ANOVA table
Analysis of variation (ANOVA) is a test for differences between three of more groups. The F-value found through ANOVA is used to test for differences.
Source |
df |
SS |
MS |
F |
Treatments |
k-1 |
SST |
||
Error |
n-k |
SSE |
||
Total |
n-1 |
Total SS |
SST is used to measure the variation between means of k samples.
SSE is used to measure the pooled variation between means of k samples.
SSE = where denotes the number of sample in the group and is the standard deviation of the samples in the group.
F value is compared with F-crit values at the desired level of significance. If F-value > F-crit value, then we reject Null Hypothesis else accept it (Mendenhall, Beaver and Beaver 2012).
Table 1: ANOVA table
Source of Variation |
Sum of Squares |
Degrees of Freedom |
Mean Square |
F |
Between treatments |
90 |
3 |
||
Within Treatments (Error) |
120 |
20 |
||
Total |
Source of Variation |
Sum of Squares |
Degrees of Freedom |
Mean Square |
F |
p-value |
F-crit |
Between Treatments |
90 |
3 |
30 |
5 |
0.0095 |
4.9382 |
Within Treatments (Error) |
120 |
20 |
6 |
|||
Total |
210 |
23 |
MS-EXCEL is used to find p-value’s and F-crit values. Since F-value > F-crit value at 0.01 level of significance hence we reject Null Hypothesis.
df between groups = 3
- m – 1 = 3
- m = 4
Thus, number of groups = 4
Total number of observations = df (Treatment) + df (Error) + 1= 3 + 20 + 1 = 24
Total number of observations = 24
Table 2: Coefficients of Regression Equation
Coefficients |
Standard Error |
t Stat |
P-value |
|
Intercept |
136.0000 |
13.7631 |
9.8815 |
0.0000 |
Year (t) |
39.1818 |
2.2181 |
17.6643 |
0.0000 |
- Number of Cars sold (in 1000s of Units) = 136.0000 + 39.1818*Year (t)
The number of Car units sold for t = 11 can be calculated as
- Number of Cars sold (in 1000s of Units) = 136.0000 + 39.1818*Year (t)
- Number of Cars sold (in 1000s of Units) = 136.0000 + 39.1818*11 = 567
ANOVA |
|||||
df |
SS |
MS |
F |
Significance F |
|
Regression |
1 |
59.89 |
59.89 |
29.6241 |
0.0028 |
Residual |
5 |
10.11 |
2.02 |
||
Total |
6 |
70 |
At a = 0.01, the p-value for the test is 0.0028. Since p-value < a-value (0.0028 < 0.01) Null hypothesis is rejected (Courtinhas and Black 2012).
Thus, there is a statistically significant relationship between price and number of flash drives.
Coefficients |
Standard Error |
t Stat |
P-value |
|
Intercept |
40.0329 |
1.0695 |
37.4309 |
0.0000 |
Units sold (y) |
-1.1743 |
0.2158 |
-5.4428 |
0.0028 |
The coefficient for the number of units sold = -1.1743. The standard Error of the coefficient = 0.2158.
Thus the t-stat
At a = 0.01, the p-value for the t-statistics is 0.0028. Since p-value < a-value (0.0028 < 0.01)
Null hypothesis is rejected.
Thus, there is a statistically significant relationship between price and number of flash drives.
Source of Variation |
Sum of Squares |
Degrees of Freedom |
Mean Square |
F |
Between treatments |
4 |
800 |
||
Within Treatments (Error) |
65 |
|||
Total |
10600 |
69 |
df (Total) = 5*14 – 1 = 70 – 1 = 69
df (Treatment) = 5 – 1 = 4
df (Error) = 70 – 5 = 65
Since there are more than 2 groups hence ANOVA is used to check for differences in the three groups.
ANOVA |
||||||
Source of Variation |
SS |
df |
MS |
F |
P-value |
F crit |
Between Groups |
324 |
2 |
162 |
40.5000 |
0.0000 |
4.2565 |
Within Groups |
36 |
9 |
4 |
|||
Total |
360 |
11 |
From ANOVA table it is found that p-value < 0.0000. Since, p-value < a-value (0.0000 < 0.05) hence we reject Null Hypothesis.
Thus there are statistically significant differences in the average sales of the three stores.
To test for differences in the sales of the three stores a hypothesis was developed.
Null Hypothesis: The average sales of the three stores are equal
Alternate Hypothesis: The average sales of at least one of the stores is different
Linear trend equation
For the testing of the hypothesis 5% level of significance is used.
At 0.05 level of significance, df (2,9) F-crit is 4.256. Thus if F-value is more than F-crit then we reject Null Hypothesis else we accept Null Hypothesis.
ANOVA |
||||||
Source of Variation |
SS |
df |
MS |
F |
P-value |
F crit |
Between Groups |
324 |
2 |
162 |
40.5 |
3.16E-05 |
4.256 |
Within Groups |
36 |
9 |
4 |
|||
Total |
360 |
11 |
At 0.05 level of significance, F-value = 40.5. Since F-value is more than F-crit (40.5 > 4.256) hence we reject Null Hypothesis. Hence, we can conclude that there are differences in the average sales of the stores.
Null Hypothesis: There are no differences in the average sales of the three boxes
Alternate Hypothesis: There are differences in the average sales of the three boxes
Anova: Single Factor |
||||||
SUMMARY |
||||||
Groups |
Count |
Sum |
Average |
Variance |
||
Box1 |
5 |
1000 |
200 |
400 |
||
Box2 |
5 |
948 |
189.6 |
133.3 |
||
Box3 |
5 |
1400 |
280 |
150 |
||
ANOVA |
||||||
Source of Variation |
SS |
df |
MS |
F |
P-value |
F crit |
Between Groups |
24467.2 |
2 |
12233.6000 |
53.7111 |
0.0000 |
3.8853 |
Within Groups |
2733.2 |
12 |
227.7667 |
|||
Total |
27200.4 |
14 |
For the testing of the hypothesis a = 0.05.
At 0.05 level of significance, F-value = 53.7111.
p-value for the ANOVA = 0.0000.
Since, p-value < a-value (0.0000 < 0.05) hence we reject Null Hypothesis.
Thus, there are statistically significant differences in the average sales of the three boxes.
Brand A |
Brand B |
Brand C |
|
Average Mileage |
37 |
38 |
33 |
Sample Variance |
3 |
4 |
2 |
Number of tyres |
10 |
10 |
10 |
Total number of tyres = 30
Hence df (total) = 30 – 1 = 29
df (Between) = 3 (Brands) – 1 = 2
df(Error) = df(total) – df(Between) = 29 – 2 = 27
Total mileage of all 10 tyres of Brand A = 37*10 = 370
Total mileage of all 10 tyres of Brand B = 38*10 = 380
Total mileage of all 10 tyres of Brand C = 33*10 = 330
Thus total mileage of all three brands = 370 + 380 + 330 = 1080
Hence Average mileage of all 3 brands = 1080/30 = 36
SS(between) =
SS (between) = 10*(37 – 36)2+10*(38 – 36)2+10*(33 – 36)2 = 140
SS (error) =
SS (error) = 9*3 + 9*4 + 9*2 = 81
ANOVA |
||||||
Source of Variation |
SS |
df |
MS |
F |
P-value |
F crit |
Between |
140 |
2 |
70 |
23.3333 |
0.0000 |
3.3541 |
Error |
81 |
27 |
3.000 |
|||
Total |
221 |
29 |
At 0.05 level of significance, F-value = 23.333.
p-value for the ANOVA = 0.0000.
Since, p-value < a-value (0.0000 < 0.05) hence we reject Null Hypothesis.
Thus, there are statistically significant differences in the average mileage of the three brands of tyres.
Day |
Tips |
Simple Moving average |
|
1 |
18 |
||
2 |
22 |
19 |
|
3 |
17 |
19 |
|
4 |
18 |
21 |
|
5 |
28 |
22 |
|
6 |
20 |
20 |
|
7 |
12 |
Day |
Tips |
Deviation |
Deviation2 |
Absolute Deviation |
1 |
18 |
-1.2857 |
1.6531 |
1.2857 |
2 |
22 |
2.7143 |
7.3673 |
2.7143 |
3 |
17 |
-2.2857 |
5.2245 |
2.2857 |
4 |
18 |
-1.2857 |
1.6531 |
1.2857 |
5 |
28 |
8.7143 |
75.9388 |
8.7143 |
6 |
20 |
0.7143 |
0.5102 |
0.7143 |
7 |
12 |
-7.2857 |
53.0816 |
7.2857 |
AVERAGE |
19.2857 |
20.7755 |
3.4694 |
The mean square error (MSE) of the amount of tip at the car park is 20.7755
The mean absolute deviation (MAD) of the amount of tip at the car park is 3.4694
Source of Variation |
Degrees of Freedom |
Sum of Squares |
Mean Square |
F |
Regression |
4 |
283940.60 |
||
Error |
18 |
621735.14 |
||
Total |
22 |
905675.74 |
The value of coefficient of determination indicates that 7.84% of the variations in sales of Very Fresh Juice Company (dependent variable) can be predicted from the four variables of Price per unit, Competitor’s price, advertising and type of container (independent variables).
Source of Variation |
Degrees of Freedom |
Sum of Squares |
Mean Square |
F |
Significance |
F-crit |
Regression |
4 |
283940.60 |
70985.15 |
2.06 |
0.13 |
2.9277 |
Error |
18 |
621735.14 |
34540.84 |
|||
Total |
22 |
905675.74 |
At a = 0.05 Significance = 0.13
Since, p-value > a-value (0.13 > 0.05), the Null Hypothesis is accepted.
Hence, the model is statistically not significant.
Sample Size = df(Total) + 1 = 22 + 1 = 23
ANOVA |
|||||
df |
SS |
MS |
F |
Significance F |
|
Regression |
2 |
118.8474 |
59.4237 |
40.9216 |
0.0000 |
Residual |
9 |
13.0692 |
1.4521 |
||
Total |
11 |
131.9167 |
|||
Coefficients |
Standard Error |
t Stat |
P-value |
||
Intercept |
118.5059 |
33.5753 |
3.5296 |
0.0064 |
|
x1 |
-0.0163 |
0.0315 |
-0.5171 |
0.6176 |
|
x2 |
-1.5726 |
0.3590 |
-4.3807 |
0.0018 |
The equation to predict the prices of the stock
- y= 118.5059 – 0.0163*x1 – 1.5726*x2
From the coefficients of the variables it can be interpreted:
- For every 100 stocks of Rawlston Inc. sold the price of stocks of Rawlston Inc. decreases by 0.0163, when the volume of exchange (in millions) for the New York Stock Exchange remains constant
- For every one-million exchange in the New York Stock Exchange the price of stocks of Rawlston Inc. decreases by 1.5726, when there is no sale of the stocks of Rawlston Inc. constant
Part c
At 95% confidence level
- The p-value for the coefficient of Rawlston Inc (x1) is 0.6176. Since p-value > confidence level (0.6176 > 0.05) hence, the coefficient of Rawlston Inc is statistically not significant.
- The p-value for coefficient of New York Stock Exchange (x2) is 0.0018. Since p-value < confidence (0.0018 < 0.05) level hence, the coefficient is statistically significant.
The number of stocks of Rawlston Inc. sold = 94500 = 945 (100s)
The volume of exchange at the New York Stock Exchange = 16 million
- y = 118.5059 – 0.0163*x1 – 1.5726*x2
= 118.5059 – 0.0163*945 – 1.5726*16
= 118.5059 – 15.4035 – 25.1616 = 77.9408
Thus, the stock prices of Rawlston Inc = 77.9408
Conclusion
The following assignment was on the learning on the basics of ANOVA, Regression. We learnt the how to find the degrees of freedom, sum of squares and the F-value. We learnt to evaluate the F-value on the basis p-values (significance levels). MS Excel was used to find the p-values. Thus, we also learnt the use of MS-EXCEL in doing ANOVA and Regression.
Regression analysis was used to find the relationship between variables, and forecasting.
References
Cortinhas, C. and Black, K., 2014. Statistics for business and economics. Wiley Global Education.
Mendenhall, W., Beaver, R.J. and Beaver, B.M., 2012. Introduction to probability and statistics. Cengage Learning.