Question 1
The solved questions in the following are respectively providing us the concept of frequency distribution and histograms. In the second question, simple linear regression model brings the idea about sample size, correlation co-efficient, co-efficient of determination and predicted values. Also, in the third question, one-way ANOVA is accomplished with a certain level of significance. Correspondingly, in the fourth question, multiple regression model is executed with the help of MS Excel. This model helps to generate an understanding about estimation, significance of the independent variables and slope co-efficient. Not only that, after identification of predictor, elimination of that predictor is accomplished. Finally, the creation of new regression model with single significant predictor is also shown in this context.
Below you are given the examination scores of 20 students (data set also provided in accompanying MS Excel file).
52 |
99 |
92 |
86 |
84 |
63 |
72 |
76 |
95 |
88 |
92 |
58 |
65 |
79 |
80 |
90 |
75 |
74 |
56 |
99 |
- Construct a frequency distribution, cumulative frequency distribution, relative frequency distribution, cumulative relative frequency distribution and percent frequency distribution for the data set using a class width of 10.
Using a class width of 10, the range is obtained as; (50-59), (60-69), (70-79), (80-89) and (90-99).
- Frequency distribution for the dataset
Class Range/ Interval |
Class Midpoint (x) |
Frequency |
50-59 |
54.5 |
3 |
60-69 |
64.5 |
2 |
70-79 |
74.5 |
5 |
80-89 |
84.5 |
4 |
90-99 |
94.5 |
6 |
Total |
20 |
- Cumulative frequency distribution for the dataset
Class Range/Interval |
Class Midpoint (x) |
Frequency |
Cumulative Frequency |
50-59 |
54.5 |
3 |
3 |
60-69 |
64.5 |
2 |
5 |
70-79 |
74.5 |
5 |
10 |
80-89 |
84.5 |
4 |
14 |
90-99 |
94.5 |
6 |
20 |
Total |
|
20 |
- Relative Frequency distribution for the dataset
Class Range/ Interval |
Class Mid-point (x) |
Frequency |
Relative Frequency |
50-59 |
54.5 |
3 |
3/20 = 0.15 |
60-69 |
64.5 |
2 |
2/20 = 0.10 |
70-79 |
74.5 |
5 |
5/20 = 0.25 |
80-89 |
84.5 |
4 |
4/20 = 0.20 |
90-99 |
94.5 |
6 |
6/20 = 0.30 |
Total |
|
20 |
1.00 |
- Cumulative relative Frequency distribution for the dataset
Class Range/Interval |
Class Mid-point (x) |
Frequency (f) |
Cumulative Relative Frequency (cf) |
50-59 |
54.5 |
3 |
0.15 |
60-69 |
64.5 |
2 |
0.25 |
70-79 |
74.5 |
5 |
0.50 |
80-89 |
84.5 |
4 |
0.70 |
90-99 |
94.5 |
6 |
1.00 |
Total |
|
20 |
- Percentage Frequency distribution for the dataset
Range |
Class Midpoint (x) |
Frequency |
Percentage (%) |
50-59 |
54.5 |
3 |
15% |
60-69 |
64.5 |
2 |
10% |
70-79 |
74.5 |
5 |
25% |
80-89 |
84.5 |
4 |
20% |
90-99 |
94.5 |
6 |
30% |
Total |
|
20 |
100% |
- Construct a histogram showing the percent frequency distribution of the examination scores. Comment on the shape of the distribution.
Comment: The histogram above shows the distribution of the data which are examination scores. Based on the figure, it is evident that the performances are high since the highest scores are above 90 percent. The scores between 60 and 69 were low. The histogram also suggest a normal distribution of the scores since there is no single outlier.
Shown below is a portion of a computer output for a regression analysis relating supply (Y in thousands of units) and unit price (X in thousands of dollars).
- What has been the sample size for this problem?
The sample size for this problem is equivalent to the sum of the degree of freedom of the regression and the residual.
Thus the sample size for the problem = (1+39) = 40 units/items
- Determine whether or not demand and unit price are related. Use α = 0.05.
From the above output, the relationship between demand and unit price can be determine through examination of the coefficient of the unit price (X variable).
The coefficient of unit price is 0.029. This is a positive value and thus we conclude that the demand and the unit price related.
- Compute the coefficient of determination and fully interpret its meaning. Be very specific.
The coefficient of determination helps in describing how well a regression line fits. i.e. goodness of fit. Coefficient of determination (R²) is computed is computed based on the following formula;
R² = 1- = 1- = 0.047 which is approximately equals to 0.
Given that the computed coefficient of determination = 0.047, it follows that the regression line (line of best fit) does not fit the set of data points.
- Compute the coefficient of correlation and explain the relationship between supply and unit price.
The coefficient of determination (R²) is the square of the coefficient of the correlation ®. But the coefficient of the determination computed in (c) above is 0.047.
Comment
Hence coefficient of correlation (R) = √Coefficient of determination
R = √0.047 = 0.2190
Based on the correlation of the coefficient computed which is 0.2190, this value is positive and thus there exist a positive significant relationship/correlation between supply and unit price.
- Predict the supply (in units) when the unit price is $50,000.
The regression equation is; Supply (Y) = 54.076 + 0.029 × (50,000) + Standard error
Supply (Y) = 54.076 + 0.029 × (50,000) + 0.021
Supply = $ 1504.10
Allied Corporation wants to increase the productivity of its line workers. Four different programs have been suggested to help increase productivity. Twenty employees, making up a sample, have been randomly assigned to one of the four programs and their output for a day’s work has been recorded. You are given the results below (data set also provided in accompanying MS Excel file).
Program A |
Program B |
Program C |
Program D |
150 |
150 |
185 |
175 |
130 |
120 |
220 |
150 |
120 |
135 |
190 |
120 |
180 |
160 |
180 |
130 |
145 |
110 |
175 |
175 |
- Construct an ANOVA table.
A single factor or one-way analysis of variance (ANOVA) is used in this case to test the null hypothesis that the means of the four programs are all equal. i.e.
Ho: μA = μB = μC = μD
H1: At least one inequality among μA, μB, μC and μD.
The one-way AVOVA table constructed in excel spreadsheet is as shown below;
Anova: Single Factor |
||||||
SUMMARY |
||||||
Groups |
Count |
Sum |
Average |
Variance |
||
Program A |
5 |
725 |
145 |
525 |
||
Program B |
5 |
675 |
135 |
425 |
||
Program C |
5 |
950 |
190 |
312.5 |
||
Program D |
5 |
750 |
150 |
637.5 |
||
ANOVA |
||||||
Source of Variation |
SS |
df |
MS |
F |
P-value |
F crit |
Between Groups |
8750 |
3 |
2916.667 |
6.140351 |
0.00557 |
3.238872 |
Within Groups |
7600 |
16 |
475 |
|||
Total |
16350 |
19 |
As the statistical consultant to Allied, what would you advise them? Use a .05 level of significance.
At 0.05 level of significance, F=6.140351 which is greater than F critical (3.238872). We therefore reject the null hypothesis. It implies that the means of all the three programs are not equal. As a statistical consultant to Allie, I would therefore advise them not to implement the four programs among the line workers if they want to increase the productivity.
A company has recorded data on the weekly sales for its product (y), the unit price of the competitor’s product (x1), and advertising expenditures (x2). The data resulting from a random sample of 7 weeks follows. Use Excel’s Regression Tool to answer the following questions (data set also provided in accompanying MS Excel file).
Week Price (x1) Advertising (x2) Sales
- 33 5 20
- 25 2 14
- 44 7 22
- 40 9 21
- 35 4 16
- 39 8 19
- 29 9 15
- What is the estimated regression equation? Show the regression output.
The estimated regression line equation is; y = Sales = 3.597615 + 41.32002 × Price + 0.013242 × Advertising.
Sales = 3.597615 + 41.32002 × Price + 0.013242 × Advertising.
The regression outputs from the excel;
SUMMARY OUTPUT |
||||||||
Regression Statistics |
||||||||
Multiple R |
0.877814 |
|||||||
R Square |
0.770558 |
|||||||
Adjusted R Square |
0.655837 |
|||||||
Standard Error |
1.83741 |
|||||||
Observations |
7 |
|||||||
ANOVA |
||||||||
df |
SS |
MS |
F |
Significance F |
||||
Regression |
2 |
45.35284 |
22.67642 |
6.716801 |
0.052644 |
|||
Residual |
4 |
13.5043 |
3.376075 |
|||||
Total |
6 |
58.85714 |
||||||
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
Lower 90.0% |
Upper 90.0% |
|
Intercept |
3.597615 |
4.052244 |
0.887808 |
0.424805 |
-7.65322 |
14.84845 |
-5.04115 |
12.23638 |
Price (x1) |
41.32002 |
13.33736 |
3.098065 |
0.036289 |
4.289567 |
78.35048 |
12.88681 |
69.75324 |
Advertising (x2) |
0.013242 |
0.327592 |
0.040422 |
0.969694 |
-0.8963 |
0.922782 |
-0.68513 |
0.711617 |
RESIDUAL OUTPUT |
||||||||
Observation |
Predicted Y |
Residuals |
||||||
1 |
17.29943 |
2.700568 |
||||||
2 |
13.9541 |
0.045896 |
||||||
3 |
21.87112 |
0.128882 |
||||||
4 |
20.2448 |
0.7552 |
||||||
5 |
18.11259 |
-2.11259 |
||||||
6 |
19.81836 |
-0.81836 |
||||||
7 |
15.6996 |
-0.6996 |
- Determine whether the model is significant overall. Use α = 0.10.
By using α = 0.10 as the level of significance, we interpret the overall F test of the significance. We then compare the P-value for the F-test to our significance level. If the P-value is less than the significance level, then the sample data would provide a sufficient evidence to conclude that the regression model is significant.
In our case, the P-value= 0.424805 which is greater than the significance level (α = 0.10). Thus the overall model is not significant.
- Determine if competitor’s price and advertising is individuallysignificantly related to sales. Use α = 0.10.
Statistical significant is the likelihood that the relationship between competitor’s price and advertising is related to sales. The P-value is examined to determine if competitor’s price and advertising is individually significantly related to sales. If the P- value obtained in each variable is lower than 5% i.e. 0.05, then the variable is individually significantly related to sales. In this case, the P-value under competitor’s price is 0.036289 while that of advertising is 0.969694. It implies that only competitor’s price is individually significantly related to sales since its P- value is less than 0. 05. Thus, advertising is an insignificant variable.
Based on your answer to part (c), drop any insignificant independent variable(s) and d.
Re-estimate the model. What is the new estimated regression equation?
Based on the findings, the advertising is the insignificant independent variable to be dropped since it has a P-value that is higher than 5 percent (0.05). We then res-estimate the model and the new estimated regression equation becomes; Sales = 3.581788 + 41.60305 × Competitor’s Price
The new estimated regression equation;
Sales = 3.581788 + 41.60305 × Competitor’s Price
Excel output is as shown below.
SUMMARY OUTPUT |
||||||||
Regression Statistics |
||||||||
Multiple R |
0.877761 |
|||||||
R Square |
0.770464 |
|||||||
Adjusted R Square |
0.724557 |
|||||||
Standard Error |
1.643765 |
|||||||
Observations |
7 |
|||||||
ANOVA |
||||||||
df |
SS |
MS |
F |
Significance F |
||||
Regression |
1 |
45.34733 |
45.34733 |
16.78311 |
0.009385 |
|||
Residual |
5 |
13.50981 |
2.701963 |
|||||
Total |
6 |
58.85714 |
||||||
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
Lower 90.0% |
Upper 90.0% |
|
Intercept |
3.581788 |
3.608215 |
0.992676 |
0.366447 |
-5.69342 |
12.857 |
-3.68894 |
10.85252 |
Price (x1) |
41.60305 |
10.15521 |
4.096719 |
0.009385 |
15.49825 |
67.70786 |
21.13981 |
62.0663 |
- Interpret the slope coefficient(s) of the model from part (d).
From the model obtained above the coefficient of competitor’s price is obtained as 41.60305. It therefore follows that for each unit increase in competitor’s price, the value of sale increases with a value that is equivalent to 41.60305.
Conclusion:
Finally, the questions of the assignment help to create a transparent concept about using common MS Excel functions and ‘Data analysis’ tool-pack. The approach of distribution and its aspects of any continuous variable (ex: exam scores of the students) is analyzed as well as depicted with respect to the analytical report. The interrelationship of various statistics and parameters of linear regression model or multiple regression model are learned in this analytical report. The ANOVA test taught us about rejecting or accepting the null or alternative hypothesis with respect to certain level of significance (here 5%). The strength of association and its significance as per coefficient of correlation or coefficient of determination are established with the reference of regression analysis. The estimation or prediction approach is apprehended by the simple or multiple regression models.
References
John, A. Rice (2005).Mathematical Statistics and Data Analysis. Berkeley: University of California
Gulezian, R. (2006). Elements of Business Statistics. London: Oxford press.