Frequency distribution problem involving class width and histogram
Solution to Question 1
Here class width is $50.
Min = Minimum observation of the data= Minimum of {136, 281, 226, 123, 178, 445, 231, 389, 196, 175, 211, 162, 212, 241, 182, 290, 434, 167, 246, 338, 194, 242, 368, 258, 323, 196, 183, 209, 198, 212, 277, 348, 173, 409, 264, 237, 490, 222, 472, 248, 231, 154, 166, 214, 311, 141, 159, 362, 189, 260} = 123
Max = Maximum observation = Maximum of {136, 281, 226, 123, 178, 445, 231, 389, 196, 175, 211, 162, 212, 241, 182, 290, 434, 167, 246, 338, 194, 242, 368, 258, 323, 196, 183, 209, 198, 212, 277, 348, 173, 409, 264, 237, 490, 222, 472, 248, 231, 154, 166, 214, 311, 141, 159, 362, 189, 260}
= 490
Range= Min – Max = 490 – 123 = 367
Number of classes = Range / width = 367 / 50 = 7.34 8
So there are 8 classes as follows
123-173 |
173-223 |
223-273 |
273-323 |
323-373 |
373-423 |
423-473 |
473-523 |
Frequency distribution:
Frequency of particular class is nothing but the number of observation in that class i.e. number of observation greater than or equal to lower limit of class and less that upper limit of class.
Class Interval |
Frequency |
123-173 |
8 |
173-223 |
16 |
223-273 |
11 |
273-323 |
4 |
323-373 |
5 |
373-423 |
2 |
423-473 |
3 |
473-523 |
1 |
Total |
50 |
Relative Frequency distribution:
Relative frequency of particular class is nothing but the number of observation in that class divided by total frequency. Here the total frequency is 50.
Class Interval |
Relative Frequency |
123-173 |
0.16 |
173-223 |
0.32 |
223-273 |
0.22 |
273-323 |
0.08 |
323-373 |
0.1 |
373-423 |
0.04 |
423-473 |
0.06 |
473-523 |
0.02 |
Total |
1 |
Percent Frequency distribution:
Percent frequency of particular class is nothing but the relative frequency represented in percentage.
Percent frequency = Relative frequency × 100
Class Interval |
Percent Frequency |
123-173 |
16 |
173-223 |
32 |
223-273 |
22 |
273-323 |
8 |
323-373 |
10 |
373-423 |
4 |
423-473 |
6 |
473-523 |
2 |
Total |
100 |
Figure : Percent Frequency Histogram
From the above histogram, one can say that shape of dollar-amount distribution of the furniture orders she recently receives is positively skewed.
Mean is always good measure of location for all type of shapes of distribution. For positively skewed shape distribution mean is always larger than median and mode is lies between the mean and median. For given sample data
Mean |
251.46 |
Median |
228.5 |
Mode |
231 |
Here the Y is demand and X is unit price. In linear regression analysis, we test the linear relationship between Y and X. In linear regression analysis we test the null hypothesis that Y and X are not related and alternative hypothesis is Y and X are related. The portion regression ANOVA of computer output given as
ANOVA |
||
df |
SS |
|
Regression |
1 |
5048.818 |
Residual |
46 |
3132.661 |
Total |
47 |
8181.479 |
Where SS means sum of square and df means degrees of freedom.
Firstly we need to complete the ANOVA, for that we need to calculate mean sum of square and F statistic for testing the null hypothesis.
ANOVA problem related to completely randomized design
Given SSR i.e. sum of square of regression = 5048.818
sum of square of error = SSE = 3132.661
degrees of freedom for regression = 1
degrees of freedom for error = 46
degrees of freedom for total = 47.
Now Mean sum of square of regression is the sum of square of regression divided by its degrees of freedom
i.e. MSR= Mean sum of square of regression
MSR= sum of square of regression / d. f.
= SSR / d. f. = 5048.818 / 1 =5048.818
MSR = 5048.818
Now Mean sum of square of error is the sum of square of error divided by its degrees of freedom
i.e. MSE = Mean sum of square of error
MSE = sum of square of error / d. f.
= SSE / d. f. = 3132.661 / 46 = 68.101
MSE = 68.101
Now F statistics is the ration of mean sum of square of regression and mean sum of square of error.
i.e. F statistic = MSR / MSE
F statistic = 5048.818 / 68.101 = 74.137
The Complete ANOVA of the regression analysis is
ANOVA |
||||
df |
SS |
MSS |
F statistic |
|
Regression |
1 |
5048.818 |
5048.818 |
74.137 |
Residual |
46 |
3132.661 |
68.101 |
|
Total |
47 |
8181.479 |
Now we compare this F statistic with Table value of F distribution. We call it F tabulated.
Decision Criteria:
If F tabulated > F statistic then we reject null hypothesis otherwise do not reject null hypothesis.
Under null hypothesis, F statistic follows F distribution with numerator degrees of freem is 1 and denominator degrees of freedom is 46. We compute the critical value of F distribution with (1, 46) degrees of freedom at = 0.05 and i.e. our F tabulated.
F tabulated = F (0.05, 1,46) = 4.052.
Now F statistic > F tabulated so we reject null hypothesis. So we acan say that at
= 0.05, the demand (Y) and unit price (X) are related.
(b) Coefficient of determination is the proportion of variation explained by the model from the total variation. It is denoted by R2 and given by
Coefficient of determination = sum of square of regression / total sum of square
= 5048.818 / 81.81479
=0.6171
i.e. 61.71 % variation in response variable is explained by the regression variable.
i.e. 61.71 % variation is demand (Y) is explained by the regression variable unit price (X).
(c)
Correlation coefficient:
Coefficient of determination is nothing but the square of correlation coefficient. Reversely correlation coefficient is square root of coefficient of determination.
About sign from the coefficient of X which -2.137 we can claims that X and Y are negatively related.
correlation coefficient = –
correlation coefficient = – = – 0.7856
Relationship between demand and unit price:
As from coefficient of X we can say that relation between demand and unit price is negative. We observe that correlation coefficient is -0.7856 which pretty high.
It means when unit price increases demand goes on decreasing and when unit price decreases demand goes on increasing.
Solution to the Question 3
In completely randomized design, we test whether the mean of all treatments are same or not. Completely randomized design has null hypothesis that there is no significant difference between different treatments and alternative hypothesis that there is no significant difference between different treatments. For the testing this hypothesis we need to complete the following given ANOVA.
Source of Variation |
Sum of Squares |
Degrees of Freedom |
Mean Square |
F |
Between Treatments |
390.58 |
|||
Within Treatments (Error) |
158.4 |
|||
Total |
548.98 |
23 |
Given that there are three treatment,
So,
degrees of freedom for between treatment = number of treatments – 1 = 3 – 1 = 2
Degrees of freedom for error (within treatments) = degrees of freedom for total – degrees of freedom for between treatment = 23 – 2 =21
Mean sum of square between treatments is nothing but sum of square between treatments divided by its degrees of freedom i. e.
Mean sum of square between treatments = sum of square between treatments / degrees of freedom for between treatment
= 390.58 / 2 = 195.290
Mean sum of square within treatments (error) is nothing but sum of square within treatments (error) divided by its degrees of freedom i. e.
Mean sum of square within treatments (error) = sum of square within treatments (error) / degrees of freedom for within treatments (error)
= 158.4 / 21 = 7.543
F Value is nothing but the ration of Mean sum of square between treatments and Mean sum of square within treatments (error) i.e.
F value = Mean sum of square between treatments / Mean sum of square within treatments (error)
= 195.290 / 7.543
= 25.891
Completed ANOVA:
Source of Variation |
Sum of Squares |
Degrees of Freedom |
Mean Square |
F |
Between Treatments |
390.58 |
2 |
195.290 |
25.891 |
Within Treatments (Error) |
158.4 |
21 |
7.543 |
|
Total |
548.98 |
23 |
Now we compare this F value with Table value of F distribution. We call it F tabulated.
Decision Criteria:
If F value > F tabulated then we reject null hypothesis otherwise do not reject null hypothesis.
Under null hypothesis, F statistic follows F distribution with numerator degrees of freedom is 2 and denominator degrees of freedom is 21. We compute the critical value of F distribution with (2, 21) degrees of freedom at = 0.05 and i.e. our F tabulated.
F tabulated = F (0.05, 2,21) = 3.4668
Now F value > F tabulated so we reject null hypothesis.
i.e. there is significant difference between mean of three treatments population.
Solution to Question 4
Here the
Y = number of mobile phone sold per day
X1 = Price (in $ 1000)
X2 = number of advertising spots
(a)
Estimated regression equation of Y related to X1 and X2 as
Y = intercept + slope of X1 × X1 + slope of X2 × X2
i.e. Y= 0.8051 + 0.4977 × X1 + 0.4733 × X2
(b)
To test the significance relationship with X1 and X2 and Y. We need to complete the given ANOVA. Here our null hypothesis is that there is no X1 and X2 are not significantly related with Y and alternative hypothesis is X1 and X2 are significantly related with Y.
Given that there are 7 observation taken and we have two repressor variables X1 and X2 so
Degrees of freedom for regression = 2
Degrees of freedom for total = 7 – 1 = 6
Degrees of freedom for residual = Degrees of freedom for total – Degrees of freedom for regression
= 6 – 2 = 4
Sum of square for regression = 40.7
Sum of square for error = 1.016
Mean sum of square is nothing but sum of square divided by its degrees of freedom.
So,
Mean Sum of square for regression = 40.7 / 2 = 20.35
Mean Sum of square for error = 1.016 / 4 = 0.254
F value = Mean Sum of square for regression / Mean Sum of square for error
= 20.35 / 0.254 = 80.118
The complete ANOVA is given as
Source of Variation |
Sum of Squares |
Degrees of Freedom |
Mean Square |
F |
Regression |
40.7 |
2 |
20.350 |
80.118 |
Residual |
1.016 |
4 |
0.254 |
|
Total |
41.716 |
6 |
Now we compare this F value with Table value of F distribution. We call it F tabulated.
Decision Criteria:
If F tabulated < F value then we reject null hypothesis otherwise do not reject null hypothesis.
Under null hypothesis, F statistic follows F distribution with numerator degrees of freedom is 2 and denominator degrees of freedom is 4. We compute the critical value of F distribution with (2, 4) degrees of freedom at = 0.05 and i.e. our F tabulated.
F tabulated = F (0.05, 2,4) = 6.994
Now F value > F tabulated so we reject null hypothesis.
i.e there is significant relationship between all the independent variables (X1 and X2) and the dependent variable (Y).
(c)
- i) we test H0: vs H1:
t-statistic for testing this hypothesis is
Under H0 t statistics follows t distribution with 4 degrees of freedom. So we find critical values of t distribution. At = 0.05, critical values of distribution is
Critical t value = 2.776
Decision criteria:
Reject H0 if
So |1.078| < 2.776
So we fail to reject H0.
- i) we test H0: vs H1:
t-statistic for testing this hypothesis is
Under H0 t statistics follows t distribution with 4 degrees of freedom. So we find critical values of t distribution. At = 0.05, critical values of distribution is
Critical t value = 2.776
Decision criteria:
Reject H0 if
So |12.33| > 2.776
So we reject H0.
(d)
Interpretation of Slope of X2:
If X1 is fixed (that is price is fixed), then each change in one unit in X2 (number of advertising spot), Y (number of mobile phone sold per day) changes by 0.4733.
(e)
If X1=20 and X2=10
Then
Y= 0.8051 + 0.4977 × 20 + 0.4733 × 10 = 15.4921 15
If the company charges $20,000 for each phone and uses 10 advertising spots, then approximately 15 mobile phones expected to sell in a day.
References
- Bickel, P.J. and Doksum, K.A., 2015. Mathematical statistics: basic ideas and selected topics, volume I(Vol. 117). CRC Press.
- DeGroot, M.H. and Schervish, M.J., 2012. Probability and statistics. Pearson Education.
- Montgomery, D.C., 2017. Design and analysis of experiments. John wiley & sons.
- Ross, S.M., 2014. Introduction to probability and statistics for engineers and scientists. Academic Press.
- Ryan, T.P., 2008. Modern regression methods(Vol. 655). John Wiley & Sons.