Frequency distribution of the dataset
Question 1 Solution:
Here our variable under study is furniture order (in $)
For constructing frequency distribution, we need to find minimum and maximum observation from the data. As the class with is given as 50.
Minimum = 123
Maximum = 490
So, Range of the data = Maximum – Minimum = 490 -123 = 367
Number of classes = (Range of the data) / (class width) = 367 / 50 = 7.340
So we make 7.37 8 classes so that minimum of the data included in the first class and maximum will be in the last class
So the 8 classes are as follows:
120 – 170 |
170 – 220 |
220 – 270 |
270 – 320 |
320 – 370 |
370 – 420 |
420 – 470 |
470 – 520 |
Frequency of any particular class is the count of observation in that class i.e. number of data points which greater than or equal to lower boundary of class and less that upper boundary of class.
Class-Interval |
Frequency |
120 – 170 |
8 |
170 – 220 |
15 |
220 – 270 |
12 |
270 – 320 |
4 |
320 – 370 |
5 |
370 – 420 |
2 |
420 – 470 |
2 |
470 – 520 |
2 |
Total |
50 |
Relative frequency of any class is portion of frequency of that class with total frequency. Here the total frequency i.e. Total number of observation is 50.
Class-Interval |
Relative Frequency |
120 – 170 |
0.16 |
170 – 220 |
0.30 |
220 – 270 |
0.24 |
270 – 320 |
0.08 |
320 – 370 |
0.10 |
370 – 420 |
0.04 |
420 – 470 |
0.04 |
470 – 520 |
0.04 |
Total |
1.00 |
Percent frequency of any particular class = Relative frequency of that class × 100
Class Interval |
Percent Frequency |
120 – 170 |
16 |
170 – 220 |
30 |
220 – 270 |
24 |
270 – 320 |
8 |
320 – 370 |
10 |
370 – 420 |
4 |
420 – 470 |
4 |
470 – 520 |
4 |
Total |
100 |
Figure: Percent Frequency Histogram
One can see there is positive skewness in the data.
Apart from the shape of the distribution, mean is always good measure for all shaped distribution where it is exist.
For given observations,
Mean=251.46, Median = 228.5 and Mode = 231
Question 2 Solution :
It is the problem of simple linear regression.
Response variable is Demand and Predictor (independent) variable is unit price.
Y: Demand and X: Unit Price
(a)
Here we test the null hypothesis that Y and X are not related against the alternative hypothesis that Y and X are related.
To test this hypothesis, we need to complete the given incomplete ANOVA
SS = sum of square and
df = degrees of freedom.
Given values in ANOVA:
sum of square of regression = SSReg = 5048.818
sum of square of error = SSError = 3132.661
df for regression = 1
df for error = 46
df for total = 47.
Now,
MSReg= Mean sum of square of regression
MSReg= SSReg / df
= 5048.818 / 1 =5048.818
MSReg = 5048.818
MSE = Mean sum of square of error
MSError = SSError / df
= 3132.661 / 46 = 68.101
MSError = 68.101
F- Value = MSReg / MSError = 5048.818 / 68.101 = 74.137
Completed ANOVA of the regression analysis is
Relative and percent frequency distribution
ANOVA
Sources of Variation |
df |
SS |
MSS |
F-Value |
Regression |
1 |
5048.818 |
5048.8180 |
74.1369 |
Residual |
46 |
3132.661 |
68.1013 |
|
Total |
47 |
8181.479 |
Decision Criteria for taking the decision of reject or do not reject the null hypothesis:
If F-Value > Critical value of F then we reject the null hypothesis otherwise do not have enough evidence to reject null hypothesis. To test the hypothesis, we assume that null hypothesis is true.
Under null hypothesis, F-Value follows F distribution with (1, 46) degrees of freedom.
where 1 is for numerator df and 46 is denominator df.
Critical value of F distribution with (1, 46) degrees of freedom at = 0.05 is F (0.05, 1,46) = 4.0517
By comparing F-Value and Critical F value, we can see that F-Value = 74.1369 > F (0.05, 1,46) = 4.0517, so we reject null hypothesis. So conclude that we have strong evidence that demand (Y) and unit price (X) are related to each other.
R2 / Coefficient of Determination:
When we fit the regression model fit the data, variation in response variable is explained by the regressor variable (predictor variable) and error. Coefficient of determination is the proportion of variation explained by the predictor variable from the total variation. Usually Coefficient of determination is denoted by R2 and given by
Coefficient of determination = SSReg / SSTotal
where SSTotal is total sum of square.
Coefficient of determination (R2) = 5048.818 / 81.81479 = 0.617103
It means that out of 100% variation in response (Demand), predictor variable (unit price) explained about 61.7103 % variation.
(c)
Correlation coefficient:
We know that R2 is the square of correlation coefficient. So can compute the correlation coefficient by taking square root of coefficient of determination. AS slope of X is negative, means correlation between X and Y is negative.
correlation coefficient = –
correlation coefficient = – = – 0.785559
Relationship between response variable demand and predictor variable unit price:
As slope of X we can say that relation between response variable demand and predictor variable unit price is negative. Also we compute the correlation coefficient between response variable demand and predictor variable unit price is -0.785559 which suggest that there is very strong negative relationship between response variable demand and predictor variable unit price.
Both the response variable demand and predictor variable unit price are inversely related. As when unit price increases the demand decreases and when unit price decreases demand will increases.
Question 3 Solution:
Here we are interesting to test whether the all treatments have same mean or not.
Linear regression analysis for the dataset
H0: All the treatment means are same.
H0: At least one of the treatment means is different.
For testing the null hypothesis, we first have to complete the ANOVA
If we have k treatments and total observations are n then for testing the null hypothesis ANOVA is.
Source_of_Variation |
Sum_of_Squares |
Degrees_of_Freedom |
Mean_Square |
F |
Between_Treatments |
SST |
k-1 |
MST |
Cal F |
Error |
SSE |
n-k |
MSE |
|
Total |
TSS |
n-1 |
Where SST is sum of square due to treatments, SSE sum of square due to error, TSS is total sum of square, MST is mean sum of square due treatments and MSE is mean sum of square due to error.
Given, k = 3, n-1 = 23, SST = 390.58, SSE = 158.40, TSS = 548.98.
Treatment degrees of freedom = k -1 = 3 – 1 = 2
Total degrees of freedom = n -1 = 23 i.e. n=24
Total degrees of freedom for error = n – k = 24 – 3 = 21
MST = SST / d.f. of treatment = 390.58 / 2 = 195.290
MSE = SSE / d.f of error = 158.4 / 21 = 7.543
Cal F = MST / MSE = 195.290 / 7.543 = 25.891
Completed_ANOVA:
Variation Source |
SS |
DF |
MSS |
Cal F |
Between_Treatments |
390.58 |
2 |
195.290 |
25.8907 |
Within_Treatments (Error) |
158.4 |
21 |
7.5429 |
|
Total |
548.98 |
23 |
Now we compare this Cal F value with F Tabulated.
Decision Criteria:
We reject the null hypothesis if F Cal > F tabulated then we reject null hypothesis otherwise do not have enough evidence to reject null hypothesis.
Under null hypothesis, F Cal follows F distribution with (2, 21) degrees of freedom. At = 0.05 F tabulated = F (0.05, 2,21) = 3.467
Now F Cal =25.8907 > F tabulated = 3.467, so we reject null hypothesis. i.e. Means all the treatments means are not same.
Question 4 Solution:
This is multiple linear regression where we have two predictor variables. Our response variable is number of mobile phone sold per day (Y). Price in $ 1000 (X1) and number of advertising spots (X2) are two predictor variables.
(a)
Regression equation when X1 and X2 are regressed over Y:
i.e.
(b)
H0: There is no significant relationship between dependent variable and independent variables.
Vs
H1: There is significant relationship between dependent variable and independent variables.
To test this claim we need to complete the given ANOVA:
When there is p independent variables and n observations then
ANOVA
Source_of_Variation |
Sum_of_Squares |
Degrees_of_Freedom |
Mean_Square |
F |
Regression |
SSReg |
p |
MSReg = SSReg / p |
F Cal = MSReg / MSE |
Residual |
SSE |
n-1-p |
MSE = SSE / (n-1-p) |
|
Total |
TSS |
n-1 |
The complete ANOVA is given as
Source_of_Variation |
Sum_of_Squares |
Degrees_of_Freedom |
Mean_Square |
F |
Regression |
40.7 |
2 |
20.350 |
80.1181 |
Residual |
1.016 |
4 |
0.254 |
|
Total |
41.716 |
6 |
Decision Criteria:
If F Cal > F Critical then we reject null hypothesis otherwise do not reject null hypothesis.
F Critical = F (0.05, 2,4) = 6.9943
So we reject H0 as F Cal = 80.1181 > F Critical = 6.9943.
i.e There is significant relationship between dependent variable and independent variables.
- i) Here we test
H0: against H1:
t-cal for testing this null hypothesis is
Under H0, t-cal follows t distribution with 4 degrees of freedom.
At = 0.05, the critical values of distribution is
Critical t value = 2.7764
Decision criteria:
Reject H0 if
So |1.078| < 2.7764, So we do not have enough evidence to reject H0.
- ii) Here we test H0: vs H1:
t-cal for testing this hypothesis is
Under H0, t-cal follows t distribution with 4 degrees of freedom.
At = 0.05, the critical values of distribution is
Critical t value = 2.7764
Decision criteria:
Reject H0 if
So |12.33| > 2.776
So we reject H0.
(d)
Interpretation of Slope of X2:
If price is fixed (i.e. X1), then each change in one unit in number of advertising spot (X2), number of mobile phone sold per day (Y) changes by 0.4733 unit.
(e)
If price is $20000 ie. X1=20 and number of advertising spots i.e. X2=10
Then
Y= 0.8051 + 0.4977 × 20 + 0.4733 × 10 = 15.492 15
If the company charges $20,000 for each phone and they use 10 advertising spots, then 15 (approximately) mobile phones expected to sell in a particular day.
References
Bickel, P.J. and Doksum, K.A., 2015. Mathematical statistics: basic ideas and selected topics, volume I(Vol. 117). CRC Press.
Darlington, R.B. and Hayes, A.F., 2016. Regression analysis and linear models: Concepts, applications, and implementation. Guilford Publications.
Dean, A., Morris, M., Stufken, J. and Bingham, D. eds., 2015. Handbook of design and analysis of experiments(Vol. 7). CRC Press.
Fox, J., 2015. Applied regression analysis and generalized linear models. Sage Publications.
Larsen, R.J. and Marx, M.L., 2017. An introduction to mathematical statistics and its applications(Vol. 5). Pearson.
Montgomery, D.C., 2017. Design and analysis of experiments. John wiley & sons.
Rohatgi, V.K. and Saleh, A.M.E., 2015. An introduction to probability and statistics. John Wiley & Sons.
Tucker, H.G., 2014. An introduction to probability and mathematical statistics. Academic Press.