Descriptive Statistics for Missy Walters Orders Dataset
This question looks at the descriptive statistics of the 50 orders of Missy Walters. Descriptive statistics provides the characteristics of a data set in terms of the frequency distribution, the measures of central tendency, measure of dispersion and the graphical representation of the data set (David & David, 2000).
The frequency distribution provides the characteristics of the dataset in terms of the numbers or the frequencies of the available observations or classes of the observations (David & David, 2000). In the case of Missy Walters’s orders data set, the frequency table provided or analyzed outlines the number of orders in each class of orders.
The orders have a class width of 50 each. From the output, it is clear that the class with the highest frequency is 150-200 with a frequency of 15 representing 30% of the total frequencies. Similarly, the class with the list frequency is 450-499 with a frequency of 2 representing 4% of the total frequencies. This is an indication that majority of orders amount between 150 and 200 while the list amount between 450 and 499.
Labels |
Frequency |
Relative Frequency |
Percentage Frequency |
100-149 |
3 |
0.06 |
6 |
150-199 |
15 |
0.3 |
30 |
200-249 |
14 |
0.28 |
28 |
250-299 |
6 |
0.12 |
12 |
300-349 |
4 |
0.08 |
8 |
350-399 |
3 |
0.06 |
6 |
400-449 |
3 |
0.06 |
6 |
450-499 |
2 |
0.04 |
4 |
Total |
50 |
1 |
100 |
Part b: Histogram
A histogram is one of the methods of graphical representation of data. A histogram presents the dataset in form of the vertical bars with widths (Knight, 2000). The lengths of the bars represent the frequencies of the classes of the observations or the observations themselves (Knight, 2000). In this case of Missy Walters, the lengths of the bars represent the frequency or the number of orders in class of orders.
The output of the histogram is presented below. From the output, it is clear that the class with longest bar is 150-199 while the class with the shortest bar is 450-499. This again supports the earlier observations
Part c: Area Plot
An area plot gives the shape of frequency distribution. In our case, an area plot of frequencies against plot has been developed (Tim, 2005). The plot is shown below. From the plot, it is clear that the data is right skewed on the right side of the mean. This pulls the mean towards the right of actual mean. This implies that the median and mode provides more accurate idea of central tendency. The mode is the class with the highest frequency which is 150-199.
Question 2: Regression Analysis
Part a: Determining relationship.
Regression analysis is one way of hypothesis testing (Tim, 2005). Hypothesis testing is used when we want to determine the truth value of a given statement about a phenomenon or research question (Tim, 2005). Regression analysis is used when testing for the relationship between the dependent and independent variables.
Histogram Analysis for Missy Walters Orders Dataset
The hypothesis tested is that there is no relationship between the dependent and the independent variable against the alternative hypothesis that there is a relationship between the dependent and the independent variable. One of the output tables in this test is he ANOVA table provided below. From the table below, it is possible to determine whether Y is related to X or not at 0.05 alpha.
From this output it is clear that X and Y are related given the intercept is 80.390. This value represents the gradient of the regression line connecting X and Y and it indicates that when X changes by 1 unit, then Y changes with a corresponding 80.390 units. In other words, when the unit price changes (increases or decreases) by 1 unit, the demand correspondingly changes (increses or decerases) by 80.390 units.
ANOVA |
||
df |
SS |
|
Regression |
1 |
5048.818 |
Residual |
46 |
3132.661 |
Total |
47 |
8181.479 |
Coefficients |
Standard Error |
|
Intercept |
80.39 |
3.102 |
X |
-2.137 |
0.248 |
Part b: Coefficient of determination
Coefficient of determination is denoted by R2 and commonly called the R squared. R squared provides the quality or accuracy of the data set used in the regression analyses. It represents the percentage of the population that is explained by the data set.
From an ANOVA table, coefficient of determination is by; R2 =SS (Regression)/SS (Total)= 5048.818/8181.479=0.6171. This implies that the sample data explains 61.71% of the population.
R2 =SS (Regression)/SS(Total)
=5048.818/8181.479
=0.6171033
Part c: Coefficient of correlation
The coefficient of correlation determines the degree of the relationship between the dependent and the independent variables (Golubbwa, 2013). From the coefficients table, the coefficient of correlation is given by the square root of coefficient of determination which is 0.78. This indicates that there is a strong positive correlation between Y and X (Tarima & Dmitriev).
Coefficient of Correlation= Squareroot(R2)
=√(R2)
=√0.6171033
=0.785593
Question 3: Hypothesis Test of Equality of means
This case provides a hypothesis test of equality of means. ANOVA test is used to test whether there is a significant difference in the means of two samples (Evseenko, 2013). In Hypothesis test, there is the null and the alternative hypothesis. A null hypothesis is testes negatively while an alternative hypothesis is stated positively (Ion, Jens, Ralf, & Wolfgang, 2001). In words, the hypotheses are stated as follows.
H0: There is no significant difference among the means.
H1: There is a significant difference among the means.
The test is completed as follows;
P= number of samples
N= number of observations in each sample
df ;
between treatmenet= P-1 = 3-1=2
Regression Analysis for X and Y Variables
within treatment= N-P = 24-3= 21
MS= Sum of squares/df
F= MS(between)/MS(within)
=195.29/21
= 7.543
F Values: =MS between/MS Within
=195.5/7.543
=25.98
=p-value for F score-numerator df (2), denominator df (21)
P value=2.1487E-06
Reject Ho
P value is found by reading the F score such that the numerator is 2 (df between) and the numerator is 21 (df within). The F score is 2. 148*10-6 which can also be written as 2.148E-06.
Decision and Conclusion
We reject the null hypothesis that there is no significant relationship between X and Y. We conclude that there is no sufficient evidence to show that there is no significant relationship between X and Y.
Question 4
Part a: Estimated equation relating y and X1, X2
Equation connecting y, X1, x2 can be estimated by running a regression analysis test. This is possible by testing the hypothesis:
H0: b1, b2 equals not zero i. e there is no relationship between y,x1 and x2
H1: b1, b2 equals to zero i. e there is a relationship between y, x1 and x2
Where b1, and b2 are the regression coefficients between y and x1 and y and x2 respectively. B1 and b2 are determined below. The estimated equation is in the form
Y=bo+b1x1+b2x2
Coefficients |
Coefficents |
SE |
T- Value |
P- Value |
Intercept |
0.8051 |
|||
x1 |
0.4977 |
0.4617 |
1.077973 |
0.330282 |
x2 |
0.4733 |
0.0387 |
12.22997 |
6.46E-05 |
The estimated equation is given by;
bo= 0.8051, b1= 0.4977 and b2= 0.4733
Y= 0.8051+ 0.4977×1+0.4733×2
Part b: testing for significant relationship x1 and x2
To test for a significant relationship between the variables, we use the ANOVA table below; the null and alternative hypothesis is given below;
H0: There is no significant difference in the means of x1 and x2
H1: There is a significant difference in the means of x1 and x2.
from the table below, the p value is 0.000593 which is less than the alpha value=0.05. therefore we reject the null hypothesis that there is no significant difference in the means of x1 and x2. We conclude that there is no sufficient evidence to show that there is no significant difference in the means of x1 and x2.
ANOVA |
|||||
df |
SS |
MS |
F |
P value |
|
Regression |
2 |
40.7 |
20.35 |
80.11811 |
0.00059 |
Residual |
4 |
1.016 |
0.254 |
||
Total |
6 |
Part c: Test whether b1 and b2 are sigficantly different.
To test whether b1 and b2 are significantly different; we use the hypothesis;
H0: there is no significant difference between x1 and x2
H1: There is a significant difference between x1 and x2
Df= n-2
T value= Estimate of Coef/Standard error Coef
Therefore, for b1:
=0.4977/0.4617
=1.078
Similarly, for b2:
T value= 0.4733/0.0387
=12.230
P value: T score for, df= n-2=5
Two- tailed test for alpha= 0.05
Price – x1:
P- value for b1= 0.3303
P value>0.05 (alpha value)
We fail to reject the H0 and conclude that there is no sufficient evidence to show that there is a significant relationship between price and sales per day.
P value for b2= 0.0001
P value<0.05 (Alpha)
We reject H0 and conclude that there is no sufficient evidence to prove that there is no significant relationship between advertisement spot and sales per day.
Coefficients |
Coefficents |
SE |
T- Value |
P- Value |
Intercept |
0.8051 |
|||
x1 |
0.4977 |
0.4617 |
1.077973 |
0.330282 |
x2 |
0.4733 |
0.0387 |
12.22997 |
0.0001 |
Part d: interpreting the cofficicient of x2
The coefficient of x2 is 0.4733. this is the gradient/slope of y and x2. This indicates that a change (increase or decrease) in advertisement by 1 unit causes a corresponding change (increase/decrease) in phone sales by a 0.4733.
Part e: Sales
Sales (y) is given by;
Y= 0.8051+ 0.4977*price+0.4733*advertisement
Sales=0.8051+0.4977*20000+0.4733*10
=$ 9955.2784 per day
References
David , J. S., & David, S. (2000). Handbook of Paramentric and nonparametric Statistical Procedures.
Evseenko, O. (2013). Statistical Evaluation of the Stock Markets.
Golubbwa, G. (2013). Statistical Analysis of Basis Factor Influence on in-transit Freight in Ukraine by Regression Model. Statistics.
Ion, S., Jens, E., Ralf, B., & Wolfgang, E. (2001). Statistcal Regression Modelling.
Knight, K. (2000). Mathematical Statistics- Volume in the Texts in Statistical Science Series- Chapman & Hall. Chapman Hall.
Tarima, S. S., & Dmitriev, Y. G. (n.d.). Statistical Estimation with Possible Incorrect Model Assumption.
Tim, S. (2005). Mastering Statistical Process control: a handbook for performance improvement using cases. New York.