Statistical Analysis Techniques – A Case Study

Descriptive Statistics for Missy Walters Orders Dataset

This question looks at the descriptive statistics of the 50 orders of Missy Walters. Descriptive statistics provides the characteristics of a data set in terms of the frequency distribution, the measures of central tendency, measure of dispersion and the graphical representation of the data set (David & David, 2000).

The frequency distribution provides the characteristics of the dataset in terms of the numbers or the frequencies of the available observations or classes of the observations (David & David, 2000). In the case of Missy Walters’s orders data set, the frequency table provided or analyzed outlines the number of orders in each class of orders.

The orders have a class width of 50 each. From the output, it is clear that the class with the highest frequency is 150-200 with a frequency of 15 representing 30% of the total frequencies. Similarly, the class with the list frequency is 450-499 with a frequency of 2 representing 4% of the total frequencies. This is an indication that majority of orders amount between 150 and 200 while the list amount between 450 and 499.

Labels	Frequency Save Time On Research and Writing Hire a Pro to Write You a 100% Plagiarism-Free Paper. Get My Paper	Relative Frequency	Percentage Frequency
100-149	3	0.06	6
150-199	15	0.3	30
200-249	14	0.28	28
250-299	6	0.12	12
300-349	4	0.08	8
350-399	3	0.06	6
400-449	3	0.06	6
450-499	2	0.04	4
Total	50	1	100

Part b: Histogram

A histogram is one of the methods of graphical representation of data. A histogram presents the dataset in form of the vertical bars with widths (Knight, 2000). The lengths of the bars represent the frequencies of the classes of the observations or the observations themselves (Knight, 2000). In this case of Missy Walters, the lengths of the bars represent the frequency or the number of orders in class of orders.

The output of the histogram is presented below. From the output, it is clear that the class with longest bar is 150-199 while the class with the shortest bar is 450-499. This again supports the earlier observations

Part c: Area Plot

An area plot gives the shape of frequency distribution. In our case, an area plot of frequencies against plot has been developed (Tim, 2005). The plot is shown below. From the plot, it is clear that the data is right skewed on the right side of the mean. This pulls the mean towards the right of actual mean. This implies that the median and mode provides more accurate idea of central tendency. The mode is the class with the highest frequency which is 150-199.

Question 2: Regression Analysis

Part a: Determining relationship.

Regression analysis is one way of hypothesis testing (Tim, 2005). Hypothesis testing is used when we want to determine the truth value of a given statement about a phenomenon or research question (Tim, 2005). Regression analysis is used when testing for the relationship between the dependent and independent variables.

Histogram Analysis for Missy Walters Orders Dataset

The hypothesis tested is that there is no relationship between the dependent and the independent variable against the alternative hypothesis that there is a relationship between the dependent and the independent variable. One of the output tables in this test is he ANOVA table provided below. From the table below, it is possible to determine whether Y is related to X or not at 0.05 alpha.

From this output it is clear that X and Y are related given the intercept is 80.390. This value represents the gradient of the regression line connecting X and Y and it indicates that when X changes by 1 unit, then Y changes with a corresponding 80.390 units. In other words, when the unit price changes (increases or decreases) by 1 unit, the demand correspondingly changes (increses or decerases) by 80.390 units.

ANOVA
	df	SS
Regression	1	5048.818
Residual	46	3132.661
Total	47	8181.479
	Coefficients	Standard Error
Intercept	80.39	3.102
X	-2.137	0.248

Part b: Coefficient of determination

Coefficient of determination is denoted by R² and commonly called the R squared. R squared provides the quality or accuracy of the data set used in the regression analyses. It represents the percentage of the population that is explained by the data set.

From an ANOVA table, coefficient of determination is by; R^{2 =}SS (Regression)/SS (Total)= 5048.818/8181.479=0.6171. This implies that the sample data explains 61.71% of the population.

R² =SS (Regression)/SS(Total)

=5048.818/8181.479

=0.6171033

Part c: Coefficient of correlation

The coefficient of correlation determines the degree of the relationship between the dependent and the independent variables (Golubbwa, 2013). From the coefficients table, the coefficient of correlation is given by the square root of coefficient of determination which is 0.78. This indicates that there is a strong positive correlation between Y and X (Tarima & Dmitriev).

Coefficient of Correlation= Squareroot(R²)

=√(R²)

=√0.6171033

=0.785593

Question 3: Hypothesis Test of Equality of means

This case provides a hypothesis test of equality of means. ANOVA test is used to test whether there is a significant difference in the means of two samples (Evseenko, 2013). In Hypothesis test, there is the null and the alternative hypothesis. A null hypothesis is testes negatively while an alternative hypothesis is stated positively (Ion, Jens, Ralf, & Wolfgang, 2001). In words, the hypotheses are stated as follows.

H0: There is no significant difference among the means.

H1: There is a significant difference among the means.

The test is completed as follows;

P= number of samples

N= number of observations in each sample

df ;

between treatmenet= P-1 = 3-1=2

Regression Analysis for X and Y Variables

within treatment= N-P = 24-3= 21

MS= Sum of squares/df

F= MS(between)/MS(within)

=195.29/21

= 7.543

F Values: =MS between/MS Within

=195.5/7.543

=25.98

=p-value for F score-numerator df (2), denominator df (21)

P value=2.1487E-06

Reject Ho

P value is found by reading the F score such that the numerator is 2 (df between) and the numerator is 21 (df within). The F score is 2. 148*10^-6 which can also be written as 2.148E-06.

Decision and Conclusion

We reject the null hypothesis that there is no significant relationship between X and Y. We conclude that there is no sufficient evidence to show that there is no significant relationship between X and Y.

Question 4

Part a: Estimated equation relating y and X1, X2

Equation connecting y, X1, x2 can be estimated by running a regression analysis test. This is possible by testing the hypothesis:

H0: b1, b2 equals not zero i. e there is no relationship between y,x1 and x2

H1: b1, b2 equals to zero i. e there is a relationship between y, x1 and x2

Where b1, and b2 are the regression coefficients between y and x1 and y and x2 respectively. B1 and b2 are determined below. The estimated equation is in the form

Y=bo+b1x1+b2x2

Coefficients	Coefficents	SE	T- Value	P- Value
Intercept	0.8051
x1	0.4977	0.4617	1.077973	0.330282
x2	0.4733	0.0387	12.22997	6.46E-05

The estimated equation is given by;

bo= 0.8051, b1= 0.4977 and b2= 0.4733

Y= 0.8051+ 0.4977×1+0.4733×2

Part b: testing for significant relationship x1 and x2

To test for a significant relationship between the variables, we use the ANOVA table below; the null and alternative hypothesis is given below;

H0: There is no significant difference in the means of x1 and x2

H1: There is a significant difference in the means of x1 and x2.

from the table below, the p value is 0.000593 which is less than the alpha value=0.05. therefore we reject the null hypothesis that there is no significant difference in the means of x1 and x2. We conclude that there is no sufficient evidence to show that there is no significant difference in the means of x1 and x2.

ANOVA
	df	SS	MS	F	P value
Regression	2	40.7	20.35	80.11811	0.00059
Residual	4	1.016	0.254
Total	6

Part c: Test whether b1 and b2 are sigficantly different.

To test whether b1 and b2 are significantly different; we use the hypothesis;

H0: there is no significant difference between x1 and x2

H1: There is a significant difference between x1 and x2

Df= n-2

T value= Estimate of Coef/Standard error Coef

Therefore, for b1:

=0.4977/0.4617

=1.078

Similarly, for b2:

T value= 0.4733/0.0387

=12.230

P value: T score for, df= n-2=5

Two- tailed test for alpha= 0.05

Price – x1:

P- value for b1= 0.3303

P value>0.05 (alpha value)

We fail to reject the H0 and conclude that there is no sufficient evidence to show that there is a significant relationship between price and sales per day.

Advertisement spots- X2:

P value for b2= 0.0001

P value<0.05 (Alpha)

We reject H0 and conclude that there is no sufficient evidence to prove that there is no significant relationship between advertisement spot and sales per day.

Coefficients	Coefficents	SE	T- Value	P- Value
Intercept	0.8051
x1	0.4977	0.4617	1.077973	0.330282
x2	0.4733	0.0387	12.22997	0.0001

Part d: interpreting the cofficicient of x2

The coefficient of x2 is 0.4733. this is the gradient/slope of y and x2. This indicates that a change (increase or decrease) in advertisement by 1 unit causes a corresponding change (increase/decrease) in phone sales by a 0.4733.

Part e: Sales

Sales (y) is given by;

Y= 0.8051+ 0.4977*price+0.4733*advertisement

Sales=0.8051+0.4977*20000+0.4733*10

=$ 9955.2784 per day

References

David , J. S., & David, S. (2000). Handbook of Paramentric and nonparametric Statistical Procedures.

Evseenko, O. (2013). Statistical Evaluation of the Stock Markets.

Golubbwa, G. (2013). Statistical Analysis of Basis Factor Influence on in-transit Freight in Ukraine by Regression Model. Statistics.

Ion, S., Jens, E., Ralf, B., & Wolfgang, E. (2001). Statistcal Regression Modelling.

Knight, K. (2000). Mathematical Statistics- Volume in the Texts in Statistical Science Series- Chapman & Hall. Chapman Hall.

Tarima, S. S., & Dmitriev, Y. G. (n.d.). Statistical Estimation with Possible Incorrect Model Assumption.

Tim, S. (2005). Mastering Statistical Process control: a handbook for performance improvement using cases. New York.

Turn in your highest-quality paper
Get a qualified writer to help you with

“ Statistical Analysis Techniques – A Case Study ”

Get high-quality paper

NEW! AI matching with writer

Order an Essay Now & Get These Features For Free:

Turnitin Report

Formatting

Title Page

Citation

Outline

Place an Order