Important Statistics to Include in Pivot Table
Sample statistical Report
The author of the sample report described the variable types – which included version, gender, whether the respondents liked the product, how much they would pay for the product, and whether they were old or young. The first variable, version was a categorical variable whose variable was asking the best version and the responses were either version 1, version 2 or neither. Therefore, this is a categorical variable with three levels. The second variable was gender and it was a two level categorical variable for male and female responses. The respondents were also asked whether they liked either the products and their response were ‘Like’ or ‘Hate’, hence the variable was a categorical variable of two levels. In addition, a variable of how much the respondents would pay for the product was asked it was an open-ended question – recording to a quantitative (continuous) variable. Lastly, the age of the respondents was recorded as a categorical variable of two levels – young for those aged below 40 years and old for those aged equal and above 40 years.
Summary statistics were used to analyse the quantitative variable (how much they would pay), individually and categorised by gender among other categories, by calculating the measures of central tendency and variation. Further, a histogram was plotted to display the distribution of the responses and it was observed that 20 respondents would be willing to pay between zero and 0.5 and 80 would be willing to pay between 2.5 and 3.5. The author also used the PowerPivot capability of analysing categorical variables to create summaries by categories, frequency, and contingency tables. For instance, a contingency table between age and whether they liked the product was created indicating that 82.09% of those who were above 40 years liked the product compared to 72.73% of those who were below 40 years. Further summaries of the amount they would be willing to pay for the product were calculated by age and it was found that those aged above 40 years were willing to pay a higher price on average compared to those aged below 40 years – and the table also showed the frequencies of old and young participants. Stacked bar graphs and back-to-back histograms were also used to present data accordingly.
- Summary Statistics – relationship between old people and whether they like the product
Column Labels |
|
|||||
hate |
like |
Total Count |
Total Percent |
|||
Row Labels |
Count |
Percent |
Count |
Percent |
|
|
old |
7 |
10.77% |
58 |
89.23% |
65 |
100.00% |
young |
9 |
25.71% |
26 |
74.29% |
35 |
100.00% |
Grand Total |
16 |
16.00% |
84 |
84.00% |
100 |
100.00% |
59 old people (p1 estimate = 89.23% of the old people) would say that they like the product.
26 young people (p2 estimate = 74.29% of the young people) would say that they like the product.
- Relationship of between old people and whether they like the product’
Comment on Relationship between Variables
Based on the contingency table above, older people like the product more compared to the young.
- Estimate of p1 – p2
- Summary statistics of old people and how much one would pay
Are they old? |
Average of how much would pay? |
StdDev of how much would pay? |
Count of how much would pay? |
old |
2.868 |
0.921 |
65 |
young |
2.417 |
1.251 |
35 |
Grand Total |
2.71 |
1.064 |
100 |
Old people
Young people
- Relationships between the variables
On average, older people are willing to pay a higher amount for the product as compared to the young people. Also, the amounts the old are willing to pay has a lower variation.
- The difference between the means
- Scatter plot
- Comment about the relationship
There is a positive relationship between number of bets and Profit.
- Estimate profit of a casino when there are 1000 bets
- Using answer in section 2:Testing for difference in proportion at 5% significance level
- The appropriate hypothesis
Null hypothesis: There is no significant difference in proportions of between old and young people who like the product.
Alternative hypothesis: There is a significant difference in proportions between old and young people who like the product
- The p-value using webpage https://epitools.ausvet.com.au/content.php?page=z-test-2’
The p-value = 0.0519
- State whether or not you reject the H0
The p-value is greater that the significance level, we reject the null hypothesis.
- Conclusion
We conclude that the difference between proportions of old and young people who like the product is significantly different from zero.
- Using answer in section 3: Difference between means at 5% level of significance
- The null and alternative hypothesis
H0: The difference in means of how much they would pay between old and young is not significantly different from zero.
H1: The difference in means of how much they would pay between old and young is significantly different from zero.
- Finding the p-value using https://www.medcalc.org/calc/comparison_of_means.php
P-value = 0.0426
- state whether or not you reject H0
We fail to reject the null hypothesis
- give a conclusion in plain English
We conclude that the mean the difference in mean of amount they would pay between old and young is significantly different from zero.
- Summary Statistics
Row Labels |
Count of do you support proposed change? |
Count of do you support proposed change?2 |
no |
81 |
41.12% |
yes |
116 |
58.88% |
Grand Total |
197 |
100.00% |
- Sample size (n) and proportion of who support change
- 90% of the proportion that support change
Section 7:
- Back to back histogram
- Description of both variables
In the back to back histogram above, there are two variables is age – a quantitative variables and gender – a categorical variable with two levels (male and female).
- The relationship between age and gender
The distribution of age among males and females is similar, which is skewed to the right for both categories of gender.
- Consider the histogram you found yourself and discussed
The discussion is not useful in business because it does not show any significant difference between males and females.
- Consider the following discussion taken from the sample report you had to read in section 1, would the discussion be useful in business?
According to the distribution of how much they would pay for the product among males and females, it does not show any significant difference, hence not useful in business.
Section 8:
- Using Section 2
- Z score; average is 0.14 and standard deviation is 0.088
Column Labels |
|
|||||
hate |
like |
Total Count |
Total Percent |
|||
Row Labels |
Count |
Percent |
Count |
Percent |
|
|
old |
7 |
10.77% |
58 |
89.23% |
65 |
100.00% |
young |
9 |
25.71% |
26 |
74.29% |
35 |
100.00% |
Grand Total |
16 |
16.00% |
84 |
84.00% |
100 |
100.00% |
- P-value using wolframalpha.com
- Expected ranks
- Complete the table
Which sample |
Rank lowest to highest |
Estimate X |
Zscore=(X-mean)/stdev |
|
Lowest estimate |
475 |
1 |
-0.143057504 |
-3.194652657 |
Estimate from allocated sample |
443 |
553 |
0.149450549 |
0.112738 |
Highest estimate |
663 |
1000 |
0.543672014 |
4.570203319 |
- Using section 3
- Z-score in section 3c); average = 0.408 and standard deviation = 0.26
- P-value using wolframalpha.com
- Expected ranks
- Complete the table
Which sample |
Rank lowest to highest |
Estimate X |
Zscore=(X-mean)/stdev |
|
Lowest estimate |
475 |
1 |
-0.434735858 |
-3.238970652 |
Estimate from allocated sample |
443 |
568 |
0.450549451 |
0.164841759 |
Highest estimate |
663 |
1000 |
1.607575758 |
4.613465 |
- Using section 4
- Z score for the slope coefficient
Slope coefficient = 1.2833
- P value (Z < z score) using wolframalpha.com
- The expected rank
- Complete the table
Which sample |
Rank lowest to highest |
Estimate X |
Zscore=(X-mean)/stdev |
|
Lowest estimate |
141 |
1 |
-0.003480103 |
-4.029377699 |
Estimate from allocated sample |
443 |
927 |
1.283285804 |
1.395943 |
Highest estimate |
398 |
1000 |
1.871737174 |
3.876998 |
- Comparisons of the predicted and actual ranks
In a) above, the rank obtained from my sample (542.5) and the actual rank (553) vary by approximately 10.
In b) above, the rank of the z-score obtained for difference in means (565.7) differs with the actual rank (568) by approximately 2.3.
Finally, in c) the predicted rank differs from the actual by 8.1.
- Comment on the following facts
*“part (d) shows totally different datasets that have same sampling distribution, (the normal distribution)”
The data is not completely from different populations. The variation in the ranks is as result of the standard deviation. A sample and a population differ due to sampling errors – which leads to variations in results.
*”Hypothesis testing uses a sampling distribution, p-value is a shaded area on the sampling distribution”
It is true that hypothesis testing uses a sampling distribution and p-value is a section in the distribution.