Data Collection and Statistical Analysis
Statistical data analysis plays an important role in the process of decision making in many sectors. It is important to use proper statistical tools and techniques for data analysis. Here, we have to analyse the data for the different variables regarding the employees. We have to draw the conclusions for the variables such as gender, age, education, monthly salary, monthly expense, etc. We have to use descriptive statistics, graphical analysis, correlation and regression, hypotheses tests such as independent samples t tests and one factor ANOVA tests for checking different claims regarding the variables. Let us see this statistical analysis in detail.
For this study of statistical data collection and analysis, we consider the following research hypotheses:
- H0: There is no any statistically significant linear relationship exists between the two variable monthly salary and monthly expense.
- H0: There is no any significant difference exists between the average monthly salary for the female and male employees.
- H0: There is no any significant difference in average number of TV hours for female and male employees.
- H0: There is no any significant difference in the average number of hours for exercise for male and female employees.
- H0: There is no any significant difference in the average monthly salary for the employees with different educations.
- H0: There is no any statistically significant difference in the average monthly expense for the employees with different education levels.
- H0: There is no any statistically significant difference in the average number of TV hours for the employees with different education levels.
- H0: There is no any statistically significant difference in the average number of exercise hours for the employees with different education levels.
- H0: Two categorical variables gender and education levels are independent from each other.
For this research study, the data is collected by using the random sampling methods and this data is archived from the government website Bureau of Labour Statistics. Data is collected for 100 employees including male and female employees. The list of the variables used for this research study is given as below:
Variable |
Type |
Scale |
ID |
Qualitative |
Nominal |
Gender |
Qualitative |
Nominal |
Age |
Quantitative |
Ratio |
Education |
Qualitative |
Ordinal |
Monthly Salary ($) |
Quantitative |
Ratio |
Monthly Expense ($) |
Quantitative |
Ratio |
Medi-claim Insurance |
Qualitative |
Nominal |
Pension Plan |
Qualitative |
Nominal |
Exercise in hour per week |
Quantitative |
Ratio |
TV hour per week |
Quantitative |
Ratio |
First of all, we have to discuss the descriptive statistics for the variables included in the given data set. We know that the descriptive statistics gives us an idea about the nature of data for corresponding variable.
Descriptive statistics for the variable age is summarised as below:
N |
Minimum |
Maximum |
Mean |
Std. Deviation |
|
Age |
100 |
27.00 |
58.00 |
43.9200 |
8.78553 |
Valid N (listwise) |
100 |
The average age of the all employees or participants in the given data is given as 43.92 years with the standard deviation of 8.79 years. The minimum age of the participant is observed as 27 year while the maximum age is observed as 58 years.
Now, we have to see the descriptive statistics for the variable monthly salary which is summarised as below:
N |
Minimum |
Maximum |
Mean |
Std. Deviation |
|
Monthly Salary ($) |
100 |
4392.00 |
10569.00 |
7481.7300 |
1378.83790 |
Valid N (listwise) |
100 |
The average salary for all employees is given as $7481.73 per month with the standard deviation of $1378.84. The minimum salary is observed as $4392 while the maximum salary is observed as $10569.
Descriptive statistics for the variable monthly expense is given as below:
N |
Minimum |
Maximum |
Mean |
Std. Deviation |
|
Monthly Expense ($) |
100 |
2081.00 |
9257.00 |
5643.8800 |
1478.30870 |
Valid N (listwise) |
100 |
From above table, it is observed that the average monthly expense for employees is given as $5643.88 with the standard deviation of $1478.31. Minimum monthly expense is given as $2081, while maximum monthly expense is given as $9257.00.
Descriptive statistics for the variable exercise in hours per week are summarised in the following table:
N |
Minimum |
Maximum |
Mean |
Std. Deviation |
|
Exercise in hours per week |
100 |
.00 |
5.00 |
2.3400 |
1.75361 |
Valid N (listwise) |
100 |
Average number of hours per week for all employees is given as 2.34 hour with the standard deviation of 1.75. The minimum number of hours per week is observed as 0, while the maximum number of hours per week for exercise is given as 5.
Descriptive Statistics
Descriptive statistics for the variable TV hour per week is given as below:
N |
Minimum |
Maximum |
Mean |
Std. Deviation |
|
TV hour per week |
100 |
4.00 |
20.00 |
11.3700 |
5.02047 |
Valid N (listwise) |
100 |
From above table, it is observed that the average TV hour per week for all employees is given as 11.37 hour with the standard deviation of 5.02 hours. The minimum number of TV hour is given as 4, while maximum number of TV hour is given as 20.
Now, we have to see some frequency distributions for the variables included in the given study. The frequency distribution for the variable gender is given as below:
Frequency |
Percent |
Valid Percent |
Cumulative Percent |
||
Valid |
Female |
54 |
54.0 |
54.0 |
54.0 |
Male |
46 |
46.0 |
46.0 |
100.0 |
|
Total |
100 |
100.0 |
100.0 |
The frequency distribution of the variable education is given as below:
Frequency |
Percent |
Valid Percent |
Cumulative Percent |
||
Valid |
Less than graduation |
34 |
34.0 |
34.0 |
34.0 |
Graduation |
32 |
32.0 |
32.0 |
66.0 |
|
Post-graduation or more |
34 |
34.0 |
34.0 |
100.0 |
|
Total |
100 |
100.0 |
100.0 |
Frequency distribution for the variable whether employee have a mediclaim policy or not is given as below:
Frequency |
Percent |
Valid Percent |
Cumulative Percent |
||
Valid |
No |
55 |
55.0 |
55.0 |
55.0 |
Yes |
45 |
45.0 |
45.0 |
100.0 |
|
Total |
100 |
100.0 |
100.0 |
Frequency distribution for the variable pension plan is summarised as below:
Frequency |
Percent |
Valid Percent |
Cumulative Percent |
||
Valid |
No |
54 |
54.0 |
54.0 |
54.0 |
Yes |
46 |
46.0 |
46.0 |
100.0 |
|
Total |
100 |
100.0 |
100.0 |
In this section, we have to see some graphical analysis for the different variables under this study. Graphical analysis plays an important role in easy understanding of the concepts of statistical analysis. For this graphical analysis, we have to use bar charts and box plots for comparison purpose. All graphical comparisons are provided in the appendix part at the end of this report.
The study of correlation gives the relationship between the two variables. The technique of linear regression is useful for the prediction of the response variable or dependent variable. Here, we have to check whether the two variables monthly salary and monthly expense are related to each other or not. We have to check whether the relationship between two variables is statistically significant or not. First of all we have to see the scatter plot for the given two variables monthly salary and monthly expense. By using scatter plot, we have to check the relationship between the two variables. Required scatter diagram for the given two variables is given as below:
From above scatter plot for the given two variables, it is observed that there is a very strong positive linear relationship or association exists between the two variables such as monthly salary and monthly expense. This means, higher the monthly salary indicates higher the monthly expense.
Now, we have to use the linear regression model for the prediction of monthly expense based on the monthly salary. For this linear regression model, the independent variable or explanatory variable is given as monthly salary while dependent variable or response variable for this linear regression model is given as monthly expense. For this linear relationship we have to check the following null and alternative hypothesis.
Frequency Distribution
Null hypothesis: H0: There is no any statistically significant linear relationship exists between the two variable monthly salary and monthly expense.
Alternative hypothesis: Ha: There is a statistically significant linear relationship exists between the two variables monthly salary and monthly expense.
Required regression analysis is given as below:
Model |
Variables Entered |
Variables Removed |
Method |
1 |
Monthly Salary ($)a |
. |
Enter |
a. All requested variables entered.
b. Dependent Variable: Monthly Expense ($)
For this regression, model summary is provided below:
Model |
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
1 |
.957a |
.916 |
.915 |
431.43690 |
a. Predictors: (Constant), Monthly Salary ($) |
From this table, it was observed that the linear correlation coefficient between the two variables monthly salary and monthly expense is given as 0.957, which indicate a strong positive linear relationship or association exists between dependent variable monthly expense and independent variable monthly salary. The coefficient of determination or the value of R square for this linear regression model is given as 0.916, which means about 91.60% of the variation in the dependent variable monthly expense is explained by the independent variable monthly salary.
Now, we have to see the ANOVA table for above regression model, which is stated as below:
Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
|
1 |
Regression |
1.981E8 |
1 |
1.981E8 |
1064.334 |
.000a |
Residual |
1.824E7 |
98 |
186137.798 |
|||
Total |
2.164E8 |
99 |
a. Predictors: (Constant), Monthly Salary ($)
b. Dependent Variable: Monthly Expense ($)
The p-value for this ANOVA table is given as 0.00 which is less than the default level of significance or alpha value 0.05, so we reject the null hypothesis that there is no any statistically significant linear relationship exists between the two variable monthly salary and monthly expense.
This means, there is sufficient evidence to conclude that there is a statistically significant linear relationship exists between the two variables monthly salary and monthly expense. So, this regression model will be useful for further prediction of the monthly expense based on the monthly salary.
The regression coefficients for this regression model are summarised in the following table:
Model |
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
||
B |
Std. Error |
Beta |
||||
1 |
(Constant) |
-2031.987 |
239.205 |
-8.495 |
.000 |
|
Monthly Salary ($) |
1.026 |
.031 |
.957 |
32.624 |
.000 |
a. Dependent Variable: Monthly Expense ($)
Required Regression model or regression equation is given as below:
Monthly expense = -2031.987 + 1.026*Monthly Salary
By using this regression model, we can predict the values for monthly expenses based on the salaries.
In this section, we have to check the significance differences in the population means by using the independent samples t tests. First of all, we have to check the significant difference in the average monthly salary for the female and male. The null and alternative hypotheses for this test are given as below:
Null hypothesis: H0: There is no any significant difference exists between the average monthly salary for the female and male employees.
Graphical Analysis
Alternative hypothesis: Ha: There is a significant difference exists between the average monthly salary for the female and male employees.
We consider 5% level of significance for this test.
The test results are summarised as below:
Gender |
N |
Mean |
Std. Deviation |
Std. Error Mean |
|
Monthly Salary ($) |
Female |
54 |
7403.8704 |
1476.55210 |
200.93329 |
Male |
46 |
7573.1304 |
1264.52132 |
186.44350 |
t-test for Equality of Means |
||||||||
t |
df |
Sig. (2-tailed) |
Mean Difference |
Std. Error Difference |
95% Confidence Interval of the Difference |
|||
Lower |
Upper |
|||||||
Monthly Salary ($) |
Equal variances assumed |
-0.61 |
98 |
0.543 |
-169.26 |
277.5361 |
-720.021 |
381.5012 |
Equal variances not assumed |
-0.617 |
97.995 |
0.538 |
-169.26 |
274.1083 |
-713.219 |
374.6993 |
The p-value for this test is given as 0.543 which is greater than alpha value of 0.05. So, we do not reject the null hypothesis.
There is sufficient evidence to conclude that there is no any statistically significant difference between the average monthly salary for the female and male employees.
Now, we have to check another hypothesis which is stated as below:
Null hypothesis: H0: There is no any significant difference in the average monthly expense for the female and male employees.
We consider 5% level of significance for this test.
The test results for this test are summarised as below:
Gender |
N |
Mean |
Std. Deviation |
Std. Error Mean |
|
Monthly Expense ($) |
Female |
54 |
5607.2963 |
1615.49455 |
219.84096 |
Male |
46 |
5686.8261 |
1315.51893 |
193.96268 |
t-test for Equality of Means |
||||||||
t |
df |
Sig. (2-tailed) |
Mean Difference |
Std. Error Difference |
95% Confidence Interval of the Difference |
|||
Lower |
Upper |
|||||||
Monthly Expense ($) |
Equal variances assumed |
-0.267 |
98 |
0.79 |
-79.5298 |
298.0137 |
-670.928 |
511.8686 |
Equal variances not assumed |
-0.271 |
97.818 |
0.787 |
-79.5298 |
293.175 |
-661.34 |
502.2799 |
The p-value for this test is given as 0.79 which is greater than the given level of significance or alpha value 0.05, so we do not reject the null hypothesis that there is no any statistically significant difference in the mean monthly expense for male and female employees.
Now, we have to test whether the average number of TV hours for male and female employees are same or not. The null and alternative hypotheses are given as below:
Null hypothesis: H0: There is no any significant difference in average number of TV hours for female and male employees.
Alternative hypothesis: Ha: There is a significant difference in the average number of TV hours for the female and male employees.
For this test, we consider 5% level of significance.
The test results are summarised below:
Gender |
N |
Mean |
Std. Deviation |
Std. Error Mean |
|
TV hour per week |
Female |
54 |
12.4074 |
5.06009 |
.68859 |
Male |
46 |
10.1522 |
4.74209 |
.69918 |
t-test for Equality of Means |
||||||||
t |
df |
Sig. (2-tailed) |
Mean Difference |
Std. Error Difference |
95% Confidence Interval of the Difference |
|||
Lower |
Upper |
|||||||
TV hour per week |
Equal variances assumed |
2.286 |
98 |
0.024 |
2.25523 |
0.98649 |
0.29758 |
4.21288 |
Equal variances not assumed |
2.298 |
97.082 |
0.024 |
2.25523 |
0.98133 |
0.30758 |
4.20289 |
The p-value for this test is given as 0.024 which is less than the given level of significance or alpha value 0.05. So, we reject the null hypothesis that there is no any significant difference in average number of TV hours for female and male employees.
There is sufficient evidence to conclude that there is a significant difference in the average number of TV hours for the female and male employees.
Now, we have to test whether the average number of hours for exercise for male and female employees is same or not.
Null hypothesis: H0: There is no any significant difference in the average number of hours for exercise for male and female employees.
Alternative hypothesis: Ha: There is a significant difference in the average number of hours for exercise for male and female employees.
Correlation and Regression
Test results for this test are given as below:
Gender |
N |
Mean |
Std. Deviation |
Std. Error Mean |
|
Exercise in hours per week |
Female |
54 |
2.5926 |
1.83795 |
.25011 |
Male |
46 |
2.0435 |
1.61873 |
.23867 |
t-test for Equality of Means |
||||||||
t |
df |
Sig. (2-tailed) |
Mean Difference |
Std. Error Difference |
95% Confidence Interval of the Difference |
|||
Lower |
Upper |
|||||||
Exercise in hours per week |
Equal variances assumed |
1.572 |
98 |
0.119 |
0.54911 |
0.34926 |
-0.14399 |
1.24222 |
Equal variances not assumed |
1.588 |
97.88 |
0.115 |
0.54911 |
0.34572 |
-0.13696 |
1.23519 |
The p-value for this test is given as 0.119 which is greater than alpha value of 0.05. So, we do not reject the null hypothesis that there is no any significant difference in the average number of hours for exercise for male and female employees.
In this section, we have to compare the population averages for more than two groups. first we have to test whether the monthly salary for the employees with different education is same or not. For checking this hypothesis we have to use one way analysis of variance. The null and alternative hypothesis for this test is given as below:
Null hypothesis: H0: There is no any significant difference in the average monthly salary for the employees with different educations.
Alternative hypothesis: Ha: There is a significant difference in the average monthly salary for the employees with different educations.
We consider 5% level of significance for this test.
Test results are summarised as below:
Monthly Salary ($)
N |
Mean |
Std. Deviation |
Std. Error |
95% Confidence Interval for Mean |
Minimum |
Maximum |
||
Lower Bound |
Upper Bound |
|||||||
Less than graduation |
34 |
7061.706 |
1359.564 |
233.1633 |
6587.332 |
7536.08 |
4392 |
9778 |
Graduation |
32 |
7225.719 |
1302.629 |
230.2744 |
6756.071 |
7695.367 |
5289 |
9731 |
Post-graduation or more |
34 |
8142.706 |
1251.285 |
214.5936 |
7706.112 |
8579.3 |
5904 |
10569 |
Total |
100 |
7481.73 |
1378.838 |
137.8838 |
7208.139 |
7755.321 |
4392 |
10569 |
Monthly Salary ($)
Sum of Squares |
df |
Mean Square |
F |
Sig. |
|
Between Groups |
2.295E7 |
2 |
1.147E7 |
6.735 |
.002 |
Within Groups |
1.653E8 |
97 |
1703797.387 |
||
Total |
1.882E8 |
99 |
The p-value for this ANOVA test is given as 0.002 which is less than the given level of significance or alpha value 0.05, so we reject the null hypothesis that There is no any significant difference in the average monthly salary for the employees with different educations.
There is sufficient evidence to conclude that there is a significant difference in the average monthly salary for the employees with different educations.
Now, we have to test whether the average monthly expense for the employees with different education levels is same or not.
Null hypothesis: H0: There is no any statistically significant difference in the average monthly expense for the employees with different education levels.
Alternative hypothesis: Ha: There is a any statistically significant difference in the average monthly expense for the employees with different education levels.
Test results for this test are given as below:
Monthly Expense ($)
N |
Mean |
Std. Deviation |
Std. Error |
95% Confidence Interval for Mean |
Minimum |
Maximum |
||
Lower Bound |
Upper Bound |
|||||||
Less than graduation |
34 |
5262.235 |
1439.163 |
246.8144 |
4760.088 |
5764.383 |
2081 |
8479 |
Graduation |
32 |
5279.438 |
1376.105 |
243.2633 |
4783.299 |
5775.576 |
3074 |
7995 |
Post-graduation or more |
34 |
6368.529 |
1373.395 |
235.5353 |
5889.329 |
6847.73 |
4183 |
9257 |
Total |
100 |
5643.88 |
1478.309 |
147.8309 |
5350.552 |
5937.209 |
2081 |
9257 |
Monthly Expense ($)
Sum of Squares |
df |
Mean Square |
F |
Sig. |
|
Between Groups |
2.706E7 |
2 |
1.353E7 |
6.932 |
.002 |
Within Groups |
1.893E8 |
97 |
1951524.912 |
||
Total |
2.164E8 |
99 |
The p-value for this test is given as 0.002 which is less than the given level of significance or alpha value 0.05, so we reject the null hypothesis that there is no any significant difference in the average monthly expense for the employees with different educations.
There is sufficient evidence to conclude that there is a significant difference in the average monthly expense for the employees with different educations.
Now, we have to use this test for checking the significant difference for the average number of TV hours for the employees with different levels of education.
Hypotheses Tests
Null hypothesis: H0: There is no any statistically significant difference in the average number of TV hours for the employees with different education levels.
Alternative hypothesis: Ha: There is a any statistically significant difference in the average number of TV hours for the employees with different education levels.
TV hour per week
N |
Mean |
Std. Deviation |
Std. Error |
95% Confidence Interval for Mean |
Minimum |
Maximum |
||
Lower Bound |
Upper Bound |
|||||||
Less than graduation |
34 |
10.5 |
4.34323 |
0.74486 |
8.9846 |
12.0154 |
4 |
19 |
Graduation |
32 |
12.6875 |
4.62418 |
0.81745 |
11.0203 |
14.3547 |
4 |
20 |
Post-graduation or more |
34 |
11 |
5.83615 |
1.00089 |
8.9637 |
13.0363 |
4 |
20 |
Total |
100 |
11.37 |
5.02047 |
0.50205 |
10.3738 |
12.3662 |
4 |
20 |
TV hour per week
Exercise in hours per week |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
Between Groups |
85.935 |
2 |
42.967 |
1.730 |
.183 |
Within Groups |
2409.375 |
97 |
24.839 |
||
Total |
2495.310 |
99 |
P-value for this test is given as 0.183 which is greater than alpha value 0.05, so we do not reject the null hypothesis that there is no any statistically significant difference in the average number of TV hours for the employees with different education levels.
Now, we have to use same test for checking the significant difference in the average number of hour of exercise for the employees with different education levels.
Null hypothesis: H0: There is no any statistically significant difference in the average number of exercise hours for the employees with different education levels.
Alternative hypothesis: Ha: There is a any statistically significant difference in the average number of exercise hours for the employees with different education levels.
Test results are given as below:
Exercise in hours per week
N |
Mean |
Std. Deviation |
Std. Error |
95% Confidence Interval for Mean |
Minimum |
Maximum |
||
Lower Bound |
Upper Bound |
|||||||
Less than graduation |
34 |
2.2059 |
1.80537 |
0.30962 |
1.576 |
2.8358 |
0 |
5 |
Graduation |
32 |
2.25 |
1.75977 |
0.31109 |
1.6155 |
2.8845 |
0 |
5 |
Post-graduation or more |
34 |
2.5588 |
1.72664 |
0.29612 |
1.9564 |
3.1613 |
0 |
5 |
Total |
100 |
2.34 |
1.75361 |
0.17536 |
1.992 |
2.688 |
0 |
5 |
Exercise in hours per week
Sum of Squares |
df |
Mean Square |
F |
Sig. |
|
Between Groups |
2.499 |
2 |
1.249 |
.401 |
.671 |
Within Groups |
301.941 |
97 |
3.113 |
||
Total |
304.440 |
99 |
P-value for this test is given as 0.671 which is greater than alpha value 0.05, so we do not reject the null hypothesis that there is no any statistically significant difference in the average number of exercise hours for the employees with different education levels.
In this section we want to check whether the two categorical variables gender and education are independent from each other or not. For checking this claim we have to use chi square test for independence of two categorical variables. The null and alternative hypothesis for this test is given as below:
Null hypothesis: H0: Two categorical variables gender and education levels are independent from each other.
Alternative hypothesis: Ha: Two categorical variables gender and education levels are not independent from each other.
We use 0.05 alpha level of significance for this test.
Test results are summarised below:
Count
Education |
Total |
||||
Less than graduation |
Graduation |
Post-graduation or more |
|||
Gender |
Female |
18 |
17 |
19 |
54 |
Male |
16 |
15 |
15 |
46 |
|
Total |
34 |
32 |
34 |
100 |
Value |
df |
Asymp. Sig. (2-sided) |
|
.074a |
2 |
.964 |
|
Likelihood Ratio |
.074 |
2 |
.964 |
Linear-by-Linear Association |
.059 |
1 |
.809 |
N of Valid Cases |
100 |
a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 14.72.
The p-value for this test is given as 0.964 which is greater than the alpha value 0.05, so we do not reject the null hypothesis that two categorical variables gender and education levels are independent from each other.
From the given statistical data analysis, we get some significant results for the different tests used in this study. We used the simple descriptive statistics for getting the primary idea about the nature of variables involved in this research study. The correlation coefficient between the two variables monthly salary and monthly expense is given as 0.957, which indicate a strong positive linear relationship. We conclude that there is a statistically significant linear relationship exists between the two variables monthly salary and monthly expense. No any statistically significant difference is observed between the average monthly salary and expense for the female and male employees. A significant difference is observed in the average number of TV hours for the female and male employees. No any significant difference is observed in the average number of hours for exercise for male and female employees. A significant difference is observed in the average monthly salary and expense for the employees with different educations. Also, there is no any statistically significant difference in the average number of TV hours for the employees with different education levels. There is sufficient evidence to conclude that two categorical variables gender and education levels are independent from each other.
Conclusions
Conclusions for the given statistical data analyses are summarised as below:
- The average age of the all employees or participants in the given data is given as 43.92 years with the standard deviation of 8.79 years. The average salary for all employees is given as $7481.73 per month with the standard deviation of $1378.84. it is observed that the average monthly expense for employees is given as $5643.88 with the standard deviation of $1478.31. All results are based on the sample of 100 observations.
- It was observed that the linear correlation coefficient between the two variables monthly salary and monthly expense is given as 0.957, which indicate a strong positive linear relationship or association exists between dependent variable monthly expense and independent variable monthly salary. The coefficient of determination or the value of R square for this linear regression model is given as 0.916, which means about 91.60% of the variation in the dependent variable monthly expense is explained by the independent variable monthly salary.
- There is sufficient evidence to conclude that there is a statistically significant linear relationship exists between the two variables monthly salary and monthly expense.
- There is sufficient evidence to conclude that there is no any statistically significant difference between the average monthly salary for the female and male employees.
- There is sufficient evidence to conclude that there is no any statistically significant difference in the mean monthly expense for male and female employees.
- There is sufficient evidence to conclude that there is a significant difference in the average number of TV hours for the female and male employees.
- There is sufficient evidence to conclude that there is no any significant difference in the average number of hours for exercise for male and female employees.
- There is sufficient evidence to conclude that there is a significant difference in the average monthly salary for the employees with different educations.
- There is sufficient evidence to conclude that there is a significant difference in the average monthly expense for the employees with different educations.
- There is sufficient evidence to conclude that there is no any statistically significant difference in the average number of TV hours for the employees with different education levels.
- There is sufficient evidence to conclude that there is no any statistically significant difference in the average number of exercise hours for the employees with different education levels.
- There is sufficient evidence to conclude that two categorical variables gender and education levels are independent from each other.
References
Antony, J. (2003). Design of Experiments for Engineers and Scientists. Butterworth Limited.
Babbie, E. R. (2009). The Practice of Social Research. Wadsworth.
Beran, R. (2000). React scatterplot smoothers: Superefficiency through basis economy. Journal of the American Statistical Association.
Bickel, P. J. and Doksum, K. A. (2000). Mathematical Statistics: Basic Ideas and Selected Topics, Vol I. Prentice Hall.
Casella, G. and Berger, R. L. (2002). Statistical Inference. Duxbury Press.
Cox, D. R. and Hinkley, D. V. (2000). Theoretical Statistics. Chapman and Hall Ltd.
Degroot, M. and Schervish, M. (2002). Probability and Statistics. Addison – Wesley.
Dobson, A. J. (2001). An introduction to generalized linear models. Chapman and Hall Ltd.
Evans, M. (2004). Probability and Statistics: The Science of Uncertainty. Freeman and Company.
Hastle, T., Tibshirani, R. and Friedman, J. H. (2001). The elements of statistical learning: data mining, inference, and prediction: with 200 full-color illustrations. Springer – Verlag Inc.
Hogg, R., Craig, A., and McKean, J. (2004). An Introduction to Mathematical Statistics. Prentice Hall.
Liese, F. and Miescke, K. (2008). Statistical Decision Theory: Estimation, Testing, and Selection. Springer.
Pearl, J. (2000). Casuality: models, reasoning, and inference. Cambridge University Press.
Ross, S. (2014). Introduction to Probability and Statistics for Engineers and Scientists. London: Academic Press.
Website for Data collection: https://www.bls.gov/