Data Collection and Analysis
Section 1: Introduction:
Answer 1. A.
The Australian Taxation Office (ATO) is the major revenue collection agency of Australia. ATO delivers different socio-economic facilities and incentive programs, major aspects of superannuation system of Australia. Australian Taxation Office plays the vital role as custodian of the Australian Business Register (Daniel, Keen and McPherson 2010). The data set undertaken in this analysis is gathered from the historical data of Australian Taxation Office (ATO). The data set aims to find the gender gap of salary and wages between males and females. The causes of gender gap produce the discrimination of hiring and appraisal in the job. Few propositions are considered assuming the salary and wage as well as occupation in accordance to the unlikely gender differs with a clear significance. The data set regarded for analysis summarises the job profile of females and males in various organizations. The salary or wage and donation or gift amounts vary with respect to gender orientation. The researcher considered a total of 1000 samples in the data. The data analysis is executed with the help of MS-Excel software.
The research question:
The researcher is eager to find whether there is any difference of the average salary amount between male and females in the second data with respect to first data set too or not.
Answer 1. B.
The first data set is a secondary data. The researcher himself or herself did not collected the data set. The ATO has given the data to the analyst for analysis.
The data set have four variables. These variables are Gender, Occ_code, Sw_amt and Gift_amt. “Gender” indicates the sex of the participants, “Occ_code” refers the occupation code of the samples, “Sw_amt” shows the salary or wage amount of the samples and “Gift_amt” is an indicator of gifts or donation deductions.
In this data set, the quantitative or numerical variables with numerical values are “Salary/wage amount” and “Gift amount”. On the other hand, “Gender” and “Occupation code” are the qualitative or categorical variables. “Gender” is a nominal variable with two levels “Male” and “Female”. “Occupation code” is a nominal variable also which is changed in the quantitative variables. For example, the occupation code “Manager” is recoded as “1”.
Table 1: The first five cases of data set 1 are shown below
Answer 1. C.
The data of “Gender”, “Working or not” and “Sw_amt” are tabulated after the survey. This second data set is named as “data set 2” and stored in a different worksheet of the same excel file. With the help of random sampling method, the analyst collected the data set. The analyst with help of random data collection method had gathered the responses from 100 people or students of the college. Hence, the data is known as college data. Out of them, only 86 people, responded about their amount of salary. The researcher with the help of randomisation technique, collected the data of 75 samples.
Descriptive Statistics
The simple questionnaire helped to collect the data from ground survey. The questionnaire includes two questions: 1) “What is your gender?” 2) “You are a working people or not?” 3)“What is your amount of salary or wage?”.
The second data set is a primary data. It is collected by the analyst himself/herself (Bluman 2013). The data set involves one numerical that is amount of salary or wage and two categorical variables are gender and working status.
Section 2: Descriptive Statistics
Figure 1: Bar plot of Gender wise frequencies of Occupational code
The bar chart indicates that the female workers prefer to work in mainly two occupations that are “Professionals” and “Clerical and Administrative workers”. Conversely, the male workers prefer to work mainly in two occupations that are and “Professionals” and “Technicians and Trade Workers”. Females are least interested working as machinery operators and drivers same as males.
The bar plot refers that the average salary and standard deviation of salary are both greater for males than females. That is males earn more than females on an average and the distribution of earnings of males in more scattered than females.
Table 2: Table of Numerical Summary of Gender wise salary/wage amount
The female employees earn lesser salary than male employees on an average (McDermott, Schemitsch and Simoncelli 2013). The standard deviations indicate that the scatter ness of the employees is greater for males than females. The minimum salary for the employees of both the genders is 0. However, the highest salary earned by any male is greater than any female. The total amount of salary of the males is almost twice than the total salary of the females
Figure 3: Scatter plot of Salary/wage amount and Gift amount
The scatter plot of the two numerical variables salary or wage amount and gift amount indicate that the association of these two variables is almost negligible (Weiss and Weiss 2012). The gift amount is almost negligible with respect to salary or wage amount.
Section 3: Inferential Statistics
Answer 3. A.
The highest amount of median salary is observed in the occupations Professionals (2), Managers (1), Machinery operators and drivers (7) and Technicians and Trades workers (3). The median of salaries arranging in decreasing order are found for Professionals (median salary = $70427), Managers (median salary = $59606), Machinery operators and drivers (median salary = $59831) and Technicians & Trades Workers (median salary = $56628).
Hypotheses
Table 3: The median values of salary and wage amounts of top 4 occupationsThe proportions of males and females in the occupation of “Manager” are 0.58 and 0.42 respectively. The proportions of males and females in the occupation of “Professional” are 0.48 and 0.52 respectively. The proportions of males and females in the occupation of “Technicians and Trade Worker” are 0.88 and 0.12. The proportions of males and females in the occupation “Machinery operators and driver” are 0.94 and 0.06 respectively. The proportions of males and females in all 4 types of considered occupations that have higher median salary are 0.65 and 0.35.
Answer 3. B.
Hypotheses:
Null hypothesis (H0): The proportion of males who are working as machinery operators and drivers is equal to 0.8.
Alternative hypothesis (HA): The proportion of males who are working as machinery operators and drivers is greater than 0.8.
(Lowry 2014)
Table 5: One sample proportional Z-test
Among 50 machinery operators and drivers, 47 are males. The observed proportion is 0.928571. The hypothesised proportion of the males among all the machinery operators and drivers is 0.8. After applying one-sample proportional Z-test with 5% level of significance, the calculated Z-value = 2.0831. At 95% confidence interval, the calculated Z-critical = 1.959964. As, 2.083095 > 1.959964, therefore, it could be said that, Z-calculated > Z-critical. Hence, the null hypothesis is rejected with 95% probability. The alternative hypothesis is accepted accordingly (Grapov and Newman 2012). Hence, the proportion of males among all the machinery operators and drivers is greater than 80%.
Answer 3. C.
Hypotheses:
Null hypothesis (H0): The difference of average amount of salaries or wages of males and females is equal to 0.
Alternative hypothesis (HA): The average amounts of salary or wage of males is higher than the average amounts of salary or wage of females.
Table 6: Two-sample t-test assuming unequal variances
Among 1000 samples, 461 are females and 539 are males. The average amount of salary or wage of males is 55679.90 and the average amount of salary or wage of females is 35461.83. After applying two-sample t-test assuming unequal variance at 5% level of significance, the calculated t-statistic = (-5.8017). The two-tailed p-value of test-statistic is calculated as 0.0 for unequal variances (De Winter 2013). As, the two-tailed p-value is less than 0.05, therefore, the null hypothesis is rejected with 95% probability. The alternative hypothesis is accepted consecutively. It could be interpreted that the average amount of salary or wage of males is higher than the average amount of salary or wage of females.
Conclusion
Answer 3. D.
Hypotheses:
Null hypothesis (H0): The difference of average salary or wage amounts of males and the females is 0 for the second data set.
Alternative hypothesis (HA): The average amount of salary or wage of males are higher than the average salary of females for the second data set.
Table 7: Two-sample t-test assuming unequal variances
Out of 75 sampled observations, 21 are females and 54 are males. The average amount of salary or wage of males is 49034.33 and the average amount of salary or wage of females is 31777.81. After applying two-sample t-test assuming unequal variance at 5% level of significance, the calculated t-statistic is (-2.3495). The two-tailed p-value of test-statistic is calculated as 0.0227 for unequal variances. As, the calculated two-tailed p-value is less than 0.05, therefore, the null hypothesis is rejected with 95% evidence. The alternative hypothesis is accepted accordingly. It could be inferred that the average amount of salary or wage of males is greater than the average amount of salary or wage of females.
Section 4: Discussion and Conclusion
Answer 4. A.
The previous sections help to conclude some inherent facts that females commly prefer to do the jobs relating to the clerical and administrative occupation as well as professional occupation. Males prefer the occupation of technicians and trade workers as well as professional. Both the genders prefer the professional jobs. Among nine types of occupations, the salary is greater as per the middle most values in the professions of “Manager”, “Professionals”, “Technicians and Trades Workers” and “Machinery operators and drivers”. The machinery operator and drivers are mainly preferred by the males as the percentage of males is greater than 80%. Hence, the occupation also has a significant role for determining the proportion of salary according to the gender type. Males earn more salary or wage than males. This claim is also proved in the analysis of second data set. As, proving the hypothesis associated to the research question, the average salaries of the males and females of primary data set is found to be equal. On the other hand, the gift amount given to the females is higher than males
Answer 4. B.
The future research that of this topic finds some research limitations of the study. The number of samples of primarily collected data set is not much higher. It could have been greater. The causes and factors that are relating the amount of salary of males and females are not provided in the data set. Those explanatory factors must be found out, so that the analyst could find out the statistical significance of the predictors (Simonsohn, Simmons and Nelson 2015).
Data of relevant aspects should be collected from the target population (Christensen et al. 2011). Those parameters must be included in the data sets. More information could be taken out from the analysed dataset if the information with respect to the “years of experience”, “educational level”, “hours of working” and “designation of work” are added in this data set.
References:
Arnold, C., Matthews, L.J. and Nunn, C.L., 2010. The 10kTrees website: a new online resource for primate phylogeny. Evolutionary Anthropology: Issues, News, and Reviews, 19(3), pp.114-118.
Bluman, A.G., 2013. Elementary statistics. Chennai: McGraw Hill.
Christensen, L.B., Johnson, B., Turner, L.A. and Christensen, L.B., 2011. Research methods, design, and analysis.
Daniel, P., Keen, M. and McPherson, C. eds., 2010. The taxation of petroleum and minerals: principles, problems and practice. Routledge.
De Winter, J.C., 2013. Using the Student’s t-test with extremely small sample sizes. Practical Assessment, Research & Evaluation, 18(10).
Grapov, D. and Newman, J.W., 2012. imDEV: a graphical user interface to R multivariate analysis tools in Microsoft Excel. Bioinformatics, 28(17), pp.2288-2290.
Lowry, R., 2014. Concepts and applications of inferential statistics.
McDermott, J.H., Schemitsch, M. and Simoncelli, E.P., 2013. Summary statistics in auditory perception. Nature neuroscience, 16(4), p.493.
Simonsohn, U., Simmons, J.P. and Nelson, L.D., 2015. Specification curve: Descriptive and inferential statistics on all reasonable specifications.
Weiss, N.A. and Weiss, C.A., 2012. Introductory statistics. London: Pearson Education.