Data Collection and Analysis
OLS Estimators or regressions are used to make forecasts to understand the importance of a given variable in determining a dependent variable. Gender parity is a hotly debated topic currently as Australian industries too seem to have a gender pay gap. (Organization for Economic Co-operation and Development, 2017). This report is an attempt to analyse the gender pay gap among the highly educated population (i,e population that has at least a graduate degree). In order to do so, OLS estimation has been used on a sample collected from the Household Survey of Australia, 2015-2016. (Australian Bureau Of Statistics, 2017)
The Data for taken from the Australian Household Survey for 205-2016. (Australian Bureau Of Statistics, 2017)This Panel data survey respondents on a variety of demographic indicators such as age, education, wage, number of house works. The given data is a sample from this database. The data contains sample for only those workers who have at least a graduate degree. The data for cleaned before analyses, in order to remove outliers. Accordingly, the top 1% earners i,e those earning more than 183.03 AUD and the bottom 1% earner i.e. those earning less than 7.61AUD were removed as observations. Similarly, average wages are not a good statistic to consider since the wage differentials can be a result of the cost of living in a region. Hence, the log of wages or ln wages has been used as dependent variable in the OLS estimator. The log of a variable considers the elasticity of a variable, given a change in another variable, instead of the absolute value. (Wooldridge, 2015)
Table 1 Variables’ Name, Description and Type/Units
Name |
Description |
Type/Units |
Unique household number |
Assigned Unique Identification number for every Member surveyed |
String Variable (ID number) / None |
Person number within each income unit Position in income unit (relationship to the IU reference person) |
The number of people within the household Whether the member of the family is the head of the household or the spouse of the head of the household |
Ordinal Number eg, 1.2.3 Descriptive. Variables can eother take values “ Head of the income unit” or Spouse of the head of the income unit |
Age |
Age of the respondent |
Years (rounded off) |
Wage |
Hourly wage earned by the person |
Dollars |
Female |
Indicates whether the person is female or not |
1,0 (dummy variable) |
Ft/pt |
Indicates whether the person is employed full time or part time. |
1,0 (dummy variable) |
Ind |
Indicates the industry that a person is employed in, from a pre-drawn list of industries |
Categorical string data, pre selected categories |
OCC |
Indicates the occupation that the person is employed in |
Categorical string data, pre selected categories |
Figure Summary Statistics for Log of Wages, female respondents, respondents who are working and married, the total number of children in the household, and the square of the age of the respondent
Table 4 Summary Statistics for the Industry of the Main Job of the Respondent = Accommodation and Food Services; Administrative and Support services, ; Agriculture, Forestry and Fishing
Table 6 Summary Statistics for the Industry of the Main Job of the Respondent = Electricity, Gas, Water and Waste Services; or Financial and Insurance Services or Health Care and Social Assistance
Table 7 Summary Statistics for the Industry of the Main Job of the Respondent = Information Media and Telecommunications; or Manufacturing; or Mining
Table 8 Summary Statistics for the Industry of the Main Job of the Respondent = Other Services; or Professional, Scientific and Technical Services; Public Administration and Safety or Rental, Hiring and Real Estate Services
Variables’ Name, Description and Type/Units
Table 9 Summary Statistics for the Industry of the Main Job of the Respondent = Retail Trade; or Transport, Postal and Warehousing; or Wholesale Trade
Table 10 Summary Statistics for the Occupation of the Main Job of the Respondent = Community and Personal Service Workers; or Labourers; or Machinery Operators and Drivers
Table 11 Summary Statistics for the Occupation of the Main Job of the Respondent = Machinery Operators and Drivers; or Professionals
Table 12 Summary Statistics for the Occupation of the Main Job of the Respondent = Sales Workers; or Technicians and Trade Workers
- EXPLAIN THE RELATIONSHIPS (+VE or –VE), WHAT GRAPHS ARE TRYING TO SHOW?
The Residuals are concentrated negatively towards the lower wages, implying the model is not a good fit for lower wages. Additionally, the residuals seem to show a lower bias, implying that they are less than estimated. However, the slope of the line is positive.
The regression line is almost parallel with a very light downward trend. The Residuals are concentrated negatively towards the lower wages, implying the model is not a good fit for lower wages. The model is a better fit when the wages are higher.
The highest number of graduate respondents are professionals with the lowest number of reposndents being technicians and trade workers. However, the wages per house do not change much based on the occupation. However, they seem to be the highest for occupations “managers” and “Clerical and Administrative Workers”.
The initial model records the of of the hourly wages, regressed on the gender, years with children in the household, married or not, age, full time or part time, number of hours,
Ln(wagei) = f (femalei; marriedi, kidyrsi, agei, , agei,2, ftpti, hoursi, , indi, occi) — > model A
Ln(wagei) = f (female Xmaried, female x kidyears, age, hours, occ) — > model B
Testing for Multicollinearity: It is assumed that there is no collinearity. This assumption is tested using multicollinearity tests. Generally, the VIF tests is used after an OLS Regression is generated. However, since an OLS regression was not generated, the correlation statistics of the entire group was checked, manually. The variables age and age2 had a high correlation for model 1. Hence, age2 was dropped. (Wooldridge, 2015). Additionally, a special co-relation test was performed to see the
Normality: It is assumed that the data is normally distributed. This is tested using the normality test called Jarque Bera Test in eviews. ( IHS Global Inc., 2017) The null hypothesis is that data is normally distributed. The p-value was 0.00. Hence, the test was rejected. (Wooldridge, 2015)
Summary Statistics
Hetereskedascity: Normally, a test would be conducted to detect Homoskedascity. However, a variety of software packages (microfit, Gretl, Eviews, Oxmetrics) were used but the results could not be detected since no OLS regression was generated. However, we understand that heteroskedascity occurs when the residuals do not show a homogenous trend. The scatter plots in Figure 1 and 2 show heteroskedascity as the residuals around the fiited line do not show a homogenous trend. Hence, there is heteroskedascity, at least for these two variables. (Lambert, 2013).
Since, not other test could be performed, a bar diagram of the variable “indutsry of main job” was taken and each industry seems to have an equal weight. Hence, this variable was dropped.
Further, the regression of variable “Years with children in the household” was examined using a scatter plot
Given the heteroskedasticity , the variables
Ln_age = log (Age(years) )
Ln_hours = log (Hours (hour/week))
Ln_kd_years = log (Years with children in the household)
The final model is estimated as below
The regression model A is as below:
Ln(wagei) = α + β1 female+ β2married + β3 Years with children in the household + β3 Age (years)+ β4 ftpt + β5 Occupation of the Main Job of the Respondent, Clerical and Administrative Workers+ β6 Occupation of the Main Job of the Respondent, Community and Personal Service Workers+ β7 Occupation of the Main Job of the Respondent, Labourers + β8 Occupation of the Main Job of the Respondent, Machinery Operators and Drivers + β9 Occupation of the Main Job of the Respondent, Managers+ β10 Occupation of the Main Job of the Respondent.’ Professionals+ β11 Occupation of the Main Job of the Respondent, Managers + β12 Occupation of the Main Job of the Respondent, Sales Workers + β13 Occupation of the Main Job of the Respondent, Technicians and Trades Workers + u
Model B remained the same
Ln(wagei) = α + β1 (female Xmaried) + β2 female xkidyears, + u — > model B
- Model A is not a good fit since the R-Square is small. Moreover, the probability of the occupations of the main job are not significant is the main job is not a labourer or a machinery operation job. The two occupations that are significant are labourers and machinery operant jobs. Hence, these jobs are the only significant jobs that regress the wages. The standard errors of the model are small. But the R squared was very low. Hence, this model was not good fit. Additionally, no variables has any testing significance, since p-values hovered around zer. This model is not a good model and should be dropped in favour of model B.
However, the hours per week and whether female or not are negatively related, implying that women are likely to have lower wages and the increase in wages in lower, if the number of hours go high (Intuitively, this makes sense as labourers tend to have greater working hours and lower wages).
However, the results obtained were as below
In this model, no variable is significant.
- For Model B, both being married and having children or being married and the age of the kids in the household are not significant regressors, since the probability of both of these variables is 0. However, noth co-efficient of regression were positive regressors.
c )The mean of the log of wages is greater for males, implying that men tend to earn more on an average.
Table 17 Mean Comparison between Independent Variable for Male and Female Workers
ln_wages (female) |
|
Mean |
3.509849 |
Standard Error |
0.012946 |
Median |
3.51631 |
Mode |
3.218876 |
Standard Deviation |
0.43903 |
Sample Variance |
0.192747 |
Kurtosis |
0.884741 |
Skewness |
0.185265 |
Range |
3.178218 |
Minimum |
2.029463 |
Maximum |
5.207681 |
Sum |
4036.326 |
Count |
1150 |
Ln_wages (male) |
|
Mean |
3.63497342 |
Standard Error |
0.017099679 |
Median |
3.646754488 |
Mode |
3.218875825 |
Standard Deviation |
0.510991523 |
Sample Variance |
0.261112337 |
Kurtosis |
0.19455751 |
Skewness |
0.069169781 |
Range |
3.155640226 |
Minimum |
2.037316625 |
Maximum |
5.192956851 |
Sum |
3246.031264 |
Count |
893 |
Conclusion
The results are in line with a study by Morgan ( 2008) who found virtually no differences in pay for recent graduates based on gender, in USA. Nevertheless, Gender Pay Gap exists. (Blau & Kahn, 2007) (Arulampalam, Arulampalam, & Bryan, 2007) The reason for the opposing results may be due to a variety of factors such as bias in sample selection (only graduates were considered) or due to the estimation technique.
The OLS estimates given in the table are not reliable as OLS regression may not be a good technique for Panel Data. Plenty of the regressors have probabilities close to zero. Hence, it is important to use more sophisticated and current methods. (Schmidheiny, 2016)
References
IHS Global Inc. (2017, October 25). Residual Diagnostics. Retrieved from eviews: https://www.eviews.com/help/helpintro.html#page/content/testing-Residual_Diagnostics.html
Arulampalam, W., Arulampalam, W., & Bryan, M. L. (2007). Is There a Glass Ceiling over Europe? Exploring the Gender Pay Gap across the Wage Distribution. ILR Review, 60 (2), 163-186.
Australian Bureau Of Statistics. (2017, October 27). Household Income and Wealth, Australia, 2015-16. Retrieved from Australian Bureau Of Statistics: https://abs.gov.au/household-income
Blau, F. D., & Kahn, L. M. (2007). The Gender Pay Gap: Have Women Gone as Far as They Can? Academy of Management Perspectives, 21(1), 58-63.
Lambert, B. (2013, June). Heteroscedasticity: as a symptom of omitted variable bias – part 1. Retrieved from YouTube: https://www.youtube.com/watch?v=sFOtuCKQztc
Lambert, B. (2013, June 3). Heteroskedasticity summary. Retrieved from YouTube: https://www.youtube.com/watch?v=zRklTsY9w9c&t=185s
Lambert, B. (2013, June 18). Interpreting Regression Coefficients in Linear Regression. Retrieved from YouTube.com: https://www.youtube.com/watch?v=JwGaos2Y9bM
Morgan, L. A. (2008). Major Matters: A Comparison of the Within?Major Gender Pay Gap across College Majors for Early?Career Graduates. A Journal of Economy and Society, 47 (4), 625-650.
Schmidheiny, K. (2016). Panel Data: Fixed and Random Effects. Retrieved from https://www.cantab.net/users/bf100/pdf/pd_slides_fingleton.pdf
Wooldridge, J. M. (2015). Introductory Econometrics: A Modern Approach : Sixth Edition (pg 75). online: Cengage Learning.