Scatter Plot
The paper discusses the change of the Market price (Y) of houses due to changes in the Price index (X1), Annual % change (X2), Total number of square meters (X3), and Age of house (years) (X4) with the help of building regression model. Y is the dependent variable and the rest of the four variables are the predictor variables. The multiple linear regression model is fitted using the Data Analysis Tool-Pak of MS Excel software tool (Slezà et al., 2014). There are 15 observations from the financial years 2002-03 to 2016-17. The output of the regression analysis needs to be discussed in detail with the model estimation, building confidence interval, and elaborate interpretation.
Analysis for Sydney City
- Graphical representations
The scatter plot is shown below:-
The dependent variable Market price is plotted on the vertical axis and the rest of the predictor variables are plotted against the horizontal axis. The scatter plot shows a positive relationships between the Price index and Market price of Sydney and Market price and Total number of square meters but the relationship is weaker for the latter. The Annual % change and Age of house are showing negative association with the Market price (Tang & Zhang, 2013).
- Description of the regression model:
Let the multiple linear regression model be defined as,
Y = b0 + b1X1 + b2 X2 + b3X3 + b4X4 + e ;
b0 = the y-intercept
b1 = the partial regression coefficient of X1
b2 = the partial regression coefficient of X2
b3 = the partial regression coefficient of X3
b4 = the partial regression coefficient of X4
e = residual of estimation
The output of the regression analysis is shown in the table below:
Regression Statistics |
|
||||
Multiple R |
0.88916481 |
||||
R Square |
0.79061406 |
||||
Adjusted R Square |
0.70685968 |
||||
Standard Error |
43.8878261 |
||||
Observations |
15 |
||||
ANOVA |
|||||
|
df |
SS |
MS |
F |
Significance F |
Regression |
4 |
72728.5872 |
18182.14679 |
9.43967451 |
0.001993481 |
Residual |
10 |
19261.4128 |
1926.141284 |
||
Total |
14 |
91990 |
|
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
Intercept |
548.978108 |
81.13153739 |
6.766519231 |
4.94032E-05 |
368.2057774 |
729.7504386 |
Sydney price Index |
1.963493894 |
0.583205471 |
3.366727492 |
0.007160758 |
0.664031125 |
3.262956664 |
Annual % change |
-5.622204236 |
3.240109357 |
-1.735189655 |
0.113361729 |
-12.84161778 |
1.597209306 |
Total number of square meters |
0.519145629 |
0.3239088 |
1.60275247 |
0.140071458 |
-0.202568152 |
1.240859409 |
Age of house (years) |
-2.48786597 |
1.129750872 |
-2.2021368 |
0.052251738 |
-5.005107781 |
0.029375841 |
Table 1
(Source: As created by the Author)
(4)
Model
The estimated regression equation is given from Table 2 as,
The equation shows that there is a linear relationship of Market price with the four predictor variables (Cameron & Trivedi, 2013). The Sydney price index and the Age of house have negative association whereas, the rest of the two have positive association with the Market price.
- Interpretation of the Coefficients
Here,
The y-intercept indicates that estimated Market price = 548.978 if all the independent variables are zero. The slope coefficient states that one unit increase in the Sydney price index will increase the market price by 1.963 units, keeping other variables constant. means that a one unit increase in the Annual % change will decrease the Market price 5.622 units. The value of indicates 0.519 unit increase of the Market price for single unit increase of X3 variable, keeping X1, X2, and X4 constant. One unit increase in the X4 variable will decrease the Y value by if other variables are constant (Hinton, 2014).
- R-square
The value of the coefficient of determination of R2 is 0.790614058 which indicates a good fit as higher the value of R2 interprets better fitting of the regression model.
Here, 79.06% of the variability of the Market price is explained by the independent variables.
- Confidence intervals (CI)
The 95% CI for the regression coefficient of each of each of the independent variables can be evaluated using the formula (, i = 1 to 4. standard error, and = critical value of the t-statistic at 5% significance level (Zou, 2013). The values are already obtained in the Excel sheet.
Regression Model
95% CI for X1 = (0.664031125, 3.262956664); It represents that the researcher is 95% confident that the Price index value of Sydney will lie between 0.664031125 and 3.262956664.
95% CI for X2 = (-12.84161778, 1.597209306); Here the value of Annual % change changes between the lower bound = 0.664031125 and upper bound = 3.262956664 with 95% confidence.
95% CI for X3 = (-0.202568152, 1.240859409); the values of variable X3 will lie within the given interval with 95% confidence having upper bound 1.24085 and lower bound -0.020257.
95% CI for X4 = (-5.005107781, 0.029375841); Like the above three, the value of X4 variable will lie within the calculated CI with 95% confidence level.
The simple linear regression model is shown below:
The regression analysis shows that the R2 value = 0.098 that implies that model is not a good fit. The estimated regression equation is-
= 659.143 + 0.5636
The R2 value of the former (original) regression model is 0.790614058 which is higher than that of the re-estimated model (0.098125419). Thus the, the multiple linear regression model is a better fit for the Market price than the simple linear regression model. Greater percentage (79.06%) of variability in the predicting variable is explained by all the four predictor variables than that of the variability explained by only “Total number of square meters” (9.81%).
If Square meters = 400 then the estimated Market price value = 659.143 + (0.5636 400) = 884.583 that is, $884583.
This part shows the output of the regression analysis for the Market price of the Brisbane:
Regression Statistics |
|
|||||
Multiple R |
0.847362 |
|||||
R Square |
0.7180224 |
|||||
Adjusted R Square |
0.6052313 |
|||||
Standard Error |
12.573407 |
|||||
Observations |
15 |
|||||
ANOVA |
||||||
|
df |
SS |
MS |
F |
Significance F |
|
Regression |
4 |
4025.588 |
1006.397 |
6.365952 |
0.008182698 |
|
Residual |
10 |
1580.906 |
158.0906 |
|||
Total |
14 |
5606.493 |
||||
|
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
Intercept |
89.871245 |
16.91237 |
5.313936 |
0.000341 |
52.18814276 |
127.5543 |
Brisbane Price Index |
-0.5082788 |
0.47905 |
-1.06101 |
0.313636 |
-1.575668349 |
0.559111 |
Annual % change |
1.4620069 |
1.016153 |
1.438766 |
0.180771 |
-0.802123664 |
3.726137 |
Total number of square meters |
0.056465 |
0.094729 |
0.59607 |
0.564376 |
-0.154603966 |
0.267534 |
Age of house (years) |
-0.7797538 |
0.348462 |
-2.2377 |
0.049196 |
-1.556176484 |
-0.00333 |
Table 2
(Source: As created by the Author)
The estimated regression equation is given from the above table is written as,
The equation shows that there is negative linear relationship of the Market price with the Brisbane price index and Age of house. However, the relationship is positive for Total number of square meters and Annual % change (Cohen, West & Aiken, 2014).
The y-intercept indicates that estimated Market price = 89.871 if all the independent variables are zero. The Market price increases 1.462 units for unit increase in the X2 variable keeping other variables constant. Similar changes occur for the X3 variable. The 0.508 unit decrement in the Market price occurs for the one unit increment of the Brisbane price index. The regression coefficient being negative, a one unit increase in the X4 variable denotes 0.78 unit decrease in the Y variable.
The value of the coefficient of determination of R2 is 0.718 which suggests that the regression model is a good fit for the given predictor variable. Approximately, 71.8% of the variation in the value of the Market price variable is explained by the four dependent variables.
95% CI for X1 = (-1.575668349, 0.559110736); With 95% confidence, the Brisbane Price index value will lie between -1.575668349 and 0.559110736.
95% CI for X2 = (-0.802123664, 3.726137411); Here the value of Annual % change changes between the lower-bound = -0.802123664 and upper-bound = 3.726137411 with 95% confidence.
95% CI for X3 = (-0.154603966, 0.267533947); the values of variable X3 will lie within the given interval with 95% confidence.
Estimated Regression Equation
95% CI for X4 = (-1.556176484, -0.003331143) that shows the upper-bound and lower-bound of the X4 variable within which the values will lie with 95% confidence.
The simple linear regression model is shown below that has R2 value 0.097 which is small, indicating a very poor fit. Here the fit is explaining 9.73% variation in the Market price explained by Total number of square meters.
The regression model is defined as,
= 65.988 + 0.139
The R2 value of the former (original) regression model is 0.718022376 which is higher than that of the re-estimated model (0.097334455). Thus, the multiple linear regression model is a better fit for the Market price than the simple linear regression model.
If Square meters = 400 then the estimated Market price value = 65.988 + (0.139 400) = 121.588 or $121588.
Regression Statistics |
|
|||||
Multiple R |
0.801295 |
|||||
R Square |
0.642074 |
|||||
Adjusted R Square |
0.498904 |
|||||
Standard Error |
19.79055 |
|||||
Observations |
15 |
|||||
ANOVA |
||||||
|
df |
SS |
MS |
F |
Significance F |
|
Regression |
4 |
7025.999 |
1756.5 |
4.484691 |
0.0247334 |
|
Residual |
10 |
3916.658 |
391.6658 |
|||
Total |
14 |
10942.66 |
||||
|
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
Intercept |
82.70826 |
26.59056 |
3.110437 |
0.011052 |
23.460804 |
141.9557 |
Melbourne Price Index |
-0.67601 |
0.927016 |
-0.72923 |
0.482589 |
-2.741533 |
1.389509 |
Annual % change |
1.851263 |
1.501679 |
1.232796 |
0.245847 |
-1.494686 |
5.197213 |
Total number of square meters |
0.116315 |
0.145828 |
0.79762 |
0.443618 |
-0.20861 |
0.44124 |
Age of house (years) |
-1.24947 |
0.451735 |
-2.76594 |
0.019926 |
-2.255999 |
-0.24294 |
Table 3
(Source: As created by the Author)
The estimated regression equation is given from the above table is written as,
The equation shows that there is negative linear relationship of the Market price with the Melbourne price index and Age of house. However, the relationship is positive for Total number of sq-meters and Annual % change.
Here,
The y-intercept indicates that estimated Market price = 82.708 if all Xi’s are zero. The Market price decreases 0.676 units for unit increase in the X1 variable keeping other variables constant. The decrement of 1.24974 units of the Market price occurs for a single unit increase in the X4 variable. One unit increase of the X2 variable indicates an increase of 1.851 units and one unit increase of the X3 variable interprets 0.116 units of the Y variable (Bates et al., 2014).
The value of the coefficient of determination of R2 is 0.642074 which suggests a moderately good fit of the regression model for the given predictor variable. 64.21% of the variation in the value of the Market price variable is explained by the four dependent variables (Nimon & Oswald, 2013).
95% CI for X1 = (-2.741533069, 1.389508577); With 95% confidence, the Brisbane Price index value will lie between -2.741533069 and 1.389508577.
95% CI for X2 = (-1.494686069, 5.19721276); Here the value of Annual % change changes between the lower-bound = -1.494686069 and upper-bound = 5.19721276 with 95% confidence.
95% CI for X3 = (-0.208609648, 0.441240343); the values of variable X3 will lie within the given interval with 95% confidence.
95% CI for X4 = (-2.255998646, -0.242941926) that shows the upper-bound and lower-bound of the X4 variable within which the values will lie with 95% confidence.
The simple linear regression model is shown below that has R2 value 0.108.
The regression model is defined as,
= 50.092 + 0.204
The R2 value of the former (original) regression model is 0.642074 which is higher than that of the re-estimated model 0.108085. Thus, the multiple linear regression model is a better fit for the Market price than the simple linear regression model.
If Square meters = 400 then the estimated Market price value = 50.09162 + (0.204012 400) = 131.69642 or $131696.
Conclusion
From the above multiple linear regression analyses of the Market price of house for three cities in Australia- Sydney, Brisbane, and Melbourne, it can be concluded that the four chosen independent variables are providing a good fit of the regression model for the dependent variable Market price for all the three cities as the values of the coefficients of determination are more than 0.5. On the other hand, the re-estimated simple linear regression model where the predictor variable is only the Total number of square meters does not provide a good explanation for the variability in the Market price variable. Therefore, the multiple linear regression model is the recommended model to predict the value of the market price variable.
References
Bates, D., Maechler, M., Bolker, B., & Walker, S. (2014). lme4: Linear mixed-effects models using Eigen and S4. R package version, 1(7), 1-23.
Cameron, A. C., & Trivedi, P. K. (2013). Regression analysis of count data (Vol. 53). Cambridge university press.
Cohen, P., West, S. G., & Aiken, L. S. (2014). Applied multiple regression/correlation analysis for the behavioral sciences. Psychology Press.
Hinton, P. R. (2014). Statistics explained. Routledge.
Nimon, K. F., & Oswald, F. L. (2013). Understanding the results of multiple linear regression: Beyond standardized regression coefficients. Organizational Research Methods, 16(4), 650-674.
SlezÃ, P., Bokes, P., Pavol, N. Ã., & WaczulÃkovÃ, I. (2014). Microsoft Excel add-in for the statistical analysis of contingency tables. International Journal for Innovation Education and Research, 2(5), 90-100.
Tang, Q. Y., & Zhang, C. X. (2013). Data Processing System (DPS) software with experimental design, statistical analysis and data mining developed for use in entomological research. Insect Science, 20(2), 254-260.
Zou, G. Y. (2013). Confidence interval estimation for the Bland–Altman limits of agreement with multiple observations per individual. Statistical methods in medical research, 22(6), 630-642.