Report objectives
The real estate industry has significantly expanded over time. Due to this, there has been experienced a challenge in estimating the house prices, a factor that has seen most companies either overquote or underprice the houses (Laura, et al., 2017). Therefore, there is a need to come up with a stabilized method through which house pricing can be done. While companies may come up with prices of their own, it is an important aspect that a data-driven decision be made in consideration of empirical studies performed on house prices (Laura, et al., 2017). In this study, the historical data regarding the house prices alongside transaction dates, house distance to the market, the number of conveniences stores around the house, the house latitude, and the house longitudes for the Sky Heights company was applied in establishing the evident influencers of house prices. This is a construction company that’s headquartered in Brooklyn, New York City in the United States of America. For the past 13 years, the firm has grown to a giant in construction of real estate products especially the construction of bungalows (Nasarab, 2022). The firm’s rapid growth has been attributed to its interaction with clients, speed in construction and final product delivery. The firm’s current goal is to emerge on top of the every growing and competitive real estate industry both in terms of structures dealt in and the overall revenue (Nasarab, 2022). The structures that are constructed by the Sky Height include single family house, construction of buildings, and residential building construction. In their vision, the firm wishes to expand its structures of operation, the firm is interested in venturing into modern housing and technology. Specifically, the company is interested in venturing in Green building technology at an affordable construction and sales cost, a factor that makes it even more important to predict the significant factors affecting the house price in comparison to what the clients are willing to pay and that the current market price (Nasarab, 2022). While the current company projects are not stated, the firm brags of great delivery as observed in PS 234, MTA Whitestone Bridge, Gantry Plaza and Urban houses all in New York city; structures that have proven quality for the Sky Height company. The company usually records a revenue of less than $5000000 annually with each of its 25 employees recording close to $85000 within a year (Nasarab, 2022). The level of revenue with expenses included results in an overall profit that’s low, a factor that’s most likely influenced by houses prices among other factors (Laura, et al., 2017). For instance, the price of a house influences the decision by the potential buyer to purchase and the profit margin in general. In line with the statement, highly valued houses are only sold if their quality fit to the quoted price, otherwise, potential buyers will opt for alternative goods (Laura, et al., 2017). Such houses include those with many bedrooms, accessibility, social neighborhood and the level of security. This implies that coming up with a house price that appeals to potential buyers and that is favorable in terms of competition can only be achieved through inclusion of many significant impactors thus affect the overall house revenue and the resulting profit (Ira & Sampurna, 2018). Therefore, in an attempt to improve the percentage of profit, revenue, and overall sales, the company desires to come up with a strict model that can be used in establishing house prices and thus the purpose of this study (Nasarab, 2022). This was tested using both simple and multiple linear regression models. This kind of modeling is key in determining a linear relationship between variables including the respective goodness of fit (Ira & Sampurna, 2018). Thus it is possible to determine if the model is appropriate for use in coming up with house prices or not. The study objectives are stated below.
- To estimate the housing prices using the historical dataset available for the assignment. The use of historical data is a key component towards data-driven decisions. For instance, from the historical data on house prices, diagnostic, predictive and prescriptive analytics will be applied in determining the factors affecting house prices for Sky Height Company and the extent to which such factors affect it. As such, the significant variables can be used in establishing a close to accurate house pricing model that’s helpful and thus the reason for this objective. In particular, variables including transaction data, house age, closeness to the market, the number of convenience stores, longitude and latitude are tested of significance towards their influence on house prices. From the variables, the ones that proves to be significant are used to coming up with a final model that predicts the best house price which is healthy for the company and the entire real estate sector.
- To develop a regression model to estimate the house prices of unit area.
Data analysis
While there are several analytics that are used in predicting the house prices, the choice of the model is key in coming up with an accurate model. Regression analytics have proven to be a key method as it not only gives the model equation but also the variable(s) significance, the percentage variation explained and the correlation value. Therefore, this objective will test the goodness of the regression model in coming up with the house pricing. In specific, a variation percentage of above 50% will mean a good percentage hence the regression model considered good.
The first step involved performing six simple linear regressions to determine the influence of transaction date, house age, distance to the nearest market, number of convenience stores, latitude, and longitude. These are discussed below.
Transaction date vs house prices
The relationship was first visualized using a scatter plot and this is displayed below.
The graph displayed a small positive relationship between house prices and transaction dates. The significance of this relationship was tested using simple linear regression and this resulted in the table below.
It is shown from the results that the transaction date was an insignificant model predictor at a 5% level of significance (F (1, 412) = 3.18, p=.08). This variable only explained .77% (R-squared = 0.007655) of the house price variation which is very small and thus confirms the insignificant relationship.
House age vs house prices
The relationship was first visualized using a scatter plot and this is displayed below.
The graph displayed a negative relationship between house prices and house age implying that older houses were cheaper whereas new ones were expensive. The significance of this relationship was tested using simple linear regression and this resulted in the table below.
It is shown from the results that the house age was a significant model predictor at a 5% level of significance (F (1, 412) = 19.11, p=.00). This variable explained 4.43% (R-squared = 0.04434) of the house price variation which is small.
Distance to the nearest market vs house prices
The relationship was first visualized using a scatter plot and this is displayed below.
The graph displayed a negative relationship between house prices and house distance to the nearest market station implying that houses closer to the market recorded lower prices whereas those that were far away from the market station were cheaper. The significance of this relationship was tested using simple linear regression and this resulted in the table below.
It is shown from the results that the house distance from the nearest market was a significant model predictor at a 5% level of significance (F (1, 412) = 342.2, p=.00). This variable explained 45.38% (R-squared = 0.4538) of the house price variation which is average and considered as good.
Number of convenience stores vs house prices
The relationship was first visualized using a scatter plot and this is displayed below.
The graph displayed a positive relationship between house prices and the number of convenience stores around the house implying that houses with more convenience stores closer recorded higher prices compared to those that had a lower number of convenience stores around that were cheaper. The significance of this relationship was tested using simple linear regression and this resulted in the table below.
It is shown from the results that the number of convenience stores close to the house was a significant model predictor at a 5% level of significance (F (1, 412) = 199.3, p=.00). This variable explained 32.60% (R-squared = 0.326) of the house price variation which is average and considered as fair.
The graph displayed a positive relationship between house prices and the house latitude implying that houses at higher latitudes recorded higher prices while lower latitude houses, far from the equator, recorded lower prices. The significance of this relationship was tested using simple linear regression and this resulted in the table below.
It is shown from the results that the latitude degree of the house was a significant model predictor at a 5% level of significance (F (1, 412) = 175.3, p=.00). This variable explained 29.85% (R-squared = 0.2985) of the house price variation which is lowly average and considered fair.
The graph displayed a positive relationship between house prices and the house longitude implying that houses at higher longitudes recorded higher prices while lower longitude houses, far from the Greenwich Meridian, recorded lower prices. The significance of this relationship was tested using simple linear regression and this resulted in the table below.
It is shown from the results that the longitude degree of the house was a significant model predictor at a 5% level of significance (F (1, 412) = 155.4, p=.00). This variable explained 27.38% (R-squared = 0.2738) of the house price variation which is lowly average and considered fair.
Multiple linear regression (age, market distance, convenience stores, latitude, and longitudes)
Using the significant predictors from the simple linear regression analyses above, a multiple linear regression analysis was performed to establish the joint influence of the variables on house prices.
From the results, there was discovered a joint significant influence of house age, distance to the marketplace, number of convenience stores, latitude, and longitudes on the house prices (F (5, 408) = 108.7, p = .00) thus leading to the conclusion that the variables were joint impactors of the model. The resulting percentage variation explained by the variables on house prices is 57.12% (R-squared = 0.5712) which is stronger than that of the six simple linear regressions showing that there is an improvement by combining the variables. Individually, house age (t (-6.90), p = .00), distance to the nearest market (t (-5.89), p = .00), number of convenience stores (t (6.11), p = .00) and latitude (t (5.29), p = .00) were significant house price predictors whereas longitude (t (-0.16), p = .87) did not. The resulting model equation is,
Price (Y) = -4946 – 0.2689 (Age) – 0.004259 (Market) + 1.163 (stores) + 237.8 (Latitude) – 7.805 (Longitude)
It implies from the regression equation that assuming other factors are kept constant, then increasing house age by a unit resulted in a .2689 decline in house prices. Further, increasing the market distance by a unit resulted in a .004259 decline in the price of houses. Thirdly, increasing by a unit, the number of convenience stores and latitudes resulted in a 1.163 and 237.8 rise in the value of house prices respectively. Finally, increasing by a unit the value of longitude resulted in a 7.805 decline in house prices.
Assuming that a house is 5 years old, 50 from the market, is close to 24 convenience stores, located at 25 latitudes and 121.5 longitudes, then the house price can be determined as follows.
Price (Y) = -4946 – 0.2689 (5) – 0.004259 (50) + 1.163 (13) + 237.8 (25.00) – 7.805 (121.5) = 78.05
Assuming that the house age is increased to 10, distance to the market is increased to 70, the number of convenience stores to 15 while the latitude and longitudes are constant then the new house price becomes,
Price (Y) = -4946 – 0.2689 (10) – 0.004259 (70) + 1.163 (20) + 237.8 (25.00) – 7.805 (121.5) = 65.15
This is evident that the model equation erected can be used in determining house prices.
Conclusion
The study sought to establish the significant influencers of house prices in regards to the date of transaction, house age, distance to the nearest market, number of convenience stores, latitude, and longitude. Of these variables, every factor except the transaction date significantly predicted the house prices for the Sky Heights company. Combining the significant variables into a multiple linear regression resulted in a stronger model with longitude being insignificant, a factor that was likely influenced by the presence of confounders. It can therefore be concluded from the results that while seeking to establish the price of the constructed houses, Sky Heights construction company should consider age, distance to the nearest market, longitude, latitude, and availability of convenience stores as substantial impactors. This study results will not only be applied in coming up with significant influencers of house prices but also to set up detailed scientific research on what factors are great impactors of house prices which is key to data-driven business decisions.
From the study results, the following recommendations should be made in regards to the house prices for the Sky Heights construction company.
While establishing the factors affecting the house price, the company management should:
- Take into consideration the house age as a significant impactor of the house price. In most cases, new houses come with high prices unlike old houses with lower prices. Therefore, it is recommended that this trend be considered in defining house prices.
- Consider house distance to the nearest market station while setting house prices. In usual instances, market places are encompassed with a series of activities associated with noise and possible insecurity, and therefore, most people who prefer a peaceful life shy off to cooler places thus translating to higher prices for such cool areas making market-close houses to experience low demand.
- Take into consideration the number of convenience stores while determining the housing prices. In most instances, it is common for houses close to convenience stores to be more expensive based on the availability of residents’ wants, unlike apartments that are far away from convenient stores.
- Consider both latitude and longitudes in defining house prices. Since latitude defines temperature, this is a key component of the choice of residential areas by individuals. Therefore, addressing house pricing should involve taking into consideration these factors.
- Develop a multiple linear regression model formula that involves many significant influencers of house prices, say 10 variables including variables that are not included in this study, which will be used in coming up with house prices rather than setting house prices based on company views. This will promote a data-driven decision for the company which is more reliable and accurate.
References
Ira, S. & Sampurna, K., 2018. Linear Regression Model to Identify the Factors Associated with Carbon Stock in Chure Forest of Nepal. Scientifica.
Laura, T., Loreta, K. & Jurga, N., 2017. Determinants of Housing Market Fluctuations: Case Study of Lithuania. Procedia Engineering, 172(2017), pp. 1169-1175.
Nasarab, A., 2022. Sky Heights Construction Corp. Construction Journal.