Data Description
The main aim of this project is to perform the analysis on the data received from Hawaiian Electric Corporation and identify whether this is suitable enough to predict future changes in the corporation prices. The corporation conducts various businesses. These selling the electricity or providing the facility to various places.
Daily data will be used to run this analysis. The correlation between the daily price changes will be measured using multiple regression analysis. The variance Inflation factor (VIF), adjusted R2, Analysis of Variance (ANOVA) and residual analysis will be discussed.
The data set consists of 382 data points, which include 382 trading days. 6 input measurements are considered for each day. Out of them, 5 are input measurements and one of them is the output measurement. The last column indicates the future changes in the share prices of the companies that is the changes from the close of trading today to the opening of trading tomorrow morning. The other columns include information of the changes in price of various financial instruments. These include
- The price of the same company’s shares
- Interest rates
- Currency exchange rates
- Price of oil
The changes in price have been ranked and sorted. The number ranges from 0 to 1 where
- “1” indicates the highest value
- “0.5” indicates the median value
- “0” indicates the least value.
Hawaiian Electric Industries, Inc., through its subsidiaries, engages in the electric utility and banking businesses primarily in the state of Hawaii. The segment of electric utility of the company generates purchases, transmits, distributes and sells electric energy. It generates renewable energy sources and potential energy sources with the help of wind, solar, photovoltaic, geothermal, hydroelectric, wave, municipal waste, sugarcane wastes and bio fuels. This electric segment of the company distributes and sells electricity on different islands of Hawaii, Oahu, Lanai, Molokai and Maui. It also serves the suburban communities, resorts, installation of the armed forces of the United States, agricultural operations. There is also a bank segment of the organization, which operates various accounts such as a savings account, money market, checking and certificates of deposit. It also deals with the loans of residential and commercial real estate, mortgages, constructions, developments, multifamily (both residential and commercial real estates) and businesses. The Hawaiian Electric Industries has its headquarter in Honolulu, Hawaii and was founded in the year 1891.
The variance inflation Factor (VIF) test is used to test the multicollinearity between the variables. When two or more dependent variables are found to be highly correlated, then the problem of multicollinearity arises (García et al., 2015). In the presence of multicollinearity in a data, model fitting becomes difficult.
Variance Inflation Factor (VIF)
PHStat function was used in Excel to find the VIF for all the input variables. The variable will be termed as highly correlated if the VIF > 5 and will be less correlated if VIF < 5. In this analysis, the VIF of all the variables were found to be less than 5. Thus, it can be said that there is very less correlation or no correlation between the dependent variables.
Thus, all the input variables are independent and are required for the analysis. None of the columns can be removed from the dataset.
The presence of outliers or non-normal residuals in the data set can be determined very easily from the normal probability plot. The Normal Probability Plot given in figure 1 shows the residual plot. From the figure it is clear that the plot is mostly linear and there is no outlier to the data. Thus, it can be concluded that the hypothesis framed on the basis of this data will be valid and accurate to predict the future sales price.
It can be seen that the plot has a little disturbance near 0.5. This is mainly because a small number of trading days are present when the share prices became stagnant. Thus, the rank for those days were 0.5. Despite of this disturbance in the trend, the share price has been increasing in a straight line. Thus, this little disturbance did not make any difference. Since the line is almost linear, it can be said the residuals are normally distributed and the hypothesis and predictions can be carried out safely.
Analysis of Variance (ANOVA) determines whether there is any relationship between the independent and the dependent variables. It is said that there exists relationship between the variables if the p-value (Significance F) is less than 0.05. From figure 3, it is clear that the p-value is much less than 0.05. Thus, it can be said that null hypothesis is rejected and there exists a relationship between the future share price and one or more of the independent variables.
The ANOVA table has certain limitations. It can only say whether there is any relation between the independent and the dependent variables. Nothing about the strength of the relationship can be said from the ANOVA table. The relationships that can exist can be weak or strong. Also, it can only be said that one or more of the five variables have relationship with the output or dependent variable. Which variables specifically have the relation is not known from the ANOVA table. Other measures will be taken into consideration for this analysis.
Residual Analysis
The strength of the relationship can be determined from the coefficient of determination (R2). The value of R2 varies from -1 to 1. The higher the value of R2 the stronger is the relationship between the independent and the dependent variables (Draper, & Smith, 2014). From figure 4, it can be seen that the R square value is 0.121. This means that 0.121 is the proportion of variation that can be explained by the independent variables. Thus, this regression model can explain only 12.1 percent of the change in Hawaiian Electric prices. Thus, the model is not a good fit as 87.9 percent of the variation in the prices will remain unexplained from this model.
The previous prediction has been on the negative note. On the positive side, it can be said that, until now, the next day prices were completely unpredictable. With the help of this model, atleast 12.1 percent of the prices can be predicted. Thus, it can be concluded that the change in the share prices are not random. There is a relationship between the changes, thought that can be a very weak relationship.
It is showing the hypothesis testing results, it can be seen that all the p values recorded are less than 0.05 (the 95 % level of significance). Thus, all the variables are important for predicting the future sales. None of the variables can be deleted from the model.
The coefficients column in figure 5 gives the values of the numbers that fits the model most appropriately and best fits the data. These coefficients will give an idea about what causes the changes in the share prices of the Hawaiian Electric.
The largest positive coefficient is of the variable “Year_X_Natural_gas”. The coefficient is recorded as 0.7507. The variable indicates an interaction effect between the year and the price of natural gas. This variable suggests that with the increase in the price of natural gas, the share prices will increase.
The price in aluminum is less obvious, when the Baltic Dry Index rises, the price in aluminum influences the share prices but with the change in the price of Hawaiian Electric, the aluminum price reduces the future share price of Hawaiian electric. None of the coefficients is close to zero. Thus, the previous claim is true. All the inputs does influence the output variable and none can be deleted.
Using the ANOVA, Confidence Interval and Prediction statistics, the past data has been used to predict the future share prices of the Hawaiian Electric.
Conclusion
The multiple linear regression predicts the future changes in the share prices in the Hawaiian Electric Corporation. The VIF tests shows the presence or absence of multicollinearity in the data. The residual analysis shows that the values are normally distributed and this will result in valid hypothesis test. The coefficient of variation shows the value of R Square is very less. The ANOVA test showed that the variables are related to the dependent variable.
The predicted values and actual values of the future share prices shown in figure 6 showed that there is difference in the actual value and predicted value of the share prices. The R Square value being so less, it is not right to comment about the change in the future share prices.
References
Draper, N. R., & Smith, H. (2014). Applied regression analysis. John Wiley & Sons.
García, C. B., García, J., López Martín, M. M., & Salmerón, R. (2015). Collinearity: Revisiting the variance inflation factor in ridge regression. Journal of Applied Statistics, 42(3), 648-661.