Subject 1:
- The process starts with description of co linearity among independent variables. The independent variables and correlation between them can be depicted here:
|
WAGES |
KCAPITAL |
Labor |
D1 |
D2 |
WAGES |
1 |
|
|
|
|
KCAPITAL |
0.905554 |
1 |
|
|
|
Labor |
0.564246 |
0.250203 |
1 |
|
|
D1 |
0.025988 |
0.028247 |
-0.02952 |
1 |
|
D2 |
0.028428 |
-0.02534 |
0.073159 |
0.06072 |
1 |
The highlighted correlation is greater then 0.8. Therefore, the variable has to be removed from the dataset and it can be said that the rest of the variables are not dangerously correlated. Regression analysis on the dependent variable and the rest three of the independent variable is given below:
|
|
|
|
|
|
|
|
|
Regression Statistics |
|
|
|
|
|
|
|
|
Multiple R |
0.82712 |
|
|
|
|
|
|
|
R Square |
0.684128 |
|
|
|
|
|
|
|
Adjusted R Square |
0.681428 |
|
|
|
|
|
|
|
Standard Error |
17644.38 |
|
|
|
|
|
|
|
Observations |
473 |
|
|
|
|
|
|
|
ANOVA |
|
|
|
|
|
|
|
|
|
df |
SS |
MS |
F |
Significance F |
|
|
|
Regression |
4 |
3.16E+11 |
7.89E+10 |
253.403 |
1.2E-115 |
|
|
|
Residual |
468 |
1.46E+11 |
3.11E+08 |
|
|
|
|
|
Total |
472 |
4.61E+11 |
|
|
|
|
|
|
|
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
Lower 99.0% |
Upper 99.0% |
Intercept |
-518.847 |
1524.07 |
-0.34044 |
0.733682 |
-3513.71 |
2476.02 |
-4460.66 |
3422.97 |
X Variable 1 |
0.74864 |
0.026659 |
28.08157 |
2E-102 |
0.696253 |
0.801027 |
0.679689 |
0.817591 |
X Variable 2 |
147.2564 |
21.67842 |
6.792765 |
3.35E-11 |
104.6573 |
189.8555 |
91.1879 |
203.325 |
X Variable 3 |
842.2054 |
1694.082 |
0.497145 |
0.61932 |
-2486.74 |
4171.155 |
-3539.33 |
5223.738 |
X Variable 4 |
7993.062 |
1896.699 |
4.214195 |
3.01E-05 |
4265.96 |
11720.16 |
3087.485 |
12898.64 |
It can be said from the table that the regression fit is good fit but the co-efficient table shows that variable 3 has a p-value higher then 0.01. Therefore, the variabl that is D1 has to deleted from the data table. Regression test with the same dependent variable and with those same independent variables other than D1 is given below:
Regression Statistics |
|
|
|
|
|
|
|
|
Multiple R |
0.827019333 |
|
|
|
|
|
|
|
R Square |
0.683960977 |
|
|
|
|
|
|
|
Adjusted R Square |
0.681939405 |
|
|
|
|
|
|
|
Standard Error |
17630.21504 |
|
|
|
|
|
|
|
Observations |
473 |
|
|
|
|
|
|
|
ANOVA |
|
|
|
|
|
|
|
|
|
df |
SS |
MS |
F |
Significance F |
|
|
|
Regression |
3 |
3.15E+11 |
1.05E+11 |
338.3313 |
7E-117 |
|
|
|
Residual |
469 |
1.46E+11 |
3.11E+08 |
|
|
|
|
|
Total |
472 |
4.61E+11 |
|
|
|
|
|
|
|
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
Lower 99.0% |
Upper 99.0% |
Intercept |
13.95975517 |
1082.726 |
0.012893 |
0.989719 |
-2113.63 |
2141.554 |
-2786.35 |
2814.271 |
X Variable 1 |
0.749167279 |
0.026617 |
28.14623 |
8.4E-103 |
0.696864 |
0.801471 |
0.680326 |
0.818008 |
X Variable 2 |
146.7921723 |
21.64091 |
6.783087 |
3.56E-11 |
104.267 |
189.3173 |
90.82115 |
202.7632 |
X Variable 3 |
8054.211397 |
1891.187 |
4.258812 |
2.48E-05 |
4337.963 |
11770.46 |
3162.934 |
12945.49 |
It can be said from the table that the regression fit is quite good here and the p-values of the co-efficient falls under 0.01. The regression analysis can be interpreted as the ultimate model here with all the variables falling in line. Therefore, the required regression equation is :
Y= (0.75)*KCAPITAL + (146.79)*Labor + (8054.21)*D2.
- Co-efficient of KCAPITAL is the average increase in the dependent variable with the per unit increase in KCAPITA with Labor keft fixed. Co-efficient of Labor is the average increase in the dependent variable with the per unit increase in Labor keeping KCAPITAL fixed. D1 is categorical variable. Therefore, coefficient of D1 is the average change in y with every category of D1. The coefficient of KCAPITAl can be challenged here since it can be said that capital has a much larger effect in business. Again, the sign can be challenged here regarding Labor since a large number of Labor can have a negative impact. The coefficient can also be lowered regarding Labor. The model can be challenged in the lights of these arguments and a new model can be proposed like:
Y= (5)*KCAPITAL – (90)*Labor + (8054.21)*D2.
- Co-efficient of determination is defined as the proportion of variation in the dependent variables that is being interpreted from independent variables. It can be interpreted here that 68% of variation in industrial production can be explained through Labor, KCAPITAL and D1.
Subject 2:
2.1 The process starts with description of co linearity among independent variables. The independent variables and correlation between them can be depicted here:
|
WAGES |
KCAPITAL |
Labor |
D1 |
D2 |
WAGES |
1 |
|
|
|
|
KCAPITAL |
0.844151 |
1 |
|
|
|
Labor |
0.960251 |
0.751036 |
1 |
|
|
D1 |
0.027968 |
-0.03644 |
0.004177 |
1 |
|
D2 |
0.12812 |
-0.07081 |
0.155761 |
0.06072 |
1 |
The highlighted correlation is greater then 0.8. Therefore, the variable has to be removed from the dataset and it can be said that the rest of the variables are not dangerously correlated. Regression analysis on the dependent variable and the rest three of the independent variable is given below:
|
|
|
|
|
|
|
|
|
|
Regression Statistics |
|
|
|
|
|
|
|
||
Multiple R |
0.971118892 |
|
|
|
|
|
|
|
|
R Square |
0.943071903 |
|
|
|
|
|
|
|
|
Adjusted R Square |
0.942585338 |
|
|
|
|
|
|
|
|
Standard Error |
0.131050435 |
|
|
|
|
|
|
|
|
Observations |
473 |
|
|
|
|
|
|
|
|
ANOVA |
|
|
|
|
|
|
|
|
|
|
df |
SS |
MS |
F |
Significance F |
|
|
|
|
Regression |
4 |
133.1499 |
33.28748 |
1938.224 |
1.2E-289 |
|
|
|
|
Residual |
468 |
8.037533 |
0.017174 |
|
|
|
|
|
|
Total |
472 |
141.1875 |
|
|
|
|
|
|
|
|
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
Lower 96.0% |
Upper 96.0% |
|
Intercept |
0.728946993 |
0.045026 |
16.18956 |
3.92E-47 |
0.640469 |
0.817425 |
0.636217 |
0.821677 |
|
X Variable 1 |
0.745283949 |
0.017807 |
41.85416 |
2.6E-160 |
0.710293 |
0.780275 |
0.708611 |
0.781957 |
|
X Variable 2 |
0.302633323 |
0.020118 |
15.04322 |
5.29E-42 |
0.263101 |
0.342165 |
0.261201 |
0.344065 |
|
X Variable 3 |
0.000311127 |
0.012578 |
0.024736 |
0.980276 |
-0.0244 |
0.025027 |
-0.02559 |
0.026215 |
|
X Variable 4 |
0.291151384 |
0.014821 |
19.64427 |
4.23E-63 |
0.262027 |
0.320276 |
0.260627 |
0.321675 |
It can be said from the table that the regression fit is good fit but the co-efficient table shows that variable 3 has a p-value higher then 0.01. Therefore, the variabl that is D1 has to deleted from the data table. Regression test with the same dependent variable and with those same independent variables other than D1 is given below:
Regression Statistics |
|
|
|
|
|
|
|
|
Multiple R |
0.971119 |
|
|
|
|
|
|
|
R Square |
0.943072 |
|
|
|
|
|
|
|
Adjusted R Square |
0.942708 |
|
|
|
|
|
|
|
Standard Error |
0.130911 |
|
|
|
|
|
|
|
Observations |
473 |
|
|
|
|
|
|
|
ANOVA |
|
|
|
|
|
|
|
|
|
df |
SS |
MS |
F |
Significance F |
|
|
|
Regression |
3 |
133.1499 |
44.3833 |
2589.817 |
2.2E-291 |
|
|
|
Residual |
469 |
8.037544 |
0.017138 |
|
|
|
|
|
Total |
472 |
141.1875 |
|
|
|
|
|
|
|
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
Lower 96.0% |
Upper 96.0% |
Intercept |
0.729187 |
0.043918 |
16.60335 |
4.95E-49 |
0.642887 |
0.815488 |
0.638739 |
0.819636 |
X Variable 1 |
0.745264 |
0.01777 |
41.93906 |
8.4E-161 |
0.710345 |
0.780183 |
0.708667 |
0.781862 |
X Variable 2 |
0.302649 |
0.020087 |
15.06726 |
4E-42 |
0.263178 |
0.342119 |
0.261281 |
0.344016 |
X Variable 3 |
0.291168 |
0.01479 |
19.68681 |
2.48E-63 |
0.262105 |
0.320231 |
0.260708 |
0.321628 |
It can be said from the table that the regression fit is quite good here and the p-values of the co-efficient falls under 0.01. The regression analysis can be interpreted as the ultimate model here with all the variables falling in line. Therefore, required regression equation is :
Y = 0.73 + 0.74*KCAPITAL + 0.30*Labor + 0.29*D2.
2.2. Co-efficient of KCAPITAL is the average increase in the dependent variable with the per unit increase in KCAPITA with Labor keft fixed. Co-efficient of Labor is the average increase in the dependent variable with the per unit increase in Labor keeping KCAPITAL fixed. D1 is categorical variable. Therefore, coefficient of D1 is the average change in y with every category of D1. The coefficient of KCAPITAl can be challenged here since it can be said that capital has a much larger effect in business. Again, the sign can be challenged here regarding Labor since a small number of Labor can have a negative impact. The coefficient can also be increased regarding Labor. The model can be challenged in the lights of these arguments and a new model can be proposed like:
Y= (5)*KCAPITAL – (90)*Labor + (0.29)*D1.
Subject 3.
It can be checked from the residual plot and the normality plot that the necessary assumptions of residual homoscadasticity and independence are not being met here regarding the log linear model but normality condition is being met. The normality and homoscadasticity is not being met in the linear model but the residuals are independent here.. The residual plot and normality plot is attached below:
Residual plot for the log linear model.
Normality plot for log linear model.
Residual plot for linear model.
Normality plot for linear model.
- The independent variable should be choosen here.
De Oliveira, A.B., Fischmeister, S., Diwan, A., Hauswirth, M. and Sweeney, P.F., 2017, March. Perphecy: Performance Regression Test Selection Made Simple but Effective. In Software Testing, Verification and Validation (ICST), 2017 IEEE International Conference on (pp. 103-113). IEEE.
Saha, R.K., Zhang, L., Khurshid, S. and Perry, D.E., 2015, May. An information retrieval approach for regression test prioritization based on program changes. In Software Engineering (ICSE), 2015 IEEE/ACM 37th IEEE International Conference on (Vol. 1, pp. 268-279). IEEE.