Data Collection
Statistical data analysis plays an important role in analysing different facts regarding the business, industry, management, and many more sectors. Statistical analysis for any type of data is the key for making effective decisions (Hogg, 2004). It helps in making effective decisions and management according to analysis. Statistical data analysis helps in understand the actual facts and it improves the creativity of managers (Degroot, 2002). For this research study, we have to use statistical data analysis for the analysis of energy consumption data for the different sectors in the Australia. By using this statistical data analysis we have to find out whether there are any significant differences in the use of energy for the different sectors. Also, we want to check the different trends in the energy uses in accordance with time factors. We will compare different sectors for their energy uses and also we will study it for the entire use of energy for the country. Let us see this research study in detail.
For the study of above research questions, it is required to collect the data for the study variables. For this research study, a data is collected from the government website (www.industry.gov.au) of Department of Industry, Innovation and Science, Australia Government. A data is collected for the 42 years for the energy uses for different sectors in the Australia. A proper method of the data collection should use for getting unbiased results (Dobson, 2001). Instrumental errors should be minimized and other chance causes should be at minimum level during the conduction of research study (Casella, 2002). Using a data from secondary sources, proper care should be taken while sampling with data (Hastle, 2001). A data link for more detail is provided in the reference section. Data is given for the energy uses for different sectors such as agriculture, mining, manufacturing, electricity generation, construction, transport, commercial, residential, other sectors, etc. A screenshot of partial data is provided in the appendix section for more detail.
The use of descriptive statistics provides us the general idea about the different variables involved in the research study. Descriptive statistics for the energy units used for different sectors are summarised in the following table.
Descriptive Statistics |
|||||
N |
Minimum |
Maximum |
Mean |
Std. Deviation |
|
Agriculture |
42 |
38.70 |
104.40 |
69.3286 |
21.74585 |
Mining |
42 |
59.40 |
531.20 |
218.6667 |
136.46949 |
Manufacturing |
42 |
852.70 |
1343.40 |
1088.7976 |
132.46116 |
Electricity generation |
42 |
509.60 |
1913.40 |
1212.7452 |
419.52499 |
Construction |
42 |
24.90 |
41.50 |
31.4310 |
4.95716 |
Transport |
42 |
685.40 |
1612.90 |
1126.1286 |
279.86470 |
Commercial |
42 |
84.50 |
336.20 |
190.3690 |
80.21346 |
Residential |
42 |
231.30 |
456.00 |
350.4024 |
72.62252 |
Other |
42 |
48.20 |
102.00 |
70.8500 |
11.54149 |
Total |
42 |
2615.20 |
5953.80 |
4358.7190 |
1119.80978 |
Valid N (listwise) |
42 |
From above table, it is observed that average energy use for agriculture sector for Australia is given as 69.33 energy units with the standard deviation of 21.75 energy units. It is seen that average total energy use for Australia is given as 4358.71 energy units with the standard deviation of 1119.81 energy units. From the given table, it is observed that manufacturing sector needs most of the energy. Most significant sectors for energy uses are given as manufacturing, electricity generation, transport, and residential.
Descriptive Statistics
Graphical analysis of the data provides an easy idea for comparisons and understanding of the concepts (Evans, 2004). Now, we have to see some graphical analysis for the given information regarding the energy uses in Australia.
First of all, we have to see the energy uses for the all sectors by using the box plots which are summarised below:
From the given box plots, it is observed that the energy use for the sectors manufacturing, electricity generation, and transport is high as compare to other sectors, agriculture and construction uses less energy.
Now, we have to see some time series analysis for the energy uses for different sectors for the last 40 years.
First of all, we have to time series analysis for total energy use for the Australia which is given as below:
From above time series plot, it is observed that the energy use for the country is continuous increasing from the last 40 years.
The energy use pattern for the agriculture sector is provided below:
From above time series plot, it is observed the energy use is continuously increasing for the agriculture sector with some up and down movement for past some years.
For the section mining, the energy use is explained by using the following time series plot.
From above given time series plot, it is observed that the energy use for the mining sector is continuously increasing.
For manufacturing sector, the time series plot for energy uses is given.
For electricity generation sector, the time series plot for energy uses is given.
For construction sector, the time series plot for energy uses is given.
For transport sector, the time series plot for energy uses is given.
For commercial sector, the time series plot for energy uses is given.
For residential sector, the time series plot for energy uses is given.
The study of correlation and linear regression is the significant statistical procedure for obtaining the future and current values for the response variable (Cox, 2000). In this section, we have to see some correlation coefficients for the different energy sectors and these correlation coefficients with their significances are provided below:
Agriculture |
Mining |
Manufacturing |
Electricity generation |
Construction |
||
Agriculture |
Pearson Correlation |
1 |
.911** |
.868** |
.965** |
-.715** |
Sig. (2-tailed) |
0 |
0 |
0 |
0 |
||
N |
42 |
42 |
42 |
42 |
42 |
|
Mining |
Pearson Correlation |
.911** |
1 |
.888** |
.891** |
-.710** |
Sig. (2-tailed) |
0 |
0 |
0 |
0 |
||
N |
42 |
42 |
42 |
42 |
42 |
|
Manufacturing |
Pearson Correlation |
.868** |
.888** |
1 |
.915** |
-.669** |
Sig. (2-tailed) |
0 |
0 |
0 |
0 |
||
N |
42 |
42 |
42 |
42 |
42 |
|
Electricity generation |
Pearson Correlation |
.965** |
.891** |
.915** |
1 |
-.670** |
Sig. (2-tailed) |
0 |
0 |
0 |
0 |
||
N |
42 |
42 |
42 |
42 |
42 |
|
Construction |
Pearson Correlation |
-.715** |
-.710** |
-.669** |
-.670** |
1 |
Sig. (2-tailed) |
0 |
0 |
0 |
0 |
||
N |
42 |
42 |
42 |
42 |
42 |
It is observed that there is strong positive correlations are exists between the different sectors for the energy uses. The agriculture sector and mining sector shows the correlation coefficient of 0.911, which indicate a strong linear relationship between these two sectors. Also, there are some negative correlations exists between some pairs of sectors for energy uses.
Graphical Analysis
The pairs of different energy use sectors with positive correlations include agriculture and mining, agriculture and manufacturing, agriculture and electricity generation, etc. The pairs of different energy use sectors with negative correlations include agriculture and construction, mining and construction, electricity generation and construction, etc.
Now, we have to see the multiple linear regression model for the prediction of total energy use based on the different energy use sectors. Required regression model is summarised as below:
Variables Entered/Removedb |
|||
Model |
Variables Entered |
Variables Removed |
Method |
1 |
Construction, Manufacturing, Agriculture, Mining, Electricity generation |
. |
Enter |
a. All requested variables entered. |
|||
b. Dependent Variable: Total |
Model Summary |
||||
Model |
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
1 |
1.000a |
.999 |
.999 |
27.41965 |
a. Predictors: (Constant), Construction, Manufacturing, Agriculture, Mining, Electricity generation |
From the above table, it is observed that there is perfect linear relationship exists between the dependent variable and independent variable for this regression model. The value of R square or coefficient of determination is given as 0.999, which means about 99.9% of the variation in the dependent variable is explained by the independent variables. The ANOVA table for this regression model is given as below:
ANOVAb |
||||||
Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
|
1 |
Regression |
5.139E7 |
5 |
1.028E7 |
13669.409 |
.000a |
Residual |
27066.148 |
36 |
751.837 |
|||
Total |
5.141E7 |
41 |
||||
a. Predictors: (Constant), Construction, Manufacturing, Agriculture, Mining, Electricity generation |
||||||
b. Dependent Variable: Total |
From above ANOVA table, it is observed that the p-value for this regression model is given as 0.00 which is less than default level of significance or alpha value 0.05, so we reject the null hypothesis that there is no any significant linear relationship exists between the dependent variable and independent variables. There is sufficient evidence to conclude that there is a statistically significant linear relationship exists between the dependent variable and independent variables. The table for regression coefficients is summarized as below:
Coefficientsa |
||||||
Model |
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
||
B |
Std. Error |
Beta |
||||
1 |
(Constant) |
332.377 |
105.070 |
3.163 |
.003 |
|
Agriculture |
2.144 |
.937 |
.042 |
2.289 |
.028 |
|
Mining |
2.476 |
.090 |
.302 |
27.583 |
.000 |
|
Manufacturing |
1.366 |
.096 |
.162 |
14.236 |
.000 |
|
Electricity generation |
1.436 |
.051 |
.538 |
28.201 |
.000 |
|
Construction |
3.428 |
1.294 |
.015 |
2.649 |
.012 |
|
a. Dependent Variable: Total |
Above regression coefficients are statistically significant as the corresponding p-values are less than the level of significance or alpha value 0.05.
Testing hypothesis is the technique in inferential statistics which allow us for deciding whether hypothesis would be rejected or not (Pearl, 2000). Statistical testing have a significant role in the theory of inference (Liese, 2008). Now, we have to use the independent samples t test for checking whether the two means for the population has any significant difference or not. Here, we want to check whether there is any significant difference in the average energy uses for the two sectors such as manufacturing and transport. The null and alternative hypothesis for this test is given as below:
Null hypothesis: H0: There is no any statistically significant difference in the average energy use for the two sectors manufacturing and transport.
Alternative hypothesis: Ha: There is a statistically significant difference in the average energy use for the two sectors manufacturing and transport.
We consider 5% level of significance for this test. The test results for this test are summarised below:
Calculations Area |
|
Pop. 1 Sample Variance |
17545.9695 |
Pop. 2 Sample Variance |
78324.2503 |
Pop. 1 Sample Var./Sample Size |
417.7612 |
Pop. 2 Sample Var./Sample Size |
1864.8631 |
For one-tailed tests: |
|
TDIST value |
0.2189 |
1-TDIST value |
0.7811 |
Correlation and Linear Regression
Separate-Variances t Test for the Difference Between Two Means |
|
(assumes unequal population variances) |
|
Data |
|
Hypothesized Difference |
0 |
Level of Significance |
0.05 |
Population 1 Sample |
|
Sample Size |
42 |
Sample Mean |
1088.797619 |
Sample Standard Deviation |
132.4612 |
Population 2 Sample |
|
Sample Size |
42 |
Sample Mean |
1126.128571 |
Sample Standard Deviation |
279.8647 |
Intermediate Calculations |
|
Numerator of Degrees of Freedom |
5210373.6950 |
Denominator of Degrees of Freedom |
89078.9952 |
Total Degrees of Freedom |
58.4916 |
Degrees of Freedom |
58 |
Standard Error |
47.7768 |
Difference in Sample Means |
-37.3310 |
Separate-Variance t Test Statistic |
-0.7814 |
Two-Tail Test |
|
Lower Critical Value |
-2.0017 |
Upper Critical Value |
2.0017 |
p-Value |
0.4377 |
Do not reject the null hypothesis |
The p-value for above test is given as 0.4377 which is greater than the given level of significance or alpha value 0.05, so we do not reject the null hypothesis that there is no any statistically significant difference in the average energy use for the two sectors manufacturing and transport.
There is insufficient evidence to conclude that there is a statistically significant difference in the average energy use for the two sectors manufacturing and transport.
If more than two population averages need to be compared, a technique of ANOVA would be found significant (Ross, 2014). Now, we have to use one way ANOVA for checking the hypothesis whether there is any significant difference observed in the average energy use for three sectors such as manufacturing, transport, and electricity generation. The null and alternative hypotheses for this test are given as below:
Null hypothesis: H0: There is no any significant difference in the average energy uses for three sectors such as manufacturing, transport, and electricity generation.
Alternative hypothesis: Ha: There is a significant difference in the average energy uses for three sectors such as manufacturing, transport, and electricity generation.
We consider 5% level of significance for this test.
Test results are summarised as below:
ANOVA: Single Factor |
||||||
SUMMARY |
||||||
Groups |
Count |
Sum |
Average |
Variance |
||
Group 1 |
42 |
45729.5 |
1088.797619 |
17545.9598 |
||
Group 2 |
42 |
50935.3 |
1212.745238 |
176001.2201 |
||
Group 3 |
42 |
1320.1 |
31.43095238 |
24.5734 |
||
ANOVA |
||||||
Source of Variation |
SS |
df |
MS |
F |
P-value |
F crit |
Between Groups |
35404470.1035 |
2 |
17702235.0518 |
274.3515 |
0.0000 |
3.0699 |
Within Groups |
7936441.8836 |
123 |
64523.9178 |
|||
Total |
43340911.9871 |
125 |
||||
Level of significance |
0.05 |
From above ANOVA table, the p-value is given as 0.00 which is less than the given level of significance or alpha value 0.05, so we reject the null hypothesis that There is no any significant difference in the average energy uses for three sectors such as manufacturing, transport, and electricity generation.
There is sufficient evidence to conclude that there is a significant difference in the average energy uses for three sectors such as manufacturing, transport, and electricity generation.
Conclusions
For the above research study, the conclusions are summarised as below:
- It is observed that average energy use for agriculture sector for Australia is given as 69.33 energy units with the standard deviation of 21.75 energy units. It is seen that average total energy use for Australia is given as 4358.71 energy units with the standard deviation of 1119.81 energy units. From the given table, it is observed that manufacturing sector needs most of the energy. Most significant sectors for energy uses are given as manufacturing, electricity generation, transport, and residential.
- From the given box plots, it is observed that the energy use for the sectors manufacturing, electricity generation, and transport is high as compare to other sectors, agriculture and construction uses less energy.
- It is observed that the energy use for the country is continuous increasing from the last 40 years.
- The pairs of different energy use sectors with positive correlations include agriculture and mining, agriculture and manufacturing, agriculture and electricity generation, etc. The pairs of different energy use sectors with negative correlations include agriculture and construction, mining and construction, electricity generation and construction, etc.
- It is observed that there is perfect linear relationship exists between the dependent variable and independent variable for this regression model. The value of R square or coefficient of determination is given as 0.999, which means about 99.9% of the variation in the dependent variable is explained by the independent variables.
- There is sufficient evidence to conclude that there is a statistically significant linear relationship exists between the dependent variable and independent variables.
- There is insufficient evidence to conclude that there is a statistically significant difference in the average energy use for the two sectors manufacturing and transport.
- There is sufficient evidence to conclude that there is a significant difference in the average energy uses for three sectors such as manufacturing, transport, and electricity generation.
References
Casella, G. and Berger, R. L. (2002). Statistical Inference. Duxbury Press.
Cox, D. R. and Hinkley, D. V. (2000). Theoretical Statistics. Chapman and Hall Ltd.
Degroot, M. and Schervish, M. (2002). Probability and Statistics. Addison – Wesley.
Dobson, A. J. (2001). An introduction to generalized linear models. Chapman and Hall Ltd.
Evans, M. (2004). Probability and Statistics: The Science of Uncertainty. Freeman and Company.
Hastle, T., Tibshirani, R. and Friedman, J. H. (2001). The elements of statistical learning: data mining, inference, and prediction: with 200 full-color illustrations. Springer – Verlag Inc.
Hogg, R., Craig, A., and McKean, J. (2004). An Introduction to Mathematical Statistics. Prentice Hall.
Liese, F. and Miescke, K. (2008). Statistical Decision Theory: Estimation, Testing, and Selection. Springer.
Pearl, J. (2000). Casuality: models, reasoning, and inference. Cambridge University Press.
Ross, S. (2014). Introduction to Probability and Statistics for Engineers and Scientists. London: Academic Press.
Data link: https://www.industry.gov.au/Office-of-the-Chief-Economist/Publications/Pages/Australian-energy-statistics.aspx