Answer 1: An Overall View of CO2 Emissions
The purpose of extending the memorandum is to offer explanation in relation to the queries that has been highlighted by you in your previous communication. You also forwarded a particular sample data and the same has been analyzed to draw meaningful conclusions in relation to the CO2 emissions and the impact of various factors such as engine size, number of cylinders in the engine along with fuel type. As a result, the given sample data does not restrict itself to representation of CO2 but also includes key information on a host of other variables. This has proved to be immensely helpful while deploying requisite strategy for exploring the data and developing association between various variables of interest. Based on this analysis, it would be possible to gain a perspective on the overall CO2 emissions along with the type of vehicles that are expected to have a higher car CO2 emissions level based on various attributes for which information has been provided in the dataset. Once identified, then these factors could suitably influence the government policy and thereby modify the usage patterns of the vehicles. The significant observations obtained from data analysis which relate to the specific queries has been presented below.
The descriptive statistics in the form of selected statistics containing central tendency measures, dispersion measures has been computed for CO2 emissions using excel as an enabling tool. Additionally, in order to aid interpretation of the data, a histogram has been obtained using Excel which tends to represent the incidence of various levels of emissions.
The primary observation about the emissions level is that these are not symmetrically distributed considered a high degree of positive skew present which is representative of few vehicles that tend to emit very high quantity of CO2. Owing to this presence of skew, there is a difference between the mean and median levels corresponding to the CO2 emissions. Also, the measures of dispersion deployed clearly reflect that the dispersion in the CO2 emissions could at best be categorized as only low to moderate. Further the emphasis should be on the vehicles which have an exceptionally high emissions levels and curbing the use and purchase of these vehicles.
In order to identify the nature and underlying strength of any association between given variables, one of the most preferred techniques is the scatter plot coupled with the computation of the correlation coefficient. A scatter diagram has been constructed in the excel between the CO2 emissions level along with the type of fuel used by each of the vehicles. However seeing the plot it seems highly likely that the association relationship tends to be either insignificant or quite low only. This conclusion is also ably supported by the correlation coefficient value which has comes out as 0.167. This implies that the as number associated with fuel type tends to aggravate the CO2 emissions are most likely to increase. The number 1 is used to denote regular petrol while premium petrol, diesel and ethanol are denoted by 2, 3 & 4 respectively.
However, it is difficult to logically accept the notion highlighted by the above association. This is because relying on the above computations it would seem that usage of ethanol is expected to be more polluting than even diesel which seems unlikely as ethanol is viewed as a clean fuel. Also, amongst the four fuel types, it is expected that the lowest emissions level would be observed for normal petrol. Clearly, considering the results being counterintuitive, it is essential to be cautious in deploying this for further analysis as it is quite likely that there are other factors are play due to the low magnitude relationship between the given variables.
Answer 2: Relationships with CO2 Emissions
We have been provided sample data but objective here is to deploy the sample data and to derive the expected mean interval for the population, a technique known as confidence interval has been used to predict the relative incidence of the various variants of engine cylinder.
The engine is available in essentially three variants i.e. engine could contain either 4.6 or 8 cylinder. The requisite confidence interval has been computed using the template provided in the excel. In accordance with the obtained result, it would be fair to claim with 95% confidence that a 4 cylinder engine would have an average emission of CO2 ranging between 198.38 g/km and 203.46 g/km. Also, it would be fair to claim with 95% confidence that a 6 cylinder engine would have an average emission of CO2 ranging between 255.39 g/km and 260.97 g/km. Further, it would be fair to claim with 95% confidence that a 8 cylinder engine would have an average emission of CO2 ranging between 316.39 g/km and 328.23 g/km. It is noteworthy that the above intervals are for the population mean i.e. the expected emissions level constituting all the vehicles instead of limiting to the given sample. Also, from the computation above, it is apparent that the emissions levels tend to show a significant variation in accordance with the cylinders present in the engine. A noticeable trend which seems applicable is that as the cylinders present in the engine tend to increase, it is highly likely that the emissions of CO2 would increase and hence it is imperative that the government through suitable measures limits the usage of vehicles containing higher cylinder engines.
Establishing that there is significant difference between the emissions levels of engines having different cylinders, it is imperative to derive the proportion of vehicles with different types of engines for the population using the given sample data. This has been computed in the form of 95% confidence interval using excel template provided.
In accordance with the obtained result, it would be fair to claim with 95% confidence that a 4 cylinder engine would have an average proportion ranging between 42.23% and 48.16% of all the vehicle engines. Besides, it may be claimed with 95% confidence that a 6 cylinder engine would have an average proportion ranging between 32.64% and 38.34% of all the vehicle engines. Also, it may be claimed with 95% confidence that a 8 cylinder engine would have an average proportion ranging between 16.96% and 21.67% of all the vehicle engines. The average population distribution of vehicles having different cylinders engine clearly suggest that presence of vehicles having engines running on lower cylinders is expected to be higher when compared with the respective proportion for the vehicles having engines running on higher cylinders. This is a positive observation as the vehicles run on higher cylinder engines were found to have significantly higher CO2 emissions but considering that their proportion is the lowest provides a sense of relief. However, it is imperative that more measures have to be introduced do further alter the proportion of vehicles in the favor of 4 cylinder engines.
Answer 3: Confidence Intervals
It is being claimed that that vehicles which have CO2 emissions greater than 350g/km. form at a minimum 5% of the total vehicles that ply on the road. An inferential statistical technique known as hypothesis testing has been deployed using excel as the enabling tool so as to test whether the given sample data lends support to the above claim or not.
Based on the requisite computation for testing highlighted in the excel sheet attached it is evident that the rejection of null hypothesis could not be facilitated based on the sample data provide and therefore acceptance of alternative hypothesis cannot materialize. Thus, the appropriate conclusion to be drawn is that there is significant lesser proportion compared to 0.05 of the vehicles that support an emission level in higher quantities in comparison to the stated value of 350 g/km. Therefore, in order to curb vehicle related CO2 emissions in an effective manner, it makes sense to fix the acceptable emission limit at a value lower than 350 g/km.
One of the objectives is to explore the influence (if any) of size of engine on the emissions level of CO2. The enabling tool used in this regard for determination of any potential causal relationship is linear regression analysis which has been performed using Excel taking into consideration the provided sample data. Based on the regression, the following model emerges.
A key parameter of regression analysis which provides information regarding fit of the model is coefficient of determination or R2. R2 value is 0.7017which implies that 70.17% of the emissions change is accounted by the corresponding change in the engine size which implies it is indeed a critical factor. The slope of the above model proposed is positive with a computed value of 36.594. This implies that an increase in the engine size by 1 liter would lead to a corresponding increase in CO2 emissions by 36.59 g/km. This clearly indicates that it should be priority of the government and the involved regulators to ensure that there is a reduction in the prevalence of higher capacity engines from the current level.
With regards to whether the model highlighted above is suited to estimate the likely emission level of a 1 liter capacity engine, it is essential to explore if this value falls within the input value range used initially to derive the above stated regression model. One look at the engine size used to derive the regression equation clearly highlight that values lesser and greater than 1 liter are present in the engine size. As a result, there should be no problem using the model computed above for computing the emissions level of a vehicle running on 1 liter engine.
The minimum sample size computation may be facilitated through the usage of the following formula.
The excel file attached deploys the formula indicated above to compute the minimum sample size as 152 which is a fraction of the current sample size in excess of 1,000.
- The minimum sample size computation may be facilitated through the usage of the following formula.
The excel file attached deploys the formula indicated above to compute the minimum sample size as 129 which is a fraction of the current sample size in excess of 1,000.
Conclusion
Based on the relevant computations on the data extended, it is apparent that mean CO2 emissions is adversely impacted by high emissions level produced by certain vehicles which tend to increase the mean level. Besides, the emission levels of CO2 do not seem to be driven by the fuel type in a significant manner. Further, it may be possible while green fuels like ethanol may lead to lower emissions of other toxic pollutants but in context of CO2, these might have higher emission levels in comparison with conventional fuel type. Thus, further studies need to be pursued to clearly explore the relationship between the type of fuel and emissions level. It has also been found that vehicles supporting engines run on greater cylinder count witness a significantly higher CO2 emission. However, it is favorable that the current vehicle population seems to be dominated by the presence of vehicles run on engine that tend to support lower number of engines.
In wake of the above observation, it makes sense for the concerned authorities to ensure the suitable disincentives are introduced for vehicles supporting a 8 cylinder engine. Also, it would be prudent that requisite limit should be fixed at a level lower than350g/km so that it could prove to be an effective deterrent. A level of 350g/km would narrow down the coverage and would fail to bring tangible change in the situation. Besides, a positive sloping linear relationship is supported between engine size (as the independent variable) and emissions (as the dependent variable) and thus it makes sense the regulators to provide incentives/disincentives to ensure that lower size engine representation continuously increases. Also, a significantly smaller size of size 152 observations would be sufficient instead of the 1,082 observations considered in the given case.