The Impact of Climate Change on Human Population and the Ecological System
Most of the problems, faced today, by the human population and the flora and fauna, are mainly because of the abrupt and frequent changes in the climate and the aggressive change in the environment. With the increasing population and its increasing demands, the industrial labour has increased considerably, leading to the release of tons of unwanted and harmful gases and elements, thereby contributing to the climatic change in the most devastating manner. Well, of course, with the changing weather, the adaptability has also taken place, but in the end, the adaptability to weather change ratio is quite unsatisfactory. In simple words, climate or ecology is the combination of several circumstances such as wind, temperature, humidity, air pressure, rainfall, and other climatic components. A variety of variables, including extreme phenomena, biological influences, and human actions, can cause global warming, resulting in a drastic temperature change. Increased environmental divergence and fluctuation in the prevalence and magnitude of dangerous weather events such as cyclones, heavy storms, rainfall, droughts, tsunamis, cold spells, and so on are some of the most well-known types of climate change. All this has resulted in dramatic shifts in glaciers and seas, including glacier warming, melting and acidification of the oceans, and rising sea levels. These variations existed before and continue to exist currently; the only disparity is that the scope of such activities has grown to the point where glaciers may disappear, oceans may no longer be a supportive environment for marine vegetation to survive, and natural disasters such as cyclones, typhoons, and tsunamis may become more common in the coming decades.
Global warming, characterized as an unusually exponential development in the Earth’s environment caused primarily by the combustion of coal, fossil fuels and the emission of greenhouse gases is some of the primary causes of the decay of the ecological system and the unfavorable depletion of the ozone layer that has picked over in recent years. Now, discussing the duvet of the Planet, which plays an important part in the survival method, are the emissions of the Greenhouse Gases, which absorb a considerable amount of energy in the climate, consequently warming it. The Greenhouse Effect pertains to the warming of the Earth’s atmosphere. Water vapor, methane, Carbon dioxide, and dinitrogen-oxide are the substances and gases involved in this process. Aside from the use of fossil resources, other factors contribute to environmental warmings, such as forest fires, discharge of sulfur dioxide by volcanoes, the release of carbon dioxide from the combustion of fuels by cars and trucks, and methane obtained by the breakdown of food in agricultural production animals such as cattle and sheep. Not only that but the glaciers and seas also play a very important role in regulating the climate and keeping it cold and also stabilizing the air temperature. When such natural resources are burned and melted, the fuel source is trapped in the atmosphere, and only a little portion is reflected into the atmosphere, raising the temperature of the atmosphere. According to the National Oceanic and Atmospheric Administration (NOAA), this decade seems to have been the warmest since 1880, and it is anticipated that the Earth’s temperature might rise by a further 7.2 degrees Fahrenheit throughout the twenty-first century if the rate of fossil fuel combustion and other natural resources does not decline.
The Use of Fossil Fuels and Greenhouse Gases Leading to Global Warming
Now, it is very much clear from the preceding information that human actions are the primary cause of the suffering of all available species. To avoid the spread of global warming, the release of greenhouse gases, the usage of fossil fuels, and the mismanagement of resources, widespread societal awareness must be fostered. The second is the use of fossil fuels that must be reduced and the third is that the industrialists must keep hold of the gases emitted by their businesses and factories, which contribute to air pollution. The fourth point is that increasing the number of trees planted can lead to a reduction in the amount of atmospheric co2, which has been raised owing to either tree cutting or global climate change and greenhouse effects. Field specialists should do a detailed case study, and based on that, certain regulations should be set that must be observed by everyone, factory owners, laborers, and everyone else. Finally, providing health-related ideas, both offline and online, might be useful, leading the public to take the required actions and motivating them to build a healthy and pleasant atmosphere.
Now, it has been a decade when United Nations Environment Program (UNEP) acknowledged that irrespective of the pledges made for the reduction of the emissions by different countries, the carbon dioxide emissions would surge to 8-12 gigatons by the end of the year 2020, which is beyond the avalanche that can be still controlled to avoid the catastrophic scenario in terms of the greenhouse gas production (Milman, 2013). Also, as per the recent United Nations Climate Report, it was found that along with the temperature cap reaching to 1.5 degree Celsius that was prompted to in Paris but also that global warming is double that of 1.5 degree. It was also found that the areas or regions that emit high carbon dioxide gas are more susceptible to having high global warming experience. All these observations and inferences were made by analyzing the facts emerging from the IPCC or Intergovernmental Panel on Climate Change, which is a result of the data compiled by various scientists and experts from more than 195 nations that explains the trend and extend of the rise in the carbon emissions throughout the world due to all sorts of activities majorly due to the human activities.
Global warming is frequently regarded as a result of anthropogenic activities. It is the fast rise in the climatic temperature of the Ecological environment, which is producing a slew of life-threatening crises. Global Climate change is an important consequence of the ecology that is currently being dealt with. Population expansion, rapid urbanization, and pollution are all experiencing a surge in global warming. The term “global warming” denotes the hike in the average climatic temperature of the Troposphere during the previous century. One of the major reasons that Global Warming is problematic is just that it disrupts the planet’s general ecology. This causes flooding, starvation, cyclones, and many other problems. There are several other reasons and consequences, which would pose a threat to the sustainability of life. Global Warming or global climate change is already observable, with various natural occurrences occurring throughout the world, harming all living organisms.
The Role of Natural Resources in Regulating the Climate
Following is some data that gives even more correct understanding of the current scenario of Global Warming based on the past few years data:
- Talking about the numbers, then the global temperature is hovering around 1.5°C higher than it was in the late 1700s, at the advent of the industrial revolution. This doesn’t seem like much, but it should be noted that is just an average number and this number has kept on rising and is expected to rise further. Many sections of the world are seeing significantly more significant thermal gradients, which have an impact on the planet’s general health.
- Another observation was that the global carbon dioxide emissions in the 1950s were around 6 billion tonnes, which is 1990 multiplied four times and achieved 22 billion tonnes, in mere 4 decades from the base year. Not just that, but today’s unregulated carbon emissions total 35B tonnes.
- Urbanization, industrialization, deforestation, and complex human activities are the most visible drivers of global warming. Such human actions have increased the release of GH gases such as carbon dioxide, methane, and nitrous oxide majorly.
Global Warming is caused by a multitude of factors. Out of these, some
of the problems can be managed by humans, but many are anticipated to be resolved by political figures, communities, and activists on a worldwide scale.
Now, some of the major reasons for the rising Global Warming as per recent studies and top scholars are:
- Greenhouse gases and their emissions
- Pollution
- Deforestation
- Coal, Fossil fuels, electricity, and oil consumptions
- Per capita carbon emissions
Undoubtedly global warming is a concerning scenario that is having a huge influence on life’s survival. Abnormal climate change has caused natural disasters, which can be seen everywhere. One of the primary causes of warming is the excessive greenhouse gas emissions that become trapped in the troposphere, causing the temperature to rise. Likewise, volcanoes contribute to global warming by spewing excessive carbon dioxide into the atmosphere. The population boom is another one of the major causes of Global Warming, which in turn leads to increased air pollution. Along with that, automobiles like four and two-wheelers emit a lot of carbon, which lingers in the atmosphere.
The rise in the world’s population is one of the primary contributors to deforestation, which in turn contributes to the phenomenon of global warming. There is an increase in the concentration of CO2 due to the cutting down of more and more trees. The natural process that occurs when sunlight travels through the region and warms the surface of the planet is referred to as the greenhouse effect. The surface of the planet gives out energy in the form of heat into the atmosphere, which helps maintain a balance with the energy that is being taken in. The ozone layer will eventually be destroyed because of global warming, which will bring about the end of the world. There is compelling evidence suggesting that the current rate of rising global warming will ultimately result in the demise of all life on the surface of the world.
Many people are interested in proving that concerns about global warming are exaggerated for political gain, even though the issue is quite genuine. However, as informed citizens of the globe, we have a responsibility to ensure that the media never distorts the truth in any way. The harm that is generated by global warming has direct and negative effects on many different aspects of the ecosystem, including both the flora and the wildlife. The plight of wildlife eventually poses a significant risk to the continued existence of mankind in its current form and the future of civilization. This decade has seen widespread evidence of the effects of global warming. The most typical occurrences seen are the retreat of glaciers and the reduction in the area of the Arctic. Glaciers are disappearing at an alarmingly rapid rate. These are textbook examples of how the climate is changing. A key additional impact of global warming is the elevation of the ocean’s surface. The increase in sea level is making low-lying communities more susceptible to flooding. Many nations are experiencing weather conditions that are considered to be quite severe. Unseasonal rainfall, intense heat and cold, wildfires, and other natural disasters are all fairly typical occurrences throughout the year. The number of reported instances of this condition is growing. The effect of this will be an imbalance in the environment, which will ultimately lead to the extinction of several species. The same can be said for marine life, which is likewise being severely impacted by the progression of global warming. This is leading to the extinction of marine species in addition to causing other problems. In addition, changes are anticipated to take place in coral reefs, which are likely to be destroyed in the years to come. In the years to come, these consequences will see a sharp increase, which will put an end to the progression of species. In addition, human beings will ultimately be affected negatively by the effects of global warming as well.
The Need for Societal Awareness and Regulations to Combat Global Warming
As per academics, the only chance for the planet to reduce carbon emissions this century is to stabilize the atmospheric temperatures at or below the breakdown number of 1.5 degrees Celsius (McGrath, 2022). Despite the dire impression given by the IPCC fact sheet, the IPCC in its briefing also demonstrates that it is important to trap heat at 1.5 degrees Celsius, which requires massive changes in energy and power generation, industry, transportation, and individual consumer behavior, as well as the way humans treat deals with nature and its resources (The Nature Conservancy, 2022).
The science behind the change in the climate and its behavior, as demonstrated by the above findings, is a critical concern in today’s globe. Multiple researches have recommended the use of statistical models along with decision-support approaches to better grasp the complexities of environmental issues (Viktor, et al., 2021). Applications and data science enables the integration and simulation of enormous quantities of fragmented data models to be constructed while taking into account the social-economic interrelationships. Finding the possible loopholes to mitigate the situation and get it under control is hence a major takeaway from the report. The current report focuses on analyzing the major factors that are involved in the rise or fall in the carbon dioxide emission rate by analyzing the already available data from the relevant and recognized data repositories.
The study attempts to get a greater and more detailed understanding of the carbon emissions from each of the 95 countries since 1990, as well as to create an effective supervised learning model for predicting and forecasting the carbon dioxide emissions in each of the countries across the world.
Research questions formulated three main aspects including:
- What is the distribution of carbon emissions over time?
- How do machine learning models perform in the prediction of carbon change and hence climate change?
- How can the knowledge that has been accumulated, be used in combating climate change?
Answering the research questions is kept for the rest of the paper in focus.
Exploratory data analytics or EDA and leveraging some of the most adopted machine learning models and algorithms such as Lasso Regression and Ridge Regression, Gradient Boosting, and Random Forest, is one of the feasible and recommended strategies for analyzing and predicting climate change.
Data science plays an important part inside the debate over climate change. Despite the diverse functionality of data science, spanning from learning algorithms to information visualisation to machine learning, data analysis serves as an important tool for understanding the consequences of climate change in multiple areas such as environmental science, including the use of estate in civilization as well as its recovery, culinary processes all around the globe, changes that take place in vector-borne related illnesses, as well as other environmental aspects. It really is suggested using data analysis may help academics make logical sense of every fundamental inconsistency as well as contradictions in data, as well as discover initiatives, strategies, and remedies targeted at improving mankind and also the ecosystem as a method of combatting climate change.
Climate science would be a subsection of the data field that studies the Ecological climate, which is also an instance of a stochastic process that, in addition to it continual movement, distributes energies as well as sustains existence. Climate science is mainly concerned with “…trying to study massive changes inside the lands, weather, seas, and permafrost during lengthy timeframes” (Faghmous & Kumar, 2014). Development of the information in global practices and functions is crucial because it allows people to acquire an understanding of the reasons that cause the reported connections as well as how the variables influence existence if the world’s temperature is raised, — in other words, the influence of changing climate.
The Consequences of Unchecked Global Warming
While climate research contributes to amassing compelling information on changing climate as well as its causes, providing fresh cautions, and proposing as well as developing new responses, the sadness would be that carbon pollution continues to rise each year. This begs the concern, “What really is completely mistaken?” Maybe because, in contrast to the Covid-19 epidemic, which had been classified as urgent, the climate warming global epidemic was classified as simply severe? Or could it be owing in proportion to lawlessness? (Glavovic, et al., 2021) Another of the biggest perpetrators of state inactivity seems to be the USA, which has had persons in positions of considerable responsibility reject the problem of climate change on a regular basis, consequently raising the price of tackling climatic changes to around $ Trillions annually.
Marshall in 2006 had mentioned in his study upon this history of climatic changes that the IPCC issued its first research on changing climate in 1990. As IPCC concluded in its report that the earth has been warming by about 0.5°C throughout the 20th century. Following that, the IPCC claimed that just strong actions would stop growing emissions of greenhouse gases and so avert irreversible global temperature impacts. The conclusions had the impact of laying the groundwork for the Nations General Assembly to negotiate a climatic agreement in Late November of the same year. The same year 1990 witnessed an upsurge in awareness about changing climate, leading to the publishing of studies. According to Mintzer (1990), the temperatures of the globe is governed by the equilibrium in between the speed about which power produced by sunshine reaches the surface as well as the pace with which the world that has already been heated by that of the sunlight emits radiated back out to space. Degrees that enable humans to live on Earth are intimately related to the retained warmth by traces of components like evaporation, co2, gas, and other substances that may consume carbon pollution. Human actions not only boost the number of the observable components inside the planet’s surface, such as moisture, co2, methane, and some other molecules that may capture carbon pollution, but they really introduce new and strong fumes that attract thermal light towards the mix. Human actions not only boost the number of the observable components inside the planet’s surface, such as moisture, co2, methane, and some other molecules that may capture carbon pollution, but they really introduce new and strong fumes that attract thermal light towards the mix.
Over the period, the growth and rise of the data science fields including machine learing, big data, deep learning, data mining, and data visualization have empowered and enabled to the adoption of these tools and techniques in the vast fields including being used as research tools in climate science (Hassani, et al., 2019).
Climate change is a large and statistics issue. To research climate, extensive big data analysis methods have been developed. The development of the internet of things (IoT), software as a service, and big data technologies, including integrated advanced technologies have increased the importance of statistical techniques within climate research, as well as the evolution of big information stored in data science. Such techniques are being used on many scales but were not confined to sustainable technologies, cognitive agricultural operations, strategic urban development, weather prediction, natural hazard mitigation, and so on (Hassani, et al., 2019). He suggested a canonical paradigm, M-PRECLIS, enabling climate variability predicting, that predicated on their evaluation was considered to become a sustainable idea suited for future investigation producing new routes of changing climate (Ramos, et al., 2022).
It is suggested that communicating outcomes is just as crucial as a lot of improvements. Situation visualisation is a key part of climate change initiatives. This entails using visuals to analyses the data and appealingly illustrate trends. Another of the primary means of communication elements being used by IPCC is visualisation, such as the dramatic “fire ember” graphic, which portrays the widespread effects of the varying rate of global overheating using a historical colour scheme from white towards red (Xexakis & Trutnevyte, 2021). Climate warming visualisation helps the core demographic to comprehend what might otherwise be abstract elements of climatic changes and promotes the thesis that a localised emphasis may cause environmental issues (Ballantyne, 2018).
Combining vast temperature data using” KDD or Knowledge Discovery using Data Mining” allows for the generation of fresh ideas about climatic changes and also the underpinning reaction to artificial pressure. Bracco and his team (2018) employed -MAPS, another complicated social evaluation method, to explore potential local and non-local quantitative connections in temperature records. Bracco (2018) accomplish this by investigating the commonalities and dissimilarities within meanings of known circulation patterns throughout climatic ?ndings as well as information using linked climate change models.
Chen (2022) and his fellow researchers in their work have emphasized the use of deep learning to foretell vegetation cover under various SSPs or Shared Socioeconomic Pathways, which were predicted by the end of the twenty-first century and was discovered that there would be no essential variations in the dynamics exhibited by vegetation cover under the IPCC’s sustainable development scenario. A couple of years ago, Amarpuri in 2019 suggested a deep learning model combination, namely CNN-LSTM or Convoluted Neural Network-Long Short-Term Memory Network, for estimating carbon emissions in India. They assessed the quality of an autoregressive model to the suggested LSTM-CNN technique in their investigation. The CNN-LSTM model beat the autoregressive or smooth exponential method, with an RMSE of 1.49 in estimating overall carbon dioxide emissions in India compared to the linear regression model’s RMSE of 58.45. Furthermore, scholars like Singh and Dubey 2021, tried to predict the carbon gas emissions from automobiles using a mix of an RNN or Recurrent neural network and an LSTM or Long short-term memory model. When contrasted with the Deep CNN and DNN or Deep Neural Network, the suggested model had an RMSE of 9.30, whereas the DNN had an RMSE of 64.87 and the Deep CNN had an RMSE of 17.82, which clarifies that the combined RNN and LSTM model are the outperforming models in this case.
Rao (2021) used three models in his work to forecast carbon emissions from power production: SVM or support vector machine, LM or linear model, and tuned SVM. When estimating Carbon dioxide emission using the O&M or the Operation and Maintenance cost of the electrical plants, the SVM had an RMSE of 2.70117, the linear regression model had an RMSE of 4.25878, and the TSVM or tuned SVM had an RMSE of 2.58541. In another research, which was done by Thanh and Lee (2022), it was suggested that the three models for forecasting the quantity and quantum of carbon trapped in the saline formations were SVM, GPR or the Gaussian process regression, and RF or the Random Forest algorithm. The GPR model surpassed both the RF (RMSE = 0.0086, R2 =.91)and the SVM (RMSE = 0.0057, R2 =.97) models, with an R2 or R-squared of 0.992 and an RMSE of roughly 0.00491.
On further deep-dive of the past research works, it was found that to forecast the quantity of carbon in the soil as a consequence of environmental changes, SVR or Support Vector Regression, KNN or the K-Nearest Neighbor regression, GB or Gradient Boosting, and RF or Random Forest models could be used as employed by Adjuik and Davis (2022) in their analysis. According to research findings, the Gradient Boosting model worked wonderfully with an RMSE of 4405.43 g C ha1 day1 and R2 of 0.88, whereas the KNN and RF performed well in out-of-sample data. Thereafter, Kadam and Vijayumar (2018) through their work have suggested a regression method to forecast the carbon emissions based on the World Bank data collected between the period 1964 and 2018. Specifically, the entire analysis of Kadam and Vijayumar (2018) revolved around forecasting the quantity of carbon emitted in the current year given the previous year’s emissions. Resultantly, they found that the regression model attained an extraordinary RMSE or the error of 0.2557.
Kadam and Vijayumar (2018) in their work have used historical data or rather secondary data to perform the analysis and forecasting of carbon emissions throughout the world that is closely related to the current research and analysis. The present study uses features like carbon produced by various factors like coal, fuel, oil, and electricity to forecast a country’s per capita emission rate of carbon dioxide. Both types of research make use of historical information on carbon emissions from different nations, as well as a multiple linear regression as one of the major machine learning techniques. Furthermore, on both the in-sample and out-of-sample datasets, both the studies employ the RMSE or the Root Mean Squared Error as one of the performance assessment techniques.
As previously stated, the report particularly related to the present work employs just the regression analysis with the former year’s carbon emissions to project the admissions in the next year. In this research, up to four models are presented, including Lasso and Ridge Regression, Random Forest, and Gradient Boosting, with the best-ranked model chosen for predicting relying on R – squared and RMSE. Furthermore, unlike Kadam and Vijayumar (2018), rather than the exact carbon emissions utilized, the current analysis utilizes each country’s per head carbon emissions from the year 1990. This project will also utilize the name of the countries as an exogenous construct to estimate each nation’s per capita carbon dioxide emissions. Furthermore, the most latest evidence offers forecasts for a single nation, India, but this analysis intends to anticipate outputs from all countries across the world.
There are very few studies that focus on the issues with the aid of machine learning and better use of it to generate decision making results and hence to anticipate the largest carbon dioxide emitters as per the current study. This study will be having highlights on the names of the pareto nations that are predicted to generate the most carbon dioxide gas over the next 5 to 10 years, along with projecting the quantum of carbon generated by the countries.
It is envisaged that by the end of the present work, it would be essential to recognize the nations with the greatest carbon emissions and spotlight the countries that are thought to generate the most carbon gas per capita in the future decade. As a result, the research project, in terms of adding to the extensive emphasis on the use of data science and machine learning, specifically deep and machine learning, to limit global warming, can be used as a foundation for the formulation of policies to guarantee countries meet standards that confine the amount of Carbon dioxide a nation can generate per capita.
The current study uses a secondary research strategy that uses quantitative methodologies to develop different machine learning algorithms to predict and forecast carbon dioxide emissions per capita. Towards that aim, if the qualities are not quantifiable, they will be changed to numbers using suitable procedures whilst the data preparation stage.
Figure 1: Data Analysis Stages
This is the general flow that is followed in any analysis. The first step is to collect or gather the data from the relevant sources, be it primary, secondary, or tertiary data sources, as per the requirement (Obaid, Dheyab, and Sabry, 2019). The next step is “Data Preparation” which is the extracting and fixing the data for analysis. It can also include getting the right and relevant data from the given dataset. The next step is “Data Cleaning”, which can also be said as the subset of the previous step that covers the major portion of the time involved in conducting any analysis (Chandrasekar et al., 2017). In this step, a dataset is thoroughly judged, assessed, and subjected to various tests that help in the cleaning and grooming of data, as in removing any sort of irregularities and anomalies.
The next step is “Data Exploration” which is done in two ways: “Descriptive Analysis” and “Inferential Analysis”. The descriptive analysis gives a summarized view of the data, in short, it defines the data and provides a brief of what the data exactly looks like, its features and characteristics, including statistical views as well (Obaid, Dheyab, and Sabry, 2019). The purpose of descriptive analysis, which may also be referred to as descriptive analytics or descriptive statistics, is to summarise or characterize a collection of data by the use of statistical methods (Stapor, 2020, pp. 63-75). The capacity of descriptive analysis, one of the primary forms of data analysis, to provide understandable insights from data that would otherwise be uninterpreted is one of the primary reasons for its widespread use. One further advantage of descriptive analysis is that it may be used to assist filter out data that is less useful. This is due to the fact that the statistical methods used in this kind of study often concentrate on the patterns found within the data rather than the outliers. An additional use for descriptive analysis is as a prerequisite for predictive or diagnostic analysis (Amrhein, Trafimow and Greenland, 2019, pp. 263-265). This kind of analysis offers insights into what has occurred in the past before trying to explain why it has occurred or projecting what will occur in the future.
The inferential analysis is the last part which involves the application of Machine learning algorithms and other techniques that are involved in predicting from the dataset. To get at the results, inferential statistical analysis will be carried out as the technique of choice. Users are able to draw conclusions or draw inferences about patterns affecting a higher population based on the analyses of smaller samples of that community. In its most basic form, it works by drawing inferences about a broader population or group based on the data obtained from a smaller sample (Stapor, 2020, pp. 63-75). Studying the correlation within a sample is common use in this context of statistical analysis. This type of research makes it possible to draw conclusions and make generalizations that appropriately reflect the population. And in contrast to descriptive analysis, predictive analysis allows organizations to test a hypothesis and draw several conclusions based on the data (Amrhein, Trafimow and Greenland, 2019, pp. 263-265). There are many different forms of inferential statistical tests that are used in the area of statistics. Some examples of these tests include ANOVA, regression, hypothesis testing, various machine learning models and algorithms, and many more.
Insufficient or missing data defining: If data isn’t adequately profiled, mistakes, anomalies, and other issues may not be recognized, which can result in faulty analytics. If information isn’t correctly profiled, flaws, discrepancies, and other concerns might not be identified.2.
Data sets usually involve missing values and also other types of incomplete data; problems of this kind need to be evaluated to see whether or not they are the result of an error and if so, they must be resolved.3.
Invalid data values: Examples of erroneous entries that regularly occur in data include misspelled words, other mistakes, and incorrect numbers (Stapor, 2020, pp. 63-75). These errors need to be repaired if accurate analytics are to be obtained from the data.4.
Standardization of names and locations: Names and addresses may be inaccurate in the data coming from various systems, with discrepancies affecting perceptions of customers and other institutions. This may be problematic for standardization efforts.5.
Inconsistent data across corporate systems: Other discrepancies in data sets gathered from several source systems, such as distinct language and unique IDs, are also a common concern in data processing efforts (Stapor, 2020, pp. 63-75). These inconsistencies might hinder the accuracy of the data.6.
The challenge of identifying how to improve an information gathering — for example, what to add to it — is a difficult one that demands a solid grasp of the requirements of a company and the objectives of its analytics efforts.7. Keeping and enhancing data preparation procedures: The task of data preparation often becomes a repeating process that has to be kept and improved on a regular basis.
Three sources of data were proposed for the current study including CO2 emissions per capita, CO2 emissions per country, and surface temperature. Appendix 1 provides an overview of the links to each of the datasets.
As previously stated, data preparation entailed combining the datasets and selecting key characteristics such as the name of the countries; latitude and longitude, carbon produced by fossil fuel, coal, electricity, and methane in the atmosphere; carbon dioxide emissions per capita, and other features that turned out to be involved in multicollinearity with other features and hence were not put to use while modeling the algorithm. The observations range from 1990 to 2018. Following that, feature engineering was used to create dummy variables for each of the involved nations. This might be useful later in projecting CO2 emissions by country. Now, diverse aspects such as fuel, oil coal, and gas consumption were merged from several data sets. Post merging, all the missing observations were dropped from the dataset which resulted in a final dataset with 2088 observations with 78 predictor attributes and 1 target attribute i.e., CO2 emissions per capita.
Figure 2: Importing Packages and Datasets
Figure 3: Merging the Datasets
Figure 4: Cleaning the Data
Figure 5: Final Dataset
Figure 6: Unique Countries and Years in the Dataset
To address the research objective, 6 machine learning models were used. This included Random Forest (RF), Gradient Boosting, Lasso Regression (LR), and Ridge Regression (RR). During implementation, the performance of the models was optimized using cross-validation after which the resulting models were used to predict the CO2 emissions in the validation data.
The shrinkage or downsizing and feature selection approach adopted for linear regression modeling is also labeled as lasso analysis or lasso regression analysis. In lasso analysis, the main purpose is to identify the subset of indicators that results in the smallest degree of error in the forecasting of a quantifiable response feature. The lasso does this by placing a restriction on the model parameters, which, in turn, leads the regression coefficients for certain variables to decrease and move closer and closer to zero. After going through the shrinkage procedure, any variables that had a regression coefficient that was equal to zero were removed from the model (Gana, 2020, pp. 5-15). Those variables that have regression coefficients that aren’t zero are the ones that have the strongest connection to the response variable. The variables that are used to explain anything might either be quantitative, categorical, or both. During this discussion, lasso regressional analysis will be done and then the results will be evaluated. With this, expertise in selecting the model that provides the greatest fit for the data and obtaining a more accurate estimate of the test error rate associated with the model by utilizing k-fold cross-validation will also be gained (Arashi, Saleh and Kibria, 2019). You will need to select a few additional categorical and quantitative predictors, also termed explanatory variables to model or develop an even larger pool of classifiers in order to test a lasso regression model. The experience with lasso analysis will be enhanced to the maximum level if there is a wider pool of predictors to evaluate (Gana, 2020, pp. 5-15). It is important to keep in mind that lasso regression is a technique of machine learning; hence, the selection of extra variables does not always need to be reliant on a study hypothesis or theory. Also, at the edge of this analysis, it would be fine to make decisions on filtering out the most essential predictors that would be of greater significance and assistance in terms of defining and predicting the target feature. It is also important to keep in mind that if the data set that is being dealt with is not very large, then there is no point in splitting the data into a training data set and a test data set.
In linear regression, the connection between the characteristics that are inputted and the variable that is being analyzed is assumed to be linear. In the scenario when there is just one input variable, the association takes the form of a line. Nevertheless, as the number of dimensions increases, we may suppose that the connection is a hyperplane that links the characteristics of the input to the variable that we are interested in (Arashi, Saleh and Kibria, 2019). The optimization approach may be used to find the coefficients in order to minimize the amount of difference that exists between the anticipated output (i.e. that) and the desired values (i.e. y). When difficulties arise with linear regression, one of the potential outcomes is that the parameter estimates of the model will grow quite big. This will cause the model to become so unsteady that it will become susceptible to the parameters. Which is relevant to the issues of having a limited number of observations or variables. It is possible to use a strategy in order to restore the regression model’s robustness (Arashi, Saleh and Kibria, 2019). It involves modifying the loss function and adding extra expenses in order to account for a model with very big coefficients. “Regularized or Penalized Linear Regression” refers to the linear regression analysis that includes updated versions of the loss functions. These models are also known as “Penalized Regression.” When doing regression modeling, the existence of multicollinearity almost always results in estimated coefficients that are inconsistent. Since it’s not resistant to the multicollinearity issue, the conventional approach of regression analysis known as OLS or ordinary least squares generates an erroneous, unstable, and volatile model as a consequence (Gana, 2020, pp. 5-15). This problem of model instability has been approached from a few different angles in the research that’s been published, but the ridge regression approach is by far the most popular.
Within the realm of machine learning, the method known as “gradient boosting” is considered to be one of the most effective algorithms. Mistakes in machine learning algorithms may be broken down into two main groups: variance and bias errors. It is common knowledge that these errors fall into one of these categories. In order to reduce the amount of bias error produced by the model, the boosting procedure known as gradient boosting is used. In contrast to the Adaboosting technique, the gradient boosting approach does not allow its base estimator to be stated by the user. The Gradient Boost technique uses a fixed base estimator called Decision Stump. This is one of its key components. The n_estimator of this method may be tuned, much as it can be with AdaBoost. If the value of the n_estimator variable is not specifically stated, then the algorithm will use the preset value of 100 for that variable (Bentéjac, Csörg? and Martínez-Muñoz, 2021, pp. 1938-1955). The Gradient Boosting Algorithm can be used to predict both categorical (as a classifier) and continuous target variables. When it is used as an endogenous variable, the Mean Square Error (MSE) serves as the cost function. On the other hand, when it is utilized as a classifier, the Log loss serves as the cost function.
The method known as the random forest is used for supervised machine learning. Because of its precision, ease of use, and adaptability, it is one of the algorithms that is used the most. Because of its nonlinear character and the fact that it can be used for classification and regression tasks, it is very flexible to a wide variety of data and scenarios. Tin Kam Ho was the one who first suggested using the phrase “random choice forest” back in 1995. Ho devised a method for making predictions based on arbitrary data by using a formula. The technique was then modified by Leo Breiman and Adele Cutler in 2006, which resulted in the creation of random forests as we know them today. This indicates that this technology, as well as the mathematics and physics that underpin it, is still in its infancy. Because it produces a seemingly endless supply of decision trees, we refer to it as a “forest.” The information obtained from each of these trees is then combined in order to provide the most reliable forecasts (Probst, Wright and Boulesteix, 2019). The forest ensures a more accurate conclusion because of its greater number of groups and options, in contrast to a solo decision tree, which only has one outcome and a limited number of groups to choose from. Finding the greatest feature from a random collection of characteristics has the additional advantage of introducing unpredictability into the model, which may improve its accuracy. In general, these features combine to provide a model that offers a broad variety of options, which is something that many data scientists want. The random forest is the favored model for supervised machine learning used by many data scientists because it is very effective, adaptive, and agile (Probst, Wright and Boulesteix, 2019). When discussing its efficacy, we may say that it is highly effective. It provides precise forecasts and classifications in addition to a wide variety of advantages that are not provided by many other options. On the other hand, the majority of its workings remain a mystery, making it somewhat of a “black box” in terms of how findings are produced.
Altogether, the most optimum model was chosen primarily on R-squared, which is the amount of variance defined by the model, and the second RMSE or the root mean squared error that tells the extent of error of the model, which it can incur while evaluating and presenting the results or the forecasts. Hence, the model with the lowest standard deviation and the most variance accounted for was identified as the best forecasting model.
Finally, the optimal model was chosen to forecast the carbon emissions per capita for the next 5-10 years. For that purpose, the test dataset was constructed utilizing the name of the nations, entries about the period 2023 to 2030, and a country’s most recent fuel, oil, and coal consumption as predictor variables. The assumption was that all the carbon-emitting factors play a vital role in the rise of overall CO2 emissions throughout the globe. Also, for this test, it was assumed that all these factors remain constant in the upcoming decade, which was one of the conceivable alternatives because no future data for these factors were whatsoever available anywhere for the predicted years.
The work by Kadam and Vijayumar (2018) which is the most significant to the ongoing study, employs simply the regression analysis using the previous year’s carbon dioxide emissions to estimate the follow-up year’s emissions. Around, four models are adopted in this research, including Lasso and Ridge Regression, Random Forest, and Gradient Boosting, out of which the top-performing framework based on RMSE and R – squared will be chosen for predicting. Furthermore, rather than the exact carbon dioxide emissions utilized in the present research, the per capita carbon emissions of each nation from 1990 were employed. This study will also utilize the names of the countries as an exogenous construct to estimate the per capita carbon emissions of each of the nations. Furthermore, the most recent research offers forecasts for a single nation, India, but this study intends to anticipate emissions from all countries across the world. Finally, the current study used 10-fold cross-validation to determine the optimal variables for each of the algorithms.Data ExplorationAs noted in figure 2, over the period of observation, the highest amount of carbon emission per capita was approximately 67.0124 which was associated with Qatar.
Figure 7: Annual CO2 emissions
Table 1 below provides an overview of the countries with the highest average CO2 emissions per capita between 2000 and 2013
Table 1: Top highest CO2 emitting countries on an average
Between 1990 and 2018, Kuwait, Qatar, and the United Arab Emirates (UAE) had the greatest annual CO2 emissions per capita. However, as seen in Figure 3, Kuwait and Qatar have a decreasing tendency in the quantity of Carbon dioxide that is released over time, although Kuwait has a little growing trend.
Figure 8: Countries colored with their CO2 numbers on the Geo Map
From the below table 2 below it can be observed that the RF or the Random Forest model achieves the lowest possible root-mean squared error score with an RMSE of approximately 1.2164 when predicting C02 emissions using the validation data among the rest of the models. Along with this, the model is also capable of explaining around 98.65% of the variations in the emissions of the carbon dioxide in the air. Hence, this variation definition on the test data makes the random forest model, the best performing model, thereby outperforming all the regression models that are proposed by the recent studies which claimed an RMSE of around 0.2557.
Table 2:
Therefore, the Random Forest model was proposed for forecasting the CO2 emissions between 2023 and 2030. Table 3 below provides an overview of the forecasts of the top 10 countries that are predicted to have the highest carbon emissions between 2023 and 2030. Figure 4 provides an overview of the forecasts per country in 2030.
Conclusion
Conclusively, it is observed that the change in the climate is a major concern due to the underpinning impacts of rising temperatures caused by a rise in the amount of greenhouse gas in the atmosphere. The relentless burning of fuels, coals, oil, and electricity consumption, causes the melting of an iceberg, which in turn causes heat surges or severe storms, which is not an unusual occurrence. It is hence noticed that the fourth oxide of carbon is a primary cause of the reported warming of the planet. As a result, an increase in CO2 levels will typically lead to variations in climatic changes. According to the conclusions of this analysis, Qatar is a top carbon dioxide emitter, but with a decreased trend showing that the government has implemented carbon-cutting efforts. This may not be the case for nations such as Saudi Arabia, Bahrain, and Oman, which are predicted to increase their carbon emissions into the air during the next decade.
Reference List
Adjuik, T.A. and Davis, S.C., 2022. Machine Learning Approach to Simulate Soil CO2 Fluxes under Cropping Systems. Agronomy, 12(1), p.197.
Amarpuri, L., Yadav, N., Kumar, G. and Agrawal, S., 2019, August. Prediction of CO 2 emissions using deep learning hybrid approach: A Case Study in Indian Context. In 2019 Twelfth International Conference on Contemporary Computing (IC3) (pp. 1-6). IEEE.
Amrhein, V., Trafimow, D. and Greenland, S., 2019. Inferential statistics as descriptive statistics: There is no replication crisis if we don’t expect replication. The American Statistician, 73(sup1), pp.262-270.
Arashi, M., Saleh, A.M.E. and Kibria, B.G., 2019. Theory of ridge regression estimation with applications. John Wiley & Sons.
Ballantyne, A.G., 2018. Exploring the Role of Visualization in Climate Change Communication–an Audience Perspective (Vol. 744). Linköping University Electronic Press.
Bentéjac, C., Csörg?, A. and Martínez-Muñoz, G., 2021. A comparative analysis of gradient boosting algorithms. Artificial Intelligence Review, 54(3), pp.1937-1967.
Bracco, A., Falasca, F., Nenes, A., Fountalis, I. and Dovrolis, C., 2018. Advancing climate science with knowledge-discovery through data mining. npj Climate and Atmospheric Science, 1(1), pp.1-6.
Chen, Z.T., Liu, H.Y., Xu, C.Y., Wu, X.C., Liang, B.Y., Cao, J. and Chen, D., 2022. Deep learning projects future warming-induced vegetation growth changes under SSP scenarios. Advances in Climate Change Research, 13(2), pp.251-257.
Faghmous, J.H. and Kumar, V., 2014. A big data guide to understanding climate change: The case for theory-guided data science. Big data, 2(3), pp.155-163.
Ford, J.D., Tilleard, S.E., Berrang-Ford, L., Araos, M., Biesbroek, R., Lesnikowski, A.C., MacDonald, G.K., Hsu, A., Chen, C. and Bizikova, L., 2016. Big data has big potential for applications to climate change adaptation. Proceedings of the National Academy of Sciences, 113(39), pp.10729-10732.
Gana, R., 2020. Ridge regression and the Lasso: how do they do as finders of significant regressors and their multipliers?. Communications in Statistics-Simulation and Computation, pp.1-35.
Glavovic, B.C., Smith, T.F. and White, I., 2021. The tragedy of climate change science. Climate and Development, pp.1-5.
Harold, J., Lorenzoni, I., Shipley, T.F. and Coventry, K.R., 2020. Communication of IPCC visuals: IPCC authors’ views and assessments of visual complexity. Climatic Change, 158(2), pp.255-270.
Hassani, H., Huang, X. and Silva, E., 2019. Big data and climate change. Big Data and Cognitive Computing, 3(1), p.12.
IPCC, 1992. Climate Change: The 1990 and 1992 IPCC assesments, s.l.: IPCC.
Kadam, P. and Vijayumar, S., 2018, April. Prediction model: CO 2 emission using machine learning. In 2018 3rd International Conference for Convergence in Technology (I2CT) (pp. 1-3). IEEE.
Kim, H., 2021. Technologies for adapting to climate change: A case study of Korean cities and implications for Latin American cities.
Leary, J., Rubinstein, B. & Whitton, H., 2020. IS US GOVERNMENT INACTION ON CLIMATE CHANGE A BREACH OF THE CONSTITUTION?, s.l.:Herbert Smith Freehills.
Lynas, M., 2022. Clock is ticking in race to slow carbon dioxide emissions, scientists warn, s.l.: Alliance for Science.
Mahony, M. and Hulme, M., 2012. The colour of risk: An exploration of the IPCC’s “burning embers” diagram. Spontaneous Generations: A Journal for the History and Philosophy of Science, 6(1), pp.75-89.
Manogaran, G. and Lopez, D., 2018. Spatial cumulative sum algorithm with big data analytics for climate change detection. Computers & Electrical Engineering, 65, pp.207-221.
Marshall, M., 2006. Timeline: Climate Change, s.l.: New Scientist.
McGrath, M., 2022. Climate change: IPCC scientists say it’s ‘now or never’ to limit warming, s.l.: BBC.
McInerny, G.J., Chen, M., Freeman, R., Gavaghan, D., Meyer, M., Rowland, F., Spiegelhalter, D.J., Stefaner, M., Tessarolo, G. and Hortal, J., 2014. Information visualisation for science and policy: engaging users and avoiding bias. Trends in ecology & evolution, 29(3), pp.148-157.
Milman, O., 2013. Carbon emissions must be cut significantly by 2020, says UN report. United Nations University/The Guardian.
Mintzer, I. M., 1990. Energy, Greenhouse Gases, and Climate Change. Annual Review of Energy, 15(1990), pp. 513-550.
OECD, 2021. Climate change: Consequences of inaction, s.l.
Probst, P., Wright, M.N. and Boulesteix, A.L., 2019. Hyperparameters and tuning strategies for random forest. Wiley Interdisciplinary Reviews: data mining and knowledge discovery, 9(3), p.e1301.
Ramos, M.P., Tasinaffo, P.M., Cunha, A.M., Silva, D.A., Gonçalves, G.S. and Dias, L.A.V., 2022. A canonical model for seasonal climate prediction using Big Data. Journal of Big Data, 9(1), pp.1-25.
Rao, M., 2021. Machine Learning in Estimating CO 2 Emissions from Electricity Generation. In Engineering Problems-Uncertainties, Constraints and Optimization Techniques. IntechOpen.
Singh, M. and Dubey, R., 2021. Deep Learning Model Based CO2 Emissions Prediction using Vehicle Telematics Sensors Data. IEEE Transactions on Intelligent Vehicles.
Stapor, K., 2020. Descriptive and inferential statistics. In Introduction to Probabilistic and Statistical Methods with Examples in R (pp. 63-131). Springer, Cham.
Stein, C., 2022. Data science has a key role to play in climate change., s.l.: Columbia University Data Science Institute.
Thanh, H.V. and Lee, K.K., 2022. Application of machine learning to predict CO2 trapping performance in deep saline aquifers. Energy, 239, p.122457.
The European Space Agency, 2021. Giant iceberg breaks off Brunt Ice Shelf in Antarctica, s.l.
The Nature Conservancy, 2022. The Latest IPCC Report: What is it and why does it matter?, s.l.
UNEP, 2013. The emissions gap report 2013: A UNEP Synthesis Report , s.l.
United Nations, 2022. UN climate report: It’s ‘now or never’ to limit global warming to 1.5 degrees, s.l.
Sebestyén, V., Czvetkó, T. and Abonyi, J., 2021. The applicability of big data in climate change research: the importance of system of systems thinking. Frontiers in Environmental Science, p.70.
Whitney, C. R., 1990. Scientists Urge Rapid Action on Global Warming, s.l.: The New York Times.
Xexakis, G. and Trutnevyte, E., 2021. Empirical testing of the visualizations of climate change mitigation scenarios with citizens: a comparison among Germany, Poland, and France. Global Environmental Change, 70, p.102324.