Benchmarking GHG Emissions Forecasting Models for Global Climate Policy

: Climate change and pollution ﬁghting have become prominent global concerns in the twenty-ﬁrst century. In this context, accurate estimates for polluting emissions and their evolution are critical for robust policy-making processes and ultimately for solving stringent global climate challenges. As such, the primary objective of this study is to produce more accurate forecasts of greenhouse gas (GHG) emissions. This in turn contributes to the timely evaluation of the progress achieved towards meeting global climate goals set by international agendas and also acts as an early-warning system. We forecast the evolution of GHG emissions in 12 top polluting economies by using data for the 1970–2018 period and employing six econometric and machine-learning models (the exponential smoothing state-space model (ETS), the Holt–Winters model (HW), the TBATS model, the ARIMA model, the structural time series model (STS), and the neural network autoregression model (NNAR)), along with a naive model. A battery of robustness checks is performed. Results conﬁrm a priori expectations and consistently indicate that the neural network autoregression model (NNAR) presents the best out-of-sample forecasting performance for GHG emissions at different forecasting horizons by reporting the lowest average RMSE (root mean square error) and MASE (mean absolute scaled error) within the array of predictive models. Predictions made by the NNAR model for the year 2030 indicate that total GHG emissions are projected to increase by 3.67% on average among the world’s 12 most polluting countries until 2030. Only four top polluters will record decreases in total GHG emissions values in the coming decades (i.e., Canada, the Russian Federation, the US, and China), although their emission levels will remain in the upper decile. Emission increases in a handful of developing economies will see signiﬁcant growth rates (a 22.75% increase in GHG total emissions in Brazil, a 15.75% increase in Indonesia, and 7.45% in India) that are expected to offset the modest decreases in GHG emissions projected for the four countries. Our ﬁndings, therefore, suggest that the world’s top polluters cannot meet assumed pollution reduction targets in the form of NDCs under the Paris agreement. Results thus highlight the necessity for more impactful policies and measures to bring the set targets within reach.


Introduction
Along with climate change, air pollution is one of the most serious environmental hazards to human health, estimated to cause 7 million premature deaths per year [1]. The economic consequences of air pollution are dire, as estimates indicate $5 trillion in welfare losses and 225 billion in lost income [2,3].
Air pollution includes greenhouse gas (GHG) emissions that warm the earth's surface and atmosphere [4]. GHG refer to the sum of seven gases that have direct effects on climate change: carbon dioxide (CO 2 ), methane (CH4), nitrous oxide (N2O), chlorofluorocarbons      This incongruence between policy targets and the current reality is particularly worrying given that top polluters continue to show significant increases in total GHG emissions and highlights the necessity of more impactful policies and measures to bring the set targets within reach. Consequently, accurate and robust forecasts for polluting emissions are needed for an effective and efficient policy-making process. The issue is timely, as countries must juggle post-pandemic recovery and bend the emission trends [13]. However, the task is particularly challenging, as the world should halve annual greenhouse gas emissions Electronics 2021, 10, 3149 4 of 28 in the next eight years to keep global warming below 1.5 • C this century, and thus meet the aspirational goal of the Paris Agreement [14]. Other studies report the need for a cut of total GHG emissions by 7.6 percent each year between 2020 and 2030 to stay on track toward the 1.5 • C temperature goal of the Paris Agreement [15]. Current statistics show a rapid recovery of economic activity and increasing emissions as energy demand soars [16].
Unsurprisingly, polluting emissions have steadily drawn the attention of academics and policymakers over the past decades, and national and international agencies increasingly employ forecasts of polluting emissions in their policy-making process. Consequently, producing accurate estimates for GHG emissions and their evolution is critical for robust policy-making processes and ultimately for solving global climate challenges [17]. This in turn is an important motivator for this study, which intends to identify the over-performing predictive model in terms of forecasting accuracy for total GHG emissions and subsequently apply it for producing forecasts for GHG emissions in top-emitting countries over long forecasting horizons, covering the first benchmark set for individual pledges within the Paris agreement, i.e., 2030.
Unlike most studies in the existing literature that investigate driving factors for polluting emissions, we take a univariate approach. This further brings two important advantages. First, it eliminates the challenge of identifying the right mix of macroeconomic, social, and financial variables that are potential impact factors for polluting emissions, and thus eliminates the risk of model misspecification, with further gains in terms of increasing estimation efficiency. Second, and most importantly, our approach allows us to produce forecasts for a validated leading indicator, independent of other variables.
Considering the above considerations, this study makes several contributions to the extant literature, as follows.
First, we employ a wider variety of candidate predictive models, including econometric and machine-learning methods, and perform a battery of robustness checks to assure that the best-performing out-of-sample forecasting model is identified. As we are more concerned with prediction accuracy than in-sample information, and in light of the previous literature, we a-priori expect machine learning methods to over-perform.
Secondly, we use a more relevant metric for air pollution, GHG emissions, instead of CO 2 emissions that are usually employed in previous studies. Consequently, by including a more accurate indicator of air pollution (i.e., CO 2 emissions account for approximately 76 percent of total GHG emissions, according to the Center for Climate and Energy Solutions [18], estimation results are more relevant for policymakers. To this end, this study uses data for the 1970-2018 period provided by the World Development Indicators (WDI) database of the World Bank.
Thirdly, unlike most of the aforementioned previous studies that focus only on a single country or cover at most a handful of economies, this study includes the 12 most polluting countries in the world, which are responsible together for around 75% of total GHG emissions at world level. This contributes to assuring the robustness of the forecasting method and further increases the relevance of results for policymakers.
Results of this study confirm prior expectations and find that overall on average, the neural network autoregression model (NNAR) presents the best out-of-sample forecasting performance for GHG emissions over a long forecasting horizon by reporting the lowest average RMSE within the array of predictive models. Results further show that the world's top polluters will not meet assumed GHG emissions' reduction targets under the Paris agreement, and thus more impactful policies and measures are needed to bring the set goals within reach.
The remainder of the paper is organized as follows. The next section gives an overview of the related literature. Next, Section 3 explains the data and methodology employed in the empirical investigation, while Section 4 presents and discusses the estimation results and the performed robustness checks. Finally, Section 5 concludes the study.

Literature Review
The environmental Kuznets curve (EKC) theory [19,20] states that pollution rises with the economic expansion until a certain level of wealth is achieved, at which point emissions begin to decline, implying an inverted U-shaped link between environmental degradation and income [21]. Overall, mixed results were obtained from previous research that looked at the presence of the EKC in different countries and across different time periods [22]. As a result, the topic of how economic growth and environmental quality are related (i.e., the form of the environmental Kuznets curve) continues to be contentious [23]. As such, on one hand, the EKG hypothesis has been validated empirically by numerous studies (among others, [6,[24][25][26]). However, on the other hand, a bidirectional causality has also been repeatedly encountered [27], thus suggesting that emissions can also be a leading indicator of growth.
Moreover, besides its proven impact on economic growth, air pollution has a substantial influence on public health [28]. Hence, previous studies confirmed that polluting emissions are also a leading indicator for various health variables [29] and for mortality [30,31]. These effects have been found in both long-term studies, which have followed cohorts of exposed individuals over time, and in studies that connect day-to-day fluctuations in air pollution and health [32]. Moreover, there is mounting evidence that indoor air pollution is a severe concern to human health in addition to ambient air quality, particularly in low-income nations where biomass fuels are still used as an energy source [33]. All these findings further highlight the importance of combating climate change.
As such, given its validated role as an impact factor for important socio-economic variables, the primary objective of this study is to produce more accurate forecasts of GHG emissions. This in turn contributes to the timely evaluation of the progress achieved toward meeting global climate goals set by international agendas and also acts as an early-warning system when projections show that the state of affairs does not reflect policy statements and formal pledges are not followed by concrete measures and results. Hence, results of this study are also important for policymakers to incorporate forecasts of polluting emissions in their policy-making process.
However, time series analysis and forecasting remain challenging tasks [34], and air pollution prediction is no exception [35]. Broadly, based on the work of [36] prediction models pertain to two main cultures or schools of thought [37], each with its benefits and drawbacks [38]: (i) econometrics, or statistical methods, a category that covers many familiar models [39], and (ii) machine learning (self-learning systems, capable of learning from data to improve their performance). Their two common goals, information, and predictability [40] are differently prioritized, with statistical methods focusing on inference, whereas machine-learning techniques concentrate on prediction [41]. As the British statistician George Box has famously put it: "All models are wrong, but some are useful." Consequently, the aim in time-series forecasting should be to identify the best predictive model within a pool of candidates and employ it to produce forecasts for the series of interest. This study does not deviate from this goal. Previous studies that attempt to model and forecast univariate polluting emission time series (most often CO 2 ) primarily employ statistical methods, including the logistic equation [42], the ARIMA [43], and the ARIMA, Holt-Winters, exponential smoothing, and singular spectrum analysis (SSA) [44]. In the second category, we encounter among others [45] that use extreme learning machines based on particle swarm optimization to predict CO 2 emissions in Hebei, ref. [46] that use an artificial neural network (ANN) to predict carbon emission intensity for Australia, Brazil, China, India, and the USA, and [47], which employ a neural network model for forecasting the CO 2 emission produced by the cereal sector in a southern Italy region. Overall, previous studies confirm that nonlinear models can capture the nonlinear pattern of real-world data, and thus overcome the limitation of linear models, improving their prediction performance [48]. Additionally, artificial neural networks (ANN) are found to be useful in time series modeling where past values of a variable of interest are used to determine its future values [49].
In this study, we attempt to forecast the evolution of GHG emissions by employing six candidate models belonging to both the aforementioned categories. As such, we estimate the innovations state-space models for exponential smoothing (ETS), the Holt-Winters (HW) model, the autoregressive integrated moving average (ARIMA) model, the trigonometric ETS state-space model with Box-Cox transformation, ARMA errors, trend and seasonal components (TBATS) model, the structural time series model (STS) and the neural network autoregression (NNAR) model. Additionally, a naive model is also employed for comparative purposes.
Similar approaches in the literature, but with application on other time series, include the study by [50], which estimate and report the forecasting performance of nine models for the price of gold, concluding that on average, the exponential smoothing model is providing the best forecasts in terms of the lowest root mean squared error. Similarly, ref. [51] uses seven automated forecasting techniques, including statistical and machine learning models, for explaining and predicting the evolution of CO 2 emissions in Bahrain and identify the NNAR model to provide the most accurate out-of-sample forecasts. More recently, ref. [34] also predicted the evolution of Bahrain's CO 2 emissions by employing a neural network time series nonlinear autoregressive model, the Gaussian process regression model, and Holt's method, to agree that the NNAR model is outperforming the other candidates. Ref. [52] also employs four of the techniques applied in this investigation (i.e., ARIMA, ETS, NNAR, and TBATS) along with their feasible hybrid combinations to forecast the second wave of COVID-19 hospitalizations in Italy, concluding that the best single models were NNAR and ARIMA, and that the best hybrid models always included a NNAR process. Finally, ref. [53] employ statistical and deep learning methods to forecast longterm pollution trends for the two categories of particulate matter (PM) in a major city in eastern India, i.e., Kolkata. They conclude that statistical methods (i.e., auto-regressive (AR), seasonal auto-regressive integrated moving average (SARIMA) and Holt-Winters) outperform deep learning methods for their data. However, they argue that the results might be due to the limited data available, and that with a higher quantity of data and higher frequency and forecasting horizon, deep-learning models would out-perform.
All of these works bring important results for the global climate fight related literature. However, most of these works have a narrow interest (i.e., most are single-country studies, as seen above) and most importantly, they do not strongly defend their results robustness. The vast majority stops at evaluating the predictive ability of alternative models by reporting various forecasting accuracy metrics. [34] employs the root mean square errors (RMSE) to this end, whereas [53] estimate both RMSE and MAE, and [52] reports MAE, MAPE, MASE, and RMSE metrics. Nonetheless, except [51] that reports the KSPA test, other studies do not estimate and present statistical tests for multiple forecast comparisons and thus, do not investigate the hypothesis whether forecasts are significantly different, defending their results. Additionally, none of these previous works have re-estimated the models by employing an alternative forecasting technique (i.e., recursive window, changing window length, various time series slitting rules, etc.).
In this study, results' robustness is assured firstly by employing out-of-sample forecasting on a holdout sample of observations and investigating the accuracy of several forecasting methods in comparative perspective, then by reassessing the predictive ability of candidate models via the recursive window forecasting technique, and finally by performing all estimations for 12 different top polluting countries, responsible for around 3 4 of total GHG emissions at world level. Moreover, applying the Kolmogorov-Smirnov (KS) predictive accuracy test (KSPA) proposed by [54] and the Diebold-Mariano (DM) test introduced by [55] and developed by [56] further contributes to testing the over-performance of the best predictive model and assures our results' robustness.
Additionally, a further advantage of our approach consists in the fact that the employment of standard econometric methods together with machine-learning techniques in estimations and predictions allows comparison with previous results from the literature. In the first stage, GHG data were extracted from the WDI for all countries included in the database, thus resulting in a sample of 205 individual economies. Then, we have removed countries for which data were unavailable over the entire period, resulting in the final sample of 175 countries and 8575 annual observations included in the analysis.
An exploratory analysis aimed at uncovering the state of affairs was subsequently performed. As of 2018, the world's top greenhouse gas (GHG) emitters in absolute terms are China, the United States, India, the Russian Federation, Japan, and Brazil. The 20 top polluters reflected in Table 1 belong to all income categories, among which 50% are highincome countries, 35% are upper-middle-income countries, and 15% lower-middle-income economies (i.e., India, Indonesia, and Pakistan). This confirms that high polluting emissions are a problem across the development divide [57]. The rhythm of GHG emissions growth is highest in Korea, with an alarming 711% increase over 1970-2018, followed by China with 559% and Iran with 484% over the same period. Only five of the world's top polluters (i.e., Brazil, Germany, UK, France, and Poland) register a decrease in emissions since 1970, with overall modest decreasing rates (emissions have fallen the most in the UK, with a negative evolution of −46%). As such, although these (mostly) developed countries have shown a downward trend in overall emissions, their levels remain in the upper decile as of 2018 (for Brazil, Germany, UK, France), while Poland is on the 8th decile in rank of the world countries with most GHG emissions in 2018. Over the 49-year period, the top 20 world polluters recorded an enormous 183.34% of GHG emissions growth, whereas the world average (including the top polluters) is 67% over the same period, as shown in Figure 1. The disparities in emissions growth are also reflected in Figure 3.  Equally troublesome, only six countries have actually reduced their GHG emissions in the aftermath of the Paris agreement (i.e., Japan, Brazil, Germany, UK, France, and Italy). All other top polluters continue to register increases in emissions since 2015, with Pakistan, Turkey, and Indonesia showing the highest growth levels.

GHG Emissions by Country, Top Polluters and Historical Trends
Electronics 2021, 10, 3149 9 of 28 Figure 4 confirms that a small handful of nations account for the majority of global greenhouse gas emissions. On an absolute basis, China, the United States, and India are the three largest emitters. Together, they account for 48% of 2018 global GHG emissions. The 12 most polluting countries produce overall around three quarters of total GHG emissions at the world level, while the other 163 countries included in the analysis are responsible together for 26% of total greenhouse gas emissions in 2018. This underlines that a minority of countries create a global problem with systemic consequences. This in turn further motivates us to focus on the 12 top emitting countries in our investigation.

Method
Firstly, Appendix A presents the notations and definitions that are employed in the empirical investigations.

Forecasting Technique
This study implements a holdout technique to compare and select an optimal model for forecasting GHG emissions in 12 countries. This technique requires the division of the historical data series of length Ni, i ∈ {1, . . . , 12} in two subsets corresponding to a training (or fit) period and a test period. For our purposes, the data up until 2013 (i.e., approximately 90% of observations) are used in-sample for model training and validation whilst the period covering 2014-2018 (i.e., 10% of observations) is set aside for testing the out-of-sample forecasting accuracy of the predictive models. The last observation in the training interval Si is thus the forecasting origin (here, GHG emissions in 2013), whereas the period that is predicted (here, 2014-2018) represents the forecasting horizon or lead-time, equal to Ni-Si [58]. Figure 5 depicts the holdout forecasting technique employed in this study. mately 90% of observations) are used in-sample for model training and validation whilst the period covering 2014-2018 (i.e., 10% of observations) is set aside for testing the out-of-sample forecasting accuracy of the predictive models. The last observation in the training interval Si is thus the forecasting origin (here, GHG emissions in 2013), whereas the period that is predicted (here, 2014-2018) represents the forecasting horizon or lead-time, equal to Ni-Si [58]. Figure 5 depicts the holdout forecasting technique employed in this study. As most NDCs under the Paris agreement specify the year 2030 as the first deadline for emissions reduction, we are particularly concerned with identifying the best predictive model within a pool of seven candidates and subsequently using it for providing h = 12 steps ahead forecasts for GHG emissions in the 12 top polluting countries, thus including this first deadline in the forecasting horizon.
R software is employed to implement the method and estimate the predictive models via automatic forecasting algorithms, mainly included in the "forecast" package [59] and the "stats" package [60].

Robustness Checks
Forecasting accuracy: The forecasting accuracy of all candidate models for each of the 12 series is assessed through estimating the root mean squared error (RMSE), as in [44] and [55]. This accuracy metric brings the valuable benefit of being directly interpretable in terms of measurement units. RMSE represents the square root of the mean square error, and thus is estimated by taking the differences between each point forecast and corresponding observed value within the lead time, squaring it, and averaging it, as in Equation (1) As most NDCs under the Paris agreement specify the year 2030 as the first deadline for emissions reduction, we are particularly concerned with identifying the best predictive model within a pool of seven candidates and subsequently using it for providing h = 12 steps ahead forecasts for GHG emissions in the 12 top polluting countries, thus including this first deadline in the forecasting horizon.
R software is employed to implement the method and estimate the predictive models via automatic forecasting algorithms, mainly included in the "forecast" package [59] and the "stats" package [60].

Robustness Checks
Forecasting accuracy: The forecasting accuracy of all candidate models for each of the 12 series is assessed through estimating the root mean squared error (RMSE), as in [44,55]. This accuracy metric brings the valuable benefit of being directly interpretable in terms of measurement units. RMSE represents the square root of the mean square error, and thus is estimated by taking the differences between each point forecast and corresponding observed value within the lead time, squaring it, and averaging it, as in Equation (1): RMSE, as many other Goodness-of-Fit (GoF) metrics, is referred to as scale-dependent [61]. Within the scale-dependent category of GoFs, RMSE, and the mean absolute error (MAE) emerged as the most popular. However, RMSE carries some benefits relative to MAE and is usually the recommended metric [62], although it cannot be used to measure out-of-sample forecast accuracy at a single forecast horizon [63] when multiple series of different measurement unit are analyzed. To solve this issue, ref. [63] proposed a new GoF metric, MASE (the mean absolute scaled error), thus a scale free error metric, which we also report in this study for robustness checks purposes. MASE is thus estimated by taking the MAE and dividing it by the MAE of an in-sample naive benchmark, as in Equation (2).
The MASE metric is symmetrical and resistant to outliers, and values larger than 1 imply that the predictions are, on average, poorer than the naïve model's in-sample one-step forecasts [63]. The MASE would only be infinite or undefined if all historical observations were equal or if all of the actual values throughout the in-sample period were zeros [64].
The recursive window forecasting technique: To further assure the robustness of our results, all estimations for the 12 time series are repeated by implementing one of the most popular techniques for cross-validation, a fixed-length rolling-window forecasting technique.
As such, the dataset covers the training period set for the first S observations (i.e., 44 years) in the sample, and a testing period of length N-S, where N is the total number of observations for each country, i.e., 49. For each year n in the testing interval [S+1:N], or here [2014:2018], the GHG emissions are predicted after the candidate models have been fit on the recursive window of S past observations. This sequence is repeated recursively over the lead-time, and consequently a total of N-S iterations (5) are performed for each of the 12 time series. Figure 6 illustrates this process applied for the current investigation.
The MASE metric is symmetrical and resistant to outliers, and values larger than imply that the predictions are, on average, poorer than the naïve model's in-samp one-step forecasts [63]. The MASE would only be infinite or undefined if all histori observations were equal or if all of the actual values throughout the in-sample peri were zeros [64].
The recursive window forecasting technique: To further assure the robustness of o results, all estimations for the 12 time series are repeated by implementing one of t most popular techniques for cross-validation, a fixed-length rolling-window forecasti technique.
As such, the dataset covers the training period set for the first S observations (i.e., years) in the sample, and a testing period of length N-S, where N is the total number observations for each country, i.e., 49. For each year n in the testing interval [S+1:N], here [2014:2018], the GHG emissions are predicted after the candidate models have be fit on the recursive window of S past observations. This sequence is repeated recursiv over the lead-time, and consequently a total of N-S iterations (5) are performed for ea of the 12 time series. Figure 6 illustrates this process applied for the current investigatio  of the ETS model is fully automated through the "forecast" package in R and, together with ARIMA models, is the base model for the most popular automatic forecasting algorithms [66]. In this study, the system is instructed to automatically select the error, type, and season, and to apply the corrected Akaike information criterion (AICc) for model selection. Hence, following the terminology of [59,65], we specify the three-character string identifying method as (Z,Z,Z).
The Holt-Winters Model (HW): The HW model was introduced in the late 1950s and early 1960s by [67,68]. It applies three exponential smoothing formulae to the time series: to the mean, trend, and each seasonal sub-series, respectively [69]. In this study, the estimation of the HW model for the 12 time series is automated through the "HoltWinters" function included in the "stats" package in R software. It computes Holt-Winters filtering of a given time series, and identifies unknown parameters by minimizing the squared prediction error.
TBATS Model (Exponential Smoothing State Space Model with Box-Cox Transformation, ARMA Errors, Trend, and Seasonal Components): The TBATS model, which is capable of handling multiple and complex seasonality has been introduced by [70]. The TBATS model is fit for the 12 time series through the "forecast" package in R. The fitted model is identified as TBATS (omega, phi, <m 1 ,k 1 >, . . . , <m J ,k J >), where omega is the Box-Cox parameter and phi is the damping parameter, m 1 , . . . , m J reflect the seasonal periods, and k 1 , . . . , k J are the corresponding number of Fourier terms used for each seasonality. The Box-Cox parameter, the trend and the damping parameters are automatically selected in our estimations by AIC.
ARIMA: ARIMA models constitute a popular statistical technique for time series forecasting that is capable of describing the autocorrelations in the data. This study applies the automatic ARIMA methodology provided through the "auto.arima function" within the "forecast" package for the R software. As in [66], the function uses unit root tests, minimization of the AICc and MLE to return the best ARIMA model, through a step-wise automated procedure.
Structural Time Series Models (STS): Structural time series models are (linear Gaussian) state-space models for (univariate) time series based on a decomposition of the series into a number of components [71]. STS models can be easily implemented in R through the function "StructTS" in the "stats" package, as in [72]. This is automatically realized in this study for the 12 GHG time series by maximum likelihood. Neural network autoregression model (NNAR): The main predictive model of interest in this study is NNAR, which provides the adaptability advantage by learning from the provided inputs and training itself to optimize weights. Generally, a neural network autoregression model (NNAR) uses p lagged values of the time series as inputs to a neural network with k hidden nodes, for forecasting the output y(t). The model is thus usually specified as NNAR(p,k), and the hidden nodes are nonlinear functions of the original provided inputs. The functions that are applied at the nodes of the hidden layers are called activation functions. A more complex specification is needed when the data is seasonal, and thus the model in this case is written as NNAR(p,P,k), where P is the number of seasonal lags. Figure 7 reflects the general structure of a neural network autoregression model, with its three main layers: the first layer of the autoregressive neural network receives the lagged values of the series (here GHG emissions) as inputs, then a linear combination of the weighted inputs are fed forward to the hidden layer or layers of the network, and finally a nonlinear activation function modifies the result from the hidden layer nodes which is then passed to the last output layer that contains a single node representing the predicted value.
Electronics 2021, 10, x FOR PEER REVIEW 13 of 28 which is then passed to the last output layer that contains a single node representing the predicted value. In equation format, the NNAR model depicted in Figure 7 can be expressed as: where Y is the output vector, f is the activation function, H is the vector of nodes in the hidden layer, W represents the weight matrix between the input and the hidden layers, X is the vector of inputs, and B is a bias vector. In this study, the "nnetar" function within the R software "forecast" package is used to automatically fit multilayer feed-forward neural networks with a single hidden layer, k nodes and p lagged inputs, by automatically selecting parameters p and P through AIC. The algorithm is also instructed to make 25 repetitions and to estimate the number of hidden notes as k = (p + P + 1)/2 (rounded to the nearest integer). As the initial weights at the input layer take random values and are subsequently updated using the observed data, we follow best practices and train the network 25 times using different random starting weights, and then average the results. Based on previous results (i.e., [51]), we expect NNAR models to out-perform other candidates in terms of forecasting accuracy when applied for GHG emissions series.

The Conceptual Framework
For a clearer view of the implemented method, Figure 8 reflects the consecutive steps that have been taken to estimate the alternative models and produce out-of-sample forecasts. -1)   y(t -2) ... Figure 7. General structure of the nonlinear autoregressive neural network (NNAR) with one hidden layer.

NNAR Input layer with p nodes
In equation format, the NNAR model depicted in Figure 7 can be expressed as: where Y is the output vector, f is the activation function, H is the vector of nodes in the hidden layer, W represents the weight matrix between the input and the hidden layers, X is the vector of inputs, and B is a bias vector. In this study, the "nnetar" function within the R software "forecast" package is used to automatically fit multilayer feed-forward neural networks with a single hidden layer, k nodes and p lagged inputs, by automatically selecting parameters p and P through AIC. The algorithm is also instructed to make 25 repetitions and to estimate the number of hidden notes as k = (p + P + 1)/2 (rounded to the nearest integer). As the initial weights at the input layer take random values and are subsequently updated using the observed data, we follow best practices and train the network 25 times using different random starting weights, and then average the results. Based on previous results (i.e., [51]), we expect NNAR models to out-perform other candidates in terms of forecasting accuracy when applied for GHG emissions series.

The Conceptual Framework
For a clearer view of the implemented method, Figure 8 reflects the consecutive steps that have been taken to estimate the alternative models and produce out-of-sample forecasts. Electronics 2021, 10, x FOR PEER REVIEW 14 of 28 Next, Figure 9 puts together all building blocks of the research and gives on overview of the work conducted. Next, Figure 9 puts together all building blocks of the research and gives on overview of the work conducted.  Table 2 reports the RMSE for out-of-sample forecasting results at a horizon of h = 5 steps ahead (covering the data test period, or 2014-2018) for the models described above, along with a "naive" forecasting model, which predicts a flat line equal to the last observation in the training set. Although no single model can provide the best forecast for GHG emissions at a horizon of five years, the NNAR is over-performing within the pool of seven competing models. The same conclusion is extracted from estimations of  Table 2 reports the RMSE for out-of-sample forecasting results at a horizon of h = 5 steps ahead (covering the data test period, or 2014-2018) for the models described above, along with a "naive" forecasting model, which predicts a flat line equal to the last observation in the training set. Although no single model can provide the best forecast for GHG emissions at a horizon of five years, the NNAR is over-performing within the pool of seven competing models. The same conclusion is extracted from estimations of the second estimated GoF metric, MASE, which is reported in Appendix C. The overall scoring given by the two metrics is identical. Consequently, when a decision should be made about relying on a single predictive model for GHG emissions at the selected forecasting horizon, NNAR emerges as the optimal choice. The STS comes in second in terms of the lowest RMSE, at a significant distance, while other models are not able to provide competitive forecasts for the evolution of GHG emissions in the 12 top polluting countries considered in this study. Notes: * Score indicates the number of times the model outperforms the other candidate models in term of forecasting accuracy; ** Score (%) indicates the percentage of outperformance (out of 12 iterations, or countries); Bold values underline the minimum RMSE across the seven candidate predictive models for each country. Table 3 reports the relative root mean squared error (RRMSE) results for the outof-sample forecasts, where the best performing forecasting model (i.e., NNAR) acts as a benchmark. Hence, the forecasting performance of the neural network model is found to be 28% better than the ETS forecast, 19% better than the ARIMA model, 14% better than STS, 54% better than Holt-Winters, 31% better than TBATS, and 37% better than the naive model for forecasting GHG emissions in the 12 top polluters. Appendix B presents the graphical representation of the forecasting performance, showing the NNAR model's fit to the real test set data for the 12 countries. It can be seen that the NNAR model (despite its nonlinear nature) fails to precisely follow the complex real data dynamics behavior (due to real data highly nonlinear characteristics) and in some instances (i.e., particularly for Brazil, Indonesia, and Canada) is not able to accurately predict the trend over the testing interval.

Empirical Results
Subsequently, applying the Kolmogorov-Smirnov (KS) Predictive Accuracy test (KSPA) proposed by [54] and also the Diebold-Mariano (DM) test introduced by [55] and developed by [56] further tests the over-performance of NNAR and contributes to assuring the robustness of results. The test identifies significant differences between forecasts produced by NNAR and the second-best performing model in each of the cases where NNAR emerged as the optimal model. In instances where NNAR is not found to over-perform, the KSPA/DM tests are applied to identify the differences between forecasts from NNAR and the specific optimal predictive model. As such, the forecast errors from NNAR and competing forecasting models are introduced as inputs into the two-sided KSPA/DM tests, which are then estimated to identify a statistically significant difference in the distribution of forecasts errors from the two models. Table 4 reports the results of the predictive accuracy tests for each pair of competing models and each country, considering NNAR as the benchmark. When the two-sided predictive accuracy tests statistic are significant at 1%, we can reject the null hypothesis and accept the alternate, thus confirming that the forecast errors from NNAR and the other candidate model do not share the same distribution. The KSPA and DM tests confirm for the vast majority of countries that the NNAR forecasting technique provides superior forecasts in comparison to its competitor (the only exceptions are encountered for estimations in the US and Germany). These findings align with those of [51]. In the instances when NNAR is not the optimal model in terms of forecasting accuracies, the predictive accuracy tests generally do not confirm the superiority of the competing model (i.e., for India, Brazil, Indonesia, and Canada). Note: * indicates a statistically significant difference between the distribution of forecast errors from the best and second best performing models based on the two-sided HS test at a 1% significance level; ** denotes significance at 5%; *** denotes significance at 10%. Table 5 confirms the superiority of NNAR throughout further robustness checks, including re-estimation at a forecasting horizon of 3 years and also re-estimation by employing the recursive window forecasting technique, while holding h = 5. In the last stage of this investigation, the over-performing predictive model (i.e., NNAR) is fitted to the entire dataset and further employed to produce point forecasts for GHG emissions in the 12 countries for the 2019-2030 period (i.e., h = 12). We should also mention that the in-sample fit has been verified by estimating the Ljung-Box test to check the residuals for any significant evidence of non-zero correlations at lags 1-20. Test results confirm that all models are correctly specified. We thus confidently proceed with a discussion of forecasting results. Table 6 contains the point forecasts in absolute terms, whereas Table 7 reflects the percentage change relative to the last year with available data within the dataset (i.e., 2018). On average, results indicate a continuation of the current increasing trend of GHG emissions produced by top polluting countries' in the next decade. Therefore, the NNAR model predicts that top polluters countries will see an overall increase of 3.67% in GHG emissions relative to 2018 levels, although significant disparities are identified among individual countries. Thus, in relative terms, the projections translate into a 22.75% increase for Brazil, a 15.75% increase for Indonesia, and 7.45% for India. The only countries that are projected to decrease polluting emissions are Canada (−5.57%), Russian Federation (−3.01%), the US (−0.76%), and China (−0.89%), although total GHG emissions remain in the upper decile and fall well behind set pledges.

Discussion of Results
Among all tested models for predicting GHG emissions in the 12 top polluters, the neural network autoregressive model has illustrated the best forecasting performance. This in line with [34,51] who reach the same conclusion from forecasting a similar time series (i.e., CO 2 emissions) in a single country, and support those of [49], thus confirming that artificial neural networks (ANN) are useful in time series modeling when past values of a variable are used as inputs to explain its future values.
Consequently, similar to the approaches of the aforementioned studies, we issue forecasted values for total GHG emissions in the 12 countries by using the neural network time series nonlinear autoregressive model (NNAR). Overall, we find that both the recent evolution of total GHG emissions in top polluting countries and also future projections of emissions fall significantly below what is needed to achieve set climate goals. Emissions have seen massive increases in Korea, China, and Iran over the 1970-2018 period, due to the rapid economic growth, poverty eradication, and substantial integration into global value chains that characterized these economies over the period [73][74][75][76]. However, GHG emissions are projected to increase most in Brazil, Indonesia, and India over the 2019-2030 interval. Our findings conform to those that emerge from a recent study of the International Energy Agency [16], which shows that polluting emissions have increased in 2020 as economic activity increased toward the middle of the year, but deviate from the projections of the European Environment Agency [13] concerning the EU country included in the study (i.e., Germany).
However, whereas emissions have continued to increase since 2015 (recording an overall 3.4% increase at the world level and 3.23% increase among the global top 20 polluters), the Paris Agreement requires yearly cuts of almost 8% on average at world level to reach the global warming threshold of 1.5 degrees Celsius [15]. Moreover, ref. [77] find that emissions reductions about 80% more ambitious than those in the Paris Agreement are required to stay within the 2 degrees target, thus highlighting that set global warming targets are even more out-of-reach than previously considered. Ref. [78] also confirm that the current commitments are inadequate to meet temperature targets. Projections of future GHG emissions that emerge from our study confirm that no country is expected to meet its NDCs under the Paris agreement, which in turn are nonetheless inadequate in the context of limiting global warming. Given this finding, and considering the catastrophic impact of pollution on public health variables [28,29], including mortality rate [30,31], the current trend is particularly troublesome, and significant efforts should be directed toward its reversal.
Our findings further highlight that more impactful policies are needed to successfully combat global pollution. Considering previous results in the literature that indicate a negative relationship between renewable energy and polluting emissions [79][80][81][82][83][84][85]), we argue that countries, especially top GHG emitters, should use the recovery funds available in the aftermath of the global COVID-19 pandemic and prioritize sustainable energy policies. This is also in line with the conclusion of [21].
Moreover, given that the bulk of global greenhouse gas emissions has historically come from a few countries, and that this situation is expected to continue for the foreseeable future, the logical solution should be to encourage particular nations in implementing specific GHG reduction targets, rather than issuing global policies that cover the entire spectrum of economies. Consequently, the small number of nations that causes a global problem with systemic effects, in particular, must issue and implement ambitious lowcarbon policies. We take a similar view to [86], who suggest that although average global reductions are expected, advanced economies should contribute more in terms of emissions reduction, considering their historical contribution to world pollution.

Conclusions
Greenhouse gas emissions (GHG) have risen significantly for the past 49 years at the world level. However, enormous disparities are encountered among individual countries, both in terms of absolute GHG emissions values and in terms of their rhythm of growth over the 1970-2018 period. On an absolute basis, China, the United States, and India are the three largest emitters. Together, they account for 48% of 2018 global GHG emissions. The 12 most polluting countries produce overall around three quarters of total GHG emissions at the world level, while the other 163 countries included in the analysis are responsible together for 26% of total greenhouse gas emissions in 2018. This underlines that a minority of countries create a global problem with systemic consequences and further motivates our focus on the 12 top emitting countries in our investigation.
The primary objective of this study is thus is to produce more accurate forecasts of GHG emissions. This in turn contributes to the timely evaluation of the progress achieved towards meeting global climate goals set by international agendas and also acts as an early-warning system when projections show that the state of affairs does not reflect policy statements and formal pledges are not followed by concrete measures and results. Results of this study are also important for policymakers that incorporate forecasts of polluting emissions in their policy-making process. A policy can only be efficient if it is developed based on robust input elements. Consequently, an accurate estimation of GHG emissions in top polluting countries is not only paramount for an effective policy-making process in the climate combat arena but will also play a vital role in planning economic developments over the long run. The issue is timely, as countries have to pursue post-pandemic economic recovery while bending the emissions trend.
As such, this paper attempts to forecast the evolution of GHG emissions in the world top polluters by employing seven statistical and machine learning methods, such as the exponential smoothing state-space model (ETS), the Holt-Winters model, the TBATS model, ARIMA, the structural time series model (STS), and the neural network time series forecasting method (NNAR). A naive model is also estimated and serves for comparative purposes. In particular, the study takes a univariate approach that offers the important advantage of producing forecasts for a validated leading indicator independent of other variables, aside from increasing efficiency. The results demonstrate that the best single model in terms of forecasting accuracy for GHG emissions is NNAR, and this finding resists a battery of robustness checks (including re-estimations at different forecasting horizons and re-estimation by implementing a recursive window forecasting technique). Consequently, the NNAR model is further employed to produce GHG emissions point forecasts for the 12 top polluting countries until 2030, i.e., until the first benchmark under the Paris agreement.
Although total GHG emissions were expected to decline sharply in the aftermath of the 2015 Paris agreement and to continue a decreasing trend over the next decades, empirical results indicate that top polluters will see an overall increase of 3.67% in GHG emissions relative to their 2018 levels. However, significant disparities remain among individual countries. Projections from the NNAR model at the 2030 forecasting horizon point to a 22.75% increase in GHG total emissions in Brazil, a 15.75% increase in Indonesia, and 7.45% in India. Decreases in GHG total emissions are expected in Canada (−5.57%), the Russian Federation (−3.01%), the US (−0.76%), and China (−0.89%), although they remain in the upper decile. More importantly, GHG projected levels fall well behind set pledges for all top polluting countries and none of the 12 sample economies is expected to meet its NDCs under the Paris agreement.
Overall, this study makes several important contributions to the extant literature, as follows: (i) it employs a wider variety of candidate predictive models for polluting emissions, including econometric and machine-learning methods, and also performs a battery of robustness checks to defend its findings; (ii) it employs a more accurate indicator for air pollution, thus increasing the relevancy of its results; (iii) it focuses on the 12 most polluting countries that are together responsible for around 75% of total GHG emissions at world level, thus further increasing the relevancy of the findings relative to singlecountry/narrower studies.
We conclude that country-specific policies would be more efficient to tackle global pollution than the global approach that is currently being implemented. In addition, a country-specific approach is only fair, given the enormous historical disparities in terms of individual countries' contributions to world pollution, which are expected to persist. Moreover, public policies and the recovery funds directed toward post-pandemic economic recovery should target sustainable energy production and consumption, which in turn mitigate polluting emissions. Electronics 2021, 10, x FOR PEER REVIEW 24 of 28