Statistical Analysis and Modeling of the CO 2 Series Emitted by Thirty European Countries

: In recent decades, an increase in the earth’s atmospheric temperature has been noticed due to the augmentation of the volume of gases with the greenhouse effect (GHG) released into the atmosphere. To reduce this effect, the European Union’s directives indicate the action directions for reducing these emissions, among which carbon dioxide (CO 2 ) recorded the highest amount. In this context, the article analyzes the CO 2 series reported in 1990–2021 by 30 European countries. The Kruskal-Wallis test rejected the hypothesis that the series comes from the same underlying distribution. The Anderson-Darling test rejected the normality hypothesis for seven series out of thirty, and Sen’s procedure found a decreasing trend slope only for 17 series. ARIMA models have been built for all individual series. Grouping the series (by the k-means and hierarchical clustering) provided the base for building the Regional series ( RegS ), which describes the CO 2 pollution evolution over Europe. The advantage of this approach is to provide the synthetic image of the regional evolution of the CO 2 emission volume (mt), incorporating information from 30 series (one for each country) in only one— RegS . It is also shown that selecting the number of clusters involved in building RegS and assessing their stability is essential for the model’s goodness of fit


Introduction
The Intergovernmental Panel on Climate Change (IPCC) [1] shows that the GHG emissions from human activities reached the highest levels compared to the previous 800,000 years.The US Inventory [2] indicates that in 2021, in the USA, the GHG volume exceeded 6340.2 mil mt of CO 2 equivalents, higher by 6% than in 2020 but lower by 17% than in 2005.The GHG emissions from the EU economy rose to 941 mt CO 2 equivalents in the first quarter of 2023 [3].
Carbon dioxide, the GHG with the highest weight in the total GHG (followed by CO 2 , NH 4 , and N 2 H), results from natural and anthropic sources.The first category includes respiration, ocean release, and decomposition.The second is represented by burning fossil fuels, deforestation, industrial production (like the cement industry), and agricultural activities [4][5][6].Transportation emitted about a quarter of the EU's CO 2 volume in 2019, of which more than two-thirds were produced by road transportation [4,5].CO 2 emissions from transportation have increased by 33.5% from 1990 to 2019 in UE [4].
According to the Intergovernmental Panel on Climate Change (IPCC) document [1], the main cause of temperature augmentation from the middle of the twentieth century has been the enormous amount of GHG emitted into the atmosphere.The Sixth Assessment Report of IPCC shows that to limit this increase, immediate measures must be implemented [7][8][9][10].According to [11,12], natural sinks contribute to removing some GHG from the atmosphere.However, when the GHG volume is high, these gases remain and accumulate for decades or even hundreds of years (like in the case of CO 2 ).Therefore, conditions to stimulate the natural sinks' activity must be created, and artificial sinks should be designed to balance the GHG's effect [12][13][14].The experimental results show that people's constant exposure to CO 2 may harm their health [15][16][17][18][19].The review of Jacobson et al. [20] and the articles on which it is based also highlight some effects of chronic exposure to CO 2 , for example, oxidative stress, diminishing cognitive abilities, kidney calcification, bone demineralization, etc. [21].Moreover, the adverse effects of the CO 2 excess on the ecosystems, plants, and animals are presented in [22,23].In this context, Hadipoor et al. [24] indicate some measures to control and reduce CO 2 emissions.
Despite the necessity of emissions' reduction being understood [25][26][27][28][29][30], there exist differences among countries related to carbon emissions' responsibility [31,32].So, most studies in the field discuss the responsibility for embodied CO 2 emissions in international trade [25,31].Some scientists analyzed the results of the international conventions' implementation [33], while others focused on the source attribution of GHG emissions [34].
Techniques like ARIMA for modeling the CO 2 series from 1972 to 2013 in Bangladesh [35], exponential smoothing and MLP for forecasting the CO 2 series in Pakistan [36], and the use of the Cobb-Douglas function to estimate the CO 2 sectorial amount in the total CO 2 emission in Indonesia [37] proved to be efficient for this purpose.A review of the methods utilized for emphasizing the correlations between the emissions of CO 2 , the consumption of energy, and economic growth is presented in [38].A survey on approaches utilized for modeling the CO 2 emissions from stationary sources, GIS, and economic assessment can be found in [39].An overview of the CO 2 globally averaged concentrations series since 1830 is performed in [40].
In the context of the climate change negotiations, which started with the adoption of the Convention on Climate Change in 1992, followed by the Kyoto Protocol in 1997, and the launching of the EU's Emissions Trading System in 2005, the European Union has been a key player in the fight against climate change.Investigating the CO 2 series evolution is essential for assessing the results of implementing the measures for reducing climate change and diminishing its environmental effects [40,41].Therefore, different articles studied the CO 2 series' temporal [35,40,42] or spatial [43,44] evolution in different countries, but fewer performed spatiotemporal [45] analyses, mostly for countries in Asia.In this study, we propose such an approach for investigating the trend of the CO 2 series recorded in thirty European countries in Europe.First, we perform a deep statistical analysis, then build models (utilizing the Box-Jenkins methodology) for the series recorded in each country, and test the hypothesis that all series have the same underlying distribution.The rest of this study is dedicated to concisely presenting the regional and temporal evolution of the CO 2 emission series in Europe and incorporating the information collected into one representative series-the Regional series (RegS) -significant from a spatial viewpoint.RegS is constructed using a selection from the 30 series recorded in the studied countries based on two clustering algorithms-k-means and hierarchical clustering [46,47].Based on our knowledge, this approach is new in the study field.
It is also shown that finding the optimal number of clusters using different selection algorithms and testing the clusters' stability is essential for obtaining the best RegS.So, even if it appears at a superficial analysis that clustering is the research goal, it is only one of the most critical steps in the proposed algorithm.

Data Series
The analyzed data set consists of the CO 2 net emissions (in mt) series recorded from 1990 to 2021 in the EU-27 countries, Iceland, Norway, and Switzerland.The data series was downloaded in an .xlxsform from the official site of the European Union [48] and represents the series sent by the 30 countries to UNFCCC and the EU GHG Monitoring Mechanism.The series has no gaps.The series was processed and represented in Figure 1 on a logarithmic scale for clarity of representation.The international abbreviations of the countries' names are utilized in this work: AT-Austria, BE-Belgium, BG-Bulgaria, CH-Switzerland, CY-Cyprus, CZ-Czechia, DE-Germany, DK-Denmark, EE-Estonia,

Methodology
The study is split into two parts.The first focuses on evaluating the evolution of the time series recorded in each country and yearly CO2 series.The second one concerns building RegS.

Study the Time Series Recorded in Each Country
The steps in the individual series analysis are the following: (1) Test the hypothesis that the series is Gaussian against the hypothesis that it is not Gaussian by using the Anderson-Darling (AD) test [49].(2) Test the homoskedasticity hypothesis against the heteroskedasticity by the Fligner-Killeen (KF) test [50] (since it is less sensitive to the normality violation) [51].(3) Perform the Mann-Kendall (MK) [52] to test the randomness hypothesis against the existence of a monotonic trend.If the null is rejected, the slope of a linear trend will be computed by Sen's [53].The following series were subject to this analysis: (a) The CO2 series recorded in each country (30 series); (b) The total CO2 emissions during 1990-2021.
(4) Perform the Augmented Dickey-Fuller (ADF) test [54] to assess the existence of a unit root vs. the time series stationarity for the series from (3). ( 5) Perform the Kruskal-Wallis (K-W) test [55] to test whether the series in a group originates from the same distribution against the alternative that at least one comes from a different distribution.When the null hypothesis was rejected, the post-hoc Dunn's test [56], with the adjustment proposed by Hochberg [57], was run.(6) Modeling the time series from (3) using the ARIMA technique.
Considering a time process ( ), t∈Z, denote by ∆  the difference of d-th order of  .( ) is an autoregressive integrated moving average process, ARIMA (p,d,q), if:

Methodology
The study is split into two parts.The first focuses on evaluating the evolution of the time series recorded in each country and yearly CO 2 series.The second one concerns building RegS.

Study the Time Series Recorded in Each Country
The steps in the individual series analysis are the following: (1) Test the hypothesis that the series is Gaussian against the hypothesis that it is not Gaussian by using the Anderson-Darling (AD) test [49].(2) Test the homoskedasticity hypothesis against the heteroskedasticity by the Fligner-Killeen (KF) test [50] (since it is less sensitive to the normality violation) [51].(3) Perform the Mann-Kendall (MK) [52] to test the randomness hypothesis against the existence of a monotonic trend.If the null is rejected, the slope of a linear trend will be computed by Sen's [53].The following series were subject to this analysis: (a) The CO 2 series recorded in each country (30 series); (b) The total CO 2 emissions during 1990-2021.
(4) Perform the Augmented Dickey-Fuller (ADF) test [54] to assess the existence of a unit root vs. the time series stationarity for the series from (3). ( 5) Perform the Kruskal-Wallis (K-W) test [55] to test whether the series in a group originates from the same distribution against the alternative that at least one comes from a different distribution.When the null hypothesis was rejected, the post-hoc Dunn's test [56], with the adjustment proposed by Hochberg [57], was run.(6) Modeling the time series from (3) using the ARIMA technique.
Considering a time process (X t ), t∈Z, denote by ∆ d X t the difference of d-th order of X t .(X t ) is an autoregressive integrated moving average process, ARIMA (p,d,q), if: where Φ and Θ are polynomials of p and q orders, respectively, with roots higher than 1, and (ε t ), t∈Z is white noise [58].
An autoregressive process of order p, AR (p), is a particular case of ARIMA, with q = d = 0.A moving average process of order q, MA (q), is an ARIMA (0,0,q).If, after performing the ADF test, the null hypothesis was rejected, then d = 0. Otherwise, one may differentiate the series and perform the ADF test again until the null is rejected.Otherwise, one may differentiate the series and perform again the ADF test until the null is rejected.Thus, d will be equal to the number of differences taken on the series when getting the null rejected.The p and q orders were selected by analyzing the charts of the autocorrelation and partial autocorrelation functions (ACF and PACF, respectively).The model with the lowest value of the Akaike criterion [59] was kept.The Ljung-Box [60] test was used to check that the residuals form white noise.
All tests were performed at a significance level of 0.05.A p-value less than 0.05 computed in a test (except Dunn's) leads to rejecting the corresponding null hypothesis.In Dunn's test, the null hypothesis rejection is performed if the p-value < α/2.

Building RegS
The algorithm run for building RegS from the series recorded in the 30 European countries has the following steps [61,62]: Determine the optimal number of clusters, k, perform the k-means and hierarchical clustering, and choose the best clustering.
The k-means and hierarchical clustering [63][64][65] were utilized to group the series recorded in the 30 countries.The advantages and disadvantages of these techniques are presented in [66], shortly.
The silhouette [67], elbow [68], and gap statistics methods [69] were employed to determine the optimum k for running the k-means algorithm.Sometimes, these techniques do not provide the same k; thus, the study was performed for each possibility, and the best one was selected using some criteria that will be presented in the following.
The ratio WSS (the within-cluster sum of squares) and BSS/TSS (the between-clusters sum of squares by the total sum of squares) were computed to determine the best clustering in the k-means algorithm.The higher the BSS/TSS is, the better is the clustering.A smaller WSS indicates better groupings, as well [70].
Hierarchical clustering is a method whose graphical output is a dendrogram that indicates the series' hierarchy.The distance between elements is computed in the first stage, and the distance matrix is built.Different distance functions can be used for this purpose, including Euclidean, Manhattan, Hamming, Jaccard, etc.In this study, the first one was utilized.
Hierarchical clustering can be agglomerative or divisive.In the first one utilized here, a cluster is initially created for each element, then, successively, the groups are merged until only one cluster is obtained.The selection of the clusters to be merged at an intermediate step is performed after checking the distances between the couples of clusters.The most similar clusters (couples with the lowest distances between them) are merged.The cluster similarity is defined using different linkage methods, among which are "average", "complete" ,"median", "ward.D", "ward.D2", etc. [65,71].In the "average" ("complete") method, the average (maximum) distances between pairs of elements (one from a cluster and the other one from another cluster) are returned.These two methods performed the best on the CO 2 dataset, taking into account the values of the cophenetic correlation coefficient [72,73].Values greater than 0.9 (between 0.8 and 0.9) indicate a very good (good) clustering quality, whereas less than 0.8 show poor clustering results.After obtaining the clusters, the average Jaccard (AvgJaggard) measures and associated instabilities are computed to verify whether the algorithm (k-means and hierarchical clustering, in this case) provided a satisfactory representation in different groups of the studied dataset.AvgJaggard > 0.85 indicates a high stability of the cluster.AvgJaggard in [0.6, 0.85) (lower than 0.6) shows that the cluster is stable (unstable) [74].

2.
Select the cluster with the highest number of elements, Cl max .When at least two clusters have this property, Cl max is the one with the smallest WSS.

3.
Compute RegS, whose elements are the averages of the series from Cl max .More precisely, the value assigned to year j is the mean of values recorded in the same year in the countries from Cl max .4.
Evaluate the modeling errors for each series by subtracting the values of RegS from the recorded values.5.
Estimate the RegS's goodness-of-fit of by computing the mean absolute percentage error (MAPE).MAPE was chosen because it is a non-dimensional index that can be utilized for comparing different models.
The R 4.3.2software (https://www.r-project.org/) was used to perform the study.

Analysis of the CO 2 Time Series
The Anderson-Darling test rejected the null hypothesis for the series recorded in BE, DK, EE, ES, FR, HU, IE, IS, IT, LT, MT, NL, PL, PT, SI, SK, and the total CO 2 series.The homoscedasticity hypothesis was rejected for the IS, IT, LT, LU, NO, and SI series.
The MK trend test applied to the total CO 2 series obtained by summing all series values recorded in each country during 1990-2021 rejected the null hypothesis.The nonparametric Sen's procedure provided a negative slope of −2.9589×10 7 for the period 1990-2021.The polynomial trend determined for 1990-2002 has the equation: and after 2003, it has the equation: where t is the time and y t is the value of the series at the moment t.Table 1 presents the p-values computed in the MK test for the CO 2 series recorded in each country during the study period.If the p-value is less than 0.05, it is accompanied by the sign plus or minus between the brackets, meaning that the slope of the linear trend computed by Sen's method is positive or negative, respectively.Out of 30 series, the null hypothesis was rejected for 20 s.A positive trend was found for the AT, CY, and IS series, whereas a negative trend was estimated for 17 countries.The ADF test could not reject the hypothesis of a unit root existence for any series at the significance level of 5%.Therefore, there is not enough evidence for the series stationarity.Therefore, to reach stationarity, some transformations must be employed.The results from Table 2 indicate that all models except for ES were validated by the Ljung-Box test.Still, relatively high MAPEs were found in those of LV and SE.Therefore, we must find a better approach to model the ES, LV, and SE series.It will be the subject of another article.The AT, BG, CH, CZ, DE, HU, IS, LT, MT, PL, SE, and SK series are described by ARIMA (0,1,0), which is obtained by taking the first-order difference of the raw series.AR(1) models were determined for EE, HR, NO, and PT, AR(2) for LU, and MA(1) for FI and LV.The rest of the series, excluding ES, have models with d = 1 or 2. These findings are in concordance with those of the ADF test.
The model of the total CO 2 series is of ARIMA (0,1,0) with the drift = −3,481,622 (se = 18,455,798) and MAPE = 2.5848.The p-value in the Ljung-Box test is 0.7458.

Building RegS
The critical step of the algorithm used for building RegS is to determine the optimal number of clusters, k.Different values were found for k-two, three, or one-by the silhouette, the elbow-knee method (Figure 2), and gap statistics, respectively.Since the result provided by the last method is influenced by the outlier existence (Germany, in this case-Figure 1) and the small distances between some clusters [61], it was neglected; thus, the study was performed for the remaining alternatives.
Ljung -Box test.Still, relatively high MAPEs were found in those of LV and SE.Therefore, we must find a better approach to model the ES, LV, and SE series.It will be the subject of another article.The AT, BG, CH, CZ, DE, HU, IS, LT, MT, PL, SE, and SK series are described by ARIMA (0,1,0), which is obtained by taking the first-order difference of the raw series.AR(1) models were determined for EE, HR, NO, and PT, AR(2) for LU, and MA(1) for FI and LV.The rest of the series, excluding ES, have models with d = 1 or 2. These findings are in concordance with those of the ADF test.

Building RegS
The critical step of the algorithm used for building RegS is to determine the optimal number of clusters, k.Different values were found for k-two, three, or one-by the silhouette, the elbow-knee method (Figure 2), and gap statistics, respectively.Since the result provided by the last method is influenced by the outlier existence (Germany, in this case-Figure 1) and the small distances between some clusters [61], it was neglected; thus, the study was performed for the remaining alternatives.
The clusters resulting after running the k-means algorithm are shown in Figure 3a for k = 2 and in Figure 3b for k = 3, where the names of the countries are replaced, for simplicity, by numbers from 1 to 30 assigned from AT to SK (in an alphabetical order).The clusters resulting after running the k-means algorithm are shown in Figure 3a for k = 2 and in Figure 3b for k = 3, where the names of the countries are replaced, for simplicity, by numbers from 1 to 30 assigned from AT to SK (in an alphabetical order).The results from Table 2 indicate that all models except for ES were validated by the Ljung -Box test.Still, relatively high MAPEs were found in those of LV and SE.Therefore, we must find a better approach to model the ES, LV, and SE series.It will be the subject of another article.The AT, BG, CH, CZ, DE, HU, IS, LT, MT, PL, SE, and SK series are described by ARIMA (0,1,0), which is obtained by taking the first-order difference of the raw series.AR(1) models were determined for EE, HR, NO, and PT, AR(2) for LU, and MA(1) for FI and LV.The rest of the series, excluding ES, have models with d = 1 or 2. These findings are in concordance with those of the ADF test.

Building RegS
The critical step of the algorithm used for building RegS is to determine the optimal number of clusters, k.Different values were found for k-two, three, or one-by the silhouette, the elbow-knee method (Figure 2), and gap statistics, respectively.Since the result provided by the last method is influenced by the outlier existence (Germany, in this case-Figure 1) and the small distances between some clusters [61], it was neglected; thus, the study was performed for the remaining alternatives.
The clusters resulting after running the k-means algorithm are shown in Figure 3a for k = 2 and in Figure 3b for k = 3, where the names of the countries are replaced, for simplicity, by numbers from 1 to 30 assigned from AT to SK (in an alphabetical order).WSSs are 86.7687 and 197.9709, respectively, and BSS/TSS = 69.3%for k = 2. WSS is high for the second cluster that contains DE (7), FR (13), IT (18), and PL (25).WSS has significantly lower values for k = 3 (0.000, 27.1215, and 34.1752), and BSS/TSS = 93.4%.Thus, the algorithm provides a significantly higher separation between the clusters when k = 3.In this case, Germany (7) forms a single cluster, ES (11), FR (13), IT (18), NL (23), and PL (25) belong to the third cluster, and the rest of the countries form the second one.This output can be explained by the series' characteristics, like the average emissions levels or the existence of a certain type of trend of the time series in the same cluster.Indeed, the CO 2 emissions in Germany are much higher than those of the countries from the third cluster: ES, FR, IT, NL, and PL.Moreover, the last four mentioned series have a decreasing trend.Still, the groups' stability must be verified before selecting the best division of the countries in different subsets.The average Jaccard values and the instabilities for the two clusterings are shown in Table 3.When k = 2, the first cluster is stable, and the second one is highly stable.When k = 3, the first is stable, and the other two are highly stable.In both cases, the highest instability is that of the first cluster (that contains Germany).Hierarchical clustering was also formed to cross-validate the above grouping and select the best one.
The cophenetic coefficient was computed to choose the linkage method.The highest value (0.9723) was obtained when using the "average", followed by the "complete" (0.9473), compared to only 0.7474 in the case of "ward.D2", or 0.7291 in the case of "ward" methods.Therefore, the "average" and "complete" linkage were utilized.The clustering obtained is provided in Figure 4a  WSSs are 86.7687 and 197.9709, respectively, and BSS/TSS = 69.3%for k = 2. WSS is high for the second cluster that contains DE (7), FR (13), IT (18), and PL (25).WSS has significantly lower values for k = 3 (0.000, 27.1215, and 34.1752), and BSS/TSS = 93.4%.Thus, the algorithm provides a significantly higher separation between the clusters when k = 3.In this case, Germany (7) forms a single cluster, ES (11), FR (13), IT (18), NL (23), and PL (25) belong to the third cluster, and the rest of the countries form the second one.This output can be explained by the series' characteristics, like the average emissions levels or the existence of a certain type of trend of the time series in the same cluster.Indeed, the CO2 emissions in Germany are much higher than those of the countries from the third cluster: ES, FR, IT, NL, and PL.Moreover, the last four mentioned series have a decreasing trend.Still, the groups' stability must be verified before selecting the best division of the countries in different subsets.The average Jaccard values and the instabilities for the two clusterings are shown in Table 3.When k = 2, the first cluster is stable, and the second one is highly stable.When k = 3, the first is stable, and the other two are highly stable.In both cases, the highest instability is that of the first cluster (that contains Germany).Hierarchical clustering was also formed to cross-validate the above grouping and select the best one.
The cophenetic coefficient was computed to choose the linkage method.The highest value (0.9723) was obtained when using the "average," followed by the "complete" (0.9473), compared to only 0.7474 in the case of "ward.D2," or 0.7291 in the case of "ward" methods.Therefore, the "average" and "complete" linkage were utilized.The clustering obtained is provided in Figure 4a   (c) k = 3 and "average" method, (d) k = 3 and "complete" method.The countries are numbered from 1-AT to 30-SK, in the alphabetical order of the countries' names' abbreviations.
Climate 2024, 12, 34 9 of 14 For k = 2, both methods provided the same clusters.The second cluster contained all countries but Germany.When k = 3, apart from Germany, which forms a cluster, there is a slight difference between the other two clusters when using "average" and "complete" methods (Figure 4c,d).In the first case, ES (11), FR (13), IT (18), and PL(25) belong to a cluster, while in the second one, NL( 23) is added.
The clustering obtained by "complete" linkage, containing 1, 24, and 5 elements, coincides with that given by the k-means algorithm (Figure 3b).They cross-validate each other.The average Jaccard indicators computed after bootstrapping are given in Table 4. Considering the ratio BSS/TSS in the k-means algorithm, the average Jaccard values and instabilities in the k-means, and hierarchical clustering, the best choice is k = 3.Indeed, when using "average", two clusters are highly stable, and one is stable.If "complete" is employed, two clusters are stable, and one is highly stable.Both linkage methods provide good results in terms of AvgJaccard and instability.
That obtained by the "average" is better since the average instability is slightly lower than in the care of the "complete" method.Still, since the results are comparable, for building RegS, both alternatives were used, and the comparison is presented.Table 5 contains the MAPE (%).The MAPEs corresponding to the "complete" and "average" are denoted by MAPE_c and MAPE_av, respectively.Table 5. MAPE (%) in the RegS modeling built using the series form the second cluster.The cluster to which the country belongs is marked inside the bracket, with Roman numerals: I = 1, II = 2, and III = 3.The values of MAPE_c (MAPE_av) are in the following intervals: 20 (20) are lower than 100, one (zero) is between 100 and 200, 3 (2) are between 200 and 300, 3 (3) are between 300 and 500, and the rest are higher than 500.The series values from the first and third clusters and most from the second cluster are well estimated.The highest values correspond to the countries with the lowest emissions, among which CY and IS have increasing trends.Considering the average MAPE for all countries (MAPE_c = 298.88,MAPE_av = 342.44), and that the lower the MAPE, the better the fitting is, we can conclude that the best result was obtained using the first linkage method.

Country
To show that building RegS using the cluster with the highest number of elements is the best choice, we also built such a series using the series from the first and last clusters.MAPE was denoted by MAPE_DE (when using the first cluster-DE series), MAPE_c_III (when using the series from the third cluster with the "complete" method), and MAPE_av_III (when using the series from the third cluster with the "average" method).Based on the results from Table 6, the first choice is the worst, as it overestimates all the values from the other countries.Building RegS with the series from the third cluster provides significantly (sometimes more than ten times) higher MAPE than when using the second cluster for this purpose (see MAPE_c_III from Table 6 vs. MAPE_c from Table 5, and MAPE_av_III from Table 6 vs. MAPE_av from Table 5).
The regional series obtained in all cases are depicted in Figure 5, where RegS_c (RegS_av) is the series obtained by using the elements from the cluster with the highest number of elements and the "complete" ("average") method, DE is the series from the first cluster, and Reg_III_c (Reg_III_av) is the series obtained by using the elements from the cluster third cluster and the "complete" ("average") method.To show that building RegS using the cluster with the highest number of elements is the best choice, we also built such a series using the series from the first and last clusters MAPE was denoted by MAPE_DE (when using the first cluster-DE series), MAPE_c_II (when using the series from the third cluster with the "complete" method), and MAPE_av_III (when using the series from the third cluster with the "average" method) Based on the results from Table 6, the first choice is the worst, as it overestimates all the values from the other countries.Building RegS with the series from the third cluster provides significantly (some times more than ten times) higher MAPE than when using the second cluster for this purpose (see MAPE_c_III from Table 6 vs. MAPE_c from Table 5, and MAPE_av_III from Table 6 vs. MAPE_av from Table 5).
The regional series obtained in all cases are depicted in Figure 5, where RegS_c (RegS_av) is the series obtained by using the elements from the cluster with the highes number of elements and the "complete" ("average") method, DE is the series from the first cluster, and Reg_III_c (Reg_III_av) is the series obtained by using the elements from the cluster third cluster and the "complete" ("average") method.Apart from the overall decreasing trend captured by the models, one may note shor augmentation periods followed by abrupt decreases, which could be explained by the continuous monitoring and regulations that appeared and were applied to control and Apart from the overall decreasing trend captured by the models, one may note short augmentation periods followed by abrupt decreases, which could be explained by the continuous monitoring and regulations that appeared and were applied to control and restrict CO 2 emissions.Comparing the values from Tables 5 and 6 with the built regional series (Figure 5), it results a drastic overestimation of the regional evolution of CO 2 series over Europe.So, the best approach is that proposed in the methodology.
Comparing the results from Table 1 with those obtained by the first procedure for the total GHG series recorded in Europe during the same period [47], the following groups of countries are determined as follows: (a) Countries (HR, EE, FI, EL, IE, PT, SI, ES) for which the MK test did not reject H 0 for both CO 2 and GHG series, so no significant monotonic trend can be emphasized; (b) Countries for which H 0 was rejected and both CO 2 and GHG series have the same type of trend: negative (BE, BG, CH, CZ, DE, DK, FR, HU, IT, LT, LU, NL, PL, RO, SE, and SK) or positive (AT, CY, and IS); (c) Countries for which the CO 2 series has a negative slope of the trend, but H 0 was not rejected for the total GHGs series (MT).(d) Countries for which H 0 was not rejected, but the GHGs have a monotonic increasing (LV) or decreasing (NO) trend.
So, most CO 2 and total GHG series trends have a similar pattern.All series evolution but one can be described by ARIMA models.For most series, the correlation between the CO 2 and GHG series (reported by the same country) is above 0.973, except for the Netherlands (0.847) and Norway (0.853).

Conclusions
This article studied the CO 2 series reported by 30 countries from 1990 to 2021 and built the RegS using an original algorithm.RegS shows a decreasing trend of CO 2 emissions, whereas, at the individual level, measures should be applied in about half of the countries to achieve the goal of pollution decrement.
Statistical analysis of big data series from trustworthy sources provides the background for making scientifically based decisions on implementing measures for reducing the pollution from anthropic sources and mitigating environmental disasters.
An analogous study will be developed on the CO 2 volume per capita or GDP to assess the relationship between the economy, society, and environment in an interdisciplinary framework.A probabilistic approach to correlations between such variables will provide an in-depth analysis of the causality relationships.Other problems to be addressed are (a) the uncertainty, delays, and inertia in the proposed models; (b) integrating the socio-economic factors into environmental models; and (c) the risk evaluation.
Future research should answer the questions concerning the most appropriate measures and best implantation methods for sustainable development, and how to motivate society members to support actions towards a cleaner production and environment.

Figure 2 .
Figure 2. The optimal number of clusters determined by (left) the silhouette and (right) the elbow method for the CO2 series recorded in the European countries.

Figure 2 .
Figure 2. The optimal number of clusters determined by (left) the silhouette and (right) the elbow method for the CO 2 series recorded in the European countries.

Figure 2 .
Figure 2. The optimal number of clusters determined by (left) the silhouette and (right) the elbow method for the CO2 series recorded in the European countries.

Figure 3 .
Figure 3. Results of the k-means algorithm for grouping the series recorded in the European countries when (a) k = 2 and (b) k = 3. Numbers from 1 to 30 are assigned to the countries in the alphabetical order of names' abbreviations.
,b for k = 2 and Figure 4c,d for k = 3.

Figure 3 .
Figure 3. Results of the k-means algorithm for grouping the series recorded in the European countries when (a) k = 2 and (b) k = 3. Numbers from 1 to 30 are assigned to the countries in the alphabetical order of names' abbreviations.
,b for k = 2 and Figure 4c,d for k = 3.

Table 1 .
The p-values in the MK test on the CO 2 series reported by the European countries.

Table 2
contains the modeling results, as follows: -Column 2-the model type; -Column 3-the model's coefficients (when a simple differentiation did not lead to the model) and the corresponding standard error (se) inside the brackets; -Column 4-the drift, if it exists, and the standard error (se) of its estimation inside the brackets; -The p-value computed in the Ljung-Box test applied to the model's residual series; -The MAPE of the model.

Table 2 .
ARIMA-type models for the series recorded in each country.

Table 3 .
The average Jaccard values and instabilities in the k-means clustering.

Table 3 .
The average Jaccard values and instabilities in the k-means clustering.

Table 4 .
The average Jaccard values and instabilities in the hierarchical clustering.

Table 6 .
MAPE (%) when building regional series using the cluster I (DE) and III.
* Note: NL belongs to the cluster II (III) when using complete (average) method.

Table 6 .
MAPE (%) when building regional series using the cluster I (DE) and III.
* Note: NL belongs to the cluster II (III) when using complete (average) method.