Abstract
This study explores the convergence in greenhouse gas emissions (GHGs) and its determinants across 38 OECD countries during the period 1996–2022, employing the novel approach which combined club convergence method with supervised machine learning algorithm Extreme Gradient Boosting (XGBoost) and SHapley Additive exPlanations (SHAP) method. The findings reveal the presence of three distinct convergence clubs shaped by structural economic and institutional characteristics. Club 1 exhibits low energy efficiency, high fossil fuel dependence, and weak governance structures; Club 2 features strong institutional quality, advanced human capital, and effective environmental taxation; and Club 3 displays heterogeneous energy profiles but converges through socio-economic foundations. While traditional growth-related drivers such as technological innovation, foreign direct investments, and GDP growth play a limited role in explaining emission convergence, energy structures, institutional and policy-related factors emerge as key determinants. These findings highlight the limitations of one-size-fits-all climate policy frameworks and call for a more nuanced, club-specific approach to emission mitigation strategies. By combining convergence theory with interpretable machine learning, this study contributes a novel empirical framework to assess the differentiated effectiveness of environmental policies across heterogeneous country groups, offering actionable insights for international climate governance and targeted policy design.
1. Introduction
GHG emissions represent a principal driver of anthropogenic climate change, constituting one of the most urgent environmental challenges of the 21st century. The primary GHGs—carbon dioxide (CO2), methane (CH4), nitrous oxide (N2O), and fluorinated gases—contribute to radiative forcing by trapping infrared radiation in the Earth’s atmosphere, intensifying the natural greenhouse effect and contributing to observed increases in global mean surface temperature, accelerated cryospheric melt, rising sea levels, and an increased frequency and severity of extreme climatic events.
Beyond environmental degradation, rising GHG emissions pose systemic risks to human health, food and water security, and global economic stability. Air pollution—primarily driven by fossil fuel combustion and other GHG-emitting activities—accounts for roughly one in every eight deaths globally, amounting to an estimated 6.7 million premature deaths annually [1]. Furthermore, land-use changes associated with emissions-intensive sectors continue to drive deforestation at a rate of nearly 10 million hectares per year [2], undermining the planet’s carbon sinks. Intensifying heat stress linked to global warming is projected to result in global productivity losses equivalent to 80 million full-time jobs by 2030 [3].
Current estimates indicate that global mean temperature has risen by approximately 1.1 °C above pre-industrial levels (1850–1900). To limit this increase to 1.5 °C—as emphasized by the Paris Agreement and the Intergovernmental Panel on Climate Change (IPCC)—global GHG emissions must peak before 2025 and decline by roughly 43% relative to 2010 levels by 2030 [4]. In response, countries have been submitting Nationally Determined Contributions (NDCs) since 2020, each intended to enhance efforts to mitigate GHG emissions and adapt to the impacts of climate change. The mitigation targets indicated in NDCs differ greatly by country in terms of target type, sectoral, and gas coverage [5]. However, achieving the goals of the Paris Agreement requires a substantial reduction in per capita GHG emissions globally, which implies a gradual convergence in emission levels across countries.
In recent years, there has been growing attention to whether environmental degradation indicators such as GHGs and ecological footprint are converging across countries. While a number of studies have tackled this issue, most of them concentrate on identifying how economic, political, and institutional factors affect emission levels [6,7,8,9,10,11,12,13]. However, they tend to overlook a key question: what explains why countries follow different convergence paths? As a result, we still know relatively little about the factors that determine membership in different convergence clubs. Given that countries differ widely in their socio-economic, political, technological, and institutional structures—as well as in their development trajectories—it is important to consider how these differences shape convergence behavior. Recognizing this is crucial for designing climate policies that are not only effective in reducing emissions, but also fair across countries with varying national contexts.
Unlike previous studies that rely solely on traditional econometric techniques, this study adopts a hybrid methodological framework combining the club convergence approach [14] with supervised machine learning algorithms. Specifically, we employ the Extreme Gradient Boosting (XGBoost) classifier to uncover complex, non-linear relationships between country-level characteristics and convergence club membership, and utilize SHapley Additive exPlanations (SHAP) to interpret the marginal contribution of each feature. This approach enhances predictive power while maintaining model transparency, thereby offering a novel contribution to the environmental economics.
In this context, the main objective of the study is to understand the GHG emission dynamics of 38 OECD countries during the period 1996–2022, to identify convergence clubs, and to analyze the factors affecting the formation of these clubs. The findings obtained through advanced empirical methods aim to guide policymakers in developing more targeted, fair, and effective environmental policies.
The analysis focuses on OECD countries due to several key considerations. These countries provide relatively reliable and comprehensive data, ensuring robustness in panel-based analyses. They have historically exhibited high levels of greenhouse gas (GHG) emissions, making them relevant for studying convergence dynamics, and they generally place substantial emphasis on environmental policies, offering a suitable context for assessing the interaction between policy frameworks and emission trajectories. Furthermore, the OECD encompasses a mix of advanced, emerging, and transitioning economies, allowing for the examination of heterogeneous pathways toward environmental sustainability. Understanding convergence patterns within this group is particularly relevant for global mitigation efforts, as these countries are both major historical contributors to emissions and key actors in implementing ambitious climate policies. By exploring club-specific drivers, the study contributes to the design of differentiated yet equitable climate strategies that reflect countries’ diverse capabilities and responsibilities.The rest of the paper is structured as follows: Section 2 presents the literature review, Section 3 introduces the data and methodology, Section 4 discusses the findings, and the last section presents the conclusion.
2. Literature Review
Understanding the convergence of environmental indicators and the evolving role of machine learning in environmental analysis is crucial for identifying the key drivers of global sustainability transitions. This section reviews two major strands of the literature relevant to this study: (i) theoretical and empirical approaches to environmental convergence, and (ii) recent applications of machine learning—particularly XGBoost and SHAP—in environmental research.
2.1. Environmental Convergence: Theoretical Background and Empirical Evidence
Environmental convergence refers to the hypothesis that countries or regions gradually become more alike in terms of key environmental indicators, such as greenhouse gas (GHG) emissions and ecological footprints. Rooted in the economic convergence literature developed by [15], this concept has been extended to environmental issues in an effort to understand long-term trends in sustainability.
A variety of methodological approaches have been developed to investigate convergence in environmental outcomes. One of the earliest is σ-convergence, which focuses on a reduction in the dispersion of a given variable—such as GHG emissions—across countries over time. A declining standard deviation indicates that countries are becoming more similar in their environmental performance. In contrast, the β-convergence framework examines whether countries with initially higher emission levels tend to reduce them faster than those with lower initial levels, typically through cross-sectional or panel regressions. Stochastic convergence shifts the focus to the time-series properties of emissions, assessing whether deviations from a common path are temporary using unit root or stationarity tests. Unlike these approaches, the club convergence methodology proposed by [14] relaxes the assumption of a single common path and permits heterogeneous convergence patterns by identifying distinct subgroups—or “clubs”—of countries that exhibit internal convergence without aligning with the rest of the sample.
Several empirical studies have employed σ-convergence methodology to assess environmental convergence across different contexts and time periods. For instance, evidence of convergence is reported for 23 OECD countries out of 111 examined, with mixed results for the remaining 88 countries [16]. Convergence across 87 countries is also documented [17]. In addition, convergence is demonstrated for 176 countries [18].
Numerous studies also support the idea of environmental convergence, employing stochastic convergence. For example, convergence in CO2 emissions among 100 countries has been identified [19]. Similar evidence is found for 28 OECD countries [6], while convergence across seven global regions is reported [20]. Conditional convergence in various environmental indicators across 27 OECD countries is also documented [7]. However, contrasting findings exist as well. For instance, no convergence in per capita CO2 emissions for a sample of 30 OECD countries is reported [21].
Other researchers using β-convergence tests also highlight convergence patterns. Convergence in OECD nations is confirmed [22], while global convergence in CO2 emissions among 173 countries is demonstrated [23]. Evidence of convergence based on income groups in a sample of 110 countries is also presented [24]. Similarly, additional support for convergence in CO2 emissions across various country samples is provided [9,25].
Beyond these aggregate analyses, the club convergence method has gained traction for identifying environmental convergence within more homogeneous groups. Two convergence clubs in a global sample of 128 countries are identified [26]. Convergence in fossil fuel emissions among 162 countries is also uncovered [27]. Evidence of two convergence clubs in CO2 emissions for 53 countries is reported [28]. Regional applications also reflect these patterns: three convergence clubs for ecological footprints among EU countries [29], similar results for the G20 [30], and two convergence groups for ecological footprint among OECD nations [10].
These findings collectively suggest that convergence does not occur uniformly across countries but rather within distinct subgroups, reinforcing the relevance of club convergence in environmental policy analysis.
2.2. Machine Learning Applications in Environmental Research
In recent years, ML methods have grown in popularity in environmental research due to their capacity to handle complicated, non-linear correlations that traditional models may fail to capture. Unlike traditional econometric approaches, machine learning (ML) techniques are not restricted by rigid assumptions about functional form, allowing for greater flexibility in examining patterns in environmental data [31].
Deep learning and related algorithms have proven very beneficial for forecasting and modeling in the energy and climate areas. For instance, neural networks to forecast solar energy generation has been applied [32,33,34,35]. ML techniques also have been employed to estimate carbon dioxide emissions predictions in various national contexts [36,37,38,39].
Among ML tools, XGBoost has emerged as a top model for prediction and classification problems, notably in environmental applications. The model has been used to estimate electiricity demand [40], simulate air pollution [41], and forecast pollutant concentrations [42,43]. Recent research has broadened XGBoost’s applicability to policy-related topics such as Imran et al. (2024) investigating the impact of green technology and environmental taxation on incineration-related emissions [44], examining how urban infrastructure contributes to emission intensity [45], exploring how environmental regulations affect ecological resilience [46].
To better understand how specific features contribute to these models, SHAP values can be used. This approach, which quantifies each predictor’s marginal influence, providing a clearer interpretation of ML outcomes. Despite increased interest in combining XGBoost and SHAP in environmental modeling, their use to explain cross-countr convergence patterns in emissions remains uncommon. This study fills a gap by using these methodologies to determine which socio-economic, institutional, political and technological aspects influence a country’s membership in GHG emission convergence clubs.
3. Materials and Methods
3.1. Data and Variables
The empirical analysis relies on a comprehensive dataset covering 38 OECD countries over the period 1996–2022. All variables used in both stages of the study are summarized in Table 1. The complete set of variables is obtained from OECD, World Development Indicators (WDI), and Worldwide Governance Indicators (WGI). The variables selected in this study were determined based on theoretical foundations and existing empirical literature in the fields of GHG emissions and environmental economics. Environmental tax revenue and policy stringency by sector are included to capture the impact of government regulations aimed at reducing emissions. Energy-related variables such as energy efficiency, fossil fuel consumption, and renewable energy consumption are critical for understanding the impact of the efficiency of energy use and the structure of energy sources on emissions. Institutional quality and human capital indicators represent structural factors that can indirectly affect environmental outcomes through policy effectiveness and innovation capacity. Furthermore, economic growth, trade openness, and foreign direct investment are included in the model to assess the role of economic integration and development dynamics on environmental performance. Collectively, these variables provide a robust framework for comprehensively analyzing the multidimensional factors determining GHG emission convergence among OECD countries.
Table 1.
Variable Description.
3.2. Methodology
As mentioned above, this study employs a combined econometric and machine learning framework. The investigation proceeds in two main steps. In the first step, the log-t regression method [14] is applied to countries’ GHG emission trajectories to identify convergence clubs. In the second stage, a multi-class XGBoost classification model is used to investigate the socioeconomic, political, institutional, and environmental factors associated with club membership. The choice of XGBoost is motivated by its proven effectiveness in handling complex, high-dimensional datasets with nonlinear relationships and interactions among variables. XGBoost’s framework offers superior predictive accuracy, robustness to multicollinearity, and efficient computational performance compared to traditional methods. However, one common criticism of tree-based ensemble methods like XGBoost is their limited interpretability. To address this, SHAP, a model-agnostic interpretability tool grounded in cooperative game theory, which quantifies the marginal contribution of each feature to individual predictions, is employed. SHAP values enable transparent and granular understanding of feature importance and directionality, facilitating interpretation of heterogeneous effects across convergence clubs. Together, the combination of XGBoost and SHAP allows us to capture complex patterns in the data while providing interpretable insights into the drivers of club membership, thereby bridging the gap between predictive power and explainability. In order to assess the robustness of our results, a multinomial logistic regression (mlogit) model where club membership was regressed on structural determinants is utilized.
3.2.1. Club Convergence Analysis
To examine convergence in GHG emissions across OECD countries, the club convergence methodology (also known as the log-t regression test) will be employed. This approach has two main advantages. First, it does not rely on strict assumptions regarding trend stationarity or stochastic convergence, making it robust in the presence of complex dynamics. Second, it overcomes limitations such as biased and inconsistent estimates that arise from endogeneity and omitted variable problems typically encountered in augmented Solow-type regressions [47].
Regarding the time-varying factor loading model is defined as:
where denotes the observed GHG emissions per capita for country i at time t, is a common latent component, is the time-varying factor loading representing country-specific heterogeneity, and is the idiosyncratic error term. Since the model is under-identified, cannot be directly estimated. Therefore, a relative transition parameter can be used:
The relative transition parameter, denoted as , quantifies the loading coefficient in relation to the panel average at time . Equation (2) measures whether converges over time. Convergence is tested by checking if the cross-sectional variance of , denoted , approaches zero as time goes on (→ 0). The following log-t regression model is presented [14] in order to examine the convergence hypothesis:
where denotes the cross-sectional variance of , and is the convergence coefficient. Based on Monte Carlo simulations, setting r = 0.3 is suggested [14] when the sample is small (T ≤ 50). Convergence is confirmed when the null hypothesis is not rejected. If the t-statistic for is less than −1.65, the null is rejected in favor of divergence.
3.2.2. Extreme Gradient Boosting
To classify countries into their respective convergence clubs, the XGBoost algorithm [48] is employed. This approach is based on gradient-boosted decision tree (GBDT) method [49]. It introduces several optimizations to the GBDT, including: (i) a regularization term that reduces model complexity and prevents overfitting; (ii) a second-order Taylor expansion that optimizes the loss function, improving computational accuracy; and (iii) a block storage structure that increases computational efficiency [50]. As a result, this technique offers various advantages, including the ability to handle huge amounts of structured data, robustness against overfitting owing to regularization, applicability for multi-class classification tasks, and high prediction accuracy.
XGBoost minimizes a regularized objective function that balances classification loss and model complexity:
where L represents the loss function, and denote the true and predicted values. The regularization term controls model complexity, where γ penalizes tree splits and λ penalizes the magnitude of leaf weights. T, denotes the number of terminal nodes (leaves) in the tree, and represents leaf weights. The objective function evaluates the improvement in model performance after splitting a decision tree node. If the split leads to a positive gain, the splitting process continues; otherwise, it terminates.
XGBoost models require extensive parameter tuning, often involving tens of thousands of parameter combinations. The process of finding the optimal hyperparameters within such a large search space is known as hyperparameter optimization. Common methods for hyperparameter optimization include Grid Search, Bayesian optimization, and others [50]. To enhance generalizability and reduce the risk of overfitting, key hyperparameters were fine-tuned using grid search combined with five-fold cross-validation, evaluating learning rate maximum depth , number of trees , subsample ratios , and column sample by tree . The best model configuration was: , depth = 5, trees = 100, subsample = 0.7, colsample_bytree = 1.0.
Although XGBoost offers strong predictive power, it is often considered a “black-box” model. Therefore, to improve interpretability, SHAP values were computed [51]. SHAP is based on SHAPley values from cooperative game theory and attributes each feature’s marginal contribution to the model’s prediction:
where is the expected value of the model output and represents the marginal contribution of feature j to the prediction for observation i. The SHAP values were computed using TreeExplainer, an optimized method for tree-based models.
Global feature importance was assessed via SHAP summary plots, which rank features by their average absolute impact across all observations. Additionally, SHAP force plots were employed to interpret the direction and magnitude of each feature’s contribution at the individual level. To further analyze the marginal effects and potential non-linearities, Partial Dependence Plots (PDPs) were also computed. These plots illustrate how predicted club membership probabilities change with variations in a single predictor, averaged over the dataset.
All machine learning procedures were conducted in Python 3.11 using the xgboost and shap libraries. Detailed PDP visualizations are included in Appendix A, Appendix B and Appendix C, complementing the SHAP-based explanations.
4. Results and Discussions
Having identified the convergence clubs and trained the classification model, this section presents the empirical results. First, the outcomes of the club convergence analysis, including the number of clubs, their composition, and the statistical significance of their convergence parameters are summarized. Second, the SHAP and PDP outputs from the XGBoost classifier, which offer insights into the relative importance and functional role of each predictor in determining club membership, are analyzed.
4.1. Club Covergence Results
The tendency for clubs’ performances to converge or diverge over time is revealed numerically and visually through this analysis. Table 2 presents the estimated coefficients along with the corresponding t-statistics obtained using the log-t regression approach (In line with Phillips and Sul (2007, 2009) [14], the log-t regression was estimated using the HP filter (λ = 400) for detrending, with a window size r = 0.3 T, trimming parameter kq = 0.3, and the default Bartlett kernel. The relative transition paths for the final club partitions are reported in Appendix A). The log-t statistic for the full panel is –32.237, which is well below the critical value of –1.65. This result leads to the rejection of the null hypothesis of full convergence across the panel, thereby validating the application of the club convergence methodology.
Table 2.
Club Convergence Results.
An initial application of the clustering algorithm identifies six distinct convergence clubs. However, this method may overestimate the number of convergence groups [14]. Therefore, it is recommended to reapply the log-t test to evaluate whether smaller clubs can be merged into larger ones (The merging procedure was carried out iteratively following Schnurbus, Haupt, and Meier (2017). At each step, the log-t test was applied to adjacent clubs. If the joint log-t statistic exceeded the critical value of –1.65 at the 10% significance level (Phillips and Sul, 2007) [14], the clubs were merged into a new group. This process was repeated until no further merges were possible, yielding the final club classifications.). Following this procedure, Club 1 is found to converge with Clubs 2 and 3, while Club 4 converges with Club 5. Club 6, on the other hand, does not exhibit convergence with any other group. Consequently, three final convergence clubs are identified. The distribution of these three clubs is illustrated in Figure 1.
Figure 1.
GHG Emission Convergence Clubs Map.
The log-t results of final classification indicate convergence in all three clubs, as the test statistics exceed the −1.65 critical value (Club 1: t = 0.409; Club 2: t = 4.785; Club 3: t = 1.030). Nevertheless, the strength and speed of convergence differ markedly across groups. Club 2 displays the most pronounced and precise convergence, whereas Club 1 and Club 3 show slower and more weakly evidenced convergence, consistent with smaller coefficients and test statistics that lie only modestly above the cutoff. Overall, the findings suggest that while convergence is present across all clubs, it is substantively strongest in Club 2 and comparatively muted in Clubs 1 and 3. These results further indicate that the weaker convergence observed especially in Club 3 may be attributable to the greater heterogeneity of the countries within this group.
Figure 2 plots the transition paths of the three convergence clubs in per capita GHG emissions. Club 1 remains at the highest emission level with only a mild decline, Club 2 shows a steady downward adjustment from intermediate levels, and Club 3 follows the lowest path with a pronounced reduction. The patterns confirm multiple convergence equilibria and highlight heterogeneity in decarbonization trajectories across country groups.
Figure 2.
Transition Paths of GHG Convergence Clubs.
Table 3 reports the distribution of OECD countries across convergence clubs, both yearly and for aggregated periods, as well as the performance metrics of the XGBoost classifier. Overall, Club 1 consistently includes 17 countries (459 observations), Club 2 covers 8 countries (216 observations), and Club 3 consists of 13 countries (351 observations). The yearly distribution remains stable from 1996 to 2022, with no transitions across clubs over the study period, consistent with the log-t convergence test. When aggregated into three subperiods—1996–2002, 2003–2012, and 2013–2022—the relative sizes of the three groups remain balanced, suggesting persistent heterogeneity in emission trajectories across OECD countries.
Table 3.
Distribution of OECD countries across convergence clubs.
4.2. XGBoost Results
Evaluation metrics such as model’s prediction, accuracy, performance and feature importance are presented here in detail. The confusion matrix in Table 4 summarizes the classification outcomes of the XGBoost model across the three convergence clubs (Predictors are measured contemporaneously with labels (time t). Missing values are addressed through mean imputation. Variables are used in their original units without scaling. Country-specific fixed effects were not removed, so SHAP values capture both cross-sectional and temporal variation, highlighting actual structural drivers of club membership.). The classifier performs with near-perfect accuracy. Of the 92 observations assigned to Club 1, 90 were correctly identified, while only two instances were misclassified—one each into Clubs 2 and 3. Club 2 and Club 3 exhibit perfect classification performance, with all cases accurately predicted.
Table 4.
Confusion Matrix.
Table 4 represents how well the XGBoost model classified the countries into their respective clubs. Most cases in Club 1 were predicted correctly (90 out of 92), while Club 2 and Club 3 had no misclassifications. Only two countries in Club 1 were incorrectly labeled—one as Club 2 and one as Club 3. The results suggest that the features used in the model are effective in separating country groups with different emission paths.
As shown in Table 5, the overall performance of the XGBoost classification model was exceptionally high. The model achieved an accuracy of 98.05%, with a 95% confidence interval ranging from 97.44% to 98.66%, indicating strong reliability in predicting club membership based on environmental, political, institutional, and socioeconomic indicators. The Cohen’s Kappa score of 0.9848 indicates a very high level of consistency between the predicted and actual classifications, well beyond what could be expected by chance. Moreover, the extremely small p-value (<2.5 × 10−68) from the binomial test demonstrates that the model’s accuracy is not only statistically significant but also far exceeds what would be expected under a no-information baseline, underscoring the strength of the classifier. The bootstrapped confidence intervals are very narrow, confirming that the high performance is not the result of random variation in the sample. Precision and recall values are almost identical, which implies that the model avoids both false positives and false negatives. The balance between sensitivity and specificity reflects a reliable classifier that does not favor one class at the expense of others. This consistency is particularly important in convergence studies, where overlooking even a small subset of countries could bias conclusions about structural heterogeneity.
Table 5.
Overall Model Metrics (Bootstrapped, 95% Cl).
The identical results across different random seeds highlight the stability of the classifier (Table 6). Random initialization and sampling strategies do not affect the outcomes, meaning the learned decision boundaries are data-driven and not artifacts of model randomness. This strengthens the reliability of the feature importance analysis, since SHAP explanations build on stable underlying predictions.
Table 6.
Performance Across Random Seeds.
The cross-validation results in Table 7 confirm that the model generalizes well beyond a single train-test split. Misclassifications remain extremely rare across clubs, and the distribution of errors is balanced rather than concentrated in one group. This indicates that the model captures structural rather than period-specific or country-specific noise. The robustness across folds provides strong evidence that the identified predictors systematically differentiate convergence clubs.
Table 7.
Average Confusion Matrix.
Table 8 presents the detailed performance metrics for each convergence club predicted by the XGBoost model. The classifier showed outstanding recall scores, correctly identifying all instances in Clubs 2 and 3, and nearly all in Club 1 (97.83%). Specificity was also very high across the board, ranging from 99.26% to 100%, which means the model made very few false positive errors. Precision was flawless for Club 1 (100%) and remained above 97% for the other clubs. Similarly, the Negative Predictive Values (NPV) were consistently close to 100%, indicating strong performance in ruling out incorrect classifications. Balanced Accuracy—the average of sensitivity and specificity—surpassed 98.9% in all cases, pointing to the model’s well-rounded and unbiased classification ability.
Table 8.
Class-wise Performance Metrics.
Table 9 presents the feature importance scores derived from the XGBoost model using the gain metric, which reflects the average improvement in model accuracy when a particular feature is used in a decision tree split. The most influential variable was Enr_use, contributing approximately 14.7% of the total gain. This was followed by Trade and Hum_cap, with normalized importance values of 12.7% and 10.8%, respectively. Other relevant predictors included Env_tax, Fossil, Urb_pop. In contrast, Fdi and Eco_tech had relatively lower contributions to the model’s overall predictive power. These results indicate that energy-related and macro-structural variables play a leading role in explaining cross-country differences in environmental convergence patterns.
Table 9.
Feature Importance.
4.3. SHAP Analysis Results
In this section, the contribution of each feature in the decision-making process of the XGBoost model is explained visually using SHAP analysis. Unlike aggregate SHAP analysis, this approach allows us to capture heterogeneity in variable importance across convergence clubs and identify the context-specific drivers of environmental convergence. For clarity and consistency, the interpretation is organized into three thematic dimensions: energy structure, institutional quality, and socio-economic capacity.
Figure 3 demonstrates the shap analysis results for Club 1. This club is primarily characterized by low energy efficiency and high fossil fuel dependence, which emerge as the strongest determinants of membership. However, renewable energy use exerts a positive influence, suggesting that countries in this cluster attempt to offset structural weaknesses through renewable investments. While this compensatory effect is visible, it does not fully counterbalance the adverse impact of inefficiency and fossil reliance.
Figure 3.
SHAP Values for Club 1.
Institutional and policy indicators display intermediate but mixed impacts. Environmental taxation and institutional quality tend to reduce the likelihood of membership, suggesting that stronger fiscal and governance frameworks are associated with escape from this cluster. Policy stringency, by contrast, has a weak but positive contribution, highlighting that while regulatory tools exist, they are insufficient to drive structural transformation.
Socioeconomic variables contribute more moderately. Urban population exerts heterogeneous effects, but with a tendency toward positive contributions, suggesting that more urbanized members often remain locked into fossil-intensive systems while some outliers deviate from this pattern. Trade openness also shows mixed and inconsistent effects, pointing to divergent integration patterns into global markets. GDP per capita and foreign direct investment contribute positively but only marginally, indicating that economic size and capital inflows alone do not shape convergence. Human capital shows heterogeneous effects, reflecting uneven educational and skill dynamics, and eco-technology adoption exerts a weak but positive role, signaling limited innovation-driven transformation.
Overall, Club 1 emerges as the most vulnerable pathway, defined by inefficiency, fossil dependence, weak institutional anchors, and only partial compensation through renewable adoption and socioeconomic improvements. This makes it fragile, but also the most open to policy-driven transformation.
In Club 2, institutional and human capital features underscore the central role in shaping membership (Figure 4). Hum_cap emerges as the most influential determinant, with higher values consistently associated with positive SHAP contributions. This indicates that countries with well-educated populations and advanced skill bases are more likely to belong to Club 2. Inst and Env_tax also contribute positively and significantly, reinforcing the interpretation that this cluster represents countries with robust governance structures and effective policy frameworks for climate and environmental management.
Figure 4.
SHAP Values for Club 2.
Energy-related variables play a more nuanced role in this group. Renewable energy generally reduces the likelihood of Club 2 membership, whereas fossil fuel use shows heterogeneous effects: some countries rely on fossil fuels, but strong institutional and human capital capacities compensate for this, sustaining convergence. Enr_eff plays only a modest role compared to governance and human capital features.
Other factors, including Trade, Urb_pop, and Pol_strn, contribute moderately and in some cases inconsistently. Socioeconomic indicators such as Gdp, Fdi, and Eco_tech remain secondary determinants, with relatively small SHAP magnitudes, underscoring that traditional growth or technology drivers are not the primary explanatory variables for this cluster.
In sum, Club 2 reflects a convergence pathway anchored in strong human capital and governance capacity, complemented by institutionalized environmental policies. The presence of consistently positive SHAP effects from education, governance, and environmental taxation confirms that structural capacity building and policy design play a decisive role in shaping this group. The high and significant log-t statistic further demonstrates that these countries follow the strongest and most coherent convergence trajectory among the three identified clubs.
The SHAP analysis for Club 3 reveals that this group follows a convergence pathway primarily driven by energy-related features (Figure 5). Enr_eff stands out as the most important determinant, with consistently positive SHAP values. Countries with higher efficiency are substantially more likely to be classified into this cluster. In contrast, Fossil exerts a strong negative influence, indicating that a lower reliance on fossil energy is a defining characteristic of Club 3 membership. Renew plays a moderate role, with generally negative but heterogeneous effects across countries. Hum_cap is ranked as the second most important factor; however, its contribution is predominantly negative at higher values. While seemingly counterintuitive given the presence of advanced economies such as Denmark, Sweden, and Switzerland in this club, the result highlights that convergence within this group is not explained by human capital alone but rather by the interaction of socioeconomic capacity with structural energy transitions. Other socio-economic indicators, including Gdp, Fdi, and Eco_tech, exert only marginal effects, underscoring the limited role of traditional growth drivers in explaining emission convergence. Institutional and policy dimensions, including Env_tax, Inst, and policy stringency (Pol_strn), provide limited but mixed contributions. Although these features are not uniformly decisive, they reinforce the structural transformation in the energy sector and complement socioeconomic drivers. Urb_pop also exerts dispersed SHAP effects, suggesting that demographic dynamics interact with broader institutional and energy conditions but do not singularly determine membership.
Figure 5.
SHAP Values for Club 3.
Overall, Club 3 reflects a convergence trajectory rooted in structural and energy-sector alignment rather than institutional or economic strength alone. The moderate but statistically significant log-t statistic confirms that these countries are converging through diversified yet structured pathways. This explains why advanced economies with strong capacity for energy transition—such as Denmark, Sweden, and Switzerland—are consistently located in this cluster.
4.4. Robustness Check
The XGBoost model’s SHAP-based machine learning results were compared with mlogit regression results, using Club 1 as the base category, in order to assess the consistency and robustness of the results (Table 10). While SHAP values reflect the marginal contribution of each variable to model predictions in a non-parametric framework, mlogit coefficients provide insights into the direction and magnitude of relationships between covariates and the likelihood of club membership within a parametric setting.
Table 10.
Mlogit Results.
Despite methodological differences, both approaches exhibit a high degree of consistency in identifying the relative importance and direction of key variables across the three convergence clubs. For instance, energy efficiency shows a strong positive association with membership in Club 3 in both the SHAP and mlogit results, while fossil fuel consumption and urban population growth exhibit strong negative contributions in the same club across both methods. Similarly, human capital positively contributes to Club 2 and negatively to Club 3 in both approaches. Renewable energy use generally shows a negative association with membership in Clubs 2 and 3 in the mlogit results, which aligns with the mixed but predominantly negative SHAP effects observed in these clubs. Institutional quality and environmental taxation also display broadly consistent positive effects on Club 2 membership across both analyses, although their impact is more nuanced in Club 3. Policy stringency and trade openness have smaller and less consistent influences; both SHAP and mlogit results suggest these variables contribute only modestly or insignificantly to club membership probabilities. Likewise, eco-friendly technologic development and foreign direct investment appear to have limited and statistically weak effects in both methods. The alignment in sign, magnitude, and relative importance across approaches suggests that the SHAP-derived insights are not model-specific artifacts, but rather reflect underlying structural patterns in the data.
This consistency across distinct modeling frameworks strengthens the credibility of our findings. It highlights that the underlying drivers of greenhouse gas emissions convergence in OECD countries are robust to different methodological specifications, thereby reinforcing the explanatory value of key variables identified through SHAP analysis.
5. Conclusions
This study first examines greenhouse gas (GHG) emission convergence among 38 OECD countries from 1996 to 2022 using the Phillips and Sul (2007) [14] club convergence method. Then, supervised machine learning techniques are applied to analyze the factors that determine membership in the identified convergence clubs. The results confirmed the existence of 3 convergence clubs, reflecting heterogeneous emission dynamics. An XGBoost classifier trained on socioeconomic, institutional, and environmental variables achieved an accuracy of 99.03%, and SHAP analysis identified the political, socio-economic and institutional factors identifying club membership.
The three different greenhouse gas emission convergence clubs identified in our study differ based on the energy performance, institutional structures and structural capacities of the countries. In this context, different policy priorities should be adopted for each club.
Club 1 consists of countries with lower performance in terms of energy efficiency and higher fossil fuel use, uneven or modest renewable adoption and unclear institutional/fiscal orientations. In these countries, priority should be given to incentives to increase energy efficiency and support renewable energy investments. In addition, a more sustainable and harmonious policy framework should be created by strengthening environmental taxes and institutional capacity. Considering that this club shows a looser convergence, it is important to flexibly consider local and sectoral needs in policy designs.
Club 2 consists of countries with high human capital, strong institutional capacity, and effective environmental taxation mechanisms. Although the relationship between fossil fuel use, energy efficiency and club membership is somewhat mixed, countries in this cluster generally show lower reliance on fossil fuels. Policy recommendations for the countries in this club should be to maintain existing strengths and tighten environmental regulations. In addition, these countries should be supported to take a leading role in reducing carbon emissions with innovative financial instruments and international cooperation mechanisms. Thus, this club can provide higher and more consistent convergence.
Club 3 includes countries that follow a convergence process based on structural and socioeconomic capacity. Although they exhibit a heterogeneous profile in terms of energy performance, high energy efficiency and low fossil fuel use are the main factors that increase membership in these countries. In addition, renewable energy use and institutional quality provide moderate but significant contributions. In this context, institutional capacity should be a priority in these countries. Flexible and holistic development strategies should be developed in line with the diversity in the energy sector. Coordination between environmental taxation and energy policies should be increased.
Importantly, across all three clubs, environmentally friendly technological development exhibits limited influence on convergence outcomes. However, this area is critical for the long-term low-carbon transformation. Therefore, R&D investments in green technologies should be increased and innovative environmental technologies should be supported by stronger public policies and financial incentives.
As a result, in the process of convergence of greenhouse gas emissions, rather than uniform policy approaches, it is necessary to design targeted policy packages that are appropriate for the different needs and capacities of countries, taking into account their club-based characteristics. In this way, global climate goals can be achieved in a more effective and comprehensive way.
This study introduces a novel integration of club convergence analysis with XGBoost and SHAP to uncover heterogeneous drivers of environmental convergence across OECD countries. By combining machine learning explainability with traditional econometric approaches, it provides robust, nuanced insights into how key socio-economic and energy-related factors—including energy efficiency, human capital, fossil fuel use, and institutional quality—influence convergence clubs. The findings offer valuable guidance for tailored policy interventions to promote sustainable energy transitions and climate action.
Future research could investigate the potential effects of regime shifts, structural breaks, and other time-varying dynamics on greenhouse gas emission convergence. Employing rolling-window analyses, regime-specific models, or alternative machine learning algorithms would provide a more nuanced understanding of how policy changes, economic shocks, and global events shape country trajectories. Additionally, extending the framework to include all countries and incorporating time-varying data would help validate and enrich the findings presented here, offering a stronger basis for policy recommendations tailored to different convergence pathways.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The datasets analysed during the current study are available in the OECD, WDI and WGI.
Conflicts of Interest
The author declares no conflicts of interest.
Abbreviations
| GHGs | Greenhouse Gas Emissions |
| XGBoost | Extreme Gradient Boosting |
| SHAP | SHAPley Additive Explanations |
| CO2 | Carbon Dioxide |
| CH4 | Methane |
| N2O | Nitrous Oxide |
| IPCC | Intergovernmental Panel On Climate Change |
| NDCs | Nationally Determined Contributions |
| ML | Machine Learning (ML) |
| WDI | World Development Indicators |
| WGI | Worldwide Governance Indicators |
| PDPs | Partial Dependence Plots |
| NPV | Negative Predictive Values |
Appendix A. Partial Dependence for Club 1

Appendix B. Partial Dependence for Club 2

Appendix C. Partial Dependence for Club 3

References
- World Health Organization. World Health Statistics: Monitoring Health for the SDGS, Sustainable Development Goals; World Health Organization: Geneva, Switzerland, 2024. [Google Scholar]
- Nationen, V. (Ed.) Realizing the Importance of Forests in a Changing World; The Global Forest Goals Report; United Nations: New York, NY, USA, 2021. [Google Scholar]
- Kjellström, T.; Maître, N.; Saget, C.; Otto, M.; Karimova, T. Working on a Warmer Planet: The Effect of Heat Stress on Productivity and Decent Work; International Labour Organization: Geneva, Switzerland, 2019. [Google Scholar]
- IPCC. Global Warming of 1.5 °C: IPCC Special Report on Impacts of Global Warming of 1.5 °C above Pre-Industrial Levels in Context of Strengthening Response to Climate Change, Sustainable Development, and Efforts to Eradicate Poverty, 1st ed.; Cambridge University Press: Cambridge, UK, 2022; Available online: https://www.researchgate.net/publication/329841417_Global_warming_of_15C_An_IPCC_Special_Report_on_… (accessed on 28 August 2025).
- GHG Emission Trends and Targets (GETT): Harmonised Quantification Methodology and Indicators; OECD Environment Working Papers No. 230. 2024. Available online: https://www.oecd.org/en/publications/ghg-emission-trends-and-targets-gett_decef216-en.html (accessed on 28 August 2025).
- Barassi, M.R.; Spagnolo, N.; Zhao, Y. Fractional Integration Versus Structural Change: Testing the Convergence of CO2 Emissions. Environ. Resour. Econ. 2018, 71, 923–968. [Google Scholar] [CrossRef]
- Solarin, S.A. Convergence in CO2 emissions, carbon footprint and ecological footprint: Evidence from OECD countries. Environ. Sci. Pollut. Res. 2019, 26, 6167–6181. [Google Scholar] [CrossRef]
- Lawson, L.A.; Martino, R.; Nguyen-Van, P. Environmental convergence and environmental Kuznets curve: A unified empirical framework. Ecol. Model. 2020, 437, 109289. [Google Scholar] [CrossRef]
- Sun, H.; Kporsu, A.K.; Taghizadeh-Hesary, F.; Edziah, B.K. Estimating environmental efficiency and convergence: 1980 to 2016. Energy 2020, 208, 118224. [Google Scholar] [CrossRef]
- Bektaş, V.; Ursavaş, N. Revisiting the environmental Kuznets curve hypothesis with globalization for OECD countries: The role of convergence clubs. Environ. Sci. Pollut. Res. 2023, 30, 47090–47105. [Google Scholar] [CrossRef]
- Borowiec, J.; Papież, M. Convergence of CO2 emissions in countries at different stages of development. Do globalisation and environmental policies matter? Energy Policy 2024, 184, 113866. [Google Scholar] [CrossRef]
- Ahouangbe, V.L.; Turcu, C. How bilateral foreign direct investment influences environmental convergence. World Econ. 2024, 47, 37–95. [Google Scholar] [CrossRef]
- Diallo, S. Effect of renewable energy consumption on environmental quality in sub-Saharan African countries: Evidence from defactored instrumental variables method. Manag. Environ. Qual. Int. J. 2024, 35, 839–857. [Google Scholar] [CrossRef]
- Phillips, P.C.B.; Sul, D. Transition Modeling and Econometric Convergence Tests. Econometrica 2007, 75, 1771–1855. [Google Scholar] [CrossRef]
- Barro, R.J.; Sala-i-Martin, X. Convergence. J. Polit. Econ. 1992, 100, 223–251. [Google Scholar] [CrossRef]
- Aldy, J.E. Per capita carbon dioxide emissions: Convergence or divergence. Environ. Resour. Econ. 2006, 33, 533–555. [Google Scholar] [CrossRef]
- Ezcurra, R. Is there cross-country convergence in carbon dioxide emissions? Energy Policy 2007, 35, 1363–1372. [Google Scholar] [CrossRef]
- Bigerna, S.; Bollino, C.A.; Polinori, P. Convergence in renewable energy sources diffusion worldwide. J. Environ. Manag. 2021, 292, 112784. [Google Scholar] [CrossRef]
- Strazicich, M.C.; List, J.A. Are CO2 Emission Levels Converging Among Industrial Countries? Environ. Resour. Econ. 2003, 24, 263–271. [Google Scholar] [CrossRef]
- Acaravcı, A.; Erdogan, S. The convergence behavior of CO2 emissions in seven regions under multiple structural breaks. Int. J. Energy Econ. Policy 2016, 6, 575–580. [Google Scholar]
- Lee, J.; Yucel, A.G.; Islam, M.T. Convergence of CO2 emissions in OECD countries. Sustain. Technol. Entrep. 2023, 2, 100029. [Google Scholar] [CrossRef]
- McKibbin, W.J.; Stegman, A. Convergence and per capita carbon emissions. CAMA Work. Pap. Ser. 2005, 167, 1–69. [Google Scholar]
- Brock, W.A.; Taylor, M.S. The green Solow model. J. Econ. Growth 2010, 15, 127–153. [Google Scholar] [CrossRef]
- Li, X.; Lin, B. Global convergence in per capita CO2 emissions. Renew. Sustain. Energy Rev. 2013, 24, 357–363. [Google Scholar] [CrossRef]
- Karakaya, E.; Alataş, S.; Yılmaz, B. Replication of Strazicich and List (2003): Are CO2 emission levels converging among industrial countries? Energy Econ. 2019, 82, 135–138. [Google Scholar] [CrossRef]
- Panopoulou, E.; Pantelidis, T. Club Convergence in Carbon Dioxide Emissions. Environ. Resour. Econ. 2009, 44, 47–70. [Google Scholar] [CrossRef]
- Herrerias, M.J. The environmental convergence hypothesis: Carbon dioxide emissions according to the source of energy. Energy Policy 2013, 61, 1140–1150. [Google Scholar] [CrossRef]
- Haider, S.; Akram, V. Club convergence analysis of ecological and carbon footprint: Evidence from a cross-country analysis. Carbon Manag. 2019, 10, 451–463. [Google Scholar] [CrossRef]
- Ulucak, R.; Apergis, N. Does convergence really matter for the environment? An application based on club convergence and on the ecological footprint concept for the EU countries. Environ. Sci. Policy 2018, 80, 21–27. [Google Scholar] [CrossRef]
- Bilgili, F.; Ulucak, R. Is there deterministic, stochastic, and/or club convergence in ecological footprint indicator among G20 countries? Environ. Sci. Pollut. Res. 2018, 25, 35404–35419. [Google Scholar] [CrossRef]
- Varian, H.R. Big Data: New Tricks for Econometrics. J. Econ. Perspect. 2014, 28, 3–28. [Google Scholar] [CrossRef]
- Elsaraiti, M.; Merabet, A. Solar Power Forecasting Using Deep Learning Techniques. IEEE Access 2022, 10, 31692–31698. [Google Scholar] [CrossRef]
- Salman, D.; Direkoglu, C.; Kusaf, M.; Fahrioglu, M. Hybrid deep learning models for time series forecasting of solar power. Neural Comput. Appl. 2024, 36, 9095–9112. [Google Scholar] [CrossRef]
- Gul, E.; Baldinelli, G.; Wang, J.; Bartocci, P.; Shamim, T. Artificial intelligence based forecasting and optimization model for concentrated solar power system with thermal energy storage. Appl. Energy 2025, 382, 125210. [Google Scholar] [CrossRef]
- Panigrahi, R.; Patne, N.R.; Padmanaban, S.; Sana Amreen, T. Comparative and Novel Priority Ranking Analysis for Short-Term Solar Power Prediction. IETE J. Res. 2025, 71, 1–16. [Google Scholar] [CrossRef]
- Appiah, K.; Du, J.; Appah, R.; Quacoe, D. Prediction of Potential Carbon Dioxide Emissions of Selected Emerging Economies Using Artificial Neural Network. J. Environ. Sci. Eng. A 2018, 7, 321–335. [Google Scholar]
- Zhao, H.; Huang, G.; Yan, N. Forecasting Energy-Related CO2 Emissions Employing a Novel SSA-LSSVM Model: Considering Structural Factors in China. Energies 2018, 11, 781. [Google Scholar] [CrossRef]
- Komeili Birjandi, A.; Fahim Alavi, M.; Salem, M.; Assad, M.E.H.; Prabaharan, N. Modeling carbon dioxide emission of countries in southeast of Asia by applying artificial neural network. Int. J. Low-Carbon Technol. 2022, 17, 321–326. [Google Scholar] [CrossRef]
- Sakilu, O.B.; Chen, H. Carbon Dioxide Emissions Prediction of Selected Developing Countries Using Artificial Neural Network. J. Knowl. Econ. 2025; Advance online publication. [Google Scholar]
- Li, C.; Zheng, X.; Yang, Z.; Kuang, L. Predicting Short-Term Electricity Demand by Combining the Advantages of ARMA and XGBoost in Fog Computing Environment. Wirel. Commun. Mob. Comput. 2018, 2018, 5018053. [Google Scholar] [CrossRef]
- Ma, J.; Yu, Z.; Qu, Y.; Xu, J.; Cao, Y. Application of the XGBoost Machine Learning Method in PM2.5 Prediction: A Case Study of Shanghai. Aerosol Air Qual. Res. 2020, 20, 128–138. [Google Scholar] [CrossRef]
- Li, J.; An, X.; Li, Q.; Wang, C.; Yu, H.; Zhou, X.; Geng, Y. Application of XGBoost algorithm in the optimization of pollutant concentration. Atmos. Res. 2022, 276, 106238. [Google Scholar] [CrossRef]
- Begum, A.M.; Mobin, M.A. A machine learning approach to carbon emissions prediction of the top eleven emitters by 2030 and their prospects for meeting Paris agreement targets. Sci. Rep. 2025, 15, 19469. [Google Scholar] [CrossRef] [PubMed]
- Imran, M.; Jijian, Z.; Sharif, A.; Magazzino, C. Evolving waste management: The impact of environmental technology, taxes, and carbon emissions on incineration in EU countries. J. Environ. Manag. 2024, 364, 121440. [Google Scholar] [CrossRef]
- Xu, C.; Xiong, W.; Zhang, S.; Shi, H.; Wu, S.; Bao, S.; Xiao, T. Research on the Nonlinear Relationship Between Carbon Emissions from Residential Land and the Built Environment: A Case Study of Susong County, Anhui Province Using the XGBoost-SHAP Model. Land 2025, 14, 440. [Google Scholar] [CrossRef]
- Wu, N.; Zhou, Y.; Yin, S.; Gong, H.; Zhang, C. Revealing the nonlinear impact of environmental regulation on ecological resilience using the XGBoost-SHAP model: Evidence from the Yangtze River Delta region, China. J. Clean. Prod. 2025, 514, 145700. [Google Scholar] [CrossRef]
- Du, K. Econometric Convergence Test and Club Clustering Using Stata. Stata J. Promot. Commun. Stat. Stata 2017, 17, 882–900. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar]
- Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
- Shan, T.; Feng, S.; Li, K.; Chang, R.; Huang, R. Unveiling the effects of artificial intelligence and green technology convergence on carbon emissions: An explainable machine learning-based approach. J. Environ. Manag. 2025, 373, 123657. [Google Scholar] [CrossRef] [PubMed]
- Lundberg, S.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. arXiv 2017, arXiv:1705.07874. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).