Forecasting National Sustainability Trajectories with Deep Learning: Predictability, Surprise, and Early Predictive Signals

Lan, Hai; Terbeck, Fabian

doi:10.3390/su18115530

Open AccessArticle

Forecasting National Sustainability Trajectories with Deep Learning: Predictability, Surprise, and Early Predictive Signals

by

Hai Lan

^*

and

Fabian Terbeck

Department of Earth Sciences, University of South Alabama, Mobile, AL 36688, USA

^*

Author to whom correspondence should be addressed.

Sustainability 2026, 18(11), 5530; https://doi.org/10.3390/su18115530

Submission received: 22 April 2026 / Revised: 26 May 2026 / Accepted: 28 May 2026 / Published: 1 June 2026

(This article belongs to the Section Development Goals towards Sustainability)

Download

Browse Figures

Versions Notes

Abstract

Sustainability monitoring has mainly focused on measuring where countries stand today, rather than anticipating where they are headed. This study develops an AI-based forecasting framework to predict national sustainability outcomes and identify countries whose actual paths deviate from predictions. Using 749 World Development Indicators across 184 countries and regions from 2003 to 2022, a Temporal Fusion Transformer (TFT) is developed using data from 2003 to 2017 (training and validation) and evaluated on a held-out 2018 to 2022 test period, with calibrated prediction intervals constructed retrospectively over the test period. Assuming that historical development patterns remain informative over the forecast horizon, the model achieves mean absolute errors of 1.10 for the Sustainable Development Goals Index (SDGI, 0 to 100 scale) and 0.008 for the Human Development Index (HDI, 0 to 1 scale), reducing error by at least 19 percent for SDGI and 60 percent for HDI relative to linear trend and XGBoost baselines. Of 184 countries and regions, 115 (62 percent) are classified as on-track for both indices. Among the rest, 35 show positive SDGI deviations, mostly developing nations in Sub-Saharan Africa and South Asia that are exceeding their forecast trajectories, while 23 show negative HDI deviations concentrated among nations affected by conflict and economic disruption. We find this asymmetric pattern is consistent with a decoupling between goal-level and capability-level sustainability, in which policy-driven SDG indicators can advance while foundational human development in health and income stalls. Our model identifies economic indicators as the dominant predictors of HDI (7 of the top 10), while SDGI prediction draws on a more balanced mix of economic, social, environmental, and institutional indicators. We also find that better governance is associated with lower prediction error for both SDGI (p = 0.004) and HDI (p < 0.001), suggesting that countries and regions with stronger institutions follow more predictable sustainability trajectories.

Keywords:

sustainable development; SDG forecasting; time series prediction; temporal fusion transformer; sustainability trajectories; governance

1. Introduction

The Sustainable Development Goals adopted by the United Nations in 2015 have generated a large ecosystem of composite indices, dashboards, and annual reports designed to track where countries stand [1,2]. The Sustainable Development Goals Index (SDGI) published by the Sustainable Development Solutions Network aggregates performance across all 17 goals into a single score [3,4], while the Human Development Index (HDI), maintained by the United Nations Development Programme, captures health, education, and income dimensions that reflect foundational human capabilities [5]. Together, these two indices offer complementary perspectives on national sustainability, with the SDGI reflecting policy goals and the HDI capturing structural capabilities.

Although there have been advances in the analysis of multiple interconnections of the SDGs, most of the work remains from a diagnostic perspective. Indeed, there is a large body of work on pairwise interactions between the SDGs showing that synergies between goals are more prevalent than trade-offs, but at the same time the balance varies across income groups, regions or policy contexts [6,7,8,9,10]. Network analyses have shown that the SDGs interact differently across income groups [11], and governance scholars have argued that implementing policy across goals requires restructuring institutions [12]. These studies have produced important insights into where progress is lagging and how goals interact with one another [13,14]. Forecasting methods can complement these diagnostic efforts by establishing empirical baselines against which structural shifts can be detected. They share, however, a common analytical orientation, characterizing current states or static relationships rather than asking whether the trajectory itself can be anticipated from the past.

A different question has received far less attention. Can a country’s sustainability trajectory be predicted from its development history, and what does it reveal when the actual path diverges from what historical patterns would lead us to expect?

The limited forecasting literature that does exist operates under important constraints. Chenary et al. [15] forecast SDG scores through 2030 using ARIMAX and linear regression smoothed with the Holt–Winters multiplicative method, selecting predictors based on their relevance to SDG targets. Their analysis covered six world regions aggregated from national data, and the results projected that OECD countries would reach a score of approximately 80, while Sub-Saharan Africa would remain below 60. However, the regional aggregation forgoes country-specific trajectories altogether, and the statistical models used do not capture nonlinear or cross-indicator dynamics. Xing et al. [16] made a substantial contribution by projecting 117 individual SDG indicators for 167 countries using a neural network time-series approach, comparing average annual change rates before and after SDG adoption in 2015 to classify country–goal pairs as advancing, regressing, or stagnating. Their work projected that global SDG achievement would reach approximately 63 percent by 2030. However, each indicator was modeled independently in a univariate framework, meaning the approach cannot discover cross-domain predictive relationships or leverage the shared information among hundreds of development variables. Rabbi [17] developed a more integrated approach that combines Random Forest for feature selection, LSTM for temporal forecasting, and SHAP for interpretability, and applied it to EUROSTAT sustainability indicators across seven EU countries from 2014 to 2023. Their study identified global value chain participation, social protection expenditure, and municipal recycling as key drivers of sustainability outcomes. Although methodologically rich, the study’s scope is limited to seven countries over ten years, and the composite index used is constructed with fixed weights (50 percent economic, 30 percent social, 20 percent environmental), which constrains its generalizability. A recent cross-sectional study [18] applied K-Means clustering to the 2025 SDG Index for 166 countries, grouping nations into five sustainability performance clusters and validating the groupings with Random Forest, SVM, and ANN classifiers. While useful for snapshot-based profiling, this work contains no temporal dimension and cannot address questions about trajectory. Meanwhile, deep learning and artificial intelligence have been applied successfully to sustainability-adjacent forecasting tasks, including ESG index prediction in emerging markets [19], energy demand and solar radiation forecasting [20,21], municipal waste volume prediction [22], and demand forecasting for eco-friendly vehicles [23], demonstrating that these methods are mature enough for complex time-series applications in sustainability contexts.

However, a clear gap remains. Few studies have jointly used high-dimensional development indicators to predict composite sustainability scores with deep learning, quantified deviations from predicted trajectories using calibrated uncertainty bounds, or identified which types of indicators carry the most predictive weight across domains.

This study addresses these gaps by shifting the analytical stance from diagnosis to prognosis. We ask whether a country’s past development can predict its current sustainability path, what it means when a country strays from that predicted path, and which sustainability goals are most relevant in predictive models. We address four research questions: (1) How predictable are national SDGI and HDI trajectories when modeled with multivariate deep learning, and does predictability vary across income groups? (2) Which countries and regions deviate significantly from their predicted trajectories, and do deviations show a systematic discrepancy between the two targets? (3) Which development indicators are the strongest predictors and do they differ for the SDGI and HDI? (4) Is governance quality associated with the predictability of sustainability trajectories? To answer these questions, we first evaluate model performance and predictability across income groups, then identify surprise countries and cross-target patterns, then extract predictive signals by domain, and finally examine the relationship between governance quality and forecast accuracy.

The Temporal Fusion Transformer (TFT) [24] is an interpretable deep learning architecture for multi-horizon time-series forecasting that incorporates learned variable selection, temporal attention, and distributional output mechanisms, and is employed as the core forecasting model in this study. The model is developed using 749 World Development Indicators across 184 countries and regions from 2003 to 2017 (training and validation), with the most informative 200 features retained after variance-based screening and evaluated on a held-out 2018 to 2022 test period. Prediction intervals are constructed using conformal inference methods [25,26,27,28], which produce distribution-free uncertainty bounds from model residuals. Countries whose actual trajectories consistently fall outside these intervals are classified as “surprises” in this study, which indicates cases where observed outcomes deviate meaningfully from predicted paths.

This study makes three contributions to the sustainability monitoring literature that distinguish it from prior work. First, it constructs a global predictability mapping across 184 countries and regions that quantifies, for the first time, the degree to which national sustainability trajectories follow historical patterns and identifies where those patterns break down. Second, it introduces a framework to detect surprising cases based on conformal prediction intervals and tests its robustness at multiple coverage levels (80, 90, and 95 percent), thereby providing a method for identifying countries and regions whose development paths diverge from expectations. Third, it identifies which development indicators best predict sustainability outcomes by comparing the drivers of progress on specific SDGI goals with broader human capability measures in the HDI.

2. Materials and Methods

Our research design consists of four stages (Figure 1). Data assembly and preprocessing (Figure 1a) produce a balanced panel of 200 development indicators across 184 countries and regions. Three forecasting models (Figure 1b) generate point predictions for the SDGI and HDI. Conformal prediction intervals and surprise detection (Figure 1c) classify each country as on-track or in surprise. Predictive signal extraction and governance analysis (Figure 1d) interpret the model’s learned structure.

2.1. Data

The predictor variables in this study come from the World Development Indicators (WDIs) database maintained by the World Bank, which covers 749 development indicators across 186 countries and regions for the period from 2003 to 2022. We use a preprocessed and gap-filled version of the WDIs prepared by Li et al. [29], who applied Probabilistic Principal Component Analysis to impute the remaining gaps after removing variables and countries with more than 70 percent missingness, and standardized all retained indicators as z-scores across countries and years. The resulting WDI panel contains no missing values. This preprocessing was performed independently of the present study and applies only to the predictor variables. The two prediction targets, the SDGI and HDI, are published by their respective organizations and were not generated or imputed by the same procedure. To reduce dimensionality while retaining the most informative features for forecasting, we ranked indicators by cross-country variance computed exclusively on training-period data and selected the top 200. Variance-based ranking prioritizes indicators that most strongly differentiate countries and regions, making it a useful screening criterion for cross-country forecasting. The threshold of 200 was chosen to balance dimensionality reduction against information loss, and a sensitivity analysis confirms that model rankings are preserved under alternative thresholds of 150 and 250 features (see the Supplementary Material, Table S7). The 200 selected features cover the four thematic domains defined by the World Bank’s WDI classification: economic, social, environmental, and institutional. The impact of full-period standardization on feature selection and model inputs is analyzed in the Supplementary Material (Tables S1–S3). Each retained indicator was tagged with a thematic domain label of economic, social, environmental, or institutional based on its WDI code prefix, enabling domain-level interpretation of variable importance results.

The SDGI and HDI composite indices serve as prediction targets. The SDGI is a score on a 0 to 100 scale that aggregates performance across all 17 SDGs [3,4], with observed values in the study period ranging from approximately 36 to 87. SDGI data are available for 184 of the 186 countries and regions in the WDI panel, with complete annual coverage from 2003 to 2022 for all 184. Two regions, Hong Kong SAR and Macao SAR, were excluded because no SDGI data were available. We then filtered the HDI data to retain only the 184 countries and regions for which SDGI data were available. The HDI ranges from 0 to 1 and captures health, education, and income dimensions [5]. Both targets are used in their original measurement scales so that prediction intervals remain directly interpretable. The final panel is balanced across 184 countries and regions over a 20-year period, yielding 3680 observations with complete coverage of all retained features and both targets.

2.2. Temporal Split and Preprocessing

The data were partitioned into three non-overlapping time periods. The training set covers 2003 to 2015, a span of 13 years. The validation set covers 2016 and 2017, used for hyperparameter tuning and early stopping. The held-out test set covers 2018 to 2022, the period for which all results are reported. This partition balances a long enough training window for learning temporal patterns, a two-year validation period for model selection, and a five-year test horizon that covers both stable (2018–2019) and disrupted (2020–2022) conditions. This strict temporal holdout ensures that no future information enters model training. The test window encompasses a period of significant global disruption, including a pandemic, armed conflicts, and energy market volatility, which lends substantive meaning to the surprise analysis without framing the study around any single event.

2.3. Forecasting Models

Forecasting sustainability trajectories from high-dimensional development data requires a model that can handle heterogeneous panel structure, learn from hundreds of input features, and produce interpretable outputs that reveal which indicators drive predictions. The TFT meets all three requirements and has demonstrated state-of-the-art performance on multi-horizon forecasting benchmarks across energy demand, retail, and traffic domains [24]. It has been applied in environmental contexts, including air quality prediction [30]. A comprehensive survey of deep learning for time-series forecasting is provided by Lim and Zohren [31]. Three components of the architecture are directly relevant to this study. The Variable Selection Network applies learned softmax-normalized gates to each input feature, producing per-variable importance weights that indicate which indicators the model relies on most. The Interpretable Multi-Head Attention mechanism learns temporal weights over past time steps, revealing which historical periods the model considers most informative. The quantile regression output layer produces distributional forecasts at multiple quantile levels.

Rather than building separate models for each country, our method handles differences between countries by teaching the model to recognize each country through a learned identifier. This design allows the model to share temporal dynamics across the full sample while capturing country-specific baseline patterns, an approach that has been shown to improve forecasting accuracy based on short individual time-series panel data [24,32]. World Bank income group and UN region are included as additional static covariates. The model was implemented using the PyTorch Forecasting library (version 1.6.1). Hidden size was set to 64, attention heads to 4, dropout to 0.1, learning rate to 0.001, and batch size to 64. These hyperparameters follow the recommendations in the original TFT paper for datasets of comparable size [24], and no additional hyperparameter search was performed. Training proceeded for a maximum of 100 epochs with early stopping at a patience of 10 epochs on validation loss. Separate models were trained for the SDGI and HDI targets. The encoder length was set to 13 years and the prediction length to 5 years. During the test period, the model generates all five annual predictions (2018–2022) in a single forward pass using only data observed up to 2017. No test-period target values or WDI covariate values are fed back. All WDI covariates are designated as time-varying unknown inputs, so the model receives no future covariate information during the prediction horizon and only the time index is known in advance.

Two baseline models are included to contextualize the TFT’s performance. The TFT captures long-range temporal dependencies through multi-head attention, while XGBoost provides a strong tabular baseline using gradient-boosted trees, and the linear trend model serves as the simplest extrapolation benchmark. The first is a per-country linear trend, which fits an ordinary least squares regression model of the target variable over time using the training period and extrapolates into the test window. Linear extrapolation is a standard benchmark in sustainability forecasting studies [15,16] because many development indicators evolve smoothly over time, making this a deliberately hard baseline to beat. The second is XGBoost [33], a gradient-boosted tree ensemble that has consistently ranked among the top tabular learning algorithms in benchmarking studies [34]. XGBoost uses the three most recent target values plus the 200 WDI features at time t-1 to predict the target at time t. During the test period, prediction proceeds auto-regressively, with predicted values from prior test years serving as lag inputs and actual WDI features from the preceding year used as covariates. This provides a strong non-sequential tabular baseline. XGBoost was configured with 100 estimators, maximum depth of 4, learning rate of 0.1, and L1 and L2 regularization parameters both set to 1.0.

2.4. Conformal Prediction Intervals

We use conformal prediction to convert point forecasts into prediction intervals, so that surprise detection can be conducted based on calibrated uncertainty. Identifying countries and regions whose sustainability trajectories deviate from expectations requires prediction intervals with known statistical properties. Quantile outputs generated by deep learning models can suffer from miscalibration, producing intervals that are either too narrow or too wide in practice [27]. Conformal inference [25,26] addresses this problem by providing distribution-free prediction intervals without requiring assumptions about the error distribution. Conformal prediction has been adopted increasingly in applied machine learning [27,28] and has been extended to time-series settings in energy forecasting, financial prediction, and environmental modeling [35,36].

The conformal interval is built in three steps. First, each forecasting model generates a point forecast for each country–year. Second, we compute the difference between actual and predicted values on the test set (920 country–year observations per target). Third, we set the interval as the point forecast plus or minus a threshold derived from these residuals at the desired coverage level (e.g., 80 percent), following the conformal method introduced by Vovk et al. [25] and Romano et al. [26]. To account for differences in forecast difficulty across countries, we apply a locally adaptive variant [37] in which each country’s residuals are scaled by its own median absolute deviation (MAD). This scaling produces narrower intervals for countries with stable trajectories and wider intervals for volatile ones. These intervals serve as retrospective uncertainty bands calibrated over the completed test period to identify countries whose trajectories deviate from model expectations. The prediction intervals are calibrated and evaluated on the same test set, so overall empirical coverage matches the nominal level by construction. This means that prediction interval width, not coverage, is the metric that differentiates models (Table 1). Empirical coverage by year and by income group is reported in the Supplementary Material (Tables S4 and S5). Coverage declines from 96 percent (SDGI) and 98 percent (HDI) in 2018 to 57 percent for both targets in 2022 and is relatively uniform across income groups.

2.5. Surprise Detection

A country is classified as exhibiting a sustainability surprise when its actual trajectory persistently falls outside the calibrated prediction interval (PI). Our main definition of a surprise requires the actual value to lie outside the adaptive 80 percent PI for at least two consecutive years during the 2018 to 2022 test period. The 80 percent level is chosen as a balance between detection power and false positive control, consistent with standard practice in conformal prediction [27,28]. The two-consecutive-year requirement filters out single-year anomalies that may reflect measurement noise or transient shocks rather than genuine trajectory shifts. Values above the upper bound are labeled positive surprises, and values below the lower bound are labeled negative surprises. The surprise classification is therefore a retrospective identification of countries whose trajectories deviated from model expectations during the completed test period.

Two robustness variants are evaluated to ensure that findings are not sensitive to the choice of coverage level. These use the 90 percent and 95 percent PIs with the same two-consecutive-year requirement.

A cross-target typology is constructed by intersecting the SDGI and HDI surprise classifications, producing four categories. These are countries and regions on track for both targets, those with SDGI-only surprise, those with HDI-only surprise, and those with both targets in surprise. To examine whether surprises concentrate in the later portion of the test window, we compare the rate of country–year observations falling outside the prediction interval between 2018–2019 and 2020–2022.

2.6. Predictive Signal Extraction

The Variable Selection Network within the TFT produces learned, softmax-normalized weights for each input feature, indicating its contribution to the forecast [24]. These weights are intrinsic to the model and are optimized jointly with the forecasting objective, as opposed to post hoc attribution methods such as SHAP [38]. We extract the global encoder variable importance, averaged across samples, for each target and map each feature to its thematic domain to compute domain-level cumulative predictive weight.

These weights reflect each predictor’s relevance within the model, not causal relationships, and we make no claim that changing any identified feature would alter sustainability trajectories [39].

2.7. Governance and Predictability Analysis

If sustainability trajectories are more predictable in some countries and regions than others, a natural question is whether institutional quality helps explain this variation. We examine this by regressing country-level prediction error on governance quality. The Worldwide Governance Indicators (WGIs) published by the World Bank provide six dimensions of governance, including Voice and Accountability, Political Stability, Government Effectiveness, Regulatory Quality, Rule of Law, and Control of Corruption [40]. Following standard practice in cross-country empirical research [40], we compute a composite governance score as the arithmetic mean of the six dimensions. The composite is measured using 2015 values to avoid temporal overlap with the test period. An ordinary least squares regression is estimated with country-level mean absolute error as the dependent variable and the WGI composite as the independent variable, with income group and region as control variables and robust standard errors. This analysis is correlational and not intended to identify a causal governance effect.

3. Results

3.1. Model Performance and Interval Calibration

The forecasting performance of all three models on the held-out test set is reported in Table 1, using the MAE and RMSE as standard accuracy metrics in time-series forecasting [31] that allow direct comparison with prior SDG projection studies [15,16]. The TFT achieves the lowest error on both targets, with a mean absolute error (MAE) of 1.10 for the SDGI and 0.008 for the HDI. These represent improvements of 19 percent and 60 percent over the linear trend baseline, respectively. The linear model outperforms XGBoost on both targets. Sustainability trajectories exhibit strong path dependence, and the dominant signal is a smooth trend-like change that a simple linear extrapolation already captures effectively. XGBoost, although a strong tabular learner, appears to amplify noise through its auto-regressive lag formulation when applied to these highly inertial series. The TFT improves upon the linear model by capturing modest nonlinear dynamics, but the improvement for SDGI is limited, suggesting that SDGI trajectories are close to linear over the training horizon.

The conformal prediction intervals are calibrated at the 80 percent level for all three models. The models differ in PI width, which reflects the precision of their point predictions. The TFT produces the narrowest 80 percent intervals (4.01 SDGI points, 0.035 HDI), followed by the linear model (5.07, 0.072), with XGBoost producing the widest (9.78, 0.184). At the same nominal coverage level, a narrower interval represents a more informative uncertainty estimate. Surprise detection in subsequent sections uses the TFT intervals at the 80 percent level.

3.2. Global Predictability Landscape

The distribution of country-level prediction errors by income group is shown in Figure 2. For the SDGI, the median country-level MAE is 0.88, with a mean of 1.10 and a standard deviation of 0.83. The highest individual error is observed for Seychelles at 5.41 SDGI points. For the HDI, the median MAE is 0.006, with a mean of 0.008 and a standard deviation of 0.007. Prediction errors are modestly higher for low-income countries, consistent with the expectation that institutional volatility and external shocks reduce trajectory predictability in less stable settings. The differences across income groups are, however, smaller than might be anticipated, reflecting the strong global inertia in development trends even among the poorest countries. This suggests that the framework is applicable across countries with diverse institutional settings, not only those with strong institutions.

When scaled to the observed range of each index, HDI errors are proportionally smaller (MAE/range is about 1.2 percent for HDI versus 2.2 percent for SDGI), and the HDI error distribution is more tightly concentrated. This is consistent with what the two indices capture. The HDI reflects slow-moving capability dimensions, such as life expectancy and educational attainment, which resist year-to-year fluctuations. The SDGI includes policy-sensitive indicators such as renewable energy share and institutional quality scores that can shift more abruptly in response to reforms or crises.

3.3. Sustainability Surprises and Cross-Target Typology

3.3.1. Surprise Counts and Asymmetry

The surprise detection results under the main definition (80 percent PI, two consecutive years) and under stricter coverage levels (90 and 95 percent) are presented in Table 2. Under the main definition (80 percent PI, two consecutive years), 35 countries and regions exhibit positive SDGI surprises and nine exhibit negative surprises, with 140 on track. For the HDI, the pattern is inverted, with 11 positive surprises and 23 negative surprises, leaving 150 on track. The asymmetry between the two targets is one of the central findings of this study. The SDGI surprises skew positive, reflecting sustainability acceleration among developing nations such as Benin (+4.59), Togo (+3.23), and Rwanda (+1.88). The HDI surprises skew negative, reflecting disruptions to health and income dimensions in countries such as Venezuela (−0.040), Libya (−0.023), and Iran (−0.013) during the 2020 to 2022 period. When the two targets are combined in the cross-target typology, 115 countries and regions (62 percent) are on track for both, 35 (19 percent) show SDGI-only surprises, 25 (14 percent) show HDI-only surprises, and nine (5 percent) are double surprises. To assess whether the observed surprise counts exceed what would be expected by chance, we estimated the null expectation under an independence model in which each country–year has a 10 percent probability of falling above the upper bound and 10 percent below the lower bound. Under this model, approximately 6.8 countries per direction would be flagged as surprises by chance alone (see the Supplementary Material, Table S6 and Figure S1). The 35 SDGI-positive surprises (p < 0.0001) and 23 HDI-negative surprises (p < 0.0001) far exceed this expectation. The 11 HDI-positive surprises are slightly above the null expectation (p = 0.08). The nine SDGI-negative surprises fall within the null range (p = 0.24), suggesting that some of these deviations may reflect random variation rather than genuine trajectory shifts. Widening the interval to 90 or 95 percent coverage reduces surprise counts progressively, while the core set of surprise countries remains consistent, with all countries classified as surprises at the 95 percent level also appearing at the 80 percent level (Table 2).

The spatial distribution of the four-way typology is shown in Figure 3. Positive SDGI surprises cluster in Sub-Saharan Africa and parts of South and Southeast Asia, while negative HDI surprises are more geographically dispersed, spanning the Middle East, Eastern Europe, and Latin America.

3.3.2. Surprise Countries

The top positive and negative surprises for each prediction target are listed in Table 3. Complete lists of all surprise countries with mean residuals are provided in the Supplementary Material (Tables S8 and S9). Among positive SDGI surprises, developing nations dominate. Benin, Togo, Rwanda, China, and Ethiopia all exceeded their historical trajectory predictions by substantial margins, suggesting sustained improvements in sustainability that began in several cases before 2020. Negative SDGI surprises concentrate in states experiencing political collapse or armed conflict, including Venezuela, Yemen, and Syria. For the HDI, negative surprises extend beyond fragile states to include middle-income nations such as Iran, North Macedonia, Argentina, and Mexico, whose human development was disrupted by a combination of health system strain, economic contraction, and regional instability.

The temporal distribution of surprises is uneven across the test window. The average annual rate of observations falling outside the prediction interval rises from 0.06 per country in 2018–2019 to 0.29 in 2020–2022. This confirms that the later part of the test window, which includes multiple overlapping global disruptions, showed more trajectory deviations. However, not all surprises are attributable to post-2020 events. Countries such as Benin and Togo show sustained positive SDGI deviations that begin before 2020, pointing to structural acceleration rather than crisis-driven fluctuation.

Selected trajectory plots for the top surprise and on-track countries are shown in Figure 4.

3.3.3. Double Surprises and Goal–Capability Decoupling

Among the nine double-surprise countries, a dominant pattern is visible. Six of the nine, specifically Bulgaria, Cameroon, Cabo Verde, Iran, Ukraine, and the United States, exhibit positive SDGI surprise combined with negative HDI surprise. This combination suggests a decoupling between goal-level and capability-level sustainability, in which policy-driven SDG indicators, including institutional reform metrics and renewable energy targets, can advance even as foundational human capabilities in health and income are disrupted.

Ukraine provides a clear example. Its negative HDI deviation coincides with the armed conflict beginning in 2022 [41] and the associated disruption to health infrastructure and real incomes. At the same time, wartime institutional reforms and accelerated alignment with European governance standards may have contributed to the positive SDGI deviation. The United States shows a different pattern with a similar outcome. The negative HDI deviation coincides with a sharp decline in life expectancy during 2020 and 2021 [42], a period marked by the COVID-19 pandemic and the ongoing opioid crisis, while clean energy legislation during the same period may help explain the modest positive SDGI deviation.

Portugal stands alone as the only country with positive surprises on both targets. This outcome coincides with a period of fiscal consolidation, rapid renewable energy deployment, and European Union recovery fund investment [43]. Brunei is the sole double negative, with its deviations occurring during a period of oil price volatility and limited economic diversification. Mali shows the reverse of the dominant pattern, with negative SDGI surprise and positive HDI surprise. This may reflect the impact of political instability and armed conflict on governance and institutional indicators captured by the SDGI, while basic health and education outcomes continued to improve with support from international development assistance.

3.4. Early Predictive Signals

The top 20 features by variable selection weight for each target, together with the domain-level share of cumulative predictive weight (inset pie charts), are presented in Figure 5.

The predictive pattern for the HDI is heavily concentrated in the economic domain. Seven of the top ten features are economic indicators, including fuel exports as a share of merchandise trade, the poverty gap index, petroleum rents as a share of GDP, private consumption growth, and lending interest rates. Social indicators such as adult male mortality probability and national unemployment enter only from the eighth rank onward. This concentration is consistent with the HDI’s structural composition, in which income enters directly, and both health and education in developing countries are largely dependent on economic resources.

The profile for the SDGI is more diverse. The top ten include social indicators such as the female lower secondary out-of-school rate, access to basic drinking water services, and lower secondary completion rates. It also includes economic indicators such as the import unit value index and foreign direct investment inflows, as well as institutional indicators related to the control of corruption. No single domain monopolizes the predictive signal.

At the domain level (Figure 5, inset pie charts), economic indicators account for a much larger share of the total predictive weight for the HDI than for the SDGI. For the SDGI, the four domains contribute relatively more evenly. This means that monitoring systems built around economic indicators alone may track the HDI well but will miss important signals for broader sustainability change. Effective sustainability monitoring requires indicators from multiple domains that reflect the multi-dimensional nature of sustainability [44,45,46].

The model’s temporal attention weights differ between the two targets (see the Supplementary Material, Figure S2). For the HDI, attention increases steadily toward the most recent encoder years, indicating that the recent trajectory is most informative for prediction. For the SDGI, the pattern is U-shaped, with the highest attention at the earliest encoder year and a secondary rise toward the most recent years, suggesting that the model draws on both long-term baseline levels and recent dynamics when forecasting the SDGI.

3.5. Governance and Predictability Nexus

The relationship between governance quality and prediction error is shown in Figure 6 and Table 4. The relationship is negative and statistically significant for both targets. For the SDGI, the estimated coefficient is negative, 0.196, with a standard error of 0.067 and a p-value of 0.004. For the HDI, the coefficient is negative, 0.002, with a standard error of 0.001 and a p-value below 0.001. Countries and regions with stronger governance exhibit more predictable sustainability trajectories.

It is worth noting that this relationship is correlational rather than causal, and we cannot rule out confounding from factors such as conflict exposure or resource dependence that co-vary with governance quality. This does not imply that well-governed countries are static. They develop and change, but they do so through systematic, policy-driven processes that produce regular trajectories amenable to forecasting from historical trends. Countries with weaker governance experience more stochastic trajectories shaped by conflict, resource price shocks, and institutional instability, which are more difficult for any model to anticipate. The governance effect is somewhat stronger for the HDI, possibly because the health and income components of the HDI are more directly dependent on the quality of public service delivery.

4. Discussion

4.1. Path Dependence in Sustainability Trajectories

The finding that 62 percent of countries and regions remain within calibrated prediction intervals on both the SDGI and HDI prediction targets, and that a simple linear trend model outperforms a sophisticated tabular machine learning baseline (XGBoost), points to a fundamental characteristic of sustainability dynamics. National development trajectories show strong persistence, consistent with path-dependent dynamics [47] and with theories of sustainability transitions that emphasize institutional lock-in and incremental change as key mechanisms shaping long-term development trajectories [48,49]. They evolve through gradual accumulation of policy decisions, institutional capacity, and human capital investment, and they resist abrupt reorientation. For countries on positive trajectories, continuity of policy frameworks is likely sufficient to sustain progress. For countries locked into stagnating or declining trajectories, incremental adjustments may not be enough to alter course, and transformative interventions may be required.

The TFT outperforms the linear baseline, particularly for the HDI, where it achieves a 60 percent reduction in error. This suggests that nonlinear patterns exist in sustainability time series and that deep learning can capture them. However, the modest improvement for the SDGI tempers any claim that sophisticated models are necessary for all sustainability forecasting tasks. The choice of forecasting model should be guided by the complexity of the prediction target and the analytical purpose. When the goal is simply to project forward, a linear model may suffice. When the goal is to identify deviations from expectations and to understand which features drive predictions, the interpretable architecture of the TFT provides value that simpler models cannot. The framework identifies when trajectories deviate from historical patterns, but it is not designed to predict the onset of specific crises or shocks. When such events occur, they surface as surprises in the detection step rather than being anticipated in advance.

4.2. Goal and Capability Decoupling

The asymmetry in surprise patterns across the two prediction targets is worth examining in detail. SDGI surprises are predominantly positive, with 35 countries and regions outperforming their historical trends. These are concentrated in Sub-Saharan Africa and South Asia, with Benin (+4.59), Togo (+3.23), Rwanda (+1.88), and Ethiopia (+1.31) among the largest positive deviations, suggesting that recent governance reforms, infrastructure investment, and international development partnerships in these countries may have accelerated SDG progress beyond what historical trends alone would predict. The spatial distribution of surprise types is shown in Figure 3, where positive SDGI surprises cluster in West and East Africa while negative HDI surprises are more dispersed across the Middle East, Eastern Europe, and Latin America. HDI surprises are mostly negative, with 23 countries and regions underperforming. These include not only fragile states such as Venezuela (−0.040) and Libya (−0.023) but also middle-income nations such as Iran (−0.013), Mexico, and Argentina, whose health and income dimensions appear to have been disrupted by overlapping global and regional crises during the test period.

This asymmetry points to a decoupling between goal-level and capability-level sustainability. An alternative interpretation is that the apparent decoupling partly reflects differences in index construction. The SDGI includes policy-responsive indicators that can change rapidly, such as renewable energy share and protected area coverage, while HDI components such as life expectancy respond more slowly to shocks. The sensitivity of energy and environmental indicators to policy shifts has been documented in other contexts as well [50]. The observed divergence may therefore reflect both genuine decoupling and differences in measurement sensitivity across the two indices. The SDG framework’s goal-level indicators, which include policy-responsive metrics such as institutional quality scores and environmental protection targets, can continue advancing even when the underlying human capabilities that sustain long-term development, including life expectancy, educational attainment, and real income, stagnate or decline. The six countries and regions exhibiting simultaneous SDGI-positive and HDI-negative surprises illustrate this pattern, with Ukraine, Iran, and the United States each showing the combination through different contextual pathways, as discussed in Section 3.3.3.

The policy implication is that sustainability recovery assessments should not rely solely on SDG dashboards. A country whose SDGI score rebounds while its life expectancy remains depressed should not be interpreted as having fully recovered. It has experienced a decoupling between its goals and its capabilities. Monitoring frameworks should track both, as this study does, to prevent goal-level progress from being mistaken for comprehensive sustainability improvement.

4.3. Predictive Signals and Monitoring Design

The finding that the HDI and SDGI have structurally different predictive patterns has practical implications for how countries design their monitoring systems. The HDI is predicted primarily by economic indicators such as fuel exports, the poverty gap, and petroleum rents. This is consistent with the HDI’s structural composition, in which income enters directly, and both health and education in developing countries depend on fiscal resources. The SDGI, which spans environmental protection, institutional quality, social inclusion, and economic development simultaneously, requires a broader set of signals. For instance, control of corruption, female lower secondary out-of-school rates, and access to basic drinking water services all appear among the top SDGI predictors, but none rank highly for the HDI. A monitoring system built solely on economic variables would maintain reasonable early warning capacity for HDI shifts but would be largely blind to changes in the broader sustainability trajectory.

This finding adds a temporal and predictive dimension to the existing literature on SDG interactions and interdependencies [6,7,8,10,14,51]. Where previous work has documented static correlations between SDG dimensions, our analysis shows that these results are consistent with cross-domain information, contributing to the prediction of SDGI trajectories. Social and institutional indicators carry forward predictive relevance for SDGI outcomes, meaning that changes in education access or governance quality today can help predict sustainability trajectories. It is worth noting that these variable importance rankings identify predictive associations, not causal drivers. Policy interpretation of these rankings should be guided by domain expertise and supplemented by causal inference methods.

4.4. Governance and Forecastability

Better governance is associated with more predictable sustainability trajectories. The relationship is statistically significant and holds for both prediction targets. Countries and regions with stronger governance, such as those in Northern and Western Europe, tend to develop through systematic, policy-driven processes that leave regular traces in historical data. Countries and regions with weaker institutions, such as Venezuela, Yemen, and several conflict-affected states in the negative surprise group, experience more volatile trajectories shaped by conflict, corruption, and exogenous shocks. This relationship, while intuitive, has received little empirical attention in the sustainability forecasting literature. Forecastability may serve as a complementary governance-relevant signal. Countries and regions whose sustainability futures cannot be anticipated from their developmental histories, such as those with the largest prediction errors in Figure 6, may be exhibiting a form of institutional irregularity that warrants attention from both domestic policymakers and international development partners.

4.5. Limitations

This study has several limitations. First, the prediction intervals assume that residuals are exchangeable across the test period, an assumption that may be strained when the later test years contain structural disruptions not present in the earlier years. The decline in empirical coverage from 96–98 percent in 2018 to 57 percent in 2022 (see the Supplementary Material, Table S4) reflects this limitation. Surprise classifications should therefore be interpreted as relative indicators of trajectory deviation rather than exact probabilistic coverage levels. Second, the 20-year time series is short by deep learning standards. The pooled global model with country embeddings mitigates this constraint but does not eliminate it. Third, variable selection weights reflect predictive relevance rather than causal influence. Identifying the causal drivers of sustainability transitions would require quasi-experimental or instrumental variable approaches [39]. Fourth, only two composite targets, the SDGI and HDI, are used. Individual SDG goal-level forecasting remains an important direction for future work. Fifth, while the surprise detection framework is tested across different coverage levels (90 and 95 percent), the typology remains descriptive and does not model the mechanisms underlying trajectory deviations.

5. Conclusions

National sustainability trajectories are largely predictable from historical development data, with 62 percent of 184 countries and regions remaining within calibrated prediction intervals for both the SDGI and HDI over a five-year forecast horizon. Among the remainder, the SDGI surprises skew positive while the HDI surprises skew negative, pointing to a gap between goal-level progress and capability-level resilience.

Unlike previous SDG forecasting studies that rely on univariate projections or regional aggregations, this framework jointly models 200 development indicators at the country level with interpretable deep learning and calibrated uncertainty quantification. More specifically, this study offers three major contributions. First, the trajectory intelligence framework shifts sustainability monitoring from static diagnosis to forecasting with calibrated uncertainty. Second, the asymmetric surprise structure documents the differential vulnerability of sustainability dimensions to external shocks, with direct implications for recovery assessment. Third, the structural difference in predictive patterns between the SDGI and HDI informs the design of cross-domain monitoring systems that track both goal-level and capability-level dimensions.

Future work should extend this framework to individual SDG goal-level forecasting, incorporate remote sensing inputs such as nighttime light imagery as complementary features, expand the training period as more years of WDI data become available, and apply causal inference methods to identify the drivers of sustainability trajectory deviations rather than merely predicting their occurrence.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/su18115530/s1, Table S1: Training-period statistics of the published z-scores across 749 WDI features; Table S2: Model input comparison between full-period and training-only standardization; Table S3: Feature selection stability under counterfactual stand-ardization; Table S4: Empirical coverage of adaptive conformal prediction intervals by year; Table S5: Empirical coverage by HDI quartile; Table S6: Null expectation simulation results; Table S7: Feature count sensitivity analysis; Table S8: Complete list of SDGI surprise countries; Table S9: Complete list of HDI surprise countries; Figure S1: Null distributions of surprise counts by direction; Figure S2: Temporal attention weights for SDGI and HDI.

Author Contributions

Conceptualization, H.L. and F.T.; methodology, H.L.; validation, H.L. and F.T.; formal analysis, H.L.; investigation, H.L.; data curation, H.L.; writing—original draft preparation, H.L.; writing—review and editing, H.L. and F.T.; visualization, H.L.; supervision, H.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw World Development Indicators are publicly available from the World Bank at https://databank.worldbank.org (accessed on 10 February 2026). The preprocessed and gap-filled version of the WDIs used in this study was prepared by Li (2025) [29] and is available at https://doi.org/10.5281/zenodo.14876723. SDGI data are available from the Sustainable Development Report at https://dashboards.sdgindex.org (accessed on 10 February 2026). HDI data are available from the United Nations Development Programme at https://hdr.undp.org (accessed on 10 February 2026).

Conflicts of Interest

The authors declare no conflicts of interest.

References

United Nations. Transforming Our world: The 2030 Agenda for Sustainable Development; United Nations: New York, NY, USA, 2015. [Google Scholar]
United Nations. The Sustainable Development Goals Report 2025; Department of Economic and Social Affairs: New York, NY, USA, 2025. [Google Scholar]
Sachs, J.; Lafortune, G.; Kroll, C.; Fuller, G.; Woelm, F. From Crisis to Sustainable Development: The SDGs as Roadmap to 2030 and Beyond; Sustainable Development Report 2022; Cambridge University Press: Cambridge, UK, 2022. [Google Scholar]
Sachs, J.D.; Lafortune, G.; Fuller, G.; Iablonovski, G. Financing Sustainable Development to 2030 and Mid-Century; Sustainable Development Report 2025; SDSN: Paris, France; Dublin University Press: Dublin, Ireland, 2025. [Google Scholar] [CrossRef]
Conceição, P. Human Development Report 2023/24: Breaking the Gridlock: Reimagining Cooperation in a Polarized World; UNDP: New York, NY, USA, 2024. [Google Scholar]
Pradhan, P.; Costa, L.; Rybski, D.; Lucht, W.; Kropp, J.P. A systematic study of sustainable development goal (SDG) interactions. Earth’s Future 2017, 5, 1169–1179. [Google Scholar] [CrossRef]
Warchold, A.; Pradhan, P.; Kropp, J.P. Variations in sustainable development goal interactions: Population, regional, and income disaggregation. Sustain. Dev. 2021, 29, 285–299. [Google Scholar] [CrossRef]
Mainali, B.; Luukkanen, J.; Silveira, S.; Kaivo-oja, J. Evaluating synergies and trade-offs among Sustainable Development Goals (SDGs): Explorative analyses of development paths in South Asia and Sub-Saharan Africa. Sustainability 2018, 10, 815. [Google Scholar] [CrossRef]
de Miguel Ramos, C.; Laurenti, R. Synergies and trade-offs among sustainable development goals: The case of Spain. Sustainability 2020, 12, 10506. [Google Scholar] [CrossRef]
Kroll, C.; Warchold, A.; Pradhan, P. Sustainable Development Goals (SDGs): Are we successful in turning trade-offs into synergies? Palgrave Commun. 2019, 5, 140. [Google Scholar] [CrossRef]
Lusseau, D.; Mancini, F. Income-based variation in Sustainable Development Goal interaction networks. Nat. Sustain. 2019, 2, 242–247. [Google Scholar] [CrossRef]
Tosun, J.; Leininger, J. Governing the interlinkages between the sustainable development goals: Approaches to attain policy integration. Glob. Chall. 2017, 1, 1700036. [Google Scholar] [CrossRef]
Liu, J.; Hull, V.; Godfray, H.C.J.; Tilman, D.; Gleick, P.; Hoff, H.; Pahl-Wostl, C.; Xu, Z.; Chung, M.G.; Sun, J. Nexus approaches to global sustainable development. Nat. Sustain. 2018, 1, 466–476. [Google Scholar] [CrossRef]
Zelinka, D.; Amadei, B. Systems approach for modeling interactions among the sustainable development goals part 1: Cross-impact network analysis. Int. J. Syst. Dyn. Appl. 2019, 8, 23–40. [Google Scholar] [CrossRef]
Chenary, K.; Pirian Kalat, O.; Sharifi, A. Forecasting sustainable development goals scores by 2030 using machine learning models. Sustain. Dev. 2024, 32, 6520–6538. [Google Scholar] [CrossRef]
Xing, Q.; Lu, L.; Wang, L.; Chen, F.; Liu, J.; Pradhan, P.; Bryan, B.A.; Moallemi, E.A.; Gao, L.; Schaubroeck, T. Country-specific progress toward the Sustainable Development Goals: Past, present, and prospects. Proc. Natl. Acad. Sci. USA 2025, 122, e2524299122. [Google Scholar] [CrossRef]
Rabbi, M.F. A machine learning framework for forecasting multidimensional sustainability and informing integrated policy thresholds in the EU. Environ. Dev. Sustain. 2025, 1–56. [Google Scholar] [CrossRef]
Çelik, S.; Öztürk, Ö.F.; Akkucuk, U.; Şaşmaz, M.Ü. Global sustainability performance and regional disparities: A machine learning approach based on the 2025 SDG Index. Sustainability 2025, 17, 7411. [Google Scholar] [CrossRef]
Detthamrong, U.; Klangbunrueang, R.; Chansanam, W.; Dasri, R. Deep Learning for Sustainable Finance: Robust ESG Index Forecasting in an Emerging Market Context. Sustainability 2026, 18, 110. [Google Scholar] [CrossRef]
Raza, M.Q.; Khosravi, A. A review on artificial intelligence based load demand forecasting techniques for smart grid and buildings. Renew. Sustain. Energy Rev. 2015, 50, 1352–1372. [Google Scholar] [CrossRef]
Voyant, C.; Notton, G.; Kalogirou, S.; Nivet, M.-L.; Paoli, C.; Motte, F.; Fouilloy, A. Machine learning methods for solar radiation forecasting: A review. Renew. Energy 2017, 105, 569–582. [Google Scholar] [CrossRef]
Alhathlaul, N.; Lakhouit, A.; Abdalla, G.M.; Alghamdi, A.; Shaban, M.; Alshahir, A.; Alshahr, S.; Alali, I.; Mutlaq Alshammari, F. Assessing Waste Management Using Machine Learning Forecasting for Sustainable Development Goal Driven. Sustainability 2025, 17, 8654. [Google Scholar] [CrossRef]
Kozlovskyi, S.; Kulinich, T.; Duszyński, M.; Popovskyi, T.; Dluhopolska, T.; Kornatka, A.; Popovskyi, Y. Forecasting Demand for Eco-Friendly Vehicles Using Machine Learning Technologies in the Era of Management 5.0. Sustainability 2025, 17, 4429. [Google Scholar] [CrossRef]
Lim, B.; Arık, S.Ö.; Loeff, N.; Pfister, T. Temporal fusion transformers for interpretable multi-horizon time series forecasting. Int. J. Forecast. 2021, 37, 1748–1764. [Google Scholar] [CrossRef]
Vovk, V.; Gammerman, A.; Shafer, G. Algorithmic Learning in a Random World; Springer: New York, NY, USA, 2005. [Google Scholar]
Romano, Y.; Patterson, E.; Candes, E. Conformalized quantile regression. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar] [CrossRef]
Angelopoulos, A.N.; Bates, S. Conformal prediction: A gentle introduction. Found. Trends Mach. Learn. 2023, 16, 494–591. [Google Scholar] [CrossRef]
Fontana, M.; Zeni, G.; Vantini, S. Conformal prediction: A unified review of theory and new challenges. Bernoulli 2023, 29, 1–23. [Google Scholar] [CrossRef]
Li, W.; Duveiller, G.; Gans, F.; Smits, J.; Kraemer, G.; Frank, D.; Mahecha, M.D.; Weber, U.; Migliavacca, M.; Ceglar, A. Diagnosing syndromes of biosphere-atmosphere-socioeconomic change. arXiv 2025, arXiv:2503.08874. [Google Scholar]
Du, S.; Li, T.; Yang, Y.; Horng, S.-J. Deep air quality forecasting using hybrid deep learning framework. IEEE Trans. Knowl. Data Eng. 2019, 33, 2412–2424. [Google Scholar] [CrossRef]
Lim, B.; Zohren, S. Time-series forecasting with deep learning: A survey. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 2021, 379, 20200209. [Google Scholar] [CrossRef]
Salinas, D.; Flunkert, V.; Gasthaus, J.; Januschowski, T. DeepAR: Probabilistic forecasting with autoregressive recurrent networks. Int. J. Forecast. 2020, 36, 1181–1191. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Grinsztajn, L.; Oyallon, E.; Varoquaux, G. Why do tree-based models still outperform deep learning on typical tabular data? Adv. Neural Inf. Process. Syst. 2022, 35, 507–520. [Google Scholar]
Stankeviciute, K.; M Alaa, A.; Van der Schaar, M. Conformal time-series forecasting. Adv. Neural Inf. Process. Syst. 2021, 34, 6216–6228. [Google Scholar]
Auer, A.; Gauch, M.; Klotz, D.; Hochreiter, S. Conformal prediction for time series with modern hopfield networks. Adv. Neural Inf. Process. Syst. 2023, 36, 56027–56074. [Google Scholar]
Lei, J.; G’Sell, M.; Rinaldo, A.; Tibshirani, R.J.; Wasserman, L. Distribution-free predictive inference for regression. J. Am. Stat. Assoc. 2018, 113, 1094–1111. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.-I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
Angrist, J.D.; Pischke, J.-S. Mostly Harmless Econometrics: An Empiricist’s Companion; Princeton University Press: Princeton, NJ, USA, 2009. [Google Scholar]
Kaufmann, D.; Kraay, A.; Mastruzzi, M. The worldwide governance indicators: Methodology and analytical issues1. Hague J. Rule Law 2011, 3, 220–246. [Google Scholar] [CrossRef]
Bluszcz, J.; Valente, M. The economic costs of hybrid wars: The case of Ukraine. Def. Peace Econ. 2022, 33, 1–25. [Google Scholar] [CrossRef]
Woolf, S.H.; Schoomaker, H. Life expectancy and mortality rates in the United States, 1959–2017. JAMA 2019, 322, 1996–2016. [Google Scholar] [CrossRef] [PubMed]
European Commission. 2023 Country Report—Portugal; SWD(2023) 622 final; European Commission: Brussels, Belgium, 2023. [Google Scholar]
Vinuesa, R.; Azizpour, H.; Leite, I.; Balaam, M.; Dignum, V.; Domisch, S.; Felländer, A.; Langhans, S.D.; Tegmark, M.; Fuso Nerini, F. The role of artificial intelligence in achieving the Sustainable Development Goals. Nat. Commun. 2020, 11, 233. [Google Scholar] [CrossRef] [PubMed]
Fuso Nerini, F.; Sovacool, B.; Hughes, N.; Cozzi, L.; Cosgrave, E.; Howells, M.; Tavoni, M.; Tomei, J.; Zerriffi, H.; Milligan, B. Connecting climate action with other Sustainable Development Goals. Nat. Sustain. 2019, 2, 674–680. [Google Scholar] [CrossRef]
Eizenberg, E.; Jabareen, Y. Social sustainability: A new conceptual framework. Sustainability 2017, 9, 68. [Google Scholar] [CrossRef]
Acemoglu, D.; Robinson, J.A. Why Nations Fail: The Origins of Power, Prosperity, and Poverty; Crown Currency: New York, NY, USA, 2013. [Google Scholar]
Geels, F.W. The multi-level perspective on sustainability transitions: Responses to seven criticisms. Environ. Innov. Soc. Transit. 2011, 1, 24–40. [Google Scholar] [CrossRef]
Meadowcroft, J. Engaging with the politics of sustainability transitions. Environ. Innov. Soc. Transit. 2011, 1, 70–75. [Google Scholar] [CrossRef]
Gao, D.; Zhou, X.; Liu, X. The bright side of uncertainty: The impact of climate policy uncertainty on urban green total factor energy efficiency. Energies 2024, 17, 2899. [Google Scholar] [CrossRef]
Zhao, Z.; Cai, M.; Connor, T.; Chung, M.G.; Liu, J. Metacoupled tourism and wildlife translocations affect synergies and trade-offs among sustainable development goals across spillover systems. Sustainability 2020, 12, 7677. [Google Scholar] [CrossRef]

Figure 1. Methodological workflow of the trajectory intelligence framework. The PPCA gap-filled version of the WDI data shown in panel (a) was prepared by Li et al. [29].

Figure 2. Prediction error by income group, boxplots for SDGI and HDI. Circles indicate outliers beyond 1.5 times the interquartile range.

Figure 3. Choropleth map of surprise typology across 184 countries and regions. Colors indicate on-track (both targets), SDGI-only surprise, HDI-only surprise, and double surprise.

Figure 4. Trajectory plots for selected countries and regions. Rows 1 and 2 show the two largest positive and two largest negative SDGI surprises. Rows 3 and 4 show the same for HDI. Rows 5 and 6 show the two most on-track countries for SDGI and HDI (smallest absolute residual). Black dots indicate actual values, blue triangles indicate TFT predictions, and shaded bands indicate 80 percent prediction intervals. The left dashed vertical line marks the start of the study period (2003) and the right dashed line marks the training/test boundary (2017).

Figure 5. Top 20 predictive features by variable selection weight for SDGI (left) and HDI (right), colored by thematic domain. Inset pie charts show the domain-level share of cumulative predictive weight.

Figure 6. Governance quality versus prediction error, with regression lines (red dashed) and income group coloring.

Table 1. Model performance on the test set, 2018 to 2022, across 184 countries and regions (920 country–year observations). MAE stands for mean absolute error. RMSE stands for root mean squared error. PI width represents mean distance between upper and lower conformal prediction intervals bound at the 80 percent level.

Target	Model	MAE	RMSE	PI Width 80%
SDGI	TFT	1.10	1.49	4.01
SDGI	Linear	1.35	1.86	5.07
SDGI	XGBoost	2.32	2.76	9.78
HDI	TFT	0.008	0.012	0.035
HDI	Linear	0.020	0.027	0.072
HDI	XGBoost	0.036	0.042	0.184

Table 2. Surprise counts under the main definition and stricter coverage levels. PI means prediction interval. The main definition uses the adaptive 80 percent PI with at least two consecutive years (yr) outside the interval.

Definition	SDGI-Positive	SDGI-Negative	HDI-Positive	HDI-Negative
80% PI, 2 yr (main)	35	9	11	23
90% PI, 2 yr	14	5	2	9
95% PI, 2 yr	11	1	1	5

Table 3. Top surprise countries by target and direction. Residual represents mean prediction error calculated by actual minus predicted values across all test years.

SDGI-Positive	Residual	SDGI-Negative	Residual
Benin	+4.59	Venezuela	−1.54
Togo	+3.23	Yemen	−0.88
Rwanda	+1.88	Syria	−0.76
China	+1.79	Brunei	−0.46
Ethiopia	+1.31	Mali	−0.34
HDI-Positive	Residual	HDI-Negative	Residual
UAE	+0.043	Venezuela	−0.040
Bhutan	+0.033	Libya	−0.023
Bangladesh	+0.023	Lebanon	−0.022
Vietnam	+0.017	Iran	−0.013
Egypt	+0.015	Ukraine	−0.008

Table 4. Governance and predictability regression results.

Target	Coefficient	SE	Pearson Correlation r	p-Value	n
SDGI	−0.196	0.067	−0.212	0.004	184
HDI	−0.002	0.001	−0.282	<0.001	184

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lan, H.; Terbeck, F. Forecasting National Sustainability Trajectories with Deep Learning: Predictability, Surprise, and Early Predictive Signals. Sustainability 2026, 18, 5530. https://doi.org/10.3390/su18115530

AMA Style

Lan H, Terbeck F. Forecasting National Sustainability Trajectories with Deep Learning: Predictability, Surprise, and Early Predictive Signals. Sustainability. 2026; 18(11):5530. https://doi.org/10.3390/su18115530

Chicago/Turabian Style

Lan, Hai, and Fabian Terbeck. 2026. "Forecasting National Sustainability Trajectories with Deep Learning: Predictability, Surprise, and Early Predictive Signals" Sustainability 18, no. 11: 5530. https://doi.org/10.3390/su18115530

APA Style

Lan, H., & Terbeck, F. (2026). Forecasting National Sustainability Trajectories with Deep Learning: Predictability, Surprise, and Early Predictive Signals. Sustainability, 18(11), 5530. https://doi.org/10.3390/su18115530

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Forecasting National Sustainability Trajectories with Deep Learning: Predictability, Surprise, and Early Predictive Signals

Abstract

1. Introduction

2. Materials and Methods

2.1. Data

2.2. Temporal Split and Preprocessing

2.3. Forecasting Models

2.4. Conformal Prediction Intervals

2.5. Surprise Detection

2.6. Predictive Signal Extraction

2.7. Governance and Predictability Analysis

3. Results

3.1. Model Performance and Interval Calibration

3.2. Global Predictability Landscape

3.3. Sustainability Surprises and Cross-Target Typology

3.3.1. Surprise Counts and Asymmetry

3.3.2. Surprise Countries

3.3.3. Double Surprises and Goal–Capability Decoupling

3.4. Early Predictive Signals

3.5. Governance and Predictability Nexus

4. Discussion

4.1. Path Dependence in Sustainability Trajectories

4.2. Goal and Capability Decoupling

4.3. Predictive Signals and Monitoring Design

4.4. Governance and Forecastability

4.5. Limitations

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI