1. Introduction
Sustainable development has become a cornerstone of modern global policies, aiming to balance economic growth with environmental protection and social equity. In this context, the Sustainable Development Report (SDR) [
1] plays a vital role in assessing and promoting progress toward achieving the 17 Sustainable Development Goals (SDGs) established by the United Nations. The SDR offers a comprehensive overview of countries’ sustainability performance, using quantitative indicators to highlight achievements, challenges, and strategic directions.
Most information on sustainable development focuses on the SDGs themselves rather than on how these targets are achieved. The SDR is essential for tracking and promoting progress toward the SDGs. Its key roles include:
Monitoring Progress: The report provides a detailed overview of how countries are advancing toward the SDGs. It uses data and indicators to measure progress and identify areas needing improvement [
2].
Informing Policy: By highlighting successes and challenges, the report helps policymakers understand which strategies work and where adjustments are needed. This guidance supports national and international policies aligned with sustainable development objectives [
3].
Raising Awareness: The report draws attention to critical issues such as poverty, inequality, climate change, and environmental degradation. It seeks to engage stakeholders—including governments, businesses, and civil society—in the global effort to achieve the SDGs [
2].
Encouraging Accountability: By publicly sharing progress and setbacks, the report holds countries accountable for their commitments to sustainable development. Transparency fosters responsibility and motivates action [
3].
Facilitating Collaboration: The report promotes cooperation among sectors and nations, emphasizing partnerships and collective action to address global challenges and achieve sustainable development [
3].
According to the “
Core Indicators for Sustainability and SDG Impact Reporting: Training Manual”, the indicators are grouped as follows [
4]:
Economic area indicators
Revenue.
Value added (gross value added, GVA).
Net value added (NVA).
Taxes and other payments to the government.
Green investment.
Community investment.
Expenditures on research and development.
Share of local procurement.
Social area indicators.
- 9.
Share of women in managerial positions.
- 10.
Hours of employee training.
- 11.
Expenditures on employee training.
- 12.
Employee wages and benefits.
- 13.
Expenditures on employee health and safety.
- 14.
Incidence rate of occupational injuries.
- 15.
Share of employees covered by collective agreements.
Environmental area indicators.
- 16.
Water recycling and reuse.
- 17.
Water use efficiency.
- 18.
Water stress.
- 19.
Waste generation.
- 20.
Waste reused, re-manufactured and recycled.
- 21.
Hazardous waste generation.
- 22.
Greenhouse gas emissions.
- 23.
Ozone-depleting substances and chemicals.
- 24.
Share of renewable energy.
- 25.
Energy efficiency.
- 26.
Land used adjacent to biodiversity-sensitive areas.
Institutional area indicators.
- 27.
Board meetings and attendance.
- 28.
Share of female board members.
- 29.
Board members by age range.
- 30.
Audit committee meetings and attendance.
- 31.
Compensation per board member.
- 32.
Corruption incidence.
- 33.
Management training on anti-corruption.
Choosing environmentally sustainable energy sources is essential for the planet’s future. This perspective underscores the importance of adopting eco-friendly renewable energy options, including solar, wind, hydro, geothermal, and biomass [
5].
Existing research examines consumption- and life-cycle-based indicators that capture environmental burdens of production and lifestyles, as well as the growing contributions of artificial intelligence (AI) and machine learning to emissions forecasting, smart grids, and sustainability analytics. The literature suggests that energy use in transport and electricity, production-related emissions, and composite sustainability metrics are promising candidates for explaining observed SDR outcomes.
Despite substantial progress, several issues remain debated. First, strong correlations between indicators and aggregate sustainability scores do not in themselves establish causal relationships; model specification and confounding can alter the direction and magnitude of effects. Second, small-sample analyses—common in policy time series—are vulnerable to overfitting in flexible models such as neural networks. Third, trade-offs can arise: for example, rapid deployment of renewables may interact with industrial structures or consumption patterns in ways that lead to mixed effects on composite indices. These uncertainties motivate careful model design, transparent reporting, and validation strategies suited for limited data.
This study investigates which measurable drivers most strongly predict SDR performance in the European Union using annual data. Focusing on energy and emissions related indicators, we combine descriptive statistics, correlation analysis, and curve fitting with a feed-forward artificial neural network (ANN). Across complementary methods, results consistently indicate that renewable energy use in transport (r4t) shows the strongest predictive association with the SDR, followed by the overall target calculation (cot), then renewable electricity (r4e); production-related emissions (gsep) display a weaker, negative association. These findings should be interpreted as preliminary, given the sample size and scope, yet they provide actionable directions for policy, namely, prioritizing transport renewables and target setting while pursuing integrated measures to curb production-related emissions.
The findings of this study might have significant implications for market actors and stakeholders. By identifying renewable energy use in transport and overall target calculation as the most influential factors on SDR outcomes, this research offers insights for strategic investment and policy prioritization. Market participants can leverage these results to align operations with sustainability metrics exhibiting predictive relevance, thereby enhancing contributions to SDG achievement. Policymakers and institutional stakeholders are encouraged to focus on these key indicators to accelerate progress and optimize resource allocation. The novelty of our hierarchical analysis and integration of AI techniques further reinforces the practical relevance of this study, providing a robust framework for future sustainability assessments and decision-making.
The study has several limitations. The dataset includes only 14 annual observations and focuses mainly on the European Union, limiting generalizability and excluding global or regional differences. The use of aggregated EU-level data may obscure country-specific variations, while reliance on secondary statistics can introduce methodological inconsistencies. Additionally, the analysis omits potentially important factors—such as governance, innovation, or energy prices—that could influence sustainability outcomes.
The remainder of this manuscript is structured as follows:
Section 2 is a literature review,
Section 3—Materials and Methods—presents the data and methodologies employed.
Section 4, Results, details the outcomes of applying these methods and interprets the findings.
Section 5 concludes the research, discusses limitations, and suggests directions for future studies.
2. Literature Review
Recent literature increasingly emphasizes the role of sustainability indicators in shaping policy and reporting frameworks. A substantial body of research highlights the importance of renewable energy in achieving sustainability targets. Several studies underscore the environmental benefits of renewable energy sources, particularly in mitigating greenhouse gas emissions and reducing dependence on fossil fuels [
6]. These reviews emphasize the potential of solar, wind, biomass, and hydro energy to promote environmental protection and sustainable development. Similarly, other researchers examine the opportunities and challenges associated with renewable energy, noting its contributions to energy security, climate change mitigation, and socio-economic development [
7].
Transitioning to renewable energy sources for transportation—such as electric vehicles powered by solar or wind energy—helps reduce greenhouse gas emissions from the transport sector [
8]. This shift is critical for lowering the overall carbon footprint and mitigating climate change. Increasing the share of renewables in electricity generation reduces reliance on fossil fuels, thereby decreasing greenhouse gas emissions from power production [
9]. This also supports the electrification of transport and other sectors, amplifying environmental benefits. Reducing emissions from industrial and production activities is essential for meeting climate targets. Incorporating renewable energy into these processes can significantly cut emissions, contributing to a smaller overall carbon footprint [
10].
The transport sector has been identified as a key area for decarbonization. The International Energy Agency (IEA, 2023) [
11] reports that electric vehicle (EV) adoption is accelerating globally, with EVs playing a pivotal role in reducing emissions from road transport. The Global EV Outlook stresses the importance of policy frameworks and infrastructure development to support this transition. Further studies explore the socio-technical dimensions of transport decarbonization, advocating for inclusive and just energy transitions that consider equity and public support [
12].
In parallel, the consumption footprint has emerged as a vital metric for assessing the environmental impact of production and lifestyle choices [
1]. The European Commission’s Environmental Footprint methodology [
13], including the Product Environmental Footprint (PEF) and Organisation Environmental Footprint (OEF), provides standardized tools for life cycle assessment (LCA) of products and services. Sala et al. [
14] developed a set of 16 LCA-based indicators to quantify the environmental impacts of EU consumption, offering insights into sustainable consumption patterns and policy implications.
These indicators are interconnected, offering a holistic perspective on sustainability. Addressing them collectively enables meaningful progress toward a more sustainable and resilient future.
Recent advancements in artificial intelligence (AI) have opened new avenues for sustainability research. Studies identify high-impact areas where machine learning can contribute to climate change mitigation, including smart grids, emissions forecasting, and resource optimization [
15]. Other analyses examine AI’s potential to support or hinder SDG achievement, finding that AI could positively influence 79% of SDG targets, while also posing risks related to inequality and governance [
16]. Artificial intelligence can influence SDG achievement through several mechanisms, such as enhancing forecasting capacity, improving resource-allocation efficiency, and enabling data-driven policy evaluation. These mechanisms are relevant to the present study because our use of ANNs follows the same principle: AI tools can identify nonlinear patterns, highlight variable interactions, and support the detection of indicators with high predictive relevance for sustainability outcomes. In this sense, the ANN employed in our analysis is not intended to offer authoritative predictions but rather to illustrate how AI can complement classical statistical methods by revealing underlying structures in the data that may otherwise remain undetected. This provides a direct and context-specific illustration of the broader argument about AI’s potential contribution to SDG-related assessments.
As existing literature often focuses on outcomes without thoroughly analyzing underlying factors that influence or predict their evolution, this research aims to observe how renewable energy usage, greenhouse gas emissions, and consumption footprint affect the Sustainable Development Report (SDR) [
1]. Particular attention is given to elements with immediate effects on population health and well-being, such as renewable energy. Notably, during the preparation of this material, a power outage occurred in Southwestern Europe, underscoring the necessity of this research. By applying rigorous statistical methods (correlation analysis, regression, normality and homogeneity tests) and advanced AI techniques (artificial neural networks), we address factors with predictive importance for SDR outcomes. This initial phase seeks to establish hierarchies of predictive relevance. Such hierarchies can inform policies and strategies to improve SDR results and accelerate progress toward SDGs. For instance, if renewable energy use in transportation proves highly influential, efforts should focus on its trends and expedite the transition to green transport. Conversely, if correlations are negative, adjustments may be necessary. These findings can optimize strategies for energy transition, reduce ecological impacts of production and consumption, and advance SDG achievement.
Unlike previous research that primarily examines isolated relationships between energy, emissions, and sustainability outcomes, our approach provides a structured ranking of predictor relevance for SDR performance, offering a clearer understanding of which variables exhibit the strongest data-driven associations. Furthermore, the study advances policy insight by translating these predictive findings into actionable priorities for EU climate-mitigation strategies, thus bridging the gap between exploratory modeling and real-world decision-making. This dual contribution—methodological integration and predictive policy relevance—differentiates the present work from the existing literature, which typically relies on single-method analyses or descriptive assessments without establishing a comparative hierarchy of sustainability drivers.
3. Materials and Methods
The present study employed static correlation techniques and artificial neural networks (ANNs) due to its exploratory nature and the limited dataset comprising only 14 annual observations. These methods enabled the identification of hierarchical correlations and predictive importance among sustainability indicators with high precision, despite the small sample size. This approach aligns with the aims of Climate, facilitating research on “climate mitigation and adaptation policies and strategies” and “sustainability, clean energy, and pollution control”.
3.1. Data
The dataset encompassed indicators for monitoring progress toward renewable energy targets under the Europe 2020 strategy (Directive 2009/28/EC on the promotion of energy from renewable sources—RED I) [
17] and the Fit for 55 strategy within the Green Deal (Directive (EU) 2018/2001—RED II) [
18]. Data covered the European Union (27 countries from 2020) for the years 2010–2023. The selected indicators were linked to SDR scores as they reflected core sustainability dimensions assessed within the SDR. The SDR evaluates progress across all 17 SDGs, many of which are directly shaped by trends in energy transition, environmental pressure, and production–consumption dynamics. Consequently, these indicators captured essential components of the sustainability performance that the SDR aggregates into its overall scoring.
Based on the existing literature, the research conducted, and the aim of the present work, the indicators used are presented below:
Use of renewables for transport (r4t): Measured in thousand tons of oil equivalent. According to Eurostat [nrg_ind_urtd] [
19], this is a Sustainable Development Goal (SDG) indicator. Transitioning to renewable energy sources for transportation, such as electric vehicles powered by solar or wind energy, helps reduce greenhouse gas emissions from the transport sector. The EU strategy calculates the share of energy from renewable sources for four indicators: Transport (RES-T), Heating and Cooling (RES-H&C), Electricity (RES-E), and Overall RES share (RES).
Use of renewables for electricity (r4e): Measured in gigawatt-hours. According to Eurostat [nrg_ind_ured] [
19], this is also an SDG indicator. Increasing renewable electricity generation supports electrification and reduces emissions.
Greenhouse gas emissions from production activities (gsep): Measured in kilograms per capita. According to Eurostat [cei_gsr011] [
20], this includes emissions from all production activities, excluding household emissions. Greenhouse gases include CO
2, N
2O, CH
4, and fluorinated gases (HFC, PFC, SF
6, NF
3). Lower emissions indicate greater progress toward climate change mitigation.
Calculation of overall target (cot): Measured in thousand tons of oil equivalent. According to Eurostat [nrg_ind_cotd] [
19], this SDG indicator is based on data collected under Regulation (EC) No 1099/2008 on energy statistics, supplemented by national data. Some countries’ statistical systems remain incomplete regarding RED I and RED II requirements, particularly for ambient heat, renewable cooling, and biofuel sustainability.
Consumption footprint (cfp): Indexed to 2010 = 100 per capita. According to Eurostat [cei_gsr010] [
21], this indicator estimates environmental impacts of EU consumption using 16 LCA-based indicators. It combines emissions, resource use, and consumption intensities across five areas: food, mobility, housing, household goods, and appliances.
Consumption footprint—single weighted score (cfps): According to Eurostat [sdg_12_31] [
22], this indicator compares environmental impacts against planetary boundaries. It measures how often consumption exceeds these thresholds.
Sustainable Development Report (SDR): According to the
Europe Sustainable Development Report 2025 [
23], the overall score reflects progress toward all 17 SDGs, expressed as a percentage. A score of 100 indicates full achievement. The report provides an independent quantitative assessment of the EU progress toward SDGs.
Table 1 presents descriptive statistics for all variables.
These indices are interconnected, offering a comprehensive view of sustainability. Addressing them collectively enables significant progress toward a resilient future. For this study, the SDR was considered the dependent variable, while r4t, r4e, cot, gsep, cfp, and cfps served as independent variables.
Further analysis retained only four independent variables (r4t, r4e, cot, gsep) due to their strong correlations with the SDR. To assess relationships and determine appropriate statistical tests, Bartlett’s test and Shapiro–Wilk’s test were applied.
Bartlett’s test evaluates homogeneity of variances: p-value < 0.05 indicates significant differences; p-value ≥ 0.05 indicates no significant differences.
Shapiro–Wilk’s test assesses normality: p-value < 0.05 indicates a non-normal distribution; p-value ≥ 0.05 indicates a normal distribution.
Results of these tests are shown in
Table 2 and
Figure 1 (Bartlett’s test only).
The test results indicated that the assumption of equal variances was violated (Bartlett’s test) and the data did not follow a normal distribution (Shapiro–Wilk’s test). Consequently, the analysis relied on trends observed in the variable graphs. To further capture and simulate the nonlinearity highlighted by these test results, artificial neural networks (ANNs) were applied.
3.2. Methods
This research employed two complementary approaches. For visualizing data evolution and concatenating these trends, several Python v3.10 and v3.11 modules and functions were utilized, including:
sns.lineplot plots a line graph with options for multiple semantic groupings [
24].
sns.jointplot creates a plot of two variables with bivariate and univariate graphs [
25].
sns.regplot generates a scatter plot with a linear regression line and a 95% confidence interval [
26].
corrmat constructs a correlation matrix to display relationships among variables [
27].
curve_fit applies nonlinear least squares to fit a function [
28].
ax.plot_surface produces a three-dimensional surface plot [
29].
Based on graphical representations and comparisons of growth or decline trends, rates of change, and inflection points relative to the SDR, conclusions were drawn regarding the predictive importance of independent variables. To establish a hierarchy of predictive relevance among these variables, an artificial neural network (ANN) was employed.
A feedforward ANN with a backpropagation algorithm was selected, given the nature of the problem and data characteristics. This approach, inspired by human learning, has been widely validated in prior research. The model was trained using the Adam optimizer, which efficiently manages stochastic objective functions through adaptive moment estimation. Additionally, the ReLU activation function was implemented to introduce nonlinearity and mitigate the vanishing gradient problem [
30].
The training phase represents the learning process of the model. During training, the ANN receives a portion of the dataset (called the training set) and attempts to learn the underlying relationships between the input variables (targets) and the output variable (target). The model repeatedly adjusts its internal parameters—weights and biases—to minimize the difference between predicted and actual values. This optimization is performed through iterative algorithms such as gradient descent or, in this study, the Adam optimizer, and is guided by a loss function (e.g., mean squared error). The goal of training is not only to fit the known data well but also to extract generalizable patterns that will hold for unseen observations.
The testing phase evaluates how well the trained model performs on new data it has never encountered before. The test set represents a small, separate subset of the dataset that is not used during training. By comparing the model’s predictions on these unseen data with the actual values, one can determine whether the model has learned genuinely meaningful patterns or whether it has simply memorized the training examples (a phenomenon known as overfitting). Performance metrics such as mean absolute error, mean squared error, or accuracy scores quantify how reliable and generalizable the model is.
Due to the limited number of data points, the size of the test set was fixed to 1 observation (test_size = 0.05), which resulted in 13 data points for training and validation and 1 data point for testing.
Two ANN models were developed: one using all available data and another using only the four variables selected for further analysis. Differences between these models included batch size and network architecture (number of neurons, weights, and learning rate type), as detailed in
Table 3 and illustrated in
Figure 2.
The settings presented in
Table 3 were selected after testing multiple configurations. Final values—including the number of hidden layers, neurons per layer, learning rate type, learning rate value, and batch size—were chosen based on their ability to minimize the error between actual and simulated outputs. To identify the optimal ANN configuration, several models were compared using the following hyperparameters:
Compared Models: relu_adam—activation function, ReLU; optimizer, Adam; tanh_sgd—activation function, Tanh; optimizer, SGD.
Tested Network Architectures: Simple architectures with a single hidden layer (3, 5, 7, 9, 11 neurons) and composite architectures with multiple hidden layers: (5, 3), (7, 3), (7, 5), (9, 3), (9, 5), (9, 7), (11, 3), (11, 5), (11, 7), (11, 9), (11, 7, 3).
Hyperparameters Used: Batch sizes of 2, 5, and 10.
Evaluation Metrics: Errors on the training set and errors on the test set.
The two ANN structures identified as optimal in terms of training and testing errors differed only in the number of neurons in the input layer, which corresponds to the number of independent variables. The architectures were as follows:
4. Results
This section presents the outcomes of the data analysis, including graphical representations, their interpretation, the implementation of artificial neural networks, and the results derived from their application.
4.1. Correlation Matrix
To examine correlations between SDR values and other independent variables, a correlation matrix was employed, as shown in
Figure 3. The first matrix includes all variables, while the second focuses on the four variables with correlation values greater than 0.9.
Both correlation matrices reveal strong correlations among variables, with values close to ±1, indicating near-linear relationships. Matrix 3(a) offers a comprehensive view, including additional variables (cfp, cfps), while Matrix 3(b) provides a simplified perspective for focused analysis and dimensionality reduction. Depending on the context, such as feature selection or multicollinearity assessment, Matrix 3(a) is more informative, whereas Matrix 3(b) supports streamlined modeling.
In
Table 4 the summary of the relationship between the SDR and all other variables based on the correlation matrix is presented.
Table 4 summarizes the relationships between the SDR and all other variables based on the correlation matrix. Correlations with the SDR range from 0.76 to 0.98, with only one negative correlation (gsep). Notably, correlations exceeding 0.90, and even 0.92, regardless of sign, are associated with four independent variables: r4t, r4e, cot, and gsep. Consequently, the subsequent analysis focused on two datasets: one comprising all variables and another including only the four variables with correlations above 0.92.
The study primarily emphasized energy and emission indicators—r4t, r4e, cot, and gsep—due to their direct and measurable correlation, predictive importance, and mathematical fitting functions with the SDR. This narrow scope ensured methodological clarity and highlighted immediate environmental drivers of sustainability performance.
4.2. Graphics of Variable Evolution
Figure 4 presents scatter plots where each annual observation is positioned according to its variable’s value on the x-axis and SDR value on the y-axis plots (annual observation located at variable’s value (x-axis) and SDR (y-axis)). Around these points, density contours illustrate how frequently different combinations occur, with blue areas indicating maximum density and therefore the strongest accumulation of observations. The marginal histograms located above and to the right of the main plot display the distribution of variable and SDR values, respectively, complemented by smoothed density curves. Together, these visual elements reveal a clear, concentrated, and nearly linear association between r4t and SDR, highlighting the consistency of their joint evolution.
Figure 4a illustrates the distribution of r4t, which is concentrated within a narrow range, as indicated by the marginal histograms. The SDR exhibits a broader distribution with significant accumulation in the high-density area. The relationship between r4t and SDR appears close to linear, with a clear local correlation in the high-density zone, suggesting r4t may serve as a relevant predictor for the SDR.
Figure 4b depicts the relationship between r4e and SDR. The concentration area is more dispersed than in
Figure 4a. r4e shows a wider distribution than r4t, according to the histograms. Its relationship with the SDR is less defined, indicating a weaker apparent correlation. Thus, r4e may exhibit an indirect or conditional correlation.
Figure 4c shows the association between gsep and SDR, where maximum density is concentrated in a narrow area, suggesting a localized relationship. Histograms reveal a relatively uniform distribution for gsep but significant SDR accumulation in a narrow range. gsep likely contributes to SDR variation only in combination with other factors.
Figure 4d presents the relationship between cot and SDR. Point density is more dispersed than in
Figure 4c, yet an accumulation zone remains. The relationship appears diffuse, possibly indicating an indirect correlation through interaction with other variables.
Figure 4e illustrates the correlation between cfp and SDR, with maximum density concentrated in a narrow interval, suggesting a local relationship. cfp correlates with the SDR in the high-density area, indicating a potential direct correlation.
Figure 4f shows cfps and the SDR, where density is more dispersed than in
Figure 4e, though an accumulation zone persists. Marginal distributions reveal greater variability for cfps than for cfp. The relationship between cfps and SDR is weaker but not negligible, possibly indirect or conditionally dependent.
To verify the fitting function (the fitting was carried out using NumPy’s polyfit, applying a polynomial function to the data, as illustrated in
Figure 4),
Figure 5 illustrates the relationship between SDR and each variable using both original and fitted data. The yellow scatter points represent the observed SDR values, while the blue line shows the fitted nonlinear function that captures the overall trend. The small inset provides a complementary linear regression view, where green points represent the local data used in the regression, the pink line represents the linear fit, and the shaded pink area corresponds to the 95% confidence interval. This inset highlights the local variability of the data and the degree of uncertainty around the linear approximation, confirming the general upward trend and consistency of the fitted model.
Figure 5a reveals a clear trend between SDR and r4t, as captured by the fitted line. The model closely represents the original data, with minimal deviation. The uncertainty shown in the inset graph is relatively narrow, suggesting a reliable fit and low variability.
Figure 5b shows the fitted line capturing the general trend of the original data, though with slightly more spread compared to
Figure 5a. The uncertainty band is wider, indicating greater variability or reduced confidence in the fit. Nevertheless, the model provides a reasonable approximation.
Figure 5c demonstrates that the fitted curve closely follows the trend of the original data, suggesting a strong model fit. The confidence interval is narrow, indicating high confidence in predictions. The SDR–gsep relationship appears smooth and well captured.
Figure 5d shows the fitted line aligning well with the original data, though with slightly more deviation than in
Figure 5c. The confidence interval is moderately wider, suggesting some variability. Overall, the model remains reliable for SDR–cot.
Figure 5e illustrates a strong correlation between SDR and cfp, with the fitted curve accurately capturing the trend. The inset graph provides a detailed view of a critical region, with the pink area indicating confidence intervals.
Figure 5f confirms a good model fit for SDR–cfps, with consistent performance across the observed range. The inset graph highlights uncertainty or refinement in specific regions.
While the correlation analysis in
Table 4 and
Figure 3,
Figure 4 and
Figure 5 highlights strong statistical associations between sustainability indicators and SDR, these do not imply causality. Therefore, the study employed complementary techniques, including three-dimensional representations and ANN modeling, to simulate nonlinear relationships and deepen understanding of variable interactions.
The next phase introduced functions derived from fitting into three-dimensional representations of SDR evolution relative to other independent variables, considered two at a time (
Figure 6). These graphs aimed to establish a hierarchy of influences—defined strictly as mathematical relationships from fitted functions and predictive modeling—by comparing previously determined functions.
In
Figure 6 variables are shown on the x-axis and y-axis, while the vertical axis represents the corresponding SDR values. The colored surface reflects the fitted mathematical function derived from the modeling process, with warmer colors indicating higher SDR values. The black points represent the original annual observations, allowing direct comparison between measured data and the modeled surface. This representation highlights how the SDR responds to joint variations in variables.
Figure 6a illustrates the surface of SDR versus r4t and r4e, showing a clear increase in SDR as both variables rise. Both r4t and r4e exert a positive and predictable correlation on the SDR.
Figure 6b, depicting SDR versus r4t and gsep, presents a more complex pattern, with the SDR increasing or decreasing depending on the combination of these variables. While r4t maintains a stable positive influence, gsep introduces nonlinear variability, adding complexity to SDR behavior.
Figure 6c, showing SDR versus r4t and cot, reveals a steady increase in SDR as both variables grow. The surface is smooth and uniform, indicating that cot reinforces the positive effect of r4t.
Figure 6d, SDR versus gsep and r4e, displays a more irregular surface. SDR values fluctuate due to gsep’s variability, while r4e exerts a more stable but nonlinear influence. This combination results in less predictable SDR evolution.
Figure 6e, SDR versus r4e and cot, demonstrates that the SDR increases primarily with cot, while r4e has a moderate effect. The surface suggests a positive and relatively uniform relationship along the cot axis, confirming cot as a strong contributor and r4e as secondary.
Figure 6f, SDR versus cot and gsep, shows the SDR varying within a narrower range. cot maintains a consistent positive effect, whereas gsep introduces local variations, suggesting a nonlinear and less stable influence. cot remains a reliable predictor, while gsep adds complexity.
Table 5 summarizes SDR evolution across all variable pairs. Reading the table involves comparing column headers with row variables. For example, in the second row, r4t exerts greater correlation than r4e and cot exerts a slightly stronger correlation on SDR than r4e. In the r4t column, stronger influences are evident compared to other variables. The column “No. of Higher Influences” counts superior influences for each variable, while “Hierarchy of Influence” ranks them accordingly.
r4t and cot are the most consistent and positively correlated variables with the SDR.
r4e has a moderate and relatively stable influence.
gsep introduces the most variability and nonlinearity, making it the least predictable factor.
The most favorable combinations for increasing the SDR involve r4t and cot.
These findings are later compared with results obtained through artificial intelligence, specifically artificial neural networks (ANNs).
4.3. ANN Application Results
Convergence results for both ANN versions were similar, occurring at approximately 150 iterations (
Figure 7).
The network error for all data was 0.0002637165, while for the four selected variables it was 0.0000969775. The lower error for the reduced dataset indicates superior performance compared to the full dataset.
Another comparison involved target (actual) values versus simulated values after ANN training, as shown in
Figure 8.
The graphs reveal an almost perfect overlap between target and simulated values, indicating excellent training results and minimal differences.
For additional analysis,
Table 6 presents training and testing errors for both datasets, confirming the robustness of the ANN models. The mean absolute error function represents the average of the absolute differences between the actual and predicted values. The mean squared error function represents the average of the squared differences between actual and predicted values.
Table 6 reveals extremely low training and testing error values, confirming that the model was correctly trained and delivered highly accurate results suitable for simulation and evaluation purposes. To further emphasize the robustness of training,
Table 7 presents training and testing accuracy scores based on the RandomForestRegressor. The closer these values are to 1.00, the higher the accuracy achieved.
By analyzing the training outcomes from the previous tables and noting that both training and testing achieved excellent accuracy, the next step involved determining a hierarchy of how each independent variable was regarded by the artificial neural network in terms of importance for the training process and, consequently, for predicting SDR values (model.feature_importances calculates and returns a score for each input feature, indicating its relative significance in making accurate predictions). This measure, technically referred to as feature importance, is illustrated in
Figure 9 for both scenarios: the subset of four variables (
Figure 9a) and the entire dataset (
Figure 9b).
Observations from
Figure 9 indicate that the hierarchy identified by the ANN remains consistent regardless of whether the full dataset or only the four-variable subset is used. Moreover, the importance values obtained for both approaches are similar, with differences of approximately six percentage points. In both cases, the most influential variable is r4t, followed by cot, then r4e, and finally gsep. Regarding percentage values, r4t consistently exceeds 25%, reaching up to 29.23%, while the remaining variables cluster near this range, except when only four variables are considered, in which case gsep shows a markedly lower value.
It is noteworthy that this hierarchy mirrors the one established in
Section 3.1, underscoring the validity of two-dimensional and three-dimensional representation methods and their interpretation, as well as the correct implementation and efficient training of the ANN models.
Regardless of the analytical method employed, the hierarchy of predictive importance of independent variables on the SDR remains unchanged. This confirms that the SDR is most strongly correlated with the use of renewables for transport, followed by the calculation of the overall target, then the use of renewables for electricity, and finally greenhouse gas emissions from industrial production, when considering only the four variables selected through applied analysis techniques.
5. Conclusions
This study aimed to identify and rank the predictive importance of key sustainability indicators—such as renewable energy use in transport and electricity, greenhouse gas emissions, and consumption footprint—on the Sustainable Development Report (SDR). To achieve this, a hybrid methodology was employed, combining statistical techniques (correlation analysis, regression modeling, and distribution tests) with artificial intelligence, specifically artificial neural networks (ANNs).
The results suggest that renewable energy use in transport (r4t) is the most influential factor in determining SDR values, followed by the overall target calculation (cot), renewable energy use in electricity (r4e), and greenhouse gas emissions from production (gsep). This hierarchy emerged consistently across statistical analyses, fitted-function representations, and ANN-based feature-importance results. While these convergences are encouraging, they should be interpreted cautiously given the exploratory nature of the study.
Three-dimensional analysis—strictly referring to mathematical relationships derived from fitted functions—of SDR evolution relative to pairs of independent variables revealed significant interactions, particularly between r4t and cot. These findings suggest that these two components should be prioritized in sustainability policies. Conversely, gsep exhibited variable and less predictable influence, indicating the need for integrated and adaptive approaches to reducing industrial emissions.
By leveraging artificial intelligence, the study seems to show that simulation models can accurately reproduce SDR values, providing a robust tool for evaluating future scenarios. The low training and testing error values, along with high accuracy scores, confirm the methodological efficiency of the approach. Although the ANN was able to reproduce SDR values with low training and testing error, the small dataset and constrained architecture limit the extent to which the model’s predictive ability can be generalized.
While empirical findings clearly indicate r4t as the most influential variable, the theoretical rationale behind this result warrants further discussion. Transportation has historically been one of the most challenging sectors for decarbonization due to its reliance on fossil fuels and its direct contribution to greenhouse gas emissions. The European Union has implemented ambitious policies—such as the Renewable Energy Directive (RED II) [
31], the Fit for 55 package [
32], and the European Green Deal [
33]—which prioritize the transition to clean transport through electrification, biofuels, and infrastructure development. These frameworks have accelerated renewable energy adoption in transport, amplifying its measurable impact on sustainability indicators.
In conclusion, this research offers valuable insights into SDR evolution and provides a solid foundation for developing results-oriented public policies. By prioritizing components with major predictive importance—such as renewable energy in transport and overall target setting—the transition toward a green and sustainable economy can be accelerated, aligning with global sustainable development objectives. Yet, because the analysis is based on a limited number of annual observations and reflects the specific regulatory, economic, and energy-transition dynamics of the European Union, the findings may not generalize to other regions with different policy frameworks or sustainability trajectories. Also, limitations may hide country-level differences, and the reliance on secondary statistics that can contain methodological inconsistencies. The analysis also deliberately omits potentially relevant variables—such as governance, innovation, or energy prices—that might influence sustainability outcomes.
The predictive hierarchy identified in this research indicates which variables are most useful for forecasting SDR performance within the modeling environment applied, but it does not establish causal relationships or prescriptive policy outcomes. Therefore, while the results may support EU policymakers in prioritizing areas such as renewable energy in transport and overall target planning, these insights should be viewed as exploratory and contingent on data availability, institutional variability, and evolving climate-policy mechanisms.
ANN application in this study was exploratory, aimed at complementing traditional statistical methods and identifying potential nonlinear relationships among sustainability indicators. To mitigate overfitting risks, safeguards such as regularization techniques, careful network architecture selection, and validation through testing accuracy and feature importance analysis were implemented. The ANNs were selected after testing multiple configurations. The final values—including the number of hidden layers, neurons per layer, learning rate type, learning rate value, and batch size—were chosen based on their ability to minimize the error between actual and simulated outputs. To identify the optimal ANN configuration, several models were compared. Consistently low training and testing errors, along with close alignment between predicted and actual values, suggest that the ANN captured meaningful patterns despite limited data. Furthermore, consistency across statistical and AI-based methods support the robustness of findings. While larger datasets are preferable for generalization, the current approach provides valuable preliminary insights and a foundation for future research with expanded data.
To avoid confusion between simple statistical association and the predictive ordering proposed in this study, we emphasize that the “hierarchy of influence” does not correspond to correlation magnitude alone. Correlation strength reflects linear association between two variables in isolation, whereas the hierarchy introduced here is derived from model-based predictive relevance, combining fitted functional relationships, multidimensional surface analysis, and ANN feature-importance metrics. This hierarchy therefore captures how each variable contributes to predicting SDR values within the overall model structure, accounting for nonlinear interactions and multivariate contexts. Importantly, the hierarchy does not imply causality; rather, it represents a data-driven assessment of which indicators are most useful for prediction under the modeling framework applied. The term “influence” is thus used strictly in a predictive, model-based sense, not as a causal determinant.
Future research should integrate social, institutional, and economic dimensions for a comprehensive understanding of SDR evolution. Expanding the dataset to include more countries and years, incorporating additional sustainability dimensions—such as governance quality, economic resilience, biodiversity, water usage, and social equity—would enable a more holistic assessment of sustainability progress. Advanced techniques, including ensemble AI models and time-series approaches such as VAR, ARIMA, and TVP-VAR, could enhance predictive accuracy. Econometric methods like Granger causality tests and panel data analyses should also be considered to rigorously assess causal linkages and temporal dependencies. Longitudinal studies investigating policy impacts would provide deeper insights into mechanisms driving sustainable development performance.