Next Article in Journal
A Cloud Model-Based Framework for a Multi-Scale Seismic Robustness Evaluation of Water Supply Networks
Previous Article in Journal
Pasture Restoration Reduces Runoff and Soil Loss in Karst Landscapes of the Brazilian Cerrado
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Leveraging Explainable AI to Decode Energy Poverty in China: Implications for SDGs and National Policy

1
School of Computer Science and Technology, Taiyuan Normal University, Jinzhong 030619, China
2
Shanxi Key Laboratory for Intelligent Optimization Computing and Blockchain Technology, Taiyuan Normal University, Jinzhong 030619, China
3
School of Computing and Information Technology, Shanxi University, Taiyuan 030006, China
4
Planning and Finance Department, Taiyuan Normal University, Jinzhong 030619, China
5
School of Information, Shanxi University of Finance and Economics, Taiyuan 030006, China
*
Author to whom correspondence should be addressed.
Sustainability 2025, 17(24), 11080; https://doi.org/10.3390/su172411080
Submission received: 21 October 2025 / Revised: 3 December 2025 / Accepted: 8 December 2025 / Published: 10 December 2025
(This article belongs to the Section Social Ecology and Sustainability)

Abstract

The precise identification of energy poor households is a critical step towards achieving the United Nations Sustainable Development Goals (SDGs), particularly SDG 7 (Affordable and Clean Energy) and SDG 1 (No Poverty), while also intersecting with climate action (SDG 13). As the world’s largest developing country, China faces unique energy poverty challenges characterized by significant regional disparities and uneven access to modern energy services. To support targeted interventions and equitable policy-making, this study proposes an explainable artificial intelligence (XAI) framework for predicting and interpreting energy poverty. Utilizing nationally representative data from the China Family Panel Studies (CFPS) from 2014 to 2020, we developed a predictive model that integrates a Convolutional Neural Network with SHapley Additive exPlanations (SHAP). Our model, EPPE-FCS, demonstrated exceptional predictive performance, achieving an average accuracy of 98.23%, outperforming several mainstream benchmarks. Crucially, the SHAP interpretability analysis revealed that annual per capita household expenditure is the most influential driver, while the contribution of energy burden indicators (electricity and gas expenses) exhibited a significant decreasing trend. This trend likely reflects the positive impact of China’s national policies, such as the “Clean Heating Initiative” and “Targeted Poverty Alleviation,” on improving energy infrastructure and affordability. The findings underscore the necessity of a dual-track policy that combines immediate energy cost subsidies with long-term strategies for income enhancement and clean energy transition. This research provides policymakers with a robust tool to alleviate energy poverty, thereby advancing a just, sustainable, and climate-resilient energy future in China and other developing regions.

1. Introduction

Energy serves as a fundamental cornerstone of modern society, yet energy poverty, a pervasive global challenge, directly impedes the achievement of the United Nations Sustainable Development Goals (SDGs). It sits at the nexus of SDG 7 (ensuring access to affordable, reliable, sustainable, and modern energy for all) and SDG 1 (ending poverty in all its forms everywhere), with strong linkages to SDG 13 (Climate Action) through the imperative for a clean energy transition [1,2]. Energy poverty is commonly defined as a condition in which households lack adequate access to safe, affordable, and sustainable energy services to meet basic living needs [3,4]. This condition not only undermines health and quality of life but also perpetuates a multidimensional poverty trap, posing a severe threat to social equity and long-term sustainability [5,6].
Over recent years, EU countries have continuously refined their energy poverty governance through policy frameworks. For instance, the EU Clean Energy Package [7] identifies energy poverty as one of the core indicators in the regulation of electricity and natural gas markets for member states, requiring countries to regularly monitor households with high energy burdens and implement targeted subsidies. In comparison, these research and policy practices reflect an assistance logic centered on residential energy efficiency and energy market pricing mechanisms.
In developing countries, the structural drivers of energy poverty are more complex. Factors such as inadequate basic energy infrastructure, low penetration of clean energy, and significant urban-rural disparities collectively affect households’ access to energy services. Sy & Mokaddem (2022) [8] point out that in Sub-Saharan Africa and South Asia, energy poverty is manifested not only in high energy costs but also in the absence of energy services, such as unreliable electricity supply and lack of access to modern fuels. This strand of research underscores the critical importance of energy accessibility and infrastructure investment, which shows strong parallels with the causes of energy poverty in rural China.
Energy poverty represents a common challenge confronting both developed and developing countries, revealing a fundamental equity deficit within the global energy system. According to the International Energy Agency (IEA), hundreds of millions of people worldwide still lack access to electricity, while billions rely on polluting traditional solid fuels [9,10]. This issue persists even in European contexts; for example, Papada and Kaliampakos reported that up to 58% of households in Greece experience energy poverty [11], while Betto et al. estimated that Italy’s hidden energy poverty rate ranges between 25% and 43% [12]. As the world’s largest developing country and a major emitter, China presents a critical and unique context for studying energy poverty. The nation has achieved remarkable economic growth; however, this has been accompanied by severe challenges related to uneven regional development and disparate energy consumption patterns [13]. Studies indicate that the incidence of energy poverty in China once exceeded 40% nationally, but with significant regional variation. It was particularly acute in less developed western provinces, such as Yunnan and Qinghai [14]. This landscape has been actively shaped by concerted national policies. Initiatives like “Targeted Poverty Alleviation” have directly boosted household incomes, while the “Clean Heating Initiative” and rural grid upgrades have improved access to modern and cleaner energy sources, specifically aiming to reduce the household energy burden. These context-specific interventions make China an invaluable case study for understanding how policy can reshape the drivers of energy poverty, offering lessons for other developing countries. Accurately identifying energy poverty within this complex and evolving socio-political context is therefore of paramount importance.
To effectively measure and address energy poverty, the academic community has developed various assessment methodologies. Boardman’s “10% indicator” [15] and Hills’ “Low Income-High Cost” (LIHC) metric [16] are widely applied in developed countries, yet their heavy reliance on income factors often fails to comprehensively capture the actual landscape of household energy consumption. Although the Multidimensional Energy Poverty Index (MEPI) constructed by Nussbaumer et al. [17] offers a more holistic reflection of the breadth and depth of energy poverty, it encounters challenges in acquiring micro-level data. The Basic Needs Approach [18], which relies on objective energy consumption thresholds, provides clarity in calculation but frequently overlooks regional disparities in energy demand arising from varying economic development levels and climatic conditions [19]. To establish an energy poverty line more aligned with the Chinese context, this study adopts the standard developed by Liu Zimin [20], which integrates the Basic Needs Approach with equivalence scales, aiming to achieve an accurate assessment of household-level energy poverty.
With the advent of the big data era, machine learning technologies have introduced new methodological support for energy poverty research. For instance, models based on Random Forests [21], Decision Trees [22], and XGBoost [23] have been applied in poverty identification and demonstrate potential in handling high-dimensional data. Al Kez et al. [24] integrated the United Kingdom’s housing energy-efficiency ratings with remote-sensing indicators such as nighttime light intensity and land surface temperature, together with socioeconomic variables, to construct energy poverty identification models based on XGBoost and deep learning. Their approach substantially improved the accuracy of identifying high energy-burden households, demonstrating the critical role of spatial information in predicting energy poverty. Meanwhile, Gawusu et al. [25] employed machine-learning ensemble methods to model the Multidimensional Energy Poverty Index (MEPI), revealing that factors such as educational attainment, food security, and income capability are central to explaining multidimensional energy poverty. Their findings provide actionable empirical evidence for policy formulation. However, existing studies still face several challenges: some models are susceptible to noise, resulting in limited accuracy, or carry risks of overfitting [26,27]. More critically, although many advanced algorithms (such as deep learning) possess strong predictive capabilities, their “black-box” nature significantly hinders the interpretability of predictions. This opacity makes it difficult for policymakers to understand the rationale behind model decisions, thereby restricting their practical application in precise policy formulation [28]. For example, local interpretation methods like LIME often fail to provide a global perspective on feature contributions [29]. More critically, while international research has recognized the potential of explainable AI (XAI), studies that fully apply XAI in the context of energy poverty remain extremely limited. As highlighted in a relevant review [30], current XAI research predominantly focuses on technical domains such as energy system optimization, photovoltaic forecasting, and demand response, with relatively little attention paid to socio-economic applications like poverty identification.
To address the aforementioned challenges, this paper proposes an explainable artificial intelligence (XAI) framework that integrates Convolutional Neural Networks (CNN) with SHapley Additive exPlanations (SHAP) for energy poverty prediction and attribution analysis. There are three primary considerations: CNN’s ability to capture complex patterns among household characteristics, SHAP’s provision of a unified benchmark for interpretation, and the XAI framework’s direct alignment with the policy sector’s pressing need for model transparency. The main innovations and contributions of this study are threefold:
(1)
Robust feature screening through Spearman correlation analysis enhances the model’s generalizability and computational efficiency;
(2)
The CNN architecture captures complex temporal patterns and deep features in household panel data, achieving predictive accuracy (average 98.23%) that surpasses traditional machine learning models;
(3)
The incorporation of SHAP interpretability decomposes model predictions into feature-specific contributions, thereby transparently revealing key drivers of energy poverty and their operational mechanisms.
This research aims not only to provide a high-precision predictive tool but also to establish a decision support system capable of informing the design of equitable, efficient, and sustainable energy policies, thereby contributing to the advancement of the global sustainable development agenda.
Although existing research has yielded substantial in energy poverty measurement, machine learning methods, and multidimensional poverty frameworks, three critical gaps remain. First, while traditional studies using linear or tree-based models offer certain interpretability, their capacity to handle high-dimensional household characteristics and capture nonlinear interactions remains limited. Second, despite their strong predictive performance, deep learning models struggle to provide transparent rationales for policymaking due to their “black-box” nature. Third, few studies have examined the dynamic evolution of energy poverty drivers in China within the context of major 2014–2020 national initiatives such as clean heating and targeted poverty alleviation.
Against this backdrop, this study addresses the following core research questions:
(1)
Can a model combining high accuracy and strong interpretability be developed for identifying energy poverty using household-level microdata?
(2)
Which economic, demographic, and energy expenditure characteristics at the household level are key determinants of energy poverty?
(3)
Did these key factors undergo significant dynamic changes between 2014 and 2020, and are such changes consistent with national policy priorities?
Guided by these questions, the study aims to:
(1)
Construct an explainable AI (XAI) framework integrating CNN and SHAP to accurately identify energy poor households in China;
(2)
Uncover key drivers of energy poverty through feature contribution analysis and quantify their influence mechanisms;
(3)
Analyze temporal trends in these drivers and explore their policy implications and links to the SDGs.
Based on the literature and theoretical framework, the following hypotheses are proposed:
(1)
Household economic capacity (e.g., per capita expenditure) is the most critical determinant of energy poverty.
(2)
Household energy expenditure burden (e.g., share of electricity or gas costs) continues to exert a significant independent effect on energy poverty even after controlling for economic capacity.
(3)
Between 2014 and 2020, energy structure optimization and subsidy policies reduced the relative importance of energy burden-related indicators.

2. Materials and Methods

2.1. Data Source and Description

The data used in this study were drawn from the China Family Panel Studies (CFPS) [31], a nationally representative longitudinal survey conducted by the Institute of Social Science Survey (ISSS) of Peking University. Renowned for its scientific rigor, comprehensiveness, and accessibility, the CFPS data are widely utilized to assess socio-economic development and living conditions in China, providing an ideal foundation for examining energy poverty within a sustainable development framework.
To construct a multidimensional household-level panel dataset, we integrated CFPS data from four survey waves: 2014, 2016, 2018, and 2020. The dataset construction process involved the following steps: First, variables related to household energy consumption, economic status, and demographic characteristics were extracted from the household economic database, adult database, and family relationship database. These three databases were then matched and merged using Stata 17.0 based on consistent household and individual identifiers, resulting in a consolidated dataset covering households from 25 provinces in China.
During the data preprocessing stage, a comprehensive cleaning procedure was implemented using the Python 3.7.1 programming language and the Pandas library. This process included removing duplicate entries, imputing missing values, and correcting obvious outliers to ensure the reliability and robustness of subsequent analyses.
The key dependent variable, household energy poverty status, was defined with reference to the energy poverty line established by Liu Zimin, which combines the basic needs approach and equivalence scale method (set at an annual energy expenditure of 637.09 yuan). Based on this threshold, sample households were classified into two categories: those with annual energy expenditure below the threshold were defined as energy poor (assigned a value of 0), while the rest were classified as non-energy poor (assigned a value of 1).
Through a systematic review of the CFPS questionnaire and drawing upon established theoretical frameworks and literature in the field of energy poverty, we initially selected 12 potential characteristic variables associated with energy poverty. These variables encompass dimensions such as household economics, demographic structure, housing conditions, and energy consumption patterns [32,33]. Table 1 presents detailed descriptive statistics of these variables across the four survey years. After the above processing steps, a balanced panel dataset was formed, comprising approximately 13,016 valid household observations per year from 2014 to 2020, providing a solid data foundation for subsequent model training and interpretability analysis. During data cleaning, we identified and corrected outliers in proportion variables (e.g., gas expenditure share). Values exceeding plausible ranges were winsorized, and all proportion variables were standardized to a 0–1 scale for consistency.

2.2. Research Framework/Theoretical Model

To systematically address the challenges in energy poverty identification and attribution analysis, this study proposes a comprehensive explainable artificial intelligence (XAI) analytical framework [34,35]. This framework aims not only to achieve high-precision energy poverty prediction but also to deeply unravel the underlying driving mechanisms, thereby providing transparent and reliable decision support for formulating sustainable energy poverty alleviation policies. As illustrated in Figure 1, the study comprises the following four sequential phases:
Phase 1: Data Preprocessing and Feature Engineering
The core objective of this phase is to develop a high-quality dataset suitable for model training. We began by integrating and cleaning the raw multi-source data from CFPS, which involved handling missing values, addressing outliers, and performing data standardization. Subsequently, Spearman correlation analysis, a non-parametric statistical method, was employed to quantitatively assess the monotonic relationships between all initial features and energy poverty status [36,37]. This process enabled the selection of a subset of features with the highest statistical significance, aiming to reduce data dimensionality, enhance model robustness, and establish a clear variable foundation for subsequent attribution analysis.
Phase 2: Predictive Model Construction and Optimization
In this phase, the filtered features were used to construct the energy poverty prediction model. Given the complexity of energy poverty determinants and the potential existence of non-linear interactions among features, a Convolutional Neural Network (CNN) was selected as the core predictive algorithm. CNNs are renowned for their powerful capabilities in feature abstraction and pattern recognition, enabling the automatic learning of deep discriminative features from multidimensional household data. We optimized the CNN model’s hyperparameters using grid search and cross-validation techniques to ensure optimal performance, and compared it against various traditional machine learning models to validate its superiority.
Phase 3: Model Interpretability Realization
To overcome the limitations of the “model black box,” this framework incorporates SHAP (SHapley Additive exPlanations), a game theory-based interpretability technique, applied to the CNN predictive model trained in Phase 2. SHAP provides a clear “contribution report” for each sample’s final prediction, precisely quantifying the magnitude and direction (positive or negative) of each feature’s influence on the model output. This allows us to address the core question: “On which key factors does the model judge a household to be energy poor? from both global (entire dataset) and local (individual household) perspectives.
Phase 4: Policy Interpretation and Sustainability Discussion
The ultimate objective of this framework is to translate data-driven insights into actionable policy knowledge. In this phase, we conduct an in-depth interpretation of the key feature contribution patterns revealed by the SHAP analysis, contextualizing them within China’s specific socio-economic background and energy policy landscape (e.g., “Targeted Poverty Alleviation,” “Clean Heating Initiative”). By exploring the underlying drivers, we ultimately link the analytical findings to the United Nations Sustainable Development Goals (SDGs), distilling targeted and actionable policy recommendations to advance the broader objectives of energy equity and social sustainable development.

2.3. Methodology

2.3.1. Feature Selection Method

Performing effective feature selection prior to building a predictive model is a critical step for enhancing model efficiency, robustness, and interpretability. This study employs Spearman’s rank correlation coefficient for feature screening, primarily based on the following two considerations: First, many variables in the CFPS data (such as expenditure shares and income) may not satisfy the strict assumption of normal distribution. As a non-parametric statistic, Spearman’s correlation coefficient assesses monotonic relationships rather than strict linear relationships, making it better suited for such data distributions. Second, its calculation is based on variable ranks, rendering it insensitive to outliers and thus capable of more robustly reflecting the underlying association between features and energy poverty status.
In practical application, we computed Spearman’s correlation coefficient and its statistical significance (p-value) between each initial feature and the energy poverty label (y1). A significance level of α = 0.05 was set, and only features exhibiting a statistically significant monotonic relationship with the target variable were retained for subsequent modeling. This process not only reduces the dimensionality of the feature space, effectively mitigating the “curse of dimensionality,” but more importantly, it identifies core drivers of energy poverty supported by statistical evidence. This provides a clear and reliable set of variables for the subsequent in-depth SHAP-based attribution analysis, ensuring that the final policy discussion remains focused on the most influential factors.

2.3.2. Predictive Model (CNN)

Although Convolutional Neural Networks (CNNs) are traditionally applied to image data, their potential for feature learning in tabular data is increasingly being recognized. This study selects CNN as the core predictive algorithm, primarily due to its theoretical advantage in capturing complex feature interactions. The determinants of energy poverty do not result from simple feature aggregation; rather, they involve intricate nonlinear combinations and high-order interactions among multiple features, such as those between “income level” and “energy efficiency,” or “household size” and “energy expenditure burden.” By leveraging local connectivity and weight sharing mechanisms, one-dimensional convolutional layers in CNNs can autonomously learn and extract these deep feature interaction patterns [38,39]. This gives them, in theory, greater expressive power than tree-based models such as Random Forests or XGBoost.
The architecture of the CNN model constructed in this study is as follows:
Input Layer: Receives the preprocessed and feature-selected feature vectors.
Convolutional Layers: We employ 1D convolutional kernels operating along the feature dimension to extract local feature patterns. The network contains two consecutive convolutional layers (with 32 and 64 filters, respectively, kernel size of 3), each followed by a ReLU activation function to introduce non-linearity.
Pooling Layer: A 1D max-pooling layer (pool size of 2) is applied after the convolutional layers to reduce dimensionality, enhance feature invariance, and control overfitting.
Fully Connected Layers: The output feature maps from the pooling layer are flattened and connected to a fully connected layer containing 100 neurons. Finally, a Sigmoid activation function is used in the output layer for binary classification (energy poor/non-poor) prediction.
Hyperparameters: The model uses the Adam optimizer with a learning rate of 0.001, binary cross-entropy loss function, 100 training epochs, and a batch size of 32.

2.3.3. Interpretability Method (SHAP)

To transform the high-accuracy CNN predictive model into a trustworthy decision-support tool, this study employs the SHAP framework for model interpretability analysis. SHAP originates from the Shapley value in game theory, with its core concept being the interpretation of an individual sample’s prediction as the linear sum of contributions from all features [40,41]. The SHAP value for each feature represents its marginal contribution to a specific prediction relative to the average prediction, thereby fairly allocating the influence of each feature on the final prediction outcome [42].
In this study, SHAP application aims to achieve two core objectives:
Global Interpretation: By analyzing SHAP values across all samples, identify and rank features with consistent importance to model predictions, answering the question: “Overall, which factors are the most critical drivers of energy poverty?”
Local Interpretation: For any individual household prediction, SHAP clearly demonstrates how each feature drives that household’s prediction from the “base value” (average prediction for all households) to the final predicted value, answering the question: “Why does the model classify this specific household as energy poor?”
This combined global and local interpretability capability enables policymakers to not only grasp the main drivers of energy poverty at a macro level but also understand the causative factors for individual households at a micro level. Consequently, it provides unprecedented data-driven insights for designing sustainable development policies that are both inclusive and precisely targeted.

2.3.4. Rationale for Model Selection

Although certain traditional methods, such as XGBoost and LightGBM, have demonstrated strong performance on structured data, energy poverty is characterized by significant nonlinear interactive features, such as the combined effects of expenditure structure and demographic attributes. This is why we opted for a CNN, which is better suited to capturing local feature interactions and nonlinear relationships. To mitigate overfitting in the deep learning model, this study employs cross-validation, regularization, weight decay, and early stopping strategies. The performance differences between the training and test sets are reported to demonstrate model stability.

2.3.5. Benchmark Models

To rigorously evaluate the performance of our proposed model, we compared it against a diverse set of eight mainstream machine learning classifiers. This selection covers a wide spectrum of algorithmic families, from simple linear models to complex ensemble methods, ensuring a comprehensive benchmarking. The models, their key characteristics, and the rationale for their inclusion are as follows:
(1)
Logistic Regression (LR): A linear baseline model. It is included to establish a simple, interpretable benchmark and to highlight the potential non-linearity in the data if more complex models perform significantly better.
(2)
k-Nearest Neighbors (KNN): An instance-based learning algorithm. It is sensitive to the local structure of the data and serves as a non-parametric benchmark.
(3)
Support Vector Machine (SVM): A powerful classifier effective in high-dimensional spaces. We used a linear kernel for simplicity and computational efficiency, representing a maximum-margin classifier.
(4)
Random Forest (RF): A robust bagging ensemble of decision trees. It is known for its high accuracy and resistance to overfitting, making it a strong benchmark for structured data.
(5)
Classification and Regression Tree (CART): A single decision tree model. It provides a simple, interpretable benchmark against which the ensemble and deep learning models can be compared.
(6)
eXtreme Gradient Boosting (XGBoost): A highly efficient and effective gradient boosting framework. It is often a top performer in tabular data competitions and represents the state-of-the-art in tree-based ensembles.
(7)
Light Gradient Boosting Machine (LightGBM): Another high-performance gradient boosting framework, optimized for speed and memory efficiency. Its inclusion allows for a comparison with XGBoost to assess the impact of different boosting implementations.

2.4. Experimental Setup and Evaluation Metrics

The CFPS is a longitudinal survey, and it is possible for the same households to appear across multiple waves. To prevent data leakage and ensure a robust evaluation of the model’s generalization ability to new, unseen households, we treated each annual dataset as an independent cross-section for the purpose of model training and testing. Specifically, when splitting the data for a given year into training and test sets (7:3), we ensured that all observations from a single household were allocated entirely to either the training set or the test set for that year. This approach prevents the model from being trained on data from a household and then evaluated on the same household in a different year within the same test set, thereby providing a more realistic and pessimistic estimate of performance when deploying the model on truly new household data.
To ensure a fair and robust evaluation of model performance, the following experimental setup was implemented in this study:
Data Splitting: For each annual dataset (2014, 2016, 2018, 2020), the data were randomly divided into a training set and an independent test set using a 7:3 ratio. The training set was utilized for model construction and parameter learning, while the test set was reserved for the final performance evaluation, ensuring that the results reflect the model’s generalization capability.
Model Validation: On the training set, a 10-fold cross-validation strategy was employed to optimize model hyperparameters and assess the stability of the training process. This method involves randomly partitioning the training data into 10 subsets, iteratively using 9 subsets for training and the remaining 1 for validation over 10 cycles. This approach makes efficient use of limited data while providing more reliable performance estimates.
Evaluation Metrics: Given that energy poverty prediction is a binary classification task, a comprehensive set of metrics was adopted for thorough assessment:
Accuracy: The proportion of correctly classified instances overall, measuring the model’s general performance.
Precision: The proportion of truly energy poor households among those predicted as energy poor, focusing on the model’s cost of misclassification (i.e., avoiding incorrect labeling of non-poor households as poor).
Recall: The proportion of actual energy poor households successfully identified by the model, emphasizing the model’s coverage capability (i.e., minimizing the omission of genuine poor households).
F1-Score: The harmonic mean of precision and recall, providing a balanced evaluation of the model’s performance on the positive class (energy poverty).
AUC (Area Under the ROC Curve): The area under the Receiver Operating Characteristic curve, measuring the model’s ability to distinguish between “energy poor” and “non-energy poor” households. A value closer to 1 indicates superior model performance.
Together, these metrics form a rigorous evaluation framework, ensuring that the proposed EPPE-FCS model is not only statistically accurate but also highly reliable and practical in addressing the societal challenge of energy poverty.
To prevent potential misinterpretations of the model’s high accuracy, we have further examined its rationale.
The high accuracy may be attributed to the following factors:
(1)
The energy poverty indicator itself is a relatively stable and clearly defined binary classification;
(2)
The seven features retained after Spearman correlation screening exhibit strong discriminative power for identifying poverty;
(3)
The CNN is capable of capturing interactions among features.
Therefore, the high accuracy does not indicate an anomaly. Nonetheless, we also note in the discussion that further validation should be conducted to address potential overfitting.

3. Results

3.1. Feature Selection Results

To construct a streamlined and effective predictive model, we first performed feature selection on the initial feature set using Spearman correlation analysis. Figure A1 presents heatmaps of the correlation coefficients between each feature and energy poverty status (y1) across the four datasets from 2014 to 2020.
The analysis reveals a highly consistent pattern of associations between features and the target variable across different years. Specifically, features including annual net income per capita (x3), modern fuel usage (x5), pipeline facilities (x6), urban-rural division (x11), and education level (x12) generally exhibited low correlation coefficients with energy poverty status that were statistically non-significant (p-value > 0.05). Consequently, these features were not retained in the final predictive model, as their statistically non-significant correlation with the target variable suggested limited predictive power within our modeling framework.
The final optimal feature subset comprises seven retained features: expenditure logarithm (x1), household size (x2), annual expenditure per capita (x4), housing expenditure share (x7), electricity expenditure share (x8), gas expenditure share (x9), and heating expenditure share (x10). This subset captures crucial dimensions of household economic capacity, basic living burdens, and core energy consumption structure. With the feature count reduced to 58.33% of the original set, this selection not only optimizes computational efficiency but, more importantly, ensures that subsequent prediction and attribution analysis focus on a set of statistically significant core drivers closely associated with energy poverty.

3.2. Model Predictive Performance

To evaluate the predictive efficacy of the EPPE-FCS model, we conducted a comprehensive comparison against eight mainstream machine learning models using independent test sets, with the results summarized in Table 2.
The experimental results clearly demonstrate that the proposed EPPE-FCS model achieved optimal or near-optimal comprehensive performance across all datasets from 2014 to 2020. The model attained prediction accuracies of 98.95%, 98.15%, 97.86%, and 97.94% for the respective years, representing an average improvement of approximately 0.67% to 1.3% over the second-best performing models (such as XGBoost and LightGBM). Notably, the performance gain was substantially more pronounced when compared to baseline models (e.g., CART and LR), with improvements exceeding 20%. Although this performance improvement appears marginal when compared to XGBoost, it carries substantial practical significance in the context of large-scale policy applications. Based on a population of tens of millions of households in China, a 1% increase in accuracy implies that tens to hundreds of thousands of households that might otherwise be misclassified can now be more accurately identified. For these households, this determines whether they receive crucial policy support, which is a matter that directly impacts both social equity and the efficiency of resource allocation. Therefore, the precision improvement achieved by the CNN-based model holds considerable practical value in policy contexts that strive for precision governance.
Beyond its exceptional accuracy, the EPPE-FCS model exhibited particularly outstanding performance in terms of precision, surpassing 99% in most years. This indicates an extremely low false positive rate when the model identifies a household as energy poor. To evaluate the model’s generalization capability and rule out overfitting risks, we compared prediction accuracy between the training and test sets. As shown in Table 2, the model demonstrated strong and consistent performance on the independent test set. Further quantitative comparisons revealed only minimal differences in average accuracy between the training and test sets across years: 2014 (99.10% vs. 98.95%), 2016 (98.40% vs. 98.15%), 2018 (98.00% vs. 97.86%), and 2020 (98.15% vs. 97.94%). The accuracy gap in each year did not exceed 0.25%, well below the empirical threshold of 5%. These results strongly indicate that the EPPE-FCS model does not over-rely on patterns in the training data, exhibits robust generalization capability, effectively controls overfitting risks, and delivers reliable outcomes. This characteristic is critically important for policy applications, as it ensures that limited poverty alleviation resources can be accurately directed to households genuinely in need, thereby avoiding misallocation of public funds. Such precision enhances both the efficiency and equity of poverty reduction policies, which is a key component of sustainable social governance.

3.3. Interpretability Analysis Results

3.3.1. Global Feature Importance

The global feature importance analysis based on SHAP values (Figure 2) reveals the core factors influencing energy poverty predictions. Across all years, annual household expenditure per capita (x4) consistently emerges as the most influential feature, with SHAP values substantially higher than other indicators, underscoring the fundamental role of absolute economic capacity in mitigating energy poverty.
The key factors following in importance are energy expenditure burdens, particularly the share of gas expenses (x9) and electricity expenses (x8). Notably, a longitudinal comparison of results from 2014 to 2020 shows that the global importance of the gas expense share exhibits a marked declining trend. This dynamic change may indicate that external factors (such as energy structure transformation, energy efficiency improvements, or subsidy policies) are altering the strength of the association between this variable and energy poverty. Furthermore, the share of heating expenses (x10) and housing expenditure share (x7) demonstrate stable contributions, while household size (x2) and expenditure logarithm (x1) exhibit relatively weaker impacts.
To quantitatively illustrate the degree of feature importance and its dynamic evolution, Table 3 presents the mean absolute SHAP values of key features for each year. A larger SHAP value indicates a greater overall influence of the feature on the model’s output. The data clearly show that annual expenditure per capita (x4) consistently exhibits the highest SHAP value across all years, confirming its role as the most influential driver. Meanwhile, the SHAP value for gas expenditure share (x9) demonstrates a marked decline, decreasing from 0.21 in 2014 to 0.12 in 2020. This quantified trend provides clear numerical evidence of its substantially diminished importance over the study period.

3.3.2. Feature Effect Direction and Mechanisms

The SHAP summary plots (Figure 3) provide deeper insights into the direction and mechanisms through which key features influence prediction outcomes. Panels (a) to (d) present the results for the years 2014 to 2020, respectively. The vertical axis is arranged in descending order of feature importance (measured by mean absolute SHAP values), while the horizontal axis represents the SHAP value of each feature. Each point in the plot corresponds to an individual instance, with its horizontal position indicating the SHAP value of the feature for that sample, and its color reflecting the magnitude of the original feature value (typically red for high and blue for low numerical values). This visualization offers deeper analytical insights by revealing how feature values influence model predictions across different samples:
Annual Household Expenditure per Capita (x4): Exhibits a clear negative relationship. Higher expenditure levels correspond to lower SHAP values, indicating a decreased probability of being classified as energy poor. This alignment with economic principles validates the model’s logical consistency.
Gas and Electricity Expense Shares (x9, x8): Demonstrate significant positive effects. Increasing proportions of these energy costs within total household expenditure substantially elevate the predicted risk of energy poverty (higher SHAP values). This visually confirms the “energy burden” mechanism—excessive energy expenses displace other essential household expenditures, thereby increasing poverty vulnerability.
Heating Expense Share (x10): Similarly shows a positive relationship, though its impact is relatively moderate compared to gas and electricity shares, as evidenced by its more concentrated value distribution in the plots.
This study not only provides an overall SHAP-based importance analysis but also complements it with directional interpretation of features to reveal underlying mechanisms. Furthermore, by contextualizing the results within national policy frameworks and energy market dynamics, we deliver insights that go beyond mere description and offer substantive explanatory value.

3.3.3. Local Interpretation: Representative Case Studies

To fully demonstrate the explanatory power of the SHAP framework at the micro level and provide specific insights for targeted policy interventions, we conducted in-depth case studies on three representative samples from the year 2020. These cases clearly reveal different driving patterns of energy poverty (Figure 4).
Case 1 (Energy-Affluent Sample) reaffirms the decisive protective role of high income. The sample’s exceptionally high per capita expenditure (x4 = 36,900 yuan) contributed the largest positive SHAP value (+0.32), completely offsetting the potential risk associated with its high share of electricity expenditure (x8). This case indicates that when a household’s economic capacity is sufficiently strong, energy burden no longer constitutes a poverty threat, underscoring from a different perspective the effectiveness of increasing household income as a fundamental solution.
Case 2 (Sample Near the Energy Poverty Line) reveals the potential link between housing costs and energy poverty. This sample had a moderate per capita expenditure (x4 = 3631.25 yuan), but its high share of housing expenditure (x7 = 0.255) emerged as the largest negative driving factor, significantly increasing its risk of falling into energy poverty. This suggests that, under budget constraints, housing costs may exert a severe “crowding-out effect” on energy expenditures. This case implies that housing security policies and energy poverty alleviation strategies need to be designed in a coordinated manner.
Case 3 (Severe Energy Poverty Sample) demonstrates the dominant influence of absolute income deprivation. The sample’s extremely low annual per capita expenditure (x4 = 693.4 yuan) was the primary factor pushing its predicted probability toward energy poverty (SHAP value: −0.25). Additionally, its zero expenditure on gas and heating (x9, x10) further reinforced its impoverished status, potentially indicating exclusion from modern clean energy services and a possible reliance on traditional solid fuels. This case corroborates the central role of economic capacity identified in the global analysis and highlights the issue of energy accessibility.

4. Discussion

The discussion in this study is grounded not only in our empirical results but also in a critical analysis of existing domestic and international literature. We further seek to explain the mechanisms underlying the interpretable outputs of the model and their policy implications. Our findings align with prior studies from Europe, South Asia, and Africa in identifying income capability and energy burden as the two core drivers of energy poverty [43,44]. However, unlike these regions, China underwent rapid energy structure transitions and infrastructure improvements during the study period, which provides a distinctive context for interpreting the observed dynamic changes in feature importance.
By developing an interpretable AI framework, this study not only achieves high predictive accuracy for energy poverty but, more importantly, offers an interpretable and quantitative perspective for understanding the intrinsic drivers of energy poverty and their complex interactions with Sustainable Development Goals. The following sections will discuss the core findings in detail and clarify their policy implications.

4.1. Core Drivers of Energy Poverty: The Dual Interplay of Economic Capacity and Energy Burden

The interpretability analysis in this study clearly reveals that annual household expenditure per capita and energy expenditure shares (particularly for electricity and gas) are the two most critical dimensions driving energy poverty. This finding confirms the dual nature of the energy poverty issue: it is simultaneously an issue of “poverty” stemming from insufficient income and an “energy” problem arising from excessively high energy costs or low energy efficiency.
Firstly, the overwhelming importance of annual household expenditure per capita, serving as a proxy for absolute economic capacity, indicates that increasing residents’ disposable income remains the most fundamental pathway to alleviating energy poverty. This aligns directly with the core objective of Sustainable Development Goal (SDG) 1 (No Poverty). A household with sufficient income inherently possesses greater resilience to cope with energy price fluctuations or increases in basic energy requirement.
Furthermore, a finding with greater policy relevance emerges from the interpretation of energy expenditure shares. SHAP analysis demonstrates that, even after accounting for overall expenditure levels, the energy burden itself constitutes an independent poverty-inducing factor. When the proportion of household expenditure allocated to energy becomes excessively high, it directly crowds out spending on other essential needs such as food, healthcare, and education, creating a vicious cycle of “energy expenditure—other deprivations.” This quantitative finding empirically solidifies conventional qualitative understanding, emphasizing that any strategy aimed at resolving energy poverty must concurrently address both “income enhancement” (increasing resources) and “burden reduction” (lowering the energy burden).
Unlike studies employing the LIHC or energy share metrics, this research adopts a needs-based energy poverty line, thereby more accurately capturing the expenditure gap of Chinese households in meeting basic energy needs. Notably, although SHAP analysis indicates that energy burden-related features exert a significant positive influence on poverty risk, the magnitude of this effect demonstrates a declining trend over time. This pattern aligns with changes observed in European countries following the implementation of subsidy policies, as documented in policy evaluation studies such as those in Greece. Consequently, the temporal analysis in this study enriches the existing literature by providing empirical evidence on how policies can reshape the drivers of energy poverty.

4.2. Dynamic Evolution and Policy Effectiveness: A Positive Signal

An encouraging finding is the clear declining trend in the global importance of the gas expense share (x9) from 2014 to 2020. This dynamic change is unlikely coincidental and appears closely associated with a series of robust national policies implemented in China during this period. For instance, the “13th Five-Year Plan” period saw vigorous promotion of clean energy substitution projects such as “coal-to-electricity” and “coal-to-gas,” alongside continuous upgrades to rural power grids. These initiatives significantly enhanced the accessibility and reliability of energy infrastructure. Simultaneously, government subsidies for clean energy adoption and energy-efficient equipment likely contributed to reducing households’ actual gas expenditure burden.
This trend demonstrates that well-targeted energy and infrastructure policies can effectively reshape the drivers of energy poverty. It confirms the feasibility and effectiveness of directly reducing residents’ energy costs through technological interventions and public policy. This provides micro-level evidence from China’s experience for achieving SDG 7 (Affordable and Clean Energy): improving energy accessibility and affordability directly alleviates energy poverty.

4.3. Policy Implications: From Universal Support to Targeted Intervention

The findings of this study offer clear directions for designing more sustainable and precise poverty alleviation policies:
Establishing a Tiered Targeting Mechanism: For households identified as energy poor by the model, policies should focus on direct energy cost relief. Examples include issuing “energy vouchers” specifically for electricity and gas expenses, or implementing tiered price subsidy systems to ensure the affordability of basic living energy needs. Such interventions directly address the key lever of “energy burden” with high efficiency and rapid impact.
Promoting Fundamental Capacity Building: In the long term, aligned with SDG 8 (Decent Work and Economic Growth), it is essential to continuously enhance households’ income-generating capacity through vocational skills training and creation of local employment opportunities. This approach aims to fundamentally address the insufficiency of “annual household expenditure per capita” and strengthen endogenous resilience against various risks.
Deepening the Sustainable Energy Transition: The declining importance of the gas expense share underscores the significant value of continued investment in clean energy infrastructure and promotion of high-efficiency appliances. Future policies should increase focus on distributed renewable energy applications in both urban and rural areas. This not only reduces energy costs but also decreases dependence on fossil fuels, simultaneously addressing climate change (SDG 13) and achieving dual environmental and social sustainability.

4.4. Potential Implications of Other Features

Although education level (x12) was excluded during feature selection, this may reflect measurement limitations of the educational variables in the CFPS dataset. Existing research has shown that education indirectly influences energy poverty by affecting employment quality and energy cognition [45]. Future studies should consider incorporating more refined indicators of education.
Similarly, the urban-rural variable (x11) was not retained in the final model. This could stem from a convergence in the driving mechanisms of energy poverty amid China’s ongoing urban-rural integration. Alternatively, it might indicate the model’s limited capacity to capture geographical heterogeneity.

4.5. Implications for Sustainable Development Goals

The interpretable AI framework developed in this study provides value not only by accurately identifying energy poverty but also by offering, based on Chinese microdata, empirical evidence and implementation pathways for understanding and advancing the United Nations Sustainable Development Goals (SDGs). Our findings resonate deeply with multiple SDG targets and specific indicators, with key implications outlined below:
First, this study is closely linked to SDG 7 (Ensure access to affordable, reliable, sustainable, and modern energy for all). Our SHAP analysis reveals that the importance of energy burden, represented by the share of gas expenditure, showed a declining trend from 2014 to 2020. This trend aligns temporally with China’s substantial efforts during this period to promote clean heating initiatives and upgrade rural power grids. It thereby provides micro-level evidence for assessing progress toward SDG Indicators 7.1.1 (Proportion of population with access to electricity) and 7.1.2 (Proportion of population with primary reliance on clean fuels and technology). This suggests that enhancing the accessibility and reliability of modern energy through infrastructure investment and policy interventions can directly and effectively reduce household energy burdens, serving as a key means to escape the energy poverty trap. The XAI tools provided in this study can, in the future, be used to dynamically monitor the progress of these sub-targets and accurately identify regions and populations still in need of support.
Second, our findings strongly reinforce the core tenets of SDG 1 (End poverty in all its forms everywhere). The empirical results indicate that annual household expenditure per capita is the most dominant predictor of energy poverty. This quantitative finding echoes the deeper implications of SDG 1.1 (Eradicate extreme poverty globally) and 1.2 (Reduce poverty in all its dimensions according to national definitions), that poverty is a multidimensional phenomenon. Energy poverty, as a specific manifestation of poverty in the energy domain, can only be fundamentally alleviated by enhancing households’ overall economic capacity. Our analysis confirms that a lack of sufficient economic resources is the primary driver of household energy poverty. Therefore, any strategy aimed at eliminating energy poverty must be integrated with broader income-increasing policies, such as those advocated under SDG 8 (Promote sustained, inclusive, and sustainable economic growth), to fundamentally strengthen households’ resilience and consumption capacity.
Finally, the study’s findings reveal the potential for synergistic action addressing both climate change and energy poverty, offering an integrated governance pathway for SDG 13 (Take urgent action to combat climate change and its impacts). The optimization in energy structure observed in the SHAP analysis, such as the declining share of gas expenditure, aligns with the broader transition toward reducing dependence on traditional fossil fuels and promoting clean energy. This reflects the integration of SDG 13.2 (Integrate climate change measures into national policies, strategies, and planning) within China’s energy and social policies. Our research demonstrates that interventions aimed at alleviating energy poverty, such as promoting clean energy and improving energy efficiency, can also yield emission reduction benefits by optimizing the energy consumption structure, thus achieving a dual-win outcome of poverty alleviation and emission reduction. The dual-track policy emphasized in this study, which equally prioritizes “income enhancement” and “clean transition,” serves as a concrete embodiment of this synergy.
Taken together, the analytical framework and empirical findings presented in this study not only provide a new methodological tool for the precise identification of energy poverty but also reveal, at a strategic level, the intrinsic link between energy poverty governance and the realization of the 2030 Sustainable Development Agenda.

4.6. Consideration of Regional Heterogeneity

While this study provides a national-level analysis using the nationally representative CFPS data, we acknowledge the significant regional disparities in energy poverty within China, as highlighted by prior research. Our model, trained on pooled national data, aims to identify common household-level determinants. However, the relative importance of drivers like energy burden or the effectiveness of specific policies (e.g., clean heating) likely varies across provinces, particularly between the more developed eastern regions and the less developed western regions. The current cross-sectional treatment of waves limits our ability to analyze these spatial dynamics in depth. Future research should explicitly incorporate geographical variables (e.g., province dummies, climatic zones) or employ spatial analysis/GIS techniques to stratify results and develop spatially targeted policy recommendations, thereby enhancing the practical relevance of AI-driven poverty identification tools.

4.7. Limitations and Future Research Directions

This study has several limitations. First, while the CFPS data are nationally representative, the questionnaire design limited the inclusion of some potentially important characteristics, such as housing insulation performance and specific household energy-use behaviors. Second, the study primarily provides static and retrospective explanations; future research could develop real-time prediction and dynamic early warning systems. Then, our analytical approach treated each survey wave as an independent cross-section to prevent data leakage. While this aligns with a common deployment scenario, it does not fully leverage the longitudinal nature of the CFPS data to explore household trajectories into and out of energy poverty. Future research should employ panel data models (e.g., fixed-effects models, LSTM for sequence prediction) to capture these dynamics and strengthen causal inference. Furthermore, as noted in Section 4.6, incorporating regional heterogeneity through geographical stratification or geospatial tools represents a critical avenue for extending the policy relevance of our findings. Finally, Fundamental Limitation of Explainable AI: Correlation vs. Causation. A paramount limitation of this study, and of post hoc explanation methods like SHAP in general, lies in the distinction between correlation and causation. Our model identifies features that are predictively important for identifying energy poverty within the context of the model and the data it was trained on. The ‘explanations’ provided by SHAP are, strictly speaking, explanations of the model’s behavior, not definitive explanations of the real-world phenomenon itself. While we contextualize these explanations with domain knowledge (e.g., linking the decline in gas share importance to policies), such interpretations are inferential and speculative. The model cannot rule out confounding variables (e.g., concurrent macroeconomic trends, changes in energy-consuming appliances, or regional price variations not captured in our data) that might be the true underlying drivers. Therefore, the insights generated should be viewed as robust hypotheses about the drivers of energy poverty, which require validation through causal inference methods (e.g., quasi-experimental designs, instrumental variables) in future research.
Building upon this foundation, future work will pursue three key directions: (1) deeper integration with GIS technology to investigate the spatiotemporal evolution of energy poverty and identify its spatial agglomeration and diffusion pathways; (2) incorporation of detailed “household energy consumption behavior data” to align with SDG 7.1.1 indicators on access to modern energy services; and (3) the development of dynamic poverty lines that adapt automatically with socio-economic development.
Ultimately, this evolution of our research program aims to construct a more intelligent and spatialized “identification-early warning-intervention” integrated governance system. The explainable AI framework developed in this study serves as the crucial foundational component for such a system. By demonstrating how high-precision prediction and transparent explanation can be fused, this work provides a scalable, data-driven “Chinese solution” for the localized implementation of global sustainable development goals, offering a powerful tool for sustainable social governance.

5. Conclusions and Policy Implications

This study developed an interpretable artificial intelligence framework integrating CNN and SHAP techniques, using Chinese household microdata from 2014 to 2020, to systematically analyze the identification and driving factors of energy poverty. The results demonstrate that the framework not only achieves strong predictive performance but also provides transparent and interpretable feature contributions, offering an evidence-based tool for targeted energy poverty governance and policy formulation.
The main findings include:
(1)
The model achieved high accuracy across all years, indicating that a framework combining deep learning with interpretability techniques exhibits robustness and practical potential in energy poverty identification;
(2)
Household expenditure per capita consistently emerged as the most critical determinant, highlighting the foundational role of economic capacity in household energy well-being;
(3)
Energy burden indicators (such as the share of electricity and gas expenditures) continued to exhibit independent and significant poverty-inducing effects even after controlling for economic capacity, indicating that the energy pricing system and household energy structure remain important risk sources;
(4)
The importance of gas expenditure showed a declining trend over time, aligning with China’s recent clean heating initiatives, expansion of gas infrastructure, and diversified subsidy policies, reflecting the positive impact of policy interventions in improving household energy access.
Based on these findings, this study proposes the following policy recommendations:
(1)
Enhance targeted energy subsidies for low-income households, with particular focus on groups facing high energy expenditure burdens, to alleviate structural risks of energy poverty;
(2)
Promote the adoption of distributed energy resources and high-efficiency appliances to reduce household energy expenditure pressure by improving energy efficiency and optimizing energy structure;
(3)
Implement regionally differentiated energy governance strategies, developing tailored policy solutions according to the specific needs of urban and rural areas, as well as northern heating zones and southern regions;
(4)
Establish long-term monitoring and evaluation mechanisms to continuously track trends in household energy poverty indicators, linking them with SDG 7.1 (universal access to modern energy services) and SDG 1.1 (eradicating extreme poverty) to provide data support for achieving sustainable energy transition.
This study contributes to the field in three key ways: it demonstrates the value of interpretable AI in energy poverty identification, provides a new methodological reference for future research, and offers empirical evidence to support the design of precise, efficient energy intervention policies.

Author Contributions

Conceptualization, H.Q., X.Q. and Y.S.; Methodology, H.Q., X.Q., J.Z., J.Y., L.R. and Y.S.; Software, H.Q., X.Q., Q.X., J.Y. and Y.S.; Data Curation, Q.X.; Writing—original draft, H.Q., X.Q., Q.X. and Y.S.; Writing—review and editing, X.Q., Q.X. and Y.S.; Supervision, H.Q., X.Q. and Y.S.; Funding acquisition, H.Q. All authors have read and agreed to the published version of the manuscript.

Funding

The Humanities and Social Sciences Research Foundation of the Ministry of Education (25YJCZH200; 23YJAZH118); Basic Research Plan of Shanxi Province (Free Exploration) Project (202403021221193); Shanxi Province Science and Technology Innovation Programme for Higher Education Institutions (2023L248); Taiyuan Normal University Achievement Transformation and Technology Transfer Base (2023P003).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used in this experiment are sourced from the China Family Panel Studies (https://www.isss.pku.edu.cn/cfps/sjzx/gksj/index.htm, accessed on 20 October 2025) and are freely accessible upon registration.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
SDGsSustainable Development Goals
XAIExplainable Artificial Intelligence
SHAPSHapley Additive exPlanations
LRLogistic Regression
KNNk-Nearest Neighbor
SVMSupport Vector Machines
RFRandom Forest
CARTClassification and Regression Tree
XGBoosteXtreme Gradient Boosting
LightGBMLight Gradient Boosting Machine
CNNConvolutional Neural Network
EPPE-FCSEnergy Poverty Prediction and Explanation Framework with CNN and SHAP
CFPSChina Family Panel Studies

Appendix A

Figure A1. Spearman Correlation Heatmap: Figure (ad) show the Spearman correlation heatmaps for the respective research variables in 2014, 2016, 2018, and 2020.
Figure A1. Spearman Correlation Heatmap: Figure (ad) show the Spearman correlation heatmaps for the respective research variables in 2014, 2016, 2018, and 2020.
Sustainability 17 11080 g0a1

References

  1. Lee, B.X.; Kjaerulf, F.; Turner, S.; Cohen, L.; Donnelly, P.D.; Muggah, R.; Davis, R.; Realini, A.; Kieselbach, B.; MacGregor, L.S.; et al. Transforming Our World: Implementing the 2030 Agenda Through Sustainable Development Goal Indicators. J Public Health Policy 2016, 37, 13–31. [Google Scholar] [CrossRef]
  2. Kanojia, S.; Kapoor, N.; Chhabra, M.; Sethi, P. Regional Disparities and International Spillover in Achieving the Sustainable Development Goals (SDGs) across the Globe. Discov. Sustain. 2025, 6, 993. [Google Scholar] [CrossRef]
  3. Dogan, E.; Madaleno, M.; Inglesi-Lotz, R.; Taskin, D. Race and Energy Poverty: Evidence from African-American Households. Energy Econ. 2022, 108, 105908. [Google Scholar] [CrossRef]
  4. Kalfountzou, E.; Papada, L.; Tourkolias, C.; Mirasgedis, S.; Kaliampakos, D.; Damigos, D. A Comparative Analysis of Machine Learning Algorithms in Energy Poverty Prediction. Energies 2025, 18, 1133. [Google Scholar] [CrossRef]
  5. Li, J.; Gao, M.; Luo, E.; Wang, J.; Zhang, X. Does Rural Energy Poverty Alleviation Really Reduce Agricultural Carbon Emissions? The Case of China. Energy Econ. 2023, 119, 106576. [Google Scholar] [CrossRef]
  6. Nie, P.; Li, Q.; Sousa-Poza, A. Energy Poverty and Subjective Well-Being in China: New Evidence from the China Family Panel Studies. Energy Econ. 2021, 103, 105548. [Google Scholar] [CrossRef]
  7. Hafner, M.; Raimondi, P.P. Priorities and Challenges of the EU Energy Transition: From the European Green Package to the New Green Deal. Russ. J. Econ. 2020, 6, 374–389. [Google Scholar] [CrossRef]
  8. Sy, S.A.; Mokaddem, L. Energy Poverty in Developing Countries: A Review of the Concept and Its Measurements. Energy Res. Soc. Sci. 2022, 89, 102562. [Google Scholar] [CrossRef]
  9. Zhao, J.; Dong, K.; Dong, X.; Shahbaz, M. How Renewable Energy Alleviate Energy Poverty? A Global Analysis. Renew. Energy 2022, 186, 299–311. [Google Scholar] [CrossRef]
  10. Yue, J.; Chen, S.; Weng, Z. Digital Technology Alleviates Intergenerational Energy Poverty: Evidence from Household Analysis. Appl. Energy 2025, 399, 126488. [Google Scholar] [CrossRef]
  11. Papada, L.; Kaliampakos, D. Measuring Energy Poverty in Greece. Energy Policy 2016, 94, 157–165. [Google Scholar] [CrossRef]
  12. Betto, F.; Garengo, P.; Lorenzoni, A. A New Measure of Italian Hidden Energy Poverty. Energy Policy 2020, 138, 111237. [Google Scholar] [CrossRef]
  13. Siksnelyte-Butkiene, I. A Systematic Literature Review of Indices for Energy Poverty Assessment: A Household Perspective. Sustainability 2021, 13, 10900. [Google Scholar] [CrossRef]
  14. Jiang, L.; Yu, L.; Xue, B.; Chen, X.; Mi, Z. Who Is Energy Poor? Evidence from the Least Developed Regions in China. Energy Policy 2020, 137, 111122. [Google Scholar] [CrossRef]
  15. Boardman, B. Fuel Poverty Synthesis: Lessons Learnt, Actions Needed: Fuel Poverty Comes of Age: Commemorating 21 Years of Research and Policy. Energy Policy 2012, 49, 143–1478. [Google Scholar] [CrossRef]
  16. Hills, J. Fuel Poverty: The Problem and Its Measurement; Department for Energy and Climate Change: London, UK, 2011.
  17. Nussbaumer, P.; Bazilian, M.; Modi, V. Measuring Energy Poverty: Focusing on What Matters. Renew. Sustain. Energy Rev. 2012, 16, 231–243. [Google Scholar] [CrossRef]
  18. Barnes, D.F.; Khandker, S.R.; Samad, H.A. Energy Poverty in Rural Bangladesh. Energy Policy 2011, 39, 894–904. [Google Scholar] [CrossRef]
  19. Li, W.; Chien, F.; Hsu, C.-C.; Zhang, Y.; Nawaz, M.A.; Iqbal, S.; Mohsin, M. Nexus between Energy Poverty and Energy Efficiency: Estimating the Long-Run Dynamics. Resour. Policy 2021, 72, 102063. [Google Scholar] [CrossRef]
  20. Liu, Z.; Lan, Y.; Deng, M.; Zhang, Y. Accurate Identification of Energy Poverty in China: An Analysis Based on an Equivalent Scale. J. Quant. Technol. Econ. 2023, 40, 136–157. [Google Scholar]
  21. Arribas-Bel, D.; Patino, J.E.; Duque, J.C. Remote Sensing-Based Measurement of Living Environment Deprivation: Improving Classical Approaches with Machine Learning. PLoS ONE 2017, 12, e0176684. [Google Scholar] [CrossRef]
  22. Shen, T.; Zhan, Z.; Jin, L.; Huang, F.; Xu, H. Research on Method of Identifying Poor Families Based on Machine Learning. In Proceedings of the 2021 IEEE 4th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC), Chongqing, China, 18–20 June 2021; Volume 4, pp. 10–13. [Google Scholar] [CrossRef]
  23. Huang, W.; Liu, Y.; Hu, P.; Ding, S.; Gao, S.; Zhang, M. What Influence Farmers’ Relative Poverty in China: A Global Analysis Based on Statistical and Interpretable Machine Learning Methods. Heliyon 2023, 9, e19525. [Google Scholar] [CrossRef] [PubMed]
  24. Al Kez, D.; Foley, A.; Khald Abdul, Z.; Furszyfer Del Rio, D. Machine Learning-Based Approach for Predicting Energy Poverty in the United Kingdom Incorporating Remote-Sensing and Socioeconomic Data; Social Science Research Network: Rochester, NY, USA, 2023. [Google Scholar] [CrossRef]
  25. Gawusu, S.; Jamatutu, S.A.; Ahmed, A. Predictive Modeling of Energy Poverty with Machine Learning Ensembles: Strategic Insights from Socioeconomic Determinants for Effective Policy Implementation. Int. J. Energy Res. 2024, 2024, 9411326. [Google Scholar] [CrossRef]
  26. Das, R.K.; Islam, M.; Hasan, M.M.; Razia, S.; Hassan, M.; Khushbu, S.A. Sentiment Analysis in Multilingual Context: Comparative Analysis of Machine Learning and Hybrid Deep Learning Models. Heliyon 2023, 9, e20281. [Google Scholar] [CrossRef]
  27. Anand, V.; Khajuria, A.; Pachauri, R.K.; Gupta, V. Optimized Machine Learning Based Comparative Analysis of Predictive Models for Classification of Kidney Tumors. Sci. Rep. 2025, 15, 30358. [Google Scholar] [CrossRef] [PubMed]
  28. Makumbura, R.K.; Mampitiya, L.; Rathnayake, N.; Meddage, D.P.P.; Henna, S.; Dang, T.L.; Hoshino, Y.; Rathnayake, U. Advancing Water Quality Assessment and Prediction Using Machine Learning Models, Coupled with Explainable Artificial Intelligence (XAI) Techniques like Shapley Additive Explanations (SHAP) for Interpreting the Black-Box Nature. Results Eng. 2024, 23, 102831. [Google Scholar] [CrossRef]
  29. Reddy, B.; Srikanya, K.; Varshini, M.; Srijanya, S.; Reddy, G.; Shirisha, C. The Application of Machine Learning to the Task of Poverty Classification. In Proceedings of the 2023 International Conference on Advances in Computing, Communication and Applied Informatics (ACCAI), Chennai, India, 25–26 May 2023; pp. 1–4. [Google Scholar] [CrossRef]
  30. Machlev, R.; Heistrene, L.; Perl, M.; Levy, K.Y.; Belikov, J.; Mannor, S.; Levron, Y. Explainable Artificial Intelligence (XAI) Techniques for Energy and Power Systems: Review, Challenges and Opportunities. Energy AI 2022, 9, 100169. [Google Scholar] [CrossRef]
  31. Available online: https://www.isss.pku.edu.cn/cfps/sjzx/gksj/index.htm (accessed on 20 October 2025).
  32. Al Kez, D.; Foley, A.; Lowans, C.; Del Rio, D.F. Energy Poverty Assessment: Indicators and Implications for Developing and Developed Countries. Energy Convers. Manag. 2024, 307, 118324. [Google Scholar] [CrossRef]
  33. Lin, L.; Wang, Z.; Liu, J.; Xu, X. A Review of Rural Household Energy Poverty: Identification, Causes and Governance. Agriculture 2023, 13, 2185. [Google Scholar] [CrossRef]
  34. Mersha, M.A.; Yigezu, M.G.; Tonja, A.L.; Shakil, H.; Iskander, S.; Kolesnikova, O.; Kalita, J. Explainable AI: XAI-Guided Context-Aware Data Augmentation. Expert Syst. Appl. 2025, 289, 128364. [Google Scholar] [CrossRef]
  35. Zhang, W.; Lei, T.; Gong, Y.; Zhang, J.; Wu, Y. Using Explainable Artificial Intelligence to Identify Key Characteristics of Deep Poverty for Each Household. Sustainability 2022, 14, 9872. [Google Scholar] [CrossRef]
  36. Qin, C.; Luo, X.; Deng, C.; Shu, K.; Zhu, W.; Griss, J.; Hermjakob, H.; Bai, M.; Perez-Riverol, Y. Deep Learning Embedder Method and Tool for Mass Spectra Similarity Search. J. Proteom. 2021, 232, 104070. [Google Scholar] [CrossRef] [PubMed]
  37. Amarkhil, Q.; Elwakil, E.; Hubbard, B. A Meta-Analysis of Critical Causes of Project Delay Using Spearman’s Rank and Relative Importance Index Integrated Approach. Can. J. Civ. Eng. 2021, 48, 1498–1507. [Google Scholar] [CrossRef]
  38. Da, Q.; Chen, Y.; Dai, B.; Li, D.; Fan, L. Prediction of Slope Safety Factor Based on Attention Mechanism-Enhanced CNN-GRU. Sustainability 2024, 16, 6333. [Google Scholar] [CrossRef]
  39. Han, Z.; Cui, B.; Xu, L.; Wang, J.; Guo, Z. Coupling LSTM and CNN Neural Networks for Accurate Carbon Emission Prediction in 30 Chinese Provinces. Sustainability 2023, 15, 13934. [Google Scholar] [CrossRef]
  40. Sullivan, R.S.; Longo, L. Explaining Deep Q-Learning Experience Replay with SHapley Additive exPlanations. Mach. Learn. Knowl. Extr. 2023, 5, 1433–1455. [Google Scholar] [CrossRef]
  41. Uppalapati, S.; Paramasivam, P.; Kilari, N.; Chohan, J.S.; Kanti, P.K.; Vemanaboina, H.; Dabelo, L.H.; Gupta, R. Precision Biochar Yield Forecasting Employing Random Forest and XGBoost with Taylor Diagram Visualization. Sci. Rep. 2025, 15, 7105. [Google Scholar] [CrossRef]
  42. Huang, F.; Zhang, X. A New Interpretable Streamflow Prediction Approach Based on SWAT-BiLSTM and SHAP. Environ. Sci. Pollut. Res. 2024, 31, 23896–23908. [Google Scholar] [CrossRef]
  43. Guan, Y.; Yan, J.; Shan, Y.; Zhou, Y.; Hang, Y.; Li, R.; Liu, Y.; Liu, B.; Nie, Q.; Bruckner, B.; et al. Burden of the Global Energy Price Crisis on Households. Nat. Energy 2023, 8, 304–316. [Google Scholar] [CrossRef]
  44. Leal Filho, W.; Gatto, A.; Sharifi, A.; Salvia, A.L.; Guevara, Z.; Awoniyi, S.; Mang-Benza, C.; Nwedu, C.N.; Surroop, D.; Teddy, K.O.; et al. Energy Poverty in African Countries: An Assessment of Trends and Policies. Energy Res. Soc. Sci. 2024, 117, 103664. [Google Scholar] [CrossRef]
  45. Katoch, O.R.; Sharma, R.; Parihar, S.; Nawaz, A. Energy Poverty and Its Impacts on Health and Education: A Systematic Review. Int. J. Energy Sect. Manag. 2023, 18, 411–431. [Google Scholar] [CrossRef]
Figure 1. Explainable Artificial Intelligence (XAI) Analysis Framework.
Figure 1. Explainable Artificial Intelligence (XAI) Analysis Framework.
Sustainability 17 11080 g001
Figure 2. Analysis of Feature Importance on Datasets of Different Years: Subfigures (ad) represent the analysis results for 2014, 2016, 2018, and 2020, respectively.
Figure 2. Analysis of Feature Importance on Datasets of Different Years: Subfigures (ad) represent the analysis results for 2014, 2016, 2018, and 2020, respectively.
Sustainability 17 11080 g002
Figure 3. Summary Plot of SHAP on Different Year Datasets: Subfigures (ad) represent the analysis results for 2014, 2016, 2018, and 2020, respectively.
Figure 3. Summary Plot of SHAP on Different Year Datasets: Subfigures (ad) represent the analysis results for 2014, 2016, 2018, and 2020, respectively.
Sustainability 17 11080 g003
Figure 4. SHAP Dependence Plots of Feature Contributions for Representative Cases. (a) Non-Energy Poverty Sample, (b) Energy Poverty Line Appendix Sample, (c) Energy Poverty Sample.
Figure 4. SHAP Dependence Plots of Feature Contributions for Representative Cases. (a) Non-Energy Poverty Sample, (b) Energy Poverty Line Appendix Sample, (c) Energy Poverty Sample.
Sustainability 17 11080 g004
Table 1. Description of the Energy Poverty Dataset.
Table 1. Description of the Energy Poverty Dataset.
YearSymbolVariable DescriptionMeanStd. Dev.MaxMin
2014x1Log expenditure10.500.9515.455.30
x2Household size (persons)3.731.8317.001.00
x3Annual net income per capita (yuan)14,420.9619,829.06980,000.000.25
x4Annual expenditure per capita (yuan)18,171.1138,820.332,562,500.00100.00
x5Modern fuel usage (1 = Yes, 0 = No)0.630.481.000.00
x6Piped utility access (1 = With, 0 = Without)0.690.461.000.00
x7Housing expenditure share (%)0.120.140.980.00
x8Electricity expenditure share (%)0.030.040.840.00
x9Gas expenditure share (%)0.030.580.900.00
x10Heating expenditure share (%)0.010.030.800.00
x11Urban-rural (1 = Urban, 0 = Rural)0.490.501.000.00
x12Education level (0 = Illiterate/Semi-literate, 1 = Primary, 2 = Junior high, 3 = Senior high, 4 = College+)1.741.004.000.00
2016x1Log expenditure10.750.9415.464.65
x2Household size (persons)3.671.8819.001.00
x3Annual net income per capita (yuan)17,466.7630,417.941,806,000.000.45
x4Annual expenditure per capita (yuan)23,708.8337,770.121,292,305.0052.00
x5Modern fuel usage (1 = Yes, 0 = No)0.680.461.000.00
x6Piped utility access (1 = With, 0 = Without)0.750.441.000.00
x7Housing expenditure share (%)0.110.140.980.00
x8Electricity expenditure share (%)0.030.030.570.00
x9Gas expenditure share (%)0.020.040.860.00
x10Heating expenditure share (%)0.010.020.600.00
x11Urban-rural (1 = Urban, 0 = Rural)0.500.501.000.00
x12Education level (0 = Illiterate/Semi-literate, 1 = Primary, 2 = Junior high, 3 = Senior high, 4 = College+)1.771.014.000.00
2018x1Log expenditure10.780.9614.524.80
x2Household size (persons)3.571.9121.001.00
x3Annual net income per capita (yuan)22,348.7630,750.961,012,500.000.00
x4Annual expenditure per capita (yuan)25,462.8335,298.461,614,900.00120.00
x5Modern fuel usage (1 = Yes, 0 = No)0.740.441.000.00
x6Piped utility access (1 = With, 0 = Without)0.770.421.000.00
x7Housing expenditure share (%)0.120.140.960.00
x8Electricity expenditure share (%)0.030.040.960.00
x9Gas expenditure share (%)0.030.060.650.00
x10Heating expenditure share (%)0.010.020.490.00
x11Urban-rural (1 = Urban, 0 = Rural)0.510.501.000.00
x12Education level (0 = Illiterate/Semi-literate, 1 = Primary, 2 = Junior high, 3 = Senior high, 4 = College+)1.951.034.001.00
2020x1Log expenditure10.870.9815.205.71
x2Household size (persons)3.621.9315.001.00
x3Annual net income per capita (yuan)32,606.2151,798.042,011,200.000.00
x4Annual expenditure per capita (yuan)27,940.5038,004.03801,352.00200.00
x5Modern fuel usage (1 = Yes, 0 = No)0.790.411.000.00
x6Piped utility access (1 = With, 0 = Without)0.830.381.000.00
x7Housing expenditure share (%)0.130.140.950.00
x8Electricity expenditure share (%)0.030.040.720.00
x9Gas expenditure share (%)0.030.480.720.00
x10Heating expenditure share (%)0.010.020.500.00
x11Urban-rural (1 = Urban, 0 = Rural)0.530.501.000.00
x12Education level (0 = Illiterate/Semi-literate, 1 = Primary, 2 = Junior high, 3 = Senior high, 4 = College+)2.141.074.001.00
Note: Variables (x7–x10) are expressed as decimals ranging from 0 to 1, representing the share of total household expenditure. Outliers were winsorized at the 99th percentile to ensure statistical consistency. Original percentages (e.g., 4.14%) have been converted to decimal form (e.g., 0.0414).
Table 2. Comparison Table of Prediction Results of Each Model.
Table 2. Comparison Table of Prediction Results of Each Model.
YearModelAccuracyPrecisionF1 ScoreRecallAUC
2014LR73.28%72.78%63.09%66.36%81.05%
KNN88.86%91.27%82.30%86.19%95.34%
SVM82.57%81.75%78.64%79.30%92.10%
RF95.44%94.70%94.80%94.68%99.49%
CART93.74%92.66%92.78%92.68%93.61%
XGBoost97.69%97.23%97.36%97.29%99.81%
LightGBM97.43%96.81%97.18%96.99%99.78%
Gradient Boosting95.83%95.26%94.92%95.09%99.27%
EPPE-FCS98.95%99.64%97.90%98.75%98.81%
2016LR77.51%78.37%71.62%74.07%85.98%
KNN84.12%83.53%82.01%82.40%92.19%
SVM82.52%82.77%79.24%80.38%91.20%
RF95.75%95.25%95.54%95.33%99.49%
CART94.21%93.70%93.63%93.61%94.16%
XGBoost97.73%97.34%97.63%97.48%99.81%
LightGBM97.71%97.42%97.51%97.46%99.82%
Gradient Boosting96.31%95.99%95.83%95.91%99.45%
EPPE-FCS98.15%99.99%98.71%96.73%96.85%
2018LR77.19%77.86%80.42%78.71%84.94%
KNN89.44%91.14%88.91%89.86%96.19%
SVM86.50%84.84%91.42%87.78%95.05%
RF95.34%95.13%96.25%95.64%99.39%
CART93.49%93.77%93.99%93.84%93.46%
XGBoost97.83%97.79%98.11%97.95%99.83%
LightGBM97.74%97.75%97.98%97.86%99.82%
Gradient Boosting96.18%95.92%96.88%96.40%99.50%
EPPE-FCS97.86%99.59%96.36%97.94%97.95%
2020LR72.42%72.57%82.63%77.26%77.59%
KNN83.78%85.92%85.42%85.66%90.99%
SVM77.99%75.91%89.68%82.21%85.89%
RF95.07%94.67%96.77%95.71%99.13%
CART92.60%93.92%92.99%93.44%92.67%
XGBoost96.98%96.80%97.91%97.35%99.67%
LightGBM96.90%96.87%97.70%97.28%99.68%
Gradient Boosting94.53%93.73%96.82%95.25%98.86%
EPPE-FCS97.94%98.77%98.13%96.44%95.75%
Note: The bolded sections indicate the optimal performance for each metric.
Table 3. Mean Absolute SHAP Values of Key Features.
Table 3. Mean Absolute SHAP Values of Key Features.
FeatureDescriptionMean SHAP Value
2014201620182020
x4Annual household expenditure per capita (in yuan)0.21730.23130.23320.2158
x9Share of gas expenditure (%)0.13990.13450.12940.1159
x8Share of electricity expenditure (%)0.09720.11870.11550.1047
x10Share of heating expenditure (%)0.09560.09910.09430.1027
x7Share of housing expenditure (%)0.05150.03520.04290.0475
x2Household size (number of persons)0.03420.03310.02720.0331
x1Logarithm of total expenditure0.00830.01070.01720.0231
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Qi, H.; Xue, Q.; Shi, Y.; Qi, X.; Yang, J.; Zheng, J.; Ren, L. Leveraging Explainable AI to Decode Energy Poverty in China: Implications for SDGs and National Policy. Sustainability 2025, 17, 11080. https://doi.org/10.3390/su172411080

AMA Style

Qi H, Xue Q, Shi Y, Qi X, Yang J, Zheng J, Ren L. Leveraging Explainable AI to Decode Energy Poverty in China: Implications for SDGs and National Policy. Sustainability. 2025; 17(24):11080. https://doi.org/10.3390/su172411080

Chicago/Turabian Style

Qi, Hui, Qiang Xue, Ying Shi, Xiaobo Qi, Jing Yang, Jingjing Zheng, and Lifang Ren. 2025. "Leveraging Explainable AI to Decode Energy Poverty in China: Implications for SDGs and National Policy" Sustainability 17, no. 24: 11080. https://doi.org/10.3390/su172411080

APA Style

Qi, H., Xue, Q., Shi, Y., Qi, X., Yang, J., Zheng, J., & Ren, L. (2025). Leveraging Explainable AI to Decode Energy Poverty in China: Implications for SDGs and National Policy. Sustainability, 17(24), 11080. https://doi.org/10.3390/su172411080

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop