Revealing the Driving Factors of Household Energy Consumption in High-Density Residential Areas of Beijing Based on Explainable Machine Learning

Qi, Zizhuo; Zhang, Lu; Yang, Xin; Zhao, Yanxia

doi:10.3390/buildings15071205

Open AccessArticle

Revealing the Driving Factors of Household Energy Consumption in High-Density Residential Areas of Beijing Based on Explainable Machine Learning

School of Architecture and Art, North China University of Technology, Beijing 100144, China

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Buildings 2025, 15(7), 1205; https://doi.org/10.3390/buildings15071205

Submission received: 26 February 2025 / Revised: 22 March 2025 / Accepted: 1 April 2025 / Published: 7 April 2025

(This article belongs to the Special Issue AI and Data Analytics for Energy-Efficient and Healthy Buildings: 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

This study explores the driving factors of household energy consumption in high-density residential areas of Beijing and proposes targeted energy-saving strategies. Data were collected through field surveys, questionnaires, and interviews, covering 16 influencing factors across household, building, environment, and transportation categories. A hyperparameter-optimized ensemble model (XGBoost, RF, GBDT) was employed, with XGBoost combined with genetic algorithm tuning performing best. SHAP analysis revealed that key factors varied by season but included floor level, daily travel distance, building age, greening rate, water bodies, and household age. The findings inform strategies such as optimizing workplace–residence layout, improving building insulation, increasing green spaces, and promoting community energy-saving programs. This study provides refined data support for energy management in high-density residential areas, enhances the application of energy-saving technologies, and encourages low-carbon lifestyles. By effectively reducing energy consumption and carbon emissions during the operational phase of residential areas, it contributes to urban green development and China’s “dual carbon” goals.

Keywords:

high-density residential area; household energy consumption; hyperparameter optimization; XGBoost

1. Introduction

The global energy crisis poses a severe challenge to human well-being and sustainable development. In response, the parties to the United Nations Framework Convention on Climate Change have reached a consensus, explicitly proposing a global action framework for controlling greenhouse gas emissions and energy consumption [1]. As a key practitioner, China has set forth the “dual carbon” strategy, aiming to achieve peak carbon emissions before 2030 and carbon neutrality before 2060 [2]. Notably, the building sector, as a focal point of the energy transition, accounts for 40% of global energy consumption and contributes 39% of greenhouse gas emissions [3], making it a critical constraint on sustainable development. According to statistics, residential buildings account for more than 60% of the total energy consumption in the building sector (IEA) [4]. This structural characteristic is even more pronounced in China’s urbanization process, where operational building energy consumption reached 1.19 billion tons of standard coal equivalent in 2022, with urban residential buildings contributing 42% of the total carbon emissions in the building sector [5]. Studies indicate that, driven by accelerated urbanization, rising living standards, and the widespread adoption of HVAC systems, carbon emissions from residential buildings exhibit a rigid growth trend positively correlated with urban development [6]. This suggests that the residential building sector holds significant potential for energy savings, and the effectiveness of energy consumption control will directly impact the realization of China’s “dual carbon” strategic goals.

Energy consumption and carbon emissions during the operational phase of residential areas primarily involve three components: the building itself, household practices, and public facilities and service systems [6]. Among these, household energy consumption, including daily energy use (such as the consumption of electricity, gas, hot water, etc.) and transportation energy use [7], constitutes the most significant proportion of energy consumption and carbon emissions during the operational phase [8]. It has been shown that residents are one of the main factors influencing a building’s energy usage throughout its operational phase [9,10]. Specifically, both the living environments (objective factors) and energy-saving awareness and behavior (subjective factors) [11,12] of residents affect household energy consumption.

Since the housing commercialization reform in the 1990s, Chinese cities have begun to construct large numbers of high-rise residential buildings to cope with pressures such as land shortages and population growth brought about by rapid urbanization. These residential areas, built around the year 2000, typically feature high floor area ratios, high building coverage, and low development space ratios or high population densities. In related research, existing urban residential areas with a floor area ratio between 2.0 and 4.0 or a building density of 30–40% are referred to as high-density residential areas [13]. By 2023, the existing coverage of high-density residential areas in China was approximately 335.5 billion square meters, making them the main component of urban residential development in China. Compared to newly developed residential areas, these rapidly developed, large-scale residential areas often have poor energy-saving performance due to insufficient investment during construction and the underestimation of future living requirements. There is a significant gap between energy-saving design standards applied during construction and those required by existing residential areas [14].

In summary, existing high-density residential areas are a major part of China’s urban residential development and hold great potential for achieving energy-saving and emission reduction goals in the building sector. Regulating household energy consumption in high-density residential areas is a necessary step toward achieving China’s dual carbon goals. Extensive research has been conducted on effective energy management and maximizing the energy-saving potential of buildings [15,16,17]. However, these studies often focus on newly built residential areas, with few exploring existing buildings. Furthermore, due to the complex characteristics of existing high-density residential areas and individual differences in aspects such as their structure, function, and spatial layout; the residential environment; household composition; and residents’ energy-saving awareness and behavior, large-scale planning strategies or generalized design approaches aimed at new areas are often unsuitable for the actual conditions of residential areas, significantly reducing their feasibility.

Therefore, to establish customized and refined energy-saving measures that align with the diversity and complexity of existing high-density residential areas, it is imperative to perform comprehensive surveys to precisely identify the composition, influencing factors, and weight of household energy consumption.

2. Literature Review

2.1. Objective Influencing Factors of Household Energy Consumption

The environment of residential areas has been shown to effectively influence household energy consumption. Scholars have conducted studies on the objective factors affecting household energy consumption, and the results indicate that a reasonable building layout and orientation can optimize indoor natural lighting and ventilation [17,18]. For instance, in China, residential areas with a north–south orientation have higher indoor solar heat gain in winter, reducing heating energy consumption [19], while in summer, they receive less direct sunlight, reducing air conditioning demand [20]. The spacing between buildings is also a hidden factor affecting household energy consumption. Sufficient spacing ensures ample natural lighting and good ventilation, reducing the need for artificial lighting and mechanical ventilation, which lowers household energy consumption [21]. On the other hand, if the spacing is too small, it may lead to insufficient lighting in some rooms, increasing the use of lighting devices and, consequently, household energy consumption [22]. Additionally, building form has been proven to influence household energy consumption. For example, a smaller building shape coefficient leads to a smaller heat dissipation area per unit of building area, which results in lower energy consumption. Simpler and more regular building forms with smaller shape coefficients lead to less heat loss in winter and less heat gain in summer, effectively reducing energy consumption for indoor heating and cooling [23]. Related studies have shown that for every 0.01 increase in the shape coefficient, energy consumption increases by about 2.5% [24]. Furthermore, research has found that the layout of high-density residential areas can create “wind-shadow zones” [25], which are unfavorable for ventilation and increase the use of active regulation equipment such as air conditioning, thereby increasing household energy consumption and carbon emissions. In addition, some studies suggest that building materials and building equipment can also affect household energy consumption.

2.2. Subjective Influencing Factors of Household Energy Consumption

In addition to the impact of the built environment on household energy consumption, some scholars have incorporated variables such as household socioeconomic characteristics [26], cognitive traits [27], socio-psychological factors [28], and behavioral preferences [29] when studying the influencing factors of household energy consumption. Their findings indicate that these factors have a significant impact on household energy use.

For example, a study that distributed questionnaires to 249 office employees and applied statistical analysis models to explore the formation mechanisms of energy-saving cognition revealed a theoretical coupling between its driving factors and behavioral intervention variables. The results showed that normative elements in the socio-psychological dimension (e.g., descriptive norms, organizational energy-saving culture, and media communication) positively influence energy-saving cognition by reinforcing the pathways of energy-saving responsibility perception and social pressure transmission [27]. Another study focusing on behavioral preferences utilized government open-source self-reported data from residents instead of standardized behavioral assumptions, significantly improving the alignment between actual household energy consumption and model predictions while reconstructing the evaluation framework for energy retrofit benefits [29].

Furthermore, some studies have deconstructed socio-psychological variables to reveal the key driving roles of individual factors such as residents’ energy consumption attitudes [30], perceived behavioral control [31], and social norms [32] in shaping energy-saving cognition [21]. Notably, although academic research has confirmed the dynamic relationship between energy-saving cognition and behavioral practices [32,33], there exists an asymmetrical response pattern: cognitive enhancement does not necessarily translate directly into sustainable behavioral optimization [34], and short-term behavioral improvements are difficult to sustain for long-term energy-saving effects [35].

Therefore, analyzing the complex interaction pathways between residents’ energy-saving cognition and behavior has become a crucial research direction for unlocking energy-saving potential and achieving real energy efficiency in buildings.

2.3. Analysis Methods for Influencing Factors of Household Energy Consumption

Due to the multi-faceted nature of household energy consumption, which involves building materials, site environment, and human factors, traditional regression methods are insufficient for predicting energy consumption across multiple dimensions [36]. Machine learning models can address complex nonlinear relationships [37], while SHAP values clarify variable contributions to the model, enhancing interpretability [38]. Machine learning has become a leading approach for energy consumption research. Scholars have used models such as artificial neural networks [39,40,41,42], XGBoost [39,43,44,45,46], LGBD [47], Random Forest [39,46,48,49], support vector machines [39,40,41,50], and Extra Trees [39] to predict residential heating demand [39], energy demand [46], and building energy consumption [47]. SHAP values were used to explain parameter contributions, providing data support for energy-saving retrofits in the building sector.

For model performance evaluation, scholars commonly use mean absolute percentage error (MAPE), mean absolute error (MAE), and coefficient of determination (R²). Despite widespread use in energy prediction, previous studies often overlook hyperparameter optimization for performance improvement [51]. Traditional methods like Grid Search [52,53,54] and Random Search [53,55,56] are inefficient [57,58]. Recent trends favor optimization algorithms such as Bayesian Optimization [47] and Tree-structured Parzen Estimator (TPE) [59], which offer superior tuning performance and convergence speed.

Genetic algorithms are typically applied in design-stage multi-objective optimization, such as minimizing energy consumption at reduced costs [47] or lowering CO₂ emissions [60], freeing designers from subjective trial-and-error methods. As a classic optimization problem—achieving optimal model performance by controlling hyperparameters—genetic algorithms have been introduced to hyperparameter optimization tasks, significantly improving efficiency [57,58]. However, this application remains scarce in building energy prediction, leaving the performance of genetic algorithms in complex variable environments underexplored.

2.4. Research Objectives and Significance

The studies in Section 2.1, Section 2.2 and Section 2.3 have thoroughly discussed the objective and subjective factors influencing household energy consumption and proposed targeted energy-saving measures for residential areas. However, due to the complexity of energy consumption statistics, most studies primarily rely on open-source data from energy or government departments. While these data provide an overall view of urban residents’ energy consumption, they lack micro-level data specific to households and individuals. Additionally, traditional linear regression methods have certain limitations in capturing the effects of complex factor interactions, making it difficult to accurately identify the energy consumption characteristics and driving factors of households in existing high-density residential areas. Furthermore, due to the diversity and complexity of these areas, the practical implementation of energy-saving measures remains unclear. Existing research has been insufficient in explaining the fundamental driving forces behind household energy consumption in these areas.

To address the diversity and complexity of existing high-density residential areas, we must enhance the precision of the interpretation of household energy consumption scenarios in these areas. This study, based on micro-survey data, conducts a “bottom-up” investigation and analysis of energy usage in existing high-density residential areas.

In this study, we innovatively introduced the genetic algorithm (GA) as a hyperparameter optimization algorithm into the research of household energy consumption, constructed a hyperparameter optimization ensemble learning model to achieve optimal prediction performance, and utilized SHAP values to interpret the contributions of different variables to the model.

The precise identification of the driving factors of household energy consumption in high-density residential areas can provide refined data support and scientific evidence for energy consumption management in existing urban high-density residential areas. This will help formulate more targeted energy-saving strategies, optimize the application paths of energy-saving technologies, and guide residents to adopt low-carbon lifestyles. As a result, it will effectively reduce energy consumption and carbon emission intensity during the operational phase of high-density residential areas, playing a foundational and strategic role in the process of urban green development and in achieving China’s “dual carbon” strategic goals.

3. Methodology

3.1. Study Area

Beijing was one of the first cities in China to begin high-rise residential development. Since the 1970s, a variety of residential areas have been built, providing a rich sample for studying the driving factors of household energy consumption in high-density residential areas. In 2019, Professor Gong P’s team from Tsinghua University released nationwide urban land use type data for China [61], which were used in this study to determine the residential distribution in Beijing’s central urban area. In 2024, Professor Che YZ’s team from Sun Yat-sen University released data on building contour vectors and building height for cities across China [62]. This study utilized ArcGIS 10.8 software to calculate the plot ratio of residential areas in Beijing’s central urban area, retaining those with a plot ratio of 2.0–4.0, to obtain the general distribution of existing high-density residential areas in Beijing (as shown in Figure 1).

The results show that the Fourth to Sixth Rings of Beijing are areas where high-density residential neighborhoods are concentrated. Additionally, reports indicate that the population living between the Fourth and Sixth Rings accounts for about 50.8% of the city’s total population. This region serves as a connecting area between the city’s suburbs and the central urban areas, where residents have diverse commuting needs and modes of transportation. Therefore, the scope of this study covers the area between the Fourth and Sixth Rings of Beijing, with three representative residential neighborhoods selected as survey sites, as shown in Figure 1. Detailed information on the survey sites is provided in Table 1, and the data show that all three survey sites exhibit the characteristics of high-density residential areas.

3.2. Data Collection

The basic data required for this study included Device Usage Information, Household Information, Building Information, Site Environmental Information, and Travel Condition Information. Among these, Site Environmental Information was collected through field surveys, while the other data were gathered through questionnaires and semi-structured interviews. Figure 2 shows the entire process of this study.

3.2.1. Questionnaire Design

The survey questionnaire used in this study was divided into five sections: Household Information, Housing Information, Building Energy Use, Transportation Energy Use, and Energy-saving Awareness and Behavior. It collected microdata including age, occupation, the number of residents in the household, building type, building orientation, building area, energy consumption types, energy-consuming equipment types, usage duration, usage frequency, heating and cooling methods, travel needs, travel modes, commuting distances, and energy-saving measures taken or known about. These microdata served as the foundational database for calculating energy consumption in high-density residential areas and identifying the driving factors of household energy consumption.

3.2.2. Sampling Method

To ensure the representativeness of the survey, the study employed a combination of stratified sampling and simple random sampling.

(1): Determination of Stratification Variables: variables such as age, occupation, household population, and travel demand were excluded as stratification criteria due to their high randomness and pre-survey uncertainty; variables proven in previous studies to influence building energy consumption and exhibit heterogeneity—including construction year, floor level, and building orientation—were selected as stratification variables.
(2): Simple Random Sampling within Strata: after stratification, simple random sampling was applied to each stratum to ensure sample independence.
(3): Sample Size Calculation: The total sample size for the region was first determined. Sample proportions for each building group were allocated based on their building scale. A stratified proportional allocation method was used to calculate the number of samples per stratum. For example, if Group A has N households, with n south-facing households, the proportion of south-facing households is n/N. Given a total sample size N1 allocated to Group A, the number of south-facing samples n1 is calculated as n1 = n* N1/N.

This approach ensures representativeness and minimizes sampling errors.

3.2.3. Collection Process

The surveys were conducted in June 2024, with non-working days selected for household surveys. Semi-structured interviews and questionnaires were conducted simultaneously, with the respondents’ consent obtained before entering their homes. The survey was conducted by two team members, with one responsible for assisting the respondent in completing the questionnaire, and the other responsible for asking and recording the interview questions. During this process, a team member also explained the meaning of the variables in the questionnaire to ensure the accuracy of the data.

The questionnaire statistics were based on households. A total of 285 questionnaires were distributed, with 275 valid questionnaires collected, resulting in a valid response rate of 96.5%.

3.3. Data Processing

3.3.1. Definition of Indicators for the Calculation of Energy Consumption

Based on the survey data, the building and transportation energy consumption of the sample households were calculated. Building energy consumption includes the daily average winter electricity consumption, daily average summer electricity consumption, daily average winter heating consumption, and daily natural gas consumption. Transportation energy consumption is categorized by energy type into electricity consumption and fuel consumption. The specific calculation equations are as follows:

W_{1} = \sum_{i = 1}^{n} P_{i} t_{i}

(1)

W_{2} = \sum_{i = 1}^{n} P_{i} t_{i}

(2)

W_{3} = \sum_{i = 1}^{n} Q_{i} t = \sum_{i = 1}^{n} 24 Q_{i}

(3)

W_{4} = \sum_{i = 1}^{n} v_{i} q T_{i}

(4)

W_{5} = \sum_{i = 1}^{n} P_{i} D_{i}

(5)

W_{6} = \sum_{i = 1}^{n} \frac{D_{i} p_{i}}{100} ρ q = \sum_{i = 1}^{n} \frac{D_{i} p_{i}}{100} 32.3025

(6)

In Equations (1) and (2),

W_{1}

represents the daily average winter electricity consumption,

W_{2}

represents the daily average summer electricity consumption,

P_{i}

represents the power of each electrical appliance (unit: kW), and

t_{i}

represents the total daily duration of use of each appliance (unit: h). In Equation (3),

W_{3}

represents the daily average winter heating consumption,

Q_{i}

represents the power of each radiator (unit: kW), and t is a constant value of 24. In Equation (4),

W_{4}

represents the daily average natural gas consumption;

v_{i}

represents the volume of natural gas fully consumed per unit time (1 h); q represents the calorific value of natural gas per unit volume (1 L), with

v_{i}

ranging from 0.3 to 0.5 L and q valued at 36.48 MJ/L; and

T_{i}

represents the total daily usage time of natural gas (unit: h).

In Equation (5),

W_{5}

represents the daily average electricity consumption for transportation,

P_{i}

represents the energy consumption (unit: kWh/100 km), and

D_{i}

represents the travel distance. In Equation (6),

W_{6}

represents the daily average fuel consumption for transportation,

p_{i}

represents the fuel consumption (unit: L/100 km),

D_{i}

represents the travel distance, and

ρ

represents the density of gasoline, as assumed to be

750 kg / m^{3}

for the calculation.

q

represents the average lower heating value of gasoline, assumed to be 43,070

KJ / kg

for the calculation. The equation

ρ q

gives the lower heating value per unit volume (L) of gasoline, yielding a value of 32.3025 (unit:

MJ / L

).

To facilitate this research, the calculated household energy consumption data were further statistically processed to obtain the sample households’ Average Daily Summer Energy Consumption, Average Daily Winter Energy Consumption, and Average Daily Annual Energy Consumption, with tce (tons of standard coal equivalent) as the measurement unit.

3.3.2. Variable Selection

The relevant data obtained from the questionnaire survey were initially screened. Based on relevant research, 16 potential variables that could affect household energy consumption in high-density residential areas were selected and categorized into the four groups listed below. The assignment of values for these variables is detailed in Table 2.

The calculated Average Daily Summer Energy Consumption, Average Daily Winter Energy Consumption, and Average Daily Annual Energy Consumption were taken as the dependent variables in this study, represented as Y1, Y2, and Y3, respectively. Machine learning models were used with the aforementioned variables as independent variables to predict the three dependent variables and precisely identify the driving factors influencing the energy consumption of high-density residential areas.

3.4. Data Analysis Methods

This study adopts a machine learning modeling approach combined with SHAP values for model interpretation analysis to effectively identify the potential driving factors of energy consumption in existing high-density residential areas. Given that most current studies rely on traditional Grid Search or Random Search methods for hyperparameter optimization, which struggle to find the optimal combination in high-dimensional hyperparameter spaces and thus lose predictive accuracy, this research builds a hyperparameter optimization ensemble learning model process to enhance model performance. The steps are shown in Figure 3 and listed below.

(1): Data Preprocessing:

The energy consumption data obtained through data processing are cleaned, the missing and outlier values are removed, and the feature variables to be included in the model are selected.

(2): Model Selection and Pre-training:

To maximize the final model’s fitting effect, multiple machine learning models are compared. A total of 80% of the entire dataset is used as the training set and 20% as the test set. A 5-fold cross-validation training method is adopted, the performance of each model is evaluated. The model with the best performance is selected for the next step.

(3): Model Hyperparameter Optimization:

Hyperparameter optimization is performed for the initially selected models. The steps are as follows: select the default hyperparameter value range for the Grid Search to determine the approximate range of the model hyperparameters; then, based on a Random Search, select the three most impactful parameters (n-estimations, max-depth, learning-rate) while keeping the other hyperparameter values unchanged. The model performance of the Random Search, genetic algorithm (GA), and Bayesian Optimization (BO) methods is compared.

Model Training and Interpretability Analysis: the model with the best hyperparameter combination is trained, and SHAP models are used to identify the driving factors of household energy consumption in high-density residential areas.

The explanations of the concepts in this study are as follows.

3.4.1. XGBoost Model

Gradient Boosting Decision Tree (GBDT) is an ensemble learning algorithm that combines decision trees with gradient boosting, and can be used to solve both regression and classification problems [63]. In 2016, Chen et al. [64] proposed an ensemble learning algorithm called XGBoost (eXtreme Gradient Boosting//) based on GBDT. XGBoost can automatically handle missing values, does not rely on feature scaling, and controls the model’s complexity through regularization to improve its generalization ability. It has been widely used in academic research. The computation equation is as follows:

{\hat{y}}_{i}^{(t)} = \sum_{k = 1}^{t} f_{k} (x_{i}) = {\hat{y}}_{i}^{(t - 1)} + f_{t} (x_{i}), f k \in F

(7)

In Equation (7),

{\hat{y}}_{i}^{(t)}

represents the prediction result for sample i after the t-th iteration of the model;

{\hat{y}}_{i}^{(t - 1)}

represents the prediction result of the previous t-1 weak decision trees;

f_{t} x_{i}

represents the model of the t-th tree; and F is the set of all possible decision trees.

The objective function of the XGBoost algorithm consists of a loss function and a regularization term, and it is minimized through iterations. The expression of the objective function after the iteration step is as follows:

L = \sum_{i = 1}^{n} 1 (y_{i}, {\hat{y}}_{i})

(8)

O b j^{t} = \sum_{i = 1}^{n} 1 (y_{i}, {\hat{y}}_{i}) + \sum_{i = 1}^{n}' Ω (f_{i})

(9)

In Equations (8) and (9), L represents the loss function; n denotes the number of samples;

O b j^{t}

represents the objective function at the t-th iteration; and

' Ω

represents a regularization term that suppresses model complexity to prevent overfitting.

3.4.2. Model Hyperparameters and Optimization Algorithms

Model hyperparameters refer to parameters that are manually set before training a model to control its behavior and performance. Common model hyperparameters and their value ranges are shown in Table 3.

Different combinations of hyperparameters affect the training speed, convergence, and generalization ability of a model, thereby influencing its final performance. Currently, traditional methods like Random Search or Grid Search are commonly used for hyperparameter optimization [65]. Grid Search optimizes the model by exhaustively traversing all given hyperparameter combinations, which incurs high computational costs and is suitable for small parameter spaces. Random Search, on the other hand, optimizes the model by randomly sampling from the hyperparameter distribution, which incurs lower computational costs and is suitable for larger parameter spaces but yields random results. In this study, two intelligent algorithms are used for hyperparameter optimization to enhance model performance.

(1): GA:

The genetic algorithm is a heuristic search algorithm that simulates the natural evolutionary process. In this study, it is applied to the hyperparameter optimization problem. The optimization principle is as follows: The model hyperparameters (such as n-estimations, max-depth, and learning-rate) are first encoded, and an initial population is randomly generated. The fitness function is used to evaluate individual performance. Tournament selection is applied to choose parents, which undergo crossover operations to generate offspring. The offspring are mutated with a certain probability to prevent becoming stuck in local optima. By iterating these steps, the population’s fitness is improved, and the optimal hyperparameter combination is finally determined [66].

(2): BO:

BO is based on Bayesian theory. The optimization principle is as follows: Firstly, the hyperparameter search space and prior distributions are determined. Random sampling is performed to evaluate and obtain the target function values, from which a surrogate model (e.g., a Gaussian process) is built. The expected improvement is then computed to guide the selection of new sampling points. This process is iteratively updated until the stopping condition is met and the optimal hyperparameter combination is reached [67].

Compared to Grid Search and Random Search, intelligent algorithms in hyperparameter optimization save computational costs, offer better tuning performance, and converge more quickly, thereby improving model performance.

3.4.3. Model Performance Evaluation

To quantitatively evaluate the accuracy of the model’s simulation results, the RMSE (Root-Mean-Square Error), MAE (mean absolute error), and R² (Squared Correlation Coefficient) are used to assess the performance of the machine learning models. The RMSE and MAE indicate the difference between the simulated and actual values, while R² evaluates the regression performance. The closer the RMSE is to 0, the smaller the MAE, and the closer R² is to 1, the higher the model’s accuracy. The equations for these indicators are as follows [68,69]:

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(10)

M A E = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |

(11)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - {\bar{y}}_{i})}^{2}}

(12)

In Equations (10)–(12), n represents the number of samples, and

y_{i}

,

{\bar{y}}_{i}

, and

{\bar{y}}_{i}

represent the actual value of the i-th sample, the mean of the actual values, and the predicted value, respectively.

3.4.4. SHAP Model

SHAP (SHapley Additive exPlanations) originates from Shapley values in game theory, which can be used to explain the contribution of influencing factors to model predictions and analyze the role and effect of these factors on household energy consumption. Lundberg et al. [70] developed a Python-based model interpretation package that conveniently enables the retrieval of SHAP values for various machine learning algorithms, including XGBoost. Additionally, this package has excellent visualization capabilities, making it easy to generate various analysis plots [71].

The Shapley value is the contribution of a feature variable to the model’s prediction output, defined as

ϕ_{i} = \sum_{S \subseteq N {i}} \frac{| S |! (| F | - | S | - 1)!}{| F |!} [f_{S \cup {i}} (X_{S \cup {i}}) - f_{S} (X_{S})]

(13)

In Equation (13),

ϕ_{i}

represents the Shapley value of the i-th input value; S represents the set of input values excluding the i-th input value; N represents the set of all features in the training dataset; F represents the total number of features; S represents the subset of features used in the model;

f_{S} (X_{S})

represents the predicted value of the machine learning model under the subset S; and

f_{S \cup {i}}

represents the predicted value after adding the feature i to the set of features S, and then calculating the sample’s average value based on the tree structure, the values of leaf nodes, etc.

SHAP treats the model’s output as a linear sum of the input variables and uses SHAP values to represent the contribution of each feature variable. Suppose the input to the model consists of feature variables x = (x1, x2, …xm); then, the explanation model

g (x^{'})

for the original model f(x) can be expressed as

f x = g (x^{'}) = ϕ_{0} + \sum_{i = 1}^{M} ϕ_{i} X_{i}

(14)

In Equation (14),

x^{'}

represents the simplified vector of input variables derived from the original input variables x in the dataset; M is the number of input feature variables; when all inputs are empty,

ϕ_{0}

is a constant; and

ϕ_{i}

represents the contribution of each feature iii.

The Shapley value provides the relative feature importance of variables. The higher the relative feature importance of a variable, the greater its contribution. Furthermore, through SHAP partial dependence plots and SHAP interaction values, we can also reveal the ways in which influencing factors interact with and affect household energy consumption.

4. Results

4.1. Energy Consumption Calculation Result

4.1.1. Population Information

The survey covered 275 households, including 1100 people. In terms of gender distribution, females accounted for 53.63% of the population, while males accounted for 46.37%. Regarding age structure, 30.15% of the population were over 60 years old, 14.22% were under 18 years old, 39.15% were between 25 and 45 years old, 10.15% were between 18 and 25 years old, and 6.33% were between 45 and 60 years old. In terms of household structure, the most common family size was two to three people, accounting for 66.8% of the population, while four-person or larger households in which residents lived with their parents accounted for 23.5%. Additionally, some sample households included individuals living alone or collective rental arrangements for eight or more people.

4.1.2. Characteristics of Household Energy Consumption Structure

For the sample households, energy consumption was divided into two major categories: building energy consumption and transportation energy consumption. The main types of energy consumption are electricity, heating, natural gas, and transportation energy use (Figure 4a: Sankey diagram of household energy consumption composition).

The calculated results of the sample households’ energy consumption show that the daily average energy consumption ranges from 0.00286 to 0.01876 tce, with an average daily household energy consumption of 0.00937 tce, as detailed in Table 4.

In terms of energy composition, building energy consumption is 0.00587 tce, accounting for 63%, and transportation energy consumption is 0.00349 tce, accounting for 37% (Figure 4b: pie chart of household energy consumption composition).

Regarding energy consumption types, electricity consumption accounts for 9%, heating energy consumption accounts for 29%, natural gas energy consumption accounts for 25%, and transportation energy consumption accounts for 37% (Figure 4c: pie chart of household energy consumption types).

Building Energy Consumption:

The main energy demands for buildings in the area include appliance consumption, hot water supply, heating, lighting, cooking, and others. The primary energy types are electricity, heating, and natural gas (Figure 4d: Sankey diagram of building energy consumption types). Regarding energy composition (Figure 4e: pie chart of building energy consumption composition), appliance consumption accounts for 8%, hot water supply accounts for 21%, heating accounts for 46%, lighting accounts for 4%, and cooking accounts for 21%. In terms of energy types, electricity consumption accounts for 14%, heating accounts for 46%, and natural gas accounts for 40% (Figure 4f: pie chart of building energy consumption types).

Transportation Energy Consumption:

The main travel demands for families in the area are commuting to work, traveling, picking up children, shopping, medical visits, and others. The main transportation modes are subway, buses, fuel-powered cars, new-energy vehicles, and electric bicycles. The main energy types used are electricity and fuel (Figure 4g: Sankey diagram of transportation energy consumption types). Regarding energy composition, commuting to work accounts for 33%, travel accounts for 21%, picking up children accounts for 19%, shopping accounts for 14%, medical visits account for 9%, and other energy consumption accounts for 4% (Figure 4h: pie chart of transportation energy consumption composition). In terms of energy types, electricity accounts for 46%, and fuel accounts for 54% (Figure 4i: pie chart of transportation energy consumption types).

4.2. Model Selection Result

4.2.1. Model Selection and Pre-Training

To ensure the accuracy of our research results, this study compares three models: XGBoost (eXtreme Gradient Boosting), GBDT (Gradient Boosting Decision Tree), and RF (Random Forest). These models are used to predict the three sets of dependent variables identified. During model training, the data are divided into a training set (80%) and a test set (20%), with the test set used to evaluate the model’s generalization ability.

Based on the common model hyperparameters and their value ranges compiled in Section 3.4.2, the initial hyperparameter combinations for model training were determined, as shown in Table 5:

The performance of the three models in three prediction tasks is shown in Figure 5, which illustrates the model’s prediction performance through scatter distribution and fitted lines, revealing that the XGBoost model’s fit is more consistent with the actual observed values, outperforming both the GBDT and RF models. To objectively quantify model performance, RMSE, MAE, and R2 are used to perform a comprehensive evaluation. The model performance metrics are shown in Table 6.

In the prediction of the dependent variable Y1 (Average Daily Summer Energy Consumption), there are significant differences in prediction performance among different machine learning models. The XGB model has the highest R², which is 15.06% higher than that of RF and 13.87% higher than that of GBDT, demonstrating the highest model fit and explanatory power among the three models. The RF model has the lowest MAE value, indicating relatively good prediction accuracy. Its RMSE is also the smallest, reduced by 13.30% and 1.78% compared to that of the GBDT and XGB models, respectively, showing the best prediction stability.

In the prediction of the dependent variable Y2 (Average Daily Winter Energy Consumption), significant differences in prediction performance exist between the models. The XGB model, again, has the highest R², 23.38% higher than that of RF and 9.59% higher than that of GBDT, showing the highest model fit and explanatory power. The RF model has the lowest MAE, indicating better prediction accuracy. Its RMSE is the smallest, reduced by 11.65% and 13.4% compared to that of the GBDT and XGB models, respectively, showing the best prediction stability.

In the prediction of the dependent variable Y3 (Average Daily Annual Energy Consumption), significant differences are also observed in prediction performance between the models. The XGB model has the highest R², 24.66% higher than that of RF and 14.14% higher than that of GBDT, again demonstrating the highest model fit and explanatory power. The RF model has the lowest MAE, indicating the best prediction accuracy. Its RMSE is the smallest, reduced by 5.9% and 5.63% compared to that of the GBDT and XGB models, respectively, demonstrating the best prediction stability.

Considering the above three evaluation metrics—R², MAE, and RMSE—along with the scatter plot of predicted vs. actual values, the XGB model demonstrates the strongest performance. Therefore, this study chooses the XGB model to explore the driving factors of household energy consumption.

4.2.2. Model Hyperparameter Optimization

To improve the performance of the XGB model, the hyperparameters were optimized using three algorithms: RS (Random Search), GA, and BO. The three hyperparameters that had the most significant impact on model performance—n-estimations, max-depth, and learning-rate—were selected. The search ranges and step sizes for these hyperparameters are shown in Table 7. To enhance the efficiency of the GA and BO optimization algorithms, first, a broad Random Search range was set and 100 iterations of Random Search training were performed to determine a narrower hyperparameter range, thereby improving the optimization efficiency of the GA and BO. The number of iterations for both optimization algorithms was set to 30.

Initial parameter values for the GA were referenced from Yan et al. [62]. For this general multi-objective optimization problem, the objective functions were defined as Max_R², Min_MAE, and Min_RMSE. The decision variables were the three selected hyperparameters: n-estimators, max-depth, and learning-rate. Constraints included the hyperparameter search ranges and iteration limits specified in Table 6, with the population size set to 10 times the number of parameters to be optimized. A tournament size of 3, crossover probability of 0.2, and population size of 30 were adopted.

For the GA, the initial parameter values were chosen based on the study by Yan et al. [62], with the population size set to 10 times the number of parameters to be optimized, the tournament size set to 3, the crossover probability set to 0.2, and the population size set to 30.

Three hyperparameter optimization algorithms were applied to predict the three sets of dependent variables, with model performance shown in Table 8. Across all three prediction tasks, the GA-XGB and BO-XGB models optimized by the GA and BO demonstrated significantly improved performance compared to the RS-XGB model tuned via Random Search. Notably, the GA-XGB model achieved the best performance in all three tasks.

Table 9 lists the optimal hyperparameter combinations obtained using the three optimization methods (RS, GA, BO) for the XGB model. It can be observed that in all three prediction tasks, the max-depth obtained by Random Search is higher than that of the GA and BO, indicating that increasing model complexity does not necessarily improve model performance. Additionally, the n-estimators also show that when the hyperparameters are optimized using the GA, the number is smaller than that obtained using Random Search and BO. Combined with the best prediction performance achieved by the GA-XGB model in Table X, this suggests that the GA can perform an efficient search within the hyperparameter space, thus improving the model’s predictive performance.

Additionally, Figure 6 provides a clearer comparison of the hyperparameter optimization results between the GA-XGB and BO-XGB over 30 iterations. It reveals that during the genetic Algorithm optimization process, the hyperparameters converged significantly within the ranges of learning-rate (0.07, 0.09), n-estimators [(300, 380), and max-depth (16, 18). In contrast, the BO algorithm produced more scattered hyperparameter combinations, indicating that the GA is more efficient in hyperparameter search.

Combining model selection and hyperparameter optimization method selection, XGBoost as the base model demonstrated high predictive performance. When applying the GA for multi-objective hyperparameter optimization, it efficiently searched for optimal solutions in complex hyperparameter spaces, exhibiting high efficiency and stability in high-dimensional optimization problems. Thus, the GA-XGB model based on the GA was selected as the method for identifying driving factors of household energy consumption in high-density residential areas of Beijing.

4.3. Interpretation of Model Results

For the three sets of household energy consumption data, machine learning models were trained, and the GA was used to optimize the model’s hyperparameters. After tuning the model, an interpretability analysis was performed to output the global feature importance rankings. The SHAP values of the model were visualized using Bar Plots and Summary Plots, as shown in Figure 7, Figure 8 and Figure 9.

In Figure 7a, Figure 8a and Figure 9a, a SHAP value Bar Plot is presented. The SHAP values represent the contribution of each feature variable to the model’s prediction. Each bar in the plot corresponds to a feature variable, with the height of the bar indicating the absolute average SHAP value across all samples, reflecting the magnitude of that feature’s influence on the prediction. The features are ranked in terms of importance from top to bottom in the Bar Plot.

Figure 7b, Figure 8b and Figure 9b show a SHAP value Summary Plot, where the order of features matches that of the Bar Plot. Each point in the plot represents a feature and an instance’s SHAP value, with colors ranging from blue to red indicating low to high feature values. Points with the same SHAP value are stacked together to reveal the distribution of local effects. SHAP values above 0 indicate that the feature promotes the corresponding outcome, while values below 0 suggest that the feature suppresses it.

4.3.1. Driving Factors of Average Daily Summer Energy Consumption

In Model 1, the results of the interpretable machine learning analysis (Figure 7a) show that the variables influencing SEC (Average Daily Summer Energy Consumption) include FL, CY, BA, WBS_0, AAH, and TD. The feature variables BO_0, SM_2, SM_0, WBS_1, and ET_0 have little impact on SEC. Based on the SHAP value Summary Plot (Figure 7b), the variables that significantly promote SEC include FL and TD, while the variables that significantly reduce SEC include CY, BA, WBS_0, and AAH.

The above results show the impact of different feature variables on Average Daily Summer Energy Consumption (SEC). Specifically, the higher the floor level and the greater the average daily travel distance, the higher the Average Daily Summer Energy Consumption. Considering Beijing’s climate conditions, this may be because higher floors receive more direct sunlight in the summer, leading to increased use of air conditioning and other active regulation devices, thus increasing the Average Daily Summer Energy Consumption. As for the influence of travel distance on Average Daily Summer Energy Consumption, based on the content from semi-structured interviews during the survey, it was found that in summer, the demand for travel and shopping is higher compared to that in winter, leading to increased daily travel distance and thus increasing transportation energy consumption.

The survey found that a high building age suppresses Average Daily Summer Energy Consumption, as the performance of insulation materials, windows, and other components decreases with increasing building age. The windows and external walls of newer residential areas often have better energy-saving performance, which helps reduce the Average Daily Summer Energy Consumption. The building area, according to the model results, has a suppressive effect on Average Daily Summer Energy Consumption. This result contradicts previous research and common sense, likely due to the insufficient sample size of the survey, leading to misjudgments of some outliers by the machine learning model. The presence of water bodies in the residential area reduces the outdoor air temperature, indirectly reducing the use of indoor air conditioning and other active regulation devices. The average family age also plays a suppressive role in Average Daily Summer Energy Consumption, as expected, with energy-saving awareness and behavior playing an important role in controlling household energy consumption.

4.3.2. Driving Factors of Average Daily Winter Energy Consumption

In Model 2, the results from the explainable machine learning model (Figure 8a) show that the variables influencing WEC (Average Daily Winter Energy Consumption) include FL, TD, GR, SM_1, CY, AAH, and BA, while the impact of variables such as ET_0, WBS_1, SM_0, BT_0, BT_2, and SM_2 on WEC is minimal. According to the SHAP value Summary Plot (Figure 8b), the variable that significantly promotes WEC is FL, while the variables that significantly reduce WEC are GR, SM_1, CY, and AAH.

The above results show the impact of different feature variables on Average Daily Winter Energy Consumption (WEC). Specifically, the higher the floor level, the higher the Average Daily Winter Energy Consumption. This is likely because in winter, higher buildings have a larger area exposed to cold wind, which reduces the building’s insulation performance, leading to increased use of heating devices such as air conditioners, thus increasing the Average Daily Winter Energy Consumption. The floor level significantly affects energy consumption in both winter and summer, which is an important consideration for residential area planners and designers.

Green coverage and underlying surface materials help reduce the Average Daily Winter Energy Consumption by regulating the outdoor climate, indirectly creating a more comfortable indoor environment, thereby reducing energy consumption. Building age and average family age are also factors that suppress Average Daily Winter Energy Consumption, reflecting the role of insulation materials, energy-saving awareness, and energy-saving behaviors in driving household energy consumption.

4.3.3. Driving Factors of Average Daily Annual Energy Consumption

In Model 3, the results of interpretable machine learning show (Figure 9a) that the variables affecting AEC (Average Daily Annual Energy Consumption) include FL, TD, AAH, GR, BA, NCS, and WBS_0, while BT_0, BO_0, SM_2, ET_0, SM_0, BT_2, and other feature variables have a smaller impact on AEC. According to the SHAP value Summary Plot (Figure 9b), the variables that significantly promote AEC are FL and TD, while the variables that significantly reduce AEC are AAH, GR, BA, NCS, and WBS_0.

The above results show the impact of different feature variables on Average Daily Annual Energy Consumption. Specifically, the higher the floor level and the greater the average daily travel distance, the higher the Average Daily Annual Energy Consumption. This result is understandable when considering the outcomes of the SEC and WEC models. Given the hot summers and cold winters in Beijing, there are stringent regulations on building height, which highlight the need for further research in this area.

The results for the impact of household average age, greening rate, and the presence of water bodies on Average Daily Annual Energy Consumption are similar to the findings from the SEC and WEC models. The number of charging facilities has a suppressive effect on Average Daily Annual Energy Consumption. According to the semi-structured interview results, when purchasing motor vehicles, charging facilities are an important factor. Due to the inadequate existing infrastructure in high-density residential areas, insufficient charging facilities are provided. In the surveyed residential areas, this affects the number of households that own electric vehicles, and the increase in fuel vehicles also raises household transportation energy consumption. The impact of building area on Average Daily Annual Energy Consumption is influenced by the data on Average Daily Summer Energy Consumption, leading to a conclusion that contradicts common sense.

5. Conclusions

This article collects first-hand data on household energy consumption in high-density residential areas of Beijing through field surveys, questionnaires, and semi-structured interviews. By organizing and summarizing the survey data, the SEC, WEC, and AEC of households in high-density residential areas of Beijing were calculated. Based on related research, four categories of variables were selected—(1) Household Information, (2) Building Information, (3) Site Environmental Information, and (4) Travel Condition Information—with 16 possible variables influencing household energy consumption. Three sets of energy consumption data and 16 variables were used as the dataset for machine learning prediction in this study. Three machine learning models were built to identify the driving factors of household energy consumption in the summer and winter and throughout the year.

The energy consumption calculation results show that the daily energy consumption of sample households ranges from 0.00286 to 0.01876 tce, with the average daily household energy consumption being 0.00937 tce. Among the surveyed households, energy consumption is mainly divided into building energy consumption (63%) and transportation energy consumption (37%). In terms of energy types, electricity consumption accounts for 9%, heating consumption accounts for 29%, natural gas consumption accounts for 25%, and transportation energy consumption accounts for 37%. Regarding building energy consumption, electricity accounts for 14%, heating accounts for 46%, and natural gas accounts for 40%. In terms of transportation energy consumption, electricity accounts for 46%, and fuel accounts for 54%. Calculating the energy consumption of sample households helps us to understand the structural characteristics of household energy consumption in high-density residential areas and provides a basis for proposing energy-saving measures for these areas.

To effectively explore the driving factors of energy consumption in existing high-density residential areas and propose more practical energy-saving and emission reduction strategies, this article constructed an integrated learning model process for hyperparameter optimization. By comparing the performance indicators of the XGBoost, RF, and GBDT machine learning models (the study uses R², RMSE, and MAE as evaluation metrics) on the same dataset, an appropriate machine learning model for this study was initially selected. To further improve model performance, the study optimized the model hyperparameters using three algorithms. After determining the approximate range of hyperparameters through GS, the performance of RS, GA, and BO in optimizing the hyperparameters (n-estimations, max-depth, and learning rate) was compared. The results show that XGBoost outperforms RF and GBDT in terms of model performance, and the GA shows better improvement compared to RS and BO for hyperparameter optimization. Therefore, in this study, the XGBoost model was used as the research tool, and the GA was used to optimize the hyperparameters to improve model performance.

By plotting a Bar Plot and a Summary Plot of the model’s SHAP values, the contributions of different variables to the model were demonstrated, both in terms of magnitude and direction. The results show that the driving factors of SEC include FL, TD, CY, WBS, and AAH. Among these, FL, CY, and WBS are objective factors affecting household energy consumption, while TD and AAH are subjective factors. The driving factors of WEC include FL, GR, SM, CY, and AAH. Among these, FL, GR, SM, and CY are objective factors, while AAH is a subjective factor. The driving factors of AEC include FL, TD, AAH, GR, WBS, and NCS. Among these, FL, GR, WBS, and NCS are objective factors, while TD and AAH are subjective factors.

These results show that existing high-density residential areas have certain shortcomings compared to more recently built residential areas, especially in terms of building height. The driving effect of FL on household energy consumption reflects the insufficient response in building height control during initial construction to the rapid urbanization process. The group-form design does not adequately consider climate conditions, leading to increased energy consumption from the use of air conditioning and other climate control devices. The driving effect of CY reflects the decline in the thermal insulation performance of walls and windows in older buildings, which indirectly increases residents’ reliance on air conditioning in both summer and winter due to poor insulation performance in complex climate environments. The GR, WBS, and SM reflect the failure of past planning to adequately consider the role of the residential environment in regulating the outdoor climate, as well as the ability of the outdoor environment to adjust indoor temperatures. The TD reflects the concentration of residential areas in past urban planning, resulting in severe separation of workplaces and residences in Beijing, with long commuting times being a major issue. AAH reflects the energy-saving awareness and behavior. The impact of individual energy-saving behavior on AEC was found to be significant, emphasizing the need to enhance energy-saving training.

Based on these research results and the current status of existing high-density residential areas, the following energy-saving measures are proposed to fully explore the energy-saving potential of these areas: (1) Planning and layout: optimize site land use structure, centralize parking spaces (NPS), free up activity areas and walking paths, and increase nearby job opportunities to reduce commuting needs. (2) Site design: adjust SM, optimize outdoor activities, and adjust plant configurations to increase carbon sequestration. (3) Building design: improve the thermal insulation performance of building envelopes, incorporate vertical greening and photovoltaic roofs, and improve energy efficiency. (4) Energy-saving awareness: build community programs to increase citizens’ involvement in energy-saving and low-carbon activities.

The innovations of this study are reflected in the following aspects:

(1): First application of a GA for optimizing machine learning hyperparameters, overcoming the limitations of traditional GridSearch and Random Search, and significantly improving prediction accuracy in high-dimensional parameter spaces;
(2): Construction of a dual-dimensional analytical framework integrating subjective and objective factors to systematically reveal the mechanisms influencing household energy consumption;
(3): Integration of field surveys and machine learning models to precisely identify key drivers of energy consumption in high-density residential areas, providing a scientific decision-making basis for urban low-carbon management.

This study also acknowledges certain limitations:

(1): Due to the time and labor constraints inherent in door-to-door surveys, the study was unable to cover all urban areas, thereby overlooking cross-regional differences;
(2): Lack of cost–benefit analysis: The evaluation of energy-saving measures lacked a comprehensive cost–benefit analysis of retrofit measures.

Subsequent research will deepen cross-regional comparative studies, establish a multidimensional data collection system, and develop cost–benefit evaluation models to address these gaps.

Author Contributions

Conceptualization, Z.Q. and X.Y.; methodology, X.Y.; software, writing—review and editing, L.Z. and Z.Q.; validation, Z.Q.; investigation, Z.Q., L.Z. and Y.Z.; data curation, Y.Z.; writing—original draft preparation, Z.Q.; visualization, Y.Z.; supervision, X.Y.; project administration, X.Y.; funding acquisition, X.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the General Project of the Natural Science Foundation of. Beijing City: grant number 8202017.

Data Availability Statement

The data will be made available on request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

tce	Tons of standard coal equivalent
HP	Household population
AAH	Average age of household
CY	Construction year
BA	Building area
BO	Building orientation
FL	Floor level
BT	Building type
NW	Number of windows
FAR	Floor area ratio
GR	Greening rate
WBS	Water body status
SM	Surface material
NPS	Number of parking spaces
NCS	Number of charging stations
TD	Travel distance
ET	Energy type
SEC	Average Daily Summer Energy Consumption
WEC	Average Daily Winter Energy Consumption
AEC	Average Daily Annual Energy Consumption
XGBoost	eXtreme Gradient Boosting
GBDT	Gradient Boosting Decision Tree
RF	Random Forest
GS	GridSearchCV
RS	Random Search
GA	Genetic algorithm
BO	Bayesian Optimization
RMSE	Root-Mean-Square Error
MAE	Mean absolute error
R2	Squared Correlation Coefficient
SHAP	SHapley Additive exPlanations

References

COP 25 Presidency, UN Climate Change Conference—December 2019. Available online: https://unfccc.int/cop25 (accessed on 20 March 2025).
Zhang, S.C.; Yang, X.Y.; Xu, W.; Fu, Y.J. Contribution of nearly-zero energy buildings standards enforcement to achieve carbon neutral in urban area by 2060. Adv. Clim. Change Res. 2021, 12, 734–743. [Google Scholar] [CrossRef]
Min, J.; Yan, G.X.; Abed, A.M.; Elattar, S.; Khadimallah, M.A.; Jan, A.M.; Ali, H.E. The effect of carbon dioxide emissions on the building energy efficiency. Fuel 2022, 326, 10. [Google Scholar] [CrossRef]
World Energy Outlook. 2017. Available online: https://www.iea.org/reports/world-energy-outlook-2017 (accessed on 20 March 2025).
Kuang, B.; Schelly, C.; Ou, G.; Sahraei-Ardakani, M.; Tiwari, S.; Chen, J.L. Data-driven analysis of influential factors on residential energy end-use in the US. J. Build. Eng. 2023, 75, 22. [Google Scholar] [CrossRef]
Building Energy Conservation Research, Tsinghua University. 2021 Annual Report on China Building Energy Efficiency; China Architecture & Building Press: Beijing, China, 2021. (In Chinese) [Google Scholar]
Miao, L. Examining the impact factors of urban residential energy consumption and CO₂ emissions in China—Evidence from city-level data. Ecol. Indic. 2017, 73, 29–37. [Google Scholar] [CrossRef]
Du, J.; Yu, C.; Pan, W. Multiple influencing factors analysis of household energy consumption in high-rise residential buildings: Evidence from Hong Kong. Build. Simul. 2020, 13, 753–769. [Google Scholar] [CrossRef]
Sun, W.H.; Sun, Y.N.; Xu, L.; Chen, X.; Zai, D.B. Research on Energy Consumption Constitution and Energy Efficiency Strategies of Residential Buildings in China Based on Carbon Neutral Demand. Sustainability 2022, 14, 16. [Google Scholar] [CrossRef]
Day, J.K.; McIlvennie, C.; Brackley, C.; Tarantini, M.; Piselli, C.; Hahn, J.; O’Brien, W.; Rajus, V.S.; De Simone, M.; Kjærgaard, M.B.; et al. A review of select human-building interfaces and their relationship to human behavior, energy use and occupant comfort. Build. Environ. 2020, 178, 14. [Google Scholar] [CrossRef]
Xu, X.X.; Yu, H.; Sun, Q.W.; Tam, V.W.Y. A critical review of occupant energy consumption behavior in buildings: How we got here, where we are, and where we are headed. Renew. Sustain. Energy Rev. 2023, 182, 25. [Google Scholar] [CrossRef]
Lin, M.; Afshari, A.; Azar, E. A data-driven analysis of building energy use with emphasis on operation and maintenance: A case study from the UAE. J. Clean. Prod. 2018, 192, 169–178. [Google Scholar] [CrossRef]
von Grabe, J. A preliminary cognitive model for the prediction of energy-relevant human interaction with buildings. Cogn. Syst. Res. 2018, 49, 65–82. [Google Scholar] [CrossRef]
Yin, H.W.; Kong, F.H.; Zhang, X. Changes of residential land density and spatial pattern from 1989 to 2004 in Jinan City, China. Chin. Geogr. Sci. 2011, 21, 619–628. [Google Scholar] [CrossRef]
Yu, Y.W.; Yang, J.; Chai, S.; Tang, L. Estimating the energy saving potential of residential consumption in China based on decent living standards. Front. Environ. Sci. 2022, 10, 13. [Google Scholar] [CrossRef]
Chen, X.Y.; Gou, Z.H. Bridging the knowledge gap between energy-saving intentions and behaviours of young people in residential buildings. J. Build. Eng. 2022, 57, 15. [Google Scholar] [CrossRef]
Qalati, S.A.; Qureshi, N.A.; Ostic, D.; Sulaiman, M. An extension of the theory of planned behavior to understand factors influencing Pakistani households’ energy-saving intentions and behavior: A mediated-moderated model. Energy Effic. 2022, 15, 21. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.A.; Zhang, W.K. A study about the impact of energy saving climate on college students’ energy saving behavior: Based on analysis using the hierarchical linear model. J. Environ. Plan. Manag. 2023, 66, 2943–2961. [Google Scholar] [CrossRef]
Bansal, P.; Quan, S.J. Relationships between building characteristics, urban form and building energy use in different local climate zone contexts: An empirical study in Seoul. Energy Build. 2022, 272, 21. [Google Scholar] [CrossRef]
Abanda, F.H.; Byers, L. An investigation of the impact of building orientation on energy consumption in a domestic building using emerging BIM (Building Information Modelling). Energy 2016, 97, 517–527. [Google Scholar] [CrossRef]
Jurshari, M.Z.; Tazakor, M.Y.; Yeganeh, M. Optimizing the dimensional ratio and orientation of residential buildings in the humid temperate climate to reduce energy consumption (Case: Rasht Iran). Case Stud. Therm. Eng. 2024, 59, 12. [Google Scholar] [CrossRef]
Xu, S.; Wang, S.Y.; Li, G.M.; Zhou, H.Z.; Meng, C.; Qin, Y.C.; He, B.J. Performance-based design of residential blocks for the co-benefits of building energy efficiency and outdoor thermal comfort improvement. Build. Environ. 2024, 264, 19. [Google Scholar] [CrossRef]
Feng, W.J.; Chen, J.T.; Yang, Y.; Gao, W.J.; Xing, H.W.; Yu, S. Multi-objective optimization of morphology for high-rise residential cluster with the regards to energy use and microclimate. Energy Build. 2024, 319, 16. [Google Scholar] [CrossRef]
Wang, J.; Zhu, Z.Z.; Zhao, J.C.; Li, X.Q.; Liu, J.Y.; Yang, Y.J. Research on the Energy Consumption Influence Mechanism and Prediction for the Early Design Stage of University Public Teaching Buildings in Beijing. Buildings 2024, 14, 22. [Google Scholar] [CrossRef]
Taleb, S.; Yeretzian, A.; Jabr, R.A.; Hajj, H. Optimization of building form to reduce incident solar radiation. J. Build. Eng. 2020, 28, 13. [Google Scholar] [CrossRef]
He, H.Y.; Zhao, Z.Q.; Yan, H.; Zhang, G.Q.; Jing, R.; Zhou, M.R.; Wu, X.; Lin, T.; Ye, H. Urban functional area building carbon emission reduction driven by three-dimensional compact urban forms’ optimization. Ecol. Indic. 2024, 167, 10. [Google Scholar] [CrossRef]
de Meester, T.; Marique, A.F.; De Herde, A.; Reiter, S. Impacts of occupant behaviours on residential heating consumption for detached houses in a temperate climate in the northern part of Europe. Energy Build. 2013, 57, 313–323. [Google Scholar] [CrossRef]
Wemyss, D.; Cellina, F.; Lobsiger-Kägi, E.; de Luca, V.; Castri, R. Does it last? Long-term impacts of an app-based behavior change intervention on household electricity savings in Switzerland. Energy Res. Soc. Sci. 2019, 47, 16–27. [Google Scholar] [CrossRef]
Konis, K.; Orosz, M.; Sintov, N. A window into occupant-driven energy outcomes: Leveraging sub-metering infrastructure to examine psychosocial factors driving long-term outcomes of short-term competition-based energy interventions. Energy Build. 2016, 116, 206–217. [Google Scholar] [CrossRef]
Ingle, A.; Moezzi, M.; Lutzenhiser, L.; Diamond, R. Better home energy audit modelling: Incorporating inhabitant behaviours. Build. Res. Inf. 2014, 42, 409–421. [Google Scholar] [CrossRef]
Rausser, G.; Strielkowski, W.; Mentel, G. Consumer Attitudes Toward Energy Reduction and Changing Energy Consumption Behaviors. Energies 2023, 16, 5. [Google Scholar] [CrossRef]
Maréchal, K. Not irrational but habitual: The importance of “behavioural lock-in” in energy consumption. Ecol. Econ. 2010, 69, 1104–1114. [Google Scholar] [CrossRef]
Wu, L.; Zhou, Y. Social norms and energy conservation in China. Resour. Energy Econ. 2025, 82, 101491. [Google Scholar] [CrossRef]
Podgornik, A.; Sucic, B.; Blazic, B. Effects of customized consumption feedback on energy efficient behaviour in low-income households. J. Clean. Prod. 2016, 130, 25–34. [Google Scholar] [CrossRef]
Stephenson, J.; Barton, B.; Carrington, G.; Gnoth, D.; Lawson, R.; Thorsnes, P. Energy cultures: A framework for understanding energy behaviours. Energy Policy 2010, 38, 6120–6129. [Google Scholar] [CrossRef]
Han, M.S.; Cudjoe, D. Determinants of energy-saving behavior of urban residents: Evidence from Myanmar. Energy Policy 2020, 140, 7. [Google Scholar] [CrossRef]
Yan, W.; Yuan, Y.D.; Yang, M.H.; Zhang, P.; Peng, K.P. Detecting the risk of bullying victimization among adolescents: A large-scale machine learning approach. Comput. Hum. Behav. 2023, 147, 18. [Google Scholar] [CrossRef]
Dwyer, D.B.; Falkai, P.; Koutsouleris, N. Machine Learning Approaches for Clinical Psychology and Psychiatry. In Annual Review of Clinical Psychology; Widiger, T., Cannon, T.D., Eds.; Annual Review of Clinical Psychology; Annual Reviews: Palo Alto, CA, USA, 2018; Volume 14, pp. 91–118. [Google Scholar]
Liu, H.; Chen, X.; Liu, X.X. Factors influencing secondary school students’ reading literacy: An analysis based on XGBoost and SHAP methods. Front. Psychol. 2022, 13, 18. [Google Scholar] [CrossRef]
Alvarez-Sanz, M.; Satriya, F.A.; Terés-Zubiaga, J.; Campos-Celador, A.; Bermejo, U. Ranking building design and operation parameters for residential heating demand forecasting with machine learning. J. Build. Eng. 2024, 86, 22. [Google Scholar] [CrossRef]
Attanasio, A.; Piscitelli, M.S.; Chiusano, S.; Capozzoli, A.; Cerquitelli, T. Towards an Automated, Fast and Interpretable Estimation Model of Heating Energy Demand: A Data-Driven Approach Exploiting Building Energy Certificates. Energies 2019, 12, 25. [Google Scholar] [CrossRef]
Li, X.Y.; Yao, R.M. Modelling heating and cooling energy demand for building stock using a hybrid approach. Energy Build. 2021, 235, 15. [Google Scholar] [CrossRef]
Westermann, P.; Welzel, M.; Evins, R. Using a deep temporal convolutional network as a building energy surrogate model that spans multiple climate zones. Appl. Energy 2020, 278, 16. [Google Scholar] [CrossRef]
Runge, J.; Saloux, E. A comparison of prediction and forecasting artificial intelligence models to estimate the future energy demand in a district heating system. Energy 2023, 269, 14. [Google Scholar] [CrossRef]
Lu, C.J.; Li, S.H.; Gu, J.H.; Lu, W.Z.; Olofsson, T.; Ma, J.G. A hybrid ensemble learning framework for zero-energy potential prediction of photovoltaic direct-driven air conditioners. J. Build. Eng. 2023, 64, 16. [Google Scholar] [CrossRef]
Sauer, J.; Mariani, V.C.; Coelho, L.D.; Ribeiro, M.H.D.; Rampazzo, M. Extreme gradient boosting model based on improved Jaya optimizer applied to forecasting energy consumption in residential buildings. Evol. Syst. 2022, 13, 577–588. [Google Scholar] [CrossRef]
Barbaresi, A.; Ceccarelli, M.; Menichetti, G.; Torreggiani, D.; Tassinari, P.; Bovo, M. Application of Machine Learning Models for Fast and Accurate Predictions of Building Energy Need. Energies 2022, 15, 16. [Google Scholar] [CrossRef]
Shen, Y.X.; Pan, Y. BIM-supported automatic energy performance analysis for green building design using explainable machine learning and multi-objective optimization. Appl. Energy 2023, 333, 19. [Google Scholar] [CrossRef]
Wang, B.H.; Li, J.W. Global sensitivity analysis based on multi-objective optimization of rural tourism building performance. J. Clean. Prod. 2023, 417, 14. [Google Scholar] [CrossRef]
Guo, J.X.; Yun, S.N.; Meng, Y.; He, N.; Ye, D.F.; Zhao, Z.N.; Jia, L.Y.; Yang, L. Prediction of heating and cooling loads based on light gradient boosting machine algorithms. Build. Environ. 2023, 236, 15. [Google Scholar] [CrossRef]
Ardabili, S.; Abdolalizadeh, L.; Mako, C.; Torok, B.; Mosavi, A. Systematic Review of Deep Learning and Machine Learning for Building Energy. Front. Energy Res. 2022, 10, 19. [Google Scholar] [CrossRef]
Olu-Ajayi, R.; Alaka, H.; Sulaimon, I.; Sunmola, F.; Ajayi, S. Machine learning for energy performance prediction at the design stage of buildings. Energy Sustain. Dev. 2022, 66, 12–25. [Google Scholar] [CrossRef]
Song, Q.Q.; Xia, S.L.; Wu, Z. Automatic Optimization of Hyperparameters for Deep Convolutional Neural Networks: Grid Search Enhanced with Coordinate Ascent. In Proceedings of the International Conference on Machine Intelligence and Digital Applications (MIDA), Ningbo, China, 30–31 May 2024; The Association for Computing Machinery: New York, NY, USA, 2024; pp. 300–306. [Google Scholar]
Wainer, J.; Fonseca, P. How to tune the RBF SVM hyperparameters? An empirical evaluation of 18 search algorithms. Artif. Intell. Rev. 2021, 54, 4771–4797. [Google Scholar] [CrossRef]
Açikkar, M. Fast grid search: A grid search-inspired algorithm for optimizing hyperparameters of support vector regression. Turk. J. Electr. Eng. Comput. Sci. 2024, 32, 26. [Google Scholar] [CrossRef]
Vulpe-Grigorasi, A.; Grigore, O. Convolutional Neural Network Hyperparameters Optimization for Facial Emotion Recognition. In Proceedings of the 12th International Symposium on Advanced Topics in Electrical Engineering (ATEE), Bucharest, Romania, 25–27 March 2021; IEEE: Piscataway, NJ, USA, 2021. [Google Scholar]
Wen, L.; Ye, X.C.; Gao, L. A new automatic machine learning based hyperparameter optimization for workpiece quality prediction. Meas. Control 2020, 53, 1088–1098. [Google Scholar] [CrossRef]
Yokoyama, A.M.; Ferro, M.; Schulze, B. Multi-objective hyperparameter optimization approach with genetic algorithms towards efficient and environmentally friendly machine learning. AI Commun. 2024, 37, 429–442. [Google Scholar] [CrossRef]
Ali, Y.A.; Awwad, E.M.; Al-Razgan, M.; Maarouf, A. Hyperparameter Search for Machine Learning Algorithms for Optimizing the Computational Complexity. Processes 2023, 11, 21. [Google Scholar] [CrossRef]
Huang, C.Y.; Zhang, G.J.; Yao, J.W.; Wang, X.X.; Calautit, J.K.; Zhao, C.R.; An, N.; Peng, X. Accelerated environmental performance-driven urban design with generative adversarial network. Build. Environ. 2022, 224, 22. [Google Scholar] [CrossRef]
Canbolat, A.S.; Albak, E.I. Multi-Objective Optimization of Building Design Parameters for Cost Reduction and CO₂ Emission Control Using Four Different Algorithms. Appl. Sci. 2024, 14, 24. [Google Scholar] [CrossRef]
Gong, P.; Chen, B.; Li, X.C.; Liu, H.; Wang, J.; Bai, Y.Q.; Chen, J.M.; Chen, X.; Fang, L.; Feng, S.L.; et al. Mapping essential urban land use categories in China (EULUC-China): Preliminary results for 2018. Sci. Bull. 2020, 65, 182–187. [Google Scholar] [CrossRef]
Che, Y.Z.; Li, X.C.; Liu, X.P.; Wang, Y.H.; Liao, W.L.; Zheng, X.W.; Zhang, X.C.; Xu, X.C.; Shi, Q.; Zhu, J.J.; et al. 3D-GloBFP: The first global three-dimensional building footprint dataset. Earth Syst. Sci. Data 2024, 16, 5357–5374. [Google Scholar] [CrossRef]
Elith, J.; Leathwick, J.R.; Hastie, T. A working guide to boosted regression trees. J. Anim. Ecol. 2008, 77, 802–813. [Google Scholar] [CrossRef]
Chen, T.Q.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), San Francisco, CA, USA, 13–17 August 2016; The Association for Computing Machinery: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
Stopps, H.; Lozinsky, C.H.; Touchie, M.F. Data-driven modelling of pressurized corridor ventilation system performance in a multi-unit residential building. J. Build. Phys. 2025, 18, 1–18. [Google Scholar] [CrossRef]
Ma, R.; Xing, Q.; Zhang, J.Y.; Wang, J.; Wang, Y.J. Logging interpretation method based on Bayesian Optimization XGBoost. In Proceedings of the 16th IEEE International Conference on Signal Processing (ICSP), Beijing, China, 21–24 October 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 395–400. [Google Scholar]
Lundberg, S.M.; Erion, G.G.; Lee, S.-I. Consistent Individualized Feature Attribution for Tree Ensembles. arXiv 2018, arXiv:1802.03888. [Google Scholar]
Sharma, P.; Sahoo, B.B. An ANFIS-RSM based modeling and multi-objective optimization of syngas powered dual-fuel engine. Int. J. Hydrogen Energy 2022, 47, 19298–19318. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the 31st Annual Conference on Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Yan, H.A.; Yan, K.; Ji, G.H. Optimization and prediction in the early design stage of office buildings using genetic and XGBoost algorithms. Build. Environ. 2022, 218, 12. [Google Scholar] [CrossRef]

Figure 1. Distribution of high-density residential areas and survey site distribution in Beijing’s central urban area (data sources: [61,62]).

Figure 2. Research flowchart showing the three main steps of the study: data collection, data processing, and data analysis).

Figure 3. A schematic diagram of the model selection and hyperparameter optimization process (the diagram shows the hyperparameter optimization ensemble learning model process designed in this study, which can be divided into two parts: model selection and training, and model hyperparameter optimization).

Figure 4. Schematic diagram of sample household energy consumption structure: (a) Sankey diagram of household energy consumption composition. (b) Pie chart of household energy consumption composition. (c) Pie chart of household energy consumption types. (d) Sankey diagram of building energy consumption demand types. (e) Pie chart of building energy consumption composition. (f) Pie chart of building energy consumption types. (g) Sankey diagram of transportation energy consumption demand types. (h) Pie chart of transportation energy consumption composition. (i) Pie chart of transportation energy consumption types.

Figure 5. Scatter plot of predicted vs. actual values for different models. The X-axis represents the predicted values of the dependent variable for different models, and the Y-axis represents the actual values of the dependent variable for different models. The red line represents the best fit line of the model, and the color of the scatter points represents the prediction error of the model, with red indicating errors close to 0. (a,d,g) are the prediction results of model RF. (b,e,h) are the prediction results of model GBDT. (c,f,i) are the prediction results of model XGBoost.

Figure 6. Comparison of hyperparameter optimization results for optimized models ((a) shows the optimization results for the GA-XGB model, and (b) shows those for the BO-XGB model).

Figure 7. SEC model SHAP value (a) Bar Plot and (b) Summary Plot.

Figure 8. WEC model SHAP value (a) Bar Plot and (b) Summary Plot.

Figure 9. AEC model SHAP value (a) Bar Plot and (b) Summary Plot.

Table 1. Basic information about survey locations.

Group	Land Area (m²)	Building Area (m²)	FAR	Base Area (m²)	Building Density	Average Building Height (m)
Group A	296,050.83	722,364.025	2.44	103,173.71	34.85%	32.15
Group B	135,164.1	319,643	2.36	43,658	32.3%	33.15
Group C	327,926.61	823,095.79	2.51	116,249.98	35.45%	43.5

Table 2. Summary of variables.

Parameter	Number	Variable	Assignment	n/(%)	Minimum Value	Maximum Value	Average Value	Standard Deviation
Household Information	X1	HP	-	275	1	7	2.80	0.941
Household Information	X2	AAH	-	275	13	70	33.48	13.076
Building Information	X3	CY	-	275	1996	2010	2001.67	5.070
	X4	BA	-	275	28.00	120.00	74.5165	20.25741
	X5	BO	0: E-W	98	1	1	0.6420	0.48241
	X5	BO	1: S-N	177	1	1	0.6420	0.48241
	X6	FL	-	275	1	21	4.78	5.211
	X7	BT	0: Slab	129	0	2	0.6543	0.69211
			1: Point	110
			2: Enclosed	36
	X8	NW	-	275	3	7	4.30	0.872
Site Environmental Information	X9	FAR	-	275	1.54	4.56	3.2890	1.16111
	X10	GR	-	275	10.4852	23.6328	15.01	4.50
	X11	WBS	0: Yes	152	0	1	0.4444	0.500
	X11	WBS	1: No	123	0	1	0.4444	0.500
	X12	SM	0: Brick	123	0	2	0.6543	0.6549
			1: Gravel	127
			2: Asphalt	25
	X13	NPS	-	275	45	112	85.63	27.135
	X14	NCS	-	275	12	45	27.36	11.711
Travel Condition Information	X15	TD	-	275	1	120	46.019	33.0080
	X16	ET	0: Gasoline	129	0	1	0.5309	0.50216
	X16	ET	1: Electric	146	0	1	0.5309	0.50216

(1) Household Information: household population (HP), average age of household (AAH). (2) Building Information: construction year (CY), building area (BA), building orientation (BO), floor level (FL), building type (BT), number of windows (NW). (3) Site Environmental Information: floor area ratio (FAR), greening rate (GR), water body status (WBS), surface material (SM), number of parking spaces (NPS), number of charging stations (NCS). (4) Travel Condition Information: travel distance (TD), energy type (ET).

Table 3. Summary table of common model hyperparameters and their value ranges.

Hyperparameter Name	Value Range	Search Step Size
n-estimators	(10, 1000)	10/20
max-depth	(3, 15)	1/2
min-child-weight	(1, 10)	0.1/0.5
learning-rate	(0.01, 0.3)	0.01/0.05
subsample	(0, 1)	0.05/0.1
colsample-bytree	(0, 1)	0.05/0.1
alpha	(0, 100)	0.1/1
lambda	(0, 100)	0.1/1
gamma	(0, 10)	0.1/0.5

Table 4. Household energy consumption data statistics.

Project	Statistical Item	Scope	Average Value	Standard Deviation
Household energy consumption	Average Daily Annual Energy Consumption	0.00286–0.01876	0.00937	0.00412
Component	Building energy consumption	0.00157–0.01586	0.00587	0.00214
Component	Traffic energy consumption	0.00032–0.01319	0.00349	0.00273

Table 5. Initial hyperparameter combinations and value Ranges summary Table.

Hyperparameter Name	Value
n-estimators	300
max-depth	10
min-child-weight	5
learning-rate	0.1
Subsample	0.5
colsample-bytree	0.5
Alpha	0.1
Lambda	0.1
Gamma	5

Table 6. Summary of performance metrics for different models.

Dependent Variable	Model	R²	MAE	RMSE
Y1	RF	0.6311	71.6216	91.7765
	GBDT	0.6377	82.7016	105.8520
	XGB	0.7262	70.8900	93.4421
Y2	RF	0.6180	60.6285	101.7170
	GBDT	0.6958	68.6215	114.8887
	XGB	0.7625	70.0068	112.4848
Y3	RF	0.5970	94.7043	122.4148
	GBDT	0.6520	100.5457	130.1018
	XGB	0.7442	102.1999	129.7254

Table 7. Summary of hyperparameter optimization range and step size settings for three algorithms.

Algorithm	Hyperparameter
Algorithm	Max-Depth	Learning-Rate	n-Estimators
RS	(1, 20, 1)	(0.001, 0.1, 0.001)	(100, 500, 1)
GA and BO	(10, 18, 1)	(0.02, 0.09, 0.001)	(100, 400, 1)

Table 8. Summary of performance metrics for different hyperparameter optimization algorithms.

Dependent Variable	Model	R²	MAE	RMSE
Y1	RS-XGB	0.7538	78.3521	93.4250
	GA-XGB	0.9325	50.1536	95.6352
	BO-XGB	0.8402	60.3320	87.2340
Y2	RS-XGB	0.7263	70.2568	100.3252
	GA-XGB	0.9211	51.3325	91.2367
	BO-XGB	0.8677	58.6874	102.3654
Y3	RS-XGB	0.6944	100.3654	120.3547
	GA-XGB	0.9432	85.3699	115.3263
	BO-XGB	0.8424	95.6352	120.3698

Table 9. Summary of hyperparameter optimization ranges and step sizes for three optimization methods.

Dependent Variable	Algorithm	Hyperparameter
Dependent Variable	Algorithm	Max-Depth	Learning-Rate	n-Estimators
Y1	RS-XGB	17	0.08	386
	GA-XGB	13	0.075	307
	BO-XGB	15	0.088	371
Y2	RS-XGB	18	0.07	360
	GA-XGB	12	0.067	277
	BO-XGB	16	0.077	340
Y3	RS-XGB	17	0.08	378
	GA-XGB	14	0.086	275
	BO-XGB	14	0.0077	352

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qi, Z.; Zhang, L.; Yang, X.; Zhao, Y. Revealing the Driving Factors of Household Energy Consumption in High-Density Residential Areas of Beijing Based on Explainable Machine Learning. Buildings 2025, 15, 1205. https://doi.org/10.3390/buildings15071205

AMA Style

Qi Z, Zhang L, Yang X, Zhao Y. Revealing the Driving Factors of Household Energy Consumption in High-Density Residential Areas of Beijing Based on Explainable Machine Learning. Buildings. 2025; 15(7):1205. https://doi.org/10.3390/buildings15071205

Chicago/Turabian Style

Qi, Zizhuo, Lu Zhang, Xin Yang, and Yanxia Zhao. 2025. "Revealing the Driving Factors of Household Energy Consumption in High-Density Residential Areas of Beijing Based on Explainable Machine Learning" Buildings 15, no. 7: 1205. https://doi.org/10.3390/buildings15071205

APA Style

Qi, Z., Zhang, L., Yang, X., & Zhao, Y. (2025). Revealing the Driving Factors of Household Energy Consumption in High-Density Residential Areas of Beijing Based on Explainable Machine Learning. Buildings, 15(7), 1205. https://doi.org/10.3390/buildings15071205

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Revealing the Driving Factors of Household Energy Consumption in High-Density Residential Areas of Beijing Based on Explainable Machine Learning

Abstract

1. Introduction

2. Literature Review

2.1. Objective Influencing Factors of Household Energy Consumption

2.2. Subjective Influencing Factors of Household Energy Consumption

2.3. Analysis Methods for Influencing Factors of Household Energy Consumption

2.4. Research Objectives and Significance

3. Methodology

3.1. Study Area

3.2. Data Collection

3.2.1. Questionnaire Design

3.2.2. Sampling Method

3.2.3. Collection Process

3.3. Data Processing

3.3.1. Definition of Indicators for the Calculation of Energy Consumption

3.3.2. Variable Selection

3.4. Data Analysis Methods

3.4.1. XGBoost Model

3.4.2. Model Hyperparameters and Optimization Algorithms

3.4.3. Model Performance Evaluation

3.4.4. SHAP Model

4. Results

4.1. Energy Consumption Calculation Result

4.1.1. Population Information

4.1.2. Characteristics of Household Energy Consumption Structure

4.2. Model Selection Result

4.2.1. Model Selection and Pre-Training

4.2.2. Model Hyperparameter Optimization

4.3. Interpretation of Model Results

4.3.1. Driving Factors of Average Daily Summer Energy Consumption

4.3.2. Driving Factors of Average Daily Winter Energy Consumption

4.3.3. Driving Factors of Average Daily Annual Energy Consumption

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI