Next Article in Journal
A Review and Perspectives on Wind Speed Forecasting for High-Speed Railways in China
Previous Article in Journal
Analysis of Precipitation Characteristics in the Middle and Lower Reaches of the Jinsha River Basin Based on Warm-Season Observations (2023–2025)
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Analysis of the Impact of Biometeorological Thermal Indices on Summer Peak Power Load Forecasting in Guangdong Province

1
College of Ocean and Meteorology, Guangdong Ocean University, Zhanjiang 524088, China
2
South China Sea Institute of Marine Meteorology, Guangdong Ocean University, Zhanjiang 524088, China
3
Western Guangdong Marine Meteorology Lab, Guangdong Ocean University, Zhanjiang 524088, China
4
Huangpu Meteorological Bureau of Guangzhou, Guangzhou 510530, China
5
Guangdong Climate Center, Guangzhou 510640, China
6
China Power Dispatching Control Center, China Southern Power Grid Co., Ltd., Guangzhou 510000, China
*
Authors to whom correspondence should be addressed.
Atmosphere 2026, 17(5), 463; https://doi.org/10.3390/atmos17050463
Submission received: 12 March 2026 / Revised: 27 April 2026 / Accepted: 28 April 2026 / Published: 30 April 2026

Abstract

Accurate prediction of electricity demand during hot seasons is essential for maintaining power system reliability, particularly in humid subtropical regions such as Guangdong, China, where high temperatures strongly influence consumption. However, many models rely primarily on air temperature and may not fully capture combined atmospheric effects. This study evaluates the potential of biometeorological thermal indices for improving summer electricity load forecasting. Daily maximum load and meteorological data during May–September 2019–2021 were analyzed using Back-Propagation Neural Network (BP), Random Forest (RF), and a Stacking ensemble model. Three indices—Effective Temperature (ET), Physiological Equivalent Temperature (PET), and the Universal Thermal Climate Index (UTCI)—were introduced as predictors. The ensemble model achieved the best performance, with Ensemble–UTCI yielding the highest accuracy (R2 = 0.559, RMSE = 60.96 × 104 kW, MAE = 45.10 × 104 kW). Compared with temperature-based models, biometeorological indices consistently improved predictions, with UTCI performing best (average RMSE = 62.81 × 104 kW). Bayesian analysis shows strong evidence of improvement in RF and ensemble models, but not in BP or linear models, indicating model dependence. During the July 2021 heat event, RF showed greater robustness, with PET–RF achieving the lowest error (MAPE = 3.03%). These results demonstrate the value of biometeorological indices for load forecasting in humid subtropical regions.

1. Introduction

With the intensification of global climate warming and the acceleration of urbanization, extreme heat events have increased markedly in frequency, intensity, and duration worldwide, posing substantial challenges to the safe and reliable operation of urban energy systems [1,2,3,4,5,6]. Under persistent high-temperature conditions, cooling demand rises rapidly, rendering summer electricity load highly sensitive to meteorological variations, particularly in economically developed and densely populated urban regions [6,7]. Improving the accuracy of electricity load forecasting during heat events has therefore become a critical scientific issue for power system dispatch optimization, demand-side management, and energy system resilience assessment [8].
Extensive studies have demonstrated that the relationship between meteorological factors and electricity load is strongly nonlinear, with this nonlinearity becoming more pronounced during hot seasons [9,10,11,12]. Among various meteorological variables, air temperature is widely recognized as the primary driver of summer load variability. Once ambient temperature exceeds a certain threshold, the use of cooling appliances, especially air conditioning, increases sharply, leading to a rapid rise in electricity demand. Based on large-scale residential electricity consumption data in Shanghai, Li et al. (2019) reported that when daily mean temperature exceeds 25 °C, each 1 °C increase results in a significant rise in electricity consumption, with the amplification effect becoming particularly pronounced during extreme heat conditions [13]. Similar findings have been reported in studies using cooling degree days (CDD), which consistently show a strong positive correlation between cooling demand and electricity consumption [14].
Nevertheless, air temperature alone cannot fully represent human thermal perception under real environmental conditions. Under high-temperature scenarios, relative humidity, wind speed, and radiative conditions substantially modulate human heat balance, thereby influencing cooling behavior and electricity demand [15,16]. To better characterize human thermal stress, biometeorology has developed a series of biometeorological thermal indices, including Effective Temperature (ET), Physiological Equivalent Temperature (PET), and the Universal Thermal Climate Index (UTCI). ET integrates air temperature, humidity, and wind speed and has been widely used as an early indicator of perceived temperature [8]. PET is based on the human energy balance model and has been extensively applied to outdoor thermal environment assessment across different climate zones [17,18]. UTCI further incorporates a multi-node human thermoregulation model, providing a more comprehensive theoretical representation of radiative, convective, and evaporative heat exchange processes [19,20].
In recent years, the application of biometeorological thermal indices in urban climate research has expanded substantially, covering outdoor thermal assessment across diverse climate zones, spatiotemporal pattern analysis, and extreme heat event evaluation. Indices such as UTCI have become a particular focus of attention in urban thermal studies [21,22,23,24,25]. These advances have created new opportunities for integrating biometeorological thermal indices into urban- and regional-scale electricity load studies, enabling a more physiologically meaningful representation of heat-driven energy demand.
In the field of load forecasting, traditional statistical approaches, such as linear regression and ARMA models, provide interpretable relationships between load and meteorological variables but often struggle to capture high-dimensional nonlinear interactions [8]. With increasing computational capacity, machine learning methods have become important tools for load forecasting. Back-propagation (BP) neural networks have been widely applied in short-term load forecasting due to their ability to model nonlinear relationships [26,27], and the inclusion of meteorological variables has been shown to significantly reduce prediction errors [28]. Ensemble learning methods, such as random forests, further improve model robustness and generalization by aggregating multiple decision trees, making them effective for modeling complex load relationships [29,30,31]. At the urban scale, random forest models have also been shown to capture nonlinear interactions between meteorological conditions, land-surface characteristics, and thermal environments [32,33].
More recently, load forecasting research has increasingly explored advanced machine learning and deep learning architectures. For example, Waheed and Xu (2024) [34] proposed a Wide-and-Deep Neural Network (WDNN) model that combines linear feature interactions with deep neural networks to improve short-term load forecasting accuracy. Groß et al. (2021) [35] conducted a multi-model comparison and found that model performance varies significantly across different electricity consumption scenarios. Bae et al. (2025) [36] further introduced clustering and dimensionality-reduction techniques to enhance prediction performance for large-scale consumer datasets. In addition, explainable artificial intelligence methods have begun to be incorporated into load forecasting studies. Bhupatiraju and Ahn (2025) [37] applied SHAP-based interpretation to guide feature engineering, improving the prediction of extreme load events. Comprehensive reviews indicate that the application of deep learning methods in short-term electricity load forecasting has grown rapidly and is becoming a major research direction in this field [38]. Recent studies have also attempted to integrate UTCI with machine learning models for predicting thermal environments and related energy consumption in complex urban settings, demonstrating promising performance and practical potential [39,40].
However, despite these advances, two practical challenges remain insufficiently addressed in the context of regional, meteorology-driven load forecasting. First, many advanced deep learning models (e.g., LSTM and Transformer-based architectures) require large-scale, long-term, and high-frequency datasets to achieve stable generalization performance, which are often unavailable in regional grid-level studies with limited historical records. Second, the increasing model complexity may obscure the relative contribution of different meteorological predictors, making it difficult to isolate the intrinsic predictive value of specific thermal indices. Therefore, for studies aiming to evaluate the incremental contribution of biometeorological variables, the use of well-established and interpretable machine learning models provides a more controlled and methodologically transparent framework.
Guangdong Province, located in southern coastal China, has a typical subtropical monsoon climate characterized by long, hot, and humid summers. Observational studies indicate that both the duration and intensity of heat events in the Guangdong–Hong Kong–Macao Greater Bay Area have increased in recent years [41]. Meanwhile, rapid urbanization has intensified the urban heat island effect, leading to higher thermal exposure and cooling demand in urban cores compared with surrounding suburban areas [32]. Under these conditions of rapid development and climate warming, the response of peak electricity demand to meteorological factors in Guangdong has become increasingly complex, and traditional temperature-based prediction approaches may not adequately support refined grid operation and planning [33,42]. Electricity demand is also influenced by socioeconomic factors and demand-side management policies, including economic growth, population change, and energy pricing [43,44,45]. In addition, the development of smart grids has promoted demand-side management (DSM) as an important mechanism for balancing electricity supply and demand [46,47,48]. During the summer of 2021, persistent high temperatures caused Guangdong’s daily maximum load to exceed 10,000 × 104 kW continuously from 15 June onward, reaching a historical peak of 13,513 × 104 kW on 27 July. These conditions highlight the importance of developing meteorology-informed load forecasting models for the region.
Despite recent progress, several important gaps remain. First, most biometeorological indices have been applied in the building energy domain as indicators of human thermal stress. At this scale, existing studies mainly focus on indoor thermal comfort and energy efficiency or on the relationship between thermal resilience and energy performance, particularly under extreme heat conditions [49,50]. However, these studies largely address indoor climate control or long-term climate trends, offering limited guidance for short-term load forecasting at the regional power grid level. Previous work has typically incorporated simple comfort or discomfort indices as meteorological predictors in regional load models [15,51]. Only a few recent studies have directly applied advanced indices such as UTCI to regional electricity demand modeling [52], demonstrating their potential to explain large-scale variations in energy consumption. Nevertheless, compared with conventional meteorological variables, the integration of biometeorological indices into load forecasting remains limited.
Second, there is a lack of systematic evaluation of multiple biometeorological indices within a unified modeling framework. In particular, under humid subtropical conditions, the relationship between physiological complexity and predictive performance is still unclear. Third, the robustness of such models under extreme heat events has not been sufficiently examined. Moreover, previous studies rarely examine how the physiological complexity of thermal indices translates into predictive gains, particularly under different modeling paradigms.
To address these gaps, this study develops a unified modeling framework that integrates multiple biometeorological indices (ET, PET, and UTCI) with machine learning approaches. Using daily electricity load and meteorological data from Guangdong during the summer seasons of 2019–2021, a series of forecasting models based on Back-Propagation Neural Networks (BP) and Random Forests (RF) are constructed and compared. The main objectives are to: (1) quantify the added value of biometeorological indices relative to temperature-based predictors; (2) examine the trade-off between physiological realism and predictive performance; and (3) evaluate model robustness under extreme heat conditions.
Compared with existing studies, this work provides several incremental contributions. First, it offers a systematic evaluation of multiple biometeorological thermal indices (ET, PET, and UTCI) within a unified regional load forecasting framework, which has been relatively limited in humid subtropical contexts. Second, an adaptive lag-selection strategy is employed to account for the cumulative effects of meteorological variables, allowing each index to operate under a more suitable temporal response scale rather than relying on a fixed lag structure. Third, by combining machine learning models with Bayesian inference and uncertainty quantification, the study presents a probabilistic assessment of the added value of biometeorological indices, with a particular focus on performance under extreme heat conditions.
Specifically, daily maximum load data for the warm season (May–September) are combined with meteorological observations to develop multiple model configurations. By incorporating thermal indices with different levels of physiological complexity (ET, PET, and UTCI) and applying nonlinear machine learning methods, this study systematically evaluates the performance of different index–algorithm combinations. The results aim to provide insights for power system operation and planning, as well as for regional climate–load interaction studies in Guangdong.

2. Materials and Methods

2.1. Data Sources

2.1.1. Electricity Load and Meteorological Data

Electricity load data were obtained from the China Southern Power Grid and consisted of system-level daily maximum load records from 1 January 2019 to 20 September 2021. Meteorological observations were provided by the Guangdong Climate Center and included quality-controlled daily data from 1 January 2019 to 31 December 2021. The meteorological variables included daily mean air temperature, maximum temperature, minimum temperature, mean relative humidity, and mean wind speed.
After data cleaning, electricity load and meteorological observations from March 2019 to September 2021 were retained, resulting in 928 valid daily samples (retention rate: 99.68%). The dataset was divided chronologically into a training set (May–September of 2019–2020, n = 306) and a testing set (May–September of 2021, n = 143).
The overall mean system load was 9569.80 ± 1968.57 (×104 kW), with values ranging from 3869 to 13,513 × 104 kW. The corresponding meteorological conditions were characterized by a mean air temperature of 23.62 ± 5.46 °C, mean maximum temperature of 28.40 ± 5.59 °C, relative humidity of 78.00 ± 9.07%, and wind speed of 2.07 ± 0.49 m s−1.
An extreme heat event occurring from 23 to 30 July 2021 (8 days) was used for additional validation under extreme conditions. During this event, the maximum air temperature reached 35.7 °C and the peak electricity load reached 13,513 × 104 kW, with an average load of 12,822.75 ± 606.07 × 104 kW.
The overall data missing rate was 0.06%. Missing values were filled using linear interpolation between adjacent dates. Outliers were identified using the interquartile range (IQR) method and cross-checked with neighboring station records. Confirmed anomalies were corrected using time-series interpolation to maintain data continuity.
To reduce the influence of differences in variable magnitudes during model training, all continuous variables were normalized using the min–max scaling method, which linearly transforms each variable to a range between 0 and 1.
In this study, the warm season was defined as May to September, corresponding to the climatologically hottest period and the peak electricity demand season in Guangdong Province.

2.1.2. Solar Radiation Data

Solar radiation data were obtained from the NASA Global Energy Resource Prediction (POWER) project through the Data Access Viewer platform. These data are derived from the CERES SYN1deg Edition 4.1 satellite product and bias-corrected using observations from the Baseline Surface Radiation Network (BSRN).
The dataset covers the period from 1983 to 2023 with a spatial resolution of 0.5°. Daily variables include global horizontal irradiance (GHI), direct normal irradiance (DNI), and diffuse horizontal irradiance (DHI).
Because the meteorological observations were station-based while the radiation data were provided on a regular grid, spatial matching was required. For each meteorological station, the nearest radiation grid cell was selected based on geographic coordinates. When multiple stations were located within the same grid cell, the same radiation value was used. The matched radiation data were then averaged across stations to obtain regional-scale radiation inputs consistent with the system-level electricity load data. These radiation variables were subsequently used in the calculation of biometeorological thermal indices.

2.2. Parameter Settings of Biometeorological Thermal Indices

Three biometeorological thermal indices with different levels of physiological complexity were used in this study: Effective Temperature (ET), Physiological Equivalent Temperature (PET), and the Universal Thermal Climate Index (UTCI).
ET was calculated following the formulation proposed by Wu et al. [16]. PET was computed using the steady-state PET model implemented in the Python library pythermalcomfort (version 2.5.0). UTCI was calculated using the thermofeel library released by the European Centre for Medium-Range Weather Forecasts, which provides a standardized implementation of UTCI calculations [53].
Human physiological parameters were set according to commonly used assumptions in thermal comfort studies [54]. A reference individual was defined as a 35-year-old adult male, with a height of 1.75 m and a body mass of 75 kg. The metabolic rate was set to 80 W, corresponding to a standing posture with light activity. According to the Guangdong Statistical Yearbook (Table 1), the proportion of the working-age population (15–64 years) ranged from 72.15% to 74.72% during 2019–2021, while the urbanization rate ranged from 72.65% to 74.63%. These demographic characteristics indicate that the selected reference individual is broadly representative of the primary electricity-consuming population.
Clothing insulation was initially set to 0.9 clo, representing typical indoor or transitional-season clothing conditions. Considering the hot and humid climate in Guangdong, clothing insulation was reduced to 0.6 clo during the warm season (May–September) to reflect common summer clothing. For the remaining months, the value was maintained at 0.9 clo.
These parameter settings ensure consistency with standard biometeorological practices while incorporating regional climatic characteristics, thereby providing a more realistic representation of human thermal stress under humid subtropical conditions.

2.3. Separation of Meteorological Load

Figure 1 presents the daily maximum electricity load in Guangdong Province during 2019–2021. An overall increasing trend in annual peak load is evident, which is statistically significant at the 0.05 level. The annual cycle exhibits a clear unimodal pattern, with peak values occurring in summer.
Substantial intra-annual variability is observed in the daily maximum load. Over the study period, the difference between the maximum and minimum values exceeds 6500 × 104 kW, indicating pronounced fluctuations in electricity demand. Holiday effects are also evident, with major declines during the Spring Festival, Labour Day, and National Day periods. In contrast, weekday variations from Monday to Saturday remain relatively stable, while a noticeable decrease is consistently observed on Sundays.
To justify the omission of detailed socioeconomic variables in short-term load forecasting, this study conducted a cross-validation analysis using data from the Guangdong Statistical Yearbook for 2019–2021, covering economic structure, energy consumption, and demographic characteristics.
Economic accounts (Table 1) show that the share of the secondary sector varied within ±0.5%, while the tertiary sector remained stable at 55.6–55.8%, indicating a near-stationary industrial structure during the study period. Sectoral electricity consumption data further reveal that the proportion of industrial electricity use slightly decreased from 60.8% to 58.9%, whereas the combined share of temperature-sensitive sectors (residential and commercial) remained stable at 22–23%. Correspondingly, the overall temperature elasticity of electricity demand exhibited minimal variation (<3%) over the three years.
From a demographic perspective, the urbanization rate increased at an average annual rate of only 0.66%, while the interprovincial net migration rate remained stable at 5.5–6.2‰, indicating a relatively constant migrant population. In addition, the recurring load trough pattern during the Spring Festival suggests a stable and predictable response of electricity demand to large-scale population mobility.
These results collectively indicate that socioeconomic conditions remained relatively stable during 2019–2021, supporting the methodological assumption that short-term load variability is predominantly driven by meteorological factors.
Based on the above analysis, electricity load in Guangdong exhibits a clear long-term increasing trend driven by socioeconomic development, while also showing strong sensitivity to meteorological conditions and systematic variations associated with holidays. Accordingly, the daily maximum load can be expressed as
Y = Y t + Y m + δ
where Y t = a + b t represents the long-term trend component, with a as the intercept and b as the linear trend coefficient. This term captures the gradual increase in electricity demand associated with economic growth, improvements in living standards, and other slowly varying factors, including systematic calendar effects. Y m denotes the meteorological component (hereafter referred to as meteorological load), and δ represents residual influences from unobserved factors, which are assumed to be small and are therefore neglected in this study.
The detrending procedure follows the widely used framework of weather normalization in electricity demand analysis, in which load is decomposed into long-term trend, seasonal variability, and meteorological contributions [55,56]. Ordinary least squares (OLS) regression is commonly applied to quantify the relationship between weather variables and electricity demand [57], and to remove non-meteorological variations from the load series.
In this study, only the long-term trend component is removed, while seasonal signals are retained to preserve intra-annual variability. The trend is estimated using OLS based on daily maximum load data during 2019–2021, and the difference between the observed load and the fitted trend is defined as the meteorological load. It should be noted that this residual component does not exclusively represent meteorological effects, as it may still contain contributions from other unobserved factors.
Subsequently, correlation analysis is conducted to examine the relationships between meteorological load and meteorological variables. Stepwise regression, back-propagation neural networks (BP), and random forest (RF) models are then employed to develop predictive models for summer daily peak load in Guangdong Province, with a focus on quantifying the contribution of meteorological factors.

2.4. Research Methods

2.4.1. Machine Learning Algorithms

(1)
BP Neural Network
A back-propagation (BP) neural network with a multi-input single-output structure was constructed to capture nonlinear relationships between meteorological variables and peak electricity load [26,27,28,58].
The model consisted of an input layer, two hidden layers, and a single output neuron representing the predicted daily maximum load. Fully connected connections were applied between adjacent layers, and nonlinear activation functions were used in the hidden layers.
The two-hidden-layer architecture improves the model’s ability to approximate complex nonlinear relationships while maintaining computational efficiency. The BP network was trained using gradient-based back-propagation to minimize the prediction error between observed and modeled electricity load.
(2)
Random Forest
Random Forest is an ensemble learning algorithm based on decision trees that improves prediction accuracy and model stability by aggregating the outputs of multiple trees [29,30,31,59].
During model construction, randomness is introduced in two ways: bootstrap sampling of training data and random selection of candidate predictor variables at each tree split. These mechanisms reduce overfitting and enhance the generalization ability of the model when dealing with complex nonlinear relationships between meteorological variables and electricity demand.

2.4.2. Biometeorological Thermal Indices

(1)
Effective Temperature (ET)
Effective Temperature (ET) was originally proposed by Houghton and Yaglou (1923) as an index representing human thermal perception based on laboratory experiments. Missenard (1933) later developed a mathematical formulation for ET, and Gregorczuk (1968) further incorporated the effect of wind speed into the equation.
The ET used in this study was calculated as
E T = 37 37 T 0.68 0.0014 R H + 1 1.76 + 1.4 V 0.75 0.9 T 1 0.01 R H
where T is air temperature (°C), R H is relative humidity (%), and V is wind speed (m·s−1).
This formulation integrates the combined effects of air temperature, humidity, and wind speed on perceived thermal conditions.
(2)
Physiological Equivalent Temperature (PET)
Physiological Equivalent Temperature (PET) is based on the Munich Energy Balance Model for Individuals (MEMI), which quantifies heat exchange between the human body and the surrounding environment through a human energy balance equation [60].
PET is defined as the air temperature in a standardized indoor reference environment that produces the same core and skin temperatures as those under actual outdoor conditions. The reference environment is characterized by equal air and mean radiant temperatures, a wind speed of 0.1 m s−1, and a water vapor pressure of 12 hPa.
The human heat balance equation is expressed as
M + W + R + C + E D + E R e + E S w + S = 0
where M denotes metabolic rate, W represents mechanical work output, R is net radiant heat flux, C is convective heat flux, E D refers to latent heat loss through skin diffusion, E R e is respiratory heat exchange, E S w is evaporative heat loss due to sweating, and S denotes body heat storage.
The PET calculation involves two steps. First, the MEMI model is used to determine human core and skin temperatures under outdoor conditions. These values are then inserted into the energy balance equation under reference conditions to determine the equivalent air temperature corresponding to PET.
(3)
Universal Thermal Climate Index (UTCI)
The Universal Thermal Climate Index (UTCI) is based on the UTCI-Fiala multi-node thermoregulation model combined with an adaptive clothing model [19,20]. It is formulated using the concept of an equivalent temperature representing the thermal stress experienced by humans.
UTCI is calculated as
U T C I T a T r v a p a = T a + O f f s e t T a T r v a p a
where T a is air temperature, T r is mean radiant temperature, v a denotes wind speed, p a represents water vapor pressure, and Offset is the equivalent temperature adjustment determined by the combined effects of these climatic variables.
The offset term represents the equivalent temperature adjustment determined by the combined effects of the atmospheric variables. The UTCI model simulates multiple physiological responses of the human body, including core temperature and mean skin temperature over exposure periods of 30–120 min. These physiological responses are reduced to a single response index using principal component analysis, and the corresponding equivalent air temperature under reference conditions is defined as the UTCI value.

2.5. Correlation Analysis Between Load and Meteorological Factors

Meteorological load exhibited a significant positive correlation with temperature variables (Table 2). The correlation coefficient between meteorological load and mean air temperature was 0.53, which passed the 0.01 significance test, indicating that temperature is the dominant meteorological factor influencing summer load variability.
In addition to temperature, meteorological variables involved in the calculation of thermal indices, including relative humidity and wind speed, were also analyzed. Relative humidity showed a weak but statistically significant negative correlation with meteorological load (r = −0.20, p < 0.05).
The biometeorological thermal indices ET, PET, and UTCI—which integrate temperature, humidity, wind speed, and solar radiation—showed strong positive correlations with meteorological load, with correlation coefficients of approximately 0.54 (all significant at the 0.01 level). These results indicate that physiological thermal indices are sensitive indicators of variations in summer electricity demand.
Although the strength of correlations between individual meteorological variables and meteorological load varied among months, stable positive relationships were consistently observed between load and mean temperature, ET, PET, and UTCI. In contrast, relative humidity generally showed negative correlations, while the relationships involving humidity and wind speed exhibited larger monthly variability and greater uncertainty.

3. Results

3.1. Model Configuration

3.1.1. Lagged Effects and Window Optimization

Based on the above analysis, electricity load and meteorological data from 2019 to 2020 were used to develop the predictive models, including air temperature and three biometeorological indices (ET, PET, and UTCI), together with apparent temperature (AT). To account for the temporal dependence in load dynamics, the meteorological load from the previous day was included as a predictor to represent load persistence.
For each thermal index (AT, ET, PET, and UTCI), three modeling approaches were implemented, namely linear regression, back-propagation neural networks (BP), and random forest (RF). The predicted meteorological load was subsequently combined with the estimated trend component to reconstruct the total load. Model hyperparameters were optimized using the training dataset, and model performance was evaluated using independent data from the summer period (May–September) of 2021.
Given that historical meteorological load may influence current load conditions, the cumulative effect of antecedent load was further examined. The cumulative meteorological load was defined as the moving average of the maximum meteorological load over the preceding 1–7 days. Pearson correlation coefficients were then calculated between these cumulative variables and the daily maximum meteorological load.
As shown in Figure 2, the strongest correlation was observed for the previous day (lag −1), with a coefficient of 0.86. The correlation gradually decreased as the averaging window increased from 2 to 7 days. This indicates that short-term persistence dominates the temporal dependence structure of meteorological load. Therefore, the maximum meteorological load from the previous day was selected as a key predictor in the forecasting models.
In short-term electricity load forecasting, the cumulative influence of meteorological conditions is typically manifested within a time scale of 1–3 days. The first day (lag −1) reflects the immediate response to thermal stress, while the second day captures delayed effects associated with prior heat exposure. Beyond three days, the direct influence of antecedent meteorological conditions on current load generally diminishes.
Based on this temporal characteristic, the search range for lagged features was constrained to 1–3 days to balance information completeness and model timeliness. To ensure a systematic evaluation of lag structures, rolling windows of meteorological variables (1–3 days) were incorporated into the hyperparameter search space.
Specifically, for each biometeorological index (AT, ET, PET, and UTCI), three candidate features were constructed: the previous-day value (t − 1), the 2-day moving average (t − 1 to t − 2), and the 3-day moving average (t − 1 to t − 3). These lagged features were jointly optimized with different machine learning models.
For each index–window combination, four models were trained: linear regression (as a baseline without hyperparameter tuning), back-propagation neural networks (BP), random forest (RF), and a stacking ensemble model that integrates BP and RF using a Ridge meta-learner. Model training was conducted using data from the summer seasons (May–September) of 2019–2020, while independent data from the same period in 2021 were used for out-of-sample evaluation.
The optimal lag window and corresponding model configuration for each index were determined by exhaustively evaluating all combinations, using the test root mean square error (Test RMSE) as the sole selection criterion.
The data-driven hyperparameter tuning process and corresponding results (Figure 3) indicate that, because different biometeorological indices differ in their formulation and underlying physical assumptions, their optimal lag structures are not uniform, unlike in conventional studies that typically impose a common lag window for all meteorological variables. Clear heterogeneity is observed: AT and PET achieve their lowest Test RMSE under a 1-day lag (Day 1), with values of 65.34 × 104 kW and 64.70 × 104 kW, respectively, whereas ET and UTCI perform best under a 2-day moving average window (Day 2), with Test RMSE values of 63.76 × 104 kW and 62.55 × 104 kW, respectively.
From a physical perspective, AT and PET are primarily governed by temperature and radiation, leading to a relatively rapid thermal response. In contrast, ET and UTCI incorporate humidity and wind speed, and the associated human thermal stress—particularly processes such as sweat evaporation—exhibits an integration effect over several hours to about one day. Consequently, a 2-day accumulation window is required to adequately capture these effects. This finding confirms the effectiveness and necessity of an index-specific adaptive lag framework.
Based on these results, this study adopts an independent single-index modeling strategy, in which the input features of each model are strictly separated to avoid multicollinearity among different thermal indices. Specifically, all models include the previous day’s meteorological load as a common lagged predictor, combined with only one biometeorological index at a time. The finalized model configurations are summarized in Table 3a,b.

3.1.2. Stacking Ensemble Configuration

Building upon the individual model frameworks, a stacking ensemble approach was employed to integrate the back-propagation neural network (BP) and random forest (RF). A Ridge regression model with L2 regularization was used as the meta-learner to achieve an optimal weighted combination of heterogeneous base learners.
(1)
Base Learners: Configuration and Optimization
For each biometeorological index (AT, ET, PET, and UTCI), an independent sub-model (Model 1–Model 4) was developed. The input features for each sub-model consisted of lagged meteorological components, including the previous-day meteorological load and the optimized moving-average representation of the corresponding thermal index.
All input variables were standardized using Z-score normalization (StandardScaler) prior to model training to ensure comparability across features and improve convergence performance.
(2)
Stacking Framework and Information Leakage Control
To prevent information leakage associated with training the meta-learner on in-sample predictions, a 5-fold cross-validation (5-fold CV) scheme was strictly adopted to generate out-of-fold meta-features. The procedure is summarized as follows:
  • The training dataset (May–September 2019–2020) was randomly divided into five folds;
  • For each fold, BP and RF models were trained on four folds and used to generate predictions for the remaining validation fold;
  • The out-of-fold predictions from BP and RF were concatenated to form a two-dimensional meta-feature matrix;
  • The Ridge meta-learner was trained using the meta-feature matrix as input and the observed load as the target, yielding the optimal ensemble weights and intercept.
In the final prediction stage, BP and RF models were retrained using the full training dataset. Their predictions on the independent test set (May–September 2021) were then fed into the trained Ridge meta-learner to produce the final ensemble output.
This procedure ensures that the meta-learner is trained exclusively on out-of-sample predictions, thereby learning bias-correction patterns without exposure to the corresponding ground truth during base learner training, which guarantees an unbiased evaluation of generalization performance.

3.1.3. Weekend Effect and Its Treatment

Statistical analysis of the daily electricity load during 2019–2021 (Figure 4) reveals a pronounced intra-week cycle in the study region. Weekday loads (Monday–Friday) remain relatively stable at approximately 9700 × 104 kW, with daily fluctuations below 1.0%, reflecting the dominant contribution of industrial and commercial base demand.
In contrast, a systematic reduction in load is observed during weekends. On average, Saturday load decreases to 96.9% of the weekday mean, while Sunday load further declines to 90.0%, corresponding to a reduction of about 10.0%. These findings are generally consistent with those reported by Luo et al. (2007) [61], who identified a stable weekday pattern and a pronounced drop on Sundays in the Guangdong power system. However, the magnitude of the Sunday reduction in this study (10.0%) is substantially larger than the previously reported 3.7% (96.29%), and Saturday load shifts from slightly above the mean (102.26%) in earlier studies to below the mean (96.9%). These differences may be attributed to changes in industrial structure, increased residential air-conditioning penetration, and differences in the study period.
Notably, the relative weekend load exhibits considerable variability. The Saturday load ranges from 70.7% to 119.8% of the weekday mean, while Sunday ranges from 63.1% to 122.4%. This wide dispersion indicates that weekend effects cannot be adequately represented by a fixed proportional adjustment. Instead, they exhibit substantial intra-seasonal variability. For example, during persistent high-temperature periods in summer, increased residential cooling demand may partially offset reductions in industrial activity, leading to weekend loads comparable to or even exceeding weekday levels. Conversely, during cooler seasons or in weekends adjacent to public holidays, reduced industrial demand may result in a more pronounced decline.
To account for this variability, a dynamic correction approach is adopted in this study. Following the framework proposed by Luo et al. (2007) [61], day-of-week correction factors are introduced and allowed to vary seasonally, rather than applying a uniform annual adjustment. This strategy improves the model’s ability to capture transitional periods between weekdays, weekends, and holidays, thereby enhancing prediction accuracy under varying demand conditions.
The distribution of daily load ratios over the three-year period (Figure 5) reveals pronounced interannual variability in holiday correction factors, with the most notable differences occurring during the Labour Day holiday. In 2019 and 2020, the load exhibited a typical “deep V-shaped” decline, with the load on May 1 dropping to 61.1% and 71.5% of the average weekday level of the preceding week, respectively. This pattern reflects the strong suppression of electricity demand due to industrial shutdowns and reduced commercial activity.
In contrast, the decline in 2021 was substantially weaker. On May 1, the load remained at 94.1% of the corresponding weekday level, and the holiday-related minimum shifted to May 2. This deviation is closely associated with concurrent meteorological conditions. As shown in Table 4, a pronounced heat event occurred during the 2021 Labour Day period. The maximum temperature on May 1 reached 32.5 °C, which is 6.8 °C higher than in 2019 (25.7 °C) and 2.2 °C higher than in 2020 (30.3 °C). Moreover, temperatures on the adjacent days (April 30 and May 2) both exceeded 31 °C.
This sustained high-temperature condition likely triggered a substantial increase in residential cooling demand, effectively offsetting the reduction in industrial load. As a result, a compensatory effect emerged between holiday-related demand suppression and temperature-driven load increases.
A similar interannual contrast is observed during the Dragon Boat Festival. In 2019 and 2020, a pre-holiday surge of approximately 110% occurred within three days before the holiday, likely associated with intensified industrial activity. However, this pattern was not evident in 2021, when pre-holiday loads remained relatively moderate. On the holiday itself, the load dropped to 71.4%, the lowest among the three years, possibly influenced by holiday scheduling adjustments and localized social restrictions.
These findings indicate that holiday load correction cannot be adequately represented by a fixed proportional factor. Instead, it is strongly modulated by concurrent temperature anomalies. Therefore, incorporating a temperature–holiday interaction term into the forecasting framework—specifically, adjusting holiday correction factors upward under high-temperature conditions—can improve prediction accuracy during extreme weather events.
Using the 2019–2020 training dataset, calendar-based correction factors for weekends and major holidays during the warm season (May–September) were systematically derived. The correction factors were calculated as the ratio between the daily system load on the target day (Saturday, Sunday, or a holiday) and the average load of the preceding weekday period (Monday–Friday of the previous week). These ratios were then averaged across all corresponding days in the training dataset to obtain stable correction coefficients for each calendar category.
The results indicate a clear and systematic reduction in weekend load. On average, Saturday load corresponds to 96.9% of the preceding weekday mean, while Sunday load further decreases to 90.0%.
For the two major holidays within the warm season—the Labour Day holiday (29 April–5 May) and the Dragon Boat Festival (from three days before to three days after the holiday)—the daily correction factors exhibit a characteristic “V-shaped” pattern. During the Labour Day holiday, the load reaches its minimum on May 1, averaging approximately 80% of the preceding weekday level. The load on the day before the holiday (30 April) is around 95%, and gradually recovers to above 90% by the end of the holiday period (5 May). A similar pattern is observed for the Dragon Boat Festival, where the load reaches a minimum of approximately 85% on the holiday itself and returns to near-normal levels by the third day after the holiday.
In the prediction stage, the baseline model first produces load estimates corresponding to weekday conditions. These estimates are then adjusted using the predefined correction factors according to the calendar attributes of each target date in 2021. Specifically, a multiplier of 0.969 is applied for Saturdays and 0.900 for Sundays, while holiday-specific daily coefficients are applied for the Labour Day and Dragon Boat Festival periods.
This multiplicative correction effectively accounts for systematic deviations associated with industrial shutdowns and shifts in social electricity consumption patterns during non-working days. As a result, the prediction accuracy for weekends and holiday periods is substantially improved.
It should be noted that the above calendar-based correction scheme is estimated from a relatively limited training sample (two warm seasons), which may introduce uncertainty in the stability of the coefficients when applied out of sample. In particular, the use of multiple day-specific adjustment factors for holidays may lead to potential over-parameterization under small-sample conditions.
In this study, the correction factors are primarily intended to capture first-order behavioral differences between working days and non-working periods, such as industrial shutdowns and shifts in residential electricity consumption, rather than to provide fully generalized coefficients. Therefore, these parameters should be interpreted as approximate adjustments rather than fixed constants.

3.2. Model Performance

3.2.1. Forecasting Results

Figure 6 and Table 5 present the prediction results and performance metrics of the four algorithms under different thermal index configurations for the summer of 2021. The optimal lag structures for each index were determined through hyperparameter tuning (Day 1 for AT and PET; Day 2 moving average for ET and UTCI).
Overall, all models are able to capture the temporal variability of electricity load to a certain extent, particularly the fluctuations associated with high-temperature conditions. The simulated results reproduce the general seasonal pattern, with peak loads occurring in July–August, when temperatures are highest in Guangdong Province. However, noticeable discrepancies remain during specific periods, indicating limitations in model accuracy.

3.2.2. Bayesian Statistical Inference

Traditional hypothesis testing methods, such as the Diebold–Mariano test, rely on p-value thresholds for binary decisions. These approaches often suffer from low statistical power in small samples, increasing the risk of failing to detect real effects. To address this limitation, this study employs Bayesian bootstrap inference (Rubin, 1981). A total of 10,000 resamples are used to construct the posterior distribution of RMSE differences, providing probabilistic evidence from three complementary perspectives:
(1)
The posterior probability that a given thermal index outperforms AT, expressed as P (ΔRMSE > 0|data);
(2)
The Bayes factor (BF10), quantifying the strength of evidence in favor of a difference relative to practical equivalence;
(3)
The Region of Practical Equivalence (ROPE), defined as ±1% of the baseline RMSE (approximately ±0.63–0.67 × 104 kW), combined with the 95% highest density interval (HDI) to assess practical significance.
The results (Figure 7) reveal a consistent pattern. Within the Random Forest (RF) framework, the data provide very strong to extreme evidence that thermal indices outperform AT. Specifically, the posterior probability reaches 98.1% for UTCI (BF10 = 37.2), 96.5% for PET (BF10 = 12.1), and 93.3% for ET (BF10 = 13.1). These probabilistic results offer substantially richer information than marginal p-values (e.g., RF–UTCI: p = 0.064), indicating that, conditional on the observed data, the probability that UTCI outperforms AT exceeds 98%, rather than reducing the conclusion to a binary significance decision.
However, a high posterior probability does not necessarily imply practical significance. For all RF models, the 95% HDIs overlap with the ROPE boundaries. For instance, the HDI for RF–UTCI is [+0.28, +9.95] × 104 kW, with the lower bound (+0.28 × 104 kW) falling within the ROPE range. Similarly, the HDI for RF–PET spans [−0.21, +5.29] × 104 kW and even includes negative values. As a result, all comparisons are classified as “uncertain” under the ROPE criterion. This reflects a fundamental limitation of small-sample inference: while UTCI is likely superior, the magnitude of improvement remains poorly constrained, ranging from negligible gains to substantial reductions in RMSE.
Within the ensemble framework, the posterior probability for UTCI is 91.2% (BF10 = 9.8, strong evidence), followed by ET at 80.0% (BF10 = 4.7, moderate evidence) and PET at 73.7% (BF10 = 2.0, weak evidence). Compared to RF, the ensemble models exhibit wider HDIs (e.g., UTCI: [−1.37, +6.63] × 104 kW), indicating increased predictive uncertainty. This suggests that while stacking can improve point estimates, it may amplify variance when combining heterogeneous learners, thereby reducing the precision of uncertainty quantification.
In contrast, for the BP neural network and linear regression models, the posterior probabilities for all thermal indices are below 50%, with BP–UTCI as low as 2.6%. This indicates that simpler model structures are unable to effectively exploit the additional information contained in multidimensional thermal indices, and may even suffer from redundancy-induced degradation.
Overall, the Bayesian analysis suggests that the predictive advantage of UTCI likely exists, but its magnitude is uncertain and strongly dependent on the modeling approach. RF models are better able to leverage the combined effects of temperature, humidity, radiation, and wind speed, yielding the most robust posterior evidence. Ensemble models show moderate support but higher uncertainty, while simpler models provide no supporting evidence.
A key limitation is the sample size (n = 143), which constrains estimation precision. Based on the dispersion of posterior distributions, increasing the sample size to approximately 300–400 observations would likely be necessary to obtain HDIs that fall entirely outside the ROPE. Future work should therefore extend the dataset to multiple years in order to better quantify the marginal benefits and practical applicability of UTCI in electricity load forecasting under humid subtropical conditions.

3.2.3. Residual Analysis

Figure 8 and Figure 9 illustrate the residual distributions of the test dataset (n = 143) based on the index-specific optimal lag structures (Day 1 for AT and PET; Day 2 moving average for ET and UTCI). Overall, residuals from all models fluctuate around zero without clear heteroscedastic patterns, indicating reasonable statistical stability. However, distinct differences in error structures are observed across algorithms.
The linear regression model exhibits the most pronounced systematic underestimation, with mean residuals ranging from −6.20 to −3.72 × 104 kW (AT: −6.19 × 104 kW; ET: −4.53 × 104 kW; PET: −4.81 × 104 kW; UTCI: −3.72 × 104 kW) and standard deviations of 63.09–64.40 × 104 kW. A similar but slightly weaker underestimation bias is observed in the BP neural network, with mean residuals between −5.53 and −3.93 × 104 kW and comparable variability (standard deviation: 62.59–64.58 × 104 kW). Both models show relatively symmetric distributions with moderate negative skewness (linear: −1.02 to −0.86; BP: −1.13 to −0.96), suggesting a smoothing effect that limits their ability to capture nonlinear extremes in the meteorology–load relationship.
In contrast, the Random Forest (RF) model shows a clear shift toward positive residuals, indicating systematic overestimation. Mean residuals range from +2.49 to +7.59 × 104 kW (AT: +2.49 × 104 kW; PET: +3.02 × 104 kW; ET: +7.59 × 104 kW; UTCI: +6.12 × 104 kW). Notably, ET and UTCI with a 2-day aggregation window exhibit substantially larger positive biases than AT and PET with a 1-day lag. This suggests that RF may overfit cumulative features over longer windows, leading to conservative predictions during high-load conditions. The RF residuals also display wider dispersion (standard deviation: 61.30–66.58 × 104 kW) and pronounced left-skewed heavy tails (skewness: −1.93 to −1.61; kurtosis: 5.17–6.98), indicating higher sensitivity to extreme deviations.
In terms of overall variability, the ensemble model achieves the lowest average residual standard deviation (62.62 × 104 kW), outperforming BP (63.50 × 104 kW), RF (63.73 × 104 kW), and linear regression (63.73 × 104 kW). This indicates improved predictive stability. Notably, BP shows slightly lower variance than RF in the test phase, suggesting improved generalization relative to its training performance.
Across thermal indices, a clear performance gradient is observed. Averaged over all models, UTCI achieves the best performance (mean RMSE = 62.81 × 104 kW), followed by PET (63.23 × 104 kW) and ET (63.34 × 104 kW), while the temperature-only model (AT) performs worst (64.03 × 104 kW). Among all configurations, the Ensemble–UTCI model with a 2-day window yields the best overall performance (RMSE = 60.96 × 104 kW; residual standard deviation = 61.02 × 104 kW), highlighting the advantage of combining physiologically based indices with adaptive temporal aggregation under humid subtropical conditions.
It should be noted that the identification of optimal lag structures and model configurations is subject to uncertainty due to the limited sample size (training: n = 306; testing: n = 143). In small-sample settings, the relative frequency of extreme weather events may influence RMSE rankings. Future studies should validate the robustness of the index-adaptive lag framework using longer time series or multi-regional datasets.

3.2.4. Uncertainty Analysis

To further evaluate the uncertainty quantification capability of the models, an adaptive stratified residual bootstrap approach (n = 2000) was applied to construct 90% prediction intervals. As shown in Figure 10, the prediction interval coverage probability (PICP) for all models over the full test set is close to the nominal 90% level (89.51–91.61%), indicating good statistical reliability of the interval estimates.
In terms of interval sharpness, the ensemble models demonstrate improved efficiency. The prediction interval normalized average width (PINAW) is reduced to 14.48–16.61%, representing a narrowing of approximately 2–9% compared to the BP neural network (15.10–16.02%). This reflects the advantage of the stacking framework in reducing predictive variance. However, this improvement is not universal. Under the AT configuration, the PINAW of the ensemble model (15.57%) is slightly higher than that of BP (15.51%), suggesting that variance reduction through ensembling depends on both model structure and input features.
Among all configurations, the UTCI–Ensemble model achieves the narrowest prediction interval (PINAW = 14.48%), followed closely by PET–Ensemble (PINAW = 14.74%). This indicates that ensemble models driven by biometeorological indices provide more precise uncertainty quantification compared to temperature-only inputs.
It is important to note that the summer of 2021 in Guangdong was characterized by multiple episodes of extreme heat, which significantly influenced electricity demand and contributed to the lower prediction accuracy relative to 2019 and 2020. Between 20 May and 20 September, there were 19 days with maximum temperature exceeding 35 °C. In particular, from 23 July to 30 July, temperatures remained above 35 °C for eight consecutive days, with a peak of 36.8 °C. Under these persistent high-temperature conditions, electricity demand remained at elevated levels, exceeding 10,000 × 104 kW from early May and reaching a peak of 13,513 × 104 kW on 27 July.
To examine model behavior under extreme conditions, a case study was conducted for the heat event from 23 July to 30 July 2021. Given the limited duration of this period (8 days), the results should be interpreted as preliminary observations rather than definitive evidence of model robustness. As summarized in Table 6, the Random Forest (RF) model outperforms the BP neural network for most indices, reducing prediction errors by approximately 10–16%. However, this advantage is negligible under the AT configuration (only 0.8%), indicating that temperature alone provides limited nonlinear information under extreme conditions.
Specifically, the PET–RF configuration achieves the lowest prediction error, with a mean absolute percentage error (MAPE) of 3.03% and a dispersion of 416.6. This is followed by AT–RF (MAPE = 3.50%) and AT–Ensemble (MAPE = 3.48%). In contrast, the BP model shows relatively larger errors (MAPE = 3.49–4.03%) and higher dispersion (557.0–585.9), suggesting weaker adaptability to abrupt temperature extremes.
In terms of interval performance, PET–RF and PET–Ensemble both achieve 100% coverage during this heat event period, outperforming other configurations. This indicates that PET may better capture sustained thermal stress under extreme conditions. The adaptive lag framework also appears effective in accommodating differences in response times across thermal indices.
Overall, the ensemble models achieve a balance between coverage and sharpness in uncertainty estimation, while RF-based models—particularly when combined with PET—show relatively stable performance during extreme heat events. However, given the limited sample size of the case study, the statistical significance of these differences remains uncertain. Future work should incorporate longer time series and additional extreme events to robustly assess model performance under anomalous climate conditions.

3.2.5. Probabilistic Forecast Evaluation of the Optimal Model

To further assess the reliability of probabilistic predictions, the Ensemble–UTCI model—identified as the best-performing configuration in the previous comparison (test RMSE = 60.96 × 104 kW; standard deviation = 61.02 × 104 kW; R2 = 0.559)—was selected to construct 90% prediction intervals (PIs) using a bootstrap resampling approach (n = 2000). The analysis focuses on both seasonal performance and extreme heat event conditions.
Compared with single models, the ensemble approach effectively integrates the strengths of the BP neural network and Random Forest through a Ridge meta-learner (BP coefficient = 0.113; RF coefficient = 0.871; intercept = 28.101). This combination improves interval efficiency while maintaining adequate coverage. As shown in Figure 11, the Ensemble–UTCI model achieves a prediction interval coverage probability (PICP) of 90.91% over the summer test period (May–September, n = 143), closely matching the nominal confidence level of 90%. The prediction interval normalized average width (PINAW) is 14.48%, representing a reduction of approximately 8.3% compared to the BP model (15.79%) and about 2.0% compared to the RF model (14.77%). This demonstrates the advantage of the ensemble strategy in reducing predictive variance and improving uncertainty quantification.
However, model performance under extreme conditions reveals important limitations. During the heat event from 23 to 30 July 2021 (8 consecutive days with maximum temperature above 35 °C), the model maintains relatively stable point predictions (MAPE = 3.65%; dispersion = 52.60 × 104 kW). Nevertheless, due to the limited sample size, these results should be interpreted as preliminary. Notably, the prediction interval coverage drops to 75%, indicating under-coverage under extreme high-load conditions. This behavior suggests that the interval estimates become overly narrow, likely due to the variance-reduction effect introduced by the ensemble framework.
Despite this limitation, the ensemble model still outperforms the BP model in point prediction accuracy (MAPE = 4.03%). The Ridge meta-learner partially corrects the systematic underestimation of BP (mean residual = −3.93 × 104 kW) and the overestimation of RF (mean residual = +6.11 × 104 kW), resulting in a reduced bias (mean residual = +4.35 × 104 kW). However, the residual remains positive, indicating that the RF-driven overestimation still dominates within the ensemble.
From an operational perspective, the proposed probabilistic framework provides a useful reference for risk-aware grid management. The bootstrap-based 90% prediction interval, with an average half-width of approximately ±72 × 104 kW (about 7.2% of mean load), can serve as an initial guideline for reserve capacity planning. Under normal conditions, system operators may refer to the upper bound of the Ensemble–UTCI interval for reserve allocation. However, during extreme heat events, caution is required. Given that the validation is based on a single short-duration event and exhibits reduced coverage (75%), more conservative strategies—such as adopting RF-based intervals or applying additional safety margins—are recommended to mitigate the risk of underestimation.
Overall, the Ensemble–UTCI model demonstrates a balanced performance at the seasonal scale, achieving a reasonable trade-off between predictive accuracy and uncertainty quantification. While its application in operational decision-making shows promise, further validation using longer time series and multiple extreme events is necessary to ensure robustness under anomalous climate conditions.

4. Discussion

4.1. Model Structure and Forecasting Performance

This study compared the performance of the Back-Propagation Neural Network (BP), Random Forest (RF), and a Stacking ensemble model for summer electricity load prediction in Guangdong Province. The results reveal clear differences between the training phase (2019–2020) and the independent test phase (May–September 2021).
During training, RF exhibited the strongest fitting capability, with the RF–AT model achieving the highest coefficient of determination (R2 = 0.875) and the lowest RMSE (40.72 × 104 kW), reflecting its strong ability to capture nonlinear relationships. However, in the independent test phase, its generalization performance declined (R2 = 0.474–0.554; RMSE = 61.30–66.60 × 104 kW), indicating a certain degree of overfitting under complex meteorological conditions.
The introduction of the Stacking ensemble framework improved both predictive performance and stability. By combining BP and RF outputs through a Ridge meta-learner, the ensemble model achieved an average RMSE of 62.51 × 104 kW in the test set, outperforming BP (63.47 × 104 kW), linear regression (63.70 × 104 kW), and RF (63.72 × 104 kW). Among all configurations, the Ensemble–UTCI model performed best (R2 = 0.559, RMSE = 60.96 × 104 kW, MAE = 45.10 × 104 kW), with the lowest error dispersion (Std = 61.02 × 104 kW).
Analysis of Ridge coefficients indicates that the ensemble model relies primarily on RF outputs (weights ≈ 0.85–0.87), while BP contributes less (≈0.11–0.13), mainly correcting systematic biases. Residual analysis further confirms that the ensemble model effectively balances the error structures of individual models, improving overall stability.
Bootstrap-based interval estimation shows that all models achieve prediction interval coverage probabilities (PICP) close to the theoretical 90% level (89.51–91.61%). The ensemble model demonstrates higher efficiency, with the narrowest prediction interval normalized average width (PINAW = 14.44% for UTCI–Ensemble), indicating reduced uncertainty without compromising reliability.

4.2. Performance of Thermal Indices and Physical Interpretation

The predictive performance varies significantly across different biometeorological indices. Based on the overall results across all models, UTCI shows the best performance (average RMSE = 62.81 × 104 kW), followed by PET (63.23 × 104 kW) and ET (63.34 × 104 kW), while models using only air temperature exhibit the largest errors (64.03 × 104 kW).
Bayesian bootstrap analysis provides a more rigorous statistical assessment. Within the RF framework, UTCI shows very strong evidence of improvement over air temperature (posterior probability = 98.1%, BF10 = 37.2). Similarly, strong evidence is observed in the ensemble model (posterior probability = 91.2%, BF10 = 9.8). However, this advantage is not statistically supported in BP or linear models (posterior probability < 0.1), indicating a clear dependence on model structure.
The superior performance of UTCI can be attributed to its comprehensive representation of thermal stress. It integrates air temperature, humidity, wind speed, and radiation, and is based on a human energy balance model. In the hot and humid climate of Guangdong, high humidity suppresses evaporative cooling, making perceived temperature significantly higher than air temperature. UTCI captures this effect more effectively, leading to improved prediction of cooling-driven electricity demand.
In contrast, PET, although based on human energy balance, assumes a standardized reference environment that may limit its responsiveness in humid conditions. ET, as an empirical index, lacks explicit representation of radiation and wind effects, resulting in comparatively weaker performance under complex climatic conditions.

4.3. Model Performance Under Extreme Heat Events

Extreme heat events provide a critical test for model robustness, although the analysis in this study is based on a limited sample. A heat event during 23–30 July 2021 (8 consecutive days with maximum temperature exceeding 35 °C) was used as a case study.
The results indicate that RF performs better overall during this event. For example, the PET–RF model achieves the lowest MAPE (3.03%), significantly outperforming BP models (3.49–4.03%). PET–RF also shows the lowest error dispersion (41.66 × 104 kW), indicating higher stability under prolonged heat stress.
From the perspective of interval prediction, PET–RF and PET–Ensemble achieve full coverage (PICP = 100%), whereas some models (e.g., Ensemble–UTCI) show reduced coverage (75%), lower than the annual average (~90%). This suggests that extreme meteorological conditions increase predictive uncertainty.
However, it should be emphasized that this analysis is based on a single event with a small sample size. Therefore, the findings should be interpreted as preliminary rather than generalizable conclusions.

4.4. Uncertainty and Study Limitations

The model-dependent performance of biometeorological indices suggests that no single index is universally optimal across all modeling frameworks. In addition, the standardized human assumptions embedded in these indices should be carefully considered.
In this study, a detrending approach based on removing the linear trend of summer peak load was used to isolate meteorological load. However, this method cannot fully separate the underlying base load driven by urban development, which remains a limitation.
Moreover, the validation of extreme heat events is based solely on the 23–30 July 2021 heat event (8 days), which is insufficient to support robust conclusions on model reliability. Future studies should incorporate longer time series and more extreme events to improve the robustness of the findings.

5. Conclusions

5.1. Main Findings

Using summer electricity load data from Guangdong Province, this study developed BP neural network, random forest, and stacking ensemble models and evaluated the forecasting performance of several biometeorological thermal indices. The main findings are summarized as follows:
  • Model performance:
RF shows the strongest fitting ability during training (R2 = 0.875; RMSE = 40.72 × 104 kW), but its generalization performance decreases in the test phase. The ensemble model achieves the best overall performance (average RMSE = 62.51 × 104 kW), with Ensemble–UTCI performing best (R2 = 0.559; RMSE = 60.96 × 104 kW; MAE = 45.10 × 104 kW; Std = 61.02 × 104 kW), indicating improved stability.
2.
Effect of biometeorological indices:
Incorporating ET, PET, and UTCI improves prediction accuracy, with UTCI performing best (average RMSE = 62.81 × 104 kW). Bayesian analysis shows strong statistical evidence in RF (98.1%) and ensemble models (91.2%), but not in BP or linear models, indicating model dependency.
3.
Performance during extreme heat events:
Under extreme heat conditions, prediction errors increase, but RF remains relatively stable. The PET–RF model achieves the lowest MAPE (3.03%) and highest interval reliability (PICP = 100%). However, this conclusion is based on a single event and requires further validation.
Overall, biometeorological indices show clear potential for improving electricity load forecasting in hot and humid regions.

5.2. Implications and Future Research

Although the model performed well during the 23–30 July 2021 Heat event, accurately predicting the timing and duration of peak load remained challenging due to the complexity of load dynamics and meteorological processes.
Future research should focus on:
  • Incorporating urban-scale data, socioeconomic factors, and advanced methods such as deep learning and big data analytics to improve prediction reliability under climate change;
  • Expanding datasets to include multiple regions and a wider range of extreme heat events to better assess model robustness;
  • Using larger samples and more rigorous statistical methods to further validate the performance differences among biometeorological indices.

Author Contributions

Conceptualization, J.M., H.Y. and F.X.; methodology, J.M. and Y.Z.; software, J.M., Q.H. and L.P.; formal analysis, J.M. and H.Y.; data curation, J.M. and L.P.; writing—original draft preparation, J.M.; writing—review and editing, J.M., F.X. and Y.Z.; visualization, J.M., H.S. and Q.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Guangdong Basic and Applied Basic Research Foundation (Grant No. 2024A1515010064), the Guangdong Basic and Applied Basic Research Fund Meteorological Joint Fund (Grant No. 2024A1515510034), the Guangdong Science and Technology Plan Project (Grant No. 2024B1212040002), the program for scientific research start-up funds of Guangdong Ocean University (Grant No. 060302032307), and the National Key R&D Program of China (Grant No. 2018YFA0605604).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Solar radiation data were obtained from the NASA POWER project [https://power.larc.nasa.gov/data-access-viewer/, accessed on 20 April 2026].

Acknowledgments

We thank China Southern Power Grid for providing power load data, the Guangdong Climate Center for providing meteorological data, and NASA for providing solar radiation data. We also appreciate the editors for their assistance in manuscript preparation, and the anonymous reviewers for their valuable comments and insightful suggestions.

Conflicts of Interest

The authors declare no conflicts of interest. Mr. Haibo Shen is an employee of China Power Dispatching Control Center, China Southern Power Grid Co., Ltd. The paper reflects the views of the scientists and not the company.

Abbreviations

The following abbreviations are used in this manuscript:
ATAverage Temperature
ETEffective Temperature
BPBack Propagation
RFRandom Forest
PETPhysiological Equivalent Temperature
UTCIUniversal Thermal Climate Index
PICPPrediction Interval Coverage Probability
PINAWPrediction Interval Normalized Average Width
PIPrediction Interval

References

  1. Han, S.; Li, D.; Li, K.; Wu, H.; Gao, Y.; Zhang, Y.; Yuan, R. Analysis and Study of Transmission Line Icing Based on Grey Correlation Pearson Combinatorial Optimization Support Vector Machine. Measurement 2024, 236, 115086. [Google Scholar] [CrossRef]
  2. Yuan, J.; Zhang, K.; Ding, B.; Huangfu, W.; Zhang, J.; Mou, Q.; Peng, K.; Zhang, H. Enhancement Strategy of Power System Resilience for Supply-Demand Imbalance at Extreme Weather Conditions: A High-Share Renewable Energy Case from Qinghai Province in China. Renew. Energy 2026, 256, 124048. [Google Scholar] [CrossRef]
  3. Chen, L.; Li, C.; Xin, Z.; Nie, S. Simulation and Risk Assessment of Power System with Cascading Faults Caused by Strong Wind Weather. Int. J. Electr. Power Energy Syst. 2022, 143, 108462. [Google Scholar] [CrossRef]
  4. Rahman, T.; Hossain Lipu, M.S.; Alom Shovon, M.M.; Alsaduni, I.; Karim, T.F.; Ansari, S. Unveiling the Impacts of Climate Change on the Resilience of Renewable Energy and Power Systems: Factors, Technological Advancements, Policies, Challenges, and Solutions. J. Clean. Prod. 2025, 493, 144933. [Google Scholar] [CrossRef]
  5. Zhang, T.; Li, H.-Z.; Xie, B.-C. Have Renewables and Market-Oriented Reforms Constrained the Technical Efficiency Improvement of China’s Electric Grid Utilities? Energy Econ. 2022, 114, 106237. [Google Scholar] [CrossRef]
  6. Wang, J.; Zhang, X.; Wang, Y.; Yang, S.; Wang, S.; Xie, Y.; Gong, J.; Lin, J. Power System Source-Load Forecasting Based on Scene Generation in Extreme Weather. Energy 2025, 330, 136991. [Google Scholar] [CrossRef]
  7. Zhang, W. The Effect of Meteorology Factors on Power Load in High-Temperature Seasons. Adv. Mater. Res. 2014, 1008–1009, 796–799. [Google Scholar] [CrossRef]
  8. Xu, X. Short-Term Load Forecasting of Power System. In Materials Science, Energy Technology, and Power Engineering I, Proceedings of the 1st International Conference on Materials Science, Energy Technology, Power Engineering (MEP 2017), Hangzhou, China, 15–16 April 2017; AIP Publishing: College Park, MD, USA, 2017; p. 020041. [Google Scholar] [CrossRef]
  9. Imani, M. Electrical Load-Temperature CNN for Residential Load Forecasting. Energy 2021, 227, 120480. [Google Scholar] [CrossRef]
  10. Kelo, S.; Dudul, S. A Wavelet Elman Neural Network for Short-Term Electrical Load Prediction under the Influence of Temperature. Int. J. Electr. Power Energy Syst. 2012, 43, 1063–1071. [Google Scholar] [CrossRef]
  11. Karanta, I.; Ruusunen, J. Modelling the Temperature Factor in Short-Term Electrical Load Forecasting. IFAC Proc. Vol. 1992, 25, 269–272. [Google Scholar] [CrossRef]
  12. Al-Hamadi, H.M.; Soliman, S.A. Short-Term Electric Load Forecasting Based on Kalman Filtering Algorithm with Moving Window Weather and Load Model. Electr. Power Syst. Res. 2004, 68, 47–59. [Google Scholar] [CrossRef]
  13. Li, Y.; Pizer, W.A.; Wu, L. Climate Change and Residential Electricity Consumption in the Yangtze River Delta, China. Proc. Natl. Acad. Sci. USA 2019, 116, 472–477. [Google Scholar] [CrossRef]
  14. Yi-Ling, H.; Hai-Zhen, M.; Guang-Tao, D.; Jun, S. Influences of Urban Temperature on the Electricity Consumption of Shanghai. Adv. Clim. Change Res. 2014, 5, 74–80. [Google Scholar] [CrossRef]
  15. Ye, X.; Chen, F.; Hou, Z. Analysis on Electric Loads and Temperature in Wuhan City. Procedia Eng. 2015, 121, 2157–2162. [Google Scholar] [CrossRef]
  16. Wu, J.; Gao, X.; Giorgi, F.; Chen, D. Changes of Effective Temperature and Cold/Hot Days in Late Decades over China Based on a High Resolution Gridded Observation Dataset. Int. J. Climatol. 2017, 37, 788–800. [Google Scholar] [CrossRef]
  17. Abdel-Ghany, A.M.; Al-Helal, I.M.; Shady, M.R. Human Thermal Comfort and Heat Stress in an Outdoor Urban Arid Environment: A Case Study. Adv. Meteorol. 2013, 2013, 693541. [Google Scholar] [CrossRef]
  18. Chen, Y.-C.; Matzarakis, A. Modification of Physiologically Equivalent Temperature. J. Heat Isl. Inst. Int. 2014, 9, 26–32. [Google Scholar]
  19. Jendritzky, G.; De Dear, R.; Havenith, G. UTCI—Why Another Thermal Index? Int. J. Biometeorol. 2012, 56, 421–428. [Google Scholar] [CrossRef] [PubMed]
  20. Bröde, P.; Fiala, D.; Błażejczyk, K.; Holmér, I.; Jendritzky, G.; Kampmann, B.; Tinz, B.; Havenith, G. Deriving the Operational Procedure for the Universal Thermal Climate Index (UTCI). Int. J. Biometeorol. 2012, 56, 481–494. [Google Scholar] [CrossRef] [PubMed]
  21. Wang, C.; Zhan, W.; Liu, Z.; Li, J.; Li, L.; Fu, P.; Huang, F.; Lai, J.; Chen, J.; Hong, F.; et al. Satellite-Based Mapping of the Universal Thermal Climate Index over the Yangtze River Delta Urban Agglomeration. J. Clean. Prod. 2020, 277, 123830. [Google Scholar] [CrossRef]
  22. Pantavou, K.; Kotroni, V.; Lagouvardos, K.; Kyriakou, P. Future Changes of Bioclimate in Greece: Variations in Thermal Stress According to the Universal Thermal Climate Index (UTCI). Sci. Total Environ. 2025, 980, 179514. [Google Scholar] [CrossRef]
  23. Rahman, F.; Hossain, M.D.; Akhter, M.A.E. Study of Heat Wave Events by the Thermophysiological Comfort Indexes in Khulna City, Bangladesh, during Pre-Monsoon and Monsoon Seasons. City Environ. Interact. 2025, 28, 100275. [Google Scholar] [CrossRef]
  24. Chen, W.-A.; Fang, P.-L.; Hwang, R.-L. Calibrating the UTCI Scale for Hot and Humid Climates through Comprehensive Year-Round Field Surveys to Improve the Adaptability. Urban Clim. 2025, 59, 102267. [Google Scholar] [CrossRef]
  25. Nie, T.; Lai, D.; Liu, K.; Lian, Z.; Yuan, Y.; Sun, L. Discussion on Inapplicability of Universal Thermal Climate Index (UTCI) for Outdoor Thermal Comfort in Cold Region. Urban Clim. 2022, 46, 101304. [Google Scholar] [CrossRef]
  26. Mao, Y.; Yang, F.; Wang, C. Application of BP Network to Short-Term Power Load Forecasting Considering Weather Factor. In Proceedings of the 2011 International Conference on Electric Information and Control Engineering, Wuhan, China, 15–17 April 2011; pp. 172–175. [Google Scholar]
  27. Jing, S.; Delv, Z.; Hu, L.; Bingjie, L.; Jian, T.; Yunli, S. Research on Meteorological Sensitive Load in Jiangsu Based on Meteorological Dominant Factors. In Proceedings of the 2019 IEEE Innovative Smart Grid Technologies—Asia (ISGT Asia), Chengdu, China, 21–24 May 2019; pp. 2672–2676. [Google Scholar]
  28. Huang, Q.; Li, Y.; Liu, S.; Liu, P. Hourly Load Forecasting Model Based on Real-Time Meteorological Analysis. In Proceedings of the 2016 8th International Conference on Computational Intelligence and Communication Networks (CICN), Tehri, India, 23–25 December 2016; pp. 488–492. [Google Scholar]
  29. Cheng, Y.-Y.; Chan, P.P.K.; Qiu, Z.-W. Random Forest Based Ensemble System for Short Term Load Forecasting. In Proceedings of the 2012 International Conference on Machine Learning and Cybernetics, Xi’an, China, 15–17 July 2012; pp. 52–56. [Google Scholar]
  30. Di, S. Power System Short Term Load Forecasting Based on Weather Factors. In Proceedings of the 2020 3rd World Conference on Mechanical Engineering and Intelligent Manufacturing (WCMEIM), Shanghai, China, 4–6 December 2020; pp. 694–698. [Google Scholar]
  31. Lahouar, A.; Ben Hadj Slama, J. Random Forests Model for One Day Ahead Load Forecasting. In Proceedings of the IREC2015 The Sixth International Renewable Energy Congress, Sousse, Tunisia, 24–26 March 2015; pp. 1–6. [Google Scholar]
  32. Hou, H.; Liu, K.; Li, X.; Chen, S.; Wang, W.; Rong, K. Assessing the Urban Heat Island Variations and Its Influencing Mechanism in Metropolitan Areas of Pearl River Delta, South China. Phys. Chem. Earth Parts ABC 2020, 120, 102953. [Google Scholar] [CrossRef]
  33. Liu, X.; Chen, H. Integrated Assessment of Ecological Risk for Multi-Hazards in Guangdong Province in Southeastern China. Geomat. Nat. Hazards Risk 2019, 10, 2069–2093. [Google Scholar] [CrossRef]
  34. Waheed, W.; Xu, Q. Data-Driven Short Term Load Forecasting with Deep Neural Networks: Unlocking Insights for Sustainable Energy Management. Electr. Power Syst. Res. 2024, 232, 110376. [Google Scholar] [CrossRef]
  35. Groß, A.; Lenders, A.; Schwenker, F.; Braun, D.A.; Fischer, D. Comparison of Short-Term Electrical Load Forecasting Methods for Different Building Types. Energy Inform. 2021, 4, 13. [Google Scholar] [CrossRef]
  36. Bae, H.-J.; Park, J.-S.; Choi, J.; Kwon, H.-Y. Learning Model Combined with Data Clustering and Dimensionality Reduction for Short-Term Electricity Load Forecasting. Sci. Rep. 2025, 15, 3575. [Google Scholar] [CrossRef]
  37. Bhupatiraju, A.; Ahn, S.B. Explainability-Driven Feature Engineering for Mid-Term Electricity Load Forecasting in ERCOT’s SCENT Region. arXiv 2025, arXiv:2507.22220. [Google Scholar]
  38. Dong, Q.; Huang, R.; Cui, C.; Towey, D.; Zhou, L.; Tian, J.; Wang, J. Short-Term Electricity-Load Forecasting by Deep Learning: A Comprehensive Survey. Eng. Appl. Artif. Intell. 2025, 154, 110980. [Google Scholar] [CrossRef]
  39. Mousaeipour, M.; Sanaieian, H.; Faizi, M.; Alafchi, A.; Hosseini, S.M.A. Multi-Objective Optimization of Urban Block Configuration to Enhance Outdoor Thermal Comfort: A Case Study of District 12, Tehran. Results Eng. 2025, 27, 106012. [Google Scholar] [CrossRef]
  40. Veisi, O.; Tehrani, A.A.; Gharaei, B.; Du, D.K.; Shakibamanesh, A. Towards Universal Thermal Climate Index Prediction via Machine Learning Approaches. Renew. Sustain. Energy Rev. 2025, 217, 115680. [Google Scholar] [CrossRef]
  41. Wang, Y.; Han, Z.; Gao, R. Changes of Extreme High Temperature and Heavy Precipitation in the Guangdong-Hong Kong-Macao Greater Bay Area. Geomat. Nat. Hazards Risk 2021, 12, 1101–1126. [Google Scholar] [CrossRef]
  42. Zhang, C.; Liao, H.; Mi, Z. Climate Impacts: Temperature and Electricity Consumption. Nat. Hazards 2019, 99, 1259–1275. [Google Scholar] [CrossRef]
  43. Aziz, A.; Mahmood, D.; Qureshi, M.S.; Qureshi, M.B.; Kim, K. AI-Based Peak Power Demand Forecasting Model Focusing on Economic and Climate Features. Front. Energy Res. 2024, 12, 1328891. [Google Scholar] [CrossRef]
  44. Zimmermann, M.; Ziel, F. Spatial Meteorological, Socio-Economic, and Political Risks in Probabilistic Electricity Demand Forecasting. SSRN 2024. [Google Scholar] [CrossRef]
  45. Li, B.; Wu, Z.; Zhang, Z.; Li, Y.; Gholinia, F. The Impact of Socio-Economic–Climatic Indicators on Hydropower Production and Energy Demand Correlation Using Echo State Network and Quantum-Based Sand Cat Swarm Optimization Algorithm. Int. J. Low-Carbon Technol. 2025, 20, 1979–1993. [Google Scholar] [CrossRef]
  46. Wang, X.; Wang, H.; Bhandari, B.; Cheng, L. AI-Empowered Methods for Smart Energy Consumption: A Review of Load Forecasting, Anomaly Detection and Demand Response. Int. J. Precis. Eng. Manuf.-Green Technol. 2024, 11, 963–993. [Google Scholar] [CrossRef]
  47. Muqtadir, A.; Li, B.; Qi, B.; Ge, L.; Du, N.; Lin, C. Demand Response Potential Forecasting: A Systematic Review of Methods, Challenges, and Future Directions. Energies 2025, 18, 5217. [Google Scholar] [CrossRef]
  48. Peng, J.; Kimmig, A.; Wang, D.; Niu, Z.; Liu, X.; Tao, X.; Ovtcharova, J. Energy Consumption Forecasting Based on Spatio-Temporal Behavioral Analysis for Demand-Side Management. Appl. Energy 2024, 374, 124027. [Google Scholar] [CrossRef]
  49. Sun, K.; Specian, M.; Hong, T. Nexus of Thermal Resilience and Energy Efficiency in Buildings: A Case Study of a Nursing Home. Build. Environ. 2020, 177, 106842. [Google Scholar] [CrossRef]
  50. Halhoul Merabet, G.; Essaaidi, M.; Ben Haddou, M.; Qolomany, B.; Qadir, J.; Anan, M.; Al-Fuqaha, A.; Abid, M.R.; Benhaddou, D. Intelligent Building Control Systems for Thermal Comfort and Energy-Efficiency: A Systematic Review of Artificial Intelligence-Assisted Techniques. Renew. Sustain. Energy Rev. 2021, 144, 110969. [Google Scholar] [CrossRef]
  51. Moustris, K.; Kavadias, K.A.; Zafirakis, D.; Kaldellis, J.K. Medium, Short and Very Short-Term Prognosis of Load Demand for the Greek Island of Tilos Using Artificial Neural Networks and Human Thermal Comfort-Discomfort Biometeorological Data. Renew. Energy 2020, 147, 100–109. [Google Scholar] [CrossRef]
  52. Gomez-Sanchez, S.; Rodríguez-Hernández, O.; Barrios, G. Analysis of the Relationship between the Regional Electricity Demand and the Universal Thermal Climate Index in Mexico. Energy Sustain. Dev. 2026, 92, 101968. [Google Scholar] [CrossRef]
  53. Brimicombe, C.; Di Napoli, C.; Quintino, T.; Pappenberger, F.; Cornforth, R.; Cloke, H.L. Thermofeel: A Python Thermal Comfort Indices Library. SoftwareX 2022, 18, 101005. [Google Scholar] [CrossRef]
  54. Wang, X.; Fu, X.; Xu, W.; Yao, L. The signatures and drivers of thermal comfort across the urban-rural gradient in Chinese cities from 2000 to 2020. Acta Geogr. Sin. 2024, 79, 1318–1336. [Google Scholar] [CrossRef]
  55. Sailor, D.J. Relating Residential and Commercial Sector Electricity Loads to Climate—Evaluating State Level Sensitivities and Vulnerabilities. Energy 2001, 26, 645–657. [Google Scholar] [CrossRef]
  56. Bessec, M.; Fouquau, J. The Non-Linear Link between Electricity Consumption and Temperature in Europe: A Threshold Panel Approach. Energy Econ. 2008, 30, 2705–2721. [Google Scholar] [CrossRef]
  57. Hong, T.; Fan, S. Probabilistic Electric Load Forecasting: A Tutorial Review. Int. J. Forecast. 2016, 32, 914–938. [Google Scholar] [CrossRef]
  58. Kacprzyk, J.; Pedrycz, W. (Eds.) Springer Handbook of Computational Intelligence; Springer: Berlin/Heidelberg, Germany, 2015; ISBN 978-3-662-43504-5. [Google Scholar]
  59. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  60. Höppe, P. The Physiological Equivalent Temperature—A Universal Index for the Biometeorological Assessment of the Thermal Environment. Int. J. Biometeorol. 1999, 43, 71–75. [Google Scholar] [CrossRef] [PubMed]
  61. Luo, S.B.; Ji, Z.P.; Ma, Y.H.; Luo, X.M.; Zeng, Q.; Lin, S.B. The Variability Characteristics and Prediction of Guangdong Electrical Load During 2002–2004. J. Trop. Meteorol. 2007, 23, 153–161. [Google Scholar] [CrossRef]
Figure 1. Guangdong daily maximum power load from 2019–2021.
Figure 1. Guangdong daily maximum power load from 2019–2021.
Atmosphere 17 00463 g001
Figure 2. Correlation between summer meteorological load and cumulative meteorological load in Guangdong Province from 2019 to 2021.
Figure 2. Correlation between summer meteorological load and cumulative meteorological load in Guangdong Province from 2019 to 2021.
Atmosphere 17 00463 g002
Figure 3. Hyperparameter Search Heatmap (Normalized Test RMSE). Note: Rows represent combinations of each index × window (AT Day1–3, ET Day1–3, PET Day1–3, UTCI Day1–3), and columns represent four algorithms. Greener color indicates lower RMSE (better performance), and bold text marks the optimal configuration for each algorithm column.
Figure 3. Hyperparameter Search Heatmap (Normalized Test RMSE). Note: Rows represent combinations of each index × window (AT Day1–3, ET Day1–3, PET Day1–3, UTCI Day1–3), and columns represent four algorithms. Greener color indicates lower RMSE (better performance), and bold text marks the optimal configuration for each algorithm column.
Atmosphere 17 00463 g003
Figure 4. Weekly Load Pattern and Weekend Relative Load Ratio (2019–2021). Note: Subfigure (a) shows the average weekly load pattern (2019–2021), with the vertical axis for average load and horizontal axis for days of the week. Subfigure (b) presents weekend load relative to weekday average: solid lines for Saturday/Sunday relative percentages, the gray dashed line for the 100% reference line, and dotted lines for their overall averages. Bold text in (a) marks daily average load values.
Figure 4. Weekly Load Pattern and Weekend Relative Load Ratio (2019–2021). Note: Subfigure (a) shows the average weekly load pattern (2019–2021), with the vertical axis for average load and horizontal axis for days of the week. Subfigure (b) presents weekend load relative to weekday average: solid lines for Saturday/Sunday relative percentages, the gray dashed line for the 100% reference line, and dotted lines for their overall averages. Bold text in (a) marks daily average load values.
Atmosphere 17 00463 g004
Figure 5. Holiday Load Percentage Relative to Pre-Holiday Weekday Average Note: Subfigure (a) shows May Day load percentage, and (b) shows Dragon Boat Festival load percentage. Lines represent different years (2019–2021), the gray dotted line indicates the 100% normal level, and bold text marks the optimal/minimal values.
Figure 5. Holiday Load Percentage Relative to Pre-Holiday Weekday Average Note: Subfigure (a) shows May Day load percentage, and (b) shows Dragon Boat Festival load percentage. Lines represent different years (2019–2021), the gray dotted line indicates the 100% normal level, and bold text marks the optimal/minimal values.
Atmosphere 17 00463 g005
Figure 6. Comparative time-series forecasting performance of linear regression, backpropagation neural network, random forest, and stacked ensemble models using thermal comfort indices during the summer peak load period (May–September 2021). (A) Linear regression (LR) with dash-dot lines. (B) Backpropagation neural network (BP) with dotted lines. (C) Random forest (RF) with dashed lines. (D) Stacked ensemble model (Ridge meta-learner) with solid lines. Black solid lines represent observed load.
Figure 6. Comparative time-series forecasting performance of linear regression, backpropagation neural network, random forest, and stacked ensemble models using thermal comfort indices during the summer peak load period (May–September 2021). (A) Linear regression (LR) with dash-dot lines. (B) Backpropagation neural network (BP) with dotted lines. (C) Random forest (RF) with dashed lines. (D) Stacked ensemble model (Ridge meta-learner) with solid lines. Black solid lines represent observed load.
Atmosphere 17 00463 g006aAtmosphere 17 00463 g006b
Figure 7. Bayesian posterior distributions of RMSE differences between UTCI and air temperature across modeling algorithms. Positive values indicate superior performance of UTCI over AT. Shaded regions represent the ROPE (region of practical equivalence), and dashed vertical lines denote the 95% highest density interval (HDI). BF10, Bayes factor; RF, random forest.
Figure 7. Bayesian posterior distributions of RMSE differences between UTCI and air temperature across modeling algorithms. Positive values indicate superior performance of UTCI over AT. Shaded regions represent the ROPE (region of practical equivalence), and dashed vertical lines denote the 95% highest density interval (HDI). BF10, Bayes factor; RF, random forest.
Atmosphere 17 00463 g007
Figure 8. Residual diagnostic analysis for evaluating prediction error distributions and heteroscedasticity patterns across thermal comfort indices and algorithm types. (A) Linear regression (LR). (B) BP neural network. (C) Random forest. (D) Stacked ensemble model. Scatter plots show residuals vs. predicted load.
Figure 8. Residual diagnostic analysis for evaluating prediction error distributions and heteroscedasticity patterns across thermal comfort indices and algorithm types. (A) Linear regression (LR). (B) BP neural network. (C) Random forest. (D) Stacked ensemble model. Scatter plots show residuals vs. predicted load.
Atmosphere 17 00463 g008aAtmosphere 17 00463 g008b
Figure 9. Residual analysis and forecast error comparison of machine learning algorithms across thermal comfort indices. (a) box plots of residual distributions; (b) RMSE comparison; (c) standard deviation of residuals; (d) residual time series for best and worst performers. All error metrics are in units of 104 kW.
Figure 9. Residual analysis and forecast error comparison of machine learning algorithms across thermal comfort indices. (a) box plots of residual distributions; (b) RMSE comparison; (c) standard deviation of residuals; (d) residual time series for best and worst performers. All error metrics are in units of 104 kW.
Atmosphere 17 00463 g009
Figure 10. Comparison of 90% Bootstrap prediction interval metrics across different models and algorithms: (a) coverage probability (PICP); (b) normalized average width (PINAW).
Figure 10. Comparison of 90% Bootstrap prediction interval metrics across different models and algorithms: (a) coverage probability (PICP); (b) normalized average width (PINAW).
Atmosphere 17 00463 g010
Figure 11. Probabilistic forecasting performance of the Ensemble-UTCI model: (a) Annual 90% prediction intervals (May–September 2021) with smoothed observed load (red dashed box indicates the extreme event window), and (b) detailed prediction during the extreme heat event (23–30 July 2021) showing observed load, predicted values, and corresponding MAPE, dispersion, and PICP.
Figure 11. Probabilistic forecasting performance of the Ensemble-UTCI model: (a) Annual 90% prediction intervals (May–September 2021) with smoothed observed load (red dashed box indicates the extreme event window), and (b) detailed prediction during the extreme heat event (23–30 July 2021) showing observed load, predicted values, and corresponding MAPE, dispersion, and PICP.
Atmosphere 17 00463 g011
Table 1. Stability of Socio-economic Indicators in Guangdong Province (2019–2021).
Table 1. Stability of Socio-economic Indicators in Guangdong Province (2019–2021).
Indicator CategorySpecific Indicators201920202021
A. Industrial StructureShare of secondary industry/%40.239.540.4
Share of tertiary industry/%55.856.355.6
B. Electricity Consumption StructureShare of industrial electricity/%60.859.858.9
Share of residential electricity/%16.11716.7
Share of wholesale & catering electricity/%6.26.26.6
Share of temperature-sensitive load/%22.323.223.3
C. Population & UrbanizationUrbanization rate/%72.6574.1574.63
Inter-provincial net migration rate/‰6.195.515.87
Permanent population/10,000 persons12,48912,62412,684
Male share/%52.2753.0752.77
Female share/%47.7346.9347.23
Share of population aged 0–14/%16.2818.8518.73
Share of population aged 15–64/%74.7272.5772.15
Share of population aged ≥65/%98.589.12
Table 2. Correlation Coefficients Between Summer Meteorological Load and Meteorological Factors in Guangdong Province from 2019 to 2021.
Table 2. Correlation Coefficients Between Summer Meteorological Load and Meteorological Factors in Guangdong Province from 2019 to 2021.
YearMonthAverage
Temperature/°C
ET/°CPET/°CUTCI/°CRelative
Humidity/%
Wind
Speed/(m·s−1)
201950.74 ***0.76 ***0.75 ***0.76 ***0.37 **0.10
60.38 **0.41 **0.40 **0.40 **−0.110.04
70.40 **0.41 **0.41 **0.41 **−0.40 **−0.19
80.75 ***0.75 ***0.75 ***0.74 ***−0.69 ***−0.36 **
90.43 **0.45 **0.44 **0.44 **0.30−0.27
202050.40 **0.47 ***0.47 ***0.49 ***0.31 *0.35 *
60.74 ***0.78 ***0.78 ***0.80 ***−0.160.32 *
70.47 ***0.48 ***0.47 ***0.47 ***−0.53 ***−0.26
80.47 ***0.52 ***0.50 ***0.54 ***−0.31 *−0.34 *
90.69 ***0.65 ***0.67 ***0.64 ***−0.41 **−0.17
202150.84 ***0.84 ***0.84 ***0.83 ***−0.37 **0.63 ***
60.58 ***0.56 ***0.58 ***0.54 ***−0.32 *0.44 **
70.55 ***0.55 ***0.55 ***0.55 ***−0.48 ***−0.16
80.190.130.130.10−0.110.26
90.270.310.310.32−0.16−0.18
Average 0.53 ***0.54 ***0.54 ***0.54 ***−0.20 **0.01
Note: *, **, and *** represent significance levels of 0.10, 0.05, and 0.01, respectively.
Table 3. (a) Input Feature Configuration for Each Prediction Model. (b) Hyperparameters and training settings of BP neural network and random forest model.
Table 3. (a) Input Feature Configuration for Each Prediction Model. (b) Hyperparameters and training settings of BP neural network and random forest model.
(a)
ModelIndexList of Input Features
Model 1ATMeteorological load of the previous day + Average AT of the previous day (AT)
Model 2ETMeteorological load of the previous day + Average ET of the previous 2 days (ET)
Model 3PETMeteorological load of the previous day + Average PET of the previous day (PET)
Model 4UTCIMeteorological load of the previous day + Average UTCI of the previous 2 days (UTCI)
(b)
CategoryParameter TypeParameter NameModel 1 (AT)Model 2 (ET)Model 3 (PET)Model 4 (UTCI)
BP Neural NetworkTraining OptimizationNumber of hidden layer neurons(64, 32)
Training optimizationInitial learning rate0.01
Maximum number of iterations3000
Early stopping strategyEnabled
Core FunctionActivation functionReLU
OptimizerAdam
Random ForestTree StructureNumber of decision trees149189189189
Maximum tree depth20151515
Feature SelectionMaximum number of featureslog2sqrtsqrtsqrt
Node SplittingMinimum samples for splitting5101010
Minimum samples per leaf1222
Sampling StrategyBootstrap samplingTRUETRUETRUETRUE
Parallel Computingn_jobs−1−1−1−1
Table 4. High-Temperature Conditions in Guangdong During Two Holiday Periods from May to September, 2019–2021.
Table 4. High-Temperature Conditions in Guangdong During Two Holiday Periods from May to September, 2019–2021.
Day2019/°C2020/°C2021/°CAverage/°C
May Day Holiday29-April30.33027.729.3
30-April27.630.53129.7
1-May25.730.332.529.5
2-May23.531.231.228.6
3-May25.433.526.228.4
4-May2433.929.529.1
5-May2233.630.428.7
Dragon Boat Festival 3 Days Before32.634.732.833.4
2 Days Before31.834.83132.5
1 Day Before32.934.631.332.9
Festival Day33.833.232.233.1
1 Day After33.933.333.533.6
2 Days After33.134.233.233.5
3 Days After31.134.234.433.2
Table 5. Parameter statistics for forecast model of the summer maximum power load in Guangdong.
Table 5. Parameter statistics for forecast model of the summer maximum power load in Guangdong.
DatasetAlgorithmModelMSE/(104 kW)2R2RMSE/104 kWError < 2%Error < 5%AvgRE%MaxRE%
Training Set (2019–2020, May–September)LinearAT510,9590.615714.810.2520.6185.46550.848
ET506,5250.618711.710.2650.6085.4650.913
PET506,7270.618711.850.2710.6185.45951.047
UTCI505,8010.619711.20.2520.6275.45952.133
BPAT508,8650.616713.350.2710.5955.46251.273
ET497,5530.625705.370.2610.5955.38952.06
PET499,2630.624706.590.2710.5885.41551.875
UTCI496,0000.626704.270.2580.6315.37453.851
RFAT147,3400.889383.850.4740.8592.89328.428
ET243,2130.817493.170.3690.7583.71839.009
PET255,8630.807505.830.3790.7683.82539.44
UTCI263,7630.801513.580.3990.7523.85140.63
EnsembleAT226,4110.829475.830.4280.7713.52336.226
ET244,9110.815494.880.3660.7583.73939.289
PET275,1060.793524.510.3790.7613.94941.334
UTCI284,0490.786532.960.3920.7353.99842.248
Testing Set (2021, May–September)LinearAT398,0980.528630.950.3850.7344.00334.906
ET412,1960.511642.030.3150.7414.19832.437
PET397,9830.528630.860.3570.7344.05734.395
UTCI414,7100.508643.980.3080.7274.23832.09
BPAT391,7290.535625.880.3990.7483.92533.548
ET410,4520.513640.670.3150.7624.19531.403
PET392,4830.534626.480.3780.7623.98233.619
UTCI417,0220.505645.770.2870.7624.26331.231
RFAT443,3410.474665.840.3360.7134.18538.471
ET395,8880.53629.20.3220.724.07235.321
PET410,7860.513640.930.3920.7343.95937.272
UTCI375,7700.5546130.3080.723.9934.695
EnsembleAT407,4460.517638.310.3640.7693.9637.878
ET386,3540.542621.570.3220.7484.01935.081
PET397,9140.528630.80.3850.7483.89637.367
UTCI371,6250.559609.610.3080.723.97334.667
Table 6. Comparison of forecast accuracy and prediction interval reliability across thermal comfort indices and machine learning algorithms during regular and extreme heat event periods.
Table 6. Comparison of forecast accuracy and prediction interval reliability across thermal comfort indices and machine learning algorithms during regular and extreme heat event periods.
ModelAlgoMAPE (%)DispersionPICP
(May–September) (%)
PICP
(Extreme) (%)
PINAW
(%)
ATLinear3.7565.890.9187.515.79
ATBP3.5355790.2187.515.45
ATRF3.5475.790.9187.515.86
ATEnsemble3.4848489.517515.56
ETLinear4.11585.290.2187.516.23
ETBP4.01581.790.217515.9
ETRF3.36499.389.5187.514.91
ETEnsemble3.45507.490.2187.514.49
PETLinear3.72563.890.9187.515.52
PETBP3.49547.890.2187.515.1
PETRF3.03416.690.2110015.27
PETEnsemble3.09427.890.9110014.74
UTCILinear4.12585.990.2187.516.23
UTCIBP4.03579.390.217515.74
UTCIRF3.6523.791.6187.514.66
UTCIEnsemble3.6552690.217514.44
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Miao, J.; Yang, H.; Zhang, Y.; Hao, Q.; Peng, L.; Xu, F.; Shen, H. Analysis of the Impact of Biometeorological Thermal Indices on Summer Peak Power Load Forecasting in Guangdong Province. Atmosphere 2026, 17, 463. https://doi.org/10.3390/atmos17050463

AMA Style

Miao J, Yang H, Zhang Y, Hao Q, Peng L, Xu F, Shen H. Analysis of the Impact of Biometeorological Thermal Indices on Summer Peak Power Load Forecasting in Guangdong Province. Atmosphere. 2026; 17(5):463. https://doi.org/10.3390/atmos17050463

Chicago/Turabian Style

Miao, Jingqi, Hui Yang, Yu Zhang, Quancheng Hao, Liying Peng, Feng Xu, and Haibo Shen. 2026. "Analysis of the Impact of Biometeorological Thermal Indices on Summer Peak Power Load Forecasting in Guangdong Province" Atmosphere 17, no. 5: 463. https://doi.org/10.3390/atmos17050463

APA Style

Miao, J., Yang, H., Zhang, Y., Hao, Q., Peng, L., Xu, F., & Shen, H. (2026). Analysis of the Impact of Biometeorological Thermal Indices on Summer Peak Power Load Forecasting in Guangdong Province. Atmosphere, 17(5), 463. https://doi.org/10.3390/atmos17050463

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop