A Short-Term Load Forecasting Model Based on STL Decomposition and CNN-BiLSTM Optimized by Deep Reinforcement Learning

Wang, Yi; Zhou, Jian; Wu, Gang; Ma, Ruiguang; Ma, Tiannan; Liu, Jichun; Wang, Dezhuang

doi:10.3390/electronics15112375

Open AccessArticle

A Short-Term Load Forecasting Model Based on STL Decomposition and CNN-BiLSTM Optimized by Deep Reinforcement Learning

by

Yi Wang

¹,

Jian Zhou

¹,

Gang Wu

¹,

Ruiguang Ma

²,

Tiannan Ma

²,

Jichun Liu

^3,* and

Dezhuang Wang

³

¹

State Grid Sichuan Electric Power Company, Chengdu 610065, China

²

State Grid Sichuan Economic Research Institute, Chengdu 610041, China

³

School of Electrical Engineering, Sichuan University, Chengdu 610065, China

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(11), 2375; https://doi.org/10.3390/electronics15112375

Submission received: 8 May 2026 / Revised: 27 May 2026 / Accepted: 28 May 2026 / Published: 1 June 2026

(This article belongs to the Special Issue Reinforcement Learning: Emerging Techniques and Future Prospects)

Download

Browse Figures

Versions Notes

Abstract

Accurate short-term electricity load forecasting is crucial for day-ahead scheduling and secure operation of power systems. However, electricity load series exhibit significant non-stationarity, with complex coupling between low-frequency trends and high-frequency fluctuations, making it difficult for conventional forecasting models to simultaneously characterize the overall trend and stochastic disturbances. To address this issue, this paper proposes a short-term load forecasting model based on STL decomposition and CNN-BiLSTM optimized by deep reinforcement learning. First, the original load series is decomposed into trend, seasonal, and residual components using the STL algorithm. Second, a dual-channel parallel forecasting architecture is constructed: the linear channel uses a linear regression model to predict the trend and seasonal components, thereby characterizing the low-frequency variations in the load; the nonlinear channel uses a CNN-BiLSTM framework optimized by deep reinforcement learning to predict the high-frequency residual component, and this process is formulated as a Markov decision process. Specifically, the attention-based CNN-BiLSTM serves as the policy network, and its forecasting strategy is dynamically optimized under the guidance of a reward function to enhance the modeling capability for high-frequency stochastic fluctuations. Finally, the load forecasting results for the next 24 h are obtained through dual-channel result reconstruction. Experimental results based on the ERCOT system-level load data show that the proposed model achieves superior forecasting performance, with a root mean square error of 976.4 MW and a mean absolute percentage error of 1.81%. Further multi-season testing, meteorological perturbation analysis, fair comparison under the same STL preprocessing, and ablation experiments demonstrate that the proposed model maintains good forecasting performance under different seasonal scenarios, meteorological input errors, and fair experimental settings, thereby validating its effectiveness for short-term load forecasting.

Keywords:

short-term electricity load forecasting; STL decomposition; deep reinforcement learning; CNN-BiLSTM; dual-channel model

1. Introduction

Power load forecasting provides an essential foundation for day-ahead dispatch, operational scheduling, and security assessment in power systems [1,2]. With the continuous development of new power systems and the increasing penetration of renewable energy, uncertainty and volatility on the load side have further increased [3]. Raw load series usually contain slowly varying trend components, relatively stable periodic components, and high-frequency disturbances jointly driven by meteorological conditions and stochastic behaviors. For day-ahead short-term forecasting tasks, how to capture both the overall load trend and local disturbances under non-stationary conditions remains a key issue in short-term load forecasting research.

To address the modeling difficulties caused by the complex components of load series, extensive studies have been conducted. Traditional statistical models, such as time series analysis [4], regression analysis [5], exponential smoothing [6], and Kalman filtering [7], have the advantage of clear structures, but their adaptability to complex nonlinear relationships and multi-scale fluctuations is limited. Machine learning methods, such as random forest [8], extreme learning machine [9], and support vector regression [10], improve nonlinear fitting capability, but most of them still perform unified modeling directly on the original load series. In recent years, decomposition–ensemble methods have gradually become a research focus. Related studies have employed decomposition algorithms such as STL [11] to extract periodic trend components, seasonal components, and stochastic high-frequency components, thereby reducing the modeling difficulty of the original series and improving forecasting performance by feeding the decomposed results into subsequent forecasting models. However, most existing studies treat decomposition mainly as a preprocessing step and have not yet constructed targeted forecasting models according to the characteristics of different components, lacking a differentiated modeling mechanism.

With the development of deep learning, existing studies have mainly adopted deep learning models for forecasting [12]. Convolutional neural networks [13] can extract local fluctuations from time series, while LSTM [14,15] and BiLSTM [16,17] enhance the characterization of long- and short-term dependencies through gated memory mechanisms and bidirectional temporal modeling. Meanwhile, deep reinforcement learning has also been introduced into load-related tasks [18,19,20], indicating that reinforcement learning has considerable application potential in load forecasting. However, in general, the synergy between existing deep learning models and deep reinforcement learning algorithms remains limited. For high-frequency components with stronger randomness and more pronounced abruptness, existing studies still mainly rely on static supervised losses for training and lack a mechanism that can dynamically adjust forecasting strategies based on cumulative feedback.

In addition, in terms of model combination, existing studies have developed several routes, including single-model methods [21] and hybrid-model methods [22,23]. However, most methods still focus on module stacking and pay relatively little attention to model design from the perspective of load components. Directly fitting the load variation process with deep learning networks may cause low-frequency variations and high-frequency disturbances to interfere with each other in the same representation space. In contrast, simply applying multiple forecasting models after load decomposition also makes it difficult to fully characterize the properties of different load components. Therefore, existing combination methods still lack a collaborative forecasting model with clear task division and logical consistency.

In summary, existing research on short-term load forecasting still has the following two limitations. First, although decomposition methods can reduce the impact of load non-stationarity, most studies still remain at the preprocessing stage and lack differentiated modeling based on component differences. Second, although deep learning models and attention mechanisms have improved the representation capability for complex temporal features, and deep reinforcement learning has also been applied to some load forecasting tasks, collaborative strategy optimization for high-frequency components remains insufficient.

To address these issues, this paper proposes a dual-channel short-term load forecasting model based on STL decomposition, linear regression, and CNN-BiLSTM optimized by deep reinforcement learning. First, STL is used to decompose the original load series into trend, seasonal, and residual components, based on which a dual-channel forecasting framework is constructed. In this framework, the linear channel is used to learn the low-frequency deterministic variations in the trend and seasonal components, while the nonlinear channel is used to model the high-frequency stochastic disturbances in the residual component. Second, a CNN-BiLSTM network is constructed in the residual channel and optimized within a deep reinforcement learning framework, where the forecasting strategy is improved through environment interaction and reward feedback. Finally, the forecasting results for the next 24 h are obtained through component reconstruction and compared with LSTM, BiLSTM, GRU, CNN-BiLSTM, TCN, and other models on the ERCOT system-level load dataset. Furthermore, multi-season testing, meteorological perturbation analysis, fair comparison under the same STL preprocessing, and ablation experiments are designed to evaluate the effectiveness of the proposed method from the perspectives of overall forecasting performance, seasonal adaptability, sensitivity to meteorological input errors, and the contribution of key modules. The experimental results show that, under the current dataset and experimental settings, the proposed model achieves better forecasting performance, with RMSE and MAPE reaching 976.4 MW and 1.81%, respectively, and outperforms both the baselines under the same STL preprocessing and the degraded model without the reinforcement learning module.

2. Model Forecasting Framework

This paper proposes a dual-channel short-term load forecasting model based on STL decomposition and CNN-BiLSTM optimized by deep reinforcement learning, as shown in Figure 1. First, meteorological features are selected through Pearson correlation analysis, and an input dataset is constructed by combining the selected meteorological features with date features and historical load data. Then, STL is used to decompose the original load series into trend, seasonal, and residual components. Based on the decomposed components, a dual-channel forecasting framework is constructed: the trend and seasonal components are fed into the linear channel for prediction, while the residual component is fed into the CNN-BiLSTM channel optimized by deep reinforcement learning. Finally, the load forecasting results are obtained through component reconstruction, and the forecasting performance is evaluated using error metrics.

3. Factors Affecting Electrical Load

The variation patterns of electricity load are influenced by multiple factors and are closely related to the meteorological conditions and living habits of a region. Therefore, this paper incorporates load-related influencing factors into the input variables to improve the forecasting performance of the neural network model. The influencing factors of electricity load cover several aspects. For short-term load forecasting, they can be mainly divided into date-related factors and meteorological factors.

3.1. Date Factor

Date information is a key factor influencing the inherent periodicity of electricity load patterns, with load variations exhibiting pronounced cyclicality across different time scales.

Over a seven-day week, electricity load demonstrates distinct periodic characteristics. Typically, weekday load levels are higher and exhibit similar profiles, whereas weekend load levels drop significantly due to reduced commercial and industrial activity. Intraday load curves also change, with evening peaks potentially becoming more pronounced.

Load patterns during statutory holidays also follow certain rules. Statutory holidays cause substantial shifts in societal production and lifestyle patterns, resulting in load characteristics distinct from both typical weekdays and weekends. They typically manifest as a significant reduction in load levels, with patterns closer to weekends, though specific manifestations vary by holiday type. Accurate identification and handling of holiday information are crucial for avoiding major forecasting errors.

3.2. Meteorological Factors

Meteorological factors exert a significant influence on electricity load. Common meteorological factors affecting load include temperature, humidity, wind speed, and weather patterns.

Air temperature is one of the key meteorological factors influencing short-term electricity load variations. A pronounced nonlinear relationship exists between power load and temperature: during summer, rising ambient temperatures trigger widespread use of cooling equipment, causing load increases; conversely, in winter, falling temperatures drive extensive deployment of heating equipment, similarly leading to load growth. Atmospheric humidity is another significant meteorological factor affecting electricity load. It typically does not directly impact load but indirectly amplifies or alters temperature’s driving effect on load by regulating human apparent temperature and comfort levels. Wind speed influences load by promoting building ventilation during summer gusts, thereby reducing the need for cooling equipment and lowering electricity demand. The combined effects of wind speed, temperature, and humidity on human comfort also influence the usage of related electrical appliances, thereby affecting power demand.

Compared to the ambient temperature in weather forecasts, apparent temperature more accurately reflects residential electricity consumption patterns. Apparent temperature represents the human body’s comprehensive sensation of multiple meteorological factors including temperature, humidity, and wind speed. The calculation formula for apparent temperature is shown in Equations (1) and (2).

A T = T_{a} + 0.33 \times e - 0.70 \times W S - 4.00

(1)

e = \frac{R H}{100} \times 6.105 \times \exp (\frac{17.27 \times T_{a}}{237.7 + T_{a}})

(2)

where

A T

is the apparent temperature,

T_{a}

is the air temperature,

e

is the vapor pressure,

W S

is the wind speed at 10 m above ground level, and

R H

is the relative humidity.

3.3. Summary of Influencing Factors

The Pearson correlation coefficient is used to quantify the linear relationship between input variables in electricity load forecasting, as shown in Equation (3).

R_{x y} = \frac{\sum_{i = 1}^{n} (x_{i} - \bar{x}) (y_{i} - \bar{y})}{\sqrt{\sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{2}} \sqrt{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}}

(3)

where

R_{x y}

represents the linear correlation coefficient between input variables.

x

and

y

denote two variables, while

\bar{x}

and

\bar{y}

represent the respective means of

x

and

y

. The closer the absolute value of

R_{x y}

approaches 1, the stronger the correlation between the two variables.

The impact of meteorological factors on load was analyzed using Pearson’s correlation coefficient, with the results shown in Figure 2.

To avoid information leakage from the test set, the Pearson correlation coefficients between each meteorological variable and the load were calculated using only the training samples. The input features were then selected by considering both correlation strength and the physical mechanisms through which meteorological factors affect load demand. Specifically,

|R_{x y}| \geq 0.3

was used as the screening threshold for meteorological variables, while the physical effects of temperature, humidity, wind speed, and solar radiation on residential electricity consumption and building heating/cooling loads were also considered. The correlation analysis results are shown in Figure 2.

As shown in Figure 2, the correlation coefficients between electricity load and apparent temperature, humidity, wind speed, and solar radiation are approximately 0.77, 0.63, −0.42, and 0.53, respectively, all meeting the feature selection criterion adopted in this study. Among them, apparent temperature comprehensively reflects the effects of air temperature, humidity, and wind speed on human thermal comfort, and is strongly related to residential cooling and heating electricity demand. Humidity indirectly affects air-conditioning load by changing perceived comfort. Wind speed influences building ventilation as well as heating and cooling load demand, while solar radiation is associated with daytime temperature rise and changes in building thermal load. In contrast, precipitation and ground air pressure show weaker correlations with load and have limited direct explanatory power in the present dataset; therefore, they were not selected as major meteorological input variables. Accordingly, apparent temperature, humidity, wind speed, and solar radiation were finally selected as meteorological input variables and combined with historical load and calendar variables to construct the model input feature set.

In summary, this paper uses raw load data, highly correlated meteorological data, and date data as inputs for model training, as shown in Table 1.

4. Model Principle

4.1. Seasonal Trend Decomposition Based on LOESS

Electricity load data are usually composed of fluctuations at multiple frequencies superimposed on each other. As a filtering process, STL decomposition decomposes the original load series through nested inner and outer iterative loops. The decomposition model is defined as follows:

Y_{t} = T_{t} + S_{t} + R_{t}

(4)

where

Y_{t}

represents the observed original load value at time

t

;

T_{t}

denotes the trend component, reflecting the long-term low-frequency variation trend in the load sequence;

S_{t}

indicates the seasonal component, reflecting fixed fluctuation patterns with specific periodicity;

R_{t}

signifies the residual component, representing the portion remaining after trend and seasonal components are removed.

The core mathematical mechanism of STL lies in utilizing the LOESS smoother to perform local fitting on trend and seasonal components. For each target point, LOESS fits a low-order polynomial within its local neighborhood by minimizing the weighted least squares error, as expressed by the following formula:

\min_{P} \sum_{i = 1}^{N} W (\frac{|x_{i} - x|}{λ}) {(Y_{i} - P (x_{i}))}^{2}

(5)

where

x

denotes the target time point requiring smoothing estimation,

x_{i}

represents the temporal position of sample points within a local window centered at

x

,

N

indicates the total number of samples contained within the local window,

P (x)

signifies the low-order polynomial fitted within the local neighborhood,

W (\cdot)

is the weighting function that assigns weights based on the distance between sample points and the target point,

λ

is the bandwidth parameter, and

Y_{i}

is the actual load value at the time point

x_{i}

.

To avoid introducing future information into the STL decomposition during day-ahead forecasting, this study adopts a rolling causal decomposition strategy to construct forecasting samples. For any forecasting origin

t

, STL decomposition is performed only on the historical load window of length

w

up to and including the current time, without using the actual load values within the forecasting horizon. Therefore, the decomposition procedure in the testing stage is consistent with practical day-ahead forecasting scenarios and avoids look-ahead information leakage caused by involving future load values in the decomposition.

Meanwhile, considering that LOESS smoothing may suffer from endpoint effects at both ends of a finite window, this study combines a historical decomposition window with an effective input window during sample construction to reduce boundary-induced disturbances. Specifically, decomposition is first performed over a relatively long historical window, and then the decomposed results close to the forecasting origin are extracted as model inputs. This treatment reduces the influence of boundary effects on forecasting inputs without introducing future actual load information.

4.2. CNN Network

The core advantages of convolutional neural networks lie in their local feature extraction capability and parameter-sharing mechanism, which make them particularly useful for processing multivariate time series data. When dealing with time series problems, one-dimensional convolutional neural network (1D-CNN) are commonly used. A 1D-CNN captures local patterns and short-term dependencies in a sequence by sliding convolution kernels along the temporal dimension.

Z_{j, t} = b_{j} + \sum_{c = 1}^{d} \sum_{s = 0}^{k - 1} w_{j, c, s} \cdot x_{c, t - s}, t = 1, \dots, k

(6)

Re L U (z) = \max (0, z)

(7)

where

Z_{j, t}

is the linear output of the convolutional layer,

j

is the

j

convolutional kernel,

t

is the temporal position of the convolutional output,

b_{j}

is the bias term of the

j

convolutional kernel,

d

is the feature dimension,

w_{j, c, s}

is the weight of the

j

convolutional kernel at offset

s

within channel

c

and window,

x_{c, t - s}

is the value of the

c

variable at time

t - s

, and

k

is the kernel length.

4.3. Attention-Based BiLSTM Model

4.3.1. Standard LSTM Unit

Long short-term memory (LSTM) is an advanced variant of recurrent neural networks. Unlike the simple recurrent structure of a standard RNN unit, an LSTM is internally designed with a memory cell and three special gating structures, namely the forget gate, input gate, and output gate. Through sigmoid activation functions and element-wise multiplication operations, these gates control the flow, retention, and discarding of information in the memory cell. A typical LSTM unit structure is shown in the Figure 3.

The calculation formula is shown in the equation.

f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f})

(8)

i_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i})

(9)

{\tilde{C}}_{t} = \tanh (W_{C} \cdot [h_{t - 1}, x_{t}] + b_{C})

(10)

C_{t} = f_{t} ⊙ C_{t - 1} + i_{t} ⊙ {\tilde{C}}_{t}

(11)

o_{t} = σ (W_{o} \cdot [h_{t - 1}, x_{t}] + b_{o})

(12)

h_{t} = o_{t} ⊙ \tanh (C_{t})

(13)

where

f_{t}

,

i_{t}

,

{\tilde{C}}_{t}

,

C_{t}

,

o_{t}

and

h_{t}

represent the forget gate, input gate, candidate memory unit state vector, memory unit state vector, output gate, and hidden state vector at time step

t

, respectively.

4.3.2. BiLSTM Modeling

Standard LSTMs can capture long-range dependencies, but their unidirectional structure can only utilize historical information prior to time step

t

, failing to perceive subsequent evolutionary trends. In power load forecasting, load curve variations often exhibit smoothness and continuity. The current state is influenced not only by historical accumulation but also potentially constrained by future trends.

To this end, this paper employs a bidirectional long short-term memory network (BiLSTM). The BiLSTM consists of two independent LSTM layers operating in opposite directions: the forward layer processes the input sequence in chronological order from 1 to

t

to extract forward features, while the backward layer processes the sequence in reverse order from

t

to 1 to capture backward dependencies.

Let the forward hidden state be

\vec{h_{t}}

and the backward hidden state be

\overset{\leftarrow}{h_{t}}

. The final output state

H_{t}

of the BiLSTM at time

t

is formed by concatenating the forward and backward states, thereby integrating bidirectional contextual features. The relevant formula is as follows:

{\vec{h}}_{t} = {LSTM}_{f w d} (x_{t}, {\vec{h}}_{t - 1})

(14)

{\overset{\leftarrow}{h}}_{t} = {LSTM}_{b w d} (x_{t}, {\overset{\leftarrow}{h}}_{t + 1})

(15)

H_{t} = [{\vec{h}}_{t} \oplus {\overset{\leftarrow}{h}}_{t}]

(16)

where

\vec{h_{t}}

denotes the forward hidden state,

{LSTM}_{f w d}

represents the forward LSTM cell,

x_{t}

is the input feature vector at time step

t

,

{\vec{h}}_{t - 1}

is the forward hidden state from the previous time step,

\overset{\leftarrow}{h_{t}}

denotes the backward hidden state,

{LSTM}_{b w d}

represents the backward LSTM cell,

\overset{\leftarrow}{h_{t + 1}}

is the backward hidden state at the future time step,

H_{t}

is the hidden state matrix, and

\oplus

represents the concatenation operation.

4.3.3. Attention Mechanism

In traditional BiLSTM models, the hidden state at the final time step is typically used directly for prediction. This paper introduces a temporal attention mechanism after the BiLSTM layer to evaluate the importance of each hidden state at every time step in the BiLSTM output for the current prediction task, as shown in Figure 4.

First, the hidden state matrix output from the BiLSTM is fed into a fully connected layer. A nonlinear activation function is applied to compute the energy score at each time step, representing the significance of the features at that moment:

u_{t} = \tanh (W_{w} H_{t} + b_{w})

(17)

where

u_{t}

represents the energy score,

W_{w}

denotes the weight matrix, and

b_{w}

signifies the bias term.

The scores are normalized using the Softmax function to obtain the attention weights

α_{t}

for each time step.

α_{t} = \frac{\exp (u_{t}^{T} v_{w})}{\sum_{i = 1}^{n} \exp (u_{i}^{T} v_{w})}

(18)

where

v_{w}

is a context vector initialized randomly.

Weight the hidden states at all time steps using the computed weights and sum them to obtain the final context feature vector

C

containing key information:

C = \sum_{t = 1}^{n} α_{t} H_{t}

(19)

To obtain the final residual prediction value, input

C

into the fully connected layer for linear mapping:

{\hat{Y}}_{t}^{r e s} = W_{f c} \cdot C + b_{f c}

(20)

where

{\hat{Y}}_{t}^{r e s}

represents the residual forecasting result,

W_{f c}

denotes the weights of the fully connected layer, and

b_{f c}

indicates the bias of the fully connected layer.

Unlike the linear mapping in conventional supervised learning paradigms, where the final prediction is directly obtained through gradient descent, the CNN-BiLSTM model in the proposed framework functions as a policy network. Its parameters are iteratively optimized through environment interaction and reward feedback driven by a deep reinforcement learning algorithm.

4.4. Overall Network Architecture

4.4.1. Task Definition and Dual-Channel Division of Roles

The proposed model adopts a dual-channel parallel architecture, decomposing the complex short-term load forecasting task into two subtasks: low-frequency channel forecasting and high-frequency channel forecasting. Considering that the load data are hourly time series, a day-ahead 24 h forecasting setting is adopted in this paper. For any given time step, STL decomposition is first performed on a historical load window of 168 points to obtain the trend, seasonal, and residual components. Then, the most recent 72 points of the decomposed results are taken as the effective input window of the model to predict the load values over the next 24 points.

According to the differences in component characteristics, the trend and seasonal components are assigned to the linear channel, while the residual component is assigned to the nonlinear channel. The trend and seasonal components mainly reflect the low-frequency long-term variations and fixed periodic patterns in the load series and, thus, exhibit relatively strong regularity. Nonlinear peaks, high-frequency disturbances, and local deviations caused by extreme temperature variations or abrupt electricity consumption behavior are mainly reflected in the residual component, and are modeled by the CNN-BiLSTM and the deep reinforcement learning optimization mechanism in the nonlinear residual channel. Therefore, this paper adopts a dual-channel collaborative forecasting strategy in which the linear channel characterizes low-frequency periodic variations, while the nonlinear channel models high-frequency stochastic fluctuations.

4.4.2. Linear Channel

The trend and seasonal components contain the low-frequency long-term trend and fixed periodic patterns in the load data. Such signals are usually stable and highly regular, and overly complex deep networks may lead to overfitting. Therefore, this paper constructs a linear regression channel, using future forecasted meteorological information as explanatory variables, and directly maps the future sum of the trend and seasonal components through a multi-output linear regression model.

{\hat{Y}}_{t}^{l i n} = W_{l i n} \cdot X_{t}^{l i n} + b_{l i n}

(21)

where

{\hat{Y}}_{t}^{l i n}

represents the linear component prediction value,

W_{l i n}

denotes the linear weight matrix,

X_{t}^{l i n}

signifies the weather characteristics at future time points, and

b_{l i n}

indicates the bias term.

4.4.3. Deep Reinforcement Learning Modeling of the Nonlinear Channel

The residual component contains strong high-frequency nonlinear fluctuations and stochastic disturbances. When CNN-BiLSTM is directly trained using a conventional supervised loss, the model mainly learns a static mapping relationship from historical inputs to future residual sequences. For the day-ahead 24 h forecasting task, forecasting errors are reflected not only in point-wise errors at individual time steps, but also in the overall shape deviation of the residual sequence over the entire forecasting window. Therefore, this study formulates the residual forecasting task as a window-level policy optimization problem in a continuous action space, and adopts a deep deterministic policy gradient framework to optimize the CNN-BiLSTM residual channel.

The reinforcement learning formulation in this study does not recursively generate future load values hour by hour, nor does it emphasize decomposing the 24 forecasting points into a step-by-step control process. Instead, the future 24 h residual forecasting vector is regarded as a continuous action as a whole, and the residual forecasting error over the entire forecasting window is used as the reward feedback. Accordingly, the Actor network directly outputs the future 24 h residual forecasting action according to the current historical state, while the Critic network evaluates the overall forecasting quality of the corresponding state–action pair. This design aims to optimize the high-frequency residual component from the perspective of the entire forecasting window, rather than replacing the local feature extraction and temporal dependency modeling functions of CNN-BiLSTM itself.

The MDP can be represented as a five-tuple:

M = 〈S, A, P, R, γ〉

(22)

where

S

denotes the state space,

A

denotes the action space,

P

denotes the state transition probability,

R

denotes the reward function, and

γ

denotes the discount factor.

For any forecasting sample, one episode in the residual channel is defined as follows: given the historical residual series over the most recent 72 time steps before time (t) and the corresponding auxiliary features, the agent outputs the residual forecasts for the next 24 time steps and receives reward feedback according to the forecasting error. In other words, one sample corresponds to one episode, and the action corresponds to the continuous prediction of the residual sequence over the next 24 h.

Specifically, the state

s_{t}

is composed of the historical residual series, the historical load series, and the date and meteorological features related to forecasting. First, a 1D-CNN is used to extract local high-dimensional features from the historical residual series. These features are then fed into the BiLSTM to learn temporal dependencies, followed by a temporal attention mechanism that assigns weights to key time steps. Finally, the current state representation is obtained.

s_{t} = f_{A t t} (f_{B i L S T M} (f_{C N N} (X_{t}^{r e s})))

(23)

where

X_{t}^{r e s}

denotes the input features of the residual channel, and

f_{C N N} (\cdot)

,

f_{B i L S T M} (\cdot)

,

f_{A t t} (\cdot)

denote convolutional feature extraction, bidirectional temporal modeling, and attention aggregation, respectively.

The action

a_{t}

is defined as the residual forecasting vector for the next 24 time steps output by the agent under the current state. In this study, the future 24 h residual forecasting sequence is defined as a continuous action as a whole, which means that the multi-step forecasting window is jointly modeled. The reward function is calculated based on the mean squared error between the predicted and actual residuals over the entire forecasting window; therefore, the update of the Actor network is constrained by window-level error feedback. Compared with conventional multi-output supervised regression, the Actor-Critic structure further introduces state–action value evaluation, enabling the residual channel to adjust its output strategy according to the overall forecasting quality of the prediction window during training. It should be noted that this mechanism does not imply that reinforcement learning is superior to supervised learning for all multi-step load forecasting tasks; its effectiveness should be verified together with the ablation experiments in this study.

a_{t} = {\hat{Y}}_{t}^{r e s}

(24)

In this paper, the CNN-BiLSTM network serves as the Actor and outputs deterministic continuous actions according to the current state, namely the residual forecasting results for the next 24 h. The Critic network is used to evaluate the value function of the state–action pair

Q (s_{t}, a_{t})

.

The reward function is used to evaluate the forecasting quality corresponding to the current action. To encourage the agent to output residual forecasts with smaller errors, this paper defines the immediate reward as the negative mean squared error between the predicted residuals and the actual residuals.

r_{t} = - α \cdot \frac{1}{H} \sum_{i = 1}^{H} {({\hat{y}}_{t + i}^{r e s} - y_{t + i}^{r e s})}^{2}

(25)

where

α

denotes the reward scaling factor;

{\hat{y}}_{t + i}^{r e s}

and

y_{t + i}^{r e s}

denote the predicted residual value and the actual residual value at time

t + i

, respectively; and

H

denotes the output window length.

Since the action space in this study is the future 24 h residual forecasting vector, directly performing policy search in the original scale may lead to excessively large action magnitudes and unstable training. To reduce the optimization difficulty of the continuous action space, the residual targets and input features are first normalized so that the Actor output remains within a stable numerical range. Second, an experience replay buffer is used to store historical samples, and mini-batches are randomly sampled to update the Actor and Critic networks, thereby reducing the temporal correlation among adjacent samples. Third, a target-network soft update mechanism is adopted, allowing the target Actor and target Critic to slowly track the parameters of the current networks, thus mitigating oscillations in value-function estimation. Finally, a reward scaling coefficient is introduced to control the numerical range of the reward and avoid excessive gradient fluctuations during Critic training. These settings are used to improve the stability of the DDPG training process and are combined with validation-error monitoring for model selection.

4.4.4. Result Reconstruction

During the inference stage, the linear channel outputs the predicted low-frequency components for the next 24 time steps, while the residual channel outputs the predicted high-frequency residual components for the next 24 time steps. The final load forecasting results are obtained through dual-channel reconstruction.

{\hat{Y}}_{t} = {\hat{Y}}_{t}^{l i n} + {\hat{Y}}_{t}^{r e s}

(26)

where

{\hat{Y}}_{t}

denotes the load forecasting values for the next 24 time steps.

5. Case Study

This study uses the public load data from the Electric Reliability Council of Texas (ERCOT) as the forecasting target. The data cover the period from 1 January 2019, to 31 December 2023, with an hourly temporal resolution. The public ERCOT load data include hourly load information for eight weather zones, namely Coast, East, Far West, North, North Central, South, South Central, and West, as well as the ERCOT system-level total load. In this study, the ERCOT-wide total load is used as the forecasting target; that is, the system-level hourly total load sequence over the entire ERCOT operating area is taken as the model output.

The meteorological data are obtained from the NASA POWER Hourly API, with the same time span and hourly resolution as the load data. In this study, NASA POWER meteorological data are extracted using a regional bounding box covering the main ERCOT operating area, which is located within 25.8° N–36.5° N and 103.5° W–93.5° W. For multiple NASA POWER meteorological grid points within the bounding box, the same meteorological variable is spatially averaged at each hourly timestamp to obtain system-level meteorological input series.

In terms of temporal synchronization, the ERCOT load data and NASA POWER meteorological data are unified at an hourly time scale and matched according to timestamps. Calendar variables are constructed from the timestamps. The weekend variable is encoded as a binary variable, with weekends set to 1 and non-weekends set to 0; the holiday variable is also encoded as a binary variable, with holidays set to 1 and non-holidays set to 0. For the hour variable, sine–cosine cyclic encoding is adopted considering its intraday periodicity, thereby avoiding an artificial distance discontinuity between 23:00 and 00:00.

Through the above processing, a regional meteorology–load joint dataset matched to the ERCOT system-level load forecasting task is constructed.

5.1. Experimental Preparation

5.1.1. Data Preprocessing

This study conducted a detailed examination and processing of anomalous data in the dataset. First, a comprehensive completeness check was performed on all feature columns. By traversing the entire time series, it was confirmed that the dataset used in this study contained no missing values and that the records were complete and continuous. Therefore, no interpolation or imputation was required. This provides a high-quality data basis for the subsequent analysis.

To eliminate the negative effects caused by differences in the scales of different features and to improve the training efficiency and performance of the model, all input features and output targets were normalized before being fed into the model. In this study, min–max normalization was adopted. This method linearly transforms the original data and maps all values into the interval [0,1]. Its mathematical formula is as follows:

x_{norm} = \frac{x - x_{\min}}{x_{\max} - x_{\min}}

(27)

where

x

is the original data point,

x_{\min}

and

x_{\max}

are the minimum and maximum values in the feature sequence, respectively, and

x_{norm}

is the normalized value.

After the model completes its prediction, the same set of parameters obtained from the training set is used to perform denormalization on the model’s output results, restoring them to their original numerical scale. This enables comparison with actual load values and calculation of errors. Its inverse normalization formula is:

x = x_{norm} \cdot (x_{\max} - x_{\min}) + x_{\min}

(28)

By implementing this standardized data normalization strategy, we ensure that all features contribute equally during model training while maintaining that the test set remains completely unknown to the model. This guarantees that the final performance evaluation results are authentic and reliable.

5.1.2. Dataset Division and Training Settings

This paper selects the last 30 days of the complete time series, corresponding to 720 hourly points, as the test period. Forecasting samples are constructed using a sliding-window strategy, and all samples whose forecasting start points fall within the test period are used for testing. Except for the test set, the remaining samples are divided into the training set and validation set in a 9:1 ratio according to chronological order. This division strategy avoids future information leakage and keeps the test set unknown throughout the entire modeling process, thereby providing a more objective evaluation of the model’s generalization capability.

To ensure comparability among different models, all comparison models adopt the same data division strategy, input window length

L = 72

, forecasting horizon

H = 24

, normalization strategy, and random seed setting. In the linear channel, a multi-output linear regression model is used to predict the trend and seasonal components for the next 24 h. In the nonlinear channel, the residual component is optimized and trained by the CNN-BiLSTM network within the deep reinforcement learning framework. For the DRL part, an experience replay buffer and a target network soft update mechanism are adopted to reduce temporal correlations among samples and improve training stability.

To improve the reproducibility of deep reinforcement learning training, specific training parameters are shown in Table 2.

5.1.3. Sample Construction and Rolling STL Decomposition

This study focuses on day-ahead 24 h short-term load forecasting and constructs samples using a sliding-window strategy. For any forecasting origin

t

, 168 historical hourly load points within the interval

[t - 167, t]

are first selected as the STL decomposition window to extract the trend, seasonal, and residual components. Then, the most recent 72 time steps of the decomposed component sequences are used as the effective model input to forecast the 24 hourly load values over

[t + 1, t + 24]

.

In this process, STL decomposition, feature construction, and model input generation are all strictly based on historical information before the forecasting origin. The actual load values over the future 24 h are used only as training labels or testing evaluation values, and are not involved in STL decomposition or input construction. Therefore, the sample construction strategy used in this study is consistent with rolling day-ahead forecasting scenarios and can avoid look-ahead information leakage caused by offline full-sequence decomposition.

Using a 168-point historical window covers one week of hourly load variations, which helps extract daily and weekly periodic information. The most recent 72 points are then selected as the effective input to reduce the influence of smoothing errors at both ends of the window on model inputs, while also controlling the input length and training complexity of the deep learning model. The training, validation, and test sets all adopt the same rolling STL decomposition and sample construction procedure to ensure consistency in data processing across different stages.

5.1.4. Experimental Error Evaluation Criteria

To effectively evaluate forecasting performance, three evaluation metrics are introduced: root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE), as shown in the following equations:

RMSE = \sqrt{\frac{1}{N} \sum_{t = 1}^{N} {(y_{t} - {\hat{y}}_{t})}^{2}}

(29)

MAE = \frac{1}{N} \sum_{t = 1}^{N} |y_{t} - {\hat{y}}_{t}|

(30)

MAPE = \frac{100 %}{N} \sum_{t = 1}^{N} |\frac{y_{t} - {\hat{y}}_{t}}{y_{t}}|

(31)

where

y_{i}

represents the actual load value at the time

t

,

{\hat{y}}_{i}

denotes the model’s predicted load value for that time point, and

N

indicates the total number of samples in the test set.

Among these metrics, RMSE is more sensitive to large forecasting errors and can more effectively reflect the model’s ability to suppress large deviations. MAE is used to measure the overall absolute error level, while MAPE characterizes model accuracy from the perspective of relative error. Considering the engineering context in which electricity load forecasting is sensitive to large errors, this paper uses RMSE as the primary evaluation metric, with MAE and MAPE used as supplementary metrics for comprehensive analysis.

5.2. Model Comparison

To comprehensively evaluate the performance of the proposed hybrid model and the comparison models, this section conducts a series of rigorous and reproducible comparative experiments on the preprocessed dataset. First, the STL decomposition results are presented. Then, model comparison experiments are carried out to analyze the forecasting errors of different models, thereby demonstrating the favorable forecasting performance of the proposed model.

During STL decomposition, the previous 168 data points are used for decomposition, and only the most recent 72 decomposed points are taken as the model input. Figure 5 shows the STL decomposition results for the last 30 days of the training set, corresponding to 720 data points. In the figure, y denotes the actual load curve, low denotes the low-frequency load trend composed of the trend and seasonal components, and resid denotes the residual component.

LSTM, BiLSTM, GRU, CNN-BiLSTM, and TCN are selected as baseline models. All models are trained and evaluated on the same training, validation, and test sets. Table 3 and Figure 6 present the performance of all models on the test set, covering three core evaluation metrics: RMSE, MAE, and MAPE.

According to the results, among all baseline models, single deep learning models such as LSTM, BiLSTM, and GRU are limited by their fitting bottlenecks for non-stationary sequences, resulting in generally high forecasting errors, with RMSE values all exceeding 2300 MW. Although TCN and CNN-BiLSTM enhance local feature extraction capability by introducing convolutional structures, they only slightly reduce the forecasting errors. In contrast, the proposed model achieves the best performance across all evaluation metrics, reducing the RMSE to 976.4 MW and achieving a MAPE of 1.81%, which indicates a significant improvement in forecasting accuracy.

To gain a deeper understanding of the forecasting behavior of the proposed model, comparative plots between the predicted load curves and the actual load curves on the test set are provided. Figure 7 presents the model comparison curves for a natural day, Figure 8 shows the model comparison curves over the entire test set, and Figure 9 illustrates the comparison between the forecasting results of the proposed model and the actual load curve over the entire test set.

In summary, the proposed model demonstrates strong load curve tracking capability across different time scales. Figure 7 presents the 24 h forecasting results of different models for a natural day. Compared with LSTM, GRU, BiLSTM, TCN, and CNN-BiLSTM, the forecasting curve of the proposed model is closer to the actual load curve. In particular, during periods of intraday load rise, decline, and peak–valley transitions, it can better preserve the variation trend of the curve, indicating that the dual-channel structure helps capture both low-frequency trends and local fluctuations. Figure 8 further shows the model comparison results over the entire test set. It can be observed that the baseline models exhibit obvious deviations or lagging behavior during some periods, whereas the proposed model maintains higher overall consistency with the actual curve, reflecting the effectiveness of separately modeling the low-frequency components and the high-frequency residual component after STL decomposition. Figure 9 separately illustrates the fitting performance of the proposed model on the test set. The forecasting curve can follow the actual load variations relatively steadily without showing obvious long-term drift, indicating that the CNN-BiLSTM residual channel optimized by deep reinforcement learning can, to some extent, enhance the model’s ability to respond to high-frequency stochastic disturbances. Overall, the visual results in the above figures show that the proposed model not only achieves superior performance in terms of error metrics, but also exhibits better forecasting advantages in tracking load variation trends and characterizing local fluctuations.

To further avoid comparison bias caused by different preprocessing strategies, this study additionally constructs fair comparison experiments for LSTM, GRU, and BiLSTM under the same STL decomposition preprocessing conditions, as shown in Table 4.

It can be observed that STL-LSTM, STL-GRU, and STL-BiLSTM achieve lower errors than the baseline models that directly forecast the original load sequence, indicating that STL decomposition can effectively alleviate the modeling difficulty caused by load non-stationarity. However, compared with these baseline models that also adopt STL decomposition, the proposed model still obtains lower forecasting errors. This indicates that the performance improvement is not only derived from decomposition preprocessing, but also from the collaborative modeling of the low-frequency linear channel and the nonlinear residual channel, as well as the further optimization of the residual channel by deep reinforcement learning.

5.3. Multi-Season Testing and Meteorological Perturbation Analysis

In Section 5.2, the overall forecasting performance of the proposed model was verified on the original test set. Considering that the original test set cannot fully reflect the model’s adaptability under summer high-temperature and winter low-temperature conditions, this section further constructs multi-season test sets and conducts analyses from two aspects: multi-season testing and meteorological input errors. First, the training set, validation set, and summer and winter test windows are redefined. Second, the forecasting performance of the model in summer and winter is evaluated. Finally, perturbations are applied to the apparent temperature input variable to analyze the influence of temperature errors on forecasting accuracy.

5.3.1. Multi-Season Testing

Under the premise that the training, validation, and test sets are strictly divided in chronological order, the period from 1 January 2021, to 31 October 2022, is selected as the training set, the period from 1 November 2022, to 31 December 2022, is selected as the validation set, the period from 1 July 2023, to 31 July 2023, is selected as the summer test set, and the period from 1 December 2023, to 31 December 2023, is selected as the winter test set.

To ensure that the test periods are not involved in model training or parameter selection, continuous test windows in summer and winter are selected for evaluation, respectively, to verify the generalization capability of the proposed model under different seasonal load scenarios. Table 5 presents the forecasting results on the summer and winter test sets.

It can be seen that the proposed model maintains good forecasting performance in both summer high-temperature scenarios and winter low-temperature scenarios, indicating that it can adapt to load fluctuations caused by meteorological variations across different seasons. Particularly under relatively extreme temperature conditions, the dual-channel structure can separately characterize low-frequency trend variations and high-frequency residual disturbances, thereby improving the modeling capability for seasonal load variations.

5.3.2. Meteorological Perturbation Analysis

Perturbations are applied to the apparent temperature variable at all time steps in the summer and winter test sets. For each time step, a random perturbation of either +1 °C or −1 °C was added to the apparent temperature input to approximate meteorological forecast uncertainty. Table 6 presents the forecasting results under the temperature perturbation scenarios.

As shown in the above table, meteorological forecasting errors affect the short-term load forecasting results. However, the increase in forecasting error is relatively limited, indicating that the proposed model has a certain degree of robustness to meteorological forecasting errors.

5.4. Ablation Experiment

To further investigate the internal mechanism of the model and quantitatively evaluate the contribution of its key components, a series of rigorous ablation experiments are designed in this section. Taking the proposed model as the benchmark, Group 1 denotes the model without the deep reinforcement learning module, Group 2 denotes the model without the attention module, Group 3 denotes the model without the CNN module, and Group 4 denotes the model without STL decomposition and the linear channel. To ensure fair comparison, the experimental settings are kept consistent with those of the main experiment. The results are shown in Table 7 and Figure 10.

By comparing Group 1 with the proposed model, the contribution of the deep reinforcement learning optimization mechanism to the residual channel can be evaluated. Group 1 retains STL decomposition, the dual-channel structure, and the CNN-BiLSTM residual forecasting network, but removes the Actor-Critic policy optimization and trains the residual channel only using conventional supervised learning. The experimental results show that, after removing the deep reinforcement learning optimization, the RMSE increases from 976.4 MW to 1527.3 MW, while the MAE and MAPE increase to 1094.0 MW and 2.42%, respectively. This result indicates that, under the current dataset and experimental settings, reward-feedback-based window-level policy optimization can further improve the forecasting performance of the high-frequency residual component. It also suggests that, within the STL decomposition-based dual-channel forecasting framework constructed in this study, deep reinforcement learning optimization has a positive effect on reducing the prediction error of the residual channel.

By comparing Group 2 with the proposed model, the effectiveness of the attention mechanism can be verified. The results show that the attention mechanism improves forecasting accuracy by assigning higher importance to key temporal information.

By comparing Group 3 with the proposed model, it can be observed that Group 3 obtains an RMSE of 1860.8 MW and a MAPE of 3.12%. This indicates that the residual component contains substantial high-frequency abrupt features, which are difficult for the BiLSTM alone to capture accurately. In contrast, the convolution kernels in the CNN can effectively extract local fluctuation features.

By comparing Group 4 with the proposed model, the results show that decomposing the non-stationary load series into different components with clear characteristics and then forecasting them separately can significantly reduce the forecasting difficulty, thereby verifying the effectiveness of the load decomposition–reconstruction strategy.

To further illustrate the training stability of the residual channel, the variations in the average reward and validation RMSE during DDPG optimization are plotted in Figure 11. Since the reward function is defined as the negative mean squared error of residual forecasting, a reward value closer to zero indicates better forecasting performance. It can be observed that, as the training epochs increase, the validation RMSE generally decreases and gradually stabilizes in the later stage, while the average reward also shows a progressive improvement trend. This indicates that, with input normalization, experience replay, target-network soft updates, and reward scaling, the residual channel training process does not exhibit obvious divergence and maintains good stability.

6. Conclusions

This paper investigates the challenges posed by load non-stationarity to the accuracy of short-term electricity load forecasting and proposes a dual-channel hybrid forecasting model based on STL decomposition and CNN-BiLSTM optimized by deep reinforcement learning. Through empirical analysis using ERCOT system-level load data, comparative evaluation with multiple models, multi-season testing, meteorological perturbation analysis, and ablation experiments, the following conclusions are drawn:

(1) The experimental results show that the proposed model achieves an RMSE of 976.4 MW and a MAPE of 1.81%. Compared with standalone deep learning models, the proposed method introduces a linear channel to process low-frequency components and uses a CNN-BiLSTM framework optimized by deep reinforcement learning to predict high-frequency nonlinear components, thereby effectively improving forecasting performance.

(2) This study verifies the effectiveness of the STL decomposition strategy and the dual-channel architecture. Decomposing complex non-stationary load series into independent components with clear physical meanings significantly reduces the optimization difficulty of subsequent neural networks and enables the model to capture intrinsic dynamic patterns more efficiently.

(3) Multi-season testing and meteorological perturbation analysis show that the proposed model maintains good forecasting performance on both the summer and winter test sets. When certain perturbations are introduced into the future meteorological inputs, the model error increases to some extent, but remains relatively stable overall. This indicates that the proposed method has a certain degree of adaptability to seasonal load variations and meteorological input errors.

(4) The ablation experiments show that using CNN-BiLSTM as the Actor network and introducing a reward feedback mechanism help improve the forecasting performance of the residual channel. Under the current dataset and experimental settings, the deep reinforcement learning optimization mechanism has a positive effect on reducing forecasting errors. Meanwhile, the training convergence curves show that the validation RMSE of the residual channel gradually stabilizes in the later training stage, indicating that normalization, experience replay, target-network soft updates, and reward scaling contribute to improving training stability.

In summary, this study not only provides a systematically validated high-performance short-term electricity load forecasting model, but also constructs a deep reinforcement learning-based framework for complex time series forecasting tasks.

Author Contributions

Conceptualization, Y.W. and J.Z.; methodology, G.W.; software, R.M.; validation, R.M., T.M. and D.W.; formal analysis, G.W.; investigation, J.Z.; resources, Y.W.; data curation, Y.W.; writing—original draft preparation, D.W.; writing—review and editing, J.L.; visualization, Y.W.; supervision, Y.W.; project administration, Y.W.; funding acquisition, Y.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by Science and Technology Project of State Grid Sichuan Electric Power Company (Project name: Analysis and Short-term load forecasting of Sichuan grid under complex weather events, Project number: 521996250003).

Data Availability Statement

Data is contained within the article.

Acknowledgments

The author would like to express gratitude to State Grid Sichuan Electric Power Company for its financial support.

Conflicts of Interest

Authors Yi Wang, Jian Zhou and Gang Wu were employed by the company State Grid Sichuan Electric Power Company, Chengdu, China; authors Ruiguang Ma and Tiannan Ma were employed by the company State Grid Sichuan Economic Research Institute, Chengdu. China. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The authors declare that this study received funding from State Grid Sichuan Electric Power Company. The funder was not involved in the study design, collection, analysis, interpretation of data, the writing of this article or the decision to submit it for publication.

References

Ahmad, N.; Ghadi, Y.; Adnan, M.; Ali, M. Load forecasting techniques for power system: Research challenges and survey. IEEE Access 2022, 10, 71054–71090. [Google Scholar] [CrossRef]
Nti, I.K.; Teimeh, M.; Nyarko-Boateng, O.; Adekoya, A.F. Electricity load forecasting: A systematic review. J. Electr. Syst. Inf. Technol. 2020, 7, 13. [Google Scholar] [CrossRef]
Kuster, C.; Rezgui, Y.; Mourshed, M. Electrical load forecasting models: A critical systematic review. Sustain. Cities Soc. 2017, 35, 257–270. [Google Scholar] [CrossRef]
Hagan, M.T.; Behr, S.M. The time series approach to short term load forecasting. IEEE Trans. Power Syst. 2007, 2, 785–791. [Google Scholar]
Yildiz, B.; Bilbao, J.I.; Sproul, A.B. A review and analysis of regression and machine learning models on commercial building electricity load forecasting. Renew. Sustain. Energy Rev. 2017, 73, 1104–1122. [Google Scholar] [CrossRef]
Rendon-Sanchez, J.F.; de Menezes, L.M. Structural combination of seasonal exponential smoothing forecasts applied to load forecasting. Eur. J. Oper. Res. 2019, 275, 916–924. [Google Scholar] [CrossRef]
Takeda, H.; Tamura, Y.; Sato, S. Using the ensemble Kalman filter for electricity load forecasting and analysis. Energy 2016, 104, 184–198. [Google Scholar] [CrossRef]
Dudek, G. A comprehensive study of random forest for short-term load forecasting. Energies 2022, 15, 7547. [Google Scholar] [CrossRef]
Li, S.; Goel, L.; Wang, P. An ensemble approach for short-term load forecasting by extreme learning machine. Appl. Energy 2016, 170, 22–29. [Google Scholar] [CrossRef]
Ceperic, E.; Ceperic, V.; Baric, A. A strategy for short-term load forecasting by support vector regression machines. IEEE Trans. Power Syst. 2013, 28, 4356–4364. [Google Scholar] [CrossRef]
Zhu, S.; Ma, H.; Chen, L.; Wang, B.; Wang, H.; Li, X.; Gao, W. Short-term load forecasting of an integrated energy system based on STL-CPLE with multitask learning. Prot. Control Mod. Power Syst. 2024, 9, 71–92. [Google Scholar] [CrossRef]
Zhang, L.; Wen, J.; Li, Y.; Chen, J.; Ye, Y.; Fu, Y.; Livingood, W. A review of machine learning in building load prediction. Appl. Energy 2021, 285, 116452. [Google Scholar] [CrossRef]
Rafi, S.H.; Deeba, S.R.; Hossain, E. A short-term load forecasting method using integrated CNN and LSTM network. IEEE Access 2021, 9, 32436–32448. [Google Scholar] [CrossRef]
Kong, W.; Dong, Z.Y.; Jia, Y.; Hill, D.J.; Xu, Y.; Zhang, Y. Short-term residential load forecasting based on LSTM recurrent neural network. IEEE Trans. Smart Grid 2017, 10, 841–851. [Google Scholar] [CrossRef]
Meng, X.; Wang, J.; Li, D.; Zhang, H.; Sun, Q.; Wang, H. ICEEMDAN-and LSTM-Enhanced Hybrid Cosine-Attention iTransformer for Ultra-Short-Term Load Forecasting. Electronics 2025, 14, 4857. [Google Scholar] [CrossRef]
Guo, Y.; Wang, L.; Zhao, J. A Multi-Channel Δ-BiLSTM Framework for Short-Term Bus Load Forecasting Based on VMD and LOWESS. Electronics 2025, 14, 4772. [Google Scholar] [CrossRef]
Guo, Y.; Li, Y.; Qiao, X.; Zhang, Z.; Zhou, W.; Mei, Y.; Lin, J.; Zhou, Y.; Nakanishi, Y. BiLSTM multitask learning-based combined load forecasting considering the loads coupling relationship for multienergy system. IEEE Trans. Smart Grid 2022, 13, 3481–3492. [Google Scholar] [CrossRef]
Zhang, W.; Chen, Q.; Yan, J.; Zhang, S.; Xu, J. A novel asynchronous deep reinforcement learning model with adaptive early forecasting method and reward incentive mechanism for short-term load forecasting. Energy 2021, 236, 121492. [Google Scholar] [CrossRef]
Dong, Y.; Liu, K.; Jiang, H.; Dong, Y.; Wang, J. Power load forecasting using deep learning and reinforcement learning. Inf. Sci. 2025, 720, 122523. [Google Scholar] [CrossRef]
Zhang, H.; Zhang, G.; Zhao, M.; Liu, Y. Load forecasting-based learning system for energy management with battery degradation estimation: A deep reinforcement learning approach. IEEE Trans. Consum. Electron. 2024, 70, 2342–2352. [Google Scholar] [CrossRef]
Muzaffar, S.; Afshari, A. Short-term load forecasts using LSTM networks. Energy Procedia 2019, 158, 2922–2927. [Google Scholar] [CrossRef]
Ullah, K.; Ahsan, M.; Hasanat, S.M.; Haris, M.; Yousaf, H.; Raza, S.F.; Tandon, R.; Abid, S.; Ullah, Z. Short-term load forecasting: A comprehensive review and simulation study with CNN-LSTM hybrids approach. IEEE Access 2024, 12, 111858–111881. [Google Scholar] [CrossRef]
Wu, K.; Wu, J.; Feng, L.; Yang, B.; Liang, R.; Yang, S.; Zhao, R. An attention-based CNN-LSTM-BiLSTM model for short-term electric load forecasting in integrated energy system. Int. Trans. Electr. Energy Syst. 2021, 31, e12637. [Google Scholar] [CrossRef]

Figure 1. Forecasting framework of the proposed model.

Figure 2. Correlation heatmap of factors affecting electricity load.

Figure 3. Basic unit of the LSTM network.

Figure 4. Structure of the attention mechanism.

Figure 5. STL decomposition results.

Figure 6. Comparison of forecasting results for different models.

Figure 7. Model Comparison Curve on a Natural Day.

Figure 8. Model Comparison Curve on the Test Set.

Figure 9. Fitting Performance of the Proposed Model on the Test Set.

Figure 10. Column Chart of Ablation Experiment Results.

Figure 11. Average Reward and Validation RMSE during the DDPG Optimization Process.

Table 1. Summary of impact factors.

Variable Categories	Variable Name	Unit/Encoding Method	Data Source	Pearson Correlation Coefficient
Historical load	Load	MW	ERCOT	1.00
Calendar factors	Weekdays	0/1	Timestamp	/
Calendar factors	Holidays	0/1	Timestamp	/
Calendar factors	Hour	Sine and cosine encoding	Timestamp	/
Meteorological factor	Apparent temperature	°C	NASA POWER	0.77
Meteorological factor	Humidity	%	NASA POWER	0.63
Meteorological factor	Wind speed	m/s	NASA POWER	−0.42
Meteorological factor	Solar radiation	W/m²	NASA POWER	0.53

Table 2. Training Parameters.

Module	Parameter Value
Experience replay pool capacity	50,000
Small batch size	64
Discount factor	0.99
Target network soft update coefficient	0.005
Actor learning rate	0.0001
Critic learning rate	0.001
Optimizer	Adam
Maximum number of training rounds	100
Number of CNN convolution kernels	32
Kernel size	3
Number of hidden units in BiLSTM	64
Dropout	0.2
Fully connected layer output dimension	24
Reward scaling factor	1.0

Table 3. Comparison of forecasting results of different models.

Model	RMSE (MW)	MAE (MW)	MAPE (%)
LSTM	2970.9	2362.3	5.21
BiLSTM	2385.7	1939.6	4.32
GRU	2473.6	2048.0	4.51
CNN-BiLSTM	1624.1	1277.3	2.92
TCN	1989.6	1671.3	3.67
Proposed Model	976.4	739.7	1.81

Table 4. Model comparison under the same STL preprocessing.

Model	RMSE (MW)	MAE (MW)	MAPE (%)
STL-LSTM	2246.2	1818.6	4.11
STL-GRU	2053.6	1724.5	3.90
STL-BiLSTM	1983.7	1662.9	3.65
Proposed Model	976.4	739.7	1.81

Table 5. Multi-Season Testing Results.

Season	Model	RMSE (MW)	MAE (MW)	MAPE (%)
Summer	LSTM	3346.2	2713.6	6.15
	BiLSTM	2536.3	2125.4	4.72
	GRU	2837.4	2324.6	5.13
	CNN-BiLSTM	1945.8	1662.1	3.66
	TCN	2165.6	1815.9	4.08
	Proposed Model	1483.2	1166.4	2.57
Winter	LSTM	2735.6	2141.5	5.02
	BiLSTM	2436.4	1987.1	4.43
	GRU	2517.8	2084.2	4.60
	CNN-BiLSTM	1729.3	1364.7	3.05
	TCN	1787.5	1411.8	3.14
	Proposed Model	1053.2	796.1	1.97

Table 6. Forecasting Results under Apparent Temperature Perturbation.

Season	Model	RMSE (MW)	MAE (MW)	MAPE (%)
Summer	LSTM	3723.9	3079.3	6.48
	BiLSTM	2756.5	2284.2	5.07
	GRU	2876.6	2351.7	5.18
	CNN-BiLSTM	2179.8	1826.6	4.10
	TCN	2194.5	1837.3	4.12
	Proposed Model	1728.2	1404.5	3.12
Winter	LSTM	2964.1	2347.6	5.14
	BiLSTM	2618.5	2193.0	4.86
	GRU	2531.3	2116.7	4.66
	CNN-BiLSTM	1694.8	1326.4	2.91
	TCN	2046.3	1723.2	3.82
	Proposed Model	1326.4	997.5	2.43

Table 7. Comparison of ablation experiment results.

Model	RMSE (MW)	MAE (MW)	MAPE (%)
Group 1	1527.3	1094.0	2.42
Group 2	1348.3	977.8	2.21
Group 3	1860.8	1401.2	3.12
Group 4	1574.1	1139.3	2.45
Proposed Model	976.4	739.7	1.81

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, Y.; Zhou, J.; Wu, G.; Ma, R.; Ma, T.; Liu, J.; Wang, D. A Short-Term Load Forecasting Model Based on STL Decomposition and CNN-BiLSTM Optimized by Deep Reinforcement Learning. Electronics 2026, 15, 2375. https://doi.org/10.3390/electronics15112375

AMA Style

Wang Y, Zhou J, Wu G, Ma R, Ma T, Liu J, Wang D. A Short-Term Load Forecasting Model Based on STL Decomposition and CNN-BiLSTM Optimized by Deep Reinforcement Learning. Electronics. 2026; 15(11):2375. https://doi.org/10.3390/electronics15112375

Chicago/Turabian Style

Wang, Yi, Jian Zhou, Gang Wu, Ruiguang Ma, Tiannan Ma, Jichun Liu, and Dezhuang Wang. 2026. "A Short-Term Load Forecasting Model Based on STL Decomposition and CNN-BiLSTM Optimized by Deep Reinforcement Learning" Electronics 15, no. 11: 2375. https://doi.org/10.3390/electronics15112375

APA Style

Wang, Y., Zhou, J., Wu, G., Ma, R., Ma, T., Liu, J., & Wang, D. (2026). A Short-Term Load Forecasting Model Based on STL Decomposition and CNN-BiLSTM Optimized by Deep Reinforcement Learning. Electronics, 15(11), 2375. https://doi.org/10.3390/electronics15112375

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Short-Term Load Forecasting Model Based on STL Decomposition and CNN-BiLSTM Optimized by Deep Reinforcement Learning

Abstract

1. Introduction

2. Model Forecasting Framework

3. Factors Affecting Electrical Load

3.1. Date Factor

3.2. Meteorological Factors

3.3. Summary of Influencing Factors

4. Model Principle

4.1. Seasonal Trend Decomposition Based on LOESS

4.2. CNN Network

4.3. Attention-Based BiLSTM Model

4.3.1. Standard LSTM Unit

4.3.2. BiLSTM Modeling

4.3.3. Attention Mechanism

4.4. Overall Network Architecture

4.4.1. Task Definition and Dual-Channel Division of Roles

4.4.2. Linear Channel

4.4.3. Deep Reinforcement Learning Modeling of the Nonlinear Channel

4.4.4. Result Reconstruction

5. Case Study

5.1. Experimental Preparation

5.1.1. Data Preprocessing

5.1.2. Dataset Division and Training Settings

5.1.3. Sample Construction and Rolling STL Decomposition

5.1.4. Experimental Error Evaluation Criteria

5.2. Model Comparison

5.3. Multi-Season Testing and Meteorological Perturbation Analysis

5.3.1. Multi-Season Testing

5.3.2. Meteorological Perturbation Analysis

5.4. Ablation Experiment

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI