An Ensemble Deep Learning Model for Provincial Load Forecasting Based on Reduced Dimensional Clustering and Decomposition Strategies

Wang, Kaiyan; Du, Haodong; Wang, Jiao; Jia, Rong; Zong, Zhenyu

doi:10.3390/math11122786

Open AccessArticle

An Ensemble Deep Learning Model for Provincial Load Forecasting Based on Reduced Dimensional Clustering and Decomposition Strategies

by

Kaiyan Wang

^1,2,

Haodong Du

^1,*,

Jiao Wang

²,

Rong Jia

^1,2 and

Zhenyu Zong

¹

School of Electrical Engineering, Xi’an University of Technology, Xi’an 710048, China

²

Key Laboratory of Smart Energy in Xi’an, Xi’an University of Technology, Xi’an 710048, China

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(12), 2786; https://doi.org/10.3390/math11122786

Submission received: 6 April 2023 / Revised: 12 June 2023 / Accepted: 16 June 2023 / Published: 20 June 2023

(This article belongs to the Special Issue Modeling and Simulation for the Electrical Power System)

Download

Browse Figures

Versions Notes

Abstract

:

The accurate prediction of short-term load is crucial for the grid dispatching department in developing power generation plans, regulating unit output, and minimizing economic losses. However, due to the variability in customers’ electricity consumption behaviour and the randomness of load fluctuations, it is challenging to achieve high prediction accuracy. To address this issue, we propose an ensemble deep learning model that utilizes reduced dimensional clustering and decomposition strategies to mitigate large prediction errors caused by non-linearity and unsteadiness of load sequences. The proposed model consists of three steps: Firstly, the selected load features are dimensionally reduced using singular value decomposition (SVD), and the principal features are used for clustering different loads. Secondly, variable mode decomposition (VMD) is applied to decompose the total load of each class into intrinsic mode functions of different frequencies. Finally, an ensemble deep learning model is developed by combining the strengths of LSTM and CNN-GRU deep learning algorithms to achieve accurate load forecasting. To validate the effectiveness of our proposed model, we employ actual residential electricity load data from a province in northwest China. The results demonstrate that the proposed algorithm performs better than existing methods in terms of predictive accuracy.

Keywords:

short-term load forecasting; clustering; decomposition; forecasting strategies; ensemble deep learning

MSC:

60G25

1. Introduction

As China’s electrification level rises, so does demand for electrical energy [1]. Residential, commercial, and industrial electricity consumption all show a year-on-year growth trend. However, the electricity load exhibits obvious randomness and volatility due to objective and social factors such as weather changes, holidays, and unexpected situations [2,3]. This complicates load forecasting and has an impact on the reliability and efficiency of power system operation [4,5,6,7]. Highly accurate load forecasting enables power dispatching authorities to develop more scientific and cost-effective power generation plans, reducing fossil fuel consumption and slowing environmental degradation [8]. Improved load forecasting accuracy can lead to better peak regulation in power systems with energy storage. As a result, finding high-accuracy load forecasting methods has been a hot and difficult research topic in this field [9,10,11].

In terms of time scales, there are currently four main types of load forecasts: ultra-short-term load forecasts [12], short-term load forecasts [13,14], medium-term load forecasts [15,16,17], and long-term load forecasts [18,19,20,21]. Ultra-short-term forecasts (one hour or 10 min in the future) are mainly used for real-time security analysis, real-time economic dispatch, and automatic generation control. Short-term forecasts (one day or one week ahead) are used for scheduling daily start-up and shutdown plans and generation plans. Medium-term forecasts (one year in the future) are used for monthly maintenance plans, operation methods, and reservoir scheduling plans. Long-term forecasts (10 years in the future) provide key basic data for grid planning or for determining annual maintenance plans, operation methods, etc.

Load forecasting models can be mainly classified as statistical [22,23], artificial intelligence [24,25], and combined models [26,27,28].

Statistical models mainly include autoregressive integrated moving average, seasonal autoregressive integrated moving average, multiple linear regression, exponential smoothing method, etc. Wang Bo et al. constructed an autoregressive and moving average model with exogenous variables to achieve the short-term forecasting of load [29]. Luiz Felipe Amaral et al. developed the smooth transition periodic autoregressive model and evaluated the load forecasting performance of the model [30]. Traditional statistical methods will no longer be applicable when forecasting complex non-linear trends.

In recent years, artificial intelligence models have been widely applied in the field of load forecasting with excellent results. Common artificial intelligence models can be broadly classified into machine learning models [31,32,33] and deep learning models [34,35,36]. Support vector machines and neural network models are the most representative machine learning models. Jian Luo et al. [37] constructed a weighted quadratic surface support vector regression model to achieve efficient load prediction. The results show that the support vector function can handle non-linear time series better, but the parameter setting of the method is too cumbersome. Haoming Liu et al. [38] constructed a combinatorial model based on support vector regression for short-term load forecasting of integrated energy systems. Pham et al. [39] used BPNN as a core forecasting algorithm for load forecasting. Yusha Hu et al. [40] established a parameter-optimised BPNN network to avoid the problem of prediction results falling into local optima. Deep learning models are a field of algorithms that has grown from neural network models in machine learning models. Deep learning models contain more complex structures and are suitable for processing large amounts of non-linear data. A stacked autoencoder structure based on a deep LSTM was innovatively proposed by Zahra Fazlipour et al. [41]. Notably, the study shows that the deep structure is useful for improving prediction accuracy. A stack LSTM model was developed by Hongbo Ren et al. [42] for implementing load forecasting. Temporal convolutional networks incorporating an attention mechanism were constructed by XianlunTang et al. [43].

Various single prediction models have their limitations, and prediction accuracy is difficult to meet production needs. Combinatorial prediction methods have started to emerge in recent years. Combinatorial models can be divided into two main categories: combinations of optimisation algorithms and forecasting models and the integration of multiple forecasting models. A combined forecasting model, i.e., the Elman neural network (ENN) model optimised using the particle swarm optimization algorithm for load power forecasting was proposed by Kun Xie et al. [44]. Considerable work has also been performed by groups of multiple researchers in the area of integration of multiple predictive models. Xifeng Guo et al. [45] used a convolutional neural network to cascade four different scales of features to fully exploit the potential relationships between continuous and discontinuous data in the feature maps. The feature vectors at different scales are fused as input to the LSTM network, and the LSTM neural network is used for short-term load prediction. Umar Javed et al. [46] combined an expanded causal convolutional network with short sensory fields and a bi-directional LSTM to build a new load forecasting architecture, achieving higher accuracy in load forecasting. Sana Arastehfar et al. [47] integrated a graph convolutional neural network and a long- and short-term memory network into a unified network. The network can extract both temporal and spatial information. In addition to the combined models mentioned above, the combination of decomposition methods and deep learning models is a new trend in the field of load forecasting. Weimin Yue et al. [48] combined ensemble empirical modal decomposition with long- and short-term memory neural networks to address the problem of poor load prediction accuracy. Qian Zhang et al. [49] combined a variational modal decomposition model with a stacked integrated model for load prediction. These studies show the outstanding advantages of combined models in the field of load forecasting.

In everyday life and production processes, there is differentiation in the electricity consumption behaviour of customers. Load characteristics therefore vary, making it difficult for generalised forecasts to meet the requirements of forecasting accuracy. When forecasting the load in an area, if each subarea is forecast individually and then integrated, the forecast granularity is too fine and prone to over-fitting. At the same time, it would be more time-consuming to forecast the load using this method. Conversely, if all customer loads are aggregated and then predicted, the prediction granularity is too large, and the differentiated characteristics of customer electricity consumption cannot be obtained. Therefore, the extraction of customer electricity characteristics is also a key technique that affects the accuracy of load forecasting. KOIVISTO M et al. [50] used principal component analysis for dimensionality reduction and clustering using K-means. The method can effectively achieve the clustering of loads, but the stability of the principal component analysis method is poor, and the accuracy is not high. ZHONG S et al. [51] used the Fourier transform as a dimensionality reduction method to extract the main load features and achieve classification of loads, but the method did not specify the weights of the dimensionality reduction indicators.

Based on previous research, an ensemble load forecasting algorithm based on load clustering and load decomposition strategies is proposed in this study. The improved prediction strategy includes three aspects. The first is a load clustering method based on principal characteristic extraction and dimensionality reduction. The extracted primary features reflect the fluctuation characteristics of various loads. The dimensionality reduction strategy can preserve important features while reducing the complexity of the clustering model, improving clustering efficiency and reducing memory requirements. The clustering algorithm groups loads with similar trends in variation. By predicting a class of loads with similar trends in a provincial region, the forecasting accuracy can be improved while reducing time costs. The second aspect is the load decomposition strategy. Multiple classes of loads with different characteristics are obtained after clustering. The sum of each type of load is decomposed into separate components with a single and uncoupled frequency, and then separate prediction models are constructed for the different frequency components. By decomposing each class of load, not only can the information contained in the data be fully explored, but also the interaction between different components at the characteristic scale can be reduced. Thirdly, an integrated deep learning model based on the LSTM and CNN-GRU model is constructed. In this model, the LSTM and CNN-GRU models are used for the prediction of low-frequency and high-frequency components, respectively. The LSTM can fully reflect the overall trend of the load and has high accuracy in predicting low-frequency time series. The CNN-GRU neural network, on the other hand, has a strong non-linear fitting capability and can achieve accurate prediction of high-frequency components with high randomness. The advantages of these two prediction methods complement each other. The prediction methods were validated by simulation, yielding desirable prediction results.

This paper is organised as follows. Section 2 presents the novel load forecasting model and describes the relevant theoretical background of the algorithms involved in this paper. Section 3 presents a case study of the proposed model. Section 4 concludes the paper.

2. Methodology

2.1. An Ensemble Forecasting Model Based on an Improved Load Clustering and Decomposition Strategy

In this paper, a forecasting strategy based on load clustering and decomposition is designed to achieve accurate load forecasting for provincial areas containing multiple cities, and the following detailed steps are given.

Step 1: A load clustering method based on principal feature extraction was used to cluster loads from multiple cities. The load characteristics are first calculated for all cities. Secondly, the SVD method is applied to reduce the load characteristics in order to extract the main ones. Finally, a K-means algorithm is used to cluster the loads of several cities based on their main characteristics. In this study, the load data of one province (containing 10 cities) are used as the study data. The load data of 10 cities are processed by dimensionality reduction–clustering.

Step 2: The total load of each category is obtained from the clustering results. The VMD algorithm is used to decompose the various types of loadings obtained by clustering. Several different frequency components are obtained.

Step 3: Based on the improved prediction strategy, the ensemble prediction algorithm is proposed. The LSTM and CNN-GRU models are used to predict the low-frequency and high-frequency IMF components obtained using the VMD algorithm, respectively, and then the prediction results of each component are superimposed to obtain the final prediction results of each type of load. Finally, the forecast results of each type of load are superimposed to obtain the load forecast results of the province.

The flowchart of the ensemble model considering clustering and decomposition strategies to achieve provincial short-term load forecasting is given in Figure 1.

2.2. Load Characteristic Dimensionality Reduction Based on Singular Value Decomposition (SVD)

As the number of dimensions of load data increases, the efficiency of load clustering decreases significantly. By reducing the dimensionality of the load characteristics, the efficiency of clustering can be improved and the memory requirements for data storage can be reduced. SVD, as a matrix decomposition method, enables the dimensionality reduction of matrices.

Assume that the n load characteristics of m users form a real matrix

A = {[a_{1}, a_{2}, \dots, a_{m}]}^{T}

of order m × n. The n load characteristics of each user are denoted as

a_{k} = {[a_{k, 1}, a_{k, 2}, \dots, a_{k, n}]}^{T}

. For matrix A, there exist orthogonal matrices

U \in R^{m \times m}

and

V \in R^{n \times n}

, such that the following equation holds [52].

A = U Λ V^{T} = [u_{1}, u_{2}, \dots, u_{m}] [\begin{array}{l} Λ_{1} \\ 0 \end{array}] {[v_{1}, v_{2}, \dots, v_{n}]}^{T} = [λ_{1} u_{1}, λ_{2} u_{2}, \dots, λ_{n} u_{n}] [\begin{array}{l} v_{1}^{T} \\ v_{2}^{T} \\ ⋮ \\ v_{n}^{T} \end{array}] = \sum_{i = 1}^{n} λ_{i} u_{i} v_{i}^{T}

(1)

where

Λ

is the diagonal matrix.

The magnitude of the singular value

λ_{i}

indicates the importance of the load characteristics. A larger singular value indicates that the feature is more important, while a smaller value indicates that the feature is unimportant and can be ignored. In Formula (1), only the first

q (q < n)

dominant singular values are retained. Then, the matrix

A

and

a_{k}

are reduced to

A \approx \sum_{i = 1}^{q} λ_{i} u_{i} v_{i}^{T} = [λ_{1} u_{1} λ_{2} u_{2} \dots λ_{q} u_{q}] \cdot [v_{1}^{T} v_{2}^{T} \dots {v_{q}^{T}]}^{T}

(2)

a_{k} = [λ_{1} u_{1, k} λ_{2} u_{2, k} \dots λ_{q} u_{q, k}] \cdot [v_{1}^{T} v_{2}^{T} \dots {v_{q}^{T}]}^{T}

(3)

From Equations (2) and (3), it can be seen that the coordinate system

v_{1}, v_{2}, \dots, v_{n}

can be reduced to the low-dimensional coordinate system

v_{1}, v_{2}, \dots, v_{q}

after neglecting the direction of the small variance of the data variation. Accordingly, the coordinate values of the load characteristic a_k in the low-dimensional coordinate system

λ_{i} u_{i, k}

can be used to reflect the main characteristics of the load characteristic. In addition, the singular values corresponding to each axis describe the importance of the load characteristic. When clustering coordinates, the higher the singularity value, the more important the corresponding load characteristic is. Therefore, the singular values of the axes are chosen as the weights of the dimensionality reduction indicators and are then normalised.

2.3. K-Means Clustering Algorithm

In this study, the K-means algorithm is used to cluster the loads of several cities in the province based on the main features after dimensionality reduction. The method needs to determine the number of clusters k in advance, and this paper adopts the sum of squared error (SSE) as the criterion for evaluating the effectiveness of clustering and determines the number of clusters accordingly. The SSE metric is defined in the following equation [53].

I_{S S E} = \sum_{i = 1}^{k} \sum_{x \in C_{k}} d^{2} (c_{k}, x)

(4)

where

C_{i}

is the class

C_{i}

sample.

c_{i}

is the cluster centre of the class

i

sample, and

d^{2} (c_{i}, x)

is the squared Euclidean distance between

c_{i}

and sample

x

. Smaller I_SEE means better quality of clustering.

Suppose X is a collection of n metadata with s dimensions, denoted X = {x₁, x₂,……x_n} ∈ R^s. The steps for clustering load data are as follows:

(1): Determine the number of clusters k according to the clustering validity index SSE.
(2): Randomly select the initial k clustering centres u₁, u₂,……u_k ∈ R^s. Calculate for each data sample the class lable_s it belongs to.

$l a b l e_{s} = \arg \min {‖x^{i} - u_{j}‖}^{2}$

(5)
(3): For each class j, recalculate the cluster centre of that class:

$x_{j}^{t + 1} = \frac{1}{|N_{i}^{t}|} \sum_{x_{j} \in l a b l e_{i}^{t}} x_{j}$

(6)

where $N_{i}^{t}$ is the number of clustering centres recalculated for the t-th time for class i.
(4): Update the class centre with the class mean.
(5): Repeat (3) and (4) until the class centres are unchanged.
(6): Output the clustering results.

2.4. Load Decomposition Based on VMD Algorithm

Considering the non-linear and non-smooth nature of the load series, the decomposition method is used to decompose the total load for each category. Each type of load is decomposed into multiple IMFs of different frequencies, and then each load sequence component is predicted separately.

Variational empirical modal decomposition (VMD) [54] is a new type of adaptive decomposition algorithm. The algorithm decomposes a complex time series into a number of single-frequency components based on a pre-determined number of decompositions M. The optimal solution of the model is obtained by alternating directional multiplication and iterative updating.

Assume a signal to be decomposed:

y (t) = \sum_{m = 1}^{M} v_{m} (t) = \sum_{m = 1}^{M} A_{m} (t) \cos [φ_{m} (t)]

(7)

where

y (t)

is the original load signal to be decomposed. v_m(t)(m = 1~M) is the single frequency signal after load decomposition. M is the number of decompositions. A_m(t) is the amplitude of the signal v_m(t).

φ_{m} (t)

is the phase angle of v_m(t).

The VMD extracts M modal components when the original signal is non-smooth, such that the sum of the frequency bandwidths of each component is minimised and the sum of each modal component is equal to the original signal. The constraint model is

\{\begin{cases} \min_{\{μ_{m}\}, \{ω_{m}\}} \{{\sum_{m} ‖\partial_{t} [(\partial (t) + \frac{j}{π t}) \cdot μ_{m} (t)] e^{- j ω_{m} t}‖}_{2}^{2}\} \\ s . t . \sum_{m} μ_{m} = f \end{cases}

(8)

where

\{μ_{m}\} = \{μ_{1}, μ_{2}, \dots, μ_{m}\}

is the decomposed modal component,

\{ω_{m}\} = \{ω_{1}, ω_{2}, \dots, ω_{m}\}

is the decomposed centre frequency,

δ (t)

is the shock function, and

f

is the original load signal.

The Lagrangian multiplier

λ

and the second-order parametric penalty factor

α

are used to construct the extended Lagrangian function, and then the alternating direction multiplier method is used to iteratively find the global optimal solution of the objective function. The mathematical model of the augmented Lagrangian function is as follows.

L (\{μ_{m}\}, \{ω_{m}\}, λ) = α {\sum_{m} ‖\partial_{t} [(δ (t) + \frac{j}{π t}) \cdot μ_{m} (t)] e^{- j ω_{m} t}‖}^{2} + {‖ f (t) - \sum_{m} μ_{m} (t) ‖}_{2}^{2} + 〈λ (t), f (t) - \sum_{m} μ_{m} (t)〉

(9)

The optimal solution to the above equation is found by alternately solving the multiplicative updates:

μ_{m}^{n + 1}

,

ω_{m}^{n + 1}

, and

λ_{m}^{n + 1}

. The value of

μ_{m}^{n + 1}

is expressed as follows:

{\overset{Λ}{μ}}_{m}^{n + 1} = \arg \min α \cdot {‖j ω [(1 + sgn (ω + ω_{m})) \cdot {\overset{Λ}{μ}}_{m} (ω + ω_{m})]‖}_{2}^{2} + {‖\overset{Λ}{f} (ω) - \sum_{i} {\overset{Λ}{μ}}_{i} (ω) + \frac{\overset{Λ}{λ} (ω)}{2}‖}_{2}^{2}

(10)

The minimal value of each IMF component from Fourier transform is as follows:

{\overset{Λ}{μ}}_{m}^{n + 1} (ω) = \frac{\overset{Λ}{f} (ω) + \sum_{i} {\overset{Λ}{μ}}_{i} (ω) + \frac{\overset{Λ}{λ} (ω)}{2}}{1 + 2 α {(ω - ω_{m})}^{2}}

(11)

Similarly, find the central frequency minima as

ω_{m}^{n + 1} = \frac{\int_{0}^{\infty} ω {|\overset{Λ}{μ_{m}} (ω)|}^{2} d ω}{\int_{0}^{\infty} {|\overset{Λ}{μ_{m}} (ω)|}^{2} d ω}

(12)

2.5. LSTM

Long Short-Term Memory (LSTM) [55] is a type of recurrent neural network model used for processing sequence data such as text, speech, and time series data. The advantage of LSTM models is that they can capture long-term dependencies, which traditional recurrent neural network models cannot achieve. The memory unit in the LSTM model can remember information in the input sequence and pass it on to the next time step for better prediction of future values. In addition, the LSTM model can control the flow of information through gate mechanisms to reduce the problem of vanishing gradients. These advantages make LSTM models widely applicable in areas such as speech recognition and time prediction. The low-frequency component of the load fluctuates smoothly, and accurate prediction can be achieved using only the LSTM model. Therefore, the LSTM model is used to implement the low-frequency component prediction in this study.

The structure of the LSTM [54] consists of three gates: the forgetting gate, the input gate, and the output gate. According to the internal structure of the three gates, the value of the LSTM hidden layer at this moment depends on the joint action of the current moment and the previous moment. The three gates act as three control switches to extract and process the information: delete the historical information that is not useful, retain the information related to the output characteristics, update the state of the hidden layer, and improve the convergence of the model.

2.6. CNN-GRU

The CNN-GRU model is a novel neural network architecture that combines the convolutional neural network (CNN) and gated recurrent unit (GRU) models. The CNN-GRU model is designed to process sequential data, such as time series data, and is particularly well-suited for high-frequency component prediction. The CNN-GRU model consists of two main components: a CNN and a GRU. The CNN is responsible for extracting local features from the input data, while the GRU is used to capture the temporal dependencies in the data. The output of the CNN is fed into the GRU, which then produces time series prediction results. This allows the model to accurately predict high-frequency components in the data, which are often difficult to predict using traditional time series models. Overall, the CNN-GRU model represents a significant advancement in the field of time series prediction and can greatly improve the accuracy and reliability of high-frequency component predictions.

2.6.1. CNN

The convolutional neural network (CNN) is a deep learning model used for image classification, feature extraction, target detection, speech recognition, etc. The CNN mainly consists of multiple convolutional layers, where the convolutional layers are used to extract potential features of the data and pooling layers are used to reduce the size of the feature map. Convolutional layers play a key role in feature extraction. They are responsible for detecting local features in the input data by sliding a set of learnable filters over the input volume and computing the dot product between the filter and the corresponding patch of the input. The result of this operation is a feature map that captures potential fluctuating features in the input data. Overall, the convolutional layer enables the neural network to learn hierarchical representations of the input data that are increasingly abstract and discriminative, thereby improving the accuracy and robustness of the model.

2.6.2. GRU

GRU is a variant of LSTM that uses its specific memory and forgetting structure to model time series dynamically in time. It addresses the phenomenon of gradient disappearance and gradient explosion during the training of loaded time series. Compared with LSTM, GRU reduces the number of gates, which ensures both the accuracy of load prediction and the training time.

The GRU has two gates. The GRU integrates the forgetting gate and the input gate from the LSTM to form a new update gate. The output gate in the LSTM is replaced by a reset gate, which picks the state at the previous moment and writes it to the candidate set at this moment [54].

2.7. Performance Evaluation

In this paper, the error evaluation index root mean square error (RMSE) [56] and mean absolute percentage error (MAPE) [57] are used to evaluate the accuracy of the prediction model. The mathematical expressions are shown in Equations (13) and (14).

R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - p_{i})}^{2}}

(13)

M A P E = \frac{1}{N} \sum_{i = 1}^{N} |\frac{y_{i} - p_{i}}{y_{i}}| \times 100 %

(14)

where N is the number of samples.

y_{i}

is the true value of the load data.

p_{i}

is the load forecast value.

3. Case Analysis

To verify the application effectiveness of the proposed ensemble forecasting method based on an improved forecasting strategy, simulations were carried out using actual load data. Actual load data from 15 July to 15 August 2017 were collected from 10 cities in a province in northwest China to forecast the load on August 16. The data sampling period was 15 min, and a total of 96 points were sampled in 1 day. The typical daily load curves of 10 cities are shown in Figure 2.

3.1. Load Characteristic Extraction

In 2005, the National Grid Limited issued the “In-depth Requirements for Load Characteristics Study Content and Interpretation of Indicators”. In this document, the definitions of various types of load characteristic indicators are specified. In this paper, seven types of daily load characteristic indicators are selected and defined, as shown in Table 1.

In the table,

P_{\max}

,

P_{\min}

, and

P_{s u m}

are the maximum daily load, minimum daily load, and total daily load, respectively.

Based on the above-defined equations of load characteristics, the load characteristics of each municipality are calculated, and the singular values of each load characteristic are calculated using the SVD. Seven types of load characteristic singular values and their Pareto diagrams are shown in Figure 3.

As can be seen from Figure 3, the first three load characteristics have larger singularity values and the last four have smaller ones. According to the SVD principle, the larger the singularity value, the more information the load characteristic reflects and the more important the load characteristic is. The Pareto diagram shows that the cumulative contribution of the first three load characteristics is high, reaching 93.92%. It can be observed that the first three load characteristics contain most of the load information, so the first three load characteristics are extracted as the main characteristics.

3.2. Analysis of Clustering Results for Ten City Loads

Before clustering, the number of clusters needs to be determined based on the SSE index to ensure the clustering effect, as shown in Figure 4.

It can be seen that when the number of clusters is four, the SSE index is relatively small. The number of clusters is chosen to be four, as each category contains a certain number of cities after clustering. Using the load characteristic matrix obtained after SVD dimensionality reduction, the K-means algorithm is applied to achieve clustering of different urban loads. The clustering results after the SVD dimensionality reduction are shown in Figure 5.

The load characteristics of each class of cities after clustering are shown in Figure 6. In this figure, for each city load, each colour bar represents a characteristic of that city load. A total of seven load characteristics are plotted for each city load. The seven colour bars represent, from left to right, the load characteristics: maximum daily load, minimum daily load, average daily load, daily load factor, minimum daily load factor, daily peak-to-valley variance, and daily peak-to-valley ratio. The seven load characteristics are defined in Table 1 above.

As can be seen from Figure 6, the load characteristics of cities clustered into the same category are similar, and the load characteristics of cities in different categories vary considerably. The first category contains three cities, E–G. The second category contains only city A, which is the capital city of the province and has the largest electricity load. The third category contains cities B–D. The fourth category contains three cities, H–J.

Figure 7 shows the gross regional product (GDP) of the 10 municipalities in the province.

Electricity load is used as a barometer to measure economic and social development, and the results of load clustering are correlated with the economic development of each region.

The comparative analysis revealed that, with the exception of cities C and G, the clustering results corresponded to the regional GDP for all cities. Geographically, cities C and G are adjacent to the provincial capital, located to the east and west of the provincial capital, respectively. The two cities’ GDP, climate, and population statistics are extremely comparable. This demonstrates the effectiveness of the proposed clustering method based on the extraction of load principal characteristics. Clustering the loads of cities with similar characteristics into the same class simplifies and reduces the randomness of load forecasting.

3.3. Analysis of the Results of the Frequency Domain Decomposition of the Load

VMD was carried out for each type of load after clustering, and the number of decompositions is four. Due to space limitations, only the results of the fourth type of load decomposition are shown in Figure 8.

As can be seen in Figure 8, each type of load is decomposed into components with a smooth sequence and a single frequency. The intrinsic variation pattern of the load is extracted, reducing the complexity of the original load sequence. The effect between different components at the characteristic scale is greatly reduced, providing a simpler sequence for the fine prediction of load.

3.4. Analysis of Prediction Results Based on a Proposed Ensemble Model

The proposed ensemble forecast model is used to forecast four types of loads. Firstly, the low-frequency component (trend term, IMF1, IMF2) and the high-frequency component (IMF3) are predicted separately, and then the predictions for each component are superimposed.

For each type of load, the prediction results of high-frequency components and low-frequency components are superimposed to obtain the final prediction results, as shown in Figure 9. The forecast errors RMSE and MAPE for each type of load are shown in Table 2.

As seen in Table 2, the greater the peak-to-valley difference, the greater the root mean square error (RMSE) of the load forecast. The average absolute percentage error MAPE is 1.12% for the first category of load, 1.76% for the second category, 0.95% for the third category, and 1.47% for the fourth category. The mean absolute percentage error MAPE for each category of load did not exceed 2%, and the accuracy of the predictions is high.

3.5. Comparison of the Proposed Forecast and Baseline Schemes

3.5.1. Five Comparative Baseline Schemes

In this section, five forecasting schemes are proposed. All five schemes are used to compare and validate the effectiveness of the proposed ensemble forecasting method with improved forecasting strategies.

Scheme 1: Based on the prediction strategy proposed in this paper, LSTM is used to predict the decomposed components. That is, “reduced dimensional clustering—decomposition—LSTM”.

Scheme 2: The support vector regression (SVR) is applied directly to each class of load after clustering for prediction, i.e., “reduced dimensional clustering—SVR”.

Scheme 3: The clustered loads of each class are predicted directly using BP neural networks, i.e., “reduced dimensional clustering—BPNN”.

Scheme 4: The LSTM is used directly to predict the load for each class after clustering, i.e., “reduced dimensional clustering—LSTM”.

Scheme 5: Direct forecasting of the total load of the 10 municipalities in the province using LSTM. This is known as the “LSTM”.

A clearer presentation of the six prediction schemes is given in Table 3. The values of each model parameter are shown in Table 4.

In the following study, the prediction performance of the schemes using the reduced dimensional clustering approach will be compared first (i.e., proposed and Schemes 1–4). In detail, the prediction performance of these five prediction schemes for each type of load is compared separately. The forecast results for each type of load under the five schemes are then integrated to obtain the total load forecast results for the province for the coming day under the five prediction schemes. Finally, the forecast results of these five schemes are compared with Scheme 5.

3.5.2. Comparison of Predictive Results for Various Types of Loads (Proposed Scheme and Schemes 1–4)

Figure 10, Figure 11, Figure 12 and Figure 13 show a comparison of the results of the four types of load prediction using the first four schemes and the proposed scheme.

A comparison of the error results for the four types of load forecast is shown in Table 5. Figure 14 shows a histogram of the forecast error for each type of load under various schemes.

It can be seen that both the MAPE and RMSE values of the proposed method are lower. The dimensionality reduction–clustering and decomposition strategy is used in Scheme 1, but only the LSTM algorithm is used for the prediction of each component. The MAPE prediction errors for all types of loads using Scheme 1 are 2.11–6.25% higher than the proposed method. The RMSE prediction errors for all types of loads using Scheme 1 are 1.77~5.81% higher than the proposed method. This is due to the use of CNN-GRU to predict the more non-linear stochastic component separately, which improves the prediction accuracy of the high-frequency component. The results verify the effectiveness and accuracy of CNN-GRU for predicting the high-frequency components, and the ensemble prediction can achieve accurate prediction for the characteristics of various load components.

Schemes 2–4 all use a method of forecasting each class of load after clustering the loads. Compared to the proposed forecasting scheme and Scheme 1, the forecasting error for each class of load increases to varying degrees. This shows that the decomposition of the loads can improve the prediction accuracy. Comparing Schemes 2–4, the error index MAPE for Scheme 4 is smaller than that of Schemes 2 and 3. The results show that LSTM has a higher prediction accuracy than the SVR and BPNN algorithms, which reflects the effectiveness of the LSTM algorithm.

3.5.3. Comparison of the Predicted Results of the Total Provincial Load (Proposed Scheme and Schemes 1–5)

For the proposed scheme and Schemes 1–4, the predicted results of each type of load are superimposed to obtain the predicted load for the entire province on August 16th under each scheme. The direct forecast of the province’s load on August 16 using Scheme 5 yields the forecast of the province’s load on August 16 under Scheme 5. The forecast results of these six schemes are compared with the actual load of the province on August 16, and the forecast result curves are shown in Figure 15. A comparison of the forecast errors is shown in Table 6.

The comparison shows that the proposed prediction scheme is able to achieve higher prediction accuracy. The table also shows that the prediction error of Scheme 5 is higher than that of Scheme 4. The MAPE error indicator value of Scheme 5 is 12.07% higher than that of Scheme 4. The RMSE error indicator of Scheme 5 is 8.26% higher than that of Scheme 4. It is inferred that the use of the clustering method can effectively improve the prediction accuracy of provincial load. The effectiveness of frequency domain decomposition of load sequences in improving prediction accuracy can be readily seen by comparing the error metrics of Schemes 1 and 4. The proposed scheme has a 1.8% reduction in the MAPE error metric and a 0.23% reduction in the RMSE error metric compared to Scheme 1. Thus, the validity of the ensemble model is verified. After a comprehensive comparison and analysis, the proposed forecasting scheme in this paper has significant advantages, and the scheme can achieve accurate forecasting of provincial loads.

4. Conclusions

Considering the high demand for short-term load forecasting accuracy in power dispatch, an ensemble forecasting algorithm based on an improved forecasting strategy is proposed in this paper. The proposed model is validated with actual load data from a province in northwest China, leading to the following conclusions:

(1): The proposed load clustering based on principal characteristic extraction improves the efficiency and quality of clustering. At the same time, the randomness of the load sequence for each class of users is reduced, as users with similar load characteristics are clustered into the same class. This reduces the complexity of load prediction and facilitates the improvement of prediction accuracy.
(2): After VMD, the load sequence has a single frequency and is not prone to modal confusion. The VMD method can fully exploit the implicit features of the data and avoid the mutual interference between different local features. It lays the foundation for the accurate prediction of load sequences.
(3): The proposed prediction model fully considers the fluctuation characteristics of high-frequency components and low-frequency components and fully utilizes the respective advantages of the LSTM and CNN-GRU models. At the same time, the total prediction error after the superposition of each class of load can offset the prediction error of each class to a certain extent, so that the prediction error after superposition can be further reduced. The results show that the proposed model can achieve better prediction results, and the proposed model can not only predict the change trend of electric load but also predict the local details. The forecast error (RMSE) of the proposed scheme is 0.23%, 63.49%, 75.84%, 1.59%, and 9.10% lower than that of the benchmark scheme.
(4): The CNN-GRU component in the proposed model can better extract the local features of the high-frequency component, which ensures that the proposed model can track the load trend more accurately. Compared with the scheme that only uses LSTM to predict each frequency component, the proposed model increases the complexity of the model, but the prediction accuracy is improved.

Author Contributions

Conceptualization, K.W. and H.D.; methodology, K.W. and H.D.; software, K.W. and H.D.; validation, K.W. and H.D.; formal analysis, K.W.; investigation, K.W.; resources, K.W.; data curation, R.J. and J.W.; writing—original draft preparation, H.D.; writing—review and editing, K.W. and Z.Z.; visualization, K.W. and J.W.; supervision, H.D. and J.W.; project administration, R.J.; funding acquisition, R.J. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the Shaanxi Province Science and Technology Department, grant number 2022JM-208.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Wu, Z.; Mu, Y.; Deng, S.; Li, Y. Spatial–temporal short-term load forecasting framework via K-shape time series clustering method and graph convolutional networks. Energy Rep. 2022, 8, 8752–8766. [Google Scholar] [CrossRef]
Yang, Y.; Zhou, H.; Wu, J.; Liu, C.-J.; Wang, Y.-G. A novel decompose-cluster-feedback algorithm for load forecasting with hierarchical structure. Int. J. Electr. Power Energy Syst. 2022, 142, 108249. [Google Scholar] [CrossRef]
Yang, Y.; Wang, Z.; Gao, Y.; Wu, J.; Zhao, S.; Ding, Z. An effective dimensionality reduction approach for short-term load forecasting. Electr. Power Syst. Res. 2022, 210, 108150. [Google Scholar] [CrossRef]
Haque, A.; Rahman, S. Short-term electrical load forecasting through heuristic configuration of regularized deep neural network. Appl. Soft Comput. 2022, 122, 108877. [Google Scholar] [CrossRef]
Deng, X.; Ye, A.; Zhong, J.; Xu, D.; Yang, W.; Song, Z.; Zhang, Z.; Guo, J.; Wang, T.; Tian, Y.; et al. Bagging–XGBoost algorithm based extreme weather identification and short-term load forecasting model. Energy Rep. 2022, 8, 8661–8674. [Google Scholar] [CrossRef]
Han, X.; Su, J.; Hong, Y.; Gong, P.; Zhu, D. Mid- to Long-Term Electric Load Forecasting Based on the EMD–Isomap–Adaboost Model. Sustainability 2022, 14, 7608. [Google Scholar] [CrossRef]
Lee, G.-C. Regression-Based Methods for Daily Peak Load Forecasting in South Korea. Sustainability 2022, 14, 3984. [Google Scholar] [CrossRef]
Zhao, X.; Shen, B.; Lin, L.; Liu, D.; Yan, M.; Li, G. Residential Electricity Load Forecasting Based on Fuzzy Cluster Analysis and LSSVM with Optimization by the Fireworks Algorithm. Sustainability 2022, 14, 1312. [Google Scholar] [CrossRef]
Li, M.; Xie, X.; Zhang, D. Improved Deep Learning Model Based on Self-Paced Learning for Multiscale Short-Term Electricity Load Forecasting. Sustainability 2022, 14, 188. [Google Scholar] [CrossRef]
Son, N. Comparison of the Deep Learning Performance for Short-Term Power Load Forecasting. Sustainability 2022, 13, 12493. [Google Scholar] [CrossRef]
Aslam, S.; Ayub, N.; Farooq, U.; Alvi, M.J.; Albogamy, F.R.; Rukh, G.; Haider, S.I.; Azar, A.T.; Bukhsh, R. Towards Electric Price and Load Forecasting Using CNN-Based Ensembler in Smart Grid. Sustainability 2021, 13, 12653. [Google Scholar] [CrossRef]
Wang, H.; Zhang, N.; Du, E.; Yan, J.; Han, S.; Liu, Y. A comprehensive review for wind, solar, and electrical load forecasting methods. Glob. Energy Interconnect. 2022, 5, 9–30. [Google Scholar] [CrossRef]
Yang, D.; Guo, J.-E.; Li, Y.; Sun, S.; Wang, S. Short-term load forecasting with an improved dynamic decomposition-reconstruction-ensemble approach. Energy 2023, 263, 125609. [Google Scholar] [CrossRef]
Geng, G.; He, Y.; Zhang, J.; Qin, T.; Yang, B. Short-Term Power Load Forecasting Based on PSO-Optimized VMD-TCN-Attention Mechanism. Energies 2023, 16, 4616. [Google Scholar] [CrossRef]
Matrenin, P.; Safaraliev, M.; Dmitriev, S.; Kokin, S.; Ghulomzoda, A.; Mitrofanov, S. Medium-term load forecasting in isolated power systems based on ensemble machine learning models. Energy Rep. 2022, 8, 612–618. [Google Scholar] [CrossRef]
Wang, S.; Wang, S.; Wang, D. Combined probability density model for medium term load forecasting based on quantile regression and kernel density estimation. Energy Procedia 2019, 158, 6446–6451. [Google Scholar] [CrossRef]
Ahmad, T.; Zhang, H. Novel deep supervised ML models with feature selection approach for large-scale utilities and buildings short and medium-term load requirement forecasts. Energy 2020, 209, 118477. [Google Scholar] [CrossRef]
Kalhori, M.R.N.; Emami, I.T.; Fallahi, F.; Tabarzadi, M. A data-driven knowledge-based system with reasoning under uncertain evidence for regional long-term hourly load forecasting. Appl. Energy 2022, 314, 118975. [Google Scholar] [CrossRef]
Xiang, W.; Xu, P.; Fang, J.; Zhao, Q.; Gu, Z.; Zhang, Q. Multi-dimensional data-based medium- and long-term power-load forecasting using double-layer CatBoost. Energy Rep. 2022, 8, 8511–8522. [Google Scholar] [CrossRef]
Guan, Y.; Li, D.; Xue, S.; Xi, Y. Feature-fusion-kernel-based Gaussian process model for probabilistic long-term load forecasting. Neurocomputing 2021, 426, 174–184. [Google Scholar] [CrossRef]
Wen, Z.; Xie, L.; Fan, Q.; Feng, H. Long term electric load forecasting based on TS-type recurrent fuzzy neural network model. Electr. Power Syst. Res. 2020, 179, 106106. [Google Scholar] [CrossRef]
Xie, J.; Zhong, Y.; Xiao, T.; Wang, Z.; Zhang, J.; Wang, T.; Schuller, B.W. A multi-information fusion model for short term load forecasting of an architectural complex considering spatio-temporal characteristics. Energy Build. 2022, 277, 112566. [Google Scholar] [CrossRef]
Deng, S.; Chen, F.; Wu, D.; He, Y.; Ge, H.; Ge, Y. Quantitative combination load forecasting model based on forecasting error optimization. Comput. Electr. Eng. 2022, 101, 108125. [Google Scholar] [CrossRef]
Ning, Y.; Zhao, R.; Wang, S.; Yuan, B.; Wang, Y.; Zheng, D. Probabilistic short-term power load forecasting based on B-SCN. Energy Rep. 2022, 8, 646–655. [Google Scholar] [CrossRef]
Kim, H.; Jeong, J.; Kim, C. Daily Peak-Electricity-Demand Forecasting Based on Residual Long Short-Term Network. Mathematics 2022, 10, 4486. [Google Scholar] [CrossRef]
Yang, W.; Shi, J.; Li, S.; Song, Z.; Zhang, Z.; Chen, Z. A combined deep learning load forecasting model of single household resident user considering multi-time scale electricity consumption behavior. Appl. Energy 2022, 307, 118197. [Google Scholar] [CrossRef]
Nie, Y.; Jiang, P.; Zhang, H. A novel hybrid model based on combined preprocessing method and advanced optimization algorithm for power load forecasting. Appl. Soft Comput. 2020, 97, 106809. [Google Scholar] [CrossRef]
Xiao, L.; Shao, W.; Yu, M.; Ma, J.; Jin, C. Research and application of a combined model based on multi-objective optimization for electrical load forecasting. Energy 2017, 119, 1057–1074. [Google Scholar] [CrossRef]
Wang, B.; Tai, N.-L.; Zhai, H.-Q.; Ye, J.; Zhu, J.-D.; Qi, L.-B. A new ARMAX model based on evolutionary algorithm and particle swarm optimization for short-term load forecasting. Electr. Power Syst. Res. 2008, 78, 1679–1685. [Google Scholar] [CrossRef]
Amaral, L.F.; Souza, R.C.; Stevenson, M. A smooth transition periodic autoregressive (STPAR) model for short-term load forecasting. Int. J. Forecast. 2008, 24, 603–615. [Google Scholar] [CrossRef]
Ibrahim, B.; Rabelo, L.; Gutierrez-Franco, E.; Clavijo-Buritica, N. Machine Learning for Short-Term Load Forecasting in Smart Grids. Energies 2022, 15, 8079. [Google Scholar] [CrossRef]
Ma, D.; Wu, R.; Li, Z.; Cen, K.; Gao, J.; Zhang, Z. A new method to forecast multi-time scale load of natural gas based on augmentation data-machine learning model. Chin. J. Chem. Eng. 2022, 48, 166–175. [Google Scholar] [CrossRef]
Nguyen, V.H.; Besanger, Y.; Tran, Q.T. Self-updating machine learning system for building load forecasting—Method, implementation and case-study on COVID-19 impact. Sustain. Energy Grids Netw. 2022, 32, 100873. [Google Scholar] [CrossRef]
Zhang, W.; Chen, Q.; Yan, J.; Zhang, S.; Xu, J. A novel asynchronous deep reinforcement learning model with adaptive early forecasting method and reward incentive mechanism for short-term load forecasting. Energy 2021, 236, 121492. [Google Scholar] [CrossRef]
Bian, H.; Wang, Q.; Xu, G.; Zhao, X. Load forecasting of hybrid deep learning model considering accumulated temperature effect. Energy Rep. 2022, 8, 205–215. [Google Scholar] [CrossRef]
Deepanraj, B.; Senthilkumar, N.; Jarin, T.; Gurel, A.E.; Sundar, L.S.; Anand, A.V. Intelligent wild geese algorithm with deep learning driven short term load forecasting for sustainable energy management in microgrids. Sustain. Comput. Informatics Syst. 2022, 36, 100813. [Google Scholar] [CrossRef]
Luo, J.; Hong, T.; Gao, Z.; Fang, S.-C. A robust support vector regression model for electric load forecasting. Int. J. Forecast. 2022, 39, 1005–1020. [Google Scholar] [CrossRef]
Liu, H.; Tang, Y.; Pu, Y.; Mei, F.; Sidorov, D. Short-term Load Forecasting of Multi-Energy in Integrated Energy System Based on Multivariate Phase Space Reconstruction and Support Vector Regression Mode. Electr. Power Syst. Res. 2022, 210, 108066. [Google Scholar] [CrossRef]
Pham, M.-H.; Nguyen, M.-N.; Wu, Y.-K. A Novel Short-Term Load Forecasting Method by Combining the Deep Learning with Singular Spectrum Analysis. IEEE Access 2021, 9, 73736–73746. [Google Scholar] [CrossRef]
Hu, Y.; Li, J.; Hong, M.; Ren, J.; Lin, R.; Liu, Y.; Liu, M.; Man, Y. Short term electric load forecasting model and its verification for process industrial enterprises based on hybrid GA-PSO-BPNN algorithm—A case study of papermaking process. Energy 2019, 170, 1215–1227. [Google Scholar] [CrossRef]
Fazlipour, Z.; Mashhour, E.; Joorabian, M. A deep model for short-term load forecasting applying a stacked autoencoder based on LSTM supported by a multi-stage attention mechanism. Appl. Energy 2022, 327, 120063. [Google Scholar] [CrossRef]
Ren, H.; Li, Q.; Wu, Q.; Zhang, C.; Dou, Z.; Chen, J. Joint forecasting of multi-energy loads for a university based on copula theory and improved LSTM network. Energy Rep. 2022, 8, 605–612. [Google Scholar] [CrossRef]
Tang, X.; Chen, H.; Xiang, W.; Yang, J.; Zou, M. Short-Term Load Forecasting Using Channel and Temporal Attention Based Temporal Convolutional Network. Electr. Power Syst. Res. 2022, 205, 107761. [Google Scholar] [CrossRef]
Xie, K.; Yi, H.; Hu, G.; Li, L.; Fan, Z. Short-term power load forecasting based on Elman neural network with particle swarm optimization. Neurocomputing 2020, 416, 136–142. [Google Scholar] [CrossRef]
Guo, X.; Zhao, Q.; Zheng, D.; Ning, Y.; Gao, Y. A short-term load forecasting model of multi-scale CNN-LSTM hybrid neural network considering the real-time electricity price. Energy Rep. 2020, 6, 1046–1053. [Google Scholar] [CrossRef]
Javed, U.; Ijaz, K.; Jawad, M.; Khosa, I.; Ansari, E.A.; Zaidi, K.S.; Rafiq, M.N.; Shabbir, N. A novel short receptive field based dilated causal convolutional network integrated with Bidirectional LSTM for short-term load forecasting. Expert Syst. Appl. 2022, 205, 117689. [Google Scholar] [CrossRef]
Arastehfar, S.; Matinkia, M.; Jabbarpour, M.R. Short-term residential load forecasting using Graph Convolutional Recurrent Neural Networks. Eng. Appl. Artif. Intell. 2022, 116, 105358. [Google Scholar] [CrossRef]
Yue, W.; Liu, Q.; Ruan, Y.; Qian, F.; Meng, H. A prediction approach with mode decomposition-recombination technique for short-term load forecasting. Sustain. Cities Soc. 2022, 85, 104034. [Google Scholar] [CrossRef]
Zhang, Q.; Wu, J.; Ma, Y.; Li, G.; Ma, J.; Wang, C. Short-term load forecasting method with variational mode decomposition and stacking model fusion. Sustain. Energy Grids Netw. 2022, 30, 100622. [Google Scholar] [CrossRef]
Koivisto, M.; Heine, P.; Mellin, I.; Lehtonen, M. Clustering of Connection Points and Load Modeling in Distribution Systems. IEEE Trans. Power Syst. 2013, 28, 1255–1265. [Google Scholar] [CrossRef]
Zhong, S.; Tam, K.-S. Hierarchical Classification of Load Profiles Based on Their Characteristic Attributes in Frequency Domain. IEEE Trans. Power Syst. 2015, 30, 2434–2441. [Google Scholar] [CrossRef]
Falini, A. A review on the selection criteria for the truncated SVD in Data Science applications. J. Comput. Math. Data Sci. 2022, 5, 100064. [Google Scholar] [CrossRef]
Jenssen, R.; Eltoft, T. A new information theoretic analysis of sum-of-squared-error kernel clustering. Neurocomputing 2008, 72, 23–31. [Google Scholar] [CrossRef]
Zhao, L.; Li, Z.; Qu, L.; Zhang, J.; Teng, B. A hybrid VMD-LSTM/GRU model to predict non-stationary and irregular waves on the east coast of China. Ocean Eng. 2023, 276, 114136. [Google Scholar] [CrossRef]
Wang, K.; Du, H.; Jia, R.; Jia, H. Performance Comparison of Bayesian Deep Learning Model and Traditional Bayesian Neural Network in Short-Term PV Interval Prediction. Sustainability 2022, 14, 12683. [Google Scholar] [CrossRef]
Hu, X.; Li, K.; Li, J.; Zhong, T.; Wu, W.; Zhang, X.; Feng, W. Load forecasting model consisting of data mining based orthogonal greedy algorithm and long short-term memory network. Energy Rep. 2022, 8, 235–242. [Google Scholar] [CrossRef]
Hu, H.; Xia, X.; Luo, Y.; Zhang, C.; Nazir, M.S.; Peng, T. Development and application of an evolutionary deep learning framework of LSTM based on improved grasshopper optimization algorithm for short-term load forecasting. J. Build. Eng. 2022, 57, 104975. [Google Scholar] [CrossRef]

Figure 1. An ensemble model based on clustering and decomposition strategies for the provincial short-term load forecasting.

Figure 2. Typical daily load curve of 10 cities in a province.

Figure 3. The singular values of load characteristics and their Pareto graphs.

Figure 4. Clustering effectiveness index after dimensionality reduction.

Figure 5. Load clustering results of 10 cities in a province.

Figure 6. Load characteristics of various cities after clustering.

Figure 7. Regional GDP of 10 cities in a province.

Figure 8. VMD of the second cluster load.

Figure 9. The prediction results of the proposed ensemble model.

Figure 10. Forecast results for the first type of load using different forecast schemes.

Figure 11. Forecast results for the second type of load using different forecast schemes.

Figure 12. Forecast results for the third type of load using different forecast schemes.

Figure 13. Forecast results for the fourth type of load using different forecast schemes.

Figure 14. Histogram of forecast errors for each type of load under various schemes.

Figure 15. Comparison of forecast results for the total provincial load.

Table 1. Load characteristic index definition.

Indicators	Calculation Formula
Maximum daily load	$α_{1} = P_{\max}$
Minimum daily load	$α_{2} = P_{\min}$
Average daily load	$α_{3} = P_{s u m} / n$
Daily load factor	$α_{4} = (P_{s u m} / n) / P_{\max}$
Minimum daily load factor	$α_{5} = P_{\min} / P_{\max}$
Daily peak-to-valley difference	$α_{6} = P_{\max} - P_{\min}$
Daily peak-to-valley ratio	$α_{7} = (P_{\max} - P_{\min}) / P_{\max}$

Table 2. Error of the proposed ensemble prediction algorithm.

Error	Type 1	Type 2	Type 3	Type 4
RMSE	30.50	94.12	66.69	29.41
MAPE	1.12%	1.76%	0.95%	1.47%

Table 3. Comparison of six prediction schemes.(√ indicates that the policy is adopted, × indicates that the policy is not adopted).

Scheme	Whether Reduced Dimensional Clustering Is Adopted	Decomposition Is Adopted or Not	Prediction Model for Each Component	Prediction Models for Various Types of Loads When Decomposition Is Not Used	Prediction Model for the Overall Load of 10 Cities
Proposed	√	√	LSTM, CNN-GRU	-	-
Scheme 1	√	√	LSTM	-	-
Scheme 2	√	×	-	SVR	-
Scheme 3	√	×	-	BPNN	-
Scheme 4	√	×	-	LSTM	-
Scheme 5	×	×	-	-	LSTM

Table 4. The value of each model parameter.

Scheme	Algorithm	Parameter Value
Proposed	LSTM	num_layers = 3, units = 128, activation = ′relu′, epochs = 150, optimizer = ‘adam′
	CNN-GRU	CNN: num_layers = 1, filters = 96, kernel_size = 2, padding = valid
		GRU: num_layers = 3, units = 128, dropout = 0.02
		epochs = 150, optimizer = ‘adam′
Scheme 1	LSTM	num_layers = 3, units = 128, activation = ′relu′, epochs = 150
Scheme 2	SVR	Kernel = rbf, C = 1
Scheme 3	BPNN	num_layers = 3, units = 128, epochs = 150, optimizer = ‘adam′
Scheme 4	LSTM	num_layers = 3, units = 128, activation = ′relu′, epochs = 150, optimizer = ‘adam′
Scheme 5	LSTM	num_layers = 3, units = 128, activation = ′relu′, epochs = 150, optimizer = ‘adam′

Table 5. Comparison of prediction errors of different prediction schemes.

Model	Error Indicators	Type 1	Type 2	Type 3	Type 4
Proposed	MAPE	1.12%	1.76%	0.95%	1.47%
Scheme 1	MAPE	1.19%	1.84%	0.97%	1.56%
Scheme 2	MAPE	1.32%	3.04%	0.98%	1.79%
Scheme 3	MAPE	1.30%	2.96%	0.99%	1.77%
Scheme 4	MAPE	1.29%	1.96%	0.98%	1.59%
Proposed	RMSE	30.50	94.12	66.69	29.41
Scheme 1	RMSE	31.04	98.84	70.37	31.12
Scheme 2	RMSE	45.69	172.80	71.99	35.98
Scheme 3	RMSE	45.16	163.10	71.79	35.52
Scheme 4	RMSE	44.99	107.81	71.47	32.02

Table 6. Comparison of total load forecast errors in the province.

Model	MAPE	RMSE
Proposed	1.09%	195.40
Scheme 1	1.11%	195.87
Scheme 2	2.97%	535.20
Scheme 3	4.25%	808.90
Scheme 4	1.16%	198.56
Scheme 5	1.30%	214.97

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, K.; Du, H.; Wang, J.; Jia, R.; Zong, Z. An Ensemble Deep Learning Model for Provincial Load Forecasting Based on Reduced Dimensional Clustering and Decomposition Strategies. Mathematics 2023, 11, 2786. https://doi.org/10.3390/math11122786

AMA Style

Wang K, Du H, Wang J, Jia R, Zong Z. An Ensemble Deep Learning Model for Provincial Load Forecasting Based on Reduced Dimensional Clustering and Decomposition Strategies. Mathematics. 2023; 11(12):2786. https://doi.org/10.3390/math11122786

Chicago/Turabian Style

Wang, Kaiyan, Haodong Du, Jiao Wang, Rong Jia, and Zhenyu Zong. 2023. "An Ensemble Deep Learning Model for Provincial Load Forecasting Based on Reduced Dimensional Clustering and Decomposition Strategies" Mathematics 11, no. 12: 2786. https://doi.org/10.3390/math11122786

APA Style

Wang, K., Du, H., Wang, J., Jia, R., & Zong, Z. (2023). An Ensemble Deep Learning Model for Provincial Load Forecasting Based on Reduced Dimensional Clustering and Decomposition Strategies. Mathematics, 11(12), 2786. https://doi.org/10.3390/math11122786

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Ensemble Deep Learning Model for Provincial Load Forecasting Based on Reduced Dimensional Clustering and Decomposition Strategies

Abstract

1. Introduction

2. Methodology

2.1. An Ensemble Forecasting Model Based on an Improved Load Clustering and Decomposition Strategy

2.2. Load Characteristic Dimensionality Reduction Based on Singular Value Decomposition (SVD)

2.3. K-Means Clustering Algorithm

2.4. Load Decomposition Based on VMD Algorithm

2.5. LSTM

2.6. CNN-GRU

2.6.1. CNN

2.6.2. GRU

2.7. Performance Evaluation

3. Case Analysis

3.1. Load Characteristic Extraction

3.2. Analysis of Clustering Results for Ten City Loads

3.3. Analysis of the Results of the Frequency Domain Decomposition of the Load

3.4. Analysis of Prediction Results Based on a Proposed Ensemble Model

3.5. Comparison of the Proposed Forecast and Baseline Schemes

3.5.1. Five Comparative Baseline Schemes

3.5.2. Comparison of Predictive Results for Various Types of Loads (Proposed Scheme and Schemes 1–4)

3.5.3. Comparison of the Predicted Results of the Total Provincial Load (Proposed Scheme and Schemes 1–5)

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI