Load Forecasting Based on LVMD-DBFCM Load Curve Clustering and the CNN-IVIA-BLSTM Model

Hu, Linjing; Wang, Jiachen; Guo, Zhaoze; Zheng, Tengda

doi:10.3390/app13127332

Open AccessArticle

Load Forecasting Based on LVMD-DBFCM Load Curve Clustering and the CNN-IVIA-BLSTM Model

¹

College of Electric Power, Inner Mongolia University of Technology, Hohhot 010051, China

²

The Sixth Research Institute of China Aerospace Science and Industry Corporation, Hohhot 010010, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(12), 7332; https://doi.org/10.3390/app13127332

Submission received: 27 April 2023 / Revised: 12 June 2023 / Accepted: 15 June 2023 / Published: 20 June 2023

(This article belongs to the Topic Soft Computing)

Download

Browse Figures

Versions Notes

Abstract

:

Power load forecasting plays an important role in power systems, and the accuracy of load forecasting is of vital importance to power system planning as well as economic efficiency. Power load data are nonsmooth, nonlinear time-series and “noisy” data. Traditional load forecasting has low accuracy and curves not fitting the load variation. It is not well predicted by a single forecasting model. In this paper, we propose a novel model based on the combination of data mining and deep learning to improve the prediction accuracy. First, data preprocessing is performed. Second, identification and correction of anomalous data, normalization of continuous sequences, and one-hot encoding of discrete sequences are performed. The load data are decomposed and denoised using the double decomposition modal (LVMD) strategy, the load curves are clustered using the double weighted fuzzy C-means (DBFCM) algorithm, and the typical curves obtained are used as load patterns. In addition, data feature analysis is performed. A convolutional neural network (CNN) is used to extract data features. A bidirectional long short-term memory (BLSTM) network is used for prediction, in which the number of hidden layer neurons, the number of training epochs, the learning rate, the regularization coefficient, and other relevant parameters in the BLSTM network are optimized using the influenza virus immunity optimization algorithm (IVIA). Finally, the historical data of City H from 1 January 2016 to 31 December 2018, are used for load forecasting. The experimental results show that the novel model based on LVMD-DBFCM load c1urve clustering combined with CNN-IVIA-BLSTM proposed in this paper has an error of only 2% for electric load forecasting.

Keywords:

power load forecasting; LVMD; DBFCM; CNN; BLSTM; IVIA

1. Introduction

As quality of life has improved, the demand for power energy has also risen, and thus, the requirements for power generation and transmission and the use of electricity have increased [1]. Power load forecasting applies machine learning methods to mine the key factors affecting load from historical data, such as weather and time data, to build load forecasting models [2]. Load forecasting is the basis for ensuring the balance of power supply and demand. Accurate forecasting results can reduce the pressure on transmission and distribution links and facilitate the optimal scheduling of power transmission links. They can effectively reduce power generation costs and improve economic and social benefits [3,4].

Traditional forecasting methods use the overall historical load numbers as their basis [5,6,7,8,9]. The trend extrapolation predicts future loads based on the historical trends of the predictor variables. It predicts near future loads with high accuracy, but error gradually increases with time. The regression analysis method predicts future load changes by establishing a functional relationship between variables. It has high fitting ability but has a large error in predicting future changes. Time series extrapolation (the autoregressive moving average model) approximates a nonlinear relationship with a linear equation [10]. The grey forecasting method performs correlation analysis by identifying the degree of dissimilarity in development trends between system factors. Traditional forecasting models have a relatively simple structure and require highly accurate historical data, resulting in forecasting results with poor accuracy.

In recent decades, deep learning has made important breakthroughs in the field of artificial intelligence. In 1957, Frank Rosenblatt invented an artificial neural network to solve complex problems by means of a multilayer perceptron (MLP). Deep neural networks (DNNs) are currently a popular topic in load prediction research, and recurrent neural networks (RNNs) have advantages in memory parameter sharing and Turing completeness [11]. Long short-term memory (LSTM) networks overcome the problem that RNNs are prone to gradient disappearance/explosion, but their ability to correct errors is weak [12]. M. Schuster proposed a BLSTM network that can make full use of the hidden layer history state and has stronger robustness [13]. Yet, BLSTM networks have many parameters, and without proper optimization, the model may be overfitted or slow to train, resulting in inefficiency. The number of hidden layer neurons, the number of training epochs, the learning rate, and the regularization coefficient in the BLSTM network are optimized using the IVIA. It can improve the model’s efficiency. The challenges of the machine learning algorithms to load forecasting for power systems, with complex, nonsmooth, nonlinear time-series and “noisy” data, will decrease the accuracy of the algorithm, and a single prediction model has difficulty meeting the accuracy requirements. Integrating a hybrid preprocessing method can significantly improve the prediction accuracy [14,15,16,17].

In summary, to improve the accuracy and meet the needs of practical problems, this paper proposes a new model based on data mining and deep learning. It considers not only historical load data but also weather information, date types, real-time electricity prices, etc. Historical data load, weather information, and real-time electricity prices are normalized, and the date and holiday information are expressed through One-Hot encoding. The DBFCM algorithm is applied to load curve clustering to overcome the problems of traditional C-means clustering and fuzzy C-means clustering in which data are clustered into small classes, inaccurate classification is performed, and the “uniform effect of the cluster size” degrades performance. First, the LVMD is used to decompose and denoise the input data to improve the continuity and stability of the data. Second, the DBFCM algorithm is used to perform load curve clustering. Third, feature fusion extraction is performed using CNN. In addition, the extracted feature vectors are used as the input of IVIA-BLSTM. The IVIA is proposed to optimize the parameters in the BLSTM network. Finally, this paper proposes a new load forecasting model based on the LVMD-DBFCM algorithm and CNN-IVIA-BLSTM.

The remainder of this article is organized as follows. Section 2 describes the IVIA algorithm, Section 3 presents the methodology used in this paper, Section 4 is a case study of this paper, and Section 5 concludes the paper.

2. Influenza Virus Immunity Optimization Algorithm

The IVIA is a new metaheuristic optimization algorithm that is inspired by the way influenza viruses spread in a population and the process of population immunity. The process of an individual being infected is divided into three states: uninfected, infected, and immune, and the population has herd immunity when the number of immune individuals in the population exceeds 80% of the total population. This section examines the IVIA with 23 sets of standard test functions and compares it with seven optimization algorithms proposed in recent years. They are turbulent flow of water-based optimization (TFWO) [18], golden eagle optimization (GEO) [19], the parasitism–predation algorithm (PPA) [20], the rat swarm optimizer (RSO) [21], gray wolf optimization (GWO) [22], particle swarm optimization (PSO) [23], and whale optimization algorithm (WOA) [24]. The results of the tested functions show that the IVIA converges faster with higher accuracy, optimizes the function better, and finds the optimal value of the function in fewer iterations. Then, the IVIA is applied to the neural network BLSTM training of the power load forecasting model to optimize the number of hidden layer neurons, the number of training epochs, the learning rate and the regularization coefficient, and other relevant parameters to improve the accuracy of the forecasting model.

2.1. Biological Characteristics

The influenza virus is very contagious, spreads quickly, and has many routes of transmission [25]. The spread of the virus in the population can be achieved through droplet transmission and physical contact, and the rate of spread is related to the social distance between people [26]. The vast majority of the population can recover on their own after infection, and those who recover produce the corresponding antibodies in their bodies and have immune functions. A small number of people with underlying diseases, that is, elderly people or those with weak resistance, are at risk. When the number of immune individuals in the population exceeds 80%, the population has herd immunity, thus preventing the next round of disease transmission. A schematic diagram of the transmission process of the influenza virus in a population is shown in Figure 1.

The population is divided into infected individuals, highly susceptible individuals, susceptible individuals, safe individuals, and absolutely safe individuals.

Extremely susceptible individuals are direct contacts of infected individuals, who are socially close to infected individuals and are infected by getting influenza virus from infected individuals; susceptible individuals are indirect contacts of infected individuals who are socially distant from the infected individuals, may be infected through third parties as vectors, and have a higher risk of infection; safe individuals are not in contact with infected individuals or other individuals, and because the influenza virus is transmitted as a vector, through a change in social distance, safe individuals may become susceptible individuals or infected individuals; absolutely safe individuals have immune functions through vaccination or healing after infection and are not susceptible to infection, regardless of the contact distance of infected individuals. In this paper, the influenza virus immunization algorithm is proposed through the transmission of the influenza virus in a population and the immunization mechanism. To map the spread process of the influenza virus with the optimization algorithm, the following assumptions are made:

The initial number of infected individuals is small, and they are randomly distributed in the population. The category to which an individual currently belongs and the corresponding update formula are determined based on the distance between the individual and the infected individual;
The overall number of the population is kept constant, and absolutely safe individuals randomly move to different locations in the population. The number and location of other individuals change dynamically with the number of iterations;
Individuals in the population have three states: infected, uninfected, and immune. Infected individuals have immunity after infecting other individuals and will not be infected in the following process. A population is immune when 80% or more of the individuals in the population are immune;
Individuals are infected by contracting virus cells from an infected person, and the process of virus cell exchange is shown in Figure 2. The influenza virus cells and the diseased cells are taken as the smallest units within an individual to represent the dimension of the optimization problem.

Immunized individuals: The individuals who are categorized as immunized are protected against the virus, and they are not affected by infected individuals.

Figure 3 shows the proportion of immune individuals to the entire population with the 23 sets of standard test functions run by IVIA. From the figure, except for functions F4, F7, F9, F16, F21, and F23, all the remaining 17 tested functions stop the iterative search process when the population immunity rate reaches 0.8. In this paper, the minimum population immunity rate for IVIA is set at 0.8, i.e., the search for optimization stops when the number of immunized individuals in the population exceeds 80% of the total. It ensures that the algorithm has high optimization results.

In this paper, we propose the IVIA based on the transmission and immunization process of the influenza virus. The influenza virus spreads rapidly in a population, and the growth curve of the cumulative number of infections is shown in Figure 4. Curve ① indicates that the number of infected individuals in the population increases rapidly within 2~3 weeks, and curves ② and ③ indicate that the number of infected individuals peaks at 7~8 weeks and then gradually starts to decline. Curve ④ indicates that the number of infected individuals gradually decreases and the number of immunized individuals increases near the end of this influenza outbreak. The IVIA has a good optimization capability at the beginning of the iteration. Figure 4 is a schematic diagram summarized according to the epidemic transmission rules; the website of the epidemic transmission rules is shown below.

https://www.zhzyw.com/crb/gm/08521151042985F4176E7H4KEC0B47A.html (accessed on 12 June 2023).

2.2. Mathematical Model

The process of influenza virus transmission in a population is mapped with the influenza virus immunization algorithm. The population is represented by the matrix

X

, as shown in Equation (1), where

n

is the population size and the number of individuals, and

d

is the number of cells invaded by the influenza virus into the human body, corresponding to the dimensionality of the objective function. The initialization process of the algorithm randomly generates an

n \times d

dimension matrix

X

, which sets the initial number of infected individuals to

x

.

\begin{matrix} X = & \begin{matrix} [ & \begin{matrix} x_{11} & x_{12} & \dots & x_{1 d} \\ x_{21} & x_{22} & \dots & x_{2 d} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ x_{n 1} & x_{n 2} & \dots & x_{n d} \end{matrix} & ] \end{matrix} \end{matrix}

(1)

The fitness values of different individuals in the population are represented by Equation (2). Each row in the matrix is represented as the fitness value of the current individual. The individual corresponding to the optimal fitness value is the current optimal solution of the objective function.

F (x) = [\begin{matrix} f (x_{11}, x_{12}, \dots, x_{1 d}) \\ f (x_{21}, x_{22}, \dots, x_{2 d}) \\ ⋮ \\ f (x_{n 1}, x_{n 2}, \dots, x_{n d}) \end{matrix}]

(2)

The distance between individuals is represented by C, as shown in Equation (3).

x = {x_{1}, x_{2,} \dots, x_{d}}

is the position of individual x,

y = {y_{1}, y_{2}, \dots, y_{d}}

is the position of individual y, and d is the dimension of the optimization problem.

C = \sqrt{{(x_{1} - y_{1})}^{2} + {(x_{2} - y_{2})}^{2} + \dots + {(x_{d} - y_{d})}^{2}}

(3)

L is the maximum safety distance, as shown in Equation (4).

(L_{\min}, L_{\max})

represents the range of values for individuals.

L = \frac{1}{2} \sqrt{L_{\min}^{2} + L_{\max}^{2}}

(4)

The virus spreading process corresponds to the iterative optimization-seeking process in the algorithm. It is assumed that the safe distance between other individuals and the infected person when the influenza virus spreads in the population is L. The iterative formulas for individuals in different cases are shown in Equations (5) to (9).

Infected individuals are updated by Equation (5):

x_{i, j} (t + 1) = x_{i, j} (t) {(\exp (\frac{N_{\max} - n}{N_{\max}} - 1) \times \frac{1}{d - 1})}^{q}

(5)

When the contact distance between individuals in the population and infected individuals is

C < 0.2 L

, the individuals are moved into the infected state. The position of infected individuals is updated by Equation (5).

N_{\max}

is the maximum number of iterations,

n

is the current number of iterations,

d

is the dimensionality of the optimization problem, and

q

is used to regulate the individual position update rate. The larger the value is of

q

, the faster the position update rate and the higher the accuracy of the optimization search.

Extremely susceptible individuals are updated by Equation (6):

\begin{array}{l} x_{i, j} (t + 1) = x_{i, j} (t) + α |x_{i, j} (t) - x_{i, q} (t)| \\ α = \frac{1}{d} \sum_{p = 1}^{d} r a n d (- 1, 1) \end{array}

(6)

When the contact distance between individuals in the population and infected individuals is

0.2 L < C < 0.5 L

, the position is updated by Equation (6).

x_{i, q} (t)

is the random acquisition of viral cells by highly susceptible individuals; α represents update coefficient at the contact distance of

0.2 L < C < 0.5 L

. When the number of exchanged cells exceeds a quarter of the total number of cells, the highly susceptible individuals are transformed into infected individuals. Cell exchange occurs only once during each iteration, and the number of exchanges is random.

Susceptible individuals are updated by Equation (7):

\begin{array}{l} x_{i, j} (t + 1) = x_{i, j} (t) + β |x_{i, j} (t) - x_{i, v} (t)| \\ β = \exp (\frac{C - L}{ε \times N_{\max}}) \\ ε = r a n d (0, 1) \end{array}

(7)

When the contact distance between individuals in the population and infected individuals is

0.5 L < C < 0.8 L

, the position is updated by Equation (7).

x_{i, v} (t)

is a random acquisition of viral cells by highly susceptible individuals, β represents update coefficient at the contact distance of

0.5 L < C < 0.8 L

,

ε

is a random number in the range 0~1. When the number of exchanged cells has exceeded one-third of the total number of cells, the highly susceptible individuals are transformed into infected individuals.

Secure individuals updated by Equation (8):

\begin{matrix} x_{i, j} (t + 1) = x_{i, j} (t) + γ |x_{i, j} (t) - x_{i, w} (t)| \\ f (x_{i, w}) = \arg \min f (x_{i, j}) \\ γ = r a n d (- 1, 1) \end{matrix}

(8)

When the contact distance between individuals in the population and infected individuals is

0.8 L < C < L

, the position is updated by Equation (8). γ is a random number in the range −1~1. There is no cellular exchange between safe individuals and infected individuals. However, as the number of infected individuals increases, the contact distance between safe individuals and infected individuals may decrease, and safe individuals may become susceptible.

Absolutely safe individuals updated by Equation (9):

\begin{array}{l} x_{i, j} (t + 1) = λ x_{i, j} (t) \\ λ = γ \times (\frac{\sqrt{2} - 1}{2}) L \end{array}

(9)

Absolutely safe individuals are immune, and the contact distance between individuals and infected individuals is

C > L

; λ represents the update coefficient at the contact distance of

C > L

. They randomly change their positions during the iterative process and can be used to escape the current search region when the algorithm falls into local optima.

Stop criterion IVIA: The algorithm terminates when the number of immune individuals in the population exceeds more than 80% of the population or when the maximum number of iterations is reached. Therefore, the individual with the highest population immunity corresponds to the optimal solution of the optimization problem, and the corresponding function value is the optimal value of the optimization problem. If the problem to be solved requires higher accuracy, the population immunity rate in the Algorithm 1 can be increased. The pseudo-code of IVIA is presented below.

Algorithm 1 IVIA pseudo-code

Input:

n: The number of people

N_max: The maximum number of iterations

m: Initial number of infected individuals (usually set to 1)

L: Maximum safe contact distance

R: Herd immunity ratio

C: The distance between individuals

lb,ub: Search boundary

1: Population initialization

2: Setting parameters

3: while (t < N)

4: Calculate fitness value and sort

5: for i = 1:n

6: if (C < 0.2L) then

7: Using Equation (5) to update the location of the infected individuals

8: Record the current status of the individuals

9: else if (0.2L < C < 0.5L) then

10: Using Equation (6) to update the location of highly susceptible individuals

11: Record the current status of the individuals

12: else if (0.5L < C < 0.8L) then

13: Using Equation (7) to update the location of susceptible individuals

14: Record the current status of the individuals

15: else if (0.8L < C < L) then

16: Using Formula (8) to update the position of a safe individuals

17: Record the current status of the individual

18: else

19: Using Equation (9) to update the position of an absolutely safe individuals

20: end if

21: Recalculate fitness value

22: end for

23: Calculate immunity rate

24: t = t + 1

25: end while

2.3. Algorithm Testing

To test the actual optimization effect of the proposed IVIA, it was evaluated in this paper on 23 sets of standard test functions. The optimization results were compared with those of turbulent flow of water-based optimization (TFWO), golden eagle optimization (GEO), the parasitism–predation algorithm (PPA), the rat swarm optimizer (RSO), gray wolf optimization (GWO), particle swarm optimization (PSO), and the whale optimization algorithm (WOA).

Table 1 shows the expression of the test function, the dimension of the optimization problem, and the overall search range, and the last column shows the minimum value of the test function expectation, which is the theoretical optimum. The IVIA population size

n

is 30, the maximum number of iterations

N_{\max}

is 1000, the maximum safe contact distance

L

is

0.5 u b

, and the population immunity rate

R

is 80% during the test period. The number of dimensions of the test function and the parameters, such as search intervals

l b

and

u b

, are set according to Table 1. The overall and iteration numbers of the other optimization functions are the same as those of the IVIA.

The viability of the proposed IVIA is tested using 23 well-known benchmark functions with different sizes and complexity. Figure 5 shows the three-dimensional graphs of the standard test functions and their corresponding test results. The convergence of the IVIA is very good for both single-peaked and multipeaked functions.

From the test results in Table 2, the accuracy of the IVIA for function finding is much higher than that of several other optimization algorithms, and the theoretical optimal value can be found for some functions. IVIA has fewer iterations, a shorter running time, and faster convergence compared to the other seven algorithms.

For comparative evaluation, the proposed IVIA is compared against seven well-established comparative methods using the same benchmark functions. IVIA optimizes best in F1, F2, F3, F4, F5, F6, F7, F8, converges fastest in F11, F12, F13, F14, F15, F17, and it has the same effect as other algorithms in F18, F19, F20, F21, F22, F23. The comparative results show the effectiveness of IVIA.

In summary, the IVIA is based on the characteristics of fast convergence, better robustness, and good optimization search. In this paper, the IVIA is selected to optimize the relevant parameters, such as the number of neurons in the hidden layer, the number of training times, the learning rate, and the regularization coefficients in the BLSTM network.

3. Methodology

In this paper, we propose a method for power load forecasting based on LVMD-DBFCM load curve clustering, and the CNN-IVIA-BLSTM model is shown in Figure 6. First, data preprocessing is performed, and the influences of load are divided into continuous and discrete series. After normalizing the continuous sequences, they include load data, meteorological data (temperature, wind speed, and humidity), and economic factors. One-hot encoding is applied to discrete sequences, and the discrete sequences include the data types of workdays, weekends, and holidays. Load data decomposition and denoising with LVMD are performed on the preprocessed data. Then, the power load curve is clustered using the DBFCM algorithm. The processed data are extracted with a 1D CNN for feature extraction and finally, predicted with the IVIA-BLSTM model.

3.1. Self-Built Dataset

We expect to study the electrical load of a particular city or region where such datasets are rare. Data from the internet are collected and organized to produce an HS dataset. The following cities are replaced by H City because the electricity data relate to the economic performance of the country. The dataset includes 1096 days of electricity load data from 1 January 2016 to 31 December 2018, in H City. The dataset was divided; 80% is used as the training set and 20% is used as the test set.

The effects of the three factors of temperature, humidity, and wind speed on the power load are as follows. In summer, the temperature is high, and as air-conditioners usage increases, the power load increases. When the relative humidity is in the sensitive range 40~95%, the meteorological sensitive load varies significantly with the relative humidity. The load decreases with the increase of relative humidity. When the wind speed is in the sensitive range 2~6 m/s, the weather sensitive load changes significantly with the wind speed. The load increases with the increase of wind speed [27].

3.1.1. Dataset Visualization

The overall power load data from 2016–2018 are presented in Figure 7. From the box-whisker plot, the difficulty of prediction is reflected by the high number of data outliers in the morning from 6:00 to 8:00 and in the afternoon from 17:00 to 18:00 and 19:45 to 22:00. In summary, the importance of the data preprocessing and clustering and prediction models that follow in this paper is illustrated.

3.1.2. Data Preprocessing

Abnormal data recognition and correction

Data preprocessing is first performed to resolve data outliers before clustering and prediction is performed [28].

Missing data: Lagrange interpolation is used to handle missing values.

Data duplication: Remove duplicate datasets by similarity.

Data mutation: 3 criteria.

Data normalization

Normalized values for the clustering process can reduce the adverse effect on the clustering effect in the data domain due to the weight of the distance occupied between different attributes of the initial values. Normalization can eliminate the influence of the size of the load data volume on the distance in the clustering analysis, and thus, the information on the load pattern is highlighted [29].

Data normalization: Uniform metrics on data. Z Score normalization is shown in Equation (10). After normalization, the load data are normalized to the interval [−1, 1].

Z_{X} = \frac{x_{i} - \bar{X}}{σ}

(10)

Data preprocessing: The equation for normalizing continuous data is given in Equation (11). Using One Hot encoding for discrete sequences, each state has unique register bits; only one bit is 1, and the rest are 0.

M i n M a x_{x} = \frac{x_{i} - x_{\min}}{x_{\max} - x_{\min}}

(11)

3.2. LVMD-DBFCM Imbalanced Data Clustering

The goal of load clustering is to mine the typical daily load model [30].

Step 1. De-noising. The load data are decomposed and denoised using the LVMD (in Section 3.2.1).

Step 2. Clustering. The load data are clustered using the DBFCM (in Section 3.2.2).

Step 3. Clustering evaluation. The load data clustering effect are analyzed using the SSE evaluation index (in Section 3.2.3).

3.2.1. LVMD Double Decomposition Modal Strategy

VMD Variational Mode Decomposition

VMD [31] is an iterative process that is used to search for the optimal solution of a variational model and to determine the mode

u_{k} (t)

and its corresponding central frequency

w_{k}

and bandwidth. VMD has good denoising capabilities. VMD denoising are as follows:

Initialize parameters ${\hat{u}}_{k}^{1}$ , $w_{k}^{1}$ , ${\hat{λ}}^{1}$ , and let $n = 0$ .
$n = n + 1$ , update $w_{k}$ and $u_{k}$ .
$k = k + 1$ , repeat the previous step until $k = K$ .
Update $λ$ , and the equation is shown in Equation (12).
Repeat steps 2–5 until the end, when the condition of Equation (13) is satisfied.
The component that contains the minimum time information after decomposition is judged and considered to be random noise that is eliminated. Then, the data are reconstructed by the remaining modal component pairs, and the reconstructed data are the load data that do not contain noise.

{\hat{λ}}^{n + 1} (ω) = {\hat{λ}}^{n} (ω) + τ [\hat{f} (ω) - \sum_{k} {\hat{u}}_{k}^{n + 1} (ω)]

(12)

{\sum_{k} ‖{\hat{u}}_{k}^{n + 1} - {\hat{u}}_{k}^{n}‖}_{2}^{2} / {‖{\hat{u}}_{k}^{n}‖}_{2}^{2} < e

(13)

LMD Local Mean Decomposition

LMD is a new time–frequency analysis method that adaptively decomposes complex signals into a finite sum of product function (PF) components. Each of these PF components is actually a single component of the AM-FM information. LMD has good feature decomposition abilities [32,33]. The LMD principle is as follows.

First, find all the extreme points contained in the data series $x (t)$ , assuming that the distribution of extreme points is ${n_{1}, n_{2}, n_{3}, \dots}$ , and then calculate the mean $m_{i}$ and envelope $a_{i}$ of the adjacent extreme points according to Equations (14) and (15).

m_{i} = \frac{n_{i} + n_{i + 1}}{2}

(14)

a_{i} = \frac{|n_{i} - n_{i + 1}|}{2}

(15)

The local mean function curve

m_{11} (t)

and the envelope function curve

a_{11} (t)

are obtained by connecting

m_{i}

and

a_{i}

with line segments and smoothing, respectively, and the sliding average formula is given in Equation (16):

Y_{s} (i) = \frac{1}{2 R + 1} (Y (i + R) + Y (i + R - 1) + \dots + Y (i - R))

(16)

where

Y (i)

is the sequence to be smoothed,

2 R + 1

is the sliding span, and

R

is the distance from

Y (i)

to the starting point of the sliding process.

2.: $m_{11} (t)$ is removed from the original signal $x (t)$ to obtain $h_{11} (t)$ , and $s_{11} (t)$ is obtained by demodulating $h_{11} (t)$ using Equations (17) and (18):

h_{11} (t) = x (t) - m_{11} (t)

(17)

s_{11} (t) = \frac{h_{11} (t)}{a_{11} (t)}

(18)

When

a_{12} (t) = 1

,

s_{11} (t)

decomposes into a pure FM function; if

a_{12} (t) \neq 1

, repeat the above process until

a_{1 n} (t) = 1

and

s_{1 (n - 1)} (t)

is the corresponding pure FM function; see Equation (19) for details:

h_{1 n} (t) = s_{1 (n - 1)} (t) - m_{1 n} (t)

(19)

3.: The PF component of the envelope signal is the product of the envelope signal and the pure FM signal, and the above steps are repeated after stripping the PF component from the original signal to obtain the new signal until the residual component is a monotonic function. The final result of LMD is shown in Equation (20):

x (t) = \sum_{p = 1}^{k} P F_{p} (t)

(20)

where the last component is a monotonic function.

LMD decomposes the original signal into multiple high-frequency and low-frequency components without distorting the original signal in the decomposition process. If the sliding step of the sliding average algorithm is not chosen properly in the first step, the decomposition result will be greatly affected. In this paper, the original sliding average process is replaced by the three-time Hermite interpolation method to improve the decomposition accuracy of the LMD algorithm while reducing the overshoot and undershoot of the envelope. That is, after obtaining the extreme value points, the Hermite interpolation method is used to form the upper and lower envelopes, and then the other steps continue to apply the LMD without changes.

3.2.2. DBFCM Double Weights Fuzzy C-Means Algorithm

FCM Fuzzy C-means Algorithm

FCM is a soft clustering algorithm and is unlike traditional hard clustering (HCM) algorithms, which allows the same object to belong to the same cluster [34]. By optimizing the objective function to obtain the affiliation of each sample point to all class centers, the affiliation range is [0, 1] to determine the class of the sample points to achieve the purpose of automatic classification of sample data [35].

The objective function and constraints s.t. of the FCM algorithm are given in Equation (21):

\begin{array}{l} \min_{u_{i j}, c_{i}} J (u_{i j}, c_{i}) = {\sum_{i = 1}^{K} \sum_{j = 1}^{N} u_{i j}^{m} ‖x_{j} - c_{i}‖}^{2} \\ s . t . \sum_{i = 1}^{K} u_{i j} = 1, \begin{matrix}  \end{matrix} j = 1, 2, \dots \begin{matrix} , N \end{matrix} \end{array}

(21)

where

u_{i j}

denotes that the sample belongs to the cluster affiliation value,

m

represents the fuzziness,

K

is the number of clusters,

c_{i}

is the i-th clustering center, and

{‖x_{j} - c_{i}‖}^{2}

is the 2 parity of the Euclidean distance from each data point to the cluster center.

The clustering center and affiliation update equations are shown in Equations (22) to (23):

u_{i j} = \frac{1}{\sum_{l = 1}^{K} {(\frac{‖x_{j} - c_{i}‖}{‖x_{j} - c_{l}‖})}^{\frac{2}{m - 1}}}

(22)

c_{i} = \frac{\sum_{j = 1}^{N} u_{i j}^{m} x_{j}}{\sum_{j = 1}^{N} u_{i j}^{m}}

(23)

DBFCM Double Weights Fuzzy C-means Algorithm

Traditional FCM clustering is inaccurate in classification for holidays in small categories. From the date type, the power load data are unbalanced data. FCM has a “uniform effect of cluster size” issue, which affects the recognition effect and causes the algorithm to not recognize the holiday load model implied in the historical dataset [36,37]. In this paper, the DBFCM algorithm is proposed to improve the clustering performance of imbalanced data. The objective function of DBFCM utilizes the clustering volume as the weight and the weighted Euclidean distance as the metric distance.

The affiliation matrix defines the class volume, which is then introduced as a constraint s.t. into the traditional FCM algorithm objective function, as shown in Equation (24):

J_{D B F C M} = \sum_{i = 1}^{K} \sum_{j = 1}^{N} \frac{u_{i j}^{m} {‖x_{j} - c_{i}‖}^{2}}{v_{j}}

(24)

where

v_{j}

is the volume of the j-th class; see Equation (25). Constraint s.t. is shown in Equation (26):

v_{j} = \frac{\sum_{i = 1}^{N} u_{i j}}{N}

(25)

s . t . \sum_{i = 1}^{K} u_{i j} = 1, j = 1, 2, \dots, N

(26)

The objective function of DBFCM uses the clustering volume as weights, which can balance the volume of each class in the clustering process. Thus, it can compensate for the unequal interactions between classes and improve the clustering performance of traditional algorithms for unbalanced data.

\frac{\partial J_{D B F C M}}{\partial c_{i}} = \frac{\partial {\sum_{i = 1}^{N} u_{i j}^{m} ‖x_{j} - c_{i}‖}^{2} \frac{}{v_{j}}}{\partial c_{i}}

(27)

The derivative of Equation (27) with respect to the affiliation degree is constantly positive. Therefore, the Lagrange multiplier method is used to solve the affiliation degree and clustering center by using a greedy strategy and introducing constraint variables. The Lagrangian equation is given in Equation (28):

L = \sum_{j = 1}^{K} {\sum_{i = 1}^{N} u_{i j}^{m} ‖x_{j} - c_{i}‖}^{2} \frac{}{v_{j}} - \sum_{i = 1}^{N} λ_{i} (\sum_{i = 1}^{K} u_{i j} - 1)

(28)

The partial derivative of Equation (28) is found by setting the partial derivative = 0; see Equation (29):

\frac{\partial L}{\partial u_{i j}} = m u_{i j}^{m - 1} {‖x_{j} - c_{i}‖}^{2} \frac{}{v_{j}} - λ_{j}^{} = 0

(29)

The updated equations for the affiliation

u_{i j}

as well as the clustering center

c_{i}

are given in Equations (30) to (31):

u_{i j} = \frac{\frac{v_{j}}{{‖x_{j} - c_{i}‖}^{\frac{2}{m - 1}}}}{\sum_{q = 1}^{K} \frac{v_{q}}{{‖x_{j} - c_{i}‖}^{\frac{2}{m - 1}}}}

(30)

c_{i} = \frac{\sum_{j = 1}^{N} u_{i j}^{m} x_{j}}{\sum_{j = 1}^{N} u_{i j}^{m}}

(31)

The DBFCM algorithm is more biased towards small classes and avoids the problem of the FCM uniformity effect when dealing with the same sample Euclidean clustering.

2.: The load curve is characteristically weighted by introducing a weighting factor $w$ to the traditional Euclidean distance $d_{i j}$ . The DBFCM is used to perform cluster analysis on the decomposed and reconstructed load curves of LVMD to improve the optimal number of clusters.

The degree of influence of each dimensional feature of the analyzed sample on the classification result is assumed to be

w

. The feature weight coefficients of each dimension

w = {w_{1}, w_{2}, \dots, w_{n}}

are introduced to the Euclidean distance to form a weighted Euclidean distance, which controls the weights of each dimensional feature vector. The traditional Euclidean distance

d_{i j}

and the Euclidean distance

d_{i j}^{*}

with the addition of weight

w

are expressed in Equations (32) to (33):

d_{i j} = {\sqrt{{|x_{i 1} - x_{j 1}|}^{2} + {|x_{i 2} - x_{j 2}|}^{2} + \dots + {|x_{i k} - x_{j k}|}^{2}}}^{}

(32)

d_{i j}^{*} = \sqrt{w_{1} {|x_{i 1} - x_{j 1}|}^{2} + w_{2} {|x_{i 2} - x_{j 2}|}^{2} + \dots + w_{n} {|x_{i k} - x_{j k}|}^{2}}

(33)

After DBFCM adds weight

w

, the clustering center

c_{i}

does not change, but the objective function

J_{D B F C M}

, the affiliation

u_{i j}

, and the Euclidean distance from sample

x_{j}

to the clustering center

c_{i}

are all affected by the weighted Euclidean distance

d_{i j}^{*}

.

In summary, the objective function and affiliation of DBFCM are shown in Equations (32) to (33):

J_{D B F C M} = {\sum_{i = 1}^{K} \sum_{j = 1}^{N} u_{i j}^{m} {‖x_{j} - c_{i}‖}^{2}}_{*} \frac{}{v_{j}}

(34)

u_{i j}^{*} = \frac{\frac{v_{j}}{{‖x_{j} - c_{i}‖}_{*}^{\frac{2}{m - 1}}}}{\sum_{q = 1}^{K} \frac{v_{q}}{{‖x_{j} - c_{i}‖}_{*}^{\frac{2}{m - 1}}}}

(35)

3.2.3. Clustering Validity Quality Evaluation

The clustering criterion used in the clustering process is usually the sum of squared error (SSE). The analytical formula of SSE is shown in the Equation (36):

S S E (q) = {\sum_{i = 1}^{q} \sum_{x \in p_{i}} ‖c_{i} - x‖}_{2}^{2}

(36)

In the formula,

c_{i}

is the i-th cluster center, and

p_{i}

represents the aggregation of data points in the i-th cluster. As the number of clusters

q

increases, the samples are classified more accurately, and the SSE decreases. Theoretically, the smaller the SSE, the better the clustering effect. As the value of

q

increases to a certain level, the rate of the decline in SSE slows down. As shown in Figure 8, when

q = 5

, it is the inflection point of the curve. The most suitable cluster number is 5.

In this paper, two clustering validity indicators are used to assess the quality of clustering: the Calinski–Harabasz indicator (CHI) and the Davies–Bouldi indicator (DBI). The CHI is obtained by the ratio of the compactness to the separation [38]. Thus, the larger the index is, the more compact the result is. The DBI metric estimates intraclass closeness by the distance from the sample point within a class to the center of the class to which it belongs, and the distance between class centers indicates the interclass dispersion [39]. Thus, the smaller the index is, the better the effect. Equations (37) to (38) describe these metrics:

C H I = \frac{T r (B_{k})}{T r (W_{k})} \times \frac{n - k}{k - 1}

(37)

D B I = \frac{1}{k} \sum_{i = 1}^{k} \max_{j = 1 \dots k, i \neq j} \frac{s_{i} + s_{j}}{{‖ c_{i} - c_{j} ‖}_{2}^{2}}

(38)

3.3. The Proposed CNN-IVIA-BLSTM Forecasting Model

Using the CNN-IVIA-BLSTM model to forecast the power load data:

Step 1. Feature fusion extraction. CNN is used to extract the data feature.

Step 2. Forecasting. IVIA optimizes the parameters related to the BLSTM network, then uses the optimized BLSTM to predict.

Step 3. Power load forecast evaluation. The load data forecasting effect analyzed using RMSE, MAE, and MAPE.

3.3.1. CNN: Convolutional Neural Network

A CNN can automatically extract potential features between massive loads of continuous and discontinuous data to build a compressed complete feature vector for the top fully connected layer [40]. It is a hierarchical neural feedforward network and consists of a series of network layers with different functions. It mainly comprises the input layer, implied layer, and output layer. Among these, the implied layer takes the convolutional layer as the core, and the main function of the convolution layer is to extract the feature fusion of the input data as an arithmetic of the convolution layer. In addition, the CNN includes pooling layers (to reduce the number of parameters of the neural network) and the Dropout layer (to prevent data overfitting). The 1D convolutional kernel is two-dimensional and has a length and width; however, there is a sliding window in the width or height direction, and multiplication is transformed into addition. In this paper, a 1D CNN is used, with a convolutional kernel size of

3 \times 3

, a sliding window of 10, and a step size of 1. The overall structure of the convolutional network is shown in Figure 9. A convolutional block is a combination of M convolutional layers and b convergence layers. It can be stacked with N consecutive convolutional blocks followed by K fully connected layers.

3.3.2. BLSTM Bidirectional Long Short-Term Memory Network

BLSTM can remember a sequence well, can solve the dependency problem of longer time spans, and has a strong advantage in improving time series correlation. Based on the LSTM (long short-term memory) network, the forward and reverse network structures form a closed loop of information, which can better verify and correct the process error information while maintaining the bidirectional data information, which has stronger robustness [41]. The LSTM network structure is shown in Figure 10, and the relevant calculation equations are shown in (39)–(44).

I n p u t g a t e : i_{t} = σ (W_{i} x_{t} + U_{i} h_{t - 1} + b_{i})

(39)

F o r g e t g a t e : f_{t} = σ (W_{f} x_{t} + U_{f} h_{t - 1} + b_{f})

(40)

O u t p u t g a t e : o_{t} = σ (W_{0} x_{t} + U_{o} h_{t - 1} + b_{o})

(41)

N e w m e m o r y u n i t : c_{t}' = \tanh (W_{c} x_{t} + U_{c} h_{t - 1} + b_{c})

(42)

F i n a l m e m o r y u n i t : c_{t} = f_{t} \otimes c_{t - 1} + i_{t} \otimes c_{t}'

(43)

O u t p u t : h_{t} = o_{t} \otimes \tanh (c_{t})

(44)

where

W_{i}

,

W_{f}

, and

W_{o}

represent input weight vectors;

U_{i}

,

U_{f}

, and

U_{o}

represent upper output weight vectors; and

b_{i}

,

b_{f}

,

b_{o}

, and

b_{c}

are bias vectors. Sigmoid is generally selected as the excitation function for

σ

, which mainly plays a role of gating. It has an output between 0 and 1, which matches the physical definition of gating and is very close to 1 or 0 when the input is large or small, thus ensuring that the gate is open or closed. The tanh is an option to generate the new memory unit c_t′ due to a faster convergence rate with an output between −1 and 1, which coincides with the center of the feature distribution being 0 in most scenarios. The related formulas are given in Equations (45) and (46):

s i g m o i d (x) = \frac{1}{1 + e^{- x}}

(45)

\tanh (x) = \frac{e^{x} - e^{- x}}{e^{x} + e^{- x}}

(46)

LSTM can only be passed in one direction, and the unit computation of the bidirectional neural network is connected with the unidirectional one. BLSTM consists of an input layer, forward LSTM layer, reverse LSTM layer, and output layer. However, the hidden layer of the bidirectional neural network must save two values, i.e.,

A

, which is involved in the forward calculation, and

A'

, which is involved in the reverse calculation. The final output value depends on the sum. The BLSTM network structure is shown in Figure 11.

However, a common problem is that BLSTM networks have many parameters, and without proper optimization, the model may be overfitted or slow to train, resulting in inefficiency. Too few nodes in the hidden layer will cause the model to not have the necessary learning ability and information processing capability, and too many nodes will increase the complexity of the network structure and make the learning process easily fall into local minima, which makes the network slow. When the learning rate is too high, the cost function is not easy to reduce to the lowest point, it is not easy to converge at the lowest point, and the convergence effect is poor. When too much training is performed, the gradient descent process may cross the nadir, which causes the training rate to be too low. In the case of reasonable tuning parameters, the more layers and neurons there are, the higher the accuracy rate; however, this can also lead to the overfitting phenomenon, and a regularization process can be used to solve the overfitting problem. In this paper, the number of implicit layer neurons, training times, learning rates, and regularization coefficients in BLSTM are optimized using the IVIA to improve the performance of the sequence modelling task.

3.3.3. Hybrid Forecasting Model

The CNN-IVIA-BLSTM model is a combined prediction model consisting of a CNN and BLSTM. BLSTM is a commonly used model architecture of deep learning models for sequence modelling tasks, and it has achieved good performance in many tasks. The CNN is first used to extract the feature vectors consisting of load influencing factors, which can be considered a “feature extractor”, then the extracted feature vectors are used for load prediction with the BLSTM model optimized by the IVIA. The hybrid prediction model structure is shown in Figure 12, which takes full advantage of both the CNN and BLSTM network to ensure the accuracy of the power load.

The forecasting steps are as follows:

Selected information as model input.
LVMD decomposes and denoises the original sequence, and DBFCM performs clustering.
The IVIA population size N, the maximum number of iterations M, and the initial search range of the parameters (the number of neurons in the hidden layer H, training number E, learning rate $η$ , and regularization factor $L 2$ ) are set. The root mean square error ( $y_{R M S E}$ ) is used as the objective function in the optimization algorithm, and finally, the model of the influenza virus immunization algorithm coupled with the bidirectional long and short-term memory network is developed.
The 1D CNN reads the load sequence with a sliding time window of 10 and a step size of 1 for feature extraction.
m prediction models are obtained by inputting the CNN-IVIA-BLSTM prediction models for each component separately.
Finally, the predicted values of the m prediction models are combined to obtain the predicted values of the load.

3.3.4. Power Load Forecast Evaluation Indicator

Three evaluation indexes are set as

y_{M A P E}

(mean absolute percentage error, MAPE),

y_{R M S E}

(root mean square error, RMSE), and

y_{M A E}

(mean absolute error, MAE). The equations are shown in (47) to (49):

y_{M A P E} = \frac{1}{n} \sum_{i = 1}^{n} | \frac{X_{a c t} (i) - X_{p r e d} (i)}{X_{a c t} (i)} |

(47)

y_{R M S E} = \sqrt{\frac{\sum_{i = 1}^{n} {(X_{a c t} (i) - X_{p r e d} (i))}^{2}}{n}}

(48)

y_{M A E} = \frac{1}{n} \sum_{i = 1}^{n} | X_{a c t} (i) - X_{p r e d} (i) |

(49)

In the above equation, n is the total number of predictions,

X_{a c t} (i)

is the real value of the load at moment i, and

X_{p r e d} (i)

is the predicted value of the load at moment i.

4. Case Analysis

In this paper, historical power load data from 2016 to 2018 in H city were used for prediction with a sampling interval of 15 min. LSTM, BLSTM, CNN-BLSTM, LVMD-DBFCM-CNN-BLSTM, and the proposed model in this paper are selected to compare the prediction results.

4.1. Analysis of Data Processing Results

4.1.1. Data Denoising and Decomposition

The original load data are decomposed and denoised using VMD, as shown in Figure 13, and the original load data sequence is decomposed into 8 IMF components. From the figure, we can see that the IMF7–IMF8 values are small and contain more noisy data, which are removed from the original data, then the first six IMF components are used to reconstruct the data to achieve data denoising.

The second step of LVMD is to redecompose the denoised data with LMD. The denoised data are decomposed into a total of six PF components, and the decomposition results are shown in Figure 14. The values of PF1 are larger and maintain the same trend as the original data, containing the main valid information of the data. The values of PF2–PF6 are smaller, among which the periodic changes of PF2–PF4 are more obvious and PF6 is monotonically increasing, which is convenient for the prediction of each component. The randomness of the F5 sequence volatility is strong and the prediction accuracy of the F5 component is not high when making predictions.

4.1.2. Analysis of the Clustering Results

The daily load curve after LVMD-DBFCM clustering is shown in Figure 15.

The clustering algorithm proposed in this paper is compared with k-means [42], FCM [43], and DBFCM. Their comparison is displayed in Table 3 and Figure 16. From the results, it can be seen that the maximum value of the CHI metric is 1142.509 and the minimum value of the DBI metric is 1.083. The clustering validity of the method proposed in this paper is better than that of the other three methods.

4.2. Analysis of the Prediction Results

The comparison between the prediction model proposed in this paper and the four models LSTM, BLSTM, CNN-BLSTM, and LVMD-DBFCM+CNN-BLSTM using three load evaluation indices, RMSE, MAE, and MAPE, is shown in Table 4 and Figure 17. From the results, we can see that the prediction models proposed in this paper have better prediction results than the other four models in all evaluation indices.

Figure 18 shows the prediction graphs of the five prediction models. From the figure, it can be clearly seen that the pink line, representing the model proposed in this paper, fits so closely to the target curve represented by the black line, which can better reflect the trend of the target, indicating that the model proposed in this paper has the better prediction effect than addressed models. Figure 19 shows the prediction error comparison of the 5 models, from which it can be clearly seen that the model proposed in this paper, represented by yellow, has the lowest error.

5. Conclusions

In this paper, a new model based on LVMD-DBFCM load curve clustering and a CNN-IVIA-BLSTM hybrid model for power load forecasting is proposed. This comprehensive technique takes historical load data and influencing factors (meteorology, economy, and data type) into account, where historical load data, meteorological factors, and economic factors are normalized, and the data types are uniquely heat coded.

The novel LVMD-DBFCM algorithm improves the continuity and stability of the data, and the values of the CHI and DBI quality assessment indicators are 1142.509 and 1.083, respectively, both of which reflect the good validity of the clustering method used in this paper. In the new CNN-IVIA-BLSTM model, a CNN is used for feature extraction, BLSTM is used for load forecasting, and the IVIA is used to optimize the relevant parameters in the BLSTM network. The results of the three electric load forecasting evaluation metrics of the hybrid forecasting model show that the RMSE is 31.9942, the MAE is 23.3691, and the MAPE is 1.6421%. The prediction effect of the electric load fits well with the target, and the prediction error is minimized.

Author Contributions

Conceptualization, L.H., J.W., Z.G. and T.Z.; Data curation, L.H., J.W., Z.G. and T.Z.; Formal analysis, L.H., J.W. and T.Z.; Investigation, L.H; Methodology, L.H. and J.W; Software, J.W. and Z.G.; Project administration, L.H.; Resources, L.H.; Supervision, L.H.; Validation, L.H.; Visualization, L.H.; Writing—original draft, L.H., J.W., Z.G. and T.Z.; Writing—review & editing, J.W. and Z.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Huang, Y. Medium-Long Term Power Load Forecasting of a Region. Master’s Thesis, Xihua University, Chengdu, China, 2015. [Google Scholar]
Han, F.J.; Wang, X.H.; Qiao, J. Review on Artificial Intelligence based Load Forecasting Research for the New-type Power System. Proc. CSEE 2023, 1–24. [Google Scholar]
Zhang, Y.; Wang, A.H.; Zhang, H. Overview of smart grid development in China. Power Syst. Prot. Control 2021, 49, 180–187. [Google Scholar]
Gao, D.D.; Gao, S.T. Review on medium-long term power load forecasting study. Sci. Technol. Innov. Her. 2014, 11, 25. [Google Scholar]
Shang, C.; Gao, J.; Liu, H.; Liu, F. Short-term load forecasting based on PSO-KFCM daily load curve clustering and CNN-LSTM model. IEEE Access 2021, 9, 50344–50357. [Google Scholar] [CrossRef]
Huang, J.B. Based on trend extrapolation method analysis of city complexes load development characteristicsitle of the chapter. Rural Electrif. 2019, 7, 39–42. [Google Scholar]
Song, F.; Liu, J.; Zhang, T. The Grey Forecasting Model for the Medium-and Long-Term Load Forecasting. J. Phys. Conf. Ser. 2020, 1654, 012104. [Google Scholar] [CrossRef]
Fan, G.F.; Qing, S.; Wang, H. TSupport Vector Regression Model Based on Empirical Mode Decomposition and Auto Regression for Electric Load Forecasting. Energies 2013, 6, 1887–1901. [Google Scholar] [CrossRef]
Qiu, X.P. Neural Network and Deep Learning. J. Chin. Inf. Process. 2020, 34, 4. [Google Scholar]
Jarquin, C.S.S.; Gandelli, A.; Grimaccia, F.; Mussetta, M. Short-Term Probabilistic Load Forecasting in University Buildings by Means of Artificial Neural Networks. Forecasting 2023, 5, 390–404. [Google Scholar] [CrossRef]
Liu, J.H. Power load forecasting in Shanghai based on CNN-LSTM combined model. Proc. SPIE 2022, 12254, 125–132. [Google Scholar]
Li, J.W.; Luong, T.; Dan, J. When are tree structures necessary for deep learning of representations. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; Association for Computational Linguistics: Stroudsburg, PA, USA, 2015. [Google Scholar]
Gu, H.C.; Mou, P.; Li, J.W. Modeling and application of ethylene cracking furnace based on cross-iterative BLSTM network. CIESC J. 2019, 70, 548–555. [Google Scholar]
Xu, Y.; Xiang, Y.F.; Ma, T.X. Short-term Power Load Forecasting Method Based on EMD-CNN-LSTM Hybrid Model. J. North China Electr. Power Univ. 2022, 49, 81–89. [Google Scholar]
Gao, W.C. Urban Gas Load Forecasting Based on EWT-CNN-LSTM Model. Master’s Thesis, Shanghai Normal University, Shanghai, China, 2021. [Google Scholar]
Sun, G.L.; Li, B.J.; Xu, D.M.; Li, Y.P. Monthly Runoff Prediction Model Based on VMD-SSA-LSTM. Water Resour. Power 2022, 40, 18–21. [Google Scholar]
Yang, Y. Research and Application of Medium and Long Term Load Forecasting Technology. Master’s Thesis, Shenyang Institute of Computing Technology, Chinese Academy of Science, Shenyang, China, 2021. [Google Scholar]
Mojtaba, G.; Faraji, D.I.; Ebrahim, A.; Abolfazl, R. A novel and effective optimization algorithm for global optimization and its engineering applications: Turbulent Flow of Water-based Optimization (TFWO). Eng. Appl. Artif. Intell. 2020, 92. [Google Scholar] [CrossRef]
Abdolkarim, M.B.; Mahmoud, D.N.; Adel, A. Golden eagle optimizer: A nature-inspired metaheuristic algorithm. Comput. Ind. Eng. 2021, 152, 107050. [Google Scholar]
Mohamed, A.A.; Hassan, S.A.; Hemeida, A.M.; Alkhalaf, S. Parasitism-Predation algorithm (PPA): A novel approach for feature selection. Ain Shams Eng. J. 2019, 11, 293–308. [Google Scholar] [CrossRef]
Dhiman, G.; Garg, M.; Nagar, A.; Kumar, V. A novel algorithm for global optimization: Rat Swarm Optimizer. J. Ambient Intell. Humaniz. Comput. 2020, 12, 8457–8482. [Google Scholar] [CrossRef]
Mirjalili, S.; Mirjalili, S.M.; Lewis, A. Grey Wolf Optimizer. Adv. Eng. Softw. 2014, 69, 46–61. [Google Scholar] [CrossRef] [Green Version]
Ramya, S.; Rajesh, N.B.; Viswanathan, B. Particle Swarm Optimization (PSO) based optimum Distributed Generation (DG) location and sizing for Voltage Stability and Loadability Enhancement in Radial Distribution System. Int. Rev. Autom. Control IREACO 2014, 7, 288–293. [Google Scholar]
Jhila, N.; Modarres, K.F.; Akiko, Y. A whale optimization algorithm (WOA) approach for clustering. Cogent Math. Stat. 2018, 5, 1483565. [Google Scholar]
Karol, S.; Ewelina, H.; Katarzyna, L.; Katarzyna, K. Spread of Influenza Viruses in Poland and Neighboring Countries in Seasonal Terms. Pathogens 2021, 10, 316. [Google Scholar]
Oxford, J.S. Special Article: What Is the True Nature of Epidemic Influenza Virus and How Do New Epidemic Viruses Spread. Epidemiol. Infect. 1987, 99, 1–3. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Zhu, Z.W. Study on the Impact of Weather Factors on Characteristics of Electric Load. Ph.D. Thesis, Zhejiang University, Hangzhou, China, 2008. [Google Scholar]
Xu, F.; Weng, G.Q. Research on Load Forecasting Based on CNN-LSTM Hybrid Deep Learning Model. In Proceedings of the 2022 IEEE 5th International Conference on Electronics Technology (ICET), Chengdu, China, 13–16 May 2022; pp. 1332–1336. [Google Scholar]
Hu, Y.C. Research on Power Load Pattern Recognition Method Based on Improved k-means Clustering Algorithm. Master’s Thesis, University of Electronic Science and Technology of China, Chengdu, China, 2018. [Google Scholar]
Zhang, H.; Zhang, Y.; Xu, Z. Thermal Load Forecasting of an Ultra-short-term Integrated Energy System Based on VMD-CNN-LSTM. In Proceedings of the 2022 International Conference on Big Data, Information and Computer Network (BDICN), Sanya, China, 20–22 January 2022; pp. 264–269. [Google Scholar]
Dragomiretskiy, K.; Zosso, D. Variational mode decomposition. IEEE Trans. Signal Process. 2014, 62, 531–544. [Google Scholar] [CrossRef]
Yang, S.S.; Zhou, H.; Zhao, H.Y. Fault diagnosis method for the bearing of reciprocating compressor based on LMD multiscale entropy and SVM. J. Mech. Transm. 2015, 39, 119–123. [Google Scholar]
Wu, D.S.; Yang, Q.; Zhang, J.Y. Ensemble fault diagnosis method based on VMD-LMD-CNN. Bearing 2020, 10, 57–63. [Google Scholar]
Li, P.S.; Li, X.R.; Chen, H.H. The characteristics classification and synthesis of power load based on fuzzy clustering. Proc. CSEE 2005, 25, 73–78. [Google Scholar]
Zhou, K.L. Theoretical and Applied Research on Fuzzy C-means Clustering and Its Cluster Validation. Ph.D. Thesis, Hefei University of Technology, Hefei, China, 2014. [Google Scholar]
Dong, T. Research on Electrical Load Pattern Recognition and Load Forecasting Based on Deep Learning. Master’s Thesis, Jilin University, Changchun, China, 2022. [Google Scholar]
Ma, Z.B.; Xu, S.A.; Zhu, S.B. Power Load Classification Based on Feature Weighted Fuzzy Clustering. ELECTRIC POWER 2022, 55, 25–32. [Google Scholar]
Tomasini, C.; Emmendorfer, L.; Borges, E.N. A methodology for selecting the most suitable cluster validation internal indices. In Proceedings of the 31st Annual ACM Symposium on Applied Computing (SAC ′16), Pisa, Italy, 4–8 April 2016; pp. 901–903. [Google Scholar]
Hao, X.H.; Li, Y.L.; Gu, Q. Power load data clustering algorithm based on DTW histogram. Transducer Microsyst. Technol. 2020, 39, 140–142. [Google Scholar]
Liu, Y.H.; Zhao, Q. Ultra-short-term power load forecasting based on cluster empirical mode decomposition of CNN-LSTM. Power Syst. Technol. 2021, 45, 4444–4451. [Google Scholar]
Zhu, H.S. Design and Implementation of CNN-BLSTM Speech Separation Algorithm Fused With Self-attention Mechanism. Master’s Thesis, Hebei University of Science and Technology, Shijiazhuang, China, 2021. [Google Scholar]
Wang, J.D.; Gu, Z.C.; Ge, L.J. Load Clustering Characteristic Analysis of the Distribution Network Based on the Combined Improved Firefly Algorithm and K-means Algorithm. J. Tianjin Univ. Sci. Technol. 2023, 56, 137–147. [Google Scholar]
Pan, X.G. Research of Fuzzy Clustering Algorithm on Complicated Data and Feature Weight Learning Techniques. Ph.D. Thesis, Jiangnan University, Wuxi, China, 2022. [Google Scholar]

Figure 1. Diagram of the process of influenza virus population transmission.

Figure 2. Diagram of the virus cell exchange process.

Figure 3. The proportion of immune individuals in the whole population.

Figure 4. Growth curve of the number of infected people.

Figure 5. Standard test function optimization result.

Figure 6. The overall structure design of this paper.

Figure 7. Distribution of whole load data in the dataset for 15 min. (a) Data from 00:00 to 12:00; (b) Data from 12:00 to 23:45. Box-whisker plot, the dots replace the abnormal value, the middle line of the box is he median of the data, which represents the average level of the sample data.

Figure 8. The gradient graph of SSE with increasing cluster number

q

from 1 to 10.

Figure 8. The gradient graph of SSE with increasing cluster number

q

from 1 to 10.

Figure 9. Overall structure of a CNN.

Figure 10. LSTM network structure diagram.

Figure 11. BLSTM network structure diagram.

Figure 12. Hybrid forecasting model structure diagram.

Figure 13. Result of VMD decomposition.

Figure 14. Result of LMD decomposition.

Figure 15. (a–e) Daily load curve after LVMD-DBFCM clustering when the number of clusters is determined to be 5.

Figure 16. Comparative graph displaying the clustering validity indicators, CHI and DBI, of the K-means, FCM, DBFCM, and LVMD-DBFCM methods.

Figure 17. Comparison of the RMSE, MAE, and MAPE results for 5 models’ predictions.

Figure 18. Comparison chart of the prediction results of 5 models.

Figure 19. Comparison of prediction errors of the 5 models.

Table 1. Standard test function.

Standard	Dimension	Search Space	Minimum
$F_{1} (x) = \sum_{i = 1}^{n} x_{i}^{2}$	30	[−100, 100]	0
$F_{2} (x) = \sum_{i = 1}^{n} \|x_{i}\| + \prod_{i = 1}^{n} \|x_{i}\|$	30	[−10, 10]	0
$F_{3} (x) = {\sum_{i = 1}^{n} (\sum_{j = 1}^{n} x_{j})}^{2}$	30	[−100, 100]	0
$F_{4} (x) = \max_{i} \{\|x_{i}\|, 1 \leq i \leq n\}$	30	[−100, 100]	0
$F_{5} (x) = \sum_{i = 1}^{n - 1} [100 (x_{i + 1} - x_{i}^{2}) + {(x_{i} - 1)}^{2}]$	30	[−30, 30]	0
$F_{6} (x) = \sum_{i = 1}^{n} {([x_{i} + 0.5])}^{2}$	30	[−100, 100]	0
$F_{7} (x) = \sum_{i = 1}^{n} i x_{i}^{4} + r a n d o m [0, 1)$	30	[−128, 128]	0
$F_{8} (x) = \sum_{i = 1}^{n} - x_{i} \sin (\sqrt{\|x_{i}\|})$	30	[−500, 500]	−12,569.5
$F_{9} (x) = \sum_{i = 1}^{n} [x_{i}^{2} - 10 \cos (2 π x_{i}) + 10]$	30	[−5.12, 5.12]	0
$F_{10} (x) = - 20 \exp (- 0.2 \sqrt{\frac{1}{n} \sum_{i = 1}^{n} x_{i}^{2}}) - \exp (\frac{1}{n} \sum_{i = 1}^{n} \cos (2 π x_{i})) + 20 + e$	30	[−32, 32]	0
$F_{11} (x) = \frac{1}{4000} \sum_{i = 1}^{n} x_{i}^{2} - \prod_{i = 1}^{n} \cos (\frac{x_{i}}{\sqrt{i}}) + 1$	30	[−600, 600]	0
$\begin{matrix} F_{12} (x) = \frac{π}{n} \{10 \sin (π y_{1}) + \sum_{i = 1}^{n - 1} {(y_{i} - 1)}^{2} [1 + 10 \sin^{2} (π y_{i + 1})] + {(y_{n} - 1)}^{2}\} \\ + \sum_{i = 1}^{n} u (x_{i}, 10, 100, 4) \\ y_{i} = 1 + \frac{x_{i} + 1}{4} u (x_{i}, a, k, m) = \{\begin{array}{l} k {(x_{i} - a)}^{m} x_{i} > a \\ 0 - a < x_{i} < a \\ k {(- x_{i} - a)}^{m} x_{i} < a \end{array} \end{matrix}$	30	[−50, 50]	0
$\begin{array}{l} F_{13} (x) = 0.1 \{\sin^{2} (3 π x_{i}) + \sum_{i = 1}^{n} \begin{array}{l} {(x_{i} - 1)}^{2} [1 + \sin^{2} (3 π x_{i} + 1)] + \\ {(x_{n} - 1)}^{2} [1 + \sin^{2} (2 π x_{n})] \end{array}\} \\ + \sum_{i = 1}^{n} u (x_{i}, 5, 100, 4) \end{array}$	30	[−50, 50]	0
$F_{14} (x) = {[\frac{1}{500} + \sum_{j = 1}^{25} \frac{1}{j + \sum_{i = 1}^{2} {(x_{i} - a_{i j})}^{6}}]}^{- 1}$	2	[−65.56, 65.56]	1
$F_{15} (x) = {\sum_{i = 1}^{11} [a_{i} - \frac{x_{1} (b_{i}^{2} + b_{i} x_{2})}{b_{i}^{2} + b_{i} x_{3} + x_{4}}]}^{2}$	4	[−5, 5]	0.0003075
$F_{16} (x) = 4 x_{1}^{2} - 2.1 x_{1}^{4} + \frac{1}{3} x_{1}^{6} + x_{1} x_{2} - 4 x_{2}^{2} + 4 x_{2}^{4}$	2	[−5, 5]	−1.0316
$F_{17} (x) = {(x_{2} - \frac{5.1}{4 π^{2}} x_{1}^{2} + \frac{5}{π} x_{1} - 6)}^{2} + 10 (1 - \frac{1}{8 π}) \cos x_{1} + 10$	2	[−5, 5]	0.398
$\begin{array}{l} F_{18} (x) = & [1 + {(x_{1} + x_{2} + 1)}^{2} (19 - 14 x_{1} + 3 x_{1}^{2} - 14 x_{2} + 6 x_{1} x_{2} + 3 x_{2}^{2})] \times \\ \begin{matrix} \end{matrix} & [30 + {(2 x_{1} - 3 x_{2})}^{2} (18 - 32 x_{1} + 12 x_{1}^{2} + 48 x_{2} - 36 x_{1} x_{2} + 27 x_{2}^{2})] \end{array}$	2	[−2, 2]	3
$F_{19} (x) = - \sum_{i = 1}^{4} c_{i} \exp (- \sum_{j = 1}^{3} a_{i j} {(x_{j} - p_{i j})}^{2})$	3	[1, 3]	−3.86
$F_{20} (x) = - \sum_{i = 1}^{4} c_{i} \exp (- \sum_{j = 1}^{6} a_{i j} {(x_{j} - p_{i j})}^{2})$	6	[0, 1]	−3.32
$\begin{array}{l} F_{21} (x) = - {\sum_{i = 1}^{10} (\sum_{j = 1}^{5} {(x_{j} - C_{j i})}^{2} + β_{i})}^{- 1} \\ β = \frac{1}{10} (1, 2, 2, 4, 4, 6, 3, 7, 5, 5) \\ C = [\begin{matrix} 4.0 & 1.0 & 8.0 & 6.0 & 3.0 & 2.0 & 5.0 & 8.0 & 6.0 & 7.0 \\ 4.0 & 1.0 & 8.0 & 6.0 & 7.0 & 9.0 & 3.0 & 1.0 & 2.0 & 3.6 \\ 4.0 & 1.0 & 8.0 & 6.0 & 3.0 & 2.0 & 5.0 & 8.0 & 6.0 & 7.0 \\ 4.0 & 1.0 & 8.0 & 6.0 & 7.0 & 9.0 & 3.0 & 1.0 & 2.0 & 3.6 \end{matrix}] \end{array}$	4	[0, 10]	−10.1532
$F_{22} (x) = - {\sum_{i = 1}^{7} (\sum_{j = 1}^{5} {(x_{j} - C_{j i})}^{2} + β_{i})}^{- 1}$	4	[0, 10]	−10.4028
$F_{23} (x) = - {\sum_{i = 1}^{10} (\sum_{j = 1}^{4} {(x_{j} - C_{j i})}^{2} + β_{i})}^{- 1}$	4	[0, 10]	−10.5364

Table 2. Results of the testing functions optimized by different algorithms.

F		IVIA	TFWO	GEO	PPA	RSO	GWO	PSO	WOA
	optimal	0.000 × 10⁰	0.000 × 10⁰	5.988 × 10⁻⁶	0.000 × 10⁰	0.000 × 10⁰	0.000 × 10⁰	8.428 × 10⁻⁵	0.000 × 10⁰
F1	average	1.820 × 10⁻²⁴⁸	4.900 × 10⁻¹	2.136 × 10⁻¹	1.598 × 10⁻⁶	9.458 × 10⁻¹⁰²	1.985 × 10⁻⁴⁷	6.789 × 10⁻¹²	9.486 × 10⁻⁷¹
	worst	1.245 × 10⁻¹⁶²	3.500 × 10⁻¹¹	5.121 × 10⁻¹	2.185 × 10⁻¹	1.169 × 10⁻¹³	8.942 × 10⁻¹⁰	8.628 × 10⁻¹⁸	6.839 × 10⁻²⁷
	optimal	0.000 × 10⁰	0.000 × 10⁰	0.000 × 10⁰	0.000 × 10⁰	0.000 × 10⁰	6.843 × 10⁻³⁶	6.880 × 10⁻¹⁰	1.642 × 10⁻¹⁰⁴
F2	average	9.840 × 10⁻⁵³	1.249 × 10⁻⁸	4.878 × 10⁻⁶	9.785 × 10⁻⁷	1.854 × 10⁻³⁶	6.842 × 10⁻¹³	6.874 × 10⁻⁷	4.859 × 10⁻⁴⁷
	worst	2.350 × 10⁻³⁰	1.895 × 10⁻⁶	1.236 × 10⁻⁴	8.545 × 10⁻⁵	1.561 × 10⁻¹⁸	6.842 × 10⁻⁶	6.420 × 10⁻⁴	6.465 × 10⁻²⁸
	optimal	0.000 × 10⁰	2.855 × 10⁻¹	1.565 × 10¹	1.585 × 10⁻⁷	1.265 × 10⁻⁶	6.872 × 10⁻³	4.265 × 10⁻²	1.063 × 10³
F3	average	4.185 × 10⁻¹¹	1.522 × 10⁰	5.016 × 10²	1.855 × 10⁻⁵	1.295 × 10⁻⁶	3.000 × 10¹	2.326 × 10¹	3.064 × 10³
	worst	1.855 × 10⁻⁸	5.489 × 10⁰	1.000 × 10⁵	1.855 × 10⁻³	1.486 × 10⁻³	1.014 × 10²	6.019 × 10¹	1.036 × 10⁴
	optimal	9.459 × 10⁻⁷⁷	1.125 × 10⁰	2.153 × 10⁰	2.126 × 10⁰	3.165 × 10⁻³	2.894 × 10⁻²⁴	8.486 × 10⁻²	3.492 × 10⁻¹
F4	average	5.895 × 10⁻³⁴	2.459 × 10⁰	4.153 × 10⁰	3.124 × 10⁰	2.989 × 10⁰	1.987 × 10⁻¹⁴	1.425 × 10⁰	9.482 × 10⁻¹
	worst	2.349 × 10⁻¹⁹	3.157 × 10⁰	8.166 × 10⁰	5.156 × 10⁰	6.362 × 10⁰	6.850 × 10⁻⁷	6.895 × 10⁰	3.843 × 10⁰
	optimal	9.646 × 10⁻¹⁰	2.360 × 10⁻¹	1.465 × 10¹	1.946 × 10⁰	1.235 × 10⁰	2.169 × 10⁰	9.892 × 10⁻¹	1.979 × 10⁰
F5	average	5.419 × 10⁻⁸	6.523 × 10⁰	5.988 × 10²	1.249 × 10¹	1.565 × 10¹	1.249 × 10¹	1.468 × 10¹	9.162 × 10⁰
	worst	1.242 × 10⁻²	2.104 × 10¹	1.275 × 10⁴	1.991 × 10¹	1.035 × 10²	5.616 × 10¹	2.317 × 10¹	3.216 × 10¹
	optimal	2.355 × 10⁻¹⁸	3.985 × 10⁻⁴	8.852 × 10⁻²	1.289 × 10⁻³	1.965 × 10⁻⁶	3.687 × 10⁻¹	7.985 × 10⁻¹⁰	6.485 × 10⁻²
F6	average	9.475 × 10⁻¹⁰	9.125 × 10⁻¹	8.049 × 10⁻²	1.597 × 10⁻¹	1.547 × 10⁻¹	9.006 × 10⁻¹	7.864 × 10⁻⁷	9.717 × 10⁻¹
	worst	9.855 × 10⁻⁵	1.836 × 10⁰	3.598 × 10⁰	1.099 × 10⁰	6.218 × 10⁻¹	2.663 × 10⁰	6.843 × 10⁻²	2.097 × 10⁰
	optimal	1.968 × 10⁻⁷	0.000 × 10⁰	3.942 × 10⁻²	3.975 × 10⁻⁶	4.885 × 10⁻³	3.946 × 10⁻³	9.847 × 10⁻⁵	3.550 × 10⁻⁵
F7	average	2.495 × 10⁻³	1.698 × 10⁻¹	3.965 × 10⁻¹	3.735 × 10⁻¹	1.125 × 10⁻²	8.972 × 10⁻¹	8.885 × 10⁻¹	4.165 × 10⁻¹
	worst	6.515 × 10⁻¹	1.317 × 10⁰	1.569 × 10⁰	1.032 × 10⁰	3.339 × 10⁻¹	1.398 × 10⁰	1.763 × 10⁰	1.223 × 10⁰
	optimal	−1.256 × 10⁴	−1.024 × 10⁴	−5.663 × 10³	−1.124 × 10⁴	−1.035 × 10⁴	−9.176 × 10³	−1.034 × 10⁴	−1.224 × 10⁴
F8	average	−1.156 × 10⁴	−9.563 × 10³	−4.983 × 10³	−8.832 × 10³	−8.865 × 10³	−5.863 × 10³	−6.856 × 10³	−8.936 × 10³
	worst	−1.055 × 10⁴	−7.934 × 10³	−3.527 × 10³	−7.295 × 10³	−5.235 × 10³	−4.330 × 10³	−5.845 × 10³	−7.532 × 10³
	optimal	0.000 × 10⁰	4.946 × 10⁻⁶	9.533 × 10⁻⁷	0.000 × 10⁰	1.968 × 10⁻²²	6.427 × 10⁻³	6.848 × 10⁻⁸	7.842 × 10⁻²⁸
F9	average	5.649 × 10⁻⁵	1.568 × 10⁻³	3.798 × 10⁻³	1.765 × 10⁻³	1.765 × 10⁻⁶	3.499 × 10⁻⁵	3.492 × 10⁻⁴	9.787 × 10⁻⁹
	worst	3.476 × 10⁻²	1.065 × 10⁰	9.833 × 10⁻¹	3.177 × 10⁻²	1.685 × 10⁻¹	1.297 × 10⁰	1.379 × 10⁻¹	3.790 × 10⁻³
	optimal	9.852 × 10⁻²¹	4.682 × 10⁻¹¹	1.649 × 10⁻⁴	6.842 × 10⁻⁸	9.845 × 10⁻⁶	5.400 × 10⁻¹⁶	6.463 × 10⁻⁵	6.842 × 10⁻¹⁵
F10	average	1.855 × 10⁻⁸	4.190 × 10⁻³	1.515 × 10⁻¹	6.850 × 10⁻³	9.815 × 10⁻²	8.943 × 10⁻³	3.163 × 10⁻³	8.463 × 10⁻⁸
	worst	1.642 × 10⁻⁵	1.684 × 10⁻¹	1.331 × 10⁰	1.146 × 10⁰	3.855 × 10⁻¹	6.315 × 10⁻¹	9.198 × 10⁻¹	6.318 × 10⁻³
	optimal	5.989 × 10 ⁻⁹	9.847 × 10⁻⁵	5.646 × 10⁻³	1.685 × 10⁻⁷	9.946 × 10⁻⁵	3.489 × 10⁻²	3.965 × 10⁻⁴	3.787 × 10⁻⁹
F11	average	3.500 × 10⁻⁸	3.490 × 10⁻¹	3.654 × 10¹	4.846 × 10⁻⁴	1.654 × 10⁻³	3.326 × 10⁻¹	1.038 × 10⁰	9.779 × 10⁻⁵
	worst	1.985 × 10⁻⁴	2.165 × 10⁰	1.068 × 10²	3.168 × 10⁰	5.486 × 10⁻¹	1.235 × 10⁰	1.380 × 10⁰	3.480 × 10⁻³
	optimal	5.648 × 10⁻¹²	2.165 × 10⁻⁶	6.846 × 10¹	6.546 × 10⁻⁷	6.468 × 10⁻⁷	4.893 × 10⁻⁸	4.896 × 10⁻¹³	1.687 × 10⁻³
F12	average	6.942 × 10⁻⁸	1.656 × 10⁻³	6.845 × 10²	8.646 × 10⁻⁵	6.579 × 10⁻⁴	7.893 × 10⁸	4.987 × 10⁻¹⁰	6.447 × 10⁻¹
	worst	1.685 × 10⁻⁴	9.924 × 10⁻²	1.007 × 10⁴	1.634 × 10⁻¹	6.846 × 10⁻²	3.589 × 10⁻³	4.983 × 10⁻⁷	1.349 × 10⁰
	optimal	6.115 × 10⁻⁸	4.216 × 10⁻⁵	2.647 × 10¹	5.242 × 10⁻²	5.464 × 10⁻⁴	4.685 × 10⁻⁴	4.165 × 10⁻⁷	4.198 × 10⁻⁸
F13	average	4.359 × 10⁻²⁰	6.410 × 10⁻³	1.464 × 10²	5.825 × 10⁰	1.351 × 10⁻²	3.199 × 10⁻³	4.147 × 10⁻⁴	4.168 × 10⁻⁶
	worst	4.642 × 10⁻⁸	1.416 × 10⁰	1.001 × 10⁴	2.353 × 10¹	3.861 × 10⁰	2.056 × 10⁰	1.325 × 10⁰	9.650 × 10⁻³
	optimal	1.000 × 10⁰	1.000 × 10⁰	1.001 × 10⁰	1.024 × 10⁰	1.115 × 10⁰	1.259 × 10⁰	1.000 × 10⁰	1.000 × 10⁰
F14	average	1.001 × 10⁰	1.674 × 10⁰	1.009 × 10⁰	2.196 × 10⁰	1.351 × 10⁰	1.950 × 10⁰	1.350 × 10⁰	1.268 × 10⁰
	worst	1.007 × 10⁰	2.419 × 10⁰	2.486 × 10⁰	1.117 × 10¹	2.652 × 10⁰	1.286 × 10¹	2.149 × 10⁰	1.635× 10⁸
	optimal	3.000 × 10⁻⁴	5.419 × 10⁻⁴	1.684 × 10⁻³	6.554 × 10⁻⁴	3.948 × 10⁻⁴	9.157 × 10⁻⁴	9.165 × 10⁻³	3.917 × 10⁻⁴
F15	average	3.425 × 10⁻⁴	8.646 × 10⁻³	1.682 × 10⁻³	6.834 × 10⁻³	6.422 × 10⁻²	2.925 × 10⁻³	6.117 × 10⁻³	9.164 × 10⁻³
	worst	6.453 × 10⁻³	6.550 × 10⁻¹	1.954 × 10⁻¹	9.846 × 10⁻²	9.648 × 10⁻¹	1.896 × 10⁻¹	1.430 × 10⁻²	9.168 × 10⁻²
	optimal	−1.032 × 10⁰	−9.186 × 10⁻¹	−1.021 × 10⁰	−1.001 × 10⁰	−9.425 × 10⁻¹	−9.168 × 10⁻¹	−8.250 × 10⁻²	−1.000 × 10⁰
F16	average	−1.000 × 10⁰	−8.634 × 10⁻¹	−9.648 × 10⁻¹	−9.545 × 10⁻¹	−8.217 × 10⁻¹	−2.198 × 10⁻¹	−1.625 × 10⁻⁸	−5.491 × 10⁻¹
	worst	4.618 × 10⁻¹	−5.462 × 10⁻¹	−6.139 × 10⁻¹	−8.316 × 10⁻¹	−7.717 × 10⁻¹	1.198 × 10⁻¹	1.032 × 10⁰	−1.493 × 10⁻¹
	optimal	3.981 × 10⁻³	4.856 × 10⁻²	4.404 × 10⁻¹	4.040 × 10⁻¹	3.990 × 10⁻²	4.299 × 10⁻¹	4.399 × 10⁻¹	3.992 × 10⁻¹
F17	average	4.040 × 10⁻²	4.000 × 10⁻²	6.004 × 10⁻¹	6.040 × 10⁻¹	8.036 × 10⁻¹	6.498 × 10⁻¹	9.413 × 10⁻¹	1.065 × 10⁰
	worst	5.033 × 10⁻¹	9.911 × 10⁻¹	1.032 × 10⁰	2.030 × 10⁰	1.603 × 10⁰	2.064 × 10⁰	2.319 × 10⁰	1.916 × 10⁰
	optimal	3.000 × 10⁰	3.005 × 10⁰	3.016 × 10⁰	3.149 × 10⁰	3.001 × 10⁰	3.198 × 10⁰	3.616 × 10⁰	3.264 × 10⁰
F18	average	3.015 × 10⁰	3.147 × 10⁰	3.656 × 10⁰	3.648 × 10⁰	3.004 × 10⁰	3.616 × 10⁰	4.086 × 10⁰	3.949 × 10⁰
	worst	3.146 × 10⁰	3.142 × 10⁰	4.411 × 10⁰	4.245 × 10⁰	3.133 × 10⁰	4.691 × 10⁰	4.620 × 10⁰	5.017 × 10⁰
	optimal	−3.860 × 10⁰	−3.861 × 10⁰	−3.195 × 10⁰	−3.619 × 10⁰	−3.812 × 10⁰	−3.852 × 10⁰	−3.801 × 10⁰	−3.859 × 10⁰
F19	average	−3.677 × 10⁰	−3.852 × 10⁰	−2.870 × 10⁰	−3.595 × 10⁰	−3.816 × 10⁰	−3.742 × 10⁰	−3.550 × 10⁰	−3.726 × 10⁰
	worst	−3.165 × 10⁰	−3.832 × 10⁰	−8.456 × 10⁻¹	−3.455 × 10⁰	−3.795 × 10⁰	−3.516 × 10⁰	−2.949 × 10⁰	−3.030 × 10⁰
	optimal	−3.320 × 10⁰	−3.146 × 10⁰	−3.307 × 10⁰	−3.032 × 10⁰	−2.919 × 10⁰	−1.982 × 10⁰	−2.098 × 10⁰	−3.156 × 10⁰
F20	average	−2.996 × 10⁰	−2.920 × 10⁰	−3.019 × 10⁰	−2.533 × 10⁰	−1.398 × 10⁰	−1.089 × 10⁰	−9.497 × 10⁻¹	−1.979 × 10⁰
	worst	−2.649 × 10⁰	−2.520 × 10⁰	−2.120 × 10⁰	1.298 × 10⁰	−5.945 × 10⁻¹	1.685 × 10⁻¹	−6.430 × 10⁻²	−1.189 × 10⁰
	optimal	−1.015 × 10¹	−1.007 × 10¹	−1.002 × 10¹	−1.015 × 10¹	−8.103 × 10⁰	−1.002 × 10¹	−9.162 × 10⁰	−1.006 × 10¹
F21	average	−9.122 × 10⁰	−9.004 × 10⁰	−9.654 × 10⁰	−1.012 × 10¹	−4.137 × 10⁰	−8.981 × 10⁰	−7.198 × 10⁰	−9.682 × 10⁰
	worst	−7.596 × 10⁰	−5.642 × 10⁰	−8.514 × 10⁰	−9.632 × 10⁰	−6.517 × 10⁻¹	−1.192 × 10⁰	−1.192 × 10⁰	−8.336 × 10⁰
	optimal	−1.040 × 10¹	−1.032 × 10¹	−1.035 × 10¹	−1.040 × 10¹	−1.039 × 10¹	−1.032 × 10¹	−9.198 × 10⁰	−1.006 × 10¹
F22	average	−1.015 × 10¹	−8.250 × 10⁰	−1.015 × 10¹	−5.365 × 10⁰	−3.349 × 10⁰	−9.195 × 10⁰	−6.927 × 10⁰	−8.198 × 10⁰
	worst	−1.002 × 10¹	−4.820 × 10⁰	−9.671 × 10⁰	−2.389 × 10⁻¹	8.924 × 10⁻²	−4.928 × 10⁰	−1.162 × 10⁰	−1.265 × 10⁻¹
	optimal	−1.052 × 10¹	−9.525 × 10⁰	−1.051 × 10¹	−8.449 × 10⁰	−1.046 × 10¹	−1.000 × 10¹	−9.198 × 10⁰	−9.062 × 10⁰
F23	average	−1.025 × 10¹	−6.489 × 10⁰	−1.050 × 10¹	−4.168 × 10⁰	−6.265 × 10⁰	−5.984 × 10⁰	−3.198 × 10⁻¹	−6.894 × 10⁰
	worst	−9.219 × 10⁰	−1.220 × 10⁰	−9.647 × 10⁰	−1.608 × 10⁻¹	−2.487 × 10⁻¹	−1.963 × 10⁻¹	1.625 × 10⁻²	−1.820 × 10⁰

Table 3. The comparative table displaying the clustering validity indicators, CHI and DBI, of the K-means, FCM, DBFCM, and LVMD-DBFCM methods.

Methods	CHI	DBI
K-means	721.201	1.549
FCM	814.014	1.401
DBFCM	1024.743	1.216
LVMD-DBFCM	1142.509	1.083

Table 4. The comparative table displaying the clustering validity indicators, CHI and DBI, of the K-means, FCM, DBFCM, and LVMD-DBFCM methods.

Models	RMSE	MAE	MAPE
LSTM	101.4817	80.0980	5.7087%
BLSTM	83.1498	61.4826	4.4460%
CNN-BLSTM	79.3340	58.0245	4.1782%
LVMD-DBFCM+CNN-BLSTM	57.7316	42.3669	2.9946%
Proposed	31.9942	23.3691	1.6421%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hu, L.; Wang, J.; Guo, Z.; Zheng, T. Load Forecasting Based on LVMD-DBFCM Load Curve Clustering and the CNN-IVIA-BLSTM Model. Appl. Sci. 2023, 13, 7332. https://doi.org/10.3390/app13127332

AMA Style

Hu L, Wang J, Guo Z, Zheng T. Load Forecasting Based on LVMD-DBFCM Load Curve Clustering and the CNN-IVIA-BLSTM Model. Applied Sciences. 2023; 13(12):7332. https://doi.org/10.3390/app13127332

Chicago/Turabian Style

Hu, Linjing, Jiachen Wang, Zhaoze Guo, and Tengda Zheng. 2023. "Load Forecasting Based on LVMD-DBFCM Load Curve Clustering and the CNN-IVIA-BLSTM Model" Applied Sciences 13, no. 12: 7332. https://doi.org/10.3390/app13127332

APA Style

Hu, L., Wang, J., Guo, Z., & Zheng, T. (2023). Load Forecasting Based on LVMD-DBFCM Load Curve Clustering and the CNN-IVIA-BLSTM Model. Applied Sciences, 13(12), 7332. https://doi.org/10.3390/app13127332

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Load Forecasting Based on LVMD-DBFCM Load Curve Clustering and the CNN-IVIA-BLSTM Model

Abstract

1. Introduction

2. Influenza Virus Immunity Optimization Algorithm

2.1. Biological Characteristics

2.2. Mathematical Model

2.3. Algorithm Testing

3. Methodology

3.1. Self-Built Dataset

3.1.1. Dataset Visualization

3.1.2. Data Preprocessing

3.2. LVMD-DBFCM Imbalanced Data Clustering

3.2.1. LVMD Double Decomposition Modal Strategy

3.2.2. DBFCM Double Weights Fuzzy C-Means Algorithm

3.2.3. Clustering Validity Quality Evaluation

3.3. The Proposed CNN-IVIA-BLSTM Forecasting Model

3.3.1. CNN: Convolutional Neural Network

3.3.2. BLSTM Bidirectional Long Short-Term Memory Network

3.3.3. Hybrid Forecasting Model

3.3.4. Power Load Forecast Evaluation Indicator

4. Case Analysis

4.1. Analysis of Data Processing Results

4.1.1. Data Denoising and Decomposition

4.1.2. Analysis of the Clustering Results

4.2. Analysis of the Prediction Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI