Load Forecasting Based on LVMD-DBFCM Load Curve Clustering and the CNN-IVIA-BLSTM Model

: Power load forecasting plays an important role in power systems, and the accuracy of load forecasting is of vital importance to power system planning as well as economic efﬁciency. Power load data are nonsmooth, nonlinear time-series and “noisy” data. Traditional load forecasting has low accuracy and curves not ﬁtting the load variation. It is not well predicted by a single forecasting model. In this paper, we propose a novel model based on the combination of data mining and deep learning to improve the prediction accuracy. First, data preprocessing is performed. Second, identiﬁcation and correction of anomalous data, normalization of continuous sequences, and one-hot encoding of discrete sequences are performed. The load data are decomposed and denoised using the double decomposition modal (LVMD) strategy, the load curves are clustered using the double weighted fuzzy C-means (DBFCM) algorithm, and the typical curves obtained are used as load patterns. In addition, data feature analysis is performed. A convolutional neural network (CNN) is used to extract data features. A bidirectional long short-term memory (BLSTM) network is used for prediction, in which the number of hidden layer neurons, the number of training epochs, the learning rate, the regularization coefﬁcient, and other relevant parameters in the BLSTM network are optimized using the inﬂuenza virus immunity optimization algorithm (IVIA). Finally, the historical data of City H from 1 January 2016 to 31 December 2018, are used for load forecasting. The experimental results show that the novel model based on LVMD-DBFCM load c1urve clustering combined with CNN-IVIA-BLSTM proposed in this paper has an error of only 2% for electric load forecasting.


Introduction
As quality of life has improved, the demand for power energy has also risen, and thus, the requirements for power generation and transmission and the use of electricity have increased [1]. Power load forecasting applies machine learning methods to mine the key factors affecting load from historical data, such as weather and time data, to build load forecasting models [2]. Load forecasting is the basis for ensuring the balance of power supply and demand. Accurate forecasting results can reduce the pressure on transmission and distribution links and facilitate the optimal scheduling of power transmission links. They can effectively reduce power generation costs and improve economic and social benefits [3,4].
Traditional forecasting methods use the overall historical load numbers as their basis [5][6][7][8][9]. The trend extrapolation predicts future loads based on the historical trends of the predictor variables. It predicts near future loads with high accuracy, but error gradually increases with time. The regression analysis method predicts future load changes by establishing a functional relationship between variables. It has high fitting ability but has a large error in predicting future changes. Time series extrapolation (the autoregressive optimizes the function better, and finds the optimal value of the function in fewe tions. Then, the IVIA is applied to the neural network BLSTM training of the pow forecasting model to optimize the number of hidden layer neurons, the number of t epochs, the learning rate and the regularization coefficient, and other relevant para to improve the accuracy of the forecasting model.

Biological Characteristics
The influenza virus is very contagious, spreads quickly, and has many ro transmission [25]. The spread of the virus in the population can be achieved throug let transmission and physical contact, and the rate of spread is related to the social d between people [26]. The vast majority of the population can recover on their ow infection, and those who recover produce the corresponding antibodies in their and have immune functions. A small number of people with underlying diseases, elderly people or those with weak resistance, are at risk. When the number of im individuals in the population exceeds 80%, the population has herd immunity, th venting the next round of disease transmission. A schematic diagram of the transm process of the influenza virus in a population is shown in Figure 1. The population is divided into infected individuals, highly susceptible indiv susceptible individuals, safe individuals, and absolutely safe individuals.
Extremely susceptible individuals are direct contacts of infected individuals, w socially close to infected individuals and are infected by getting influenza virus fr fected individuals; susceptible individuals are indirect contacts of infected indi who are socially distant from the infected individuals, may be infected through thi ties as vectors, and have a higher risk of infection; safe individuals are not in conta infected individuals or other individuals, and because the influenza virus is trans as a vector, through a change in social distance, safe individuals may become susc individuals or infected individuals; absolutely safe individuals have immune fu through vaccination or healing after infection and are not susceptible to infection, less of the contact distance of infected individuals. In this paper, the influenza vi munization algorithm is proposed through the transmission of the influenza vir population and the immunization mechanism. To map the spread process of the in virus with the optimization algorithm, the following assumptions are made: 1. The initial number of infected individuals is small, and they are randomly dist in the population. The category to which an individual currently belongs a The population is divided into infected individuals, highly susceptible individuals, susceptible individuals, safe individuals, and absolutely safe individuals.
Extremely susceptible individuals are direct contacts of infected individuals, who are socially close to infected individuals and are infected by getting influenza virus from infected individuals; susceptible individuals are indirect contacts of infected individuals who are socially distant from the infected individuals, may be infected through third parties as vectors, and have a higher risk of infection; safe individuals are not in contact with infected individuals or other individuals, and because the influenza virus is transmitted as a vector, through a change in social distance, safe individuals may become susceptible individuals or infected individuals; absolutely safe individuals have immune functions through vaccination or healing after infection and are not susceptible to infection, regardless of the contact distance of infected individuals. In this paper, the influenza virus immunization algorithm is proposed through the transmission of the influenza virus in a population and the immunization mechanism. To map the spread process of the influenza virus with the optimization algorithm, the following assumptions are made: 1.
The initial number of infected individuals is small, and they are randomly distributed in the population. The category to which an individual currently belongs and the corresponding update formula are determined based on the distance between the individual and the infected individual; 2.
The overall number of the population is kept constant, and absolutely safe individuals randomly move to different locations in the population. The number and location of other individuals change dynamically with the number of iterations; 3.
Individuals in the population have three states: infected, uninfected, and immune. Infected individuals have immunity after infecting other individuals and will not be infected in the following process. A population is immune when 80% or more of the individuals in the population are immune; 4.
Individuals are infected by contracting virus cells from an infected person, and the process of virus cell exchange is shown in Figure 2. The influenza virus cells and the diseased cells are taken as the smallest units within an individual to represent the dimension of the optimization problem. Immunized individuals: The individuals who are categorized as immunized are protected against the virus, and they are not affected by infected individuals. Figure 3 shows the proportion of immune individuals to the entire population with the 23 sets of standard test functions run by IVIA. From the figure, except for functions F4, F7, F9, F16, F21, and F23, all the remaining 17 tested functions stop the iterative search process when the population immunity rate reaches 0.8. In this paper, the minimum population immunity rate for IVIA is set at 0.8, i.e., the search for optimization stops when the number of immunized individuals in the population exceeds 80% of the total. It ensures that the algorithm has high optimization results.
In this paper, we propose the IVIA based on the transmission and immunization process of the influenza virus. The influenza virus spreads rapidly in a population, and the growth curve of the cumulative number of infections is shown in Figure 4. Curve 1 indicates that the number of infected individuals in the population increases rapidly within 2~3 weeks, and curves 2 and 3 indicate that the number of infected individuals peaks at 7~8 weeks and then gradually starts to decline. Curve 4 indicates that the number of infected individuals gradually decreases and the number of immunized individuals increases near the end of this influenza outbreak. The IVIA has a good optimization capability at the beginning of the iteration. Figure 4 is a schematic diagram summarized according to the epidemic transmission rules; the website of the epidemic transmission rules is shown below. Figure 3 shows the proportion of immune individuals to the entire population with the 23 sets of standard test functions run by IVIA. From the figure, except for functions F4, F7, F9, F16, F21, and F23, all the remaining 17 tested functions stop the iterative search process when the population immunity rate reaches 0.8. In this paper, the minimum population immunity rate for IVIA is set at 0.8, i.e., the search for optimization stops when the number of immunized individuals in the population exceeds 80% of the total. It ensures that the algorithm has high optimization results.  In this paper, we propose the IVIA based on the transmission and immunization process of the influenza virus. The influenza virus spreads rapidly in a population, and the growth curve of the cumulative number of infections is shown in Figure 4. Curve ① indicates that the number of infected individuals in the population increases rapidly within 2~3 weeks, and curves ② and ③ indicate that the number of infected individuals peaks at 7~8 weeks and then gradually starts to decline. Curve ④ indicates that the number of in-  bility at the beginning of the iteration. Figure 4 is a schematic diagram summarized according to the epidemic transmission rules; the website of the epidemic transmission rules is shown below.

Mathematical Model
The process of influenza virus transmission in a population is mapped with the influenza virus immunization algorithm. The population is represented by the matrix X , as shown in Equation (1), where n is the population size and the number of individuals, and d is the number of cells invaded by the influenza virus into the human body, corresponding to the dimensionality of the objective function. The initialization process of the algorithm randomly generates an d n× dimension matrix X , which sets the initial number of infected individuals to x.
The fitness values of different individuals in the population are represented by Equation (2). Each row in the matrix is represented as the fitness value of the current individual. The individual corresponding to the optimal fitness value is the current optimal solution of the objective function. 11 12 1d The distance between individuals is represented by C, as shown in Equation (3).
is the position of individual y, and d is the dimension of the optimization problem.

Mathematical Model
The process of influenza virus transmission in a population is mapped with the influenza virus immunization algorithm. The population is represented by the matrix X, as shown in Equation (1), where n is the population size and the number of individuals, and d is the number of cells invaded by the influenza virus into the human body, corresponding to the dimensionality of the objective function. The initialization process of the algorithm randomly generates an n × d dimension matrix X, which sets the initial number of infected individuals to x.
The fitness values of different individuals in the population are represented by Equation (2). Each row in the matrix is represented as the fitness value of the current individual. The individual corresponding to the optimal fitness value is the current optimal solution of the objective function.
The distance between individuals is represented by C, as shown in Equation (3). x = {x 1 , x 2, . . . , x d } is the position of individual x, y = {y 1 , y 2 , . . . , y d } is the position of individual y, and d is the dimension of the optimization problem.
L is the maximum safety distance, as shown in Equation (4). (L min , L max ) represents the range of values for individuals.
The virus spreading process corresponds to the iterative optimization-seeking process in the algorithm. It is assumed that the safe distance between other individuals and the • Infected individuals are updated by Equation (5): When the contact distance between individuals in the population and infected individuals is C < 0.2L, the individuals are moved into the infected state. The position of infected individuals is updated by Equation (5). N max is the maximum number of iterations, n is the current number of iterations, d is the dimensionality of the optimization problem, and q is used to regulate the individual position update rate. The larger the value is of q, the faster the position update rate and the higher the accuracy of the optimization search.

•
Extremely susceptible individuals are updated by Equation (6): When the contact distance between individuals in the population and infected individuals is 0.2L < C < 0.5L, the position is updated by Equation (6). x i,q (t) is the random acquisition of viral cells by highly susceptible individuals; α represents update coefficient at the contact distance of 0.2L < C < 0.5L. When the number of exchanged cells exceeds a quarter of the total number of cells, the highly susceptible individuals are transformed into infected individuals. Cell exchange occurs only once during each iteration, and the number of exchanges is random.

•
Susceptible individuals are updated by Equation (7): When the contact distance between individuals in the population and infected individuals is 0.5L < C < 0.8L, the position is updated by Equation (7). x i,v (t) is a random acquisition of viral cells by highly susceptible individuals, β represents update coefficient at the contact distance of 0.5L < C < 0.8L, ε is a random number in the range 0~1. When the number of exchanged cells has exceeded one-third of the total number of cells, the highly susceptible individuals are transformed into infected individuals.

•
Secure individuals updated by Equation (8): When the contact distance between individuals in the population and infected individuals is 0.8L < C < L, the position is updated by Equation (8). γ is a random number in the range −1~1. There is no cellular exchange between safe individuals and infected individuals. However, as the number of infected individuals increases, the contact distance between safe individuals and infected individuals may decrease, and safe individuals may become susceptible.

•
Absolutely safe individuals updated by Equation (9): Absolutely safe individuals are immune, and the contact distance between individuals and infected individuals is C > L; λ represents the update coefficient at the contact distance of C > L. They randomly change their positions during the iterative process and can be used to escape the current search region when the algorithm falls into local optima.
Stop criterion IVIA: The algorithm terminates when the number of immune individuals in the population exceeds more than 80% of the population or when the maximum number of iterations is reached. Therefore, the individual with the highest population immunity corresponds to the optimal solution of the optimization problem, and the corresponding function value is the optimal value of the optimization problem. If the problem to be solved requires higher accuracy, the population immunity rate in the Algorithm 1 can be increased. The pseudo-code of IVIA is presented below. Calculate fitness value and sort 5: for i = 1:n 6: if (C < 0.2L) then 7: Using Equation (5) to update the location of the infected individuals 8: Record the current status of the individuals 9: else if (0.2L < C < 0.5L) then 10: Using Equation (6) to update the location of highly susceptible individuals 11: Record the current status of the individuals 12: else if (0.5L < C < 0.8L) then 13: Using Equation (7) to update the location of susceptible individuals 14: Record the current status of the individuals 15: else if (0.8L < C < L) then 16: Using Formula (8) to update the position of a safe individuals 17: Record the current status of the individual 18: else 19: Using Equation (9)

Algorithm Testing
To test the actual optimization effect of the proposed IVIA, it was evaluated in this paper on 23 sets of standard test functions. The optimization results were compared with those of turbulent flow of water-based optimization (TFWO), golden eagle optimization (GEO), the parasitism-predation algorithm (PPA), the rat swarm optimizer (RSO), gray wolf optimization (GWO), particle swarm optimization (PSO), and the whale optimization algorithm (WOA). Table 1 shows the expression of the test function, the dimension of the optimization problem, and the overall search range, and the last column shows the minimum value of the test function expectation, which is the theoretical optimum. The IVIA population size n is 30, the maximum number of iterations N max is 1000, the maximum safe contact distance L is 0.5ub, and the population immunity rate R is 80% during the test period. The number of dimensions of the test function and the parameters, such as search intervals lb and ub, are set according to Table 1. The overall and iteration numbers of the other optimization functions are the same as those of the IVIA.

Standard Dimension Search Space Minimum
The viability of the proposed IVIA is tested using 23 well-known benchmark functions with different sizes and complexity. Figure 5 shows the three-dimensional graphs of the standard test functions and their corresponding test results. The convergence of the IVIA is very good for both single-peaked and multipeaked functions.
From the test results in Table 2, the accuracy of the IVIA for function finding is much higher than that of several other optimization algorithms, and the theoretical optimal value can be found for some functions. IVIA has fewer iterations, a shorter running time, and faster convergence compared to the other seven algorithms.
In summary, the IVIA is based on the characteristics of fast convergence, better robustness, and good optimization search. In this paper, the IVIA is selected to optimize the relevant parameters, such as the number of neurons in the hidden layer, the number of training times, the learning rate, and the regularization coefficients in the BLSTM network. From the test results in Table 2, the accuracy of the IVIA for function finding is much higher than that of several other optimization algorithms, and the theoretical optimal value can be found for some functions. IVIA has fewer iterations, a shorter running time, and faster convergence compared to the other seven algorithms.

Methodology
In this paper, we propose a method for power load forecasting based on LVMD-DBFCM load curve clustering, and the CNN-IVIA-BLSTM model is shown in Figure 6. First, data preprocessing is performed, and the influences of load are divided into continuous and discrete series. After normalizing the continuous sequences, they include load data, meteorological data (temperature, wind speed, and humidity), and economic factors. Onehot encoding is applied to discrete sequences, and the discrete sequences include the data types of workdays, weekends, and holidays. Load data decomposition and denoising with LVMD are performed on the preprocessed data. Then, the power load curve is clustered using the DBFCM algorithm. The processed data are extracted with a 1D CNN for feature extraction and finally, predicted with the IVIA-BLSTM model. Appl Figure 6. The overall structure design of this paper.

Self-Built Dataset
We expect to study the electrical load of a particular city or region where such datasets are rare. Data from the internet are collected and organized to produce an HS dataset. The following cities are replaced by H City because the electricity data relate to the economic performance of the country. The dataset includes 1096 days of electricity load data from 1 January 2016 to 31 December 2018, in H City. The dataset was divided; 80% is used as the training set and 20% is used as the test set.
The effects of the three factors of temperature, humidity, and wind speed on the power load are as follows. In summer, the temperature is high, and as air-conditioners usage increases, the power load increases. When the relative humidity is in the sensitive range 40~95%, the meteorological sensitive load varies significantly with the relative humidity. The load decreases with the increase of relative humidity. When the wind speed is in the sensitive range 2~6 m/s, the weather sensitive load changes significantly with the wind speed. The load increases with the increase of wind speed [27].

Dataset Visualization
The overall power load data from 2016-2018 are presented in Figure 7. From the boxwhisker plot, the difficulty of prediction is reflected by the high number of data outliers in the morning from 6:00 to 8:00 and in the afternoon from 17:00 to 18:00 and 19:45 to 22:00. In summary, the importance of the data preprocessing and clustering and prediction models that follow in this paper is illustrated.

Self-Built Dataset
We expect to study the electrical load of a particular city or region where such datasets are rare. Data from the internet are collected and organized to produce an HS dataset. The following cities are replaced by H City because the electricity data relate to the economic performance of the country. The dataset includes 1096 days of electricity load data from 1 January 2016 to 31 December 2018, in H City. The dataset was divided; 80% is used as the training set and 20% is used as the test set.
The effects of the three factors of temperature, humidity, and wind speed on the power load are as follows. In summer, the temperature is high, and as air-conditioners usage increases, the power load increases. When the relative humidity is in the sensitive range 40~95%, the meteorological sensitive load varies significantly with the relative humidity. The load decreases with the increase of relative humidity. When the wind speed is in the sensitive range 2~6 m/s, the weather sensitive load changes significantly with the wind speed. The load increases with the increase of wind speed [27].

Dataset Visualization
The overall power load data from 2016-2018 are presented in Figure 7. From the box-whisker plot, the difficulty of prediction is reflected by the high number of data outliers in the morning from 6:00 to 8:00 and in the afternoon from 17:00 to 18:00 and 19:45 to 22:00. In summary, the importance of the data preprocessing and clustering and prediction models that follow in this paper is illustrated. Appl

Abnormal data recognition and correction
Data preprocessing is first performed to resolve data outliers before clustering and prediction is performed [28].
Missing data: Lagrange interpolation is used to handle missing values. Data duplication: Remove duplicate datasets by similarity. Data mutation: 3 criteria. •

Data normalization
Normalized values for the clustering process can reduce the adverse effect on the clustering effect in the data domain due to the weight of the distance occupied between different attributes of the initial values. Normalization can eliminate the influence of the size of the load data volume on the distance in the clustering analysis, and thus, the information on the load pattern is highlighted [29].
Data normalization: Uniform metrics on data. Z Score normalization is shown in Equation (10). After normalization, the load data are normalized to the interval [−1,1].
Data preprocessing: The equation for normalizing continuous data is given in Equation (11). Using One Hot encoding for discrete sequences, each state has unique register bits; only one bit is 1, and the rest are 0.

LVMD-DBFCM Imbalanced Data Clustering
The goal of load clustering is to mine the typical daily load model [30].
Step 1. De-noising. The load data are decomposed and denoised using the LVMD (in Section 3.2.1).
Step 2. Clustering. The load data are clustered using the DBFCM (in Section 3.2.2).

• Abnormal data recognition and correction
Data preprocessing is first performed to resolve data outliers before clustering and prediction is performed [28].
Missing data: Lagrange interpolation is used to handle missing values. Data duplication: Remove duplicate datasets by similarity. Data mutation: 3 criteria.

• Data normalization
Normalized values for the clustering process can reduce the adverse effect on the clustering effect in the data domain due to the weight of the distance occupied between different attributes of the initial values. Normalization can eliminate the influence of the size of the load data volume on the distance in the clustering analysis, and thus, the information on the load pattern is highlighted [29].
Data normalization: Uniform metrics on data. Z Score normalization is shown in Equation (10). After normalization, the load data are normalized to the interval [−1, 1].
Data preprocessing: The equation for normalizing continuous data is given in Equation (11). Using One Hot encoding for discrete sequences, each state has unique register bits; only one bit is 1, and the rest are 0.

LVMD-DBFCM Imbalanced Data Clustering
The goal of load clustering is to mine the typical daily load model [30].
Step 1. De-noising. The load data are decomposed and denoised using the LVMD (in Section 3.2.1).
Step 2. Clustering. The load data are clustered using the DBFCM (in Section 3.2.2).
Step 3. Clustering evaluation. The load data clustering effect are analyzed using the SSE evaluation index (in Section 3.2.3).

LVMD Double Decomposition Modal Strategy
• VMD Variational Mode Decomposition VMD [31] is an iterative process that is used to search for the optimal solution of a variational model and to determine the mode u k (t) and its corresponding central frequency w k and bandwidth. VMD has good denoising capabilities. VMD denoising are as follows: 1.

4.
Update λ, and the equation is shown in Equation (12).
The component that contains the minimum time information after decomposition is judged and considered to be random noise that is eliminated. Then, the data are reconstructed by the remaining modal component pairs, and the reconstructed data are the load data that do not contain noise. •

LMD Local Mean Decomposition
LMD is a new time-frequency analysis method that adaptively decomposes complex signals into a finite sum of product function (PF) components. Each of these PF components is actually a single component of the AM-FM information. LMD has good feature decomposition abilities [32,33]. The LMD principle is as follows.

1.
First, find all the extreme points contained in the data series x(t), assuming that the distribution of extreme points is {n 1 , n 2 , n 3 , · · · }, and then calculate the mean m i and envelope a i of the adjacent extreme points according to Equations (14) and (15).
The local mean function curve m 11 (t) and the envelope function curve a 11 (t) are obtained by connecting m i and a i with line segments and smoothing, respectively, and the sliding average formula is given in Equation (16): where Y(i) is the sequence to be smoothed, 2R + 1 is the sliding span, and R is the distance from Y(i) to the starting point of the sliding process.
The PF component of the envelope signal is the product of the envelope signal and the pure FM signal, and the above steps are repeated after stripping the PF component from the original signal to obtain the new signal until the residual component is a monotonic function. The final result of LMD is shown in Equation (20): where the last component is a monotonic function. LMD decomposes the original signal into multiple high-frequency and low-frequency components without distorting the original signal in the decomposition process. If the sliding step of the sliding average algorithm is not chosen properly in the first step, the decomposition result will be greatly affected. In this paper, the original sliding average process is replaced by the three-time Hermite interpolation method to improve the decomposition accuracy of the LMD algorithm while reducing the overshoot and undershoot of the envelope. That is, after obtaining the extreme value points, the Hermite interpolation method is used to form the upper and lower envelopes, and then the other steps continue to apply the LMD without changes.

DBFCM Double Weights Fuzzy C-Means Algorithm
• FCM Fuzzy C-means Algorithm FCM is a soft clustering algorithm and is unlike traditional hard clustering (HCM) algorithms, which allows the same object to belong to the same cluster [34]. By optimizing the objective function to obtain the affiliation of each sample point to all class centers, the affiliation range is [0, 1] to determine the class of the sample points to achieve the purpose of automatic classification of sample data [35].
The objective function and constraints s.t. of the FCM algorithm are given in Equation (21): where u ij denotes that the sample belongs to the cluster affiliation value, m represents the fuzziness, K is the number of clusters, c i is the i-th clustering center, and x j − c i 2 is the 2 parity of the Euclidean distance from each data point to the cluster center. The clustering center and affiliation update equations are shown in Equations (22) to (23): •

DBFCM Double Weights Fuzzy C-means Algorithm
Traditional FCM clustering is inaccurate in classification for holidays in small categories. From the date type, the power load data are unbalanced data. FCM has a "uniform effect of cluster size" issue, which affects the recognition effect and causes the algorithm to not recognize the holiday load model implied in the historical dataset [36,37]. In this paper, the DBFCM algorithm is proposed to improve the clustering performance of imbalanced data. The objective function of DBFCM utilizes the clustering volume as the weight and the weighted Euclidean distance as the metric distance.

1.
The affiliation matrix defines the class volume, which is then introduced as a constraint s.t. into the traditional FCM algorithm objective function, as shown in Equation (24): where v j is the volume of the j-th class; see Equation (25). Constraint s.t. is shown in Equation (26): The objective function of DBFCM uses the clustering volume as weights, which can balance the volume of each class in the clustering process. Thus, it can compensate for the unequal interactions between classes and improve the clustering performance of traditional algorithms for unbalanced data.
The derivative of Equation (27) with respect to the affiliation degree is constantly positive. Therefore, the Lagrange multiplier method is used to solve the affiliation degree and clustering center by using a greedy strategy and introducing constraint variables. The Lagrangian equation is given in Equation (28): The partial derivative of Equation (28) is found by setting the partial derivative = 0; see Equation (29): The updated equations for the affiliation u ij as well as the clustering center c i are given in Equations (30) to (31): The DBFCM algorithm is more biased towards small classes and avoids the problem of the FCM uniformity effect when dealing with the same sample Euclidean clustering.

2.
The load curve is characteristically weighted by introducing a weighting factor w to the traditional Euclidean distance d ij . The DBFCM is used to perform cluster analysis on the decomposed and reconstructed load curves of LVMD to improve the optimal number of clusters.
The degree of influence of each dimensional feature of the analyzed sample on the classification result is assumed to be w. The feature weight coefficients of each dimension w = {w 1 , w 2 , · · · , w n } are introduced to the Euclidean distance to form a weighted Euclidean distance, which controls the weights of each dimensional feature vector. The traditional Euclidean distance d ij and the Euclidean distance d * ij with the addition of weight w are expressed in Equations (32) to (33): After DBFCM adds weight w, the clustering center c i does not change, but the objective function J DBFCM , the affiliation u ij , and the Euclidean distance from sample x j to the clustering center c i are all affected by the weighted Euclidean distance d * ij . In summary, the objective function and affiliation of DBFCM are shown in Equations (32) to (33):

Clustering Validity Quality Evaluation
The clustering criterion used in the clustering process is usually the sum of squared error (SSE). The analytical formula of SSE is shown in the Equation (36): In the formula, c i is the i-th cluster center, and p i represents the aggregation of data points in the i-th cluster. As the number of clusters q increases, the samples are classified more accurately, and the SSE decreases. Theoretically, the smaller the SSE, the better the clustering effect. As the value of q increases to a certain level, the rate of the decline in SSE slows down. As shown in Figure 8, when q = 5, it is the inflection point of the curve. The most suitable cluster number is 5.
In the formula, i c is the i-th cluster center, and i p represents the aggregation of data points in the i-th cluster. As the number of clusters q increases, the samples are classified more accurately, and the SSE decreases. Theoretically, the smaller the SSE, the better the clustering effect. As the value of q increases to a certain level, the rate of the decline in SSE slows down. As shown in Figure 8, when 5 = q , it is the inflection point of the curve.
The most suitable cluster number is 5. In this paper, two clustering validity indicators are used to assess the quality of clustering: the Calinski-Harabasz indicator (CHI) and the Davies-Bouldi indicator (DBI). The CHI is obtained by the ratio of the compactness to the separation [38]. Thus, the larger the index is, the more compact the result is. The DBI metric estimates intraclass closeness by the distance from the sample point within a class to the center of the class to which it belongs, and the distance between class centers indicates the interclass dispersion [39]. Thus, the smaller the index is, the better the effect. Equations (37) to (38) describe these metrics:

The Proposed CNN-IVIA-BLSTM Forecasting Model
Using the CNN-IVIA-BLSTM model to forecast the power load data: Step 1. Feature fusion extraction. CNN is used to extract the data feature.
Step 2. Forecasting. IVIA optimizes the parameters related to the BLSTM network, then uses the optimized BLSTM to predict.
Step 3. Power load forecast evaluation. The load data forecasting effect analyzed using RMSE, MAE, and MAPE. In this paper, two clustering validity indicators are used to assess the quality of clustering: the Calinski-Harabasz indicator (CHI) and the Davies-Bouldi indicator (DBI). The CHI is obtained by the ratio of the compactness to the separation [38]. Thus, the larger the index is, the more compact the result is. The DBI metric estimates intraclass closeness by the distance from the sample point within a class to the center of the class to which it belongs, and the distance between class centers indicates the interclass dispersion [39]. Thus, the smaller the index is, the better the effect. Equations (37) to (38) describe these metrics:

The Proposed CNN-IVIA-BLSTM Forecasting Model
Using the CNN-IVIA-BLSTM model to forecast the power load data: Step 1. Feature fusion extraction. CNN is used to extract the data feature.
Step 2. Forecasting. IVIA optimizes the parameters related to the BLSTM network, then uses the optimized BLSTM to predict.
Step 3. Power load forecast evaluation. The load data forecasting effect analyzed using RMSE, MAE, and MAPE.

CNN: Convolutional Neural Network
A CNN can automatically extract potential features between massive loads of continuous and discontinuous data to build a compressed complete feature vector for the top fully connected layer [40]. It is a hierarchical neural feedforward network and consists of a series of network layers with different functions. It mainly comprises the input layer, implied layer, and output layer. Among these, the implied layer takes the convolutional layer as the core, and the main function of the convolution layer is to extract the feature fusion of the input data as an arithmetic of the convolution layer. In addition, the CNN includes pooling layers (to reduce the number of parameters of the neural network) and the Dropout layer (to prevent data overfitting). The 1D convolutional kernel is two-dimensional and has a length and width; however, there is a sliding window in the width or height direction, and multiplication is transformed into addition. In this paper, a 1D CNN is used, with a convolutional kernel size of 3 × 3, a sliding window of 10, and a step size of 1. The overall structure of the convolutional network is shown in Figure 9. A convolutional block is a combination of M convolutional layers and b convergence layers. It can be stacked with N consecutive convolutional blocks followed by K fully connected layers. or height direction, and multiplication is transformed into addition. In this paper, a 1D CNN is used, with a convolutional kernel size of 3 3× , a sliding window of 10, and a step size of 1. The overall structure of the convolutional network is shown in Figure 9. A convolutional block is a combination of M convolutional layers and b convergence layers. It can be stacked with N consecutive convolutional blocks followed by K fully connected layers. Figure 9. Overall structure of a CNN.

BLSTM Bidirectional Long Short-term Memory Network
BLSTM can remember a sequence well, can solve the dependency problem of longer time spans, and has a strong advantage in improving time series correlation. Based on the LSTM (long short-term memory) network, the forward and reverse network structures form a closed loop of information, which can better verify and correct the process error information while maintaining the bidirectional data information, which has stronger robustness [41]. The LSTM network structure is shown in Figure 10, and the relevant calculation equations are shown in (39)-(44). ( ) Figure 9. Overall structure of a CNN.

BLSTM Bidirectional Long Short-Term Memory Network
BLSTM can remember a sequence well, can solve the dependency problem of longer time spans, and has a strong advantage in improving time series correlation. Based on the LSTM (long short-term memory) network, the forward and reverse network structures form a closed loop of information, which can better verify and correct the process error information while maintaining the bidirectional data information, which has stronger robustness [41]. The LSTM network structure is shown in Figure 10, and the relevant calculation equations are shown in (39)-(44).
Forget gate : New memory unit : Final memory unit : where W i , W f , and W o represent input weight vectors; U i , U f , and U o represent upper output weight vectors; and b i , b f , b o , and b c are bias vectors. Sigmoid is generally selected as the excitation function for σ, which mainly plays a role of gating. It has an output between 0 and 1, which matches the physical definition of gating and is very close to 1 or 0 when the input is large or small, thus ensuring that the gate is open or closed. The tanh is an option to generate the new memory unit c t due to a faster convergence rate with an output between −1 and 1, which coincides with the center of the feature distribution being 0 in most scenarios. The related formulas are given in Equations (45) and (46): time spans, and has a strong advantage in improving time series correlation. Based on the LSTM (long short-term memory) network, the forward and reverse network structures form a closed loop of information, which can better verify and correct the process error information while maintaining the bidirectional data information, which has stronger robustness [41]. The LSTM network structure is shown in Figure 10, and the relevant calculation equations are shown in (39)-(44). ( ) LSTM can only be passed in one direction, and the unit computation of the bidirectional neural network is connected with the unidirectional one. BLSTM consists of an input layer, forward LSTM layer, reverse LSTM layer, and output layer. However, the hidden layer of the bidirectional neural network must save two values, i.e., A, which is involved in the forward calculation, and A , which is involved in the reverse calculation. The final output value depends on the sum. The BLSTM network structure is shown in Figure 11. , and c b are bias vectors. Sigmoid is generally selected as the excitation function for  , which mainly plays a role of gating. It has an output between 0 and 1, which matches the physical definition of gating and is very close to 1 or 0 when the input is large or small, thus ensuring that the gate is open or closed. The tanh is an option to generate the new memory unit ct' due to a faster convergence rate with an output between −1 and 1, which coincides with the center of the feature distribution being 0 in most scenarios. The related formulas are given in Equations (45) and (46) LSTM can only be passed in one direction, and the unit computation of the bidirectional neural network is connected with the unidirectional one. BLSTM consists of an input layer, forward LSTM layer, reverse LSTM layer, and output layer. However, the hidden layer of the bidirectional neural network must save two values, i.e., A , which is involved in the forward calculation, and ' A , which is involved in the reverse calculation.
The final output value depends on the sum. The BLSTM network structure is shown in Figure 11.  However, a common problem is that BLSTM networks have many parameters, and without proper optimization, the model may be overfitted or slow to train, resulting in inefficiency. Too few nodes in the hidden layer will cause the model to not have the necessary learning ability and information processing capability, and too many nodes will increase the complexity of the network structure and make the learning process easily fall into local minima, which makes the network slow. When the learning rate is too high, the cost function is not easy to reduce to the lowest point, it is not easy to converge at the lowest point, and the convergence effect is poor. When too much training is performed, the gradient descent process may cross the nadir, which causes the training rate to be too low. In the case of reasonable tuning parameters, the more layers and neurons there are, the higher the accuracy rate; however, this can also lead to the overfitting phenomenon, and a regularization process can be used to solve the overfitting problem. In this paper, Figure 11. BLSTM network structure diagram.
However, a common problem is that BLSTM networks have many parameters, and without proper optimization, the model may be overfitted or slow to train, resulting in inefficiency. Too few nodes in the hidden layer will cause the model to not have the necessary learning ability and information processing capability, and too many nodes will increase the complexity of the network structure and make the learning process easily fall into local minima, which makes the network slow. When the learning rate is too high, the cost function is not easy to reduce to the lowest point, it is not easy to converge at the lowest point, and the convergence effect is poor. When too much training is performed, the gradient descent process may cross the nadir, which causes the training rate to be too low. In the case of reasonable tuning parameters, the more layers and neurons there are, the higher the accuracy rate; however, this can also lead to the overfitting phenomenon, and a regularization process can be used to solve the overfitting problem. In this paper, the number of implicit layer neurons, training times, learning rates, and regularization coefficients in BLSTM are optimized using the IVIA to improve the performance of the sequence modelling task.

Hybrid Forecasting Model
The CNN-IVIA-BLSTM model is a combined prediction model consisting of a CNN and BLSTM. BLSTM is a commonly used model architecture of deep learning models for sequence modelling tasks, and it has achieved good performance in many tasks. The CNN is first used to extract the feature vectors consisting of load influencing factors, which can be considered a "feature extractor", then the extracted feature vectors are used for load prediction with the BLSTM model optimized by the IVIA. The hybrid prediction model structure is shown in Figure 12, which takes full advantage of both the CNN and BLSTM network to ensure the accuracy of the power load.
square error ( RMSE y ) is used as the objective function in the optimization algorithm, and finally, the model of the influenza virus immunization algorithm coupled with the bidirectional long and short-term memory network is developed. 4. The 1D CNN reads the load sequence with a sliding time window of 10 and a step size of 1 for feature extraction. 5. m prediction models are obtained by inputting the CNN-IVIA-BLSTM prediction models for each component separately. 6. Finally, the predicted values of the m prediction models are combined to obtain the predicted values of the load.  The forecasting steps are as follows: 1. Selected information as model input.

3.
The IVIA population size N, the maximum number of iterations M, and the initial search range of the parameters (the number of neurons in the hidden layer H, training number E, learning rate η, and regularization factor L2) are set. The root mean square error (y RMSE ) is used as the objective function in the optimization algorithm, and finally, the model of the influenza virus immunization algorithm coupled with the bidirectional long and short-term memory network is developed.

4.
The 1D CNN reads the load sequence with a sliding time window of 10 and a step size of 1 for feature extraction.

5.
m prediction models are obtained by inputting the CNN-IVIA-BLSTM prediction models for each component separately.

6.
Finally, the predicted values of the m prediction models are combined to obtain the predicted values of the load.

Power Load Forecast Evaluation Indicator
Three evaluation indexes are set as y MAPE (mean absolute percentage error, MAPE), y RMSE (root mean square error, RMSE), and y MAE (mean absolute error, MAE). The equations are shown in (47) to (49): In the above equation, n is the total number of predictions, X act (i) is the real value of the load at moment i, and X pred (i) is the predicted value of the load at moment i.

Case Analysis
In this paper, historical power load data from 2016 to 2018 in H city were used for prediction with a sampling interval of 15 min. LSTM, BLSTM, CNN-BLSTM, LVMD-DBFCM-CNN-BLSTM, and the proposed model in this paper are selected to compare the prediction results.

Data Denoising and Decomposition
The original load data are decomposed and denoised using VMD, as shown in Figure 13, and the original load data sequence is decomposed into 8 IMF components. From the figure, we can see that the IMF7-IMF8 values are small and contain more noisy data, which are removed from the original data, then the first six IMF components are used to reconstruct the data to achieve data denoising. In the above equation, n is the total number of predictions, ) (i X act is the real v of the load at moment i, and ) (i X pred is the predicted value of the load at moment i.

Case Analysis
In this paper, historical power load data from 2016 to 2018 in H city were used prediction with a sampling interval of 15 min. LSTM, BLSTM, CNN-BLSTM, LVM DBFCM-CNN-BLSTM, and the proposed model in this paper are selected to compare prediction results.

Data Denoising and Decomposition
The original load data are decomposed and denoised using VMD, as shown in Fi 13, and the original load data sequence is decomposed into 8 IMF components. From figure, we can see that the IMF7-IMF8 values are small and contain more noisy data, w are removed from the original data, then the first six IMF components are used to re struct the data to achieve data denoising.  The second step of LVMD is to redecompose the denoised data with LMD. The denoised data are decomposed into a total of six PF components, and the decomposition results are shown in Figure 14. The values of PF1 are larger and maintain the same trend as the original data, containing the main valid information of the data. The values of PF2-PF6 are smaller, among which the periodic changes of PF2-PF4 are more obvious and PF6 is monotonically increasing, which is convenient for the prediction of each component. The randomness of the F5 sequence volatility is strong and the prediction accuracy of the F5 component is not high when making predictions. noised data are decomposed into a total of six PF components, and the decomposition results are shown in Figure 14. The values of PF1 are larger and maintain the same trend as the original data, containing the main valid information of the data. The values of PF2-PF6 are smaller, among which the periodic changes of PF2-PF4 are more obvious and PF6 is monotonically increasing, which is convenient for the prediction of each component. The randomness of the F5 sequence volatility is strong and the prediction accuracy of the F5 component is not high when making predictions.

Analysis of the Clustering Results
The daily load curve after LVMD-DBFCM clustering is shown in Figure 15. The clustering algorithm proposed in this paper is compared with k-means [41], FCM [43], and DBFCM. Their comparison is displayed in Table 3 and Figure 16. From the results, it can be seen that the maximum value of the CHI metric is 1142.509 and the minimum value of the DBI metric is 1.083. The clustering validity of the method proposed in this paper is better than that of the other three methods.

Analysis of the Clustering Results
The daily load curve after LVMD-DBFCM clustering is shown in Figure 15.
The second step of LVMD is to redecompose the denoised data with LMD. The denoised data are decomposed into a total of six PF components, and the decomposition results are shown in Figure 14. The values of PF1 are larger and maintain the same trend as the original data, containing the main valid information of the data. The values of PF2-PF6 are smaller, among which the periodic changes of PF2-PF4 are more obvious and PF6 is monotonically increasing, which is convenient for the prediction of each component. The randomness of the F5 sequence volatility is strong and the prediction accuracy of the F5 component is not high when making predictions.

Analysis of the Clustering Results
The daily load curve after LVMD-DBFCM clustering is shown in Figure 15. The clustering algorithm proposed in this paper is compared with k-means [41], FCM [43], and DBFCM. Their comparison is displayed in Table 3 and Figure 16. From the results, it can be seen that the maximum value of the CHI metric is 1142.509 and the minimum value of the DBI metric is 1.083. The clustering validity of the method proposed in this paper is better than that of the other three methods.   The clustering algorithm proposed in this paper is compared with k-means [42], FCM [43], and DBFCM. Their comparison is displayed in Table 3 and Figure 16. From the results, it can be seen that the maximum value of the CHI metric is 1142.509 and the minimum value of the DBI metric is 1.083. The clustering validity of the method proposed in this paper is better than that of the other three methods. (e) Figure 15. (a-e)Daily load curve after LVMD-DBFCM clustering when the number of c determined to be 5.

Analysis of the Prediction Results
The comparison between the prediction model proposed in this paper and the four models LSTM, BLSTM, CNN-BLSTM, and LVMD-DBFCM+CNN-BLSTM using three load evaluation indices, RMSE, MAE, and MAPE, is shown in Table 4 and Figure 17. From the results, we can see that the prediction models proposed in this paper have better prediction results than the other four models in all evaluation indices.  Figure 18 shows the prediction graphs of the five prediction models. From the figure, it can be clearly seen that the pink line, representing the model proposed in this paper, fits so closely to the target curve represented by the black line, which can better reflect the trend of the target, indicating that the model proposed in this paper has the better prediction effect than addressed models. Figure 19 shows the prediction error comparison of the 5 models, from which it can be clearly seen that the model proposed in this paper, represented by yellow, has the lowest error.  Figure 18 shows the prediction graphs of the five prediction models. From the figure, it can be clearly seen that the pink line, representing the model proposed in this paper, fits so closely to the target curve represented by the black line, which can better reflect the trend of the target, indicating that the model proposed in this paper has the better prediction effect than addressed models. Figure 19 shows the prediction error comparison of the 5 models, from which it can be clearly seen that the model proposed in this paper, represented by yellow, has the lowest error.

Conclusions
In this paper, a new model based on LVMD-DBFCM load curve clustering and a CNN-IVIA-BLSTM hybrid model for power load forecasting is proposed. This comprehensive technique takes historical load data and influencing factors (meteorology, economy, and data type) into account, where historical load data, meteorological factors, and

Conclusions
In this paper, a new model based on LVMD-DBFCM load curve clustering and a CNN-IVIA-BLSTM hybrid model for power load forecasting is proposed. This comprehensive technique takes historical load data and influencing factors (meteorology, economy, and data type) into account, where historical load data, meteorological factors, and

Conclusions
In this paper, a new model based on LVMD-DBFCM load curve clustering and a CNN-IVIA-BLSTM hybrid model for power load forecasting is proposed. This comprehensive technique takes historical load data and influencing factors (meteorology, economy, and data type) into account, where historical load data, meteorological factors, and economic factors are normalized, and the data types are uniquely heat coded.
The novel LVMD-DBFCM algorithm improves the continuity and stability of the data, and the values of the CHI and DBI quality assessment indicators are 1142.509 and 1.083, respectively, both of which reflect the good validity of the clustering method used in this paper. In the new CNN-IVIA-BLSTM model, a CNN is used for feature extraction, BLSTM is used for load forecasting, and the IVIA is used to optimize the relevant parameters in the BLSTM network. The results of the three electric load forecasting evaluation metrics of the hybrid forecasting model show that the RMSE is 31.9942, the MAE is 23.3691, and the MAPE is 1.6421%. The prediction effect of the electric load fits well with the target, and the prediction error is minimized.