A VMD–CISSA–LSSVM Based Electricity Load Forecasting Model

: Accurate power load forecasting has an important impact on power systems. In order to improve the load forecasting accuracy, a new load forecasting model, VMD–CISSA–LSSVM, is proposed. The model combines the variational modal decomposition (VMD) data preprocessing method, the sparrow search algorithm (SSA) and the least squares support vector machine (LSSVM) model. A multi-strategy improved chaotic sparrow search algorithm (CISSA) is proposed to address the shortcomings of the SSA algorithm, which is prone to local optima and a slow convergence. The initial population is generated using an improved tent chaotic mapping to enhance the quality of the initial individuals and population diversity. Second, a random following strategy is used to optimize the position update process of the followers in the sparrow search algorithm, balancing the local exploitation performance and global search capability of the algorithm. Finally, the Levy ﬂight strategy is used to expand the search range and local search capability. The results of the benchmark test function show that the CISSA algorithm has a better search accuracy and convergence performance. The volatility of the original load sequence is reduced by using VMD. The optimal parameters of the LSSVM are optimized by the CISSA. The simulation test results demonstrate that the VMD–CISSA–LSSVM model has the highest prediction accuracy and stabler prediction results.


Introduction
Electricity load forecasting is of great importance to the development of modern power systems.Stable and efficient management and scheduling strategies for power systems rely heavily on accurate forecasts of future loads at different times [1].Accurate short-term load forecasting can help national grids and energy suppliers cope with the increasing complexity of pricing strategies in future smart grids, further increase the utilization of renewable energy and meet the challenges posed by the development of electricity [2].
In recent years, the research on electricity load forecasting can be divided into traditional forecasting methods based on mathematical statistics and forecasting methods based on artificial intelligence (AI).Traditional forecasting models can be classified into exponential smoothing [3], Kalman filtering [4] and multiple linear regression models [5].Traditional forecasting methods rely on statistical models to analyze the regularity of electrical loads during stochastic variations and cannot effectively solve complex problems of non-linearity.In order to better solve the problem of complex nonlinear time series, AI-based forecasting methods have been widely discussed and applied.AI-based prediction methods include artificial neural networks (ANNs) [6,7], support vector machines (SVM) [8,9] and fuzzy prediction methods [10].For example, [11] combined a real number coded genetic algorithm (GA) with a BP neural network (BPNN) for short-term gas load forecasting.In addition, the feasibility of combining the GA algorithm with the BPNN and The sparrow search algorithm is a new type of swarm intelligence optimization algorithm proposed in 2020 and is widely used in various fields [22].The SSA algorithm has a strong global optimization capability and stability, but it still suffers from an insufficient optimization capability and slow convergence speed, and it easily falls into local optimality when encountering complex problems.Researchers have proposed a number of solutions to address the shortcomings of the SSA algorithm.For example, [23] proposed a fused cross-variant sparrow search algorithm.The algorithm used tent chaotic mapping to initialize the population to increase the population diversity.The crossover and variation ideas of the genetic algorithm were used to improve the position update equation of the SSA algorithm and help the algorithm to jump out of the local optimum.The chaotic flying sparrow search algorithm was proposed in [24].The improved algorithm was optimized mainly in the position update phase of the sparrow.In the search discovery phase, dynamic adaptive weights and levy flight mechanisms were combined to improve the search range and flexibility of the algorithm.The backward learning strategy based on lens imaging was introduced into the follower's position update process to help the algorithm balance local and global search.Another study [25] presented an improved sparrow search algorithm applied to the field of photovoltaic microgrids.The improved algorithm used a gravity inverse learning mechanism to initialize the population.Learning coefficients are introduced into the sparrow finder position update process to improve the global optimization capability.The variation operator is introduced into the joiner position update process to help the algorithm jump out of the local optimum.
In addition, it was found that data pre-processing techniques can effectively reduce the effect of noise in the raw data on the prediction results.For example, [26] proposed a combined forecasting model based on improved empirical modal decomposition (IEMD), autoregressive integrated moving average (ARIMA) and wavelet neural network (WNN) optimized based on the FOA algorithm.IEMD was used to reduce the noise of the original data.Simulation experiments not only verified the excellent prediction performance of the model, but also confirmed that data pre-processing has a positive impact on the prediction results.Another study [27] proposed a novel electricity load forecasting model based on data preprocessing and a multi-objective cuckoo search algorithm based on non-dominated ranking to optimize the GRNN.Fast empirical modal decomposition by integration (FEEMD) was used to reduce the interference of raw data.Another study [28] used the ensemble empirical mode decomposition (EEMD) to decompose the raw load data and then used the Elman neural network to make predictions.Although empirical mode decomposition (EEMD) and empirical mode decomposition (EEMD) could automatically decompose the modal components based on the data, the addition of white noise to the EEMD and EEMD during the decomposition process could create endpoint effects and cause distortion.VMD enables the effective separation of the intrinsic modal components and the division of the frequency domain of the signal to avoid distortions caused by the endpoint effect.
Table 1 shows a further summary and analysis of the above literature.The following conclusions are obtained from Table 1: the use of the idea of combined models to construct prediction models, a reasonable signal noise reduction approach and a multi-strategy optimization approach can effectively improve the prediction accuracy of power load prediction models.Studies [11][12][13][14]21] used standard intelligent optimization algorithms to optimize the network weights of neural networks in order to construct prediction models.Although such prediction models also had high prediction accuracy, the authors did not take into account the impact of the standard intelligent optimization algorithm's own shortcomings on the optimization process and the impact of non-linear fluctuations in the original data on the prediction results.While [15,[18][19][20] considered the impact of multi-strategy optimization approaches on intelligent optimization algorithms, the authors ignored the improvement of the prediction accuracy by data pre-processing methods.In addition, although [26][27][28] integrated the idea of combinatorial modeling, multi-strategy optimization and data pre-processing, the authors did not consider the endpoint effects inherent in EMD denoising that can also affect the final prediction results.GA algorithm optimization strategies and the choice of data pre-processing still have to be improved.[12] Power load Application of the FOA algorithm and the GRNN model to the field of power load forecasting The structural shortcomings of the FOA algorithm itself and the influence of noise on the prediction results were not taken into account.
[13] Power load A new intelligent optimization algorithm (Follow the Leader, FTL) based on flock movement The structural shortcomings of the FTL algorithm itself and the influence of the raw data on the prediction results were not taken into account.
[14] Power load Introducing intelligent optimization algorithms to the field of structural optimization of neural networks The paper demonstrated the effectiveness of intelligent optimization algorithms for training neural networks, failing to take into account the shortcomings of the intelligent algorithms themselves.
[21] Economic loss It proved the superiority of the SSA algorithm and the LSSVM model.
The authors only took into account the powerful global optimality-seeking capability of the SSA algorithm, ignoring the shortcomings of the algorithm itself in terms of its tendency to fall into local optima and convergence.
[15] Power load The authors proposed a prediction algorithm combining reinforcement learning particle swarm optimization and least squares support vector machines.
Although the K-averaging algorithm was used to classify the production patterns of the raw load data, it failed to take into account the effect of noise on the prediction results.
[18] Power load The authors proposed a CIGWO-ELMAN electric load forecasting model.The GWO algorithm was optimized by introducing a chaotic mapping strategy and a cosine function strategy based on random distribution.
The effect of noise in the raw load data on the prediction results was not taken into account.
[19] Power load The authors proposed an NCSOELM electric load forecasting model.The chicken flock algorithm was optimized by using a nonlinear dynamic inertia weighting strategy and levy variance strategy.
The effect of noise in the raw load data on the prediction results was not taken into account.
literature [20] power load The authors proposed an FA-CSSA-ELM electricity load forecasting model.The sparrow algorithm was optimized using the firefly perturbation strategy and tent chaos mapping.
The effect of noise in the raw load data on the prediction results was not taken into account.[26] Power load The authors proposed a combined forecasting model based on improved empirical mode decomposition (IEMD), the autoregressive integrated moving average (ARIMA) and the FOA algorithm optimized wavelet neural network (WNN).
No account was taken of the distortion caused by endpoint effects during the denoising process of the EMD algorithm.
[27] Power load The authors proposed a multi-strategy improved cuckoo algorithm based on the optimized GRNN model for electricity load forecasting and take into account seasonal factors.
No account was taken of the distortion caused by endpoint effects during the denoising process of the EMD algorithm.
[28] Power load The authors proposed an electricity load forecasting model based on integrated empirical mode decomposition (EEMD), approximate entropy and the extreme learning machine (ELM).
No account was taken of the distortion caused by endpoint effects during the denoising process of the EMD algorithm.The effect of parameter selection on the prediction results of neural network models was ignored.
In summary, a new combined power load forecasting model based on variational modal decomposition (VMD) and an improved chaotic sparrow search algorithm (CISSA) to optimize the LSSVM model is proposed.First, we address the problem that the standard sparrow search algorithm is prone to fall into local extremes as the population diversity decreases in the late iterations.In this paper, an improved chaotic sparrow optimization algorithm (CISSA) is proposed based on the analysis of the SSA algorithm.The improved tent mapping strategy, the random following strategy in the chicken flock optimization algorithm idea and the Levy flight strategy in the cuckoo algorithm idea are improved for the population initialization phase, algorithm iteration phase and global optimization search phase of the algorithm, respectively.Second, the original load sequence is decomposed into several modal components of different frequencies by VMD.The CISSA algorithm is used to calculate the two parameters of the LSSVM model, the penalty factor gam and the RBF kernel parameter sig.The CISSA-LSSVM prediction model is then used to train and predict the components at different frequencies separately.Finally, the predicted values of each component are integrated to produce the final prediction results.
In order to verify the performance of the CISSA algorithm proposed in this paper, 8 test benchmark functions are used to evaluate the optimization capability of the CISSA algorithm.The comparison with two improved SSA algorithms and three basic algorithms verifies that the CISSA algorithm has better search accuracy, convergence performance and stability.Finally, simulation experiments using real historical load data are conducted to verify the prediction accuracy and stability of the model.The simulation results compared with several competing models also demonstrate the excellent prediction accuracy and performance of the VMD-CISSA-LSSVM prediction model.

Theory and Methods
This section presents the mathematical theory and models of the variational modal decomposition, the LSSVM model, the sparrow search algorithm, the improved chaotic sparrow search algorithm and the VMD-CISSA-LSSVM model.

Variational Modal Decomposition
VMD is an adaptive decomposition method for non-smooth signals, which can determine the number of modal decompositions according to the actual situation of the sequence.The optimal solution is obtained by adaptively matching the frequency bandwidth of each mode to the optimal frequency bandwidth of each class of modes during the solution process.The specific mathematical model of VMD is shown in [29].The specific process of VMD decomposition is shown as follows: 1.
The Hilbert transform is applied to each sub-mode and the one-sided spectrum There is a transformation of the spectrum to a baseband where the spectrum is multiplied by the central frequency of an exponential signal estimate: The bandwidth is estimated by demodulating the signal and its constrained variational problem can be expressed as Equation (1): The quadratic penalty factor α and Lagrange multiplier λ(t) are introduced to turn it into an unconstrained variational problem to be solved: Mathematics 2022, 10, 28 6 of 28

5.
The alternating direction multiplier method is used to update the values of A and B, as shown in Equation (3): where δ(t) is the unit pulse signal; n is the n-th modal component obtained after the signal decomposition; N is the total number of modal decompositions; k is the number of iterations; ω n is the central frequency of the modal; ∂t is the sign of the partial derivative operation; α is the penalty factor; j is the unity of the imaginary number; ⊗ is the convolution operator; λ is the Lagrange multiplier; f (w), u n (ω) and λ are the Fourier transforms of f (t), u n (t) and ω n (t), respectively; u n (t) is the finite bandwidth of the component; and w n is the central frequency of the component.

Least Square Support Vector Machines
The inequality constraint in the SVM algorithm is replaced by an equation constraint and the sum of squared errors is used as the empirical loss.In addition, the selection of penalty factor and kernel function parameters in LSSVMs directly affects LSSVMs' antiinterference ability and generalization ability.The specific mathematical model of LSSVMs is shown in [30].
For a given training set T = x 1 , y 1 , • • • , x n , y n , its regression function can be defined as Equation ( 4): where x is the sample input, y is the sample output and ω and b are the normal vector and intercept of the hyperplane in the higher dimensional space, respectively.According to the risk minimization principle, the regression problem can be transformed into a constraint problem: where e i is the relaxation variable and γ is the regularization factor.By introducing the Lagrange multiplier α, the above problem is transformed into Equation (6): The optimal values are obtained by the partial differentiation of ω, b, e and α, respectively, and the regression function is then established: where K(x, x i ) is the kernel function and the RBF kernel function is used in this paper.The expression is as shown in Equation (8): where σ is the RBF kernel parameter.

Sparrow Search Algorithm
The sparrow search algorithm [31] is a new swarm intelligence optimization algorithm proposed by Xue in 2020.In this paper, a rational analysis is carried out based on the SSA algorithm so as to develop a reasonable optimization scheme.
The initial sparrow individuals in the sparrow search algorithm are randomly generated in the search space and gradually aggregated during the iterative process, making it difficult to obtain a good population diversity and maintain it at a certain level.This leads to a poor convergence performance and an inconsistency between the global search capability and local exploitation performance of the algorithm.
Sparrow populations are divided into searchers, followers and vigilantes, depending on their individual capabilities.The searcher's position is updated by the following Equation ( 9): where t is the current number of iterations; ∂ is the random number between [0, 1]; R 2 ∈ [0, 1] and ST ∈ 1 2 , 1 represent the warning value and the safety threshold, respectively; L is a matrix of 1 × d whose elements are all 1; and Q is a random number subject to a normal distribution.
The equation for updating the position of a follower is as follows: where xw t d denotes the worst position of the sparrow in the d-th dimension in the t-th iteration of the population, xb t+1 d denotes the optimal position of the sparrow in the d-th dimension in the t+1-th iteration of the population and L is the unit matrix of 1 × d.
The equation for updating the location of the vigilantes is as follows: where is the minimum constant; K is a random number within [−1, 1]; f i , f g and f w are the current adaptation fitness, the best adaptation fitness and the worst adaptation fitness, respectively; β is the number of iteration steps.When f i = f g , the sparrow is at the edge of the population and is vulnerable to predators; when f i = f g , the sparrow is in the middle of the population, is aware of the threat of predators and adjusts its search strategy by moving closer to other sparrows in time to avoid being attacked by predators.
From Equation (10), it can be seen that the follower position update process is mainly guided by xw t d and xb t+1 d .This also shows that the SSA algorithm does not take full advantage of the information carried by most common individuals in the population.As a result, the effective exploration area for sparrows is small and the algorithm is weak at the global scale.
The operation flow of the standard SSA algorithm is shown in Figure 1.The iterative search process for individual sparrows shows that the strength of the sparrow search algorithm is influenced by the quality of the individuals in the population and the location update parameters.The individual sparrow position updating relies on the inter-individual following and interactions.Due to the lack of variation in the iterative update process of individuals, once the local optimum stagnation is reached it is difficult for sparrows to jump out of the current local space.

Improved Chaotic Sparrow Search Algorithm
In this paper, the CISSA algorithm is proposed based on the analysis of the SSA algorithm.First, an improved tent chaotic mapping is used to generate the initial population to improve the quality of the initial solution and lay the foundation for global optimization.Second, in the iteration of the algorithm the random following strategy of the chicken flock algorithm is used to optimize the position update process of the followers in the SSA algorithm, thus balancing the local exploitation performance and global search capability of the algorithm.Finally, the Levy flight strategy of the cuckoo algorithm is introduced to improve the global search capability of the algorithm and help it to jump out of local constraints.The multi-strategy fusion approach helps the algorithm to balance local exploitation and global search capabilities, while improving the algorithm's local extreme value escape capability.

Improved tent Mapping Strategy
Chaos is a nonlinear system between deterministic and stochastic systems [32,33].Chaotic mappings are capable of traversing all states without repetition within a certain range.
Figure 2 shows the bifurcation diagrams of four common chaotic mappings.From Figure 2, it is clear that the tent chaotic map covers a larger area and is more uniformly start Initialize the population and related parameters Update the searcher's position using equation ( 9) Update the position of the follower using equation (10) Update the position of the vigilantes using equation ( 11

Improved Chaotic Sparrow Search Algorithm
In this paper, the CISSA algorithm is proposed based on the analysis of the SSA algorithm.First, an improved tent chaotic mapping is used to generate the initial population to improve the quality of the initial solution and lay the foundation for global optimization.Second, in the iteration of the algorithm the random following strategy of the chicken flock algorithm is used to optimize the position update process of the followers in the SSA algorithm, thus balancing the local exploitation performance and global search capability of the algorithm.Finally, the Levy flight strategy of the cuckoo algorithm is introduced to improve the global search capability of the algorithm and help it to jump out of local constraints.The multi-strategy fusion approach helps the algorithm to balance local exploitation and global search capabilities, while improving the algorithm's local extreme value escape capability.

Improved tent Mapping Strategy
Chaos is a nonlinear system between deterministic and stochastic systems [32,33].Chaotic mappings are capable of traversing all states without repetition within a certain range.
Figure 2 shows the bifurcation diagrams of four common chaotic mappings.From Figure 2, it is clear that the tent chaotic map covers a larger area and is more uniformly distributed.Therefore, the tent chaos mapping is chosen to initialize the sparrow population distribution and help the sparrow population to be uniformly distributed in the mapping space.In addition, random variables are introduced into the tent chaos mapping to improve the diversity and randomness of the population.
where q is a random number within   0,1 ; id lb and id ub represent the upper and lower bounds of the feasible solution interval, respectively; and id y is the individual after mapping.The process can be expressed as follows: a d-dimensional vector is randomly generated in   0,1 as the initial individual.Then N-1 new individuals are generated by iterating over each dimension of the vector in Equation ( 13).Finally, Equation ( 14) is used to map the values of the variables generated by the modified tent chaos mapping onto the sparrow individuals.

Random Following Strategy
The followers in the SSA algorithm are prone to rapid population clustering in a short period of time as they move towards the optimal position.Although a fast convergence can be achieved, the probability of the algorithm falling into a local optimum is greatly increased by the sudden drop in population diversity.Therefore, the random following strategy of the chicken flock optimization algorithm is used to improve the position update of the followers in the SSA algorithm.The mathematical model of the chicken swarm optimization is shown in [34].The random-following strategy of the chicken swarm optimization algorithm is to move the hens closer to the roosters with a certain probability.This ensures a convergence without reducing the diversity of the population and provides a good balance between local exploitation and global search.The Equation for updating the position of the hen is as follows: where r denotes any r-th rooster as the hen's mate and s denotes any s-th rooster or hen in the flock, rs  ; f is the fitness of a randomly selected rooster s; i f is the fitness value of the i-th sparrow.
The improved follower position update formula can be expressed as: The tent chaotic mapping can be expressed by Equation ( 12): Adding random variables rand(0, 1) × 1 N to Equation ( 12), Equation ( 13) is obtained: Finally, the improved tent chaotic mapping is obtained after the Bernoulli shift transformation of y i+1 = (2y i )mod1 + rand(0, 1) × 1 N .The initial position of the sparrow population in the feasible domain is obtained by Equation ( 14): where q is a random number within [0, 1]; lb id and ub id represent the upper and lower bounds of the feasible solution interval, respectively; and y id is the individual after mapping.The process can be expressed as follows: a d-dimensional vector is randomly generated in [0, 1] as the initial individual.Then N-1 new individuals are generated by iterating over each dimension of the vector in Equation (13).Finally, Equation ( 14) is used to map the values of the variables generated by the modified tent chaos mapping onto the sparrow individuals.

Random Following Strategy
The followers in the SSA algorithm are prone to rapid population clustering in a short period of time as they move towards the optimal position.Although a fast convergence can be achieved, the probability of the algorithm falling into a local optimum is greatly increased by the sudden drop in population diversity.Therefore, the random following strategy of the chicken flock optimization algorithm is used to improve the position update of the followers in the SSA algorithm.The mathematical model of the chicken swarm optimization is shown in [34].The random-following strategy of the chicken swarm optimization algorithm is to move the hens closer to the roosters with a certain probability.This ensures a convergence without reducing the diversity of the population and provides a good balance between local exploitation and global search.The Equation for updating the position of the hen is as follows: where r denotes any r-th rooster as the hen's mate and s denotes any s-th rooster or hen in the flock, r = s; f is the fitness of a randomly selected rooster s; f i is the fitness value of the i-th sparrow.
The improved follower position update formula can be expressed as: where S 3 = exp(( f s − f i )).

Levy Flight Strategy
In the late iterations of the SSA algorithm, individual sparrows have already completed their position updates and are prone to local optimum stagnation at this point.In order to solve this problem, the Levy flight strategy in the cuckoo algorithm is used to update and mutate the population after the SSA algorithm position update [35].
The Levy flight strategy is based on a combination of long-term small-step searches and short-term large-step jumps.The short distance search ensures that a small area around the individual is carefully searched during foraging.Longer walks ensure that the individual is able to move into another area and search over a wider area.Currently, the Mantegna method is commonly used to generate random step sizes that obey the Levy distribution.The formula proposed by Mantegna for simulating the Levy flight path can be expressed as: where s is the flight path of Levy; β is a constant, usually taken as 1.5; and ρ and v are normally distributed random numbers, which obey the normal distribution of Equation (20).The standard deviations σ ρ and σ v of the corresponding normal distribution in Equation (20) take values that satisfy Equation (21).The position update formula for Levy's flight can be expressed as: where x i (t) denotes the i-th solution at generation t, x best denotes the optimal solution at this point, l denotes the weight of the control step and ⊕ denotes the point multiplication.Figure 3 illustrates the two-dimensional plane-based Levy flight path generated using the Mantegna method.It is clear from Figure 3 that the Mantegna method can be effectively implemented in the search for the optimal solution based on a long-term small-step search and a short-term large-step jump change interphase.By expanding the search space in the short term with large steps, the individual is able to escape from the local stagnation at this point.In addition, the long-term small-step search method is used to enhance the local search capability, effectively solving the problem of individuals falling into local optima.In the standard SSA algorithm, once the sparrow's position is updated it enters the next phase of the cycle or ends, at which point it tends to fall into a local optimum.By introducing Levy flight variation into the sparrow population in the global search phase, the variation is updated again, helping the population to update its position again and move away from the local optimum at this point.This paper accomplishes a selective variation update of sparrows after a position update by comparing the size of rand with the inertia weight factor f : where iter is the current iteration number.Maxiter is the maximum iteration number and rand is a random number within   0,1 .If the selected random number rand is greater than f , the selected sparrow is subjected to Levy flight variation according to Equation (22).If the selected random number rand is less than f , the variation is skipped and the next step is carried out.

The CISSA Algorithm
As shown in Figure 4, the operational flow of the CISSA algorithm can be summarized as follows: 1. Initialize the relevant parameters of the SSA algorithm; 2. Initialize the sparrow population using a tent chaotic mapping with increased random variables.The improved tent chaotic mapping is used to improve the diversity of the sparrow population by using the ergodicity and randomness of the mapping, thus providing a basis for the global optimization of the algorithm.It generates a ddimensional vector in the initial space as the initial individual.Then, N-1 new individuals are generated by iterating over each of its dimensions by the equation This paper accomplishes a selective variation update of sparrows after a position update by comparing the size of rand with the inertia weight factor f : where iter is the current iteration number.Maxiter is the maximum iteration number and rand is a random number within (0, 1).If the selected random number rand is greater than f , the selected sparrow is subjected to Levy flight variation according to Equation (22).If the selected random number rand is less than f , the variation is skipped and the next step is carried out.

The CISSA Algorithm
As shown in Figure 4, the operational flow of the CISSA algorithm can be summarized as follows: 1.
Initialize the relevant parameters of the SSA algorithm; 2.
Initialize the sparrow population using a tent chaotic mapping with increased random variables.The improved tent chaotic mapping is used to improve the diversity of the sparrow population by using the ergodicity and randomness of the mapping, thus providing a basis for the global optimization of the algorithm.It generates a d-dimensional vector in the initial space as the initial individual.Then, N-1 new individuals are generated by iterating over each of its dimensions by the equation Finally, the values of the variables generated by the chaotic mapping are mapped onto individual sparrows by the equation x id = lb id + q(lb id − ub id ) × y id ; 3.
Calculate and rank the fitness values of the sparrows at this time and record the best and worst positions of the sparrows at this time; 4.
Update the position of the spotter sparrow at this point according to the equation The position of the follower sparrow at this point is updated according to the random following strategy employed in the equation The local exploitation performance and global search capability of the algorithm is balanced using the random following strategy; 6.
Update the position of the spotter alert at this point according to the equation Recalculate and rank the fitness values of the sparrows, recording the best and worst positions of the sparrows at this time; 8.
Calculate the inertia weighting factor f = 1 − iter/Maxiter.Whether sparrow populations undergo Levy variation is determined by comparing the magnitude of rand to f .If the selected random number is greater than f , then the selected individual sparrow is subjected to Levy flight variation according to the equation x i (t) = x i (t) + l ⊕ s.
The Levy flight strategy in the cuckoo algorithm idea is used to improve the global search ability of the algorithm and help the algorithm to jump out of local restrictions; 9.
Recalculate the fitness and record of the optimal and worst positions of the sparrow at this time; 10.Determine whether if the stop condition is met.If the stop condition is met, output the result; otherwise, repeat steps 2-9. .Whether sparrow populations undergo Levy variation is determined by comparing the magnitude of rand to f .If the selected random number is greater than f , then the selected individual sparrow is subjected to Levy flight variation according to the equation x t l s    .The Levy flight strategy in the cuckoo algorithm idea is used to improve the global search ability of the algorithm and help the algorithm to jump out of local restrictions; 9. Recalculate the fitness and record of the optimal and worst positions of the sparrow at this time; 10. Determine whether if the stop condition is met.If the stop condition is met, output the result; otherwise, repeat steps 2-9.
start Initialize the population using the Tent chaos mapping of equation ( 14) Update the searcher's position using equation (9) Update the position of the follower using equation (18) Update the position of the vigilantes using equation (11) Calculate the fitness value and sort it

Recalculate fitness values and update sparrow positions
T>maxT?
The end

NO YES
Set the parameters of the algorithm Update the position of the sparrow according to the equation ( 22

VMD-CISSA-LSSVM Electricity Load Forecasting Model
In summary, a new combined power load forecasting model based on VMD, the CISSA algorithm and the LSSVM model is proposed.
The VMD algorithm is used to decompose the original data to obtain multiple IMF components and Res residual components.The effect of denoising the raw load data is then achieved by means of modal reconstruction.The accuracy of the prediction model is reduced if the sub-series data is fed directly into the LSSVM model for load power prediction.The reason for this is that the penalty factor gam and the RBF kernel parameter sig of the LSSVM have a significant impact on the prediction results.To improve the prediction accuracy, the CISSA algorithm proposed in this paper is used to find the optimal kernel width and penalty factor for these two important parameters and input them into the LSSVM model for load prediction.
The flow of the VMD-CISSA-LSSVM power load forecasting model is shown in Figure 5.The specific operational flow can be expressed as follows: 1.
Modal decomposition of load data using the VMD algorithm; 2.
The input sub-series data has a large variance in peak values, which can have a significant impact on the prediction results if entered directly without processing.Therefore, the data needs to be normalized before the individual subsequences are fed into the LSSVM.The normalization formula can be expressed as x = x−x min x max −x min , where x represents the original data and x min and x max represent the minimum and maximum values in the original data; 3.
The kernel function width and penalty factor of the LSSVM are optimized using the CISSA algorithm proposed above; 4.
The decomposed sub-series data of the original load prediction are fed into the LSSVM prediction model optimized by the CISSA algorithm; 5.
The prediction results of each sub-series are summed to obtain the final prediction result.
Mathematics 2022, 9, x FOR PEER REVIEW 15 of 31 In summary, a new combined power load forecasting model based on VMD, the CISSA algorithm and the LSSVM model is proposed.
The VMD algorithm is used to decompose the original data to obtain multiple IMF components and Res residual components.The effect of denoising the raw load data is then achieved by means of modal reconstruction.The accuracy of the prediction model is reduced if the sub-series data is fed directly into the LSSVM model for load power prediction.The reason for this is that the penalty factor gam and the RBF kernel parameter sig of the LSSVM have a significant impact on the prediction results.To improve the prediction accuracy, the CISSA algorithm proposed in this paper is used to find the optimal kernel width and penalty factor for these two important parameters and input them into the LSSVM model for load prediction.
The flow of the VMD-CISSA-LSSVM power load forecasting model is shown in Figure 5.The specific operational flow can be expressed as follows: 1. Modal decomposition of load data using the VMD algorithm;  In addition, the proposed combined prediction model can be applied in a real power transmission environment, as shown in Figure 6.The VMD-CISSA-LSSVM model is applied in the first conversion phase.The model's forecasting performance is continuously

CISSA-LSSVMRE S
The predicted value of component 1

Sum of predicted results
The predicted value of component 1

The predicted value of component 1
The predicted value of component 1 In addition, the proposed combined prediction model can be applied in a real power transmission environment, as shown in Figure 6.The VMD-CISSA-LSSVM model is applied in the first conversion phase.The model's forecasting performance is continuously improved by continuous learning from historical electricity load data from previous years.Highly accurate forecasting results are used to give effective feedback to the power sector, helping decision makers to develop reasonable power supply and production plans and reduce unnecessary losses and waste in the supply-consumption process.

Final predicted value
Mathematics 2022, 9, x FOR PEER REVIEW 16 of 31 improved by continuous learning from historical electricity load data from previous years.Highly accurate forecasting results are used to give effective feedback to the power sector, helping decision makers to develop reasonable power supply and production plans and reduce unnecessary losses and waste in the supply-consumption process.

Selection of the Test Function
In order to verify the effectiveness and stability of the CISSA, eight benchmark test functions were used to perform a comparative test of function optimization.The test functions and their specific information are shown in Table 2 below.F1-F5 denote unimodal testbench functions and F6-F8 denote multimodal testbench functions.

Type Test Function Dimension Section Min
Unimodal testbench functions

Selection of the Test Function
In order to verify the effectiveness and stability of the CISSA, eight benchmark test functions were used to perform a comparative test of function optimization.The test functions and their specific information are shown in Table 2 below.F1-F5 denote unimodal testbench functions and F6-F8 denote multimodal testbench functions.
Table 2. Test functions and their specific information.

Type Test Function Dimension Section Min
Unimodal testbench functions [−100, 100] 0 x 2 i − 10 cos(2πx i + 10) 30 [−5.12, 5.12] 0 The sparrow search algorithm, chaotic sparrow search algorithm, particle swarm optimization algorithm, grey wolf optimization algorithm, FA-CSSA algorithm proposed in [20] and CISSA algorithm proposed in this paper were selected for the test function finding comparison experiments.The common parameters of all algorithms were kept the same, the population size was set to 50, and the maximum number of iterations was set to 300.The relevant parameters of each algorithm are shown in Table 3 below.In order to remove the error caused by chance, 30 independent trials were conducted for each of the 8 test functions.Table 4 shows the experimental results of the PSO, GWO, SSA, CSSA, FA-CSSA and the proposed CISSA algorithms after 30 independent runs on several standard test functions.The best values are marked in bold.In addition, the iterative convergence curves of the benchmark test functions are plotted to further visualize the convergence of each algorithm and the optimization results of the algorithms.The iterative convergence curves are shown in Figure 7.The different colored curves represent the convergence of the different algorithms.The horizontal axis represents the number of iterations and the vertical axis represents the fitness value.
The analysis in Table 4 shows that the CISSA algorithm outperformed the other three standard comparison algorithms and the two modified sparrow algorithms for the same test constraints for eight sets of test benchmark functions.
For the single-peak test functions F1-F5, the CISSA algorithm performed better than the PSO and GWO standard algorithms in the calculation of the mean and standard deviation.Even the FA-CSSA and CSSA algorithms achieved very good results for some of the test functions.In particular, the SSA, CSSA and FA-CSSA algorithms also achieved theoretical optima, but still did not outperform the CISSA algorithm in the calculation of the mean and standard deviation.This also demonstrates that for the single-peaked test functions F1-F5, CISSA not only achieved optimal results, but also showed a better convergence accuracy and stability.
For the multi-peaked functions F6-F8, the CISSA algorithm performed better than the PSO standard algorithm and the GWO standard algorithm in the calculation of the mean and standard deviation.In addition, the SSA, CSSA, FA-CSSA and CISSA algorithms all found their theoretical optimal values and performed very well in the calculation of the mean and standard deviation.The fact that the optimal solution was closer to the theoretical value proves that the CISSA algorithm is efficient in exploring the search space and guarantees a strong global search and local exploration capability.
In addition, compared to the SSA algorithm, the CISSA algorithm with the introduction of three improvement strategies had improved the optimal value, mean and standard deviation of the search results by several orders of magnitude to several tens of magnitude.The improvement of the SSA by a single strategy was limited, and the optimization results could not be maintained at a high level for different functions.The CISSA algorithm combining the three improved strategies showed a better overall solution performance, with most of the optimization accuracy and stability in 30 dimensions significantly better than the other four compared algorithms, which demonstrates the all-round improvement of the algorithm by combining multiple strategies.
From the iteration curves of the eight tested functions shown in Figure 7, it can be found that: for the F1 and F2 functions, the CISSA algorithm converged quickly and obtained the best fitness value, although the number of iterations to reach the optimal fitness was slightly greater than those of the other algorithms; for the F3, F4 and F5 functions, the CISSA algorithm was superior not only in terms of its convergence speed but also in terms of the number of iterations it required to achieve the optimal result; for the F6, F7 and F8 functions, the CISSA algorithm outperformed the other algorithms in terms of the number of iterations it required to reach the optimal value and the speed of convergence.To further evaluate the performance of the CISSA algorithm, a Wilcoxon signed-rank test was performed at the α = 5% significance level on the best results of the CISSA algorithm and the other five algorithms at 30 independent operations.The symbols "+", "−" and "=" indicate that CISSA outperformed, underperformed and was equivalent to the comparison algorithm, respectively, and N/A indicates that the algorithms were close to each other and no significance could be determined.The results are shown in Table 5, where CISSA outperformed SSA in five of the eight benchmark functions, PSO in eight functions, GWO in eight functions, CSSA in five functions and FACSSA in five functions.Moreover, the p-values of CISSA were basically less than 0.05, indicating that the superiority of CISSA was statistically significant.The CISSA algorithm performed the best, which proves the superiority of the algorithm itself.The mean and standard deviation of CISSA were smaller than other algorithms in the process of multiple search for both single-peaked and multi-peaked functions, which shows that the stability and robustness of CISSA is significantly better than other algorithms.The improved sparrow search algorithm was also able to explore the search space sufficiently and efficiently, and ensure a strong global search capability and local exploration capability.For different types of test functions, CISSA required the lowest number of iterations and the fastest iteration speed when several algorithms converged to the optimum, demonstrating the superiority of the convergence performance of the CISSA algorithm.

Simulation Experimental Data
For this paper, the real historical load data from 30 April 2007 to 12 September 2007 in a region of Shandong were selected as the simulation data.The data were collected every 0.5 h for a total of 20 weeks of historical data.As can be seen in Figure 8, the raw data series fluctuated considerably and was generally consistent with the "peak and trough" characteristics of the electricity load.We divided the dataset into seven datasets, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday and Sunday.The 20 weeks of data are divided into seven datasets according to Monday to Sunday.The data from the first 19 weeks were used for training and the data from the last week were predicted.This division allowed the prediction of loads from a particular month to a precise day.

F8
N For this paper, the real historical load data from 30 April 2007 to 12 September 2007 in a region of Shandong were selected as the simulation data.The data were collected every 0.5 h for a total of 20 weeks of historical data.As can be seen in Figure 8, the raw data series fluctuated considerably and was generally consistent with the "peak and trough" characteristics of the electricity load.We divided the dataset into seven datasets, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday and Sunday.The 20 weeks of data are divided into seven datasets according to Monday to Sunday.The data from the first 19 weeks were used for training and the data from the last week were predicted.This division allowed the prediction of loads from a particular month to a precise day.

Evaluation Functions
For the performance evaluation of the prediction model, four common performance evaluation metrics were used: mean squared error (MSE), mean absolute error (MAE) and mean absolute percentage error (MAPE).Table 6 shows the mathematical models of the three evaluation functions.

Evaluation Functions
For the performance evaluation of the prediction model, four common performance evaluation metrics were used: mean squared error (MSE), mean absolute error (MAE) and mean absolute percentage error (MAPE).Table 6 shows the mathematical models of the three evaluation functions.Table 6.Evaluation functions for three prediction models.

Metrics
Mathematical Model where p i represents the predicted data and t i represents the real data.

Experimental Analysis of VMD Noise Reduction
From Figure 8, it can be seen that the historical load data had a strong volatility and non-linear characteristics.Therefore, it was necessary to use VMD for noise reduction of the load data.The parameters of the VMD were set as follows: a penalty factor of 2000, an initial center frequency of 0 and a convergence factor of 1 × 10 −7 .Figure 9 shows the decomposition of the load data to obtain a number of IMF and Res components.From Figure 9, it can be observed that the decomposed sequence was regular and had a certain periodicity.The frequencies were relatively stable and there was no obvious spectrum aliasing.
From Figure 8, it can be seen that the historical load data had a strong volatility and non-linear characteristics.Therefore, it was necessary to use VMD for noise reduction of the load data.The parameters of the VMD were set as follows: a penalty factor of 2000, an initial center frequency of 0 and a convergence factor of 1 × 10 −7 .Figure 9 shows the decomposition of the load data to obtain a number of IMF and Res components.From Figure 9, it can be observed that the decomposed sequence was regular and had a certain periodicity.The frequencies were relatively stable and there was no obvious spectrum aliasing.The training set of each component sequence after VMD decomposition was normalized and input into the CISSA-LSSVM power load forecasting model, and the final forecasting results were obtained by summing.The smaller the value of the MAPE as a function of the fitness was, the higher the accuracy of the prediction model was.Table 7 shows the MAPE values of the VMD-CISSA-LSSVM model compared with the CISSA-LSSVM model without VMD noise reduction.In addition, the predictive performance of the different competing models was analyzed more closely according to the four evaluation functions described above.Table 8 shows the performance of the evaluation functions of the different competing models for the seven data subsets.In particular, the smaller the evaluation value was, the better the prediction performance the model was.Based on the values of the three evaluation functions presented in Table 8, the corresponding bar graphs are plotted for a more visual analysis, as shown in Figure 11.
From the analysis of the data presented in Table 8 and Figures 10 and 11, we can draw the following conclusions: compared to the LSSVM, ELM and ELMAN independent forecasting models, the average MSE values of the VMD-CISSA-LSSVM model were reduced by 69.8%, 87.8% and 86.7%, respectively; the average MAPE values were reduced by 66.1%, 89.0% and 87.6%, respectively; and the average MAE values were reduced by 69.2%, 87.6% and 86.5%, respectively.This also demonstrates the inability of independent forecasting models to achieve a forecasting accuracy that meets modern requirements.From the analysis of the prediction performance and trends of the different competing models shown in Figure 10, the prediction results of VMD-CISSA-LSSVM were more accurate and the prediction results were very stable and very close to the trend of the real load data.
Second, the FA-CSSA-ELM model, CISSA1-LSSVM model and CISSA2-LSSVM model performed relatively well.However, the above models also showed large fluctuations, resulting in unstable prediction results.The three independent forecasting models had the worst performance.In particular, the Elman model showed the most dramatic fluctuations and was the furthest from the real load data.
In addition, the predictive performance of the different competing models was analyzed more closely according to the four evaluation functions described above.Table 8 shows the performance of the evaluation functions of the different competing models for the seven data subsets.In particular, the smaller the evaluation value was, the better the prediction performance the model was.Based on the values of the three evaluation functions presented in Table 8, the corresponding bar graphs are plotted for a more visual analysis, as shown in Figure 11.

Conclusions
The innovations of this paper can be summarized as follows: 1.A detailed analysis of the iterative optimization search process of the SSA algorithm is presented.The CISSA algorithm is proposed to address the drawbacks of an uneven initial population distribution, the ease of falling into a local optimum and the From the analysis of the data presented in Table 8 and Figures 10 and 11, we can draw the following conclusions: compared to the LSSVM, ELM and ELMAN independent forecasting models, the average MSE values of the VMD-CISSA-LSSVM model were reduced by 69.8%, 87.8% and 86.7%, respectively; the average MAPE values were reduced by 66.1%, 89.0% and 87.6%, respectively; and the average MAE values were reduced by 69.2%, 87.6% and 86.5%, respectively.This also demonstrates the inability of independent forecasting models to achieve a forecasting accuracy that meets modern requirements.
Compared to independent forecasting models, combined forecasting models provided more accurate forecasts and trends that were closer to the true historical load.Compared with the SSA-LSSVM, PSO-Elman and GWO-ELM models, the average MSE values of the VMD-CISSA-LSSVM model were reduced by 62.3%, 61.7% and 60.9%, respectively; the average MAPE values were reduced by 59.7%, 58.2% and 60.1%, respectively; and the average MAE values were decreased by 59.1%, 58.0% and 60.2%, respectively.
The combined model based on the underlying optimization algorithm also achieved good prediction results.However, it still had the disadvantage of a low prediction accuracy.Compared with the CISSA1-LSSVM, FA-CSSA-ELM and CISSA2-LSSVM models, the average MSE values of the VMD-CISSA-LSSVM model decreased by 25.3%, 36.3% and 54.8%, respectively; the average MAPE values decreased by 13.7%, 34.4% and 54.4%, respectively; and the average MAE values decreased by 12.0%, 32.8% and 53.7%, respectively.
In summary, the combined VMD-CISSA-LSSVM prediction model proposed in this paper had the most outstanding prediction performance and was the most stable model; it could follow the trend of historical load data very well.In addition, the combined VMD-CISSA-LSSVM model had the lowest MSE evaluation value, indicating that the predicted and true values of the model had the lowest deviation.Having had the smallest MAE evaluation value indicates that the model had the smallest error in the prediction value.With the smallest MAPE evaluation values, the combined VMD-CISSA-LSSVM model was the most outstanding.

Conclusions
The innovations of this paper can be summarized as follows: 1.
A detailed analysis of the iterative optimization search process of the SSA algorithm is presented.The CISSA algorithm is proposed to address the drawbacks of an uneven initial population distribution, the ease of falling into a local optimum and the slow convergence of the sparrow search algorithm.The CISSA algorithm uses the improved tent mapping strategy in the initialization phase of the population.The random following strategy taken from the chicken flock optimization algorithm idea is used in the iteration stage of the algorithm.The Levy flight strategy from the cuckoo algorithm idea is used in the global optimization phase for improvement; 2.
The experimental results of the eight benchmark functions prove that the improved strategies are collaborative and complementary.The CISSA algorithm had a better convergence performance and overall synergy of the search and stability of the solution, and the overall performance was significantly improved compared with the SSA algorithm; 3.
A new VMD-CISSA-LSSVM model for power load forecasting is proposed.The load forecasting simulation results demonstrate that the VMD-CISSA-LSSVM forecasting model had the highest forecasting accuracy and more stable forecasting results, and could follow the trend of historical load data very well.The numerical comparison with the four evaluation functions of competing models shows the superiority of the VMD-CISSA-LSSVM prediction model.Therefore, the VMD-CISSA-LSSVM can provide reasonable decision making and production guidance to the authorities.

Figure 1 .
Figure 1.Block diagram of the SSA algorithm operation.

Figure 1 .
Figure 1.Block diagram of the SSA algorithm operation.

Figure 2 .
Figure 2. Four common chaotic map bifurcation graphs: (a) is the bifurcation graph of the logistic chaotic map; (b) is the bifurcation graph of the tent chaotic map; (c) is the bifurcation graph of the sine chaotic map; and (d) is the bifurcation graph of the Hénon chaotic map.

Figure 2 .
Figure 2. Four common chaotic map bifurcation graphs: (a) is the bifurcation graph of the logistic chaotic map; (b) is the bifurcation graph of the tent chaotic map; (c) is the bifurcation graph of the sine chaotic map; and (d) is the bifurcation graph of the Hénon chaotic map.

31 Figure 3 .
Figure 3. Simulation of a Levy flight path in the 2D plane.

3 .Figure 3 .
Figure 3. Simulation of a Levy flight path in the 2D plane.
) for the Levy flight strategy Recalculate fitness values and update sparrow positions

2 . 3 . 4 . 5 .
The input sub-series data has a large variance in peak values, which can have a significant impact on the prediction results if entered directly without processing.Therefore, the data needs to be normalized before the individual subsequences are fed into the LSSVM.The normalization formula can be expressed as ' and maximum values in the original data; The kernel function width and penalty factor of the LSSVM are optimized using the CISSA algorithm proposed above; The decomposed sub-series data of the original load prediction are fed into the LSSVM prediction model optimized by the CISSA algorithm; The prediction results of each sub-series are summed to obtain the final prediction result.

Figure 6 .
Figure 6.Flowchart for the transmission of electrical loads in the transmission grid.

Figure 6 .
Figure 6.Flowchart for the transmission of electrical loads in the transmission grid.

Mathematics 2022, 9 , 1 Figure 7 . 4 .
Figure 7. Convergence curves for eight functions: (a) indicates the convergence curve for t function; (b) indicates the convergence curve for the F2 function; (c) indicates the convergence for the F3 function; (d) indicates the convergence curve for the F4 function; (e) indicates the co gence curve for the F5 function; (f) indicates the convergence curve for the F6 function; (g) ind the convergence curve for the F7 function; (h) indicates the convergence curve for the F8 funcTable 4. Optimal results of different intelligent algorithms.Statistics Algorith m F1 F2 F3 F4 F5 F6 F7 F8

Figure 7 .
Figure 7. Convergence curves for eight functions: (a) indicates the convergence curve for the F1 function; (b) indicates the convergence curve for the F2 function; (c) indicates the convergence curve for the F3 function; (d) indicates the convergence curve for the F4 function; (e) indicates the convergence curve for the F5 function; (f) indicates the convergence curve for the F6 function; (g) indicates the convergence curve for the F7 function; (h) indicates the convergence curve for the F8 function.

Figure 9 .
Figure 9.The VMD decomposition provided the IMF component and the Res component.

Figure 10 .
Figure 10.Final daily power load forecast value of different forecast models.(a) Prediction results of different prediction models in the Monday subset; (b) prediction results of different prediction models in the Tuesday subset; (c) prediction results of different prediction models in the Wednesday subset; (d) prediction results of different prediction models in the Thursday subset; (e) prediction results of different prediction models in the Friday subset; (f) prediction results of different prediction models in the Saturday subset; (g) prediction results of different prediction models in the Sunday subset.

Figure 10 .
Figure 10.Final daily power load forecast value of different forecast models.(a) Prediction results of different prediction models in the Monday subset; (b) prediction results of different prediction models in the Tuesday subset; (c) prediction results of different prediction models in the Wednesday subset; (d) prediction results of different prediction models in the Thursday subset; (e) prediction results of different prediction models in the Friday subset; (f) prediction results of different prediction models in the Saturday subset; (g) prediction results of different prediction models in the Sunday subset.

Figure 11 .
Figure 11.Statistics of evaluation indexes of different power load competition models.(a) Comparison of MAE values for different competition forecasting models; (b) comparison of MAPE values for different competition forecasting models; (c) comparison of MSE values for different competition forecasting models.

Figure 11 .
Figure 11.Statistics of evaluation indexes of different power load competition models.(a) Comparison of MAE values for different competition forecasting models; (b) comparison of MAPE values for different competition forecasting models; (c) comparison of MSE values for different competition forecasting models.

Figure A1 .
Figure A1.Final daily power load forecast values of different forecast models.(a) Prediction results of different prediction models in the Monday subset; (b) prediction results of different prediction models in the Tuesday subset; (c) prediction results of different prediction models in the Wednesday subset; (d) prediction results of different prediction models in the Thursday subset; (e) prediction results of different prediction models in the Friday subset; (f) prediction results of different prediction models in the Saturday subset; (g) prediction results of different prediction models in the Sunday subset.

Figure A1 .
Figure A1.Final daily power load forecast values of different forecast models.(a) Prediction results of different prediction models in the Monday subset; (b) prediction results of different prediction models in the Tuesday subset; (c) prediction results of different prediction models in the Wednesday subset; (d) prediction results of different prediction models in the Thursday subset; (e) prediction results of different prediction models in the Friday subset; (f) prediction results of different prediction models in the Saturday subset; (g) prediction results of different prediction models in the Sunday subset.

Table 1 .
A comprehensive analysis of the relevant literature.

; 7 .
Update the position of the spotter alert at this point according to the equation Recalculate and rank the fitness values of the sparrows, recording the best and worst positions of the sparrows at this time;

Table 2 .
Test functions and their specific information.

Table 3 .
Parameter setting for each algorithm.

Table 4 .
Optimal results of different intelligent algorithms.

Table 5 .
Wilcoxon signed rank test p-value.

Table 8 .
Evaluation index statistics of different power load competition models.