A Hybrid GA–PSO–CNN Model for Ultra-Short-Term Wind Power Forecasting

: Accurate and timely wind power forecasting is essential for achieving large-scale wind power grid integration and ensuring the safe and stable operation of the power system. For over-coming the inaccuracy of wind power forecasting caused by randomness and volatility, this study proposes a hybrid convolutional neural network (CNN) model (GA–PSO–CNN) integrating genetic algorithm (GA) and a particle swarm optimization (PSO). The model can establish feature maps between factors affecting wind power such as wind speed, wind direction, and temperature. More-over, a mix-encoding GA–PSO algorithm is introduced to optimize the network hyperparameters and weights collaboratively, which solves the problem of subjective determination of the optimal network in the CNN and effectively prevents local optimization in the training process. The prediction effectiveness of the proposed model is veriﬁed using data from a wind farm in Ningxia, China. The results show that the MAE, MSE, and MAPE of the proposed GA–PSO–CNN model decreased by 1.13–9.55%, 0.46–7.98%, and 3.28–19.29%, respectively, in different seasons, compared with Single–CNN, PSO–CNN, ISSO–CNN, and CHACNN models. The convolution kernel size and number in each convolution layer were reduced by 5–18.4% in the GA–PSO–CNN model.


Introduction
The unsustainability of conventional energy sources, in addition to their knock-on effects on local-and global-scale climate, necessitates the use of naturally occurring energies [1]. The need for the use of renewable energy helps human beings transmit the world to future generations [2]. Developing wind power is an important measure to alleviate energy shortage [3], reduce environmental pollution, and combat global climate change. The global installed capacity of wind power is increasing rapidly. In 2020, the global installed capacity reached 651 GW, which is over three times greater than the 2010 total. The demand for generating more electricity is absorbing huge attention at a rapid pace [4]. Wind power is the fifth-greatest power source in the world after coal, natural gas, water power, and nuclear power [5]. Wind power is generated by the kinetic energy of the air moving around a boiler turbine [6]. It is affected by a variety of meteorological factors, including temperature, humidity, and pressure, with strong randomness, volatility, and instability [7,8]. After accessing the power grid, wind power introduces a new factor of uncertainty to the grid and may have a negative impact on the grid integrity in terms of the power quality, system security, and system stability due to its rapid changes and its intermittent and diffuse nature [9]. Accurate wind power forecasting can effectively reduce the fuzziness of wind power and reduce uncertainty to a smaller error range. It is advisable to adjust the dispatching plan and optimize the grid structure to reduce the damage to the stability of the grid caused by wind power grid connection and to improve the wind power utilization rate and economic benefits [10].
A variety of models have been proposed for wind power forecasting. Early research mainly focused on physical models based on numerical weather forecasting [11,12] and statistical models (persistence method [13], ARIMA [14,15]). Physical models are based on influencing factors (such as contour lines, air pressure, and temperature) and equipment characteristics around the wind farm to establish complex mathematical models. Li et al. [16] proposed a physical approach to wind power prediction. Cassola and Burlando [17] used Kalman filtering to correct the physical model output and improve the wind speed forecasting performance. This type of model usually requires significant computing resources and time, with poor results in short-term prediction [18]. In contrast, statistical models do not require complex hydrodynamic and thermodynamic models. They require less work and perform better in short-term forecasting. Chen and Yu [19] used Kalman filtering to improve the persistence method, which reduces the prediction error and improves the stability of the system. Milligan et al. [20] used the auto-regressive moving average (ARMA) model to forecast the power of wind farms in Oregon, USA. Based on improvements to the autoregressive (AR) model, Zhang et al. [21] proposed an innovative autoregressive dynamic adaptive (ARDA) model, which was more accurate than the ARMA model, with faster calculation and better dynamic adaptability to data fluctuations. In addition, some scholars have proposed a combination model, such as Liu et al. [22] proposed an ultra short-time forecasting method based on the Takagi-Sugeno fuzzy model for wind power and wind speed. Liu et al. [23] developed a forecasting system based on a data pretreatment strategy, a modified multi-objective optimization algorithm, and several forecasting models to further enhance wind power forecasting performance and stability.
With the development of computer technology, shallow artificial neural network (ANN) models based on artificial intelligence have been widely used in wind power prediction. Traditional methods have higher requirements on the time series of historical data and often use linear methods to deal with nonlinear loads. Artificial intelligence methods have gradually become a research hotspot in recent years [24]. ANN models can effectively capture the nonlinear relationships in data with a stronger fitting ability and wider application prospects. Han et al. [25] proposed a BP neural network model for wind power prediction based on the Tabu search algorithm and demonstrated that the model could improve prediction accuracy and convergence speed with appropriate input parameters. Han et al. [26] reported that the training speed and prediction accuracy of an RBF network was better than those of a BP network in 3 h wind power prediction. Carolin Mabel and Fernandez [27] used a fully connected neural network (FNN) to forecast the wind energy power of the Muppandal wind farm in Tamil Nadu (India), reporting good agreement with the measured values of the actual wind farm, verifying the accuracy of the model. Wang et al. [28] proposed a hybrid model based on empirical mode decomposition and an Elman neural network (ENN) to predict wind speed. Similar wind power prediction research using ANNs can be found in [29][30][31].
With the limited information extracted from complex wind power data by shallow neural network models, it is difficult to conduct a comprehensive analysis of the factors affecting wind speed [32]; the requirements of deep data mining cannot be met [33][34][35]. Recently, deep learning network models (such as the deep belief network (DBN), recurrent neural network (RNN), and convolutional neural network (CNN)) have become popular solutions for wind power prediction. Unlike a shallow ANN, a deep neural network can spontaneously extract effective information from a large number of influencing factors, with a stronger nonlinear mapping ability [36,37]. Among different deep neural networks, a CNN can directly process multi-dimensional data samples, which is conducive to the extraction of inherent data features. Its unique weight-sharing structure can reduce the number of parameters, reducing the complexity of the network and effectively preventing overfitting. Several studies have applied CNNs to wind speed prediction. Solas et al. [38] proposed a short-term wind power prediction method based on a CNN and demonstrated better wind power forecasting performance than ARIMA and Gradient Supercharger. Wang et al. [39] decomposed raw wind power data into different frequencies using wavelet transform and effectively trained the CNN to improve prediction accuracy. Liu et al. [40] proposed a novel approach to predicting short-term wind power by converting time series into images, using a CNN to analyze them. Compared with state-of-the-art techniques in wind power prediction (GRU, LSTM, and RNN), superior performance was demonstrated.
To further improve CNN performance, some studies integrated different methods into the CNN prediction model. Wang et al. [18] combined the information of nearby wind farms and used a CNN for short-term wind speed prediction to overcome the problem of insufficient historical wind speed data from new wind farms. Ju et al. [41] combined the Light-GBM classification algorithm with a CNN. The research results showed that this model had better prediction accuracy than SVM, Light-GBM, DNN, and CNN when considering the volatility of ultra-short-term wind power. Similarly, Liu et al. [42] combined wavelet packet decomposition with CNN. Yildiz et al. [43] combined variational mode decomposition with CNN. Zhang et al. [44] and Kuang et al. [45] introduced bidirectional LSTM to CNN, forming a novel hybrid wind speed forecasting model.
Although a CNN provides good prediction performance and wide applicability, efficiently determining the hyperparameters of the network (such as the number of convolutional layers and the size and number of kernels) and the connection weights (weights and bias) is always difficult. Traditional research usually uses manual parameter adjustment or empirical values to determine the relevant parameters, which requires significant manpower and material resources and cannot always produce the optimal parameter combination [46]. To obtain the optimal CNN parameters, some studies have applied an intelligent optimization algorithm such as a genetic algorithm (GA) or a particle swarm optimization (PSO) algorithm in the model. Oullette et al. [47] used a GA to obtain the optimal connection weight of a CNN. To solve the problem of slow training speed, Fedorovici et al. [48] combined the Gravity Search Algorithm, PSO, and a BP algorithm to prevent the network from falling into a local minimum while ensuring its fast convergence. Albeahdili et al. [49] used the connection weight and error function of a CNN as the particle and fitness function of a PSO to improve the backpropagation stage of the CNN. Huang et al. [50] fused an improved, simplified PSO with the stochastic gradient descent method (SGD) to complete the training of a CNN. These studies focused mainly on the optimization of CNN connection weights; the structure of the network was still determined subjectively by trial-and-error and experience.
A few studies have optimized the CNN network hyperparameters (such as the size and number of convolution kernels). Chen et al. [51] used a GA algorithm to optimize the number of nodes in the hidden layer and the training function of a CNN, and proposed a short-term wind power prediction model. Dong et al. [52] used a GA-PSO hybrid algorithm to optimize the hyperparameters of the neural network framework. These studies focused only on the structural optimization of the CNN; the learning of network connection weights used the BP algorithm. In actual problems, efficiently determining the optimal hyperparameter combination of the network while preventing the model from falling into a local optimal condition still requires further research.
Ultra-short-term prediction is in the range of minutes and hours, usually within 6 h, which is helpful in optimizing frequency modulation and spinning reserve capacity [53]. Ultra-short-term forecasting is mainly used to control the daily operation of wind farm units and real-time dispatching of a power grid; it is a forecasting method with the highest theoretical accuracy. It is the key index of power system real-time dispatching and the foundation of the intraday and real-time electricity market, has an important influence on the operation, control, and planning of a power system [54,55]. Improving the prediction accuracy and reliability can reduce the cost of equipment, optimize the grid structure, and improve the utilization rate of wind power and power quality. Decision-makers can determine a reasonable scheduling scheme according to the prediction results to ensure the safe and economic operation of the power system [56]. Ultra-short-term forecasting is the basis of probability prediction. The slight improvement in prediction accuracy is important in promoting the safe and economic operation of the power system [57]. Thus, ultra-short-term wind power prediction is critical.
Prediction errors for ultra-short-term wind power are caused mainly by inherent random factors and external random factors [51]. The inherent random factors include defects or imperfections in the prediction system itself. The external random factors include incomplete data or input data errors. When making an ultra-short-term wind power forecast, fan operation data and meteorological data are required [10]. The integrity and authenticity of these data have a great impact on prediction accuracy. Unlike the capability of generating conventional resources (coal and fossil fuels) according to the consumption and at specific and accurate schedules, the production of renewable energies is variable, which can change dramatically from time to time [58], further increasing the difficulty of ultra-short-term wind power prediction.
Some studies have shown that the accuracy of ultra-short-term wind power prediction depends largely on the performance of correlation mining between data [59][60][61]. With the complex space-time relationship of wind power output and inherent characteristics such as nonlinear randomness and intermittence, deep learning is an effective approach and can improve the accuracy of ultra-short-term wind power prediction. CNNs have been selected over other deep architectures as they are particularly suitable for tuning data with a gridbased structure [62]. A time-series represents a grid of samples spaced equally in time; CNNs are able to model this type of data. In addition, a CNN can use the information in a time series to extract common features at different levels from high-dimensional time-series data, with more stable predictions. Unlike other deep learning models, a CNN uses weight sharing technology, which has fewer parameters, reducing the amount and complexity of data and improving the learning efficiency and overall generalization of the model.
In summary, CNN has been widely used in complex wind power forecasting. Integrating the intelligent algorithm GA-PSO into CNN provides a feasible idea for determining the CNN network structure (such as the number of convolution layers, the size, and the number of convolution kernels, etc.) or the training connection weights (weights and biases). However, most existing research only focuses on optimizing network connection weights, and the network structure parameters are still subjectively determined. Thus, to further improve ultra-short-term wind power prediction accuracy and address the shortcomings of traditional CNN prediction models in which the hyperparameters are subjectively difficult to determine and likely to fall into a local optimum, a hybrid GA-PSO-CNN model based on a mix-encoding GA-PSO algorithm is proposed in this study for ultra-short-term wind power prediction. The model uses related wind farm influencing factors as inputs (wind speed, wind direction, temperature, etc.), conducts hybrid coding design for relevant parameters of the CNN, and uses a GA-PSO hybrid algorithm to synergistically optimize the hyperparameters and connection weights in the CNN, which solves the problem of subjective determination of the optimal network in the CNN and effectively prevents local optimization in the training process. By doing so, this paper makes novel contributions on two aspects: on the one hand, the GA-PSO hybrid algorithm in the proposed model realizes the automatic optimization of hyperparameters in CNN. The model can effectively determine the optimal hyperparameters combination, avoiding tedious and inefficient manual tuning. On the other hand, in the CNN training process, the global search capability of PSO is combined with the fast convergence of the BP algorithm to prevent the model from falling into a local optimum during the training process, which further improves the performance of the prediction model. theory and numerical computing equipment, the CNN has developed rapidly and is used in computer vision [63], natural language processing [64], and many other fields.
Unlike other neural networks, a CNN provides local connection, weight sharing, and pooling operation. These features can simplify the network structure by reducing the number of weights and the complexity of feature extraction. CNN structure consists of a convolution layer, a pooling layer, a full connection layer, and an output layer, as shown in Figure 1.

Brief Introduction to CNN
As a representative network of deep learning, a CNN is a feedforward neural network containing a convolutional structure. CNN research began in the 1980s. Time delay network and LeNet-5 were the earliest CNN structures. With the development of deep learning theory and numerical computing equipment, the CNN has developed rapidly and is used in computer vision [63], natural language processing [64], and many other fields.
Unlike other neural networks, a CNN provides local connection, weight sharing, and pooling operation. These features can simplify the network structure by reducing the number of weights and the complexity of feature extraction. CNN structure consists of a convolution layer, a pooling layer, a full connection layer, and an output layer, as shown in Figure 1.

Convolutional Layer
The convolutional layer is the core of a CNN and is used to extract data features. The weights of the convolutional layer are shared by all feature mapping channels, which are also known as convolutional kernels or filters. The unique "weight sharing" significantly reduces the number of network parameters and is suitable for deep networks. The calculation formula of the convolutional layer is expressed as

Pooling Layer
After the convolution layer extracts the features, the pooling layer selects the features and filters the information of the output feature images. Its functions are to enlarge the local features of the image, speed up the calculation, and reduce the probability of overfitting. The maximum pooling method is a commonly used pooling method; it is calculated as

Convolutional Layer
The convolutional layer is the core of a CNN and is used to extract data features. The weights of the convolutional layer are shared by all feature mapping channels, which are also known as convolutional kernels or filters. The unique "weight sharing" significantly reduces the number of network parameters and is suitable for deep networks. The calculation formula of the convolutional layer is expressed as where x l−1 i and y l j represent the output feature map of the ith channel at the (l − 1)th layer and the jth output map at the lth layer, respectively; w l j and b l j are the weight and bias of the lth layer, respectively; C l−1 and C l denote the number of channels in the (l − 1)th layer and the lth layer, respectively.

Pooling Layer
After the convolution layer extracts the features, the pooling layer selects the features and filters the information of the output feature images. Its functions are to enlarge the local features of the image, speed up the calculation, and reduce the probability of overfitting. The maximum pooling method is a commonly used pooling method; it is calculated as where a l(i,t) represents the activation value of the tth neuron of the ith channel in the lth layer, p l(i,j) is the value after pooling, and W represents the range of the pooling area.

Full Connection Layer
The full connection layer of a CNN is equivalent to the hidden layer of a traditional feedforward neural network and is used to obtain the output by a nonlinear combination of extracted features. It is calculated as where x l−1 is the input of the (l − 1)th layer; w l , b l , and y l are the weight, bias, and output, respectively, of the lth layer.

Activation Layer
The activation layer is connected after the convolution layer and the full connection layer. Usually, Sigmoid is used as the activation function in output layers, and ReLU is used in the others.
where x is the input variable of the layer.

GA-PSO Hybrid Algorithm
PSO is a random search algorithm developed by Eberhart and Kennedy in 1995 that simulates the foraging behavior of birds [65]. PSO has fewer internal parameters, a simple structure is easily implemented, and must only abstract the solution of the specific problem without relying on the principle of the problem. PSO can quickly iterate and converge. However, due to the update of the particle position in the particle swarm, it is mainly evolved by comparing its own position with the surrounding position and the current optimal position in the swarm particle. The single mode makes its convergence rate inefficient in the later stage of the calculation, and it is easy to fall into local optimum [66]. In comparison, GA can improve the diversity of solutions due to the existence of different evolutionary modes such as crossover and mutation. However, when GA is solved to a certain range, it often leads to a large number of redundant iterations of inaction, long calculation time, and reduced solution efficiency. Therefore, it is natural to think of combining PSO and GA to improve the performance of the algorithm.
In the later stages of the algorithm, the differences among the individuals in the particle swarm were reduced, resulting in a significant decline in the local search ability of the algorithm, and susceptibility to the local optimum. Moreover, the search scope of PSO was not sufficiently detailed; the global optimal solution can be missed. The global search ability and local search ability of the algorithm were both insufficient for complex high-dimensional problems. PSO was suitable for high-dimensional optimization problems with multiple local optimal solutions and low accuracy requirements.
In this study, a GA-PSO hybrid algorithm was proposed by combining PSO and the GA mutation operation. After the completion of the PSO iteration in each generation, the current global optimal particle was found and saved. The mutation operation was conducted at specific sites for each particle. This step can increase the population richness of the particle swarm such that the algorithm maintains a strong local search ability in later iterations. After the mutation operation, a particle is randomly selected from the new population and replaced with the previously saved global optimal particle to ensure that the global optimal particle enters the next iteration of the algorithm, which improves the convergence speed and searchability of the algorithm. As an integrated algorithm, PSO-GA combines the merits of GA and PSO but avoids their drawbacks. In addition, PSO-GA has stronger efficient optimization capability than GA or PSO alone [67]. The GA-PSO hybrid algorithm flowchart is shown in Figure 2. More details can be found in reference [68]. stronger efficient optimization capability than GA or PSO alone [67]. The GA-PSO hybrid algorithm flowchart is shown in Figure 2. More details can be found in reference [68].

Proposed GA-PSO-CNN Prediction Model
Hyperparameters in neural networks have always been a challenging problem, and they have a great impact on neural network performance. Empirical parameters in traditional research may only be applicable to a specific study; manual adjustment of parameters requires significant manpower and material resources. The gradient descent method also has obvious disadvantages in the neural network as it can easily fall into the local minimum point without obtaining the global optimum. Thus, a CNN hybrid prediction model (GA-PSO-CNN) optimized by a GA-PSO algorithm is proposed in this study. The model conducts a mixed coding design for the relevant parameters of the neural network; the hyperparameters use binary coding, and the connection weights use real number coding. The GA-PSO algorithm is used to iteratively optimize the whole particle; a mutation operator is added in the iteration process of binary coding to increase the diversity of the population, allowing the model to search for more hyperparameter combinations. The overall structure of the GA-PSO-CNN model is shown in Figure 3.
The GA-PSO-CNN model realizes synchronous optimization of hyperparameters and connection weights, avoiding cumbersome and inefficient manual parameter adjustment and empirical methods without the defects of the traditional gradient descent method to improve the accuracy and robustness of the prediction model.
Randomly generate the initial population no yes Update the specific sites of each particle by the mutation operation Update the particle by PSO Randomly select a particle and replace it with p g

Initialize the relevant parameters, k=1
Calculate the fitness function value of each particle, find the current global optimal particle p g and save it k≤max_k k=k+1 Obtain the global optimal particle P g

Proposed GA-PSO-CNN Prediction Model
Hyperparameters in neural networks have always been a challenging problem, and they have a great impact on neural network performance. Empirical parameters in traditional research may only be applicable to a specific study; manual adjustment of parameters requires significant manpower and material resources. The gradient descent method also has obvious disadvantages in the neural network as it can easily fall into the local minimum point without obtaining the global optimum. Thus, a CNN hybrid prediction model (GA-PSO-CNN) optimized by a GA-PSO algorithm is proposed in this study. The model conducts a mixed coding design for the relevant parameters of the neural network; the hyperparameters use binary coding, and the connection weights use real number coding. The GA-PSO algorithm is used to iteratively optimize the whole particle; a mutation operator is added in the iteration process of binary coding to increase the diversity of the population, allowing the model to search for more hyperparameter combinations. The overall structure of the GA-PSO-CNN model is shown in Figure 3.
The GA-PSO-CNN model realizes synchronous optimization of hyperparameters and connection weights, avoiding cumbersome and inefficient manual parameter adjustment and empirical methods without the defects of the traditional gradient descent method to improve the accuracy and robustness of the prediction model.

Coding Scheme
In a CNN, the performance of the model is closely related to the hyperparameters. In this study, the number of convolutional layers, the kernel size, and the number of filters in each layer are the hyperparameters that must be optimized in the prediction model. The values of the hyperparameters are all integers; binary coding is used in the front part of the particle, which ensures that the integer type of the hyperparameters remains unchanged in the iteration process and is convenient for mutation operations in the genetic algorithm.
A specific coding example of the parameters in the GA-PSO-CNN model is shown in Figure 4. The hyperparameter value was obtained by summing the binary values controlling all positions of the parameter. The latter part of the particle was used to optimize the connection weights of the CNN. Different hyperparameter values established different CNN structures. The length of the particle was determined by the maximum hyperparameter value in the CNN. In the algorithm, according to the specific hyperparameter value, the weights of the corresponding position in the particle were iterated, and the other positions remain unchanged.

Coding Scheme
In a CNN, the performance of the model is closely related to the hyperparameters. In this study, the number of convolutional layers, the kernel size, and the number of filters in each layer are the hyperparameters that must be optimized in the prediction model. The values of the hyperparameters are all integers; binary coding is used in the front part of the particle, which ensures that the integer type of the hyperparameters remains unchanged in the iteration process and is convenient for mutation operations in the genetic algorithm.
A specific coding example of the parameters in the GA-PSO-CNN model is shown in Figure 4. The hyperparameter value was obtained by summing the binary values controlling all positions of the parameter.

Coding Scheme
In a CNN, the performance of the model is closely related to the hyperparameters. In this study, the number of convolutional layers, the kernel size, and the number of filters in each layer are the hyperparameters that must be optimized in the prediction model. The values of the hyperparameters are all integers; binary coding is used in the front part of the particle, which ensures that the integer type of the hyperparameters remains unchanged in the iteration process and is convenient for mutation operations in the genetic algorithm.
A specific coding example of the parameters in the GA-PSO-CNN model is shown in Figure 4. The hyperparameter value was obtained by summing the binary values controlling all positions of the parameter. The latter part of the particle was used to optimize the connection weights of the CNN. Different hyperparameter values established different CNN structures. The length of the particle was determined by the maximum hyperparameter value in the CNN. In the algorithm, according to the specific hyperparameter value, the weights of the corresponding position in the particle were iterated, and the other positions remain unchanged. The latter part of the particle was used to optimize the connection weights of the CNN. Different hyperparameter values established different CNN structures. The length of the particle was determined by the maximum hyperparameter value in the CNN. In the algorithm, according to the specific hyperparameter value, the weights of the corresponding position in the particle were iterated, and the other positions remain unchanged.

Selection of Mixed Algorithms
CNNs are widely used to predict ultra-short-term wind power. The model architecture depends mainly on the number of convolutional layers, the kernel size, the number of filters, and other relevant parameters. Different hyperparameter combinations greatly affect the model performance. In this study, the PSO algorithm was used to update the position of the binary code of the particle to realize automatic hyperparameter optimization. Considering that the solution richness of the PSO algorithm decreased in the later iterations, the GA mutation operation was added to the PSO to ensure that particles were not the same and to enhance the local optimization ability of the algorithm in later iterations. ISSO can effectively optimize complex parameter combinations in the solution space [69]. Compared with the PSO algorithm, ISSO has been demonstrated to more efficiently converge to highquality solutions and is more suitable for continuous problems [70]. Thus, ISSO is used for updating iterations in the real number coding of particles.

Initialization Method
The initialization of weights in deep learning models is important for the training of the model. If the initial weights are too small, the signal gradually shrinks in the process of transmission, and differences are difficult to generate, which is not conducive to model convergence. If the initial weights are too large, the signal increases layer by layer in the transmission process, resulting in gradient explosion. Xavier initialization was used in this study to ensure that a moderate initial weight was generated [55]. The Xavier initialization is shown in Equation (6).
where f an in and f an out represent the number of nodes in the input and output, respectively. This method generates an evenly distributed initial weight value to activate each layer.

Fitness Function
In this study, the average absolute error of the prediction model using the training set was selected as the fitness function [71]: where N represents the number of samples in the training set; t(i) and y(i) represent actual data and predicted data, respectively.

Model Process and Framework
Most previous studies only considered a single CNN problem (such as difficulty in hyperparameter tuning or local minimums). The proposed model optimizes both parts simultaneously to improve the performance of the neural network. The flow chart of the GA-PSO-CNN model is shown in Figure 5. The model uses GA-PSO combined with a BP algorithm to optimize the hyperparameters and connection weights of the network. The optimization process is described as follows. The basic formulas can be referred to separately [72,73]. Randomly generate M mixed coded Particles P m =[P m (1) ,P m (2) ], m=1,2,…,m, including binary part P m (1) and real part P m (2) .

g=1
Decode P m (1) (structure) and P m (2) (weights and bias) and calculate fitness fit m Update the global optimal solution P g =[P g (1) ,P g (2) ] ,individual optimal solution P i =[P i (1) ,P i (2) ] and the global optimal value P best Update the velocity and position of all particle binary parts P m (1) and real parts P m (2) m=1 Decode the k particle and use BP algorithm to iterate n times Update the position of m paritlces P m,k Step 1: Initialize the inertia factor, the learning factor, the mutation probability (α) of the binary part and the real part, the maximum velocity of the particle (v max ), and the maximum number of iterations (max_g). According to the coding scheme, the hyperparameters use binary coding to control the number of convolutional layers, the kernel size, and the number of filters; the connection weights use real number coding. There is a one-to-one correspondence between the connection weights of the network nodes. Randomly generate M initial particles.
Step 3: Calculate the fitness function value of each particle according to Equation (7).
Step 4: Update the current optimal solution P g = [P (1) g , P g ], the individual optimal solution P i = [P (1) i , P (2) i ], and the global optimal value P best .
Step 5: Update the particle velocities using Equation (8). The position of the binary part of the particle is updated according to Equations (9) and (10), and the position of the real part of the particle is updated according to Equation (11). The mutation operation of the GA algorithm is added in the process of updating the binary part.
x m,t,j = where V m represents the velocity of particle m, r 1 and r 2 are random values in the interval [0, 1]; p best is the individual optimal position reached by the particle; g best is the global optimal position; s is the probability of changing the position of the particle; x m,t,j represents the value of the jth site of particle m in the jth iteration; ρ I is the random number generated within the interval I; g j represents the value of the jth site of the global optimal particle; C r and C g are given constants; u j is speed parameter; U j and L j are the upper and lower bounds of the variable, respectively; N Var is the number of variables.
Step 7: Perform n BP iterations on particle m. Update the value of the corresponding position in the real part of particle m according to the connection weight after iteration.
Step 8: If m ≤ M, then m = m + 1. Go to Step 7. If not, go to Step 9.
Step 9: If g ≤ max_g, then g = g + 1. Go to Step 3. If not, go to Step 10.
Step 10: The algorithm is finished, and the optimal parameter combination P * = [P (1) m * , P m * ] of the CNN is obtained. of single-unit wind turbines with a capacity of 1500 kW. The total construction capacity was 300 MW, which will be completed in 4 phases. As the output power of a wind farm was affected mainly by the surrounding environment, this study considered wind speed, wind direction, air pressure, temperature, density at hub height, and historical wind power as input features. The variables are presented in Table 1. The data used in this study was from approximately 45 wind turbines in the first phase of the wind farm project, in 2 parts. The first part was the actual output wind power from January 1 to 31 December 2012. The second part was the meteorological wind speed, wind direction, air temperature, pressure, and density at hub height data from the wind farm in the same period. The sampling interval was 15 min; there were a total of 35,040 groups. There were a few missing data values. Lagrange interpolation was used to complete the data.

Data Description and Preprocessing
The data set was divided into 4 seasons: winter (December-February), spring (March-May), summer (June-August), and autumn (September-November). The first 70% of the data were used as the training set; the remaining 30% was the test set.

Time-Order Character
There is a time correlation in wind power data. In this section, the time-order character was established according to the correlation and used as input to predict the output of the next time point. The sliding window method was introduced to predict the output at each time point and compare it with the actual value. The wind speed, wind direction, atmospheric pressure, temperature, hub height density, and historical wind power output at the previous 4 time points were selected to form an input matrix x 1 to predict the wind power output power y 1 at the next time point as shown in Figure 6. Similarly, all characteristic variable values from the second to the fifth time points were selected to form an input matrix x 2 to predict the wind power output y 2 at the sixth time point, etc. By constructing time-order characters, the CNN can better learn the inherent features in wind power data. However, when the time-order characters are too long, the model extracts too many redundant features, which reduces the prediction accuracy of the model and increases the time cost of training the model. Thus, time-order characters of different lengths were used as input to forecast the output of wind power. Figure 7 shows the MSE and MAE obtained by the CNN prediction model with time-order characters of different lengths used as input. The horizontal axis represents the length of the time-order characters; the vertical axis represents the error value. In Figure 7, the first half of the error curve showed a downward trend, and the second half showed an upward trend. When the timeorder character length was 4, MSE and MAE simultaneously reached the minimum value. Thus, a time-order character length of 4 was chosen in this study. By constructing time-order characters, the CNN can better learn the inherent features in wind power data. However, when the time-order characters are too long, the model extracts too many redundant features, which reduces the prediction accuracy of the model and increases the time cost of training the model. Thus, time-order characters of different lengths were used as input to forecast the output of wind power. Figure 7 shows the MSE and MAE obtained by the CNN prediction model with time-order characters of different lengths used as input. The horizontal axis represents the length of the time-order characters; the vertical axis represents the error value. In Figure 7, the first half of the error curve showed a downward trend, and the second half showed an upward trend. When the time-order character length was 4, MSE and MAE simultaneously reached the minimum value. Thus, a time-order character length of 4 was chosen in this study.
in wind power data. However, when the time-order characters are too long, the model extracts too many redundant features, which reduces the prediction accuracy of the model and increases the time cost of training the model. Thus, time-order characters of different lengths were used as input to forecast the output of wind power. Figure 7 shows the MSE and MAE obtained by the CNN prediction model with time-order characters of different lengths used as input. The horizontal axis represents the length of the time-order characters; the vertical axis represents the error value. In Figure 7, the first half of the error curve showed a downward trend, and the second half showed an upward trend. When the timeorder character length was 4, MSE and MAE simultaneously reached the minimum value. Thus, a time-order character length of 4 was chosen in this study.  Table 2. The research was conducted using Python3.5 and Tensorflow1.4.0, on a personal computer with an i5-10300H CPU for calculations. To demonstrate the superior performance of the new CNN framework, 3 performance indicators were introduced to evaluate the prediction ability of the model: mean absolute error (MAE), mean square error (MSE) and mean absolute percentage error (MAPE).

Model Parameter Description Value
Single-CNN c The number of convolution layers 2 k1 k2 The length of the convolution window 4/5 f1 f2 The convolution kernel number of convolution layers 5/5 epochs The maximum number of iterations in the network 200 PSO-CNN n The number of particles in a particle swarm 20 epochs1 The maximum number of iterations of PSO 20 epochs2 The number of BP runs per iteration 10 r The ratio of BP in each generation of particles 0.3  Table 2 the research was conducted using Python3.5 and Tensorflow1.4.0, on a personal computer with an i5-10300H CPU for calculations. To demonstrate the superior performance of the new CNN framework, 3 performance indicators were introduced to evaluate the prediction ability of the model: mean absolute error (MAE), mean square error (MSE) and mean absolute percentage error (MAPE). Table 2. Parameter values for different models.

Model Parameter Description Value
Single-CNN c The number of convolution layers 2 k1 k2 The length of the convolution window 4/5 f1 f2 The convolution kernel number of convolution layers 5/5 epochs The maximum number of iterations in the network 200 PSO-CNN n The number of particles in a particle swarm 20 epochs1 The maximum number of iterations of PSO 20 epochs2 The number of BP runs per iteration 10 r The ratio of BP in each generation of particles 0.3

ISSO-CNN
n The number of particles in a particle swarm 20 epochs1 The maximum number of iterations of PSO 20 epochs2 The number of BP runs per iteration 10 Cr Cv Cg The parameters of the iterative formula 0.35/0.45/0.2 r The ratio of BP in each generation of particles 0.3

CHACNN kernel
The interval of convolution kernel size [2,6] filters The interval of filters in the convolution layers [2,8] alpha The interval of learning rate in the network [0.05, 2] epochs1 The maximum number of iterations of the algorithm 20 epochs2 The interval of maximum BP iterations [100, 250] GA-PSO-CNN n The number of particles in a particle swarm 20 epochs1 The maximum number of iterations of GA-PSO 20 epochs2 The number of BP runs per iteration 50 c1 The number of convolution layers [1, 2] k1, k2 The interval of convolution kernel size [2, 6] f1, f2 The interval of filters in the convolution layers [2,8]

Analysis of Results
To further verify the performance of the proposed model, each model was compared with GA-PSO-CNN, using MAE, MSE, and MAPE to measure its prediction accuracy. To ensure the validity of the evaluation criteria, the input of all models in the experiment was the same. As the experimental results have a certain degree of randomness, each model was trained 50 times to prevent accidental results in a single operation. The average of the error indicators for 50 operations was calculated for comparison to ensure the fairness of the results. Figure 8 shows the distribution of each performance index obtained from 50 independent operations of each model. It was observed that in each season, the GA-PSO-CNN model had a higher prediction accuracy and a smaller fluctuation interval in each performance index than the comparison models, which verified the excellent prediction performance and robustness of this model.  Table 3 shows the average MAE, MSE, and MAPE values for different models in each season; the optimal value of each index is indicated in bold. The GA-PSO-CNN model demonstrates good wind power prediction performance in different seasons. In terms of MAE, MSE, and MAPE, the GA-PSO-CNN model had the best performance in all seasons, indicating that it was superior to the Single-CNN, PSO-CNN, ISSO-CNN, and CHACNN models in predicting ultra-short-term wind power. Compared with the benchmark model (Single-CNN), the GA-PSO-CNN model had the most obvious improvement in MAE, MSE, and MAPE performance indexes in different seasons, demonstrating its superior performance in ultra-short-term wind power forecasting.  Table 3 shows the average MAE, MSE, and MAPE values for different models in each season; the optimal value of each index is indicated in bold. The GA-PSO-CNN model demonstrates good wind power prediction performance in different seasons. In terms of MAE, MSE, and MAPE, the GA-PSO-CNN model had the best performance in all seasons, indicating that it was superior to the Single-CNN, PSO-CNN, ISSO-CNN, and CHACNN models in predicting ultra-short-term wind power. Compared with the benchmark model (Single-CNN), the GA-PSO-CNN model had the most obvious improvement in MAE, MSE, and MAPE performance indexes in different seasons, demonstrating its superior performance in ultra-short-term wind power forecasting. Due to the large climatic differences in different seasons, 3 days were randomly selected from different seasons to draw the error box plot between the predicted power of the GA-PSO-CNN model and the actual wind power, as shown in Figure 9. It was observed that in autumn and winter, the uncertainty of wind power prediction was relatively low; the error between the predicted value and the actual value was generally maintained below 10%. However, the uncertainty of wind power forecasting was greater in summer; the error was controlled within 20%. The climate conditions in summer were more complex and rainy and were not conducive to wind power prediction.
To further analyze the fitting effect of GA-PSO-CNN on wind power in each season, 1 day was selected from each season to draw a comparison diagram between the predicted value of GA-PSO-CNN and the actual value, as shown in Figure 10. GA-PSO-CNN exhibits good prediction performance in all seasons, and the prediction results were close to the actual values. When the climate conditions were complex, and the wind speed changed greatly and rapidly (as in summer), the error indexes of the prediction models were relatively high; the GA-PSO-CNN model produced the smallest error. In Figure 10, even with several wind power changes in a short period of time in summer, the GA-PSO-CNN model extracts relevant features. The prediction curve was close to the actual value curve, indicating a good prediction.  The formation of wind was closely related to air pressure; the weather conditions vary greatly in the morning and evening, as does the wind force. To further verify the wind power prediction accuracy improvements of the GA-PSO-CNN model, a month was randomly selected from the data set. The day was divided into day (8:00-22:00) and night (22:00-8:00 the next day) for prediction. Table 4 shows the prediction errors of each model in different time periods.
During the day, the average power of the wind farm was 6.16 MW every 15 min; at night, the average power was 7.88 MW. As observed in Table 4, the influence of each disturbance factor was smaller at night, which was more conducive to wind power prediction. The prediction error of each model was significantly smaller at night than in the day. The GA-PSO-CNN model was only slightly inferior to the CHACNN model in MAPE index at night (2.30% difference); in most other cases, the GA-PSO-CNN model achieved  The formation of wind was closely related to air pressure; the weather conditions vary greatly in the morning and evening, as does the wind force. To further verify the wind power prediction accuracy improvements of the GA-PSO-CNN model, a month was randomly selected from the data set. The day was divided into day (8:00-22:00) and night (22:00-8:00 the next day) for prediction. Table 4 shows the prediction errors of each model in different time periods.
During the day, the average power of the wind farm was 6.16 MW every 15 min; at night, the average power was 7.88 MW. As observed in Table 4, the influence of each disturbance factor was smaller at night, which was more conducive to wind power prediction. The prediction error of each model was significantly smaller at night than in the day. The GA-PSO-CNN model was only slightly inferior to the CHACNN model in MAPE index at night (2.30% difference); in most other cases, the GA-PSO-CNN model achieved The formation of wind was closely related to air pressure; the weather conditions vary greatly in the morning and evening, as does the wind force. To further verify the wind power prediction accuracy improvements of the GA-PSO-CNN model, a month was randomly selected from the data set. The day was divided into day (8:00-22:00) and night (22:00-8:00 the next day) for prediction. Table 4 shows the prediction errors of each model in different time periods. During the day, the average power of the wind farm was 6.16 MW every 15 min; at night, the average power was 7.88 MW. As observed in Table 4, the influence of each disturbance factor was smaller at night, which was more conducive to wind power prediction. The prediction error of each model was significantly smaller at night than in the day. The GA-PSO-CNN model was only slightly inferior to the CHACNN model in MAPE index at night (2.30% difference); in most other cases, the GA-PSO-CNN model achieved the best results. The GA-PSO-CNN model was superior to Single-CNN, PSO-CNN, ISSO-CNN, and CHACNN models in wind power prediction in different time periods.
To study the wind power prediction performance of the model in different weather conditions, based on the local weather forecast, 3 days were selected in each season (sunny, cloudy, and rainy days) to compare the performance of the models. Table 5 shows the performance evaluation results for each model. The forecast accuracy on rainy days is lower than on sunny days for all models in all seasons. On rainy days, the meteorological conditions were chaotic, and there were many interference factors. Wind power was sensitive to weather conditions; accurate prediction was difficult on rainy days. Improving the prediction accuracy of wind power on rainy days has been a research focus in recent years. In Table 5, it is observed that the GA-PSO-CNN model exhibits the best performance of all prediction models, with the lowest MAE, MSE, and MAPE on most days. The error range of the GA-PSO-CNN model was the smallest of the models, even on cloudy and rainy days. The results further demonstrate the stability and effectiveness of the GA-PSO-CNN model.
The GA-PSO-CNN model was run 50 times independently on the data set to decode the optimal particles, and the size and number of convolution kernels in each layer were calculated, as shown in Table 6. Compared with the structural parameters selected by manual parameter tuning, the GA-PSO-CNN model has a simplified network structure. For the average value in 50 runs, the size and number of convolution kernels in each layer decreased by 5-18.4%, indicating that the GA-PSO-CNN model has a stronger ability to resist overfitting and better generalization.
The GA-PSO-CNN model can accurately predict wind power in different seasons, time periods, weather, and other conditions. Compared with other models, the GA-PSO-CNN model has improved prediction accuracy and robustness. This model can be applied to ultra-short-term wind power prediction in different scenarios.

Conclusions
To solve the problem of wind power prediction caused by randomness and volatility, a hybrid GA-PSO-CNN prediction model is proposed in this study. The model uses CNN to extract and fit the high-dimensional features of wind farm data, integrates the intelligent algorithm GA and PSO, and optimizes the structure and parameters of CNN to obtain the optimal network structure and connection weight combination simultaneously.
The proposed GA-PSO-CNN model is applied to the power generation data of a wind farm in Ningxia. The experimental results demonstrate the improvements of the GA-PSO-CNN model in prediction accuracy and network structure simplicity. Compared with Single-CNN, PSO-CNN, ISSO-CNN, and CHACNN models, the GA-PSO-CNN model reduced the MAE, MSE, and MAPE by 1.13-9.55%, 0.46-7.98%, and 3.28-19.29%, respectively, with data sets from different seasons. The GA-PSO-CNN model has a more simplified network structure than with hyperparameters selected by manual tuning, reducing the size and number of convolution kernels by 5-18.4%.