Wind Power Ramp Event Forecasting Based on Feature Extraction and Deep Learning

: In order to improve the accuracy of wind power ramp forecasting and reduce the threat of ramps to the safe operation of power systems, a wind power ramp event forecast model based on feature extraction and deep learning is proposed in this work. Firstly, the Optimized Swinging Door Algorithm (OpSDA) is introduced to detect wind power ramp events, and the extraction results of ramp features, such as the ramp rate, are obtained. Then, a ramp forecast model based on a deep learning network is established. The historical wind power and its ramp features are used as the input of the forecast model, thereby strengthening the model’s learning for ramp features and preventing ramp features from being submerged in the complex wind power signal. A Convolutional Neural Network (CNN) is adopted to extract features from model inputs to obtain the coupling relationship between wind power and ramp features, and Long Short-Term Memory (LSTM) is utilized to learn the time-series relationship of the data. The forecast wind power is used as the output of the model, based on which the ramp forecast result is obtained after the ramp detection. Finally, the wind power data from the Elia website is used to verify the forecast performance of the proposed method for wind power ramp events.


Introduction
As renewable clean energy, wind energy is widely used in China.According to the statistical results released by the National Energy Administration, the installed capacity of national wind power integration has increased by 25.74 million kW in 2019, with an installed gross capacity of 210 million kW [1].With the scale integration of wind power increasing, the safe and stable operation of the power system has been seriously affected by uncertainty.In particular, problems caused by wind power ramps, such as the imbalance of the power grid between power generation and power supply, induce great safety hazards or serious economic losses to the operation of the power grid [2].Therefore, for the safe operation and economic dispatch of power systems, it is extremely important to be able to obtain accurate forecast results of ramp events.The difference between ramp forecasting and wind power forecasting is that the event is used as the object of ramp forecasting.There are two methods for forecasting ramps: direct forecasting and indirect forecasting.
Direct forecasting is used to obtain the detection mechanism by training historical ramp data, and then the features of ramp events (such as ramp amplitude, ramp duration, or ramp rate) are directly obtained without the need for wind power forecasting [3].In [4], a model based on the autocorrelation statistical characteristics of ramps and a day-ahead forecast algorithm for the sequence of ramps were established.In [5], the Support Vector Machine (SVM) was used to classify and forecast the ramp amplitude directly based on historical wind power data.However, there are obvious shortcomings in direct forecasting; that is, a large amount of historical wind power data is required to train the model, and the completeness of the ramp data in these historical data affects the forecast accuracy.
Indirect forecasting is the current mainstream method for wind power ramp event forecasting, which is based on the use of wind power forecasting to obtain ramp forecast results by ramp detection [6].For example, the Back Propagation (BP) neural network and Radial Basis Function (RBF) neural network were used in [7,8], respectively, to forecast the components of wind power decomposed by the Atomic Sparse Method (ASD).However, because a ramp event is both a strong mutation and a low-probability event, it is difficult for traditional neural networks to learn the features of wind power when a ramp event occurs.Thus, it is difficult to obtain a high forecast accuracy.Currently, deep learning networks with stronger learning capabilities are widely used in wind power forecasting.The forecast accuracy can be improved by extracting wind power features with a Convolutional Neural Network (CNN).In [9,10], a CNN was used to extract the hidden features of the coupling relationship between wind energy, wind speed and wind direction.In [11], a CNN was used to extract spatial features from the spatial wind speed matrix.The time-series feature of wind power can be effectively learned by Long Short-Term Memory (LSTM) with a special structure to obtain more accurate forecast results [12].In [13,14], LSTM was also used as a nonlinear mechanism for forecasting.Although the performance of wind power forecasting based on the deep learning network is improved, for ramp forecasting, the wind power ramp features are not sufficiently considered in the forecast model input, which also affects the accuracy of the ramp forecast model.
In order to solve these problems, many scholars have carried out research on the input of the forecast model and used the relevant feature data of wind power as the input of models to improve the forecast accuracy.For example, in [15], the effective meteorological variables in Numerical Weather Prediction (NWP) were utilized as the model inputs.In [16,17], the long-term trend features of wind power were captured by physical models before forecasting.In [18], not only were the spatial features of wind power considered, but the combination of a deep neural network and multi-task learning was also adopted, and the input from multiple wind farms was simultaneously received to forecast ramps based on the spatial correlation of wind farms.However, it is still uncertain whether these feature data, used as the inputs in existing studies, are directly related to ramp features.
In [19], the abrupt feature in the wind power time-series extracted by a CNN was applied as the ramp feature, and this feature was utilized as the input of LSTM to capture the long-term wind power ramp feature.However, the extraction process of ramp features based on CNN was not strictly defined.As a small sample event in a wind power sequence, ramp events are easily submerged by wind power, and the ramp feature information is quite complicated.If ramp features are not extracted according to the definition of the ramp or are not used as the input, a complete feature situation is difficult for models to learn, and thus it is hard to obtain effective results from wind power ramp event forecasting.
In summary, a wind power ramp event forecast model based on the Optimized Swinging Door Algorithm (OpSDA), a Convolutional Neural Network and Long Short-Term Memory is proposed in this paper.Firstly, OpSDA is used to detect historical wind power ramp events and extract feature values.Then, the hidden ramp features are automatically extracted by a deep learning network, which can effectively explore the coupling relationship between power and ramps.The second extraction for ramp features with CNN is conducive to the effective learning of small-sample ramp events by the forecast model.Then wind power and ramp features are used as the input of the LSTM model.Finally, the wind power forecast data are detected again to obtain the forecast results of ramp events.

Features of Wind Power Ramp Events
A wind power ramp event refers to a phenomenon in which wind power changes greatly in one direction in a short time.In accordance with the work presented in [2], a ramp event is defined by Equation (1) in this paper.It indicates that a ramp event occurs when the absolute value of the difference between the power at the beginning and the end of a period of time exceeds the decision threshold.
Δ threshold where is the power at time Δ t + t , t P is the power at time t , and threshold P is the decision threshold, which is set to 3% of the installed capacity in this paper in accordance with the work in [13].
Ramps can be divided into up-ramps and down-ramps depending on the direction.An up-ramp is a sudden increase in wind power over a period of time ( Δ 0 -t+ t t P P > ), and a down-ramp is a sudden decrease in wind power over a period of time ( Δ 0 - Based on the performance analysis of the wind power ramp event, four important features that define the ramp event are shown in Figure 1, which includes the ramp rate R R , the ramp amplitude SW R , the ramp start time ST R and the ramp duration D R [20].
Features of the wind power ramp event.
Wind power ramp events pose a serious threat to the power quality and safe operation of the power grid.They even cause problems such as frequency instability, load shedding and blackouts.In order to reduce the hazards caused by wind power ramps, high-precision ramp forecasting is urgently needed to provide data support for subsequent grid dispatching [21].Due to the strong randomness and great fluctuation of wind power ramps and the stronger nonlinearity of ramps than of non-ramps, the features of the ramp segments in wind power are quite different from those of the non-ramp segments.Therefore, it is more difficult to forecast the wind power in the ramp segment, and large errors are formed by using the traditional wind power forecast method when a ramp occurs.
A forecast method that considers the influence of ramp features in the forecast process to improve the accuracy is proposed in [22].The reasons for considering ramp features are as follows: (1) a wind power ramp event is a low-probability event for the entire wind power time-series, so the impact of ramp features on the forecast needs to be fully considered when using the deterministic model to forecast wind power; (2) four key ramp features are designed to measure whether a ramp occurs and the severity of a ramp.Therefore, a larger ramp rate when a ramp occurs means wind power with stronger nonlinearity, and forecast errors of ramps are higher than those of non-ramps.On the one hand, the ramp features are used as the input of the forecast model to learn the relationship between wind power and ramp events during the training process and to improve the forecast accuracy.On the other hand, this is also conducive to the safe, stable and economic operation of the wind power grid, which can reduce the reserve margin during grid dispatching and the impact of ramps on the power generation and power supply balance of the grid.

Ramp Detection and Feature Extraction Based on OpSDA
The Swinging Door Algorithm (SDA) is a data compression algorithm that filters samples according to the parallelogram rule.The parallelogram rule refers to the construction of a parallelogram based on adjustable parameters ε (door width) to divide a long sample into multiple small samples [23].The initial point and the end point of each segment of data are detected as SDA piecewise points, and the data in the middle of each segment are compressed.As shown in Figure 2, the first sample point A is acquired as the initial point, and multiple parallelograms are constructed in the order of samples, with 2ε as the height to enclose samples, such as in the parallelogram AC and the parallelogram AD.Each time we construct a parallelogram, we need to determine whether the parallelogram covers all the samples in the segment.When the parallelogram cannot completely cover all the data in the segment, such as the parallelogram AF, the previous sample point E is acquired as the end point of the segment and the initial point of the next segment [24].By analogy, points A, E, I and K in Figure 2 are SDA piecewise points.Points B, C and D are compressed in the first segment, points F, G and H are compressed in the second segment, and point J is compressed in the third segment.In SDA, only ramps are detected between the piecewise points according to Equation (1).Generally, a ramp event on the long-term scale can be divided into several adjacent small events by the SDA, and defects are present in the detection effect.Given an appropriate scoring function S, the Optimal Swinging Door Algorithm (OpSDA) is suitable for ramp detection on a long-term scale.The objective function   P i, j is constructed for any wind power time-series in the time interval   i, j , and ramps are recognized by obtaining their maximum.Depending on SDA piecewise points, adjacent events with the same ramp direction are classified as the same event by the optimized algorithm.

 
The constraint of the objective function   where   S i,k is the score of the time interval   i, k and    R is the ramp rule.If Equation (1) is satisfied within the time interval   i, j , a ramp event occurs, and   1 In order to improve the accuracy of ramp detection, some events, called bumps, in the wind power time-series that have a small amplitude but are opposite to the adjacent ramp direction also need to be recognized by OpSDA.Up-ramp detection is taken as an example by using Equation (5) to determine whether a bump occurs.
) by the combination of ramps and non-ramps.In order to detect bumps, suppose that the total time interval set of bumps in the sample is , and the n -th non-ramp is n,b t .The wind power sequence can be signified as -and the number of ramps in the sample is changed into The wind power ramp event is detected by the OpSDA, and the process of extracting its features is shown in Figure 3.For indirect forecasting, the accuracy of ramp forecasting is determined by the degree of ramp detection.Wind power data from a Belgian wind farm from 5:30 on 4 April to 2:00 on 6 April 2019-taken from the Elia website [25]-represent an example detected by OpSDA, and the results are shown in Figure 4.In the first and third up-ramps, the detected bumps and the adjacent ramps are classified as one event, which improves the shortcomings of SDA not being able to detect the long-term ramps.Therefore, the ramp trend of the wind power sequence can be effectively detected by OpSDA, which provides a data basis for the second feature extraction.

Start O b t a i n th e i n i t i a l p i e c e w i se p o i n t b y SD A O b t a i n t h e ti m e i n t e r v a l ( i , j ) o f e a c h P i e c e w i s e p o i n t i n t h e s l i d in g w i n d o w
Wind power data

Ramp Forecasting Based on Deep Learning
Compared to traditional machine learning methods, a deep learning network captures data features using multiple hidden layers and emphasizes the depth of the model, which means that the hidden features of data can be mined and the forecast accuracy is effectively improved.However, it is difficult to learn complex wind power ramp features based only on one kind of certain deep learning network [26].Therefore, the Convolutional Neural Network-Long Short-Term Memory network forecast model is proposed in this paper to deeply learn the ramp features of wind power and then obtain the forecast results of ramp events.

CNN
The CNN is a feedforward neural network and one of the representative algorithms of deep learning, with four characteristics: a local connection, weight sharing, pooling operation and a multilayer structure [27].Moreover, compared with a traditional neural network, the CNN can automatically extract the effective local features of the data with lower computational complexity and stronger generalization ability.In order to extract the coupling relationship between wind power and other influencing factors and the time-series relationship of features, the CNN has been widely used in the field of wind power in recent years [28].
The typical structure of a CNN usually consists of the convolutional layer, the pooling layer and the fully connected layer.In the convolution layer, the input is extracted by a convolution operation, which is expressed by Equation ( 6): where the output of the i -th feature map in the 1 -(l ) -th layer is represented by The output of the convolutional layer is subjected to second feature extraction and information filtering in the pooling layer, thereby retaining the most significant features.The numbers of parameters and data are compressed in the pooling layer, which can effectively reduce overfitting and minimize the complexity of the network.The operation in the pooling layer is shown in Equation ( 7): ( ) where ( ) down  implicates a subsampled function.
Common pooling methods mainly include the mean-pooling with the average feature value and the max-pooling with the maximum feature value.In wind power ramp forecasting based on ramp features, for the law information of the power and ramp feature sequence to be effectively learned by the subsequent LSTM model, the max-pooling is selected to extract the most important information of ramp features.
Taking the time-series data of wind power and its four ramp features as the input, the onedimensional convolution model is used in this paper to extract features, as shown in Figure 5.According to Figure 5, the input of the model is five one-dimensional data feature maps with image height n, where n is also the sliding window width of the model.Multiple hidden layers are used for training data by the deep learning network to effectively learn essential features.In this paper, the CNN model is set with three convolutional layers and a max-pooling layer to deeply learn the data set and extract features.Features are extracted by moving the convolution kernels (filters) on feature maps for convolution operation.The size of feature maps is determined by the size of convolution kernels, which is the depth of the next layer of the CNN.The size of the convolution kernels in the first convolution layer is 2 × 4, and the stride is 1, so the original output size should be × ( 1) 4 n .However, in order to prevent the loss of ramp feature information and keep the CNN output feature size unchanged from the size of the original feature map, all-zero padding is adopted by the model to maintain the feature dimension, setting the padding as "equal".Thus, the output of the first convolutional layer should be n × 4. By analogy, the size of the convolution kernel in the second convolution layer is 2 × 16, and the output size is n × 16.The size of the convolution kernel in the third convolution layer is 2 × 32, and the output size is n × 32.As the number of feature maps in each layer increases, multi-layer convolution can extract more complex features from low-level features.In the deep learning network, a nonlinear activation function ( )  f is also needed to construct a sparse matrix to remove the redundancy of the data and retain the features of the data as far as possible.Therefore, the model in this paper includes a rectified linear unit (ReLU) with good generalization performance after each convolutional layer.The pooling kernel size in the maxpooling layer is 32.After the feature map obtained by the convolutional layer is processed by the max-pooling layer, the final output vector size is 1 × n and forms the input of the LSTM model.

LSTM
In recent years, LSTM has been used a great deal in the field of wind power forecasting.Dynamic changes of time-series can be captured by LSTM, which can solve the problem of the long-term dependence on wind power and prevent the gradient of wind power from disappearing in the network transmission.The basic unit structure of LSTM is shown in Figure 6, including three control gates: the input gate, the forget gate and the output gate.The important ramp features can be retained by the unique memory cell of LSTM, and the unimportant features can be discarded by the forget gate to strengthen the network's learning ability for ramp features.The activation function formula [29] of each gate is where σ is the sigmoid activation function or the tanh activation function; t y is the input vector at

Structure Design of CNN-LSTM
On the basis of Section 3.1, if only wind power is utilized as the input, it is difficult for the forecast model to capture the ramp features of wind power.Therefore, historical wind power and ramp features corresponding to the time-series extracted by OpSDA are utilized as the input of the CNN-LSTM deep learning network model established in this paper.They are constructed as a feature map by the sliding time window, and the wind power to be forecasted is used as the output.Thus, the learning of the relationship between wind power and ramp features is deepened in the network to reduce the ramp forecast error.In the deep learning network model, a CNN is adopted for the second feature extraction for wind power and ramp features, and LSTM is adopted to forecast wind power.The structure of the CNN-LSTM model is shown in Figure 7.   (1) The input Wind power and four ramp features form a data set that is used as the input.The four ramp features are the ramp rate R R , the ramp amplitude SW R , the ramp start time ST R and the ramp duration D R .In order to maintain the continuity of the ramp features, the features are filled with zeros in non-ramps according to the original time-series sequence, meaning that the feature dimension is consistent with the wind power dimension.In Figure 7, P is the wind power data from the Belgian Elia website, with a data point every 15 min.t is any moment in the wind power, m is the forecast ahead step and n is set to 32, which is the width of the sliding window.Thus, the input data set is { } (2) Feature extraction based on CNN As shown in Figure 7, three convolutional layers and a max-pooling layer are used in the CNN model in this paper for deep learning and feature extraction.Since a wind power ramp event is a contingency, for continuous wind power sequence data, discontinuity is a feature of the ramp feature data.The convolution kernels of CNN operate on the original feature map and perform convolution operations at the same time so that the potential relationship of each data point in the feature map can be effectively extracted to form feature vectors.Furthermore, the CNN model can start from a small area and extract the ramp features in the entire dataset layer-by-layer from the deep structure by deep learning.
A batch of wind power samples taken from the Elia website in January 2020.Figure 8a-c is drawn from the three convolutional layers based on the samples.The horizontal axis represents the samples arranged in the order of sampling time, and the sample time interval is 15 min; the vertical axis represents the normalized wind power value at that moment, and the light pink background indicates that a ramp event occurred in this segment.According to Figure 8, there are mutations in the feature map output of the three convolutional layers near the samples where the ramp occurs.From Figure 8a-the first convolutional layer-due to the small number of convolution kernels, the more accurate ramp feature is only extracted by the second feature map output.The output of the remaining three feature maps basically shows no ramp trend.In Figure 8b, features are extracted by the 16 convolution kernels in the second convolution layer, and the ramp trend can basically be described by the feature map outputs.However, the fluctuation range of the outputs is large and the concentration is weak.According to Figure 8c, after features are extracted by the third convolutional layer, the ramp trend of the outputs is more concentrated.The outputs of the 15th, 22nd and 30th feature maps show a corresponding mutation when a ramp event occurred.Among them, the 30th feature map output includes a strong mutation, while the features basically stabilize when there is no ramp.Therefore, it can be seen that there is a certain relationship among adjacent samples of wind power ramp features.The ability of CNN to capture local trends is conducive to learning the relationship and extracting wind power ramp features.
(3) Forecasting based on LSTM Due to the time-series relationship of samples that can be learned by LSTM, ramp features extracted by the CNN model and wind power are used as the input of the LSTM model so that the relationship between ramps and the wind power can be determined.To learn the data rules without over-fitting, a single-layer LSTM network is used, where the number of neurons is set to be 128, and the wind power forecast results are exported by the fully connected layer.
In order to meet the dispatching requirements, it is necessary to forecast the long-term wind power ramp situation in advance.As shown in Figure 7, Administration documents [30], wind farms report the forecast curve for the next 4 h every 15 min.In other words, to forecast 16 future wind power data points, the forecast ahead step is usually set to be 16 m = .To further improve the forecast accuracy, this paper adopts the rolling multi-step forecast method, and the forecast data are updated every 15 min.Only the first forecast point is retained for 4 h-ahead forecasting.
The structure of the CNN-LSTM model established in this paper is shown in Table 1, and endto-end training is applied to the model.In summary, the overall idea of multi-step forecasting based on ramp features and deep learning proposed in this paper is as follows: Step 1: OpSDA is used to detect historical wind power, and four types of ramp features are extracted; Step 2: A sliding window is used to divide the input, composed of wind power and ramp features, that is input into the CNN-LSTM model.The ramp features are extracted by the CNN again, and 16 steps-ahead forecast results are forecast by the LSTM.The final rolling multi-step wind power forecast result is obtained by extracting the first forecast point of the multi-step forecasting; Step 3: Based on the ramp detection of the forecast power, the forecast results of ramp events are obtained.

Ramp Forecasting Evaluation Indexes
According to [2], there are four situations for ramp forecast results: forecasting a ramp to occur that does not actually occur; forecasting a ramp not to occur that actually occurs; forecasting a ramp to occur that actually occurs; and forecasting a ramp not to occur that does not actually occur.The occurrence times of the four situations are represented by FP N , FN N , TP N and TN N in turn.There are many evaluation indexes for ramp forecasting, and they have not been uniformly specified.Therefore, this paper selects multiple indexes to evaluate the performance of the forecast model from different perspectives.The recall is defined by the precision is defined by the bias index is defined by the critical success index is defined by the accuracy is defined by the missing report rate is defined by the false alarm rate is defined by the up-ramp forecast accuracy rate is defined by and the down-ramp forecast accuracy rate is defined by where R C and NR C are, respectively, the number of correct forecast up-ramps and down-ramps; R N and NR N are, respectively, the total number of actual up-ramps and down-ramps.
The mean absolute percentage error is defined by where n is the number of wind power samples, actual P is the actual wind power value, forecast P is the forecast wind power value and actual P is the average of the actual wind power.
The recall indicates the probability of the forecast occurrence of the ramp in terms of its actual occurrence.The precision represents the probability that the ramp is forecasted to occur and that it then occurs in reality.The bias index expresses the ratio of the correct forecast number of the ramp to the actual number.The critical success index measures the accuracy of ramp forecasting and indicates the validity of the forecast results.The accuracy is the probability of correctly forecasting the wind power event, which reflects the accuracy of the model in the forecasting of ramps.The missing report rate is the probability of ramp actually occurring that is forecasted not to occur.The false alarm rate is the probability of forecasting a non-ramp as a ramp.The accuracy rate of up-ramp and down-ramp forecasting evaluates the performance of the model from different ramp directions.The mean absolute percentage error is a common index used to evaluate the accuracy of the deterministic wind power forecast model.This paper is about ramp forecasting, and thus in order to better evaluate the model, the mean absolute percentage error is only analyzed when a ramp occurs.

Ramp Detection and Feature Extraction
In this section, the wind power data from the Elia Belgian wind farm with an installed capacity of 3796 MW in January 2020 are used to detect ramps.The ramp detection and feature extraction based on OpSDA were realized in MATLAB R2019b.The detection results are shown in Table 2.The start time in Table 2 is expressed in a numeric format by using "datenum".Due to space limitations, only the features of the first 24 ramp events in January are displayed.The 1000 wind power samples from the Elia Belgian wind farm in January 2020 are taken as an example, and the results of the ramp features detected and extracted by OpSDA are shown in Figure 9.Each rectangle in Figure 9 represents a ramp event.The rectangle in the first quadrant represents an up-ramp, and that in the fourth quadrant represents a down-ramp.The abscissa in Figure 9 represents the duration of each ramp and the sample point of the wind power at the beginning of the ramp.The ordinate represents the amplitude of each ramp.

Performance Analysis of Ramp Forecast Model
Python was adopted as the programming language by the CNN-LSTM forecast model in this paper.The compiling environment was PyCharm Community Edition 2020, 16 GB of RAM was used, and the processor was an AMD Ryzen 7 4800H.The forecast model implementation of this paper was the Keras deep learning framework, which can be used as a high-level application program interface for TensorFlow, Theano and Microsoft Cognitive Toolkit (Microsoft-CNTK), enabling rapid model construction and experimental development.The forecast performance of CNN-LSTM was analyzed by using the data from the Elia website.The first 500 points and the second 500 points of the data of each quarter from October 2019 to September 2020 were used to train and test the network, respectively.Additionally, in this section, the model performance is analyzed with different parameters and with other models to verify the feasibility.

Ramp Forecast Performance with Different Parameters
According to the process of ramp detection, the wind power data are firstly compressed by OpSDA based on the ε -value, and multiple SDA piecewise points are obtained.Ramp detection and subsequent ramp forecasting are affected to a certain extent by the ε -value.In the multi-step forecast model, different forecast results are attained due to different forecast-ahead steps.
In conclusion, to evaluate the different forecast results obtained with different parameters and to show the effect of considering the ramp features in the input of the forecast model to improve the accuracy, data from January 2020 are used for analysis.The performance of the multi-step ramp forecast model with different forecast-ahead steps, different door widths and different inputs is evaluated by four indexes: C R , C P , S B and SI C .Based on Equations ( 19) and (20), the relationship between the four evaluation indexes is shown in Figure 10.higher than these of only power, and C R reaches more than 0.9 many times.When 25 ε = , the evaluation points are mainly distributed near the diagonal line, which shows that the model can simultaneously obtain higher recall and higher precision than 50 ε = and 10 ε = .The evaluation results of 16 forecast-ahead steps (4 h) are arranged more closely; thus, the forecast effect is more stable.Combined with the statistical results in [31], about 95% of wind power ramp events last less than 4.04 h, so the forecast ahead step is set to 16 steps (4 h) in this paper.
The yellow solid point in Figure 10 is the optimal parameter point.The input of this point contains ramp features, a forecast-ahead step that is set to 16 steps (4 h) and a door width that is set to be 25 ε = .In this case, the precision is 0.8587.In addition, not only is the probability of correct results, accounting for all ramp forecast results, higher, but the probability of the correct forecasting of ramp occurrence also reaches the optimal result for C P , and the C R -result is 0.9240.Furthermore, a higher S B and better SI C are obtained in the meanwhile.In summary, the case with the optimal parameters is adopted in this paper for the CNN-LSTM forecast model.In actual engineering applications, different optimal parameters can be set for forecasting according to different wind power data.

Performance Analysis of Different Forecast Models
In this section, BP and LSTM are compared with the proposed model in this paper by using data from the ELIA website to evaluate the performance of wind power ramp forecasting.The wind power data are detected by OpSDA and its ramp features are extracted, and they are used as the input together for the CNN 3. Based on Table 3, the annual average of each model for each index is obtained and shown in Figure 11 so that the forecast performance of the three models can be more intuitively displayed.model, which are both above 0.9 in the second quarter; that is, the number of correctly forecasted ramps accounts for more than 90% of the total number of forecasted ramps and the total number of actual ramps.In general, the forecast accuracy of each model for down-ramp forecasting is higher than that for up-ramp forecasting, and up-ramps are more difficult to forecast than down-ramps.CNN-LSTM 0.8407 0.9051 0.9098 0.8172 0.9020 MLP-BT [16] 0.8669 0.9286 0.8125 --MLP-MSAR [17] 0.8600 0.8889 0.8571 --DNN-MTL(C4S1) [18] 0.7133 --0.6567 0.7047 The ramp forecast results in January 2020 based on CNN-LSTM are shown in Figure 12. Figure 12a shows the 16-step-ahead wind power forecast results obtained from the original wind power and ramp features based on three models.Based on these results, the final ramp forecast results are obtained after being re-detected by OpSDA, as shown in Figure 12b-d.According to Figure 12, the forecast effect of the model is shown more intuitively by the display method of the ramp forecast results proposed in this paper.The ramp amplitude and the ramp time are acquired as the forecast results to provide a better dispatch basis for the power grid.According to Figure 12, severe fluctuations are found in the forecast results of the LSTM model and the BP model.In particular, the long-term ramp event is forecasted by the BP model as multiple events with a short ramp duration but large ramp amplitude.As shown in Figure 12b, it can be clearly concluded that the CNN-LSTM model has better performance in terms of forecasting the ramp amplitude and the ramp duration.After the ramp features are extracted for the second time by the CNN, the continuity of the wind power and the ramp can be effectively learned by LSTM.Ramp forecast results for long-term wind power can be obtained by CNN-LSTM, which is conducive to the safe operation of the power system and economic dispatch.

Conclusions
Aiming at ramp features and the long-term trend features of wind power, a wind power ramp forecast method based on feature extraction and deep learning is proposed in this paper.Firstly, the historical wind power is detected by OpSDA to obtain the historical ramp features.Then, the CNN-LSTM forecast model is established, in which historical ramp features and power are used as the input and the forecast power is used as the output.Deep learning is applied to explore the coupling relationship between ramp features and wind power.Finally, the ramp forecast results are obtained by re-detecting the forecast power.
In this paper, the ramp features can be effectively extracted by CNN from the model input, meaning that the relationship between wind power and ramp events can be learned by LSTM more effectively to obtain more accurate ramp forecast results.The forecast performance of the proposed model with different parameters is discussed and optimal parameters are obtained, which means that high precision and high recall can be obtained simultaneously.Based on the evaluation indexes of ramp forecasting, the wind power samples from the Elia website are used for case analysis.It can be concluded that the evaluation results based on the CNN-LSTM model meet the index requirements, and the annual average recall is as high as 0.9059.In the end, the comparison with other research results verifies the high precision and effectiveness of the deep learning network model proposed in this paper.

Figure 2 .
Figure 2. The basic principle of the Swinging Door Algorithm (SDA).
 ' is the convolution operation; l ij k is the convolution kernel weight matrix, which is used to connect the j - th feature map in the l -th layer and the i -th feature map in the 1 -(l ) -th layer; l j b is the bias matrix; j N is the set of input feature maps; l j L is the output of the j -th feature map in the l -th layer; and ( )  f is the activation function.

Figure 5 .
Figure 5.The structure of the Convolutional Neural Network (CNN). c

Figure 9 .
Figure 9.The effect graph of ramp features detected and extracted by OpSDA.

Figure 10 .
Figure 10.Forecast model performance with different parameters.
-LSTM model, the LSTM model and the BP model for training.In the data from October 2019 to September 2020, 1000 data points are taken every quarter.The first 500 are used as the training set and the last 500 are used as the test set.To analyze the performance of the forecast model, S and R S are obtained and shown in Table

Figure 11 .
Figure 11.Comparison of annual average evaluation indexes with different forecast models.

Table 1 .
The structure of the CNN-LSTM model.

Table 2 .
Ramp events and their features.

Table 3 .
Comparison of evaluation indexes with different forecast models in each quarter.BP: Back Propagation.The smaller the mean absolute percentage error, the better the wind power forecasting effects.Here, the wind power forecast effect in the third quarter is the best and the MAPE I result is only 0.0603.Besides, the MAPE I results of the LSTM model and the BP model are both significantly larger than the proposed ramp forecast model.The other seven indexes mainly reflect the accuracy of the forecast model for ramp forecasting.In the first three quarters, the CC A results of the CNN-LSTM forecast model are above 0.82.Although the CC A result of CNN-LSTM in the fourth quarter is slightly insufficient, it is still higher than that of the other two forecast models.Based on R S in Table 3 reaching 0.8041, the model is proven to be useful for up-ramp forecasting.There are cases in which the missing report rate I M or the false alarm rate R E of the BP model and the LSTM model is lower than that of the CNN-LSTM model, as in the third quarter, but the CC A result of these two methods is far inferior to the CNN-LSTM model.In the CNN-LSTM model, higher accuracy can be obtained when the missing report rate and false alarm rate are relatively low.
).Although the overall forecast accuracy of the proposed ramp forecast model is better for forecasting up-ramps and downramps.
CCA of the proposed ramp forecast model is slightly lower than that of the MLP- BT model and the MLP-MSAR model, the proposed ramp forecast model can balance the various evaluation indexes better and can obtain a better precision C P while ensuring a higher recall C R , and each index result is higher than 0.81.In addition, C performance

Table 4 .
Comparison of evaluation indexes with models in other research works.