ELM-QR-Based Nonparametric Probabilistic Prediction Method for Wind Power

: Wind power has signiﬁcant randomness. Probabilistic prediction of wind power is necessary to solve the problem of safe and stable power grid dispatching with the integration of large-scale wind power. Therefore, this paper proposes a novel nonparametric probabilistic prediction model for wind power based on extreme learning machine-quantile regression (ELM-QR). Firstly, the ELM-QR models of multiple quantiles are established, and then the new comprehensive index (NCI) is optimized by particle swarm optimization (PSO) to obtain the weighting coefﬁcients corresponding to the lower and upper bounds of the prediction intervals. The ﬁnal prediction interval is obtained by integrating the outputs of ELM-QR models and the weighting coefﬁcients. Finally, case studies are carried out with the real wind farm operation data, simulation results show that the proposed algorithm can obtain narrower prediction intervals while ensuring high reliability. Through sensitivity analysis and comparison with other algorithms, the effectiveness of the proposed algorithm is further veriﬁed.


Introduction
Wind power has remarkable uncertainties and randomness. Traditional research projects about wind power prediction mainly focused on deterministic point prediction [1,2]. Recently, the algorithms of point prediction mainly include convolutional neural network [3], long short-term memory neural network [4], and gated recurrent neural network [5]. The main focus of these articles is to reduce the prediction errors by combining or improving some algorithms, but the errors of point prediction are unavoidable and the results cannot describe the uncertainties of wind power generation quantitatively. Considering the current real industrial applications, planning, dispatching, safety, and stability analysis of the power grid involving wind power require a more accurate estimation of the fluctuation range of wind power. Therefore, a new prediction method that can quantitatively reflect the uncertainties of wind power generation is needed to overcome the defects of traditional point prediction, and probabilistic prediction is an effective way to solve this problem.
In recent years, research on the probabilistic prediction of wind power have attracted the attention of many scholars. The modeling objects of probabilistic prediction are usually divided into two types: wind power itself and the point prediction errors. Conventional studies on probabilistic prediction are mainly based on the point prediction errors, and it is assumed that the prediction errors obey the normal distribution [6], β distribution [7], α-stable distribution [8], exponential distribution [9], and the error distribution function are obtained by parametric methods to obtain the prediction intervals. The above methods have low computational complexity, but they are prone to meet situations with unreasonable distribution assumptions. In [10], the nonparametric kernel density estimation approach was used to obtain the distribution of prediction errors, which can avoid the impact of unreasonable assumptions of prior distribution, but the final probabilistic prediction results depend on the accuracy of point prediction, which often leads to poor generalization ability.
For the methods of direct probabilistic prediction of wind power, there is no need to perform point prediction firstly, eliminating the dependence on the point prediction results. According to the modeling methods, they can also be divided into parametric and nonparametric methods. Wan et al. [11] used the bootstrap method to resample the wind power, and assumed that the outputs obeyed the normal distribution to get the final prediction intervals; in [12], the multi-distribution ensemble method was used for probabilistic wind power forecasting, three probabilistic forecasting models based on Gaussian, gamma, and Laplace predictive distributions were adopted to form the ensemble model; [13][14][15][16] used direct quantile regression, joint quantile regression, quantile regression based on gradient boosting decision trees and support vector quantile regression respectively to establish the quantile regression models of wind power, and obtained the nonparametric probabilistic prediction results; in [17], decomposition-based quantile regression forest was applied to day-ahead short-term load probability density forecasting; in [18], the Naive Bayesian Classifier was established to obtain the classification of wind power, and the particle swarm optimization (PSO) algorithm was used to optimize the weighting coefficients corresponding to the prediction intervals; in [19], the lower and upper bounds of the prediction intervals were directly treated as the outputs of extreme learning machine (ELM), and the output weights of ELM were obtained through PSO. Since the nonparametric methods can avoid unreasonable distribution assumptions in the parametric methods, the results obtained are more reasonable, but the calculation is often more complicated. Therefore, it is necessary to seek a modeling method with a simple structure to improve its practicability.
For probabilistic prediction evaluation indicators, the prediction interval coverage probability (PICP) is usually used to evaluate the reliability of the prediction intervals, the prediction interval normalized average width (PINAW) is often utilized to assess the sharpness of the prediction intervals [20], and the interval normalized average deviation (INAD) [21] is generally used to appraise the overall degree of the deviation of actual power from the intervals when it falls outside the prediction intervals. For the comprehensive evaluation indices, Shrivastava et al. [22] proposed the coverage width criterion (CWC), which combines PICP and PINAW in sections, but practical applications showed that this indicator could not scientifically evaluate the global performance of probabilistic prediction methods [19]. In [18], PICP and PINAW were simply weighted and summed, and the structure was simple, but it failed to achieve the adjustment of assessment keypoints under different conditions. Therefore, it is necessary to find a more reasonable and effective comprehensive performance evaluation index.
Based on the above analysis, wind power is more suitable to be selected as the modeling object than point prediction error, nonparametric modeling is more reasonable than parametric modeling. Thus, in this paper, a novel wind power nonparametric probabilistic prediction method based on extreme learning machine-quantile regression (ELM-QR) is proposed. As a special type of single-hidden layer feedforward neural networks, ELM has the characteristics of fast learning speed and strong generalization capability. It has been widely used in recent years [23,24]. In [25], ELM with error correction was used for short-term wind speed prediction; in [26], the multinomial Bayesian extreme learning machine (MBELM) was proposed for multi-class classification; on-line sequential outlier robust extreme learning machine was applied in [27] for probabilistic wind speed forecasting; in [28], a self-adaptive kernel extreme learning machine was proposed for short-term wind speed forecasting; according to the training time shown in [29], ELM can obtain more accurate forecasting results with a faster calculation speed than comparison models, its structural advantages are fully demonstrated. In general, ELM is a powerful algorithm that is well suited for further study. As a typical nonparametric estimation method, quantile regression has been extensively applied in probabilistic prediction [30,31]. Thus, in this paper, ELM is combined with quantile regression to exert each advantages.
At the same time, in order to better evaluate the overall performance of the probabilistic prediction, this paper proposes the new comprehensive index (NCI), which can adaptively adjust the assessment keypoints according to different situations. After that, PSO algorithm is used to maximize the NCI, and the weighting coefficients corresponding to the lower and upper bounds of the prediction intervals are obtained respectively. Finally, the outputs of ELM-QR models and the weighting coefficients are integrated to obtain the final prediction intervals. The method in this paper fully integrates the advantages of multiple algorithms, it can get rid of the impact of unreasonable assumptions of prior distribution and the dependence on point prediction results with a simple structure, the simulation comparison test verifies its superiorities.
The main contents of this article are as follows: Section 2 describes the basic principles of ELM and quantile regression algorithm based on ELM; Section 3 gives the detailed definition of the proposed comprehensive performance evaluation index NCI, and introduces the structure of the probabilistic prediction model based on ELM-QR and its detailed solution steps; Section 4 conducts simulation and comparison studies with actual data; and the final conclusions are draw in Section 5.

Extreme Learning Machine
ELM is a special form of single-hidden layer feedforward neural networks proposed by Huang in [32]. Unlike traditional neural networks, ELM randomly generates the input weights and biases, and then uses simple matrix operations to obtain the output weights, which can significantly increase the learning speed and reduce the computational complexity. Considering , where x i ∈ R n is the input vector, and t i ∈ R m is the corresponding target vector, for an ELM with K hidden nodes, the model can be expressed as: where g(·) is the activation function, w i = [w i1 , w i2 , · · · , w in ] is the weight vector between the input layer and the i-th hidden node, β i = [β i1 , β i2 , · · · , β im ] T is the weight vector between the i-th hidden node and the output nodes, and b i is the bias of the i-th hidden node.
For N training samples, if the outputs of ELM can approximate targets with nearly zero deviation, then: Equation (2) can be further rewritten as: where H is the hidden layer output matrix of the ELM, which can be expressed as: β = [β 1 , β 2 , · · · , β K ] T is the output weight matrix and T = [t 1 , t 2 , · · · , t N ] T is the target matrix. ELM will randomly generate the input weights and the corresponding biases at the beginning of learning, and the values will remain unchanged throughout the learning processes. It can be seen that the value of H will be determined at the beginning of learning and remain unchanged according to Equation (4). ELM obtains the optimal output weights β * by seeking the unique smallest norm least-squares solution of Equation (3). According to the generalized inverse theory of matrix, the solution can be expressed as: where H † represents the Moore-Penrose generalized inverse matrix of H, which is usually obtained by using singular value decomposition (SVD).
Since ELM does not need repeated iterations with the gradient descent method, it can overcome the problems of overfitting and local optimal solutions that exist in the traditional gradient-based neural networks. At the same time, it only requires simple matrix operations to solve Equation (5), significantly reducing the computational complexity of the solving processes [33].

Quantile Regression Based on Extreme Learning Machine
Given N sets of wind power input and output samples {( with nominal proportion τ of the outputs can be expressed as: The mapping relationship between the input and output variables can be expressed by Equation (7).
where θ is the model parameter, and the estimation problem of the regression parameters of the τ-th quantile can be transformed into the optimization problem shown in Equation (8): where ρ τ (·) is the test function, and its expression is shown as follows: The parameters that make the objective function reach the minimum are the τ-th quantile regression coefficientsθ τ . According to the derivation processes of ELM above, ELM can randomly generate the input weights and biases. If the values of the input variables are known, H in Equation (3) is uniquely determined. And the relationship between the output weights to be solved and the output variables can be regarded as linear. Substituting the multi-input single-output ELM model into Equation (8), the objective function of the extreme learning machinequantile regression (ELM-QR) can be obtained: where H i is the i-th row of the matrix H. The current methods for solving the linear quantile regression problems mainly include simplex method, interior point method and smoothing algorithm. Due to its high efficiency and numerical stability, the interior point method has been widely used. In this paper, the interior point method is used to acquire the quantile regression model with ELM.
Introducing slack variables ξ + i and ξ − i , the optimization problem (10) can be converted into: Then it can be transformed into a standard linear programming form: where: Using the interior point method to solve the above linear programming, the τ-th quantile regression parameters β τ can be obtained.

New Comprehensive Index
According to the definition of prediction interval (PI), the interval I where L (α) t (x i ) represent the lower and upper bounds of the prediction interval, respectively.
The traditional prediction interval evaluation indices mainly include reliability and sharpness. Based on this, this paper proposes a new type of comprehensive performance evaluation index.

Reliability
The PI coverage probability (PICP) is usually used to express the reliability of the prediction intervals, reflecting the probability that the actual prediction output t i falls in t (x i ), which can be calculated by Equation (15): where N test is the number of the test samples, c i can be determined by Equation (16): In order to ensure the high reliability of the prediction intervals, the value of PICP should be as close as possible to PINC. Another related indicator is the average coverage error (ACE), which is defined as: Correspondingly, ACE should be as close to zero as possible to ensure the reliability of the prediction intervals.

Sharpness
With a wider prediction interval, the corresponding reliability is higher. However, excessively wide intervals are useless in practical applications, and they cannot truly reflect the uncertain information of the output variables, so it is necessary to introduce the evaluation index of the width of the prediction intervals.
The width of the prediction intervals δ (α) t (x i ) can be expressed as: In order to better evaluate the performance of the prediction intervals, interval score t (x i ) [8] is introduced, and its calculation is shown in Equation (19): The interval score can be calculated for each prediction output. For all test data, the global interval score can be obtained by Equation (20): The interval score not only considers the width of the prediction intervals, but also takes into account the cumulative deviation outside the prediction intervals, so that it can express the performance of the prediction intervals more reasonably.

New Comprehensive Index
In practical applications, the expected prediction intervals should be highly reliable (ACE is as close to zero as possible) and their interval scores are high, so that they will be more meaningful for subsequent optimization. For the comprehensive evaluation index, it should be able to balance the reliability and sharpness, and be able to adaptively assign the key points of assessment under different circumstances. Based on the above purposes, this article proposes the new comprehensive index (NCI), and its expression is: where RIS is the reliability index score, |·| represents the absolute value, γ and λ are the weights of the reliability index score and the interval score, A α t stands for the ACE value of the prediction intervals under the nominal probability 100(1 − α)%, η, σ are used to adjust the characteristics of RIS. S α t norm is the normalized value of S α t , can be calculated by Equation (22): where S α t min is set to zero, and S α t max is set to 2α.
The curve of RIS changing with ACE is shown in Figure 1. For NCI, the reliability evaluation is mainly achieved by introducing the sigmoid function. When ACE fluctuates in a small range near 0, RIS approaches 0, which has little impact on the overall performance index. At this time, the key point of the assessment is the interval score. When the deviation between ACE and 0 is large, RIS will increase suddenly, which will have a greater impact on NCI. At this time, the main assessment is the reliability. Therefore, the NCI proposed in this paper can adaptively adjust the assessment keypoints according to diverse situations. When the reliability is poor, the reliability is mainly evaluated, and when the reliability is high, the interval score is primarily considered.

Structure of the Probabilistic Prediction Model
This paper proposes a novel probabilistic prediction model based on the ELM-QR and its structure is shown in Figure 2. It can be seen from the structure diagram that in order to obtain the lower and upper bounds of the prediction intervals, a certain number of quantile regression models are first trained, and then the outputs of these models are weighted and summed to obtained the corresponding lower and upper bounds of the intervals. In this structure, the quantiles τ 1 ∼ τ k , τ 1 > τ 2 > · · · > τ k on which the upper bound depends and the quantiles τ k+1 ∼ τ k+n , τ k+1 > τ k+2 > · · · > τ k+n on which the lower bound depends do not cross, and τ k > τ k+1 .

Solution of the Probabilistic Prediction Model
In the novel probabilistic prediction model, selection of the weighting coefficients will directly affect the range of the prediction intervals, so the setting of the weighting coefficients will be an important part of the probabilistic prediction processes.
PSO is widely used in various optimization scenarios for its simplicity, superior robustness, and fast convergence speed. In this paper, PSO algorithm is used to obtain the optimal weighting coefficients by maximizing NCI. In order to avoid overfitting, this paper divides the dataset into three categories: D train , D valid , and D test . Among them, D train is used to construct the ELM-QR model, D valid and D train are introduced jointly to optimize the weighting coefficients to enhance generalization ability and avoid overfitting, and D test is used to verify the effectiveness of the algorithm finally.
The calculation process of the wind power probabilistic prediction model based on ELM-QR is shown in Figure 3. The solution steps are as follows: Step 1: Collect the actual data of the wind farm, remove dead pixels, construct the input and output data pairs, and normalize the data pairs to [−1,1], divide them into three parts, D train , D valid and D test ; Step 2: Set the number of quantile points and the corresponding quantile values, use D train with the interior point method to obtain the quantile regression models; Step 3: Initialize the weighting coefficient h t , set γ and λ in the objective function, and initialize the PSO parameters (population number, maximum iteration number, initial velocity and initial position); Step 4: Use the PSO algorithm to optimize and obtain the weighting coefficients based on D train and D valid ; While the maximum number of iterations or sufficiently good fitness has not been reached, perform the following: (a) For each particle in the population, calculate the corresponding PI and NCI over D train and D valid , NCI is used as the fitness. (b) Compare the fitness of particle's current position with its historical best position (p best ), and update the historical best position with the current position if the fitness of current position is higher. (c) Compare the fitness of particle's current position with the fitness of global best position (g best ). If the fitness of the current position is higher, the global best position is updated with the current position. (d) Update particle position and rate according to p best and g best . (e) Increment the iteration counter.
Step 5: Substitute the inputs of D test into each quantile model to obtain the corresponding outputs, and then determine the prediction intervals according to the weighting coefficients obtained from PSO, and calculate the evaluation indices finally.

Simulation Data and Parameter Setting
The actual operating data used in this article is from a wind farm in Hebei province, North China, which covers the period from July to August in 2018 and 2019 with a 15 min sampling period. The wind farm has a combined generating capacity 200.1 MW consisting of 87 wind turbines of 2.3 MW. The wind speed is measured by 70-m-high wind tower and the wind power data is collected by supervisory control and data acquisition (SCADA) system.
Generally, there are some abnormal data points in the raw data due to sensor failure or communication interruption in daily operation of wind farm. Thus, data processing is required after data collection to ensure high data quality. The data processing method used in this paper is to remove bad points and replace them with liner interpolation data. The raw data and processed data which covers the period from July to August in 2018 is shown in Figure 4, where x-axis is average wind speed measured by wind towers and y-axis is the power of the entire wind farm. It can be seen from Figure 4 that there is a strong mapping relationship between wind speed and power. In [18], the historical wind speed and the historical wind farm power are used as inputs of probabilistic prediction model. However, when the multi-step prediction is performed, the cumulative error will occur because the model input contains the wind farm power (which cannot be obtained from the weather forecast). Therefore, this paper only uses historical wind speed (data covers 8 sampling periods before current time) as input and wind farm power (data at the next sampling period) as output to construct the data pairs for one-step ahead forecasting, and normalizes them to [−1,1]. The subsequent simulations are all based on historical operation data to verify the effectiveness of proposed method. Since the model input is wind speed, which can be easily obtained by weather forecast, so the proposed model is easy to be extended to multi-step prediction through weather forecast data in practical application.  Figure 5 shows the value of NCI with different iteration numbers over training data which is optimized by PSO. With the increases of iteration number, the value of NCI tends to be optimal. Figures 6 and 7 show the prediction results of the test data with PINC = 90% and PINC = 80%, respectively, and Table 1 shows calculation results of the detailed performance indices. It can be seen intuitively from the results in these figures that the obtained prediction intervals are narrow, and the points falling outside the intervals are also close to the interval boundaries. The detailed performance indices calculation results show that the ACE is quite close to 0, and the scores of the interval score and the NCI are also high, which fully shows that the proposed algorithm can obtain a more effective prediction interval.

Comparative Analysis
Since in the proposed novel probabilistic prediction model, the lower and upper bounds are obtained by weighted sum of different quantile regression models, in order to further prove the superiorities of this structure, this paper conducts sensitivity analysis on the quantile regression model prediction results, obtain the performance indices of the prediction intervals and compare with the results of the proposed structure. Figure 8 shows the prediction results of some typical quantiles. The sensitivity coefficient κ is introduced, and its calculation formula is shown in Equation (23), where τ, τ are the quantile points corresponding to the upper and lower bounds of the intervals. Calculate various performance indices under different sensitivities and make subsequent comparisons: Figure 9 and Table 2 show the NCI results and the detailed performance indices under different sensitivities respectively. From the calculation results, it can be seen that the NCI of the proposed structure is better than the NCI at any sensitivity. With 90% nominal confidence, the ACE and interval scores of the proposed structure are the best; and with 80% nominal confidence, the ACEs with −0.04 and −0.02 sensitivity are better than the proposed structure, but the corresponding interval scores are lower. Considering both of these two situations, the final NCI of the proposed structure is still higher.   Table 3 shows the calculation results of other typical algorithms. Among them, BELM is the algorithm combining bootstrap and ELM in [11], and KDE is the prediction interval obtained by non-parametric kernel density estimation of the prediction errors and combined with the point prediction results. Since KDE depends on the accuracy of point prediction and estimations of the distribution of prediction errors are usually not accurate, so the prediction results are not stable enough. According to the results in Table 3, the ACE of KDE with 90% nominal confidence is high, but the ACE with 80% nominal confidence is low, and they all have a wide prediction interval. For BELM, it assumes that the final result satisfies normal distribution, but this assumption of prior distribution is also not reasonable. At 90% nominal confidence, the result is not as good as that of KDE with almost the same width of prediction interval, but at 80% nominal confidence, the overall result is better than KDE. Combining the calculation results in Table 1, it can be seen that the algorithm proposed in this paper is superior to the above two algorithms in all indicators, which can obtain narrower prediction intervals while ensuring high reliability. The model proposed in this paper can overcome the shortcomings of the above two algorithms, which can get rid of the impact of unreasonable assumptions of prior distribution, and achieve more reasonable and effective prediction intervals based on NCI optimization.

Conclusions
In order to obtain effective wind power probabilistic prediction results, this paper proposes a novel type of nonparametric probabilistic prediction method based on ELM-QR. Its main features include: (1) Make full use of the fast learning and strong generalization ability of ELM to obtain the quantile regression models effectively. (2) A novel comprehensive performance evaluation index is proposed, which can adjust the assessment keypoints adaptively according to different situations, so as to balance reliability and sharpness more reasonably. (3) A new model structure is proposed. Based on NCI, the weighting coefficients are obtained through PSO, which fully integrates the advantages of multiple algorithms.
Simulation research and comparative analysis are carried out using actual wind farm operating data. The results show that compared with typical probabilistic prediction methods, the proposed method in this paper can provide more accurate prediction intervals, its comprehensive performance has remarkable advantages, and it can provide effective data support for the safe and stable operation of the power grid.