Probability Density Forecasting of Wind Speed Based on Quantile Regression and Kernel Density Estimation

.


Introduction
Because of the independence from fossil energy and low environmental costs, wind energy has become an important part of sustainable development strategies in many countries [1]. Wind power generation is the conversion of air kinetic energy into electrical energy, and its characteristics will be directly affected by the characteristics of wind speed, which is a stochastic variable and intermittent. With the increase of the proportion of wind energy capacity in the system, the impact of fluctuating wind power on the grid system becomes more and more obvious. High wind speed disturbance will cause great changes in the voltage and frequency of the system and may cause the system to lose stability in serious cases. Therefore, accurate prediction of wind speed is meaningful for the optimal control of wind turbine operation in wind farms, the reasonable formulation of power system dispatch plan, and adverse effect reduction of the wind power on the whole grid [2][3][4][5][6][7].
In the literature, much attention has been paid to developing accurate wind speed forecasting models, which are mainly designed for point predictions. According to the implementation mechanism, current point forecasting models mainly include two categories: one is called the physical model, which is based on numerical weather prediction. The other is based on historical data to construct statistical models to predict future wind speed. Traditional statistical models are represented by time series models, i.e., autoregressive moving average (ARMA) models [8]. In recent years, artificial intelligence and machine learning (AI/ML) models, such as artificial neural networks (ANN) [9], support vector machine (SVM) [10], extreme learning machines (ELM) [11], and deep learning networks (DLN) [12], have been widely used in point prediction of short-term wind speed. In order to improve the prediction accuracy and robustness, hybrid or combined models [13] integrating the advantages of single models are attracting more and more attention.
As nondeterminacy exists in actual wind speed samples, the point prediction model could not always give satisfactory results, which makes the decision-making work face certain risks. Interval prediction is an effective tool to describe and quantify the uncertainty of wind speed. Some scholars have carried out research on wind speed interval prediction in recent years. Song, Jiang, and Zhang [14] examined a Markov-switching model in wind speed forecasting. Unlike the traditional point forecast of wind speeds, such a model could offer both the point and interval forecasts of wind speed series. Iverson et al. [15] used stochastic differential equations (SDEs) to model short-term wind speed. They showed that SDEs could effectively capture the time dependence structure of wind speed prediction errors naturally and, most importantly, derive point and interval forecasts using one SDE model. Recently, wind speed interval forecast using AI/ML models under the multi-objective optimization framework has received more and more attention [16][17][18][19][20][21]. The multi-objective optimization aims at concurrently minimizing the width and maximizing the coverage probability of the constructed intervals. Several AI/ML models, such as SVMs [16][17][18], back propagation neural networks (BPNNs) [19], radial basis function neural networks (RBFNNs) [20], and deep belief networks (DBNs) [21], have been successfully applied in wind speed interval forecasting. Some signal processing algorithms, including variational mode decomposition (VMD) [17], complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) [18], and wavelet transform (WT) [19,21], were also utilized to reduce the noise and complexity of the raw wind speed data.
Although interval prediction can give upper and lower boundaries of future wind speed, it cannot describe the probability distribution of wind speed. It is well known that the probability density function (PDF) could completely model the probabilistic characteristics of random variables. Thus, probability density prediction can describe the uncertainty of actual wind speed data more accurately and can provide fully predictive information for decision-making work. Some scholars have already made some fruitful attempts. Gneiting and his collaborators [22,23] first used multiple estimates of the current state of wind speed to generate an ensemble of deterministic predictions. They then adopted Bayesian model averaging (BMA) as a statistical post-processing method to predict the PDFs of wind speed. Their results showed that the BMA method could provide calibrated and sharp probabilistic forecasts of wind speed. A similar idea was also adopted by Baran [24,25], but using two different distributions (log normal and truncated normal distributions) to calibrate the probabilistic wind speed forecasts. Both Gneiting's [22,23] and Baran's [24,25] studies need multiple runs of numerical weather forecasting models with various initial conditions to obtain the ensembles of forecasts. Unlike these studies, Hu and Wang [26] proposed a hybrid probability prediction model based on wind speed historical data. The hybrid model is composed of empirical WT and Gaussian process regression (GPR). Experimental results showed that the hybrid GPR model can provide the most likely value and the probability information corresponding to the wind speed forecast based on the predictive PDF [26]. Compared with the large number of studies on point and interval predictions (especially for the point prediction), the studies on wind speed probability density prediction are relatively few, and more effective models should be explored to meet engineering needs.
Quantile regression (QR) is one of regression analysis methods. It was first proposed by Koenker and Bassett in 1978 [27]. QR estimates conditional quantiles of response variables from explained variables and further deduces the conditional probability distribution of response variables without assuming the distribution type of random variables [28]. Therefore, QR could be a good choice for probability density forecasting of wind speed. In order to improve the performance of linear QR (LQR) in complex nonlinear problems, some scholars integrated the AI/ML algorithms with QR and proposed quantile regression neural networks (QRNN) [29], quantile regression random forests (QRRF) [30], quantile regression support vector machine (QRSVM) [31], etc. Very recently, He and his collaborators [32][33][34][35], respectively, applied QRSVM and QRNN for short-term probability prediction of wind power. Their results showed that QRNN has excellent performance in wind power probability forecasting. However, in the current literature, it is rare to see the QR for wind speed probability prediction. Zheng et al. [36] performed pioneering work in this area. They put forward a theoretical framework for wind speed probability density prediction based on composite QR and outlier-robust ELM (CQR-ORELM) with feature selection and parameter optimization. A detailed analysis of actual wind speed data showed that the CQR-ORELM model can well describe the conditional distribution and provide satisfactory wind speed forecasts. Research in this area will be continued in this study, using both linear QR and nonlinear QR (NQR, i.e. QRNN, QRRF, QRSVM, etc.) to predict the probability density of short-term wind speed, in order to promote the further study of wind speed probability prediction.
Density estimation methods should be used to assist QR to obtain the PDF of predictive wind speed. Unlike the traditional parametric estimation methods, the kernel density estimation (KDE) method does not require prior knowledge of the data distribution and does not add any assumptions to the data distribution. Because of its flexibleness and robustness, KDE has been widely used in wind speed probability distribution estimation and wind energy assessment [6,37]. The core of KDE is the selection of the kernel function and the determination of bandwidth. In this study, the Gaussian kernel function will be chosen for its generality. The normal reference criterion (NRC) [38] will be used for calculating the bandwidth. According to the NRC, the optimal bandwidth is achieved by minimizing the value of the mean integrated squared error (MISE). By choosing the Gaussian kernel function and optimal bandwidth, the KDE method is utilized to handle future wind speed data predicted by QR models to acquire the overall PDFs at any moment. In addition, empirical mode decomposition (EMD) [39,40], which is a famous adaptive de-noising algorithm based on local characteristics of the signal, will be implemented to make the raw wind speed series less noisy and more stable.
The above literature review indicates that only a few studies have focused on the wind speed probability prediction; this is particularly true for the QR-KDE based probability density forecasting of short-term wind speed. In addition, few efforts have been made to comprehensively compare the performance and robustness of both the LQR and NQR models in point, interval, and probabilistic density predictions of short-term wind speed. Therefore, the novelty and contributions of this study can be summarized as follows: A framework for probability density forecasting of wind speed based on QR and KDE is proposed. EMD is implemented to reduce the noise of raw wind speed series. Both LQR and NQR (QRNN, QRRF, and QRSVM) are, respectively, utilized to study the de-noised wind speed signal. By choosing the Gaussian kernel function and optimal bandwidth, the KDE method is utilized to handle future wind speed data predicted by QR models to acquire the overall PDFs at any moment. Various experiments are conducted on the real wind speed data at four wind sites in China. The performance and robustness of various QR-KDE models in point, interval, and density predictions of short-term wind speed are compared comprehensively. The best QR-KDE based probabilistic density forecast model is then recommended for real applications.
The content of this paper is organized as follows. Section 2 explains the structures and procedures of QR-KDE based density forecast models. Section 3 introduces the measurement of wind speed data at four sites of China. Section 4 presents the evaluation of model parameters and compares the performances of various models, and Section 5 summarizes the conclusions of the study.

Linear Quantile Regression
Different from the classical regression analysis, QR aims at estimating the conditional quantiles of the response variable under given independent variables and then the conditional density distribution of the response variable. Similar to linear regression, LQR is also based on the method of least squares. If the independent variables are x i (t) (i = 1, 2, ..., I), regression coefficients are m i , and the intercept term is b, then we can get the τth (0 ≤ τ ≤ 1) quantile of the response variableŷ τ (t) as: From LQR,ŷ τ (t) could be estimated by minimizing the quantile regression error function: where y(t) is the observed value of response variable at time t (t = 1, 2, ..., N). ρ τ (u) is known as the pinball loss function, and its expression is given by: The detailed optimization algorithm was outlined by Koenker [28]. The whole conditional density distribution can be obtained by continuously taking the value of τ in the range of (0, 1). When dealing with complex nonlinear problems, the performance of LQR might be poor. Thus, NQR models integrating with AI/ML algorithms have also been proposed in the literature. In this study, these models will also be utilized for probability density forecasting of wind speed data.

Quantile Regression Neural Network
The ANN family are widely used AI/ML algorithms. In the forecasting field, the single hidden-layer feedforward network is the most commonly used algorithm. Cannon [29] combined LQR and this type of ANN to propose the QRNN model. By applying the hyperbolic tangent to the inner product between independent variables x i (t) and weights w  j of the hidden-layer, we can obtain the output of the jth hidden-layer node as follows: An estimate of the conditional τ−quantileŷ τ (t) is then given by: are, respectively, the weights and bias of output-layer nodes. The output-layer transfer function f (·) is usually considered as the linear function. The number of hidden-layer nodes J, which controls the model complexity, should be set carefully to avoid overfitting. Moreover, we can also use the weight decay regularization [41] to help prevent overfitting. A positive constant λ, which is called the weight penalty, should also be determined to control the relative contribution of the weight decay term.

Quantile Regression Random Forest
Both LQR and QRNN obtain the optimal parameters by minimizing the quantile regression error function (see Equation (2)). Meinshausen [30] proposed a different approach (QRRF), which is based on random forests (RFs) instead of directly optimizing the minimum error function. For QRRF, the ensemble of trees is grown as in the standard RF algorithm by employing random node and split point selection. Then, the conditional distribution of the response variable is estimated by the weighted distribution of observed response variables, where the weights attached to observations are identical to the original RF algorithm [30].
Following the notation of Breiman [42], the random parameter vector θ k describes how a tree is grown. The observed values of one leaf are (x, θ k ). Let the weight vector ω i (x, θ k ) be given by a positive constant if observation X i is part of leaf (x, θ k ) and zero if it is not. The conditional distribution F(y|X = x) could be approximated by the weighted mean over the observations of 1 {Y≤y} as: Estimations of the τth conditional quantilesŷ τ (t) could then be obtained according to the definition: The number of trees N tree and the number of leaves grown in every tree m try are two tuning parameters of the QRRF. The values of these two parameters will be fine-tuned on the out-of-bag samples.

Quantile Regression Support Vector Machine
Takeuchi et al. [31] first proposed the QRSVM model, which integrates SVM into the QR to construct an NQR model. By minimizing the error function plus a regularizer, QRSVM could provide the conditional quantiles of response variable. Similar to the regular SVM for regression [43], the independent variable vector could be projected into a higher dimensional feature space by using the kernel function defined nonlinear mapping relations, i.e., f (x) = φ(x), w + b, in which w is the weight vector, φ(x) is the mapping function of independent variable x, and b is the dual variable to the constraint. Using the connection between reproducing the kernel Hilbert space and feature space, we obtain the dual optimization problem, which is equivalent to the minimization of error function plus the regularizer: Here, we use C = 1/(λ s m), in which λ s is the regularization parameter. Equation (8) could be solved straightforwardly using the Lagrange multipliers. f (x) is also represented by the kernel expansion form as f (x) = ∑ i α i k(x, x i ) + b, where α i is the Lagrange multiplier and k(x, x i ) is the kernel function. In our study, we chose the radial basis function as the kernel function. The bandwidth of kernel function and regularization parameter λ s determine the performance of QRSVM and should be tuned to make the QRSVM in the optimal state.

Kernel Density Estimation
KDE is used to estimate the PDF from conditional quantiles obtained by QR models. By centering a smooth kernel function at each data point, KDE then sums to get a density estimate. The expression of the basic kernel estimator is given as: in which K(·) is the kernel function and h is the bandwidth. The Gaussian kernel function is chosen in this study for its generality. The selection of the bandwidth would significantly affect the estimation results. NRC [38] is used by minimizing the MISE to select the optimal bandwidth. The optimization problem is given by: As the form of f (x) is unknown, so the solution of Equation (10) is not easy. Usually, we can assume that the kernel density function obeys the normal distribution, so that the optimal bandwidth can be calculated directly from the following equation: in which d is the lag number, n is the sample number, andσ is the standard error of the sample. Theσ could be replaced by the inter-quartile range (IQR) to make it less sensitive to outliers.

Empirical Mode Decomposition
In order to reduce the noisy of original wind speed series, EMD is implemented before applying the QR-KDE models for probability density forecasting. After the process of EMD, the original wind speed data are decomposed into a finite number of intrinsic mode functions (IMFs) and residuals. Compared with the original data, each IMF behaves as more stable and regular. In order to determine whether the decomposed signal is an IMF or not, two conditions should be satisfied [40]: (a) the number of extrema and the number of zero crossings are equal or differ at most by one; (b) at any point, the mean value of the envelope defined by the local maxima and the envelope defined by the local minima is zero. The detailed computation of the EMD can be found in [40].

Probabilistic Density Forecast Models Based on QR-KDE
Based on the principles of LQR, NQR, KDE, and EMD introduced in previous sections, the solution process of QR-KDE based probabilistic density forecast models is proposed and shown in Figure 1. It can be summarized as the following steps: Step 1: For the original wind speed data, EMD is implemented to obtain a finite number of IMFs and residuals. These IMFs and residuals are then studied by the LQR or NQR (QRNN, QRRF, and QRSVM) models, respectively. An ensemble of conditional quantiles of wind speed is obtained accordingly.
Step 2: Based on the optimal bandwidth by NRC, KDE estimates fully the PDFs of predictive wind speed from the ensemble of conditional quantiles. If the prediction error does not meet the requirement, then Steps 1 and 2 are repeated by tuning and selecting the best parameters of each LQR and NQR models.
Step 3: Under the 90% and 80% confidence levels, the forecasting PDF curves are constructed, respectively. Both point and interval predictions under the two confidence levels are calculated in for model performance comparisons.
According to different QR models used, the wind speed density prediction models proposed in this study are divided into four types, namely EMD-LQR-KDE, EMD-QRNN-KDE, EMD-QRRF-KDE, and EMD-QRSVM-KDE. We can also directly apply the QR models to the original wind speed data without the implementation of EMD. Here, these four models, including LQR-KDE, QRNN-KDE, QRRF-KDE, and QRSVM-KDE, are also investigated for comparisons with EMD based models.

Forecasting Performance Evaluation
In order to compare the performance of various models in point and interval predictions, several evaluation metrics should be defined. For the point prediction, the probability mean and median are of the most interest. Here, the root mean squared error (RMSE), the mean absolute error (MAE), and the mean absolute percentage error (MAPE) are utilized to evaluate the point forecasting performance. Their expressions are given by: where y t ,ŷ t denote the actual and predicted values and n is the sample number. Due to the randomness and volatility, wind speed may be close to zero or even equal to zero at some times. The maximal value of wind speed y t,max is used as the denominator in the analysis of the MAPE to avoid the instance that the error value tends to infinity. There are usually two metrics for the performance evaluation of interval predictions. The first one is called prediction interval coverage probability (PICP) [44], which is expressed as follows: where n is the sample number. If the actual observations of the ith wind speed sample fall into the predicted interval, then c i = 1; otherwise, c i = 0. The predicted interval is unreliable if the PICP is significantly lower than the predetermined confidence level. The normalized mean prediction interval width (NMPIW) [44] is the other metric to evaluate the accuracy of interval prediction. Its mathematical expression is written as: where U(x i ), L(x i ) are the upper and lower interval boundaries of the ith sample, respectively. R is the variation in range of actual wind speed. If the NMPIW is large enough, PICP can always reach 100% to satisfy the predetermined confidence level. However, such an interval width is too large and useless for engineering applications. The goal of constructing the prediction interval is to make NMPIW as small as possible under the premise that PICP is greater than the predetermined confidence level. From a practical point of view, we expect higher PICP and lower NMPIW. However, these two objectives always conflict with each other. Under certain conditions, higher PICP would lead to higher NMPIW, and lower NMPIW would cause PICP become lower. Therefore, a coverage width based criterion (CWC) [45] is defined as a comprehensive evaluation metric: where µ is the predetermined confidence level, η is the penalty parameter (usually a large number), and: When PICP is greater than µ, the CWC is equal to NMPIW. If PICP is lower than µ, then both PICP and NMPIW determine the value of the CWC, and PICP would have a greater impact on the CWC. In short, the lower the CWC, the better the accuracy of interval prediction is.
For the probabilistic density prediction, the continuous ranked probability score (CRPS), defined by Matheson and Winkler [46] and Gneiting and Raftery [47], could be utilized as the evaluation metric. Its expression is presented as follows: where F(x) is the predictive cumulative distribution function and y is the verifying observation. Generally, the lower the CRPS metric, the higher the accuracy of probability density forecasting.

Wind Data Description
With the aid of an anemometer, the wind speed information can be recorded continuously. Usually, the time interval of recorded wind speed data is 10 min. For the convenience of following the analysis, the data are averaged hourly. Four wind speed observation stations in China, whose geographic information is listed in Table 1, are concerned in this study. For each observation station, we select a wind speed dataset with a sample size of 1000, of which the first 800 is the training set and the latter 200 is the testing set. Tim plots for each wind speed dataset are given in Figure 2.  Table 2 shows the statistical characteristics of wind speed data at four sites. It can be seen that the maximum wind speeds of the four sites are all greater than 10 m/s. The average wind speed of the HeiLongJiang (HLJ) site is the highest with a large deviation, while that of the AnHui (AH) site is the lowest with a small deviation. Although the kurtosis indexes of the four sites have little difference, the difference in the skewness index is evident. There are not only sites with high skewness, such as the GanSu (GS) site (0.9460), but also sites with low skewness, such as the GuangDong (GD) site (−0.0621). Therefore, the wind speed data of the four selected points are representative, which can provide data support for the research of the prediction method in this paper. Table 2. Statistical features of wind speed data at four wind sites.

Results and Discussions
Before applying the QR-KDE models, EMD is used for signal de-noising. In this study, two IMFs and residuals are used to represent the original series and are respectively studied by the QR-KDE models to obtain the ensemble of conditional quantiles of predictive wind speed.
The input variables of the model are wind speed data with different lag periods. The lag order of different wind speed data is determined by the partial autocorrelation function (PACF). Taking the original data of the AH site as an example, the PACF is drawn and shown in Figure 3. When the lag order is greater than four, the PACF value is small and can be ignored. Therefore, the lag order is four. For the other data, the same method can be used to determine the lag order.
Quantiles from 0.05 to 0.95 with an interval of 0.01 are used to construct the 90% confidence level, and quantiles in the range of 0.1-0.9 with the interval of 0.01 are adopted to construct the 80% confidence level. For the LQR model, the least squares method is used for model estimation. However, the parameters of various NQR models are different, and there is no uniform parameter determination method. Therefore, aiming at minimizing the MAE results of point prediction, various NQR models (QRNN, QRRF, and QRSVM) with different parameter combinations are tested for each wind speed dataset of four observation stations, and the optimal parameters are obtained, as shown in Tables 3-5.  Using these models, we compare the accuracy of both the point and interval predictions of various QR-KDE models. Then, taking the conditional quantiles obtained by the QR-KDE model with the best performance as the input, PDFs of wind speed at different times are obtained by the KDE method and verified by comparing with the actual wind speed values.

Point Prediction
Point prediction includes the probabilistic mean and median prediction. Various QR-KDE models with/without EMD are applied for point predictions. In addition, the ARMA model is also a common model in the point prediction analysis. Bayesian information criteria (BICs) are used to determine the optimal order of the ARMA model. Taking the original data of the AH site as an example, the BIC values of the ARMA models with different order combinations are calculated and shown in Table 6. Obviously, the BIC value corresponding to ARMA(1,1) is the lowest. Therefore, ARMA(1,1) is determined as the prediction model of AH wind data, and the model parameters are calculated by the maximum likelihood estimation. The order and parameter estimation of ARMA models for other wind data can be determined according to the above process. The introduction of EMD greatly improves the point prediction accuracy of the QR-KDE models. The results of the AH station (see Table 7) are taken as an example. Without EMD, the lowest RMSE, MAE, and MAPE of wind speed mean are all obtained by ARMA, i.e., 0.8692 m/s, 0.6833 m/s, and 19.68%. With EMD, the lowest RMSE, MAE, and MAPE of wind speed mean are gained by EMD-QRNN-KDE and EMD-LQR-KDE, i.e., 0.4635 m/s, 0.3498 m/s, and 11.08%. The prediction accuracy is increased by nearly 50%. For the wind speed median, the prediction accuracy is also increased 50% after considering the EMD. Similar accuracy improvement could also be found in the other three stations.  Generally, ARMA, LQR, and QRNN have better point prediction performance than QRRF and QRSVM. At the AH station (see Table 7), except for the MAPE of the mean and the MAE of the median, the lowest values of other metrics of both the mean and median prediction are obtained by EMD-QRNN-KDE. At the GD station (see Table 8), EMD-LQR-KDE has the lowest RMSE, MAE, and MAPE of the wind speed mean, while EMD-QRNN-KDE gets the lowest metrics of wind speed median. Except for the RMSE of the wind speed median at the GS station (see Table 9) and the RMSE and MSE of the wind speed median at the HLJ station (see Table 10), EMD-LQR-KDE gains the lowest values of other metrics at the two stations. Thus, we can conclude that for point prediction, EMD-LQR-KDE and EMD-QRNN-KDE have better performance than the other seven models.

Interval Prediction
Interval predictions under two confidence levels (90% and 80%) are carried out by various QR-KDE models. Both the PICP and NMPIW of the predicted intervals for the four observation stations are then calculated and given in Tables 11-14, respectively. It is shown that the PICPs of predicted intervals by QR-KDE models without EMD (LQR-KDE, QRNN-KDE, QRRF-KDE, and QRSVM-KDE) are always lower than the predetermined confidence levels. After the introduction of EMD, the PICPs of various QR-KDE models are significantly increased. The most obvious increase is found in the EMD-QRRF-KDE model. Taking the results at the GD station for example (see Table 12), we can see that the PICPs of EMD-QRRF-KDE are 99% and 95%, greatly exceeding the confidence levels (90% and 80%). However, if we pay attention to the NMPIW, we can find that the NMPIW of EMD-QRRF-KDE (36.3% and 28%) becomes wider than that of QRRF-KDE (30.3% and 22.7%). This indicates that for the QRRF model, the cost of introducing EMD to increase PICP is the widening of NMPIW, which is contrary to the goal of interval prediction. For the QRNN model, after the introduction of EMD, we can find that PICPs increase to 90.0% and 81%, which are in the vicinity of the predetermined confidence levels. Most importantly, the NMPIW is reduced greatly, from 22.3% and 15.2% of QRNN-KDE to 16.4% and 11.8% of EMD-QRNN-KDE. The positive effect of EMD on interval prediction accuracy could also be found in the LQR model. Although EMD could reduce the NMPIW of QRSVM-KDE, its increased effect on NICP is not as significant as the QRNN and LQR models. Similar findings could also be gained from the results of the other three stations.   Under the two confidence levels, Tables 11-14 also present the results of the CWC metric at the four stations, to directly show the performance of various QR-KDE models without and with EMD. According to the definition of Equation (17), the QR-KDE model with the lower CWC has higher accuracy and better performance. It is shown that except for the case of the AH station, both EMD-LQR-KDE and EMD-QRNN-KDE have the lowest CWC. At the AH station (see Table 11), the CWC of EMD-QRNN-KDE is slightly higher than that of EMD-LQR-KDE, while it is still much lower than the CWCs of the other QR-KDE models. Among the considered QR-KDE models, EMD-LQR-KDE and EMD-QRNN-KDE have the best performance in interval prediction. However, the introduction of EMD is not always useful to improve the accuracy of interval prediction, especially for the QRRF models. We can see that the performance of QRRF-KDE becomes worse after considering the EMD. For QRSVM-KDE, it is found that EMD is helpful for accuracy improvement in most cases. Only at the GS station, the introduction of EMD seems to have little effect (90% confidence level; see Table 13) and even a negative effect (80% confidence level; see Table 13) on accuracy improvement. Figures 4-7 give the upper and lower bounds of wind speed predicted by EMD-QRNN-KDE at the four stations, respectively. The interval widths of the 90% confidence level are basically larger than those of the 80% confidence level. At some times, such as the 110th hour (AH site in Figure 4b), the 150th hour (GD site, Figure 5b), the 30th hour (GS site, Figure 6a), and the 50th hour (HLJ site, Figure 7a), the interval predicted by the model can not well contain the actual wind speed values. However, these moments are in the minority. As most real wind speeds were covered by the upper and lower bounds, we can find that the performance of EMD-QRNN-KDE is good.

Probabilistic Density Prediction
From the analysis of the previous sections, one can see that the performance of the LQR and QRNN models on point and interval predictions could be greatly improved by EMD. In this section, by taking conditional quantiles obtained by various QR-KDE models as the input, the PDFs of wind speed at 200 h are obtained by the KDE method. The optimal bandwidth of the KDE method is determined by the NRC. Tables 15-18 present the CRPS values of both LQR and nonlinear QR models under the 80% and 90% confidence levels. Obviously, through the pre-processing of wind speed by EMD, the CRPS metric of probability density prediction is significantly reduced, except for the QRRF-EMD model. It is indicated that the introduction of EMD can improve the accuracy of probability density prediction for most QR-KDE models. In addition, the CRPS value of the EMD-QRNN-KDE model is the lowest, indicating that the EMD-QRNN-KDE model performs best in wind speed probability density prediction.
Here, the specific PDF curves predicted by EMD-QRNN-KDE at nine hours (the 1st, 25th, 50th, 75th, 100th, 125th, 150th, 175th, and 200th) under the 80% and 90% confidence levels for four wind observation stations are, respectively, given by Figures 8-11. In order to make the comparisons, the actual wind speed values at these hours are also presented in the figures. Except for the 150th hour of the AH station in Figure 8g, the 25th hour of the GD station in Figure 9b, the 125th hour of the GS station in Figure 10f, and the 175th hour of the HLJ station in Figure 11h, most actual wind speeds are always near the peaks of the predicted PDF curves, indicating that the probabilistic density prediction by EMD-QRNN-KDE is believable. Compared with the curves of the 90% confidence level, the density curves of the 80% confidence level usually have narrower wind speed ranges and higher peak values. At some times, the density curves might be biased distributions (see the 25th hour of the AH station in Figure 8b, the 125th hour of the GD station in Figure 9f, the 1st hour of the GS station in Figure 10a, the 1st hour of the HLJ station in Figure 11a, etc.), bimodal distributions (see the 175th hour of the AH station in Figure 8i, the 75th hour of the GD station in Figure 9c, etc.), or even multi-modal distributions (see the 75th hour of the AH station in Figure 8c, the 50th hour of the GD station in Figure 9c, the 175th hour of the HLJ station in Figure 11h, etc.). Based on the EMD-QRNN-KDE model, we can not only get the specific PDF curves of wind speeds, but also obtain the dynamic change of density distributions with time. This means that compared with traditional point and interval prediction models, the proposed QR-KDE models could acquire more information about the randomness and uncertainty of the actual wind speed.

Conclusions
A framework for probability density forecasting of wind speed based on QR and KDE is proposed. EMD is implemented to reduce the noise of raw wind speed series. Both LQR and NQR (QRNN, QRRF, and QRSVM) are, respectively, utilized to study the de-noised wind speed series. By taking the predicted conditional quantiles as the input, PDFs of wind speed at different times are obtained by the KDE method. Various experiments and comparisons are conducted on the real wind speed data at four wind observation stations in China. The conclusions are summarized as follows: (1) EMD-LQR-KDE and EMD-QRNN-KDE have the best performance and robustness in both point and interval predictions.
(2) Most actual wind speeds lay near the peak of the predicted PDF curves, indicating that the probabilistic density prediction by EMD-QRNN-KDE is believable.
(3) With the change of times, the predicted density curves might be biased distributions, bimodal distributions, or even multi-modal distributions.
The results show that the QR-KDE model can not only provide point prediction and interval prediction results, but also provide the probability density distribution of wind speed at any moment. Therefore, the research results will help to deepen the understanding of the randomness and uncertainty of actual wind speeds.

Conflicts of Interest:
The authors declare no conflict of interest.