1. Introduction
With the rapid development of science and technology industries, fossil fuel energy has been consumed in large quantities, which has led to a series of problems, such as the greenhouse effect, resource shortage, and environmental pollution [
1,
2]. To alleviate and improve the existing energy crisis and resource scheduling problems, people focus on the development of renewable and nonpolluting new energies. Wind energy is one of the most common sources of energy in nature and accounts for a large proportion of the development of renewable energy [
3,
4]. Wind-speed prediction is also important for the wind-resistant design of bridges [
5] and railway infrastructure [
6]. However, the nonstationary, nonlinearity, and intermittency of wind-energy resources will lead to uneven wind-power output of wind turbines, which hinders grid security maintenance, power quality, and power scheduling and planning [
7,
8]. For wind-power generation systems, the difficulty and cost of storing electricity is much higher than generating electricity, and most generators are in the form of direct power generation [
9,
10]. This is very different from traditional regulated energy sources, such as hydroelectricity [
11,
12]. Another difficulty with wind power generation is the integration of wind power into the grid [
13]. A large number of intermittent energy grids will lead to unbalanced and unstable power-frequency regulation ranges and peak values during the peak period of electricity consumption [
14]. Therefore, finding an accurate and robust wind-speed prediction method has always been an important research direction for related scholars.
Wind-speed prediction methods can be generally divided into two categories: physical methods and statistical methods [
15]. Physical methods build predictive models through complex physical laws and meteorological boundary information [
16]. In general, physical methods have a more stable effect on long-term wind-speed predictions. Numerical weather prediction (NWP) [
17] is the most common physical method. For example, Yamaguchi et al. [
18] proposed a wind-speed forecasting method based on an autoregressive model. The experimental results showed that the use of high-level resolution numerical weather forecasting can effectively improve wind-speed forecasting accuracy. Zhao et al. [
19] developed a selective ensemble system based on the NWP. The experimental results showed that the proposed method can obtain accurate wind-speed prediction results by automatically culling underperforming members. Chen et al. [
20] employed the Gaussian process and the NWP model to predict next-day wind power. The results showed that the accuracy of the prediction of the proposed model is better than that of the comparison model. However, the use of physical models requires complex physical variables and time-consuming solution times.
Unlike physical methods, statistical methods do not require complex physical variables, and only need historical meteorological data to build predictive models [
21,
22]. In general, statistical methods have better performance in short-term wind-speed prediction. The widely used traditional statistical methods include stochastic models [
23], autoregressive models (AR) [
24,
25], and Markov chains [
26]. Pablo et al. [
27] proposed a wind-speed model based on the Ornstein–Uhlenbeck process and applied the model to predict the average wind speed of a wind field in Mexico. Karakus et al. [
28] proposed a wind-speed prediction model based on the polynomial AR model for the forecast of the previous day. The experimental results showed that this model outperforms all other models in both speed prediction and power prediction. Tang et al. [
29] improved the traditional Markov chain and proposed a new state classification step and wind-speed simulation method. The results showed that this proposed model outperforms traditional modeling methods. However, traditional statistical methods are mostly based on the assumption of stationarity and cannot effectively extract nonstationary features in wind-speed series.
The advent of machine learning models solves this dilemma. Common machine learning models are K-nearest neighbor (KNN) [
30], random forest (RF) [
31], artificial neural networks (ANN) [
32], support vector regression (SVR) [
33,
34], etc. For example, Zhang et al. [
35] developed a short-term wind-speed forecasting model, GA-ANN. The results of the paper showed that the model can significantly improve the accuracy of short-term wind-speed predictions. Ren et al. [
36] developed a new wind-speed prediction model by combining SVR and empirical mode decomposition, and the results showed that the proposed model outperforms several contrast methods in terms of accuracy or computational complexity. Wang et al. [
37] proposed an RF-based wind-speed forecasting method. The results showed that the proposed method can effectively improve the training efficiency and the accuracy of wind-speed predictions. Owing to the characteristics of the machine learning model, the nonstationary features in the wind-speed sequence were effectively extracted, and the accuracy of wind-speed prediction was improved to a certain extent.
With the deepening of research, relevant scholars have found that the prediction performance of a single model always has certain limitations [
38,
39]. Combining multiple models can obtain better prediction results. Therefore, there is a growing trend to combine different individual models. For instance, Cadenas et al. [
40] developed an ensemble model combining the ANN and the autoregressive integrated moving average (ARIMA) model. The results showed that the generalization ability of the ensemble model is better than that of the ARIMA and ANN models alone. In [
41], a hybrid model based on modal decomposition and an extreme learning machine is proposed for short-term wind-speed predictions, and it is found that the proposed prediction model can effectively improve the reliability of multistep wind-speed predictions. Wang et al. [
42] proposed a hybrid model combining the AdaBoost with the extreme learning machine (ELM) algorithm and verified that the proposed model is a method with more potential than traditional methods.
However, most of the existing studies focus on improving the accuracy of wind-speed predictions, ignoring the quantification of uncertainty in wind-speed series. Accurately capturing the probability distribution of wind-speed sequences can provide more abundant decision-making information for dispatchers, which is conducive to efficient planning and rational allocation of resources. Therefore, it is necessary to carry out wind-speed probability prediction research. This paper developed a novel hybrid model, LGB-GPR, which combines light gradient boosting machine (LGB) and Gaussian process regression (GPR) for wind-speed forecasting and quantifying forecast uncertainty. Among them, the LGB model can provide accurate wind-speed deterministic prediction results, but cannot quantify the wind-speed uncertainty. In contrast, the GPR model can quantify the uncertainty of wind speed, but its prediction accuracy is poor. The fusion of the above two models can give full play to their advantages. The innovations and main contributions of this paper are summarized as follows:
- (1)
A new machine learning method named LGB is used to predict wind-speed sequences, which can provide accurate wind-speed prediction results.
- (2)
A novel hybrid model combining LGB and GPR is proposed for wind-speed probability prediction.
- (3)
The proposed hybrid model is applied to a real case in the United States and compared with eight contrasting models.
The rest of the paper is organized as follows: In
Section 2, a brief description of LGB, GPR, and the model LGB-GPR are introduced. The evaluation metrics are given in
Section 3. The data usage and experimental setup are presented in
Section 4. Comparative results and discussion are presented in
Section 5. At last, conclusions and future research of this study are given in
Section 6.
2. Methodology
To solve the wind-speed probability prediction problem, this paper proposes a new hybrid model, LGB-GPR. In this section, we first describe the formulation and principles of the LGB model and the GPR model, respectively. Then, how to couple the LGB model and the GPR model to obtain reliable wind-speed probabilistic prediction results is described in detail.
2.1. Light Gradient Boosting Machine
Light gradient boosting machine (LGB) is an improved gradient boosting decision tree (GBDT) model proposed by Microsoft. It solves the problems of slow training speed and large memory usage of the traditional gradient boosting decision tree model in the face of large data volume and high feature dimension, and at the same time, has higher accuracy. LGB is a lifting tree model with a decision tree as the base function. The final prediction is achieved by linearly adding the prediction results of multiple decision trees.
2.1.1. Model Formulation
Known datasets
T = {(
x1,
y1), (
x2,
y2),
, (
xN,
yN)},
xiX,
yiY.
xi is the
n-dimensional feature vector,
X is the input space,
yi is the one-dimensional label,
Y is the output space, and
N is the number of samples. The model can be expressed as follows:
where,
T(
x;
) represents a single binary regression tree,
is the parameter of the tree,
M is the number of trees.
If input space
X is divided into
J independent regions,
R1,
R2,
,
RJ, there is a certain output value corresponding to each region
cj, and the regression tree can be expressed as:
where,
= {(
R1,
c1), (
R2,
c2),
, (
RJ,
cJ)} is the divided area of the tree and the output value on the corresponding area,
J is the complexity of the tree, that is, the number of leaf nodes of the tree.
2.1.2. Model Optimization Mechanism
Compared with the traditional GBDT model, the optimization of the LGB model mainly includes the following points:
- (1)
Gradient-based One-Side Sampling (GOSS): Without changing the distribution of sample data, some samples with small gradients can be eliminated, and only the remaining samples with larger gradients can be retained to estimate information gain, thereby reducing the number of training samples. Since samples with smaller gradients also contribute little to information gain, GOSS technology can make the LGB model faster while ensuring accuracy.
- (2)
Exclusive Feature Bundling: In practical applications, high-dimensional data is often sparse. LGB model adopts the histogram (Histogram) algorithm to merge those mutually exclusive features after discretizing continuous features to form new features, reduce feature dimension, reduce memory usage, and speed up model training.
- (3)
Leaf-wise Tree Growth with Depth Limit: Change the level-wise tree growth adopted by most decision tree models to a leaf-wise growth strategy. Compared to the original, each leaf node was split, but now only the leaf node with the largest split gain is split, which reduces unnecessary overhead. In the case of the same number of classifications, the latter is more accurate than the former. LGB avoids model overfitting by setting the maximum tree depth parameter. The growth diagram of the decision tree is shown in
Figure 1:
2.1.3. Model Implementation Process
The flow of the complete LGB model is as follows:
- (1)
Initialize, find the constant value that minimizes the overall loss function.
where
L(.) is the loss function. At this point, the model is a tree with only one root node.
- (2)
For m = 1, 2, , M.
- (a)
For
i = 1, 2,
,
N, the residual is estimated by the negative gradient of the loss function.
- (b)
Fit a regression tree to rm to obtain the leaf node area Rmj of the m-th tree, that is, j = 1, 2, , J.
- (c)
For
j = 1, 2,
,
J, estimate the value of the leaf node region using a linear search fit to minimize the loss function.
- (d)
Iteratively update with the following formula.
- (3)
2.2. Gaussian Process Regression
The Gaussian process regression is a nonparametric model that uses a Gaussian process prior to performing regression analysis on data. The basic principles are: assuming that the learning sample obeys the Gaussian distribution, according to the prior probability of the random variable assumed by the Gaussian distribution, estimate the posterior distribution of the random variable based on the Bayesian principle, and use the maximum likelihood method or Monte Carlo sampling to estimate the model parameters. Then, a Gaussian process regression model is constructed to obtain a probability prediction value that obeys a Gaussian distribution. The schematic diagram is shown in
Figure 2.
In the figure,
X = [
x1,
x2,
,
xn] represents the
n-dimensional input feature vector, and
Y = [
y1,
y2,
,
yn] represents the predictor variable. Suppose
x and
y form the following regression model:
where
is the noise and obeys the normal distribution with a mean of 0 and a variance of
;
n is the input feature dimension. The prior distribution of
ytrain is:
where
(
Xtrain,
Xtrain) is an
n ×
n symmetric positive definite covariance matrix,
In is an
n-dimensional identity matrix. The detailed expression of
(
Xtrain,
Xtrain) is as follows:
where cov
i,j represents the covariance between feature
i and feature
j. Gaussian process kernel function
is introduced to simulate the covariance between each feature dimension,
(
Xtrain,
Xtrain) = (
ij). In the paper, the radial basis kernel function is used, and the formula is as follows:
where
σ is the hyperparameter of the radial basis kernel, and
M is the matrix function that characterizes the anisotropy. The joint Gaussian distribution of
ytrain and
ytest is as follows:
where
is the covariance matrix between the training set feature input
Xtrain and the test set feature input
Xtest,
is the internal covariance matrix of the test set feature input.
The posterior condition of the predicted value
Ytest of the test set can be obtained by Bayesian inference.
where
test is the predicted mean of the test set;
is the variance of the Gaussian distribution.
2.3. LGB-GPR
On the basis of the deterministic forecast results obtained by the LGB model, combined with the GPR method, the LGB-GPR model, which can obtain both the interval forecast results and the probabilistic forecast results, is obtained. GPR is implemented using the ‘GPy 1.9.9′ framework in python. LGB is implemented using the ‘lightgbm 3.3.1′ framework in python. All the above models were performed using ‘Intel(R) Core (TM) i7-10750H CPU @ 2.60GHz’. The flowchart of the LGB-GPR prediction is presented in
Figure 3.
In
Figure 3,
represents the original training set feature input,
represents the original test set feature input,
and
represents the training set results and the test set results predicted by the trained LGB model, respectively.
represents the training set observations,
represents the test set observations,
represents the test set results predicted by the GPR model.
The prediction steps based on the LGB-GPR model are as follows:
Step1: Train LGB model with and as features and labels, respectively.
Step2: Taking and as input, respectively, use the trained LGB model to obtain and .
Step3: Train the GPR model with and .
Step4: Using the trained GPR model to get , evaluate model prediction accuracy based on and .