Probabilistic Wind Power Forecasting Approach via Instance-Based Transfer Learning Embedded Gradient Boosting Decision Trees

: with the high wind penetration in the power system, accurate and reliable probabilistic wind power forecasting has become even more signiﬁcant for the reliability of the power system. In this paper, an instance-based transfer learning method combined with gradient boosting decision trees (GBDT) is proposed to develop a wind power quantile regression model. Based on the spatial cross-correlation characteristic of wind power generations in different zones, the proposed model utilizes wind power generations in correlated zones as the source problems of instance-based transfer learning. By incorporating the training data of source problems into the training process, the proposed model successfully reduces the prediction error of wind power generation in the target zone. To prevent negative transfer, this paper proposes a method that properly assigns weights to data from different source problems in the training process, whereby the weights of related source problems are increased, while those of unrelated ones are reduced. Case studies are developed based on the dataset from the Global Energy Forecasting Competition 2014 (GEFCom2014). The results conﬁrm that the proposed model successfully improves the prediction accuracy compared to GBDT-based benchmark models, especially when the target problem has a small training set while resourceful source problems are available.


Introduction
Wind energy has grown to an extent whereby its impact on the power system has become relevant in many regions.According to a report by the World Wind Energy Association (WWEA), the total capacity of wind turbines installed worldwide reached 539 GW at the end of 2017 [1].In recent years, an increasing number of countries, including Germany, Ireland, Portugal, Spain, Sweden, and Uruguay, have reached a double-digit wind power share.Considering the higher variability of wind power generation, the increased wind power penetration level will introduce considerable uncertainties to power systems, resulting in higher requirements on power system transmission capacity, reserve capacity and system flexibility [2].Probabilistic wind power forecasting is proposed to model the uncertainty of wind power by estimating the probability distribution of wind power generation.Based on the probability distribution, many decision-making applications, including unit commitment [3][4][5][6][7][8], wind power trading [9][10][11][12], reserve procurement [13,14], demand response [15,16], probabilistic power flow [17,18], and economic dispatch [19][20][21], can be developed.
To model the wind power distribution, some researchers developed a point wind power forecasting model, and subsequently, parametric models [22][23][24][25][26][27][28] and non-parametric models [29][30][31][32] were built to model the error distribution.In recent years, much attention has been paid to quantile regression.In the wind power forecasting track of the Global Energy Forecasting Competition 2014 (GEFCom2014) [32], three of the five most-effective models incorporated quantile regression to solve the wind power distribution.By utilizing the pinball loss function, quantile regression estimates the wind power quantiles directly, which has been shown to be an effective probabilistic wind power forecasting method.For example, a quantile regression forest model and a stacked random forest-gradient boosting decision trees (GBDT) model were built in [33].These two models form a voted ensemble for forecasting the probability distribution of wind power.In [34], a quantile linear regression model was built, whereby the non-linear feature transformations of the input variable are taken as the model input.The winner of the GEFCom2014 wind power track developed a gradient boosted machine (GBM) approach for multiple quantile regression, therein fitting each quantile and zone independently [35].Notably, the model developed in [35] utilizes the information on correlated wind farms in the other zones.The case study shows that when the information of the wind farms in the other zones is used as the input, a 2.5% decrease in the pinball loss can be achieved.Similarly, rather than solely using one numerical weather predictions (NWP) sample point measured at the location of the power plant, the work in [36] explored information from the NWP data of both the local zone and the nearby zones, which constitutes a spatial grid of NWP.By constructing new variables from the raw NWP data of the NWP grid, [36] significantly improved the forecast skill of state of the art forecasting systems by 16.09% and 12.85% for solar and wind power, respectively.The success of [35,36] suggested that introducing information of the wind farms in other zones is a viable technique to reduce the prediction error.
To utilize the information of wind power generations in other zones, this paper proposes a wind power quantile forecasting method based on instance-based transfer learning.Transfer learning focuses on solving a specific problem (target problem) using knowledge gained from different, but related problems (source problems) [37].with this knowledge, transfer learning can improve the results achieved on a target problem [38].Many successful applications can be found in visual adaptation [39] and text classification [40].In instance-based transfer learning, the data of the related problems are used as training examples for the target problem [41,42].For wind power quantile regression, the wind power generation of the wind farm in different zones shows spatial cross-correlation [43].Therefore, it is reasonable to apply instance-based transfer learning techniques to reduce forecasting error on the target zone by introducing the data of related wind farms to the training set.
To review existing works that use instance-based transfer learning in forecasting, the work in [44] focused on adding data from the source problems (auxiliary training data) to the training process together with the data from the target problem (base training data).As the source problem set may mix with unrelated problems and the resulting negative knowledge transfers can degrade the prediction accuracy [45], thus [44] introduced a source problem selection technique based on the covariance coefficient of the load vectors.However, the selected auxiliary training data from different source problems are directly added to the training set and treated equally, neglecting the fact that the relatedness to the target problem is not the same for each source problem.
In this paper, an IBT-GBDT (instance-based transfer learning embedded gradient boosting decision trees) model is proposed.The gradient boosting decision trees algorithm is very effective in wind power generation probabilistic forecasting, which is chosen as the core forecasting method.Following the instance-based transfer learning technique, the base training data from the target problem and auxiliary training data from the source problems together constitute the training set, but with different weights assigned.To derive the weights, this paper analyzes two types of errors, i.e., random errors and systematic errors.Then, the formula for the weights, which considers the distribution of the errors, is given.However, the errors are unknown before the model is trained; thus, in practice, based on the theoretical analysis, the weights of the auxiliary training sets are solved by iteration, whereas the weight of the base training set is a hyperparameter chosen by cross-validation.As a result of the advantages of transfer learning, the combination successfully utilizes information about other zones.with the enlarged training set, the model is well trained using the GBDT algorithm, which results in the improvement in prediction accuracy.The IBT-GBDT model is tested on a public dataset from the wind track of GEFCom2014.This dataset consists of 10 zones.Measured by quantile forecasting score (QS), the results show that the IBT-GBDT method can increase the forecasting accuracy for the target zone, especially when this target problem has many closely-correlated problems and a small training set.
To the best of our knowledge, this is the first time that instance-based transfer learning has been used for wind power quantile regression.The contributions of this paper are three-fold:

•
Instance-based transfer learning is utilized to increase the forecasting accuracy of probabilistic wind power forecasting.Different weights are assigned to different auxiliary training sets according to their relatedness to the target problem to reflect the real relatedness between source problems and the target problem.Based on the maximum likelihood method, the theoretical formula for the weights is derived.• A unique method for solving the weights is proposed in this paper.The weight for a target zone is a hyperparameter chosen by cross-validation, and the weights for the source problems are solved by iteration.

•
Several GBDT-based benchmark models are developed in this paper to illustrate the effect of the instanced-based transfer learning method.Compared to those benchmark models, the IBT-GBDT model earned the highest prediction accuracy.
The remainder of this paper is organized as follows.In Section 2, the mathematical formulation of the proposed IBT-GBDT model and the corresponding training method are described.Case studies are conducted to validate the proposed approach in Section 3. Finally, Section 4 gives a summary of this paper.

The Instance-Based Transfer Learning Embedded Gradient Boosting Decision Trees
At the beginning of this section, the methodology of GBDT and its application in quantile regression are introduced.Then, the architecture of the IBT-GBDT model is described.Finally, the weight formula and the corresponding iterative weight-solving algorithm are derived.

Pinball Loss Function and Weighted Pinball Loss
The pinball loss function is an error measure for quantile regression.Given a target percentile τ, f τ (•) denotes the predictive function for the wind power quantile.The corresponding pinball loss, denoted by l τ (y, f τ (X)), is defined as: where X denotes the model inputs such as the weather, wind speed, and wind direction.y denotes the actual wind power generation.For a sample set denoted by S, by summing the pinball losses across all samples from S, the pinball loss of the set S is obtained.As some data points contribute more than others, large weights are assigned to the samples with higher contributions.In this case, the weighted pinball loss for the sample set S is defined as the weighted sum of all the pinball losses of samples from S. Let (X i , y i ) denote the i th sample of S and w i denote the corresponding weight (the determination of w i depends on (X i , y i ), which is introduced in detail in Section 2.3).The weighted pinball loss is defined as: (2) 2.1.2.Gradient Boosting Decision Trees with Weighted Pinball Loss Gradient boosting is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of base learners.Typically, gradient boosting is used with decision trees of a fixed size as base learners, e.g., Classification And Regression Trees (CART).By utilizing the weighted pinball loss, the use of gradient boosting decision trees (GBDT) becomes an effective technique for quantile regression.
GBDT builds the model in a stage-wise manner.In every stage, the target is to fit the residual error.f (t) denotes the fitted function after stage t.The number of boosting stages is denoted by T. Before the tth stage, the residual error for point i is: i denotes the fitted value on the ith sample in stage t.
After the residual error is fitted, the updated weighted pinball loss for the sample set S becomes: ). ( The gradient of i is: Recall the definition of gradient boosting.The base learner should fit the negative gradient of the loss function (−g i ).When choosing CART as the base learner, the updated weighted pinball loss is: where k denotes the value of the kth tree leaf in stage t and K (t) denotes the number of tree leaves in stage t.To minimize the weighted pinball loss function L (t) τ , we take the partial derivative of L (t) τ with respect to the value of the kth node (ρ k is chosen according to the following equation.

Hyperparameters of Gradient Boosting Decision Trees
In the application of the GBDT algorithm, hyperparameter tuning is crucially important.In this paper, we consider several key hyperparameters, which can be classified into two groups: (1) regression tree-related hyperparameters; (2) gradient boosting-related hyperparameters.
The regression tree-related hyperparameters are (1) maximum depth D max and (2) minimum number of samples in leaf node N lea f .Maximum depth controls the number of nodes used in the individual tree.The common value of maximum depth is between five and nine, which gives a suitable model complexity for the decision tree as a base learner of gradient boosting.The minimum number of samples in the leaf node is always between 20 and 80 [36].
The gradient boosting-related hyperparameters are (1) the number of boosting iterations T, (2) learning rate λ, and (3) bag fractions bag.The bag fraction stands for the fraction of samples to be used to train the individual decision tree and is a real number satisfying bag ∈ [0, 1].The number of boosting stages controls the number of trees used in the model.with sufficient boosting stages, it is a common pattern that the smaller the learning rate is, the better the generalization achieved [46].However, the quantile GBDT model has to repeat the training process 99 times at each percentile (τ = 0.01, 0.02, • • • , 0.99), which is time consuming.Considering the computation time, the value of λ is chosen as 0.05.For this specific λ, the optimal value of T is between 400 and 800 [36].
The optimal values for these super parameters are determined by cross-validation, which will be discussed later.

The Architecture of IBT-GBDT
Consider M source problems {P (m) }, where m is the source problem index.When training the forecasting model for the target zone, the sample data of the target problem (base training data) constitute the training set.In an ideal scenario, wind power generations of the wind farm in different zones are strongly correlated.For the wind power quantile regression in the target zone, other than the base training data, the auxiliary training data can be added directly to the training set.with the enlarged training set, the model can be better trained, and the forecasting error will thus be reduced.In this case, the samples from the auxiliary training set and base training set satisfy the following equations.
where f denotes the real mapping function between the input variable X denote the random error for samples from P (target) and P (m) , respectively.
However, in practice, the cross-correlation between the target problem and source forecasting problems is not ideal.In the worst case, the source problem set may be mixed with totally irrelevant problems.Therefore, the systematic error δ (m) i is introduced to denote the imperfect relatedness of P (m) to P (target) .with the systematic error considered, Formula (10) becomes: where f (m) denotes the real mapping function between the input variable X (m) i and the wind power for the mth source problem.f (m) is different from f .In this case, the base training data and auxiliary training data should not be treated equally.In this paper, different weights are assigned to the target problem and source problems.The weight of the base training set is defined as w (target) , and w (m) denotes the weight of the mth auxiliary training set.w (m) should reflect the relatedness of different source problems P (m) to the target problem.A larger w (m) for P (m) implies a stronger relatedness to the target problem.The formula for weights is derived in Section 2.3.In this formula, the weights are calculated based on the performance of the GBDT model.Based on the formula for weights, an iterative algorithm is developed to solve the weight, where the weights are adjusted according to the weight formula in every iteration.Then, the updated weight is sent back to the GBDT model.The iteration stops when the weights converge.
When the target percentile τ changes, this paper assumes that the relatedness between the target problem and source problems does not change.Thus, the weights are the same for different quantiles and only need to be solved once.Therefore, the IBT-GBDT described in this paper is divided into two training steps.In Step 1, the weight is solved, and the quantile regression model with τ = 0.5 is trained.In Step 2, the weight from Step 1 is assigned to different datasets accordingly, and then, the quantile regression with τ = 0.01 ∼ 0.49 and 0.51 ∼ 0.99 is solved.In both two steps, the inner GBDT model is trained using the algorithm described in Section 2.1.2.The structure of the proposed IBT-GBDT model is depicted in Figure 1.

Derivation of the Weight Formula
The analysis in [27] shows that the error distribution of wind power generation forecasting is a heavy-tailed distribution.Thus, when solving the quantile regression model with τ = 0.5 in Step 1, we assume that the random variants δ (m) + ε (m) and ε (target) i are independent and follow the Laplace distribution.The expressions are: δ+ε represent the scale parameter of ε (target) and δ (m) + ε (m) , respectively.For the hypothesis set of the prediction functions f θ (θ is a parameter), the likelihood of the prediction function f θ being the correct prediction function is: Using the maximum likelihood method, the most likely value for the parameter θ is: The target optimization function for solving the most likely prediction function f θ is: According to (15), the weight formulas for w (target) and w (m) are: For weights, only their relative size matters.Thus, this paper normalizes the weights of the source problems w (m) to the range 0-1.
The corresponding w (target) becomes:

Iterative Weight Assignment Algorithm
According to (18) and ( 19), the calculation of weights depends on the error distributions, and the error is calculated after the model is trained; however, the model is trained after the weights are set.
To solve for the circular dependency, an iterative algorithm is presented in this paper, where w (m) is solved via iteration, while w (target) is a hyperparameter chosen in advance via cross-validation.
Because the relatedness between all the source problems P (m) and the target problem P (target) is initially unknown, all the source problems P (m) are assigned to the same weight (according to (18), the initial value for all w (m) is one).with w (target) and w (m) determined, the inner quantile regression model is trained using the gradient boosting decision trees described in Section 2.1.2.After the inner layer is trained, b (m) δ+ε (the scale parameter of the Laplace error distribution) is calculated.The weights w (m) , based on (18), are updated according to the corresponding b (m) δ+ε .with the updated w (m) , the next iteration begins.Finally, the outer iteration stops when every w (m) converges (Equation ( 18) holds simultaneously).The pseudocode of the weight assignment algorithm is described in Algorithm 1.
Algorithm 1 Weight assignment via iteration and validation (pseudocode).

Data Specification
To examine the IBT-GBDT model proposed in this paper, the model is tested on a public dataset from the probabilistic wind power forecasting track of GEFCom2014 (the dataset of GEFCom2014 is available in the Supplementary Material of [32]).The aim of this track was to forecast the normalized wind power generation in 10 zones, corresponding to 10 wind farms in Australia.The competition consisted of 15 tasks.The first three tasks were trial tasks, and the last 12 tasks were the evaluation tasks.To mimic real-world forecasting processes, the tasks were designed in a rolling forecasting manner.In the first trail task, months of data were provided as training data.with one month of incremental data released, the aim of each task was to forecast the wind power generation of the next month.The provided input included a 10-m wind speed vector and a 100-m wind speed vector.The desired output was the probabilistic distribution of wind power generation described by multiple quantiles.For each task, the quantile regression model was trained independently.The data period of the corresponding task is listed in Table 1.

Benchmark Models
In GEFCom2014, GBDT was shown to be a very effective algorithm for probabilistic forecasting of wind power, as GBDT was used in the top two effective models.In this paper, three GBDT-based models were developed and chosen as the benchmark models.The first one is the DL-GBDT [35], which was the winner of the wind power forecasting track of GEFCom2014.The authors of this paper reproduced the DL-GBDT model with exactly the same structure, input features, and hyperparameters.The last two benchmark models were the basic GBDT models that were developed to exhibit the effectiveness of the weight assignment algorithm.One was the GBDT model trained only by the base training data.The other was the GBDT model trained by both the base training data and auxiliary training data.

Feature Selection
The input data provided were hourly wind forecasts of the zonal and meridional wind components at 10 and 100 m, for ten separate (but correlated) wind zones.Based on the model input, the wind speed (WS), wind direction (WD), and wind energy (WE) are defined as follows, where u and v are the wind components provided.
In DL-GBDT, features are derived from the provided model input by feature engineering, and different combinations of input features are tested by cross-validation.As a result, the input features of DL-GBDT model are very effective for the quantile model fit.Therefore, the proposed IBT-GBDT model chose the same input features as the DL-GBDT model, except the feature "hour" was added.The corresponding input features for the forecasting model of different zones are listed in Table 2.

Error Measure
The sharpness and reliability are two key measures for probabilistic forecasting.According to [32], the wind power quantile forecasting score (QS) is defined.QS is a comprehensive evaluation of sharpness and reliability, which is the average of pinball loss over all target percentiles.For the convenience of comparison, the target percentiles were chosen to be τ = 0.01, 0.02, • • • 0.99, the same as in GEFCom2014.

Illustration of Training Process
To illustrate the training process of the IBT-GBDT model, wind power generation quantile forecasting for Zone 7 in the fourth evaluation task was chosen as the target problem.The auxiliary training sets were the data of wind power generation in zones other than Zone 7.
The cross-validation results showed that the best value of the hyperparameter w target was 50.Thus, 50 was chosen as the weight of the dataset from Zone 7 (target zone).The weights of the others zones were initialized to one.Then, the quantile regression model with τ = 0.5 was trained according to Step 1 of Algorithm 1.The converging process of the weights is presented in Table 5.Table 5 shows that the weights converged after almost seven iterations.According to Table 5, the weight of Zone 8 always had the value of one.This means Zone 8 had the highest relatedness to Zone 7 (the same result was also given in [33]).
For the trained quantile regression model with τ = 0.5, the pinball losses on the base training set and auxiliary training sets are calculated and shown in Table 6.Compared to the training loss after the first iteration, the training loss for zones except Zone 7 and Zone 8 increased after the second iteration.The reason is that the weights of these training sets dropped after the first iteration; thus, the inner GBDT algorithm was less fit to these training sets in the second iteration.After the weight is determined, Step 2 of Algorithm 1 is executed.The wind power quantiles were forecast for the test set of Zone 7. Line plots of different prediction intervals, measurements, and forecast medium during the first 48 h are drawn in Figure 2.

The Relationship between Forecasting Error and w (target)
As stated above, w (target) is a hyperparameter.To illustrate the effect of w (target) , for Zone 7, the relationship between QS and the hyperparameter w (target) is drawn in Figure 3.
As w (target) increases from 2-500, Figure 3 shows that the QS is first decreasing, but then increases.The reason is as follows.If w (target) is too small, the model is more like a common model of all the zones, where negative transfer occurs.When w (target) is too large, the model is more like a model trained only on data of the target zone, and no positive transfer is involved.When w (target) is given a suitable value, the positive transfer is strengthened, whereas the negative transfer is prevented; thus, the QS obtained the best result.

Analysis on Model Reliability
For any probabilistic forecasting method, reliability is seen as a primary requirement [47].For the TL-GBDT model and benchmark models, the average proportion deviations between nominal and empirical proportions [47], which is the measure of reliability, are depicted in Figure 4.According to Figure 4, the effectiveness of these models is analyzed.For all these GBDT-based methods, the quantiles were slightly overestimated for proportions lower than 0.5 and slightly underestimated for proportions above that value, which indicates that the corresponding predictive distributions were slightly too narrow.Among them, the worst-performing model was the GBDT model trained with both base training data and auxiliary training data.Negative transfer should be the main reason causing this.The second worst-performing model was the GBDT model trained with base training data.Comparing with the IBT-GBDT model, those two basic GBDT models did not perform well when the target percentile τ was between 80 and 95.For those quantiles, the distribution of training data was sparser, and the complicated model would easily overfit the training data; whereas for IBT-GBDT, with more training data, the over-fitting was suppressed.
To further evaluate the reliability of probabilistic forecasting, average coverage error (ACE) is introduced, which is defined as: where C i is the indicative function of whether y i are covered by the prediction interval.
For any probabilistic forecasting model, the reliability may not be the same in different forecast horizons.Thus, the proposed model and benchmark models were tested on different forecast horizons.Measured by the ACE (the chosen prediction interval nominal confidences (PINCs) is 90%), a comparison of the reliability across different forecast horizons is shown in Table 7.According to the results, the criteria ACE maintained within certain ranges in all forecasting horizons, which indicates that the proposed IBT-GBDT model achieved a rather high reliability in short probabilistic forecasting.

Comparison of Forecasting Error
In this paper, the forecasting errors were measured by the quantile score (QS), which was defined in (21).By definition, the QS of a model is the average of the pinball loss over all target percentiles.However, for different probabilistic wind power forecasting methods, their performance on different target percentiles is uneven.Thus, their pinball losses on several selected percentiles are listed and compared in Table 8.When the target percentile τ is away from the central percentile, Table 8 shows that the improvement of the IBT-GBDT model over the DL-GBDT model is larger.The reason is discussed in the following.Around the edge of the distribution of the wind power generation, the training data were relatively scarce and had higher volatility.Furthermore, according to the definition of pinball loss, these higher volatility data had higher weights when the target percentile τ was away from the central percentile.Those training samples with high volatility caused the quantile regression model to fluctuate more, which led to lower predictive accuracy.In the proposed IBT-GBDT model, the volatility and variation of training data were reduced when more training data were available; whereas for the DL-GBDT model, the shortage of training data and the resulting high volatility still existed.This explains why the proposed IBT-GBDT model was more effective when the target percentile τ was away from the central percentile.
In addition, according to the bias variance theory, a complicated model will easily overfit a small dataset, but a complicated model is better than a simple model when more data are available.For the tree-based model, max depth D max reflects the complexity of the model.In Table 3, the validation result shows that the optimal value of D max in the IBT-GBDT model is larger than the optimal value of D max of the other benchmark models.This means, with more training data available in IBT-GBDT, the training dataset was large enough for the IBT-GBDT model with higher model complexity getting the upper hand, which led to the improvement in prediction accuracy.
To further test the applicability of the proposed IBT-GBDT model, this model was then applied to the zones other than Zone 7.For different zones, the QS of IBT-GBDT model was calculated and compared with the QS of the benchmark models in Table 9.According to Table 9, the QS improvement over DL-GBDT achieved by IBT-GBDT ranged from 0.54%-2.40%for different zones, and the average improvement was 1.46%.Furthermore, for different tasks, the QS of the IBT-GBDT model was calculated and compared with the generalized additive tree model (GAT) [33], the linear regression model (LR) [34], and three benchmark models.The QS of these models (The QS of the DL-GBDT model, the GAT model, and the LR model came from the provisional leader board of the GEFCom2014.The provisional leader board can be download from the Supplementary Material of [32]) for different tasks are recorded in Table 10.One advantage of transfer learning is that it performs well when the training set is small.Thus, it is reasonable to infer that IBT-GBDT would be more effective than benchmark models when the training set becomes smaller.To illustrate this effect, the forecasting errors of the models trained with a small base training set (5%, 10%, 20%, and 50% of the samples of the original base training data) were calculated for both the IBT-GBDT model and the three benchmark models.Their QS are compared in Table 11.According to Table 11, the maximum improvement over DL-GBDT is 4.74%.The forecasting error of IBT-GBDT was always smaller than other GBDT-based benchmark models, especially when the training set of the target problem was small.For different sample percentages of base training data, the weights of different zones are recorded in Table 12.It shows that those weights are almost the same across different sample percentages of the base training set.This result is in agreement with the above analysis, which is that the weights of the source problems just depend on their relatedness to the target problem.

The Relatedness between Different Zones
Based on the converged weight, the relatedness between different zones is analyzed.One characteristic of the proposed IBT-GBDT model is that the converged weight varies with different tasks.Thus, the converged weights were averaged over all tasks and formed a correlation matrix (the value of elements on diagonal axis were set to one).In Figure 5, a heat map is drawn to present the correlation.According to Figure 5, Zone 7 and Zone 8 are closely cross-correlated.This result agrees with the analysis of the correlation between zones conducted by [35].

The Comparison of Computational Time
The training process of those GBDT-based models is very time consuming.For the proposed IBT-GBDT model, which added extra structure onto the GBDT model, it is necessary to compare the computational time.In this paper, the GBDT algorithm was built on the R GBM package [48].
The training times (the tests were performed on an Intel(R) Core(TM) i7-4790 CPU, with 8 GB RAM, running a 64-bit Windows 10 operating system) of the IBT-GBDT model and the three benchmark models are compared in Table 13.According to Table 13, the DL-GBDT model and GBDT model have similar training times.However, that of the IBT-GBDT model was three-times larger than that of the basic GBDT model, mainly because there were more training data involved in the training process.
i represent the i th input variable and output wind power from the m th source problem, respectively; and ε

Figure 1 .
Figure 1.The structure of the proposed IBT-GBDT model.

1: 6 :
Choose w (target) 2: Initialize every w (m) to 1 3: repeat 4: Complete the training of the inner quantile GBDT layer with τ = 0.5 according to the updated weight w (m)Calculate w (m) according to(18) 7: until Every w (m) converges 8: Complete the training of the inner quantile GBDT layer with τ = 0.01 ∼ 0.49 and 0.51 ∼ 0.99 based on the converged weight w(m)

Figure 5 .
Figure 5.The correlation between different zones.

Table 1 .
Data period of the scoring task.

Table 4
presents the optimal value of w target in the proposed IBT-GBDT model of different zones.

Table 4 .
w target settings for the IBT-GBDT model.

Table 5 .
Weights of the training sets from different zones after each iteration.

Table 6 .
Pinball loss after each iteration step.

Table 8 .
The pinball loss of the IBT-GBDT model and benchmark models.

Table 9 .
The QS improvement of the IBT-GBDT model over the benchmark models.

Table 10 .
Comparison of QS.

Table 11 .
QS of different models across different amounts of training data.

Table 12 .
Weights after convergence (different sample percentages of base training data).

Table 13 .
The comparison of the training times.