Next Article in Journal
Application of the Curvilinear Coordinate Method for the Numerical Solution of the Navier–Stokes Equations in Domains with Complex Boundaries
Previous Article in Journal
Remaining Useful Life Prediction of Fracturing Truck Valve Bodies Based on the CB2-RUL Algorithm
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Enhancing Short-Term Wind Energy Forecasting with XGBoost and Conformal Prediction for Robust Uncertainty Quantification

by
Rabelani Innocent Nthangeni
,
Caston Sigauke
*,
Thakhani Ravele
and
Thinawanga Hangwani Tshisikhawe
Department of Mathematical and Computational Sciences, University of Venda, Thohoyandou 0950, South Africa
*
Author to whom correspondence should be addressed.
Computation 2026, 14(3), 56; https://doi.org/10.3390/computation14030056
Submission received: 22 January 2026 / Revised: 19 February 2026 / Accepted: 27 February 2026 / Published: 1 March 2026
(This article belongs to the Section Computational Engineering)

Abstract

This paper presents probabilistic wind energy forecasting using quantile regression averaging combined with a conformal prediction modelling framework. The study uses data from Eskom, South Africa’s power utility company. The data is from April 2019 to November 2023. A partial linear additive quantile regression (PLAQR) averaging method is used to combine forecasts from two competing forecasting models: eXtreme Gradient Boosting (XGBoost) and Principal Component Regression (PCR). To compare the predictive abilities of the models, two data splits are used: 80%, 10% and 10% for the first set, and 85%, 10% and 5% for the second set, for training, validation and testing, respectively. Empirical results suggest that the combined predictions from PLAQR perform better than the individual models, significantly improving calibration and accuracy. The proposed combination has the smallest root mean square error (RMSE) and the highest probability of change in direction (POCID). The combination captures nonlinearities and produces well-calibrated probabilistic results. Probability integral transform histograms validate this. This performance gain reflected the importance of data volume. This is reinforced by the fact that the PLAQR model, which combines the benefits of tree-based approaches and linear models, is a robust modelling approach for reliable renewable energy forecasting. Future research directions should consider more varied ensembles.

1. Introduction

1.1. Research Motivation

The need for accurate probabilistic wind energy forecasting is not an abstract concept but an imperative for South Africa’s economy. As the country moves towards a diversified energy mix, the inherent variability of wind-based electricity poses substantial uncertainty for the system. Deterministic forecasting methods, which only predict the point at which wind turbines will generate electricity, are not effective in the long run. Not only do they fail to consider the range of possible electricity generation, but they also result in substantial economic losses, as grid management must keep costly fossil fuel reserves on standby or risk curtailing wind-based electricity during unexpected generation.

1.2. Literature Review

Wind energy is one of the cornerstones of renewable energy production worldwide [1]. Nevertheless, the unpredictable nature of wind speed, influenced by spatial and temporal variability, has been a major challenge for power grid management [2]. As a result, short-term wind energy forecasting, which aims to predict wind energy production from minutes to days in advance [3], has become a crucial task for ensuring the stability of power grids, minimising costs, and optimising the use of renewable energy resources [4,5]. Despite the development of sophisticated forecasting models, wind energy forecasts have been error-prone owing to the chaotic nature of wind [2]. This paper aims to overcome the challenges by developing a short-term wind energy prediction model using the XGBoost algorithm and comparing it with the Principal Component Regression method. Furthermore, it aims to improve the model by incorporating conformal prediction. To understand the existing body of research this literature review highlights the development of wind energy forecasting techniques both chronologically and topically.

1.2.1. Evolution of Forecasting Methods: From Statistics to Machine Learning

In the early stages of wind forecasting, statistical and time series models were used. Methods such as the Autoregressive Integrated Moving Average (ARIMA) model and the Persistence Model (PM) were popular owing to their simplicity [6]. Nevertheless, these classical models are based on linearity and stationarity, which makes it difficult for them to represent the nonlinear relationships between wind power and meteorological variables [7]. This inherent drawback led to the use of more sophisticated models.
The emergence of machine learning (ML) algorithms brought a major shift in both theme and timeline. The development of Artificial Neural Networks (ANNs) and Support Vector Machines (SVMs) demonstrated greater ability to model nonlinear processes [8]. Recent studies have shown that ML algorithms, including Random Forest (RF), XGBoost, K-Nearest Neighbour (KNN), and Multi-Layer Perceptron (MLP), have been able to produce more accurate results than their statistical counterparts [4]. For example, gradient boosting machines (GBMs) achieved a normalised mean absolute error (NMAE) of 5.15% in short-term forecasting, outperforming traditional methods [9]. RF and GBM algorithms have further pushed the boundaries of forecasting by allowing multiple learners to work together to produce more accurate predictions [10].

1.2.2. XGBoost and Hyperparameter Optimisation in Wind Forecasting

Under the ML framework, XGBoost has proven to be a highly effective algorithm because of its efficiency, scalability, and regularisation properties [11]. Its use in wind energy forecasting has been well established. For instance, ref. [12] developed a Bayesian-optimised XGBoost (BO-XGBoost) model that performed better than SVMs, Kernel Extreme Learning Machine (KELM), and Long Short-Term Memory (LSTM) models under different testing scenarios, including adverse weather conditions [13].
The performance of XGBoost relies heavily on hyperparameter optimisation. Bayesian optimisation is highly effective, beating grid search methods comprehensively by effectively exploring the complex hyperparameter space [12]. Optimisation techniques have also been coupled with feature engineering; ref. [5] coupled XGBoost with LSTM and technical analysis tools such as MACD, resulting in a highly accurate normalised mean absolute error of 0.0396. Although deep learning architectures such as CNN-GRU may occasionally outperform XGBoost, they remain highly competitive and reliable options for day-ahead forecasting problems [14]. Hybrid models continue to break new ground in performance, with techniques such as Boost-LR (combining XGBoost, CatBoost, and RF) resulting in substantial reductions in error [15].

1.2.3. Uncertainty Quantification and the Role of PCR

The literature indicates a critical gap in quantifying prediction uncertainty. Most of the literature, including the ones mentioned above, focuses on point forecast accuracy (e.g., MAE, RMSE) and lacks information about forecast uncertainty. This is a major drawback for grid managers who require risk assessment and informed decision-making under uncertainty. Although [2] investigated probabilistic forecasting using ensemble approaches, and statistical tests such as the Diebold–Mariano test exist for model comparison, a comprehensive framework for building prediction intervals with statistical guarantees has yet to be explored [14].
Conformal prediction, proposed by [16], provides a remedy by offering valid prediction intervals with a prescribed confidence level [17]. Although its integration with ML models has demonstrated potential for quantifying uncertainty measures in other applications [18], its extension to wind energy forecasting has yet to be explored. A detailed discussion of conformal prediction is given in [19].
Moreover, there is a significant gap in relation to Principal Component Regression (PCR). Although PCR combines PCA and linear regression to address multicollinearity, a problem often encountered in meteorological data, it is rarely reported in the literature for wind energy estimation. There are no comparisons of its performance with the latest ML approaches.

1.2.4. Summary of the Literature and Research Gap

While the literature clearly shows an evolution from statistical models to sophisticated machine learning methods like XGBoost, which has been shown to improve point forecast accuracy considerably, there has been a lack of research on uncertainty quantification. Moreover, there has been a lack of research on the potential of simple yet effective techniques such as PCR. This research aims to bridge this gap by developing a framework that combines the power of XGBoost with the accuracy of conformal prediction to provide not only point forecasts but also prediction intervals, serving as a benchmark for techniques like PCR. This will address the first and foremost requirement of modern power management systems. Table 1 summarises the key studies discussed, highlighting their methodologies, focus, and limitations.

1.3. Contributions and Research Highlights

The key innovation in this work lies the combination of tree-based and linear modelling methods, using Partial Linear Additive Quantile Regression (PLAQR), which is theoretically grounded in the unique properties of wind energy time-series data. This is justified as follows:
  • The time series of wind energy data has two essential characteristics that underpin our hybrid model: (i) nonlinear transitions between regimes based on atmospheric stability constraints, which are addressed by the tree-based model, and (ii) linear trend components during stable atmospheric regimes, which are addressed by PLAQR. Our hybrid model is not a result of heuristics, but rather a consequence of the physical insight that wind energy production occurs on multiple scales: nonlinear atmospheric processes control transitions between regimes. However, linear relationships dominate during stable operation. PLAQR captures this hierarchical structure by permitting tree-based models to divide the feature space into regions where linear relationships hold.
  • The stochastic process of wind generation, which is heteroscedastic and non-stationary, makes it difficult to apply conventional parametric uncertainty analysis. Conformal prediction is especially useful in this case because it is a distribution-free method for uncertainty analysis that holds under the actual data-generating process, which is unknown. In the context of wind energy prediction, where the distribution varies with weather and seasonal patterns, this is especially useful because it ensures that the prediction intervals have the correct coverage regardless of the true error distribution.
  • Instead of offering a heuristic combination, PLAQR provides calibrated probabilistic predictions via the theoretical guarantee of finite-sample coverage provided by the conformal framework for prediction sets. The incorporation of tree-based nonlinearities improves point prediction accuracy (lower RMSE) and directional correctness (higher POCID), while preserving valid estimates of uncertainty, as verified by Probability Integral Transform (PIT) histograms. This tackles the inherent trade-off between sharpness and calibration in probabilistic forecasting.
  • The improvement in performance with an increase in the amount of training data from 80% to 85% is not only empirical but also reflects the consistency properties of both the ensemble technique and conformal prediction. As data volumes increase, tree-based techniques will be able to distinguish between smaller regimes. On the other hand, the non-asymptotic properties of the conformal framework will be more refined.
  • The proposed modelling framework offers a template for forecasting renewable energy, addressing the key challenge of producing point forecasts and uncertainty measures for a non-standard data-generating process. The underlying principles of the approach, including regime-based hybrid modelling and distribution-free uncertainty quantification, can be applied to other areas of renewable energies.
The remainder of this paper is structured as follows. Section 2 introduces the models, Section 3 presents the empirical results, and Section 4 provides a detailed discussion of these findings. Finally, Section 5 offers concluding remarks.

2. Models

The modelling framework proposed in this study is given in Figure 1. The wind energy prediction system uses the eXtreme Gradient Boosting (XGBoost) model with conformal prediction. A comparative analysis is done with Principal Component Regression (PCR). After preparing the data with wind speed as the target variable, the model selection step is where the flowchart branches out. In the XGBoost part of the flowchart, the model uses 80–10–10 and 85–10–5 data splits for training, validation and testing sets, respectively. The training is done with parameters such as m a x . d e p t h = 6 and η = 0.1 , uses early stopping, and computes feature importance before performing conformal prediction with α = 0.05 . In the PCR part of the flowchart, the model applies the same techniques, uses leave-one-out cross-validation, and finally selects the optimal components. The models then generate forecasts, compute MAE, RMSE, PICP, and MPIW, and produce visualisations.

2.1. eXtreme Gradient Boosting

Tianqi Chen and Carlos Guestrin introduced XGBoost in 2016 [22]. It builds on gradient boosting [23]. XGBoost is a highly scalable and efficient gradient-boosting algorithm [24]. Gradient boosting is an ensemble learning technique that sequentially constructs multiple decision trees. Each new tree is trained to predict the errors (residuals) made by the previous trees, enabling iterative improvements in overall prediction accuracy. This process results in a strong predictive model.
The key principles of gradient boosting are as follows:

2.1.1. Additive Learning

Additive learning in XGBoost is a boosting ensemble technique where the predictive model is built iteratively. This process involves sequentially adding new decision trees to an already trained ensemble. In XGBoost ensemble methods, additive learning builds the final prediction model gradually. It starts with an initial model, and in each iteration, a new weak learner f m ( x ) is trained to address the errors of the ensemble F m 1 ( x ) . This weak learner is then added to the ensemble, thereby improving the overall model. A mathematical representation of this is given in Equation (1).
F m ( x ) = F m 1 ( x ) + f m ( x )

2.1.2. Loss Function

The loss function is the difference between the predicted values and the actual values. XGBoost has been shown to handle a wide range of loss functions, depending on the problem being tackled. The loss function in regression problems is the Mean Squared Error, given in Equation (2).
L ( y i , y ^ i ) = ( y i y ^ i ) 2 ,
where y i is the actual values and y ^ i is the predicted values.

2.1.3. Regularisation

Regularisation is a crucial set of techniques to prevent the model from overfitting the training data. XGBoost incorporates a regularisation term, Ω ( f j ) , into its overall objective function Θ . The objective function aims to minimise both the prediction error (measured by the loss function, L ( y i , y ^ i ) and the model complexity measured by the regularisation term. The equation is as follows:
Θ = i = 1 M L ( y i , y ^ i ) + j = 1 J Ω ( f j )
The specific form of the regularisation term used here is
Ω ( f j ) = γ T + 1 2 λ w 2 2 ,
where T is the number of leaf nodes, and w is the weight of the tree, which specifies the complexity of the tree, that is, Ω ( f j ) = γ T + 1 2 λ w 2 2 , γ and λ are the parameters controlling complexity. The larger the value, the more complex the tree’s structure.
By adding this regularisation term to the objective function, XGBoost does not just minimise error on the training data; it also builds an inherently simpler, more generalisable model. Rather than fitting the entire model at once, it is optimised iteratively. We begin with an initial prediction y ^ i ( 0 ) = 0 . At each step, we add a new tree to enhance the model. The updated prediction after adding the t t h tree can be expressed as follows:
y ^ i t = y ^ i ( t 1 ) + f t ( x i )
In decision trees, boosting is used during the model’s training to minimise the objective function. This technique involves iteratively adding a new function f to the existing model. Therefore, in the t th iteration, a new function is added as follows:
Θ = i = 1 M L ( y i , y i ( t 1 ) + f t ( x i ) ) + Ω ( f j )
The algorithm can also handle missing data and make precise decisions on where to split data based on gains. Further, XGBoost relies on post-pruning for improving efficiency. It is well-known for its scalability and flexibility, making it one of the most favourable algorithms for handling big data. Though XGBoost has many advantages over other learning algorithms, its parameters must be carefully tuned for better performance [20].

2.2. Principal Component Regression

Principal Component Regression (PCR) is used as a benchmark model in this study. It is a dimension-reduction technique useful when multicollinearity exists among explanatory variables in the multiple regression framework. A standard multiple linear regression model is defined as:
Y = X β + ε ,
where Y represents the vector of observed values, X denotes a matrix of explanatory variables, β represents the parameter vector, and ε is the vector of error terms. The least squares solution for estimating the parameter vector β ^ is expressed as follows:
β ^ = ( X T X ) 1 X T Y + ε
The challenge is that at times X T X may be singular due to either multicollinearities or as a result of the number of variables exceeding the sample observations. PCR changes the original matrix X into a lower-dimensional orthogonal space using Singular Value Decomposition (SVD) to solve this issue.
To get our first m principal components, we use SVD to approximate the X matrix:
X = X ˜ ( m ) + ε X = ( U ( m ) D ( m ) ) V ( m ) T + ε X = T ( m ) P ( m ) T + ε X ,
where T represents orthogonal scores, while P denotes loadings. Both U and V are orthonormal and the matrix D is diagonal with positive real entries. Consequently, regressing Y on the scores results in:
β ^ = P ( T T T ) 1 T T Y

2.3. Quantile Regression

Quantile regression (QR) is a modelling framework for estimating conditional quantiles of the response variable and was developed by Koenker and Bassett [25]. If we let Y to represent a random variable with corresponding covariates X, then the conditional quantile q Y X ( τ ) , where τ ( 0 , 1 ) , is defined as q Y X ( τ ) = inf { y R : F Y X ( y ) τ } , where F Y X represents the conditional distribution of Y given X. The conditional quantile q Y X ( τ ) is a solution to
q Y X ( τ ) = arg min g E ρ τ ( Y g ( X ) ) X ,
where ρ τ ( · ) is the pinball loss function defined as ρ τ ( u ) = u ( τ I ( u < 0 ) ) and I ( · ) is an indicator function. Now, let Y t = X t β + ε t be a linear quantile regression model where Y t denotes wind energy, X t is the design matrix, β is a vector of parameters and ε t is the error term; then, the estimates of β are given as
β ^ τ = arg min β R p t = 1 n ρ τ Y t X t β .

2.4. Partial Linear Additive Quantile Regression Framework for Forecast Combination

To go beyond the linear combination of forecasts, we propose a partial linear additive quantile regression model. The proposed model enables us to combine predictions from XGBoost and PCR models. The proposed model assumes that the true conditional quantile can be expressed as a linear combination of base forecasts plus a nonlinear adjustment term [26].
Let:
  • f t XGB be the point forecast from an XGBoost model at time t.
  • f t PCR be the point forecast from a PCR model at time t.
  • Y t be the actual realized value.
We define our combined forecast as the output of a partial linear additive quantile regression model. For a given quantile τ , the conditional quantile function is:
q Y | F ( τ | f t XGB , f t PCR ) = β 0 ( τ ) + β 1 ( τ ) f t XGB + β 2 ( τ ) f t PCR + g ( f t XGB , f t PCR ) ,
where β 0 ( τ ) is the intercept; β 1 ( τ ) and β 2 ( τ ) are the linear weights for the two base forecasts and this forms the linear component, and g ( · , · ) is a smooth, unknown function that captures the nonlinear interactions and residual patterns not accounted for by the linear combination. The parameters β 0 , β 1 , β 2 and the function g are estimated simultaneously for a given τ by minimising a regularised version of the quantile loss function:
arg min β 0 , β 1 , β 2 , g t = 1 T ρ τ Y t β 0 + β 1 f t XGB + β 2 f t PCR + g ( f t XGB , f t PCR ) + λ · J ( g ) ,
where T is the number of time points in the training set; ρ τ ( · ) is the pinball loss as defined in Equation (11); J ( g ) is a penalty term that enforces smoothness on the function g (e.g., the integral of the squared second derivatives); λ is a smoothing parameter that controls the trade-off between fitting the data and the smoothness of g. This is an additive model estimated with a quantile loss objective.

2.5. Evaluation Metrics

Evaluation metrics are essential for determining the ability of forecasting methods to predict future time series data. In the subsequent sections, we discuss some performance evaluation metrics.

2.5.1. Root Mean Square Error

The root mean square error (RMSE) is the square root of the average squared difference between the predicted and actual values, indicating how well the model’s predictions fit the actual data. The RMSE is calculated using Equation (15).
RMSE = 1 m t = 1 m ( y t y ^ t ) 2 ,
where y t is the actual value, y ^ t denotes the predicted value of the t th observation and m represents the total number of predictions. A lower RMSE value indicates that the predictions are closer to the actual data points, indicating better model performance.

2.5.2. Mean Absolute Error

The mean absolute error (MAE) is the average absolute difference between the predicted and actual values. The formula for MAE is given in Equation (16).
MAE = 1 m t = 1 m y t y ^ t ,
where y t , y ^ t and m are as defined in Equation (15). A lower MAE indicates that the model’s predictions are closer to the actual values.

2.5.3. Mean Absolute Scaled Error

The mean absolute scaled error (MASE) is a forecast evaluation metric which compares the forecast errors to those of a naive benchmark model. For a seasonal time series, the MASE is defined by Equation (17).
MASE = 1 m t = 1 n | y t y ^ t | 1 n 1 t = 2 m | y t y t 1 | ,
where y t , y ^ t and m are as defined in Equation (15) and y ^ t 1 denotes the predicted value of the ( t 1 ) th observation. A low MASE value is desirable as it indicates better predictive performance.

2.5.4. Mean Bias Error

It measures the average bias in the predictions. Mean Bias Error can be positive or negative, indicating whether the model overestimates or underestimates the actual values. MBE is given by Equation (18).
MBE = 1 m t = 1 m ( y t y ^ t ) ,
where y t , y ^ t and m are as defined in Equation (15).

2.5.5. Prediction of Change in Direction

A major disadvantage of using evaluation metrics such as RMSE, MAE, MASE, and MBE is that they do not account for changes in forecast direction. This drawback is overcome by using the prediction of change in direction (POCID), which counts the number of correct direction changes [27]. The POCID is calculated using Equation (19).
POCID = 100 m t = 1 m D t ,
where m is the total number of forecasted periods, D t is a directional indicator function for time period t, defined as:
D t = 1 if ( y t y t 1 ) ( y ^ t y ^ t 1 ) > 0 0 otherwise
The condition ( y t y t 1 ) ( y ^ t y ^ t 1 ) > 0 checks if the predicted and actual movements have the same sign. The actual change is denoted by y t y t 1 , while y ^ t y ^ t 1 is the predicted change. If the product is positive, both changes are in the same direction (both positive or both negative), and the prediction is counted as correct ( D t = 1 ). However, if the product is zero or negative, the directions did not match, and the prediction is counted as incorrect ( D t = 0 ).
A major drawback of POCID is that it does not consider the closeness of the predictions to the actual values. In their study, ref. [27] developed a fitness metric. This metric combines POCID and MSE [27]. In this study, we extend the fitness metric by using a data-driven weight, w and RMSE. The modified fitness metric is given in Equation (20).
Fitness = POCID 1 + w RMSE ,
where w is calculated as follows
w = 1 1 k i = 1 k R M S E i = k i = 1 k R M S E i ,
with k representing the number of models. Therefore
Fitness = POCID 1 + k i = 1 k R M S E i R M S E i
Higher values of the fitness metric indicate better model performance in predicting fluctuations and greater prediction accuracy.

2.6. Conformal Prediction

Conformal Prediction (CP) is a modelling approach which produces prediction sets or prediction intervals with a certain coverage probability [19]. Given a user-specified error probability α , CP guarantees that the true target output Y new for a new input X new is included in the predicted set C ( X new ) with probability at least 1 α [28]. CP is model-agnostic and makes finite-sample predictions with only the assumption of exchangeability.

2.6.1. Mathematical Framework

The most computationally efficient variant of CP is Split Conformal Prediction (SCP) [19]. This variant requires a pre-fitted model f ^ , trained on a proper training set and a calibration set.
Nonconformity Measure
A function V ( x , y ) measures how different a new data point is to a previously observed data point. For regression, the absolute residual is a popular choice:
V ( x , y ) = | y f ^ ( x ) |
Calibration
Given a calibration set J = { ( x 1 , y 1 ) , , ( x n , y n ) } of size n = n cal :
  • Compute nonconformity scores for all points in J :
    s i = V ( x i , y i ) = | y i f ^ ( x i ) | , i = 1 , , n
  • For a desired miscoverage rate α , calculate the critical quantile q from the empirical distribution of the scores:
    q = Quantile { s 1 , , s n } , ( n + 1 ) ( 1 α ) n
Prediction Interval
For a new test point x new , the prediction interval is [19]:
C ( x new ) = [ f ^ ( x new ) q , f ^ ( x new ) + q ]

2.6.2. Coverage Guarantee

The fundamental guarantee of CP is marginal coverage. For a new exchangeable test point ( X new , Y new ) [19]:
P Y new C ( X new ) 1 α
This probability holds over the randomness in both the calibration set and the new test point
Proof. 
Consider the combined set of calibration scores { s 1 , , s n } and the test score s new = V ( x new , y new ) . By exchangeability, s new is equally likely to rank anywhere among these n + 1 scores. The probability that s new is less than or equal to the ( 1 α ) -th quantile of the calibration scores is therefore at least 1 α . Since Y new C ( X new ) if and only if s new q , the coverage guarantee holds for any finite n. □

2.7. Prediction Interval Evaluation Metrics

2.7.1. Prediction Interval Coverage Probability

The Prediction Interval Coverage Probability (PICP) evaluation metric calculates the percentage of actual observations that lie within their corresponding predicted intervals. It is the key measure of the validity or reliability of an interval [29]. The formula for PICP is given in Equation (22).
PICP = 1 m t = 1 m c t , where c t = 1 if y t [ L t , U t ] 0 if y t [ L t , U t ] ,
where y t is the actual value of the t th observation, [ L t , U t ] is the prediction interval with lower bound L t and upper bound U t for that observation and m is the total number of predicted observations.

2.7.2. Mean Prediction Interval Width

The Mean Prediction Interval Width (MPIW) measures the average width of prediction intervals and quantifies their sharpnessor precision. A narrower interval is more informative, provided it maintains valid coverage [29]. The formula for MPIW is given by:
MPIW = 1 m t = 1 m ( U t L t ) ,
where U t , L t and m are as defined in Equation (22). A lower MPIW is desirable, but it should only be used to compare models that have already achieved a valid PICP.

2.7.3. Coverage Width-Based Criterion

The Coverage Width-based Criterion (CWC) is a score that penalises both low coverage and large interval widths. It provides a single value to be minimised [29]. One common formulation is:
CWC = MPIW · 1 + γ ( PICP ) · e η ( PICP μ ) , γ = 1 if P I C P < μ 0 if P I C P μ ,
where μ is the nominal coverage rate, γ ( PICP ) is an indicator function and η is a scaling parameter that controls how heavily under-coverage is penalised.

2.7.4. Probability Integral Transform Histogram

The Probability Integral Transform (PIT) is a graphical method for evaluating the probabilistic calibration of an entire predictive distribution, not just a single interval. It is calculated by applying the forecasted Cumulative Distribution Function (CDF) to its corresponding actual observation [30]. The CDF is given in Equation (25).
z t = F t ( y t ) ,
where y t is the actual value of the t th observation and F t is the forecasted CDF for that observation. The resulting set of values { z 1 , , z m } is then plotted as a histogram. For a perfectly calibrated forecast, the z t values should be uniformly distributed on [ 0 , 1 ] , resulting in a flat histogram.

2.7.5. Diebold–Mariano Test

In addition to computing forecast accuracy measures, it may be necessary to test whether the differences in forecast accuracy are statistically significant. For the purpose of checking the predictive accuracy of the rival models, the Diebold–Mariano test is applied in this research, as proposed by [31] and discussed in [32].
Let y t , τ for t = 1 , , m be the observed wind energy values, and let y ^ i , t , τ and y ^ j , t , τ be two different forecasts obtained from models i and j, respectively, for i j and i , j = 1 , 2 , , K . The forecast errors are defined as ε i , t , τ = y ^ i , t , τ y t , τ for each model. If g ( ε i , t , τ ) is a loss function related to the forecast errors.
g ( ε i , t , τ ) = e λ ε i , t , τ 1 λ ε i , t , τ .
The loss differential series between the two forecasts is then constructed as [32]:
d t = g ( ε 1 , t , τ ) g ( ε 2 , t , τ )
The null hypothesis tested by the Diebold–Mariano test is that both forecasts have equal forecasting accuracy, which is expressed as H 0 : E ( d t ) = 0 . In contrast, the alternative hypothesis is H 1 : E ( d t ) , which suggests a statistically significant difference in forecast performance.

3. Results

3.1. Exploratory Data Analysis

3.1.1. Data Source

This study uses wind energy data sourced from Eskom, South Africa’s power utility company. This data is freely available at https://www.eskom.co.za/dataportal/, accessed on 5 February 2025. The data is from 2 April 2019 to 28 November 2023.

3.1.2. Data Characteristics

The data will be split into two sets: (80% training, 10% validation, and 10% test) and (85% training, 10% validation, and 5% test). The response variable in this study is wind energy, for which we will make predictions. The explanatory variables in the dataset are as follows:
  • difLag1is the first-order hourly difference in the Wind energy produced time series: Y t Y t 1
  • difLag2 is the second-order hourly difference in the Wind energy produced time series: Y t Y t 2
  • difLag12 is the difference from half a day prior: Y t Y t 12
  • difLag24 is the daily seasonal difference: Y t Y t 24
  • Hour represents the hour of the day (0 to 23).
  • Day represents the day of the week.
  • noltrend is the estimated nonlinear trend component of the wind energy produced time series. This component was extracted through seasonal and trend decomposition using Loess.
Figure 2 displays eight histograms that show different patterns across various variables. Day and Hour have uniform distributions, while difLag1, difLag2, difLag12 and difLag24 show bell-shaped distributions centred around zero, showing small lagged differences. In contrast, noltrend and wind exhibit right-skewed distributions, with more low values and fewer high values.

3.1.3. Summary Statistics

Table 2 shows the summary statistics for wind energy from 2 April 2019 until 27 November 2023. The data have a minimum wind energy of 19.8 MWh and a maximum of 3102.2 MWh for the specified period. The central tendency shows a median of 903.0 MWh and a mean of 982.5 MWh, which are close to each other, suggesting a moderately right-skewed distribution with the mean slightly higher than the median. This observation is supported by the skewness of 0.7557.
Furthermore, the kurtosis value of 3.3057 suggests a leptokurtic distribution, indicating heavier tails and a sharper peak than a normal distribution. This may reflect occasional high-production spikes or outliers in wind energy data. The interquartile range (IQR), calculated as the difference between the third quartile (Q3) of 1306.8 MWh and the first quartile (Q1) of 568.4 MWh, is 738.4 MWh, indicating a moderately wide middle 50% of the data.

3.2. Data Processing

3.2.1. Dataset Description

Figure 3 shows the time series plot of wind energy. There has been steady increase in wind energy production over the yeas 2019 to 2023.

3.2.2. Missing Values

Figure 4 shows that the data has no missing values in all variables.

3.2.3. Relationship Between Variables

Figure 5 displays the Pearson correlation coefficients among various variables. Red indicates strong positive correlations, while white shows no linear relationship. Each feature is perfectly correlated with itself along the diagonal.
The analysis reveals strong multicollinearity among difLag variables, with difLag1 and difLag2 showing particularly high correlation of 0.87. Wind exhibits moderate positive relationships with difLag24 (0.52), difLag12 (0.53), and a weaker link with difLag2 (0.19). Additionally, noltrend correlates strongly with Wind (0.66) and moderately with difLag24 (0.41). In contrast, the Day and Hour features show weak correlations with most other variables, suggesting they may not be strongly linked to the rest of the dataset.

3.2.4. Variable Importance

A comparison of variable importance is presented in Table 3 for two different sizes of the training dataset (80% vs. 85%). The variable noltrend is always the dominant predictor with the highest importance value (0.66 in both models). Feature difLag12 is the second most important predictor, but its importance drops slightly as the training dataset size increases from 0.2309 to 0.2223. The relative order of most variables remains unchanged, indicating that the structure of variable importance is stable. There are some small changes in the less important variables, like Hour (from 0.0199 to 0.0302) and difLag24 (from 0.0605 to 0.0508). Day is still insignificant in both models.
The analysis of model performance across two data splitting strategies reveals the optimal number of boosting rounds (nrounds) to minimise generalisation error and avoid overfitting is given in Table 4. For the 80% training, 10% validation, and 10% test split, the best results were at nround = 143, achieving the lowest RMSE and MAE. In the 85% training, 10% validation, and 5% test split, the optimal nround was 152. In both cases, exceeding these nrounds led to overfitting. emphasising the need for early stopping.
Figure 6, which is the zoomed sample of Figure A1 given in Appendix A shows that the XGBoost model shows strong and consistent performance in both point predictions and uncertainty quantification throughout the entire dataset. The Conformal Prediction Intervals effectively bound the actual wind energy across all segments, confirming that they are well-calibrated and reliable.
Figure 7 shows the zoomed samples from Figure A2, given in Appendix A which indicate that the predicted values closely align with the actual production values.

3.3. Choosing Number of Components

When selecting the optimal number of components for Principal Component Regression (PCR), the one-sigma heuristic was used. This approach is often recommended in the literature as a way to balance model simplicity and predictive accuracy. According to [33], the one-sigma heuristic involves choosing the model with the fewest components that still has a prediction error within one standard error of the minimum error observed across all models. In essence, rather than selecting the model with the absolute lowest prediction error, this method prioritises a simpler model that achieves nearly the same level of performance, thereby minimising the risk of overfitting.

3.4. Selecting Number of Components

Figure 8 illustrates the procedure for selecting the optimal number of components using the RMSEP criterion for both the 80% and 85% training sets. The plots show a sharp decline in RMSEP with the addition of the first few components, followed by a stabilisation in RMSEP as more components are incorporated. While the absolute minimum RMSEP occurs at six components, the results indicate that most of the relevant predictive information is already captured by the first three to four components. Adding further components provides only marginal improvements to model performance.
The Table 5 highlights significant trade-offs between the two data splits. The 85% split indicates improved point forecast accuracy with reduced MASE (1.3224 vs. 1.3822) and MAE (144.61 vs. 149.98), although the RMSE marginally increases. Moving to prediction intervals, the 85% split indicates improved coverage (PICP: 0.9364 vs. 0.9275) with broader intervals (MPIW: 693.34 vs. 654.76), thus indicating improved uncertainty estimation. However, it is important to note that the CWC for the 85% split significantly improves (2064.1 vs. 2669.333), indicating improved overall interval forecasting performance despite broader prediction intervals.
Time series plots comparing actual versus predicted values in Figure 9 shows both models closely tracking observed data patterns. The 80% training split predictions align more precisely with actual values, particularly at peak points, explaining its lower MASE and MAE. Both models exhibit slight systematic underestimation, visible as predicted values frequently plotting below actual observations. The similar RMSE values reflect nearly identical prediction error magnitudes, though the 85% split demonstrates marginally better overall forecast accuracy as shown in Table 5.
Figure 10 and Figure 11 present Probability Integral Transform (PIT) histograms, which are derived from conformal p-values, which serve as a standard diagnostic tool for assessing the probabilistic calibration of a forecasting model. Both histograms provide strong visual evidence that the conformal prediction model is well-calibrated. In both plots, the distribution of p-values is approximately uniform, as desired. This consistency across both graphs confirms that the model’s estimates of uncertainty are reliable.

3.5. Results of the Diebold–Mariano Tests

We present the results from the DM tests based on the 80% training test, 10% validation and 10% test. The two models considered are M1 (XGBoost) and M2 (PCR). The null hypothesis is:
H0: 
Forecasts from M1 and M2 are equally accurate.
H1: 
Forecasts from M1 and M2 have different accuracy.
The XGBoost model significantly outperforms the PCR model in forecasting accuracy. As shown in Table 6, XGBoost achieves lower error across all evaluated metrics. This is confirmed by the Diebold–Mariano test (see Table 7), which indicates that the result is statistically significant ( p -value = 0.0015) and that the mean loss differential is −1651.345. The model also results in 4.62% and 3.18% improvements in MSE and MAE, respectively, for the XGBoost model.

3.6. Probability of Change in Direction and Fitness Tests

3.6.1. XGBoost and PCR (80% Training Test, 10% Validation and 10% Test)

Based on 80% training, 10% validation, and 10% test sets for both the XGBoost and PCR models, we combined test-set predictions using the partially linear additive quantile regression (PLAQR) averaging method. We refer to the combined predictions as fplaqrTest10. A detailed discussion of the PLAQR method is presented in [26].
The evaluation compared three forecasting models, fplaqrTest10, f1XG and f3PCR, using a combination of traditional error metrics, directional accuracy, and a composite fitness score. A summary of the model comparison (80% training test, 10% validation and 10% test) is given in Table 8, while Figure 12 shows the probability of change in direction. In terms of directional accuracy, measured by the Prediction of Change in Direction (POCID), all models performed well, with scores ranging from 70.71% to 71.71%. Model f3PCR achieved the highest directional accuracy at 71.71%. However, when considering prediction error, fplaqrTest10 recorded the lowest Root Mean Squared Error (RMSE) of approximately 179, while f3PCR had the highest RMSE at about 189.
To provide an overall assessment, a fitness metric was used that balances directional accuracy with error magnitude, applying a weight factor optimised for the observed RMSE range. According to this combined measure, model fplaqrTest10 achieved the highest fitness score of 34.28, making it the best-performing model overall. This outcome reflects its superior balance between maintaining a relatively high POCID and keeping prediction errors low. Models f1XG and f3PCR followed with fitness scores of 33.76 and 33.60, respectively.
While each model demonstrated solid performance, fplaqrTest10 is identified as the most effective, offering the optimal trade-off between accurately predicting movement direction and minimising forecast error. The fitness metric, calibrated with a weight of 0.006, effectively highlighted these differences, confirming fplaqrTest10 as the recommended choice for practical application.

3.6.2. XGBoost and PCR (85% Training Test, 10% Validation and 5% Test)

Similarly, for the 85% training, 10% validation, and 5% test sets, the combined predictions using PLAQR are denoted fplaqrTest5. The results in Table 9 compare three models: fplaqrTest5, f2XG, and f4PCR using error metrics, directional accuracy, and a combined fitness score. In contrast, Figure 13 shows the probability of change in direction. In traditional error measures, fplaqr performs best, with the lowest MAE (139.12), MSE (32,000.81), RMSE (178.89), and MASE (1.22), indicating the smallest average prediction errors. The models f2XG and f4PCR follow with progressively higher error values. All models show a low mean bias error (MBE), suggesting minimal systematic over- or under-prediction.
For directional accuracy, measured by POCID, fplaqr again leads by correctly predicting price movement direction 74.24% of the time, compared to 72.97% for f2XG and 73.31% for f4PCR. A combined fitness metric, which balances POCID and RMSE with a weighting factor of 0.006, ranks the models in the same order as fplaqrTest5. achieves the highest fitness score (35.81), followed by f2XG (34.80) and f4PCR (34.24). All are interpreted as having “good” overall performance.
The analysis confirms that fplaqrTest. is the best-performing model across both accuracy and directional metrics. The chosen weight ( w = 0.006 ) is justified as approximately the inverse of the average RMSE across models, providing an optimal penalty that allows for meaningful comparison without overly diminishing the fitness score. The results suggest this weighting is appropriate for models with RMSE values around 180–190.

4. Discussion

This research shows that the hybrid model produced by combining XGBoost and PCR via the PLAQR averaging method is better calibrated and more accurate. This result, that the PLAQR ensemble model is better for composite performance (fplaqrTest10, fplaqrTest5), is consistent with the established notion within the research community that model averaging is preferable, as it helps reduce variance and become more generalisable [34,35]. However, this has been made possible within the conformal prediction paradigm, as indicated through its well-calibrated predictions as well as its PIT Histograms, which is a major development as it proves that advanced combination methods are capable of improving both accuracy as well as probability values within machine learning paradigms despite primarily focusing on error minimisation within their model development approaches.
The superior directional accuracy (POCID) and reduced error (RMSE) of the PLAQR model confirm our hypothesis: it can indeed capture complex nonlinearities by leveraging the complementary strengths of tree-based and linear methods, while maintaining structural stability. The competitive POCID of the PCR model, along with its higher error, flags its sensitivity to directional trends but with limitations in magnitude precision. XGBoost, on the other hand, provided a compromise solution. More importantly, the performance gain with the larger training dataset (85% vs. 80%) indicates that these data-intensive models continue to benefit from more data, underscoring the significance of scale in wind energy forecasting models.
The implications of these findings extend beyond this specific forecasting problem. They provide one possible solution to the problem of constructing reliable forecasting systems in the renewable energy sector. The success of the conformal prediction method is important for understanding how to construct more trustworthy AI systems in problem domains where reliable estimates of uncertainty are important.
The effectiveness of the PLAQR approach needs to be tested with a more diverse ensemble, including other model types, such as neural networks and scoring functions. The robustness of this approach can also be tested using high-frequency time-series data. Another area could be exploring adaptive weights for the PLAQR model.

5. Conclusions

This research shows that the PLAQR ensemble of XGBoost and PCR yields a better, well-calibrated model for forecasting, not only in terms of accuracy (RMSE, POCID), but also in terms of probability, as ensured by conformal prediction. The purpose of this research was to reaffirm the significance of employing advanced methods of averaging in machine learning. As shown, combining model classes and increasing the amount of data used are vital for renewable energy forecasting. As a result, it offers clear benefits to grid management, reducing the need for expensive balancing power reserves. The high level of accuracy enables more efficient reserves, thereby generating cost savings for the system. Additionally, the boosted accuracy helps improve dispatch decisions by grid operators, which, with the rise of renewable energy, often lead to errors in energy curtailment.

Author Contributions

Conceptualisation, R.I.N., T.H.T., T.R. and C.S.; methodology, R.I.N.; software, R.I.N.; validation, R.I.N., T.H.T., T.R. and C.S.; formal analysis, R.I.N. and C.S.; investigation, R.I.N., T.H.T., T.R. and C.S.; data curation, R.I.N.; writing—original draft preparation, R.I.N.; writing—review and editing, R.I.N., T.H.T., T.R. and C.S.; visualisation, R.I.N.; supervision, T.H.T., T.R. and C.S.; project administration, T.H.T., T.R. and C.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding, and the University of Venda funded the APC.

Data Availability Statement

The data that support the findings of this study are available at https://github.com/csigauke/Enhancing-Short-Term-Wind-Energy-Forecasting-with-XGBoost-and-Conformal-Prediction, accessed on 2 January 2026. The data is analytic data which was used for developing the models used in this study.

Acknowledgments

The authors are grateful to the numerous people for helpful comments on this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
ANNArtificial Neural Networks
ARIMAAutoRegressive Integrated Moving Average
BH-XGBoostBayesian Hyperparameter-optimised XGBoost
Boost-LRBoosting with Linear Regression
CNN GRUConvolutional Neural Network and Gated Recurrent Unit
CWCCoverage Width-based Criterion
GBMGradient Boosting Machines
GPRGaussian Process Regression
KDJStochastic Oscillator
KNNK-Nearest Neighbors
LSTMLong Short-Term Memory
MACDMoving Average Convergence and Divergence
MAEMean Absolute Error
MASEMean Absolute Scaled Error
MBEMean Bias Error
MLP ANNMulti-Layer Perceptron Artificial Neural Network
MPIWMean Prediction Interval Width
NMAENormalised Mean Absolute Error
NNNeural Networks
PCAPrincipal Component Analysis
PCRPrincipal Component Regression
PICPPrediction Interval Coverage Probability
PITProbability Integral Transform
RFRandom Forest
RMSERoot Mean Square Error
SVMSupport Vector Machines
XGBoosteXtreme Gradient Boosting

Appendix A. Supplementary Plots

The series of plots in Figure A1 and Figure A2 demonstrate that the predictive models are highly effective at forecasting wind energy production.
Figure A1. Actual vs. predicted value at 80% training test, 10% validation and 10% test using the XGBoost model.
Figure A1. Actual vs. predicted value at 80% training test, 10% validation and 10% test using the XGBoost model.
Computation 14 00056 g0a1
Figure A2. Actual vs. predicted value at 80% training test, 10% validation and 5% test using the XGBoost model.
Figure A2. Actual vs. predicted value at 80% training test, 10% validation and 5% test using the XGBoost model.
Computation 14 00056 g0a2

References

  1. Behabtu, H.A.; Vafaeipour, M.; Kebede, A.A.; Berecibar, M.; Van Mierlo, J.; Fante, K.A.; Messagie, M.; Coosemans, T. Smoothing Intermittent Output Power in Grid-Connected Doubly Fed Induction Generator Wind Turbines with Li-Ion Batteries. Energies 2023, 16, 7637. [Google Scholar] [CrossRef]
  2. Kim, D.; Hur, J. Short-term probabilistic forecasting of wind energy resources using the enhanced ensemble method. Energy 2018, 157, 211–226. [Google Scholar] [CrossRef]
  3. Foley, A.M.; Leahy, P.G.; Marvuglia, A.; McKeogh, E.J. Current methods and advances in forecasting of wind power generation. Renew. Energy 2012, 37, 1–8. [Google Scholar] [CrossRef]
  4. Ekinci, G.; Ozturk, H.K. Forecasting Wind Farm Production in the Short, Medium, and Long Terms Using Various Machine Learning Algorithms. Energies 2025, 18, 1125. [Google Scholar] [CrossRef]
  5. Zheng, Y.; Guan, S.; Guo, K.; Zhao, Y.; Ye, L. Technical indicator enhanced ultra-short-term wind power forecasting based on long short-term memory network combined XGBoost algorithm. IET Renew. Power Gener. 2025, 19, e12952. [Google Scholar] [CrossRef]
  6. Giebel, G.; Brownsword, R.; Kariniotakis, G.; Denhard, M.; Draxl, C. The State-of-the-Art in Short-Term Prediction of Wind Power: A Literature Overview, 2nd ed.; ANEMOS.plus: Crete, Greece, 2011. [Google Scholar] [CrossRef]
  7. Liu, Z.; Guo, H.; Zhang, Y.; Zuo, Z. A Comprehensive Review of Wind Power Prediction Based on Machine Learning: Models, Applications, and Challenges. Energies 2025, 18, 350. [Google Scholar] [CrossRef]
  8. Lei, M.; Shiyan, L.; Chuanwen, J.; Hongling, L.; Yan, Z. A review on the forecasting of wind speed and generated power. Renew. Sustain. Energy Rev. 2009, 13, 915–920. [Google Scholar] [CrossRef]
  9. Park, S.; Jung, S.; Lee, J.; Hur, J. A short-term forecasting of wind power outputs based on gradient boosting regression tree algorithms. Energies 2023, 16, 1132. [Google Scholar] [CrossRef]
  10. Lahouar, A.; Slama, J.B.H. Hour-ahead wind power forecast based on random forests. Renew. Energy 2017, 109, 529–541. [Google Scholar] [CrossRef]
  11. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; ACM: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
  12. Xiong, X.; Guo, X.; Zeng, P.; Zou, R.; Wang, X. A short-term wind power forecast method via XGBoost hyper-parameters optimization. Front. Energy Res. 2022, 10, 905155. [Google Scholar] [CrossRef]
  13. García-Puente, B.; Rodríguez-Hurtado, A.; Santos, M.; Sierra-García, J. Evaluation of XGBoost vs. other Machine Learning models for wind parameters identification. Renew. Energy Power Qual. J. 2023, 21, 388–393. [Google Scholar] [CrossRef]
  14. Sunku, V.S.R.P.; Namboodiri, V.; Mukkamala, R. The Short-Term Wind Power Forecasting by Utilizing Machine Learning and Hybrid Deep Learning Frameworks. Probl. Reg. Energetics 2025, 1, 1–11. [Google Scholar] [CrossRef]
  15. Ahmed, U.; Muhammad, R.; Abbas, S.S.; Aziz, I.; Mahmood, A. Short-term wind power forecasting using integrated boosting approach. Front. Energy Res. 2024, 12, 1401978. [Google Scholar] [CrossRef]
  16. Vovk, V.; Gammerman, A.; Shafer, G. Algorithmic Learning in a Random World; Springer: Berlin/Heidelberg, Germany, 2005. [Google Scholar]
  17. Angelopoulos, A.N.; Bates, S. A gentle introduction to conformal prediction and distribution-free uncertainty quantification. arXiv 2021, arXiv:2107.07511. [Google Scholar]
  18. Dheur, V. Distribution-Free and Calibrated Predictive Uncertainty in Probabilistic Machine Learning. Ph.D. Thesis, UMONS—University of Mons [Faculté des Sciences], Mons, Belgium, 2025. [Google Scholar]
  19. Angelopoulos, A.N.; Barber, R.F.; Bates, S. Theoretical Foundations of Conformal Prediction. arXiv 2025, arXiv:2411.11824. [Google Scholar] [CrossRef]
  20. Kavzoglu, T.; Teke, A. Advanced hyperparameter optimization for improved spatial prediction of shallow landslides using extreme gradient boosting (XGBoost). Bull. Eng. Geol. Environ. 2022, 81, 201. [Google Scholar] [CrossRef]
  21. Ponkumar, G.; Jayaprakash, S.; Kanagarathinam, K. Advanced machine learning techniques for accurate very-short-term wind power forecasting in wind energy systems using historical data analysis. Energies 2023, 16, 5459. [Google Scholar] [CrossRef]
  22. Zhao, X.; Li, Q.; Xue, W.; Zhao, Y.; Zhao, H.; Guo, S. Research on Ultra-Short-Term Load Forecasting Based on Real-Time Electricity Price and Window-Based XGBoost Model. Energies 2022, 15, 7367. [Google Scholar] [CrossRef]
  23. Friedman, J.H. Stochastic gradient boosting. Comput. Stat. Data Anal. 2002, 38, 367–378. [Google Scholar] [CrossRef]
  24. Chen, T.; He, T.; Benesty, M.; Khotilovich, V.; Tang, Y.; Cho, H.; Chen, K.; Mitchell, R.; Cano, I.; Zhou, T.; et al. Xgboost: Extreme gradient boosting. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar] [CrossRef]
  25. Koenker, R.W.; Bassett, G. Regression Quantiles. Econometrica 1978, 46, 33–50. [Google Scholar] [CrossRef]
  26. Hoshino, T. Quantile regression estimation of partially linear additive models. J. Nonparametr. Stat. 2014, 26, 509–536. [Google Scholar] [CrossRef]
  27. Fallahtafti, A.; Aghaaminiha, M.; Akbarghanadian, S.; Weckman, G.R. Forecasting ATM Cash Demand Before and During the COVID-19 Pandemic Using an Extensive Evaluation of Statistical and Machine Learning Models. SN Comput. Sci. 2022, 3, 164. [Google Scholar] [CrossRef] [PubMed]
  28. Stocker, M.; Małgorzewicz, W.; Fontana, M.; Taieb, S.B. A Gentle Introduction to Conformal Time Series Forecasting. arXiv 2025, arXiv:2511.13608. [Google Scholar] [CrossRef]
  29. Khosravi, A.; Nahavandi, S.; Creighton, D.; Atiya, A.F. Comprehensive review of neural network-based prediction intervals and new advances. IEEE Trans. Neural Netw. 2011, 22, 1341–1356. [Google Scholar] [CrossRef]
  30. Gneiting, T.; Balabdaoui, F.; Raftery, A.E. Probabilistic forecasts, calibration and sharpness. J. R. Stat. Soc. Ser. B Stat. Methodol. 2007, 69, 243–268. [Google Scholar] [CrossRef]
  31. Diebold, F.; Mariano, R. Comparing predictive accuracy. J. Bus. Econ. Stat. 1995, 13, 253–265. [Google Scholar] [CrossRef]
  32. Triacca, U. Comparing Predictive Accuracy of Two Forecasts. 2018. Available online: https://www.lem.sssup.it/phd/documents/Lesson19.pdf (accessed on 17 September 2025).
  33. Mevik, B.H.; Wehrens, R.; Liland, K.H. Introduction to the pls Package. 2015. Available online: https://cran.r-project.org/web/packages/pls/vignettes/pls-manual.html (accessed on 23 August 2025).
  34. Nowotarski, J.; Weron, R. Computing electricity spot price prediction intervals using quantile regression and forecast averaging. Comput. Stat. 2015, 30, 791–803. [Google Scholar] [CrossRef]
  35. Mpfumali, P.; Sigauke, C.; Bere, A.; Mulaudzi, S. Day Ahead Hourly Global Horizontal Irradiance Forecasting—Application to South African Data. Energies 2019, 12, 3569. [Google Scholar] [CrossRef]
Figure 1. Flowchartof the modelling framework: XGBoost with Conformal Prediction vs. PCR.
Figure 1. Flowchartof the modelling framework: XGBoost with Conformal Prediction vs. PCR.
Computation 14 00056 g001
Figure 2. Distribution of variables.
Figure 2. Distribution of variables.
Computation 14 00056 g002
Figure 3. Time series wind plot.
Figure 3. Time series wind plot.
Computation 14 00056 g003
Figure 4. Missing values.
Figure 4. Missing values.
Computation 14 00056 g004
Figure 5. Pearson correlation coefficient matrix.
Figure 5. Pearson correlation coefficient matrix.
Computation 14 00056 g005
Figure 6. Conformal Prediction Interval on actual vs. predicted value at 80% training test, 10% validation and 10% test based on first 500 observations.
Figure 6. Conformal Prediction Interval on actual vs. predicted value at 80% training test, 10% validation and 10% test based on first 500 observations.
Computation 14 00056 g006
Figure 7. Conformal Prediction Interval of actual vs. predicted value at 80% training test, 10% validation and 5% test.
Figure 7. Conformal Prediction Interval of actual vs. predicted value at 80% training test, 10% validation and 5% test.
Computation 14 00056 g007
Figure 8. Selecting number of optimal components for 80% and 85% training test.
Figure 8. Selecting number of optimal components for 80% and 85% training test.
Computation 14 00056 g008
Figure 9. Actual vs. predicted values comparison for benchmark model across different data splits.
Figure 9. Actual vs. predicted values comparison for benchmark model across different data splits.
Computation 14 00056 g009aComputation 14 00056 g009b
Figure 10. Probability Integral Transform on (80% training test, 10% validation and 10% test).
Figure 10. Probability Integral Transform on (80% training test, 10% validation and 10% test).
Computation 14 00056 g010
Figure 11. Probability Integral Transform on (85% training test, 10% validation and 5% test).
Figure 11. Probability Integral Transform on (85% training test, 10% validation and 5% test).
Computation 14 00056 g011
Figure 12. Probability of change in direction (80% training test, 10% validation and 10% test).
Figure 12. Probability of change in direction (80% training test, 10% validation and 10% test).
Computation 14 00056 g012
Figure 13. Probability of change in direction (85% training test, 10% validation and 5% test).
Figure 13. Probability of change in direction (85% training test, 10% validation and 5% test).
Computation 14 00056 g013
Table 1. Summaryof wind energy forecasting methods.
Table 1. Summaryof wind energy forecasting methods.
Author/YearMethodology TypeForecasting HorizonUncertainty QuantificationKey Performance MetricsMain Limitations
[6]Statistical (ARIMA, Persistence)Short-termNot addressedQualitative reviewStruggle with nonlinear relationships between wind power and weather variables
[8]Machine Learning (ANNs, SVMs)Short-termNot addressedComparative reviewEarly-stage ML applications, limited uncertainty quantification
[10]Random ForestHour-aheadNot addressedAccuracy improvements demonstratedPoint predictions only, no uncertainty estimates
[2]Ensemble methods (temporal and geographical ensembles)Short-termProbabilistic forecasting with analogue ensemble methodsImproved uncertainty estimationComplex implementation, computational intensity
[12]XGBoost with Bayesian hyperparameter optimisation (BH-XGBoost)Short-termNot addressedSuperior performance vs. SVM, KELM, LSTM in all test conditionsPoint predictions only, no uncertainty quantification
[20]Advanced optimisation algorithmsNot specifiedNot addressedImproved model performanceFocus on optimisation rather than uncertainty
[9]Gradient Boosting Machine (GBM)Short-term (15-min intervals)Not addressedNMAE: 5.15%Point predictions only, limited to specific temporal resolution
[13]XGBoost vs. SVR, GPR, NNShort-termNot addressedXGBoost most effective for short-term predictionsNo uncertainty quantification
[21]LightGBM, RF, CatBoost, XGBoostVery short-termNot addressedMAE, MSE, RMSE, R-squared comparisonsPoint predictions only
[15]Boost-LR (XGBoost, CatBoost, RF + Linear Regression)Short-termNot addressedMAE improvements: 31.42%, 32.14%, 27.55%Ensemble improves accuracy but lacks uncertainty intervals
[5]XGBoost + LSTM + Technical Indicators (KDJ, SO, MACD)Ultra-short-termNot addressedNMAE: 0.0396; Processing time: 550 sComputational complexity, no uncertainty quantification
[4]XGBoost, RF, ANNs, KNN, MLPMedium to long-termNot addressedSuperior stability and accuracy vs. statistical methodsFocus on accuracy, not forecast reliability
[14]CNN-GRU vs. XGBoost, RFDay-aheadStatistical validation (Diebold–Mariano test)Deep learning marginally better; XGBoost competitiveHypothesis testing rather than operational uncertainty quantification
[7]XGBoost, RF, LSTM vs. traditionalComprehensive reviewNot addressedML superiority demonstratedReview format, no empirical uncertainty analysis
[18]Machine Learning + Conformal PredictionVariousConformal predictionQuantifiable uncertainty intervalsNot specifically applied to wind energy forecasting
Table 2. Summary statistics for wind energy produced.
Table 2. Summary statistics for wind energy produced.
SummaryValue
Minimum19.8
First Quartile (Q1)568.4
Median (Q2)903.0
Third Quartile (Q3)1306.8
Maximum3102.2
Mean982.5
Skewness0.7557
Kurtosis3.3057
Table 3. Variable importance comparison for 80% and 85% training sets.
Table 3. Variable importance comparison for 80% and 85% training sets.
VariablesImportance (80% Train)Importance (85% Train)
noltrend0.66370.6670
difLag120.23090.2223
difLag240.06050.0508
Hour0.01990.0302
difLag20.01610.0198
difLag10.00750.0082
Day0.00150.0016
Table 4. Model performance comparison using (80% training test, 10% validation and 10% test) and (85% training test, 10% validation and 5% test).
Table 4. Model performance comparison using (80% training test, 10% validation and 10% test) and (85% training test, 10% validation and 5% test).
Model/SetEvaluation MetricsOptimal Rounds = 143nrounds = 500nrounds = 1000
80% trainMASE1.32841.40371.4188
10% validationRMSE182.4441193.5536195.9357
10% testMAE144.1475152.3166153.9614
MBE0.0677−7.1082−7.9733
Model/SetEvaluation MetricsOptimal Rounds = 152nrounds = 500nrounds = 1000
85% trainMASE1.25511.31461.3421
10% validationRMSE182.781192.4822197.4568
5% testMAE143.5088150.3099153.4492
MBE−3.7311−17.5414−12.7386
Table 5. Model performance comparison and prediction interval evaluation using (80% training, 10% validation, 10% test) and (85% training, 10% validation, 5% test).
Table 5. Model performance comparison and prediction interval evaluation using (80% training, 10% validation, 10% test) and (85% training, 10% validation, 5% test).
Model/SetEvaluation Metrics80%/10%/10%85%/10%/5%
Training/Validation/Test SplitMASE1.38221.3224
RMSE189.0324190.2073
MAE149.9833144.6136
MBE−3.4655−4.1088
Prediction IntervalsPICP0.92750.9364
MPIW654.7578693.342
CWC2669.3332064.1
Table 6. Performance metrics comparison (80% training test, 10% validation and 10% test).
Table 6. Performance metrics comparison (80% training test, 10% validation and 10% test).
MetricXGBoostPCRWinner
MSE34,081.8935,733.24XGBoost
RMSE184.61189.03XGBoost
MAE145.22149.98XGBoost
MAPE (%)13.5814.27XGBoost
Table 7. Model comparisons.
Table 7. Model comparisons.
Diebold–Mariano Test
Null HypothesisTest Statisticp-ValueMean Loss DifferentialResult
M1 = M2−3.1820.0015−1651.345Not equally accurate
Table 8. Comprehensive model comparison (80% training test, 10% validation and 10% test).
Table 8. Comprehensive model comparison (80% training test, 10% validation and 10% test).
ModelMASERMSEMSEMAEMBEPOCID (%)Fitness
fplaqrTest101.2993179.247332,129.60140.9943−0.707071.148734.2805
f1XG1.3284182.444133,285.86144.14750.067770.707833.7562
f3PCR1.3822189.032435,733.24149.9833−3.465571.712033.6014
Table 9. Performance metrics for the three forecasting models (85% training test, 10% validation and 5% test).
Table 9. Performance metrics for the three forecasting models (85% training test, 10% validation and 5% test).
ModelMASERMSEMSEMAEMBEPOCID (%)Fitness
fplaqrTest5.1.2168178.887732,000.81139.12270.807274.240935.8077
f2XG1.2551182.781033,408.91143.5088−3.731172.967734.8014
f4PCR1.3224190.207336,178.81151.2035−4.108873.310534.2373
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Nthangeni, R.I.; Sigauke, C.; Ravele, T.; Tshisikhawe, T.H. Enhancing Short-Term Wind Energy Forecasting with XGBoost and Conformal Prediction for Robust Uncertainty Quantification. Computation 2026, 14, 56. https://doi.org/10.3390/computation14030056

AMA Style

Nthangeni RI, Sigauke C, Ravele T, Tshisikhawe TH. Enhancing Short-Term Wind Energy Forecasting with XGBoost and Conformal Prediction for Robust Uncertainty Quantification. Computation. 2026; 14(3):56. https://doi.org/10.3390/computation14030056

Chicago/Turabian Style

Nthangeni, Rabelani Innocent, Caston Sigauke, Thakhani Ravele, and Thinawanga Hangwani Tshisikhawe. 2026. "Enhancing Short-Term Wind Energy Forecasting with XGBoost and Conformal Prediction for Robust Uncertainty Quantification" Computation 14, no. 3: 56. https://doi.org/10.3390/computation14030056

APA Style

Nthangeni, R. I., Sigauke, C., Ravele, T., & Tshisikhawe, T. H. (2026). Enhancing Short-Term Wind Energy Forecasting with XGBoost and Conformal Prediction for Robust Uncertainty Quantification. Computation, 14(3), 56. https://doi.org/10.3390/computation14030056

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop