Next Article in Journal
Partial Discharge Characteristics of Typical Defects in Oil-Paper Insulation Based on Photon Detection Technology
Next Article in Special Issue
Near Real-Time Photovoltaic Power Forecasting Through Recurrent Neural Network Using Timely Open-Access Data
Previous Article in Journal
Hybrid PCM–Liquid Cooling System with Optimized Channel Design for Enhanced Thermal Management of Lithium–Ion Batteries
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Short-Term Forecasting of Unplanned Power Outages Using Machine Learning Algorithms: A Robust Feature Engineering Strategy Against Multicollinearity and Nonlinearity

by
Khathutshelo Steven Sivhugwana
* and
Edmore Ranganai
Department of Statistics, University of South Africa, Florida Campus, Johannesburg 1709, South Africa
*
Author to whom correspondence should be addressed.
Energies 2025, 18(18), 4994; https://doi.org/10.3390/en18184994
Submission received: 20 August 2025 / Revised: 17 September 2025 / Accepted: 18 September 2025 / Published: 19 September 2025

Abstract

Efficient power grid operations and effective business strategies require accurate prediction of power outages. However, predicting outages is a difficult task due to the large amount of heterogeneous, random, intermittent, and non-linear power grid data characterised by highly complex variable relationships. Attempting to simultaneously quantify these characteristics using a conventional single (linear or nonlinear) model may lead to inaccurate and costly results. To address this, we propose a hybrid RVM-WT-AdaBoostRT-RF framework using power grid data from the Electricity Supply Commission (Eskom) of South Africa. To achieve model interpretability, the least absolute shrinkage and selection operator (LASSO) is first applied to remedy the adverse effects of multicollinearity through regularisation and variable selection. Secondly, a random forest (RF) is used to select the top 10 most influential variables for each season for further analysis. A relevance vector machine (RVM) captures complex nonlinear relationships separately for each season, while the wavelet transform (WT) decomposes residuals generated from RVM into different frequency subseries (with reduced noise). These subseries are predicted with minimal bias using AdaBoost with regression and threshold (AdaBoostRT). Finally, we stack RVM, AdaBoostRT, RF, and residual individual predictions using RF as a meta-model to produce the final forecast with minimal error accumulation and efficiency. The comparative study, based on point forecast metrics, the Diebold-Mariano test, and prediction interval widths, shows that the proposed model outperforms vector autoregressive (VAR), RF, AdaBoostRT, RVM, and Naïve models. The study results can be utilised for optimising resource allocation, effective power grid management, and customer alerts.

1. Introduction

1.1. Context

In developing regions such as Africa, an adequate and planned electricity supply should be at the core of economic development strategies [1,2]. Since 2008, South Africa has experienced energy demand growth compared to other African regions, leading to power imbalances [3,4]. The work of [5,6] attributes this to several factors. Firstly, there was an increase in houses connected to the grid, due to the free basic electricity policy in 2001 and the increased population. Secondly, there is ageing infrastructure, which was characterised by low maintenance because of government delays in funding coal power stations in the early 2000s. Thirdly, there was a steady increase in electricity demand of around 50% between 1994 and 2008. This can be attributed to the lifting of international sanctions, which led to economic expansion [5]. Finally, the development of the two largest coal-based power plants in the country—Medupi in 2007 and Kusile in 2008—as well as maintenance activities for existing power plants, were frequently postponed. This led to unit malfunctions and reduced reserve margins [5,7]. In the work of [8], the authors argued that decent design and maintenance of electricity infrastructure can reduce power outages. Power outages can also be minimised with proper design and maintenance of the infrastructure [8]. It was further shown in [7] that maintenance delays lead to increased unplanned capacity loss factor (UCLF) (i.e., unplanned outages) [9] leading to increased load-shedding stages.
Due to persistent power imbalances, the Electricity Supply Commission (Eskom), which produces most (at least 90%) of the electricity in South Africa, has been implementing country-wide load-shedding since late 2007 to avoid total grid collapse [6,10]. The frequent power outages associated with these power capacity constraints in South Africa have reduced foreign direct investment (FDI) and increased production costs, thereby compounding socio-economic challenges in the country [11,12]. To alleviate this challenge, a substitute in the form of clean energy has been recommended, given the abundance of solar and wind energy resources [12,13,14,15,16]. Nonetheless, coal remains the primary source of electricity. For instance, coal accounted for more than 85% of electricity in 2021, while clean energy sources contributed only 6% [4,17,18]. According to [19], the lack of academic studies in the energy space contributed to the ongoing energy crisis in South Africa. Academic research into solar energy has gained traction in the country. However, very little has been performed on power grid management and wind energy [13] (also see [2,20]).

1.2. Motivation

The reliability and steadiness of the power grid system depend on various factors, namely, intrinsic factors (such as the life span of equipment/power plants, equipment defects, and internal maintenance), external factors (such as fluctuations in weather patterns), and human error factors (such as vandalism of infrastructure) [21]. The authors of [2] showed an interdependent relationship between power outage events, the stability of the grid system, and the uninterruptible power supply. In the work of [22], the authors argued that grid outages occur due to higher electricity demand than the system can supply (often due to increased connections to the grid or increased economic development activities) [2,5]. In turn, this results in higher power grid maintenance costs [2]. The work of [2,5,21] articulated that accurate predictions of power outages are not always available in advance for businesses, the general public, and utility operators to plan for the associated economic and maintenance costs. In this regard, a reliable power outage predictive model is essential for electricity supply reliability and effective power grid system management [23,24]. Accurate power outage predictive models can improve proactive decisions to prevent power grid strains and outages [2,25]. However, the computational complexity and operational integration of power grid data into power system planning and operational decision frameworks are the key challenges to transforming the heterogeneous large dataset into actionable outcomes.

1.3. Literature Review and Gaps

As statistical analysis and machine learning (ML) tools become available, recent research has focused on predicting and evaluating power outages based on heterogeneous data such as meteorological data [24,26]. The majority of the literature thus focuses on the impact and the effect of weather-related variables (e.g., wind, storm, snow, etc.) on the duration of power outages (see, e.g., [8,27,28,29,30,31]). In the literature, power outage predictive models have been developed using statistical techniques [27,31], ML techniques [2,20,26], and hybrid techniques [32,33] (also see Table 1).
There are several basic statistical techniques used to model causal relationships for forecasting power outages using meteorological data [21,27,28,29,30]. These include but are not limited to generalised negative binomials (GNBs) [31], generalised additive models (GAMs) [27], exponential regression [30], and generalised linear models (GLMs) [27]. However, power outage forecasting has become increasingly complex due to the use of increasingly unpredictable and high-variant weather-related variables (e.g., wind) as inputs. Despite their mathematical tractability, statistical models require pre-modelling distributional assumptions about data, and consequently, the results obtained do not accurately capture deviations from these assumptions, such as the inherent nonlinear power system patterns and dynamics (also see Table 2).
Recent advances in computing power have led to more advanced, efficient, and accurate ML algorithms. As ML methods are based on historical data, they are capable of processing nonlinear data as well as self-adjusting, making them highly robust and adaptive for predicting highly variant datasets [2,20,32]. Among many, these include neural networks (NNs) such as conventional artificial neural networks (ANNs) applied in the work of [8,20,26,28] and deep NNs (DNNs) employed in [32,34]. Despite their simplicity, ANNs tend to converge on local minima and can easily overfit small datasets. Hence, they require large amounts of data alongside strict data correlations to guarantee stability (or avoid gradient explosion) and accuracy, unlike relevance vector machines (RVMs) [35,36,37]. Though robust, recurrent neural networks (RNNs) (and their variants, such as long short-term memory (LSTM) and gated recurrent units (GRU)) are sensitive to outliers and may struggle to adequately capture local patterns, particularly when dealing with small and noisy datasets [37]. In some cases, their complexity and requirement for large data may compromise the accuracy-efficiency trade-off [37]. In fact, they are highly computationally intensive, so efficiency is surrendered for high-level accuracy. There are, however, more efficient, flexible, reliable, and accurate ML models, such as random forests (RFs) [24,38] and AdaBoost with regression and threshold (AdaBoostRT) [28], which have been successfully applied and employed (to a lesser extent) to predict power outages using meteorological datasets. Although AdaBoostRT can create strong learners from a group of weak learners [28], they are susceptible to outliers (also see Table 2). Different from RNNs, RF is effective at forecasting power outages because of its low sensitivity to noise and robustness to outliers. The complexity of RF allows it to cope with complex relationships; however, it has difficulties in comprehending at the individual tree level (also see Table 1 and Table 2).
In [24], the authors articulated that power outage prediction is a difficult task principally due to the vast amount of diverse data features and the multifaceted bases and effects of numerous factors on the power grid, which cannot be entirely explained by a single model. To overcome this challenge, ref. [33] combined classification and regression trees (CART) and Bayesian additive regression trees (BART) successfully to predict power outages due to damaged poles. Though the aforementioned regression trees (RT) are not significantly affected by outliers when applied to continuous values, they are ineffective. Also, a small variance in the data can have a huge impact on the tree’s structure. In similar work, ref. [39] found that combining a decision tree (DT), RF, and a boosted gradient tree (BT) for forecasting power outages (due to storms) yielded better and improved accuracy than individual models. The proposed ensemble decision tree model was also found to be less susceptible to noise and missing values; hence, it could learn more general representations of data. Authors of [40] employed LSTM-RNN to accurately predict South African unplanned outages (UCLF) using installed capacity, historic demand, and planned capacity loss factor (PCLF). The authors’ proposed approach showed potential for modelling and predicting UCLF. However, the study did not include the other capability loss factor (OCLF), an essential variable for predicting power outages, as it represents some elements of external and human factors (also see Table 1).
Table 1. Review summary of the related works.
Table 1. Review summary of the related works.
AuthorProblemMethodsDatasetResultsLimitations
Onaolapo et al. (2022) [2]Electricity outage Forecasting
(South Africa)
ML (ANNs, Exponential smoothing, Linear regression)Historical outage and weather conditionsANNs were highly accurate and effective over conventional methods Limited/smaller datasets were used. The model is data-greedy and easily overfits small datasets.
Das et al. (2021) [32]Outage estimation in electric power distribution systems
(USA)
ML (Deep Neural Network Ensemble (DNNE), AdaBoost+, ANN)Overhead distribution feeders outage and weather conditionsDNNE was superior over other models and captured complex relationships very wellThe complex model underestimated outages due to wind, lightning, and animals
Kankanala et al. (2014) [28]Estimating weather-related outages in distribution systems
(USA)
ML (AdaBoost+, AdaBoostRT, ANN, Linear regression, mixture of experts)Historical outage and weather conditionsAdaBoost+ captures complexity and nonlinearity well. Hence, error reduction in forecastingModel is complex, computationally expensive, and noise-sensitive. Model under-predicted outages in the sparse high-range
Han et al. (2009) [27]Prediction of power outages due to Hurricane
(USA)
Statistical methods
(GAM, GLM)
Hurricane outages and weather conditionsTo some extent, GAM enhanced predictive performance and accuracy GAM’s precision is challenged when dealing with complex variables
Kankanala et al. (2011) [29]Estimating outages due to wind and lightning on overhead distribution feeders
(USA)
Statistical methods
(Linear regression models)
Historical outages and weather conditions (wind and lightning)Models showed a high positive correlation between predictors and increased outagesFewer meteorological data were used, excluding outlier days. The proposed model struggled to handle high-range outages
Kankanala et al. (2011) [30]Estimating power outages on overhead distribution feeders
(USA)
Statistical methods
(exponential regression models)
Historical outage and weather conditions (wind and lighting)Proposed exponential methods enhanced outage forecasting to some extentFewer meteorological data were used. The inability of the proposed model to handle high-range outage values
Guikema et al. (2010) [33]Prestorm estimation of hurricane damage to electric power distribution systems
(USA)
Statistical/ML (GAM, GLM, CART, BART)Historical outage and weather conditions The proposed model captured complexity and nonlinearity effectivelyThe lack of outage data limits model capabilities. Model complexity and computationally expensive
Wanik et al. (2017) [39]Storm outage modelling for an electric distribution network
(USA)
ML/Ensemble (BT, DT, RF,
DT-RF-BT)
Electric distribution network outages and weather conditions DT-RF-BT accurately predicted outages due to stormsThere is limited data on infrastructure. Besides, only weather variables cannot account for power grid dynamics. The model is also complex
Motepe et al. (2022) [40]Forecasting unplanned capability loss factors
(South Africa)
ML/Hybrid (deep belief network (DBN), optimally pruned extreme learning machines (OP-ELM), LSTM-RNN)Historic outage, weather conditions, and capacity factorsThe hybrid DBN and LSTM-RNN outcompeted other models. Prediction error was reduced The model is computationally expensive.
To address the significant difficulties associated with the operation and administration of the power grid system due to the limited predictability of power outages, hybrid models are available in the literature, albeit to a lesser degree. In addition, the performance of the hybrid models can be enhanced through signal pre-processing methods such as wavelet transforms (WTs) [37,41] (also see Table 2). The processing of signals through WTs often entails denoising, short-term local feature extraction, and filtering. As far as we know, these strategies have been scantly explored in the power outage literature (also see Table 1). The analysis of the literature further shows that unplanned power outage forecasting research in South Africa is very scant aside from [2,20,40], making it difficult for utility managers to accurately predict when power outages might occur (also see [19]). According to [2], utility managers often rely on experience and discretion instead of reliable power outage predictive models when developing power restoration strategies. The currently untapped potential of applying hybrid methods for better planning and operation of the power grid is a very challenging task and needs significant efforts all around.
Table 2. Strengths and weaknesses of the models implemented.
Table 2. Strengths and weaknesses of the models implemented.
ModelStrengthsWeaknessesCitation
LASSOBesides being computationally efficient, LASSO is effective at regularisation, variable selection, and dimension reducibility.Selects only a subset of correlated predictors and shrinks the rest to zero.
The number of predictors selected is limited to the number of samples.
[42,43]
RFThrough the bagging technique, RF effectively handles nonlinearities, outliers, and missing values thereby avoiding model overfitting and minimising variance. Can effectively handle both classification and regression problems.Requires more training time than other decision tree-based algorithms. Complex compared to other decision tree-based algorithms.[44,45,46]
WTFrequency and time domain compatible. Remove noise and reveal patterns in the signals. Provide statistically sound signals that are simple to model and predict.It is difficult to determine the most appropriate decomposition level.[37,47]
AdaBoostRTLeveraging boosting, these methods enhance generalisation capabilities. Can avoid overfitting and minimise bias. Do not require a large training dataset.AdaBoostRT’s convergence speed depends on the threshold selected.[48,49,50]
RVMFounded on the Bayesian framework, RVMs are sparse, probabilistic and require fewer support vectors. Handles high-dimensional data well, offering greater generalisation, and preventing overfitting (high variance). Does not have to comply with Mercer’s criteria and performs very well on smaller datasets.Requires more training time for large datasets. [35,36,51,52]
VARCaptures complex and interdependent relations and structural changes in the data. Handles high dimensionality efficiently, and is easy to comprehend.Lag length affects performance. Parameters increase with dimension. In higher-dimensional spaces, sparsity is required to avoid strong correlations. Has a complex stochastic structure[53,54]
In our proposed RVM-WT-AdaBoostRT-RF hybrid framework, LASSO addresses the adverse effects of multicollinearity by selecting variables to achieve model interpretability. On the other hand, RVM captures nonlinearity and bias using the original dataset, and WT decomposes residuals from fitting RVMs into signals that are easy to predict. An AdaBoostRT model uses residual signals from RVM as input features to generate residual forecasts. Not only is RF utilised to select the best top 10 variables (to some extent account for LASSO’s inability to capture nonlinearity and enhance season-specific modelling), but RF is also used as a meta-model that fuses RVM, AdaBoostRT, RF, and residual predictions with efficiency and accuracy (minimal error variance).
The efficacy of the proposed approach is validated against RVM, AdaBoostRT, RF, vector autoregressive (VAR), and a benchmark Naive model, using hourly measurements of the power grid data accessed from the Eskom data portal. The recorded power grid data covers the period from 1 March 2021, to 30 April 2022, and includes instances of zero-inflated values. The intention is to preserve the true operational characteristics of the power grid. The data provides substantial insights into power grid operations and usage during this period.

1.4. Novelty and Contributions

There is an overabundance of outage forecasting models that exist in the literature, which can be mainly classified into three major categories, namely, statistical, ML, and hybrid models. However, the dynamic nature of the power grid necessitates the use of a suitable model to capture the multi-dimensional power grid features, viz., fluctuating power demands, varying weather conditions, and unpredictable system failures [55,56], as an attempt to use a single model will often lead to costly, inaccurate predictions, and unreliable results.
Although individual methods are efficient and easy to comprehend, they frequently lack precision when paralleled with hybrid methods [13]. It is often observed in the literature that a hybrid framework can greatly enhance prediction accuracy and robustness [13]. However, this benefit might be offset by increased model complexity and computational intensity if the hybrid strategy is not carefully designed. The novelty of the proposed stacked hybrid approach is premised on the basis that very few hybrids manage to achieve a desirable trade-off between complexity, efficiency, and accuracy in the existing literature. Primarily, hybrids that are effective, efficient, and distinguished by their significant accuracy in making deterministic and probabilistic predictions, low variability and bias, and robustness are pivotal for a successful and informed decision-making process. Still, they are very scant in the literature. We, therefore, summarise our contribution by exploiting an ensemble of regularised regression, bagging methods, wavelets, ensemble boosting methods, and vector machines, namely:
  • A preliminary examination of data utilising variance inflation factor (VIF) diagnostics revealed the presence of a high degree of multicollinearity. Hence, LASSO regression as a regularisation and variable selection procedure is used to remedy high levels of multicollinearity and predictor redundancies, thus ensuring dimensionality reduction in the model (i.e., with fewer parameters).
  • Since LASSO cannot adequately capture complex nonlinear relations, we further employ RF to select the top 10 season-based variables, thereby enhancing model interpretability, efficiency, and accuracy.
  • In our preliminary inspection of the original power grid data, we discovered that some variables were both unstable and noisy. To address these issues, we opted to utilise the superior capabilities of the sparse Bayesian RVM algorithm. By doing so, we can effectively handle complex behaviour (e.g., nonlinearity) in the data and improve forecast accuracy.
  • At their core, WTs reduce noise and the effect of outliers from the underlying time series to ease modelling and forecasting [37,41]. We, therefore, employ WTs to decompose RVM residuals into high-frequency and approximate subseries with improved and sound statistical characteristics (less noise), which are easy to model and predict.
  • Using AdaBoostRT’s robustness capabilities, we are able to minimise bias, accurately, and efficiently forecast residual subseries while utilising decomposed subseries for input features.
  • Leveraging RF’s accuracy and its ability to avoid model overfitting, these computationally efficient and bagging approaches are also used to combine RVMs, RFs, AdaBoostRTs, and residual forecasts to arrive at a final forecast with speed and minimal error accumulation.
  • In [40], the method employed UCLF alone as a predictand of power outages, using just a handful of other factors. Conversely, the proposed TUCLF.OCLF extends this by incorporating OCLFs and UCLFs. Thus, TUCLF.OCLF is a more comprehensive independent variable for predicting power outages as, (to some extent) it accounts for unplanned power outages in their totality.
  • To an extent, our proposed framework could effectively capture the seasonality effects, nonlinearity, random fluctuations, and nonstationarity patterns inherent in the power grid data.
The study has been conducted in a manner that is reliable and reproducible, and we have provided appropriate and comprehensive assessment metrics (such as point and probabilistic forecast evaluation metrics and statistical tests) that are suitable for power outage modelling and forecasting evaluation.

1.5. Structure of the Study

The rest of the study is structured as follows: The methods and materials are described carefully and thoroughly in Section 2. In Section 3, an analysis of the results and discussion are provided whilst Section 4 concludes the study.

2. Materials and Methods

2.1. Case Study Report

2.1.1. Data Description

This comprehensive study explores the feasibility of developing a predictive model for power outages in South Africa through a thorough analysis of historical power grid data. The research period spans from 1 March 2021, to 30 April 2022, as illustrated in Figure 1. The data analysis was executed using the R program (version 4.4.1), and the final dataset incorporated 43 variables as outlined in Table 3.
Figure 1 illustrates the unplanned outage data, which fluctuates from hour to hour. The unplanned outages (in MW) (i.e., y T U C L F . O C L F ) are used as a dependent variable, while other variables are considered to be independent (see Table 3). The data details presented in Table 3 are supplemented by a comprehensive glossary and individual distributions in Appendix A. These help us understand the physical or operational meaning of the variables and their significance in power outage forecasting

2.1.2. Problem Formulation

This study approximates the complex relationship between power grid factors and unplanned outages. The predictor matrix and dependent variable are, respectively, given by X   1464   × 42 = x O R C L , , x N C S   and y 1464   × 1 = y T U C L F . O C L F   1 , ,   y T U C L F . O C L F   1464 T such that
x i T = x i , 1 , , x i , 42 ,   y = y 1 y 1464 ,
where x i T i = 1 : 1464   denotes the i t h row of the predictor matrix X   and y i   denotes the i t h unplanned outage observation. Fundamentally, we would like to find an approximate function f ^ X of an unknown function f X with f :   1464 × 42   1464   × 1 such that f ^ X f X . In essence, we would like to resolve the following regression exercise:
y = f X + ζ ,
y 1 y 1464 = f x 1 x 1464 + ζ 1 ζ 1464 ,
where ζ i   is the noise or residual error associated with the i t h unplanned outage prediction.

2.1.3. Data Partition

To capture seasonal effects (including weather variation), the data were partitioned into four distinct seasons with different seasonal variations: autumn (March–May 2021), winter (June–August 2021), spring (September–November 2021), and summer (December 2021–February 2022). Fundamentally, an effective predictive system must be tested and validated across all year-round conditions and seasons (see Table 4). For experimentation, the first two months of each season, approximately 1464 h, were selected. For each season, the datasets were divided into a training set (including a 20% validation set), which accounts for 80%, and a testing set, which accounts for the remaining 20% (also see Table 4). Nonetheless, we preserved the results for the Autumn 2022 (1 March 2022–30 April 2022) dataset to evaluate models’ applicability and seasonal robustness over the years. Through analysing the four distinct seasons within a single year, we can delve into the seasonal patterns that affect the model, ensuring its robustness across different seasonal conditions. Furthermore, the incorporation of seasonal data from a different year, specifically Autumn 2022, allows us to determine the applicability of the model in scenarios beyond those encountered in the initial year.
As part of the supervised learning approach employed in this work, we optimise the ML algorithms (outlined in the following sub-sections) to estimate the function   f . The rationale is to minimise the loss or error function ( L ) between y i and y ^ i denoted by:
L y i , y ^ i   = 1 1176 i = 1 1176 ζ ^ i 2 ,
where y ^ i is the predicted value and ζ ^ i denotes the residual from the 1176 observation belonging to the training set. During the training stage of ML, the loss function will be adjusted (until–levels of parameters are reached).

2.2. Variable Selection

There are various variable and regularisation methods such as LASSO, Elastic Net and Ridge regression. LASSO encourages sparse models and performs both variable selection and regularisation to increase the interpretability and prediction accuracy of the model [43]. LASSO also excels at handling high levels of multicollinearity (see [43]). The loss function which combines ordinary least square error loss with the absolute deviation-based penalty ( j = 1 p ξ j η ) or ( l 1 -norm constraint) that the LASSO seeks to minimise is given by the following Lagrangian form:
m i n ξ p 1 2 × N i = 1 N y i j = 1 p x i j ξ j 2 + λ j = 1 p ξ j ,
where N = 10,223 pairs of predictor variables ( X ) and response variables ( y ) are denoted by { ( x i j ,   y i ) ,   i = 1 : N ; j = 1 : p }   ,   p = 42   denotes the number of predictor variables, ξ = ξ 1 ,   ξ 2 , , ξ p are the regression weights, and λ (controlling the amount of shrinkage) is a tuning parameter. If λ =   0 , we revert to the ordinary least squares equation. The larger values of λ , the more the coefficients are shrunk to zero. Hence, the model will under-fit. For an increasing λ , the coefficients are set to zero and eliminated, thereby leading to model bias. A decreasing tuning parameter λ results in a variance increase. LASSO seeks to strike the best compromise between model over-fitting and sparsity by avoiding the extreme (i.e., either positive or negative) values of the constraint η . Furthermore, LASSO regression is also applicable to feature selection since the coordinates of less significant features are truncated towards zero accordingly while those statistically insignificant ones are entirely shrunk to zero. We controlled multicollinearity, regularised, and selected variables using the function ‘cv.glmnet’ in the R program. In the subsequent analysis, variables including x R F ,   x R S A . C F ,   x I E ,   x E G G ,   x T R E I C ,   x T R E , x C S P I C , x P G U H ,   x T U C L F , and x T O C L F were excluded.
As a regularised linear regression technique, LASSO assumes a linear association between dependent and independent variables. Thus, some aspect of linearity was assumed during feature engineering or selection. Nonetheless, the drawbacks associated with linearity were later addressed through the application of nonlinear ML models. Using the remaining 31 predictor variables and the dependent variable, we further used RF to select the top 10 variables for each season. The rationale is to further reduce model dimensions, capture nonlinearity characteristics, and adequately capture specific seasonal behavior, thereby improving accuracy and robustness. In Appendix A is Algorithm A1, which summarises the implementation process in the R program.

2.3. Random Forest

RF regression is an ensemble flexible ML model that relies on the aggregation of weak predictors, namely, regression trees, to provide accurate and reliable prediction [44,45]. RF models are often used to solve regression and classification problems since they possess a high level of accuracy without complex hyperparameter tuning. They also handle (internally) missing values very well and rectify overfitting in a decision tree. A process known as bootstrapping aggregation, or bagging (see Algorithm A2), which reduces overfitting and bias, is employed to build each tree independently using a subset of the training data [45,46]. Suppose we have input data ( X r e t a i n e d ) and target data ( y r e t a i n e d ) each tree is sampled randomly with replacement (bootstrapped sample) from the training data ( X t r a i n _ r e t a i n e d ,   y t r a i n _ r e t a i n e d ). Each tree must be grown on an independent bootstrap sample from the training data. From all M (number of trees in the forest) possible variables, select m variables at random and find the most optimal split at each node (see Algorithm A2). In the end, by averaging the forecasts from all trees, the forecast is calculated through the following equation:
y ^ t r a i n _ r e t a i n e d = 1 M i = 1 M T i X t r a i n _ r e t a i n e d .
The RF assumes that when using a bootstrapped sample, the most effective data split is reached at each stage of growing a tree or forest. The resultant output from each grown tree is combined through averaging. In [45], the authors showed that randomness and variability in tree construction resulted in reduced generalisation error and a better overall model with lower variance. Different from other techniques (such as boosting), RF results do not gradually change the training set [45,46]. Overall, RFs are relatively insensitive to outliers and noise and can handle highly nonlinear interactions. This is because bagging (bootstrap averaging), which the RF employs, can effectively reduce variance in decision trees [46]. The implementation of RF (through the bagging process) in the R program is summarised in Algorithm A2 in Appendix A.

2.4. Signal Processing Methods

The Fourier transform (FT) assumes a stationary signal, and it cannot reflect time series features in the time domain [47] such that its density and marginal distribution are independent of time [57]. Due to this feature, FTs often fail to effectively and steadfastly explain non-stationary signals in the real world. While short-term Fourier Transforms (STFTs) remedy this deficiency through window functions and short localised waveforms in the time-frequency domain, these techniques have the drawback of using a fixed-width window function. To circumvent the FTs’ and STFTs’ drawbacks, WTs are often employed as they are able to handle both the time-frequency domain [47,58] and are independent of window functions [57]. WTs rely on group functions that expand and dwindle with signal frequency [57]. WTs also collect insights and meaningful information whilst removing noise and anomalies from the original time series to ease the modelling and forecasting process [37,41]. The WT presents high-frequency resolution at low frequencies and high-time resolution at high frequencies such that noise is removed and patterns or trends are revealed [37,41].

Wavelet Transform

In practice, discrete WT (DWT) is often applied to enhance models’ predictive strength, as its calculations are simpler and faster than continuous WT (CWT). Furthermore, the DWT contrasts with the CWT as its wavelet scaling factor and the translation factor are discrete (see, e.g., [13,37,47]). For a given time series sequence decomposed by DWT, the approximation ( A i ) and detailed ( D i ) subseries are, respectively, determined by:
A i + 1 t = l h l .   A i 2 . t + l ,
D i + 1 t = l g l .   A i 2 . t + l ,
where i 0 , M are decomposition levels, h and g respectively denote the low-pass and high-pass filter functions, and A 0 = y t   is the original signal. Hence, the inverse DWT is computed by the mathematical expression below:
A i t = l h l .   A i + 1 2 . t + l + l g l .   D i + 1 2 . t + l .
Despite excellent time-frequency domain features, it is difficult to compute the best decomposition level when working with WTs. This study employs maximal overlap DWT (MODWT), which is a type of DWT. Different from the traditional DWT, the MODWT is time-invariant, such that subseries signals maintain the same coefficients even if the original signal has shifted. Furthermore, MODWT offers better statistical features compared to conventional time-variant DWT (see [37,47]). The MODWT is implemented through the waveslim package in R as summarised in Algorithm A3 (see Appendix A).

2.5. AdaBoostRT Algorithm

AdaBoost is an ensemble learning boosting algorithm that aims to enhance model robustness and generalisation abilities by combining predictions from multiple learners [48,49,50]. In this algorithm, interdependence is encouraged by training weak learners to become stronger. By using sequentially trained models that correct errors made by previous models, bias is reduced. AdaBoost decision trees are built on existing tree knowledge using training data [49]. Learners study and improve their predecessors’ mistakes [48,49]. In AdaBoost, weights increase for incorrectly classified values and decrease for correctly classified values. AdaBoostRT is a type of Adaboost algorithm designed to handle regression problems. It categorises the samples as either correct or incorrect based on the absolute relative error score calculated using the equation below:
ρ i = y ^ i y i y i ,
where y ^ i and y i   respectively denote a predicted and actual unplanned outage observation. The AdaBoostRT algorithm effectively discriminates between correct and incorrect predictions by establishing a distinct threshold from the training data. Afterwards, the algorithm places greater emphasis (increased weights) on observations that are incorrectly predicted. The key drawback of AdaBoostRT is that it varies depending on the threshold selected. A higher threshold (e.g., δ > 0.4 ) will result in more values being classified as correct and a slower convergence speed for the algorithm [49]. If the threshold value is set too low, accuracy will inevitably decrease, resulting in a decrease in the ensemble algorithm’s reliability or stability [49]. The summary of the main steps involved in AdaBoostRT is presented in Algorithm A4 in Appendix A (also see [48,50] for details).

2.6. Relevance Vector Machine

In RVM, which is a variant of sparse linear models based on the hierarchical prior distribution [36], sparseness is achieved through the assumption of sparse distribution over the weights of the regression model [35,36,51]. The independent Gamma prior distribution is usually assumed on the weight parameters, whilst the Gamma hyperprior distributions are assumed on the variance parameters [36], resulting in a student t prior distribution over the weights thereby achieving sparseness. The RVMs have the same function form as the well-known SVMs; however, the kernel functions that form the basis of the RVMs do not have to comply with Mercer’s criteria (i.e., continuous symmetric positive integral operator). In addition, RVMs have a smaller number of relevance vectors (compared to support vectors used by the SVMs) and reduced sensitivity to hyperparameter settings [35,36,51,52]. RVMs, however, usually involve highly nonlinear optimisation processes [36]. The RVMs compute predictions based on the following mathematical expression (see [35,52] for details):
f X ,   w = i = 1 N w i K X , x i + ω 0   ,
where w = w 1 ,   w 2 , ,   w N T denotes the weights of the model, K . , . is the kernel function centred at different training data observations, and ω 0 is the bias term. The kernel function defines one basis function for each observation in the training dataset. To automatically select the right kernel at each location, the sparse component or element of RVMs prunes all irrelevant kernels [36]. Suppose that we are given a training dataset of input-output denoted by Θ = x n ,   t n n = 1 N and assume that the outputs or targets t n are from the model defined by the following mathematical expression [35,36]:
t n = f x n , w + ε n ,
where additive noise ε n N 0 ,   β 1 is a set of independent samples, and   β 1   is the precision of the variance of the noise term. Thus, the likelihood distribution of t n   is given by the following expression (see [35,36,51] for details):
  p t n | X , w , β = n = 1 N   P t n | x n ,   w ,   β = n = 1 N   Ν t n | f ( x n ,   w ) ,   β 1 .
For each weight hyperparameter w i , the RVM model introduces a separate hyperparameter π i (which represents the precision of the weight parameter), such that the weight parameter will have a prior distribution concentrated around zero with the following form [35,52]:
p ( w | π ) = i = 1 N Ν ( w i | 0 ,   π i 1 ) ,
where a vector π = ( π 1 ,   π 2 , π N ) and as mentioned before π i denotes the precision of the weight hyperparameter w i which controls the variability (or shrinkage). Since the resultant basis functions do not contribute to predictions, the final model will be sparse. Different from SVM, which uses sequential minimal optimisation (SMO) [59], RVMs use expectation maximisation algorithms [35,36,52]. According to [52], RVM updates its hyperparameters iteratively until a threshold condition is satisfied through the steps summarised in Algorithm A5 (see Appendix A).

2.7. Vector Autoregressive Models

VARs are data-driven methods for analysing multivariate time series data. Aside from being easy to comprehend, these techniques are (to some extent) also capable of handling data with higher dimensions and capturing structural changes [53,54]. However, they come with limitations, including challenges in interpreting estimated coefficients due to high dimensionality; choice of lag length affecting performance; and number of parameters increasing with dimension; and in high-dimensional spaces, sparsity is required to avoid multicollinearity [53,54] (also see Table 2). A VAR ( p ) model can be intuitively represented by the following equation:
y i , t = τ i + k = 1 p ϕ i 1 , k , y 1 , t k + k = 1 p ϕ i 2 , k y 2 , t k + + k = 1 p ϕ i n , k y n , t k + e i , t ,
where τ i i = 1 : n are the constants or intercept terms of the i t h time series; y i , t i = 1 : n denotes the i t h time series at time t ; p represents the lag for the model; ϕ i j , k is the effect of region   j on region   i with a lag of k   time points, e i , t   i = 1 : n is the uncorrelated noise or residual terms for the i t h series at time t . The equation above is estimated using the least squares method, with parameters estimated by minimising the sum of squares for each equation. VAR models are effective under data stationarity; otherwise, transformations (usually “VAR” differencing) need to be effected. To align the VAR model, we applied data differencing techniques to stabilise the variance and ensure that the data were stationary. To lower the significant risk of overfitting, which is increased by the small sample size and numerous variables, we further reduced the number of exogenous variables by employing the principal component analysis (PCA) method.

2.8. Proposed Framework

In terms of study contributions, the proposed RVM-WT-AdaBoostRT-RF improves upon the inadequacies of the previous work in the sense that it provides computational efficiency, effectively captures nonlinearity, avoids overfitting, and has high accuracy in the prediction of unplanned power outages using power grid parameters. The contribution of each of the models involved in the construction of the stacked hybrid model is presented in Table 5.

Proposed Stacking Prediction Approach

In stacking, several forecasting models are fused using a meta-model. In essence, each subsequent model in the stack tackles the errors of the previous model, thereby minimising the overall error. Thus, the model identifies missed patterns or trends in the previous model, thereby preventing overfitting and ultimately enhancing the stability of the model on unseen data. In this study, for instance, LASSO and RF identify and handle broad trends or aspects such as dimension reduction; RVM facilitates nonlinear probabilistic learning; WTs and AdaBoostRT focus on more detailed aspects of the residuals to minimise bias and noise. RF also ensures that predictions are more stable by effectively capturing nonlinearity and averaging errors with efficiency and accuracy when stacking. Fundamentally, the stacking approach reduces the errors of the base learner, boosts forecast accuracy, and increases the seasonal robustness of the model. The specifics of the proposed RVM-WT-AdaBoostRT-RF are also provided in Algorithm 1 (also see Appendix A; Algorithms A1–A5) and Figure 2.
Algorithm 1: RVM-WT-AdaBoostRT-RF
  • Data cleaning, formatting, dimension reduction, and partition
1. 
     Data cleaning
  • Input:   Raw _ data   10,223 × 42 ,
  • Output:   X n e w   10,223   × 42 ,   y n e w   10,223   × 1
2. 
     Detect multicollinearity through VIF
  • Input:   X n e w ,   y n e w
  • Output:  X v i f   10,223   × 42 , y v i f 10,223   × 1
3. 
     Partition data into 80% training (including validation) and 20% test set
  • Input:   X v i f ,   y v i f
  • Output:   ( X t r a i n   8178   × 42 ,   y t r a i n 8178   × 1 ) training set,
    ( X t e s t   2045   × 42 ,   y t e s t 2045   ×   1 )   test set
4. 
     Variable selection (LASSO)
  • Input:   X t r a i n , y t r a i n ,   α l a s s o
            Output: 31 retained predictors; 1 dependent variable
5. 
     Extract data to represent each season
  • Input:   X s e l e c t e d   v a r 10,223 × 31 , y s e l e c t e d   v a r 10,223 × 1
  • Output:     X s e a s o n 1464 × 31 ,     y s e a s o n 1464 × 1
6. 
     Select top 10 variable per season (RF)
  • Input:     X s e a s o n ,     y s e a s o n
  • Output:   ( X t r a i n   r e t a i n e d     1176   = 888 t r a i n +   288 v a l   × 10 ,   y t r a i n   r e t a i n e d 1176   = 888 t r a i n +   288 v a l × 1   training   set ;   ( X t e s t   r e t a i n e d   288   × 10 ,   y t e s t _ r e t a i n e d 288   ×   1 )     test   set ;   ( X r e t a i n e d   1464   × 10 , y r e t a i n e d 1464   ×   1 )     Full retained set
B.
Decomposition of RVM residuals
7. 
     Fit RVM model on the entire retained data
  • Input RVM o p t i m a l ,   X r e t a i n e d 1464   × 10 ,   y r e t a i n e d 1464   × 1
  • Output: f ^ X r e t a i n e d y ^ f i t t e d 1464   × 1
8. 
     Calculate residuals
  • Input:   y r e t a i n e d ,   y ^ f i t t e d
  • Output:   y r = y r e t a i n e d y ^ f i t t e d 1464   × 1
9. 
     Intialise wavelet parameters
  • Input:   y r ,   db 4 wavelet _ filter ,   2 n _ level ,   periodic   boundary  
  • Output:  y r ,   wavelet _ filter ,   n _ level , boundary
10. 
     Perform wavelet decomposition using modwt
  • Input:   y r ,   wavelet _ filter ,   n _ level , boundary
  • Output:   A 2 1464   × 1 A   ( Approximate   subseries ) ;   ( D 1 ,   D 2 )   1464   × 2   D (Detailed subseries)
C.
Residual predictions through AdaBoostRT
11. 
     Partition subseries into training and test datasets
  • Input y r ,   X d e c o m p o s e d = ( A , D )   1464   × 3
  • Output X t r a i n _ r   1176 = 888 t r a i n +   288 v a l   × 3 ,   y t r a i n   r 1176 = 888 t r a i n +   288 v a l     × 1   training   set   ( 80 % ) ;   ( X t e s t _ r ,   288   × 3 , y t e s t _ r 288   × 1 )   test set (20%)
12. 
     Initialise parameters
  • Set weights τ i = 1 n , weak learners V ,
  • error_threshold δ  
13. 
     Train (60%) and validate (20%) using the 80% of the retained data
13.1. 
 Tune hyperparameters
  • For each iteration i = 1   to   V
  • Input:   X t r a i n _ r , y t r a i n _ r , τ i
    Fit   a   weak   learner   G i   to   weighted   X t r a i n _ r ;   Predict   for   y t r a i n _ r  
  • Output:   Predictions   from   weak   learners   G i   ( X t r a i n _ r )
  • a. Calculate error
  •      Input:   y t r a i n _ r ,   G i   X t r a i n _ r
  • Compare   G i   ( X t r a i n _ r )   with   y t r a i n _ r :
  • If   ρ i < δ , correct; otherwise, incorrect
  •      Output: ρ i   (incorrectly classified)
                 b. Update weights
Input :   y t r a i n _ r , G ( X t r a i n _ r ) ,   τ i , ρ i  
Output :   τ i + 1 (updated weight)
                 c. Calculate model weights
Input :   ρ i  
Output :   weights   ψ i   (for the weak learners)
13.2.
  Model validation on the 20% of the data
  • Input weak   trees   G   G 1 ,   G 2 ,   ,   G V ,   weights   ψ { ψ 1 , ψ 2 , . . . , ψ V }
  • Output:   Weighted   combination   f ^ X t r a i n _ r v a l = i = 1 V ψ i   G i X t r a i n _ r v a l = f ^ A d a B o o s t R T _ r _ v a l   288   × 1
14. 
     Predicting using test data
  • Input:   f ^ X t r a i n _ r v a l ,   X t e s t _ r ,   y t e s t _ r
  • Output:   f ^ X t e s t _ r = y ^ t e s t _ r   f ^ A d a B o o s t R T _ r 288   × 1
15. 
     Model performance assessment
  • Input: y test _ retained , f ^ A d a B o o s t R T _ r
  • Output: {Mean absolute error (MAE), mean absolute percent-age error (MAPE), root mean square error (RMSE)}←Performance metrics
16. 
     Final output
  • Output: f ^ A d a B o o s t R T _ r ,   Performance metrics
D.
Stacking through RF
17. 
     Initialise model
  • Input: y t r a i n _ r e t a i n e d v a l , f ^ A d a B o o s t R T _ r _ v a l , f ^ A d a B o o s t R T _ v a l , f ^ R F _ v a l ;   f ^ R V M _ v a l
  • Output: y t r a i n _ r e t a i n e d v a l 288 × 1 ; ( f ^ A d a B o o s t R T _ r _ v a l , f ^ A d a B o o s t R T _ v a l , f ^ R F _ v a l ;   f ^ R V M _ v a l ) X ^ v a l i d a t i o n 288 × 4
18. 
     Forecast fusion (stacking)
  • Input: y t r a i n _ r e t a i n e d v a l ,     X ^ v a l i d a t i o n ;   X ^ b a s e _ m o d e l s = f ^ R F _ m o d e l , f ^ A d a B o o s t R T _ r , f ^ R V M _ m o d e l , f ^ A d a B o o s t R T   288 × 4
    Train   using   X v a l i d a t i o n ,   y t r a i n _ r e t a i n e d v a l   and   then   compute   y ^ t e s t _ r e t a i n e d = f ^ X ^ b a s e _ m o d e l s = i = 1 M T i   f ^ R F m o d e l , f ^ A d a B o o s t R T _ r , f ^ R V M m o d e l , f ^ A d a B o o s t R T
Output :   y ^ t e s t _ r e t a i n e d f ^ H y b r i d 288   × 1
19. 
     Model performance assessment
  • Input y t e s t _ r e t a i n e d ,     f ^ H y b r i d
  • Output: MAE, MAPE, RMSE, prediction interval normalised average width (PINAW), Mincer-Zarnowitz (MZ) test, and the Diebold-Mariano (DM) test←Performance metrics
20. 
     Final Output
  • Output:   f ^ H y b r i d , Performance metrics

2.9. Evaluation Metrics

2.9.1. Point Prediction Performance Indicators

In this study, the point predictions are evaluated by MAE, MAPE, and RMSE, which are, respectively, given by the following equations:
M A E = 1 n t = 1 n ζ ^ t ,
M A P E = 1 n t = 1 n ζ ^ t y t × 100 ,
R M S E = 1 n t = 1 n ζ ^ t 2 ,
where ζ ^ t = y t   y ^ t , with y t   and y ^ t   respectively denoting the actual and predicted values at time   t . The sample size is represented by n . Smaller values for MAE, MAPE, and RMSE imply a better prediction model. Though easy to comprehend, the MAE is more sensitive to extreme values than other indices. MAPE returns the error in percentage form, which is easy to interpret, but is singularity-prone [53,60]. RMSE, on the other hand, produces an absolute value in the same range as the output value. Consequently, RMSE is more suitable for operations relating to power grid stability, as it accurately reflects the inherent nature of extreme errors. However, RMSE is more prone to outliers than MAPE [53,60].

2.9.2. Residual Analysis

Suppose that y t   and y ^ t , respectively, correspond to the observed and predicted power outage values at time   t . The residual terms estimates given by ζ ^ t , t = 1 , , n , will be analysed to determine whether model predictions underestimate or overestimate actual power outages. If ζ ^ t 0 then the predictive model either overestimates ( ζ ^ t < 0 ) or underestimates ( ζ ^ t > 0 ) the actual outage values.

2.9.3. Probabilistic Performance Indicators

PINAW is employed to evaluate the reliability and certainty of prediction intervals. This prediction interval performance indicator is given by the following expression [61].
P I N A W = 1 n γ t = 1 n U t L t ,
where L t and U t respectively represent the lower and upper limits of the prediction interval. The γ   is the range indicator. The lower PINAW values are preferred as they suggest a narrower deviation from the target values and a more robust prediction interval with better generalisation characteristics. PINAWs are known to be computationally expensive and sensitive to normality deviations [60].

2.9.4. Diebold-Mariano Test

The DM test evaluates the forecasting strength of the models applied. Consider the two forecasted values donated y ^ t i ,   y ^ t j ,   for the value y t from models   i and j , respectively. Let L ζ t r =   y t r y ^ t r for r = 1 , 2 be the loss function associated with the two forecasts. Then, the loss differential function between the two forecasts is given by [62,63]:
d r = L error i -   L error j .
The DM test evaluates the null hypothesis that the two forecasts have the same predictive accuracy. Thus, the following hypothesis is tested [62,63]: H 0 :   E ( d r ) = 0 ,   vs .   H a :   E ( d r ) 0     r . The DM test statistic is calculated as follows:
D M = 1 n r = 1 n d r s 2 n ,
where S 2 is the estimated variance of d r . The calculated DM value is then compared to the critical value. In both cases, the original hypothesis will be rejected if the DM statistic is greater than the upper critical value ( Z α 2 ) or less than the lower critical value ( Z α 2 ).

2.9.5. Computational Tools

All six models were trained and tested in the Dell development environment with the following specifications: 13th Gen Intel(R) Core i7 2.50 GHz processor, 32 GB of RAM, and Windows 11. Models’ hyperparameters were tuned through cross-validation, grid search, and heuristics. The optimal parameter intervals alongside the respective R libraries are given in Table 6. On average, implementation time (in seconds) on each dataset lasted around 4–5 min.

3. Results

3.1. Exploratory Analysis

Table 7 summarises the statistics of the five datasets under experimentation. The highest power outage levels were recorded in the summer (17,558 MW), whilst the lowest was observed in the autumn (8410 MW). On average, summer power outage levels (13,928 MW) were significantly higher than in any other season. This trend may be attributed to the surge in electricity consumption during the summer months caused by intense heat in various parts of the country. For instance, during the summer months, the air temperature rises. As a result, the majority of consumers rely on air conditioners, increasing their energy consumption. Winter had the highest variance (1562.242 MW) compared to any other time of year. Power outage levels vary by season. Furthermore, the Kruskal–Wallis test at 5% level of significance resulted in p-value   < 0.05 , confirming the presence of seasonality effects in the power outage data (also see Figure 3). All datasets are platykurtic (kurtosis less than 3) with skewness ranging from 0.42 to 0.40. Figure 3 also shows non-normality in autumn, winter, and summer. Furthermore, the autumn and summer datasets are multimodal.

3.2. Empirical Results

3.2.1. Wavelet Analysis

The characteristics of the decomposed RVM residuals using Daubechies 4 (DB4) are shown in Figure 4. We compared LA8’s performance (based on RMSE and coefficient of determination ( R 2 )) with DB4’s at different decomposition levels ( M = 2 and 3). DB4 dominated LA8, with best results at M = 2. It was also found that the predictive error increased with increasing decomposition levels, which concurs with the results in [64]. It is worth noting that the Autumn 2022 dataset is reserved for evaluating the seasonal robustness of the proposed strategy over time.

3.2.2. Comparative Analysis

Table 8 shows how the hybrid (or RVM-WT-AdaBoostRT-RF), RF, RVM, AdaBoostRT, VAR, and benchmark Naive model performed in the five datasets tested. The PINAW indices measure the reliability of the prediction interval estimates, while the other indicators measure the precision of point predictions. The MZ test assesses the bias of the forecasts. The DM test compares the predictive accuracy of forecasts from each of the six models. The top performers in each category of the performance index are bolded.
Overall, comparative tests showed that the hybrid dominated all other models based on low values of RMSEs, MAEs, and MAPEs, followed by RVM, AdaBoostRT, RF, VAR, and Naive. With the exception of the autumn dataset, RVM was the single most influential model across all seasons based on RMSE, MAE, and MAPE. Furthermore, the AdaBoostRT model accounted for power outage dynamics better than the RF model for the winter, spring, and summer seasons, whilst for the autumn, it outcompeted RVM based on the least RMSE, MAE, and MAPE. All models (including the proposed hybrid) are subject to seasonal fluctuations. Overall, our analysis revealed that the hybrid model produced more accurate predictions than other models (see Figure 5 and Figure 6).
Except for the summer dataset, the hybrid model underestimated all outage datasets, as its residuals were positively skewed (indicating more positive errors). Similar results were obtained for RVM and VAR predictions across all four seasons. Except for the spring dataset, the AdaBoostRT predictions are overestimating all four datasets. The Naive model overestimated the autumn dataset while underestimating the other datasets.
The bias test was also conducted using the MZ test. To evaluate unbiasedness, the null hypothesis that the intercept term and slope term are, respectively, 0 and 1 is tested. If the p-value <   0.05, then the model is considered to be biassed; otherwise, the predictions are said to be unbiased, whilst biassed for the other. The MZ test revealed that all models (including the proposed hybrid) were biassed (with p-values   < 0.05 ) for all four datasets under investigation.
Except for the summer dataset, the hybrid model is less deviant (smaller standard deviation values) from actual outage levels than other models (see Figure 5). The best performance of the hybrid model was recorded in the autumn season, more than in any other season. With the exception of the autumn dataset, RVM dominated all the single models (in terms of small standard deviation values) across all datasets. Overall, the hybrid model provides more accurate results.
The 95% prediction interval coverage probability (PICPs) for all models for the four datasets tested were valid (i.e., PICP [94%, 96%]). At 95% prediction interval withnominal confidence (PINC), the hybrid produced the narrowest PINAW (or better-calibrated intervals) compared to all other models for autumn, winter, and spring datasets. The RVM model produced the narrowest prediction interval width (PIW) compared to other single models across all four datasets. The same model outcompeted the hybrid for the summer dataset based on the smallest PINAW. Overall, the hybrid prediction interval was much narrower for the autumn dataset and much broader for the spring dataset. Clearly, the overall analysis shows that the hybrid generated more accurate predictions with less uncertainty and better seasonal robustness.
The DM test was applied to all models. At 5% significance level, each comparison between the hybrid model and the five other models resulted in p-values   < 0.05 (i.e., H 0   Rejected ), indicating a unique and higher predictive accuracy for the hybrid model compared to the five other models.
The assessment of the seasonal adaptability of the models over the years (using Autumn and Autumn 2022) showed that the stacked hybrid method is the only unbiased and superior to all other models across all performance evaluation metrics. The RVM model outperforms all individual models in every performance evaluation metric. The DM test showed that the hybrid approach provided more accurate and unique forecasts among the six models. The hybrid method demonstrated stability and robustness to seasonal effects.
Overall, the proposed hybrid approach captures higher peaks and variations embedded in power outage data better than all models (see Figure 5). Thus, the proposed and recommended hybrid approach provides better short-term point estimates and the most reliable prediction interval estimates with less uncertainty and strong seasonal adaptability (see Figure 5).

3.2.3. Accuracy–Complexity Trade-Off

Though the proposed hybrid strategy requires more computational power (around 60 s), the resulting increased accuracy makes the extra time worthwhile (see Table 9). As a result of our proposed hybrid approach, the accuracy of forecasts for each model is improved by at least 40%, which is crucial for making well-informed decisions about resource allocation, power grid risk management, and cost minimisation in the energy space. Thus, the extra training time is immaterial relative to the benefits that come with accurate hourly prediction of unplanned power outages (also see [23,24,25]). Besides, the use of advanced computer hardware will enable efficient and effective management of this computational cost. The average computational cost limits the model’s applicability to real-time power grid control. Nonetheless, the model still remains crucial for short-term power grid operational planning.

3.2.4. Ablation Study

Table 10 presents the ablation study results based on a summer dataset, testing the efficacy of each element of the full stacked approach. The study results revealed that each model included in the full stacked or proposed approach improves forecasting performance, confirming the synergetic structure of the proposed strategy. Overall, the best and well-balanced performance was recorded for the full stacked hybrid model as compared to the partial hybrids.

3.2.5. Comparison with State-of-the-Art Methods

The proposed approach shows superiority over other models, as illustrated in the performance of metrics in Table 11. Overall, our approach demonstrated greater robustness and reliability in predicting unplanned power outages across various seasons. This can be attributed to the fact that we used LASSO and RF to reduce complex dimensionality and enhance interpretability; probabilistic sparse Bayesian learning (RVM) to capture complex data behaviour (i.e., nonlinearity, intermittence, etc.) whilst minimising overfitting; WT to decompose and smooth residuals, a boosting approach (AdaBoostRT) to reduce bias in error predictions; and bagging RF to efficiently minimise variance effects in forecast stacking. This hybrid approach eliminates redundant calculations, reduces model complexity, and reduces noise or dimensionality prior to feeding the data, thereby reducing overall execution time (to some extent).

4. Conclusions

To ensure the efficient operation of power grids and the development of effective business strategies for utility managers, efficient and accurate predictions through an appropriate and most suitable methodology are imperative. However, power grid data has a complex structure that possesses various characteristics that make it difficult to be accurately and reliably quantified by a single model. Attempting to model such a multi-dimensional and ever-changing system with a single model may not only be impractical but might also lead to costly, inaccurate predictions and unreliable results.
In our pursuit of a well-balanced hybrid model, considering complexity and accuracy, LASSO and RF were deployed to minimise high data dimensional complexity and enhance the feature engineering process, while we leverage the sparse and probabilistic Bayesian learning of RVM to capture complex data behaviour (such as nonlinearity, random fluctuations, etc.), whilst handling model overfitting. WTs (through MODWT) decomposed and smoothed residuals. In handling residuals, bias will be minimised through a weak learner boosting approach inherent in the AdaBoostRT model. Finally, RF is also used as a meta-model that combines RVM, AdaBoostRT, RF, and residual predictions with efficiency and accuracy (minimal error accumulation). Thus, this work proposes a stacked hybrid learning model, RVM-WT-AdaBoostRT-RF, to accurately and reliably predict unplanned power outages in South Africa using power outage data provided by Eskom. The power grid variables used in the study are related to electricity demand, electricity supply, renewables, storage, and outages.
The feasibility of the proposed hybrid approach was validated using RMSE, MAPE, MAE, residual analysis, 95% PINAW, the MZ test, and the DM test against the benchmark Naive, VAR, RF, RVM, and AdaBoostRT. The study’s overall findings showed that the developed hybrid model outperforms all other models (i.e., Naive, VAR, RF, RVM, and AdaBoostRT) across all datasets based on the RMSE, MAPE, MAE, residual analysis, 95% PINAW, and the DM test. In addition to improving model accuracy, the proposed strategy also provided the most accurate interval prediction estimates with a greater amount of certainty and seasonal robustness. Similar results were obtained in the work of [40]. Overall, the findings illustrate that the proposed hybrid approach exhibits varying performance depending on the dataset, the time, or the season of the year. This result concurs with those in [2,40]. The results further showed that the proposed stacking approach is robust and generalisable over short-term periods under various conditions. Furthermore, our analysis found that models achieved better results on unscaled data than on scaled data (using min-max scaling). Thus, scaling did not improve performance. This was consistent with the results in [70,71].
The season-segmented data analysis results will enhance the development of resource allocation and power grid management strategies tailored to the needs of each season. The study results will be of great value to grid management teams, utility operators, and other energy infrastructure developers in South Africa, such as Eskom, in creating effective maintenance strategies.
The proposed hybrid showed the least performance and consistent underestimation for the spring dataset, which is likely driven by wind power (one of the main predictors during this season, which is highly variable and irregular) [72]. Therefore, the study notes the importance of weather-related factors as alluded to in the literature review; however, such data were not available (to the authors). Hence, the main limitation of the study is that it does not fully incorporate weather data such as storms, rain, etc. Furthermore, the study focuses on short-term forecasting utilising seasonal-based data.
Future work could test the efficacy of the proposed hybrid approach on significantly more extensive datasets. This could involve comparisons over the years across various locations, including different provinces, and could encompass critical meteorological variables such as wind speeds, cloud cover, storm intensity, and temperature. Furthermore, the study only employed the DB4 in signal decomposition; other wavelet filters, such as least asymmetric (“la8”), could be tested. Using highly scalable, accurate, and efficient gradient boosting approaches such as a light gradient boosting machine (LightGBM) instead of AdaBoostRT could prove beneficial in terms of efficiency and accuracy. These could be compared with robust and accurate deep-learning approaches such as GRUs, LSTMs, CNNs, and RNNs.

Author Contributions

Conceptualisation, K.S.S.; methodology, K.S.S.; software, K.S.S.; validation, K.S.S. and E.R.; formal analysis, K.S.S.; investigation, K.S.S.; resources, K.S.S. and E.R.; data curation, K.S.S.; writing—original draft, K.S.S.; writing—review and editing, K.S.S. and E.R.; visualisation, K.S.S.; supervision, E.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Research Foundation rating grant RA201117574546; and the University of South Africa’s Department of Statistics ROF1 funds.

Data Availability Statement

The data presented in this study are openly available in [ESKOM] [https://www.eskom.co.za/dataportal/] (accessed on 11 August 2022).

Conflicts of Interest

The authors state that there is no conflict of interest.

Appendix A

I
Algorithms implemented
Algorithm A1: Variable Selection (through LASSO and RF)
1. 
     Load relevant R libraries (glmnet, caret, randomForest)
2. 
     Data cleaning
  • Input:   Raw _ data   10,223 × 42 ,
  • Check completeness, correctness, consistency, handle structural errors, drop irrelevant features, and create new features.
  • Output:   X n e w   10,223 × 42 ,   y n e w 10,223 × 1
3. 
     Detect multicollinearity through VIF
  • Input:   X n e w ,   y n e w
  • Check for multicollinearity using VIF
  • Output:   X v i f 10,223 × 42 ,   y v i f 10,223 × 1
4. 
     Data division for LASSO training
  • Input:   X v i f ,   y v i f
  • Partition data into an 80% training set and a 20% test set
  • Output:   ( X t r a i n   8178   × 42 ,   y t r a i n 8178   × 1 )   training   set ,   ( X t e s t   2045 × 42 ,   y t e s t 2045   ×   1 )   test set
5. 
     Variable selection (LASSO)
  • Input:   X t r a i n ,   y t r a i n ,   α l a s s o
Through   cross   validation ,   find   optimal   λ and fit LASSO
          Retain variables with non-zero coefficients ( ξ j 0 )
          Output: 31 predictors; 1 dependent variable
6. 
     Extract data to represent each season
  • Input:   X s e l e c t e d _ v a r   10,223 × 31 ,  y s e l e c t e d _ v a r 10,223 × 1
  • Extract data to represent each season
  • Output:     X s e a s o n 1464 × 31 ,     y s e a s o n 1464 × 1
7. 
     Select top 10 variable per season (RF)
  • Input:   X s e a s o n 1464 × 31 ,     y s e a s o n 1464 × 1
  • Partition data into an 80% training set [60% train + 20% val] and a 20% test set; Select the top ten variables for each season through RF
  • Output:   ( X t r a i n _ r e t a i n e d     1176 = 888 t r a i n +   288 v a l × 10 , y t r a i n _ r e t a i n e d 1176   = 888 t r a i n +   288 v a l × 1 )   training   set ;   ( X t e s t _ r e t a i n e d   288   × 10 , y t e s t _ r e t a i n e d 288   ×   1 )     test   set ;   ( X r e t a i n e d   1464   × 10 , y r e t a i n e d 1464   ×   1 )     Full retained set
Algorithm A2: RF (through bagging)
1.    
  Load relevant R libraries (caret, randomForest, ranger)
2.    
  Train (60%) and validate (20%) using 80% of the retained data
2.1. 
  Tuning hyperparameters
  • Input X t r a i n _ r e t r a i n e d 888   × 10 ,   y t r a i n _ r e t a i n e d 888   × 1  
  • Tune RF hyperparameters and find the optimal number of trees (M) and featues (m)
  • Output:   Optimised   M ,   m
  •  For each tree i = 1 to M
  • a. Bootstrapped sample generation
    • Input X t r a i n _ r e t a i n e d ,   y t r a i n _ r e t a i n e d
    • Draw each sample with a replacement from ( X t r a i n _ r e t a i n e d ,   y t r a i n _ r e t a i n e d )   to create a bootstrapped sample ( X b s t r a p e d ,   y b s t r a p e d ) = B *
    • Output:  B *
  • b. Build decision tree
    • Input:  B * ,   m
    • Build a decision tree; randomly select m at each node; grow a decision tree
    • Output: Decision tree T i
  • c. Building forest
    • Input T i
    • Built forest using all the tress ( T 1 , T 2 ,   ,   T M )
    • Output:  ( T 1 , T 2 ,   ,   T M )     RF o p t i m a l
2.2. 
  Model validation on the 20% of the data
  • Input RF o p t i m a l ,   X t r a i n _ r e t a i n e d v a l ,   y t r a i n _ r e t a i n e d v a l
  • Aggregate predictions from all trees such that y ^ t r a i n _ r e t a i n e d v a l = f ^ X t r a i n _ r e t a i n e d v a l = 1 M i = 1 M T i   X t r a i n _ r e t a i n e d v a l = f ^ R F _ v a l 288   × 1
  • Compute MAE, MAPE, and RMSE between y ^ t r a i n _ r e t a i n e d v a l and y t r a i n _ r e t a i n e d v a l
  • Output: {MAE, MAPE, RMSE} ← Performance metrics
3.    
  Predicting using the test data
  • Input:  RF o p t i m a l ,   X t e s t _ r e t a i n e d , y t e s t _ r e t a i n e d
  • Use   RF o p t i m a l on X t e s t _ r e t a i n e d to predict y t e s t _ r e t a i n e d such that f ^ X t e s t _ r e t a i n e d = y ^ t e s t _ r e t a i n e d
  • Output y ^ t e s t _ r e t a i n e d   f ^ R F _ m o d e l   288   × 1
4.    
  Model performance assessment using test data
  • Input y t e s t _ r e t a i n e d ,   f ^ R F _ m o d e l
  • Calculate MAE, MAPE, RMSE, PINAW, MZ test, and the DM test
  • Output: {MAE, MAPE, RMSE, PINAW, MZ test, DM test} ← Performance metrics
5.    
  Final output
  • Output:  f ^ R F _ m o d e l , Performance metrics
Algorithm A3: Wavelet transform (through MODWT)
1. 
  Load relevant R libraries (waveslim, forecast, caret, kernlab)
2. 
  Fit the entire retained data
  • Input:  X r e t a i n e d 1464   × 10 ,   y r e t a i n e d 1464   × 1 ,
  •    Compute   y ^ f i t t e d = f ^ X r e t a i n e d using RVM model
  • Output:  y ^ f i t t e d 1464   × 1
3. 
  Calculate residuals
  • Input y r e t a i n e d ,   y ^ f i t t e d
  •    Compute   residuals   y r =   y r e t a i n e d y ^ f i t t e d
  • Output:   y r 1464   × 1
4. 
  Set wavelet parameters
  • Input:   y r ,   db 4 wavelet _ filter ,   2 n _ level ,   periodic boundary
  • Output:  y r ,   wavelet _ filter ,   n _ level , boundary
5. 
  Perform wavelet decomposition using MODWT
  • Input:   y r ,   wavelet _ filter ,   n _ level , boundary
  •     Decompose   y r into detailed and approximate signals
  • Output:   A 2 1464   × 1     A   ( Approximate   subseries ) ;   D 1 ,   D 2 1464   × 2     D (Detailed subseries)
Algorithm A4: AdaBoostRT (through boosting)
1.    
  Load relevant R libraries (ReBoost)
2.    
  Initialise parameters
  • τ i = 1 n   weights ,   V   number   of   weak   learners , δ 0.38   error threshold
  • Output:   Initialised   τ i ,   V ,   δ
3.    
  Train (60%) and validate (20%) using 80% of the retained data
3.1. 
  Tuning hyperparameters
  • For   each   i   =   1   to   V
  • a. Training weak leaner
    • Input:   X t r a i n _ r e t a i n e d ,   y t r a i n _ r e t a i n e d ,   τ i
    • Fit   a   weak   learner   q i   to   the   weighted   X t r a i n _ r e t a i n e d
    • Predict   for   y t r a i n _ r e t a i n e d  
    • Output:     q i   ( X t r a i n _ r e t a i n e d )
  • b. Calculate error function
    • Input:   y t r a i n _ r e t a i n e d ,   q i   ( X t r a i n _ r e t a i n e d )
    •   Compare   q i ( X t r a i n _ r e t a i n e d )   with   y t r a i n _ r e t a i n e d :
    •   If   error   ρ i < δ , correct; otherwise, incorrect
    • Output:  ρ i   (incorrectly classified)
  • c. Update weights
    • Input:    y t r a i n _ r e t a i n e d ,   q i ( X t r a i n _ r e t a i n e d ) ,   τ i ,   ρ i
    • Increase incorrectly classified weights
    • Normalise the updated weights
    • Output:  τ i + 1
3.2. 
  Calculate model weights
  • Input ρ i
  • Compute   model   weight   ψ i
    •   Output ψ i   ( for   the   i t h weak learner)
3.3. 
  Preserve trees and weights
  • Input:  Weak   trees     q 1 ,   q 2 ,   ,   q V ,   weights   { ψ 1 , ψ 2 , , ψ V }
  • Store     q 1 ,   q 2 ,   ,   q V q   and   { ψ 1 , ψ 2 , . . . , ψ V }   ψ
  • Output:  q , ψ A d a B o o s t R T o p t i m a l
3.4. 
  Model validated on the 20% of the data
  • Input A d a B o o s t R T o p t i m a l , X t r a i n _ r e t a i n e d v a l ,   y t r a i n _ r e t a i n e d v a l
  • Weighted ensemble of the predictions from all trees such that f ^ X t r a i n _ r e t a i n e d v a l = i = 1 V ψ i q i   X t r a i n _ r e t a i n e d v a l = y ^ t r a i n _ r e t a i n e d v a l = f ^ A d a B o o s t R T _ v a l 288   × 1
  • Compute MAE, MAPE, and RMSE
  • Output: {MAE, MAPE, RMSE}←Performance metrics
4.    
  Predicting using test data
  • Input A d a B o o s t R T o p t i m a l ,   X t e s t _ r e t a i n e d 288   × 10 ,   y t e s t _ r e t a i n e d   288   × 1
  • Compute   f ^ X t e s t _ r e t a i n e d = i = 1 V ψ i q i   X t e s t _ r e t a i n e d = y ^ t e s t _ r e t a i n e d
  • Output:   y ^ t e s t _ r e t a i n e d f ^ A d a B o o s t R T 288   × 1
5.    
  Model performance assessment using test data
  • Input:   y t e s t _ r e t a i n e d ,   f ^ A d a B o o s t R T
  • Calculate MAE, MAPE, RMSE, PINAW, MZ test, and DM test
  • Output: {Calculate MAE, MAPE, RMSE, PINAW, MZ test, and DM test}←Performance metrics
6.    
  Final output
  • Output f ^ A d a B o o s t R T , Performance metrics
Algorithm A5: RVM (through Bayesian framework)
1. 
  Load relevant R libraries (kernlab, caret)
2. 
  Initialise Parameters
  • Set π 888   × 1 ( precision   weights ) ,   w 888   × 1 ( weights ) ,     β 1 = σ 2   ( noise   term   in   regression ) ,   ω 0     ( bias term)
  • Output π ,   σ 2 ,     w ,   ω 0  
3. 
  Train (60%) and validate (20%) using 80% of the retained data
3.1. 
 Tuning hyperparameters
  • Choose a basis and transform the data
    Input X t r a i n _ r e t a i n e d 888   × 10 ,   y t r a i n _ r e t a i n e d 888   × 1
    Define   a   basis   function   such   that   K   X t r a i n _ r e t a i n e d 888   × 888
    Output:   K   X t r a i n _ retained
  • Fit the model on training data
    Input: π, w, ω0 ,   X t r a i n _ r e t a i n e d ,   K   X t r a i n _ retained
    Compute   f ^ X t r a i n _ r e t a i n e d = i = 1 888 w i K X t r a i n _ r e t a i n e d , X i + ω 0  
    Output:  f ^ X t r a i n _ r e t a i n e d
  • Update hyperparameters
    Input:  f ^ X t r a i n _ r e t a i n e d ,   y t r a i n _ r e t a i n e d ,   ω 0  
    Through   marginal   likelihood ,   optimise   w ,   update   both   σ 2   and   π
    Using   bias   ω 0   , prune excessive weights, and remove non-relevant vectors
    Adjust   precision   to  
    Output:   updated   π ,   σ 2 ,   w ,   ω 0  
  • Check for convergence
    Input:  ϵ = i = 1 w i n + 1 w i n < ϵ T h r e s h threshold (based on weights)
  •   While convergence is not achieved, fit the model and update hyperparameters (repeat from step 3)
      If convergence is achieved, stop the process.
      Output: Convergence decision (True or False), optimised parameters ( π ,   σ 2 ,   w ,   ω 0   ) R V M o p t i m a l
3.2. 
 Model validation on the 20% of the data
Input X t r a i n _ r e t a i n e d v a l ,   R V M o p t i m a l
Use   R V M o p t i m a l   to   fit   f ^ ( X t r a i n _ r e t a i n e d v a l ) = y ^ t r a i n _ r e t a i n e d v a l = f ^ R V M _ v a l   288   × 1
Compute   MAE ,   MAPE ,   and   RMSE  
Output y ^ t r a i n _ r e t a i n e d v a l ,   { MAE ,   MAPE ,   RMSE }     Performance metrics
4.  
Predicting using test data
Input:  R V M o p t i m a l ,   X t e s t _ r e t a i n e d   288   × 10 ,   y t e s t _ r e t a i n e d   288   × 1
Compute   y ^ t e s t _ r e t a i n e d   = f ^ ( X t e s t _ r e t a i n e d ) = i = 1 288 w i K X t e s t _ retained , X i + ω 0   using   R V M o p t i m a l
Output y ^ t e s t _ r e t a i n e d 288   × 1  
5.  
Evaluate model performance on test data
Input:  y t e s t _ r e t a i n e d   ,   y ^ t e s t _ r e t a i n e d
Compute MAE, MAPE, RMSE, PINAW, MZ test, DM test
Output:  y ^ t e s t _ r e t a i n e d   f ^ R V M _ m o d e l ,   { MAE ,   MAPE ,   RMSE ,   PINAW ,   MZ   test ,   DM   test }     Performance metrics
6.  
Final output
Output f ^ R V M _ m o d e l , Performance metrics
II
Variable distributions
Figure A1. Variable distributions.
Figure A1. Variable distributions.
Energies 18 04994 g0a1
III
Glossary
Available Dispatchable Capacity (Incl Non-Comm Units)—The capacity that is available from all dispatchable generation resources, and includes non-commercial generation, as it is dispatchable energy available to support the system.
CSP—Total contracted Concentrated Solar Power generation.
Dispatchable IPP OCGT—OCGT plant that is owned by an IPP and is dispatched by Eskom National Control.
Gen Unit Hours—The number of hours that one unit at pump storage stations can generate based on the amount of water still available in the dams or the number of hours that one unit at an OCGT power station can generate based on the fuel available at that power station.
GW—Gigawatt = 1000 megawatts.
GWh—Gigawatt-hour = 1000 MWh.
Hydro Generation—Generation from large hydropower stations, and sent out onto the Transmission network.
ILS—Interruptible Load Shed. This is consumer load(s) that can be contractually interrupted without notice or reduced by remote control or on instruction from Eskom National Control. Individual contracts place limitations on usage.
International Exports—Energy that is exported from RSA to neighbouring countries.
International Imports—Energy that is imported into RSA from neighbouring countries.
IOS—Interruption of Supply. It is all contracted as well as mandatory demand reduction resources utilised by Eskom National Control. This includes interruption of supply due to Transmission network faults.
IPP—Independent Power Producers that Eskom has contracts with.
kWh—Kilowatt-hour = 1000 watt-hours.
Load Factor—The ratio of the energy generated over a specific time versus the maximum generating capability over the same period.
MLR—Manual Load Reduction. It is an estimation of the demand that has been reduced due to load shedding and/or curtailment.
MW—Megawatt = 1 million watts.
MWh—Megawatt-hour = 1000 kWh.
Non-Dispatchable Conventional IPP—IPP that uses conventional fuel sources to generate energy. These IPPs are contracted with Eskom but not dispatched by Eskom National Control.
Nuclear Generation—Generation from nuclear power stations, and sent out onto the Transmission network.
OCGT—Open Cycle Gas Turbine. Generation from open cycle gas turbine power stations, and sent out onto the Transmission network. These power stations use diesel as their primary resource.
OCLF—Other Capability Loss Factor of Eskom plant. It is the ratio between the unavailable energy of the units that cannot be dispatched, due to constraints out of the power station management control, over a period compared to the total net installed capacity of all units over the same period.
Other RE—Generation from other smaller contracted renewables (small hydro, biomass, landfill gas, etc.).
PCLF—Planned Capability Loss Factor of Eskom plant. It is the ratio between the unavailable energy of the units that are out on planned maintenance over a period compared to the total net installed capacity of all units over the same period.
Pumped Water Generation—Generation from pumped storage power stations, and sent out onto the Transmission network.
Pumping—During off-peak periods and when the system allows, water is pumped from the bottom dams at pumped storage stations to the top dams so that this water is available to generate again. During this process, energy is used from the Transmission network.
PV—Total contracted Photovoltaic generation.
Residual Demand—The hourly average demand that needs to be supplied by all resources that can be dispatched by Eskom National Control. It includes Eskom generation, international imports, dispatchable IPPs and IOS. Normally expressed in MW.
Residual Energy—The total residual demand that is summated over a period of time. Normally expressed in MWh or GWh.
Residual Forecast—The forecast of what the expected residual demand will be in the future.
RSA Contracted Demand—The hourly average demand that needs to be supplied by all resources that Eskom has contracts with. It is the residual demand including demand supplied by self-dispatched generation (such as the renewables).
RSA Contracted Energy—The total RSA contracted demand that is summated over a period of time. Normally expressed in MWh or GWh.
RSA Contracted Forecast—The forecast of what the expected RSA contracted demand will be in the future.
SCO—Synchronous Condenser Operation. The energy used (MW per hour) to overcome the frictional losses when the plant is used to assist in stabilizing the network by supplying or absorbing reactive power.
Thermal Generation—Generation from coal-fired power stations, and sent out onto the Transmission network.
Total Available Capacity (Incl Non-Comm Units and Renewables)—The capacity that is available from all generation resources that Eskom has contracts with, and includes non-commercial generation, as it is energy available to support the system.
UCLF—Unplanned Capability Loss Factor of Eskom plant. It is the ratio between the unavailable energy of the units that are out on unplanned outages over a period compared to the total net installed capacity of all units over the same period.
Wind—Total contracted Wind generation.

References

  1. Maythem, A.; Maryam, A. A New Load Forecasting Model Considering Planned Load Shedding Effect. Int. J. Energy Sect. Manag. 2018, 13, 149–165. [Google Scholar] [CrossRef]
  2. Onaolapo, A.K.; Carpanen, R.P.; Dorrell, D.G.; Ojo, E.E. A Comparative Assessment of Conventional and Artificial Neural Networks Methods for Electricity Outage Forecasting. Energies 2022, 15, 511. [Google Scholar] [CrossRef]
  3. Oladunni, O.J.; Mpofu, K.; Olanrewaju, O.A. Greenhouse Gas Emissions and Its Driving Forces in the Transport Sector of South Africa. Energy Rep. 2022, 8, 2052–2061. [Google Scholar] [CrossRef]
  4. Chikobvu, D.; Mamba, M. Modelling Emissions from Eskom’s Coal-Fired Power Stations Using Generalised Linear Models. J. Energy S. Afr. 2023, 34, 1–14. [Google Scholar] [CrossRef]
  5. Rakotonirainy, R.G.; Durbach, I.; Nyirenda, J. Considering Fairness in the Load Shedding Scheduling Problem. Orion 2019, 35, 127–144. [Google Scholar] [CrossRef]
  6. Inglesi, R.; Pouris, A. Forecasting Electricity Demand in South Africa: A Critique of Eskom’s Projections. S. Afr. J. Sci. 2010, 106, 50–53. [Google Scholar] [CrossRef]
  7. Pretorius, I.; Piketh, S.; Burger, R. The Impact of the South African Energy Crisis on Emissions. WIT Trans. Ecol. Environ. 2015, 198, 255–264. [Google Scholar]
  8. Jaech, A.; Zhang, B.; Ostendorf, M.; Kirschen, D.S. Real-Time Prediction of the Duration of Distribution System Outages. IEEE Trans. Power Syst. 2018, 34, 773–781. [Google Scholar] [CrossRef]
  9. Pombo-van Zyl, N. Warning: Stage 2 Loadshedding Returns States Eskom. ESI Afr. Afr. Power J. 2020. Available online: https://www.esi-africa.com/industry-sectors/transmission-and-distribution/warning-high-risk-of-loadshedding-returns-states-eskom/ (accessed on 9 October 2023).
  10. Marta, N.; Agnieszka, T. Load Shedding and the Energy Security of Republic of South Africa. J. Pol. Saf. Reliab. Assoc. Summer Saf. Reliab. Semin. 2015, 6, 99–108. Available online: https://bibliotekanauki.pl/articles/2069278 (accessed on 17 June 2023).
  11. IEA. Electricity Market Report—January 2022; IEA: Paris, France, 2022; Available online: https://www.iea.org/reports/electricity-market-report-january-2022 (accessed on 15 September 2023).
  12. Inglesi-Lotz, R. The Impact of Electricity Shortage on South Africa’s Economy. National Science and Technology Forum (NSTF). 2021. Available online: https://nstf.org.za/wp-content/uploads/2022/05/NSTF-2021-Loadshedding-Roula-Inglesi-Lotz.pdf (accessed on 17 December 2023).
  13. Sivhugwana, K.S.; Ranganai, E. An Ensemble Approach to Short-Term Wind Speed Predictions Using Stochastic Methods, Wavelets and Gradient Boosting Decision Trees. Wind 2024, 4, 44–67. [Google Scholar] [CrossRef]
  14. Gordon, R.; Gareth, E. Offshore Wind Energy—South Africa’s Untapped Resource. J. Energy S. Afr. 2020, 31, 26–42. [Google Scholar] [CrossRef]
  15. Fluri, T.P. The Potential of Concentrating Solar Power in South Africa. Energy Policy 2009, 37, 5075–5080. [Google Scholar] [CrossRef]
  16. Bosch, J.; Staffell, I.; Hawkes, A.D. Temporally Explicit and Spatially Resolved Global Offshore Wind Energy Potentials. Energy 2018, 163, 766–781. [Google Scholar] [CrossRef]
  17. Akinbami, O.M.; Oke, S.R.; Bodunrin, M.O. The State of Renewable Energy Development in South Africa: An Overview. Alex. Eng. J. 2021, 60, 5077–5093. [Google Scholar] [CrossRef]
  18. Statistics South Africa (Stats SA). Electricity, Gas and Water Supply Industry Report 2021. Available online: https://www.statssa.gov.za/publications/Report-41-01-02/Report-41-01-022021.pdf (accessed on 15 September 2023).
  19. Pouris, A. Energy and Fuels Research in South African Universities: A Comparative Assessment. Open Inf. Sci. J. 2008, 1, 1–9. Available online: https://repository.up.ac.za/bitstream/handle/2263/5990/Pouris_Energy(2008).pdf?sequence=1 (accessed on 17 October 2024). [CrossRef]
  20. Onaolapo, A.K.; Pillay Carpanen, R.; Dorrell, D.G.; Ojo, E.E. A Comparative Evaluation of Conventional and Computational Intelligence Techniques for Forecasting Electricity Outage. In Proceedings of the Southern African Universities Power Engineering Conference/Robotics and Mechatronics/Pattern Recognition Association of South Africa (SAUPEC/RobMech/PRASA), Potchefstroom, South Africa, 27–29 January 2021; pp. 1–6. [Google Scholar]
  21. Pahwa, A. Effect of Environmental Factors on Failure Rate of Overhead Distribution Feeders. In Proceedings of the IEEE Power Engineering Society General Meeting, Denver, CO, USA, 6–10 June 2004; pp. 691–692. [Google Scholar] [CrossRef]
  22. Tartibu, L.K.; Kabengele, K.T. Forecasting Net Energy Consumption of South Africa Using Artificial Neural Network. In Proceedings of the International Conference on the Industrial and Commercial Use of Energy (ICUE 2018), Cape Town, South Africa, 13–15 August 2018; pp. 16–22. [Google Scholar]
  23. Dahal, K.P. A Review of Maintenance Scheduling Approaches in Deregulated Power Systems. In Proceedings of the International Conference on Power Systems (ICPS 2004), Kathmandu, Nepal, 3–5 November 2004; pp. 565–570. Available online: http://hdl.handle.net/10454/2502 (accessed on 11 March 2023).
  24. Hou, H.; Zhu, S.; Geng, H.; Li, M.; Xie, Y.; Zhu, L.; Huang, Y. Spatial Distribution Assessment of Power Outage under Typhoon Disasters. Int. J. Electr. Power Energy Syst. 2021, 132, 107169. [Google Scholar] [CrossRef]
  25. Mamun, A.A.; Sohel, M.; Mohammad, N.; Haque Sunny, M.S.; Dipta, D.R.; Hossain, E. A Comprehensive Review of the Load Forecasting Techniques Using Single and Hybrid Predictive Models. IEEE Access 2020, 8, 134911–134939. [Google Scholar] [CrossRef]
  26. Oh, S.; Kong, J.; Choi, M.; Jung, J. Data-Driven Prediction Method for Power Grid State Subjected to Heavy-Rain Hazards. Appl. Sci. 2020, 10, 4693. [Google Scholar] [CrossRef]
  27. Han, S.R.; Guikema, S.D.; Quiring, S.M. Improving the Predictive Accuracy of Hurricane Power Outage Forecasts Using Generalized Additive Models. Risk Anal. 2009, 29, 1443–1453. [Google Scholar] [CrossRef]
  28. Kankanala, P.; Das, S.; Pahwa, A. AdaBoost+: An Ensemble Learning Approach for Estimating Weather-Related Outages in Distribution Systems. IEEE Trans. Power Syst. 2014, 29, 359–367. [Google Scholar] [CrossRef]
  29. Kankanala, P.; Pahwa, A.; Das, S. Regression Models for Outages Due to Wind and Lightning on Overhead Distribution Feeders. In Proceedings of the IEEE PES General Meeting 2011, Detroit, MI, USA, 24–28 July 2011; p. 3. [Google Scholar] [CrossRef]
  30. Kankanala, P.; Pahwa, A.; Das, S. Exponential Regression Models for Wind and Lightning Caused Outages on Overhead Distribution Feeders. In Proceedings of the North America Power Symposium (NAPS), Boston, MA, USA, 4–6 August 2011. [Google Scholar] [CrossRef]
  31. Liu, H.; Davidson, R.; Rosowsky, D.; Stedinger, J. Negative Binomial Regression of Electric Power Outages in Hurricanes. J. Infrastruct. Syst. 2005, 11, 258–267. [Google Scholar] [CrossRef]
  32. Das, S.; Kankanala, P.; Pahwa, A. Outage Estimation in Electric Power Distribution Systems Using a Neural Network Ensemble. Energies 2021, 14, 4797. [Google Scholar] [CrossRef]
  33. Guikema, S.D.; Quiring, S.M.; Han, S.R. Prestorm Estimation of Hurricane Damage to Electric Power Distribution Systems. Risk Anal. 2010, 30, 1744–1752. [Google Scholar] [CrossRef] [PubMed]
  34. Rizvi, M. Leveraging Deep Learning Algorithms for Predicting Power Outages and Detecting Faults: A Review. Adv. Res. 2023, 25, 80–88. [Google Scholar] [CrossRef]
  35. Tipping, M.E. Sparse Bayesian Learning and the Relevance Vector Machine. J. Mach. Learn. Res. 2001, 1, 211–244. Available online: http://www.jmlr.org/papers/volume1/tipping01a/tipping01a.pdf (accessed on 3 August 2023).
  36. Tzikas, D.G.; Wei, L.; Likas, A.C.; Yang, Y.; Galatsanos, N.P. A Tutorial on Relevance Vector Machines for Regression and Classification with Applications. EURASIP J. Adv. Signal Process. 2006, 17, 4. [Google Scholar]
  37. Sivhugwana, K.S.; Ranganai, E. Short-Term Wind Speed Prediction via Sample Entropy: A Hybridisation Approach against Gradient Disappearance and Explosion. Computation 2024, 12, 163. [Google Scholar] [CrossRef]
  38. Yuan, S.; Quiring, M.S.; Zhu, L.; Huang, Y.; Wang, J. Development of a Typhoon Power Outage Model in Guangdong, China. Int. J. Electr. Power Energy Syst. 2020, 117, 105711. [Google Scholar] [CrossRef]
  39. Wanik, D.W.; Anagnostou, E.N.; Hartman, B.M.; Frediani, M.E.B.; Astitha, M. Using Machine Learning Methods to Improve Prediction of Weather-Related Power Outages. Electr. Power Syst. Res. 2017, 146, 236–245. [Google Scholar] [CrossRef]
  40. Motepe, S.; Hasan, A.N.; Shongwe, T. Forecasting the Total South African Unplanned Capability Loss Factor Using an Ensemble of Deep Learning Techniques. Energies 2022, 15, 2546. [Google Scholar] [CrossRef]
  41. Bruce, L.M.; Koger, C.H.; Li, J. Dimensionality Reduction of Hyperspectral Data Using Discrete Wavelet Transform Feature Extraction. IEEE Trans. Geosci. Remote Sens. 2002, 40, 2331–2338. [Google Scholar] [CrossRef]
  42. Zou, H.; Hastie, T. Regularization and Variable Selection via the Elastic Net. J. R. Stat. Soc. Ser. B Stat. Methodol. 2005, 67, 301–320. [Google Scholar] [CrossRef]
  43. Ranganai, E.; Mudhombo, I. Variable Selection and Regularization in Quantile Regression via Minimum Covariance Determinant Based Weights. Entropy 2020, 23, 33. [Google Scholar] [CrossRef]
  44. Natras, R.; Soja, B.; Schmidt, M. Ensemble Machine Learning of Random Forest, AdaBoost and XGBoost for Vertical Total Electron Content Forecasting. Remote Sens. 2022, 14, 3547. [Google Scholar] [CrossRef]
  45. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  46. Bühlmann, P. Methods. In Handbook of Computational Statistics; Gentle, J., Härdle, W., Mori, Y., Eds.; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar] [CrossRef][Green Version]
  47. Haijian, S.; Wei, H.; Xing, D.; Song, X. Short-Term Wind Speed Forecasting Using Wavelet Transformation and AdaBoosting Neural Networks in Yunnan Wind Farm. IET Renew. Power Gener. 2016, 11, 374–381. [Google Scholar] [CrossRef]
  48. Solomatine, D.P.; Shrestha, D.L. AdaBoost_RT: A Boosting Algorithm for Regression Problems. In Proceedings of the 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541), Budapest, Hungary, 25–29 July 2004; IEEE: New York, NY, USA, 2004; Volume 2, pp. 1163–1168. [Google Scholar] [CrossRef]
  49. Zhang, P.; Yang, Z. A Robust AdaBoost_RT Based Ensemble Extreme Learning Machine. Math. Probl. Eng. 2015, 2015, 260970. [Google Scholar] [CrossRef]
  50. Li, R.; Sun, H.; Wei, X.; Ta, W.; Wang, H. Lithium Battery State-of-Charge Estimation Based on AdaBoost_RT-RNN. Energies 2022, 15, 6056. [Google Scholar] [CrossRef]
  51. Karatzoglou, A.; Smola, A.; Hornik, K.; Zeileis, A. kernlab—An S4 Package for Kernel Methods in R. J. Stat. Softw. 2004, 11, 1–20. [Google Scholar] [CrossRef]
  52. Fletcher, T. Relevance Vector Machines Explained. Available online: https://www.di.fc.ul.pt/~jpn/r/PRML/chp7/Fletcher_RVM_Explained.pdf (accessed on 14 March 2023).
  53. Hyndman, R.J.; Athanasopoulos, G. Forecasting: Principles and Practice, 2nd ed.; OTexts: Melbourne, Australia, 2021. [Google Scholar]
  54. Hou, P.S.; Fadzil, L.M.; Manickam, S.; Al-Shareeda, M.A. Vector Autoregression Model-Based Forecasting of Reference Evapotranspiration in Malaysia. Sustainability 2023, 15, 3675. [Google Scholar] [CrossRef]
  55. Bhattarai, B.P.; Paudyal, S.; Luo, Y.; Mohanpurkar, M.; Cheung, K.; Tonkoski, R.; Hovsapian, R.; Myers, K.S.; Zhang, R.; Zhao, P.; et al. Big Data Analytics in Smart Grids: State-of-the-Art, Challenges, Opportunities, and Future Directions. IET Smart Grid 2019, 2, 141–154. [Google Scholar] [CrossRef]
  56. Mohamed, M.A.; Eltamaly, A.M.; Farh, H.M.; Alolah, A.I. Energy Management and Renewable Energy Integration in Smart Grid System. In Proceedings of the 2015 IEEE International Conference on Smart Energy Grid Engineering (SEGE), Oshawa, ON, Canada, 17–19 August 2015; IEEE: New York, NY, USA, 2015. [Google Scholar]
  57. Arts, L.; van den Broek, E.L. The Fast Continuous Wavelet Transformation (fCWT) for Real-Time, High-Quality, Noise-Resistant Time-Frequency Analysis. Nat. Comput. Sci. 2022, 2, 47–58. [Google Scholar] [CrossRef] [PubMed]
  58. Yarmohammadi, M. A Filter Based Fisher g-Test Approach for Periodicity Detection in Time Series Analysis. Sci. Res. Essays 2011, 6, 3717–3723. [Google Scholar] [CrossRef]
  59. Hong, X.; Mitchell, R.; Di Fatta, G. Simplex Basis Function Based Sparse Least Squares Support Vector Regression. Neurocomputing 2019, 330, 394–402. [Google Scholar] [CrossRef]
  60. Gensler, A. Wind Power Ensemble Forecasting: Performance Measures and Ensemble Architectures for Deterministic and Probabilistic Forecasts. Ph.D. Thesis, University of Kassel, Kassel, Germany, 2018. [Google Scholar] [CrossRef]
  61. Sun, X.; Wang, Z.; Hu, J. Prediction Interval Construction for By-Product Gas Flow Forecasting Using Optimized Twin Extreme Learning Machine. Math. Probl. Eng. 2017, 2017, 12. [Google Scholar] [CrossRef]
  62. Diebold, F.X.; Mariano, R. Comparing Predictive Accuracy. J. Bus. Econ. Stat. 1995, 13, 253–265. [Google Scholar] [CrossRef]
  63. Zhou, Q.; Lv, Z.; Zhang, G. A Combined Forecasting System Based on Modified Multi-Objective Optimization for Short-Term Wind Speed and Wind Power Forecasting. Appl. Sci. 2021, 11, 9383. [Google Scholar] [CrossRef]
  64. Sivhugwana, K.S.; Ranganai, E. Wind Speed Forecasting with Differentially Evolved Minimum-Bandwidth Filters and Gated Recurrent Units. Forecasting 2025, 7, 27. [Google Scholar] [CrossRef]
  65. Hasanat, S.M.; Ullah, K.; Yousaf, H.; Munir, K.; Abid, S.; Bokhari, S.; Aziz, M.M.; Naqvi, S.F.M.; Ullah, Z. Enhancing Short-Term Load Forecasting with a CNN-GRU Hybrid Model: A Comparative Analysis. IEEE Access 2024, 12, 184132–184141. [Google Scholar] [CrossRef]
  66. Alhussein, M.; Aurangzeb, K.; Haider, S.I. Hybrid CNN-LSTM Model for Short-Term Individual Household Load Forecasting. IEEE Access 2020, 8, 180544–180557. [Google Scholar] [CrossRef]
  67. Marino, D.L.; Amarasinghe, K.; Manic, M. Building Energy Load Forecasting Using Deep Neural Networks. In Proceedings of the 42nd Annual Conference of the IEEE Industrial Electronics Society (IECON), Florence, Italy, 23–26 October 2016; IEEE: Piscataway, NJ, USA, 2016. [Google Scholar] [CrossRef]
  68. Tan, Z.; Zhang, J.; He, Y.; Zhang, Y.; Xiong, G.; Liu, Y. Short-Term Load Forecasting Based on Integration of SVR and Stacking. IEEE Access 2020, 8, 227719–227728. [Google Scholar] [CrossRef]
  69. Li, S.; Chen, W. A Study on Interpretable Electric Load Forecasting Model with Spatiotemporal Feature Fusion Based on Attention Mechanism. Technologies 2025, 13, 219. [Google Scholar] [CrossRef]
  70. Ahsan, M.M.; Mahmud, M.A.P.; Saha, P.K.; Gupta, K.D.; Siddique, Z. Effect of Data Scaling Methods on Machine Learning Algorithms and Model Performance. Technologies 2021, 9, 52. [Google Scholar] [CrossRef]
  71. Pinheiro, J.M.H.; de Oliveira, S.V.B.; Silva, T.H.S.; Saraiva, P.A.R.; de Souza, E.F.; Godoy, R.V.; Becker, M. The Impact of Feature Scaling in Machine Learning: Effects on Regression and Classification Tasks. arXiv 2025, arXiv:2506.08274. [Google Scholar] [CrossRef]
  72. Wright, M.A. Wind Speed Climatology in the Northern, Western, and Eastern Capes of South Africa: Implications for Wind Power. Ph.D. Thesis, University of the Witwatersrand, Johannesburg, South Africa, 2021. [Google Scholar]
Figure 1. Hourly unplanned outage levels plot for the period 1 March 2021 to 30 April 2022 (source: own image).
Figure 1. Hourly unplanned outage levels plot for the period 1 March 2021 to 30 April 2022 (source: own image).
Energies 18 04994 g001
Figure 2. Schematic representation of the proposed stacking hybrid RVM-WT-AdaBoostRT-RF model (source: own image).
Figure 2. Schematic representation of the proposed stacking hybrid RVM-WT-AdaBoostRT-RF model (source: own image).
Energies 18 04994 g002
Figure 3. The time plot, density plot, boxplot, and Q-Q plot for power outage data for Autumn (a), Winter (b), Spring (c), Summer (d), and Autumn 2022 (e) datasets (source: own image). Blue lines represent Q-Q lines and sky-blue boxes in indicate interquartile ranges.
Figure 3. The time plot, density plot, boxplot, and Q-Q plot for power outage data for Autumn (a), Winter (b), Spring (c), Summer (d), and Autumn 2022 (e) datasets (source: own image). Blue lines represent Q-Q lines and sky-blue boxes in indicate interquartile ranges.
Energies 18 04994 g003
Figure 4. Level 2 DB4 wavelet decomposition of the RVM residuals for Autumn (top left panel), Winter (top right panel), Spring (middle left panel), Summer (middle right panel), and Autumn 2022 (bottom centre panel) datasets (source: own image).
Figure 4. Level 2 DB4 wavelet decomposition of the RVM residuals for Autumn (top left panel), Winter (top right panel), Spring (middle left panel), Summer (middle right panel), and Autumn 2022 (bottom centre panel) datasets (source: own image).
Energies 18 04994 g004
Figure 5. Comparison of models’ predictions and actual power outage levels for Autumn (top left panel), Winter (top right panel), Spring (middle left panel), Summer (middle right panel), and Autumn 2022 (bottom centre panel) datasets (source: own image).
Figure 5. Comparison of models’ predictions and actual power outage levels for Autumn (top left panel), Winter (top right panel), Spring (middle left panel), Summer (middle right panel), and Autumn 2022 (bottom centre panel) datasets (source: own image).
Energies 18 04994 g005
Figure 6. Box plot comparison of models’ residuals for Autumn (top left panel), Winter (top right panel), Spring (middle left panel), Summer (middle right panel), and Autumn 2022 (bottom centre panel) datasets (source: own image).
Figure 6. Box plot comparison of models’ residuals for Autumn (top left panel), Winter (top right panel), Spring (middle left panel), Summer (middle right panel), and Autumn 2022 (bottom centre panel) datasets (source: own image).
Energies 18 04994 g006
Table 3. Power grid data description.
Table 3. Power grid data description.
VariableKeynote
x y Independent Variable; Dependent Variable
x O R F L ;   x R F ;   x R S A . C F ;
x D G ;   x I E ;   x R D ;   x R S A . C D .
ORFL = Original Residual Forecast Before Lockdown; RF = Residual Forecast; RSA.CF = Republic Of South Africa (RSA) Contracted Forecast; DG = Dispatchable Generation; IE = International Exports; RD = Residual Demand; RSA.CD = RSA Contracted Demand.
x I M ;   x T G ;
x N G ;   x E G G ;   x E . O C G T . G ;
x H W G ;   x I L S U ;
  x M L R ;   x I O S ;   x D . I P P . O C G T ;   x E . G S C O
x E . O C G T . S C O ;   x P W S C O . P ;   x P S ;   x I E C .
IM = International Imports; TG = Thermal Generation; NG = Nuclear Generation; EGG = Eskom Gas Generation; E.OCGT.G = Eskom Open Cycle Gas Turbine Generation; HWG = Hydro Water Generation; PWG = Pumped Water Generation; ILSU = Interruptible Load Shed Usage; MLR = Manual Load Reduction; IOS = Interruption of Supply Excl ILS and MLR; D.IPP.OCGT = Dispatchable Independent Power Producers Eskom Open Cycle Gas Turbine; E.GSCO = Eskom Gas Synchronous Condenser Operation; E.OCGT.SCO = Eskom Open Cycle Gas Turbine Synchronous Condenser Operation; PWSCO.P = Pumped Water Synchronous Condenser Operation Pumping; PS = Pump Storage; IEC= Installed Eskom Capacity.
x D G U H ;   x P G U H ;   x I G U H .DGUH = Drakensberg Generation Unit Hours; PGUH = Palmiet Generation Unit Hours; IGUH = Ingula Generation Unit Hours.
x W I N D ;   x P V ;   x C S P ;
x O R E ;   x T R E ;   x W I C ;   x P V I C ;
x C S P I C ;   x O R E I C ;
x T R E I C .
PV = Photovoltaic; CSP = Concentrated Solar Power; ORE = Other Renewable; TRE = Total Renewable; WIC = Wind Installed Capacity; PVIC = PV Installed Capacity; CSPIC = CSP Installed Capacity; OREIC = Other Renewable Installed Capacity; TREIC = Total Renewable Installed Capacity.
x T P C L F ;   x T U C L F ;   x T O C L F ;   x E . G S C O ;   x l a g   1 ;
  x l a g   2 ;   x l a g   24 ;   x N C S .  
TPCLF = Total Planned Capability Loss Factor of Eskom plant; TUCLF = Total Unplanned Capability Loss Factor of Eskom plant; TOCLF = Total Other Capability Loss Factor of Eskom plant; lag 1 = TUCLF. OCLF 1 h ago (i.e., to capture immediate fluctuations); lag 2 = TUCLF. OCLF 2 h ago (i.e., to capture short-term trends); lag 24 = TUCLF. OCLF 24 h ago (i.e., to capture daily patterns); NCS = Non-comm sentout (NCS).
y T U C L F . O C L F . TUCLF. OCLF = Total unplanned power outage including TOCLF
Table 4. Sample breakdown for model training and testing.
Table 4. Sample breakdown for model training and testing.
DatasetDateSampleTraining (80%)Test (20%)
Autumn1 March–30 April 202114641176288
Winter1 June–31 July 202114641176288
Spring1 September–31 October 202114641176288
Summer1 December 2021–31 January 202214881195293
Autumn 20221 March 2022–30 April 202214641176288
Table 5. Model breakdown and contribution to the proposed RVM-WT-AdaBoostRT-RF.
Table 5. Model breakdown and contribution to the proposed RVM-WT-AdaBoostRT-RF.
ModelContribution to the Strategy
LASSO✓ LASSO is used for regularisation, variable selection, and dimension reduction. As a result, unplanned power outages are accurately predicted by utilising the most relevant and significant variables.
RVM✓ These sparse Bayesian learning techniques are probabilistic frameworks that require fewer support vectors while providing accuracy and similar generalisation to that of SVMs. In fact, RVMs are able to capture data complexity behavior (such as random fluctuations, nonlinearity, intermittence, etc.) whilst preventing overfitting. Hence, RVMs are a top choice for regression of heterogeneous power grid data.
WT✓ By using frequency and time-domain compatible WTs, we can effectively eliminate noise and reveal complex patterns that exist in power outage data. Hence, RVM residuals are best decomposed with a WT, as it is efficient and can handle nonstationary fluctuations well. Consequently, these signals become statistically reliable and easy to predict, thereby enhancing the predictive power of the model.
AdaBoostRT✓ Our solution for high volatile residuals leverages AdaBoostRT capabilities to minimise model bias, and accurately forecast residual subseries while using decomposed subseries as input. As a result, bias in the forecast is minimised.
RF✓ Besides providing top-10 most importance variables (which are pivotal for robust season-specific modelling), RFs are highly efficient at capturing nonlinearity while preventing overfitting and minimising variance. We, therefore, utilise RF as a meta-model to accurately and efficiently ensemble RVM, RF, AdaBoostRT, and residual forecasts to arrive at the forecast value, while minimising error accumulation and enhancing overall model robustness.
Table 6. Model parameters settings.
Table 6. Model parameters settings.
ModelLibrariesMethodParameterOptimal Range
LASSOglmnetVariable Selectionlambda
family
nlambda
0–2
“guassian”
100–500
RFCaret, ranger,
randomForest

Bagging ensemble
mtry1–10
ntree
nodesize
100–1000
1–15

RVM

kernlab
(rvm)
Bayesian inferencekernel(“anovadot”,“rbfdot”)
sigma0–2
degree1–2
AdaBoostRT
ReBoost (AdaBoostRT)
Boosting ensemblethr0.001–0.3
power0–2
t_final30–500
WTWaveslim
(modwt)
Signal decomposition
(noise reduction)
wf
n.levels
boundary
‘db4’
2
‘periodic’
VARVars
(var)
Autoregressionlag order
p
1–3
Hybrid-Stacked
Table 7. Summary statistics for the datasets (in MW).
Table 7. Summary statistics for the datasets (in MW).
DatasetMinQ1MedianMeanQ3MaxStd.DevKurtosisSkewness
Autumn841011,18411,93111,86312,61914,867986.34640.0874−0.4222
Winter895710,91411,75412,07613,30315,8621562.242−0.77640.3735
Spring981911,96613,04413,05514,19316,5731503.281−0.73840.0026
Summer10,14412,82314,21913,92814,92417,5581396.025−0.5362−0.3862
Autumn 202210,98112,82913,74913,79314,67617,0221245.443−0.69430.1651
Table 8. Performance indicators for the developed models.
Table 8. Performance indicators for the developed models.
ModelAutumnWinterSpringSummerAutumn 2022
Point forecasts evaluation
RMSE (MW)Hybrid262.6653264.5506394.6098379.0801260.6709
RF403.1206 326.5305740.894 678.7496 383.0104
RVM414.4714 302.2173 549.6189 608.635 301.7799
AdaBoostRT390.2795 318.0112 714.8113 637.2507 359.4349
VAR2491.183 942.7002 1100.073 1201.092 812.6005
Naive1214.92 1027.169 1075.973 3252.646 993.0945
MAE (MW)Hybrid201.1949198.5543288.1561273.6507190.9103
RF287.4192253.7929538.0478 519.6454 289.1329
RVM298.4295232.7954 385.2204 543.8472 227.4744
AdaBoostRT252.9515239.721 500.6655 469.6258 264.6619
VAR2294.874 800.4576 893.751 990.4461 695.3962
Naive986.413 908.9877 902.3139 3013.444 781.4761
MAPE (%)Hybrid1.819731.64081.98972.21901.2744
RF2.58502.10583.77704.09681.9376
RVM2.90301.99252.66844.64241.5185
AdaBoostRT2.27601.97783.49943.72911.7754
VAR25.16616.74856.33917.93464.5550
Naive8.22337.46726.216919.25405.4756
Residual analysis
Standard deviation
(MW)
Hybrid257.5268257.9521376.4643379.1566261.116
RF368.4571322.784584.5953474.7601326.8649
RVM369.7274283.4306445.0868279.8739301.1337
AdaBoostRT374.4322318.5555606.9506533.957325.5703
VAR970.9904931.5841080.1361188.798722.2145
Naive1056.3751007.0251070.4231226.374767.169
Skewness/Error directionHybridUnderestimateUnderestimateUnderestimateOverestimateUnderestimate
RFOverestimate UnderestimateUnderestimateOverestimateUnderestimate
RVMUnderestimateUnderestimateUnderestimateUnderestimateUnderestimate
AdaBoostRTOverestimateOverestimate UnderestimateOverestimateUnderestimate
VARUnderestimateUnderestimate UnderestimateUnderestimateUnderestimate
NaiveOverestimateUnderestimateUnderestimateUnderestimateUnderestimate
Bias test (Conclusion)
MZ *HybridBiasedBiasedBiasedBiasedUnbiased
RFBiasedBiasedBiasedBiasedBiased
RVMBiasedBiasedBiasedBiasedBiased
AdaBoostRTBiasedBiasedBiasedBiasedBiased
VARBiasedBiasedBiasedBiasedBiased
NaiveBiasedBiasedBiasedBiasedBiased
Prediction intervals evaluation
95% PINAWHybrid21.227724.251730.235130.705230.0080
RF25.890827.873240.717832.615935.2706
RVM25.353927.848032.444417.473837.7469
AdaBoostRT27.915031.277543.604038.326636.9853
VAR68.790470.497279.642370.582757.4267
Naive86.143184.325492.846282.257679.2907
Predictive accuracy evaluation: Hybrid vs. individual models.
DM **RF H 0   Rejected H 0   Rejected H 0   Rejected H 0   Rejected H 0   Rejected
RVM H 0   Rejected H 0   Rejected H 0   Rejected H 0   Rejected H 0   Rejected
AdaBoostRT H 0   Rejected H 0   Rejected H 0   Rejected H 0   Rejected H 0   Rejected
VAR H 0   Rejected H 0   Rejected H 0   Rejected H 0   Rejected H 0   Rejected
Naive H 0   Rejected H 0   Rejected H 0   Rejected H 0   Rejected H 0   Rejected
Keynote: * MZ test: p-values < 0.05 implies that the model is biased, otherwise the model is unbiased; ** DM test: Reject the null hypothesis of the same predictive accuracy if p-value < 0.05. Bold = Best model.
Table 9. Trade-off between accuracy and complexity (excluding Autumn 2022 dataset).
Table 9. Trade-off between accuracy and complexity (excluding Autumn 2022 dataset).
ModelComputational Time
Intervals (s)
Average Computational Time (s)Hybrid vs. Single Model Time Difference (s) % Δ RMSE
RF20–30253061
RVM30–40352043
AdaBoostRT30–40352055
VAR5–107.547.5375
Naive5–107.547.5395
Hybrid50–6055--
Table 10. Ablation study using the summer dataset.
Table 10. Ablation study using the summer dataset.
ModelBlenderRMSE/MWMAPE/%PICP/%PINAW/%MAD/MW % Δ RMSE
RVM+AdaBoosRTRF458.63942.695195.563131.4419376.994521
RVM + AdaBoosRT + ζ ^ f RF430.38292.566995.221831.7904352.737514
RF+AdaBoosRTRF466.53182.645495.904433.4339304.260923
RF + AdaBoosRT + ζ ^ f RF436.39222.448594.880631.6265287.616315
RVM + RF + ζ ^ f RF399.49772.708795.563125.5820440.92435
RVM + AdaBoosRT + RF + ζ ^ f Average3337.88135.68995.221826.89025001.131781
Full stacked (Hybrid)RF379.08012.219095.904430.7052292.8671
Keynote: MAD = Median absolute deviation (measures prediction sharpness). Smaller MAD values imply better model; ζ ^ f   = residual forecast; Bold = Best model.
Table 11. Comparison the proposed method with state-of-the-art methods from the literature.
Table 11. Comparison the proposed method with state-of-the-art methods from the literature.
ModelRMSE (MW)MAPE (%)CitationData Description
Hybrid (Proposed)260.671.27PresentSouth Africa, outage data (UCLF) (MW), 2021–2022
CNN-LSTM381.662.15[65,66]American Electric Power (AEP)
and ISO New England (ISONE) load data (MW), 2014
LSTM975.005.11[65,67]AEP and ISONE load data (MW), 2014
SVR-stacking583.771.75[68]Spain, load data (MW), 2015–2018
XGBoost-stacking1087.453.62[68]Spain, load data (MW), 2015–2018
TCN-GRU-Attention1008.238.80[69]Australian electricity market
operator (AEMO) load data (MW), 2006–2011
Keynote: CNN = Convolutional neural networks; TCN = Temporal convolutional networks; Bold = Best model.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sivhugwana, K.S.; Ranganai, E. Short-Term Forecasting of Unplanned Power Outages Using Machine Learning Algorithms: A Robust Feature Engineering Strategy Against Multicollinearity and Nonlinearity. Energies 2025, 18, 4994. https://doi.org/10.3390/en18184994

AMA Style

Sivhugwana KS, Ranganai E. Short-Term Forecasting of Unplanned Power Outages Using Machine Learning Algorithms: A Robust Feature Engineering Strategy Against Multicollinearity and Nonlinearity. Energies. 2025; 18(18):4994. https://doi.org/10.3390/en18184994

Chicago/Turabian Style

Sivhugwana, Khathutshelo Steven, and Edmore Ranganai. 2025. "Short-Term Forecasting of Unplanned Power Outages Using Machine Learning Algorithms: A Robust Feature Engineering Strategy Against Multicollinearity and Nonlinearity" Energies 18, no. 18: 4994. https://doi.org/10.3390/en18184994

APA Style

Sivhugwana, K. S., & Ranganai, E. (2025). Short-Term Forecasting of Unplanned Power Outages Using Machine Learning Algorithms: A Robust Feature Engineering Strategy Against Multicollinearity and Nonlinearity. Energies, 18(18), 4994. https://doi.org/10.3390/en18184994

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop