Short-Term Forecasting of Unplanned Power Outages Using Machine Learning Algorithms: A Robust Feature Engineering Strategy Against Multicollinearity and Nonlinearity

Sivhugwana, Khathutshelo Steven; Ranganai, Edmore

doi:10.3390/en18184994

Open AccessArticle

Short-Term Forecasting of Unplanned Power Outages Using Machine Learning Algorithms: A Robust Feature Engineering Strategy Against Multicollinearity and Nonlinearity

by

Khathutshelo Steven Sivhugwana

^* and

Edmore Ranganai

Department of Statistics, University of South Africa, Florida Campus, Johannesburg 1709, South Africa

^*

Author to whom correspondence should be addressed.

Energies 2025, 18(18), 4994; https://doi.org/10.3390/en18184994

Submission received: 20 August 2025 / Revised: 17 September 2025 / Accepted: 18 September 2025 / Published: 19 September 2025

(This article belongs to the Special Issue Machine Learning Algorithms for Power Systems and Renewable Energy Applications)

Download

Browse Figures

Versions Notes

Abstract

Efficient power grid operations and effective business strategies require accurate prediction of power outages. However, predicting outages is a difficult task due to the large amount of heterogeneous, random, intermittent, and non-linear power grid data characterised by highly complex variable relationships. Attempting to simultaneously quantify these characteristics using a conventional single (linear or nonlinear) model may lead to inaccurate and costly results. To address this, we propose a hybrid RVM-WT-AdaBoostRT-RF framework using power grid data from the Electricity Supply Commission (Eskom) of South Africa. To achieve model interpretability, the least absolute shrinkage and selection operator (LASSO) is first applied to remedy the adverse effects of multicollinearity through regularisation and variable selection. Secondly, a random forest (RF) is used to select the top 10 most influential variables for each season for further analysis. A relevance vector machine (RVM) captures complex nonlinear relationships separately for each season, while the wavelet transform (WT) decomposes residuals generated from RVM into different frequency subseries (with reduced noise). These subseries are predicted with minimal bias using AdaBoost with regression and threshold (AdaBoostRT). Finally, we stack RVM, AdaBoostRT, RF, and residual individual predictions using RF as a meta-model to produce the final forecast with minimal error accumulation and efficiency. The comparative study, based on point forecast metrics, the Diebold-Mariano test, and prediction interval widths, shows that the proposed model outperforms vector autoregressive (VAR), RF, AdaBoostRT, RVM, and Naïve models. The study results can be utilised for optimising resource allocation, effective power grid management, and customer alerts.

Keywords:

machine learning; forecasting; power outage; load-shedding; South Africa; Eskom

1. Introduction

1.1. Context

In developing regions such as Africa, an adequate and planned electricity supply should be at the core of economic development strategies [1,2]. Since 2008, South Africa has experienced energy demand growth compared to other African regions, leading to power imbalances [3,4]. The work of [5,6] attributes this to several factors. Firstly, there was an increase in houses connected to the grid, due to the free basic electricity policy in 2001 and the increased population. Secondly, there is ageing infrastructure, which was characterised by low maintenance because of government delays in funding coal power stations in the early 2000s. Thirdly, there was a steady increase in electricity demand of around 50% between 1994 and 2008. This can be attributed to the lifting of international sanctions, which led to economic expansion [5]. Finally, the development of the two largest coal-based power plants in the country—Medupi in 2007 and Kusile in 2008—as well as maintenance activities for existing power plants, were frequently postponed. This led to unit malfunctions and reduced reserve margins [5,7]. In the work of [8], the authors argued that decent design and maintenance of electricity infrastructure can reduce power outages. Power outages can also be minimised with proper design and maintenance of the infrastructure [8]. It was further shown in [7] that maintenance delays lead to increased unplanned capacity loss factor (UCLF) (i.e., unplanned outages) [9] leading to increased load-shedding stages.

Due to persistent power imbalances, the Electricity Supply Commission (Eskom), which produces most (at least 90%) of the electricity in South Africa, has been implementing country-wide load-shedding since late 2007 to avoid total grid collapse [6,10]. The frequent power outages associated with these power capacity constraints in South Africa have reduced foreign direct investment (FDI) and increased production costs, thereby compounding socio-economic challenges in the country [11,12]. To alleviate this challenge, a substitute in the form of clean energy has been recommended, given the abundance of solar and wind energy resources [12,13,14,15,16]. Nonetheless, coal remains the primary source of electricity. For instance, coal accounted for more than 85% of electricity in 2021, while clean energy sources contributed only 6% [4,17,18]. According to [19], the lack of academic studies in the energy space contributed to the ongoing energy crisis in South Africa. Academic research into solar energy has gained traction in the country. However, very little has been performed on power grid management and wind energy [13] (also see [2,20]).

1.2. Motivation

The reliability and steadiness of the power grid system depend on various factors, namely, intrinsic factors (such as the life span of equipment/power plants, equipment defects, and internal maintenance), external factors (such as fluctuations in weather patterns), and human error factors (such as vandalism of infrastructure) [21]. The authors of [2] showed an interdependent relationship between power outage events, the stability of the grid system, and the uninterruptible power supply. In the work of [22], the authors argued that grid outages occur due to higher electricity demand than the system can supply (often due to increased connections to the grid or increased economic development activities) [2,5]. In turn, this results in higher power grid maintenance costs [2]. The work of [2,5,21] articulated that accurate predictions of power outages are not always available in advance for businesses, the general public, and utility operators to plan for the associated economic and maintenance costs. In this regard, a reliable power outage predictive model is essential for electricity supply reliability and effective power grid system management [23,24]. Accurate power outage predictive models can improve proactive decisions to prevent power grid strains and outages [2,25]. However, the computational complexity and operational integration of power grid data into power system planning and operational decision frameworks are the key challenges to transforming the heterogeneous large dataset into actionable outcomes.

1.3. Literature Review and Gaps

As statistical analysis and machine learning (ML) tools become available, recent research has focused on predicting and evaluating power outages based on heterogeneous data such as meteorological data [24,26]. The majority of the literature thus focuses on the impact and the effect of weather-related variables (e.g., wind, storm, snow, etc.) on the duration of power outages (see, e.g., [8,27,28,29,30,31]). In the literature, power outage predictive models have been developed using statistical techniques [27,31], ML techniques [2,20,26], and hybrid techniques [32,33] (also see Table 1).

There are several basic statistical techniques used to model causal relationships for forecasting power outages using meteorological data [21,27,28,29,30]. These include but are not limited to generalised negative binomials (GNBs) [31], generalised additive models (GAMs) [27], exponential regression [30], and generalised linear models (GLMs) [27]. However, power outage forecasting has become increasingly complex due to the use of increasingly unpredictable and high-variant weather-related variables (e.g., wind) as inputs. Despite their mathematical tractability, statistical models require pre-modelling distributional assumptions about data, and consequently, the results obtained do not accurately capture deviations from these assumptions, such as the inherent nonlinear power system patterns and dynamics (also see Table 2).

Recent advances in computing power have led to more advanced, efficient, and accurate ML algorithms. As ML methods are based on historical data, they are capable of processing nonlinear data as well as self-adjusting, making them highly robust and adaptive for predicting highly variant datasets [2,20,32]. Among many, these include neural networks (NNs) such as conventional artificial neural networks (ANNs) applied in the work of [8,20,26,28] and deep NNs (DNNs) employed in [32,34]. Despite their simplicity, ANNs tend to converge on local minima and can easily overfit small datasets. Hence, they require large amounts of data alongside strict data correlations to guarantee stability (or avoid gradient explosion) and accuracy, unlike relevance vector machines (RVMs) [35,36,37]. Though robust, recurrent neural networks (RNNs) (and their variants, such as long short-term memory (LSTM) and gated recurrent units (GRU)) are sensitive to outliers and may struggle to adequately capture local patterns, particularly when dealing with small and noisy datasets [37]. In some cases, their complexity and requirement for large data may compromise the accuracy-efficiency trade-off [37]. In fact, they are highly computationally intensive, so efficiency is surrendered for high-level accuracy. There are, however, more efficient, flexible, reliable, and accurate ML models, such as random forests (RFs) [24,38] and AdaBoost with regression and threshold (AdaBoostRT) [28], which have been successfully applied and employed (to a lesser extent) to predict power outages using meteorological datasets. Although AdaBoostRT can create strong learners from a group of weak learners [28], they are susceptible to outliers (also see Table 2). Different from RNNs, RF is effective at forecasting power outages because of its low sensitivity to noise and robustness to outliers. The complexity of RF allows it to cope with complex relationships; however, it has difficulties in comprehending at the individual tree level (also see Table 1 and Table 2).

In [24], the authors articulated that power outage prediction is a difficult task principally due to the vast amount of diverse data features and the multifaceted bases and effects of numerous factors on the power grid, which cannot be entirely explained by a single model. To overcome this challenge, ref. [33] combined classification and regression trees (CART) and Bayesian additive regression trees (BART) successfully to predict power outages due to damaged poles. Though the aforementioned regression trees (RT) are not significantly affected by outliers when applied to continuous values, they are ineffective. Also, a small variance in the data can have a huge impact on the tree’s structure. In similar work, ref. [39] found that combining a decision tree (DT), RF, and a boosted gradient tree (BT) for forecasting power outages (due to storms) yielded better and improved accuracy than individual models. The proposed ensemble decision tree model was also found to be less susceptible to noise and missing values; hence, it could learn more general representations of data. Authors of [40] employed LSTM-RNN to accurately predict South African unplanned outages (UCLF) using installed capacity, historic demand, and planned capacity loss factor (PCLF). The authors’ proposed approach showed potential for modelling and predicting UCLF. However, the study did not include the other capability loss factor (OCLF), an essential variable for predicting power outages, as it represents some elements of external and human factors (also see Table 1).

Table 1. Review summary of the related works.

Author	Problem	Methods	Dataset	Results	Limitations
Onaolapo et al. (2022) [2]	Electricity outage Forecasting (South Africa)	ML (ANNs, Exponential smoothing, Linear regression)	Historical outage and weather conditions	ANNs were highly accurate and effective over conventional methods	Limited/smaller datasets were used. The model is data-greedy and easily overfits small datasets.
Das et al. (2021) [32]	Outage estimation in electric power distribution systems (USA)	ML (Deep Neural Network Ensemble (DNNE), AdaBoost+, ANN)	Overhead distribution feeders outage and weather conditions	DNNE was superior over other models and captured complex relationships very well	The complex model underestimated outages due to wind, lightning, and animals
Kankanala et al. (2014) [28]	Estimating weather-related outages in distribution systems (USA)	ML (AdaBoost+, AdaBoostRT, ANN, Linear regression, mixture of experts)	Historical outage and weather conditions	AdaBoost+ captures complexity and nonlinearity well. Hence, error reduction in forecasting	Model is complex, computationally expensive, and noise-sensitive. Model under-predicted outages in the sparse high-range
Han et al. (2009) [27]	Prediction of power outages due to Hurricane (USA)	Statistical methods (GAM, GLM)	Hurricane outages and weather conditions	To some extent, GAM enhanced predictive performance and accuracy	GAM’s precision is challenged when dealing with complex variables
Kankanala et al. (2011) [29]	Estimating outages due to wind and lightning on overhead distribution feeders (USA)	Statistical methods (Linear regression models)	Historical outages and weather conditions (wind and lightning)	Models showed a high positive correlation between predictors and increased outages	Fewer meteorological data were used, excluding outlier days. The proposed model struggled to handle high-range outages
Kankanala et al. (2011) [30]	Estimating power outages on overhead distribution feeders (USA)	Statistical methods (exponential regression models)	Historical outage and weather conditions (wind and lighting)	Proposed exponential methods enhanced outage forecasting to some extent	Fewer meteorological data were used. The inability of the proposed model to handle high-range outage values
Guikema et al. (2010) [33]	Prestorm estimation of hurricane damage to electric power distribution systems (USA)	Statistical/ML (GAM, GLM, CART, BART)	Historical outage and weather conditions	The proposed model captured complexity and nonlinearity effectively	The lack of outage data limits model capabilities. Model complexity and computationally expensive
Wanik et al. (2017) [39]	Storm outage modelling for an electric distribution network (USA)	ML/Ensemble (BT, DT, RF, DT-RF-BT)	Electric distribution network outages and weather conditions	DT-RF-BT accurately predicted outages due to storms	There is limited data on infrastructure. Besides, only weather variables cannot account for power grid dynamics. The model is also complex
Motepe et al. (2022) [40]	Forecasting unplanned capability loss factors (South Africa)	ML/Hybrid (deep belief network (DBN), optimally pruned extreme learning machines (OP-ELM), LSTM-RNN)	Historic outage, weather conditions, and capacity factors	The hybrid DBN and LSTM-RNN outcompeted other models. Prediction error was reduced	The model is computationally expensive.

To address the significant difficulties associated with the operation and administration of the power grid system due to the limited predictability of power outages, hybrid models are available in the literature, albeit to a lesser degree. In addition, the performance of the hybrid models can be enhanced through signal pre-processing methods such as wavelet transforms (WTs) [37,41] (also see Table 2). The processing of signals through WTs often entails denoising, short-term local feature extraction, and filtering. As far as we know, these strategies have been scantly explored in the power outage literature (also see Table 1). The analysis of the literature further shows that unplanned power outage forecasting research in South Africa is very scant aside from [2,20,40], making it difficult for utility managers to accurately predict when power outages might occur (also see [19]). According to [2], utility managers often rely on experience and discretion instead of reliable power outage predictive models when developing power restoration strategies. The currently untapped potential of applying hybrid methods for better planning and operation of the power grid is a very challenging task and needs significant efforts all around.

Table 2. Strengths and weaknesses of the models implemented.

Model	Strengths	Weaknesses	Citation
LASSO	Besides being computationally efficient, LASSO is effective at regularisation, variable selection, and dimension reducibility.	Selects only a subset of correlated predictors and shrinks the rest to zero. The number of predictors selected is limited to the number of samples.	[42,43]
RF	Through the bagging technique, RF effectively handles nonlinearities, outliers, and missing values thereby avoiding model overfitting and minimising variance. Can effectively handle both classification and regression problems.	Requires more training time than other decision tree-based algorithms. Complex compared to other decision tree-based algorithms.	[44,45,46]
WT	Frequency and time domain compatible. Remove noise and reveal patterns in the signals. Provide statistically sound signals that are simple to model and predict.	It is difficult to determine the most appropriate decomposition level.	[37,47]
AdaBoostRT	Leveraging boosting, these methods enhance generalisation capabilities. Can avoid overfitting and minimise bias. Do not require a large training dataset.	AdaBoostRT’s convergence speed depends on the threshold selected.	[48,49,50]
RVM	Founded on the Bayesian framework, RVMs are sparse, probabilistic and require fewer support vectors. Handles high-dimensional data well, offering greater generalisation, and preventing overfitting (high variance). Does not have to comply with Mercer’s criteria and performs very well on smaller datasets.	Requires more training time for large datasets.	[35,36,51,52]
VAR	Captures complex and interdependent relations and structural changes in the data. Handles high dimensionality efficiently, and is easy to comprehend.	Lag length affects performance. Parameters increase with dimension. In higher-dimensional spaces, sparsity is required to avoid strong correlations. Has a complex stochastic structure	[53,54]

In our proposed RVM-WT-AdaBoostRT-RF hybrid framework, LASSO addresses the adverse effects of multicollinearity by selecting variables to achieve model interpretability. On the other hand, RVM captures nonlinearity and bias using the original dataset, and WT decomposes residuals from fitting RVMs into signals that are easy to predict. An AdaBoostRT model uses residual signals from RVM as input features to generate residual forecasts. Not only is RF utilised to select the best top 10 variables (to some extent account for LASSO’s inability to capture nonlinearity and enhance season-specific modelling), but RF is also used as a meta-model that fuses RVM, AdaBoostRT, RF, and residual predictions with efficiency and accuracy (minimal error variance).

The efficacy of the proposed approach is validated against RVM, AdaBoostRT, RF, vector autoregressive (VAR), and a benchmark Naive model, using hourly measurements of the power grid data accessed from the Eskom data portal. The recorded power grid data covers the period from 1 March 2021, to 30 April 2022, and includes instances of zero-inflated values. The intention is to preserve the true operational characteristics of the power grid. The data provides substantial insights into power grid operations and usage during this period.

1.4. Novelty and Contributions

There is an overabundance of outage forecasting models that exist in the literature, which can be mainly classified into three major categories, namely, statistical, ML, and hybrid models. However, the dynamic nature of the power grid necessitates the use of a suitable model to capture the multi-dimensional power grid features, viz., fluctuating power demands, varying weather conditions, and unpredictable system failures [55,56], as an attempt to use a single model will often lead to costly, inaccurate predictions, and unreliable results.

Although individual methods are efficient and easy to comprehend, they frequently lack precision when paralleled with hybrid methods [13]. It is often observed in the literature that a hybrid framework can greatly enhance prediction accuracy and robustness [13]. However, this benefit might be offset by increased model complexity and computational intensity if the hybrid strategy is not carefully designed. The novelty of the proposed stacked hybrid approach is premised on the basis that very few hybrids manage to achieve a desirable trade-off between complexity, efficiency, and accuracy in the existing literature. Primarily, hybrids that are effective, efficient, and distinguished by their significant accuracy in making deterministic and probabilistic predictions, low variability and bias, and robustness are pivotal for a successful and informed decision-making process. Still, they are very scant in the literature. We, therefore, summarise our contribution by exploiting an ensemble of regularised regression, bagging methods, wavelets, ensemble boosting methods, and vector machines, namely:

A preliminary examination of data utilising variance inflation factor (VIF) diagnostics revealed the presence of a high degree of multicollinearity. Hence, LASSO regression as a regularisation and variable selection procedure is used to remedy high levels of multicollinearity and predictor redundancies, thus ensuring dimensionality reduction in the model (i.e., with fewer parameters).
Since LASSO cannot adequately capture complex nonlinear relations, we further employ RF to select the top 10 season-based variables, thereby enhancing model interpretability, efficiency, and accuracy.
In our preliminary inspection of the original power grid data, we discovered that some variables were both unstable and noisy. To address these issues, we opted to utilise the superior capabilities of the sparse Bayesian RVM algorithm. By doing so, we can effectively handle complex behaviour (e.g., nonlinearity) in the data and improve forecast accuracy.
At their core, WTs reduce noise and the effect of outliers from the underlying time series to ease modelling and forecasting [37,41]. We, therefore, employ WTs to decompose RVM residuals into high-frequency and approximate subseries with improved and sound statistical characteristics (less noise), which are easy to model and predict.
Using AdaBoostRT’s robustness capabilities, we are able to minimise bias, accurately, and efficiently forecast residual subseries while utilising decomposed subseries for input features.
Leveraging RF’s accuracy and its ability to avoid model overfitting, these computationally efficient and bagging approaches are also used to combine RVMs, RFs, AdaBoostRTs, and residual forecasts to arrive at a final forecast with speed and minimal error accumulation.
In [40], the method employed UCLF alone as a predictand of power outages, using just a handful of other factors. Conversely, the proposed TUCLF.OCLF extends this by incorporating OCLFs and UCLFs. Thus, TUCLF.OCLF is a more comprehensive independent variable for predicting power outages as, (to some extent) it accounts for unplanned power outages in their totality.
To an extent, our proposed framework could effectively capture the seasonality effects, nonlinearity, random fluctuations, and nonstationarity patterns inherent in the power grid data.

The study has been conducted in a manner that is reliable and reproducible, and we have provided appropriate and comprehensive assessment metrics (such as point and probabilistic forecast evaluation metrics and statistical tests) that are suitable for power outage modelling and forecasting evaluation.

1.5. Structure of the Study

The rest of the study is structured as follows: The methods and materials are described carefully and thoroughly in Section 2. In Section 3, an analysis of the results and discussion are provided whilst Section 4 concludes the study.

2. Materials and Methods

2.1. Case Study Report

2.1.1. Data Description

This comprehensive study explores the feasibility of developing a predictive model for power outages in South Africa through a thorough analysis of historical power grid data. The research period spans from 1 March 2021, to 30 April 2022, as illustrated in Figure 1. The data analysis was executed using the R program (version 4.4.1), and the final dataset incorporated 43 variables as outlined in Table 3.

Figure 1 illustrates the unplanned outage data, which fluctuates from hour to hour. The unplanned outages (in MW) (i.e.,

y_{T U C L F . O C L F}

) are used as a dependent variable, while other variables are considered to be independent (see Table 3). The data details presented in Table 3 are supplemented by a comprehensive glossary and individual distributions in Appendix A. These help us understand the physical or operational meaning of the variables and their significance in power outage forecasting

2.1.2. Problem Formulation

This study approximates the complex relationship between power grid factors and unplanned outages. The predictor matrix and dependent variable are, respectively, given by

X \in ℝ^{1464 \times 42} = [x_{O R C L}, \dots, x_{N C S}]

and

y \in ℝ^{1464 \times 1} = {[y_{T U C L F . O C L F (1)}, \dots, y_{T U C L F . O C L F (1464)}]}^{T}

such that

x_{i}^{T} = [x_{i, 1}, \dots, x_{i, 42}], y = [\begin{matrix} y_{1} \\ ⋮ \\ y_{1464} \end{matrix}],

(1)

where

x_{i}^{T} (i = 1 : 1464)

denotes the

i^{t h}

row of the predictor matrix

X

and

y_{i}

denotes the

i^{t h}

unplanned outage observation. Fundamentally, we would like to find an approximate function

\hat{f} (X)

of an unknown function

f (X)

with

f : ℝ^{1464 \times 42} \to ℝ^{1464 \times 1}

such that

\hat{f} (X) \approx f (X)

. In essence, we would like to resolve the following regression exercise:

y = f (X) + ζ,

[\begin{matrix} y_{1} \\ ⋮ \\ y_{1464} \end{matrix}] = f ([\begin{matrix} x_{1} \\ ⋮ \\ x_{1464} \end{matrix}]) + [\begin{matrix} ζ_{1} \\ ⋮ \\ ζ_{1464} \end{matrix}],

(2)

where

ζ_{i}

is the noise or residual error associated with the

i^{t h}

unplanned outage prediction.

2.1.3. Data Partition

To capture seasonal effects (including weather variation), the data were partitioned into four distinct seasons with different seasonal variations: autumn (March–May 2021), winter (June–August 2021), spring (September–November 2021), and summer (December 2021–February 2022). Fundamentally, an effective predictive system must be tested and validated across all year-round conditions and seasons (see Table 4). For experimentation, the first two months of each season, approximately 1464 h, were selected. For each season, the datasets were divided into a training set (including a 20% validation set), which accounts for 80%, and a testing set, which accounts for the remaining 20% (also see Table 4). Nonetheless, we preserved the results for the Autumn 2022 (1 March 2022–30 April 2022) dataset to evaluate models’ applicability and seasonal robustness over the years. Through analysing the four distinct seasons within a single year, we can delve into the seasonal patterns that affect the model, ensuring its robustness across different seasonal conditions. Furthermore, the incorporation of seasonal data from a different year, specifically Autumn 2022, allows us to determine the applicability of the model in scenarios beyond those encountered in the initial year.

As part of the supervised learning approach employed in this work, we optimise the ML algorithms (outlined in the following sub-sections) to estimate the function

f

. The rationale is to minimise the loss or error function (

L

) between

y_{i}

and

{\hat{y}}_{i}

denoted by:

L (y_{i}, {\hat{y}}_{i}) = \frac{1}{1176} \sum_{i = 1}^{1176} {\hat{ζ}}_{i}^{2},

(3)

where

{\hat{y}}_{i}

is the predicted value and

{\hat{ζ}}_{i}

denotes the residual from the 1176 observation belonging to the training set. During the training stage of ML, the loss function will be adjusted (until–levels of parameters are reached).

2.2. Variable Selection

There are various variable and regularisation methods such as LASSO, Elastic Net and Ridge regression. LASSO encourages sparse models and performs both variable selection and regularisation to increase the interpretability and prediction accuracy of the model [43]. LASSO also excels at handling high levels of multicollinearity (see [43]). The loss function which combines ordinary least square error loss with the absolute deviation-based penalty (

\sum_{j = 1}^{p} |ξ_{j}| \leq η)

or (

l_{1}

-norm constraint) that the LASSO seeks to minimise is given by the following Lagrangian form:

\begin{matrix} m i n \\ ξ \in ℝ^{p} \end{matrix} \{\frac{1}{2 \times N} \sum_{i = 1}^{N} {(y_{i} - \sum_{j = 1}^{p} x_{i j} ξ_{j})}^{2} + λ \sum_{j = 1}^{p} |ξ_{j}|\},

(4)

where

N = 10,223

pairs of predictor variables (

X

) and response variables (

y

) are denoted by

{(x_{i j}, y_{i}), i = 1 : N; j = 1 : p}, p = 42

denotes the number of predictor variables,

ξ = (ξ_{1}, ξ_{2}, \dots, ξ_{p})

are the regression weights, and

λ

(controlling the amount of shrinkage) is a tuning parameter. If

λ = 0

, we revert to the ordinary least squares equation. The larger values of

λ

, the more the coefficients are shrunk to zero. Hence, the model will under-fit. For an increasing

λ,

the coefficients are set to zero and eliminated, thereby leading to model bias. A decreasing tuning parameter

λ

results in a variance increase. LASSO seeks to strike the best compromise between model over-fitting and sparsity by avoiding the extreme (i.e., either positive or negative) values of the constraint

η

. Furthermore, LASSO regression is also applicable to feature selection since the coordinates of less significant features are truncated towards zero accordingly while those statistically insignificant ones are entirely shrunk to zero. We controlled multicollinearity, regularised, and selected variables using the function ‘cv.glmnet’ in the R program. In the subsequent analysis, variables including

x_{R F}

,

x_{R S A . C F}

,

x_{I E}

,

x_{E G G}

,

x_{T R E I C}, x_{T R E}, x_{C S P I C}

,

x_{P G U H}, x_{T U C L F}

, and

x_{T O C L F}

were excluded.

As a regularised linear regression technique, LASSO assumes a linear association between dependent and independent variables. Thus, some aspect of linearity was assumed during feature engineering or selection. Nonetheless, the drawbacks associated with linearity were later addressed through the application of nonlinear ML models. Using the remaining 31 predictor variables and the dependent variable, we further used RF to select the top 10 variables for each season. The rationale is to further reduce model dimensions, capture nonlinearity characteristics, and adequately capture specific seasonal behavior, thereby improving accuracy and robustness. In Appendix A is Algorithm A1, which summarises the implementation process in the R program.

2.3. Random Forest

RF regression is an ensemble flexible ML model that relies on the aggregation of weak predictors, namely, regression trees, to provide accurate and reliable prediction [44,45]. RF models are often used to solve regression and classification problems since they possess a high level of accuracy without complex hyperparameter tuning. They also handle (internally) missing values very well and rectify overfitting in a decision tree. A process known as bootstrapping aggregation, or bagging (see Algorithm A2), which reduces overfitting and bias, is employed to build each tree independently using a subset of the training data [45,46]. Suppose we have input data (

X_{r e t a i n e d}

) and target data (

y_{r e t a i n e d}

) each tree is sampled randomly with replacement (bootstrapped sample) from the training data (

X_{t r a i n_r e t a i n e d}, y_{t r a i n_r e t a i n e d}

). Each tree must be grown on an independent bootstrap sample from the training data. From all

M

(number of trees in the forest) possible variables, select

m

variables at random and find the most optimal split at each node (see Algorithm A2). In the end, by averaging the forecasts from all trees, the forecast is calculated through the following equation:

{\hat{y}}_{t r a i n_r e t a i n e d} = \frac{1}{M} \sum_{i = 1}^{M} T_{i} (X_{t r a i n_r e t a i n e d}) .

(5)

The RF assumes that when using a bootstrapped sample, the most effective data split is reached at each stage of growing a tree or forest. The resultant output from each grown tree is combined through averaging. In [45], the authors showed that randomness and variability in tree construction resulted in reduced generalisation error and a better overall model with lower variance. Different from other techniques (such as boosting), RF results do not gradually change the training set [45,46]. Overall, RFs are relatively insensitive to outliers and noise and can handle highly nonlinear interactions. This is because bagging (bootstrap averaging), which the RF employs, can effectively reduce variance in decision trees [46]. The implementation of RF (through the bagging process) in the R program is summarised in Algorithm A2 in Appendix A.

2.4. Signal Processing Methods

The Fourier transform (FT) assumes a stationary signal, and it cannot reflect time series features in the time domain [47] such that its density and marginal distribution are independent of time [57]. Due to this feature, FTs often fail to effectively and steadfastly explain non-stationary signals in the real world. While short-term Fourier Transforms (STFTs) remedy this deficiency through window functions and short localised waveforms in the time-frequency domain, these techniques have the drawback of using a fixed-width window function. To circumvent the FTs’ and STFTs’ drawbacks, WTs are often employed as they are able to handle both the time-frequency domain [47,58] and are independent of window functions [57]. WTs rely on group functions that expand and dwindle with signal frequency [57]. WTs also collect insights and meaningful information whilst removing noise and anomalies from the original time series to ease the modelling and forecasting process [37,41]. The WT presents high-frequency resolution at low frequencies and high-time resolution at high frequencies such that noise is removed and patterns or trends are revealed [37,41].

Wavelet Transform

In practice, discrete WT (DWT) is often applied to enhance models’ predictive strength, as its calculations are simpler and faster than continuous WT (CWT). Furthermore, the DWT contrasts with the CWT as its wavelet scaling factor and the translation factor are discrete (see, e.g., [13,37,47]). For a given time series sequence decomposed by DWT, the approximation (

A_{i}

) and detailed (

D_{i}

) subseries are, respectively, determined by:

A_{i + 1} (t) = \sum_{l} h (l) . A_{i} (2 . t + l),

(6)

D_{i + 1} (t) = \sum_{l} g (l) . A_{i} (2 . t + l),

(7)

where

i \in [0, M^{'}]

are decomposition levels,

h

and

g

respectively denote the low-pass and high-pass filter functions, and

A_{0} = y_{t}

is the original signal. Hence, the inverse DWT is computed by the mathematical expression below:

A_{i} (t) = \sum_{l} h (l) . A_{i + 1} (2 . t + l) + \sum_{l} g (l) . D_{i + 1} (2 . t + l) .

(8)

Despite excellent time-frequency domain features, it is difficult to compute the best decomposition level when working with WTs. This study employs maximal overlap DWT (MODWT), which is a type of DWT. Different from the traditional DWT, the MODWT is time-invariant, such that subseries signals maintain the same coefficients even if the original signal has shifted. Furthermore, MODWT offers better statistical features compared to conventional time-variant DWT (see [37,47]). The MODWT is implemented through the waveslim package in R as summarised in Algorithm A3 (see Appendix A).

2.5. AdaBoostRT Algorithm

AdaBoost is an ensemble learning boosting algorithm that aims to enhance model robustness and generalisation abilities by combining predictions from multiple learners [48,49,50]. In this algorithm, interdependence is encouraged by training weak learners to become stronger. By using sequentially trained models that correct errors made by previous models, bias is reduced. AdaBoost decision trees are built on existing tree knowledge using training data [49]. Learners study and improve their predecessors’ mistakes [48,49]. In AdaBoost, weights increase for incorrectly classified values and decrease for correctly classified values. AdaBoostRT is a type of Adaboost algorithm designed to handle regression problems. It categorises the samples as either correct or incorrect based on the absolute relative error score calculated using the equation below:

ρ_{i} = |\frac{{\hat{y}}_{i} - y_{i}}{y_{i}}|,

(9)

where

{\hat{y}}_{i}

and

y_{i}

respectively denote a predicted and actual unplanned outage observation. The AdaBoostRT algorithm effectively discriminates between correct and incorrect predictions by establishing a distinct threshold from the training data. Afterwards, the algorithm places greater emphasis (increased weights) on observations that are incorrectly predicted. The key drawback of AdaBoostRT is that it varies depending on the threshold selected. A higher threshold (e.g.,

δ > 0.4

) will result in more values being classified as correct and a slower convergence speed for the algorithm [49]. If the threshold value is set too low, accuracy will inevitably decrease, resulting in a decrease in the ensemble algorithm’s reliability or stability [49]. The summary of the main steps involved in AdaBoostRT is presented in Algorithm A4 in Appendix A (also see [48,50] for details).

2.6. Relevance Vector Machine

In RVM, which is a variant of sparse linear models based on the hierarchical prior distribution [36], sparseness is achieved through the assumption of sparse distribution over the weights of the regression model [35,36,51]. The independent Gamma prior distribution is usually assumed on the weight parameters, whilst the Gamma hyperprior distributions are assumed on the variance parameters [36], resulting in a student

t

prior distribution over the weights thereby achieving sparseness. The RVMs have the same function form as the well-known SVMs; however, the kernel functions that form the basis of the RVMs do not have to comply with Mercer’s criteria (i.e., continuous symmetric positive integral operator). In addition, RVMs have a smaller number of relevance vectors (compared to support vectors used by the SVMs) and reduced sensitivity to hyperparameter settings [35,36,51,52]. RVMs, however, usually involve highly nonlinear optimisation processes [36]. The RVMs compute predictions based on the following mathematical expression (see [35,52] for details):

f (X, w) = \sum_{i = 1}^{N} w_{i} K (X, x_{i}) + ω_{0},

(10)

where

w = {(w_{1}, w_{2}, \dots, w_{N})}^{T}

denotes the weights of the model,

K (., .)

is the kernel function centred at different training data observations, and

ω_{0}

is the bias term. The kernel function defines one basis function for each observation in the training dataset. To automatically select the right kernel at each location, the sparse component or element of RVMs prunes all irrelevant kernels [36]. Suppose that we are given a training dataset of input-output denoted by

Θ = {\{x_{n}, t_{n}\}}_{n = 1}^{N}

and assume that the outputs or targets

t_{n}

are from the model defined by the following mathematical expression [35,36]:

t_{n} = f (x_{n}, w) + ε_{n},

(11)

where additive noise

ε_{n} \sim N (0, β^{- 1})

is a set of independent samples, and

β^{- 1}

is the precision of the variance of the noise term. Thus, the likelihood distribution of

t_{n}

is given by the following expression (see [35,36,51] for details):

p (t_{n} | X, w, β) = \prod_{n = 1}^{N} P (t_{n} | x_{n}, w, β) = \prod_{n = 1}^{N} Ν (t_{n} | f (x_{n}, w), β^{- 1}) .

(12)

For each weight hyperparameter

w_{i}

, the RVM model introduces a separate hyperparameter

π_{i}

(which represents the precision of the weight parameter), such that the weight parameter will have a prior distribution concentrated around zero with the following form [35,52]:

p (w | π) = \prod_{i = 1}^{N} Ν (w_{i} | 0, π_{i}^{- 1}),

(13)

where a vector

π = (π_{1}, π_{2}, \dots π_{N}

) and as mentioned before

π_{i}

denotes the precision of the weight hyperparameter

w_{i}

which controls the variability (or shrinkage). Since the resultant basis functions do not contribute to predictions, the final model will be sparse. Different from SVM, which uses sequential minimal optimisation (SMO) [59], RVMs use expectation maximisation algorithms [35,36,52]. According to [52], RVM updates its hyperparameters iteratively until a threshold condition is satisfied through the steps summarised in Algorithm A5 (see Appendix A).

2.7. Vector Autoregressive Models

VARs are data-driven methods for analysing multivariate time series data. Aside from being easy to comprehend, these techniques are (to some extent) also capable of handling data with higher dimensions and capturing structural changes [53,54]. However, they come with limitations, including challenges in interpreting estimated coefficients due to high dimensionality; choice of lag length affecting performance; and number of parameters increasing with dimension; and in high-dimensional spaces, sparsity is required to avoid multicollinearity [53,54] (also see Table 2). A VAR (

p

) model can be intuitively represented by the following equation:

y_{i, t} = τ_{i} + \sum_{k = 1}^{p} ϕ_{i 1, k}, y_{1, t - k} + \sum_{k = 1}^{p} ϕ_{i 2, k} y_{2, t - k} + \dots + \sum_{k = 1}^{p} ϕ_{i n, k} y_{n, t - k} + e_{i, t},

(14)

where

τ_{i} (i = 1 : n)

are the constants or intercept terms of the

i^{t h}

time series;

y_{i, t} (i = 1 : n)

denotes the

i^{t h}

time series at time

t

;

p

represents the lag for the model;

ϕ_{i j, k}

is the effect of region

j

on region

i

with a lag of

k

time points,

e_{i, t} (i = 1 : n)

is the uncorrelated noise or residual terms for the

i^{t h}

series at time

t

. The equation above is estimated using the least squares method, with parameters estimated by minimising the sum of squares for each equation. VAR models are effective under data stationarity; otherwise, transformations (usually “VAR” differencing) need to be effected. To align the VAR model, we applied data differencing techniques to stabilise the variance and ensure that the data were stationary. To lower the significant risk of overfitting, which is increased by the small sample size and numerous variables, we further reduced the number of exogenous variables by employing the principal component analysis (PCA) method.

2.8. Proposed Framework

In terms of study contributions, the proposed RVM-WT-AdaBoostRT-RF improves upon the inadequacies of the previous work in the sense that it provides computational efficiency, effectively captures nonlinearity, avoids overfitting, and has high accuracy in the prediction of unplanned power outages using power grid parameters. The contribution of each of the models involved in the construction of the stacked hybrid model is presented in Table 5.

Proposed Stacking Prediction Approach

In stacking, several forecasting models are fused using a meta-model. In essence, each subsequent model in the stack tackles the errors of the previous model, thereby minimising the overall error. Thus, the model identifies missed patterns or trends in the previous model, thereby preventing overfitting and ultimately enhancing the stability of the model on unseen data. In this study, for instance, LASSO and RF identify and handle broad trends or aspects such as dimension reduction; RVM facilitates nonlinear probabilistic learning; WTs and AdaBoostRT focus on more detailed aspects of the residuals to minimise bias and noise. RF also ensures that predictions are more stable by effectively capturing nonlinearity and averaging errors with efficiency and accuracy when stacking. Fundamentally, the stacking approach reduces the errors of the base learner, boosts forecast accuracy, and increases the seasonal robustness of the model. The specifics of the proposed RVM-WT-AdaBoostRT-RF are also provided in Algorithm 1 (also see Appendix A; Algorithms A1–A5) and Figure 2.

Algorithm 1: RVM-WT-AdaBoostRT-RF

Data cleaning, formatting, dimension reduction, and partition

1.

Data cleaning

Input: $Raw_data \in ℝ^{10,223 \times 42},$
Output: $X_{n e w} \in ℝ^{10,223 \times 42}$ $, y_{n e w} \in ℝ^{10,223 \times 1}$

2.

Detect multicollinearity through VIF

Input: $X_{n e w}$ $, y_{n e w}$
Output: $X_{v i f}$ $\in ℝ^{10,223 \times 42}$ , $y_{v i f} \in ℝ^{10,223 \times 1}$

3.

Partition data into 80% training (including validation) and 20% test set

Input: $X_{v i f}, y_{v i f}$
Output: $(X_{t r a i n} \in ℝ^{8178 \times 42}$ $, y_{t r a i n}$ $ℝ^{8178 \times 1})$ $\leftarrow$ training set,
$(X_{t e s t} \in ℝ^{2045 \times 42}$ $, y_{t e s t}$ $\in ℝ^{2045 \times 1}$ $) \leftarrow$ test set

4.

Variable selection (LASSO)

Input: $X_{t r a i n}$ , $y_{t r a i n}, α_{l a s s o}$

Output: 31 retained predictors; 1 dependent variable

5.

Extract data to represent each season

Input: $X_{s e l e c t e d v a r} \in ℝ^{10,223 \times 31}$ , $y_{s e l e c t e d v a r} \in ℝ^{10,223 \times 1}$
Output: $X_{s e a s o n} \in ℝ^{1464 \times 31}, y_{s e a s o n} \in ℝ^{1464 \times 1}$

6.

Select top 10 variable per season (RF)

Input: $X_{s e a s o n}, y_{s e a s o n}$
Output: $(X_{t r a i n r e t a i n e d} \in ℝ^{1176 (= 888_{t r a i n} + 288_{v a l}) \times 10}$ $, y_{t r a i n r e t a i n e d} \in ℝ^{1176 (= 888_{t r a i n} + 288_{v a l}) \times 1} \leftarrow$ $training set; (X_{t e s t r e t a i n e d} \in ℝ^{288 \times 10},$ $y_{t e s t_r e t a i n e d}$ $\in ℝ^{288 \times 1}$ $) \leftarrow test set; (X_{r e t a i n e d} \in ℝ^{1464 \times 10},$ $y_{r e t a i n e d}$ $\in ℝ^{1464 \times 1}$ $) \leftarrow$ Full retained set

B.: Decomposition of RVM residuals

7.

Fit RVM model on the entire retained data

Input: ${RVM}^{o p t i m a l}$ $, X_{r e t a i n e d} \in ℝ^{1464 \times 10}$ $, y_{r e t a i n e d} \in ℝ^{1464 \times 1}$
Output: $\hat{f} (X_{r e t a i n e d}) \leftarrow {\hat{y}}_{f i t t e d} \in ℝ^{1464 \times 1}$

8.

Calculate residuals

Input: $y_{r e t a i n e d}$ $, {\hat{y}}_{f i t t e d}$
Output: $y_{r} = y_{r e t a i n e d} - {\hat{y}}_{f i t t e d} \in ℝ^{1464 \times 1}$

9.

Intialise wavelet parameters

Input: $y_{r}$ $, db 4 \leftarrow$ $wavelet_filter, 2 \leftarrow n_level,$ $periodic \leftarrow boundary$
Output: $y_{r}$ $, wavelet_filter, n_level$ , boundary

10.

Perform wavelet decomposition using modwt

Input: $y_{r}$ $, wavelet_filter, n_level$ , boundary
Output: $A_{2} \in ℝ^{1464 \times 1} \leftarrow A$ $(Approximate subseries); (D_{1}, D_{2}$ $) \in ℝ^{1464 \times 2} \leftarrow D$ (Detailed subseries)

C.: Residual predictions through AdaBoostRT

11.

Partition subseries into training and test datasets

Input: $y_{r}, X_{d e c o m p o s e d} = (A, D) \in ℝ^{1464 \times 3}$
Output: $(X_{t r a i n_r} \in ℝ^{1176 (= 888_{t r a i n} + 288_{v a l}) \times 3}, y_{t r a i n r} \in ℝ^{1176 (= 888_{t r a i n} + 288_{v a l}) \times 1}) \leftarrow$ $training set (80 %); (X_{t e s t_r}$ $, \in ℝ^{288 \times 3}, y_{t e s t_r} \in ℝ^{288 \times 1}) \leftarrow$ test set (20%)

12.

Initialise parameters

Set weights $τ_{i} = (\frac{1}{n})$ , weak learners $V \in ℝ,$
error_threshold $δ \in ℝ$

13.

Train (60%) and validate (20%) using the 80% of the retained data

13.1.

Tune hyperparameters

For each iteration $i = 1 to V$
Input: $X_{t r a i n_r}$ , $y_{t r a i n_r}, τ_{i}$
$Fit a weak learner G_{i}$ $to weighted X_{t r a i n_r}$ $; Predict for y_{t r a i n_r}$
Output: $Predictions from weak learners G_{i} (X_{t r a i n_r})$
a. Calculate error
Input: $y_{t r a i n_r}, G_{i} (X_{t r a i n_r})$
$Compare G_{i}$ $(X_{t r a i n_r})$ $with y_{t r a i n_r}$ :
$If ρ_{i} < δ$ , correct; otherwise, incorrect
Output: $ρ_{i}$ (incorrectly classified)

b. Update weights

Input : y_{t r a i n_r}

,

G

(

X_{t r a i n_r}), τ_{i}

,

ρ_{i}

Output : τ_{i + 1}

(updated weight)
c. Calculate model weights

Input : ρ_{i}

Output : weights ψ_{i}

(for the weak learners)

13.2.

Model validation on the 20% of the data

Input: $weak trees G \leftarrow \{G_{1}, G_{2}, \dots, G_{V}\}$ $, weights ψ \leftarrow {ψ_{1}$ $, ψ_{2}$ $, . . ., ψ_{V}}$
Output: $Weighted combination \hat{f} (X_{t r a i n_r (v a l)}) = \sum_{i = 1}^{V} ψ_{i} G_{i} (X_{t r a i n_r (v a l)}) = {\hat{f}}^{A d a B o o s t R T_r_v a l} \in ℝ^{288 \times 1}$

14.

Predicting using test data

Input: $\hat{f} (X_{t r a i n_r (v a l)})$ $, X_{t e s t_r}$ $, y_{t e s t_r}$
Output: $\hat{f} (X_{t e s t_r}) = {\hat{y}}_{t e s t_r} \leftarrow {\hat{f}}^{A d a B o o s t R T_r} \in ℝ^{288 \times 1}$

15.

Model performance assessment

Input: $y_{test_retained}, {\hat{f}}^{A d a B o o s t R T_r}$
Output: {Mean absolute error (MAE), mean absolute percent-age error (MAPE), root mean square error (RMSE)}←Performance metrics

16.

Final output

Output: ${\hat{f}}^{A d a B o o s t R T_r},$ Performance metrics

D.: Stacking through RF

17.

Initialise model

Input: $y_{t r a i n_r e t a i n e d (v a l)}, {\hat{f}}^{A d a B o o s t R T_r_v a l}, {\hat{f}}^{A d a B o o s t R T_v a l}, {\hat{f}}^{R F_v a l}; {\hat{f}}^{R V M_v a l}$
Output: $y_{t r a i n_r e t a i n e d (v a l)} \in ℝ^{288 \times 1}; ({\hat{f}}^{A d a B o o s t R T_r_v a l}, {\hat{f}}^{A d a B o o s t R T_v a l}, {\hat{f}}^{R F_v a l}; {\hat{f}}^{R V M_v a l}) \leftarrow {\hat{X}}_{v a l i d a t i o n} \in ℝ^{288 \times 4}$

18.

Forecast fusion (stacking)

Input: $y_{t r a i n_r e t a i n e d (v a l)}, {\hat{X}}_{v a l i d a t i o n}$ $; {\hat{X}}_{b a s e_m o d e l s} = ({\hat{f}}^{R F_m o d e l}, {\hat{f}}^{A d a B o o s t R T_r}, {\hat{f}}^{R V M_m o d e l}, {\hat{f}}^{A d a B o o s t R T}) \in ℝ^{288 \times 4}$
$Train using (X_{v a l i d a t i o n}, y_{t r a i n_r e t a i n e d (v a l)})$ $and then compute {\hat{y}}_{t e s t_r e t a i n e d} = \hat{f} ({\hat{X}}_{b a s e_m o d e l s}) = \sum_{i = 1}^{M} T_{i} ({\hat{f}}^{R F_{m o d e l}}, {\hat{f}}^{A d a B o o s t R T_r}, {\hat{f}}^{R V M_{m o d e l}}, {\hat{f}}^{A d a B o o s t R T})$

Output : {\hat{y}}_{t e s t_r e t a i n e d} \leftarrow {\hat{f}}^{H y b r i d} \in ℝ^{288 \times 1}

19.

Model performance assessment

Input: $y_{t e s t_r e t a i n e d}, {\hat{f}}^{H y b r i d}$
Output: MAE, MAPE, RMSE, prediction interval normalised average width (PINAW), Mincer-Zarnowitz (MZ) test, and the Diebold-Mariano (DM) test←Performance metrics

20.

Final Output

Output: ${\hat{f}}^{H y b r i d}$ , Performance metrics

2.9. Evaluation Metrics

2.9.1. Point Prediction Performance Indicators

In this study, the point predictions are evaluated by MAE, MAPE, and RMSE, which are, respectively, given by the following equations:

M A E = \frac{1}{n} \sum_{t = 1}^{n} |{\hat{ζ}}_{t}|,

(15)

M A P E = \frac{1}{n} \sum_{t = 1}^{n} |\frac{{\hat{ζ}}_{t}}{y_{t}}| \times 100,

(16)

R M S E = \sqrt{\frac{1}{n} \sum_{t = 1}^{n} {\hat{ζ}}_{t}^{2}},

(17)

where

{\hat{ζ}}_{t} = y_{t} - {\hat{y}}_{t},

with

y_{t}

and

{\hat{y}}_{t}

respectively denoting the actual and predicted values at time

t

. The sample size is represented by

n

. Smaller values for MAE, MAPE, and RMSE imply a better prediction model. Though easy to comprehend, the MAE is more sensitive to extreme values than other indices. MAPE returns the error in percentage form, which is easy to interpret, but is singularity-prone [53,60]. RMSE, on the other hand, produces an absolute value in the same range as the output value. Consequently, RMSE is more suitable for operations relating to power grid stability, as it accurately reflects the inherent nature of extreme errors. However, RMSE is more prone to outliers than MAPE [53,60].

2.9.2. Residual Analysis

Suppose that

y_{t}

and

{\hat{y}}_{t}

, respectively, correspond to the observed and predicted power outage values at time

t

. The residual terms estimates given by

{\hat{ζ}}_{t}

,

t = 1, \dots, n

, will be analysed to determine whether model predictions underestimate or overestimate actual power outages. If

{\hat{ζ}}_{t} \neq 0

then the predictive model either overestimates (

{\hat{ζ}}_{t} < 0

) or underestimates (

{\hat{ζ}}_{t} > 0

) the actual outage values.

2.9.3. Probabilistic Performance Indicators

PINAW is employed to evaluate the reliability and certainty of prediction intervals. This prediction interval performance indicator is given by the following expression [61].

P I N A W = \frac{1}{n γ} \sum_{t = 1}^{n} (U_{t} - L_{t}),

(18)

where

L_{t}

and

U_{t}

respectively represent the lower and upper limits of the prediction interval. The

γ

is the range indicator. The lower PINAW values are preferred as they suggest a narrower deviation from the target values and a more robust prediction interval with better generalisation characteristics. PINAWs are known to be computationally expensive and sensitive to normality deviations [60].

2.9.4. Diebold-Mariano Test

The DM test evaluates the forecasting strength of the models applied. Consider the two forecasted values donated

{\hat{y}}_{t i}, {\hat{y}}_{t j},

for the value

y_{t}

from models

i

and

j

, respectively. Let

L (ζ_{t r}) = y_{t r} - {\hat{y}}_{t r}

for

r = 1, 2

be the loss function associated with the two forecasts. Then, the loss differential function between the two forecasts is given by [62,63]:

d_{r} = L ({error}_{i}) - L ({error}_{j}) .

(19)

The DM test evaluates the null hypothesis that the two forecasts have the same predictive accuracy. Thus, the following hypothesis is tested [62,63]:

H_{0} : E (d_{r}) = 0, vs . H_{a} : E (d_{r}) \neq 0

\forall r

. The DM test statistic is calculated as follows:

D M = \frac{\frac{1}{n} \sum_{r = 1}^{n} d_{r}}{\sqrt{\frac{s^{2}}{n}}},

(20)

where

S^{2}

is the estimated variance of

d_{r}

. The calculated DM value is then compared to the critical value. In both cases, the original hypothesis will be rejected if the DM statistic is greater than the upper critical value (

Z_{\frac{α}{2}}

) or less than the lower critical value (

- Z_{\frac{α}{2}}

).

2.9.5. Computational Tools

All six models were trained and tested in the Dell development environment with the following specifications: 13th Gen Intel(R) Core i7 2.50 GHz processor, 32 GB of RAM, and Windows 11. Models’ hyperparameters were tuned through cross-validation, grid search, and heuristics. The optimal parameter intervals alongside the respective R libraries are given in Table 6. On average, implementation time (in seconds) on each dataset lasted around 4–5 min.

3. Results

3.1. Exploratory Analysis

Table 7 summarises the statistics of the five datasets under experimentation. The highest power outage levels were recorded in the summer (17,558 MW), whilst the lowest was observed in the autumn (8410 MW). On average, summer power outage levels (13,928 MW) were significantly higher than in any other season. This trend may be attributed to the surge in electricity consumption during the summer months caused by intense heat in various parts of the country. For instance, during the summer months, the air temperature rises. As a result, the majority of consumers rely on air conditioners, increasing their energy consumption. Winter had the highest variance (1562.242 MW) compared to any other time of year. Power outage levels vary by season. Furthermore, the Kruskal–Wallis test at 5% level of significance resulted in p-value

< 0.05

, confirming the presence of seasonality effects in the power outage data (also see Figure 3). All datasets are platykurtic (kurtosis less than 3) with skewness ranging from

-

0.42 to 0.40. Figure 3 also shows non-normality in autumn, winter, and summer. Furthermore, the autumn and summer datasets are multimodal.

3.2. Empirical Results

3.2.1. Wavelet Analysis

The characteristics of the decomposed RVM residuals using Daubechies 4 (DB4) are shown in Figure 4. We compared LA8’s performance (based on RMSE and coefficient of determination (

R^{2}

)) with DB4’s at different decomposition levels (

M^{'}

= 2 and 3). DB4 dominated LA8, with best results at

M^{'}

= 2. It was also found that the predictive error increased with increasing decomposition levels, which concurs with the results in [64]. It is worth noting that the Autumn 2022 dataset is reserved for evaluating the seasonal robustness of the proposed strategy over time.

3.2.2. Comparative Analysis

Table 8 shows how the hybrid (or RVM-WT-AdaBoostRT-RF), RF, RVM, AdaBoostRT, VAR, and benchmark Naive model performed in the five datasets tested. The PINAW indices measure the reliability of the prediction interval estimates, while the other indicators measure the precision of point predictions. The MZ test assesses the bias of the forecasts. The DM test compares the predictive accuracy of forecasts from each of the six models. The top performers in each category of the performance index are bolded.

Overall, comparative tests showed that the hybrid dominated all other models based on low values of RMSEs, MAEs, and MAPEs, followed by RVM, AdaBoostRT, RF, VAR, and Naive. With the exception of the autumn dataset, RVM was the single most influential model across all seasons based on RMSE, MAE, and MAPE. Furthermore, the AdaBoostRT model accounted for power outage dynamics better than the RF model for the winter, spring, and summer seasons, whilst for the autumn, it outcompeted RVM based on the least RMSE, MAE, and MAPE. All models (including the proposed hybrid) are subject to seasonal fluctuations. Overall, our analysis revealed that the hybrid model produced more accurate predictions than other models (see Figure 5 and Figure 6).

Except for the summer dataset, the hybrid model underestimated all outage datasets, as its residuals were positively skewed (indicating more positive errors). Similar results were obtained for RVM and VAR predictions across all four seasons. Except for the spring dataset, the AdaBoostRT predictions are overestimating all four datasets. The Naive model overestimated the autumn dataset while underestimating the other datasets.

The bias test was also conducted using the MZ test. To evaluate unbiasedness, the null hypothesis that the intercept term and slope term are, respectively, 0 and 1 is tested. If the p-value

<

0.05, then the model is considered to be biassed; otherwise, the predictions are said to be unbiased, whilst biassed for the other. The MZ test revealed that all models (including the proposed hybrid) were biassed (with p-values

< 0.05)

for all four datasets under investigation.

Except for the summer dataset, the hybrid model is less deviant (smaller standard deviation values) from actual outage levels than other models (see Figure 5). The best performance of the hybrid model was recorded in the autumn season, more than in any other season. With the exception of the autumn dataset, RVM dominated all the single models (in terms of small standard deviation values) across all datasets. Overall, the hybrid model provides more accurate results.

The 95% prediction interval coverage probability (PICPs) for all models for the four datasets tested were valid (i.e., PICP

\in

[94%, 96%]). At 95% prediction interval withnominal confidence (PINC), the hybrid produced the narrowest PINAW (or better-calibrated intervals) compared to all other models for autumn, winter, and spring datasets. The RVM model produced the narrowest prediction interval width (PIW) compared to other single models across all four datasets. The same model outcompeted the hybrid for the summer dataset based on the smallest PINAW. Overall, the hybrid prediction interval was much narrower for the autumn dataset and much broader for the spring dataset. Clearly, the overall analysis shows that the hybrid generated more accurate predictions with less uncertainty and better seasonal robustness.

The DM test was applied to all models. At 5% significance level, each comparison between the hybrid model and the five other models resulted in p-values

< 0.05

(i.e.,

H_{0} Rejected

), indicating a unique and higher predictive accuracy for the hybrid model compared to the five other models.

The assessment of the seasonal adaptability of the models over the years (using Autumn and Autumn 2022) showed that the stacked hybrid method is the only unbiased and superior to all other models across all performance evaluation metrics. The RVM model outperforms all individual models in every performance evaluation metric. The DM test showed that the hybrid approach provided more accurate and unique forecasts among the six models. The hybrid method demonstrated stability and robustness to seasonal effects.

Overall, the proposed hybrid approach captures higher peaks and variations embedded in power outage data better than all models (see Figure 5). Thus, the proposed and recommended hybrid approach provides better short-term point estimates and the most reliable prediction interval estimates with less uncertainty and strong seasonal adaptability (see Figure 5).

3.2.3. Accuracy–Complexity Trade-Off

Though the proposed hybrid strategy requires more computational power (around 60 s), the resulting increased accuracy makes the extra time worthwhile (see Table 9). As a result of our proposed hybrid approach, the accuracy of forecasts for each model is improved by at least 40%, which is crucial for making well-informed decisions about resource allocation, power grid risk management, and cost minimisation in the energy space. Thus, the extra training time is immaterial relative to the benefits that come with accurate hourly prediction of unplanned power outages (also see [23,24,25]). Besides, the use of advanced computer hardware will enable efficient and effective management of this computational cost. The average computational cost limits the model’s applicability to real-time power grid control. Nonetheless, the model still remains crucial for short-term power grid operational planning.

3.2.4. Ablation Study

Table 10 presents the ablation study results based on a summer dataset, testing the efficacy of each element of the full stacked approach. The study results revealed that each model included in the full stacked or proposed approach improves forecasting performance, confirming the synergetic structure of the proposed strategy. Overall, the best and well-balanced performance was recorded for the full stacked hybrid model as compared to the partial hybrids.

3.2.5. Comparison with State-of-the-Art Methods

The proposed approach shows superiority over other models, as illustrated in the performance of metrics in Table 11. Overall, our approach demonstrated greater robustness and reliability in predicting unplanned power outages across various seasons. This can be attributed to the fact that we used LASSO and RF to reduce complex dimensionality and enhance interpretability; probabilistic sparse Bayesian learning (RVM) to capture complex data behaviour (i.e., nonlinearity, intermittence, etc.) whilst minimising overfitting; WT to decompose and smooth residuals, a boosting approach (AdaBoostRT) to reduce bias in error predictions; and bagging RF to efficiently minimise variance effects in forecast stacking. This hybrid approach eliminates redundant calculations, reduces model complexity, and reduces noise or dimensionality prior to feeding the data, thereby reducing overall execution time (to some extent).

4. Conclusions

To ensure the efficient operation of power grids and the development of effective business strategies for utility managers, efficient and accurate predictions through an appropriate and most suitable methodology are imperative. However, power grid data has a complex structure that possesses various characteristics that make it difficult to be accurately and reliably quantified by a single model. Attempting to model such a multi-dimensional and ever-changing system with a single model may not only be impractical but might also lead to costly, inaccurate predictions and unreliable results.

In our pursuit of a well-balanced hybrid model, considering complexity and accuracy, LASSO and RF were deployed to minimise high data dimensional complexity and enhance the feature engineering process, while we leverage the sparse and probabilistic Bayesian learning of RVM to capture complex data behaviour (such as nonlinearity, random fluctuations, etc.), whilst handling model overfitting. WTs (through MODWT) decomposed and smoothed residuals. In handling residuals, bias will be minimised through a weak learner boosting approach inherent in the AdaBoostRT model. Finally, RF is also used as a meta-model that combines RVM, AdaBoostRT, RF, and residual predictions with efficiency and accuracy (minimal error accumulation). Thus, this work proposes a stacked hybrid learning model, RVM-WT-AdaBoostRT-RF, to accurately and reliably predict unplanned power outages in South Africa using power outage data provided by Eskom. The power grid variables used in the study are related to electricity demand, electricity supply, renewables, storage, and outages.

The feasibility of the proposed hybrid approach was validated using RMSE, MAPE, MAE, residual analysis, 95% PINAW, the MZ test, and the DM test against the benchmark Naive, VAR, RF, RVM, and AdaBoostRT. The study’s overall findings showed that the developed hybrid model outperforms all other models (i.e., Naive, VAR, RF, RVM, and AdaBoostRT) across all datasets based on the RMSE, MAPE, MAE, residual analysis, 95% PINAW, and the DM test. In addition to improving model accuracy, the proposed strategy also provided the most accurate interval prediction estimates with a greater amount of certainty and seasonal robustness. Similar results were obtained in the work of [40]. Overall, the findings illustrate that the proposed hybrid approach exhibits varying performance depending on the dataset, the time, or the season of the year. This result concurs with those in [2,40]. The results further showed that the proposed stacking approach is robust and generalisable over short-term periods under various conditions. Furthermore, our analysis found that models achieved better results on unscaled data than on scaled data (using min-max scaling). Thus, scaling did not improve performance. This was consistent with the results in [70,71].

The season-segmented data analysis results will enhance the development of resource allocation and power grid management strategies tailored to the needs of each season. The study results will be of great value to grid management teams, utility operators, and other energy infrastructure developers in South Africa, such as Eskom, in creating effective maintenance strategies.

The proposed hybrid showed the least performance and consistent underestimation for the spring dataset, which is likely driven by wind power (one of the main predictors during this season, which is highly variable and irregular) [72]. Therefore, the study notes the importance of weather-related factors as alluded to in the literature review; however, such data were not available (to the authors). Hence, the main limitation of the study is that it does not fully incorporate weather data such as storms, rain, etc. Furthermore, the study focuses on short-term forecasting utilising seasonal-based data.

Future work could test the efficacy of the proposed hybrid approach on significantly more extensive datasets. This could involve comparisons over the years across various locations, including different provinces, and could encompass critical meteorological variables such as wind speeds, cloud cover, storm intensity, and temperature. Furthermore, the study only employed the DB4 in signal decomposition; other wavelet filters, such as least asymmetric (“la8”), could be tested. Using highly scalable, accurate, and efficient gradient boosting approaches such as a light gradient boosting machine (LightGBM) instead of AdaBoostRT could prove beneficial in terms of efficiency and accuracy. These could be compared with robust and accurate deep-learning approaches such as GRUs, LSTMs, CNNs, and RNNs.

Author Contributions

Conceptualisation, K.S.S.; methodology, K.S.S.; software, K.S.S.; validation, K.S.S. and E.R.; formal analysis, K.S.S.; investigation, K.S.S.; resources, K.S.S. and E.R.; data curation, K.S.S.; writing—original draft, K.S.S.; writing—review and editing, K.S.S. and E.R.; visualisation, K.S.S.; supervision, E.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Research Foundation rating grant RA201117574546; and the University of South Africa’s Department of Statistics ROF1 funds.

Data Availability Statement

The data presented in this study are openly available in [ESKOM] [https://www.eskom.co.za/dataportal/] (accessed on 11 August 2022).

Conflicts of Interest

The authors state that there is no conflict of interest.

Appendix A

I: Algorithms implemented

Algorithm A1: Variable Selection (through LASSO and RF)

1.

Load relevant R libraries (glmnet, caret, randomForest)

2.

Data cleaning

Input: $Raw_data \in ℝ^{10,223 \times 42},$
Check completeness, correctness, consistency, handle structural errors, drop irrelevant features, and create new features.
Output: $X_{n e w} \in ℝ^{10,223 \times 42}$ $, y_{n e w} \in ℝ^{10,223 \times 1}$

3.

Detect multicollinearity through VIF

Input: $X_{n e w}$ $, y_{n e w}$
Check for multicollinearity using VIF
Output: $X_{v i f}$ $\in ℝ^{10,223 \times 42}$ $, y_{v i f} \in ℝ^{10,223 \times 1}$

4.

Data division for LASSO training

Input: $X_{v i f}, y_{v i f}$
Partition data into an 80% training set and a 20% test set
Output: $(X_{t r a i n} \in ℝ^{8178 \times 42}$ $, y_{t r a i n} \in ℝ^{8178 \times 1}) \leftarrow$ $training set, (X_{t e s t} \in ℝ^{2045 \times 42}$ $, y_{t e s t}$ $\in ℝ^{2045 \times 1}$ $) \leftarrow$ test set

5.

Variable selection (LASSO)

Input: $X_{t r a i n}$ $, y_{t r a i n}, α_{l a s s o}$

Through cross validation, find optimal λ

and fit LASSO
Retain variables with non-zero coefficients

(ξ_{j} \neq 0

)
Output: 31 predictors; 1 dependent variable

6.

Extract data to represent each season

Input: $X_{s e l e c t e d_v a r} \in ℝ^{10,223 \times 31}$ , $y_{s e l e c t e d_v a r} \in ℝ^{10,223 \times 1}$
Extract data to represent each season
Output: $X_{s e a s o n} \in ℝ^{1464 \times 31}, y_{s e a s o n} \in ℝ^{1464 \times 1}$

7.

Select top 10 variable per season (RF)

Input: $X_{s e a s o n} \in ℝ^{1464 \times 31}, y_{s e a s o n} \in ℝ^{1464 \times 1}$
Partition data into an 80% training set [60% train + 20% val] and a 20% test set; Select the top ten variables for each season through RF
Output: $(X_{t r a i n_r e t a i n e d} \in ℝ^{1176 (= 888_{t r a i n} + 288_{v a l}) \times 10}$ $, y_{t r a i n_r e t a i n e d} \in ℝ^{1176 (= 888_{t r a i n} + 288_{v a l}) \times 1}) \leftarrow$ $training set; (X_{t e s t_{_} r e t a i n e d} \in ℝ^{288 \times 10},$ $y_{t e s t_r e t a i n e d}$ $\in ℝ^{288 \times 1}$ $) \leftarrow$ $test set; (X_{r e t a i n e d} \in ℝ^{1464 \times 10},$ $y_{r e t a i n e d}$ $\in ℝ^{1464 \times 1}$ $) \leftarrow$ Full retained set

Algorithm A2: RF (through bagging)

1.

Load relevant R libraries (caret, randomForest, ranger)

2.

Train (60%) and validate (20%) using 80% of the retained data

2.1.

Tuning hyperparameters

Input: $X_{t r a i n_r e t r a i n e d} \in ℝ^{888 \times 10}, y_{t r a i n_r e t a i n e d} \in ℝ^{888 \times 1}$
Tune RF hyperparameters and find the optimal number of trees (M) and featues (m)
Output: $Optimised M, m$
For each tree i = 1 to M
a. Bootstrapped sample generation
- Input: $X_{t r a i n_r e t a i n e d}, y_{t r a i n_r e t a i n e d}$
- Draw each sample with a replacement from $(X_{t r a i n_r e t a i n e d}, y_{t r a i n_r e t a i n e d})$ to create a bootstrapped sample $(X_{b s t r a p e d}, y_{b s t r a p e d}) = B^{*}$
- Output: $B^{*}$
b. Build decision tree
- Input: $B^{*}, m$
- Build a decision tree; randomly select m at each node; grow a decision tree
- Output: Decision tree $T_{i}$
c. Building forest
- Input: $T_{i}$
- Built forest using all the tress $(T_{1}, T_{2}, \dots, T_{M}$ )
- Output: $(T_{1}, T_{2}, \dots, T_{M}) \leftarrow {RF}^{o p t i m a l}$

2.2.

Model validation on the 20% of the data

Input: ${RF}^{o p t i m a l}, X_{t r a i n_r e t a i n e d (v a l)}, y_{t r a i n_r e t a i n e d (v a l)}$
Aggregate predictions from all trees such that ${\hat{y}}_{t r a i n_r e t a i n e d (v a l)} = \hat{f} (X_{t r a i n_r e t a i n e d (v a l)}) = \frac{1}{M} \sum_{i = 1}^{M} T_{i} (X_{t r a i n_r e t a i n e d (v a l)}) = {\hat{f}}^{R F_v a l} \in ℝ^{288 \times 1}$
Compute MAE, MAPE, and RMSE between ${\hat{y}}_{t r a i n_r e t a i n e d (v a l)}$ and $y_{t r a i n_r e t a i n e d (v a l)}$
Output: {MAE, MAPE, RMSE} ← Performance metrics

3.

Predicting using the test data

Input: ${RF}^{o p t i m a l}, X_{t e s t_r e t a i n e d}, y_{t e s t_r e t a i n e d}$
$Use {RF}^{o p t i m a l}$ on $X_{t e s t_r e t a i n e d}$ to predict $y_{t e s t_r e t a i n e d}$ such that $\hat{f} (X_{t e s t_r e t a i n e d}) = {\hat{y}}_{t e s t_r e t a i n e d}$
Output: ${\hat{y}}_{t e s t_r e t a i n e d} \leftarrow {\hat{f}}^{R F_m o d e l} \in ℝ^{288 \times 1}$

4.

Model performance assessment using test data

Input: $y_{t e s t_r e t a i n e d}, {\hat{f}}^{R F_m o d e l}$
Calculate MAE, MAPE, RMSE, PINAW, MZ test, and the DM test
Output: {MAE, MAPE, RMSE, PINAW, MZ test, DM test} ← Performance metrics

5.

Final output

Output: ${\hat{f}}^{R F_m o d e l}$ , Performance metrics

Algorithm A3: Wavelet transform (through MODWT)

1.

Load relevant R libraries (waveslim, forecast, caret, kernlab)

2.

Fit the entire retained data

Input: $X_{r e t a i n e d} \in ℝ^{1464 \times 10}, y_{r e t a i n e d} \in ℝ^{1464 \times 1}$ ,
$Compute {\hat{y}}_{f i t t e d} = \hat{f} (X_{r e t a i n e d})$ using RVM model
Output: ${\hat{y}}_{f i t t e d} \in ℝ^{1464 \times 1}$

3.

Calculate residuals

Input: $y_{r e t a i n e d}$ $, {\hat{y}}_{f i t t e d}$
$Compute residuals y_{r} = y_{r e t a i n e d} - {\hat{y}}_{f i t t e d}$
Output: $y_{r} \in ℝ^{1464 \times 1}$

4.

Set wavelet parameters

Input: $y_{r}$ $, db 4 \leftarrow$ $wavelet_filter, 2 \leftarrow n_level,$ $periodic \leftarrow boundary$
Output: $y_{r}$ $, wavelet_filter, n_level$ , boundary

5.

Perform wavelet decomposition using MODWT

Input: $y_{r}$ $, wavelet_filter, n_level$ , boundary
$Decompose y_{r}$ into detailed and approximate signals
Output: $A_{2} \in ℝ^{1464 \times 1} \leftarrow A (Approximate subseries); (D_{1}, D_{2}) \in ℝ^{1464 \times 2} \leftarrow D$ (Detailed subseries)

Algorithm A4: AdaBoostRT (through boosting)

1.

Load relevant R libraries (ReBoost)

2.

Initialise parameters

$τ_{i} = (\frac{1}{n}) \in ℝ \leftarrow weights, V \in ℝ \leftarrow number of weak learners, δ \leq 0.38 \in ℝ \leftarrow$ error threshold
Output: $Initialised τ_{i}, V, δ$

3.

Train (60%) and validate (20%) using 80% of the retained data

3.1.

Tuning hyperparameters

$For each i = 1 to V$
a. Training weak leaner
- Input: $X_{t r a i n_r e t a i n e d}$ $, y_{t r a i n_r e t a i n e d}$ $, τ_{i}$
- $Fit a weak learner q_{i}$ $to the weighted X_{t r a i n_r e t a i n e d}$
- $Predict for y_{t r a i n_r e t a i n e d}$
- Output: $q_{i} (X_{t r a i n_r e t a i n e d})$
b. Calculate error function
- Input: $y_{t r a i n_r e t a i n e d}$ $, q_{i} (X_{t r a i n_r e t a i n e d})$
- $Compare q_{i}$ $(X_{t r a i n_r e t a i n e d}$ $) with y_{t r a i n_r e t a i n e d}$ :
- $If error ρ_{i} < δ$ , correct; otherwise, incorrect
- Output: $ρ_{i}$ (incorrectly classified)
c. Update weights
- Input: $y_{t r a i n_r e t a i n e d}$ $, q_{i}$ ( $X_{t r a i n_r e t a i n e d}$ ) $, τ_{i}$ $, ρ_{i}$
- Increase incorrectly classified weights
- Normalise the updated weights
- Output: $τ_{i + 1}$

3.2.

Calculate model weights

Input: $ρ_{i}$
$Compute model weight ψ_{i}$

- Output: $ψ_{i}$ $(for the i^{t h}$ weak learner)

3.3.

Preserve trees and weights

Input: $Weak trees \{q_{1}, q_{2}, \dots, q_{V}\}$ $, weights {ψ_{1}$ $, ψ_{2}$ $, \dots, ψ_{V}}$
$Store \{q_{1}, q_{2}, \dots, q_{V}\} \leftarrow q$ $and {ψ_{1}$ $, ψ_{2}$ $, . . ., ψ_{V}} \leftarrow ψ$
Output: $(q, ψ) \leftarrow A d a B o o s t R T^{o p t i m a l}$

3.4.

Model validated on the 20% of the data

Input: $A d a B o o s t R T^{o p t i m a l}, X_{t r a i n_r e t a i n e d (v a l)},$ $y_{t r a i n_r e t a i n e d (v a l)}$
Weighted ensemble of the predictions from all trees such that $\hat{f} (X_{t r a i n_r e t a i n e d (v a l)}) = \sum_{i = 1}^{V} ψ_{i} q_{i} (X_{t r a i n_r e t a i n e d (v a l)}) = {\hat{y}}_{t r a i n_r e t a i n e d (v a l)} = {\hat{f}}^{A d a B o o s t R T_v a l} \in ℝ^{288 \times 1}$
Compute MAE, MAPE, and RMSE
Output: {MAE, MAPE, RMSE}←Performance metrics

4.

Predicting using test data

Input: $A d a B o o s t R T^{o p t i m a l}$ $, X_{t e s t_r e t a i n e d} \in ℝ^{288 \times 10}$ $, y_{t e s t_r e t a i n e d} \in ℝ^{288 \times 1}$
$Compute \hat{f} (X_{t e s t_r e t a i n e d}) = \sum_{i = 1}^{V} ψ_{i} q_{i} (X_{t e s t_r e t a i n e d}) = {\hat{y}}_{t e s t_r e t a i n e d}$
Output: ${\hat{y}}_{t e s t_r e t a i n e d} \leftarrow {\hat{f}}^{A d a B o o s t R T} \in ℝ^{288 \times 1}$

5.

Model performance assessment using test data

Input: $y_{t e s t_r e t a i n e d}, {\hat{f}}^{A d a B o o s t R T}$
Calculate MAE, MAPE, RMSE, PINAW, MZ test, and DM test
Output: {Calculate MAE, MAPE, RMSE, PINAW, MZ test, and DM test}←Performance metrics

6.

Final output

Output: ${\hat{f}}^{A d a B o o s t R T},$ Performance metrics

Algorithm A5: RVM (through Bayesian framework)

1.

Load relevant R libraries (kernlab, caret)

2.

Initialise Parameters

Set $π \in ℝ^{888 \times 1}$ $(precision weights), w \in ℝ^{888 \times 1} ($ $weights), β^{- 1} = σ^{2} \in ℝ ($ $noise term in regression), ω_{0} \in ℝ ($ bias term)
Output: $π$ $, σ^{2}, w$ $, ω_{0}$

3.

Train (60%) and validate (20%) using 80% of the retained data

3.1.

Tuning hyperparameters

Choose a basis and transform the data
Input: $X_{t r a i n_r e t a i n e d}$ $\in ℝ^{888 \times 10}$ $, y_{t r a i n_r e t a i n e d}$ $\in ℝ^{888 \times 1}$
$Define a basis function such that K (X_{t r a i n_r e t a i n e d}) \in ℝ^{888 \times 888}$
Output: $K (X_{t r a i n_retained})$
Fit the model on training data
Input: π, w, ω₀ $, X_{t r a i n_r e t a i n e d}$ $, K (X_{t r a i n_retained})$
$Compute \hat{f} (X_{t r a i n_r e t a i n e d}) = \sum_{i = 1}^{888} w_{i} K (X_{t r a i n_r e t a i n e d}, X_{i}) + ω_{0}$
Output: $\hat{f} (X_{t r a i n_r e t a i n e d})$
Update hyperparameters
Input: $\hat{f} (X_{t r a i n_r e t a i n e d}), y_{t r a i n_r e t a i n e d}, ω_{0}$
$Through marginal likelihood, optimise w$ $, update both σ^{2}$ $and π$
$Using bias ω_{0}$ , prune excessive weights, and remove non-relevant vectors
$Adjust precision to \infty$
Output: $updated π, σ^{2}$ $, w, ω_{0}$
Check for convergence
Input: $ϵ = \sum_{i = 1} w_{i}^{n + 1} - w_{i}^{n} < ϵ_{T h r e s h} \leftarrow$ threshold (based on weights)

While convergence is not achieved, fit the model and update hyperparameters (repeat from step 3)
If convergence is achieved, stop the process.
Output: Convergence decision (True or False), optimised parameters $(π, σ^{2}, w, ω_{0}) \leftarrow R V M^{o p t i m a l}$

3.2.: Model validation on the 20% of the data
Input: $X_{t r a i n_r e t a i n e d (v a l)}, R V M^{o p t i m a l}$
$Use {R V M}^{o p t i m a l} to fit \hat{f} (X_{t r a i n_r e t a i n e d (v a l)}) = {\hat{y}}_{t r a i n_r e t a i n e d (v a l)} = {\hat{f}}^{R V M_v a l} \in ℝ^{288 \times 1}$
$Compute MAE, MAPE, and RMSE$
Output: ${\hat{y}}_{t r a i n_r e t a i n e d (v a l)}, {MAE, MAPE, RMSE} \leftarrow$ Performance metrics
4.: Predicting using test data
Input: $R V M^{o p t i m a l}, X_{t e s t_r e t a i n e d} \in ℝ^{288 \times 10}, y_{t e s t_r e t a i n e d} \in ℝ^{288 \times 1}$
$Compute {\hat{y}}_{t e s t_r e t a i n e d}$ $= \hat{f} (X_{t e s t_r e t a i n e d}) = \sum_{i = 1}^{288} w_{i} K (X_{t e s t_retained}, X_{i}) + ω_{0}$ $using R V M^{o p t i m a l}$
Output: ${\hat{y}}_{t e s t_r e t a i n e d} \in ℝ^{288 \times 1}$
5.: Evaluate model performance on test data
Input: $y_{t e s t_r e t a i n e d}, {\hat{y}}_{t e s t_r e t a i n e d}$
Compute MAE, MAPE, RMSE, PINAW, MZ test, DM test
Output: ${\hat{y}}_{t e s t_r e t a i n e d} \leftarrow {\hat{f}}^{R V M_m o d e l}, {MAE, MAPE, RMSE, PINAW, MZ test, DM test} \leftarrow$ Performance metrics
6.: Final output
Output: ${\hat{f}}^{R V M_m o d e l}$ , Performance metrics

II: Variable distributions

Figure A1. Variable distributions.

III: Glossary

Available Dispatchable Capacity (Incl Non-Comm Units)—The capacity that is available from all dispatchable generation resources, and includes non-commercial generation, as it is dispatchable energy available to support the system.

CSP—Total contracted Concentrated Solar Power generation.

Dispatchable IPP OCGT—OCGT plant that is owned by an IPP and is dispatched by Eskom National Control.

Gen Unit Hours—The number of hours that one unit at pump storage stations can generate based on the amount of water still available in the dams or the number of hours that one unit at an OCGT power station can generate based on the fuel available at that power station.

GW—Gigawatt = 1000 megawatts.

GWh—Gigawatt-hour = 1000 MWh.

Hydro Generation—Generation from large hydropower stations, and sent out onto the Transmission network.

ILS—Interruptible Load Shed. This is consumer load(s) that can be contractually interrupted without notice or reduced by remote control or on instruction from Eskom National Control. Individual contracts place limitations on usage.

International Exports—Energy that is exported from RSA to neighbouring countries.

International Imports—Energy that is imported into RSA from neighbouring countries.

IOS—Interruption of Supply. It is all contracted as well as mandatory demand reduction resources utilised by Eskom National Control. This includes interruption of supply due to Transmission network faults.

IPP—Independent Power Producers that Eskom has contracts with.

kWh—Kilowatt-hour = 1000 watt-hours.

Load Factor—The ratio of the energy generated over a specific time versus the maximum generating capability over the same period.

MLR—Manual Load Reduction. It is an estimation of the demand that has been reduced due to load shedding and/or curtailment.

MW—Megawatt = 1 million watts.

MWh—Megawatt-hour = 1000 kWh.

Non-Dispatchable Conventional IPP—IPP that uses conventional fuel sources to generate energy. These IPPs are contracted with Eskom but not dispatched by Eskom National Control.

Nuclear Generation—Generation from nuclear power stations, and sent out onto the Transmission network.

OCGT—Open Cycle Gas Turbine. Generation from open cycle gas turbine power stations, and sent out onto the Transmission network. These power stations use diesel as their primary resource.

OCLF—Other Capability Loss Factor of Eskom plant. It is the ratio between the unavailable energy of the units that cannot be dispatched, due to constraints out of the power station management control, over a period compared to the total net installed capacity of all units over the same period.

Other RE—Generation from other smaller contracted renewables (small hydro, biomass, landfill gas, etc.).

PCLF—Planned Capability Loss Factor of Eskom plant. It is the ratio between the unavailable energy of the units that are out on planned maintenance over a period compared to the total net installed capacity of all units over the same period.

Pumped Water Generation—Generation from pumped storage power stations, and sent out onto the Transmission network.

Pumping—During off-peak periods and when the system allows, water is pumped from the bottom dams at pumped storage stations to the top dams so that this water is available to generate again. During this process, energy is used from the Transmission network.

PV—Total contracted Photovoltaic generation.

Residual Demand—The hourly average demand that needs to be supplied by all resources that can be dispatched by Eskom National Control. It includes Eskom generation, international imports, dispatchable IPPs and IOS. Normally expressed in MW.

Residual Energy—The total residual demand that is summated over a period of time. Normally expressed in MWh or GWh.

Residual Forecast—The forecast of what the expected residual demand will be in the future.

RSA Contracted Demand—The hourly average demand that needs to be supplied by all resources that Eskom has contracts with. It is the residual demand including demand supplied by self-dispatched generation (such as the renewables).

RSA Contracted Energy—The total RSA contracted demand that is summated over a period of time. Normally expressed in MWh or GWh.

RSA Contracted Forecast—The forecast of what the expected RSA contracted demand will be in the future.

SCO—Synchronous Condenser Operation. The energy used (MW per hour) to overcome the frictional losses when the plant is used to assist in stabilizing the network by supplying or absorbing reactive power.

Thermal Generation—Generation from coal-fired power stations, and sent out onto the Transmission network.

Total Available Capacity (Incl Non-Comm Units and Renewables)—The capacity that is available from all generation resources that Eskom has contracts with, and includes non-commercial generation, as it is energy available to support the system.

UCLF—Unplanned Capability Loss Factor of Eskom plant. It is the ratio between the unavailable energy of the units that are out on unplanned outages over a period compared to the total net installed capacity of all units over the same period.

Wind—Total contracted Wind generation.

References

Maythem, A.; Maryam, A. A New Load Forecasting Model Considering Planned Load Shedding Effect. Int. J. Energy Sect. Manag. 2018, 13, 149–165. [Google Scholar] [CrossRef]
Onaolapo, A.K.; Carpanen, R.P.; Dorrell, D.G.; Ojo, E.E. A Comparative Assessment of Conventional and Artificial Neural Networks Methods for Electricity Outage Forecasting. Energies 2022, 15, 511. [Google Scholar] [CrossRef]
Oladunni, O.J.; Mpofu, K.; Olanrewaju, O.A. Greenhouse Gas Emissions and Its Driving Forces in the Transport Sector of South Africa. Energy Rep. 2022, 8, 2052–2061. [Google Scholar] [CrossRef]
Chikobvu, D.; Mamba, M. Modelling Emissions from Eskom’s Coal-Fired Power Stations Using Generalised Linear Models. J. Energy S. Afr. 2023, 34, 1–14. [Google Scholar] [CrossRef]
Rakotonirainy, R.G.; Durbach, I.; Nyirenda, J. Considering Fairness in the Load Shedding Scheduling Problem. Orion 2019, 35, 127–144. [Google Scholar] [CrossRef]
Inglesi, R.; Pouris, A. Forecasting Electricity Demand in South Africa: A Critique of Eskom’s Projections. S. Afr. J. Sci. 2010, 106, 50–53. [Google Scholar] [CrossRef]
Pretorius, I.; Piketh, S.; Burger, R. The Impact of the South African Energy Crisis on Emissions. WIT Trans. Ecol. Environ. 2015, 198, 255–264. [Google Scholar]
Jaech, A.; Zhang, B.; Ostendorf, M.; Kirschen, D.S. Real-Time Prediction of the Duration of Distribution System Outages. IEEE Trans. Power Syst. 2018, 34, 773–781. [Google Scholar] [CrossRef]
Pombo-van Zyl, N. Warning: Stage 2 Loadshedding Returns States Eskom. ESI Afr. Afr. Power J. 2020. Available online: https://www.esi-africa.com/industry-sectors/transmission-and-distribution/warning-high-risk-of-loadshedding-returns-states-eskom/ (accessed on 9 October 2023).
Marta, N.; Agnieszka, T. Load Shedding and the Energy Security of Republic of South Africa. J. Pol. Saf. Reliab. Assoc. Summer Saf. Reliab. Semin. 2015, 6, 99–108. Available online: https://bibliotekanauki.pl/articles/2069278 (accessed on 17 June 2023).
IEA. Electricity Market Report—January 2022; IEA: Paris, France, 2022; Available online: https://www.iea.org/reports/electricity-market-report-january-2022 (accessed on 15 September 2023).
Inglesi-Lotz, R. The Impact of Electricity Shortage on South Africa’s Economy. National Science and Technology Forum (NSTF). 2021. Available online: https://nstf.org.za/wp-content/uploads/2022/05/NSTF-2021-Loadshedding-Roula-Inglesi-Lotz.pdf (accessed on 17 December 2023).
Sivhugwana, K.S.; Ranganai, E. An Ensemble Approach to Short-Term Wind Speed Predictions Using Stochastic Methods, Wavelets and Gradient Boosting Decision Trees. Wind 2024, 4, 44–67. [Google Scholar] [CrossRef]
Gordon, R.; Gareth, E. Offshore Wind Energy—South Africa’s Untapped Resource. J. Energy S. Afr. 2020, 31, 26–42. [Google Scholar] [CrossRef]
Fluri, T.P. The Potential of Concentrating Solar Power in South Africa. Energy Policy 2009, 37, 5075–5080. [Google Scholar] [CrossRef]
Bosch, J.; Staffell, I.; Hawkes, A.D. Temporally Explicit and Spatially Resolved Global Offshore Wind Energy Potentials. Energy 2018, 163, 766–781. [Google Scholar] [CrossRef]
Akinbami, O.M.; Oke, S.R.; Bodunrin, M.O. The State of Renewable Energy Development in South Africa: An Overview. Alex. Eng. J. 2021, 60, 5077–5093. [Google Scholar] [CrossRef]
Statistics South Africa (Stats SA). Electricity, Gas and Water Supply Industry Report 2021. Available online: https://www.statssa.gov.za/publications/Report-41-01-02/Report-41-01-022021.pdf (accessed on 15 September 2023).
Pouris, A. Energy and Fuels Research in South African Universities: A Comparative Assessment. Open Inf. Sci. J. 2008, 1, 1–9. Available online: https://repository.up.ac.za/bitstream/handle/2263/5990/Pouris_Energy(2008).pdf?sequence=1 (accessed on 17 October 2024). [CrossRef]
Onaolapo, A.K.; Pillay Carpanen, R.; Dorrell, D.G.; Ojo, E.E. A Comparative Evaluation of Conventional and Computational Intelligence Techniques for Forecasting Electricity Outage. In Proceedings of the Southern African Universities Power Engineering Conference/Robotics and Mechatronics/Pattern Recognition Association of South Africa (SAUPEC/RobMech/PRASA), Potchefstroom, South Africa, 27–29 January 2021; pp. 1–6. [Google Scholar]
Pahwa, A. Effect of Environmental Factors on Failure Rate of Overhead Distribution Feeders. In Proceedings of the IEEE Power Engineering Society General Meeting, Denver, CO, USA, 6–10 June 2004; pp. 691–692. [Google Scholar] [CrossRef]
Tartibu, L.K.; Kabengele, K.T. Forecasting Net Energy Consumption of South Africa Using Artificial Neural Network. In Proceedings of the International Conference on the Industrial and Commercial Use of Energy (ICUE 2018), Cape Town, South Africa, 13–15 August 2018; pp. 16–22. [Google Scholar]
Dahal, K.P. A Review of Maintenance Scheduling Approaches in Deregulated Power Systems. In Proceedings of the International Conference on Power Systems (ICPS 2004), Kathmandu, Nepal, 3–5 November 2004; pp. 565–570. Available online: http://hdl.handle.net/10454/2502 (accessed on 11 March 2023).
Hou, H.; Zhu, S.; Geng, H.; Li, M.; Xie, Y.; Zhu, L.; Huang, Y. Spatial Distribution Assessment of Power Outage under Typhoon Disasters. Int. J. Electr. Power Energy Syst. 2021, 132, 107169. [Google Scholar] [CrossRef]
Mamun, A.A.; Sohel, M.; Mohammad, N.; Haque Sunny, M.S.; Dipta, D.R.; Hossain, E. A Comprehensive Review of the Load Forecasting Techniques Using Single and Hybrid Predictive Models. IEEE Access 2020, 8, 134911–134939. [Google Scholar] [CrossRef]
Oh, S.; Kong, J.; Choi, M.; Jung, J. Data-Driven Prediction Method for Power Grid State Subjected to Heavy-Rain Hazards. Appl. Sci. 2020, 10, 4693. [Google Scholar] [CrossRef]
Han, S.R.; Guikema, S.D.; Quiring, S.M. Improving the Predictive Accuracy of Hurricane Power Outage Forecasts Using Generalized Additive Models. Risk Anal. 2009, 29, 1443–1453. [Google Scholar] [CrossRef]
Kankanala, P.; Das, S.; Pahwa, A. AdaBoost+: An Ensemble Learning Approach for Estimating Weather-Related Outages in Distribution Systems. IEEE Trans. Power Syst. 2014, 29, 359–367. [Google Scholar] [CrossRef]
Kankanala, P.; Pahwa, A.; Das, S. Regression Models for Outages Due to Wind and Lightning on Overhead Distribution Feeders. In Proceedings of the IEEE PES General Meeting 2011, Detroit, MI, USA, 24–28 July 2011; p. 3. [Google Scholar] [CrossRef]
Kankanala, P.; Pahwa, A.; Das, S. Exponential Regression Models for Wind and Lightning Caused Outages on Overhead Distribution Feeders. In Proceedings of the North America Power Symposium (NAPS), Boston, MA, USA, 4–6 August 2011. [Google Scholar] [CrossRef]
Liu, H.; Davidson, R.; Rosowsky, D.; Stedinger, J. Negative Binomial Regression of Electric Power Outages in Hurricanes. J. Infrastruct. Syst. 2005, 11, 258–267. [Google Scholar] [CrossRef]
Das, S.; Kankanala, P.; Pahwa, A. Outage Estimation in Electric Power Distribution Systems Using a Neural Network Ensemble. Energies 2021, 14, 4797. [Google Scholar] [CrossRef]
Guikema, S.D.; Quiring, S.M.; Han, S.R. Prestorm Estimation of Hurricane Damage to Electric Power Distribution Systems. Risk Anal. 2010, 30, 1744–1752. [Google Scholar] [CrossRef] [PubMed]
Rizvi, M. Leveraging Deep Learning Algorithms for Predicting Power Outages and Detecting Faults: A Review. Adv. Res. 2023, 25, 80–88. [Google Scholar] [CrossRef]
Tipping, M.E. Sparse Bayesian Learning and the Relevance Vector Machine. J. Mach. Learn. Res. 2001, 1, 211–244. Available online: http://www.jmlr.org/papers/volume1/tipping01a/tipping01a.pdf (accessed on 3 August 2023).
Tzikas, D.G.; Wei, L.; Likas, A.C.; Yang, Y.; Galatsanos, N.P. A Tutorial on Relevance Vector Machines for Regression and Classification with Applications. EURASIP J. Adv. Signal Process. 2006, 17, 4. [Google Scholar]
Sivhugwana, K.S.; Ranganai, E. Short-Term Wind Speed Prediction via Sample Entropy: A Hybridisation Approach against Gradient Disappearance and Explosion. Computation 2024, 12, 163. [Google Scholar] [CrossRef]
Yuan, S.; Quiring, M.S.; Zhu, L.; Huang, Y.; Wang, J. Development of a Typhoon Power Outage Model in Guangdong, China. Int. J. Electr. Power Energy Syst. 2020, 117, 105711. [Google Scholar] [CrossRef]
Wanik, D.W.; Anagnostou, E.N.; Hartman, B.M.; Frediani, M.E.B.; Astitha, M. Using Machine Learning Methods to Improve Prediction of Weather-Related Power Outages. Electr. Power Syst. Res. 2017, 146, 236–245. [Google Scholar] [CrossRef]
Motepe, S.; Hasan, A.N.; Shongwe, T. Forecasting the Total South African Unplanned Capability Loss Factor Using an Ensemble of Deep Learning Techniques. Energies 2022, 15, 2546. [Google Scholar] [CrossRef]
Bruce, L.M.; Koger, C.H.; Li, J. Dimensionality Reduction of Hyperspectral Data Using Discrete Wavelet Transform Feature Extraction. IEEE Trans. Geosci. Remote Sens. 2002, 40, 2331–2338. [Google Scholar] [CrossRef]
Zou, H.; Hastie, T. Regularization and Variable Selection via the Elastic Net. J. R. Stat. Soc. Ser. B Stat. Methodol. 2005, 67, 301–320. [Google Scholar] [CrossRef]
Ranganai, E.; Mudhombo, I. Variable Selection and Regularization in Quantile Regression via Minimum Covariance Determinant Based Weights. Entropy 2020, 23, 33. [Google Scholar] [CrossRef]
Natras, R.; Soja, B.; Schmidt, M. Ensemble Machine Learning of Random Forest, AdaBoost and XGBoost for Vertical Total Electron Content Forecasting. Remote Sens. 2022, 14, 3547. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Bühlmann, P. Methods. In Handbook of Computational Statistics; Gentle, J., Härdle, W., Mori, Y., Eds.; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar] [CrossRef][Green Version]
Haijian, S.; Wei, H.; Xing, D.; Song, X. Short-Term Wind Speed Forecasting Using Wavelet Transformation and AdaBoosting Neural Networks in Yunnan Wind Farm. IET Renew. Power Gener. 2016, 11, 374–381. [Google Scholar] [CrossRef]
Solomatine, D.P.; Shrestha, D.L. AdaBoost_RT: A Boosting Algorithm for Regression Problems. In Proceedings of the 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541), Budapest, Hungary, 25–29 July 2004; IEEE: New York, NY, USA, 2004; Volume 2, pp. 1163–1168. [Google Scholar] [CrossRef]
Zhang, P.; Yang, Z. A Robust AdaBoost_RT Based Ensemble Extreme Learning Machine. Math. Probl. Eng. 2015, 2015, 260970. [Google Scholar] [CrossRef]
Li, R.; Sun, H.; Wei, X.; Ta, W.; Wang, H. Lithium Battery State-of-Charge Estimation Based on AdaBoost_RT-RNN. Energies 2022, 15, 6056. [Google Scholar] [CrossRef]
Karatzoglou, A.; Smola, A.; Hornik, K.; Zeileis, A. kernlab—An S4 Package for Kernel Methods in R. J. Stat. Softw. 2004, 11, 1–20. [Google Scholar] [CrossRef]
Fletcher, T. Relevance Vector Machines Explained. Available online: https://www.di.fc.ul.pt/~jpn/r/PRML/chp7/Fletcher_RVM_Explained.pdf (accessed on 14 March 2023).
Hyndman, R.J.; Athanasopoulos, G. Forecasting: Principles and Practice, 2nd ed.; OTexts: Melbourne, Australia, 2021. [Google Scholar]
Hou, P.S.; Fadzil, L.M.; Manickam, S.; Al-Shareeda, M.A. Vector Autoregression Model-Based Forecasting of Reference Evapotranspiration in Malaysia. Sustainability 2023, 15, 3675. [Google Scholar] [CrossRef]
Bhattarai, B.P.; Paudyal, S.; Luo, Y.; Mohanpurkar, M.; Cheung, K.; Tonkoski, R.; Hovsapian, R.; Myers, K.S.; Zhang, R.; Zhao, P.; et al. Big Data Analytics in Smart Grids: State-of-the-Art, Challenges, Opportunities, and Future Directions. IET Smart Grid 2019, 2, 141–154. [Google Scholar] [CrossRef]
Mohamed, M.A.; Eltamaly, A.M.; Farh, H.M.; Alolah, A.I. Energy Management and Renewable Energy Integration in Smart Grid System. In Proceedings of the 2015 IEEE International Conference on Smart Energy Grid Engineering (SEGE), Oshawa, ON, Canada, 17–19 August 2015; IEEE: New York, NY, USA, 2015. [Google Scholar]
Arts, L.; van den Broek, E.L. The Fast Continuous Wavelet Transformation (fCWT) for Real-Time, High-Quality, Noise-Resistant Time-Frequency Analysis. Nat. Comput. Sci. 2022, 2, 47–58. [Google Scholar] [CrossRef] [PubMed]
Yarmohammadi, M. A Filter Based Fisher g-Test Approach for Periodicity Detection in Time Series Analysis. Sci. Res. Essays 2011, 6, 3717–3723. [Google Scholar] [CrossRef]
Hong, X.; Mitchell, R.; Di Fatta, G. Simplex Basis Function Based Sparse Least Squares Support Vector Regression. Neurocomputing 2019, 330, 394–402. [Google Scholar] [CrossRef]
Gensler, A. Wind Power Ensemble Forecasting: Performance Measures and Ensemble Architectures for Deterministic and Probabilistic Forecasts. Ph.D. Thesis, University of Kassel, Kassel, Germany, 2018. [Google Scholar] [CrossRef]
Sun, X.; Wang, Z.; Hu, J. Prediction Interval Construction for By-Product Gas Flow Forecasting Using Optimized Twin Extreme Learning Machine. Math. Probl. Eng. 2017, 2017, 12. [Google Scholar] [CrossRef]
Diebold, F.X.; Mariano, R. Comparing Predictive Accuracy. J. Bus. Econ. Stat. 1995, 13, 253–265. [Google Scholar] [CrossRef]
Zhou, Q.; Lv, Z.; Zhang, G. A Combined Forecasting System Based on Modified Multi-Objective Optimization for Short-Term Wind Speed and Wind Power Forecasting. Appl. Sci. 2021, 11, 9383. [Google Scholar] [CrossRef]
Sivhugwana, K.S.; Ranganai, E. Wind Speed Forecasting with Differentially Evolved Minimum-Bandwidth Filters and Gated Recurrent Units. Forecasting 2025, 7, 27. [Google Scholar] [CrossRef]
Hasanat, S.M.; Ullah, K.; Yousaf, H.; Munir, K.; Abid, S.; Bokhari, S.; Aziz, M.M.; Naqvi, S.F.M.; Ullah, Z. Enhancing Short-Term Load Forecasting with a CNN-GRU Hybrid Model: A Comparative Analysis. IEEE Access 2024, 12, 184132–184141. [Google Scholar] [CrossRef]
Alhussein, M.; Aurangzeb, K.; Haider, S.I. Hybrid CNN-LSTM Model for Short-Term Individual Household Load Forecasting. IEEE Access 2020, 8, 180544–180557. [Google Scholar] [CrossRef]
Marino, D.L.; Amarasinghe, K.; Manic, M. Building Energy Load Forecasting Using Deep Neural Networks. In Proceedings of the 42nd Annual Conference of the IEEE Industrial Electronics Society (IECON), Florence, Italy, 23–26 October 2016; IEEE: Piscataway, NJ, USA, 2016. [Google Scholar] [CrossRef]
Tan, Z.; Zhang, J.; He, Y.; Zhang, Y.; Xiong, G.; Liu, Y. Short-Term Load Forecasting Based on Integration of SVR and Stacking. IEEE Access 2020, 8, 227719–227728. [Google Scholar] [CrossRef]
Li, S.; Chen, W. A Study on Interpretable Electric Load Forecasting Model with Spatiotemporal Feature Fusion Based on Attention Mechanism. Technologies 2025, 13, 219. [Google Scholar] [CrossRef]
Ahsan, M.M.; Mahmud, M.A.P.; Saha, P.K.; Gupta, K.D.; Siddique, Z. Effect of Data Scaling Methods on Machine Learning Algorithms and Model Performance. Technologies 2021, 9, 52. [Google Scholar] [CrossRef]
Pinheiro, J.M.H.; de Oliveira, S.V.B.; Silva, T.H.S.; Saraiva, P.A.R.; de Souza, E.F.; Godoy, R.V.; Becker, M. The Impact of Feature Scaling in Machine Learning: Effects on Regression and Classification Tasks. arXiv 2025, arXiv:2506.08274. [Google Scholar] [CrossRef]
Wright, M.A. Wind Speed Climatology in the Northern, Western, and Eastern Capes of South Africa: Implications for Wind Power. Ph.D. Thesis, University of the Witwatersrand, Johannesburg, South Africa, 2021. [Google Scholar]

Figure 1. Hourly unplanned outage levels plot for the period 1 March 2021 to 30 April 2022 (source: own image).

Figure 2. Schematic representation of the proposed stacking hybrid RVM-WT-AdaBoostRT-RF model (source: own image).

Figure 3. The time plot, density plot, boxplot, and Q-Q plot for power outage data for Autumn (a), Winter (b), Spring (c), Summer (d), and Autumn 2022 (e) datasets (source: own image). Blue lines represent Q-Q lines and sky-blue boxes in indicate interquartile ranges.

Figure 4. Level 2 DB4 wavelet decomposition of the RVM residuals for Autumn (top left panel), Winter (top right panel), Spring (middle left panel), Summer (middle right panel), and Autumn 2022 (bottom centre panel) datasets (source: own image).

Figure 5. Comparison of models’ predictions and actual power outage levels for Autumn (top left panel), Winter (top right panel), Spring (middle left panel), Summer (middle right panel), and Autumn 2022 (bottom centre panel) datasets (source: own image).

Figure 6. Box plot comparison of models’ residuals for Autumn (top left panel), Winter (top right panel), Spring (middle left panel), Summer (middle right panel), and Autumn 2022 (bottom centre panel) datasets (source: own image).

Table 3. Power grid data description.

Variable	Keynote
$x$ ; $y$	Independent Variable; Dependent Variable
$x_{O R F L}$ $; x_{R F}; x_{R S A . C F;}$ $x_{D G}; x_{I E}; x_{R D}; x_{R S A . C D} .$	ORFL = Original Residual Forecast Before Lockdown; RF = Residual Forecast; RSA.CF = Republic Of South Africa (RSA) Contracted Forecast; DG = Dispatchable Generation; IE = International Exports; RD = Residual Demand; RSA.CD = RSA Contracted Demand.
$x_{I M}; x_{T G}$ ; $x_{N G}; x_{E G G}$ $; x_{E . O C G T . G}$ ; $x_{H W G}; x_{I L S U}$ ; $x_{M L R}; x_{I O S}; x_{D . I P P . O C G T}; x_{E . G S C O}$ $x_{E . O C G T . S C O}; x_{P W S C O . P}; x_{P S}; x_{I E C}$ .	IM = International Imports; TG = Thermal Generation; NG = Nuclear Generation; EGG = Eskom Gas Generation; E.OCGT.G = Eskom Open Cycle Gas Turbine Generation; HWG = Hydro Water Generation; PWG = Pumped Water Generation; ILSU = Interruptible Load Shed Usage; MLR = Manual Load Reduction; IOS = Interruption of Supply Excl ILS and MLR; D.IPP.OCGT = Dispatchable Independent Power Producers Eskom Open Cycle Gas Turbine; E.GSCO = Eskom Gas Synchronous Condenser Operation; E.OCGT.SCO = Eskom Open Cycle Gas Turbine Synchronous Condenser Operation; PWSCO.P = Pumped Water Synchronous Condenser Operation Pumping; PS = Pump Storage; IEC= Installed Eskom Capacity.
$x_{D G U H}; x_{P G U H}; x_{I G U H}$ .	DGUH = Drakensberg Generation Unit Hours; PGUH = Palmiet Generation Unit Hours; IGUH = Ingula Generation Unit Hours.
$x_{W I N D}; x_{P V}; x_{C S P}$ ; $x_{O R E}; x_{T R E}; x_{W I C}; x_{P V I C}$ ; $x_{C S P I C}; x_{O R E I C}$ ; $x_{T R E I C}$ .	PV = Photovoltaic; CSP = Concentrated Solar Power; ORE = Other Renewable; TRE = Total Renewable; WIC = Wind Installed Capacity; PVIC = PV Installed Capacity; CSPIC = CSP Installed Capacity; OREIC = Other Renewable Installed Capacity; TREIC = Total Renewable Installed Capacity.
$x_{T P C L F}; x_{T U C L F}; x_{T O C L F}; x_{E . G S C O}; x_{l a g 1}$ ; $x_{l a g 2}; x_{l a g 24}; x_{N C S} .$	TPCLF = Total Planned Capability Loss Factor of Eskom plant; TUCLF = Total Unplanned Capability Loss Factor of Eskom plant; TOCLF = Total Other Capability Loss Factor of Eskom plant; lag 1 = TUCLF. OCLF 1 h ago (i.e., to capture immediate fluctuations); lag 2 = TUCLF. OCLF 2 h ago (i.e., to capture short-term trends); lag 24 = TUCLF. OCLF 24 h ago (i.e., to capture daily patterns); NCS = Non-comm sentout (NCS).
$y_{T U C L F . O C L F .}$	TUCLF. OCLF = Total unplanned power outage including TOCLF

Table 4. Sample breakdown for model training and testing.

Dataset	Date	Sample	Training (80%)	Test (20%)
Autumn	1 March–30 April 2021	1464	1176	288
Winter	1 June–31 July 2021	1464	1176	288
Spring	1 September–31 October 2021	1464	1176	288
Summer	1 December 2021–31 January 2022	1488	1195	293
Autumn 2022	1 March 2022–30 April 2022	1464	1176	288

Table 5. Model breakdown and contribution to the proposed RVM-WT-AdaBoostRT-RF.

Model	Contribution to the Strategy
LASSO	✓ LASSO is used for regularisation, variable selection, and dimension reduction. As a result, unplanned power outages are accurately predicted by utilising the most relevant and significant variables.
RVM	✓ These sparse Bayesian learning techniques are probabilistic frameworks that require fewer support vectors while providing accuracy and similar generalisation to that of SVMs. In fact, RVMs are able to capture data complexity behavior (such as random fluctuations, nonlinearity, intermittence, etc.) whilst preventing overfitting. Hence, RVMs are a top choice for regression of heterogeneous power grid data.
WT	✓ By using frequency and time-domain compatible WTs, we can effectively eliminate noise and reveal complex patterns that exist in power outage data. Hence, RVM residuals are best decomposed with a WT, as it is efficient and can handle nonstationary fluctuations well. Consequently, these signals become statistically reliable and easy to predict, thereby enhancing the predictive power of the model.
AdaBoostRT	✓ Our solution for high volatile residuals leverages AdaBoostRT capabilities to minimise model bias, and accurately forecast residual subseries while using decomposed subseries as input. As a result, bias in the forecast is minimised.
RF	✓ Besides providing top-10 most importance variables (which are pivotal for robust season-specific modelling), RFs are highly efficient at capturing nonlinearity while preventing overfitting and minimising variance. We, therefore, utilise RF as a meta-model to accurately and efficiently ensemble RVM, RF, AdaBoostRT, and residual forecasts to arrive at the forecast value, while minimising error accumulation and enhancing overall model robustness.

Table 6. Model parameters settings.

Model	Libraries	Method	Parameter	Optimal Range
LASSO	glmnet	Variable Selection	lambda family nlambda	0–2 “guassian” 100–500
RF	Caret, ranger, randomForest	Bagging ensemble	mtry	1–10
RF	Caret, ranger, randomForest	Bagging ensemble	ntree nodesize	100–1000 1–15
RVM	kernlab (rvm)	Bayesian inference	kernel	(“anovadot”,“rbfdot”)
			sigma	0–2
			degree	1–2
AdaBoostRT	ReBoost (AdaBoostRT)	Boosting ensemble	thr	0.001–0.3
			power	0–2
			t_final	30–500
WT	Waveslim (modwt)	Signal decomposition (noise reduction)	wf n.levels boundary	‘db4’ 2 ‘periodic’
VAR	Vars (var)	Autoregression	lag order p	1–3
Hybrid	-	Stacked

Table 7. Summary statistics for the datasets (in MW).

Dataset	Min	Q1	Median	Mean	Q3	Max	Std.Dev	Kurtosis	Skewness
Autumn	8410	11,184	11,931	11,863	12,619	14,867	986.3464	0.0874	−0.4222
Winter	8957	10,914	11,754	12,076	13,303	15,862	1562.242	−0.7764	0.3735
Spring	9819	11,966	13,044	13,055	14,193	16,573	1503.281	−0.7384	0.0026
Summer	10,144	12,823	14,219	13,928	14,924	17,558	1396.025	−0.5362	−0.3862
Autumn 2022	10,981	12,829	13,749	13,793	14,676	17,022	1245.443	−0.6943	0.1651

Table 8. Performance indicators for the developed models.

	Model	Autumn	Winter	Spring	Summer	Autumn 2022
Point forecasts evaluation
RMSE (MW)	Hybrid	262.6653	264.5506	394.6098	379.0801	260.6709
	RF	403.1206	326.5305	740.894	678.7496	383.0104
	RVM	414.4714	302.2173	549.6189	608.635	301.7799
	AdaBoostRT	390.2795	318.0112	714.8113	637.2507	359.4349
	VAR	2491.183	942.7002	1100.073	1201.092	812.6005
	Naive	1214.92	1027.169	1075.973	3252.646	993.0945
MAE (MW)	Hybrid	201.1949	198.5543	288.1561	273.6507	190.9103
	RF	287.4192	253.7929	538.0478	519.6454	289.1329
	RVM	298.4295	232.7954	385.2204	543.8472	227.4744
	AdaBoostRT	252.9515	239.721	500.6655	469.6258	264.6619
	VAR	2294.874	800.4576	893.751	990.4461	695.3962
	Naive	986.413	908.9877	902.3139	3013.444	781.4761
MAPE (%)	Hybrid	1.81973	1.6408	1.9897	2.2190	1.2744
	RF	2.5850	2.1058	3.7770	4.0968	1.9376
	RVM	2.9030	1.9925	2.6684	4.6424	1.5185
	AdaBoostRT	2.2760	1.9778	3.4994	3.7291	1.7754
	VAR	25.1661	6.7485	6.3391	7.9346	4.5550
	Naive	8.2233	7.4672	6.2169	19.2540	5.4756
Residual analysis
Standard deviation (MW)	Hybrid	257.5268	257.9521	376.4643	379.1566	261.116
	RF	368.4571	322.784	584.5953	474.7601	326.8649
	RVM	369.7274	283.4306	445.0868	279.8739	301.1337
	AdaBoostRT	374.4322	318.5555	606.9506	533.957	325.5703
	VAR	970.9904	931.584	1080.136	1188.798	722.2145
	Naive	1056.375	1007.025	1070.423	1226.374	767.169
Skewness/Error direction	Hybrid	Underestimate	Underestimate	Underestimate	Overestimate	Underestimate
	RF	Overestimate	Underestimate	Underestimate	Overestimate	Underestimate
	RVM	Underestimate	Underestimate	Underestimate	Underestimate	Underestimate
	AdaBoostRT	Overestimate	Overestimate	Underestimate	Overestimate	Underestimate
	VAR	Underestimate	Underestimate	Underestimate	Underestimate	Underestimate
	Naive	Overestimate	Underestimate	Underestimate	Underestimate	Underestimate
Bias test (Conclusion)
MZ *	Hybrid	Biased	Biased	Biased	Biased	Unbiased
	RF	Biased	Biased	Biased	Biased	Biased
	RVM	Biased	Biased	Biased	Biased	Biased
	AdaBoostRT	Biased	Biased	Biased	Biased	Biased
	VAR	Biased	Biased	Biased	Biased	Biased
	Naive	Biased	Biased	Biased	Biased	Biased
Prediction intervals evaluation
95% PINAW	Hybrid	21.2277	24.2517	30.2351	30.7052	30.0080
	RF	25.8908	27.8732	40.7178	32.6159	35.2706
	RVM	25.3539	27.8480	32.4444	17.4738	37.7469
	AdaBoostRT	27.9150	31.2775	43.6040	38.3266	36.9853
	VAR	68.7904	70.4972	79.6423	70.5827	57.4267
	Naive	86.1431	84.3254	92.8462	82.2576	79.2907
Predictive accuracy evaluation: Hybrid vs. individual models.
DM **	RF	$H_{0} Rejected$	$H_{0} Rejected$	$H_{0} Rejected$	$H_{0} Rejected$	$H_{0} Rejected$
	RVM	$H_{0} Rejected$	$H_{0} Rejected$	$H_{0} Rejected$	$H_{0} Rejected$	$H_{0} Rejected$
	AdaBoostRT	$H_{0} Rejected$	$H_{0} Rejected$	$H_{0} Rejected$	$H_{0} Rejected$	$H_{0} Rejected$
	VAR	$H_{0} Rejected$	$H_{0} Rejected$	$H_{0} Rejected$	$H_{0} Rejected$	$H_{0} Rejected$
	Naive	$H_{0} Rejected$	$H_{0} Rejected$	$H_{0} Rejected$	$H_{0} Rejected$	$H_{0} Rejected$

Keynote: * MZ test: p-values < 0.05 implies that the model is biased, otherwise the model is unbiased; ** DM test: Reject the null hypothesis of the same predictive accuracy if p-value < 0.05. Bold = Best model.

Table 9. Trade-off between accuracy and complexity (excluding Autumn 2022 dataset).

Model	Computational Time Intervals (s)	Average Computational Time (s)	Hybrid vs. Single Model Time Difference (s)	$- % Δ$ RMSE
RF	20–30	25	30	61
RVM	30–40	35	20	43
AdaBoostRT	30–40	35	20	55
VAR	5–10	7.5	47.5	375
Naive	5–10	7.5	47.5	395
Hybrid	50–60	55	-	-

Table 10. Ablation study using the summer dataset.

Model	Blender	RMSE/MW	MAPE/%	PICP/%	PINAW/%	MAD/MW	$- % Δ$ RMSE
RVM+AdaBoosRT	RF	458.6394	2.6951	95.5631	31.4419	376.9945	21
$RVM + AdaBoosRT + {\hat{ζ}}_{f}$	RF	430.3829	2.5669	95.2218	31.7904	352.7375	14
RF+AdaBoosRT	RF	466.5318	2.6454	95.9044	33.4339	304.2609	23
$RF + AdaBoosRT + {\hat{ζ}}_{f}$	RF	436.3922	2.4485	94.8806	31.6265	287.6163	15
$RVM + RF + {\hat{ζ}}_{f}$	RF	399.4977	2.7087	95.5631	25.5820	440.9243	5
$RVM + AdaBoosRT + RF + {\hat{ζ}}_{f}$	Average	3337.881	35.689	95.2218	26.8902	5001.131	781
Full stacked (Hybrid)	RF	379.0801	2.2190	95.9044	30.7052	292.8671

Keynote: MAD = Median absolute deviation (measures prediction sharpness). Smaller MAD values imply better model;

{\hat{ζ}}_{f}

= residual forecast; Bold = Best model.

Table 11. Comparison the proposed method with state-of-the-art methods from the literature.

Model	RMSE (MW)	MAPE (%)	Citation	Data Description
Hybrid (Proposed)	260.67	1.27	Present	South Africa, outage data (UCLF) (MW), 2021–2022
CNN-LSTM	381.66	2.15	[65,66]	American Electric Power (AEP) and ISO New England (ISONE) load data (MW), 2014
LSTM	975.00	5.11	[65,67]	AEP and ISONE load data (MW), 2014
SVR-stacking	583.77	1.75	[68]	Spain, load data (MW), 2015–2018
XGBoost-stacking	1087.45	3.62	[68]	Spain, load data (MW), 2015–2018
TCN-GRU-Attention	1008.23	8.80	[69]	Australian electricity market operator (AEMO) load data (MW), 2006–2011

Keynote: CNN = Convolutional neural networks; TCN = Temporal convolutional networks; Bold = Best model.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sivhugwana, K.S.; Ranganai, E. Short-Term Forecasting of Unplanned Power Outages Using Machine Learning Algorithms: A Robust Feature Engineering Strategy Against Multicollinearity and Nonlinearity. Energies 2025, 18, 4994. https://doi.org/10.3390/en18184994

AMA Style

Sivhugwana KS, Ranganai E. Short-Term Forecasting of Unplanned Power Outages Using Machine Learning Algorithms: A Robust Feature Engineering Strategy Against Multicollinearity and Nonlinearity. Energies. 2025; 18(18):4994. https://doi.org/10.3390/en18184994

Chicago/Turabian Style

Sivhugwana, Khathutshelo Steven, and Edmore Ranganai. 2025. "Short-Term Forecasting of Unplanned Power Outages Using Machine Learning Algorithms: A Robust Feature Engineering Strategy Against Multicollinearity and Nonlinearity" Energies 18, no. 18: 4994. https://doi.org/10.3390/en18184994

APA Style

Sivhugwana, K. S., & Ranganai, E. (2025). Short-Term Forecasting of Unplanned Power Outages Using Machine Learning Algorithms: A Robust Feature Engineering Strategy Against Multicollinearity and Nonlinearity. Energies, 18(18), 4994. https://doi.org/10.3390/en18184994

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Short-Term Forecasting of Unplanned Power Outages Using Machine Learning Algorithms: A Robust Feature Engineering Strategy Against Multicollinearity and Nonlinearity

Abstract

1. Introduction

1.1. Context

1.2. Motivation

1.3. Literature Review and Gaps

1.4. Novelty and Contributions

1.5. Structure of the Study

2. Materials and Methods

2.1. Case Study Report

2.1.1. Data Description

2.1.2. Problem Formulation

2.1.3. Data Partition

2.2. Variable Selection

2.3. Random Forest

2.4. Signal Processing Methods

Wavelet Transform

2.5. AdaBoostRT Algorithm

2.6. Relevance Vector Machine

2.7. Vector Autoregressive Models

2.8. Proposed Framework

Proposed Stacking Prediction Approach

2.9. Evaluation Metrics

2.9.1. Point Prediction Performance Indicators

2.9.2. Residual Analysis

2.9.3. Probabilistic Performance Indicators

2.9.4. Diebold-Mariano Test

2.9.5. Computational Tools

3. Results

3.1. Exploratory Analysis

3.2. Empirical Results

3.2.1. Wavelet Analysis

3.2.2. Comparative Analysis

3.2.3. Accuracy–Complexity Trade-Off

3.2.4. Ablation Study

3.2.5. Comparison with State-of-the-Art Methods

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI