Short-Term Load Forecasting Models: A Review of Challenges, Progress, and the Road Ahead

: Short-term load forecasting (STLF) is critical for the energy industry. Accurate predictions of future electricity demand are necessary to ensure power systems’ reliable and efﬁcient operation. Various STLF models have been proposed in recent years, each with strengths and weaknesses. This paper comprehensively reviews some STLF models, including time series, artiﬁcial neural networks (ANNs), regression-based, and hybrid models. It ﬁrst introduces the fundamental concepts and challenges of STLF, then discusses each model class’s main features and assumptions. The paper compares the models in terms of their accuracy, robustness, computational efﬁciency, scalability, and adaptability and identiﬁes each approach’s advantages and limitations. Although this study suggests that ANNs and hybrid models may be the most promising ways to achieve accurate and reliable STLF, additional research is required to handle multiple input features, manage massive data sets, and adjust to shifting energy conditions.


Introduction
The efficient operation and planning of the power system require accurate load forecasting.Short-term load forecasting (STLF) is critical in power system applications, as it allows the optimal scheduling of energy resources and the efficient management of energy storage systems [1].Short-term load forecasting (STLF) is a crucial task for the energy industry, as accurate predictions of future electricity demand are necessary to ensure power systems' reliable and efficient operation [2].STLF models, shown in Figure 1, involve predicting the electricity demand for the next few hours, days, or weeks, typically up to a maximum of one month.Accurate load forecasting is essential for efficient energy planning, scheduling, and dispatch, as it helps to balance supply and demand, minimize production costs, and avoid power outages [3].Various STLF models have been proposed in recent years, each with strengths and weaknesses [4].These models include time series, artificial neural networks (ANNs), regression-based, and hybrid models.Time series models use historical load data to identify patterns and trends in the load profile and make predictions based on statistical and mathematical techniques such as autoregressive integrated moving averages (ARIMA) and exponential smoothing [5].ANNs are machine learning models that can capture complex nonlinear relationships between input and output variables and can be trained on large and diverse datasets using backpropagation algorithms [6].
Regression-based models use linear or nonlinear regression analysis to establish causal relationships between load and environmental factors such as temperature, humidity, and time of day [7].Hybrid models combine the strengths of multiple model classes, such as ANNs and time series models, to achieve better accuracy and robustness [8].Despite the growing interest in STLF, several challenges and limitations are associated with each model class.Time series models may be limited by their inability to capture long-term trends and non-stationary load data and their sensitivity to data outliers and missing values.
ANNs may be limited by their computational complexity, interpretability, and overfitting, as well as their susceptibility to noise and bias in the training data [9].Regression-based models may be limited by their assumptions of linearity and additivity in the input-output relationships and their difficulty in handling categorical or non-numeric data [10].Hybrid models may be limited by their complexity, parameter tuning, and scalability, as well as their potential overfitting and data redundancy [11].To address these challenges, it is essential to compare and evaluate the performance of different STLF models on real-world load data using a variety of metrics and validation techniques.
Energies.2023, 16, x FOR PEER REVIEW 2 of models use historical load data to identify patterns and trends in the load profile and ma predictions based on statistical and mathematical techniques such as autoregressive in grated moving averages (ARIMA) and exponential smoothing [5].ANNs are mach learning models that can capture complex nonlinear relationships between input and o put variables and can be trained on large and diverse datasets using backpropagation gorithms [6].Regression-based models use linear or nonlinear regression analysis to tablish causal relationships between load and environmental factors such as temperatu humidity, and time of day [7].Hybrid models combine the strengths of multiple mo classes, such as ANNs and time series models, to achieve better accuracy and robustn [8].Despite the growing interest in STLF, several challenges and limitations are associat with each model class.Time series models may be limited by their inability to captu long-term trends and non-stationary load data and their sensitivity to data outliers a missing values.ANNs may be limited by their computational complexity, interpretabili and overfitting, as well as their susceptibility to noise and bias in the training data [ Regression-based models may be limited by their assumptions of linearity and additiv in the input-output relationships and their difficulty in handling categorical or non-n meric data [10].Hybrid models may be limited by their complexity, parameter tuni and scalability, as well as their potential overfitting and data redundancy [11].To addr these challenges, it is essential to compare and evaluate the performance of different ST models on real-world load data using a variety of metrics and validation techniques.Given the fast-paced progress in machine learning, artificial intelligence, and d processing methods, there is a continuous need for updated reviews encompassing t latest developments in short-term load forecasting (STLF) models.This review paper ai to cover these recent advancements and provide insights into state-of-the-art mode which might not be covered in older reviews.As the world moves toward a more susta able energy future, accurate load forecasting is becoming even more critical in managi power systemsʹ increasing complexity and variability.Table 1 lists the contributions a limitations of the latest published articles on the topic of short-term load forecasting.
A comprehensive review of deep learning techniques 2.
Detailed analysis of CNNs, LSTMs, and GRUs

1.
Limited to deep learning techniques 2.
Lacks comparison with traditional or hybrid methods [13] 1.
Comparative study of hybrid models 2.
Clear analysis of model performance 1.
Limited to specific hybrid combinations 2.
Meta-analysis of ensemble learning techniques 2.
Thorough discussion of bagging, boosting, and stacking 1.
Limited to ensemble techniques 2.
Case study on feature selection in load forecasting 2.
Demonstrates the potential of feature selection 1.
Limited to one case study 2.
Explores Bayesian neural networks for forecasting.2.
Explanation of methodology and advantages 1.
Limited to probabilistic methods 2.
Lacks comparison with deterministic approaches Given the fast-paced progress in machine learning, artificial intelligence, and data processing methods, there is a continuous need for updated reviews encompassing the latest developments in short-term load forecasting (STLF) models.This review paper aims to cover these recent advancements and provide insights into state-of-the-art models, which might not be covered in older reviews.As the world moves toward a more sustainable energy future, accurate load forecasting is becoming even more critical in managing power systems' increasing complexity and variability.Table 1 lists the contributions and limitations of the latest published articles on the topic of short-term load forecasting.
A comprehensive review of deep learning techniques 2.
Detailed analysis of CNNs, LSTMs, and GRUs

1.
Limited to deep learning techniques 2.
Lacks comparison with traditional or hybrid methods [13] 1.
Comparative study of hybrid models 2.
Clear analysis of model performance 1.
Limited to specific hybrid combinations 2.
Meta-analysis of ensemble learning techniques 2.
Thorough discussion of bagging, boosting, and stacking 1.
Limited to ensemble techniques 2.
Does not consider standalone models
Case study on feature selection in load forecasting 2.
Demonstrates the potential of feature selection 1.
Limited to one case study 2.
Explores Bayesian neural networks for forecasting.

2.
Explanation of methodology and advantages

Lacks comparison with deterministic approaches
This review paper highlights the challenges in the field of short-term load forecasting and suggests suitable solutions.The latest development in statistical, hybrid, and intelligent STLF models is explored with mathematical analysis.The various features of deep learning strategies are analyzed.This in-depth analysis is valuable for researchers and practitioners, enabling them to make informed decisions when selecting and applying appropriate forecasting models.Our paper presents a critical analysis of the models' advantages and limitations, offering insights into potential improvements and suggesting new approaches that can enhance the performance of STLF models.This contribution aids in advancing the field by addressing existing challenges and identifying areas for future research.

Challenges and Solutions
One of the critical challenges in developing accurate STLF models is dealing with the dynamic nature of load data.Weather, economic conditions, and consumer behavior affect load data, which can change over time [17].Therefore, STLF models need to be able to adapt to changes in the underlying data-generating process.One approach to dealing with non-stationary data is using time-varying models, such as autoregressive moving average (TVARMA) models [13].TVARMA models allow autoregressive coefficients and moving average terms to vary over time, allowing the model to capture changes in the underlying data-generating process.TVARMA models have been shown to outperform stationary models such as ARIMA in some STLF applications.Another approach to dealing with non-stationary data is to use machine learning models, such as deep learning models [18].Deep learning models, such as deep neural networks (DNNs), can automatically learn complex patterns and relationships and adapt to changes in the data-generating process [18].However, training and implementing deep learning models require much data and computational resources.
In addition to dealing with non-stationary data, another challenge in STLF is uncertainty and risk management.STLF models provide point forecasts of the future load values but do not provide information on the uncertainty or risk associated with the forecasts.Therefore, there is a need for models that can provide probabilistic forecasts, such as quantile regression models [19].Quantile regression models provide estimates of the conditional quantiles of the load distribution, allowing for the construction of prediction intervals and risk assessment.Quantile regression models are effective in providing probabilistic forecasts in STLF applications.One more strategy to deal with uncertainty and risk management is scenario-based forecasting.Scenario-based forecasting involves generating multiple scenarios of the future load values based on the different assumptions of the underlying data-generating process [20].Scenario-based forecasting allows for assessing the risk associated with different scenarios and can inform decision-making under uncertainty.
STLF models must also be scalable and adaptable to different power systems and operating conditions.Power systems vary in size, complexity, and generation mix, and STLF models must be able to accommodate these variations [21].Transfer learning is one approach to developing scalable and adaptable STLF models.Transfer learning involves using a pre-trained model on one power system and adapting it to another.Transfer learning can reduce the data and computational resources required to train a new model and enable the transfer of knowledge and expertise across different power systems.Using metalearning is a different strategy for developing scalable and adaptable STLF models.Meta-learning involves learning the optimal model and hyperparameters for a new power system based on the characteristics of the power system and the available data.Meta-learning can reduce the time and resources required to develop a new STLF model and enable the development of models tailored to the power system's specific characteristics [22].
Another challenge in STLF is dealing with the uncertainty and variability of renewable energy sources, such as wind and solar.Renewable energy sources are highly variable and uncertain, and their integration into power systems can affect the accuracy of STLF models [23].One method of handling the variability of renewable energy sources is to use hybrid models that combine STLF and renewable energy forecasting models.Renewable energy forecasting models use weather data and other predictors to forecast the output of renewable energy sources, such as wind and solar.Hybrid models can combine the forecasts of the STLF models with the forecasts of the renewable energy forecasting models, improving the load forecasts' accuracy [24].Another strategy to cope with the variability of renewable energy sources is the use of demand response and energy storage.Demand response involves incentivizing consumers to reduce their electricity consumption during periods of high demand or low renewable energy output, while energy storage involves storing excess renewable energy for later use.Demand response and energy storage can reduce the variability and uncertainty of renewable energy sources and improve STLF model accuracy.
Another difficulty in STLF is addressing the high-dimensional and complex nature of the data.STLF models often require many predictors, such as weather data, holiday schedules, and economic indicators [25].However, using too many predictors can lead to overfitting and reduced accuracy.One strategy for handling the high-dimensional and complex nature of the data is to use feature selection techniques.Feature selection techniques involve selecting a subset of the most relevant predictors for the forecasting task [26].This strategy can reduce the number of predictors used in the model, reduce overfitting, and improve the accuracy of the forecasts.Another method for handling the high-dimensional and complex nature of the data is to use dimensionality reduction techniques.Dimensionality reduction techniques involve transforming the data into a lower-dimensional space while preserving the essential information.This technique can reduce the number of predictors used in the model, reduce the computational resources required, and improve the accuracy of the forecasts [27].
STLF also faces the difficulty of the non-Gaussian and heavy-tailed nature of the load data.Load data often exhibits non-Gaussian and heavy-tailed distributions, which can violate the assumptions of many STLF models [28].One method to deal with the non-Gaussian and heavy-tailed nature of the load data is to use robust STLF models.Robust STLF models are designed to be less sensitive to outliers and heavy-tailed distributions and can provide more accurate forecasts in these scenarios.Another technique to deal with the non-Gaussian and heavy-tailed nature of the load data is to use distributional STLF models [29].Distributional STLF models model the entire distribution of the load data rather than just the mean, allowing for the estimation of quantiles, prediction intervals, and risk assessment.
A further challenge in STLF is dealing with the nonlinear and non-monotonic relationships between the load data, weather data, and other predictors [30].STLF models often require nonlinear and non-monotonic models to capture these relationships.One strategy to cope with nonlinear and non-monotonic relationships is to use non-parametric models, such as decision trees or random forests [31].Non-parametric models can handle nonlinear and non-monotonic relationships and automatically learn complex patterns and relationships in the data.One more approach to dealing with nonlinear and non-monotonic relationships is to use kernel-based models, such as kernel regression or support vector machines [32].Kernel-based models can handle nonlinear and non-monotonic relationships and be more computationally efficient than non-parametric models.Another critical challenge in STLF is dealing with the spatial dimension of load data.Load data varies over time and space, and STLF models must account for spatial dependencies and heterogeneity.One way to deal with the spatial dimension of load data is to use spatial-temporal models.Spatial-temporal models account for the spatial and temporal dependencies of the load data and can provide more accurate forecasts by incorporating information from neighboring locations [33].Another strategy to handle the spatial dimension of load data is to use clustering and spatial interpolation techniques.Clustering techniques involve grouping similar places based on their load patterns.By contrast, spatial interpolation techniques involve estimating the load values at unobserved locations based on the load values at nearby observed locations.
In addition to the challenges mentioned above, STLF models must be transparent and interpretable.STLF models are used in critical decision-making processes, such as load shedding and demand response, and power system operators and decision-makers need to understand their results quickly.One method of developing transparent and interpretable STLF models is to use explainable artificial intelligence (XAI) techniques [34].XAI techniques involve developing models that explain their decisions and predictions.This can increase the transparency and trustworthiness of the models and enable power system operators and decision-makers to better understand the underlying factors that contribute to the load forecasts.Another way to develop transparent and interpretable STLF models is to use causal inference techniques [35].Causal inference techniques involve developing models that can identify the causal relationships between the load data and other predictors and provide insights into the underlying drivers of load demand.
Finally, another challenge in STLF is dealing with the lack of data and data quality issues.STLF models require a large amount of data to train and validate, and the data quality can affect the forecasts' accuracy [36].Data augmentation techniques are one method to deal with the lack of data and data quality issues.Data augmentation techniques involve generating new data from the existing data by adding noise, perturbing the data, or generating synthetic data.Data augmentation can increase the data available for training and improve the model's generalization.Another way to deal with the lack of data and data quality issues is to use transfer learning and domain adaptation techniques [37].These techniques can reduce the data required to train a new model and enable the transfer of knowledge and expertise across different power systems.

Development in STLF Models
Traditional STLF models use historical load, weather, and calendar data as input features.However, the availability of new data sources, such as social media data and smart meter data, presents an opportunity to develop more accurate and robust STLF models.Advanced machine learning techniques, such as deep learning and reinforcement learning, have shown great potential in improving the accuracy of STLF models [38].Deep learning techniques can automatically learn complex patterns and relationships in the data, while reinforcement learning can learn to optimize actions based on feedback from the environment [39].Probabilistic forecasting provides a measure of uncertainty around the point forecast, which can help power system operators make more informed decisions in the face of uncertainty.Various STLF models have been developed, but they can be broadly classified into statistical, intelligent, and hybrid [40].Figure 2 shows three main types of STLF models: statistical, intelligent, and hybrid models.learning techniques can automatically learn complex patterns and relationships in data, while reinforcement learning can learn to optimize actions based on feedback fr the environment [39].Probabilistic forecasting provides a measure of uncertainty arou the point forecast, which can help power system operators make more informed decisio in the face of uncertainty.Various STLF models have been developed, but they can broadly classified into statistical, intelligent, and hybrid [40].Figure 2 shows three m types of STLF models: statistical, intelligent, and hybrid models.

Statistical Models
Statistical models are based on time series analysis and can capture the temporal patterns of load demand.Figure 3 shows common statistical models for STLF including autoregressive integrated moving averages (ARIMA), seasonal ARIMA (SARIMA), exponential smoothing (ES), and generalized linear models (GLM) [41].ARIMA models assume that the current value of the load demand is a function of its past values and the random error term.SARIMA models incorporate seasonal patterns in the data, which can be helpful in power system applications where load demand exhibits daily, weekly, or monthly cycles.ES models use a weighted average of the past load demand values to predict future values [42].Statistical models are relatively simple and require low computational resources.However, they may not be able to capture the nonlinear relationships and complex dynamics of power system systems, which can result in lower forecasting accuracy [43].Short-term load forecasting (STLF) is a critical component of energy management systems (EMS) for power system applications.Accurate STLF is essential for optimizing power system energy supply and demand.It can improve energy efficiency, reduce costs, and enhance reliability [44].

Statistical Models
Statistical models are based on time series analysis and can capture the temporal pa terns of load demand.Figure 3 shows common statistical models for STLF including a toregressive integrated moving averages (ARIMA), seasonal ARIMA (SARIMA), exp nential smoothing (ES), and generalized linear models (GLM) [41].ARIMA models a sume that the current value of the load demand is a function of its past values and t random error term.SARIMA models incorporate seasonal patterns in the data, which ca be helpful in power system applications where load demand exhibits daily, weekly, monthly cycles.ES models use a weighted average of the past load demand values predict future values [42].Statistical models are relatively simple and require low comp tational resources.However, they may not be able to capture the nonlinear relationshi and complex dynamics of power system systems, which can result in lower forecastin accuracy [43].Short-term load forecasting (STLF) is a critical component of energy ma agement systems (EMS) for power system applications.Accurate STLF is essential for o timizing power system energy supply and demand.It can improve energy efficiency, r duce costs, and enhance reliability [44].

Autoregressive Integrated Moving Average (ARIMA) Models
ARIMA models are widely used in STLF applications due to their simplicity and ab ity to capture the temporal dependence of the load data.ARIMA models are based o three components: the autoregressive (AR) component, the integrated (I) component, an the moving average (MA) component [45].The AR component models the dependence the load on its past values, the I component models the trend in the load data, and the M component models the dependency on past errors.ARIMA models can be customized b adjusting the parameters of the three components, such as the order of the AR and M components and the degree of differencing in the I component [46].However, ARIM models assume that the load data follows a stationary process, which may not always valid for power system applications.
Furthermore, ARIMA models may not capture the nonlinear relationships betwee the load and other factors that influence the load, such as weather and occupancy.T algorithm of ARIMA models is shown in Figure 4, The ARIMA model is denoted ARIMA(p,d,q), where p is the order of the AR model, d is the order of differencing, and is the order of the MA model.These parameters are chosen based on the characteristics the time series being analyzed and can be estimated using statistical methods such maximum likelihood estimation.Once the parameters have been evaluated, the ARIM model can be used to forecast future values of the time series.The model works by usin past observations to generate a prediction of the next value in the series, based on the A and MA components of the model, and then using this predicted value to update the err

Autoregressive Integrated Moving Average (ARIMA) Models
ARIMA models are widely used in STLF applications due to their simplicity and ability to capture the temporal dependence of the load data.ARIMA models are based on three components: the autoregressive (AR) component, the integrated (I) component, and the moving average (MA) component [45].The AR component models the dependence of the load on its past values, the I component models the trend in the load data, and the MA component models the dependency on past errors.ARIMA models can be customized by adjusting the parameters of the three components, such as the order of the AR and MA components and the degree of differencing in the I component [46].However, ARIMA models assume that the load data follows a stationary process, which may not always be valid for power system applications.
Furthermore, ARIMA models may not capture the nonlinear relationships between the load and other factors that influence the load, such as weather and occupancy.The algorithm of ARIMA models is shown in Figure 4, The ARIMA model is denoted as ARIMA(p,d,q), where p is the order of the AR model, d is the order of differencing, and q is the order of the MA model.These parameters are chosen based on the characteristics of the time series being analyzed and can be estimated using statistical methods such as maximum likelihood estimation.Once the parameters have been evaluated, the ARIMA model can be used to forecast future values of the time series.The model works by using past observations to generate a prediction of the next value in the series, based on the AR and MA components of the model, and then using this predicted value to update the error term and make a new prediction for the following time step.This process is repeated recursively to generate a forecast for a specified number of time steps into the future.
The MA component represents the dependency of the current value of the time series on the past error terms.An MA model of order q is represented as MA(q), and the equation is: where: Y(t) is the value of the time series at time t; c is a constant; ε(t) is the error term at time t; θ₁, θ₂, …, θq are the moving average coefficients; q is the order of the MA model.Combining the AR, I, and MA components, we can represent an ARIMA(p,d,q) model as:  The AR component assumes that the current value of the time series depends on its previous values.An AR model of order p is represented as AR(p), and the equation is: where: Higher-order differencing can be applied by differencing the already differenced series multiple times.For example, the second-order differencing is: The MA component represents the dependency of the current value of the time series on the past error terms.An MA model of order q is represented as MA(q), and the equation is: where: Y(t) is the value of the time series at time t; c is a constant; ε(t) is the error term at time t; θ 1 , θ 2 , . . ., θ q are the moving average coefficients; q is the order of the MA model.
Combining the AR, I, and MA components, we can represent an ARIMA(p,d,q) model as: 3.1.2.Seasonal Autoregressive Integrated Moving Average (SARIMA) Models SARIMA models are extensions of ARIMA models that can capture the seasonal patterns in the load data.SARIMA models include additional parameters to model the seasonal variation in the load data, such as the seasonal period and the order of the AR, I, and MA components [47].SARIMA models are helpful for STLF applications in the power system that exhibit strong seasonal patterns, such as peak load periods during the day or week.However, SARIMA models suffer from the same limitations as ARIMA models, such as the assumption of a stationary process and the inability to capture nonlinear relationships between the load and other factors [48].SARIMA models may also require a large amount of historical data to estimate the seasonal parameters accurately, which may not be available for new power system installations.The algorithm of SARIMA models is shown in Figure 5, The SARIMA model is specified using three main parameters: p, d, and q for the non-seasonal component, and P, D, and Q for the seasonal component.The p parameter represents the order of autoregression, the d parameter represents the degree of differencing, and the q parameter represents the order of moving average for the non-seasonal component.SARIMA models are extensions of ARIMA models that can capture the seasonal pa terns in the load data.SARIMA models include additional parameters to model the se sonal variation in the load data, such as the seasonal period and the order of the AR, and MA components [47].SARIMA models are helpful for STLF applications in the pow system that exhibit strong seasonal patterns, such as peak load periods during the day o week.However, SARIMA models suffer from the same limitations as ARIMA model such as the assumption of a stationary process and the inability to capture nonlinear rel tionships between the load and other factors [48].SARIMA models may also require large amount of historical data to estimate the seasonal parameters accurately, which ma not be available for new power system installations.The algorithm of SARIMA models shown in Figure 5, The SARIMA model is specified using three main parameters: p, and q for the non-seasonal component, and P, D, and Q for the seasonal component.Th p parameter represents the order of autoregression, the d parameter represents the degre of differencing, and the q parameter represents the order of moving average for the non seasonal component.
Similarly, the P, D, and Q parameters represent the order of autoregression, the d gree of differencing, and the order of moving averages for the seasonal component.Th algorithm for fitting a SARIMA model involves several steps.First, the model paramete are estimated using maximum likelihood estimation.This process involves selecting th values of p, d, q, P, D, and Q that maximize the likelihood of the observed data [49].Onc the parameters are estimated, the model is fitted to the data using forecasting.This stra egy involves using the model to predict future periods based on historical data.Similarly, the P, D, and Q parameters represent the order of autoregression, the degree of differencing, and the order of moving averages for the seasonal component.The algorithm for fitting a SARIMA model involves several steps.First, the model parameters are estimated using maximum likelihood estimation.This process involves selecting the values of p, d, q, P, D, and Q that maximize the likelihood of the observed data [49].Once the parameters are estimated, the model is fitted to the data using forecasting.This strategy involves using the model to predict future periods based on historical data.
The equations for the SARIMA model components are as follows: Autoregressive (AR) component: where ϕ(B) is the autoregressive operator, B is the backshift operator, X t is the time series at time t, c is a constant, ϑ(B) is the moving average operator, and ε t is the error term.Differencing (I) component: where (1 − B) d is the differencing operator, and Y t is the differenced time series.Moving average (MA) component: where µ is the series' mean and θ i are the MA coefficients.Seasonal autoregressive (SAR) component: where ∅ s (B s ) is the seasonal autoregressive operator and ϑ s (B s ) is the seasonal moving average operator.Seasonal differencing (SI) component: where (1 − B s ) D is the seasonal differencing operator, and Z t is the seasonally differenced time series.Seasonal moving average (SMA) component: where θ si are the seasonal MA coefficients.Combining these components, the SARIMA model equation can be represented as:

Exponential Smoothing (ES) Models
ES models are time series models that use an exponentially weighted average of past observations to forecast future values.ES models can be customized by adjusting the smoothing parameter, which controls the weights assigned to past observations.ES models include several variants, such as simple exponential smoothing (SES), Holt's linear exponential smoothing (Holt), and Holt-Winters seasonal exponential smoothing (HW) [49].SES models use a single smoothing parameter to forecast the load data based on past values.Holt models include an additional trend component to capture the linear trend in the load data.HW models include both a trend and a seasonal component to capture the seasonal patterns in the load data.HW models are helpful for STLF applications in the power system that exhibit trend and seasonal patterns [50].
ES models are computationally efficient and require less historical data than ARIMA and SARIMA models.ES models can also capture the nonlinear relationships between the load and other factors that influence the load, such as weather and occupancy.However, ES models assume that the load data follows a stationary process and may not perform well for power systems with non-stationary load data [51].The algorithm of ES models is shown in Figure 6.The ES model is specified using two main parameters: alpha and beta.The alpha parameter controls the weight given to the most recent observation, while the beta parameter controls the weight given to the trend component.The algorithm for fitting an ES model involves several steps.First, the initial level and trend estimates are calculated using the first few observations of the time series.Then, the level and trend estimates are updated for each subsequent observation using the following equations: 13) Energies.2023, 16, x FOR PEER REVIEW 10 of GLMs are a class of statistical models that can capture the nonlinear relationship between the load and other factors that influence the load.GLMs include several variant such as Poisson regression, negative binomial regression, and gamma [48].GLMs are class of statistical models that extend the linear regression framework to handle non-no mal distributions of the response variable.GLMs can capture nonlinear relationships b tween the load and other factors that influence the load, such as weather and occupanc by modeling the conditional mean of the response variable as a function of the predicto variables through a link function [49].One popular GLM for STLF is Poisson regressio which models the load count as a function of predictor variables.Poisson regression a sumes that the response variable follows a Poisson distribution and uses a log link fun tion to model the expected value of the response variable as a linear function of the pr dictor variables.Poisson regression can capture the nonlinear relationships between th load and other factors that influence the load and is well-suited for power systems wit count data, such as the number of appliances or devices in use [50].Using a log-linea relationship, the Poisson regression model relates the expected count (λ) to the predicto variables (X).The model can be written as: where: β₀ is the intercept; β₁ is the coefficient for the predictor variable X₁.
Another GLM for STLF is negative binomial regression, an extension of Poisson r gression that can handle over-dispersed count data.Overdispersion occurs when the va iance of the response variable exceeds its mean, which is common in power system appl cations due to the high variability in load data [51].Negative binomial regression uses log link function to model the expected value of the response variable as a linear functio of the predictor variables.It includes an additional dispersion parameter to model th variance in the response variable [52].
The dispersion model can be written as:  Here L(t) represents the level of observations and T(t) shows the trend of data values.
Once the level and trend estimates are updated, the ES model can be used to predict future periods.The forecast for period t + 1 is calculated using the following equation: 3.1.4.Generalized Linear Models (GLMs) GLMs are a class of statistical models that can capture the nonlinear relationships between the load and other factors that influence the load.GLMs include several variants, such as Poisson regression, negative binomial regression, and gamma [48].GLMs are a class of statistical models that extend the linear regression framework to handle non-normal distributions of the response variable.GLMs can capture nonlinear relationships between the load and other factors that influence the load, such as weather and occupancy, by modeling the conditional mean of the response variable as a function of the predictor variables through a link function [49].One popular GLM for STLF is Poisson regression, which models the load count as a function of predictor variables.Poisson regression assumes that the response variable follows a Poisson distribution and uses a log link function to model the expected value of the response variable as a linear function of the predictor variables.Poisson regression can capture the nonlinear relationships between the load and other factors that influence the load and is well-suited for power systems with count data, such as the number of appliances or devices in use [50].Using a log-linear relationship, the Poisson regression model relates the expected count (λ) to the predictor variables (X).The model can be written as: where: β 0 is the intercept; β 1 is the coefficient for the predictor variable X 1 .
Another GLM for STLF is negative binomial regression, an extension of Poisson regression that can handle over-dispersed count data.Overdispersion occurs when the variance of the response variable exceeds its mean, which is common in power system applications due to the high variability in load data [51].Negative binomial regression uses a log link function to model the expected value of the response variable as a linear function of the predictor variables.It includes an additional dispersion parameter to model the variance in the response variable [52].
The dispersion model can be written as: where: µ is the expected count (mean) of the response variable; β 0 is the intercept; β 1 is the coefficient for the predictor variable X 1 .
For the dispersion parameter k, you can either fix it at a constant value or model it as a function of the predictor variables.To model k as a function of the predictor variables, we can use a log link log(k where: γ 0 is the intercept for the dispersion part of the model; γ 1 is the coefficient for the predictor variable Z 1 affecting the dispersion.Gamma regression is a critical GLM that can be used for STLF in power system applications.Gamma regression models the continuous load data as a function of predictor variables if the response variable follows a gamma distribution [53].Gamma regression uses a log link function to model the expected value of the response variable as a linear function of the predictor variables.The gamma regression model can be written as The Equation (19) represents the relationship between the mean of the response variable (µ) and the linear predictor (η) through a link function (g).
Combining the link function and the linear predictor, we have GLMs are computationally efficient and can handle various predictor variables, including categorical and interaction terms [54].GLMs also provide interpretable coefficients that can be used to identify the most significant predictor variables and quantify their impact on the load.However, GLMs assume that the response variable follows a specific distribution, which may not always be valid for power system applications [55].GLMs may also require a large amount of historical data to estimate the model parameters accurately, which may not be available for new power system installations [56].

Intelligent Models
Intelligent models in short-term load forecasting (STLF) refer to forecasting techniques that leverage advanced computational methods, such as artificial intelligence, machine learning, and optimization algorithms, to predict the electricity load in the short term.These models are designed to capture complex patterns, nonlinear relationships, and dependencies in the load data, leading to improved forecasting accuracy and reliability [57].Figure 7 displays some of the main intelligent models used in STLF.

Intelligent Models
Intelligent models in short-term load forecasting (STLF) refer to forecasting techniques that leverage advanced computational methods, such as artificial intelligence, machine learning, and optimization algorithms, to predict the electricity load in the short term.These models are designed to capture complex patterns, nonlinear relationships, and dependencies in the load data, leading to improved forecasting accuracy and reliability [57].Figure 7 displays some of the main intelligent models used in STLF.

Support Vector Machine
One important ML model for STLF is the support vector machine (SVM), a supervised learning model that can handle linear and nonlinear relationships between the load and other factors that influence the load.SVMs can control various predictor variables, including categorical variables and interaction terms [58].They can also handle noisy data using a kernel function to map it to a higher-dimensional space.SVMs have been used successfully in power system applications to forecast the load based on temperature, humidity, and time of day.In the context of short-term load forecasting (STLF), SVMs can be employed to predict electricity load or demand for a specific upcoming period, usually ranging from a few hours to a week ahead.The primary goal of an SVM is to find the optimal hyperplane that maximally separates two classes of data points.In a two-dimensional space, a hyperplane is a line that separates the data into two classes.It is a hyperplane (a subspace with one dimension less than the containing space) in higher-dimensional spaces.The margin is the distance between the hyperplane and the closest data

Support Vector Machine
One important ML model for STLF is the support vector machine (SVM), a supervised learning model that can handle linear and nonlinear relationships between the load and other factors that influence the load.SVMs can control various predictor variables, including categorical variables and interaction terms [58].They can also handle noisy data using a kernel function to map it to a higher-dimensional space.SVMs have been used successfully in power system applications to forecast the load based on temperature, humidity, and time of day.In the context of short-term load forecasting (STLF), SVMs can be employed to predict electricity load or demand for a specific upcoming period, usually ranging from a few hours to a week ahead.The primary goal of an SVM is to find the optimal hyperplane that maximally separates two classes of data points.In a two-dimensional space, a hyperplane is a line that separates the data into two classes.It is a hyperplane (a subspace with one dimension less than the containing space) in higher-dimensional spaces.The margin is the distance between the hyperplane and the closest data points from both classes.These closest points are called support vectors, which "support" the hyperplane.The objective of an SVM is to maximize the margin while correctly classifying the data points.Given a dataset of labeled data points (x i , y i ), where x i ∈ R n is the feature vector, and y i ∈ {−1, 1} is the class label, the hyperplane can be defined as [59]: Here, w is the weight vector, x is the input feature vector, and b is the bias term.The dot product (w•x) measures the projection of x onto the direction of w.The goal is to find the optimal w and b that maximize the margin between the two classes.
The decision function for classification is where sign(.) is the signum function that outputs the class label based on the sign of its argument.The margin for each data point can be computed as: where: y i is the true label of the i-th data point; in the binary classification case, it takes the value of either −1 or +1.w is the weight vector, which is orthogonal to the decision boundary (hyperplane).
x i is the feature vector of the i-th data point.b is the bias term, which shifts the decision boundary away from the origin.w•x i is the dot product between the weight vector w and the feature vector x i , which represents the projection of x i onto the weight vector.
The objective of the SVM is to maximize the margin while ensuring that all data points are correctly classified.The margin can be maximized by minimizing the norm of the weight vector, w .This is because the distance between the hyperplane and the closest data point is inversely proportional to w .
The optimization problem for an SVM can be formulated as a constrained optimization problem: subject to: Here, N is the number of data points in the dataset.This is a convex quadratic programming problem with linear constraints.We can use the Lagrange multipliers method to solve this problem, which leads to the dual problem.The dual problem is a more convenient form for solving the SVM optimization problem, especially for nonlinear cases when using kernel functions [60].
The step-wise working of STLF is shown in Figure 8 and explained below.
Energies.2023, 16, x FOR PEER REVIEW 13 of 29 3. The SVM model is trained with the preprocessed data and the selected features.SVMs aim to find the optimal hyperplane that best separates the data into different classes or categories.In the case of STLF, it is a regression problem, so the model will learn to predict continuous values for the electricity load.To do this, the SVM algorithm uses kernel functions (such as linear, polynomial, or radial basis functions) to transform the input data into a higher-dimensional space, making finding the optimal separating hyperplane easier.4. To ensure the SVM model performs well on unseen data, it is validated using techniques such as cross-validation.During this process, the dataset is divided into training and validation subsets, with the model being trained on one subset and tested on the other.This helps to assess the modelʹs performance and generalizability.Moreover, hyperparameters such as the cost parameter (C), kernel type, and kernel parameters are tuned to find the best combination for the specific STLF problem. 5.After training and tuning the SVM model, it is tested on an unseen dataset to evaluate its forecasting accuracy.Performance metrics such as mean absolute error (MAE), mean squared error (MSE), or mean absolute percentage error (MAPE) are used to quantify the modelʹs predictive capabilities.6.Once the SVM model has been trained, validated, and tested, it can be used to make short-term load forecasts based on new input data.The model takes in the relevant features for the desired forecasting period and outputs the predicted electricity load.

Decision Tree
Another ML model for STLF is the decision tree (DT), a supervised learning model that can handle both categorical and continuous predictor variables [60].DTs work by partitioning the predictor variables into subsets based on their relevance to the load and constructing a decision tree that can be used to predict the load based on the predictor variables.DTs have been used successfully in power system applications to forecast the load based on weather conditions, occupancy, and time of day.The key concepts in decision trees are [61]: Node: A decision tree consists of nodes, where each node represents a decision or a split based on a feature's value [62].
Leaf: The terminal nodes of the tree, where no further splitting occurs, are called leaves.They represent the final decision or output for a given input.
Split criterion: The choice of feature and the split value at each node is based on a split criterion, which aims to maximize the homogeneity (purity) of the resulting child nodes.

1.
The first step involves collecting historical electricity load data and relevant exogenous variables such as weather data, day of the week, and time of the day.These data are cleaned and preprocessed to remove any inconsistencies, outliers, or missing values, and are often normalized or standardized to improve the performance of the SVM. 2.
Next, the most relevant features are selected for the forecasting task.This step is crucial, as irrelevant or redundant features can negatively impact the model's performance.Techniques such as recursive feature elimination (RFE), correlation analysis, or principal component analysis (PCA) can be applied to identify the most significant features of the problem.

3.
The SVM model is trained with the preprocessed data and the selected features.SVMs aim to find the optimal hyperplane that best separates the data into different classes or categories.In the case of STLF, it is a regression problem, so the model will learn to predict continuous values for the electricity load.To do this, the SVM algorithm uses kernel functions (such as linear, polynomial, or radial basis functions) to transform the input data into a higher-dimensional space, making finding the optimal separating hyperplane easier.4.
To ensure the SVM model performs well on unseen data, it is validated using techniques such as cross-validation.During this process, the dataset is divided into training and validation subsets, with the model being trained on one subset and tested on the other.This helps to assess the model's performance and generalizability.
Energies 2023, 16, 4060 14 of 29 Moreover, hyperparameters such as the cost parameter (C), kernel type, and kernel parameters are tuned to find the best combination for the specific STLF problem. 5.
After training and tuning the SVM model, it is tested on an unseen dataset to evaluate its forecasting accuracy.Performance metrics such as mean absolute error (MAE), mean squared error (MSE), or mean absolute percentage error (MAPE) are used to quantify the model's predictive capabilities.6.
Once the SVM model has been trained, validated, and tested, it can be used to make short-term load forecasts based on new input data.The model takes in the relevant features for the desired forecasting period and outputs the predicted electricity load.

Decision Tree
Another ML model for STLF is the decision tree (DT), a supervised learning model that can handle both categorical and continuous predictor variables [60].DTs work by partitioning the predictor variables into subsets based on their relevance to the load and constructing a decision tree that can be used to predict the load based on the predictor variables.DTs have been used successfully in power system applications to forecast the load based on weather conditions, occupancy, and time of day.The key concepts in decision trees are [61]: Node: A decision tree consists of nodes, where each node represents a decision or a split based on a feature's value [62].
Leaf: The terminal nodes of the tree, where no further splitting occurs, are called leaves.They represent the final decision or output for a given input.
Split criterion: The choice of feature and the split value at each node is based on a split criterion, which aims to maximize the homogeneity (purity) of the resulting child nodes.
For a classification task, two common split criteria are: a.
Gini impurity: Gini impurity measures a node's impurity (class mixture), with lower values indicating higher purity [63].The Gini impurity for a node with class probabilities p i is: b.
Information gain: Information gain is based on entropy, which measures the randomness or uncertainty in a set.The entropy for a node with class probabilities π is: The information gain is the difference in entropy before and after the split: The goal is to maximize the information gain, which leads to more homogeneous child nodes.
For a regression task, the common split criterion is mean squared error (MSE).The MSE measures the average squared difference between the actual and predicted target values [64].The objective is to minimize the MSE for each split.
Figure 9 shows the working of a decision tree algorithm, and this process is explained below.

Random Forest and Gradient Boosting
Random forest (RF) is an extension of DTs that can handle overfitting and improve the accuracy of the load forecasts.RFs construct multiple decision trees using bootstrap samples of the training data and averaging the predictions of the individual trees [66].RFs have been used successfully in power system applications to forecast the load based on temperature, humidity, and solar radiation.Figure 10 shows the working of a random forest algorithm.The m features are selected from the incoming instances.The different numbers of trees are made, leading to unique prediction classes.The majority vote determines the final class [66].Gradient boosting (GB) is an ensemble ML model that can improve the accuracy of load forecasts [67].GB works by sequentially adding decision trees to the model, which corrects the errors of the previous trees, resulting in a final model that can capture the

1.
The first step involves collecting historical electricity load data and relevant exogenous variables such as weather data, day of the week, and time of the day.These data are cleaned and preprocessed to remove any inconsistencies, outliers, or missing values.Feature scaling is generally not required for decision trees, as they are less sensitive to the scale of input features.

2.
The most relevant features for the forecasting task are selected to ensure that irrelevant or redundant features do not negatively impact the model.Techniques such as recursive feature elimination (RFE), correlation analysis, or information gain can be applied to identify the most significant features of the problem.

3.
With the preprocessed data and selected features, the decision tree model is trained.
The algorithm recursively splits the data into subsets based on the input features' values to minimize the impurity of the resulting subsets.For a regression task such as STLF, the impurity can be measured using criteria such as mean squared error (MSE).
The algorithm splits the data until a stopping criterion is reached, such as a maximum tree depth or a minimum number of samples in a leaf node.4.
Decision trees can be prone to overfitting, especially when they grow too deep.To address this issue, the model is validated using techniques such as cross-validation.
The dataset is divided into training and validation subsets, with the model being trained on one subset and tested on the other.This method helps assess the model's performance and generalizability.Additionally, pruning techniques, such as costcomplexity or reduced-error pruning, can simplify the tree and reduce overfitting.5.
After training and pruning the decision tree model, it is tested on an unseen dataset to evaluate its forecasting accuracy.Performance metrics such as mean absolute error (MAE), mean squared error (MSE), or mean absolute percentage error (MAPE) are used to quantify the model's predictive capabilities [65].6.
Once the decision tree model has been trained, validated, and tested, it can make short-term load forecasts based on new input data.The model takes in the relevant features for the desired forecasting period and traverses the tree from the root node to a leaf node, following the decision rules at each split.The output at the leaf node is the predicted electricity load.

Random Forest and Gradient Boosting
Random forest (RF) is an extension of DTs that can handle overfitting and improve the accuracy of the load forecasts.RFs construct multiple decision trees using bootstrap samples of the training data and averaging the predictions of the individual trees [66].RFs have been used successfully in power system applications to forecast the load based on temperature, humidity, and solar radiation.Figure 10 shows the working of a random forest algorithm.The m features are selected from the incoming instances.The different numbers of trees are made, leading to unique prediction classes.The majority vote determines the final class [66].

Random Forest and Gradient Boosting
Random forest (RF) is an extension of DTs that can handle overfitting and improve the accuracy of the load forecasts.RFs construct multiple decision trees using bootstrap samples of the training data and averaging the predictions of the individual trees [66].RFs have been used successfully in power system applications to forecast the load based on temperature, humidity, and solar radiation.Figure 10 shows the working of a random forest algorithm.The m features are selected from the incoming instances.The different numbers of trees are made, leading to unique prediction classes.The majority vote determines the final class [66].Gradient boosting (GB) is an ensemble ML model that can improve the accuracy of load forecasts [67].GB works by sequentially adding decision trees to the model, which corrects the errors of the previous trees, resulting in a final model that can capture the Gradient boosting (GB) is an ensemble ML model that can improve the accuracy of load forecasts [67].GB works by sequentially adding decision trees to the model, which corrects the errors of the previous trees, resulting in a final model that can capture the nonlinear relationships between the load and other factors that influence the load.GB has been used successfully in power system applications to forecast the load based on weather conditions, occupancy, and time of day [68].Figure 11  nonlinear relationships between the load and other factors that influence the load.GB has been used successfully in power system applications to forecast the load based on weather conditions, occupancy, and time of day [68].Figure 11 is a block diagram of the gradient boosting algorithm.The data training determines the weak learners used to make a more accurate prediction.The step-wise working of random forest and gradient boosting algorithms is below.
1.The first step involves collecting historical electricity load data and relevant exogenous variables such as weather data, day of the week, and time of the day.These data are cleaned and preprocessed to remove inconsistencies, outliers, or missing values.The step-wise working of random forest and gradient boosting algorithms is below.

1.
The first step involves collecting historical electricity load data and relevant exogenous variables such as weather data, day of the week, and time of the day.These data are cleaned and preprocessed to remove inconsistencies, outliers, or missing values.Feature scaling is generally not required for random forest as decision trees, its base learners, are less sensitive to the scale of input features.

2.
The most relevant features for the forecasting task are selected to ensure that irrelevant or redundant features do not negatively impact the model.Although random forest has an inherent ability to handle many features and automatically estimate feature importance, using domain knowledge or techniques such as recursive feature elimination (RFE) and correlation analysis can help further improve model performance.

3.
With the preprocessed data and selected features, the random forest model is trained.
The algorithm creates multiple decision trees, and each tree is trained on a different bootstrap sample of the original dataset (sampling with replacement).Additionally, a random subset of features is considered at each split in the tree construction process, which introduces further diversity among the trees and reduces overfitting.

4.
The cross-validation technique ensures that the random forest model performs well on unseen data.The dataset is divided into training and validation subsets, with the model being trained on one subset and tested on the other.This process helps assess the model's performance and generalizability.

5.
Random forest has several hyperparameters, such as the number of trees (n_estimators), the maximum depth of the trees, and the minimum number of samples required to split a node.These hyperparameters can be tuned using techniques such as grid or random search and cross-validation to find the best combination for the specific STLF problem.6.
Once the random forest model has been trained, validated, and tested, it can make short-term load forecasts based on new input data.The model takes in the relevant features for the desired forecasting period and produces a prediction from each decision tree.The final prediction is the average of the individual tree predictions, which provides a more accurate and stable forecast.

Multilayer Perceptron Model
A multilayer perceptron (MLP) is an artificial neural network consisting of multiple layers of interconnected nodes, also known as neurons or perceptrons.It is widely used in supervised learning tasks, such as classification and regression.An MLP is shown in Figure 12, it has the following components [69]: Input layer: This is the first layer of the MLP model, receiving the input data (e.g., numbers, images, and text).Each node in this layer corresponds to a single input data feature.2.
Hidden layers: These are the layers between the input and output layers.They consist of neurons that learn to represent and process the data.The more hidden layers and neurons per layer, the more complex patterns the model can learn.

3.
Output layer: The last layer in the MLP model produces the final results or predictions.
The number of nodes in this layer depends on the problem one is trying to solve.For example, if images are classified into ten categories, the output layer will have ten nodes.4.
Neurons: Each neuron in the MLP model receives input from other neurons, processes it using an activation function, and sends the output to other neurons in the next layer.The activation function introduces non-linearity, which enables the MLP to learn complex patterns in the data.

5.
Weights and biases: Each connection between neurons has a weight that determines the strength of the association.The weights are adjusted during training to minimize the difference between the predicted and actual values.Biases are additional constants that help shift the activation function, improving the model's learning ability.

6.
Training: MLP models are trained using a backpropagation algorithm, which adjusts the weights and biases by minimizing the error between the predicted and actual values.The process is iterative, involving multiple passes through the data to fine-tune the model.7.
Loss function: This is a measure of how well the MLP model is performing.A lower value indicates better performance.During training, the goal is to minimize the loss of function.Mathematically, the output of a neuron can be represented as [66]: where: a j is the output (activation) of neuron j; f is the activation function; w ij is the weight connecting input i to neuron j; x i is the input value for input i; b j is the bias term for neuron j.
For each layer in the MLP, this equation can be applied in a matrix form: where: A is the activation matrix (each column represents the activation of a neuron); f is the activation function applied element-wise; W is the weight matrix; X is the input matrix (each column represents an input feature vector); B is the bias matrix.
After computing the activations for all layers, the output layer produces the final prediction.For classification tasks, a softmax function is typically used in the output layer to convert the activations into probabilities: softmax(a i ) = exp(a i )/Σ(exp(a j )) (32) where: a i is the activation of output neuron i, a j is the activation of output neuron j, softmax(a i ) is the probability for class i.

Deep Learning Models
Deep learning (DL) is a class of ML models that can capture complex nonlinear relationships between the load and other factors that influence the load [70].DL models consist of multiple layers of interconnected nodes that process and transmit information through weighted connections [71].One popular DL model for STLF is the convolutional neural network (CNN), which can capture the spatial and temporal patterns in the load data.CNNs have been used successfully in power system applications to forecast the load based on weather conditions, occupancy, and time of day [72].Another critical DL model for STLF is the recurrent neural network (RNN), which can capture the temporal dependencies in the load data.RNNs use feedback connections to allow information to be passed from one time step to the next, enabling the model to capture the dynamics of the load over time.One type of RNN is the long short-term memory (LSTM) network, which is well-suited for STLF as it can capture both short-term and long-term dependencies in the load data [73].

Ensemble Models
Ensemble models are machine learning models that combine multiple models' predictions to produce a final prediction [74].The basic idea behind ensemble models is to use the strengths of different models and combine their predictions to create a more accurate and robust forecast.Ensemble models can be used in various machine learning tasks, including classification, regression, and clustering [75].One of the main advantages of ensemble models is their ability to reduce variance and overfitting.Variance is a measure of how much the predictions of the models in the ensemble vary from each other.Overfitting occurs when a model is too complex and fits the training data too closely, leading to poor performance on new, unseen data.Ensemble models can reduce variance and overfitting by combining the predictions of multiple models, thereby reducing the overall variance and producing a more robust forecast [76].One of the main disadvantages of ensemble models is their complexity.Ensemble models can be more complex to implement and interpret than single models and may require more computational resources [77].
Additionally, the performance of ensemble models can depend on the specific combination of models used, and finding the optimal combination can be challenging.Another disadvantage of ensemble models is their sensitivity to the quality of the models in the ensemble [78].Poor-quality models can negatively affect ensemble models, such as overfitting or underfitting the data.It is essential to carefully select the models in the ensemble and ensure that they are high quality.Several techniques for creating ensemble models include bagging, boosting, and stacking shown in Figure 13.Each technique has its strengths and limitations, and the choice of method depends on the specific application and dataset.
Bagging, or bootstrap aggregating, is a technique that involves training multiple models on different subsets of the training data and then combining their predictions using a weighted average.Bagging can be used with any model and can reduce variance and overfitting.The basic idea behind bagging is to create multiple copies of the original dataset, each with a different subset of the data [79].The models are then trained on each of these copies, combining their predictions to produce a final prediction.By combining the predictions of multiple models, bagging can produce a more accurate and robust forecast.

B. Boosting.
Boosting is a technique that involves training multiple models sequentially, with each model focusing on the examples that the previous model misclassified.Boosting can improve the accuracy of the forecast but may be more prone to overfitting than bagging.The basic idea behind boosting is to start with a simple model and then sequentially add more complex models that focus on the examples that the previous model misclassified [80].The models are combined using a weighted sum, with weights that depend on their accuracy.By focusing on the examples that the previous model misclassified, boosting can produce a more accurate and robust forecast.

C. Stacking.
Stacking, or stacked generalization, is a technique that involves training multiple models on the training data and then using their predictions as input to a higher-level model.The higher-level model learns to combine the predictions of the lower-level models to produce a final prediction.Stacking can improve the accuracy and robustness of the forecast but may be more complex to implement and interpret than bagging or boosting [81].The basic idea behind stacking is to train multiple models on the training data and then use their predictions as input to a higher-level model.The higher-level model learns to combine the predictions of the lower-level models to produce a final prediction.By combining the predictions of multiple models, stacking can create a more accurate and robust forecast.
Energies.2023, 16, x FOR PEER REVIEW 19 of overfitting by combining the predictions of multiple models, thereby reducing the overa variance and producing a more robust forecast [76].One of the main disadvantages o ensemble models is their complexity.Ensemble models can be more complex to impl ment and interpret than single models and may require more computational resource [77].
Additionally, the performance of ensemble models can depend on the specific com bination of models used, and finding the optimal combination can be challenging.An other disadvantage of ensemble models is their sensitivity to the quality of the models i the ensemble [78].Poor-quality models can negatively affect ensemble models, such a overfitting or underfitting the data.It is essential to carefully select the models in the en semble and ensure that they are high quality.Several techniques for creating ensemb models include bagging, boosting, and stacking shown in Figure 13.Each technique ha its strengths and limitations, and the choice of method depends on the specific applicatio and dataset.A. Bagging Bagging, or bootstrap aggregating, is a technique that involves training multiple mode on different subsets of the training data and then combining their predictions using a weighte average.Bagging can be used with any model and can reduce variance and overfitting.Th basic idea behind bagging is to create multiple copies of the original dataset, each with a di ferent subset of the data [79].The models are then trained on each of these copies, combinin their predictions to produce a final prediction.By combining the predictions of multiple mod els, bagging can produce a more accurate and robust forecast.

B. Boosting
Boosting is a technique that involves training multiple models sequentially, with eac model focusing on the examples that the previous model misclassified.Boosting can improv the accuracy of the forecast but may be more prone to overfitting than bagging.The basic ide behind boosting is to start with a simple model and then sequentially add more complex mod els that focus on the examples that the previous model misclassified [80].The models are com bined using a weighted sum, with weights that depend on their accuracy.By focusing on th examples that the previous model misclassified, boosting can produce a more accurate an robust forecast.

Hybrid Models
Hybrid models combine the advantages of both statistical and machine learning models.The most used hybrid models for STLF are ARIMA-SVR and ES-ANN.ARIMA-SVR integrates the ARIMA and SVR models and can capture the temporal patterns and nonlinear relationships of load demand [82].ES-ANN combines ES and ANN models and can capture the seasonal and nonlinear patterns of load demand.Hybrid models can improve the accuracy and interpretability of STLF models for power system applications.However, they require more computational resources than statistical models and may be more challenging to implement than machine learning models [83].Hybrid models, which combine different modeling techniques, have been proposed to improve the accuracy of load forecasts [84].The following are some of the essential hybrid models:

ARIMA-ANN Hybrid Model
The autoregressive integrated moving average (ARIMA) model is a classical timeseries forecasting method that captures linear dependencies in the data.Artificial neural networks (ANNs) are capable of learning complex nonlinear patterns.By combining the linear forecasting ability of ARIMA with the nonlinear forecasting ability of ANNs, this hybrid model can capture both linear and nonlinear dependencies in the load data, resulting in improved STLF accuracy [85].

Wavelet-Transform-Based Hybrid Models
Wavelet transform is a technique that decomposes a time series into different frequency components, which can be analyzed separately.The high-frequency components represent noise and sudden changes, while low-frequency components capture the underlying trends.Wavelet transform can be combined with various forecasting techniques, such as ANN, support vector machines (SVM), or long short-term memory (LSTM) networks, to create a hybrid model.It can handle the frequency components separately, leading to improved forecast accuracy [86].

EEMD-ANN Hybrid Model
The ensemble empirical mode decomposition (EEMD) is an advanced signal processing technique that decomposes a non-stationary time series into a set of intrinsic mode functions (IMFs).Combining EEMD with ANN allows this hybrid model to handle nonstationary and nonlinear load data more effectively.The EEMD preprocesses the load data by extracting the IMFs, and the ANN is trained on these IMFs to generate forecasts.The forecasts are then combined to produce the final STLF [87].

Fuzzy-Logic-Based Hybrid Models
Fuzzy logic is a mathematical approach that deals with uncertainty and imprecision in data.It can be combined with other forecasting techniques such as ANN, SVM, or regression models to create a hybrid model that handles the uncertainty in load data more effectively.Fuzzy logic can preprocess the input data, model the uncertainties in the forecasting model, or fuse the forecasts from different models [88].

Deep-Learning-Based Hybrid Models
Deep learning techniques, such as convolutional neural networks (CNN) and LSTMs, have shown great potential in STLF due to their ability to learn hierarchical and temporal features in the data.These deep learning models can be combined with other forecasting techniques, such as statistical models, wavelet transform, or fuzzy logic, to create hybrid models that leverage the strengths of both approaches for improved STLF accuracy [89].

Performance Comparison of STLF Models
The performance of STLF models depends on various factors, such as the size and quality of the data, the forecasting horizon, and the complexity of the underlying relationships between the variables.Intelligent models outperform statistical models, and hybrid models outperform both statistical and intelligent models [90].However, the models' relative performance can vary depending on the application and dataset.For example, statistical models may perform well when the data are stationary and linear and when limited data are available.Intelligent models may perform well when the data are nonlinear and non-stationary and when a large amount of data is available.Hybrid models may perform well when there are both linear and nonlinear relationships between the variables and when the data are noisy or missing.Table 2 summarizes the findings.

The Road Ahead
Short-term load forecasting (STLF) is critical in the energy industry.It enables power system operators to make informed decisions about resource allocation, power generation, and grid stability.Over the years, numerous STLF models have been developed and evaluated, but there is still room for improvement.Some potential future research directions in the development of STLF models are shown in Figure 14.

1.
Incorporating new data sources: Traditional STLF models rely on historical load, weather, and calendar data as input features.However, the availability of new data sources, such as social media data and smart meter data, presents an opportunity to develop more accurate and robust STLF models [103].Future studies can examine the application of machine learning algorithms to identify the most relevant data sources for predicting electricity demand and how best to incorporate these data sources into STLF models.

2.
Development of hybrid models: Hybrid models combine different models or techniques to address specific challenges or achieve specific goals.Hybrid models can combine traditional STLF models with models for predicting renewable energy production or demand-side management (DSM) models [104].Future research can examine the development of more advanced hybrid models that can handle multiple input features, uncertainty quantification, and other challenges in STLF modeling.

3.
Integration of advanced machine learning techniques: Advanced machine learning techniques, such as deep learning and reinforcement learning, have shown great potential in improving the accuracy of STLF models.Deep learning techniques can automatically learn complex patterns and relationships in the data, while reinforcement learning can learn to optimize actions based on feedback from the environment [105].Future work can investigate the creation of more sophisticated machine learning models that can manage vast volumes of data, combine numerous input features, and adjust to shifting energy system conditions.4.
Handling of non-stationary and nonlinear load data: Traditional STLF models assume that the load data is stationary and linear.However, the load data can be nonstationary and nonlinear due to changes in consumer behavior, the introduction of new technologies, and other factors [106].Future research can explore the development of STLF models that can handle non-stationary and nonlinear load data, either through advanced machine learning techniques or more flexible statistical models.5.
Integration of probabilistic forecasting: Probabilistic forecasting measures uncertainty around the point forecast.It can help power system operators make more informed decisions in the face of uncertainty [107].Future work can support the development of STLF models that can provide point forecasts and uncertainty estimates, such as prediction intervals or probabilistic forecasts.These uncertainty estimates can help to identify potential risks and improve the overall reliability of the power system.6.
Integration of online learning: Online learning is a type of machine learning that can adapt to changing conditions in the energy system in real-time [108].Online learning algorithms can learn from new data as it becomes available and adjust the forecast accordingly.Future studies may look toward creating STLF models that employ online learning algorithms to increase forecast precision and timeliness [109].7.
Development of interpretable models: Interpretable models are models that can provide insights into the factors that are driving the forecast.Interpretable models can help power system operators understand the underlying patterns and relationships in the data and make more informed decisions about resource allocation and power generation [110].Additional studies may examine the creation of STLF models that are easier to understand, either via the use of advanced machine learning techniques or more straightforward statistical models [111].8.
Integration of ensemble methods: Ensemble methods, such as bagging and boosting, have shown great potential in improving the accuracy and robustness of STLF models.Ensemble methods can combine and overfit them and be used to select the best model for a given dataset [112].Future work could investigate the creation of more sophisticated ensemble models capable of handling various input features, uncertainty quantification, and other difficulties in STLF modeling [113].9.
Handling of data quality issues: Data quality issues, such as missing data, outliers, and measurement errors, can have a significant impact on the accuracy of STLF models [114].Future studies could assist in creating STLF models that can deal with problems with data quality by prediction models or more sophisticated statistical models that can directly deal with missing data.10.Integration of domain knowledge: Domain knowledge, such as knowledge about consumer behavior, the energy system, and the environment, can provide valuable insights into the factors driving electricity demand [115].Future studies can create STLF models that incorporate domain knowledge into the modeling process, either through expert systems or sophisticated machine learning methods that can add domain knowledge as different input characteristics.11.Development of adaptive models: The energy system constantly changes, and the factors driving the electricity demand can vary over time [116].Future studies may examine the creation of STLF models that can modify their predictions in response to changing energy system conditions, either using online learning algorithms or more adaptable statistical models.12. Handling multiple time scales: The electricity demand can exhibit patterns on various time scales, such as daily, weekly, and seasonal patterns [117].Future studies can develop STLF models that can handle multiple time scales by combining models trained on various time scales or utilizing more sophisticated machine-learning methods.13.Integration of uncertainty information: Uncertainty information, such as information about input data reliability or model accuracy, can provide valuable insights into the quality of the forecast [118].Future studies may explore creating STLF models that can incorporate uncertainty data into the modeling process through probabilistic models or more sophisticated statistical models that can calculate forecast uncertainty [119].14.Development of models for distributed energy resources: The increasing use of distributed energy resources, such as rooftop solar panels and energy storage systems, has introduced new challenges for STLF models [120].Future studies may develop STLF models that can account for distributed energy resources, either through models that forecast the production of renewable energy or models that forecast the effects of distributed energy resources on electricity consumption [121,122].

Conclusions
STLF models are an essential component of the energy industry, as they enab power system operators to make informed decisions about resource allocation, pow generation, and grid stability.This review article has presented an overview of the sta of-the-art STLF models for power system applications, including statistical, intellige and hybrid models.These models have their strengths and limitations.The choice model depends on various factors, such as the size and quality of the data, the forecasti horizon, and the complexity of the underlying relationships between the variables.Pow system operators should carefully evaluate the performance of different models and co sider these factors when selecting the most appropriate model for their application.T development of STLF models is an ongoing research area, and future advances in machi learning, data analytics, and computational resources are expected to improve the acc racy and robustness of STLF models.

Conclusions
STLF models are an essential component of the energy industry, as they enable power system operators to make informed decisions about resource allocation, power generation, and grid stability.This review article has presented an overview of the state-of-the-art STLF models for power system applications, including statistical, intelligent, and hybrid models.These models have their strengths and limitations.The choice of model depends on various factors, such as the size and quality of the data, the forecasting horizon, and the complexity of the underlying relationships between the variables.Power system operators should carefully evaluate the performance of different models and consider these factors when selecting the most appropriate model for their application.The development of STLF models is an ongoing research area, and future advances in machine learning, data analytics, and computational resources are expected to improve the accuracy and robustness of STLF models.

Figure 4 .
Figure 4. Algorithm of the ARIMA model.

Figure 4 .
Figure 4. Algorithm of the ARIMA model.

Figure 5 .
Figure 5. Algorithm of SARIMA model.The equations for the SARIMA model components are as follows: Autoregressive (AR) component:        ( where   is the autoregressive operator, B is the backshift operator, Xt is the time serie at time t, c is a constant,   is the moving average operator, and εt is the error term.Differencing (I) component: 1    ( where (1 − B) d is the differencing operator, and Yt is the differenced time series.

Figure 6 .
Figure 6.Principle of an ES model.

Figure 8 .
Figure 8. Block diagram of the support vector machine system.

Figure 8 .
Figure 8. Block diagram of the support vector machine system.

Figure 9 .
Figure 9. Block diagram of a decision tree.

Figure 9 .
Figure 9. Block diagram of a decision tree.

Figure 9 .
Figure 9. Block diagram of a decision tree.
is a block diagram of the gradient boosting algorithm.The data training determines the weak learners used to make a more accurate prediction.Energies.2023, 16, x FOR PEER REVIEW 16 of 29

Figure 11 .
Figure 11.Block diagram of the gradient boosting algorithm.

Figure 11 .
Figure 11.Block diagram of the gradient boosting algorithm.

4 .
Neurons: Each neuron in the MLP model receives input from other neurons, processes it using an activation function, and sends the output to other neurons in the next layer.The activation function introduces non-linearity, which enables the MLP to learn complex patterns in the data.5. Weights and biases: Each connection between neurons has a weight that determines the strength of the association.The weights are adjusted during training to minimize the difference between the predicted and actual values.Biases are additional constants that help shift the activation function, improving the modelʹs learning ability.6. Training: MLP models are trained using a backpropagation algorithm, which adjusts the weights and biases by minimizing the error between the predicted and actual values.The process is iterative, involving multiple passes through the data to finetune the model.7. Loss function: This is a measure of how well the MLP model is performing.A lower value indicates better performance.During training, the goal is to minimize the loss of function.

Figure 13 .
Figure 13.Types of the ensemble method.

Figure 13 .
Figure 13.Types of the ensemble method.

Author
Contributions: Conceptualization, S.A. and S.S.; methodology, S.S. and M.J.; formal ana sis, A.Z. and H.S.U.; validation, Z.L. and R.G.; visualization, H.K.; investigation, all authors; w ing-original draft preparation, S.A. and S.S.; writing-review and editing, A.Z., H.S.U., M.J., H. R.G. and Z.L.; supervision, H.K. and M.J.; project administration, H.K.; funding, R.G.All auth have read and agreed to the published version of the manuscript.Funding: This work received funding from an SGS Grant from VSB-Technical University of Ostra under grant number SP2023/005.Data Availability Statement: Not applicable.

Figure 14 .
Figure 14.The road ahead for STLF models.

Table 1 .
Contributions and limitations of recent publications.

Table 1 .
Contributions and limitations of recent publications.

Table 2 .
Performance comparison of STLF Models.