Generic and multi-site electricity demand data is difficult to obtain. Since the authors are familiar with Panama’s electricity infrastructure and because Panama makes its electricity load data publicly available, the authors decided to use the Panama electricity load data to build the models. In Panama, the National Dispatch Center (CND) is in charge of the power system planning and operation. According to CND methodologies [
3], the goal of forecasting with an acceptable level of deviation is to anticipate and supply the demand with minimum costs. Short-term forecasting (following week) is needed to cover security aspects in the electrical system operation. As stated in the short-term and mid-term methodologies [
3], CND does this forecast planning every week. For short-term scheduling, CND uses an hourly basis optimization software [
4]. This optimization tool solves the weekly minimal dispatch cost, and it requires data about the load forecast, the power plants, and the power grid on an hourly basis. CND is currently using the Nostradamus tool by HITACHI ABB [
5] to forecast the hourly load and feed the short-term optimization tool, to plan the following week’s hourly dispatch [
6].
The electricity consumption patterns evolve, and new machine learning (ML) approaches are emerging, motivating the exploration to update the forecasting tools with the most efficient and robust methods to minimize errors. In this sense, the current research aims to develop better STLF models. The models will be evaluated with the Nostradamus’ historical weekly forecasts for Panama’s power grid to benchmark the models’ performance against the Nostradamus forecasts in an effort to show that it is possible to improve the 168 h STLF. This research’s dataset includes historical load, a vast set of weather variables, holidays, and historical load weekly forecast features to compare the proposed ML approaches and achieve the above-declared objectives.
The Extreme Gradient Boosting Regressor (XGBoost) algorithm showed the best performance from the ML algorithms set, surpassing the historical weekly forecasts predicted by an artificial neural network (ANN) tool and providing information about features’ importance.
In addition to this introduction, this paper includes a section with a literature review on the STLF field, followed by a section with the materials and methods employed. Results are presented and discussed in the following section. Finally, the last section presents the conclusions along with limitations and future works recommendations.
Literature Review
The short-term electricity load forecasting is implemented to solve a wide range of needs, providing a wide range of applications. The most evident difference between research is the load scale, from a single transformer [
9] to buildings [
10], to cities [
11], regions [
12], and even countries [
13]. The second most crucial distinction among the research field is the forecasting horizon. Varying from very short-term applications, like forecasting the next 900 s for machine tools [
14], moving to a few hours [
15], predicting the day-ahead, which is the most common [
16], and 48 h ahead [
17], to weekly forecasts [
18]. The forecasting granularity also varies among the research field. Having granularities from 15 or 30 min, but most of the approaches consider hourly granularity forecasting. Despite the variety of forecasting applications, the current research will focus on covering implemented methodologies, chosen variables, algorithms, and evaluation criteria, since the forecast success will heavily depend on the decisions made through these development stages.
A wide variety of methodologies and algorithms have been implemented to address STLF. From the most straightforward persistence method, proposed in Reference [
19], which follows the basic rule of “today equals tomorrow”, until the most recent deep learning algorithms, as exposed in the review article of Reference [
20]. In this article, the authors compare traditional ML approaches with deep learning methods on the electricity forecasting field and the most trending algorithms in Scopus-indexed publications from 2005 to 2015.
Time-series analysis is considered one of the most widely discussed forecasting methodologies in which the Box-Jenkins and Holt-Winters procedures are extensively used. For example, the authors of Reference [
21] used those methods to forecast the weekly load for Riyadh Power System in Saudi Arabia, concluding that these approaches give insights to decompose the electric load forecast. The autoregressive integrated moving average (ARIMA) model is proposed to forecast the next 24 h in Iran [
22]. This modified ARIMA combines the estimation with temperature and load data, producing an enhancement to the traditional ARIMA model. The ARIMA model by itself does not significantly improve the forecast accuracy and is computationally more expensive, demonstrating the need to complement these models with external inputs to enhance the results.
Overall, in most recent research, these models are less used for electricity STLF since ML methods provide better results, as demonstrated in References [
23,
24], and more recently in Reference [
25]. Particularly, in this last cited study, the authors compare the performance of six classical data-driven regression models and two deep learning models to deliver a day-ahead forecast for Jiangsu province, China, concluding that the ARIMA model had several limitations to solve the STLF problem. First, it can only consider time-series data to forecast based on the electrical load. Second, the determination of the model order is either computationally expensive or empirical. Lastly, to make residuals uncorrelated, several trials are required. At the same time, autocorrelation function (ACF) and partial autocorrelation function (PACF) graphs need to be iteratively checked to tune the model.
In contrast with the classical statistical time-series models, ML models can handle more valuable factors, such as weather conditions, to improve the STLF accuracy. Multiple linear regression (MLR) has been widely used for STLF. For example, the authors of Reference [
26] used it to forecast the hourly weekly load in Thailand, obtaining an average mean absolute percentage error (MAPE) of 7.71% for 250 testing weeks and pointing out that temperature is a primary factor to predict load. Similarly, the authors of Reference [
13] utilized MLR to forecast electricity consumption 24 h ahead for 14 west-African countries, considering weather variables like temperature, humidity, and daylight hours. The researchers that have implemented MLR agreed on the fast training and interpretability this model offers, although it shows poor performance for irregular load profiles.
Another approach that has been widely used for STLF during the last decades is the artificial neural network (ANN), mainly due to the algorithm flexibility. For example, in the 2006 study of Reference [
10], an ANN is proposed with a Levenberg-Marquardt training algorithm to forecast hourly, daily, and weekly load in Ontario, Canada, presenting good results without comparing with other algorithms. Furthermore, the authors of Reference [
9] predicted a single transformer hourly load, using quarter-hour load records and weather data with hourly records, obtaining a MAPE performance below 1% with ANN for summer and winter seasons. In more recent research, the authors of Reference [
27] apply STLF for urban smart grid systems in Australia, commenting that ANN has good generalization ability for the task. However, this approach still has many disadvantages such as quickly falling into a local optimum, over-fitting, and exhibiting a relatively low convergence rate. Nevertheless, forecasting smart grid loads with increasing renewable energy sources is challenging and deserves complex solutions to obtain good results.
The support vector regression (SVR) model is another popular model for STLF, mainly with a linear kernel, due to the linearity between the inputs and the forecast, as concluded by the authors of Reference [
25], who obtained a MAPE under 2.6% for the day-ahead prediction, performing better than MLR and multivariate adaptive regression splines. Similarly, the authors of Reference [
17] proposed forecasting the 48 h of Portuguese electricity consumption by using SVR as a better alternative after previously submitting the ANN’s use for the same task in Reference [
28]. The main reason for preferring SVR was the efficiency of the hyperparameter tuning on the daily online forecast. The SVR achieved a MAPE between 1.9% and 3.1% for the first-day forecast and between 3.1% and 4% for the second day. A variant of SVR is compared against ANN by the authors of Reference [
29] to forecast the south-Iranian day-ahead hourly load. They proposed the nu-SVR, which improves upon SVR by changing the algorithm optimization problem and automatically allowing the epsilon tube width to adapt to data. They evaluated both models for each season: the average MAPE was 2.95% for nu-SVR and 3.24% for ANN.
The random forest (RF) ensemble technique combines independent learners to improve the overall model forecasting ability. The research presented in Reference [
30] took advantage of this principle to forecast the day-ahead hourly consumption in office buildings. They used many ensemble algorithms, with RF being one of them, including environmental variables such as temperature and humidity and lagged load records to improve the results. Finally, they obtained a 6.11% MAPE for RF.
Similarly, the authors of Reference [
31] submitted a comparative study between many models to forecast smart buildings’ electricity load. ARIMA, Seasonal ARIMA (SARIMA), RF, and extreme gradient boosting (XGB) were on this set of models. Their experiments demonstrated that RF showed decent results, but XGB outperformed the other methods, concluding that XGB gives better accuracy and better performance in terms of execution time. The study from Reference [
32] compares RF solely with XGB to forecast the next 24 h load and concludes that XGB, as an emerging ensemble learning algorithm, can achieve higher prediction accuracy, producing a root mean squared error (RMSE) of 3.31 for RF and 2.01 for XGB.
The authors of Reference [
33] suggest using XGB, including weather variables and historical load, to forecast the hourly weekly load of a power plant. A remark is made on the complexity of the XGB hyperparameter phase. For this reason, the fireworks algorithm is proposed as a solution to obtain the global minimum on the hyperparameter space, and for instance, getting a more accurate load forecast. STLF for holidays is one of the most challenging tasks within the field due to their irregular load profile, though, in Reference [
16], the authors argue that there are many matured predictive methods for STLF such as SVR, ANN, and deep learning (DL). However, those methods have some issues: SVR is not robust to outliers, ANN has the weakness of setting the correct number of hidden layers or can be easily trapped into a local minimum, and DL approaches require massive high-dimensional datasets for good performance. XGB lacks these issues and outperforms the others for solving STLF. Their results are based on averaging the daily profile curves for similar holidays plus the use of XGB, where this averaging plus XGB outperforms RF, SVR, ANN, and even the sole-use XGB.
Despite the good XGB performance, some authors recommend training the model based on similar days to enhance the forecast [
34,
35]. A comparison between a traditional XGB and the similar days XGB is demonstrated in Reference [
34]. The similar days approach showed a noticeable improvement, emphasizing that the accurate selection of similar days will directly affect the STLF.
Because XGB provides the feature importance property, the authors of Reference [
36] proposed a hybrid algorithm to classify similar days with K-means clustering fed by XGB feature importance results. Once the classification is done, an empirical mode method is used to decompose similar days’ data into several intrinsic mode functions to train separated long short-term memory (LSTM) models, and finally, a time-series reconstruction from individual LSTM model predictions. This hybrid model using LSTM performed better for STLF over 24 and 168 h horizons, after comparing with ARIMA, SVR, or a back-propagation neural network using the same similar day approach as initial input.
The authors of Reference [
37] proposed a multi-step-ahead forecasting methodology using XGB and SVR to forecast hourly heat load, where a “direct” and “recursive” forecasting strategy are compared. The direct method involves an independent model to predict each period on the forecasting horizon, while the “recursive” method considers a unique model that iterates one step at a time over the forecasting horizon, using the previous predicted steps as an input variable for the following forecasting step. Performance is the main disadvantage of the direct strategy because it needs to train as many models as desired periods to forecasts. The recursive strategy is sensitive to prediction errors, meaning that prediction errors will propagate along the forecasting horizon.
A study to forecast the 10-day streamflow for a hydroelectric dam used a decomposition-based methodology to compare XGB and SVR [
38]. In this study, the streamflow time-series were decomposed into seven contiguous frequency components using the Fourier Transform. Then, each component was forecasted independently by the SVR or XGB. The study results showed that SVR outperformed XGB in terms of evaluation criteria through the Fourier decomposition methodology.
Another solution joining ANN with ensemble approaches is presented in Reference [
39], where the authors seek to improve ANN generalization ability using bagging-boosting. When training ensembles of ANNs in parallel, each ensemble uses a bootstrapped sample of the training data and consists of training the ANNs sequentially, and this method reduces the STLF error but increases the computational time because of the several training procedures. Alternatively to training several ANN sequentially, the authors of Reference [
40] propose an evolutionary novel optimization procedure for tuning an ANN. For instance, avoiding the issues related to ANN tuning like overfitting and selecting the best ANN architecture. Their results achieved a 4.86% MAPE. Based on the results from References [
10,
36], ANN for STLF can outperform other forecasting methods if a robust hyperparameter optimization is performed to avoid the issues related to ANN tuning.
Recurrent neural networks (RNN) are taking an important place in the STLF field from all neural network approaches, especially LSTM. Contrary to standard feedforward neural networks, LSTM feedback connections are beneficial to deal with time-series forecasting applications. An example of forecasting the next 24 h load from a smart grid, comparing LSTM results with a back-propagation ANN and SVR, demonstrates that LSTM can offer a MAPE of 1.9% against 3.3% from ANN and 4.8% of SVR [
41]. Similarly, the authors of Reference [
42] address the STLF for a furniture company with a method based on a multi-layer LSTM and compare it to other models like ARIMA, exponential smoothing, k-nearest neighbors regressor, and ANN. Moreover, their results showed that LSTM performed better in both RMSE and MAPE, followed by SVM and ANN. According to Reference [
43], deep learning methods have a superior performance in electricity STLF. However, the potential of using these methods has not yet been fully exploited in terms of the hidden layer structures. For this reason, they evaluate deep-stacked LSTM with multiple layers for both unidirectional LSTM (Uni-LSTM), bidirectional LSTM (Bi-LSTM), and SVR as the baseline. Their results showed that Bi-LSTM returned a MAPE of 0.22% against MAPE scores above 2% for Uni-LSTM and SVR.
The hybridization of the successive geometric transformations model (SGTM) neural-like structure is another promising approach for STLF, as used in Reference [
44] to predict Libya’s solar radiation. This approach demonstrated a higher accuracy than MLR, SVR, RF, and multilayer perceptron neural network, besides having a faster training time due to the non-iterative training procedure.