1. Introduction
The fundamental objective of electric power industry deregulation is to maximize efficient generation and consumption of electricity and reduce energy prices. To achieve these goals, accurate and efficient electricity load forecasting is becoming more and more important [
1]. Distribution and transmission system operators are based on these forecasts in order to deal with the stochastic variations of the distributed renewable power sources connected to the grid [
2]. This holds true both for the aggregate system load (i.e., on a country basis) as well as for the load met by micro-grids. Although the main body of the specialized literature addresses the prediction of the total load on a country, region, or county/community level, significant attention is shifted to the bus load of the transmission and distribution systems, which are more affected by the stochastic nature of individual loads [
3]. Moreover, forecasting aggregate system load and electricity price has become a major issue in modern power systems, a pre-requisite for price forecasting [
4]. Thus, short-term load forecasting (STLF) is an indispensable tool in power systems planning, maintenance, and management and in smart grid applications [
5]. It counts decades of research activity and applications, with a multitude of forecasting models, methodologies, and tools for day-ahead and hour-ahead load predictions.
Forecasting techniques can be broadly categorized into (i) statistical or time series-based methods, (ii) physical methods, and (iii) hybrid or ensemble methods. However, machine learning (ML) techniques have by far outperformed the other categories. These include a large variety of artificial neural networks (ANN). They start from the well-known multilayer perceptron (MLP), the support vector machine (SVM) [
6], the Markov chain methods, etc. ANN-based forecasting is generally considered very effective because of the ANNs’ ability to learn complex, nonlinear relationships [
2] and their significant advantage of being universal approximators [
7]. The specific application requirements should be considered in the selection of the most suitable forecasting methodology. However, every successful forecasting model should be characterized by low computational expenditure and should be able to incorporate empirical knowledge. Further, it should be flexible and straightforward in the interpretation of its results [
4]. The models proposed in the literature can be classified into two categories: trend methods and similar-day approaches [
5]. Trend methods interpolate the demand curve as a function of time and extrapolate the curve to predict future demand. The similar day approach tracks similarities between current and historical load curves. Machine learning techniques fit this second category. A training set is formed in order to optimally determine or fit the model parameters. After sufficient training, the tuned model is applied to the test dataset. If the modeling error exceeds the set tolerance, the training set is modified, and the model is trained again. Aiming at the reduction of forecasting errors, many researchers proposed hybrid models. These models combine a clustering algorithm and a forecaster. The clustering algorithm captures characteristic attributes of the data. These include outliers, periodic behavior, and other relationships. The training set is divided into a number of relatively homogenous clusters. Each cluster set is employed to train its own forecaster. This process results in improved training of each forecaster. Time series models or ML models can be employed as forecasters in this process. The self-organizing map (SOM) can be profitably combined with SVM for peak load prediction. Fan et al. exploited an SOM for the categorization of their training data [
8]. Separate support vector regression models are applied to each cluster to predict the daily peak load. Che et al. [
9] combined the SOM and the support vector regression (SVR) with adaptive fuzzy rule forecasting and applied it to cases with variable training period lengths. The seasonality of load peaks is addressed through a functional clustering technique. Mori et al. [
10] combined the deterministic annealing (DA) clustering for the preprocessing of input data, together with an MLP ANN, to predict day-ahead peak load. Kim et al. [
11] classified existing seasonal load data into four patterns using Kohonen NN. Daubechies D2, D4, and D10 WTs were adopted subsequently to predict hourly load. Martinez—Alvarez et al. [
12] applied clustering techniques to group and label the data set samples. Thus, the prediction of a data point starts with the extraction of the pattern sequence prior to the day to be predicted. Traditionally, most models were based on feed-forward (FF) ANNs trained by modification of the basic back-propagation algorithm. Cecati et al. [
13] assessed five different learning algorithms for radial basis function (RBF) ANNs to advance their performance in the load forecasting of the ISO-New England market. Typical RBF networks were applied to 24 h electric load forecasting based on SVR, Extreme Learning Machines, and Decay RBF NN. In addition to the shallow and deep, fully connected FF ANNs, which are routinely employed in load forecasting, more complex types, such as the long short-term memory (LSTM), which is a version of recurrent neural networks (RNN), adopt a block structure with a number of gates interacting with the previous and next network state. They are more complex compared to FF ANN. However, they are capable of effectively handling temporal dependencies between variable time series lags. For this reason, they are employed in more complex forecasting tasks, such as the electricity price forecasting in auctions [
14]. To this end, convolutional neural networks (CNNs) using convolution to learn patterns within specific time windows are also employed to learn from the data from different perspectives via data shuffling. A review of more recent developments with the use of deep learning (DL) methods in electric power systems is presented in [
15]. Mishra et al. [
16] analyzed the taxonomy of existing DL algorithms applied to different forecasting problems in the electrical utility industry. Khodayar et al. explored the theoretical advantages of deep learning in power systems research. Supervised, unsupervised, and semi-supervised applications, as well as reinforcement learning tasks, were covered [
17]. Sun et al. [
18] combined Bayesian probability theory and deep learning in a framework employing clustering in sub-profiles to forecast aggregated net load from the Ausgrid distribution network. Input data for the numerical experiments were collected from smart meters in load centers in Sydney, New South Wales. Additional input from residential rooftop PV outputs was considered to enhance the performance of aggregated net load forecasting. In spite of the significant research effort allocated to the structure of the ANN applied in the load forecasting problem, the structure of input data employed for the training dataset did not receive the necessary degree of attention. The majority of the research effort uses standardized input data from data repositories and performs benchmark tests to assess possible improvements in error metrics. On the other hand, the nature of the load forecasting problem on a country or regional basis is very complicated and affected by multiple factors discussed in the next section, in modes that are not yet well understood [
19]. Moreover, day-ahead load forecasting on a country or regional level was challenged by the advent of COVID-19 and the associated shutdown of economic activity, which complicated the prediction. The period from March 2020 to May 2022 had peculiar characteristics due to the COVID-19 pandemic and the measures taken during large time intervals in order to protect public health from the spread of the virus. A study of the effects of shutting on and off several activities, aiming to assess the calibration capabilities of the prediction models, is still underway [
20]. Surakhi et al. [
21] investigated the dependence of forecasting accuracy on the selection of an optimal time-lag value. They comparatively tested a statistical approach using auto-correlation, LSTM, and a heuristic optimization algorithm combined with LSTM. In a comprehensive study involving data from load datasets from Australia, Germany, and America, Li et al. [
22] tested a convolution-based DL model with a densely connected network. The model’s backbone is the unshared CNN and a densely connected structure to avoid the vanishing of the gradient. Pavicevic et al. tested various models in temporal convolutional (TCN) and RNN/LSTM architectures for predicting the electricity price on the Hungarian market and electricity load in Montenegro TCN and LSTM layers, both in combination with fully connected layers, demonstrated the best performance, but in cases where all models failed with large mistakes, autoregressive LSTM performed even worse [
19]. Mir et al. [
23] presented the systematic development of a short-term load forecast (STLF) model using 5-year hourly load time series for an electric power utility in Pakistan. Following the investigation of previously developed models, they addressed the challenges of STLF by comparatively applying multiple linear regression, bootstrap aggregated decision trees, and ANNs.
Guo et al. [
24] employed electricity consumption data from three cities in Jiangsu, China, to train a DL-based framework, random forest, and gradient boosting machine to forecast the total electricity consumption of 3000 users. To address various factors affecting residential electricity consumption, they used feature engineering. The influencing factors were divided into date-related and air-quality-related factors, weather factors, and local economic factors. As long as the forecasting is applied to large regions or a country level, the attainable accuracy is inferior. This is true, especially when the error metrics are applied to the hourly values of power demand. Wang et al. [
25] applied a stacked noise suppression auto-encoder (SDA) model and a class of DNN to forecast the hourly electricity price. The datasets were compiled from hubs in five U.S. states. Two types of forecasting, online hourly forecasting and day-ahead hourly forecasting, were examined. MAPE values in the range from 2.51% to 46% were reported, depending on the price fluctuation, which was very high in January and very low in April 2014. Hossen et al. [
26] employed a DNN for forecasting day-ahead electricity consumption. Ninety days of data from the Iberian utility market were employed for training the multilayer DNN. Various combinations of activation functions were tested, aiming at improved MAPE, taking into account the weekday and weekend variations. The functions tested include Sigmoid, rectifier linear unit (ReLU), and exponential linear unit (ELU). Weekday MAPE ranged between 2.1 and 3.9%. Weekend MAPE ranged between 1.3 and 2.5%. Din and Marneridis [
27] investigated the feasibility of the application of the feed-forward DNN and recurrent-DNN models utilizing datasets from ISO New England. The proposed models obtained the least daily demand prediction MAPE errors of the order of 1% in the spring season. The highest errors were observed in the summer. They were attributed to the unexpected electricity consumption exceedance caused by high temperatures and social events. Dong et al. [
28] combined CNN and K-means algorithms to predict hourly load. They applied the K-means algorithm to a 1.4 million electrical load records dataset, restructuring it into several subsets. Afterward, these subsets were input to CNN for training and testing. The results were promising, attaining 3% MAPE during summer and 7.4% MAPE during winter. Wen et al. [
29] employed deep RNN–gated recurrent unit (GRU) models for short- and medium-term prediction, which attained a MAPE of 3.5% in their forecasts. Kong et al. [
30] made short-term load predictions at the individual building level. They applied a density-based clustering method to calculate and compare the inconsistency between the combined load and individual loads. Since the consumers’ lifestyle significantly changes the energy consumption pattern, the authors proposed an LSTM–RNN-based load forecasting structure for the load demand dataset. The LSTM and BPNN-T in the top tier outperformed all the other benchmarks: MAPE varied from 8.18% to 8.64% in the predictions. Shi et al. [
31] proposed a pooling-based deep-RNN for household load forecasting. They attempted to avoid over-fitting caused by increasing data diversity and dimensions. Their STLF model was tested on 920 residential smart meter datasets in Ireland. The RMSE attained outperformed ARIMA by 19.5%, SVR by 13.1%, and classical deep RNN by 6.5% in terms of RMSE. Peng et al. [
32] presented a useful comparison of important research works on typical methods used in electricity load forecasting, with indicative values of statistical metrics. They developed and applied a hybrid method—improved backtracking search optimization algorithm (IBSA)–double-reservoir echo state network (DRESN) in STLF. Mutual information is utilized to eliminate low-significance input features and retain key input features. The DRESN structure aims to increase the diversity of the network. Roulette strategy, adaptive mutation operator, and niche operator are introduced to improve the standard BSA algorithm. The IBSA is applied to optimize several critical parameters in the DRESN neural network. The proposed method outperformed eight popular benchmark models, as tested with North America and PJM load datasets. The decades of development of short-term electricity load forecasting techniques have been invested in flexible and easy-to-use computational tools currently employed by the network operators [
33]. However, as reported above, the prediction accuracy needs further improvement regarding the design and implementation of the input training datasets, which have specific peculiarities for each country, depending on the size, climate, and economic activities’ diversity and other factors. As seen in the above presentation, modeling error results reported from several research works in short-term electrical load forecasting vary over a wide range of MAPE and nRMSE. This can be attributed to the wide variation of training input data structures and the different types of predictions obtained and the different geographical scales met in the different applications. The focus of the present work is on further improvement and standardization of the training dataset of the day-ahead load prediction. This is accomplished by adding the daily heating and cooling degree-days of a representative location, as well as improving the prediction of the peak load, after carefully studying specific time periods in the Greek system where all models systematically fail. In this process, two standard, popular, and cost-effective FF ANN models are employed for the day-ahead system’s load forecasting. The comparison is carried out using consistent metrics and the same data from the Greek system [
34]. The hourly actual aggregate electricity load, as reported by the Greek Independent Power Transmission Operator (IPTO) [
34] during the five-year period 2017–2021, was employed in the training of the models, along with meteorological data. Testing and validation of the models are carried out for various periods of 2022. The main contributions of the present work are the following: (i) the current prediction accuracy level attained in the Greek system is not high, and it is proven that it can be easily attained with simple types of FF ANNs and easily available datasets; (ii) a systematic procedure is adopted to find and discuss the most important incidents of prediction failure, explain the reasons of failure, and indicate possible pathways to their remedy. (iii) This procedure leads to suggestions for improvement in the selection and phasing of the training variables, the increase in data monitoring frequency, and the possible inclusion of information on specific economic activity variables to be included in the training and prediction. The paper is organized into four sections.
Section 2 presents the overview of input data and the formulation of the prediction methodology to be employed.
Section 3 discusses the selection of input data and the specific types of ANN for the day-ahead forecast. The results of the simulation are presented and discussed in
Section 4. Finally, the conclusions and future work are presented in
Section 5.
2. Materials and Methods
Before attempting to formulate a day-ahead load prediction methodology, it is useful to present and discuss the current state of the day-ahead prediction of Greece’s electricity load curve, as reported to the European Transparency Platform (ENTSO-E), which is responsible for the central collection and publication of Electricity Generation, Transportation, and Consumption Data and Information for the Pan-European Market [
35]. As an example, the actual demand values, on an hourly basis, are compared with the day-ahead predictions for January 2022 in
Figure 1. The average values and variability, on a monthly basis, of the most important statistical metrics for 2022 are shown in the same Figure to quantify the prediction accuracy. MAPE is 2.61% with a standard deviation of 0.33%, and nRMSE is 0.036 with a standard deviation of 0.005. Finally, the mean bias error (MBE) is 73 MW, which indicates an over-prediction. On a qualitative basis, it is interesting to see in this example of January 2022 (
Figure 1) that the most pronounced prediction failures are observed with regard to the morning peaks, and to a much lesser extent, to the late afternoon peak loads. It must be mentioned in this respect that the specific level of prediction accuracy is not attained by a truly day-ahead computation. That is, not all 24 h of the next day are predicted. Instead, the next day’s prediction is corrected at noon, based on the—known at that time—actual load data of the first half of the day.
The effect of this improved prediction may be seen in the example of a typical weekday (20 January 2023) based on the data reported daily by the Greek system operator (IPTO) [
34] presented in
Figure 2. The modified noon prediction reduces the 24 h MAPE from 3.35% to 2.47% and nRMSE from 0.041 to 0.035. Thus, the forecasts reported in the ENTSO-E database are not true 24-hour-ahead forecasts. This explains the somewhat reduced accuracy in our true 24-hour-ahead forecasts presented in the next section.
Before proceeding to study the accuracy of the prediction of special events and spot-specific cases of prediction failure, it is important to understand the general typologies of the Greek system load curves in the main seasons of the year.
2.1. Typical Load Curves of the Greek System, Seasonal Effects
To understand the behavior of the 24 h total electric load curve, its evolution during a Friday of January is shown in
Figure 2. The total system’s load drops during the night, stabilizing at about 3.9 GW. The minimum values are before dawn, from 3:00 to 5:00 in the morning. This base load level makes up for the night consumption, which covers the following main activity:
Urban, road, and highway lighting;
Industrial production continuing to the night shift;
Base load of the residential sector.
Refrigeration loads for the industrial, commercial, and residential sectors.
In the period 05:00–09:00, we observe the morning ramp. During the noon hours, the demand drops below 5.4 GW. From 16:00 to 18:00, the afternoon ramp leads to a demand plateau close to 6.3 GW between 18:00 and 20:00. Next, it is useful to study the patterns of another day, which is a Monday in November (
Figure 3).
The behavior of the demand curve during the night is about the same. However, the base load during the night drops below 3.5 GW. The morning ramp during this weekday is characteristic, with a total demand increase of 1.25 GW between 05:00 and 08:00. The morning ramp continues more gradually to the first load peak of the day (about 10:00 AM). This corresponds to the following activity:
Industrial activity starts for the morning shift.
Commercial and services activity starts.
People prepare and go to work.
Students prepare and go to school.
Space heating starts after the night shut down.
After 9:00 AM, we observe an approximate consumption plateau, which results in a daily minimum at about 3:00 PM. Following the noon hours, we observe a gradual increase in consumption, which leads to the second ramp of the day and leads to the second consumption peak at 18:00–19:00 (because of the winter time and the early advent of evening, while this peak goes up to 21:00–22:00 during summer time). This second peak reaches 5.5 GW during this mild late autumn day but may reach 8.5 GW or more during winter (
Figure 1) due to the part of space heating supplied by electricity. After 8 PM, we observe a gradual reduction in the electric load, which takes its night levels after 02:00.
The system’s load levels may drop significantly lower during the neutral months of April–May and October, respectively, which do not require electricity consumption for space heating in the residential and part of the commercial sector.
An example of this performance is presented in
Figure 4 for the month of April, 2022, where the minimum demand during the night drops close to 3 GW and the evening peak drops to less than 6 GW during late April, where ambient temperatures are of the order of 20–24 °C during the day.
Further, it is interesting to observe in
Figure 5 a pronounced load prediction failure of the Greek system’s operator for the period 20–22 April (Wednesday to Friday, hours 2640–2712). An observation of the weather data for typical places in central Greece shows a sudden weather improvement after a rainy and cold weekend. A closer observation of this period in
Figure 5 indicates that the day-ahead prediction routinely over-predicts the demand for the 20th, 21st, and half of the 22nd of April.
Only after the noon correction in the prediction of the afternoon system’s demand the error vanishes, and the prediction accuracy returns to high levels.
Next, the Greek system’s operator’s forecasting is compared with the actual demand for the month of July 2022 (
Figure 6). This is a difficult period for forecasting because of the high electricity demand for air conditioning, which leads the system’s peak demand to exceed 9 GW for certain cases (25–28 July, hours 4920–5016). As regards the prediction accuracy, it is interesting to observe a prediction failure for the weekend 16–17 July (hours 4704–4752), which is shown in detail in
Figure 7.
The system’s operator failed to predict with sufficient accuracy the effect on the system’s demand of the onset of high ambient temperatures during the specific weekend. This will be examined in more detail in view of the respective predictions of our models in
Section 4. Next, we are going to present the modeling approach we developed and tested, aiming at further improving the forecasting accuracy of the Greek system demand.
2.2. Input Data Employed for the 24-Hour-Ahead Forecasting
Our investigations were based on the processing of the measured hourly electrical load during the years from 2017 to 2021, for which the hourly load demand input data for the Greek system were obtained from ENTSO-E [
35]. Testing and validation of the models were carried out for various months in 2022. In addition to the electric load curves, meteorological data, at least on a daily basis, need to be employed for representative climatic conditions of Greece. The central location of Athens in Greek geography, and the presence of about 40% of the population and a significant part of industrial and business activity here, allows us to consider Athens weather data as corresponding—more or less—to the average Greek climate as weighted by the number of inhabited space and number of inhabitants. For this purpose, out of the four weather stations of the central Athens area (Gazi, Ambelokipi, Patissia, Psychico) [
36], the suburb of Psychico was selected for weather data [
37]. Psychico may be considered as representing the climatic conditions of Athens, which hosts a major part of the population and economic activity, and, on the other hand, its climate is not severely affected by the city-center conditions. On the other hand, because it may be considered to belong to the northern suburbs, and it is not densely built with high-rise buildings, and it has plenty of trees and park space, its climate is not affected by Athens center conditions, a fact that allows it to be more representative of Greece. For this reason, input weather data as daily averages and high-low temperature values were obtained from the meteorological station of Psychico [
37].
Apart from meteorological variables such as dry bulb temperature and relative humidity, load presents a high correlation to its past values [
4]. To this end, it is interesting to confirm, to the specific dataset, the generally observed short-term periodicity of the load using the Pearson correlation coefficient [
38]. In
Figure 8, the Pearson correlation coefficients of the current hourly load for the full year 2021, correlated with its previous hourly values up to 216 h before, are graphically presented. Obviously, they start from the value of 1 at zero delay, and they are seen to fluctuate with 24 h periodicity. However, a high correlation coefficient of 0.9344 is clearly observable for a 24 h delay. Next, a higher correlation coefficient observed was 0.8392 for a 168 h delay. For this reason, 24 h lagged load and 168 h (previous week) lagged load are routinely fed as input to FF ANN applied for day-ahead load prediction.
For the day-ahead load forecast, the following are input parameters usually applied in the specialized literature:
Dry bulb temperature;
Dew point temperature;
Hour of day;
Day of the week;
Holiday/weekend indicator (0 or 1);
Previous 24 h average load;
24 h (previous day) lagged load;
168 h (previous week) lagged load.
The ambient dry-bub (DB) temperature is included in most investigations because temperature affects electricity consumption. The correlation between the total daily electricity demand (GWh) of the Greek system in 2021 and daily average temperature is shown in
Figure 9. A nonlinear relationship between load and temperature is observed. There exists a baseline daily demand of the order of 100–130 GWh, during the days with normal average temperature (DB) in the range 18–24 °C. With the onset of higher average temperatures, the total daily load steeply increases up to 210 GWh.
The same trend is observed when the mean daily temperature deviates to values lower than normal. However, as seen in
Figure 9, the effect of lower mean daily temperature levels on the total daily load (GWh) is significantly less pronounced. This is due to the fact that space cooling is carried out almost exclusively by means of electrically driven heat pumps and air-conditioning equipment, whereas the heating is mainly carried out by natural gas, oil and pellet-fueled boilers, and—to a lesser extent—by heat pumps or split units in heating mode. Hourly temperature data (dry bulb and dew point) for locations in high-demand areas of the system are usually considered. Another environmental variable that affects electricity consumption is the ambient air humidity, usually reported in the form of more complex indices such as the relative humidity (RH), dew point (DP) temperature, or wet bulb (WB) temperature. However, the correlation of any one of these indices with the total load is not straightforward. On the other hand, since the effect of ambient temperature and humidity on the electricity consumption is conveyed through the heating or cooling requirements, we considered, as a good practice, to correlate the total daily load in GWh with the heating degree-days instead. These are routinely reported on a daily basis by all weather stations. This correlation is presented in
Figure 10 for a typical weather station in the Athens area during 2021.
As seen in
Figure 10, the correlation of the total daily load with the daily heating degree days during the heating season is significantly better than the correlation with ambient temperature seen in the previous Figure. This hints at a possibly better training ability of the machine learning models to be employed in the predictions.
Moreover, cooling degree days are also reported on a daily basis by all weather stations since they give an estimate of the necessary energy consumption for space cooling during the summer. Again, a correlation of the total daily load in GWh with the cooling degree days of a typical weather station in Athens, shown in
Figure 11 reveals a clear positive correlation whenever five cooling degree days are exceeded daily.
As already observed for the cooling season, the correlation coefficient of the total daily load with the daily cooling degree days is significantly higher than the correlation with the heating degree days. Based on the above findings, in the current work, we preferred to use the daily heating and daily cooling degree days instead of the DB and the DP temperature. The required weather data are significantly more simple and easy to acquire since they refer to just two daily values instead of the two 24 h vectors required by the usual practice. Moreover, the daily weather forecast required for the day-ahead forecast requires just a single value (daily heating or cooling degree days) instead of 48 values, involving the forecasted hourly values of DB and DP temperatures. To summarize our model training approach, the following input parameters are selected for training:
Heating degree days (daily for a representative meteorological station);
Cooling degree days (daily for a representative meteorological station);
Hour of day;
Day of the week;
Holiday/weekend indicator (0 or 1);
24 h lagged load (hourly resolution);
168 h (previous week) lagged load (hourly resolution).
3. Neural Network Selection
As already discussed in the introduction section, a significant volume of research work has been carried out in the last decade, especially concerning the application of deep learning in the electrical utility industry, including power system fault detection and classification, load and power forecasting, wind speed and irradiance forecasting for wind and PV energy system, power quality detection, etc. As regards applications in short-term power load forecasting, the more advanced deep learning models are usually applied in specific cities or communities. On the other hand, the complex nature of the problem of electrical load forecasting on a country basis requires special attention to the type of data to be employed for the training. Since there is no possibility of reconstructing the problem of forecasting the total load demand of a country, with 1 h resolution, by adding a very large number of distribution units covering cities and counties that could be addressed with advanced ML models trained by numerous input data of more local character. Based on the above reasoning, our selected approach must be checked for effectiveness, starting from the simplest types of shallow neural networks and comparing them to the prediction accuracy of the state-of-the-art commercial forecasting platforms.
To this end, the following two types of simple, feed-forward ANNs were selected for comparative testing in our investigations.
3.1. FF ANN
Due to the fact that the above-mentioned inputs must be assimilated in a complex way to match the training target points, a feed-forward neural network is very well suited for this type of time series forecasting. Neural networks have a proven ability to fit multi-dimensional mapping problems arbitrarily well, given consistent data and enough neurons in their hidden layers. The first neural network to be applied is the well-known, simple form of a feed-forward artificial neural network (FF ANN) with one hidden layer, or MLP, with sigmoid hidden neurons and linear output neurons.
The values of several important parameters of the FF ANN employed are presented in
Table 1. The network training function updates weight and bias states (expressed in vector form
) according to the Levenberg–Marquardt back propagation optimization algorithm [
39]:
where
is the Jacobian,
is the value of mu at step k, and
is the vector of the components of the modeling error (sum of squares) [
40]. As
is increased, the algorithm approaches the behavior of the steepest descent algorithm with small learning rate [
39]:
as
decreases to zero, the algorithm becomes Gauss–Newton.
The algorithm begins with
(
Table 1). If a step does not yield a smaller value for the modeling error, the step is repeated with
multiplied by a factor of 10 (
Table 1). If a step reduces the modeling error, then
is divided by 10 for the next step. Thus, we approach Gauss-Newton, to provide faster convergence. The algorithm provides a nice compromise between the speed of Newton’s method and the guaranteed convergence of the steepest descent.
The FF ANN is implemented by the use of the open-source platform Tensorflow [
41]. The input layer includes seven input nodes, namely, heating degree days, cooling degree days, hour of day, day of week, holiday/weekend indicator, 24 h lagged load, and 168 h lagged load. The hidden layer comprises 20 neurons, each containing a sigmoid activation function [
42]. Training involves the fitting of a complex curve through the training data. This is affected by employing loss minimization algorithms, as well as the corresponding weights and biases optimization. Since this type of complex fitting procedure is employed in the day-ahead prediction, it excludes erroneous or noisy information from the dataset. This explains why the required quality assurance procedure must be included in the preprocessing of the reported electricity demand data to correct any reporting errors or missing values [
43,
44,
45]. Validation is performed with one dataset, which corresponds to the actual demand year 2022. In the specific runs, we did not consider overfitting, which would require further validation datasets and observation of possible decrease in the modeling error associated with an increase in the validation error [
46].
3.2. Feed-Forward Back Propagation Neural Network
A more complex type of neural network examined is a feed-forward back-propagation neural network (BPNN) with two fully connected hidden layers [
47]. This type of network has already been successfully applied to forecasting problems of complex systems with highly nonlinear behavior affected by several parameters [
14,
48,
49]. For a single-layer network, the error is an explicit function of the network weights, and its derivatives with respect to the weights can be easily computed. In multilayer networks with nonlinear transfer functions, the relationship between the network weights and the error is more complex [
39]. The goal of BPNN is to update each of the network weights so that the neural network can approximate its output to the desired target. The error between the neural network output and the desired target can be written in the form of a cost (or loss) function, where
is the output and
is the desired target for a specific time step:
The objective is to minimize this cost function by updating the weights and biases during the training process, which, for the specific type of network, is an iterative process. It starts with forward propagation with the training dataset, which computes an initial output and then compares it to the reference values (targets) to calculate the loss function. The loss is back-propagated using a series of partial derivatives with respect to each FNN’s internal parameters (weights and biases). Thus, the values of these parameters are updated, and a new iteration starts. The iterative process ends with the fulfillment of convergence criteria or when the maximum number of epochs is attained. At regular intervals during the training iterations, a separate test dataset is employed to validate the accuracy of the BPNN. This procedure is similar to the one employed to train the feed-forward ANN described in
Section 3.1.
The Adam optimization algorithm is employed for training the specific ANN, as mentioned in
Table 2. It is an extension of the classical stochastic gradient descent procedure, which computes individual adaptive learning rates for different parameters from estimates of the first and second moments of the gradient [
50]. Training methods with a derivative-based optimization algorithm, such as Levenberg–Marquardt or Adam, may be trapped in local minima; hence, they should be repeated to ensure they lead to an appropriate ANN.
A comparative analysis of the results of applying the two types of networks with the actual demand for various periods of 2022 takes place in the next section and is based on the MAPE and the nRMSE error metrics. This type of performance metric is routinely applied in load forecasting problems. They are expressed as follows:
where
and
are the actual and forecasted loads of hour i, and i = 1,2…N is the sequential number of hours in the time period examined.