This paper proposes a method of obtaining accurate short-term load forecasts by using artificial neural networks in recurrent mode with weather-related variables used as exogenous inputs. The model produces a 24-hour forecast as the output of a fully-connected ANN with one hidden layer and sigmoid functions as activation functions in the hidden nodes. The output nodes use a linear activation function. Inputs are historical values of load. The input vector also includes day, month, day of the week and whether or not the day is a working day. Weather information is also used as exogenous input. The architecture of the network is such that the neural network is trained in open-loop using actual load data in order to calculate the node weights, and then, the network is used in closed-loop to generate the forecast applying the calculated forecast load as input. The results show an improvement of the forecast when compared with ANN models without exogenous inputs. The resulting network is a recursive network, and therefore, by using its own calculated output, it does not need to be retrained in order to continue providing forecasts.

This research lies in the intersection of the fields of data forecasting, electrical load forecasting, artificial neural networks and time-series forecasting. We will focus the literature review on the topic of short-term load forecasting using artificial neural networks.

#### 1.2. Multi-Layer Feedforward Neural Network Approach

Abdel et al. in [

10] developed a univariate model for medium-term load forecast utilizing abductive and neural networks. The authors compared the performance (MAPE-based) between the abductive networks model and the neural network model. The MAPE of the neural network model was about 4%. The ANN used by the authors in this research was a multi-layer feedforward neural network (MLFFN). Since their model was univariate, only historical values of load data were used. The authors utilized six years of load data, normalized in order to remove the trend, and used a single-next month model for their iterative forecasting. The authors compare two different approaches: a single 12-month stream of data and 12 separate one-month streams of data. The single stream method proved more accurate, 3.2% vs. 3.8%. The authors claim better MAPE performance with the ANN than ARIMA and other forecasting methods.

The researchers Alfuhaid et al. [

11] utilize a cascaded artificial neural network (CANN) to produce a forecast that includes peak, minimum and daily energy as additional input data for the final forecast stage. The authors utilize two neural networks: one small network to predict the peak, maximum and minimum loads; another network that uses the output of the first network, to predict half-hourly loads for the next day. The first network cascades onto the second. Researchers used data from the network in Kuwait. The average (MAPE) forecasting error by applying CANN is 2.707% as opposed to 3.367% when using conventional ANN (FitNet).

The work by Bennett [

12] describes the comparison between an ARIMAXmodel and an MLFFN ANN model to forecast low voltage next day total energy use and next day peak demand. The authors propose a hybrid model incorporating a double exponential smoothing algorithm, autoregressive terms, relative humidity and day of the week dummy variables to increase accuracy. The ARIMAX model included linear and quadratic terms for the temperature and relative humidity as a multiplying term for both. The double exponential regression term was introduced in order to account for the general trend. The ARIMAX model yields an MAPE of an average of 7% and the ANN model an MAPE of 6%. Data were taken from a transformer serving 128 residential customers. The authors found that daily average temperature explained half of the observed variance in the model. The ANN used was a feedforward classical FitNet network. The hybrid ARIMAX-ANN model showed improvement in forecast accuracy and fit.

The work done by Bilgic [

13] shows an application of an ANN network for forecasting hourly load 24-hour ahead. The authors used previous research by others for comparison purposes. The research used a multilayer ANN with five years of scaled data. The best results are with one, three and five hidden neurons. They tested the algorithm for each of the nine geographical regions, with MAPE results in the order of 3%. Then, they tested the model on the aggregate of all regions with MAPE around 1.85% for the aggregate. Compared with previous results for the same dataset by other authors, the ANN model showed improvement in accuracy. Kandil [

14] uses an MLFFN network trained with a backpropagation algorithm to achieve 0.981% error in forecasting 24-hour loads for Hydro-Quebec.

In [

15], the authors study the energy consumption of Mauritius with different training schemes. The first scheme, used for comparison purposes, uses a supervised training MLFFN ANN. The second scheme uses a three-step hybrid process in which data clustering, by means of a Kohonen self-organizing map (SOM) clustering technique, which is used to group load by type, identifying the different day use patterns. The second step updates the information manually in order to incorporate the results of the first phase into an input model for the third step. The third step is to run an MLFFN ANN with a backpropagation learning algorithm using supervised learning. MAPE results for the first model run in the order of 4.l5%, whereas using the hybrid model MAPE is in the order of 3.9%.

The work presented by the author by Charytoniuk [

16] focuses on predicting a very short-term (60–90 min) forecast based on the incremental changes of load during the day before. The author claims that incremental changes of load during a day are more predictable than actual loads, and therefore, his model focuses on incremental load forecasting. The research mentions that this method is more robust because it is less dependent on the actual weather conditions, especially when the forecast weather is different from the training weather values. The author uses an MLFFN ANN with only one day’s data. The author also breaks down the forecast by day type, generating one forecast for each type and using up to 40 neural networks per day type.

Ekonomou et al. [

17] present a comparison of several feedforward ANNs using combinations of two learning algorithms (gradient descent and Levenberg–Marquardt), three transfer functions (hyperbolic tangent sigmoid, logarithmic sigmoid and hard-limit) and several different structures (1–3 hidden layers with 2–30 neurons in each hidden layer). The methods were applied to actual loads in Greece for a period of one year with similar results in different combinations. Furthermore, Ekonomou [

18] presents a hybrid method in which a wavelet denoising algorithm is used to clean up the input data to a feedforward ANN.

The work by Gooi [

19] reveals an ANN network using peak and valley load with a single hidden layer. Inputs include load and predicted weather data. The output predicts either a peak or a valley load depending on whether the model is trained to forecast the peak or the valley load. Once the peak and valley load are predicted, the hourly forecast can be obtained from a linear combination of both. Five years of historical data were used. Statistical methods are used to classify the inputs according to patterns. Ad hoc groups are created based on their similarity, with some groups, like weekend or holiday, added and some days grouped into similar day types. The model parameters are varied manually in order to improve the forecast. The parameters are the number of input nodes, the number of hidden nodes, momentum rates and learning rates. Once the model is trained, it is put online in order to use the latest one-hour load data. MAPE errors obtained are in the range of 1.43%–4.14% for days of the week and 5.8% for special days.

Tee [

20] proposes a model using a model using an MLFFN with 51 inputs, 16 hidden neurons and one output layer. The inputs include 24 dummy variables for time of day, past hour load for 24 h, one dummy variable for weekend or weekday, temperature at the hour before and month of the year. The author supports the choice of just temperature as a sole weather variable, citing [

3,

14] that wind speed, humidity and cloud cover have little influence on the performance of the ANN. The training method is a Levenberg–Marquardt back propagation algorithm. The MAPE achieved by this study is 0.439% with a maximum MAPE of 7.986% for the month of December.

The work by Harun [

21] shows a feedforward ANN, which has two data preprocessing schemes for comparison: one group does not use differencing of the data, and the other uses first order differencing in order to achieve stationarity. The authors run different simulations using lags of 24, 48 and 72 h, with the latter giving the best results. Furthermore, the stationary model gave better results for most forecast results.

Hernandez [

22] uses an MLFFN in order to forecast load for a region in Spain. The author’s approach is a three-layer MLFFN (input, hidden, output) with input variables day of week, month and total day load. The model uses 16 neurons in the hidden layer. A heuristic method is used to select the number of hidden neurons. Training was done via a Bayesian regulation backpropagation training function. The mean error MAPE was 2.4037%. Special days had errors in the order of 4%. In a different research work, Hernandez [

23] tests two different models of MLFFN in order to forecast total load for a city. The first model is a traditional MLFFN, and the second one is a two-stage modified MLFFN in which a first MLFFN model is run to predict peaks and valleys in load for the next day, and the output of that model is used as input for the regular MLFFN model. The variables considered were precipitation, air temperature, average wind speed, average wind direction, relative humidity, atmospheric pressure and solar radiation. The MAPE for the dual-stage MLFFN model was 1.62% and 2.47% for the MLFFN.

The work by Kalaitzakis [

24] is a comparison of the performance of nine different methods, ordered from the highest error to the lowest error: the autoregressive (stochastic AR) method; an MLFFN trained with backpropagation; an adaptive learning rate with back propagation; an MLFFN with Gaussian encoding; a random activation weight neural network; a weight matrix random activation weight neural network; a zero-order regularization radial basis function neural network; an MLFFN using a real-time recurrent learning algorithm; and an autoregressive recurrent neural network (ARNN). The authors found that the best performance was obtained by an ARNN, with a relative error of 1.22%.

Khotanzad [

25] presents an approach with two different generations of the same MLFFN ANN. The first generation broke down the model into three separate ANNs: hourly (yesterday’s and two days ago data at this hour), daily (yesterday’s data for 24 h) and weekly (24-hour data for the last week on the same day). The second generation broke down the model into different hours of the day (early morning, mid-morning, afternoon peak and late night). Both models utilize an adaptive combiner in order to provide a forecast based on the output of the individual models. The team achieved 2.19% MAPE for the second generation model, compared with 2.52% MAPE for the first generation for a day-ahead forecast.

Kiartzis [

26,

27] presents an MLFFN structure capable of achieving 2.52% MAPE by using a three-layer MLFFN and backpropagation training. The model uses 64 inputs: loads for one and two days ago (24 h), maximum temperature for one and two days ago, square of the temperature deviation for both days, maximum temperature difference for the past two days, day of the year as a sine wave and a cosine wave and day of the week. No comparison data are provided.

Matsumoto [

28] presents a short-term load forecasting technique for summer loads, using a two-part predictor. The first part is an MLFFN that uses data from the same year only; the second part is another MLFFN that uses data from consecutive years. The forecast produced by the first part is adjusted by the trend found from the second module. This way, the model can accommodate variations in the trend from one year to another. The first ANN is used to classify inputs using the norm as the classifier, then grouping the data in order to select the forecast load to input the second module. The authors found that they can generate forecasts with an MAPE of 2.52%.

Moharari [

29] shows the implementation of an MLFFN ANN for forecasting short-term loads considering special days. The input to the MLFFN network includes day variables, such as weekends and holidays, as well as weather variables, such as minimum and maximum forecast temperatures, and historical loads for the past 15 days, for a total of 23 inputs. The implementation results in a forecast with an MAPE of 1.43%.

Raza [

8] presents a model using an MLFFN trained with a gradient descent algorithm. The inputs to the network include day of week, working day, hour of day, dew point, dry bulb temperature and loads for current day, day before and week before. Twenty neurons were used in the hidden layer. The forecast accuracy achieved was separated by season, and it varied from 3.81% in the spring to 4.59% during the summer. The analysis included a statistical insight into the resulting MAPE error with confidence level intervals for the error.

Papalexopoulos [

30] shows a model with an MLFFN trained with 77 inputs. The architecture is a feedforward network (FitNet) that uses seasonal input variables in the shape of sine and cosine functions of a period equal to one year in order to accommodate seasonal variations; temperature for today and tomorrow (forecast), indirect temperature variables (average maximum temperature forecast, average minimum temperature forecast) for tomorrow and last week; temperature trend variables, such as the difference in temperature between today and average maximum, etc.; cooling and heating degrees by day; and historical load variables (averages, trends, peaks). The model achieved an MAPE of 1.783%.

Reinschmidt [

31] describes a model with two modules: the first module is a feedforward neural network that produces an ARMA model of the load data; the second module, a recurrent ANN, uses the output from the first module, in addition to other weather variables in order to train the network to produce a load forecast. One subnet is developed for each of the 24 h of the day, and then, their results are combined to produce the complete forecast. The author claims that traditional models utilizing only ANN networks do not have the capability of adapting to sudden changes in weather conditions. The authors do not discuss the accuracy of their forecast.

Santos et al. [

32] discuss a univariate ANN model that uses minimal load data, only current day, past week and past two weeks’ load in order to make a forecast using a feedforward ANN. The activation function is a hyperbolic tangent function. One hidden layer was used. In order to incorporate the effects of temperature, one additional load variable is added to the input, which is the average expected load for the next day, which is calculated offline by another method. The MAPE obtained by this arrangement is 1.71%. Furthermore, Santos [

33] includes reactive power in the input of an ANN model in order to reduce the forecast error. The reason the authors included this variable in their study is that the test data were taken in an area with a mild climate, and therefore, the correlation between load and weather variables was very low. The approach mentioned in the paper includes pre-processing the data in order to influence the composition of the input vector in such a way as to reduce the margin of discretion in its definition. The results yield MAPE values in the range of 3.27%–4.96%.

Shimakura [

34] describes a two-step method for generating a forecast. The idea is based on the fact that there can be a trend factor in the load from one year to another in the same comparable season. In order to remove the problems associated with ANN learning time-series data that contain trends, the author proposes a system in which the trend is first calculated and removed from the data by means of a data compensation process, and then, the data are processed in the regular manner by an MLFFN. The trend extracted by the preprocessing algorithm is then added to the forecast of the MLFFN to generate the final forecast values. The author also describes a technique in which some of the weights are not allowed to change (restricted change) in order to minimize overfitting. The MAPE obtained by this model is in the range of 2.1%–2.7%.

Zhang [

35] proposes an MLFFN model that uses three layers on the ANN (input, hidden, output). The input includes information on weather, load and whether the day is a working day, weekend or holiday. Even though the network is not recurrent by design, past values of the load are used in the input vector. The network is trained with a backpropagation algorithm. The MAPE obtained by this method is in the range of 1.87%–3.051%.

Sinha [

36] presents research on an MLFFN that is broken into six subnetworks in order to improve training time. Each subnetwork processes 4 h of the day. The ANN is trained using a backpropagation algorithm. The error reported was in the order of 3%.

#### 1.5. Other Computational Intelligence Methods

The research by Fattaheian [

43] uses support vector regression (SVR) and the radial basis function (RBF) on an ANN network trained with a backpropagation algorithm. First, the authors applied an SVR as a regression mechanism. Then, four kernels are tried on the results: linear, polynomial, sigmoid and RBF, selecting the best of the four in terms of the MAPE of the fit. Then, an optimization problem is solved with an objective function, which is the same as the objective of the SVR model. A combinatorial model is run in order to obtain the model with the lowest MAPE. That model is the one used for forecasting. A correlation analysis is used on the input variables in order to determine the best subset of inputs.

Hayati [

44] compared the performance of three different ANN structures: (a) MLFFN with one hidden layer and various configurations for number of hidden neurons; (b) Elman recurrent neural network (ERNN), which is a modified MLFFN with feedback from the hidden layer output to the input layer, also with a varying number of hidden neurons; and (c) RBFN, where the hidden layer clusters the inputs based on a radial basis function, and the output layer performs a linear transformation of the hidden layer to generate the output. The author found that the RBFN network yields the smallest error. The number of centers (neurons in the input layer) was selected manually by the user. The MAPE results were 0.17% for RBFN, 0.38% for MLFFN and 0.76% for ERNN.

Hernandez [

45] tested a three-stage model with a Kohonen SOM as a previous stage, then a K-means clustering algorithm and, finally, an MLFFN for post-processing. The groupings found by the SOM were: (a) working days and holidays; (b) seasonal months; (c) weekends and weekdays. The clusters used by the K-means were: (i) working days of January, February, March, April, November, December and 15–31 October; (ii) Saturdays, Sundays and holidays; and (iii) working days of May, June, July, August, September and 1–14 October. The MAPE found by this arrangement was in the order of 2.76%. The work by Hsu [

46,

47] is very similar: using an SOM to group the demand patterns and predict the peaks and valleys for the next 24 h; and the second phase uses an MLFFN to produce the 24-h load as the output by means of a linear transformation. The MAPE for the combined system is in the range of 1.14%–1.25%. Similarly, Marin [

48] describes a model that uses an SOM for pre-classification of the input into similar load profiles, then uses a recurrent ANN to generate forecast load; then, the model is put online in a recall phase with a simplified model, and retraining of the forecasting model is done once a year for adjusting the weights and biases. Their Kohonen SOM uses 15 classes (patterns) of load, e.g., “Saturdays in August”, which demonstrate similarities. The authors create one Elman ANN (ERNN) for each class, and the results are presented for each network. The average MAPE ranges from 1.03% for Class 14 (Tuesday–Friday of the third and fourth weeks of June and July) to 1.71% for Class 8 (Mondays from October–March).

Another method used for comparison is fuzzy logic. In [

49], Badri compares a feedforward neural network (MLFFN) with a fuzzy logic (FL) algorithm for 24-hour ahead forecast. The FL algorithm starts from a linear regression on the available data. The fitted data provide a predicted peak load and maximum daily temperature. The authors did not provide information on the length of time for the data used for the MLFFN model. Their results were an accuracy of 0.5% MAPE for the MLFFN and 4.91% for the FL model.

Khosravi [

50] utilizes an interval Type-2 fuzzy logic system (FLS). The idea is to utilize three dimensional fuzzy membership functions in order to accommodate uncertainty in the data. By using this FLS system, the researchers accomplished an improvement over the interval Type-1 FLS. The RMSE found was in the order of 0.170.

Kim [

51] demonstrates an implementation of an MLFFN network, which produces a provisional forecast, which is later adjusted for weather and holiday behavior by a fuzzy logic algorithm to produce a final forecast. The MAPE achieved in this model is 1.3%. The MLFFN utilizes load data only. The training algorithm is backpropagation.

Mahmoud [

52] presents a post-processing module that takes the output from the traditional forecasting module and, using some of the inputs to the models, adjusts their forecast output by means of a fuzzy logic algorithm. The idea is to make a forecasting system more robust to operational scenarios not reflected in the training data. The goal is to increase forecasting intelligence in order to optimize parameter selection and cover any missing knowledge in the model. The fuzzy algorithm optimizes the error by means of converging to an acceptable range. The system works similarly to a PI controller in which two gains need to be tuned in order to achieve the optimum error range. The fuzzy logic mechanism finds the optimal

${K}_{p}$ and

${K}_{i}$. The technique is applied on SVM, MLFFN and FBTFSwith improvement in the forecast accuracy of 4.9%, 4.2% and 4.6%, respectively.

Santos [

53] and Rafael [

54] developed a model that uses a neuro-fuzzy approach utilizing Gaussian membership functions to group signals into specific days based on fuzzy logic. The fuzzy logic parameters were adjusted by using backpropagation. The number of parameters for the model were determined using trial and error. The result is a 20-member function model that yields a forecast accuracy of 3.64%.

Srinivasan [

55] proposes a genetic algorithm (GA) to deal with some shortcomings of the ANNs: dependence on initial parameters, long training time, lack of a problem-independent way to choose appropriate network topology and the incomprehensive (black box) nature of ANNs. The GA is used to evolve an MLFFN and connecting weights in order to improve its forecasting accuracy. One network is developed for each day of the week, and Mondays are lumped together with days after holidays. The achieved MAPE of the evolved ANN ranges between 0.8% and 1.01%, compared with the MAPE of the comparable statistical model used by the same utility of 1.28%–1.89%, respectively.

Subbaraj [

56] presents an approach using two modules: evolutionary programming (EP) and particle swarm optimization (PSO). The modules do linear combinations between the results of different MLFFN networks: instead of selecting the best fit ANN, it selects an optimal combination of ANNs to produce the forecast. The input consists of current and time-delayed values for load, temperature, relative humidity and forecast values of temperature and relative humidity for the forecast period. The EP algorithm searches for and finds the optimal solution by evolving a population of candidate solutions over a number of iterations. The PSO algorithm is a stochastic global optimization technique in which all of the solutions tend to follow the optimal one. The technique results are as follows: the unprocessed best ANN result by an MLFFN gives an MAPE of 2.95%. By using EP, the solution improves to an MAPE of 2.24%; by using PSO, the solution improves to an MAPE of 2.27%. Therefore, EP gives a better forecast error.

Sun [

57] describes a method that uses two different models for different days: a fuzzy logic (FL) support vector (SVM) method to forecast low load days, such as weekends and Mondays, and a linear extrapolation method for the rest of the weekdays. The linear extrapolation method is adapted to include weather variables. The method reduces the MAPE from 2.32% down to 1.63%.

Yang [

58] proposes a forecasting scheme where the input is partitioned by an FL algorithm into different groups (fuzzy sets), and an ARMAX forecast is done on each one of the groups. The input variables are pre-filtered based on their correlation with the system load. The ANN used for comparison, an MLFFN, used six inputs, 10 hidden layer neurons and one output neuron. The result is a model with an MAPE of 1.98% compared with 2.31% for MLFFN and 2.22% for ARMAX run in SAS.

Carpinteiro [

59] presents a model with two SOM networks, one on top of the other. Seven inputs are used in the representation, five loads and a sine/cosine function for the hour on a 24-hour period. The comparing ANN is an MLFFN with univariate input of hourly loads (2160 instances). The SOM networks utilize feedback in between them. The results of the SOM networks show an MAPE of 2.33% vs. 2.64% for the MLFFN for a Friday and 2.03% vs. 5.92% for a Sunday.

Crone [

60] proposes an empirical comparison between MLFFN and SVR models using the radial basis function (RBF) and linear kernel functions, by analyzing their forecasting power on five time series. Their results show that RBF SVR models have problems in extrapolating trends, while MLFFN and linear SVR models without data preprocessing provide robust accuracy across all patterns and clearly outperform the commonly-used RBF SVR on trended time series. The MAPE results are: 1.391% for MLFFN, 1.632% for linear SVR and 1.590% for RBF SVR.

ANNs are used in combination with other techniques in order to address specific deficiencies. Ghayekhloo [

9] uses a hybrid time series and regression analysis to select the best sets on inputs and, then, a genetic algorithm to process the weights. Satish [

1] uses a multi-ANN method to address the day-of-week problem, which affects most statistical methods. Buitrago [

61] and also Abdulaal et al. [

62] propose a framework combining parameter estimation, clustering and ANNs to group load patterns. Lopez [

63] uses self-organizing maps (SOM) for addressing the groupings of load patterns. Liang [

64] uses differential dynamic programming together with ANNs trained with supervised and unsupervised learning algorithms. Zhang [

6] uses wavelet decomposition and an adaptive ANN model for pricing purposes. The motivation of this research is that no attempt has been found in the references to model short-term loads using non-linear autoregressive ANNs. A different implementation using a feedforward neural network can be found in [

22,

23].