#
Optimal Deep Learning LSTM Model for Electric Load Forecasting using Feature Selection and Genetic Algorithm: Comparison with Machine Learning Approaches ^{†}

^{1}

^{2}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

## 2. Literature Review

- (i)
- Implementing feature importance using wrapper and hybrid methods, optimal lag and number of layers selection for LSTM model using GA enabled us to prevent overfitting and resulted in more accurate and stable forecasting.
- (ii)
- We train a robust LSTM-RNN model to forecast aggregate electric load for short- and medium term horizon using a large dataset for a complete metropolitan region covering a period of nine years at a 30 min resolution
- (iii)
- We compare the LSTM-RNN model with the machine learning benchmark that is performing the best among several linear and non-linear models optimized with hyperparameter tuning.

## 3. Background

#### 3.1. From RNN to LSTM-RNN’s

_{1}, x

_{2}, ..., x

_{n}} using the recurrence:

_{t}is the input at time t, and h

_{t}is the hidden state. Gates are introduced into the recurrence function f in order to solve the gradient vanishing or explosion problem. States of LSTM cells are computed as follows:

_{t}, f

_{t}and o

_{t}are the input, forget and output gates respectively. W’s and b’s are parameters of the LSTM unit, C

_{t}is the current cell state and $\tilde{C}$ is new candidate values for cell state. There are three sigmoid functions for i

_{t}, f

_{t}and o

_{t}gates that modulates the output between 0 and 1 as given in Equations (2)–(4). The decisions for these three gates are dependent on the current input x

_{t}and the previous output h

_{t}

_{−1}. If the gate is 0, then the signal is blocked by the gate. Forget Gate f

_{t}defines how much of the previous state h

_{t}

_{−1}is allowed to pass. Input gate i

_{t}decides which new information from the input to update or add to the cell state. Output gate o

_{t}decides which information to output based on the cell state. These gates work in tandem to learn and store long and short-term sequence related information.

_{t}

_{−1}into the new cell state C

_{t}is performed using Equation (6). Calculation of new candidate values $\tilde{C}$ of memory cell and output of current LSTM block h

_{t}uses hyperbolic tangent function as in Equations (5) and (7). The two states cell state and the hidden state are being transferred to the next cell for every time step. This process then continues to repeat. Weights and biases are learnt by the model by minimizing the differences between the LSTM outputs and the actual training samples.

#### 3.2. Alternative Modeling Approaches

#### 3.3. Performance Metrics for Evaluation

_{i}is the actual and $\overline{y}$ is the average energy consumption.

## 4. Methodology

#### 4.1. Methodology Process Overview

#### 4.2. Data Preparation and Pre-Processing

#### 4.2.1. Preliminary Data Analysis

#### 4.2.2. Data Pre-Processing

^{2}and high residual autocorrelation can be signs of spurious regression. Dickey-Fuller test is conducted to check stationarity. The resulting p-value is less 0.05, thus we reject the null hypothesis and conclude that the time series is stationary.

#### 4.3. Selecting the Machine Learning Benchmark Model

#### 4.3.1. Improving Benchmark Performance with Feature Selection & Hyper Parameter Tuning

#### 4.3.2. Checking Overfitting for Machine Learning Model

#### 4.4. GA-Enhanced LSTM-RNN Model

#### 4.4.1. GA Customization for Optimal Lag Selection

- (i)
- Selection: roulette wheel selection was used to select parents according to their fitness. Better chromosomes have more chances to be selected
- (ii)
- Crossover: crossover operator exchanges variables between two parents. We have used two-point crossover on the input sequence individuals with crossover probability of 0.7.
- (iii)
- Mutation: This operation introduce diversity into the solution pool by means of randomly swapping bits. Mutation is a binary flip with a probability of 0.1
- (iv)
- Fitness Function: RMSE on validation set will act as a fitness function.

#### 4.4.2. GA-LSTM Training

## 5. Experimental Results: GA-LSTM Settings

- The number of neurons in the hidden layers were tested from 20 to 100.
- size was varied from 10 to 200 training examples and training epochs from 50 to 300.
- Sigmoid, hyperbolic tangent (tanh) and rectified linear unit (ReLU) were tested as the activation functions in hidden layers.
- gradient descent (SGD), Root Mean Square Propagation (RMSProp) and adaptive moment estimation (ADAM) were tested as optimizers.

#### 5.1. Cross Validation of the LSTM Model

#### 5.2. Short and Medium Term Forecasting Results

## 6. Discussion: Threat to Validity

## 7. Conclusions

## Author Contributions

## Acknowledgments

## Conflicts of Interest

## References

- Mocanu, E.; Nguyen, P.H.; Gibescu, M.; Kling, W.L. Deep learning for estimating building energy consumption. Sustain. Energy Grids Netw.
**2016**, 6, 91–99. [Google Scholar] [CrossRef] - Hyndman, R.J.; Shu, F. Density forecasting for long-term peak electricity demand. IEEE Trans. Power Syst.
**2010**, 25, 1142–1153. [Google Scholar] [CrossRef] - Chui, F.; Elkamel, A.; Surit, R.; Croiset, E.; Douglas, P.L. Long-term electricity demand forecasting for power system planning using economic, demographic and climatic variables. Eur. J. Ind. Eng.
**2009**, 3, 277–304. [Google Scholar] [CrossRef] - Hernandez, L.; Baladron, C.; Aguiar, J.M.; Carro, B.; Sanchez-Esguevillas, A.J.; Lloret, J.; Massana, J. A survey on electric power demand forecasting: Future trends in smart grids, microgrids and smart buildings. IEEE Commun. Surv. Tutor.
**2014**, 16, 1460–1495. [Google Scholar] [CrossRef] - Graves, A.; Jaitly, N. Towards End-To-End Speech Recognition with Recurrent Neural Networks. In Proceedings of the International Conference on Machine Learning, Beijing, China, 21–26 June 2014; pp. 1764–1772. [Google Scholar]
- Mao, J.; Xu, W.; Yang, Y.; Wang, J.; Huang, Z.; Yuille, A. Deep captioning with multimodal recurrent neural networks (m-RNN). arXiv, 2014; arXiv:1412.6632. [Google Scholar]
- Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to sequence learning with neural networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 3104–3112. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. LSTM can solve hard long time lag problems. In Proceedings of the Advances in Neural Information Processing Systems, Denver, CO, USA, 2–5 December 1996; pp. 473–479. [Google Scholar]
- Ribeiro, G.H.; Neto, P.S.D.M.; Cavalcanti, G.D.; Tsang, R. Lag selection for time series forecasting using particle swarm optimization. In Proceedings of the IEEE 2011 International Joint Conference on Neural Networks (IJCNN), San Jose, CA, USA, 31 July–5 August 2011; pp. 2437–2444. [Google Scholar]
- Goldberg, D.E. Genetic Algorithms in Search, Optimization, Machine Learning; Addison Wesley: Reading, UK, 1989. [Google Scholar]
- Wang, Z.; Srinivasan, R.S. A review of artificial intelligence based building energy use prediction: Contrasting the capabilities of single and ensemble prediction models. Renew. Sustain. Energy Rev.
**2017**, 75, 796–808. [Google Scholar] [CrossRef] - Liu, N.; Tang, Q.; Zhang, J.; Fan, W.; Liu, J. A hybrid forecasting model with parameter optimization for short-term load forecasting of micro-grids. Appl. Energy
**2014**, 129, 336–345. [Google Scholar] [CrossRef] - Weron, R. Electricity price forecasting: A review of the state-of-the-art with a look into the future. Int. J. Forecast.
**2014**, 30, 1030–1081. [Google Scholar] [CrossRef] - Ryu, S.; Noh, J.; Kim, H. Deep Neural Network Based Demand Side Short Term Load Forecasting. Energies
**2017**, 10, 3. [Google Scholar] [CrossRef] - Hagan, M.T.; Behr, S.M. The time series approach to short term load forecasting. IEEE Trans. Power Syst.
**1987**, 2, 785–791. [Google Scholar] [CrossRef] - Taylor, J.W. Short-term electricity demand forecasting using double seasonal exponential smoothing. J. Oper. Res. Soc.
**2003**, 54, 799–805. [Google Scholar] [CrossRef][Green Version] - Taylor, J.W.; de Menezes, L.M.; McSharry, P.E. A comparison of univariate methods for forecasting electricity demand up to a day ahead. Int. J. Forecast.
**2006**, 22, 1–16. [Google Scholar] [CrossRef][Green Version] - Park, D.C.; El-Sharkawi, M.; Marks, R.; Atlas, L.; Damborg, M. Electric load forecasting using an artificial neural network. IEEE Trans. Power Syst.
**1991**, 6, 442–449. [Google Scholar] [CrossRef][Green Version] - Hernandez, L.; Baladrón, C.; Aguiar, J.M.; Carro, B.; Esguevillas, S.A.J.; Lloret, J. Short-term load forecasting for microgrids based on artificial neural networks. Energies
**2013**, 6, 1385–1408. [Google Scholar] [CrossRef] - Hippert, H.S.; Pedreira, C.E.; Souza, R.C. Neural networks for short-term load forecasting: A review and evaluation. IEEE Trans. Power Syst.
**2001**, 16, 44–55. [Google Scholar] [CrossRef] - Box, G.E.; Jenkins, G.M.; Reinsel, G.C. Time Series Analysis: Forecasting and Control; John Wiley & Sons: Hoboken, NJ, USA, 2011; Volume 734. [Google Scholar]
- Chen, J.-F.; Wang, W.-M.; Huang, C.M. Analysis of an adaptive time-series autoregressive moving-average (ARMA) model for short-term load forecasting. Electr. Power Syst. Res.
**1995**, 34, 187–196. [Google Scholar] [CrossRef] - Zhao, H.-X.; Magouls, F. A review on the prediction of building energy consumption. Renew. Sustain. Energy Rev.
**2012**, 16, 3586–3592. [Google Scholar] [CrossRef] - Foucquier, A.; Robert, S.; Suard, F.; Stephan, L.; Jay, A. State of the art in building modelling and energy performances prediction: A review. Renew. Sustain. Energy Rev.
**2013**, 23, 272–288. [Google Scholar] [CrossRef] - Cincotti, S.; Gallo, G.; Ponta, L.; Raberto, M. Modeling and forecasting of electricity spot-prices: Computational intelligence vs classical econometrics. AI Commun.
**2014**, 27, 301–314. [Google Scholar] - Amjady, N.; Keynia, F. Day ahead price forecasting of electricity markets by a mixed data model and hybrid forecast method. Int. J. Electr. Power Energy Syst.
**2008**, 30, 533–546. [Google Scholar] [CrossRef] - Bakirtzis, A.G.; Petridis, V.; Kiartzis, S.J.; Alexiadis, M.C.; Maissis, A.H. A neural network short term load forecasting model for the Greek power system. IEEE Trans. Power Syst.
**1996**, 11, 858–863. [Google Scholar] [CrossRef] - Papadakis, S.E.; Theocharis, J.B.; Kiartzis, S.J.; Bakirtzis, A.G. A novel approach to short-term load forecasting using fuzzy neural networks. IEEE Trans. Power Syst.
**1998**, 13, 480–492. [Google Scholar] [CrossRef] - Bashir, Z.; El-Hawary, M. Applying wavelets to short-term load forecasting using PSO-based neural networks. IEEE Trans. Power Syst.
**2009**, 24, 20–27. [Google Scholar] [CrossRef] - Kodogiannis, V.S.; Amina, M.; Petrounias, I. A clustering-based fuzzy wavelet neural network model for short-term load forecasting. Int. J. Neural Syst.
**2013**, 23, 1350024. [Google Scholar] [CrossRef] [PubMed] - Fan, S.; Chen, L. Short-term load forecasting based on an adaptive hybrid method. IEEE Trans. Power Syst.
**2006**, 21, 392–401. [Google Scholar] [CrossRef] - Bengio, Y.; Courville, A.; Vincent, P. Representation Learning: A Review and New Perspectives. IEEE Trans. Pattern Anal. Mach. Intell.
**2013**, 35, 1798–1828. [Google Scholar] [CrossRef] [PubMed][Green Version] - Marino, D.L.; Amarasinghe, K.; Manic, M. Building energy load forecasting using Deep Neural Networks. In Proceedings of the IECON 42nd Annual Conference of the IEEE Industrial Electronics Society, Florence, Italy, 23–26 October 2016; pp. 7046–7051. [Google Scholar]
- Rahman, A.; Srikumar, V.; Smith, A.D. Predicting electricity consumption for commercial and residential buildings using deep recurrent neural networks. Appl. Energy
**2018**, 212, 372–385. [Google Scholar] [CrossRef] - Zheng, H.; Yuan, J.; Chen, L. Short-Term Load Forecasting Using EMD-LSTM Neural Networks with a Xgboost Algorithm for Feature Importance Evaluation. Energies
**2017**, 10, 1168. [Google Scholar] [CrossRef] - Roux, N.L.; Bengio, Y. Deep Belief Networks Are Compact Universal Approximators. Neural Comput.
**2010**, 22, 2192–2207. [Google Scholar] [CrossRef][Green Version] - Colah.github.io. Understanding LSTM Networks—Colah’s Blog. Available online: http://colah.github.io/posts/2015-08-Understanding-LSTMs (accessed on 5 April 2018).
- Patterson, J.; Gibson, A. Deep Learning. A Practitioner’s Approach; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2017; pp. 150–158. [Google Scholar]
- Wei, Y.; Zhang, X.; Shi, Y.; Xia, L.; Pan, S.; Wu, J.; Han, M.; Zhao, X. A review of data-driven approaches for prediction and classification of building energy consumption. Renew. Sustain. Energy Rev.
**2018**, 82, 1027–1047. [Google Scholar] [CrossRef] - Yildiz, B.; Bilbao, J.; Sproul, A. A review and analysis of regression and machine learning models on commercial building electricity load forecasting. Renew. Sustain. Energy Rev.
**2017**, 73, 1104–1122. [Google Scholar] [CrossRef] - RTE France. Bilans Électriques Nationaux. Available online: http://www.rte-france.com/fr/article/bilans-electriques-nationaux (accessed on 7 February 2018).
- Dangeti, P. Statistics for Machine Learning: Techniques for Exploring Supervised, Unsupervised, and Reinforcement Learning Models with Python and R; Packt Publishing: Birmingham, UK, 2017. [Google Scholar]
- Brooks, C. Introductory Econometrics for Finance, 2nd ed.; Cambridge University Press: Cambridge, UK, 2008. [Google Scholar]
- Hastie, T.J.; Tibshirani, R.J.; Friedman, J.H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer: New York, NY, USA, 2009. [Google Scholar]
- Huang, Y. Advances in Artificial Neural Networks—Methodological Development and Application. Algorithms
**2009**, 2, 973–1007. [Google Scholar] [CrossRef] - Scikit-learn.org. Parameter Estimation Using Grid Search with Cross-Validation—Scikit-Learn 0.19.1 Documentation. 2018. Available online: http://scikit-learn.org/stable/auto_examples/model_-selection/plotgrid_search_digits.html (accessed on 12 April 2018).
- Lukoseviciute, K.; Ragulskis, M. Evolutionary algorithms for the selection of time lags for time series forecasting by fuzzy inference systems. Neurocomputing
**2010**, 73, 2077–2088. [Google Scholar] [CrossRef] - Sun, Z.L.; Huang, D.S.; Zheng, C.H.; Shang, L. Optimal selection of time lags for TDSEP based on genetic algorithm. Neurocomputing
**2006**, 69, 884–887. [Google Scholar] [CrossRef] - Scikit-learn.org. sklearn.model_selection.TimeSeriesSplit—Scikit-Learn 0.19.1 Documentation. 2018. Available online: http://scikitlearn.org/stable/modules/generated/sklearn.model_selection.Time-Series-Split.html (accessed on 18 April 2018).
- Scellato, S.; Fortuna, L.; Frasca, M.; Gómez-Gardeñes, J.; Latora, V. Traffic optimization in transport networks based on local routing. Eur. Phys. J. B
**2010**, 73, 303–308. [Google Scholar] [CrossRef] - Bouktif, S. Improving Software Quality Prediction by Combining and Adapting Predictive Models. Ph.D. Thesis, Montreal University, Montreal, QC, Canada, 2005. [Google Scholar]

No. | Model | Parameters |
---|---|---|

1 | Ridge Regression | Regularization parameter, α = 0.8 |

2 | k-Nearest Neighbor | No. of neighbors, n = 5, weight function = uniform, Distance Metric = Euclidian |

3 | Random Forest | No of Trees = 125, max depth of the tree = 100, min samples split = 4, min sample leaf = 4 |

4 | Gradient Boosting | No of estimators = 125, maximum depth = 75, min samples split = 4, min sample leaf = 4 |

5 | Neural network | Activation = relu, weight optimization = adam, batch size = 150, number of epochs = 300, learning rate = 0.005 |

6 | Extra Trees | No of Trees = 125, max depth of the tree = 100, min samples split = 4, min sample leaf = 4 |

Model | RMSE | CV (RMSE) | MAE |
---|---|---|---|

Linear Regression | 847.62 | 1.55 | 630.76 |

Ridge | 877.35 | 1.60 | 655.70 |

k-Nearest Neighbor | 1655.70 | 3.02 | 1239.35 |

Random Forest | 539.08 | 0.98 | 370.09 |

Gradient Boosting | 1021.55 | 1.86 | 746.24 |

Neural network | 2741.91 | 5.01 | 2180.89 |

Extra Trees | 466.88 | 0.85 | 322.04 |

Metrics | Values After | Values Before |
---|---|---|

RMSE | 428.01 | 466.88 |

CV(RMSE) % | 0.78 | 0.85 |

MAE | 292.49 | 322.04 |

Metrics | LSTM Metrics 30 Lags | LSTM Metrics Optimal Time Lags | Extra Tree Model Metrics | Error Reduction (%) |
---|---|---|---|---|

RMSE | 353.38 | 341.40 | 428.01 | 20.3 |

CV(RMSE) | 0.643 | 0.622 | 0.78 | 20.3 |

MAE | 263.14 | 249.53 | 292.49 | 14.9 |

**Table 5.**Performance Metrics of LSTM-RNN and Extra Tree Regressor Model using the Cross Validation Approach.

Model | Mean | Std. Deviation |
---|---|---|

RMSE Extra Trees | 513.8 | 90.9 |

RMSE LSTM | 378 | 59.8 |

CV (RMSE) % Extra Trees | 1.95 | 0.3 |

CV (RMSE) % LSTM | 1.31 | 0.2 |

MAE Extra Trees | 344 | 55.8 |

MAE LSTM | 270.4 | 45.4 |

Forecasting Horizon | MAE | RMSE | CV (RMSE) % |
---|---|---|---|

2 Weeks | 251 | 339 | 0.61 |

Between 2–4 Weeks | 214 | 258 | 0.56 |

Between 2–3 Months | 225 | 294 | 0.63 |

Between 3–4 Months | 208 | 275 | 0.50 |

Mean-Medium term | 215.6 | 275.6 | 0.56 |

Std. Dev. | 8.6 | 18 | 0.06 |

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Bouktif, S.; Fiaz, A.; Ouni, A.; Serhani, M.A. Optimal Deep Learning LSTM Model for Electric Load Forecasting using Feature Selection and Genetic Algorithm: Comparison with Machine Learning Approaches ^{†}. *Energies* **2018**, *11*, 1636.
https://doi.org/10.3390/en11071636

**AMA Style**

Bouktif S, Fiaz A, Ouni A, Serhani MA. Optimal Deep Learning LSTM Model for Electric Load Forecasting using Feature Selection and Genetic Algorithm: Comparison with Machine Learning Approaches ^{†}. *Energies*. 2018; 11(7):1636.
https://doi.org/10.3390/en11071636

**Chicago/Turabian Style**

Bouktif, Salah, Ali Fiaz, Ali Ouni, and Mohamed Adel Serhani. 2018. "Optimal Deep Learning LSTM Model for Electric Load Forecasting using Feature Selection and Genetic Algorithm: Comparison with Machine Learning Approaches ^{†}" *Energies* 11, no. 7: 1636.
https://doi.org/10.3390/en11071636