# Wind Power Forecasting Based on Echo State Networks and Long Short-Term Memory

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Wind Power Forecasting

- Ultra-short-term forecasting: from few minutes to 1 h ahead.
- Short-term forecasting: from 1 h to several hours ahead.
- Medium-term forecasting: from several hours to one week ahead.
- Long-term forecasting: from one week to one year or more ahead.

#### 2.1. Persistence Method

#### 2.2. Physical Methods

#### 2.3. Statistical Methods

#### 2.4. Machine Learning Methods

## 3. Recurrent Neural Networks

#### 3.1. Long Short-Term Memory

- $ne{t}_{j}^{\left[i\right]}\left(t\right)=\sum _{m}{w}_{mj}^{\left[i\right]}\xb7{x}_{m}\left(t\right)+\sum _{m}{w}_{mj}^{{\left[i\right]}_{rec}}\xb7{y}_{m}(t-1),$
- $ne{t}_{j}^{\left[ig\right]}\left(t\right)=\sum _{m}{w}_{mj}^{\left[ig\right]}\xb7{x}_{m}\left(t\right)+\sum _{m}{w}_{mj}^{{\left[ig\right]}_{rec}}\xb7{y}_{m}(t-1)+{w}_{\xb7j}^{{\left[ig\right]}_{peep}}\xb7{c}_{j}(t-1),$
- ${y}_{j}^{\left[ig\right]}\left(t\right)=f\left(ne{t}_{j}^{\left[ig\right]}\left(t\right)\right),$
- $ne{t}_{j}^{\left[fg\right]}\left(t\right)=\sum _{m}{w}_{mj}^{\left[fg\right]}\xb7{x}_{m}\left(t\right)+\sum _{m}{w}_{mj}^{{\left[fg\right]}_{rec}}\xb7{y}_{m}(t-1)+{w}_{\xb7j}^{{\left[fg\right]}_{peep}}\xb7{c}_{j}(t-1),$
- ${y}_{j}^{\left[fg\right]}\left(t\right)=f\left(ne{t}_{j}^{\left[fg\right]}\left(t\right)\right),$
- ${c}_{j}\left(t\right)={y}_{j}^{\left[fg\right]}\left(t\right)\xb7{c}_{j}(t-1)+{y}_{j}^{\left[ig\right]}\left(t\right)\xb7g\left(ne{t}_{j}^{\left[i\right]}\left(t\right)\right),$
- $ne{t}_{j}^{\left[og\right]}\left(t\right)=\sum _{m}{w}_{mj}^{\left[og\right]}\xb7{x}_{m}\left(t\right)+\sum _{m}{w}_{mj}^{{\left[og\right]}_{rec}}\xb7{y}_{m}(t-1)+{w}_{\xb7j}^{{\left[og\right]}_{peep}}\xb7{c}_{j}\left(t\right),$
- ${y}_{j}^{\left[og\right]}\left(t\right)=f\left(ne{t}_{j}^{\left[og\right]}\left(t\right)\right),$
- ${y}_{j}\left(t\right)=h\left({c}_{j}\left(t\right)\right)\xb7{y}_{j}^{\left[og\right]}\left(t\right),$

#### 3.2. Echo State Networks

- The reservoir connection matrix ${\mathbf{W}}_{start}$ is randomly initialized.
- The matrix is updated as $\mathbf{W}=\alpha \xb7{\mathbf{W}}_{start}/\rho \left({\mathbf{W}}_{start}\right)$.

## 4. LSTM+ESN Proposed Model

Algorithm 1 LSTM+ESN training scheme |

1: Dataset is split in two: $\{{\mathbf{x}}_{tr},{\mathbf{y}}_{tr}\}$ for training and $\{{\mathbf{x}}_{va},{\mathbf{y}}_{va}\}$ for validating. |

2: Randomly initialize sparse hidden weights matrices ${\mathbf{W}}^{{\left[i\right]}_{rec}}$, ${\mathbf{W}}^{{\left[ig\right]}_{rec}}$, ${\mathbf{W}}^{{\left[fg\right]}_{rec}}$ and ${\mathbf{W}}^{{\left[og\right]}_{rec}}$ from a uniform distribution with range (−0.1, 0.1). |

3: Initialize hidden weight matrices ${\mathbf{W}}^{\left[i\right]}$, ${\mathbf{W}}^{{\left[ig\right]}_{i}}$, ${\mathbf{W}}^{{\left[fg\right]}_{i}}$, ${\mathbf{W}}^{{\left[og\right]}_{i}}$, ${\mathbf{W}}^{{\left[ig\right]}_{peep}}$, ${\mathbf{W}}^{{\left[fg\right]}_{peep}}$, ${\mathbf{W}}^{{\left[og\right]}_{peep}}$, ${\mathbf{W}}^{{\left[i\right]}_{bias}}$, ${\mathbf{W}}^{{\left[ig\right]}_{bias}}$, ${\mathbf{W}}^{{\left[fg\right]}_{bias}}$ and ${\mathbf{W}}^{{\left[og\right]}_{bias}}$ from a uniform distribution with range (−0.1, 0.1). |

4: Set $\mathbf{z}$ as ${\mathbf{x}}_{tr}$ or ${\mathbf{y}}_{tr}$. |

5: Set the value for Regularization. |

6: TrainNet_by_LSTM(${\{{\mathbf{x}}_{tr}\left(t\right),\mathbf{z}\left(t\right)\}}_{t=1,\dots ,T}$) |

7: TrainNet_by_ESN(${\{{\mathbf{x}}_{tr}\left(t\right),{\mathbf{y}}_{tr}\left(t\right)\}}_{t=1,\dots ,T}$, Regularization) |

8: repeat |

9: TrainNet_by_LSTM(${\{{\mathbf{x}}_{tr}\left(t\right),{\mathbf{y}}_{tr}\left(t\right)\}}_{t=1,\dots ,T}$) |

10: TrainNet_by_ESN(${\{{\mathbf{x}}_{tr}\left(t\right),{\mathbf{y}}_{tr}\left(t\right)\}}_{t=1,\dots ,T}$, Regularization) |

11: until Convergence using the validation set as the early stopping condition. |

`TrainNet_by_LSTM`, systematically describes Step 3 of Algorithm 1, used to train the network. First, the network is trained online with the AdaDelta [60] optimization algorithm as in the LSTM training, using only one epoch (Step 6). We choose this optimization technique because it has been shown that it converges faster than other algorithms of the descending gradient type [61]. In addition, to preserve the imposed requirements of the ESN reservoir, after each weight update, the matrix of recurring connections is kept sparse by zeroing out those weights that were initially set as zero (Step 7). In order to avoid producing diverging output values when updating the output weights or the influence of the LSTM gates being biased by some very large weights, a filter is applied that resets to zero weights that exceed a threshold $\zeta $ (Step 10 and Step 12). Since online training is performed, the matrix of recurring connections will be re-scaled to maintain a spectral radius $\alpha $ at each instant t. The parameter

`MaxEpoch`indicates the maximum number of epochs to use; $\zeta $ means the maximum value that can take the weights;

`fixedWo`indicates if the output weights are updated or not;

`keepSR`indicates if the spectral ratio is kept or not; and

`cleanHW`controls if the weights that exceed $\zeta $ are reset to zero.

Algorithm 2 TrainNet_by_LSTM() |

Input: A set of instances ${\{X\left(t\right),Z\left(t\right)\}}_{t=1,\dots ,T}$ |

Input: MaxEpoch = 1, $\zeta =10$, fixedWo=False, keepSR=True, cleanHW=True |

1: Define $i=0$ |

2: while $i<$ MaxEpoch do |

3: $i\leftarrow i+1$ |

4: for $t=1$ T do |

5: ${\widehat{Z}}_{tmp}\left(t\right)\leftarrow $ForwardPropagate($X\left(t\right)$, network) |

6: UpdateWeights($Z\left(t\right)$, ${\widehat{Z}}_{tmp}\left(t\right)$, network) |

7: Sparse connectivity is maintained by setting the inactive weights back to zero. |

8: if cleanHW == True then |

9: if fixedWo == False then |

10: ${W}^{\left[o\right]}\leftarrow 0$ ssi $|{W}^{\left[o\right]}|>\zeta $ |

11: end if |

12: ${W}^{[.]}\leftarrow 0$ ssi $|{W}^{[.]}|>\zeta $ , $\forall \phantom{\rule{0.166667em}{0ex}}[.]\setminus $output layer |

13: end if |

14: end for |

15: if keepSR == True then |

16: ${W}^{{[.]}_{rec}}\leftarrow {W}^{{[.]}_{rec}}\xb7\alpha /\rho \left({W}^{{[.]}_{rec}}\right)$ |

17: end if |

18: end while |

Output: Net’s Weights |

`TrainNet_by_ESN()`. This can be done, for example, using ridge regression as in Equation (2) or by means of quantile regression [54]. The latter approach gets a more robust estimation of $\mathbf{y}$. Recall that the model needs to use a regularized regression because ${\mathbf{x}}^{\left[o\right]}$ is a singular matrix. Currently, there are different alternatives in the literature to do this, but for this proposal, we used a regularized quantile regression model based on an elastic network, implemented in the R-project package

`hqreg`.

`TrainNet_by_LSTM`is performed using $\mathbf{y}$ as the target output; (ii)

`TrainNet_by_ESN`is performed using the same regression as in Step 7 from Algorithm 1.

`regression`is set in Step 5, Algorithm 1. For instance, LSTM+ESN+Y+RR denotes the LSTM+ESN architecture trained with $\mathbf{y}$ as a target in Step 4 and Ridge Regression in Step 5.

`validate`indicates if the input dataset is split or not in dataset-valid and dataset-train;

`MaxEpochGlobal`indicates the maximum number of epochs in the fine tuning stage; and

`Regularization`refers to the regularization used by the

`TrainNet_by_ESN`step; which in this case is always set to

`QR`.

Algorithm 3 LSTM+ESN+X+QR |

Input: A set of instances ${\{\mathbf{x}\left(t\right),\mathbf{y}\left(t\right)\}}_{t=1,\dots ,T}$. |

1: validate=True, MaxEpochGlobal = 1, Regularization = QR |

2: $\{{\mathbf{x}}_{ori}\left(t\right),{\mathbf{y}}_{ori}\left(t\right)\}\leftarrow {\{\mathbf{x}\left(t\right),\mathbf{y}\left(t\right)\}}_{t=1,\dots ,T}$ |

3: if validate == True then |

4: $\{{\mathbf{x}}_{va}\left(t\right),{\mathbf{y}}_{va}\left(t\right)\}\leftarrow {\{\mathbf{x}\left(t\right),\mathbf{y}\left(t\right)\}}_{t=\lfloor 0.9\xb7T\rfloor +1,\dots ,T}$ |

5: $\{{\mathbf{x}}_{tr}\left(t\right),{\mathbf{y}}_{tr}\left(t\right)\}\leftarrow {\{\mathbf{x}\left(t\right),\mathbf{y}\left(t\right)\}}_{t=1,\dots ,\lfloor 0.9\xb7T\rfloor}$ |

6: end if |

7: TrainNet_by_LSTM(${\{{\mathbf{x}}_{tr}\left(t\right),{\mathbf{x}}_{tr}\left(t\right)\}}_{t=1,\dots ,T}$) |

8: TrainNet_by_ESN(${\{{\mathbf{x}}_{tr}\left(t\right),{\mathbf{y}}_{tr}\left(t\right)\}}_{t=1,\dots ,T}$,Regularization) |

9: if validate == True then |

10: $Erro{r}_{best}\leftarrow $CalculateForecastError(${\{{\mathbf{x}}_{va}\left(t\right),{\mathbf{y}}_{va}\left(t\right)\}}_{t=\dots}$) |

11: end if |

12: Save(network) |

13: Define $i=0$, attempts=1, MaxAttemtps=10 |

14: while$i<$MaxEpochGlobal do |

15: $i\leftarrow i+1$ |

16: RestartOutputs(network) |

17: RestartGradients(network) |

18: TrainNet_by_LSTM(${\{{\mathbf{x}}_{tr}\left(t\right),{\mathbf{y}}_{tr}\left(t\right)\}}_{t=1,\dots ,T}$; fixedWo=True) |

19: if validate == True then |

20: TrainNet_by_ESN(${\{{\mathbf{x}}_{tr}\left(t\right),{\mathbf{y}}_{tr}\left(t\right)\}}_{t=1,\dots ,T}$,Regularization) |

21: $Erro{r}_{new}\leftarrow $CalculateForecastError(${\{{\mathbf{x}}_{va}\left(t\right),{\mathbf{y}}_{va}\left(t\right)\}}_{t=\dots}$) |

22: if $Erro{r}_{new}<Erro{r}_{best}$ then |

23: $Erro{r}_{best}\leftarrow Erro{r}_{new}$ |

24: Save(network) |

25: else |

26: attempts←attempts + 1 |

27: end if |

28: else |

29: Save(network) |

30: end if |

31: if attempts>MaxAttemtps then |

32: break |

33: end if |

34: end while |

35: LoadSavedNet() |

36: TrainNet_by_ESN(${\{{\mathbf{x}}_{ori}\left(t\right),{\mathbf{x}}_{ori}\left(t\right)\}}_{t=1,\dots ,T}$) |

Output: Last network saved |

## 5. Experiments

#### 5.1. Data Description and Preprocessing

#### 5.2. Forecasting Accuracy Evaluation Metrics

- s1: the one that achieves the lowest MSE averaged over every subseries and every step ahead of the test set, as in Equation (22).
- s2: the one that achieves the lowest weighted average of:$$\frac{1}{R}\sum _{r=1}^{R}\sum _{h=1}^{H}{v}_{h}\xb7{\left({e}_{r}(T+h|T)\right)}^{2},$$

#### 5.3. Case Study and Results

## 6. Conclusions

`MaxEpochGlobal`= 1; making it an efficient machine learning approach. Finally, our proposal trains all the layers of the model, taking advantage of the potential of its architecture in automatically modeling the underlying temporal dependencies of the data.

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## References

- Chang, W.-Y. A Literature Review of Wind Forecasting Methods. J. Power Energy Eng.
**2014**, 2, 161–168. [Google Scholar] [CrossRef] - Perera, K.S.; Aung, Z.; Woon, W. Machine learning techniques for supporting renewable energy generation and integration: A survey. In Data Analytics for Renewable Energy Integration; Woon, W.L., Aung, Z., Madnick, S., Eds.; Springer International Publishing: Cham, Switzerland, 2014; pp. 81–96. [Google Scholar]
- Schmidt, T.S.; Born, R.; Schneider, M. Assessing the costs of photovoltaic and wind power in six developing countries. Nat. Clim. Chang.
**2012**, 2, 548–553. [Google Scholar] [CrossRef] - De Aguiar, B.C.G.; Valenca, M.J.S. Using reservoir computing for forecasting of wind power generated by a wind farm. In Proceedings of the Sixth International Conference on Advanced Cognitive Technologies and Applications, Venice, Italy, 25–29 May 2014. [Google Scholar]
- Lei, M.; Shiyan, L.; Chuanwen, J.; Hongling, L.; Yan, Z. A review on the forecasting of wind speed and generated power. Renew. Sustain. Energy Rev.
**2009**, 13, 915–920. [Google Scholar] [CrossRef] - Jung, J.; Broadwater, R.P. Current status and future advances for wind speed and power forecasting. Renew. Sustain. Energy Rev.
**2014**, 31, 762–777. [Google Scholar] [CrossRef] - Fortuna, L.; Nunnari, S.; Guariso, G. Fractal order evidences in wind speed time series. In Proceedings of the 2014 International Conference on Fractional Differentiation and Its Applications (ICFDA), Catania, Italy, 23–25 June 2014; pp. 1–6. [Google Scholar]
- Shen, Z.; Ritter, M. Forecasting Volatility of Wind Power Production; Humboldt University: Berlin, Germany, 2015. [Google Scholar]
- Fazelpour, F.; Tarashkar, N.; Rosen, M.A. Short-term wind speed forecasting using artificial neural networks for Tehran, Iran. Int. J. Energy Environ. Eng.
**2016**, 7, 377–390. [Google Scholar] [CrossRef] - Kadhem, A.A.; Wahab, N.I.A.; Aris, I.; Jasni, J.; Abdalla, A.N. Advanced wind speed prediction model based on a combination of weibull distribution and an artificial neural network. Energies
**2017**, 10, 1744. [Google Scholar] [CrossRef] - Zheng, D.; Shi, M.; Wang, Y.; Eseye, A.T.; Zhang, J. Day-ahead wind power forecasting using a two-stage hybrid modeling approach based on scada and meteorological information, and evaluating the impact of input-data dependency on forecasting accuracy. Energies
**2017**, 10, 1988. [Google Scholar] [CrossRef] - Hallas, M.; Dorffner, G. A Comparative Study on Feedforward and Recurrent Neural Networks in Time Series Prediction Using Gradient Descent Learning. 1998. Available online: http://www.smartquant.com/references/NeuralNetworks/neural22.pdf (accessed on 1 February 2018).
- Madsen, H.; Nielsen, H.A.; Nielsen, T.S. A tool for predicting the wind power production of off-shore wind plants. In Proceedings of the Copenhagen Offshore Wind Conference & Exhibition, Copenhagen, Denmark, 25–28 October 2005. [Google Scholar]
- Al-Deen, S.; Yamaguchi, A.; Ishihara, T. A physical approach to wind speed prediction for wind energy forecasting. In Proceedings of the Fourth International Symposium on Computational Wind Engineering, Yokohama, Japan, 16–19 July 2006. [Google Scholar]
- Ishihara, T.; Yamaguchi, A.; Fujino, Y. A Nonlinear Model MASCOT: Development and Application. In Proceedings of the European Wind Energy Conference, Madrid, Spain, 16–19 June 2003. [Google Scholar]
- Cassola, F.; Burlando, M. Wind speed and wind energy forecast through Kalman filtering of Numerical Weather Prediction model output. Appl. Energy
**2012**, 99, 154–166. [Google Scholar] [CrossRef] - Li, L.; Liu, Y.-Q.; Yang, Y.-P.; Han, S.; Wang, Y.-M. A physical approach of the short-term wind power prediction based on CFD pre-calculated flow fields. J. Hydrodyn.
**2013**, 25, 56–61. [Google Scholar] [CrossRef] - Marti, I.; Cabezon, D.; Villanueva, J.; Sanisisdro, M.J.; Loureiro, Y.; Cantero, E. LocalPred and RegioPred, Advanced tolos for wind energy prediction in complex terrain. In Proceedings of the European Wind Energy Conference & Exhibition, Madrid, Spain, 16–19 June 2003. [Google Scholar]
- Landberg, L. Short-term prediction of local wind conditions. J. Wind Eng. Ind. Aerodyn.
**2001**, 89, 235–245. [Google Scholar] [CrossRef] - Focken, U.; Lange, M.; Waldl, H.-P. Previento—A wind power prediction system with an innovative upscaling algorithm. In Proceedings of the European Wind Energy Conference & Exhibition, Copenhagen, Denmark, 2–6 July 2001. [Google Scholar]
- Liu, H.; Tian, H.-Q.; Chen, C.; Li, Y.-F. A hybrid statistical method to predict wind speed and wind power. Renew. Energy
**2010**, 35, 1857–1861. [Google Scholar] [CrossRef] - Wang, M.-D.; Qiu, Q.-R.; Cui, B.-W. Short-Term wind speed forecasting combined time series method and ARCH model. In Proceedings of the 2012 International Conference on Machine Learning and Cybernetics, Xian, China, 15–17 July 2012. [Google Scholar]
- Liu, Y.; Roberts, M.C.; Sioshansi, R. A vector autoregressive weather model for electricity supply and demand modeling. J. Mod. Power Syst. Clean Energy
**2014**, 1–14. [Google Scholar] [CrossRef] - Cadenas, E.; Rivera, W.; Campos-Amezcua, R.; Heard, C. Wind speed prediction using a univariate ARIMA model and a multivariate NARX model. Energies
**2016**, 9, 109. [Google Scholar] [CrossRef] - Foley, A.M.; Leahy, P.G.; Marvuglia, A.; McKeogh, E.J. Current methods and advances in forecasting of wind power generation. Renew. Energy
**2012**, 37, 1–8. [Google Scholar] [CrossRef][Green Version] - Olaofe, Z.O.; Folly, K.A. Wind power estimation using recurrent neural network technique. In Proceedings of the IEEE Power and Energy Society Conference and Exposition in Africa: Intelligent Grid Integration of Renewable Energy Resources (PowerAfrica), Johannesburg, South Africa, 9–13 July 2012; pp. 1–7. [Google Scholar]
- Cadenas-Barrera, J.L.; Meng, J.; Castillo-Guerra, E.; Chang, L. A neural network approach to multi-step- ahead, short-term wind speed forecasting. In Proceedings of the 12th International Conference on Machine Learning and Applications (ICMLA), Miami, FL, USA, 4–7 December 2013. [Google Scholar]
- Cheggaga, N. Improvements in wind speed forecasting using an online learning. In Proceedings of the 2014 5th International Renewable Energy Congress (IREC), Hammamet, Tunisia, 25–27 March 2014; pp. 1–6. [Google Scholar]
- Kishore, G.R.; Prema, V.; Rao, K.U. Multivariate wind power forecast using artificial neural network. In Proceedings of the IEEE Global Humanitarian Technology Conference, South Asia Satellite (GHTC-SAS), Trivandrum, India, 26–27 September 2014. [Google Scholar]
- De Aquino, R.R.B.; Souza, R.B.; Neto, O.N.; Lira, M.M.S.; Carvalho, M.A.; Ferreira, A.A. Echo state networks, artificial neural networks and fuzzy system models for improve short-term wind speed forecasting. In Proceedings of the International Joint Conference on Neural Networks (IJCNN), Killarney, Ireland, 12–17 July 2015. [Google Scholar]
- Sun, W.; Liu, M.; Liang, Y. Wind speed forecasting based on FEEMD and LSSVM optimized by the Bat algorithm. Energies
**2015**, 8, 6585–6607. [Google Scholar] [CrossRef] - Wu, Q.; Peng, C. Wind power generation forecasting using least squares support vector machine combined with ensemble empirical mode decomposition, principal component analysis and a bat algorithm. Energies
**2016**, 9, 261. [Google Scholar] [CrossRef] - Ghorbani, M.A.; Khatibi, R.; FazeliFard, M.H.; Naghipour, L.; Makarynskyy, O. Short-term wind speed predictions with machine learning techniques. Meteorol. Atmos. Phys.
**2016**, 128, 57–72. [Google Scholar] [CrossRef] - Bonanno, F.; Capizzi, G.; Sciuto, G.L.; Napoli, C. Wavelet recurrent neural network with semi-parametric input data preprocessing for micro-wind power forecasting in integrated generation systems. In Proceedings of the 2015 International Conference on Clean Electrical Power (ICCEP), Taormina, Italy, 16–18 June 2015; pp. 602–609. [Google Scholar]
- Chang, G.W.; Lu, H.J.; Hsu, L.Y.; Chen, Y.Y. A hybrid model for forecasting wind speed and wind power generation. In Proceedings of the 2016 IEEE Power and Energy Society General Meeting (PESGM), Boston, MA, USA, 17–21 July 2016; pp. 1–5. [Google Scholar]
- Brusca, S.; Capizzi, G.; Sciuto, G.L.; Susi, G. A new design methodology to predict wind farm energy production by means of a spiking neural network-based system. Int. J. Numer. Model. Electron. Netw. Devices Fields
**2017**, 30. [Google Scholar] [CrossRef] - De Alencar, D.B.; de Mattos Affonso, C.; de Oliveira, R.C.L.; Rodríguez, J.L.M.; Leite, J.C.; Filho, J.C.R. Different models for forecasting wind power generation: Case study. Energies
**2017**, 10, 1976. [Google Scholar] [CrossRef] - Bengio, Y.; Simard, P.; Frasconi, P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw.
**1994**, 5, 157–166. [Google Scholar] [CrossRef] [PubMed] - Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput.
**1997**, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed] - Jaeger, H. The “Echo State” Approach to Analysing and Training Recurrent Neural Networks; GMD Report 148; GMD—German National Research Institute for Computer Science: Hanover, Germany, 2001. [Google Scholar]
- Del Brío, B.M.; Molina, A.S. Neural Networks and Fuzzy Systems, 3rd ed.; Springer: Berlin, Germany, 2006. (In Spanish) [Google Scholar]
- Giles, C.L.; Lawrence, S.; Tsoi, A.C. Noisy time series prediction using recurrent neural networks and grammatical inference. Mach. Learn.
**2001**, 44, 161–183. [Google Scholar] [CrossRef] - Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Martens, J.; Sutskever, I. Learning recurrent neural networks with hessian-free optimization. In Proceedings of the 28th International Conference on Machine Learning (ICML 2011), Bellevue, WA, USA, 28 June–2 July 2011; pp. 1033–1040. [Google Scholar]
- Pascanu, R.; Mikolov, T.; Bengio, Y. On the difficulty of training recurrent neural networks. In Proceedings of the 30th International Conference on International Machine Learning (ICML’13), Atlanta, GA, USA, 16–21 June 2013; Volume 28, pp. III-1310–III-1318. [Google Scholar]
- Werbos, P.J. Backpropagation through time: What it does and how to do it. Proc. IEEE
**1990**, 78, 1550–1560. [Google Scholar] [CrossRef] - Williams, R.J.; Zipser, D. A learning algorithm for continually running fully recurrent neural networks. Neural Comput.
**1989**, 1, 270–280. [Google Scholar] [CrossRef] - Gers, F. Long Short-Term Memory in Recurrent Neural Networks. Ph.D. Thesis, École Polytechnique Fédérale de Laussanne, Lausanne, Switzerland, 2001. [Google Scholar]
- Lukoševičius, M.; Jaeger, H. Reservoir computing approaches to recurrent neural network training. Comput. Sci. Rev.
**2009**, 3, 127–149. [Google Scholar] [CrossRef] - Schölkopf, B.; Smola, A.J. Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond; MIT Press: Cambridge, MA, USA, 2002. [Google Scholar]
- Goga, C.; Shehzad, M.A. Overview of Ridge Regression Estimators in Survey Sampling; Université de Bourgogne: Dijon, France, 2010. [Google Scholar]
- Lukoševičius, M. Neural Networks: Tricks of the Trade, 2nd ed.; Chapter A Practical Guide to Applying Echo State Networks; Springer: Berlin/Heidelberg, Germany, 2012; pp. 659–686. [Google Scholar]
- Lv, Z.; Zhao, J.; Liu, Y.; Wang, W. Use of a quantile regression based echo state network ensemble for construction of prediction intervals of gas flow in a blast furnace. Control Eng. Pract.
**2016**, 46, 94–104. [Google Scholar] [CrossRef] - Nielsen, H.A.; Madsen, H.; Nielsen, T.S. Using quantile regression to extend an existing wind power forecasting system with probabilistic forecasts. Wind Energy
**2006**, 9, 95–108. [Google Scholar] [CrossRef] - Schmidhuber, J.; Gagliolo, M.; Gomez, F. Training recurrent networks by evolino. Neural Comput.
**2007**, 19, 757–779. [Google Scholar] [CrossRef] [PubMed] - Zhao, Z.; Chen, W.; Wu, X.; Chen, P.C.Y.; Liu, J. LSTM network: A deep learning approach for short-term traffic forecast. IET Intell. Transp. Syst.
**2017**, 11, 68–75. [Google Scholar] [CrossRef] - Liu, D.; Wang, J.; Wang, H. Short-term wind speed forecasting based on spectral clustering and optimised echo state networks. Renew. Energy
**2015**, 78, 599–608. [Google Scholar] [CrossRef] - Sheng, C.; Zhao, J.; Liu, Y.; Wang, W. Prediction for noisy nonlinear time series by echo state network based on dual estimation. Neurocomputing
**2012**, 82, 186–195. [Google Scholar] [CrossRef] - Palangi, H.; Deng, L.; Ward, R.K. Learning input and recurrent weight matrices in echo state networks. arXiv, 2013; arXiv:1311.2987. [Google Scholar]
- Zeiler, M.D. ADADELTA: An adaptive learning rate method. arXiv, 2012; arXiv:1212.5701. [Google Scholar]
- Ruder, S. An overview of gradient descent optimization algorithms. arXiv, 2016; arXiv:1609.04747. [Google Scholar]
- Iversen, E.B.; Morales, J.M.; Møller, J.K.; Trombe, P.-J.; Madsen, H. Leveraging stochastic differential equations for probabilistic forecasting of wind power using a dynamic power curve. Wind Energy
**2017**, 20, 33–44. [Google Scholar] [CrossRef] - Cheng, H.; Tan, P.; Gao, J.; Scripps, J. Advances in Knowledge Discovery and Data Mining. In Proceedings of the 10th Pacific-Asia Conference (PAKDD 2006), Singapore, 9–12 April 2006; Chapter Multistep-Ahead Time Series Prediction. Springer: Berlin/Heidelberg, Germany, 2006; pp. 765–774. [Google Scholar]

**Figure 6.**Preprocessing data scheme. Original forecasts in black, merged series in red. NWP, Numerical Weather Prediction.

Parameter | Value |
---|---|

MaxEpoch | 1 |

$\zeta $ | 10 |

fixedWo | False |

keepSR | True |

cleanHW | True |

Parameter | Value |
---|---|

validate | True |

MaxEpochGlobal | 1 |

Parameter | Value |
---|---|

${\rho}_{AD}$ | $0.95$ |

${\u03f5}_{AD}$ | $1\times {10}^{-8}$ |

**Table 4.**Configurations with the lowest test error. RR, Ridge Regression; QR, Quantile Regression; s1, Setting 1.

ID | Model | J | $\mathit{\alpha}$ | $\mathit{\lambda}$ |
---|---|---|---|---|

M1 | LSTM+ESN+Y+RR+s. | 480 | 0.5 | ${10}^{-3}$ |

M2 | LSTM+ESN+X+RR+s1 | 270 | 0.5 | ${10}^{-2}$ |

M3 | LSTM+ESN+X+RR+s2 | 110 | 0.6 | ${10}^{-3}$ |

M4 | LSTM+ESN+Y+QR+s1 | 270 | 0.8 | ${10}^{-2}$ |

M5 | LSTM+ESN+Y+QR+s2 | 340 | 0.6 | ${10}^{-5}$ |

M6 | LSTM+ESN+X+QR+s1 | 190 | 0.5 | ${10}^{-3}$ |

M7 | LSTM+ESN+X+QR+s2 | 110 | 0.5 | ${10}^{-3}$ |

**Table 5.**Average MSE, MAE, MAPE and Standard Deviation of Error (SDE). WPPT, Wind Power Prediction Tool.

ID | Model | MSE | MAE | MAPE | SDE |
---|---|---|---|---|---|

Persistence | 11,990,351 | 2545.84 | 150.85 | 2309.09 | |

WPPT | 3,831,958 | 1364.76 | 76.78 | 1691.58 | |

M1 | LSTM+ESN+Y+RR+s. | 3,775,138 * | 1391.16 | 79.63 | 1739.10 |

M2 | LSTM+ ESN+X+RR+s1 | 3,792,689 * | 1401.85 | 81.77 | 1814.31 |

M3 | LSTM+ESN+X+RR+s2 | 3,795,159 * | 1415.13 | 87.45 | 1788.09 |

M4 | LSTM+ESN+Y+QR+s1 | 3,924,957 | 1432.32 | 80.33 | 1742.01 |

M5 | LSTM+ESN+Y+QR+s2 | 3,902,590 | 1420.41 | 80.18 | 1718.10 |

M6 | LSTM+ESN+X+QR+s1 | 3,766,980 * | 1323.35 * | 66.72 * | 1666.40 * |

M7 | LSTM+ESN+X+QR+s2 | 3,689,831 * | 1343.60 * | 71.43 * | 1701.15 |

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

López, E.; Valle, C.; Allende, H.; Gil, E.; Madsen, H. Wind Power Forecasting Based on Echo State Networks and Long Short-Term Memory. *Energies* **2018**, *11*, 526.
https://doi.org/10.3390/en11030526

**AMA Style**

López E, Valle C, Allende H, Gil E, Madsen H. Wind Power Forecasting Based on Echo State Networks and Long Short-Term Memory. *Energies*. 2018; 11(3):526.
https://doi.org/10.3390/en11030526

**Chicago/Turabian Style**

López, Erick, Carlos Valle, Héctor Allende, Esteban Gil, and Henrik Madsen. 2018. "Wind Power Forecasting Based on Echo State Networks and Long Short-Term Memory" *Energies* 11, no. 3: 526.
https://doi.org/10.3390/en11030526