Enhancing Short-Term Wind Power Forecasting through Multiresolution Analysis and Echo State Networks

: This article suggests the application of multiresolution analysis by Wavelet Transform—WT and Echo State Networks—ESN for the development of tools capable of providing wind speed and power generation forecasting. The models were developed to forecast the hourly mean wind speeds, which are applied to the wind turbine’s power curve to obtain wind power forecasts with horizons ranging from 1 to 24 h ahead, for three different locations of the Brazilian Northeast. The average improvement of Normalized Mean Absolute Error—NMAE for the ﬁrst six, twelve, eighteen and twenty-four hourly power generation forecasts obtained by using the models proposed in this article were 70.87%, 71.99%, 67.77% and 58.52%, respectively. These results of improvements in relation to the Persistence Model—PM are among the best published results to date for wind power forecasting. The adopted methodology was adequate, assuring statistically reliable forecasts. When comparing the performance of fully-connected feedforward Artiﬁcial Neural Networks—ANN and ESN, it was observed that both are powerful time series forecasting tools, but the ESN proved to be more suited for wind power forecasting.


Introduction
Nowadays, wind energy is one of the most successful sources of renewable energy used around the world. The incentive policies adopted by several countries are among the drivers for using wind power, since they guarantee the purchase of energy produced by wind farms even if the price of energy is not competitive. The first countries that applied incentive policies to stimulate wind power development were Germany and Denmark. Then, other countries also adopted incentive policies, as in the case of Brazil, with the creation of the Incentive Program for Alternative Sources of Electric Energy-PROINFA [1].
Since wind generation is an intermittent source of generation, the use of models or techniques capable of predicting the energy produced by these type of sources is essential for an adequate wind power integration into electrical power systems.
Accurate wind power forecasting methods can assist power systems operators in the accurate balancing between demand and supply. The higher the penetration level of wind generation, the greater the need for forecasting tools of this type of generation, in order to maximize the integration between the generation from conventional sources (which is planned and programmed) and the forecast electricity demand (which is predictable with sufficient accuracy). According to [2], even when using state-of-the-art wind forecasting methods, the hour-ahead prediction errors are around 10-15% for a single wind farm.
In interconnected electrical power systems, the balance between demand and generation in the short-term depends on the automatic generation control, which does not regulate power flows in transmission lines. Regional voltage controls typically regulate the primary buses, which may not result in improved voltage levels on other buses of the power system. Thus, due to the limitations of generation controllers and voltage regulators, a high level of wind power penetration can cause temporary overloads on transmission lines and violation of acceptable system voltage levels.
Regarding the different forecasting horizons, the day-ahead period will always be important for wind forecasting. The unit commitment process of the next-day is a core task for power system operators. As described in [3], the day-ahead time frame is typically when the overall operating plan for the next day is put in place, including the selection of dispatchable sources (like large thermal plants that may take several hours to start up).
The current tools used for forecasting wind speed and power may or may not involve a Numerical Weather Prediction-NWP model. The models that use NWP provide better predictions of temporal series for larger horizons, starting from 3 to 6 h, which makes them more appropriate for utilities usage. In Brazil, for example, the National Electric System Operator-ONS uses wind hourly forecasts on the heights of 10 m and 100 m, from the numerical model ETA with spatial resolution of 15 km, to several wind farms. This model is processed at Center for Weather Forecasting and Climate Studies-CPTEC.
For short-term prediction, there are basically two different types of modeling: physical and statistical. A combination of the two types is also used to make more reliable predictions, trying to use physical variables to obtain an adequate estimate of the local wind speed before use a statistical model (Model Output Statistics-MOS) to decrease the remaining error. Statistical modeling searches for strong relations between the historical values of the electrical energy production (and other meteorological parameters) and information measured in real time, usually using recursive techniques. For time series forecasting, there are statistical models that can be expressed analytically, such as Autoregressive Moving Average-ARMA and its variants, or "black box" models, which are not described analytically, such as ANN.
Several short-term forecasting models using ANN, Fuzzy Logic and Wavelets were analyzed in [1]. The forecasts results were compared with reference models. The performance gains of the best models proposed in relation to reference models were approximately 80% to the forecasting horizon of one hour ahead. It was shown that there are improvements when the average hourly wind speeds are preprocessed with the application of multiresolution analysis by Wavelet Transform-WT before providing them as inputs to neural networks, especially for the forecasting horizon in the range from 1 to 6 h. The analyzed forecasting horizon range of varied from 1 to 24 h ahead.
Based on the theories of wavelet, wavelet packet, time series analysis and ANN, three hybrid models for wind speed prediction were proposed in [4]. The models were compared with classical wind speed forecasting methods like PM. The results showed that the Wavelet Packet-ANN model is the best among them.
A hybrid neuro-fuzzy system for wind power prediction was proposed in [5]. A wireless sensor network measures and transmits the required parameters (air temperature, density and pressure, and wind speed) for the prediction model at the operations center. According to the authors, consideration of all these parameters increased the prediction accuracy of the proposed model.
A procedure to make judgment about the nonlinearity of wind speed time series was proposed in [6]. The proposed method solves the problem with quantizing the nonlinear deterministic components. The performance of some prediction were compared, with Grey-Markov presenting the best performance.
A prediction method based on wavelet networks, using multi-resolution analysis and adaptive wavelet neural networks, was implemented and evaluated by numerical simulations using real wind speed profiles in [7]. According to the authors, the method allows reducing computational efforts without reduction in the accuracy, which is very important for prediction systems that uses digital processors.
Nonparametric methods using ANN and Fuzzy Inference Systems for modeling wind turbine's power curve were explored in [8]. The models were adjusted to predict the wind power in the range between 1 and 24 h ahead, for actual case studies from two wind farms in the Brazilian Northeast. The gain of the developed models lay in the range of 29 to 60% when compared to the PM in a twenty-four hours forecasting horizon.
Models presented in [9], based on techniques of ESN, ANN and Adaptive Neuro-Fuzzy Inference Systems-ANFIS, investigated the contribution of using climate variables as inputs seeking for the performance improvement of wind speed forecasting. The gain of some of the best models developed were approximately 50% in relation to the PM in a four hours ahead time window.
A method to find the best reservoir applied to the time series prediction task was proposed in [10]. The developed method, called RCDESIGN, combines an evolutionary algorithm with Reservoir Computing-RC and searches simultaneously for the best values of the ESN hyperparameters without rescaling the reservoir weights matrix by its spectral radius-SR. The method also considers the RC in all its nonlinearity, because it allows the use of all possible connections, rather than using only the required connections. The proposed method was applied in two real data sets of average hourly wind speed looking to verify the feasibility of the proposed method applied to wind speed forecasting.
A novel ESN that improves performance of the forecasting engine by decreasing the number of the internal states was proposed in [11]. The proposed topology has multiple sub-reservoirs with nonlinear relations between the internal states, resulting in reduced orders of the weight matrices. Simulation results show faster learning speed, fewer computations, and improved prediction performance as compared to ESN with fixed sizes and topologies.
The authors in [12] propose a Growing Echo State Network-GESN to automatically design ESN. The GESN makes use of multiple sub-reservoirs to reduce the coupling effects between units in a single reservoir. It was shown that the Echo State Property could be guaranteed without post-scaling of reservoir weight matrix. Experimental results on four time-series benchmarks demonstrate that the GESN performs better than some ESN with fixed hyperparameters.
An ESN consists of a large recurrent neural network, the reservoir (Figure 1), whose weights of connections are often set randomly and sparse. As discussed in [13], the reservoir states are driven by the input signal and then projected to output units. The present article suggests the application of multiresolution analysis by WT and ESN for the development of tools capable of providing wind speed and power generation forecasting. The generated tools seek to support planning of a multisource electrical power system, composed essentially by hydro, thermal and wind generation. The main objective is to provide further insight into the process used to develop a new model that uses ESN for wind speed forecasting. The performance of this model is then compared with one of the different models proposed in [1]. The wind power forecasts for each hour of the day are done through the application of the forecasted speeds into the wind turbine's power curve.
The main difference between the compared models is that rather than using ESN, the authors in [1] used Multilayer Feedforward fully connected Artificial Neural Networks. The model proposed in [1], that will be used here, was named by their authors of WTANNLM. The "WT" prefix means Wavelet Transform and "ANN" is due to the use of Multilayer Feedforward fully connected Artificial Neural Networks. The suffix "LM" refers to the use of the Levenberg-Marquardt-LM training algorithm. These networks architecture consists of an input, an intermediate (hidden) and an output layer, as shown in Figure 2. The goal of the ANN training algorithm is to adjust all weights of the connections at all existing layers. In the case of ESN, the training algorithm is used to adjust only the connections from the reservoir to the output units.  In this article, the networks were replaced by ESN, and the proposed model is called WTESN. The results show significant improvements compared to the forecasts presented in previous articles for the same databases.
Due to the existence of extensive literature addressing the ANN, ESN and WT, their detailed description is not the scope of this work. For more details, we suggest reading the references listed at the end of this article.

Materials and Methods
The materials and methods used for performing this article will be detailed in this section. First, will be presented an overview of the wind speed time series used for the development of the forecasting models and the error criteria used to evaluate their performance. After that, there is a brief description of the data preprocessing through multiresolution analysis with WT. Next, will be shown the structures of the forecasting models and the description of the BASE MATRIX-BM, which serves to facilitate the pattern separation for adjusting the models. In sequence, using a flow chart as reference, will be detailed each one of the principal steps carried out for a better understanding of the developed models. Finally, will be presented the methodology used to convert the forecasted wind speeds to power generation through the power curve of the wind turbine.
The data handling and the development of the proposed models was done with MATLAB ® in its version 7.10.0.499 (R2010a). The 64-bit processor used was the Intel (R) Core (TM) 2 Duo CPU T6600 @ 2.20 GHz, with 4.00 GB of RAM, and the operating system, Windows 7-64 bit.

Arrangement for the Wind Speed Time Series
The models developed in this article are univariate, i.e., wind speed is the only variable used as input for the forecasting models.
The wind speed series used to develop the models are the hourly average values measured in three different cities (Macau, Mossoro and Natal) in the Brazilian state of Rio Grande do Norte. The data were measured at a 10 m height, between January 2008 and December 2008, and each of the three series is composed by 8784 data samples. The statistics of the time series are shown in Table 1.  The composition of a typical day was obtained by calculating the arithmetic average of the data corresponding to each time of the day throughout all the studied year. It is clearly observed in Figure 4 that the wind speed has a daily cyclical variation. The values decrease throughout the day, reaching its minimum at approximately 09:00 (Coordinated Universal Time-UTC). Subsequently, the values increase until reaching its peak between 15:00 and 19:00 (UTC).
Seasonality can be seen in the graph of monthly average wind speeds ( Figure 5). The seasonal behavior of wind in the Brazilian northeast region is extremely important for wind power generation, as there is the possibility of using this type of generation as a way to supplement hydroelectric generation, since in periods of low rainfall, winds are more favorable, and during wet periods winds are weaker.
As the autocorrelation directly affects the performance of univariate models, it was used for the definition of the forecasting model inputs. Figure 6 depicts the autocorrelation function of the wind speed. Because of the winds' daily cyclic behavior, as expected, it can be observed that the series autocorrelation has another maximum 24 h after the initial peak.

Performance Evaluation
The error metrics used in this article to evaluate the performance of a forecasting model were Mean Squared Error-MSE, and the Mean Absolute Error-MAE, defined as follows: where k-forecasting horizon (number of time-steps); N-number of data used for the model evaluation; e s (t + k|t)-error corresponding to time t + k for the wind speed forecast made at origin time t. The errors defined in (1) and (2) refer to the wind speed forecasts. The Normalized Mean Absolute Error-NMAE was used to illustrate the performance of the power generation. The equation used to calculate it is defined as follows: where k-forecasting horizon (number of time-steps); N-number of data used for the model evaluation; P inst -installed wind power capacity; e P (t + k|t)-error corresponding to time t + k for the power generation forecast made at origin time t.
Probably the most common reference model used in the time series prediction is the Persistence Model [14]. This simple predictor tool states that the variable's future value will be the same as the last one measured, i.e.,: where y PERS (t + k|t)-wind speed (or power generation) forecast for time t + k made at origin time t; y(t)-measured wind speed (or power generation) at time t.
In order to calculate the improvement of the forecasts obtained with the proposed models in relation to PM, the following equation was used: where k-forecasting horizon (number of time-steps); EC-considered Evaluation Criteria, which can be MSE, MAE or NMAE.

Multiresolution Analysis
The wind speed have a highly oscillatory behavior, so developing models that provide satisfactory forecasts is a very complex task. In order to reduce the noise caused by the wind variability, a multiresolution analysis by WT is proposed in this article.
The multiresolution analysis decomposes time series into different time scales. As result, the decomposed signals are both an "approximation" and "details" of an original signal.
After preliminary tests, it was found that the Daubechies family wavelets were more robust for the models developed in this article. The level of decomposition was set to 3. It should be emphasized that the results are strongly influenced by both the level of decomposition and the kind of wavelet mother.

Structure of the Forecasting Models
The WTANNLM and WTESN models have the same input and output patterns, i.e., sixteen inputs and one output. The input data are the wavelet coefficients from the multiresolution analysis of the last four hourly average wind speeds, using the Daubechies wavelet family (db10) decomposed at level 3. The output is the forecast of the hourly average wind speed for a given forecasting horizon k. Figure 7 schematically illustrates the models.

Mounting of the BASE MATRIX
Before performing the models' training, a matrix of input and output patterns was set up for each series of wind speeds, and for each forecasting horizon (1 to 24 h ahead). The main purpose of these matrices is to further the manipulation of the training patterns. In total, 72 matrices were mounted (3 locations and 24 different hours ahead).
The BASE MATRIX-BM is a matrix with m rows and n columns. The number of rows depends on the size of the wind speed and of the forecasting horizon. The number of columns is determined by summing the number of inputs and outputs (targets) of the forecasting models.
Given a forecasting horizon k, and a wind speed time series with w elements, the number of rows will be determined by the following expression: m = w − 3 − k, with k = 1, 2, . . . , 24, and w ≥ 28.
As there are sixteen inputs and one output for the networks, then, the number of columns is equal to seventeen (n = 17). Let us call the elements of the time series as s 1 , s 2 , ..., s w . Then, for each element s i , there are four corresponding elements a i,3 , d i,1 , d i,2 , d i,3 , where a i,j and d i,j corresponds, respectively, to the approximation and detail coefficients of the element s i for the level j. The BM is filled as presented in Table 2.

Development of the WTANNLM Model
A flow chart that illustrates the development process for the WTANNLM model is presented in Figure 8. For a better understanding of the process, we will explain in the following subsections the steps outlined on the diagram. All the procedures implemented in the following subsections aim to improve the generalization power of the neural networks, as addressed in [15,16], avoiding the memorization of the wind speeds patterns during the training of the models. Models based on ANN demand a pre-treatment of the data to ensure good performance during network training. Data normalization is the most commonly used method in preparation for ANN training and testing, because it tries to ensure that the same importance will be given to all inputs and targets of the neural network in weights' tuning. And this is also why artificial neuron activation functions are limited. In this article, the BM elements were normalized to values in the range from 0 to 1, according the following equation: where S and S are the wind speed non-normalized and normalized, respectively; S max and S min are the maximum and the minimum absolute value of the wind speed, respectively. Due to the use of preprocessing with wavelets, normalization was performed separately on three different groups: one group formed by columns of the BM referring to the coefficients of approximation (columns 1, 5, 9 and 13). The second group was formed by columns referring to the coefficients of details (columns 2, 3, 4, 6, 7, 8, 10, 11, 12, 14, 15 and 16). The third group refers to outputs (column 17).

Shuffle
As the training algorithm is based on a gradient descent method, if the data is given in some meaningful order, this can bias the gradient and lead to poor convergence. Thus, after normalization, the NORMALIZED BASE MATRIX-NBM rows were shuffled.
The shuffling of the NBM rows was done as follows: using the MATLAB function "rand", we created a vector whose number of elements is equal to the number of rows of the NBM. Then, applying the MATLAB function "sort" to the generated vector, we stored the indexes of the original elements positions; and finally, we used these indexes to reorder the NBM rows.

Database Partition
The NBM shuffled rows were used to create the datasets for adjusting the weights and bias of the neural networks. The database was divided into 60%, 30%, and 10% for training, validation and testing sets, respectively. Calling m the number of rows of the NBM, the separation sequence was defined as follows: training set-rows 1 to 0.60 m; validation set-rows (0.60 m + 1) to 0.90 m; test set-rows (0.90 m + 1) to m.

Selection of the Best Architecture
The RNA learning process is strongly influenced both by the number of patterns in the training set and by the free parameters (weights and biases), and also by the complexity of the problem addressed, as shown in [16]. Although the relationship between these variables can be defined deterministically, part of the scientific community prefers to use some practical rules for this definition. In this work, the variables are restricted to the number of free parameters, i.e., to the ANN architecture, because the database's size already limits the number of training patterns and the complexity is intrinsically defined by the kind of problem handled.
In this regard, the ANN architecture selection was defined by changing the amount of neurons in the hidden layer from 3 to 15, and selecting the one with the best performance in the training validation set, as described in [1]. For each hidden layer size, the weights and bias of the neural network were initialized ten times. The maximum number of training epochs was equal to 500. After the end of the process, the number of neurons in the hidden layer was defined by the lowest average MSE for the initializations.

Selection of the Best ANN
The 10-fold cross-validation method was applied to select the best RNA for each neural model. The training, validation and test sets were defined from ten mutually exclusive partitions composed by the random division of the rows of NBM. Each experiment, in total 10 for each forecasting horizon of a specific wind speed series, was performed by setting the partition d = 1, 2, 3, ..., 10 to test the ANN; 6 of the remaining partitions for training; and the last 3 for validation.
When applying this technique, it is expected that the MAE of the ANN predictions will tend to the MAE for the experiment's test set, as discussed in [15,16]. In this article, the selection of the best ANN was done based in the fold with lower MAE, averaged in ten initializations. Again, the maximum number of training epochs was equal to 500.

Development of the WTESN Model
A flow chart that illustrates the development process for the model WTESN is presented in Figure 9. For a better understanding of the process, we will explain in the following subsections the steps outlined on the diagram. All the procedures implemented in the following subsections aim to improve the generalization power of the Echo State Network.

Normalization
Normalization is performed exactly as presented for the WTANNLM model.

Database Partition
The NBM shuffled rows were used to create the datasets for adjusting the weights of the readout. The database was divided into 90% and 10%, for training and testing sets, respectively. Calling m the number of rows of the NBM, the separation sequence was defined as follows: training set-rows 1 to 0.90 m; test set-rows (0.90 m + 1) to m.

Selection of the Best Reservoir Size
The best reservoir size (i.e., the number of neurons in reservoir) was defined by changing the number of neurons in the reservoir, and selecting the one with the best performance in the test set during the training process. The procedure for each location and for each forecasting horizon was performed as follows: It was created an ESN containing 100 neurons in the reservoir, with hyperbolic tangent as activation functions; II. Random weights were assigned to the reservoir weight matrix, and the matrix was rescaled to yield a spectral radius equal to 0.8; III. Supervised training was performed through the MATLAB function "train_esn", specific function from the toolbox in [17], in offline mode and with parameter "nForgetPoints" equal to 100; IV. The MSE value for the test set was calculated; V. The number of neurons in the reservoir was increased by 100 units, it was returned to the step II, and this "loop" continued until the step IV of the reservoir with 1000 neurons was performed; VI. The best reservoir size was set equal to that one which presented the lowest average MSE for the test set.
After completing the steps above, the number of reservoir neurons for each of considered sites (Macau, Mossoro and Natal) was determined. For all of them, the least average MSE value was obtained with 1000 neurons in the reservoir. It was observed in all cases that increasing the number of neurons resulted in reduction of average MSE, however runtimes to training increased significantly. Some more training with a higher quantity of neurons were performed, but the reduction in average MSE was not significant.

Selection of the Best ESN
The next step after determining the reservoir size, which provides the final architecture of ESN, is the definition of the ESN that presents better generalization ability to forecasting wind speeds for each location and for each of the different forecasting horizons.
The best ESN was defined by changing the spectral radius of the reservoir weight matrix and the activation functions of its neurons, selecting the one with the best performance in the test set during the training process. The procedure for each location and for each forecasting horizon was performed as follows: It was created an ESN containing 1000 neurons in the reservoir, with hyperbolic tangent as activation functions; II. Random weights were assigned to the reservoir weight matrix, and the matrix was rescaled to yield a spectral radius equal to 0.1; III. Supervised training was performed through the MATLAB function "train_esn", specific function from the toolbox in [17], in offline mode and with parameter "nForgetPoints" equal to 100; IV. The MSE value for the test set was calculated; V. The spectral radius was increased by 0.1, it was returned to the step II, and this "loop" continued until the step IV with spectral radius equal to 1.4 was performed; VI. It was returned to the step I changing the activation functions to logistic sigmoid (in the first time that step VI was reached) or identity (in the second time that step VI was reached); VII. The best ESN was set equal to that one which presented the lowest average MSE for the test set.
After completing the steps above, the best ESN for each of considered sites was determined. The average MSE are summarized in Table 3. It is presented that when utilizing hyperbolic tangent and logistic sigmoid, lower values for the MSE was obtained for spectral radii between 1.1 and 1.4. With the use of identity function, there were convergence problems for spectral radiuses greater or equal to 1 (one), and this is the reason why the last four rows of Table 3 are marked in grey color and with the text Did Not Converge (DNC). Despite this, for each one of the three sites, the best ESN (those that presented the lowest average MSE for the 24 h forecasting horizons of test sets, marked in green in Table 3), has spectral radius equal to 0.9 and uses the identity as activation function of the reservoir neurons. The behavior of MSE for all tested ESN is presented in Figures 10-12. The rectangles colors are a Red-Green-Blue (RBG) mapping based on interpolated MSE values of their vertices, whose coordinates refer to the spectral radius (in the axis of the abscissa) and to the forecasting horizon (in the axis of the ordinates). The gray color was used to indicate that there was no convergence in at least one of the rectangle's vertices.

Conversion of Wind Speed to Wind Power Generation Forecasting
As the selected turbine has a hub height designed to 57 m and the wind speed series were measured at a 10 m height, and consequently the forecasts will be realized at this latter height too, it is necessary to convert the wind speed from 10 to 57 m before transforming it into wind power using the wind turbine manufacturer's power curve. The change in speed or direction over some distance is called wind shear. According to [18], the wind shear at a ground-level surface increases the wind speed with height, as follows: where s 0 is the wind speed measured at the reference height h 0 ; s 1 is the wind speed estimated at height h 1 ; and α is the friction coefficient (also known as Hellmann exponent), which represents the ground surface friction. In the present article, as in [1], the friction coefficient was equal to 0.1.

Results
As previously mentioned, the models were developed to forecast the hourly mean wind speeds with horizons ranging from 1 to 24 h ahead, for three different locations of Brazilian Northeast. These cities are: Macau, Mossoro and Natal.
The following subsections are divided in the following way: first, development of the models is presented; in sequence, comparisons between forecasts of the proposed models and PM are done. To avoid the use of several similar figures to illustrate the results of the simulations, it was chosen to present only some results for MACAU. Similar patterns and results were observed for all forecasting horizons and analyzed wind speed series.

Best Neural Networks for the WTANNLM Model
The MAE values for the test set with WTANNLM for one, six and twenty-four hours ahead are presented in Figure 13. For some initializations, due to a not adequate choice of the starting point, MAE values were higher. It occurs when the training algorithm did not converge, ending the process once in the first few training epochs.

Best Echo State Network for the WTESN Model
The MAE of test sets for the best ESN are presented in Figure 14. It can be observed that for all considered locations, as expected, the error increases when increasing the forecasting horizon.

Forecasts and Comparisons between the Models
This subsection presents comparisons between the performances of the WTANNLM and WTESN models with those obtained using PM (4). The criteria for selecting the periods was based on the choice of months that had no gaps in the wind speed data. It is important to emphasize that the data for these periods have never been used for training the WTANNLM and WTESN models.

Wind Speed Forecasts
The MAE values of wind speed forecasts are presented in Figure 15. As can be observed, the WTESN model proposed in this article provided better forecasts than PM and WTANNLM for all look-ahead times. For lower forecasting horizons (mainly up to 12 h ahead), the average errors of forecasts with the WTESN model are significantly lower than those obtained with PM. From 12 to 24 h ahead, the average errors of the WTANNLM model hardly vary, while these errors reduce for the PM.

Power Generation Forecasts
The power curve of the wind turbine with rated capacity of 2300 kW used in this article is presented in Figure 16. As manufacturers only provide a few points of the curve, it was parametrized at four different intervals through the least squares method, enabling the power estimation at any speed between the cut-in and cut-out. The NMAE for the power generation forecasts are presented in Figure 17 and its values are presented in Table 4, in which the minimum and maximum NMAE values marked in green and red, respectively. For the WTESN model, the maximum NMAE is 5.91% of the wind power installed capacity, while for the PM, the maximum NMAE is 13.13%. As expected, the minimum NMAE occurs for one hour ahead, with values equal to 1.88% for WTESN, 2.15% for WTANNLM and 4.73% for PM. The results for Mossoro and natal are also presented in Table 4.

Improvements of Power Generation Forecasts
The improvements of NMAE (5) regarding PM (4) for the power generation forecasts are presented in Figure 18, in which can be observed that for forecasts up to 15 h ahead, improvements of the WTESN model are greater than 60%. Figure 18. Improvements of NMAE with respect to PM of wind power forecasts for Macau.

Discussion
The models developed in this article are based on preprocessing the wind speed time series through multiresolution analysis by WT, with posterior wind speed forecasting by ESN and power forecasting obtained via wind turbine's power curve. The presented forecasting horizons fall within short-term forecasts up to twenty-four hours ahead, which is appropriate to support the operation planning of electrical power systems, since the startup of a thermal power plant must be planned in advance and the starting time varies from one plant to another.
According to [19], for wind farms in flat terrain, typical NMAE for wind power forecasting is around 6% of the installed capacity for the first 6 h, rising to 9-11% for 48 h ahead.
The average NMAE value for 3 h ahead with the hybrid model proposed in [23] is equal to 1.65%, and the average NMAE value for the PM is 5.24%. In [24], the NMAE value varies from 2 to 16% for forecasting horizons between 1 to 24 h ahead, respectively.
In [25], the authors present one hour ahead wind power forecasts based on random forests-RF for a one-year period. Results were compared with PM and ANN. The best NMAE values presented are equal to 3.49%, 3.67% and 4.25% for RF, ANN and PM, respectively.
The daily average NMAE for hourly wind power forecasts obtained with the proposed approach in [26] was equal to 0.48%, while for the PM, NMAE was equal to 1.01%. Care should be taken when analyzing such low values for NMAE. The authors in [26] did not emphasize, however, according to the graphs presented in that article, it can be observed that for a great part of the time wind power generation remains at very low or zero values. Thus, forecasting errors naturally tends to zero, as can be seen in the reduced NMAE for the PM.
The results obtained with WTESN are promising when compared with the state-of-the-art forecasters. As can be calculated by Table 4, the average NMAE of first six, twelve, eighteen and twenty-four hourly power generation forecasts are 2.32%, 2.87%, 3.59% and 4.16%, respectively. The average improvement of NMAE for the first six, twelve, eighteen and twenty-four hourly power generation forecasts are 70.87%, 71.99%, 67.77% and 58.52%, respectively.

Conclusions
When compared to PM and a model proposed in literature that uses ANN, the WTESN model provided better forecasts for all the look-ahead times considered, especially for the shorter ones (up to 12 h ahead). The average improvement of NMAE for the horizon of twelve hours ahead was above 70%. These results of improvements in relation to the PM are among the best published results to date in the field of wind power generation forecasts. Here is observed that the forecasts quality is strongly influenced by the autocorrelation series for the PM because it does not use WT.
Decomposition of wind speed by WT allow models to extract relevant information about the series, which significantly improve the forecasts when the decomposed signals are used as inputs.
Through the analysis, it was found that the behavior of forecast errors for different horizons was similar for all the three sites (as presented in Table 4). The adopted methodology was adequate, assuring statistically reliable forecasts, that is, the models acquired generalization capacity.
When comparing the performance of ANN and ESN, the results show that both are powerful time series forecasting tools, but the ESN proved to be better suited for wind power forecasting.