Applying Wavelet Filters in Wind Forecasting Methods

: Wind is a physical phenomenon with uncertainties in several temporal scales, in addition, measured wind time series have noise superimposed on them. These time series are the basis for forecasting methods. This paper studied the application of the wavelet transform to three forecasting methods, namely, stochastic, neural network, and fuzzy, and six wavelet families. Wind speed time series were ﬁrst ﬁltered to eliminate the high-frequency component using wavelet ﬁlters and then the different forecasting methods were applied to the ﬁltered time series. All methods showed important improvements when the wavelet ﬁlter was applied. It is important to note that the application of the wavelet technique requires a deep study of the time series in order to select the appropriate family and ﬁlter level. The best results were obtained with an optimal ﬁltering level and improper selection may signiﬁcantly affect the accuracy of the results.


Introduction
When the penetration of wind power into the network reaches a certain level, system operators have difficulties in balancing generation with demand. To help address this issue, it is necessary to apply forecasting methods to estimate the wind power generated in the next few hours and days.
It is difficult to compare different methods if they do not use the same dataset and the same performance indexes. A typical approach for comparison is to use the persistence model as the reference [12]. Table 1 illustrates the improvement achieved by different forecasting methods compared to the persistence method (although different forecast horizons are used, the list gives an idea of the range of improvement). These improvements are rarely over 25%.
Wind speed series have considerable uncertainty because of weather fluctuations and the added instrument uncertainty. This uncertainty and noise make it difficult to improve the forecasts. There are several strategies to process the data [13][14][15]. Some authors have used wavelet transforms [16] to process the time series. Most authors [17][18][19][20][21][22][23][24][25][26][27][28][29][30][31] have used wavelets to decompose the time series into sub-series, called approximation and details; applied the forecasting method to each sub-series, and finally, summarized the forecasting results to obtain the final solution. The advantage of this method comes from the sub-series having an improved performance with respect to the original series. A few authors [32][33][34][35][36][37] have used other wavelet filtering techniques to eliminate the high frequency variations and smooth the time series. In all papers, the authors selected the wavelet family and decomposition level without too much justification. For example, the cited authors only

Authors
Forecast Method Forecast Horizon Improvement vs. Persistence Model [1] AR -- [2] ARMA 1-10 h 12-20% [3] ARMA 20 -10 5-12% [4] ARMA 6 h 19% [5] Kalman filter 1 4-10% [6] Kalman filter -- [7] ANN 1 -10 11-8% [8] Spatial/ANN 2-2 h 15-25% [9] ANN 1 -1 h - [10] Neuro-fuzzy 6-36 h 46% [11] Neuro-fuzzy 10 5% In this paper, the wavelet transform was analyzed thoroughly. This work demonstrated that the selection of the wavelet family and decomposition level were far more important than they have been given credit for thus far. The improvement obtained was greater than that achieved with most new forecasting methods. The result was applied to the three main forecasting methods currently used, namely statistical, neural network, and fuzzy methods. These were applied to several forecast horizons and sample times. In all cases, the results obtained were improved for each method when the optimal wavelet filter was applied. Finally, the main contribution of the paper is to highlight the importance of data processing and to propose it as an additional phase in the forecasting method, so that both steps are optimized together.
The rest of this paper is structured as follows. Section 2 explains the basic concepts of the wavelet transform. Section 3 presents the different forecasting approaches. Section 4 describes the forecasting approach proposed. In Section 5, the comparison criteria to evaluate the improvement of each method are explained. Section 6 presents the results for the different methods considered. Finally, Section 7 draws the main conclusions of this research.

Basic Concepts of the Wavelet Transform
Fourier analysis is commonly used to help analyze different types of signals. With this method, a signal f (t) is expressed as a linear decomposition of real-valued functions of t, as shown in Equation (1), where a k are the real-valued expansion coefficients and φ k (t) are a set of real-valued functions of t called the expansion set. In Fourier series, these are sin(kω 0 t) and cos(kω 0 t) with frequencies of kω 0 .

Wavelet Transform
An introduction to the wavelet transform can be found in [16]. In Equation (2), the signal is already decomposed into coefficients a j,k and functions Ψ j,k (t), which depend on parameters j and k, f (t) = ∑ k ∑ j a j,k Ψ j,k (t) (2) where Ψ j,k are the wavelet expansion functions and a j,k is the discrete wavelet transform of f (t), or the set of expansion coefficients. The wavelet expansion functions, or family of wavelets are generated from a mother wavelet Ψ(t) by scaling and translation: where the parameter k translates the function and the parameter j scales it. Figure 1 shows the translating and scaling operations of the function Ψ 2 j t − k . where Ψ j,k are the wavelet expansion functions and a j,k is the discrete wavelet transform of f(t), or the set of expansion coefficients. The wavelet expansion functions, or family of wavelets are generated from a mother wavelet Ψ(t) by scaling and translation: where the parameter k translates the function and the parameter j scales it. Figure 1 shows the translating and scaling operations of the function (2 − ).

Multi-Resolution Formulation of Wavelet Systems
In multi-resolution analysis, the resolution of the approximation of f(t) depends on the choice of j in Equation (2). For a value of j = j0, the equation is: For low values of j, the approximation of f(t) can represent only coarse information. In multi-resolution formulation, φ j0,k (t) are called scaling functions. If we want to represent detailed information, then high values of j are required.
However, there is another way to describe a signal with better resolution without increasing j. This new approach consists of describing the differences between the approximation and the original signal with a combination of other functions called wavelets Ψ j,k (t) and the coefficients dj,k, as shown in Equation (5). The parameters k and j indicate the translation and scaling of the function.
There are several packets of scaling functions φ(t) and wavelets Ψ(t), as shown in Equation (6) (see Figure 2), which were chosen depending on the signal that has to be approximated.

Multi-Resolution Formulation of Wavelet Systems
In multi-resolution analysis, the resolution of the approximation of f (t) depends on the choice of j in Equation (2). For a value of j = j0, the equation is: For low values of j, the approximation of f (t) can represent only coarse information. In multi-resolution formulation, ϕ j0,k (t) are called scaling functions. If we want to represent detailed information, then high values of j are required.
However, there is another way to describe a signal with better resolution without increasing j. This new approach consists of describing the differences between the approximation and the original signal with a combination of other functions called wavelets Ψ j,k (t) and the coefficients d j,k , as shown in Equation (5). The parameters k and j indicate the translation and scaling of the function.
There are several packets of scaling functions ϕ(t) and wavelets Ψ(t), as shown in Equation (6) (see Figure 2), which were chosen depending on the signal that has to be approximated. In Equation (6), the coefficients h 0 (n) and h 1 (n) with n∈Z, are a sequence of real numbers called filter coefficients. The process is similar to digital filters, where h 0 (m − 2k) acts as a low-pass filter and h 1 (m − 2k) acts as a high-pass filter. Figure 3 shows the decomposition process of c j in: c j+1 (low frequency) and d j+1 (high frequency).  In Equation (6), the coefficients h 0 (n) and h 1 (n) with n∈Z, are a sequence of real numbers called filter coefficients. The process is similar to digital filters, where h 0 (m − 2k) acts as a low-pass filter and h 1 (m − 2k) acts as a high-pass filter. Figure 3 shows the decomposition process of c j in: c j+1 (low frequency) and d j+1 (high frequency). In Equation (6), the coefficients h 0 (n) and h 1 (n) with n∈Z, are a sequence of real numbers called filter coefficients. The process is similar to digital filters, where h 0 (m − 2k) acts as a low-pass filter and h 1 (m − 2k) acts as a high-pass filter. Figure 3 shows the decomposition process of c j in: c j+1 (low frequency) and d j+1 (high frequency). The j+1 level scaling coefficients are: These expressions represent the approximation and details of the signal for a j + 1 level scaling, and m = 2k + n is a sequence of real numbers.
This process can be repeated iteratively to reduce the high-frequency component as shown in Figure 4. In Figure 5b, we can see the approximation c 2 (k) and the details d 2 (k), d 1 (k), and d 0 (k) of the original signal f(t) are shown in Figure 5a. The j+1 level scaling coefficients are: These expressions represent the approximation and details of the signal for a j + 1 level scaling, and m = 2k + n is a sequence of real numbers.
This process can be repeated iteratively to reduce the high-frequency component as shown in Figure 4. In Equation (6), the coefficients h 0 (n) and h 1 (n) with n∈Z, are a sequence of real numbers called filter coefficients. The process is similar to digital filters, where h 0 (m − 2k) acts as a low-pass filter and h 1 (m − 2k) acts as a high-pass filter. Figure 3 shows the decomposition process of c j in: c j+1 (low frequency) and d j+1 (high frequency). The j+1 level scaling coefficients are: These expressions represent the approximation and details of the signal for a j + 1 level scaling, and m = 2k + n is a sequence of real numbers.
This process can be repeated iteratively to reduce the high-frequency component as shown in Figure 4. In Figure 5b, we can see the approximation c 2 (k) and the details d 2 (k), d 1 (k), and d 0 (k) of the original signal f(t) are shown in Figure 5a. In Figure 5b, we can see the approximation c 2 (k) and the details d 2 (k), d 1 (k), and d 0 (k) of the original signal f (t) are shown in Figure 5a.
Wind speed time series have a high-frequency component due to wind gusts, measurement errors, and random events as well as a low-frequency component with slower variation. The high-frequency component of the signal introduces a lot of noise into forecasting methods, causing them to perform poorly. If this component is eliminated and the forecasting methods are applied to an approximation with only the low frequency component, improved results can be obtained. Wind speed time series have a high-frequency component due to wind gusts, measurement errors, and random events as well as a low-frequency component with slower variation. The high-frequency component of the signal introduces a lot of noise into forecasting methods, causing them to perform poorly. If this component is eliminated and the forecasting methods are applied to an approximation with only the low frequency component, improved results can be obtained.

Wavelet Families
There is a large number of wavelets. The selection of the wavelet function depends on the problem and the properties of the wavelet function [16]. The main properties are its region of support and the number of vanishing moments. The region of support affects its localization capabilities, whereas the vanishing moments limit the ability of the wavelet to represent the information of a signal. In this paper, the wavelet families used were: Haar, Daubechies, Symlet, Coiflet, Biorsplines, and Meyer.
There are some methods to select the optimal wavelet family, but they have been developed for specific applications and it is not certain that they can be applied to forecasting problems:

•
In the cross-correlation method [38], the optimum wavelet maximizes the cross-correlation between the signal of interest and the wavelet; • In the energy method [39], the aim is to maximize the energy of the signal of interest; and • In the entropy method [40], the best wavelet minimizes the entropy of the signal of interest.

Forecasting Models
The wavelet filter was applied to several forecasting methods, namely regression, neural network and fuzzy models.

Persistence Model
In the persistence model, the variable value in t + Δt is equal to the variable value in t. Due to its simplicity, this model was used as a reference. y t = y t−1 (8)

Wavelet Families
There is a large number of wavelets. The selection of the wavelet function depends on the problem and the properties of the wavelet function [16]. The main properties are its region of support and the number of vanishing moments. The region of support affects its localization capabilities, whereas the vanishing moments limit the ability of the wavelet to represent the information of a signal. In this paper, the wavelet families used were: Haar, Daubechies, Symlet, Coiflet, Biorsplines, and Meyer.
There are some methods to select the optimal wavelet family, but they have been developed for specific applications and it is not certain that they can be applied to forecasting problems:

•
In the cross-correlation method [38], the optimum wavelet maximizes the crosscorrelation between the signal of interest and the wavelet; • In the energy method [39], the aim is to maximize the energy of the signal of interest; and • In the entropy method [40], the best wavelet minimizes the entropy of the signal of interest.

Forecasting Models
The wavelet filter was applied to several forecasting methods, namely regression, neural network and fuzzy models.

Persistence Model
In the persistence model, the variable value in t + ∆t is equal to the variable value in t. Due to its simplicity, this model was used as a reference.

Regressive Model
This model [41] is based on the multiple regressions that study the relations between a dependent variable and a set of independent variables. Among the independent variables, there are exogenous variables such as temperature, and intrinsic variables like the historical values of these variables. When the model only uses the historical values of these variables, it is called the auto-regressive temporal series model. In this work, the model used the historical values: where α i represents the auto-regressive parameters and p is the number of past values.

Neural Network Model
Neural networks [42] are auto-adaptive dynamic systems that are able to find nonlinear relations between several variables. The model used is a multilayer perceptron that gives good results in forecasting problems: where θ j and θ k are the layer thresholds; w ji and w kj are the layer weights; i and j are the number of neurons in each layer; and g is the activation function.

Fuzzy Model
The fuzzy model [43] is based on concepts of fuzzy sets theory, fuzzy rules of type if-then, and approximation reasoning: where ω i is the normalized firing strengths; u i is the functions that depend on the inputs y t-I ; A i is the fuzzy set that represents the input variables; and p i is the membership grade of each input y t in A i .

Forecasting Approach
Wind time series have high variability due to the intrinsic uncertainty of the wind; this variability negatively influences the forecasting result. In this paper, the adopted approach, illustrated in Figure 6, consists of using an optimized filter based on wavelets to de-noise the data that are used to train the chosen forecasting method. The filter is optimized with a genetic algorithm that selects the best wavelet family and the optimal decomposition level.
The algorithm receives as inputs the wind speed data and the prediction method to be used.
In step 1, a random population of individuals is created. Each individual contains the information of the parameters of the prediction method, the wavelet family, and the level of decomposition.
In step 2, each individual in the population is evaluated. The evaluation has three phases.
The first phase consists of applying the wavelet filter to the input data with the wavelet family and the level indicated by each individual. The data are divided into training and test sets. The original time series is filtered with the wavelet transform and is decomposed into an approximation component and several details of the signal. The approximation component has improved behavior in comparison to the original series in the forecasting process. Therefore, only the approximation component is used in the next phase and the details are discarded. In this phase, there are two important decisions to analyze: the best wavelet family to use and the optimal filter level. These questions have not been answered in the technical literature.
The second phase consists of training the prediction method with the parameters indicated by each individual and the training dataset. A forecasting method is applied only to the approximation component. In this paper, three methods were used to forecast the time series: autoregressive, neural networks, and fuzzy models.
The third phase consists of evaluating the prediction method, already trained, with the test dataset.
In step 3, the best individuals are selected based on the error obtained in the evaluation with the test data. In step 4, the crossover and mutation operators of the genetic algorithm are applied that give rise to a new population.
Steps 2 through 4 are repeated until the termination criterion is reached, which is the number of generations or iterations of the genetic algorithm. The algorithm receives as inputs the wind speed data and the prediction method to be used.
In step 1, a random population of individuals is created. Each individual contains the information of the parameters of the prediction method, the wavelet family, and the level of decomposition.
In step 2, each individual in the population is evaluated. The evaluation has three phases.
The first phase consists of applying the wavelet filter to the input data with the wavelet family and the level indicated by each individual. The data are divided into training and test sets. The original time series is filtered with the wavelet transform and is decomposed into an approximation component and several details of the signal. The approximation component has improved behavior in comparison to the original series in the forecasting process. Therefore, only the approximation component is used in the next phase and the details are discarded. In this phase, there are two important decisions to analyze: the best wavelet family to use and the optimal filter level. These questions have not been answered in the technical literature.
The second phase consists of training the prediction method with the parameters indicated by each individual and the training dataset. A forecasting method is applied only to the approximation component. In this paper, three methods were used to forecast the time series: autoregressive, neural networks, and fuzzy models.

Forecasting Errors
We compared these forecasting methods with the simpler persistence model used as the reference. The measure of the error of each method was calculated by the root mean square error (RMSE), where Xpred t is the predicted value in t; Xreal t is the real value at t; and n is the number of samples.
The improvement of each method in comparison to the persistence model was calculated with the following equation: Energies 2021, 14, 3181 8 of 22

Data Description
The wavelet filter was applied to several forecasting methods: regression, neural network, and fuzzy models. Five wind speed series were used in this work: two series with a high sampling frequency (1 and 1 , respectively) and three with a sampling frequency of 10 . Table 2 shows the main statistic characteristics of these time series and Figures A1-A5 in the Appendix A represent their temporal variation. Several cases were built from these data to observe the performance of the forecasting models when the sampling step (∆t) and the forecasting horizon (FH) changed. An identifier (ID) was assigned to each case in Table 3.

Results
Every time series was divided into two sets of the same length: a training set and a test set. The forecasting models were built with the training set and the results presented here were obtained by applying these models to the test set. Moreover, eight inputs were used in all methods.
First, a detailed analysis of wavelet filtering is presented, aiming to answer whether they are helpful with different prediction methods and whether they depend on factors such as the level of filtering, the prediction horizon, and the sampling frequency. Afterward, we analyze whether it is possible to select the wavelet family by any of the methods described in the literature regardless of the prediction method used. Finally, its application is presented in a specific case.

Influence of Wavelet Filters in Several Forecasting Methods
The best obtained results of applying the wavelet filters on time series are presented in Figures 7 and 8. It is shown that regardless of the model, the forecasting horizon, or time step, the performance was much better with the wavelet filter than without it when the optimal wavelet family and level were chosen. The best obtained results of applying the wavelet filters on time series are presented in Figures 7 and 8. It is shown that regardless of the model, the forecasting horizon, or time step, the performance was much better with the wavelet filter than without it when the optimal wavelet family and level were chosen.  Detailed results are provided in Tables A1-A6 in the Appendix. It is important to note that the optimal wavelet family and level was different in each case. That is, there was a lot of variability in this point. This fact is contrary to the widespread action among researchers who choose these parameters depending on the application.

Influence of Decomposition Level
However, there was a great similarity in the performance of the different wavelet families in each case. Each family reached different levels of improvement, but all families achieved their maximum improvement percentage at a similar level. Figure 9 shows the improvement/level rate for a particular case, with different forecasting methods. Figure  10 shows the improvement/level rate of the same case and method for different wavelet families (see Tables A10-A13 for details). Detailed results are provided in Tables A1-A6 in the Appendix A. It is important to note that the optimal wavelet family and level was different in each case. That is, there was a lot of variability in this point. This fact is contrary to the widespread action among researchers who choose these parameters depending on the application.

Influence of Decomposition Level
However, there was a great similarity in the performance of the different wavelet families in each case. Each family reached different levels of improvement, but all families achieved their maximum improvement percentage at a similar level. Figure 9 shows the improvement/level rate for a particular case, with different forecasting methods. Figure 10 shows the improvement/level rate of the same case and method for different wavelet families (see Tables A10-A13 for details).
However, there was a great similarity in the performance of the different wavelet families in each case. Each family reached different levels of improvement, but all families achieved their maximum improvement percentage at a similar level. Figure 9 shows the improvement/level rate for a particular case, with different forecasting methods. Figure  10 shows the improvement/level rate of the same case and method for different wavelet families (see Tables A10-A13 for details).  The last results explain why researchers can obtain favorable solutions by applying wavelets, even when they do not select the wavelet family and level accurately.

Influence of the Forecasting Horizon
Comparisons made up to now were in percent, because it permitted us to adequately show the difference between whether the wavelet filters were applied or not. However, it is necessary to remember that the error (RMSE) increased with the forecasting horizon, as can be seen in Figures 11-13, although less when the wavelet filters were applied. The last results explain why researchers can obtain favorable solutions by applying wavelets, even when they do not select the wavelet family and level accurately.

Influence of the Forecasting Horizon
Comparisons made up to now were in percent, because it permitted us to adequately show the difference between whether the wavelet filters were applied or not. However, it is necessary to remember that the error (RMSE) increased with the forecasting horizon, as can be seen in Figures 11-13, although less when the wavelet filters were applied.

Influence of the Forecasting Horizon
Comparisons made up to now were in percent, because it permitted us to adequately show the difference between whether the wavelet filters were applied or not. However, it is necessary to remember that the error (RMSE) increased with the forecasting horizon, as can be seen in Figures 11-13, although less when the wavelet filters were applied.

Influence of Different Sampling Frequencies
In Figure 14, it can be seen that with low filtering levels, an important improvement was obtained, but with high filtering levels, information was lost and the improvement decreased or even worsened substantially at high sampling frequencies.

Influence of Different Sampling Frequencies
In Figure 14, it can be seen that with low filtering levels, an important improvem was obtained, but with high filtering levels, information was lost and the improvem decreased or even worsened substantially at high sampling frequencies.

Selection of Optimal Wavelet Family
In Table 4, the wavelet families found by the cross-correlation, energy, and entr methods are shown in the columns "cross-corr", "energy", and "entropy", respectiv The column "optimum" shows the wavelet family that gave the best results in our te

Selection of Optimal Wavelet Family
In Table 4, the wavelet families found by the cross-correlation, energy, and entropy methods are shown in the columns "cross-corr", "energy", and "entropy", respectively. The column "optimum" shows the wavelet family that gave the best results in our tests. These methods were applied to the time series with poor results. The cross-correlation method obtained the correct result in cases 7, 30, and 31; the energy method in cases 1, 5, 13, 15, 18, 25, 27, 32, 33, 34 and 3; and the entropy method in cases 5 and 33.

Applying the Forecasting Approach
The importance of using filtered data is illustrated in the following example. Figure 15 shows the first 300 data points (to appreciate it in detail) of the original data series of case 22, the data series filtered with the wavelet family "dmey" and a filter level 2, and the difference between the two series. Figure 16 shows the results of the forecast made with the neural network trained with the filtered data training set, and Figure 17 shows the results of the forecast made with the neural network trained with the unfiltered data training set. The results using correctly filtered data were considerably better than those with the unfiltered data. These methods were applied to the time series with poor results. The cross-correlation method obtained the correct result in cases 7, 30, and 31; the energy method in cases 1, 5, 13, 15, 18, 25, 27, 32, 33, 34 and 3; and the entropy method in cases 5 and 33.

Applying the Forecasting Approach
The importance of using filtered data is illustrated in the following example. Figure  15 shows the first 300 data points (to appreciate it in detail) of the original data series of case 22, the data series filtered with the wavelet family "dmey" and a filter level 2, and the difference between the two series. Figure 16 shows the results of the forecast made with the neural network trained with the filtered data training set, and Figure 17 shows the results of the forecast made with the neural network trained with the unfiltered data training set. The results using correctly filtered data were considerably better than those with the unfiltered data.   1  12  23  34  45  56  67  78  89  100  111  122  133  144  155  166  177  188  199  210  221  232  243  254  265  276  287  298 Wind speed (m/s)

Time (hour)
Original data Filtered data Difference

Conclusions
In this paper, the forecasting models were applied to the approximation compo of the wavelet decomposition and the details components were discarded, as oppose most authors who use both components in their forecasting models.
A deep analysis of the wavelet filter in results was made, and the conclusions enable improvements in all forecasting models.
The wavelet filter method was applied to different forecasting models: regress neural network, and fuzzy models. In all models, this technique (wavelet filter + forec ing model) improved the obtained results compared to the case when only the forecas model was used. The improvement of these methods versus the persistence method between 2% and 30%, but with the wavelet filter method, it was between 20% and 50 The study was extended to several wavelet families. In all cases, there were impr ments, but it was not easy to select the best family. The selection methods did not w for the proposed method.  1  11  21  31  41  51  61  71  81  91  101  111  121  131  141  151  161  171  181  191  201  211  221  231  241  251  261  271  281 Wind speed (m/s)

Time (hour)
Original data nf-neural difference

Conclusions
In this paper, the forecasting models were applied to the approximation component of the wavelet decomposition and the details components were discarded, as opposed to most authors who use both components in their forecasting models.
A deep analysis of the wavelet filter in results was made, and the conclusions will enable improvements in all forecasting models.
The wavelet filter method was applied to different forecasting models: regression, neural network, and fuzzy models. In all models, this technique (wavelet filter + forecasting model) improved the obtained results compared to the case when only the forecasting model was used. The improvement of these methods versus the persistence method was between 2% and 30%, but with the wavelet filter method, it was between 20% and 50%.

Time (hour)
Original data nf-neural difference

Conclusions
In this paper, the forecasting models were applied to the approximation component of the wavelet decomposition and the details components were discarded, as opposed to most authors who use both components in their forecasting models.
A deep analysis of the wavelet filter in results was made, and the conclusions will enable improvements in all forecasting models.
The wavelet filter method was applied to different forecasting models: regression, neural network, and fuzzy models. In all models, this technique (wavelet filter + forecasting model) improved the obtained results compared to the case when only the forecasting model was used. The improvement of these methods versus the persistence method was between 2% and 30%, but with the wavelet filter method, it was between 20% and 50%.
The study was extended to several wavelet families. In all cases, there were improvements, but it was not easy to select the best family. The selection methods did not work for the proposed method.
The filtering level was more important to obtain good results than the wavelet family in most of cases. This optimum level was between 2 and 5 in all wavelet families.
As a final conclusion, it seems necessary to use an optimization algorithm to select the wavelet family and level.
It has become clear that it is not easy to determine the parameters of the data processing methods and that they significantly influence the results obtained. Hence, future research is the joint optimization of the data processing and the forecasting method.

Data Availability Statement:
The data presented in this study are available in supplementary material.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
Below are the graphical representations of the measured wind speed data in each time series (Figures A1-A5). The filtering level was more important to obtain good results than the wavelet family in most of cases. This optimum level was between 2 and 5 in all wavelet families.
As a final conclusion, it seems necessary to use an optimization algorithm to select the wavelet family and level.
It has become clear that it is not easy to determine the parameters of the data processing methods and that they significantly influence the results obtained. Hence, future research is the joint optimization of the data processing and the forecasting method.

Data Availability Statement:
The data presented in this study are available in supplementary material.

Conflicts of Interest:
The authors declare no conflicts of interest.

Appendix A
Below are the graphical representations of the measured wind speed data in each time series (Figures A1-A5).   The filtering level was more important to obtain good results than the wavelet family in most of cases. This optimum level was between 2 and 5 in all wavelet families.
As a final conclusion, it seems necessary to use an optimization algorithm to select the wavelet family and level.
It has become clear that it is not easy to determine the parameters of the data processing methods and that they significantly influence the results obtained. Hence, future research is the joint optimization of the data processing and the forecasting method.

Data Availability Statement:
The data presented in this study are available in supplementary material.

Conflicts of Interest:
The authors declare no conflicts of interest.

Appendix A
Below are the graphical representations of the measured wind speed data in each time series (Figures A1-A5).     In each table, the column FH is the forecasting horizon, Δt is the time step in the lookahead period, then we have the RMSE of the persistence model, the RMSE of the study model without a wavelet filter, the RMSE of the study model with the wavelet filter, and finally, we have the improvement of the model with and without the wavelet filter versus the persistence model.    In each table, the column FH is the forecasting horizon, Δt is the time step in the lookahead period, then we have the RMSE of the persistence model, the RMSE of the study model without a wavelet filter, the RMSE of the study model with the wavelet filter, and finally, we have the improvement of the model with and without the wavelet filter versus the persistence model.    In each table, the column FH is the forecasting horizon, Δt is the time step in the lookahead period, then we have the RMSE of the persistence model, the RMSE of the study model without a wavelet filter, the RMSE of the study model with the wavelet filter, and finally, we have the improvement of the model with and without the wavelet filter versus the persistence model.  In each table, the column FH is the forecasting horizon, ∆t is the time step in the look-ahead period, then we have the RMSE of the persistence model, the RMSE of the study model without a wavelet filter, the RMSE of the study model with the wavelet filter, and finally, we have the improvement of the model with and without the wavelet filter versus the persistence model. In Tables A10-A13, we can see the effect (in %) of using different wavelet families and different filtered j-level. The best results are marked in bold and underlined. The abscissa axis is the filter level and the ordinate axis is the wavelet family.