Innovative Hybrid Modeling of Wind Speed Prediction Involving Time-Series Models and Artiﬁcial Neural Networks

: This work proposes hybrid models combining time-series models (using linear functions) and artiﬁcial intelligence (using a nonlinear function) that can be used to provide monthly mean wind speed predictions for the Brazilian northeast region. These might be useful for wind power generation; for example, they could acquire important information on how the local wind potential can be usable for a possible wind power plant through understanding future wind speed values. To create the proposed hybrid models, it was necessary to set the wind speed variable as a dependent variable of exogenous variables (i


Introduction
The development of renewable energy research is growing all over the world.This can be justified by the occurrence of climate change, which many scientists claim is a result of increased anthropogenic gases emission (i.e., CO 2 , NO x , SO x ).Climate change causes negative effects such as increased greenhouse effect, acid rain, and the degradation of the ozone layer [1].The benefits of using renewable sources of energy mainly address economic and environmental concerns.Thus, renewable energy sources can give future generations the opportunity to live in a healthier world and are fundamental to humanity.It is, therefore, necessary to invest more on clean and renewable energy research in order to reach sustainable development [2].
One of the renewables energies that contribute the most for sustainable development is the wind power-the generation of electric energy through the wind-which is growing meaningfully all over the world [3].It is certainly motivated by researches that guarantee its viability in economic and environmental terms.The growing impulse may have started from constant debates inside the academic community, mainly warning of the need to search for energy alternatives to conventional energy sources that uses combustion of petroleum, gas or coal and also for the attenuation of climate changes.
It is possible to consider Brazil as one of the countries most capable of generating electricity from renewable resources.For example, its main sources of energy are hydropower plants, representing approximately 65% of the nation's total energy generation, according to the Brazilian Energetic Budget of 2014 [4].According to the Brazilian Wind Power Atlas, it is estimated that about 143 GW of the country's energy is generated from the wind [5], 51% of which was generated in the northeast region.Of all this generated power, the Ceará state (where the annual mean wind speed is 8.5 m/s at 50 m) produces approximately 18 GW [6]; this has brought the attention of wind power sector companies' to the state.Ceará's annual mean wind speed is quite significant compared to the ones from the European Wind Atlas, especially in Germany where the mean wind speed ranges from 5.0 to 6.0 m/s at the same height in a coastal area [7].
A wind generation research technique that might contribute to its further growth is wind speed forecasting, which can assist with local wind exploration.In Liu et al. [8], the authors use the autoregressive integrated moving average (ARIMA) model-also known as Box-Jenkins modeling [9]-to predict wind speed with the aim of protecting running trains against the strong cross winds along the Chinese Qinghai-Tibet railway; a strong wind warning system was developed.Another methodology that has been studied to forecast wind speed used artificial neural networks (ANN).For example, in Cadenas and Rivera [10], a short-term prediction was performed on the La Venta region, in Oaxaca, Mexico.Yet another model that has been studied for wind speed prediction is simple exponential smoothing (SES), which uses weighted averages in order to provide greater weight to more recent observations; for example, in Cadenas et al. [11] the authors show that SES creates more minor errors when applied to a Mexican coastal area compared to ARIMA and ANN models.It is also possible to find papers about using a combination of ARIMA and ANN models to predict time-series, i.e., the so-called hybrid models.An example is Liu et al. [12], where the application of this type of modelling on wind speed prediction in wind farms of China was analyzed.This paper proposes hybrid models which combine time-series models (through linear functions) with artificial intelligence (through nonlinear function), with the main objective of providing monthly mean wind speed predictions for the northeast region of Brazil.This could be useful to the wind power sector, for example, in acquiring important information on how local wind power can be harnessed.The intention is to identify the reliability of the proposed models in representing observed data and, from there, develop something that might be used in the wind sector's interest areas.Such models could give more assurance to this sector's decision-makers with relation to the wind exploration of a given region as they could possibly forecast the intensity of local wind speeds and, thus, help to determine the future wind regime of the region.

Materials and Methods
In this section we comment on the data of the study regions, as well as on the forecast models used.The accuracy measures are presented to identify the quality of the adjustments produced by the models as compared to the observed time-series.All the calculations produced in this study and in the graphical part were executed by the free software, R (https://www.r-project.org/(accessed on 30 January 2017)), which has been widely used in several academic activities.

Regions under Investigation and Wind Speed Data
The regions of study in this work are Fortaleza (3.77W).These regions are all located in the northeast of Brazil, as shown in Figure 1.The meteorological data used are wind speed (in m/s), pressure (in hPA), temperature (in • C), and precipitation (in mm).The latter three are independent variables-also known as exogenous variables-which are explained below.All of these data are made up of monthly averages, Atmosphere 2018, 9, 77 3 of 18 with 144 measurements starting in January 2003 and ending in December 2014.They all serve as input to the models, which were run in the R software.The meteorological data used was originally measured at a height of 10 m, and was collected from anemometric towers at airports in the respective localities.Pressure, temperature, and precipitation data were also collected.All mentioned data belong to the Brazilian Government and were collected from airports at the areas under investigations.
Atmosphere 2018, 9, x FOR PEER REVIEW 3 of 18 models, which were run in the R software.The meteorological data used was originally measured at a height of 10 m, and was collected from anemometric towers at airports in the respective localities.Pressure, temperature, and precipitation data were also collected.All mentioned data belong to the Brazilian Government and were collected from airports at the areas under investigations To apply the meteorological data on the models proposed in this paper (with the aim of predicting monthly mean wind speed), the following phases were followed.(1) First, the adjustment phase, which consists of introducing the observed data (i.e., wind speed as the dependent variable and pressure, temperature, and precipitation as exogenous variables) to the forecast models.This is so that it is possible to provide the respective adjustments using the period from January 2003 to December 2014.It should be emphasized that all meteorological variables mentioned were complete in this period, i.e., 144 measurements were accounted for; (2) Second, the adjustment quality phase.This investigation is performed using error statistics methods with the objective of identifying the accuracy of the models, which are discussed below; (3) Thirdly, the forecast phase, which was done using the best adjusted models.These were found using the error statistic.It was possible to carry out the forecasts for the following year based on the observed data used to find the adjustments, i.e., regarding the months of 2015.To apply the meteorological data on the models proposed in this paper (with the aim of predicting monthly mean wind speed), the following phases were followed.(1) First, the adjustment phase, which consists of introducing the observed data (i.e., wind speed as the dependent variable and pressure, temperature, and precipitation as exogenous variables) to the forecast models.This is so that it is possible to provide the respective adjustments using the period from January 2003 to December 2014.It should be emphasized that all meteorological variables mentioned were complete in this period, i.e., 144 measurements were accounted for; (2) Second, the adjustment quality phase.This investigation is performed using error statistics methods with the objective of identifying the accuracy of the models, which are discussed below; (3) Thirdly, the forecast phase, which was done using the best adjusted models.These were found using the error statistic.It was possible to carry out the forecasts for the following year based on the observed data used to find the adjustments, i.e., regarding the months of 2015.

Box-Jenkins and Box-Tiao Modeling
To perform wind speed predictions on the study regions, Box-Jenkins type modeling [9] was used.This refers to the systematic method of identification, adjustment, checking, and use of ARIMA.The use of ARIMA models is a powerful approach in solving many prediction problems as it can provide extremely accurate time-series forecasts.As time-series collect data sequentially over time, it is expected to show a serial correlation over time.The ARIMA models are mathematical models that aim to capture the behavior of the serial correlation-or autocorrelation-between the values of the time-series and, based on this behavior, make future predictions [13].
ARIMA modeling is commonly applied to non-stationary linear processes, assuming that the time-series are not stationary (i.e., that there is no tendency to motions towards growth or decay of any kind).In order to make them stationary, a certain number of differences (usually one or two) should be applied between their order data (d).For these processes, the model is commonly represented by ARIMA(p,d,q).The number of times that the original series must be differentiated before resulting in a stationary series is called the "order of homogeneity".These processes are described using the autoregressive integrated (i.e., number of differentiations) moving averages model of orders p, d, and q ARIMA(p,d,q), which can also include a seasonal operator.Then, it is represented by SARIMA(p,d,q)(P,D,Q) (Seasonal-AutoRegressive Integrated Moving Average), in which P, D, and Q are the terms found for the seasonal part of the time-series, i.e., similar to those described for ARIMA [13].In order to estimate the parameters of the ARIMA model, the values of p and q can be estimated using the p parameter (φ) (i.e., the coefficient related to the autoregressive stationary filter), q parameter (θ) (i.e., the coefficient related to the moving averages filter), and the model errors (ε t , also called residues).This is done according to Equation (1), in which W t represents the time-series for projection).
Another methodology used is the Box-Tiao model, which is called ARIMAX [14]; it represents an expansion of the ARIMA models that adds a linear component as a function of the covariate observations (or exogenous variables).The difference between the two is that ARIMAX has an exogenous input, in addition to the auto-regressive and moving averages parameters [15].The ARIMAX model can be understood as the combination of the auto-regressive (p), integrated (d), moving average (q), and exogenous (r) models, which can then be symbolized as ARIMAX(p,d,q,r).A simplified form of mathematically representing this model is described in Equation (2).
where y t is a time-dependent variable, the one to be projected (i.e., wind speed); ρ is a constant; y t−i is the wind speed lagged by i time steps; β i is the coefficient of y t−i ; p is the maximum number of time intervals; w j represents the exogenous variables (in this case pressure, temperature, and precipitation); ω j represents the coefficients of the exogenous variables; r is the maximum number of exogenous variables; θ j is the coefficient of the term of ε t−j which, in turn, represents the error at time t, lagged from j; and ε t is the error component of the model, with ε t ~N(o,σ 2 ).The coefficients of the models are estimated by regression.Further details on Equation ( 2) can be found in [16].
The orders of the ARIMA and ARIMAX models (i.e., the numbers that will lead to the operation of the functions) were found through the auto.arima( . . . ) function.This function is part of software R's forecast package, which was created in Hyndman and Khandakar [17].The choice of the best ARIMA and ARIMAX models takes into account a series of criteria to provide the best adjustments.For example, the Akaike criterion is based on the identification of the lowest AIC (Akaike Information Criterion) value which, theoretically, results in the best model to fit the observed data [13].The auto.arima( . . . ) function is very useful for finding the (p,d,q) and (p,d,q,r) terms of the ARIMA and ARIMAX models, respectively.In turn, the best terms will be important in finding the coefficients of Equations ( 1) and ( 2).Thus, it is possible to estimate the best time-series adjusted by each model.In the specific case of the ARIMA model, as a univariate model, the auto.arima( . . . ) function input is only relative to the wind speed observed time-series (from January 2003 to December 2014, which is considered a training phase).Similarly, the only change in the ARIMAX model is due to the fact that it is a multivariate model, namely there will be a need for other input variables (i.e., pressure, temperature, and precipitation) that can be related with the observed wind speed of the regions investigated.In addition to the AIC value, the auto.arima( . . . ) function is also based on statistical error measurements (to be discussed later), which are used to find the most parsimonious models to most accurately represent the observed time-series.
In order to evaluate if the models of the Box-Jenkins and Box-Tiao methodologies are feasible for wind speed prediction, a residue analysis was performed.According to Brockwell and Davis [13], for the ARIMA and ARIMAX models to be viable regarding the adjustments of observed data, the error term (ε t ) of this model should behave as white noise, i.e., have a zero mean, constant variance (homoscedastic), and to be uncorrelated (that is, the errors are independent).Also, the term ε t must follow a normal distribution.To identify these assumptions, there are possible tests to be applied on the waste.Among them, the residue normality verification test from the Shapiro-Wilk Test [18], the residue independence test from the Durbin-Watson Test, and the homoscedasticity hypothesis from the Breusch-Pagan Test (which verifies the equality of the residue variance) [19].Table 1 shows a summarized representation of the used tests, which were applied at the level of significance of α = 0.05.

Durbin-Watson Test
H 0 : the residues are independents.H 1 : the residues are not independents.Decision making: if the p-value is greater than α, i.e., p > 0.05 (do not reject H 0 ).

Breusch-Pagan Test
H 0 : the residues have homoscedasticity.H 1 : the residues have heteroscedasticity.Decision making: if the p-value is greater than α, i.e., p > 0.05 (do not reject H 0 ).
On the choice of exogenous variables applied to the ARIMAX model, it is important to say that wind speed is related to displacements of air masses which occur due to pressure differences in the atmosphere [20].The movement always happens from high-pressure zones to low-pressure zones and, thus, determines the general dynamics of the atmospheric circulation and the formations of the different climatic types [21].It is noteworthy that air masses can be defined as portions or volumes of the atmosphere that have practically the same characteristics in terms of pressure, temperature, and humidity because of their location.They are quite thick and homogeneous [21].It is also important to note that air is composed mainly of nitrogen, oxygen, and argon which, together, make up most of the gases in the atmosphere.Other gases include greenhouse gases such as water vapor, carbon dioxide, methane, nitrous oxide, and ozone.Variables such as pressure, volume, and temperature are considered intrinsic quantities of any gases [22].Due to the importance of the variables of pressure and temperature in relation to the displacement of air masses, it was decided to relate both of these with the wind speed in applying the proposed model.The motivation to choose precipitation is because, in the literature, there are several studies that show differences in the intensity of the winds over the northeastern region of Brazil during the rainy and dry periods [23][24][25].

Holt-Winters Model
In 1957, Holt expanded the simple exponential smoothing model to deal with linear trend data, thus, making predictions that were more accurate than those performed with simple exponential smoothing by adjusting to this condition.In 1960, Winters extended the Holt model to include a new equation that could be used to predict the behavior of the seasonal component of the data being processed.This generated the Holt-Winters (HW) method [13].The equation refers to the exponential smoothing method with seasonality and linear tendency, that is, with the seasonal component being treated in an additive way.It is represented as follows.
where a t is the series level (in m/s), which is related to how the predicted time-series evolves over time; it is identified whether it varies slowly over time or, exceptionally, undergoes sudden variations.For example, the predicted series can have a slow-growing motion with linear behavior.b t represents trend (in m/s 2 ), which is related to the fact that the predicted time-series has growth or decreasing motions that may occur at distinct time intervals.For example, a hypothetical predicted time-series has a linear growth trend in its first year, but in the second year it is identified with a linear decrease.s t represents the seasonal component (in m/s), which is related to the fact that the expected time-series has cyclical patterns of variation that repeat at relatively constant intervals of time.For example, a hypothetical predicted time-series shows undulatory behavior every three months in one year.Y t+n represents the forecast for n periods ahead (in m/s); p represents the seasonal period; and n = 1, 2, 3, . . ., h represents the forecast horizon.

Artificial Intelligence with Neural Networks
ANNs are part of so-called artificial intelligence, which is a branch of computer science research.They were first presented in 1943 by neurophysiologist McCulloch and mathematician Walter Pitts [26] According to Zhang et al. [27], one of the computational intelligence techniques commonly used in predicting time-series is ANN training.These are based on the architecture and learning of the human brain.ANNs work conceptually in a way similar to the human brain, i.e., trying to recognize regularities and patterns of data, and so they are able to learn from experience and make generalizations based on their previously accumulated knowledge.Structurally, the ANNs' may have both nonlinear and linear models and, thus, obtain better results compared to other prediction models [28].
An ANN can be thought of as a network of "neurons" organized in layers.The predictors (or inputs, x t−i ) form the lower layer and the predictions (or outputs, y t+h ) form the upper layer.There may be intermediate layers containing "hidden neurons." Figure 2 shows an example of the structure of an ANN with four inputs and with one hidden layer.The coefficients connected to the predictors are called "weights" and are commonly represented by w i .The weights are selected through a "learning algorithm", as in the case of backpropagation.This arises from the fact that the algorithm is based on the backpropagation of the errors to perform the intermediate layers' weights adjustments.This minimizes the error between predicted and observed time-series [27].In the ANN structure similar to Figure 2, the node outputs in one layer are inputs to the next layer.The result is then modified by a nonlinear function before the output.For example, inputs into neurons j in hidden layers can be linearly combined through the expression where bj and wi,j are parameters discovered in the "learning" step from the observed data of the current time-series.It is worth noting that the weights may initially assume random values which, in turn, will be updated using the observed data.In the hidden layer, Equation ( 3) is modified using a nonlinear function, such as a sigmoid given by the following expression Equation ( 4), in turn, serves as an input to the next layer, and this strategy tends to reduce the effect of extreme input values, providing better ANN performance.Further details on the use of ANN to predict time-series can be obtained in Zhang et al. [27].It is possible to highlight works on the use of ANN to predict wind speed time-series, as in Saavedra-Moreno et al. [29], where the authors predicted wind speed at a wind farm using artificial neural networks.The ANN was based on the backpropagation learning algorithm, the evaluation of which is performed with real data measurement from two different locations.In the ANN structure similar to Figure 2, the node outputs in one layer are inputs to the next layer.The result is then modified by a nonlinear function before the output.For example, inputs into neurons j in hidden layers can be linearly combined through the expression where b j and w i,j are parameters discovered in the "learning" step from the observed data of the current time-series.It is worth noting that the weights may initially assume random values which, in turn, will be updated using the observed data.In the hidden layer, Equation ( 3) is modified using a nonlinear function, such as a sigmoid given by the following expression Equation ( 4), in turn, serves as an input to the next layer, and this strategy tends to reduce the effect of extreme input values, providing better ANN performance.Further details on the use of ANN to predict time-series can be obtained in Zhang et al. [27].It is possible to highlight works on the use of ANN to predict wind speed time-series, as in Saavedra-Moreno et al. [29], where the authors predicted wind speed at a wind farm using artificial neural networks.The ANN was based on the backpropagation learning algorithm, the evaluation of which is performed with real data measurement from two different locations.

Hybrid Modelling
In this work three forms of hybrid modeling were tested for wind speed prediction, which are all described below.

Hybrid Model (ARIMA + ANN)
A hybrid model is made using the combination of the ARIMA (a time-series model) with ANN (an artificial intelligence model).In Zhang [28], the author proposes the combination of the ARIMA and ANN models, stating that this model can capture different patterns in the data.Statistical models of the ARIMA type are able to capture linear patterns.However, because of its flexible nonlinear modeling capacity, the use of ANN for time-series forecasts has been done [30].The combined hybrid forecast model for ARIMA and ANN seasonal time-series is represented as follows.
Hybrid(1) = ARIMA forecast + ANN forecast/ARIMA (6) where Hybrid(1) represents the prediction of the hybrid model, which is the result of the addition of the linear component predicted by the ARIMA forecast model with the nonlinear component predicted by the ANN forecast/ARIMA model.The latter component is modeled from the residues of the ARIMA model, which are proven to be nonlinear by the Breusch-Pagan Test, as shown in Table 1.It is important to note that this methodology has already been used in several wind speed time-series prediction papers, as in Cadenas and Rivera [31], where the authors proposed a hybrid model based on ARIMA and ANN to predict wind speed in three different places of Mexico.

Hybrid Models (ARIMAX + ANN) and (HW + ANN)
The proposed hybrid models in this paper have similarities with the one previously described.The first is composed of one linear component-with the ARIMAX model which uses the exogenous variables pressure, temperature, and precipitation-and one nonlinear component-which uses the residues of the ARIMAX model to be modeled in ANN.This is represented in the following expression.Hybrid(2) = ARIMAX forecast + ANN forecast/ARIMAX (7) where Hybrid(2) represents the prediction of the hybrid model, which is a result of the addition of the linear component predicted by the ARIMAX forecast model with the nonlinear component predicted by the ANN forecast/ARIMAX model (the component that will be modeled from the residues from the ARIMAX model).
The second hybrid model is proposed from the combination of the HW model (as the linear component) and ANN model (as the nonlinear component).The mathematical representation of this model is given by the following expression.Hybrid(3) = HW forecast + ANN forecast/HW (8) where Hybrid(3) represents the prediction of the hybrid model, which is the result of the addition of the linear component predicted by the model HW forecast , with the nonlinear component predicted by the model ANN forecast/HW .The latter component is also modeled from the HW forecast model's residues.The innovative feature of these two hybrid models in the attempt to make time-series forecasts is worth noting, especially for wind speed.

Accuracy Measurements
One of the ways to verify the accuracy of the proposed forecast models is through statistical errors analysis.The mean absolute error measures the mean error value between the observed and adjusted series.In this analysis, this variable will be represented by MAE, the mathematical representation of which is given by in which v adj represents the individual value of the adjusted wind speed time-series, v obs represents the individual value of the observed wind speed time-series, and n is the series' order.In this study MAE is measured in meters per second (m/s).The root mean square error (RMSE) represents the individual quadratic differences between the wind speed time-series, relative to observations and adjustments.
As in MAE, the RMSE calculated in Equation ( 10) is measured in m/s.It is important to note that RMSE can still be interpreted as follows.If there are large RMSE values, they represent large errors in the adjusted variables, and values close to zero indicate a near perfect fit.Another form of error measurement is the mean absolute percentage error (MAPE).The great advantage of using this expression is in its percentage representation (%) which provides quick understanding.A disadvantage that must be considered is that, if the observed value is too small, any discrepancy causes MAPE to "explode".The expression used with this variable is represented by Further details on statistical errors, especially about Equations ( 9)-( 11) can be found in [19,32].Another measure to identify the adjustments' quality is the Nash-Sutcliffe (NS) efficiency coefficient.The coefficient of efficiency can go as high as 1, with this value indicating a perfect fit.According to Galanis et al. [33], when the NS value is higher than 0.75, the performance of the model is considered good.For NS values between 0.36 and 0.75, the performance is considered acceptable, while NS values less than 0.36 deem the model unacceptable.The expression used for the calculation of NS is given by (12) in which v obs represents the observed series average.

Results and Discussion
Rainfall being set as an exogenous variable in the making of the ARIMAX models was motivated by the influence that it exerts on the intensity of wind speed in the northeastern region of Brazil, as commented in [23][24][25].In order to better identify this influence from the data used in this paper, Figure 3 shows the relationships between the monthly mean wind speed (black line) and precipitation (blue line) of the study regions Fortaleza (Figure 3a), Parnaíba (Figure 3b), and Natal (Figure 3c).The differences between these two variables during the year, with respect to intensities, are inverted.As such, it is possible to propose the following relationship: the periods of highest monthly precipitation averages occur in the first semester, and this result may be influencing the lower monthly wind speed averages in those periods.However, in the second half of the year, higher monthly wind speed averages and smaller monthly precipitation averages are identified.For example, in Fortaleza, the peak of the monthly mean precipitation occurs in April with an approximate value of 350 mm, and this same month shows the lowest monthly wind speed average (approximately 3.0 m/s).The highest monthly wind speed average is recorded in October with an approximate value of 6.0 m/s, while the average precipitation is less than 10 mm.The differences in wind speed estimations for the study regions in the different periods may reveal the importance of accounting for the inclusion of rainfall in the ARIMAX model in order to provide precision in predictions.
Atmosphere 2018, 9, x FOR PEER REVIEW 10 of 18 average precipitation is less than 10 mm.The differences in wind speed estimations for the study regions in the different periods may reveal the importance of accounting for the inclusion of rainfall in the ARIMAX model in order to provide precision in predictions.The identification of the viability of the ARIMA and ARIMAX models in adjusting the observed study region data is done by analyzing the εt residues, which must behave like white noise.With The identification of the viability of the ARIMA and ARIMAX models in adjusting the observed study region data is done by analyzing the ε t residues, which must behave like white noise.With regards to this, several assumptions are made: there must be normal distribution (shown by the Shapiro-Wilk test), independence (shown by the Durbin-Watson test), and the equality of variance must be verified, i.e., the homoscedasticity hypothesis (shown by the Breusch-Pagan test).In Table 2 the values shown are relative to the p-value of each test in each locality, and these tests were applied with a significance level of α = 0.05.Residue analyses for all locations show that all assumptions are met for both ARIMA and ARIMAX models, since all p-values are higher than the significance level.These results are indicative of the adherence of these models' adjustments to observed data.Table 3 shows the results of the statistic errors.In Fortaleza and Natal, the Hybrid(2) model composed of the ARIMAX and ANN models showed the lowest values; the values provided by the MAPE were approximately 8.03% and 7.21% for the regions, respectively.For Parnaíba, the Hybrid(3) model-composed of the HW and ANN models-showed the lowest error values; the MAPE value was approximately 4.93%.The hybrid models proposed in this work are able to reduce the error measures when compared to the traditional models in the literature; in Parnaíba, the MAPE values for the HW and ANN models are close to 14.21% and 19.25%, respectively.This result may provide greater similarities between the adjusted and observed time-series and, thus, produce more accurate wind speed predictions.Through comparison with the literature similar to that mentioned in this paper on monthly wind speed average forecasts, it is possible to affirm that the errors shown in Table 3 are similar, for example, to those found in Cadenas and Rivera [34].This reference compared the performance of the ARIMA and ANN models in an attempt to forecast monthly wind speed averages in the coastal region of Mexico.According to the authors, the ARIMA model showed greater sensitivity to wind velocity adjustment and better prediction for this particular with a MAPE of approximately 13.40%.However, it is likely that, with the increase in the number of training vectors, the ANN model's fitting performance will be improved, providing statistical errors smaller than the presented MAPE of about 20.70%.
Fadare [35] used monthly mean wind speed data from 28 meteorological stations for a period of 20 years.In this study, data from 18 stations was used to train the model and data from 10 stations was used to test it.The proposed ANN consisted of four inputs, two hidden layers, and one output.The author used latitude, longitude, altitude, and the month of the year as inputs of the proposed model, while the wind speed was the ANN output.The results indicate that the proposed topology shows high precision in the average monthly wind speed forecast, scoring a MAPE of 8.9%.More importantly, the correlation coefficient between the forecast and observation was 0.983, which bears witness to the performance of this model.
Figure 4 shows time-series monthly averages, both observed (in black line) and adjusted (blue line).In Fortaleza (Figure 4a) and Natal (Figure 4c), the comparisons of their respective observations are made with the adjustments of the best model, according to the results of the statistic error.In this case, this was the Hybrid(2) model.For Parnaíba (Figure 4b), the observed time-series is compared with the Hybrid(3) model, which presented the best fit according to the statistic error results.In the three regions it is possible to identify the existence of similarities between the two time-series (observed and adjusted), especially in terms of maximum and minimum values of wind speed in most months.For example, for the two time-series (adjusted and observed) in Fortaleza, the values of minimum wind speed in 2011, occurred in the same April and had the same value of approximately 2.5 m/s.In terms of the maximum wind speed values for 2011, September was the most common period of occurrence for the two time-series (adjusted and observed), with values close to 6.0 m/s. Figure 4 also shows the values of the NS coefficient in each region, with values 0.86, 0.96, and 0.79, respectively, in Fortaleza, Parnaíba, and Natal.These NS values reveal the good performance of hybrid models in providing efficiency in adjusting to observed data.Through comparison with the literature similar to that mentioned in this paper on monthly wind speed average forecasts, it is possible to affirm that the errors shown in Table 3 are similar, for example, to those found in Cadenas and Rivera [34].This reference compared the performance of the ARIMA and ANN models in an attempt to forecast monthly wind speed averages in the coastal region of Mexico.According to the authors, the ARIMA model showed greater sensitivity to wind velocity adjustment and better prediction for this particular case, with a MAPE of approximately 13.40%.However, it is likely that, with the increase in the number of training vectors, the ANN model's fitting performance will be improved, providing statistical errors smaller than the presented MAPE of about 20.70%.
Fadare [35] used monthly mean wind speed data from 28 meteorological stations for a period of 20 years.In this study, data from 18 stations was used to train the model and data from 10 stations was used to test it.The proposed ANN consisted of four inputs, two hidden layers, and one output.The author used latitude, longitude, altitude, and the month of the year as inputs of the proposed model, while the wind speed was the ANN output.The results indicate that the proposed topology shows high precision in the average monthly wind speed forecast, scoring a MAPE of 8.9%.More importantly, the correlation coefficient between the forecast and observation was 0.983, which bears witness to the performance of this model.
Figure 4 shows time-series monthly averages, both observed (in black line) and adjusted (blue line).In Fortaleza (Figure 4a) and Natal (Figure 4c), the comparisons of their respective observations are made with the adjustments of the best model, according to the results of the statistic error.In this case, this was the Hybrid(2) model.For Parnaíba (Figure 4b), the observed time-series is compared with the Hybrid(3) model, which presented the best fit according to the statistic error results.In the three regions it is possible to identify the existence of similarities between the two time-series (observed and adjusted), especially in terms of maximum and minimum values of wind speed in most months.For example, for the two time-series (adjusted and observed) in Fortaleza, the values of minimum wind speed in 2011, occurred in the same April and had the same value of approximately 2.5 m/s.In terms of the maximum wind speed values for 2011, September was the most common period of occurrence for the two time-series (adjusted and observed), with values close to 6.0 m/s. Figure 4 also shows the values of the NS coefficient in each region, with values 0.86, 0.96, and 0.79, respectively, in Fortaleza, Parnaíba, and Natal.These NS values reveal the good performance of hybrid models in providing efficiency in adjusting to observed data.According to a study carried out by the National Institute of Meteorology (INMET (http://www.inmet.gov.br/portal/index.php?r=clima/normaisclimatologicas (accessed on 3 September 2016))) regarding the climatological normals throughout all of Brazilian territory, the months of greatest precipitation (called the rainy season) in the city of Fortaleza are between February and May, peaking in April.According to Camelo et al. [36], the monthly mean wind speed intensity for this season in Fortaleza is lower than the period of low precipitation (called the dry season), which is between October and December.In the case of the Hybrid(2) model, it is possible to identify in Figure 4a that the model is able to represent seasonal characteristics of the wind speed in Fortaleza; there are lower intensities in the rainy season and higher intensities for the dry season.Still, according to INMET, regarding the historical series of precipitation in the cities of Natal and Parnaíba, both presented a rainy period in the first semester of the year.In Natal, this was between March and July (with a peak in April), and in Parnaíba, this was between February and May (also with a peak in April).In the same way as Fortaleza, the Hybrid(2) and Hybrid(3) adjustments in Natal and Parnaíba, respectively, were also able to represent the seasonal wind characteristics of the series observed in both regions, i.e., lower intensities occurring in the rainy season, and higher intensities in the dry season.
To further illustrate the quality of the Hybrid(2) and Hybrid(3) models, forecasts were made for the following year of observations, i.e., for the months of 2015.These were compared with the observational data of that year.Figure 5 shows these comparisons for Fortaleza (Figure 5a), Parnaíba (Figure 5b), and Natal (Figure 5c), where the observed time-series are represented as empty circles and the predicted time-series are represented as solid black circles.The results of these comparisons show the efficiency of the model with respect to the monthly wind speed averages forecast.They are able to follow the profile of the series observed in the three regions, showing values very close to each other.For example, in Natal, the forecast and observation values for February 2015 were similar at approximately 5.0 m/s.It is important to emphasize that this precision was already expected on this According to a study carried out by the National Institute of Meteorology (INMET (http://www.inmet.gov.br/portal/index.php?r=clima/normaisclimatologicas (accessed on 3 September 2016))) regarding the climatological normals throughout all of Brazilian territory, the months of greatest precipitation (called the rainy season) in the city of Fortaleza are between February and May, peaking in April.According to Camelo et al. [36], the monthly mean wind speed intensity for this season in Fortaleza is lower than the period of low precipitation (called the dry season), which is between October and December.In the case of the Hybrid(2) model, it is possible to identify in Figure 4a that the model is able to represent seasonal characteristics of the wind speed in Fortaleza; there are lower intensities in the rainy season and higher intensities for the dry season.Still, according to INMET, regarding the historical series of precipitation in the cities of Natal and Parnaíba, both presented a rainy period in the first semester of the year.In Natal, this was between March and July (with a peak in April), and in Parnaíba, this was between February and May (also with a peak in April).In the same way as Fortaleza, the Hybrid(2) and Hybrid(3) adjustments in Natal and Parnaíba, respectively, were also able to represent the seasonal wind characteristics of the series observed in both regions, i.e., lower intensities occurring in the rainy season, and higher intensities in the dry season.
To further illustrate the quality of the Hybrid(2) and Hybrid(3) models, forecasts were made for the following year of observations, i.e., for the months of 2015.These were compared with the observational data of that year.Figure 5 shows these comparisons for Fortaleza (Figure 5a), Parnaíba (Figure 5b), and Natal (Figure 5c), where the observed time-series are represented as empty circles and the predicted time-series are represented as solid black circles.The results of these comparisons show the efficiency of the model with respect to the monthly wind speed averages forecast.They are able to follow the profile of the series observed in the three regions, showing values very close to each other.For example, in Natal, the forecast and observation values for February 2015 were similar at approximately 5.0 m/s.It is important to emphasize that this precision was already expected on this occasion as influence is received from the perfect adjustments.Another detail identified about the forecasts is that they can also represent the seasonality of their respective observed time-series, i.e., lower and higher wind speed intensities according to the rainy and dry seasons.
Atmosphere 2018, 9, x FOR PEER REVIEW 14 of 18 occasion as influence is received from the perfect adjustments.Another detail identified about the forecasts is that they can also represent the seasonality of their respective observed time-series, i.e., lower and higher wind speed intensities according to the rainy and dry seasons.It is important to highlight that, although the wind speed prediction results in this paper were presented at 10 m above ground and currently the wind turbines in Brazil are being constructed above 100 m in height, the models here presented do not depend on heights (according to ABEEólica (http://www.portalabeeolica.org.br/noticias/5283-torres-mais-altas-elevam-em-6-vezes-potencialelico.html(accessed on 3 September 2016))).This means they can be deployed at any level of height; it is sufficient to only enter the time series observed in the models to provide the possible projections.This work could be a possible tool to use in several areas.For example, it can be used in the wind sector, providing, more information for the decision makers of the sector by possibly forecasting the intensity of the local wind speed.Another highlight is the usefulness of the prediction of wind speed in terms of monthly averages to assist the wind sector.It is possible to find several works on wind speed forecasting in the literature, however, various methods classify wind power prediction by timescale or methodology.Timescale classification for methods of wind power forecasting vary in It is important to highlight that, although the wind speed prediction results in this paper were presented at 10 m above ground and currently the wind turbines in Brazil are being constructed above 100 m in height, the models here presented do not depend on heights (according to ABEEólica (http://www.portalabeeolica.org.br/noticias/5283-torres-mais-altas-elevam-em-6-vezespotencial-elico.html(accessed on 3 September 2016))).This means they can be deployed at any level of height; it is sufficient to only enter the time series observed in the models to provide the possible projections.This work could be a possible tool to use in several areas.For example, it can be used in the wind sector, providing, more information for the decision makers of the sector by possibly forecasting the intensity of the local wind speed.Another highlight is the usefulness of the prediction of wind speed in terms of monthly averages to assist the wind sector.It is possible to find several works on wind speed forecasting in the literature, however, various methods classify wind power prediction by timescale or methodology.Timescale classification for methods of wind power forecasting vary in the literature.According to Chang [37], timescale methods of wind power forecasting can be divided into four categories, shown in Table 4:

Figure 1 .
Figure 1.Map of Brazil with the representation of study regions: Fortaleza, Natal, and Parnaíba.

Figure 1 .
Figure 1.Map of Brazil with the representation of study regions: Fortaleza, Natal, and Parnaíba.
. The Pitts and McCulloch model consisted of only one output and one input function, calculated by the weighted sum of several values.This model resembles the human neuron, where signals are received with different intensities causing a certain reaction on a part of the nerve cell.Although biologically inspired by the human neuron, ANNs have found applications in different scientific areas.

Figure 2 .
Figure 2. Illustration of an ANN structure with backpropagation algorithm.

Figure 2 .
Figure 2. Illustration of an ANN structure with backpropagation algorithm.

Figure 4 .
Figure 4. Monthly wind speed in (a) Fortaleza; (b) Parnaíba; and (c) Natal.A comparison between observed (continuous line in black) and adjusted (continuous line in blue) time-series.Information is given about the NS value to identify the fitting quality in each region.

Figure 4 .
Figure 4. Monthly wind speed in (a) Fortaleza; (b) Parnaíba; and (c) Natal.A comparison between observed (continuous line in black) and adjusted (continuous line in blue) time-series.Information is given about the NS value to identify the fitting quality in each region.

Table 1 .
Presentation of tests that identify white noise in Box-Jenkins and Box-Tiao models.
1 : sample does not come from a normal population.Decision making: if the p-value is greater than α, i.e., p > 0.05 (do not reject H 0 ).

Table 2 .
Tests applied to the residues to identify if the assumptions of the ARIMA and ARIMAX models were met.

Table 3 .
Error statistics to identify the proposed model's accuracy.