A Short-Term Wind Speed Forecasting Model by Using Artiﬁcial Neural Networks with Stochastic Optimization for Renewable Energy Systems

: To efﬁciently manage unstable wind power generation, precise short-term wind speed forecasting is critical. To overcome the challenges in wind speed forecasting, this paper proposes a new convolutional neural network algorithm for short-term forecasting. In this paper, the forecasting performance of the proposed algorithm was compared to that of four other artiﬁcial intelligence algorithms commonly used in wind speed forecasting. Numerical testing results based on data from a designated wind site in Taiwan were used to demonstrate the efﬁciency of above-mentioned proposed learning method. Mean absolute error (MAE) and root-mean-square error (RMSE) were adopted as accuracy evaluation indexes in this paper. Experimental results indicate that the MAE and RMSE values of the proposed algorithm are 0.800227 and 0.999978, respectively, demonstrating very high forecasting accuracy.


Introduction
The depletion of fossil fuels, increased environmental pollution and the development and maximum utilization of renewable energy sources have attracted the attention of experts and scholars globally [1][2][3].Sustainability transitions are long-term, multi-dimensional, and fundamental transformation processes [4], and it is also one of the greatest challenges in the 21st century [5].As a non-polluting renewable energy source, wind energy is highly valued by many countries.Wind power generation has emerged as one of the most mature renewable energy power generation technologies including high commercialization potential [6].
In aspect of the prices time series problem [7][8][9], Cincotti et al., used the three different methods to the model this issue: a discrete-time univariate econometric model and two artificial intelligence techniques.Support vector machine (SVM) methodology gives better forecasting accuracy for price time series [8].However, this paper proposes that the performance of WindNet compared to that of SVM, random forest (RF), decision tree (DT), multilayer perceptron (MLP), convolutional neural network CNN, and long short-term memory (LSTM) architectures is the best in that its average MAE and RMSE values are the lowest.
Short-term wind speed prediction of wind farms is one of the most effective ways to solve the above-mentioned energy problems.This corresponds to the effective prediction of the wind speed and the wind farm power output, according to the power curve of the wind turbine.This enables the Energies 2018, 11, 2777 2 of 20 power dispatch system to adjust the dispatch schedule in time, according to changes in wind power output.This ensures power quality, reduces power system reserve capacity and lowers power system operation cost.Effective prediction will also improve wind power penetration rate and reduce the impact of wind power on the power grid.Therefore, accurate short-term wind power forecasting can reduce the risk of power grid transmission and integration [10].
According to theory, since wind speed pattern is determined by natural meteorological rules, the pattern has inherent regularity, which proves the feasibility for wind speed forecasting.However, in practice, wind speed fluctuates randomly and is unstable [11].Different wind speeds correspond to different locations, sometimes wind speeds at the same location may differ with time, while wind speeds may also vary at the same altitude.Factors such as seasonality, temperature, humidity, air pressure and other parameters have to be considered [12].Therefore, there are still significant obstacles in wind speed forecasting.
Renewable energy issues has recently attracted much attention, thus there have been many studies on wind speed forecasting [13].According to [14] a Bayesian structural break model was proposed to conduct exact short-term wind speed forecasting.The experiment testing data is actual data collected from utility scale wind turbines.The experiment also applied mean absolute error (MAE), mean square error (MSE) and root mean square error (RMSE) for performance evaluation.Furthermore, the experiment could also be applied in applications such as wind turbine predictive control and wind power scheduling.The precision of this method is very high, but it is only suitable for ultra-short wind speed predictions of a few seconds to a few hours.Authors [15] proposed a wind speed and solar radiation co-testing system based on extreme learning machines (ELM) and principal components analysis (PCA).According to literature [15], PCA can be used to reduce data dimension, since this approach can greatly reduce the complexity of model training.The experiment also proved that while maintaining forecasting accuracy, ELM model training is significantly faster compared to (a) multi-layer perceptron network, (b) radial basis function networks and (c) least squares support vector machines.However, this method is applicable exclusively for short-term wind speed forecasting (few hours) and not for long-term predictions.Wang et al. [16], integrated various multi-step-ahead wind speed forecasting models and furthermore compared and analyzed each model.They further mentioned that multi-step-ahead prediction of wind speed is very challenging and that this goal can be achieved by adopting the weather research and forecasting (WRF) model.In addition, they also mentioned that multiple strategies are more effective than single strategies in the research of wind speed prediction.The integration of various models to construct a combination of direct and multi-input multi-output strategies (COMB-DIRMOs) corresponds to a practical, effective and robust model.
Currently there are many studies related to wind speed-forecasting methods and each method has its own advantages and disadvantages.Therefore, many experts and scholars use ensemble methods to predict wind speed [17].Ensemble methods can be divided into two major types: competitive ensemble forecasting and cooperative ensemble forecasting.Ren et al. [17], compared and proposed improvements on the most current state-of-the-art wind speed prediction ensemble methods.Jiang et al. [18], combined v-support vector machine (v-SVM) and cuckoo search (CS) to conduct short-term wind speed forecasting.In this method, CS is used to adjust the parameters of v-SVM and the experimental results proved that the performance of CS is improved compared to that of particle swarm optimization (PSO).Zhang et al. [19], integrated three different methods to predict wind speed: the ensemble empirical mode decomposition (EEMD), adaptive neural network-based fuzzy inference system (ANFIS) and seasonal auto-regression integrated moving average (SARIMA).Jiang et al. [18] and Zhang et al. [19] both proposed hybrid wind speed prediction models that present an adequate prediction performance.Nevertheless, both studies present predictive wind speed results within a few hours period and are not applicable for long term wind speed analysis.
As it concerns neural network wind speed prediction, More and Deo [20] proposed a basic neural network architecture to conduct wind speed prediction and its feasibility was experimentally confirmed.However, there are various kinds of neural network architectures and there are several variants concerning even the most basic neural networks.The most basic neural network architecture was presented by Li and Shi [21].Prediction comparisons were conducted between three different models (adaptive linear element, back propagation and radial basis function).The result of the research indicated that these three methods include advantages and disadvantages.Thus, the selection of a neural network model is a significant issue.Guo et al. [22], proposed a multi-step forecasting model to achieve wind speed prediction.This method applied the empirical mode decomposition (EMD) neural network, which corresponds to the use of a number of traditional neural networks to predict wind speed.Although this method has been proven effective during experiments, the parameters of neural networks that need to be trained further increased.Hence, the complexity of the training may increase significantly.However, with the development of deep learning technology, the convolutional neural network (CNN) architecture adopted in this paper not only differentiates among various machine-learning algorithms, but also additionally predicts wind speed for a 3 day-period including the highest accuracy.Hence, this provides to the energy management systems the most precise and efficient power dispatching.
The major contributions of this paper include: (a) The development of a powerful wind speed-forecasting algorithm for renewable energy systems; (b) Comparison of the performances of the several popular machine learning methods on the challenge of wind speed forecasting and (c) Demonstration of the feasibility and practicality of the proposed model as a significant wind speed forecasting application.
This paper is organized as follows: Section 2 reviews the variety of renewable resource forecasting techniques.Section 3 introduces the proposed CNN model.Section 4 illustrates the wind speed forecasting results of the proposed model and the comparison results between the proposed model and current methods.Discussions are mentioned in Section 5. Finally, Section 6 concludes the experimental results of this paper.

Renewable Resource Forecasting Techniques Overview
Figure 1 illustrates the architecture of a renewable energy management system [23].The energy management system (EMS) addresses issues of energy control, management, maintenance, and consumption, to assist in the electrical equipment maintenance and repair within the factory, farm, or even a whole city.It can monitor the operation status of the equipment and improve overall management immediately.Good management practices can extend the life of electrical equipment and reduce costs.In the event of equipment failures or various other conditions, the system can immediately send out an alarm to facilitate management personnel monitoring and maintenance and thus losses are decreased to minimum.In the case of aging intensive energy-demanding devices, the EMS can also notify management personnel to proceed to a replacement.In the renewable energy management system, the EMS can use programmed control system technology, network communication technology and database technology, to connect renewable energy data collection, monitoring stations and management and control centers distributed on the site to achieve data collection, storage, processing, statistics, query and analysis, and even data monitoring, and diagnosis.
EMS can also notify management personnel to proceed to a replacement.In the renewable energy management system, the EMS can use programmed control system technology, network communication technology and database technology, to connect renewable energy data collection, monitoring stations and management and control centers distributed on the site to achieve data collection, storage, processing, statistics, query and analysis, and even data monitoring, and diagnosis.Based on Figure 1, it is demonstrated that the EMS dispatches renewable energy generated power following a power prediction through the forecasting model, to achieve the goals of energy monitoring and effective management.Through centralized monitoring and effective management of energy data, energy consumption per unit is reduced and additionally economic and energy efficiency is significantly improved.Therefore, the forecasting model represents a crucial parameter in the EMS.
Many applications related to energy forecasting are available.Figure 2 illustrates the distribution of various applications in special resolution and forecast time coordinates.The horizontal axis of Figure 2 corresponds to the forecast horizon (time) and its resolution is divided into four different predictions of time periods: seconds, minutes, hours, and days.The vertical axis corresponds to spatial resolution, which is ordinarily divided into 3 categories: interconnection level, transmission level and distribution level.Figure 2 also lists several of the most common applications such as voltage regulation, grid stability, power reserve management (primary, secondary and tertiary), dispatching and load following, unit commitment and transmission scheduling.It is worth noting that the main focus of this paper is that dispatching and load following is positioned at the transmission level, its prediction time length, as far as current studies are concerned, mostly falls only within a few minutes to a few hours.
In addition, Figure 3 demonstrates the classification of different prediction methods [24].In this figure, the vertical axis corresponds to the spatial resolution (distance) and the horizontal axis represents forecast horizon (time).Figure 3 includes a sufficient classification and description of a variety of prediction methods and models.At present, common prediction, methods include persistence, autoregressive (AR), moving average (MA), ARMA, artificial neural network (ANN), support vector regression (SVR), fuzzy theory, statistical models, sky imagers, satellite imagery, mesoscale numerical weather prediction (NWP) and global NWP.Among these methods, sky imagers fall in the interval of 1 s to 30 min and 1 m to 2 km.Satellite imagery fall between 15 min to 6 h and 1 km to 10 km.Mesoscale NWP falls between 4 h to 120 h and 5 km to 20 km.Global NWP falls between 12 h to days and 10 km to 90 km.The ANN algorithm applied in this paper, based on current research, has a prediction time length that corresponds to the period of a few seconds to several hours.Based on Figures 2 and 3, it is derived that current studies on power dispatch and ANN prediction cannot achieve up to day-length forecasting results.The CNN prediction model proposed in this paper could analyze data from the past 7 days and predict wind speed conditions for the next 3 days.Hence, the EMS can accurately forecast future electricity output.
level and distribution level.Figure 2 also lists several of the most common applications such as voltage regulation, grid stability, power reserve management (primary, secondary and tertiary), dispatching and load following, unit commitment and transmission scheduling.It is worth noting that the main focus of this paper is that dispatching and load following is positioned at the transmission level, its prediction time length, as far as current studies are concerned, mostly falls only within a few minutes to a few hours.In addition, Figure 3 demonstrates the classification of different prediction methods [24].In this figure, the vertical axis corresponds to the spatial resolution (distance) and the horizontal axis represents forecast horizon (time).Figure 3 includes a sufficient classification and description of a variety of prediction methods and models.At present, common prediction, methods include persistence, autoregressive (AR), moving average (MA), ARMA, artificial neural network (ANN), support vector regression (SVR), fuzzy theory, statistical models, sky imagers, satellite imagery, mesoscale numerical weather prediction (NWP) and global NWP.Among these methods, sky imagers fall in the interval of 1 s to 30 min and 1 m to 2 km.Satellite imagery fall between 15 min to 6 h and 1 km to 10 km.Mesoscale NWP falls between 4 h to 120 h and 5 km to 20 km.Global NWP

The Proposed CNN Model
Neural networks present powerful modeling capabilities and are widely used in many applications.In this section, authors introduce the basic multilayer perceptron (MLP) [25] and the convolutional neural network (CNN) [26][27][28][29][30][31] architecture.The WindNet algorithm proposed in this

The Proposed CNN Model
Neural networks present powerful modeling capabilities and are widely used in many applications.In this section, authors introduce the basic multilayer perceptron (MLP) [25] and the convolutional neural network (CNN) [26][27][28][29][30][31] architecture.The WindNet algorithm proposed in this paper is also described in this chapter.

Multilayer Perceptron
The basic computation unit in a neural network is a neuron, commonly referred to as a "node" or "unit".The node receives input from other nodes or receives input from an external source and calculates the output.Each input is supplemented with "weight" (w), which depends on the relative importance of other inputs.Feedfoward neural network is the first invented and simplest artificial neural network.It contains multiple neurons (nodes) arranged in multiple layers.Nodes in adjacent layers have connections or edges.All connections are equipped with weights.In the definition of MLP, there is at least one hidden layer (excluding one input layer and one output layer).The architecture of the fully connected neural network is as illustrated in Figure 4.The leftmost green circle represents the hidden layer, the middle yellow circle corresponds to the hidden layer, the rightmost red circle serves as output and the blue line represents weight.In the MLP architecture, each layer is fully connected.The MLP uses backpropagation to adjust the weight value during each training session.Following the MLP training, the calculation result can be output through layer-by-layer transfer.

Convolution Neural Network
Although the performance of MLP seems efficient in all aspects, CNN could present an improved performance for feature extraction capabilities.CNN, which can perform feature extraction automatically, can be applied to image recognition and natural language processing.It can also effectively reduce the load of neural network training due to the introduction of the convolution layer concept.CNN is also a cognitive method that mimics the human brain.As an example, if a human brain identifies an image, it may initially notice the distinctly colored points, lines and planes and then identify them into different shapes such as eyes, nose and mouth.This abstraction process is identical to the way that the CNN algorithm builds the model.The convolution layer shifts the comparison of points to comparison of sections, through analyzing characteristics block by block.Then, the integrated comparison results are gradually stacked and a better identification result can be obtained.
The 1D convolution process is as illustrated in Figure 5.The filter stands at the top of Figure 5.There are three weight values in the filter, hence in this example, its kernel size equals 3. The convolution process consists of the multiply of the corresponding sequence in the input by the weight value of the filter and adding the results.The filter will stride one by one in the input sequence to calculate the result.It is worth noting that the weight value on the filter is determined through

Convolution Neural Network
Although the performance of MLP seems efficient in all aspects, CNN could present an improved performance for feature extraction capabilities.CNN, which can perform feature extraction automatically, can be applied to image recognition and natural language processing.It can also effectively reduce the load of neural network training due to the introduction of the convolution layer concept.CNN is also a cognitive method that mimics the human brain.As an example, if a human brain identifies an image, it may initially notice the distinctly colored points, lines and planes and then identify them into different shapes such as eyes, nose and mouth.This abstraction process is identical to the way that the CNN algorithm builds the model.The convolution layer shifts the comparison of points to comparison of sections, through analyzing characteristics block by block.Then, the integrated comparison results are gradually stacked and a better identification result can be obtained.
The 1D convolution process is as illustrated in Figure 5.The filter stands at the top of Figure 5.There are three weight values in the filter, hence in this example, its kernel size equals 3. The convolution process consists of the multiply of the corresponding sequence in the input by the weight value of the filter and adding the results.The filter will stride one by one in the input sequence to calculate the result.It is worth noting that the weight value on the filter is determined through backpropagation and is not determined manually.Unlike the MLP architecture, CNN uses a lower number of weights, which is one of the results that CNN can obtain faster convergence results.

The Proposed Model
The WindNet model proposed in this paper is presented in Figure 6.WindNet incorporates CNN with fully connected architectures.The input of WindNet is the wind speed record for the past 7 days and the output is the wind speed estimation for the following 3 days.Since wind speed data is collected hourly, the data volume of the previous 7 days corresponds to 24 × 7 = 168 data sets, while the data volume of the following 3 days equals 24 × 3 = 72 sets.After collecting data from the previous 7 days, WindNet will perform 1D convolution.Here, authors used 16 filters to perform convolution, hence the feature map shape of 1D convolution equals 168 × 16.To facilitate the connection of the subsequent estimation framework, after the 1D convolution layer, the feature map was flattened, to turn the shape of its feature map back to one dimension.This is the feature extraction process.Subsequently, WindNet will import the extracted features into a 2-layer fully connected architecture.The number of fully connected neurons in both layers equals 72, which is identical to the output length.Finally, FC2 output corresponds to the wind speed forecast of the next 3 days.In WindNet, 2 activation functions were used: sigmoid and Rectified Linear Unit (ReLU).Related formulas are presented in Equations ( 1) and ( 2).In 1D convolution, the activation function used by WindNet is ReLU to minimize the problem of gradient vanishing.In the fully connected architecture in the latter layer, the sigmoid function was selected to limit the output value range to [0,1].The sigmoid and ReLU diagrams are illustrated in Figure 7.

The Proposed Model
The WindNet model proposed in this paper is presented in Figure 6.WindNet incorporates CNN with fully connected architectures.The input of WindNet is the wind speed record for the past 7 days and the output is the wind speed estimation for the following 3 days.Since wind speed data is collected hourly, the data volume of the previous 7 days corresponds to 24 × 7 = 168 data sets, while the data volume of the following 3 days equals 24 × 3 = 72 sets.After collecting data from the previous 7 days, WindNet will perform 1D convolution.Here, authors used 16 filters to perform convolution, hence the feature map shape of 1D convolution equals 168 × 16.To facilitate the connection of the subsequent estimation framework, after the 1D convolution layer, the feature map was flattened, to turn the shape of its feature map back to one dimension.This is the feature extraction process.Subsequently, WindNet will import the extracted features into a 2-layer fully connected architecture.The number of fully connected neurons in both layers equals 72, which is identical to the output length.Finally, FC2 output corresponds to the wind speed forecast of the next 3 days.

The Proposed Model
The WindNet model proposed in this paper is presented in Figure 6.WindNet incorporates CNN with fully connected architectures.The input of WindNet is the wind speed record for the past 7 days and the output is the wind speed estimation for the following 3 days.Since wind speed data is collected hourly, the data volume of the previous 7 days corresponds to 24 × 7 = 168 data sets, while the data volume of the following 3 days equals 24 × 3 = 72 sets.After collecting data from the previous 7 days, WindNet will perform 1D convolution.Here, authors used 16 filters to perform convolution, hence the feature map shape of 1D convolution equals 168 × 16.To facilitate the connection of the subsequent estimation framework, after the 1D convolution layer, the feature map was flattened, to turn the shape of its feature map back to one dimension.This is the feature extraction process.Subsequently, WindNet will import the extracted features into a 2-layer fully connected architecture.The number of fully connected neurons in both layers equals 72, which is identical to the output length.Finally, FC2 output corresponds to the wind speed forecast of the next 3 days.In WindNet, 2 activation functions were used: sigmoid and Rectified Linear Unit (ReLU).Related formulas are presented in Equations ( 1) and ( 2).In 1D convolution, the activation function used by WindNet is ReLU to minimize the problem of gradient vanishing.In the fully connected architecture in the latter layer, the sigmoid function was selected to limit the output value range to [0,1].The sigmoid and ReLU diagrams are illustrated in Figure 7.As it concerns the programming, the wind speed database will be read first to achieve data normalization.I In this process, the value range was limited to [0, 1].Subsequently, these data will be categorized into training and testing data.

Stochastic Optimization
Deep learning often requires quantity of time and computing resources to train.Hence, this also corresponds to a major challenge for the development of deep learning algorithms.Although, multi-GPU parallel training can be used to accelerate the learning process of the model, the required computing resources are not reduced.An optimized algorithm, which requires fewer resources and allows the model to converge faster, can fundamentally accelerate the speed of the learning process and increase the effectiveness of the machine.To improve the training performance of the deep learning model, in current paper the adaptive moment estimation (Adam) optimizer was applied Performance evaluation 13: Terminate

Stochastic Optimization
Deep learning often requires quantity of time and computing resources to train.Hence, this also corresponds to a major challenge for the development of deep learning algorithms.Although, multi-GPU parallel training can be used to accelerate the learning process of the model, the required Energies 2018, 11, 2777 9 of 20 computing resources are not reduced.An optimized algorithm, which requires fewer resources and allows the model to converge faster, can fundamentally accelerate the speed of the learning process and increase the effectiveness of the machine.To improve the training performance of the deep learning model, in current paper the adaptive moment estimation (Adam) optimizer was applied [32], which corresponds to a stochastic optimization algorithm, to adjust parameters.The Adam optimization algorithm is an extension of stochastic gradient descent (SGD), which is recently widely used in deep learning applications, especially for tasks such as computer visual and natural language processing.Adam is an optimization algorithm that can replace traditional SGD processes.It can iteratively update the weights of neural networks based on training data.The main formulas are presented in Equations ( 3)-( 8).
α represents the step size, while β 1 and β 2 stand for the exponential decay rates.f (θ) is the stochastic objective function, θ 0 equals the initial parameter vector, m t is 1st moment vector, v t corresponds to the 2nd moment vector, mt and vt are bias-corrected moment estimates, g 2 t represents element wise square g t g t .Kingma et al. [32], mentioned that effective initial settings are α = 0.001, β 1 = 0.9, β 2 = 0.999, Adam is a very popular algorithm in deep learning since it can quickly achieve excellent results.Experimental results prove that the Adam algorithm has excellent performance during actual practice [32] and has great advantages compared to other kinds of random optimization algorithms.The detailed operation process of Adam is presented in Algorithm 2. Initially, after confirming parameters α, β 1 , β 2 and stochastic objective function f (θ), the following parameters should be initialized: parameter vector θ, 1st moment vector m t , 2nd moment vector v t and timestep t.Then, as long as parameter θ t does not converge, each part of the loop is iteratively updated.We also add 1 to timestep t, renew the stochastic objective function to the gradient requested by parameter θ t at the timestep, update the biased first moment estimate m t , update biased second raw moment estimate v t , then calculate bias-corrected first moment estimate mt and bias-corrected second raw moment estimate vt , then update the model parameter θ t with the above calculated value.

Data Descriptions
This experiment used the wind speed record reported by Zuoying, Taiwan in 2016 for performance analysis.The wind speed profile is illustrated in Figure 8b.The length of the collected data is one year, and it includes 8784 records.We have already tried other collected data in our previous research.However, according to that, the wind speed of Zuoying mostly falls at a range of 1.5 m/s to 4 m/s.There are still many cases where the wind speed is higher than 5 m/s or even exceeds 10 m/s.Based on Figure 8a, it can be derived that Zuoying is located in the offshore area.Thus the wind speed of Zuoying is not very stable and there are often sudden peaks.Therefore, it was difficult to predict wind speed.This is also the reason we choose this dataset for the experiments in this paper.In this experiment, the wind speed information of the past 7 days were used to predict the wind speed of the following 3 days.Authors estimated that through this information, the machine learning model will undergo supervised learning and analysis to achieve accurate predictions.10 m/s.Based on Figure 8a, it can be derived that Zuoying is located in the offshore area.Thus the wind speed of Zuoying is not very stable and there are often sudden peaks.Therefore, it was difficult to predict wind speed.This is also the reason we choose this dataset for the experiments in this paper.In this experiment, the wind speed information of the past 7 days were used to predict the wind speed of the following 3 days.Authors estimated that through this information, the machine learning model will undergo supervised learning and analysis to achieve accurate predictions.

Experiment Results
In this experiment, the mean absolute error (MAE) and root-mean-square error (RMSE) were applied as evaluation indicators (the formulas are shown in Equations ( 9) and ( 10) respectively).The test results of various algorithms are illustrated in Figures 9-13. Figure 14 presents a comprehensive comparison of the entire algorithms.In actual situations, the forecasting model can only use past experience to predict future wind speed.When the model training is completed, the data used for real forecasting will be information that the trained model has never encountered before.Therefore, to meet real situations, in this experiment, testing data did not participated in the model training process and the experimental results obtained in this experiment were also based on testing data for performance evaluation.We can derive from Figure 14 that each algorithm can slightly capture the trends of future wind speeds, but the prediction results of SVM and DT are relatively unstable.Compared to that of SVM and DT, the performance of RF, MLP and WindNet present increased stability.In particular, the wind speed information forecasted y WindNet is similar to the real

Experiment Results
In this experiment, the mean absolute error (MAE) and root-mean-square error (RMSE) were applied as evaluation indicators (the formulas are shown in Equations ( 9) and ( 10) respectively).The test results of various algorithms are illustrated in Figures 9-13. Figure 14 presents a comprehensive comparison of the entire algorithms.In actual situations, the forecasting model can only use past experience to predict future wind speed.When the model training is completed, the data used for real forecasting will be information that the trained model has never encountered before.Therefore, to meet real situations, in this experiment, testing data did not participated in the model training process and the experimental results obtained in this experiment were also based on testing data for performance evaluation.We can derive from Figure 14 that each algorithm can slightly capture the trends of future wind speeds, but the prediction results of SVM and DT are relatively unstable.Compared to that of SVM and DT, the performance of RF, MLP and WindNet present increased stability.In particular, the wind speed information forecasted y WindNet is similar to the real conditions (as presented by the blue line).This confirms that WindNet is very effective and accurate in wind speed forecasting:

Discussion
The choosing of the parameters is also a very important issue during the training process.By the correct parameter setting of WindNet, as shown in Table 3, the performance of the proposed WindNet can be significantly demonstrated.Figure 15 presents the detailed comparison result of each model.The thick blue line corresponds to the actual data and the lines of other colors are the prediction results of various algorithms.The blue box in Figure 15 indicates that the prediction result of DT almost does not coincide with the actual data, while the trend forecasted by SVM also does not coincide with that of the actual data.Among all algorithms, RF, MLP and WindNet still perform better.The green box in Figure 15 demonstrates that when the wind speed is about to decrease, many algorithms struggle to grasp the trend, DT and SVM in particular, miscalculated the trend.Even RM cannot accurately forecast data and presents a relatively disordered situation.This points out that there is still a certain degree of difficulty to the forecasting of wind speed.However, even if many algorithms cannot accurately predict data, MLP and WindNet can still supply stable forecast results.Overall, the performances of RF and MLP are stable and accurate.Nevertheless, WindNet produced the most efficient results.Therefore, the ability of WindNet for wind speed forecasting has been proven in this experiment.
Energies 2018, 11, x FOR PEER REVIEW 6 of 21 training session.Following the MLP training, the calculation result can be output through layer-bylayer transfer.Input Hidden layer Output

Figure 4 .
Figure 4.The architecture of the fully connected neural network.

Figure 4 .
Figure 4.The architecture of the fully connected neural network.

Figure 6 .
Figure 6.The architecture of the proposed WindNet model.

Figure 6 .
Figure 6.The architecture of the proposed WindNet model.

Figure 6 .Figure 7 .
Figure 6.The architecture of the proposed WindNet model.In WindNet, 2 activation functions were used: sigmoid and Rectified Linear Unit (ReLU).Related formulas are presented in Equations (1) and (2).In 1D convolution, the activation function used by WindNet is ReLU to minimize the problem of gradient vanishing.In the fully connected architecture in

Algorithm 1 .
The training data are used to train the model data, while the testing data are not used during the training process.Next, the Wind Net model is constructed and initialized.In each training period, to reduce data interdependency, all training data sequences are shuffled and training data are disassembled into several batches for training.In the WindNet architecture, the batch size equals 32, which means that there are 32 data in a batch.After training is completed, WindNet will use the testing data for performance evaluation.The program of the proposed WindNet is presented in Algorithm 1.The algorithm of the proposed WindNet.1: Loading the data 2: Data normalization 3: Partition the data into training data and testing data 4

Figure 7 .Algorithm 1 .
Figure 7.The activation functions (a) Sigmoid, (b) ReLU.As it concerns the programming, the wind speed database will be read first to achieve data normalization.I In this process, the value range was limited to [0, 1].Subsequently, these data will be categorized into training and testing data.The training data are used to train the model data, while the testing data are not used during the training process.Next, the Wind Net model is constructed and initialized.In each training period, to reduce data interdependency, all training data sequences are shuffled and training data are disassembled into several batches for training.In the WindNet architecture, the batch size equals 32, which means that there are 32 data in a batch.After training is completed, WindNet will use the testing data for performance evaluation.The program of the proposed WindNet is presented in Algorithm 1.

Figure 8 .
Figure 8.The information of (a) the position and (b) wind speed record in Zuoying, Taiwan.

Figure 8 .
Figure 8.The information of (a) the position and (b) wind speed record in Zuoying, Taiwan.

Figure 9 .
Figure 9.The forecasting results of SVM: (a) Partial results A; (b) Partial results B; (c) Partial results C; (d) Partial results D; (e) Partial results E; (f) Partial results F.

Figure 9 .
Figure 9.The forecasting results of SVM: (a) Partial results A; (b) Partial results B; (c) Partial results C; (d) Partial results D; (e) Partial results E; (f) Partial results F. To perform an increased reliability test, authors extracted 11 segments from the dataset, each containing 2 months of training data and one month of testing data, and conducted model training and testing on the data of these 11 segments.The test results are presented in Table 1 (MAE) and Table 2 (RMSE).In MAE ranking from the lowest to the highest corresponds to: WindNet (0.800227), RF (0.831981), MLP (0.833486), DT (0.955564), SVM (0.967744).In RMSE ranking from the lowest to the highest corresponds to: WindNet (0.999978), MLP (1.022898), RF (1.030018), SVM (1.198929), DT (1.203666).According to the experimental results, authors concluded that compared with other algorithms, SVM and DT performed poorly, while the MAE and RMSE values of SVM and DT are identical.If MAE is used as a benchmark, DT outperforms SVM, but in RMSE, SVM outperforms DT.An identical situation also occurs for both the MLP and RF algorithms.If the comparison is based on MAE, RF performs better than MLP, but in RMSE, MLP outperforms RF.Although the performance of RF and MLP is efficient, in terms of MAE or RMSE measurements, the leading performance overall is still the WindNet model proposed in this article.

Figure 12 .
Figure 12.The forecasting results of MLP: (a) Partial results A; (b) Partial results B; (c) Partial results C; (d) Partial results D; (e) Partial results E; (f) Partial results F.

Figure 12 .Figure 13 .
Figure 12.The forecasting results of MLP: (a) Partial results A; (b) Partial results B; (c) Partial results C; (d) Partial results D; (e) Partial results E; (f) Partial results F.

Figure 13 .Figure 14 .
Figure 13.The forecasting results of the proposed WindNet: (a) Partial results A; (b) Partial results B; (c) Partial results C; (d) Partial results D; (e) Partial results E; (f) Partial results F.

Figure 14 .
Figure 14.The comparisons of all the forecasting results: (a) Partial results A; (b) Partial results B; (c) Partial results C; (d) Partial results D; (e) Partial results E; (f) Partial results F.

Table 1 .
The experimental results in terms of mean absolute error (MAE).

Table 1 .
The experimental results in terms of mean absolute error (MAE).

Table 2 .
The experimental results in terms of root mean square error (RMSE).

Table 3 .
The parameter setting of WindNet.