A Fuzzy Seasonal Long Short-Term Memory Network for Wind Power Forecasting

: To protect the environment and achieve the Sustainable Development Goals (SDGs), reducing greenhouse gas emissions has been actively promoted by global governments. Thus, clean energy, such as wind power, has become a very important topic among global governments. However, accurately forecasting wind power output is not a straightforward task. The present study attempts to develop a fuzzy seasonal long short-term memory network (FSLSTM) that includes the fuzzy decomposition method and long short-term memory network (LSTM) to forecast a monthly wind power output dataset. LSTM technology has been successfully applied to forecasting problems, especially time series problems. This study ﬁrst adopts the fuzzy seasonal index into the fuzzy LSTM model, which effectively extends the traditional LSTM technology. The FSLSTM, LSTM, autoregressive integrated moving average (ARIMA), generalized regression neural network (GRNN), back propagation neural network (BPNN), least square support vector regression (LSSVR), and seasonal autoregressive integrated moving average (SARIMA) models are then used to forecast monthly wind power output datasets in Taiwan. The empirical results indicate that FSLSTM can obtain better performance in terms of forecasting accuracy than the other methods. Therefore, FSLSTM can efﬁciently provide credible prediction values for Taiwan’s wind power output datasets.


Introduction
Wind power generation is replacing power generation via extensive gas-flow and uses wind to drive wind turbines. In 2000, to protect the environment, the Taiwanese government actively promoted the use of clean energy to reduce the greenhouse gas emissions generated by traditional power generation methods such as thermal power generation. The Taiwanese government's expectations for wind power generation are very high. The government has developed an offshore wind power facility, the main goal of which is to generate enough electricity so that renewable energy can replace nuclear power generation. The vision of the Taiwanese government is to build a strong support industry by manufacturing the necessary wind turbine components, towers, and underwater cables for coastal engineering; by building underwater foundation pile; and by installing generators several miles offshore. According to statistics from the International Energy Agency, offshore wind power currently accounts for only 0.3% of global power generation, but experts have noted that wind power generation is expected to rapidly grow in the next 20 years, representing a business opportunity of up to one trillion US dollars. Therefore, the accuracy of wind power forecasting is a very important issue that could help governments to engage in effective policy planning. In recent years, many studies have investigated wind speed and power forecasting and adopted various prediction models to improve wind power generation forecasting. Lu et al. [1,2] used the Takagi-Sugeno fuzzy model to predict the power load in Thailand, and the prediction accuracy of the two models was compared. The results showed that LSTM offers better predictive ability. Fan et al. [20] developed an integrated method that combined the ARIMA model with the LSTM model. Shahid et al. [21] developed a novel genetic long short-term memory (GLSTM) method; this method improved wind power predictions from 6% to 30% compared to existing techniques. Zhang et al. [22] developed a convolutional neural network model based on a deep factorization machine and attention mechanism (FA-CNN). The results indicated that FA-CNN obtained better performance than the traditional LSTM. Table 1. Long short-term memory (LSTM) applications from 2015.

Author Applied Field Methodology Compared Methodology
Tian and Pan [15] Traffic LSTM SVR, Random Walker (RW), Fuzzy Neural Network (FNN), and Stacked Autoencoder (SAE) Liu et al. [16] Electroencephalography LSTM RNN Janardhanan and Barrett [17] CPU usage of Google's data center LSTM ARIMA Siami-Namini et al. [18] Financial index and economic indexes LSTM ARIMA Phyo et al. [19] Power load LSTM Deep Confidence Network (DBN) Fan et al. [20] Production forecasting Integrates the ARIMA model and the LSTM model. ARIMA Shahid et al. [21] Wind power GLSTM LSTM Zhang et al. [22] Stock price movement prediction FA- CNN LSTM In this study, the prediction model adopts three LSTMs with fuzzy seasonal indexes to approach the fuzzy set's upper and lower bounds, respectively, as well as mode prediction values. This is a novel prediction model for wind power output forecasting. The rest of this paper is organized as follows. Section 2 introduces the fuzzy seasonal LSTM (FSLSTM) in detail, which include fuzzy seasonal decomposition and fuzzy LSTM technology. Section 3 presents the experimental results of the FSLSTM for wind power output prediction. Finally, we draw conclusions and make suggestions for future research in Section 4.

Fuzzy Seasonal LSTM for Wind Power Output
In this study, the wind power output dataset is examined. This dataset is further divided into training, validation, and testing datasets, respectively. Firstly, the fuzzy seasonal index is calculated by seasonal trend decomposition. This method can define the fuzzy seasonal membership function with time series, and then the fuzzy trend dataset can be estimated using the multiplicative model. LSTM is employed to predict the fuzzy trend datasets for the upper bound, lower bound, and mode values. Based on the fuzzy LSTM and fuzzy seasonal index, the final forecasting report can be obtained using the measure index. A flowchart of the fuzzy seasonal LSTM for wind power output is shown in Figure 1.

Fuzzy Seasonal Decomposition
Chang [21] proposed the fuzzy seasonality index * k S , which is defined a a triangular membership function, from the seasonality index set. Chang [23] * k S as follows: In a time-series problem, reducing seasonality for time series prediction portant. The fuzzy seasonality index has been shown to effectively obtain mance in time series predictions. Therefore, this study proposes a novel fu LSTM that uses the fuzzy seasonality index and a decomposition method to s sonal time series problem. The multiplicative model is employed in the time lem. Moreover, the IFLR with a spread unrestricted model is combined with triangular FNs for forecasting and can obtain an accurately estimated value tion (3). Therefore, in the proposed model, a multiplicative model is used to based on a fuzzy seasonality index, as follows:

Fuzzy Seasonal Decomposition
Chang [21] proposed the fuzzy seasonality index S * k , which is defined as possessing a triangular membership function, from the seasonality index set. Chang [23] determined S * k as follows: where s L k , s M k , s U k are the W-period lower bound, W-period smoothing-operators (1 ≤ W ≤ T), and W-period upper bound, respectively.
In a time-series problem, reducing seasonality for time series predictions is very important. The fuzzy seasonality index has been shown to effectively obtain good performance in time series predictions. Therefore, this study proposes a novel fuzzy seasonal LSTM that uses the fuzzy seasonality index and a decomposition method to solve the seasonal time series problem. The multiplicative model is employed in the time series problem. Moreover, the IFLR with a spread unrestricted model is combined with symmetrical triangular FNs for forecasting and can obtain an accurately estimated value using Equation (3). Therefore, in the proposed model, a multiplicative model is used to obtain FNs based on a fuzzy seasonality index, as follows: where F k+(T+v) represents the fuzzy seasonal LSTM forecast value, f LTr k+(T+v) is the lowerbound estimated value of the trend, f MTr k+(T+v) is the mode estimated value of the trend, f UTr k+(T+v) is the upper-bound estimated value of the trend, and ε is the model noise. The proposed fuzzy seasonal LSTM model can effectively use fuzzy seasonal decomposition to reduce seasonal effects in time series problems.

Fuzzy Seasonal LSTM Model
In the fuzzy seasonal LSTM model, the fuzzy trend lower bound, upper bound, and mode values must be trained. Therefore where the fuzzy long-term state c is (c Li , c Mi , c Ui ), and the output gate o (o Li , o Mi , o Ui ) can be estimated. A fully connected fuzzy LSTM unit contains four layers, as with the traditional LSTM, and the fuzzy input vector (x i ) and previous fuzzy short-term memory ( h i−1 ) are imported into these four layers ( Figure 2). g i is the main layer of the fuzzy LSTM and uses the tanh activation function, and the fuzzy output data are stored in fuzzy long-term memory c i . The other three layers use logic activation functions, and their output ranges from 0 to 1. f i f i is the fuzzy forget gate that controls which parts of long-term memory should be deleted. I i is the fuzzy input gate that determines which parts of the fuzzy input should be added. o i is the gate that controls which parts of the fuzzy long-term memory should be read and the fuzzy output at this time step, f LTr represents the fuzzy seasonal LSTM forecast value, is the lowerbound estimated value of the trend, MTr is the mode estimated value of the trend, is the upper-bound estimated value of the trend, and ε is the model noise. The proposed fuzzy seasonal LSTM model can effectively use fuzzy seasonal decomposition to reduce seasonal effects in time series problems.

Fuzzy Seasonal LSTM Model
In the fuzzy seasonal LSTM model, the fuzzy trend lower bound, upper bound, and mode values must be trained. Therefore where the fuzzy long-term state  c is (cLi, cMi, cUi), and the output gate  o (oLi, oMi, oUi) can be estimated.
A fully connected fuzzy LSTM unit contains four layers, as with the traditional LSTM, and the fuzzy input vector (xi) and previous fuzzy short-term memory (   The operation can be written as follows: Fuzzy input gate: Fuzzy forget gate: The operation can be written as follows: Fuzzy input gate: Fuzzy forget gate: Mathematics 2021, 9, 1178 6 of 15 Output gate: Neuron input and cell input: are the fuzzy weight matrices of each of the four layers used for their connections with the fuzzy input vector, and the fuzzy weight matrices are connected to the fuzzy short-term state are the deviation terms of each of the four layers, tanh is the hyperbolic tangent function (e (x) − e (−x) )/(e (x) + e (−x) ), and σ is the sigmoid function 1/(1 + e (−x) ).
Finally, the long-term and short-term states are calculated as follows: Fuzzy long-term state: Fuzzy short-term state: Moreover, this FSLSTM adopts the adaptive moment estimation (Adam) optimization algorithm from Kingma and Ba [24], which employs stochastic optimization, to search for the proper parameters of the FSLSTM. The Adam optimization algorithm was demonstrated empirically to show that convergence meets the expectations of the theoretical analysis. The proposed FSLSTM can achieve robust performance based on the Adam optimization algorithm. The maximum epochs, initial parameters learning rate, gradient threshold, learn rate drop period, and learn rate drop factor of FSLSTM are 250, 0.005, 1, 125, and 0.2, respectively.

A Wind Power Output Example and Empirical Results
Energy conservation and decreasing carbon are very important management issues for the global power industry. To demonstrate its concern regarding the global warming issue and to comply with the government's Sustainable Energy Guidelines, the Taiwanese government is actively promoting the use of clean energy. The Taipower company has built 17 wind energy power stations in Taiwan that record monthly data on the total power output. All experimental data can be download from the National Development Council in Taiwan (https://data.gov.tw (accessed on 21 July 2020)). In this study, we selected three wind energy power stations: the Shimen, Taichung, and Mailiao wind power plants. Figure 3 and Table 2 depict the monthly generated output power (units: kilowatt-hours) from these wind power stations during the period from January 2017 to June 2020 (the total number is 42). In this study, the monthly data were divided into three sets: firstly, a training set was employed to determine the optimum forecasting model during the period from January 2017 to December 2018 (the number of samples in the training set was 24); secondly, a validation set was employed to prevent the overfitting of the different models during the period from January 2019 to December 2019 (the number of samples in the validation set was 24); finally, a testing set was employed to investigate the performance of the different models during the period from January 2020 to June 2020 (the number of samples in the testing set was 24). The percentages of training, validation, and testing sets were 57%, 29%, and 14%, respectively. mance of the different models during the period from January 2020 to June 2020 (the number of samples in the testing set was 24). The percentages of training, validation, and testing sets were 57%, 29%, and 14%, respectively.       Table 3 depict the fuzzy seasonality index with k ranging from 1 to 12 from the selected wind power stations. In addition, the mean absolute percentage error MAPE(%) was used to measure the forecasting accuracy. Equation (12) illustrates the expression of MAPE(%): where M is the number of forecasting periods, A i is the actual production value at period i, and P i is the forecasting production value at period i. Moreover, the RMSE is employed to evaluate the training error of FSLSTM, which can be expressed as follows:  Figure 4 shows the training error of the FSLSTM in three wind power stations, adopting the Adam algorithm. We can observe that the FSLSTM can obtain a lower RMSE training error (smaller than 0.2) in three wind power stations.  Figure 4 shows the training error of the FSLSTM in three wind power stations, adopting the Adam algorithm. We can observe that the FSLSTM can obtain a lower RMSE training error (smaller than 0.2) in three wind power stations.   In this study, the FSLSTM, LSTM [8], ARIMA(1, 0, 0) [25], generalized regression neural network (GRNN) [26], back propagation neural network (BPNN) [27], least square support vector regression (LSSVR) [28], and seasonal autoregressive integrated moving average (SARIMA (1, 0, 0) (1, 0, 0) 12 ) [25] models were used to forecast the monthly wind power output datasets in selected stations in Taiwan. The construction of LSTM is similar to that of the FSLSTM (see Section 2.1). The LSTM network also adopted the Adam optimization algorithm to search for optimal parameters. The ARIMA is similar to SARIMA (see Appendix A), with a difference in seasonal parameters. The construction and parameter (σ) of the GRNN is shown in Appendix B. The parameter (σ) of the GRNN was set to 1. In this study, a well-known intelligent computing machine, BPNN, is also adopted to compare prediction models. In the BPNN, the input layer has one input neuron to catch the input patterns, the hidden layer has ten neurons to propagate the intermediate signals, and the output layer has one neuron. For more training assignments in the BPNN, the hyperbolic tangent sigmoid function is employed as the activation function in the hidden layer, the pure-line transfer function is employed in the output layer as the activation function, and the gradient training is adopted as the learning algorithm for the BPNN. The LSSVR is a popular prediction model in time series problems. For the main constructs of the LSSVR, readers can be refer to [28], and the regularization parameter in the experiment was set to 1. The Radial Basis Function (RBF) Kernel Trick was employed in the LSSVR, and the parameter (σ) of the RBF was set to 0.01. Table 4 depicts the training error with various prediction models. The proposed FSLSTM, LSTM, and GRNN approaches could obtain lower training errors, which means that the training models of the three approach achieved better performance.  Table 5 illustrates the actual values and experimental results of the FSLSTM model with the mode (M) and upper (U) and lower (L) bounds from January 2020 to June 2020 for the Shimen wind power plant. Figure 5a makes a point-to-point comparison of the actual values and predicted values of FSLSTM. As shown in Figure 4, the peak power output was in April 2020, which was not easily observed in the training dataset. Figure 5b Table 5 shows the experimental results and MAPE(%) obtained by various models. The ranking of MAPE(%) is as follows: GRNN < FSLSTM-L < FSLSTM-M < LSTM < ARIMA < SARIMA < FSLSTM-U < BPNN < LSSVR. Table 4 indicates that the GRNN obtained the smallest MAPE(%), showing the best performance. However, Figure 5 shows that the predicted value of the GRNN could not capture the trend of power output at the Shimen wind power plant. The FSLSTM-L model was able to efficiently capture the trends of the data by using the fuzzy seasonal index, although the MAPE(%) of the FSLSTM-L was higher than that of the GRNN in the example. Thus, the proposed FSLSTM model is suggested to serve as a prediction model for power output for the Shimen wind power plant.    Table 4 indicates that the GRNN obtained the smallest MAPE(%), showing the best performance. However, Figure 5 shows that the predicted value of the GRNN could not capture the trend of power output at the Shimen wind power plant. The FSLSTM-L model was able to efficiently capture the trends of the data by using the fuzzy seasonal index, although the MAPE(%) of the FSLSTM-L was higher than that of the GRNN in the example. Thus, the proposed FSLSTM model is suggested to serve as a prediction model for power output for the Shimen wind power plant. Table 6 illustrates the actual values and experimental results of the proposed model for the Taichung wind power plant. Figure 6a Table 6 shows the experimental results and MAPE(%) obtained by various models for the Taichung wind power plant. The ranking of MAPE(%) is FSLSTM-M < SARIMA < FSLSTM-U < GRNN < LSTM < LSSVR < FSLSTM-L < BPNN < ARIMA. Table 5 indicates that the FSLSTM-M obtained the smallest MAPE(%), which means that FSLSTM-M achieved the best performance in this example. Moreover, Figure 6 shows that the predicted value of the FSLSTM-M was able to capture the trend of power output at the Taichung wind power plant. Moreover, the two seasonal models, FSLSTM-M and SARIMA, Figure 6. Illustration of the actual and forecasting power outputs of various models for the Taichung wind power plant. Table 6 shows the experimental results and MAPE(%) obtained by various models for the Taichung wind power plant. The ranking of MAPE(%) is FSLSTM-M < SARIMA < FSLSTM-U < GRNN < LSTM < LSSVR < FSLSTM-L < BPNN < ARIMA. Table 5 indicates that the FSLSTM-M obtained the smallest MAPE(%), which means that FSLSTM-M achieved the best performance in this example. Moreover, Figure 6 shows that the predicted value of the FSLSTM-M was able to capture the trend of power output at the Taichung wind power plant. Moreover, the two seasonal models, FSLSTM-M and SARIMA, obtained better performance than the other models, possibly because the power output at the Taichung wind power plant has a seasonal influence. Thus, the proposed FSLSTM model is also suggested to serve as a prediction model for power output at the Taichung wind power plant. Table 7 Table 7 shows the experimental results and MAPE(%) obtained using various models for the Mailiao wind power plant. The ranking of MAPE(%) is FSLSTM-M < SARIMA < FSLSTM-U < GRNN < LSTM < LSSVR < FSLSTM-L < BPNN < ARIMA. Table 6 indicates that the FSLSTM-M obtained the smallest MAPE(%), which means that the FSLSTM-M achieved the best performance in this example. Both seasonal models, FSLSTM-M and SARIMA, obtained better performance than the other models and were able to capture the trend of power output for the Mailiao wind power plant, possibly for the same reasons as those of the Taichung wind power plant. Moreover, the Mailiao region is very close to the Taichung region in Taiwan. Thus, the ranking of MAPE(%) in the Mailiao region is the same as that of the Taichung region. Again, the proposed FSLSTM model is suggested to serve as a prediction model for wind power output at the Mailiao wind power plant.
By reviewing the three forecasting examples using the FSLSTM model, some findings can be concluded, as follows: (1) the FSLSTM model can efficiently handle seasonal influ-  Table 7 shows the experimental results and MAPE(%) obtained using various models for the Mailiao wind power plant. The ranking of MAPE(%) is FSLSTM-M < SARIMA < FSLSTM-U < GRNN < LSTM < LSSVR < FSLSTM-L < BPNN < ARIMA. Table 6 indicates that the FSLSTM-M obtained the smallest MAPE(%), which means that the FSLSTM-M achieved the best performance in this example. Both seasonal models, FSLSTM-M and SARIMA, obtained better performance than the other models and were able to capture the trend of power output for the Mailiao wind power plant, possibly for the same reasons as those of the Taichung wind power plant. Moreover, the Mailiao region is very close to the Taichung region in Taiwan. Thus, the ranking of MAPE(%) in the Mailiao region is the same as that of the Taichung region. Again, the proposed FSLSTM model is suggested to serve as a prediction model for wind power output at the Mailiao wind power plant.
By reviewing the three forecasting examples using the FSLSTM model, some findings can be concluded, as follows: (1) the FSLSTM model can efficiently handle seasonal influence. In the Taichung and Mailiao regions, the seasonal influence of wind power output can be observed. (2) In all examples, the FSLSTM model could obtain better performance and more accurately capture the trends of wind power output. This performance was not observed for the traditional LSTM in the three examples. (3) For the three different types of wind power output, the FSLSTM-M model obtained better performance than almost all other models. The FSLSTM-M model is thus recommended as a prediction model for wind power output.

Managerial Implications
A wind power forecasting system is implemented for the government in this study. The government can use the forecasting results of monthly wind power output to reduce the risk of insufficient power supply. The wind power forecasting system can provide an early warning of insufficient power supply to decision-makers in the government to reduce the risk of an insufficient power supply. The government has developed an offshore wind power facility, the main goal of which is to generate enough electricity so that renewable energy can replace nuclear power generation. The mechanism of early warning by the proposed wind power forecasting system can accurately predict the wind power output. The decision-makers in the government can therefore conduct proper planning to avoid the risk of an insufficient power supply.

Conclusions
Due to the Sustainable Development Goals, wind power prediction has become increasingly crucial in Taiwan. Moreover, LSTM models have been successfully used in time series forecasting problems. However, they have not been widely explored in seasonal time series prediction. This study developed a novel FSLSTM model to exploit the unique strength of the fuzzy seasonal index and the LSTM technique in order to predict wind power output in Taiwan. In all examples, the FSLSTM model could obtain better performance and more accurately capture the trends of wind power output. This performance was not observed for the traditional LSTM in the three examples. The simultaneous results indicate that the FSLSTM model represents a promising alternative for analyzing wind power output in Taiwan. The superior performance of the FSLSTM model can be ascribed to two causes: first, the FSLSTM benefits from the advantages of LSTM and can effectively capture the time series dataset by the mechanism of a recurrent neural network; second, the fuzzy decomposition method enhances the ability of the FSLSTM models to capture seasonal nonlinear data patterns under an uncertain environment. The limitation of the FSLSTM model is that it is only suitable for strong monthly seasonal patterns. Forecasting other types of time series data using an LSTM-related model would be a challenging issue for future studies. Future research directions could consider using data preprocessing techniques to achieve improvements in the forecasting accuracy of the FSLSTM model for seasonal time series data. The parameters of FSLSTM also could be searched by a heuristic algorithm to improve the performance.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Generalized Regression Neural Network (GRNN)
The GRNN is based on nonlinear regression theory and is a well-established statistical technique for function estimation. By definition, the regression of a dependent variable y on an independent variable x estimates the most probable value for y, given x and a training set. The training set consists of values for x, each with a corresponding value for y (x and y are, in general, vectors). However, variable y may be corrupted by additive noise. Despite this, the regression method will produce the estimated value of y which minimizes the mean-squared error. The GRNN is, in essence, a method for estimating f(x, y) given only a training set. Because the probability distribution function is derived from the data with no preconceptions about its form, the system is perfectly general. There is no problem if the functions are composed of multiple disjointed non-Gaussian regressions in any number of dimensions, as well as those of simpler distributions. The variable y i is estimated optimally as follows: where w ij is the target output corresponding to input training vector x i and output j. h i is exp −D 2 i /(2σ 2 ) , the output of a hidden layer neuron. D 2 i is (x − u i )T(x − u i ) (the squared distance between the input vector x and the training vector u). x is the input vector (a column vector). u i is the training vector of i, the center of neuron i (a column vector). σ is a constant controlling the size of the respective region. Equation (A1) is the radial basis function (with normalization). However, this is different to the RBN in that the target values are used as the weights of the output network.

Appendix B. Seasonal Autoregressive Integrated Moving Average Model (SARIMA)
The SARIMA model is a popular tool in time series forecasting for data with a seasonal pattern. The SARIMA (p, d, q) × (P, D, Q) S process generates a time series, {X t, t=1,2,, N },with the mean µ of the Box and Jenkins time series model satisfying where p, d, q, P, D and Q are nonnegative integers; S is the seasonal length; ϕ(B) = (1 − ϕ 1 B − ϕ 2 B 2 − · · · − ϕ p B p ) represents a regular autoregressive operator of order p, φ(B S ) = (1 − φ 1 B S − φ 2 B 2S − · · · − φ p B PS ) is a seasonal autoregressive operator of order P, θ(B) = (1 − θ 1 B − θ 2 B 2 − · · · − θ q B q ) denotes a regular moving average operator of order q, and Θ(B S ) =(1 − Θ 1 B S −Θ 2 B 2S − · · · −Θ Q B QS ) expresses a seasonal moving average operator of order Q. Additionally, B indicates the backward shift operator, d denotes the number of regular differences, D represents the number of seasonal differences, and a t is the forecasted residual at time t. When fitting a SARIMA model to data, the first task is to estimate values of d and D, which are the orders of differentiation needed to make the series stationary and to remove most of the seasonality. The suitable values of p, P, q, and Q can be evaluated by the autocorrelation function and partial autocorrelation function of the differentiated series. The parameter selection of the SARIMA model includes the following four iterative steps: (a) Identifying a tentative SARIMA model; (b) Estimating parameters in the tentative model; (c) Evaluating the adequacy of the tentative model; (d) If an appropriate model is obtained, then applying this model for forecasting; otherwise, returning to step (a).