A Combined Forecasting System Based on Modified Multi-Objective Optimization for Short-Term Wind Speed and Wind Power Forecasting

Wind speed and wind power are two important indexes for wind farms. Accurate wind speed and power forecasting can help to improve wind farm management and increase the contribution of wind power to the grid. However, nonlinear and non-stationary wind speed and wind power can influence the forecasting performance of different models. To improve forecasting accuracy and overcome the influence of the original time series on the model, a forecasting system that can effectively forecast wind speed and wind power based on a data pre-processing strategy, a modified multi-objective optimization algorithm, a multiple single forecasting model, and a combined model is developed in this study. A data pre-processing strategy was implemented to determine the wind speed and wind power time series trends and to reduce interference from noise. Multiple artificial neural network forecasting models were used to forecast wind speed and wind power and construct a combined model. To obtain accurate and stable forecasting results, the multi-objective optimization algorithm was employed to optimize the weight of the combined model. As a case study, the developed forecasting system was used to forecast the wind speed and wind power over 10 min from four different sites. The point forecasting and interval forecasting results revealed that the developed forecasting system exceeds all other models with respect to forecasting precision and stability. Thus, the developed system is extremely useful for enhancing forecasting precision and is a reasonable and valid tool for use in intelligent grid programming.


Introduction
With the rise of globalization, renewable and alternative energy sources, which address security issues associated with conventional energy sources, are increasingly being favored to provide power for a wide range of social and economic activities. Wind energy, a promising technology utilized in many renewable energy systems, has attracted an increasing amount of attention in recent decades due to the drive to meet the rapidly growing electricity demand across the globe without emitting environmental pollutants, such as CO 2 . According to the GLOBAL WIND REPORT 2021 [1], China possesses 39% (278324MW) of the world's total onshore wind power capacity, and accounted for 56% of new onshore installations by the end of 2020, making it the world leader. As a result, the accurate forecasting of wind speed-the determining factor in wind energy electricity generation-has increasingly become a focus of public conversation, especially on the shortterm horizon. Unfortunately, it is difficult to obtain excellent forecasting results because of the stochastic and nonlinear oscillations caused by uneven atmosphere stratification and complex topography [2].
In recent years, the capacity of onshore wind power sets has been rapidly increasing, which requires high reliability and maintainability in wind turbines with poor natural (1) For physical methods, to attain effective forecasting results, information on a range of physical factors is needed, rather than information on just a single factor such as the wind speed time series; thus, these meteorological models cannot generate forecasts simply [15]. In addition, physical models are not proficient in dealing with short-term series and involve a complex calculation process and high expansion costs, which all contribute to significant forecasting errors [16]. Physical models necessitate the gathering of numerical physical variables, such as the horizontal pressure gradient, geostrophic force, and fractionate to undergo wind speed forecasting [17]. However, the complex calculation and polytrophic processes involved are not only time consuming but create the risk of forecasting error [18]. (2) With sufficient accessible spatial and temporal information from multiple wind farms, a few spatio-temporal prediction methodologies have been able to be investigated in recent studies. The spatio-temporal characteristics of wind speed are extracted by an undirected graph of wind farms [19]. A hybrid support vector machine forecasting model is proposed, which is based on the spatio-temporal and grey wolf optimization, to forecast wind power for multiple wind farms [20]. Ref. [21] uses copula theory and Bayesian theory to simulate spatio-temporal correlations between wind farms and deduce a conditional distribution of aggregated wind power. A probabilistic wind speed prediction approach was presented in Ref. [22] based on a spatio-temporal neural network (STNN) and variational Bayesian inference. With feeding of both the spatial and temporal information into the forecasting model, these methods have achieved a better forecasting performance. Nevertheless, most of these forecasting models often collected the wind speed information from different wind farms indiscriminately, and to some extent, the implicit spatial correlations cannot be fully exploited in the original wind speed data. Also, the thorny multi-dimensional computing problem caused by the large amount of wind speed data from multiple wind farms needs to be solved in an effective way. (3) Typical statistical methods can yield excellent forecasting results under the assumption that the input series was recorded under normal conditions [23]; however, nonlinearity, noise, instability, fluctuations, and other features within the raw time series are always difficult to control, resulting in a lack of modeling information. Therefore, such methods often result in bad short-term wind speed forecasting performance, especially in multistep-ahead forecasting [24]. In addition, historical data are utilized for statistical modeling methods, and only potential linear correlations between the variables and future forecasts are revealed; such models are unable to obtain a good forecasting performance within the required limits [25]. Statistical models, including typical autoregressive moving average family models (e.g., AR [26], MA, ARMA [27], autoregressive integrated moving average models (ARIMA) [28], SARIMA, etc.), exponential smoothing [29], Kalman filtering [30], vector autoregression structures for very short-term wind power forecasting [31], and autoregressive conditional heteroskedastic family models (e.g., ARCH, GARCH, and EARCH) [32] utilize significant amounts of historical data for wind speed forecasting with no consideration of other potential influencing factors to support the stochastic process. Meanwhile, a spatialtemporal forecasting method based on the vector autoregression framework has been proposed for renewable forecasting [33]. Generally, statistical methods work well for approaching linear features; however, they tend to fail when it comes to nonlinear problems due to the linear assumptions of the models [34]. (4) Fortunately, the timely emergence of artificial intelligence (AI) arithmetic, subsuming artificial neural networks (ANNs) [35], support vector machines (SVMs) [36], deep neural networks [37], and fuzzy logical methods (FLMs) [38] have efficiently remedied the flaws in the wind speed forecasting territory in recent years [39]. However, because of the inherent disadvantages of each model and the boom in the integration of wind power into the grid system, a variety of hybrid and combined models with promising forecasting potentials have been created [40]. Generally, artificial intelligence methods can achieve greater forecasting accuracy than physical or statistical models [41]; however, they also possess insurmountable drawbacks. ANNs have been extensively studied and applied to explore the complexity of wind speed forecasting; however, their performance mostly relies on training sets, which can result in a focus on local optima, over-fitting, and a reduction in the convergence rate [42]. (5) Differently to conventional or single models, hybrid models can reduce the current shortcomings associated with the forecasting of irregular, fluctuant, and nonstationary time series with noise or unpredictable components. In this regard, significant hybrid models have recently been launched [43]. Hybrid methods integrate different single algorithms to achieve a superior forecasting performance, and can overcome defects in AI models (e.g., falling into local minima, over-fitting, etc.)-greatly improving the accuracy of continuous fluctuant wind speed forecasting and providing better validity and stability than a single model [44]. Recently, the use of hybrid methods in the wind speed forecasting field has been widespread. Jiang et al. [45] proposed a hybrid model consisting of a grey correlation analysis, cuckoo search algorithm, and v-SVM (v-support vector machine). Dong et al. [46] proposed a hybrid preprocessing strategy coupled with an optimized local linear fuzzy neural network for wind power forecasting. It has been proven that this is a effective approach for predicting wind power. In 2017, Hu et al. [47] proposed a novel approach based on the Gaussian process with a t-observation model for short-term wind speed forecasting. Based on a spatio-temporal method, in [48], the performance of predictive clustering trees with a new feature space for wind power forecasting was investigated. The results showed that the proposed model achieved a satisfactory level of point forecasting accuracy and interval forecasting performance. A forecasting framework has been proposed in 2021 [49], that is multi-layer stacked bidirectional long/short-term memory (LSTM)-based for short-term time series forecasting. After being studied extensively, it is clear that no arithmetic method is omnipotent across all data cases. In future research, individual statistical models and artificial intelligence algorithms will be integrated to improve the precision of wind speed forecasting-these are referred to as hybrid models [50].
As mentioned previously, conventional or individual methods always show inherent weaknesses when approaching complex and actual wind time series with noise, which results in poor forecasting performances.
In accordance with previous studies, we designed three procedures for wind speed and wind power forecasting. First, data preprocessing was employed to identify features and eliminate useless information from the original time series. Then, the preprocessed time series was used to train and test the multi-artificial neural network. The models were ranked based on the test set accuracy for each model. Finally, the top five models were used as sub-models of the combined model. To obtain stabilized and accurate forecasting results, the weight of the combined model was optimized by the multi-objective optimization algorithm, which improved the stabilization and accuracy of the combined model.
The key findings of this study and comparisons with relative research in the field of wind speed forecasting are as follows: (1) To achieve accurate and stable forecasting of short-term wind speed and wind power, a robust, novel, combined system based on three modules was developed in this study. This novel hybrid system for forecasting short-term wind speed and wind power includes a data preprocessing module, a forecast optimization module, and an evaluation module. The excellent performances of these algorithms are combined to provide accurate and stable results for multi-step wind speed forecasting. (2) To effectively eliminate fluctuations in the original time series and avoid the limitations of a single algorithm, a new data preprocessing step was proposed. This was shown to decrease uncertainty and irregularity in the wind speed times series. SSA-EEMD, a powerful secondary denoising algorithm, was used to decompose and further denoise the actual wind speed time series. These steps were found to successfully overcome the limitations of single SSA algorithms. (3) To overcome the disadvantages of individual models, multi models were used to forecast the wind speed and wind power. To consider the nonlinear characteristics of wind speed and wind power time series, seven artificial neural networks (ANNs) were employed to forecast two types of time series, and five optimal hybrid models were selected based on the accuracy of data testing by hybrid models to form a combined model and act as sub-models. (4) To further improve forecasting accuracy and stability, the multi-objective dragonfly algorithm was used to determine the optimal weight of the combined model. In the optimization process, the optimized parameters were found to not only have good accuracy, but they also ensured that the output results had a high level of stability. Therefore, this paper used the multi-objective optimization algorithm to optimize the weight of the combined model. (5) A more scientific and comprehensive forecasting evaluation method was conducted to estimate the forecasting performance of the developed forecasting system in the model evolution module. Interval forecasting was used to assess the uncertainty of the combined model, and this indicated that the forecasting results of the proposed combined model were accurate and stabilized in an all-around manner. Additionally, the Diebold-Mariano test and Wilcoxon rank-sum test were implemented to further analyze the forecasting accuracy of each model.
The remaining sections of this paper are as follows: The methods involved in the proposed model are introduced in Section 2. Section 3 demonstrates the experiment preparations and the four numerical experiments used, as well as presenting the forecast results. Deeper discussion about the forecasting performance of our developed model is presented in Section 4. Section 5 provides the conclusions.

Flow of the Proposed Combined Model
In the 1960s, J.M. Bates et.al. discovered combined forecasting methods, which use more than two different forecasting models to address the same problem. Multiple models can be combined by the combination of quantitative methods. The main purpose of this combination is to make full use of the information provided by various models to improve forecasting accuracy [51]. In our study, a combined forecasting model was developed to forecast the nonlinear characteristics of wind speed and wind power. The forecasting processes used are shown in Figure 1, and the details of the forecasting procedure are described below.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 6 of 34 The forecasting precision and stability of the models during point forecasting were evaluated by four metrics. Three metrics were used for the uncertainty analysis.

Experiment and Results
In this section, the wind speed series data selection and experiment settings used for our study are illustrated systematically.

Data Acquisition
In this paper, the original wind speed and wind power time series used for forecasting were acquired in 2018 from four observation sites in Shandong province in China. These were used to establish and test the forecasting performance of each model.
Statistical descriptions for the datasets obtained for the two sites (mean, standard deviation, skewness, kurtosis, minimum, maximum, and median) are provided in Table  1. The mean values characterize the central tendency of the historical observations. The standard deviation values (the wind speed hovered around 2 m/s, and the wind power was around 200 kw) clearly reflect the fluctuations in wind speed and wind power. The skewness values of the wind speed and wind time series were greater than zero, which indicates that both types of time series had right-skewed distributions. Most wind time series had peak values greater than 3, indicating a fat tail distribution. Most kurtosis values of the wind power time series were greater than 3, indicating a fat tail distribution. The kurtosis values of the wind speed time series were less than 3, indicating a slight tail distribution.

Procedure 1: Data Pretreatment
This stage included two integral parts. First, SSA was used to decompose the raw wind speed series into a (r) low-frequency components group [52] and a (d-r) high-frequency components group, according to KPAC. Second, EEMD [53] was utilized to further denoise the (d-r) high-frequency components group on the premise of SSA.

Procedure 2: Prediction of Hybrid Models
The forecasting accuracy of a single model cannot be optimal when a change in training data occurs during the forecasting process. Thus, single models do not always give optimal forecasting results when training data changes. To avoid a poor forecasting performance, combined models were developed. In this study, seven different models (ANFIS, BPNN, ELM, ENN, GRNN, LSTM, RNN, WNN and SVM) were employed to forecast wind speed and wind power, and five optimal forecasting models were selected as sub-models of the combined model. Meanwhile, data pre-processing was used to eliminate noise and useless information in the original time series. In the process of forecasting and modeling, the first 9 days (1296 data point) were used as the training set for the model, and days 10-14 (720 data points) were used as the testing set of the model. Each model produced 144 forecasting values.

Procedure 3: Establishment of the Proposed Combined System
After completing the modeling and forecasting process for the single hybrid model, the accuracy of each model testing set was calculated, and the five forecasting models with the highest accuracy levels were selected as sub-models of the combined model. Then, the testing data (720 data points) of the sub-models were used to build the combined model. The modified multi-objective dragonfly algorithm (the detail of MODA shown in Appendix A) was used to determine the optimal weight of the combined model. Subsequently, the forecasting results obtained from every individual model were integrated using the obtained weight coefficients, and wind speed and wind power forecasting was conducted.

Procedure 4: Wind Speed and Wind Power Forecasting
Based on realistic data, the developed combined model was used to carry out dayahead forecasting. The theory behind day-ahead forecasting is as follows: First, the forecast origin and forecast horizon are set and denoted by a time point h and a positive whole number l, respectively. Then, if we want to forecastŷ h+l at time point h where l ≥ 1, we now defineŷ h (l) as the prediction data of y h+l ; therefore,ŷ h (l) can represent the l-step day-ahead forecasting of y t at prediction point h. Once l = 1,ŷ h (l) is used to carry out one-step day-ahead forecasting [54].

Procedure 5: Model Evaluation
The forecasting precision and stability of the models during point forecasting were evaluated by four metrics. Three metrics were used for the uncertainty analysis.

Experiment and Results
In this section, the wind speed series data selection and experiment settings used for our study are illustrated systematically.

Data Acquisition
In this paper, the original wind speed and wind power time series used for forecasting were acquired in 2018 from four observation sites in Shandong province in China. These were used to establish and test the forecasting performance of each model.
Statistical descriptions for the datasets obtained for the two sites (mean, standard deviation, skewness, kurtosis, minimum, maximum, and median) are provided in Table 1.
The mean values characterize the central tendency of the historical observations. The standard deviation values (the wind speed hovered around 2 m/s, and the wind power was around 200 kw) clearly reflect the fluctuations in wind speed and wind power. The skewness values of the wind speed and wind time series were greater than zero, which indicates that both types of time series had right-skewed distributions. Most wind time series had peak values greater than 3, indicating a fat tail distribution. Most kurtosis values of the wind power time series were greater than 3, indicating a fat tail distribution. The kurtosis values of the wind speed time series were less than 3, indicating a slight tail distribution.

Experimental Setup
In this study, three experiments were conducted to verify the validity, superiority, and generalizability of the proposed combinatorial optimization model. Experiment I was designed to compare the forecasting performances of combined models with different numbers of sub-models for wind speed and wind power forecasting in the first season. Experiment II contrasted the forecasting performances of the combined models. The models were optimized by different optimization algorithms for wind speed and wind power forecasting in the second season. Experiment III verified the performance of the combined model for wind speed and wind energy forecasting in the third and fourth seasons. The details of the experiments are as follows: Experiment I was designed to compare the combined model with different numbers of forecasting sub-models to determine the optimal number of sub models required by the combined model. Wind speed and wind power data collected from four sites in the first season were used to determine the forecasting capacity of the proposed model. Furthermore, the forecasting step was classified as day-ahead and used to assess the conducted models. Experiment II was conducted to compare the performances of different optimization algorithms to optimize the weight of the combined model. Wind speed and wind power series from Site 1 to Site 4 in the second season were collected at 10 min intervals to establish four combined models based on different optimization algorithms.
Experiment III aimed to verify the applicability of the combined model based on five sub-models and optimized by MODA. Wind speed and wind power time series data collected at 10 min intervals in the third and fourth seasons were employed to verify the prediction performance of the combined model for day-ahead forecasting. Table 2 shows the forecasting precision and stability of the models during point forecasting. The goodness-of-forecasting fit DA Directions or turning points between actual and forecasting values DA = 100

Performance Metrics and Benchmark Model
Five fitting error indices and two main benchmark models were introduced to ensure the predictability of the proposed model. These indicators were MAE, RMSE, MAPE, STDAPE, DA, U1, U2, and R 2 (the coefficient of determination), as well as seven ANN models. MAPE was a significant focus.

Parameter Setting
Parameter settings and initializations important to the methods applied included the following: Input dimension. For the input vector, we set varying values of 1-8 for the input vector based on the ten-minute wind speed data and values of 1-6 for the input vector based on the 10-min wind power and wind speed data. The results of our empirical study indicated that the forecasting accuracy was at its best when the input vector dimension was 6.
SSA decomposition and EEMD denoising. For the previously mentioned SSA decomposition stage, the window length L and the principal components were the two most vital factors. On the basis of the trial-and-error method, L was set to 24 for ten-minute intervals to assess the wind speed transition and wind power from observation Sites 1 to Site 4 due to the homogeneity of the data structure among intervals. Thirteen principal components were selected for all sites. Table 3 shows the eigenvalue contributions to the various lowand high-frequency data groups in the time sequences. Taking the Site 1 wind speed series as an example, the principal components related to the first 13 eigenvalues accounted for 97.7643% of the variability, representing the main trend in the series. Further denoising was carried out for the high-frequency components. By parity of reasoning, the wind power time series data from the four sites were also analyzed by SSA using the trial-and-error and principal component analysis methods. The details are given in Table 3. For the EEMD denoising stage, the number of IMFs was acquired by the formula log 2 N − 1, where N is the length of the series. The IMFs ranged from high-frequency to low-frequency. Table 3. Results for four sites after SSA.

Data Sets
Frequency Eigenvalue For MODA, the population size of the dragonfly was 40, the archive size was 500, and the maximum number of iterations was 200.
For each artificial neural network, the building and training of the network were nearly parallel. There were 5 input layer neurons, 20 initial hidden layer neurons, and 1 output layer neuron.

Experimental Results and Analysis
This section presents the results from the three experiments on the forecasting performance of the proposed novel model.

Experiment I: The Forecasting Performance of the Combined Models Optimized by Different Optimization Algorithms
Based on the historical wind speed and wind power time series, four experiments were designed to analyze and contrast the developed combined model with the sub-model forecasting models. Experiment I contrasted combined models with different numbers of sub-models in terms of their forecasting performances and analyzed the ability of the proposed model with five sub-models to forecast wind speed and wind power. The wind speed and wind power forecasting performance were evaluated by eight metrics.
For Site 1, the proposed combined model had the best forecasting performance for both wind speed forecasting and wind power forecasting. Specifically, the mean absolute percentage error (MAPE) value was 6.51% for wind speed forecasting and 13.21% for wind power forecasting, smaller than those of the other combined models and sub-models. The relevant characteristics of the models used for Site 1 are shown in Figure 2.
For Site 2, the best mean absolute error (MAE), root-mean-square error (RMSE), mean absolute percentage error (MAPE), and standard deviation of absolute percentage error (STDAPE) were all obtained with the developed combined method for wind speed forecasting, with values of 0.2608, 0.3783, 6.90%, and 6.41%, respectively. For wind power forecasting, the best forecasting metrics were obtained with the proposed combined model shown in Table 4. This confirms the good performance of the proposed combined model. model shown in Table 4. This confirms the good performance of the proposed combined model.
For Site 3, the wind speed forecasting accuracy of the combined model with five submodels was significantly better than that of the combined models with less than five submodels. This means that increasing the number of sub-models can improve the forecasting accuracy of the combined model. The forecasting results of the combined model for Site 4 were similar to those obtained for previous sites; the proposed combined model provided better forecasting results in the first season. Taking wind power forecasting as an example, the lowest MAPE value, obtained with the advanced model, was 13.66%. The combined model with two sub-models had the worst forecasting result, having a MAPE value of 18.48% (the forecasting results of the sub-models are shown in Appendix B). Table 4 also shows that the goodness of fit values for different combined models were over 0.9, indicating that combined models have a reduced forecasting error.

Remark 1.
A general survey of the forecasting results for wind speed and wind power in the first season showed that the proposed combined model hds an optimal wind speed forecasting capacity. MAPE values at one day ahead were 6.51%, 6.41%, 7.00%, and 6.16% for Sites 1 to 4, respectively. Meanwhile, for wind power forecasting, the proposed combined model had the lowest forecasting error among all involved models. Moreover, for the forecasting results of wind speed and For Site 3, the wind speed forecasting accuracy of the combined model with five sub-models was significantly better than that of the combined models with less than five sub-models. This means that increasing the number of sub-models can improve the forecasting accuracy of the combined model.
The forecasting results of the combined model for Site 4 were similar to those obtained for previous sites; the proposed combined model provided better forecasting results in the first season. Taking wind power forecasting as an example, the lowest MAPE value, obtained with the advanced model, was 13.66%. The combined model with two sub-models had the worst forecasting result, having a MAPE value of 18.48% (the forecasting results of the sub-models are shown in Table A1). Table 4 also shows that the goodness of fit values for different combined models were over 0.9, indicating that combined models have a reduced forecasting error.

Remark 1.
A general survey of the forecasting results for wind speed and wind power in the first season showed that the proposed combined model hds an optimal wind speed forecasting capacity. MAPE values at one day ahead were 6.51%, 6.41%, 7.00%, and 6.16% for Sites 1 to 4, respectively. Meanwhile, for wind power forecasting, the proposed combined model had the lowest forecasting error among all involved models. Moreover, for the forecasting results of wind speed and wind power, the MAPE value of wind power was twice that of wind speed with similar goodness-of-fit values.

Experiment II: The Forecasting Performance of Combined Models Optimized by Different Optimization Algorithms
This experiment was designed to compare combined models optimized by different optimization algorithms, including MOGWO, MODA, MOMVO, and MODE. This experiment aimed to assess wind speed and wind energy forecasting in the second season. The evaluation metrics obtained for each model (MAE, RMSE, STDAPE, DA, U1, U2, MAPE and R 2 ) are shown in Table 5 and Figure 3.
For Site 1, the combined model optimized by MODA for wind speed forecasting and the combined model optimized by MODA for wind power forecasting produced the most accurate forecasting results in the most efficient manner. The MAPE values of the optimal combined model were 8.965% and 19.506%, respectively. In comparison, the forecasting performances of the combined models optimized by four algorithms showed no significant differences; for example, the RMSE values of the four combined models were 0.4647, 0.4649, 0.4644, and 0.4648 for wind speed forecasting. The wind power forecasting results for each model are listed in Table 5. For wind speed forecasting at Site 2, the combined model optimized by MODA obtained the best assessment metrics for wind speed forecasting. Specifically, for wind speed forecasting, the MAPE value obtained by the combined model optimized by MOGWO was 10.409%, better than the values obtained by the combined models optimized by MOGWO, MOMVO, and MODE by 0.0019%, 0.0019%, and 0.0048%. These models were ranked from second to last in terms of their forecasting accuracy, respectively. Figure 3 shows the wind power forecasting results for the combined models optimized by four optimization algorithms.
For Site 3, all of the combined models optimized by optimization algorithms provided satisfactory wind speed and wind power forecasting results, as proven by their better goodness-of-fit values compared with those of the sub-models (the forecasting results of sub-models shows in Table A2).
For Site 4, the combined models optimized by four different optimization algorithms showed outstanding forecasting potential. For wind power forecasting, the combined model optimized by MODA obtained the lowest MAE, RMSE, STDAPE, and MAPE values of 45.4039, 81.9297, 24.57%, and 14.226%, respectively. There were no obvious differences in MAPE among the other combined models.
ences in MAPE among the other combined models.

Remark 2.
The evaluation index values obtained in Experiment II reveal that regardless of the optimization algorithm applied to optimize the combined model and the site used for forecasting, there is no optimal method for wind speed and wind power forecasting. For the second season of wind speed and wind power forecasting, the forecasting evaluation metrics of the combined models optimized by each algorithm had no significant differences.

Remark 2.
The evaluation index values obtained in Experiment II reveal that regardless of the optimization algorithm applied to optimize the combined model and the site used for forecasting, there is no optimal method for wind speed and wind power forecasting. For the second season of wind speed and wind power forecasting, the forecasting evaluation metrics of the combined models optimized by each algorithm had no significant differences.      (1) The combined model with five sub-models produced the smallest MAE, RMSE, STDAPE, U1, U2, and MAPE values and the largest DA and R 2 values at all sites. This indicates that the combined model has the best forecasting performance and stability.
(2) For wind power forecasting, the results of the combined model with five submodels (CM5) for four sites, as evaluated by eight metrics, were better than those of the other combined models. For example, the RMSE values of the CM5 model at the four sites were 40.1238, 32.9801, 56.0705, and 69.8088. Compared with the other combined models, the forecasting accuracy of the CM5 was greater. The main reason for this is that the CM5 can effectively use the advantages of each sub-model and the weight of the combined model optimized by the optimization algorithm. By comparing the CM5 with the CM2, CM3 and CM4 models, we determined that the number of sub-models included influences the forecasting performance of the combined model. The MAPE value of the CM5 was 12.00% lower than those of the other combined models for Site 1 in the third season.
(3) For wind speed forecasting, the combined model obtained satisfactory forecasting results. The forecasting results of the combined models were better than those of the sub-models. Thus, the use of combined models can improve forecasting accuracy. For wind speed forecasting at Site 1, the MAPE values of the five sub-models were 14.61%, 16.31%, 18.46%, 21.04%, and 24.38%, respectively. In contrast, the MAPE value of the combined model (CM5) was 6.29%, an improvement of 8.32%, 10.02%, 12.17%, 14.75%, and 18.09%, respectively, compared with the five sub-models.
(4) The models' goodness of fit results are shown in Table 6. The minimum value was 0.9665, which proves that the forecasting series obtained by combined model (CM5) was consistent with the actual series. To verify the applicability of the combined model with five sub-models optimized by MODA for determining the wind speed and wind power series, the forecasting results for the fourth season are shown in Table 7 and Figure 5.
For Site 1, in the fourth season, the combined model with five sub-models (CM5) obtained the best forecasting accuracy. For wind speed and wind power forecasting, the evaluation metrics of the CM5 were also better than those of combined models with less than five sub-models and for the sub-models individually. This means that the CM5 is more suitable for wind speed and wind power forecasting.
For Site 2, the sub-models and combined models showed analogical forecasting abilities in terms of the values obtained for the forecasting evaluation metrics. To be more specific, for wind speed forecasting, the MAPE values of the five sub-models were 16.30%, 18.64%, 20.90%, 23.90% and 27.26%, respectively. In contrast, the MAPE value of the combined model (CM5) was 6.77%-an improvement of 9.53%, 11.87%, 14.13%, 17.13%, and 20.49%, respectively, compared with the five sub-models (shown in Table A4).
As for Site 3, the combined model with five sub-models showed a superior performance to the other combined models based on the eight employed evaluation criteria with MAE, RMSE, STDAPE, DA, U1, U2, MAPE, and R 2 values for wind power forecasting of 26 For Site 4, the MAPE values of the combined model (CM5) for wind speed and wind power forecasting were 7.86% and 33.05%, respectively. Compared with the other three combined models with the highest MAPE values, the combined model with five sub-models improved wind speed and wind power forecasting by 2.79% and 10.76%, respectively. Comparing the combined models and sub-models, the forecasting accuracy of the combined models was better. A comparison of the wind speed and wind power forecasting results for Site 4 is presented in Figure 5.

Remark 4.
The differences in the forecasting results obtained by the developed model and those obtained by the other individual models were significant. The evaluation indicator values obtained with the proposed model were more satisfactory, as they were lower than those computed with the contrasting models, regardless of the forecasting step. Hence, we conclude that the advanced combined model has a superior capacity relative to conventional individual models for short-term wind speed forecasting.

Discussion
In this subsection, a comprehensive discussion related to the proposed model is provided. This includes two parts: a significance test for forecasting values and forecasting errors and a forecasting uncertainty analysis.

Significance Test between Forecasting Values and Actual Data
To evaluate the significance level of forecasting errors between the proposed combined model and the other models, a classical hypothesis test method, the Diebold-Mariano Test [55], was used to measure the significance of prediction errors for different models. Theoretical support for this model is as follows.
Given a certain probability of coming to a wrong conclusion α, the original hypothesis H 0 will not be rejected as long as the forecasting capacity of the proposed model possesses no evident distinction when contrasted with the comparison model; if the inverse is true, H 0 will be rejected and H 1 accepted. The hypothetical form is: where L represents the loss function for forecasting errors, and error 1 and error 2 are the forecasting errors of the combined model with five sub-models and those of other combined models and single models, respectively. Further, the DM statistical magnitude can be expressed by: In the above equation, S 2 is the estimated variance of d = L(error 1 ) − L(error 2 ). After obtaining the calculated DM value, a comparison between the DM statistics and a critical value Z α/2 is conducted. Once the DM statistic is greater than Z α/2 or less than −Z α/2 , the original hypothesis will be rejected. Moreover, it may be concluded that there is an evident distinction between our developed model and the compared model.
The Wilcoxon rank-sum test [56,57] is a nonparametric test that can be used to determine whether two independent samples have been selected from populations with the same distribution.
If h = 1, the null hypothesis that there is no difference between two samples at the 5% significance level is rejected.
If h = 0, there is a failure to reject the null hypothesis at the 5% significance level. The results of the Diebold-Mariano test and Wilcoxon rank-sum test for wind speed and wind power forecasting are shown in Table 8. The following conclusions were made: First, the test results of DM test for wind speed showed little variation in the forecasting error among the different combined models, and the Wilcoxon rank-sum test results showed no difference between the forecasting results of each model and the actual wind speed at the 5% significance level. When the combined models were compared, the combined model with five sub-models was found to be evidently different from the other models at the 1% significance level for wind power forecasting. However, the DM test between the combined model and the sub-model revealed that some sub-models obtained DM values that were higher than the threshold at a 5% significance level. For example, in the first season, the SSAWD-Elman sub-model had a DM value of 1.6448 for wind power forecasting. The test results for the other sub-models are shown in Table A5. The Wilcoxon rank-sum test result of each combined model showed no differences between the actual wind power values and the forecasting results of each combined model. Meanwhile, the Wilcoxon rank-sum test results presented in Table A5 show that some sub-models rejected the null hypothesis at the 5% significance level, which indicates that the forecasting results of some sub-models differed from the actual data.  Note: * is the 5% significance level. (0) there is difference between two sample at the 5% significance level. (1) there is no dif-ference between two sample at the 5% significance level.

MODA-CM5
For the wind speed and wind energy forecasting error test results, there was little variation among the combined models at the 1% significance level, while there were greater differences between the combined models and the sub-models. Comparing the actual data with those forecast by each model, it was found that the wind speed data forecast by each sub-model and the real data represented the same sample at the 5% significance level. However, when the wind power forecasting values obtained with each model and the actual data were compared with the Wilcoxon rank-sum test result, they were found to represent two different samples at the 5% significance level. Based on these results, the wind speed forecasting accuracy was deemed to be higher than the wind power forecasting accuracy.

Uncertainty Analysis
Uncertainties in wind speed and wind power forecasting result in an imbalance between projected and actual time series, and this influences the operation reliability of wind farms [58]. Based on the forecasting results of a combined model for wind speed and wind power, interval forecasting was employed to analyze the level of uncertainty in wind speed and wind power forecasting and to assess possible threats for decision planners in terms of power dispatching.
More specifically, the uncertainty analysis used interval forecasting to analyze wind speed and wind power data sets from four seasons, each with a span of 10 min. By comparing the forecasting performances of different combined models and sub-models, the forecasting error of each model was analyzed in four seasons, and the differences in the prediction error were compared with the level of error in the proposed models, namely, the sub-model based on the ANN and the MODA-based combined model with different numbers of sub-models. Sub-models and combined models were used as contrasting models to verify the superiority of the proposed combined model in terms of interval forecasting. If the evaluation results of the proposed combined model were better than those of all comparison models under the same conditions, this would verify that the proposed model is better than combined models with less than five sub-models or submodels alone in terms of uncertainty forecasting and stability forecasting [59].
To better identify the characteristics of the wind speed and wind power forecasting values from each model, a diverse probability density function including the t locationscale, stable distribution, logistic, and normal distribution functions was employed to fit the forecasting wind speed series and wind power series via maximum likelihood estimation (MLE) [60]. Referring to the results of the assessment index R 2 shown in Table  9 and Figure 6, the t-location scale and stable function was selected as the most suitable probabilistic distribution for further interval forecasting. The evaluation indices FICP, FINAW, and AWD were adopted to assess the interval forecasting results and to analyze the forecasting uncertainty of the four selected models [61]. Notably, the larger FICP values gave better analysis results, whereas larger FINAW and AWD values gave worse analysis results. The above three metrics and the lower and upper confines of the wind prediction values [62] are defined in Table 9, and the results of the uncertainty analysis are presented in Table 10. Furthermore, the expectation probability, defined as (1 − α) × 100%, was set at 95%, 90%, and 85% to assess the forecasting uncertainty of the four selected models in the uncertainty analysis.  Figure 6. The optimal distribution of forecasting error in CM5. The evaluation indices FICP, FINAW, and AWD were adopted to assess the inter forecasting results and to analyze the forecasting uncertainty of the four selected mod [61]. Notably, the larger FICP values gave better analysis results, whereas larger FINA and AWD values gave worse analysis results. The above three metrics and the lower upper confines of the wind prediction values [62] are defined in Table 9, and the res of the uncertainty analysis are presented in Table 10. Furthermore, the expectation pr ability, defined as 1 100  −  ( ) % , was set at 95%, 90%, and 85% to assess the forecast Figure 6. The optimal distribution of forecasting error in CM5.

Metric Definition Equation
Upper Bound Upper bounds of the wind speed forecasting value

Lower Bound
Lower bounds of the wind speed forecasting value Forecast interval coverage probability of testing dataset Forecast interval normalized average width of testing dataset Note: F(i) is the corresponding point prediction value at point i. K and σ are the quantile and scale parameters of the logistic DF. N represents the forecasting length, and NR is the difference between the maximum and minimum forecasting values. If the actual value is Au i ∈ [L i , U i ], c i = 1; otherwise, c i = 0. Table 11 shows that the combined model based on MODA and the five sub-models obtained satisfactory uncertainty analysis assessment results, which proves that the proposed model is superior to the other combined models. Using the wind power results for Site 1 as examples at the 15% significance level, the FICP values obtained from each combined model with different numbers of sub-models (MODA-CM2, MODA-CM3, MODA-CM4 and MODA-CM5) were found to be 77.58%, 79.71%, 81.97% and 82.96%, respectively, whereas the FINAW values were 0.0852, 0.0787, 0.0726 and 0.0694, respectively. Furthermore, the proposed model resulted in reductions in AWD values of 0.1119, 0.0645, and 0.0214, respectively, compared with the combined model with five sub-models.

Conclusions
As an important source of clean energy, the use of wind energy has undergone rapid development, becoming widespread in recent years. However, the irregularity and instability of wind speed series data greatly restricts the development of wind power generation. There is an urgent need to successfully and accurately forecast wind speed and wind power, solve the dispatch problem, and further improve the operation efficiency of the power market. In this study, a combined model for wind speed and wind power day-ahead forecasting was developed. First, the original wind speed and wind energy time series were preprocessed using the secondary denoising method. Then, nine different ANN and machine learning forecasting models were established to forecast the wind speed and wind energy time series after denoising.
By assessing the accuracy of the model validation set, the optimal sub-model was selected for the combined model. Finally, the combined weight of the combined model was optimized by the multi-objective optimization algorithm. Considering the wind speed forecasting results, the MAPE of the optimal combined model was between 5.86% and 11.92%, the R 2 was over 0.94, and the RMSE was between 0.3379 and 0.7595. Furthermore, the corresponding values for wind power forecasting were in the range of 11.10% to 33.05%, over 0.97, and between 0.10102 and 0.33641, respectively. All of the experimental results indicate that the combined forecasting system has high levels of accuracy and stability for wind speed and wind power forecasting.
Author Contributions: The experiment, data analysis, and paper writing were conducted by Q.L.; the experiment and data analysis were completed by Q.L. and Q.Z.; supervision, paper writing, and editing were conducted by Q.Z. and G.Z.; validation, methodology, paper editing, and supervision were handled by Q.L. and G.Z. All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by State Grid Corporation of China Science and Technology Project: SGGSKY00WYJS2000062.

Conflicts of Interest:
The authors declare no conflict of interest.

Modified Multi-Objective Dragonfly Algorithm
In order to solve the problems associated with the MODA of easily falling into local optimal solutions and having slower convergence speeds, a steps-based strategy based on an exponential function and an elite opposition learning strategy were used for modification. First, the elite opposition learning strategy was used to generate a broader search scope and diversify the population identified as the next generation and to improve the global search capacity and search accuracy of the MODA [62]. Then, a steps-based strategy based on an exponential function was applied to enhance the convergence speed of the algorithm in the later stages. The specific implementation scheme is as follows: where i = 1, 2, . . . , SN, j= 1, 2, . . . , EN, k = rand(0, 1), SN represents the population size of the dragonflies, and, generally, EN represents the selected number of elite individuals with the value of SN*0.1. k represents a generalized argument that is subject to a uniform distribution. The elite opposition learning strategy is used in the basic MODA to effectively expand the search area, diversify the population, and enhance the optimization performance and global search capability.
(2) Steps-based Strategy based on an Exponential Function In a basic MODA, there are several parameters (s, a, c, f , e, and w), as described by ∆x t+1 = (sS i + aA i + cC i + f F i + eE i ) + wx t , which are adaptively adjusted at random; hence, the dragonfly individuals update themselves according to a stochastic linear step size in the course of iteration. This means that the position update of the dragonfly completely relies on the site of the present individual and the stochastic linear step size, which can be determined, respectively. Despite the above strategy being conducive to finding a globally optimal solution to some degree, it cannot ensure that the solution is optimal; thus, this situation causes a slow convergence rate [51]. To expedite the convergence rate of the MODA, the exponential step strategy is applied to replace the original linear step strategy. This means that an exponential function is added to the original step and generates an exponential function steps-based strategy. It should be noted that a vital parameter µ is adopted to renew the step, so that the local and global hunt capacities can be improved, and the convergence rate can be further accelerated. Hence, it is pivotal to set a suitable µ.
In our study, the steps were updated based on the following exponential function: The enhanced step-size rule used is: Here, rand ∈ [0, 1] is a stochastic constant, and ∆x t is the step size in the tth iteration.
The equation for updating the position vector of dragonflies is expressed by: In the above, t is the current iteration, and x t is the position vector in the tth iteration. Apparently, the updating speed of the step size accelerates as the number of iterations increases. At the beginning of an iteration, the advanced step can accomplish local hunting and discover a more optimal hunt space. During the medium and later periods in the course of iteration, the advanced step will accelerate the convergence rate, avoiding the local optimum and finding the optimal solution in the global hunt process.     Note: * is at the 10% significance level. ** is at the 5% significance level. *** is at the 1% significance level. (0) there is difference between two sample at the 5% significance level. (1) there is no difference between two sample at the 5% significance level.