Abstract
In recent years, although deep learning algorithms have been widely applied to various fields, ranging from translation to time series forecasting, researchers paid limited attention to modelling parameter optimization and the combination of the fuzzy time series. In this paper, a novel hybrid forecasting system, named CFML (complementary ensemble empirical mode decomposition (CEEMD)-fuzzy time series (FTS)-multi-objective grey wolf optimizer (MOGWO)-long short-term memory (LSTM)), is proposed and tested. This model is based on the LSTM model with parameters optimized by MOGWO, before which a fuzzy time series method involving the LEM2 (learning from examples module version two) algorithm is adopted to generate the final input data of the optimized LSTM model. In addition, the CEEMD algorithm is also used to de-noise and decompose the raw data. The CFML model successfully overcomes the nonstationary and irregular features of wind speed data and electrical power load series. Several experimental results covering four wind speed datasets and two electrical power load datasets indicate that our hybrid forecasting system achieves average improvements of 49% and 70% in wind speed and electrical power load, respectively, under the metric MAPE (mean absolute percentage error).
1. Introduction
Effective forecasting plays an essential role in various aspects, such as energy application, economic risk management, standardized management, policy making, and so on. Forecasting helps corporations, governments, and other organizations and institutions to evaluate the market and to make relative predictions to better understand potential relations among entities and to plan for the future, which is a useful way to make policies on both the private and the social levels. As a result, many forecasting methods have been proposed during the past decades. Among these, there are two different categories: time series forecasting and causal forecasting. Since causal forecasting has some inherent limitations, including the reliability and availability of independent variables, time series forecasting has been applied much more widely due to its convenience for data collection and its high accuracy as well as stability. Time series forecasting methods presume that history will repeat itself, which means that the forecasting of future values is based on present values and past observations. Nowadays, time series forecasting has achieved great success in many industries, especially in the energy industry.
With the rapid development of the energy industry and increasing demand for high-level management and application, its infrastructure has been upgraded by a great extent, as a result of which the prices, supply, as well as demand have oscillated to a greater extent and have become more unpredictable than ever before. This has posed a great challenge to the forecasting method in terms of accuracy and stability, since forecasting plays an essential role in qualifying those unfortune features by which people are able to gain more accurate forecasts that can be applied to risk management, energy planning, industry configuration, and so on. In previous years, electrical power load forecasting has been an important part of power system planning and the basis of the economic operation of power systems. Unfortunately, we have to confront several difficulties, such as meteorological factors, development speed, and some cases of unpredictable natural devastation, if we want to take good advantage of power load data. Researchers have focused on exploring nonsymmetrical faults [1], ground faults [2], microgrid distributions [3], etc. For instance, Qu et al. [4] explored and developed an intelligent damping controller which can reduce power fluctuations in hybrid power systems. Ye et al. [5] studied long-term load forecasting based on support vector regression (SVR) and explored nonlinear relationships between economic growth in terms of GDP and power load requirements. On the other hand, with the inadequate implementation of corresponding emission and environmental protection policies [6], wind power has attracted many scientists and researchers [7]. Currently, wind power accounts for roughly 10% of the total consumption of energy in Europe—15% more than that of Spain and Germany [8]. To utilize the wind more effectively and efficiently, we need to get accurate forecasts of the wind speed. Nevertheless, as for wind speed, it has an inherently volatile and irregular quality and is considered a fairly tricky weather element to predict accurately as a result of its randomness and nonlinearity [9]. Numerous researchers and scientists have made great contributions to the development of effective and robust wind speed forecasting models, which can also be used to forecast electrical power load data. According to time horizons, there are four different types of forecasting methods: long-range forecasting, medium-range forecasting, short-time-period forecasting, and very-short-term forecasting. Moreover, it can also be divided into the following four types: artificial intelligence methods, statistical methods, spatial correlation methods, and physical methods [10].
Physical models containing parameters ranging from temperature to topography to pressure are usually used on a massive scale for long-term wind speed prediction with multiple weather parameters [11]. On the contrary, statistical models, such as the autoregressive (AR) model, Auto-Regressive Average (ARMA) [12], Autoregressive Integrated Moving Average (ARIMA) [13], fractional ARIMA (FARIMA) [14], exponential smoothing (ES) [15], and grey prediction (GP) [16], are developed on the basis of the relationships among variables through mathematical statistics to illustrate the potential correlations within the historical data sampled from the observed wind speed data. Spatial correlation methods mainly take into account the other factors, such as the direction of the wind, the terrain roughness, and the height above the horizon. Sometimes, this kind of method achieves high accuracy [17].
With the rapid development of and increasing research on computer science, the performance of complex calculations in less time has become possible. Consequently, in the past few years, a large number of statistical learning models have been recorded, which eventually formed a mature theoretical system. The renowned ANN (artificial neurol network) is widely utilized for wind speed forecasting fields, which have the ability to carry on the parallel processing and to deliver nonlinear maps. This mainly includes back propagation (BP) [18], the radial basis function (RBF) [19], the Elman neural network (ENN) [20], the wavelet neural network (WNN) [21,22], and others. In addition, during the past twenty years, the neural network field has experienced some innovations which have resulted in well-known deep learning (DL) models [23]. Particularly, the large computational cost has been the largest drawback of conventional neural network algorithms. However, greedy layer-wise pretraining is able to train the so-called deep belief network (DBN) more efficiently [24,25]. Following pertinent progresses, scientists are now able to create and train neural networks with not only one hidden layer, which, in turn, has increased generalization capabilities and allowed better outcomes. This field has also been renamed “in-depth learning” to assess the depth of progress made [26]. The success of DL models can be seen in computer science applications, such as image recognition [27], speech recognition [28], and machine translation [29]. Moreover, the benefits have also spread to energy-related fields, such as wind power forecasting, which especially refers to wind speed forecasting. In the same field, Wang et al. [30] proposed the convolutional neural networks (CNNs) to acquire precise probability prediction of wind power. However, there is still relatively less research about the DL-related models being applied in wind speed forecasting fields compared with the most active part of this technology. In Reference [31], a deep autoencoder (DAC) combined with extreme gradient boosting (XGB) was proposed to forecast the building cooling load; A deep neural network (DNN) was also applied to get the forecasting results; this method was more accurate than the other methods presented in the same paper. In Reference [32], the DL model was also shown to discern the islanding highly accurately. Therefore, regarding this point, we considered the application of these kinds of technologies in wind speed forecasting in an effort to get a higher accuracy. Furthermore, the authors of Reference [33] proposed a DL strategy applied to time series forecasting and demonstrated how it can be successfully used in electricity consumption forecasting, which correlates with the wind speed data to some extent. Except for ANNs, fuzzy logic methods [34] as well as support vector machine (SVM) [35]-related methods, such as least-squares support vector machines (LSSVMs) [36], Gaussian processes [37], and others, are also commonly applied in the forecasting of wind speed.
However, each method has different drawbacks and disadvantages as a result of its inherent nature. The drawbacks of the aforementioned models are summarized as follows:
- (1)
- Because physical algorithms are very sensitive to market information, they need a long run time and a large amount of computing resources. In addition, these models have shortcomings in dealing with short-term forecasting problems and they do not have high accuracy and validity in short-term forecasting.
- (2)
- Traditional statistical arithmetic methods fail to manage forecasting with fluctuations and high levels of noise, nonlinear and irregular trends, or other inherent characteristics of wind speed data that are primarily confined by the premise of a linear pattern along a time series. Moreover, oftentimes, these methods require a large amount of historical data on which they deeply depend in realistic cases. This means that once there is an abrupt and unexpected change in the original data as a result of social or environmental factors, prediction errors will proliferate all at once [38].
- (3)
- Spatial correlation arithmetic methods based on vast quantities of information, for example, the wind speed information of many spatially correlated sites which is difficult to collect and analyze, makes it hard to perform perfect wind speed forecasting [39].
- (4)
- Artificial intelligence arithmetic methods, different from other approaches, are able to deal with nonlinear features which are hidden among historical wind speed data. Although many studies have been carried out and the methods have been successfully applied to address complex data patterns, there are also some defects and drawbacks within artificial intelligence methods, such as showing a relatively low convergence rate and over-fitting, easily getting into a local optimum, etc.
- (5)
- Individual forecasting models are good at forecasting to some extent, but they rarely focus on the importance and necessity of data preprocessing; therefore, these approaches cannot always get a good forecasting outcome.
Hence, with the objective of combining all the advantages and of avoiding the weaknesses, a number of combined forecasting methods have been proposed [40]. Bates and Granger proposed the combination prediction theory and showed promising outcomes in 1969 [41]. Since then, research on combinatorial forecasting theory has attracted extensive attention [42]. Xiao et al. developed two combined models for wind speed sequence prediction: the AI combination model [43,44,45] and NNCT (no negative constraint theory). The results indicate that more reliable and accurate forecasts are attained when the combined models are applied.
In addition, with the purpose of achieving highly accurate forecasting, some types of time series preprocessing techniques, such as wavelet packet decomposition (WPD) [46], fast ensemble empirical mode decomposition (FEEMD) [47], and singular spectrum analysis (SSA) [48] techniques, have been effectively applied in the data preprocessing stages of time series forecasting fields in an effort to decrease the random disturbance traits of the original windspeed data. Similarly, techniques have been widely used in such hybrid models to get a higher forecasting accuracy. Thus, the complementary ensemble empirical mode decomposition (CEEMD) that is modified from the ensemble empirical mode decomposition (EEMD) is applied in this paper.
Thus, in this study, the CEEMD-FTS (fuzzy time series)-MOGWO (multi-objective grey wolf optimizer)-long short-term memory (LSTM), a combined model with CEEMD as the preprocessing part, is based on LSTM, which belongs to the RNNs (recurrent neurol networks) within the DL field, but a modified version with less disadvantages and more powerful memorizing capability and the meritorious multi-objective optimization algorithm MOGWO is developed. Subsequently, to deal with the uncertain forecasting problems and to dig out more useful and constructive information hidden within the history data to get a better forecasting result, we also combine the aforementioned model with the fuzzy time series analyzing method based on rough set rule induction which contains the LEM2 (learning from examples module version two).
Generally, the innovations of this study can be summarized as follows:
- (1).
- This study proposes a hybrid forecasting model which can take advantage of deep learning networks as well as the fuzzy time analysis technique based on the LEM2 rule-generating algorithm, which increases the forecasting accuracy obviously. To our knowledge, it has not been found that deep leaning neural networks are combined with the rough set induction theory. Hence, our study develops a hybrid model combining LSTM with the fuzzy time series analysis technique that uses rough sets to generate rules as a replacement for traditional rule-generating methods.
- (2).
- This study improves the forecasting stability and accuracy simultaneously with the deep learning neural network through the weight-determining method called MOGWO based on the leave-one-out strategy and swarm intelligence, which helps to find best weighting parameters for the LSTM neural network. Most previous studies just paid attention to one aim (stability or accuracy). Therefore, to achieve high accuracy and stability, a multi-objective optimization algorithm, MOGWO, is successfully applied in this study.
- (3).
- This study provides a scientific and reasonable evaluation of the new hybrid forecasting model made to verify the forecasting performance of the combined forecasting model proposed in this paper. Three experiments are carried out in this paper, including comparisons between different deep learning neutral networks, efficiency and effectiveness tests among various models in four different wind sites, and a contrast experiment in which the proposed hybrid forecasting system is applied to electrical load forecasting with two different electrical power load data series on Wednesday and Sunday. The outcome illustrates that this proposed system performs well.
- (4).
- This study delivers an insightful discussion about the developed forecasting system, illustrating the improvements brought about by different parts of the proposed forecasting model as well as the multistep forecasting ability. Five discussion topics are presented in this paper, namely statistical significance, association strength, improvement percentage, multistep ahead forecasting, and sensitivity analysis. Through these discussions, the effectiveness of the hybrid forecasting framework is verified.
The remainder of this paper is organized as follows:
Section 2 gives the profile of principles of methods corresponding to the proposed hybrid models, namely the CFML model (CEEMD-FTS-MOGWO-LSTM). Relevant methodology is shown Also, in this section, including the data preprocessing method, the fuzzy time series technique with LEM2, the MOGWO, and the long short-term memory algorithm. Moreover, several evaluations and experiments that help to demonstrate the performance of the CFML model are presented in Section 3. Moreover, Section 4 gives a discussion about different comparison outcomes. Finally, Section 5 concludes this study.
2. Methodology
An innovative hybrid forecasting model is successfully developed and the corresponding components are introduced briefly in this section, including the data preprocessing technique named complementary ensemble empirical mode decomposition (CEEMD), the fuzzy analyzing part based on rough sets induction theory, the forecasting algorithm named LSTM, and the multi-objective optimization algorithm MOGWO.
2.1. Hybrid Forecasting Framework
Figure 1 and Figure 2 shows combined the CFML forecasting model, from which the CFML system can be expounded as follows:
Figure 1.
Explicit processes of data input and complementary ensemble empirical mode decomposition (CEEMD) parts of the CEEMD-fuzzy time series (FTS) multi-objective grey wolf optimizer (MOGWO)-LSTM (CFML) model.
Figure 2.
Flowchart of the paper and the input data of the first forecasting.
- The original wind speed data is decomposed by applying the CEEMD method into several subseries named Intrinsic Mode Functions (IMFs).
- The fuzzy analysis method is applied using the rough set induction LEM2 algorithm to generate the forecasting rules, and raw data are applied to these rules to generate preliminary forecasts. These forecasts obtained by fuzzy time series forecasting are not precise enough, but the difference between these forecasts and the actual values can demonstrate potential forecasting biases that are useful for modifying the learning process of the following neural network, namely the LSTM model optimized with MOGWO. As for the raw input data, we accept five dimensions for each forecast, including lag1, lag2, lag3, slope, and the present data, in order to forecast the following one for each subseries (Figure 2).
- The output data generated from the previous steps is used as the input data for the LSTM forecasting module, which is optimized by the multi-objective optimization algorithm called MOGWO for each subseries. Specifically, real values of lag1, and lag2 and their differences, including D1, D2, and D3, are adopted as input data of the LSTM model modified by MOGWO (Table 1).
Table 1. The selected input variables for long short-term memory (LSTM). - The forecasting outcomes of each subseries generated from the preprocessing part named CEEMD are aggregated to obtain the eventual forecasting results of CFML.
2.2. Data Preprocessing Module
The CEEMD algorithm, proposed by Yeh et al. [49], is the modified version of the EEMD and EMD. According to Anbazhagan et al. [50], the primary steps of this algorithm are as follows:
- Step 1:
- Add white noise pairwise with the identical amplitude and the opposite phase to the raw data sequence , after which we can obtain a pair of polluted signals:
- Step 2:
- Decompose the polluted signal pairs () into a finite set including IMF components:
- Step 3:
- Two sets of IMF components, i.e., the negative noise set of the first IMF component and positive noises , are obtained by performing the above two steps T times with different amounts of white noise.
- Step 4:
- The component of the j-th IMF can be calculated as follows in order to get the ensemble means of whole IMFs:
2.3. Rough Set Theory (RST) and LEM2
In this part, the fuzzy forecasting module of the proposed new hybrid model CFML which contains the rough set theory and the more detailed rule induction algorithm called LEM2 is introduced in brief.
Pawlak and Skoworn proposed RST [51], and it has been acknowledged as one of the most effective mathematical techniques for dealing with uncertainty as well as vagueness. The premise of Rough Set Philosophy is that, due to the lack of information in the discourse space related to each object, the few information objects distinguished by the same information cannot be distinguished. The set of all indistinguishable objects is regarded as the basic set and creates the basic particles of cosmic knowledge. Any union of elementary sets is accepted as an exact set; otherwise, the set is called a rough set. RST includes the utilization of indiscernibility relations to approximately approach the sets of objects by upper and lower approximations [52]. This rough set theory is widely used to acquire more accurate rules to predict objects, and the LEM2 algorithm is usually adopted as a way of applying rough set theory to the induction of rules.
LEM2 [53], a rough set rule induction algorithm, is most frequently adopted as it has better results in most cases. In this study, the formed rules are generated in an “if-then” manner through composing several fuzzy decision values as well as fuzzy conditional values. Moreover, “supports” indicate how many records are archived in the dataset that matches the generated decision rules. LEM2 computes a local covering and then converts it into a rule set. LEM2 learns a discriminant rule set; it learns the smallest set of minimal rules describing a concept. This algorithm can generate both certain and possible rules from a decision table. The rough set induction LEM2 algorithm has several advantages because of the application of rough set theory, as follows:
- 1.
- Rough sets can discern hidden facts and make it possible for us to understand these facts in natural language, which contributes a great deal to decision making;
- 2.
- Rough sets take the background information of decision makers into account;
- 3.
- Rough sets can deal with both qualitative and quantitative attributes;
- 4.
- Rough sets enable machines to extract certain rules in a relatively short time, which means it reduces the time cost of discovering hidden rules.
The detailed process of how LEM2 works is briefly demonstrated as follows: For an attribute–value pair , a block of n which is signified by , is a set of instances belonging to so that, for an attribute, e has a value u. For a concept represented by the decision–value pair , B is a nonempty upper or lower approximation of it. Set consists of a set of attribute–value pairs , which is called set only under the condition that , where set is a minimal complex of only under the condition that depends on set and that there are no subsets of such that depends on the subset. Symbol C is a nonempty collection of nonempty attribute–value pair sets, and is the local covering of . A more detailed explanation can be found in the work of Grzymala-Busse [53].
Figure 3 demonstrates the pseudocode of LEM2 based on the study of Liu et al. [54].
Figure 3.
Flowchart of fuzzy time series forecasting.
- Step 1.
- Compute all attribute–value pair blocks.
- Step 2.
- Identify attribute–value pairs with the largest .
- Step 3.
- If the cardinality of the set is equal to another one, then select the attribute pair with the smallest block size.
- Step 4.
- If necessary, we have to go through an additional internal loop in order to find the candidates for the minimal complex.
- Step 5.
- Then, the following steps are used to find the second minimal complex and so on.
- Step 6.
- Finally, we can get the local covering of a hidden fact, which may reveal the decision-making process.
2.4. Multi-Objective Grey Wolf Optimizer (MOGWO)
To get more accurate forecasts, we adopt the GWO (grey wolf optimizer) algorithm which is modified to deal with the multi-objective problems to optimize the main forecasting model LSTM. By using the multi-objective optimization theory, we can achieve both an accurate and a stable forecasting quality.
Mirjalili et al. proposed the grey wolf optimization algorithm [55], which was based on grey wolves’ social leadership and hunting skills. In addition, the hunting process is led by three wolves (α, β, and δ). The rest of the wolves follow these three leaders throughout the whole search process to approach the global best solution.
The following formulas were proposed in an effort to emulate the encircling behaviors of grey wolves:
where K denotes the distance between the prey and the predator, ite refers to the current iteration, denotes the position vector of wolves, is the prey’s position vector, and and are coefficient vectors:
where and are random vectors in and the elements of decrease linearly from 2 to 0 across all iterations.
The GWO algorithm archives the first three best results gained so far in each iteration and then imposes other agents, namely the rest of the wolves, to update the positions with respect to them. The following formulas are calculated constantly for each search agent [55] in order to mimic the hunting process, and the promising regions of the search space are also found in this process:
The vector produces random values in . This will help the GWO algorithm show increased behavior in the whole optimization process and help to avoid and explore the local optimum. All these steps are illustrated in Figure 4. Ri is the position of wolf i, which also represents the initial weight and threshold of the LSTM model. That is to say, Ri is a vector and its dimension is determined by the number of initial weights and thresholds of the LSTM model and each element in this vector is a value of a threshold or a weight of LSTM.
Figure 4.
Position updating mechanism of search agents and the effects of A on it.
Attacking is the final stage of hunting, in which the wolf pack catches the prey and the prey stops moving. The process is determined by . Grey wolves will continue to hunt when , and the wolves are obliged to leave the prey when .
2.5. Long Short-Term Memory (LSTM)
The LSTM model was developed by Schmidhuber and Horchreiter [56]. The harmless gradient in the network is truncated by forcing constant error flow through the constant error turntable in a special multiplication unit. In order to cope with these constant error flows, all of the nonlinear units are able to learn to close or open gates in this network.
The cell state is the key part of the LSTM structure. It runs directly along the entire chain, deleting or adding information to the cell state, carefully adjusted by structures called gates. These gates serve as optional entry points for this information. They consist of a pointwise multiplication operation and a sigmoid neural net layer (Figure 5).
Figure 5.
LSTM (long-short-term memory) structure.
An input at time i is (), and the following formulas are used to compute the hidden state ():
- In the LSTM module, the first step is to determine which information will be discarded from the cell state. The forget gate () is in charge of making decisions, as follows:where σ is the sigmoid function which turns the input value into an outcome between 0 and 1. T signifies weight parameters, and b denotes bias parameters (i.e., , , , and and ,, , and ). In this part, the exponents of T and V are not power values; they are just notations used to illustrate which gate the parameters belong to. For instance, represents the weight parameters belonging to the forget gate, namely gate f.
- The next step is to determine which new information will be selected and stored in the cell state. This step has two sub-steps: The first one is the input gate () layer that helps to determine which value is going to be updated. A tanh layer is the second one, which produces a vector composed of new candidate values . Calculations are demonstrated as follows:where is a candidate memory cell, which is similar to a memory cell, but uses a tanh function.
- The next step is to update the old cell state into the new cell state , which can be described as follows:In Equation (26), the symbol represents pointwise multiplication.
- The final step is to determine what is about to be generated and selected as the output. This output is a filtered version which is predicated on the cell state, during which the output gate () determines which final output will consist of a specific part of the cell state. After, the cell state runs through the tanh layer, which is multiplied by the output gate as follows:
| Algorithm: MOGWO-LSTM | |
| Objective function Input: | |
| Training data: | |
| Testing data: | |
| Output: | |
| —a series of forecasting data | |
| Parameters of MOGWO: | |
| Iter—the maximum number of iterations | n—the number of grey wolves |
| t—the current iteration number | Ri—the position of wolf i |
| e1—the random vector in [0, 1] | c—the constant vector in [0, 2] |
| Parameters of LSTM: | |
| Iteration—the maximum number of iterations | Bias_input—the bias vector of the input gate in [0, 1] |
| Input_num—the knots of the input | Bias_forget—the bias vector of the forget gate in [0, 1] |
| Cell_num—the knots of the cell | Bias_output—the bias vector of the output gate in [0, 1] |
| Output_num—the knots of the output Cost_gate—the termination error cost | yita—the rate of adjustment for the weight at each time data_num-the number of columns of training data. |
| 1:/*Set the parameters of MOGWO and LSTM*/ | |
| 2:/*Initialize the grey wolf population Ri (i = 1, 2, ..., n) randomly*/ | |
| 3:/*Initialize c, M, and B*/ | |
| 4:/*Define the archive size*/ | |
| 5: FOR EACH i: 1 ≤ i ≤ n DO | |
| 6: Evaluate the corresponding fitness function Fi for each search agent | |
| 7: END FOR | |
| 8: /*Find the non-dominated solutions and initialize the archive with them*/ | |
| 9: Rα, Rβ, Rδ= SelectLeader(archive) | |
| 10: WHILE (t < Iter) DO | |
| 11: FOR EACH i: 1 ≤ i ≤ n DO | |
| 12: /*Update the position of the current search agent*/ | |
| 13: Kj = |Bi Rj−R|, i = 1, 2, 3; j = α, β, δ | |
| 14: Ri = Rj−Mi Kj, i = 1, 2, 3; j = α, β, δ | |
| 15: R(t + 1) = (R1 + R2 + R3)/3 | |
| 16: END FOR | |
| 17: /*Update c, M, and B*/ | |
| 18: M = 2 c e1−c; B = 2 c e2−c | |
| 19: /*Evaluate the corresponding fitness function Fi for each search agent*/ | |
| 20: /*Find the non-dominated solutions*/ | |
| 21: /*Update the archive with regard to the obtained non-dominated solutions*/ | |
| 22: IF the archive is full DO | |
| 23: /*Delete one solution from the current archive members*/ | |
| 24: /*Add the new solution to the archive*/ | |
| 25: END IF | |
| 26: IF any newly added solutions to the archive are outside the hypercubes DO | |
| 27: /*Update the grids to cover the new solution(s)*/ | |
| 28: END IF | |
| 29: Rα, Rβ, Rδ = SelectLeader(archive) | |
| 30: t = t + 1 | |
| 31: END WHILE | |
| 32: RETURN archive | |
| 33: OBTAIN R* = SelectLeader(archive) | |
| 34: Set R* as the initial weight and threshold of LSTM | |
| 35: /*Standardize the training data and testing data*/ | |
| 36: /*Initialize the structure of the LSTM network*/ 37:/*Initialize cost_gate, bias_input, bias_forget, bias_output and the weight of the LSTM network*/ 38: FOR EACH i: 1 ≤ i ≤ Iteration DO 39: yita=0.01 40: FOR EACH m: 1 ≤ m ≤ data_num DO 41: Equation (15) to Equation (20) 42: /*Calculate the error cost of this round*/ 43: error cost = l is the dimension of testing data 44: IF error cost < cost_gate DO 45: Break 46: END IF 47: /*Update the weight of all gates*/ 48: END FOR 49: IF error cost < cost_gate DO 50: Break 51: END IF 52: END FOR 53: /* Learning process has been done/ 54: Input the standardized historical data into LSTM to forecast the future changes 55: De-normalize the obtained forecasting outcomes and generate the final forecasting results | |
There are two commonly adopted criteria for verifying forecasting effectiveness, accuracy and stability. Also, we should not just focus on one objective. Both objectives—high accuracy and stability—should be studied simultaneously and implemented in the optimization part. Therefore, based on bias-variance framework, the fitness function should be defined as follows:
where x is the actual value, is the forecasted value, and E is the expectation value of the corresponding variable.
The bias equals the average difference between the actual and forecasted values, which represents forecasting accuracy. A smaller absolute value of the bias demonstrates a more accurate forecasting accuracy. A smaller variance value indicates a more stable forecasting performance. However, in the conduct of most experiments, it was found that the criteria are not suitable for issues that this paper seeks to address. Thus, the standard deviation of forecasting errors is selected as a substitute for fitness 2. Therefore, the fitness function in this paper is formulated as follows:
Hence, the objectives of multi-objective optimization problems are usually conflicting. In that regard, the Pareto optimal solution set provides an answer since it represents the best trade-offs between different objectives. Our optimization problem in this study is a minimization issue, so the way we choose suitable solutions can be formulated as follows:
Minimize the following:
Subject to the following:
where o denotes the number of objectives, m is the number of inequality constraints, p is the number of equality constraints, and Li and Ui are the lower and upper boundaries of the i-th variables, respectively.
Also, several definitions regarding this problem is listed as follows:
Definition 1.
Pareto dominance.
Suppose that there are two vectors: and . Vector x dominates y, denoted as if
Definition 2.
Pareto optimality.
The solution is named a Pareto optimal if
Two solutions are non-dominated with respect to each other if neither of them dominates the other.
Definition 3.
Pareto optimal set.
The set including all non-dominated solutions is named a Pareto set as follows:
Definition 4.
Pareto optimal front.
A set containing the corresponding values of Pareto optimal solutions in a Pareto optimal set is defined as a Pareto optimal front:
2.6. Evaluation Module
This section illustrates reasonable and scientific evaluating modules. In addition, some typical evaluation metric rules that are usually adopted in the relevant research are adopted to verify the forecasting performance; (Pearson’s correlation coefficient) and DM test methods are also exploited in this paper.
2.6.1. Typical Performance Metric
As far as we know, there are no uniform and consistent criteria to test the validity of the prediction results or to compare the results with those of other models. In this study, we adopt lots of multifarious methods and metrics, which are all shown in Table 2. Here, N is the length of the dataset, A denotes the actual value, whereas F represents the forecasting value.
Table 2.
Performance metric rules.
2.6.2. Diebold–Mariano Test
Considering α as the significance level, the null hypothesis indicates that there are no significant differences between the two different forecasting models. Otherwise, denotes the disagreement with . The following formulas indicate the related hypotheses:
where represents the loss function of forecasting errors and are the forecasting errors of two comparison models.
Furthermore, the DM test statistics can be calculated as follows:
where is an estimation for the variance of .
The DM test value is compared with . will be rejected under the circumstance that the DM statistic falls outside the acceptance interval , which indicates that there is a significant difference between the comparison models and the forecasting performances of the proposed model, meaning we accept .
3. Analysis and Experiments
In this part, three different experiments using four different wind speed datasets acquired from Liaotung peninsula and two different electrical power load datasets collected from QLD (Queensland) are carried out to test the proposed hybrid system.
3.1. Raw Data Description
In this study, four different 10-min wind speed datasets were collected from four sites (Figure 6), namely the four wind pour plants in the Liaotung peninsula: the Hengshan site (), Xianren island (), the Donggang site (), and the Danton site ().
Figure 6.
Four wind speed datasets with 10-min time intervals.
Also, two additional electrical load datasets were applied to demonstrate the efficiency of the hybrid forecasting model. The total number of data points in each wind speed dataset was 9488, and that of the electrical load was 2544. Only the first 1000 observations were adopted to verify the model. Of the total 1000 observations, the first 900 observations were used as the training set, while the testing set contained the remaining 100 observations (Figure 6). Furthermore, some basic statistical information, i.e., minimum, average values, as well as maximum values, etc. of the dataset referred to above are demonstrated in Table 3.
Table 3.
Statistical values of each experiment dataset.
3.2. Experiment I: Tests of MOGWO and LSTM
In this experiment, we present two subparts to verify the superiority of the MOGWO and LSTM forecasting algorithm, respectively.
3.2.1. Test of MOGWO
The four typical test functions that are demonstrated in Table 4 are commonly used to verify the superiority of the proposed optimizer and to deal with the multi-objective optimization issues [57,58,59]. NSGA-Ⅱ and multi-objective dragonfly (MODA) were used in this study for comparison. The experimental parameters were as follows: the search agents’ total number was 50, the archive size was 50, and the iteration number was 100. The inverted generational distance (IGD), a widely used metric, was adopted in this paper for the evaluation. Each test function was tested fifty times, and Table 5 shows the statistical values of the IGD. Moreover, Figure 7 demonstrates the Pareto optimal solutions which were acquired by different algorithms.
Table 4.
Four test benchmark functions.
Table 5.
Statistical values of the inverted generational distance (IGD) for four test functions.
Figure 7.
Obtained Pareto optimal solutions by NSGA-II, MODA, and MOGWO for the test functions: Kursawe, ZDT1, ZDT2, and ZDT3.
Based on the outcomes, two conclusions were made as follows:
- The MOGWO algorithm obtained the best IGD outcomes among almost all optimizers for four test functions (Kursawe, ZDT1, ZDT2, and ZDT3) while performing worse than the Kursawe as well as ZDT1 algorithms in terms of the minimum value and worse than MODA regarding the standard deviation. From a whole perspective, these outcomes are strong enough to demonstrate the superior optimization ability of MOGWO algorithms compared with the others.
- Figure 7 shows that the MOGWO algorithm was able to obtain more Pareto optimal solutions. In addition, the solutions found by the MOGWO algorithm were more evenly distributed on the true PF (pareto front) curve and were closer to the real Pareto optimal solutions.
Remark: The optimizing ability of MOGWO has been proven through the results and discussions of the aforementioned experiment comparison. Thus, MOGWO can be widely used to cope with multi-objective problems, thus being adopted as the best optimization model in the proposed CFML system.
3.2.2. Test of LSTM in CEEMD-FTS-MOGWO-LSTM
This subsection aims to compare LSTM, DBN, CNN, and SAE for the four wind speed datasets collected from four different wind farms with 10-min data. We set the parameters for each model based on the error and bias since there are no previous studies on how to set the optimal parameters. Also, to reduce the impact of randomness, we took the mean value of the experiments performed 50 times. The relative results and detailed values are listed in Table 6, and Figure 8 demonstrates the prediction outcomes of the aforementioned four models at the four wind speed sites. From the forecasting data, we drew several conclusions:
Table 6.
Forecasting results of the four deep learning algorithms at four sites.
Figure 8.
Forecasting results of the four deep learning algorithms.
- The LSTM model achieved almost the best results and the most accurate predictions of all four wind speed datasets with roughly the same run time and identical training and testing datasets. Namely, the adopted LSTM model outperformed the CNN, DBN, and SAE from a whole perspective and provided fairly competitive results.
- For the data collected from the four different wind farms, the LSTM model worked better than the other three deep learning models, which means that the superiority of the LSTM forecasting algorithm remained, regardless of the different geographical distribution, to some extent.
- The forecasting performance of different models was adequately reflected by the error metrics adopted by us in this part. That is to say, error measurement is effective and can be used to accurately evaluate the ability of the prediction models.
Remark: For all four datasets, although the LSTM model performed more poorly than the other models on some metrics, the best values of the majority of error metrics, such as mean absolute error (MAE), square root of average of the error squares (RMSE), mean absolute percentage error (MAPE), index of agreement (IA), and so on, indicate that the adopted LSTM model can achieve excellent forecasting accuracy. That is also the reason why we chose LSTM as the main forecasting model in our proposed hybrid forecasting model.
3.3. Experiment II
The comparisons made in this experiment were conducted to demonstrate the specific improvements brought by the fuzzy time series forecasting part and the optimizer algorithm as well as the combination of MOGWO and FTS. Furthermore, an experiment to prove the enhancement in the forecasting ability of the combined model brought by CEEMD was made as well. Moreover, the comparisons between the proposed hybrid forecasting model and all the other models are also listed and analyzed in this part. Table 7 and Figure 9 demonstrate the relevant error metric values of the models mentioned above.
Table 7.
Results of the developed forecasting framework and other models (Experiment II).
Figure 9.
Forecasting results of the developed forecasting system and the other compared models (Experiment II).
- (1)
- For the first comparison, WNN, GRNN, ARIMA, and the LSTM models were built and compared with each other in order to determine the best one for performing wind speed forecasting, which was found to be the ARIMA. However, of all the neural network algorithms, LSTM was shown to be the best one, and Experiment I proved that LSTM is better than the other three deep learning models as well. Hence, the following steps and comparisons are all based on the basic and regular forecasting model—LSTM.
- (2)
- In terms of R (Pearson’s correlation coefficient), ARIMA failed to outdo LSTM in datasets A and B. In addition, we tried AR, MA, ARMA, and ARIMA with different parameters each, and we found that of all these settings, ARMA(2,1), ARIMA(3,1,2), and ARIMA(3,2,2), achieved almost the same forecasting accuracy at about 8% MAPE, which is apparently better than that of the other neural networks. The reason for this phenomenon is that the moving-average model that includes AR requires clear rhythm patterns and fairly linear data series trends, whereas wind speed datasets are neither seasonal nor regular, so all of these irregular features were almost removed by the moving-average method as a result of the differencing operation.
- (3)
- From Table 7, for example, the MOGWO-LSTM achieved a MAPE value of 8.64%, while the basic LSTM model only achieved a MAPE value of 9.48% in the case of site A. Moreover, we tested the effectiveness of the fuzzy time series forecasting part. For example, in the case of site B, the MAPE value of FTS-LSTM was 7.91%, 8.34% lower than that of the LSTM model.
- (4)
- According to Figure 8, the FTS-MOGWO-LSTM model achieved 8.02% in MAPE and 75.34% in r2 from a mean perspective, although it failed to reach the highest r2 value in datasets A and B. Next, the separate improvement on the forecasting ability brought by FTS or MOGWO varied in different datasets. For example, in the case of dataset A, FTS-LSTM was higher than MOGWO-LSTM, which means that MOGWO contributes more to forecasting.
- (5)
- Apart from these comparisons, the decomposition algorithm was also tested in this part. In this paper, we tested several parameter configurations regarding the Nstd (signal noise ratio), NR (noise addition number), Maxiter (maximum number of iterations), and modes (number of IMFs) in the CEEMD algorithm. We tested the Nstd (0.05–0.4), NR (10–500), Maxiter (100–1000), and modes (9–13) to find the best configuration. Detailed parameter settings vary from dataset to dataset, so settings should be changed at any time when the dataset is changed. In this part, for instance, the best settings for dataset A were as follows: an Nstd of 0.2, an NR of 50, and a Maxiter of 500. The total IMF number was 12, and the best accuracy is acquired by 11 IMFs. Also, Table 8 shows that the CFML model achieved the highest r2 value and the lowest MAPE in all four data sites, which demonstrates the improvements brought by CEEMD.
Table 8. Experimental outcomes of the proposed forecasting system and other models (Experiment III, Wednesday).
Remark: Through the aforementioned comparisons and conclusions, it is apparent that the proposed hybrid forecasting model achieves the best values in all the applied error metrics. Moreover, the outcomes prove that the adopted multi-objective optimizer MOGWO, the data decomposition approach CEEMD, and the fuzzy time series part can improve the forecasting ability of the original forecasting model LSTM to a great extent.
3.4. Experiment III: Tested with Electrical Load Data
The third experiment aims to verify the performance of the proposed CFML forecasting model in QLD (Queensland) electrical power load forecasting (Figure 10). Due to the similarity in weekdays or weekends and the noticeable differences between the load data from weekdays and weekends, the data from Wednesday was randomly selected as a representative of weekdays and the data from Sunday was chosen to represent weekends [60]. Table 8 and Table 9 list the experimental outcomes. All forecasting results from Wednesday and Sunday are depicted in Figure 10. In addition, the basic datasets from Wednesday and Sunday in QLD are shown in Figure 10, and both of these datasets were collected from Queensland in Australia. The specific results of electrical load forecasts are presented and shown clearly in this subsection, from which the following conclusions were drawn:
Figure 10.
The forecasting results from Wednesday and Sunday as well as the basic data descriptions.
Table 9.
Experimental outcomes of the proposed forecasting system and other models (Experiment III, Sunday).
- (1)
- Regarding the electrical power load data from Wednesday and all forecasting steps, the proposed hybrid forecasting system performed the best among all the other models. Moreover, among all the single models involved in this experiment, the single model that performed best was the WNN algorithm, while the worst was the CNN model. However, this may be a result of the data features, which does not mean that the CNN constantly performs more poorly than the WNN model. Since the regular form of the CNN model is designed to deal with figure data, to perform unidimensional time series forecasting, it should be first transformed into a matrix in which each row contains many observations, such as 128 or 256, just like the grey scale image data to some extent. Otherwise, it is also reasonable and practical to let each row represent the number you would like to use as input data, but a compromise in the accuracy may arise on some occasions.
- (2)
- For the test of the optimization part and the verification of the fuzzy forecasting part, comparisons between MOGWO-LSTM and LSTM and comparisons between FTS-LSTM and LSTM are obviously shown in the aforementioned tables and figures, respectively. For instance, on Wednesday, the regular LSTM model achieved a MAPE of 2.81%, which is higher than the MAPE of FTS-LSTM by 39.14%. Moreover, the MOGWO-LSTM increased by 37.36% in terms of the MAPE of 1.76%. Also, the FTS-MOGWO-LSTM model possessed a MAPE of 1.46%, lower than that of the single LSTM combined with FTS or MOGWO. Noticeably, although this combined model did not have that highest r2, it was not obviously lower than that of other compared models. Moreover, it was apparently higher than that of regular networks such as GRNN, WNN, DBN, SAE, and so on.
- (3)
- All comparisons for the electrical power load data on Wednesday and Sunday demonstrate that the decomposition methods achieved the best forecasting results. In this study, we tested different parameter settings regarding the Nstd, NR, Maxiter, and modes for EMD, EEMD, and CEEMD. The following outcomes were all acquired based on the best parameter settings for each decomposition algorithm. Table 9 and Table 10 show that the CEEMD method apparently outweighs the EEMD and EMD methods, which explains why the CEEMD was selected by us and employed in this research. Also, from Figure 10, the forecasts gained by the CEEMD model corresponded most to the real data on both Wednesday and Sunday.
Table 10. Results for the Diebold–Mariano (DM) test.
Remark: Based on the three experiments mentioned above, the strong applicability of the developed model in these two electrical power load signals and in different wind data sites, which feature different characteristics, reasonably and convincingly demonstrates that the CEEMD-FTS-MOGWO-LSTM model has universal applicability. Also, the CFML model performs better than all other compared benchmark models.
4. Discussion
In this section, based on the Diebold–Mariano test (DM test), we discuss and analyze the forecasting model’s statistical significance, after which we adopt the Pearson’s correlation coefficient to discuss the association strength. Then, to verify the contributions of our CFML model, the improvement percentages between different combinations of basic models are also discussed in this section. Also, the multistep-ahead forecasting of the developed model and a sensitivity analysis are conducted.
4.1. Discussion I: Statistical Significance
The DM test is widely used to demonstrate the significance of the improvement brought by the developed CFML forecasting system compared with other algorithms. Table 10 lists the specific DM test outcomes, which demonstrates that we are able to reject the null hypothesis at the 1% significance level because all of the compared models’ DM test outcomes were greater than the critical 1% significance value for all four wind speed datasets and the two electrical power load data series. Hence, we are convinced that the proposed CFML forecasting system obviously outweighs the other compared algorithms. According to this, we are able to conclude reasonably that the hybrid forecasting framework displays a significant difference in terms of the statistical level. Furthermore, this proves that the proposed CFML model is superior to the other models mentioned above and involved in wind speed forecasting.
4.2. Discussion II: Association Strength
The Pearson test can reveal the correlation strength between the predicted and actual values, which was proposed by scientist Karl Pearson. In this section, the correlation strength is discussed based on the Pearson test to prove the superiority of the proposed hybrid prediction model and all other comparative models. Specifically, if the Pearson’s correlation coefficient is equal to 0, there is no linear relationship between the two sets of data and, if the Pearson’s correlation coefficient is equal to 1, there is a linear relationship between the actual value and the predicted value. Table 11 demonstrates the outcomes of the Pearson’s test, from which we were able to obtain the conclusion that the values of all other comparative models were lower than that of the proposed CFML forecasting model, which shows that the forecasting values of the CFML model possess higher association strengths to some extent.
Table 11.
Results for the Pearson’s test.
4.3. Discussion III: Improvement Percentage
In order to fully and clearly demonstrate the superiority of the proposed hybrid prediction system, this section discusses the percentage improvements in MAPE, RMSE, MAE, and direction accuracy (DA) between the developed system and other comparative models. These comparisons analyze and quantify how each component works in the overall prediction framework. Table 12 demonstrates the outcomes of the improvement percentages, taking dataset B and the electrical load power on Wednesday as examples, which shows the following conclusions:
Table 12.
Results for the discussion of improvement percentages.
- (1)
- By contrasting the improvement percentage between FTS-MOGWO-LSTM with FTS-LSTM and MOGWO-LSTM, we drew the conclusion that the combination of MOGWO and FTS contributes more than either FTS-LSTM or MOGWO-LSTM to the forecasting ability of the whole presented hybrid CFML forecasting model.
- (2)
- The comparison between the CEEMD-FTS-MOGWO-LSTM and the FTS-MOGWO-LSTM models obviously revealed the improvement brought by the addition of the decomposition approach CEEMD.
- (3)
- On average, all improvement percentages were positive and significant, except for the percentages of FTS-MOGWO-LSTM, as it fluctuated according to different datasets with different features. This can be studied in the future. Regardless of the fluctuations, all values revealed that FTS-MOGWO-LSTM does perform better than the regular one.
4.4. Discussion Ⅳ: Multistep-Ahead Forecasting
Now, we consider that the one-step forecasting model is sometimes insufficient to ensure the controllability and reliability of the electrical power load or wind speed forecasting system. Therefore, to test the multistep performance of the developed CFML system, the multistep prediction in this study used the two datasets listed in Table 3 (i.e., dataset A and the electrical power load on Sunday as representatives).
Table 13 illustrates the forecasting outcomes of those comparative models (i.e., GRNN, LSTM, and EEMD-FTS-MOGWO-LSTM) and the proposed CEEMD-FTS-MOGWO-LSTM forecasting model. It can be observed that for one-step, two-step, and three-step predictions using electrical power load data or wind speed data, the proposed model always achieved the lowest MAPE value in the test models. That is to say, the developed framework effectively carried out multistep-ahead forecasting in electrical power load prediction or wind speed prediction (through effective error index measurements).
Table 13.
Results for the multistep-ahead forecasting.
4.5. Discussion V: Sensitivity Analysis
The hybrid forecasting model has two essential parameters, namely the number of iterations and the number of search agents. Hence, in this subsection, we explore the effects of these two parameters on the prediction performance of wind speed dataset A. That is, the other parameters’ values were unchanged, while the number of search agents and iterations changed. Specifically, we set the search agents as 5, 10, 15, 20, 25, and 30, and then, we kept the search agent at the value of 10, changing the values of iterations to 5, 10, 20, 30, 40, and 50. Table 14 and Table 15 illustrate the experimental outcomes of dataset A. The following conclusions were drawn:
Table 14.
Sensitivity analysis of different search agent numbers based on MOGWO.
Table 15.
Sensitivity analysis of the different iteration numbers based on MOGWO.
- (1)
- The value of MAPE first decreased as the number of search agents increased. Then, it declined to the minimum value with 10 search agents, after which it started increasing and fluctuated at a high level except for a decrease at 25 search agents. Overall, we can see that the proposed hybrid CFML forecasting model performed the best with 10 search agents.
- (2)
- Keeping the number of search agents at the best value of 10, we changed the number of iterations in order to check the influence caused by the iterations on the performance of the presented model. We almost drew a similar conclusion to that of the search agents to some degree. We can see that, as the number of iterations increased from 5 to 30, the accuracy measured by various metrics, especially MAPE, first fell to the minimum value with 10 iterations and then rose gradually as the number of iteration increased. According to these two conclusions, we set the number of search agents and the number of iterations to 10 in our experiment.
- (3)
- It was found through the comparisons that the number of those two parameters would worsen the performance of the CEEMD-FTS-MOGWO-LSTM system proposed in this study if either they were too small or too big. In addition, different prediction conditions were shown to depend to a large extent on the decision-making process. Therefore, it is important to figure out the optimal parameters under different application conditions.
Remark: According to Discussions I to V, we can draw the conclusion that the proposed hybrid forecasting system, namely CEEMD-FTS-MOGWO-LSTM, possesses a more effective and stable forecasting ability, regarding not only the wind speed but also the electrical power load, than other models in terms of a lot of aspects, such as the correlation strength, statistical significance, and forecasting accuracy. Also, the small number of iterations and search agents demonstrates the superiority and convenience of the proposed model.
5. Conclusions
Accurate wind speed electrical power load forecasting is crucial for power grid safety management, power system operation, and the power market. However, due to the nonlinearity and randomness of wind speed data and electrical power load series, it is still a difficult and challenging task to establish an effective forecasting framework to deal with this problem. In this study, a new hybrid prediction system was developed in order to obtain stability and accuracy simultaneously. Four wind speed datasets and two electrical power load datasets were adopted to test the effectiveness of the hybrid forecasting framework. The outcomes show that our proposed system outperformed all other comparative benchmark models on many indicators. Firstly, a data preprocessing decomposition approach, named CEEMD, was successfully applied in this study to enhance the forecasting ability of the CFML forecasting model. Secondly, an effective multi-objective optimization algorithm, MOGWO, was successfully combined and used to find out the optimal initial parameters. It not only achieved better results in testing functions than the other two optimization models (NSGA-II and MODA) but also showed the best optimization capability. Moreover, fuzzy time series forecasting with the rough set induction rule, which is based on the LEM2 algorithm to build rule sets, was successfully combined with MOGWO and the deep learning algorithm, called LSTM, in this paper. It was shown that the addition of the FTS part, the MOGWO part, and the data decomposition part all bring improvements in the performance of the hybrid forecasting framework. Also, a similar method can be applied in other fields, for example, the electrical power load, which was verified in this paper. Finally, the forecasting models CEEMD, FTS, and MOGWO showed the ability to carry the strength of each component and to effectively improve the forecasting ability of the CFML forecasting model in terms of stability and accuracy.
Author Contributions
Conceptualization, D.W. and J.W.; Methodology, J.W.; Software, D.W.; Validation, J.W., K.N. and G.T.; Formal Analysis, D.W. and J.W.; Investigation, K.N. and G.T.; Resources, D.W.; Data Curation K.N. and G.T.; Writing-Original Draft Preparation, D.W.; Writing-Review & Editing, D.W. and J.W.; Visualization, D.W. and J.W.; Supervision, J.W.; Project Administration, D.W.; Funding Acquisition, J.W.
Funding
This work was supported by the National Natural Science Foundation of China (grant number 71671029).
Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this paper.
Abbreviations
| List of Abbreviations | FB | The fractional bias | |
| U1 | The Theil U statistic 1 | ||
| CFML | CEEMD-FTS-MOGWO-LSTM | U2 | The Theil U statistic 2 |
| WNN | Wavelet Neutral Network | DA | The direction accuracy |
| GRNN | Generalized Regression Neural Network | INDEX | The improvement ratio of the index among different models |
| SAE | Sparse Autoencoder | R2 | The Pearson’s correlation coefficient |
| LSTM | Long Short-Term Memory | DM | Diebold–Mariano test |
| DBN | Deep Belief Network | H0 | The null hypothesis |
| CNN | Convolutional Neural Network | H1 | The alternative hypothesis |
| IGD | The inverted generational distance | α | The confidence level |
| FTS | Fuzzy time series | Xt | An input at time t |
| LEM2 | Learning from examples module version two | St | The hidden state |
| AR | Autoregressive model | MA | Moving-average model |
| ARIMA | Autoregressive Integrated Moving Average | ARMA | Autoregressive moving average model |
| MODA | Multi-objective dragonfly | St−1 | The previous time step |
| MOGWO | Multi-objective grey wolf | ft | The forget gate |
| NSGA-Ⅱ | Non-dominated sorted genetic algorithm-Ⅱ | it | The input gate |
| Kα | The distance between wolf α and the prey | R1 | The position of wolf α at time ite+1 |
| Kβ | The distance between wolf β and the prey | R2 | The position of wolf β at time ite+1 |
| Kδ | The distance between wolf δ and the prey | R3 | The position of wolf δ at time ite+1 |
| QLD | Queensland | Ct−1 | The old cell state |
| Pni | Positive noise | Nni | Negative noise |
| AE | The average error | Wni | Noise with identical amplitude and phase |
| MAE | The mean absolute error | Ot | The output gate |
| RMSE | The root-mean-square error | gj | The j-th inequality constraint |
| NMSE | The normalized average of the squares of error | hj | The j-th equality constraint |
| MAPE | The mean absolute percentage error | RST | Rough set theory |
| IMF | Intrinsic mode function | IA | The index of agreement |
| ZDT2 | Zitzler–Deb–Thiele’s function N. 2 | ZDT1 | Zitzler–Deb–Thiele’s function N. 1 |
| Kursawe | Kursawe function | ZDT3 | Zitzler–Deb–Thiele’s function N. 3 |
| EMD | Empirical Mode Decomposition | ||
| EEMD | Ensemble Empirical Mode Decomposition | ||
| CEEMD | Complete Ensemble Empirical Mode Decomposition | ||
References
- Ou, T.C. A novel unsymmetrical faults analysis for microgrid distribution system. Int. J. Electr. Power Energy Syst. 2012, 43, 1017–1024. [Google Scholar] [CrossRef]
- Ou, T.C. Ground fault current analysis with a direct building algorithm for microgrid distribution. Int. J. Electr. Power Energy Syst. 2013, 53, 867–875. [Google Scholar] [CrossRef]
- Lin, W.M.; Ou, T.C. Unbalanced distribution network fault analysis with hybrid compensation. IET Gener. Transm. Distrib. 2010, 5, 92–100. [Google Scholar] [CrossRef]
- Ou, T.C.; Lu, K.H.; Huang, C.J. Improvement of transient stability in a hybrid power multi-system using a designed NIDC (novel intelligent damping controller). Energies 2017, 10, 488. [Google Scholar] [CrossRef]
- Ye, S.; Zhu, G.; Xiao, Z. Long term load forecasting and recommendations for china based on support vector regression. Energy Power Eng. 2012, 4, 380–385. [Google Scholar] [CrossRef]
- He, Q.; Wang, J.; Haiyan Lu, H. A hybrid system for short-term wind speed forecasting. Appl. Energy 2018, 226, 756–771. [Google Scholar] [CrossRef]
- Yang, W.; Wang, J.; Lu, H. Hybrid wind energy forecasting and analysis system based on divide and conquer scheme: A case study in China. J. Clean. Prod. 2019, 222, 942–959. [Google Scholar] [CrossRef]
- Abdel-Aal, R.E.; Elhadidy, M.A.; Shaahid, S.M. Modeling and forecasting the mean hourly wind speed time series using GMDH-based abductive networks. Renew. Energy 2009, 34, 1686–1699. [Google Scholar] [CrossRef]
- Wang, J.; Niu, T.; Lu, H.; Yang, W.; Du, P. A Novel Framework of Reservoir Computing for Deterministic and Probabilistic Wind Power Forecasting. IEEE Trans. Sustain. Energy 2019. [Google Scholar] [CrossRef]
- Ma, L.; Luan, S.Y.; Jiang, C.W.; Liu, H.L.; Zhang, Y. A review on the forecasting of wind speed and generated power. Renew. Sustain. Energy Rev. 2009, 13, 915–920. [Google Scholar]
- Cardenas-Barrera, J.L.; Meng, J.; Castillo-Guerra, E.; Chang, L. A neural networkapproach to multi-step-ahead, short-term wind speed forecasting. IEEE 2013, 2, 243–248. [Google Scholar]
- Torres, J.L.; García, A.; Blas, M.D.; Francisco, A.D. Forecast of hourly average wind speed with arma models in navarre (Spain). Sol. Energy 2005, 79, 65–77. [Google Scholar] [CrossRef]
- Liu, H.; Tian, H.Q.; Li, Y.F. An emd-recursive arima method to predict wind speed for railway strong wind warning system. J. Wind Eng. Ind. Aerodynam. 2015, 141, 27–38. [Google Scholar] [CrossRef]
- Kavasseri, R.G.; Seetharaman, K. Day-ahead wind speed forecasting using arima models. Renew. Energy 2009, 34, 1388–1393. [Google Scholar] [CrossRef]
- Yang, D.; Sharma, V.; Ye, Z.; Lim, L.I.; Zhao, L.; Aryaputera, A.W. Forecasting of global horizontal irradiance by exponential smoothing, using decompositions. Energy 2015, 81, 111–119. [Google Scholar] [CrossRef]
- Li, Y.; Ling, L.; Chen, J. Combined grey prediction fuzzy control law with application to road tunnel ventilation system. J. Appl. Res. Technol. 2015, 13, 313–320. [Google Scholar] [CrossRef]
- Barbounis, T.G.; Theocharis, J.B. A locally recurrent fuzzy neural network with application to the wind speed prediction using spatial correlation. Neurocomputing 2007, 70, 1525–1542. [Google Scholar] [CrossRef]
- Guo, Z.H.; Wu, J.; Lu, H.Y.; Wang, J.Z. A case study on a hybrid wind speed forecasting method using BP neural network. Knowl. Based Syst. 2011, 24, 1048–1056. [Google Scholar] [CrossRef]
- Li, G.; Shi, J. On comparing three artificial neural networks for wind speed forecasting. Appl. Energy 2010, 87, 2313–2320. [Google Scholar] [CrossRef]
- Jiang, P.; Liu, F.; Song, Y.L. A hybrid forecasting model based on date-framework strategy and improved feature selection technology for short-term load forecasting. Energy 2017, 119, 694–709. [Google Scholar] [CrossRef]
- Hao, Y.; Tian, C. The study and application of a novel hybrid system for air quality early-warning. Appl. Soft Comput. 2019, 74, 729–746. [Google Scholar] [CrossRef]
- Zhang, X.; Wang, J.; Gao, Y. A hybrid short-term electricity price forecasting framework: Cuckoo search-based feature selection with singular spectrum analysis and SVM. Energy Econ. 2019, 81, 899–913. [Google Scholar] [CrossRef]
- Lago, J.; Ridder, F.D.; Schutter, B.D. Forecasting spot electricity prices: Deep learning approaches and empirical comparison of traditional algorithms. Appl. Energy 2018, 221, 386–405. [Google Scholar] [CrossRef]
- Hinton, G.E.; Osindero, S.; Teh, Y.W. A fast learning algorithm for deep belief nets. Neural Comput. 2006, 18, 1527–1554. [Google Scholar] [CrossRef] [PubMed]
- Ni, K.L.; Wang, J.; Tang, G.J.; Wei, D.X. Research and Application of a Novel Hybrid Model Based on a Deep Neural Network for Electricity Load Forecasting: A Case Study in Australia. Energies 2019, 12, 2467. [Google Scholar] [CrossRef]
- Fischer, T.; Krauss, C. Deep learning with long short-term memory networks for financial market predictions. Eur. J. Oper. Res. 2018, 270, 654–669. [Google Scholar] [CrossRef]
- Khatami, A.; Khosravi, A.; Nguyen, T.; Lim, C.P.; Nahavandi, S. Medical image analysis using wavelet transform and deep belief networks. Expert Syst. Appl. 2017, 86, 190–198. [Google Scholar] [CrossRef]
- Hinton, G.; Deng, L.; Yu, D.; Dahl, G.; Mohamed, A.R.; Jaitly, N.; Senior, A.; Vanhoucke, V.; Nguyen, P.; Sainath, T.; et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. Signal Process. Mag. 2012, 29, 82–97. [Google Scholar] [CrossRef]
- Peris, A.; Domingo, M.; Casacuberta, F. Interactive neural machine translation. Comput. Speech Lang. 2017, 45, 201–220. [Google Scholar] [CrossRef]
- Wang, H.Z.; Li, G.Q.; Wang, G.B.; Peng, J.C.; Jiang, H.; Liu, Y.T. Deep learning based ensemble approach for probabilistic wind power forecasting. Appl. Energy 2017, 188, 56–70. [Google Scholar] [CrossRef]
- Fan, C.; Xiao, F.; Zhao, Y. A short-term building cooling load prediction method using deep learning algorithms. Appl. Energy 2017, 195, 222–233. [Google Scholar] [CrossRef]
- Kong, X.; Xu, X.; Yan, Z.; Chen, S.; Yang, H.; Han, D. Deep learning hybrid method for islanding detection in distributed generation. Appl. Energy 2018, 210, 776–785. [Google Scholar] [CrossRef]
- Coelho, I.; Coelho, V.; Luz, E.; Ochi, L.; Guimarães, F.; Rios, E. A GPU deep learning metaheuristic based model for time series forecasting. Appl. Energy 2017, 201, 412–418. [Google Scholar] [CrossRef]
- Hong, Y.Y.; Chang, H.L.; Chiu, C.S. Hour-ahead wind power and speed forecasting using simultaneous perturbation stochastic approximation (spsa) algorithm and neural network with fuzzy inputs. Energy 2010, 35, 3870–3876. [Google Scholar] [CrossRef]
- Mohandes, M.A.; Halawani, T.O.; Rehman, S.; Hussain, A.A. Support vector machines for wind speed prediction. Renew. Energy 2004, 29, 939–947. [Google Scholar] [CrossRef]
- Zhou, J.; Shi, J.; Li, G. Fine tuning support vector machines for short-term wind speed forecasting. Energy Convers. Manag. 2011, 52, 1990–1998. [Google Scholar] [CrossRef]
- He, J.M.; Wang, J.; Xiao, L.Q. A hybrid approach based on the Gaussian process with t-observation model for short-term wind speed forecasts. Renew. Energy 2017, 114, 670–685. [Google Scholar]
- Hao, Y.; Tian, C. A novel two-stage forecasting model based on error factor and ensemble method for multi-step wind power forecasting. Appl. Energy 2019, 238, 368–383. [Google Scholar] [CrossRef]
- Niu, T.; Wang, J.; Lu, H.; Du, P. Uncertainty modeling for chaotic time series based on optimal multi-input multi-output architecture: Application to offshore wind speed. Energy Convers. Manag. 2018, 156, 597–617. [Google Scholar] [CrossRef]
- Wang, J.; Li, H.; Lu, H. Application of a novel early warning system based on fuzzy time series in urban air quality forecasting in China. Appl. Soft Comput. J. 2018, 71, 783–799. [Google Scholar] [CrossRef]
- Bates, J.M.; Granger, C.W.J. The combination of forecasts. Oper. Res. Q. 1969, 20, 451–468. [Google Scholar] [CrossRef]
- Xiao, L.; Qian, F.; Shao, W. Multi-step wind speed forecasting based on a hybrid forecasting architecture and an improved bat algorithm. Energy Convers. Manag. 2017, 143, 410–430. [Google Scholar] [CrossRef]
- Xiao, L.; Wang, J.; Hou, R.; Wu, J. A combined model based on data pre-analysis and weight coefficients optimization for electrical load forecasting. Energy 2015, 82, 524–549. [Google Scholar] [CrossRef]
- Wang, J.; Du, P.; Lu, H.; Yang, W.; Niu, T. An improved grey model optimized by multi-objective ant lion optimization algorithm for annual electricity consumption forecasting. Appl. Soft Comput. J. 2018, 72, 321–337. [Google Scholar] [CrossRef]
- Li, H.; Wang, J.; Li, R.; Lu, H. Novel analysis-forecast system based on multi-objective optimization for air quality index. J. Clean. Prod. 2019, 208, 1365–1383. [Google Scholar] [CrossRef]
- Liu, H.; Tian, H.Q.; Pan, D.F.; Li, Y.F. Forecasting models for wind speed using wavelet, wavelet packet, time series and artificial neural networks. Appl. Energy 2013, 107, 191–208. [Google Scholar] [CrossRef]
- Liu, H.; Tian, H.Q.; Li, Y.F. Comparison of new hybrid FEEMD-MLP, FEEMD-ANFIS, Wavelet Packet-MLP and Wavelet Packet-ANFIS for wind speed predictions. Energy Convers. Manag. 2014, 89, 11. [Google Scholar] [CrossRef]
- Afshar, K.; Bigdeli, N. Data analysis and short term load forecasting in Iran electricity market using singular spectral analysis (SSA). Energy 2011, 36, 2620–2627. [Google Scholar] [CrossRef]
- Yeh, J.R.; Shieh, J.S.; Huang, N.E. Complementary ensemble empirical mode decomposition: A novel noise enhanced data analysis method. Adv. Adapt. Data Anal. 2010, 2, 135–156. [Google Scholar] [CrossRef]
- Anbazhagan, S.; Kumarappan, N. Day-ahead deregulated electricity market price forecasting using recurrent neural network. IEEE Syst. J. 2013, 7, 866–872. [Google Scholar] [CrossRef]
- Pawlak, Z.; Skoworn, A. Rudiments of rough sets. Inf. Sci. 2007, 177, 3–27. [Google Scholar] [CrossRef]
- Stefanowski, J. On rough set based approaches to induction of decision rules. Rough Sets Knowl. Discov. 1998, 1, 500–529. [Google Scholar]
- Grzymala-Busse, J.W. A new version of the rule induction system LERS. Fundam. Inform. 1997, 31, 27–39. [Google Scholar]
- Liu, L.; Wiliem, A.; Chen, S.; Lovell, B.C. Automatic Image Attribute Selectionfor Zero-Shot Learning of Object Categories. In Proceedings of the Twenty Second International Conference on Pattern Recognition, Stockholm, Sweden, 24–28 August 2014; pp. 2619–2624. [Google Scholar]
- Mirjalili, S.; Saremi, S.; Mirjalil, S.M.; Coelho, L.S. Multi-objective grey wolf optimizer: A novel algorithm for multi-criterion optimization. Energy 2016, 47, 106–119. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Yang, W.; Wang, J.; Niu, T. A hybrid forecasting system based on a dual decomposition strategy and multi-objective optimization for electricity price forecasting. Appl. Energy 2019, 235, 1205–1225. [Google Scholar] [CrossRef]
- Zhou, Q.G.; Wang, C.; Zhang, G.F. Hybrid forecasting system based on an optimal model selection strategy for different wind speed forecasting problems. Appl. Energy 2019, 250, 1559–1580. [Google Scholar] [CrossRef]
- Jiang, P.; Liu, Z. Variable weights combined model based on multi-objective optimization for short-term wind speed forecasting. Appl. Soft Comput. 2019, 82, 105587. [Google Scholar] [CrossRef]
- Zhang, X.; Wang, J.; Zhang, K. Short-term electric load forecasting based on singular spectrum analysis and support vector machine optimized by Cuckoo search algorithm. Electr. Power Syst. Res. 2017, 146, 270–285. [Google Scholar] [CrossRef]
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).