Next Article in Journal
Game-Based Energy Management Method for Hybrid RTG Cranes
Next Article in Special Issue
Forecasting Daily Crude Oil Prices Using Improved CEEMDAN and Ridge Regression-Based Predictors
Previous Article in Journal
Short-Term Operation Scheduling of a Microgrid under Variability Contracts to Preserve Grid Flexibility
Previous Article in Special Issue
Cost Forecasting Model of Transformer Substation Projects Based on Data Inconsistency Rate and Modified Deep Convolutional Neural Network
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Research and Application of a Novel Hybrid Model Based on a Deep Neural Network Combined with Fuzzy Time Series for Energy Forecasting

School of Statistics, Dongbei University of Finance and Economics, Dalian 116025, China
*
Author to whom correspondence should be addressed.
Energies 2019, 12(18), 3588; https://doi.org/10.3390/en12183588
Submission received: 5 August 2019 / Revised: 6 September 2019 / Accepted: 6 September 2019 / Published: 19 September 2019
(This article belongs to the Special Issue Intelligent Optimization Modelling in Energy Forecasting)

Abstract

:
In recent years, although deep learning algorithms have been widely applied to various fields, ranging from translation to time series forecasting, researchers paid limited attention to modelling parameter optimization and the combination of the fuzzy time series. In this paper, a novel hybrid forecasting system, named CFML (complementary ensemble empirical mode decomposition (CEEMD)-fuzzy time series (FTS)-multi-objective grey wolf optimizer (MOGWO)-long short-term memory (LSTM)), is proposed and tested. This model is based on the LSTM model with parameters optimized by MOGWO, before which a fuzzy time series method involving the LEM2 (learning from examples module version two) algorithm is adopted to generate the final input data of the optimized LSTM model. In addition, the CEEMD algorithm is also used to de-noise and decompose the raw data. The CFML model successfully overcomes the nonstationary and irregular features of wind speed data and electrical power load series. Several experimental results covering four wind speed datasets and two electrical power load datasets indicate that our hybrid forecasting system achieves average improvements of 49% and 70% in wind speed and electrical power load, respectively, under the metric MAPE (mean absolute percentage error).

1. Introduction

Effective forecasting plays an essential role in various aspects, such as energy application, economic risk management, standardized management, policy making, and so on. Forecasting helps corporations, governments, and other organizations and institutions to evaluate the market and to make relative predictions to better understand potential relations among entities and to plan for the future, which is a useful way to make policies on both the private and the social levels. As a result, many forecasting methods have been proposed during the past decades. Among these, there are two different categories: time series forecasting and causal forecasting. Since causal forecasting has some inherent limitations, including the reliability and availability of independent variables, time series forecasting has been applied much more widely due to its convenience for data collection and its high accuracy as well as stability. Time series forecasting methods presume that history will repeat itself, which means that the forecasting of future values is based on present values and past observations. Nowadays, time series forecasting has achieved great success in many industries, especially in the energy industry.
With the rapid development of the energy industry and increasing demand for high-level management and application, its infrastructure has been upgraded by a great extent, as a result of which the prices, supply, as well as demand have oscillated to a greater extent and have become more unpredictable than ever before. This has posed a great challenge to the forecasting method in terms of accuracy and stability, since forecasting plays an essential role in qualifying those unfortune features by which people are able to gain more accurate forecasts that can be applied to risk management, energy planning, industry configuration, and so on. In previous years, electrical power load forecasting has been an important part of power system planning and the basis of the economic operation of power systems. Unfortunately, we have to confront several difficulties, such as meteorological factors, development speed, and some cases of unpredictable natural devastation, if we want to take good advantage of power load data. Researchers have focused on exploring nonsymmetrical faults [1], ground faults [2], microgrid distributions [3], etc. For instance, Qu et al. [4] explored and developed an intelligent damping controller which can reduce power fluctuations in hybrid power systems. Ye et al. [5] studied long-term load forecasting based on support vector regression (SVR) and explored nonlinear relationships between economic growth in terms of GDP and power load requirements. On the other hand, with the inadequate implementation of corresponding emission and environmental protection policies [6], wind power has attracted many scientists and researchers [7]. Currently, wind power accounts for roughly 10% of the total consumption of energy in Europe—15% more than that of Spain and Germany [8]. To utilize the wind more effectively and efficiently, we need to get accurate forecasts of the wind speed. Nevertheless, as for wind speed, it has an inherently volatile and irregular quality and is considered a fairly tricky weather element to predict accurately as a result of its randomness and nonlinearity [9]. Numerous researchers and scientists have made great contributions to the development of effective and robust wind speed forecasting models, which can also be used to forecast electrical power load data. According to time horizons, there are four different types of forecasting methods: long-range forecasting, medium-range forecasting, short-time-period forecasting, and very-short-term forecasting. Moreover, it can also be divided into the following four types: artificial intelligence methods, statistical methods, spatial correlation methods, and physical methods [10].
Physical models containing parameters ranging from temperature to topography to pressure are usually used on a massive scale for long-term wind speed prediction with multiple weather parameters [11]. On the contrary, statistical models, such as the autoregressive (AR) model, Auto-Regressive Average (ARMA) [12], Autoregressive Integrated Moving Average (ARIMA) [13], fractional ARIMA (FARIMA) [14], exponential smoothing (ES) [15], and grey prediction (GP) [16], are developed on the basis of the relationships among variables through mathematical statistics to illustrate the potential correlations within the historical data sampled from the observed wind speed data. Spatial correlation methods mainly take into account the other factors, such as the direction of the wind, the terrain roughness, and the height above the horizon. Sometimes, this kind of method achieves high accuracy [17].
With the rapid development of and increasing research on computer science, the performance of complex calculations in less time has become possible. Consequently, in the past few years, a large number of statistical learning models have been recorded, which eventually formed a mature theoretical system. The renowned ANN (artificial neurol network) is widely utilized for wind speed forecasting fields, which have the ability to carry on the parallel processing and to deliver nonlinear maps. This mainly includes back propagation (BP) [18], the radial basis function (RBF) [19], the Elman neural network (ENN) [20], the wavelet neural network (WNN) [21,22], and others. In addition, during the past twenty years, the neural network field has experienced some innovations which have resulted in well-known deep learning (DL) models [23]. Particularly, the large computational cost has been the largest drawback of conventional neural network algorithms. However, greedy layer-wise pretraining is able to train the so-called deep belief network (DBN) more efficiently [24,25]. Following pertinent progresses, scientists are now able to create and train neural networks with not only one hidden layer, which, in turn, has increased generalization capabilities and allowed better outcomes. This field has also been renamed “in-depth learning” to assess the depth of progress made [26]. The success of DL models can be seen in computer science applications, such as image recognition [27], speech recognition [28], and machine translation [29]. Moreover, the benefits have also spread to energy-related fields, such as wind power forecasting, which especially refers to wind speed forecasting. In the same field, Wang et al. [30] proposed the convolutional neural networks (CNNs) to acquire precise probability prediction of wind power. However, there is still relatively less research about the DL-related models being applied in wind speed forecasting fields compared with the most active part of this technology. In Reference [31], a deep autoencoder (DAC) combined with extreme gradient boosting (XGB) was proposed to forecast the building cooling load; A deep neural network (DNN) was also applied to get the forecasting results; this method was more accurate than the other methods presented in the same paper. In Reference [32], the DL model was also shown to discern the islanding highly accurately. Therefore, regarding this point, we considered the application of these kinds of technologies in wind speed forecasting in an effort to get a higher accuracy. Furthermore, the authors of Reference [33] proposed a DL strategy applied to time series forecasting and demonstrated how it can be successfully used in electricity consumption forecasting, which correlates with the wind speed data to some extent. Except for ANNs, fuzzy logic methods [34] as well as support vector machine (SVM) [35]-related methods, such as least-squares support vector machines (LSSVMs) [36], Gaussian processes [37], and others, are also commonly applied in the forecasting of wind speed.
However, each method has different drawbacks and disadvantages as a result of its inherent nature. The drawbacks of the aforementioned models are summarized as follows:
(1)
Because physical algorithms are very sensitive to market information, they need a long run time and a large amount of computing resources. In addition, these models have shortcomings in dealing with short-term forecasting problems and they do not have high accuracy and validity in short-term forecasting.
(2)
Traditional statistical arithmetic methods fail to manage forecasting with fluctuations and high levels of noise, nonlinear and irregular trends, or other inherent characteristics of wind speed data that are primarily confined by the premise of a linear pattern along a time series. Moreover, oftentimes, these methods require a large amount of historical data on which they deeply depend in realistic cases. This means that once there is an abrupt and unexpected change in the original data as a result of social or environmental factors, prediction errors will proliferate all at once [38].
(3)
Spatial correlation arithmetic methods based on vast quantities of information, for example, the wind speed information of many spatially correlated sites which is difficult to collect and analyze, makes it hard to perform perfect wind speed forecasting [39].
(4)
Artificial intelligence arithmetic methods, different from other approaches, are able to deal with nonlinear features which are hidden among historical wind speed data. Although many studies have been carried out and the methods have been successfully applied to address complex data patterns, there are also some defects and drawbacks within artificial intelligence methods, such as showing a relatively low convergence rate and over-fitting, easily getting into a local optimum, etc.
(5)
Individual forecasting models are good at forecasting to some extent, but they rarely focus on the importance and necessity of data preprocessing; therefore, these approaches cannot always get a good forecasting outcome.
Hence, with the objective of combining all the advantages and of avoiding the weaknesses, a number of combined forecasting methods have been proposed [40]. Bates and Granger proposed the combination prediction theory and showed promising outcomes in 1969 [41]. Since then, research on combinatorial forecasting theory has attracted extensive attention [42]. Xiao et al. developed two combined models for wind speed sequence prediction: the AI combination model [43,44,45] and NNCT (no negative constraint theory). The results indicate that more reliable and accurate forecasts are attained when the combined models are applied.
In addition, with the purpose of achieving highly accurate forecasting, some types of time series preprocessing techniques, such as wavelet packet decomposition (WPD) [46], fast ensemble empirical mode decomposition (FEEMD) [47], and singular spectrum analysis (SSA) [48] techniques, have been effectively applied in the data preprocessing stages of time series forecasting fields in an effort to decrease the random disturbance traits of the original windspeed data. Similarly, techniques have been widely used in such hybrid models to get a higher forecasting accuracy. Thus, the complementary ensemble empirical mode decomposition (CEEMD) that is modified from the ensemble empirical mode decomposition (EEMD) is applied in this paper.
Thus, in this study, the CEEMD-FTS (fuzzy time series)-MOGWO (multi-objective grey wolf optimizer)-long short-term memory (LSTM), a combined model with CEEMD as the preprocessing part, is based on LSTM, which belongs to the RNNs (recurrent neurol networks) within the DL field, but a modified version with less disadvantages and more powerful memorizing capability and the meritorious multi-objective optimization algorithm MOGWO is developed. Subsequently, to deal with the uncertain forecasting problems and to dig out more useful and constructive information hidden within the history data to get a better forecasting result, we also combine the aforementioned model with the fuzzy time series analyzing method based on rough set rule induction which contains the LEM2 (learning from examples module version two).
Generally, the innovations of this study can be summarized as follows:
(1).
This study proposes a hybrid forecasting model which can take advantage of deep learning networks as well as the fuzzy time analysis technique based on the LEM2 rule-generating algorithm, which increases the forecasting accuracy obviously. To our knowledge, it has not been found that deep leaning neural networks are combined with the rough set induction theory. Hence, our study develops a hybrid model combining LSTM with the fuzzy time series analysis technique that uses rough sets to generate rules as a replacement for traditional rule-generating methods.
(2).
This study improves the forecasting stability and accuracy simultaneously with the deep learning neural network through the weight-determining method called MOGWO based on the leave-one-out strategy and swarm intelligence, which helps to find best weighting parameters for the LSTM neural network. Most previous studies just paid attention to one aim (stability or accuracy). Therefore, to achieve high accuracy and stability, a multi-objective optimization algorithm, MOGWO, is successfully applied in this study.
(3).
This study provides a scientific and reasonable evaluation of the new hybrid forecasting model made to verify the forecasting performance of the combined forecasting model proposed in this paper. Three experiments are carried out in this paper, including comparisons between different deep learning neutral networks, efficiency and effectiveness tests among various models in four different wind sites, and a contrast experiment in which the proposed hybrid forecasting system is applied to electrical load forecasting with two different electrical power load data series on Wednesday and Sunday. The outcome illustrates that this proposed system performs well.
(4).
This study delivers an insightful discussion about the developed forecasting system, illustrating the improvements brought about by different parts of the proposed forecasting model as well as the multistep forecasting ability. Five discussion topics are presented in this paper, namely statistical significance, association strength, improvement percentage, multistep ahead forecasting, and sensitivity analysis. Through these discussions, the effectiveness of the hybrid forecasting framework is verified.
The remainder of this paper is organized as follows:
Section 2 gives the profile of principles of methods corresponding to the proposed hybrid models, namely the CFML model (CEEMD-FTS-MOGWO-LSTM). Relevant methodology is shown Also, in this section, including the data preprocessing method, the fuzzy time series technique with LEM2, the MOGWO, and the long short-term memory algorithm. Moreover, several evaluations and experiments that help to demonstrate the performance of the CFML model are presented in Section 3. Moreover, Section 4 gives a discussion about different comparison outcomes. Finally, Section 5 concludes this study.

2. Methodology

An innovative hybrid forecasting model is successfully developed and the corresponding components are introduced briefly in this section, including the data preprocessing technique named complementary ensemble empirical mode decomposition (CEEMD), the fuzzy analyzing part based on rough sets induction theory, the forecasting algorithm named LSTM, and the multi-objective optimization algorithm MOGWO.

2.1. Hybrid Forecasting Framework

Figure 1 and Figure 2 shows combined the CFML forecasting model, from which the CFML system can be expounded as follows:
  • The original wind speed data is decomposed by applying the CEEMD method into several subseries named Intrinsic Mode Functions (IMFs).
  • The fuzzy analysis method is applied using the rough set induction LEM2 algorithm to generate the forecasting rules, and raw data are applied to these rules to generate preliminary forecasts. These forecasts obtained by fuzzy time series forecasting are not precise enough, but the difference between these forecasts and the actual values can demonstrate potential forecasting biases that are useful for modifying the learning process of the following neural network, namely the LSTM model optimized with MOGWO. As for the raw input data, we accept five dimensions for each forecast, including lag1, lag2, lag3, slope, and the present data, in order to forecast the following one for each subseries (Figure 2).
  • The output data generated from the previous steps is used as the input data for the LSTM forecasting module, which is optimized by the multi-objective optimization algorithm called MOGWO for each subseries. Specifically, real values of X t lag1, and lag2 and their differences, including D1, D2, and D3, are adopted as input data of the LSTM model modified by MOGWO (Table 1).
  • The forecasting outcomes of each subseries generated from the preprocessing part named CEEMD are aggregated to obtain the eventual forecasting results of CFML.

2.2. Data Preprocessing Module

The CEEMD algorithm, proposed by Yeh et al. [49], is the modified version of the EEMD and EMD. According to Anbazhagan et al. [50], the primary steps of this algorithm are as follows:
Step 1:
Add white noise pairwise with the identical amplitude and the opposite phase to the raw data sequence v t , after which we can obtain a pair of polluted signals:
P n i = v t + W n i t N n i = v t W n i t
where P n i denotes the positive noise of i-th trial, N n i is the negative noise of i-th trial, and W n i represents the noise with identical amplitude and phase.
Step 2:
Decompose the polluted signal pairs ( P n i , N n i ) into a finite set including IMF components:
P n i t = j = 1 M u i j + t N n i t = j = 1 M u i j t
where u i j and u i j + are the j-th intrinsic mode functions of the i-th trial with negative and positive noise. Furthermore, M signifies the number of IMFs.
Step 3:
Two sets of IMF components, i.e., the negative noise set of the first IMF component u i j t i = 1 , j = 1 T , M and positive noises u i j + t i = 1 , j = 1 T , M , are obtained by performing the above two steps T times with different amounts of white noise.
Step 4:
The component of the j-th IMF u j t can be calculated as follows in order to get the ensemble means of whole IMFs:
u j t =   1 2 T i = 1 T u i j + t + u i j t

2.3. Rough Set Theory (RST) and LEM2

In this part, the fuzzy forecasting module of the proposed new hybrid model CFML which contains the rough set theory and the more detailed rule induction algorithm called LEM2 is introduced in brief.
Pawlak and Skoworn proposed RST [51], and it has been acknowledged as one of the most effective mathematical techniques for dealing with uncertainty as well as vagueness. The premise of Rough Set Philosophy is that, due to the lack of information in the discourse space related to each object, the few information objects distinguished by the same information cannot be distinguished. The set of all indistinguishable objects is regarded as the basic set and creates the basic particles of cosmic knowledge. Any union of elementary sets is accepted as an exact set; otherwise, the set is called a rough set. RST includes the utilization of indiscernibility relations to approximately approach the sets of objects by upper and lower approximations [52]. This rough set theory is widely used to acquire more accurate rules to predict objects, and the LEM2 algorithm is usually adopted as a way of applying rough set theory to the induction of rules.
LEM2 [53], a rough set rule induction algorithm, is most frequently adopted as it has better results in most cases. In this study, the formed rules are generated in an “if-then” manner through composing several fuzzy decision values as well as fuzzy conditional values. Moreover, “supports” indicate how many records are archived in the dataset that matches the generated decision rules. LEM2 computes a local covering and then converts it into a rule set. LEM2 learns a discriminant rule set; it learns the smallest set of minimal rules describing a concept. This algorithm can generate both certain and possible rules from a decision table. The rough set induction LEM2 algorithm has several advantages because of the application of rough set theory, as follows:
1.
Rough sets can discern hidden facts and make it possible for us to understand these facts in natural language, which contributes a great deal to decision making;
2.
Rough sets take the background information of decision makers into account;
3.
Rough sets can deal with both qualitative and quantitative attributes;
4.
Rough sets enable machines to extract certain rules in a relatively short time, which means it reduces the time cost of discovering hidden rules.
The detailed process of how LEM2 works is briefly demonstrated as follows: For an attribute–value pair ( e ; u ) = o , a block of n which is signified by [ o ] , is a set of instances belonging to H so that, for an attribute, e has a value u. For a concept represented by the decision–value pair ( n ; p ) , B is a nonempty upper or lower approximation of it. Set K consists of a set of attribute–value pairs o = ( e ; u ) , which is called set T only under the condition that T = o T o K , where set T is a minimal complex of K only under the condition that K depends on set T and that there are no subsets of T such that K depends on the subset. Symbol C is a nonempty collection of nonempty attribute–value pair sets, and L is the local covering of K . A more detailed explanation can be found in the work of Grzymala-Busse [53].
Figure 3 demonstrates the pseudocode of LEM2 based on the study of Liu et al. [54].
Step 1.
Compute all attribute–value pair blocks.
Step 2.
Identify attribute–value pairs with the largest ( e ; u ) G .
Step 3.
If the cardinality of the set ( e ; u ) G is equal to another one, then select the attribute pair with the smallest block size.
Step 4.
If necessary, we have to go through an additional internal loop in order to find the candidates for the minimal complex.
Step 5.
Then, the following steps are used to find the second minimal complex and so on.
Step 6.
Finally, we can get the local covering of a hidden fact, which may reveal the decision-making process.

2.4. Multi-Objective Grey Wolf Optimizer (MOGWO)

To get more accurate forecasts, we adopt the GWO (grey wolf optimizer) algorithm which is modified to deal with the multi-objective problems to optimize the main forecasting model LSTM. By using the multi-objective optimization theory, we can achieve both an accurate and a stable forecasting quality.
Mirjalili et al. proposed the grey wolf optimization algorithm [55], which was based on grey wolves’ social leadership and hunting skills. In addition, the hunting process is led by three wolves (α, β, and δ). The rest of the wolves follow these three leaders throughout the whole search process to approach the global best solution.
The following formulas were proposed in an effort to emulate the encircling behaviors of grey wolves:
K = B × R p ( i t e ) R ( i t e )
R ( i t e + 1 ) = R p ( i t e ) M × K
where K denotes the distance between the prey and the predator, ite refers to the current iteration, R denotes the position vector of wolves, R p is the prey’s position vector, and M and B are coefficient vectors:
M = 2 c × e 1 c
B   = 2 c × e 2 c
where e 1 and e 2 are random vectors in [ 0 , 1 ] and the elements of c decrease linearly from 2 to 0 across all iterations.
The GWO algorithm archives the first three best results gained so far in each iteration and then imposes other agents, namely the rest of the wolves, to update the positions with respect to them. The following formulas are calculated constantly for each search agent [55] in order to mimic the hunting process, and the promising regions of the search space are also found in this process:
K α = B 1 × R α R
K β = B 2 × R β R
K δ = B 3 × R δ R
R 1 = R α M 1 ×   ( K α )
R 2 = R β M 2 ×   ( K β )
R 3 = R δ M 3 ×   ( K δ )
R ( t + 1 ) = R 1 + R 2 + R 3 3
The B vector produces random values in [ 0 , 2 ] . This will help the GWO algorithm show increased behavior in the whole optimization process and help to avoid and explore the local optimum. All these steps are illustrated in Figure 4. Ri is the position of wolf i, which also represents the initial weight and threshold of the LSTM model. That is to say, Ri is a vector and its dimension is determined by the number of initial weights and thresholds of the LSTM model and each element in this vector is a value of a threshold or a weight of LSTM.
Attacking is the final stage of hunting, in which the wolf pack catches the prey and the prey stops moving. The process is determined by D . Grey wolves will continue to hunt when D < 1 , and the wolves are obliged to leave the prey when D > 1 .

2.5. Long Short-Term Memory (LSTM)

The LSTM model was developed by Schmidhuber and Horchreiter [56]. The harmless gradient in the network is truncated by forcing constant error flow through the constant error turntable in a special multiplication unit. In order to cope with these constant error flows, all of the nonlinear units are able to learn to close or open gates in this network.
The cell state is the key part of the LSTM structure. It runs directly along the entire chain, deleting or adding information to the cell state, carefully adjusted by structures called gates. These gates serve as optional entry points for this information. They consist of a pointwise multiplication operation and a sigmoid neural net layer (Figure 5).
An input at time i is ( X i ), and the following formulas are used to compute the hidden state ( S i ):
  • In the LSTM module, the first step is to determine which information will be discarded from the cell state. The forget gate ( f i ) is in charge of making decisions, as follows:
    f = σ ( X i ×   T f + S i 1 ×   V i + b i )
    where σ is the sigmoid function which turns the input value into an outcome between 0 and 1. T signifies weight parameters, and b denotes bias parameters (i.e., T f , T j , T c , and T o and b i i , b j , b c , and b o ). In this part, the exponents of T and V are not power values; they are just notations used to illustrate which gate the parameters belong to. For instance, T f represents the weight parameters belonging to the forget gate, namely gate f.
  • The next step is to determine which new information will be selected and stored in the cell state. This step has two sub-steps: The first one is the input gate ( I n p u t i ) layer that helps to determine which value is going to be updated. A tanh layer is the second one, which produces a vector composed of new candidate values C i . Calculations are demonstrated as follows:
    I n p u t i = σ ( X i T i n p u t + S i 1 V i n p u t + b j )
    C ˜ i = t a n h ( X i T c + S i 1 V c + b c )
    where C ˜ i is a candidate memory cell, which is similar to a memory cell, but uses a tanh function.
  • The next step is to update the old cell state C i 1 into the new cell state C i , which can be described as follows:
    C i = C i 1 f i I n p u t i C ˜ i
    In Equation (26), the symbol represents pointwise multiplication.
  • The final step is to determine what is about to be generated and selected as the output. This output is a filtered version which is predicated on the cell state, during which the output gate ( o i ) determines which final output will consist of a specific part of the cell state. After, the cell state runs through the tanh layer, which is multiplied by the output gate as follows:
    o i = σ X i T o + S i 1 V o + b o
    S i = o i t a n h ( C i )
Algorithm: MOGWO-LSTM
Objective function
min f i t n e s s 1 = B i a s ( x ^ ) f i t n e s s 2 = S t d ( x x ^ )
Input:
Training data: x t 0 = x 0 1 , x 0 2 , , x 0 p
Testing data: x f 0 = x 0 p + 1 , x 0 p + 2 , , x 0 p + l
Output:
y ^ f 0 = y ^ f 0 p + 1 , y ^ f 0 p + 2 , , y ^ f 0 p + l —a series of forecasting data
Parameters of MOGWO:
Iter—the maximum number of iterationsn—the number of grey wolves
t—the current iteration numberRi—the position of wolf i
e1—the random vector in [0, 1]c—the constant vector in [0, 2]
Parameters of LSTM:
Iteration—the maximum number of iterationsBias_input—the bias vector of the input gate in [0, 1]
Input_num—the knots of the inputBias_forget—the bias vector of the forget gate in [0, 1]
Cell_num—the knots of the cellBias_output—the bias vector of the output gate in [0, 1]
Output_num—the knots of the output
Cost_gate—the termination error cost
yita—the rate of adjustment for the weight at each time
data_num-the number of columns of training data.
1:/*Set the parameters of MOGWO and LSTM*/
2:/*Initialize the grey wolf population Ri (i = 1, 2, ..., n) randomly*/
3:/*Initialize c, M, and B*/
4:/*Define the archive size*/
5: FOR EACH i: 1 ≤ in DO
6: Evaluate the corresponding fitness function Fi for each search agent
7: END FOR
8: /*Find the non-dominated solutions and initialize the archive with them*/
9: Rα, Rβ, Rδ= SelectLeader(archive)
10: WHILE (t < Iter) DO
11: FOR EACH i: 1 ≤ in DO
12: /*Update the position of the current search agent*/
13: Kj = |Bi RjR|, i = 1, 2, 3; j = α, β, δ
14: Ri = RjMi Kj, i = 1, 2, 3; j = α, β, δ
15: R(t + 1) = (R1 + R2 + R3)/3
16: END FOR
17: /*Update c, M, and B*/
18: M = 2 c e1−c; B = 2 c e2c
19: /*Evaluate the corresponding fitness function Fi for each search agent*/
20: /*Find the non-dominated solutions*/
21: /*Update the archive with regard to the obtained non-dominated solutions*/
22: IF the archive is full DO
23: /*Delete one solution from the current archive members*/
24: /*Add the new solution to the archive*/
25: END IF
26: IF any newly added solutions to the archive are outside the hypercubes DO
27: /*Update the grids to cover the new solution(s)*/
28: END IF
29: Rα, Rβ, Rδ = SelectLeader(archive)
30: t = t + 1
31: END WHILE
32: RETURN archive
33: OBTAIN R* = SelectLeader(archive)
34: Set R* as the initial weight and threshold of LSTM
35: /*Standardize the training data and testing data*/
36: /*Initialize the structure of the LSTM network*/
37:/*Initialize cost_gate, bias_input, bias_forget, bias_output and the weight of the LSTM network*/
38: FOR EACH i: 1 ≤ iIteration DO
39: yita=0.01
40: FOR EACH m: 1 ≤ mdata_num DO
41: Equation (15) to Equation (20)
42: /*Calculate the error cost of this round*/
43: error cost = t = 1 l f o r e c a s t e d   x ^ t a c t u a l   x t 2 , l is the dimension of testing data
44: IF error cost < cost_gate DO
45: Break
46: END IF
47: /*Update the weight of all gates*/
48: END FOR
49: IF error cost < cost_gate DO
50: Break
51: END IF
52: END FOR
53: /* Learning process has been done/
54: Input the standardized historical data into LSTM to forecast the future changes
55: De-normalize the obtained forecasting outcomes and generate the final forecasting results
There are two commonly adopted criteria for verifying forecasting effectiveness, accuracy and stability. Also, we should not just focus on one objective. Both objectives—high accuracy and stability—should be studied simultaneously and implemented in the optimization part. Therefore, based on bias-variance framework, the fitness function should be defined as follows:
E ( x ^ x ) 2 = E x ^ E ( x ^ ) + E ( x ^ ) x 2 = E x ^ E ( x ^ ) 2 + E ( x ^ ) E ( x ) 2 = V a r ( x ^ ) + B i a s 2 ( x ^ )
where x is the actual value, x ^ is the forecasted value, and E is the expectation value of the corresponding variable.
The bias equals the average difference between the actual and forecasted values, which represents forecasting accuracy. A smaller absolute value of the bias demonstrates a more accurate forecasting accuracy. A smaller variance value indicates a more stable forecasting performance. However, in the conduct of most experiments, it was found that the criteria are not suitable for issues that this paper seeks to address. Thus, the standard deviation of forecasting errors is selected as a substitute for fitness 2. Therefore, the fitness function in this paper is formulated as follows:
min f i t n e s s 1 = B i a s ( x ^ ) f i t n e s s 2 = S t d ( x x ^ )
Hence, the objectives of multi-objective optimization problems are usually conflicting. In that regard, the Pareto optimal solution set provides an answer since it represents the best trade-offs between different objectives. Our optimization problem in this study is a minimization issue, so the way we choose suitable solutions can be formulated as follows:
Minimize the following:
F ( x ) = f 1 ( x ) , f 2 ( x ) , , f o ( x )
Subject to the following:
g i ( x ) 0 ,   i = 1 , 2 , , m
h i ( x ) 0 ,   i = 1 , 2 , , p
L i x i U i ,   i = 1 , 2 , , n
where o denotes the number of objectives, m is the number of inequality constraints, p is the number of equality constraints, and Li and Ui are the lower and upper boundaries of the i-th variables, respectively.
Also, several definitions regarding this problem is listed as follows:
Definition 1.
Pareto dominance.
Suppose that there are two vectors: x = ( x 1 , x 2 , , x k ) and y = ( y 1 , y 2 , , y k ) . Vector x dominates y, denoted as x y , if
i 1 , 2 , , k , f i ( x ) f i ( y ) i 1 , 2 , , k : f i ( x )
Definition 2.
Pareto optimality.
The solution x X is named a Pareto optimal if
y X F ( y ) F ( x )
Two solutions are non-dominated with respect to each other if neither of them dominates the other.
Definition  3.
Pareto optimal set.
The set including all non-dominated solutions is named a Pareto set as follows:
P s : = x , y X F ( y ) F ( x )
Definition 4.
Pareto optimal front.
A set containing the corresponding values of Pareto optimal solutions in a Pareto optimal set is defined as a Pareto optimal front:
P f : = F ( x ) x P s

2.6. Evaluation Module

This section illustrates reasonable and scientific evaluating modules. In addition, some typical evaluation metric rules that are usually adopted in the relevant research are adopted to verify the forecasting performance; R 2 (Pearson’s correlation coefficient) and DM test methods are also exploited in this paper.

2.6.1. Typical Performance Metric

As far as we know, there are no uniform and consistent criteria to test the validity of the prediction results or to compare the results with those of other models. In this study, we adopt lots of multifarious methods and metrics, which are all shown in Table 2. Here, N is the length of the dataset, A denotes the actual value, whereas F represents the forecasting value.

2.6.2. Diebold–Mariano Test

Considering α as the significance level, the null hypothesis H 0 indicates that there are no significant differences between the two different forecasting models. Otherwise, H 1 denotes the disagreement with H 0 . The following formulas indicate the related hypotheses:
H 0 : E L o s s e i 1 = E L o s s e i 2
H 1 : E L o s s e i 1 E L o s s e i 2
where L o s s represents the loss function of forecasting errors and e i p ( p = 1 , 2 ) are the forecasting errors of two comparison models.
Furthermore, the DM test statistics can be calculated as follows:
DM   value = i = 1 n L o s s e i 1 L o s s e i 2 / n S 2 / n s 2
where s 2 is an estimation for the variance of d i = L o s s e i 1 L o s s e i 2 .
The DM test value is compared with Z α / 2 . H 0 will be rejected under the circumstance that the DM statistic falls outside the acceptance interval [ Z α / 2 , Z α / 2 ] , which indicates that there is a significant difference between the comparison models and the forecasting performances of the proposed model, meaning we accept H 1 .

3. Analysis and Experiments

In this part, three different experiments using four different wind speed datasets acquired from Liaotung peninsula and two different electrical power load datasets collected from QLD (Queensland) are carried out to test the proposed hybrid system.

3.1. Raw Data Description

In this study, four different 10-min wind speed datasets were collected from four sites (Figure 6), namely the four wind pour plants in the Liaotung peninsula: the Hengshan site ( 40 ° Ν , 120 ° Ε ), Xianren island ( 40 ° Ν , 122.5 ° Ε ), the Donggang site ( 42.5 ° Ν , 122.5 ° Ε ), and the Danton site ( 40 ° Ν , 125 ° Ε ).
Also, two additional electrical load datasets were applied to demonstrate the efficiency of the hybrid forecasting model. The total number of data points in each wind speed dataset was 9488, and that of the electrical load was 2544. Only the first 1000 observations were adopted to verify the model. Of the total 1000 observations, the first 900 observations were used as the training set, while the testing set contained the remaining 100 observations (Figure 6). Furthermore, some basic statistical information, i.e., minimum, average values, as well as maximum values, etc. of the dataset referred to above are demonstrated in Table 3.

3.2. Experiment I: Tests of MOGWO and LSTM

In this experiment, we present two subparts to verify the superiority of the MOGWO and LSTM forecasting algorithm, respectively.

3.2.1. Test of MOGWO

The four typical test functions that are demonstrated in Table 4 are commonly used to verify the superiority of the proposed optimizer and to deal with the multi-objective optimization issues [57,58,59]. NSGA-Ⅱ and multi-objective dragonfly (MODA) were used in this study for comparison. The experimental parameters were as follows: the search agents’ total number was 50, the archive size was 50, and the iteration number was 100. The inverted generational distance (IGD), a widely used metric, was adopted in this paper for the evaluation. Each test function was tested fifty times, and Table 5 shows the statistical values of the IGD. Moreover, Figure 7 demonstrates the Pareto optimal solutions which were acquired by different algorithms.
Based on the outcomes, two conclusions were made as follows:
  • The MOGWO algorithm obtained the best IGD outcomes among almost all optimizers for four test functions (Kursawe, ZDT1, ZDT2, and ZDT3) while performing worse than the Kursawe as well as ZDT1 algorithms in terms of the minimum value and worse than MODA regarding the standard deviation. From a whole perspective, these outcomes are strong enough to demonstrate the superior optimization ability of MOGWO algorithms compared with the others.
  • Figure 7 shows that the MOGWO algorithm was able to obtain more Pareto optimal solutions. In addition, the solutions found by the MOGWO algorithm were more evenly distributed on the true PF (pareto front) curve and were closer to the real Pareto optimal solutions.
Remark: The optimizing ability of MOGWO has been proven through the results and discussions of the aforementioned experiment comparison. Thus, MOGWO can be widely used to cope with multi-objective problems, thus being adopted as the best optimization model in the proposed CFML system.

3.2.2. Test of LSTM in CEEMD-FTS-MOGWO-LSTM

This subsection aims to compare LSTM, DBN, CNN, and SAE for the four wind speed datasets collected from four different wind farms with 10-min data. We set the parameters for each model based on the error and bias since there are no previous studies on how to set the optimal parameters. Also, to reduce the impact of randomness, we took the mean value of the experiments performed 50 times. The relative results and detailed values are listed in Table 6, and Figure 8 demonstrates the prediction outcomes of the aforementioned four models at the four wind speed sites. From the forecasting data, we drew several conclusions:
  • The LSTM model achieved almost the best results and the most accurate predictions of all four wind speed datasets with roughly the same run time and identical training and testing datasets. Namely, the adopted LSTM model outperformed the CNN, DBN, and SAE from a whole perspective and provided fairly competitive results.
  • For the data collected from the four different wind farms, the LSTM model worked better than the other three deep learning models, which means that the superiority of the LSTM forecasting algorithm remained, regardless of the different geographical distribution, to some extent.
  • The forecasting performance of different models was adequately reflected by the error metrics adopted by us in this part. That is to say, error measurement is effective and can be used to accurately evaluate the ability of the prediction models.
Remark: For all four datasets, although the LSTM model performed more poorly than the other models on some metrics, the best values of the majority of error metrics, such as mean absolute error (MAE), square root of average of the error squares (RMSE), mean absolute percentage error (MAPE), index of agreement (IA), and so on, indicate that the adopted LSTM model can achieve excellent forecasting accuracy. That is also the reason why we chose LSTM as the main forecasting model in our proposed hybrid forecasting model.

3.3. Experiment II

The comparisons made in this experiment were conducted to demonstrate the specific improvements brought by the fuzzy time series forecasting part and the optimizer algorithm as well as the combination of MOGWO and FTS. Furthermore, an experiment to prove the enhancement in the forecasting ability of the combined model brought by CEEMD was made as well. Moreover, the comparisons between the proposed hybrid forecasting model and all the other models are also listed and analyzed in this part. Table 7 and Figure 9 demonstrate the relevant error metric values of the models mentioned above.
(1)
For the first comparison, WNN, GRNN, ARIMA, and the LSTM models were built and compared with each other in order to determine the best one for performing wind speed forecasting, which was found to be the ARIMA. However, of all the neural network algorithms, LSTM was shown to be the best one, and Experiment I proved that LSTM is better than the other three deep learning models as well. Hence, the following steps and comparisons are all based on the basic and regular forecasting model—LSTM.
(2)
In terms of R (Pearson’s correlation coefficient), ARIMA failed to outdo LSTM in datasets A and B. In addition, we tried AR, MA, ARMA, and ARIMA with different parameters each, and we found that of all these settings, ARMA(2,1), ARIMA(3,1,2), and ARIMA(3,2,2), achieved almost the same forecasting accuracy at about 8% MAPE, which is apparently better than that of the other neural networks. The reason for this phenomenon is that the moving-average model that includes AR requires clear rhythm patterns and fairly linear data series trends, whereas wind speed datasets are neither seasonal nor regular, so all of these irregular features were almost removed by the moving-average method as a result of the differencing operation.
(3)
From Table 7, for example, the MOGWO-LSTM achieved a MAPE value of 8.64%, while the basic LSTM model only achieved a MAPE value of 9.48% in the case of site A. Moreover, we tested the effectiveness of the fuzzy time series forecasting part. For example, in the case of site B, the MAPE value of FTS-LSTM was 7.91%, 8.34% lower than that of the LSTM model.
(4)
According to Figure 8, the FTS-MOGWO-LSTM model achieved 8.02% in MAPE and 75.34% in r2 from a mean perspective, although it failed to reach the highest r2 value in datasets A and B. Next, the separate improvement on the forecasting ability brought by FTS or MOGWO varied in different datasets. For example, in the case of dataset A, FTS-LSTM was higher than MOGWO-LSTM, which means that MOGWO contributes more to forecasting.
(5)
Apart from these comparisons, the decomposition algorithm was also tested in this part. In this paper, we tested several parameter configurations regarding the Nstd (signal noise ratio), NR (noise addition number), Maxiter (maximum number of iterations), and modes (number of IMFs) in the CEEMD algorithm. We tested the Nstd (0.05–0.4), NR (10–500), Maxiter (100–1000), and modes (9–13) to find the best configuration. Detailed parameter settings vary from dataset to dataset, so settings should be changed at any time when the dataset is changed. In this part, for instance, the best settings for dataset A were as follows: an Nstd of 0.2, an NR of 50, and a Maxiter of 500. The total IMF number was 12, and the best accuracy is acquired by 11 IMFs. Also, Table 8 shows that the CFML model achieved the highest r2 value and the lowest MAPE in all four data sites, which demonstrates the improvements brought by CEEMD.
Remark: Through the aforementioned comparisons and conclusions, it is apparent that the proposed hybrid forecasting model achieves the best values in all the applied error metrics. Moreover, the outcomes prove that the adopted multi-objective optimizer MOGWO, the data decomposition approach CEEMD, and the fuzzy time series part can improve the forecasting ability of the original forecasting model LSTM to a great extent.

3.4. Experiment III: Tested with Electrical Load Data

The third experiment aims to verify the performance of the proposed CFML forecasting model in QLD (Queensland) electrical power load forecasting (Figure 10). Due to the similarity in weekdays or weekends and the noticeable differences between the load data from weekdays and weekends, the data from Wednesday was randomly selected as a representative of weekdays and the data from Sunday was chosen to represent weekends [60]. Table 8 and Table 9 list the experimental outcomes. All forecasting results from Wednesday and Sunday are depicted in Figure 10. In addition, the basic datasets from Wednesday and Sunday in QLD are shown in Figure 10, and both of these datasets were collected from Queensland in Australia. The specific results of electrical load forecasts are presented and shown clearly in this subsection, from which the following conclusions were drawn:
(1)
Regarding the electrical power load data from Wednesday and all forecasting steps, the proposed hybrid forecasting system performed the best among all the other models. Moreover, among all the single models involved in this experiment, the single model that performed best was the WNN algorithm, while the worst was the CNN model. However, this may be a result of the data features, which does not mean that the CNN constantly performs more poorly than the WNN model. Since the regular form of the CNN model is designed to deal with figure data, to perform unidimensional time series forecasting, it should be first transformed into a matrix in which each row contains many observations, such as 128 or 256, just like the grey scale image data to some extent. Otherwise, it is also reasonable and practical to let each row represent the number you would like to use as input data, but a compromise in the accuracy may arise on some occasions.
(2)
For the test of the optimization part and the verification of the fuzzy forecasting part, comparisons between MOGWO-LSTM and LSTM and comparisons between FTS-LSTM and LSTM are obviously shown in the aforementioned tables and figures, respectively. For instance, on Wednesday, the regular LSTM model achieved a MAPE of 2.81%, which is higher than the MAPE of FTS-LSTM by 39.14%. Moreover, the MOGWO-LSTM increased by 37.36% in terms of the MAPE of 1.76%. Also, the FTS-MOGWO-LSTM model possessed a MAPE of 1.46%, lower than that of the single LSTM combined with FTS or MOGWO. Noticeably, although this combined model did not have that highest r2, it was not obviously lower than that of other compared models. Moreover, it was apparently higher than that of regular networks such as GRNN, WNN, DBN, SAE, and so on.
(3)
All comparisons for the electrical power load data on Wednesday and Sunday demonstrate that the decomposition methods achieved the best forecasting results. In this study, we tested different parameter settings regarding the Nstd, NR, Maxiter, and modes for EMD, EEMD, and CEEMD. The following outcomes were all acquired based on the best parameter settings for each decomposition algorithm. Table 9 and Table 10 show that the CEEMD method apparently outweighs the EEMD and EMD methods, which explains why the CEEMD was selected by us and employed in this research. Also, from Figure 10, the forecasts gained by the CEEMD model corresponded most to the real data on both Wednesday and Sunday.
Remark: Based on the three experiments mentioned above, the strong applicability of the developed model in these two electrical power load signals and in different wind data sites, which feature different characteristics, reasonably and convincingly demonstrates that the CEEMD-FTS-MOGWO-LSTM model has universal applicability. Also, the CFML model performs better than all other compared benchmark models.

4. Discussion

In this section, based on the Diebold–Mariano test (DM test), we discuss and analyze the forecasting model’s statistical significance, after which we adopt the Pearson’s correlation coefficient to discuss the association strength. Then, to verify the contributions of our CFML model, the improvement percentages between different combinations of basic models are also discussed in this section. Also, the multistep-ahead forecasting of the developed model and a sensitivity analysis are conducted.

4.1. Discussion I: Statistical Significance

The DM test is widely used to demonstrate the significance of the improvement brought by the developed CFML forecasting system compared with other algorithms. Table 10 lists the specific DM test outcomes, which demonstrates that we are able to reject the null hypothesis at the 1% significance level because all of the compared models’ DM test outcomes were greater than the critical 1% significance value for all four wind speed datasets and the two electrical power load data series. Hence, we are convinced that the proposed CFML forecasting system obviously outweighs the other compared algorithms. According to this, we are able to conclude reasonably that the hybrid forecasting framework displays a significant difference in terms of the statistical level. Furthermore, this proves that the proposed CFML model is superior to the other models mentioned above and involved in wind speed forecasting.

4.2. Discussion II: Association Strength

The Pearson test can reveal the correlation strength between the predicted and actual values, which was proposed by scientist Karl Pearson. In this section, the correlation strength is discussed based on the Pearson test to prove the superiority of the proposed hybrid prediction model and all other comparative models. Specifically, if the Pearson’s correlation coefficient is equal to 0, there is no linear relationship between the two sets of data and, if the Pearson’s correlation coefficient is equal to 1, there is a linear relationship between the actual value and the predicted value. Table 11 demonstrates the outcomes of the Pearson’s test, from which we were able to obtain the conclusion that the values of all other comparative models were lower than that of the proposed CFML forecasting model, which shows that the forecasting values of the CFML model possess higher association strengths to some extent.

4.3. Discussion III: Improvement Percentage

In order to fully and clearly demonstrate the superiority of the proposed hybrid prediction system, this section discusses the percentage improvements in MAPE, RMSE, MAE, and direction accuracy (DA) between the developed system and other comparative models. These comparisons analyze and quantify how each component works in the overall prediction framework. Table 12 demonstrates the outcomes of the improvement percentages, taking dataset B and the electrical load power on Wednesday as examples, which shows the following conclusions:
(1)
By contrasting the improvement percentage between FTS-MOGWO-LSTM with FTS-LSTM and MOGWO-LSTM, we drew the conclusion that the combination of MOGWO and FTS contributes more than either FTS-LSTM or MOGWO-LSTM to the forecasting ability of the whole presented hybrid CFML forecasting model.
(2)
The comparison between the CEEMD-FTS-MOGWO-LSTM and the FTS-MOGWO-LSTM models obviously revealed the improvement brought by the addition of the decomposition approach CEEMD.
(3)
On average, all improvement percentages were positive and significant, except for the percentages of FTS-MOGWO-LSTM, as it fluctuated according to different datasets with different features. This can be studied in the future. Regardless of the fluctuations, all values revealed that FTS-MOGWO-LSTM does perform better than the regular one.

4.4. Discussion Ⅳ: Multistep-Ahead Forecasting

Now, we consider that the one-step forecasting model is sometimes insufficient to ensure the controllability and reliability of the electrical power load or wind speed forecasting system. Therefore, to test the multistep performance of the developed CFML system, the multistep prediction in this study used the two datasets listed in Table 3 (i.e., dataset A and the electrical power load on Sunday as representatives).
Table 13 illustrates the forecasting outcomes of those comparative models (i.e., GRNN, LSTM, and EEMD-FTS-MOGWO-LSTM) and the proposed CEEMD-FTS-MOGWO-LSTM forecasting model. It can be observed that for one-step, two-step, and three-step predictions using electrical power load data or wind speed data, the proposed model always achieved the lowest MAPE value in the test models. That is to say, the developed framework effectively carried out multistep-ahead forecasting in electrical power load prediction or wind speed prediction (through effective error index measurements).

4.5. Discussion V: Sensitivity Analysis

The hybrid forecasting model has two essential parameters, namely the number of iterations and the number of search agents. Hence, in this subsection, we explore the effects of these two parameters on the prediction performance of wind speed dataset A. That is, the other parameters’ values were unchanged, while the number of search agents and iterations changed. Specifically, we set the search agents as 5, 10, 15, 20, 25, and 30, and then, we kept the search agent at the value of 10, changing the values of iterations to 5, 10, 20, 30, 40, and 50. Table 14 and Table 15 illustrate the experimental outcomes of dataset A. The following conclusions were drawn:
(1)
The value of MAPE first decreased as the number of search agents increased. Then, it declined to the minimum value with 10 search agents, after which it started increasing and fluctuated at a high level except for a decrease at 25 search agents. Overall, we can see that the proposed hybrid CFML forecasting model performed the best with 10 search agents.
(2)
Keeping the number of search agents at the best value of 10, we changed the number of iterations in order to check the influence caused by the iterations on the performance of the presented model. We almost drew a similar conclusion to that of the search agents to some degree. We can see that, as the number of iterations increased from 5 to 30, the accuracy measured by various metrics, especially MAPE, first fell to the minimum value with 10 iterations and then rose gradually as the number of iteration increased. According to these two conclusions, we set the number of search agents and the number of iterations to 10 in our experiment.
(3)
It was found through the comparisons that the number of those two parameters would worsen the performance of the CEEMD-FTS-MOGWO-LSTM system proposed in this study if either they were too small or too big. In addition, different prediction conditions were shown to depend to a large extent on the decision-making process. Therefore, it is important to figure out the optimal parameters under different application conditions.
Remark: According to Discussions I to V, we can draw the conclusion that the proposed hybrid forecasting system, namely CEEMD-FTS-MOGWO-LSTM, possesses a more effective and stable forecasting ability, regarding not only the wind speed but also the electrical power load, than other models in terms of a lot of aspects, such as the correlation strength, statistical significance, and forecasting accuracy. Also, the small number of iterations and search agents demonstrates the superiority and convenience of the proposed model.

5. Conclusions

Accurate wind speed electrical power load forecasting is crucial for power grid safety management, power system operation, and the power market. However, due to the nonlinearity and randomness of wind speed data and electrical power load series, it is still a difficult and challenging task to establish an effective forecasting framework to deal with this problem. In this study, a new hybrid prediction system was developed in order to obtain stability and accuracy simultaneously. Four wind speed datasets and two electrical power load datasets were adopted to test the effectiveness of the hybrid forecasting framework. The outcomes show that our proposed system outperformed all other comparative benchmark models on many indicators. Firstly, a data preprocessing decomposition approach, named CEEMD, was successfully applied in this study to enhance the forecasting ability of the CFML forecasting model. Secondly, an effective multi-objective optimization algorithm, MOGWO, was successfully combined and used to find out the optimal initial parameters. It not only achieved better results in testing functions than the other two optimization models (NSGA-II and MODA) but also showed the best optimization capability. Moreover, fuzzy time series forecasting with the rough set induction rule, which is based on the LEM2 algorithm to build rule sets, was successfully combined with MOGWO and the deep learning algorithm, called LSTM, in this paper. It was shown that the addition of the FTS part, the MOGWO part, and the data decomposition part all bring improvements in the performance of the hybrid forecasting framework. Also, a similar method can be applied in other fields, for example, the electrical power load, which was verified in this paper. Finally, the forecasting models CEEMD, FTS, and MOGWO showed the ability to carry the strength of each component and to effectively improve the forecasting ability of the CFML forecasting model in terms of stability and accuracy.

Author Contributions

Conceptualization, D.W. and J.W.; Methodology, J.W.; Software, D.W.; Validation, J.W., K.N. and G.T.; Formal Analysis, D.W. and J.W.; Investigation, K.N. and G.T.; Resources, D.W.; Data Curation K.N. and G.T.; Writing-Original Draft Preparation, D.W.; Writing-Review & Editing, D.W. and J.W.; Visualization, D.W. and J.W.; Supervision, J.W.; Project Administration, D.W.; Funding Acquisition, J.W.

Funding

This work was supported by the National Natural Science Foundation of China (grant number 71671029).

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Abbreviations

List of AbbreviationsFBThe fractional bias
U1The Theil U statistic 1
CFMLCEEMD-FTS-MOGWO-LSTMU2The Theil U statistic 2
WNNWavelet Neutral Network DAThe direction accuracy
GRNNGeneralized Regression Neural Network INDEXThe improvement ratio of the index among different models
SAESparse Autoencoder R2The Pearson’s correlation coefficient
LSTMLong Short-Term MemoryDMDiebold–Mariano test
DBNDeep Belief NetworkH0The null hypothesis
CNNConvolutional Neural NetworkH1The alternative hypothesis
IGDThe inverted generational distanceα The confidence level
FTSFuzzy time seriesXtAn input at time t
LEM2Learning from examples module version twoStThe hidden state
ARAutoregressive modelMAMoving-average model
ARIMAAutoregressive Integrated Moving AverageARMAAutoregressive moving average model
MODAMulti-objective dragonflySt−1The previous time step
MOGWOMulti-objective grey wolfftThe forget gate
NSGA-ⅡNon-dominated sorted genetic algorithm-ⅡitThe input gate
KαThe distance between wolf α and the preyR1The position of wolf α at time ite+1
KβThe distance between wolf β and the preyR2The position of wolf β at time ite+1
KδThe distance between wolf δ and the preyR3The position of wolf δ at time ite+1
QLDQueenslandCt−1The old cell state
PniPositive noiseNniNegative noise
AEThe average errorWniNoise with identical amplitude and phase
MAEThe mean absolute errorOtThe output gate
RMSEThe root-mean-square errorgjThe j-th inequality constraint
NMSEThe normalized average of the squares of errorhjThe j-th equality constraint
MAPEThe mean absolute percentage errorRSTRough set theory
IMFIntrinsic mode functionIAThe index of agreement
ZDT2Zitzler–Deb–Thiele’s function N. 2ZDT1Zitzler–Deb–Thiele’s function N. 1
KursaweKursawe functionZDT3Zitzler–Deb–Thiele’s function N. 3
EMDEmpirical Mode Decomposition
EEMDEnsemble Empirical Mode Decomposition
CEEMDComplete Ensemble Empirical Mode Decomposition

References

  1. Ou, T.C. A novel unsymmetrical faults analysis for microgrid distribution system. Int. J. Electr. Power Energy Syst. 2012, 43, 1017–1024. [Google Scholar] [CrossRef]
  2. Ou, T.C. Ground fault current analysis with a direct building algorithm for microgrid distribution. Int. J. Electr. Power Energy Syst. 2013, 53, 867–875. [Google Scholar] [CrossRef]
  3. Lin, W.M.; Ou, T.C. Unbalanced distribution network fault analysis with hybrid compensation. IET Gener. Transm. Distrib. 2010, 5, 92–100. [Google Scholar] [CrossRef]
  4. Ou, T.C.; Lu, K.H.; Huang, C.J. Improvement of transient stability in a hybrid power multi-system using a designed NIDC (novel intelligent damping controller). Energies 2017, 10, 488. [Google Scholar] [CrossRef]
  5. Ye, S.; Zhu, G.; Xiao, Z. Long term load forecasting and recommendations for china based on support vector regression. Energy Power Eng. 2012, 4, 380–385. [Google Scholar] [CrossRef]
  6. He, Q.; Wang, J.; Haiyan Lu, H. A hybrid system for short-term wind speed forecasting. Appl. Energy 2018, 226, 756–771. [Google Scholar] [CrossRef]
  7. Yang, W.; Wang, J.; Lu, H. Hybrid wind energy forecasting and analysis system based on divide and conquer scheme: A case study in China. J. Clean. Prod. 2019, 222, 942–959. [Google Scholar] [CrossRef] [Green Version]
  8. Abdel-Aal, R.E.; Elhadidy, M.A.; Shaahid, S.M. Modeling and forecasting the mean hourly wind speed time series using GMDH-based abductive networks. Renew. Energy 2009, 34, 1686–1699. [Google Scholar] [CrossRef]
  9. Wang, J.; Niu, T.; Lu, H.; Yang, W.; Du, P. A Novel Framework of Reservoir Computing for Deterministic and Probabilistic Wind Power Forecasting. IEEE Trans. Sustain. Energy 2019. [Google Scholar] [CrossRef]
  10. Ma, L.; Luan, S.Y.; Jiang, C.W.; Liu, H.L.; Zhang, Y. A review on the forecasting of wind speed and generated power. Renew. Sustain. Energy Rev. 2009, 13, 915–920. [Google Scholar]
  11. Cardenas-Barrera, J.L.; Meng, J.; Castillo-Guerra, E.; Chang, L. A neural networkapproach to multi-step-ahead, short-term wind speed forecasting. IEEE 2013, 2, 243–248. [Google Scholar]
  12. Torres, J.L.; García, A.; Blas, M.D.; Francisco, A.D. Forecast of hourly average wind speed with arma models in navarre (Spain). Sol. Energy 2005, 79, 65–77. [Google Scholar] [CrossRef]
  13. Liu, H.; Tian, H.Q.; Li, Y.F. An emd-recursive arima method to predict wind speed for railway strong wind warning system. J. Wind Eng. Ind. Aerodynam. 2015, 141, 27–38. [Google Scholar] [CrossRef]
  14. Kavasseri, R.G.; Seetharaman, K. Day-ahead wind speed forecasting using arima models. Renew. Energy 2009, 34, 1388–1393. [Google Scholar] [CrossRef]
  15. Yang, D.; Sharma, V.; Ye, Z.; Lim, L.I.; Zhao, L.; Aryaputera, A.W. Forecasting of global horizontal irradiance by exponential smoothing, using decompositions. Energy 2015, 81, 111–119. [Google Scholar] [CrossRef] [Green Version]
  16. Li, Y.; Ling, L.; Chen, J. Combined grey prediction fuzzy control law with application to road tunnel ventilation system. J. Appl. Res. Technol. 2015, 13, 313–320. [Google Scholar] [CrossRef]
  17. Barbounis, T.G.; Theocharis, J.B. A locally recurrent fuzzy neural network with application to the wind speed prediction using spatial correlation. Neurocomputing 2007, 70, 1525–1542. [Google Scholar] [CrossRef]
  18. Guo, Z.H.; Wu, J.; Lu, H.Y.; Wang, J.Z. A case study on a hybrid wind speed forecasting method using BP neural network. Knowl. Based Syst. 2011, 24, 1048–1056. [Google Scholar] [CrossRef]
  19. Li, G.; Shi, J. On comparing three artificial neural networks for wind speed forecasting. Appl. Energy 2010, 87, 2313–2320. [Google Scholar] [CrossRef]
  20. Jiang, P.; Liu, F.; Song, Y.L. A hybrid forecasting model based on date-framework strategy and improved feature selection technology for short-term load forecasting. Energy 2017, 119, 694–709. [Google Scholar] [CrossRef]
  21. Hao, Y.; Tian, C. The study and application of a novel hybrid system for air quality early-warning. Appl. Soft Comput. 2019, 74, 729–746. [Google Scholar] [CrossRef]
  22. Zhang, X.; Wang, J.; Gao, Y. A hybrid short-term electricity price forecasting framework: Cuckoo search-based feature selection with singular spectrum analysis and SVM. Energy Econ. 2019, 81, 899–913. [Google Scholar] [CrossRef]
  23. Lago, J.; Ridder, F.D.; Schutter, B.D. Forecasting spot electricity prices: Deep learning approaches and empirical comparison of traditional algorithms. Appl. Energy 2018, 221, 386–405. [Google Scholar] [CrossRef]
  24. Hinton, G.E.; Osindero, S.; Teh, Y.W. A fast learning algorithm for deep belief nets. Neural Comput. 2006, 18, 1527–1554. [Google Scholar] [CrossRef] [PubMed]
  25. Ni, K.L.; Wang, J.; Tang, G.J.; Wei, D.X. Research and Application of a Novel Hybrid Model Based on a Deep Neural Network for Electricity Load Forecasting: A Case Study in Australia. Energies 2019, 12, 2467. [Google Scholar] [CrossRef]
  26. Fischer, T.; Krauss, C. Deep learning with long short-term memory networks for financial market predictions. Eur. J. Oper. Res. 2018, 270, 654–669. [Google Scholar] [CrossRef] [Green Version]
  27. Khatami, A.; Khosravi, A.; Nguyen, T.; Lim, C.P.; Nahavandi, S. Medical image analysis using wavelet transform and deep belief networks. Expert Syst. Appl. 2017, 86, 190–198. [Google Scholar] [CrossRef]
  28. Hinton, G.; Deng, L.; Yu, D.; Dahl, G.; Mohamed, A.R.; Jaitly, N.; Senior, A.; Vanhoucke, V.; Nguyen, P.; Sainath, T.; et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. Signal Process. Mag. 2012, 29, 82–97. [Google Scholar] [CrossRef]
  29. Peris, A.; Domingo, M.; Casacuberta, F. Interactive neural machine translation. Comput. Speech Lang. 2017, 45, 201–220. [Google Scholar] [CrossRef] [Green Version]
  30. Wang, H.Z.; Li, G.Q.; Wang, G.B.; Peng, J.C.; Jiang, H.; Liu, Y.T. Deep learning based ensemble approach for probabilistic wind power forecasting. Appl. Energy 2017, 188, 56–70. [Google Scholar] [CrossRef]
  31. Fan, C.; Xiao, F.; Zhao, Y. A short-term building cooling load prediction method using deep learning algorithms. Appl. Energy 2017, 195, 222–233. [Google Scholar] [CrossRef]
  32. Kong, X.; Xu, X.; Yan, Z.; Chen, S.; Yang, H.; Han, D. Deep learning hybrid method for islanding detection in distributed generation. Appl. Energy 2018, 210, 776–785. [Google Scholar] [CrossRef]
  33. Coelho, I.; Coelho, V.; Luz, E.; Ochi, L.; Guimarães, F.; Rios, E. A GPU deep learning metaheuristic based model for time series forecasting. Appl. Energy 2017, 201, 412–418. [Google Scholar] [CrossRef]
  34. Hong, Y.Y.; Chang, H.L.; Chiu, C.S. Hour-ahead wind power and speed forecasting using simultaneous perturbation stochastic approximation (spsa) algorithm and neural network with fuzzy inputs. Energy 2010, 35, 3870–3876. [Google Scholar] [CrossRef]
  35. Mohandes, M.A.; Halawani, T.O.; Rehman, S.; Hussain, A.A. Support vector machines for wind speed prediction. Renew. Energy 2004, 29, 939–947. [Google Scholar] [CrossRef]
  36. Zhou, J.; Shi, J.; Li, G. Fine tuning support vector machines for short-term wind speed forecasting. Energy Convers. Manag. 2011, 52, 1990–1998. [Google Scholar] [CrossRef]
  37. He, J.M.; Wang, J.; Xiao, L.Q. A hybrid approach based on the Gaussian process with t-observation model for short-term wind speed forecasts. Renew. Energy 2017, 114, 670–685. [Google Scholar]
  38. Hao, Y.; Tian, C. A novel two-stage forecasting model based on error factor and ensemble method for multi-step wind power forecasting. Appl. Energy 2019, 238, 368–383. [Google Scholar] [CrossRef]
  39. Niu, T.; Wang, J.; Lu, H.; Du, P. Uncertainty modeling for chaotic time series based on optimal multi-input multi-output architecture: Application to offshore wind speed. Energy Convers. Manag. 2018, 156, 597–617. [Google Scholar] [CrossRef]
  40. Wang, J.; Li, H.; Lu, H. Application of a novel early warning system based on fuzzy time series in urban air quality forecasting in China. Appl. Soft Comput. J. 2018, 71, 783–799. [Google Scholar] [CrossRef]
  41. Bates, J.M.; Granger, C.W.J. The combination of forecasts. Oper. Res. Q. 1969, 20, 451–468. [Google Scholar] [CrossRef]
  42. Xiao, L.; Qian, F.; Shao, W. Multi-step wind speed forecasting based on a hybrid forecasting architecture and an improved bat algorithm. Energy Convers. Manag. 2017, 143, 410–430. [Google Scholar] [CrossRef]
  43. Xiao, L.; Wang, J.; Hou, R.; Wu, J. A combined model based on data pre-analysis and weight coefficients optimization for electrical load forecasting. Energy 2015, 82, 524–549. [Google Scholar] [CrossRef]
  44. Wang, J.; Du, P.; Lu, H.; Yang, W.; Niu, T. An improved grey model optimized by multi-objective ant lion optimization algorithm for annual electricity consumption forecasting. Appl. Soft Comput. J. 2018, 72, 321–337. [Google Scholar] [CrossRef]
  45. Li, H.; Wang, J.; Li, R.; Lu, H. Novel analysis-forecast system based on multi-objective optimization for air quality index. J. Clean. Prod. 2019, 208, 1365–1383. [Google Scholar] [CrossRef]
  46. Liu, H.; Tian, H.Q.; Pan, D.F.; Li, Y.F. Forecasting models for wind speed using wavelet, wavelet packet, time series and artificial neural networks. Appl. Energy 2013, 107, 191–208. [Google Scholar] [CrossRef]
  47. Liu, H.; Tian, H.Q.; Li, Y.F. Comparison of new hybrid FEEMD-MLP, FEEMD-ANFIS, Wavelet Packet-MLP and Wavelet Packet-ANFIS for wind speed predictions. Energy Convers. Manag. 2014, 89, 11. [Google Scholar] [CrossRef]
  48. Afshar, K.; Bigdeli, N. Data analysis and short term load forecasting in Iran electricity market using singular spectral analysis (SSA). Energy 2011, 36, 2620–2627. [Google Scholar] [CrossRef]
  49. Yeh, J.R.; Shieh, J.S.; Huang, N.E. Complementary ensemble empirical mode decomposition: A novel noise enhanced data analysis method. Adv. Adapt. Data Anal. 2010, 2, 135–156. [Google Scholar] [CrossRef]
  50. Anbazhagan, S.; Kumarappan, N. Day-ahead deregulated electricity market price forecasting using recurrent neural network. IEEE Syst. J. 2013, 7, 866–872. [Google Scholar] [CrossRef]
  51. Pawlak, Z.; Skoworn, A. Rudiments of rough sets. Inf. Sci. 2007, 177, 3–27. [Google Scholar] [CrossRef]
  52. Stefanowski, J. On rough set based approaches to induction of decision rules. Rough Sets Knowl. Discov. 1998, 1, 500–529. [Google Scholar]
  53. Grzymala-Busse, J.W. A new version of the rule induction system LERS. Fundam. Inform. 1997, 31, 27–39. [Google Scholar]
  54. Liu, L.; Wiliem, A.; Chen, S.; Lovell, B.C. Automatic Image Attribute Selectionfor Zero-Shot Learning of Object Categories. In Proceedings of the Twenty Second International Conference on Pattern Recognition, Stockholm, Sweden, 24–28 August 2014; pp. 2619–2624. [Google Scholar]
  55. Mirjalili, S.; Saremi, S.; Mirjalil, S.M.; Coelho, L.S. Multi-objective grey wolf optimizer: A novel algorithm for multi-criterion optimization. Energy 2016, 47, 106–119. [Google Scholar] [CrossRef]
  56. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  57. Yang, W.; Wang, J.; Niu, T. A hybrid forecasting system based on a dual decomposition strategy and multi-objective optimization for electricity price forecasting. Appl. Energy 2019, 235, 1205–1225. [Google Scholar] [CrossRef]
  58. Zhou, Q.G.; Wang, C.; Zhang, G.F. Hybrid forecasting system based on an optimal model selection strategy for different wind speed forecasting problems. Appl. Energy 2019, 250, 1559–1580. [Google Scholar] [CrossRef]
  59. Jiang, P.; Liu, Z. Variable weights combined model based on multi-objective optimization for short-term wind speed forecasting. Appl. Soft Comput. 2019, 82, 105587. [Google Scholar] [CrossRef]
  60. Zhang, X.; Wang, J.; Zhang, K. Short-term electric load forecasting based on singular spectrum analysis and support vector machine optimized by Cuckoo search algorithm. Electr. Power Syst. Res. 2017, 146, 270–285. [Google Scholar] [CrossRef]
Figure 1. Explicit processes of data input and complementary ensemble empirical mode decomposition (CEEMD) parts of the CEEMD-fuzzy time series (FTS) multi-objective grey wolf optimizer (MOGWO)-LSTM (CFML) model.
Figure 1. Explicit processes of data input and complementary ensemble empirical mode decomposition (CEEMD) parts of the CEEMD-fuzzy time series (FTS) multi-objective grey wolf optimizer (MOGWO)-LSTM (CFML) model.
Energies 12 03588 g001
Figure 2. Flowchart of the paper and the input data of the first forecasting.
Figure 2. Flowchart of the paper and the input data of the first forecasting.
Energies 12 03588 g002
Figure 3. Flowchart of fuzzy time series forecasting.
Figure 3. Flowchart of fuzzy time series forecasting.
Energies 12 03588 g003
Figure 4. Position updating mechanism of search agents and the effects of A on it.
Figure 4. Position updating mechanism of search agents and the effects of A on it.
Energies 12 03588 g004
Figure 5. LSTM (long-short-term memory) structure.
Figure 5. LSTM (long-short-term memory) structure.
Energies 12 03588 g005
Figure 6. Four wind speed datasets with 10-min time intervals.
Figure 6. Four wind speed datasets with 10-min time intervals.
Energies 12 03588 g006
Figure 7. Obtained Pareto optimal solutions by NSGA-II, MODA, and MOGWO for the test functions: Kursawe, ZDT1, ZDT2, and ZDT3.
Figure 7. Obtained Pareto optimal solutions by NSGA-II, MODA, and MOGWO for the test functions: Kursawe, ZDT1, ZDT2, and ZDT3.
Energies 12 03588 g007
Figure 8. Forecasting results of the four deep learning algorithms.
Figure 8. Forecasting results of the four deep learning algorithms.
Energies 12 03588 g008
Figure 9. Forecasting results of the developed forecasting system and the other compared models (Experiment II).
Figure 9. Forecasting results of the developed forecasting system and the other compared models (Experiment II).
Energies 12 03588 g009
Figure 10. The forecasting results from Wednesday and Sunday as well as the basic data descriptions.
Figure 10. The forecasting results from Wednesday and Sunday as well as the basic data descriptions.
Energies 12 03588 g010
Table 1. The selected input variables for long short-term memory (LSTM).
Table 1. The selected input variables for long short-term memory (LSTM).
FactorsExplanation
X t The present value
LAG1first-order lagged period X t 1
LAG2second-order lagged period X t 2
D1difference 1: D t = X t f o r e c a s t e d   X t
D2difference 2: D t 1 = X t 1 f o r e c a s t e d   X t 1
D2difference 3: D t 2 = X t 2 f o r e c a s t e d   X t 2
Table 2. Performance metric rules.
Table 2. Performance metric rules.
MetricDefinitionEquation
AEAverage error of N forecasting results A E = 1 N i = 1 N A i F i
MAEMean absolute error of N forecasting results M A E = 1 N i = 1 N A i F i
RMSESquare root of average of the error squares R M S E = 1 N i = 1 N A i F i 2
NMSEThe normalized average of the squares of the errors N M S E = 1 N i = 1 N A i F i 2 F i A i
MAPEAverage of N absolute percentage error0 M A P E = 1 N i = 1 N A i F i A i × 100 %
IAIndex of agreement of the forecasting results I A =   1 i = 1 N A i F i 2 i = 1 N F i A ¯ + A i + A ¯ 2
FBFractional bias of N forecasting results F B = 2 × A ¯ F ¯ A ¯ + F ¯
U1Theil U statistics 1 of forecasting results U 1 = 1 N i = 1 N A i F i 2 1 N i = 1 N A i 2 + 1 N i = 1 N F i 2  
U2Theil U statistics 2 of forecasting results U 2 = 1 N i = 1 N A i + 1 F i + 1 A i 2 1 N i = 1 N A i + 1 F i A i 2  
DADirection accuracy of the forecasting results D A = 1 l i = 1 l w i , w i = 0 , o t h e r w i s e 1 , i f ( A i + 1 A i ) ( F i + 1 A i ) > 0
INDEXImprovement ratio of the index among different models I N D E X = F i A i c o m p a r e d F i A i p r o p o s e d F i A i p r o p o s e d
RPearson’s correlation coefficient R = i N A i A ¯ F i F ¯ i N A i A ¯ 2 i N F i F ¯ 2
Table 3. Statistical values of each experiment dataset.
Table 3. Statistical values of each experiment dataset.
Data SetStatistical IndicatorData SetStatistical Indicator
Mid.Max.Min.StdMeanMid.Max.Min.StdMean
Dataset A Dataset D
All samples4.939117.2000.10002.70724.6000All samples4.875417.7000.10002.64134.6000
Training5.740111.8001.20002.01365.8000Training5.601112.5000.90001.89375.8000
Testing4.97507.10002.80000.77745.0000Testing5.11206.70003.10000.79835.2500
Dataset B Dataset E
All samples5.267428.8000.10002.90404.9000All samples6043.48180.74488.0841.076189.2
Training6.119012.7001.30002.04816.2000Training6065.98180.74488.0849.286214.1
Testing5.61907.10003.10000.82375.7000Testing5840.77221.24515.0736.475981.9
Dataset C Dataset F
All samples5.071822.1000.10002.90004.6000All samples5515.57780.54357.8684.295444.2
Training5.826212.3001.30002.09465.8000Training5542.37780.54357.8693.795472.0
Testing5.09206.50002.90000.71085.1000Testing5273.96416.34447.3537.195170.9
Table 4. Four test benchmark functions.
Table 4. Four test benchmark functions.
KursaweZDT1
Minimize f 1 x = i = 1 2 10 exp 0.2 x 1 2 + x 2 2   Minimize f 1 x = x i
Minimize   f 2 x = i = 1 3 x i 0.8 + 5 sin x i 3 Minimize   f 2 x = g x × h f 1 x , g x
where 5 x i 5 , 1 i 3 where G x = 1 + 9 N 1 i = 2 N x i h f 1 x , g x = 1 f 1 x g x 0 x i 1 , 1 i 30
ZDT2ZDT3
Minimize   f 1 x = x i Minimize   f 1 x = x i
Minimize   f 2 x = g x × h f 1 x , g x Minimize   f 2 x = g x × h f 1 x , g x
where G x = 1 + 9 N 1 i = 2 N x i h f 1 x , g x = 1 f 1 x g x 2 0 x i 1 , 1 i 30 where G x = 1 + 9 29 i = 2 N x i h f 1 x , g x = 1 f 1 x g x 0 x i 1 , 1 i 30
Table 5. Statistical values of the inverted generational distance (IGD) for four test functions.
Table 5. Statistical values of the inverted generational distance (IGD) for four test functions.
Test FunctionsIGD Values
MeanMax.Min.Std.Med.
Kursawe
MODA0.0125000.0215000.0085000.0036000.011500
NSGA-Ⅱ0.0065000.0155000.0045000.0028000.005900
MOGWO0.0052000.0058000.0049000.0002510.005200
ZDT1
MODA0.0146000.0223000.0079000.0048000.014400
NSGA-Ⅱ0.0158000.0364000.0003750.0088000.013500
MOGWO0.0068000.0164000.0021000.0038000.005900
ZDT2
MODA0.0139000.0221000.0069000.0046000.012100
NSGA-Ⅱ0.0292000.0604000.0033000.0135000.025600
MOGWO0.0090000.0194000.0012000.0055000.008100
ZDT3
MODA0.0187000.0259000.0070000.0052000.019300
NSGA-Ⅱ0.0115000.0215000.0047000.0047000.011000
MOGWO0.0056000.0150000.0010000.0030000.005600
The values in bold indicate the best value of each benchmark function.
Table 6. Forecasting results of the four deep learning algorithms at four sites.
Table 6. Forecasting results of the four deep learning algorithms at four sites.
SitesModelsAEMAERMSENMSEMAPEIAFBrU1U2
Dataset ACNN−0.14660.55580.68710.02150.11430.99580.02990.58710.06930.845
DBN−0.25780.49170.61050.01690.09880.99670.05280.71220.06180.8078
SAE−0.24310.48910.60840.01650.09820.99670.05010.70830.0620.7952
LSTM0.27060.43640.54620.01380.09480.9973−0.05300.79150.05290.6470
Dataset BCNN0.12510.55150.71980.01910.10630.9963−0.02200.55580.06280.8427
DBN0.23380.48530.61110.01150.08850.9974−0.04020.72840.05210.8130
SAE−0.018140.50770.63370.01460.09470.99710.00320.66080.05600.7455
LSTM0.20340.44480.56770.01220.08630.9977−0.03560.76450.04920.6598
Dataset CCNN−0.19740.55850.70950.02210.12880.99570.03940.58920.07000.8672
DBN0.10240.47650.61380.01540.09810.9968−0.01970.55890.05870.7241
SAE−0.04510.45380.58080.01450.09360.99710.00890.64810.05680.6088
LSTM0.01310.44190.57310.01460.09280.9971−0.00260.62410.05570.6151
Dataset DCNN−0.19740.55850.70950.02210.11320.99570.03940.58920.07000.8672
DBN−0.37840.54900.68270.02020.10770.99600.07720.74190.06880.8875
SAE−0.15860.47760.60250.01540.09580.99690.03150.72270.05920.7510
LSTM−0.11680.42950.55650.01310.08680.99740.02310.74880.05440.7198
The values in bold indicate the best values of each benchmark function.
Table 7. Results of the developed forecasting framework and other models (Experiment II).
Table 7. Results of the developed forecasting framework and other models (Experiment II).
SitesModelsAEMAERMSENMSEMAPEIAFBU1U2DAr2
Dataset AARIMA0.00190.39950.49410.01150.08370.9978−0.00040.04910.65760.42420.7830
GRNN0.14270.46500.57230.01540.10040.9970−0.02830.05620.69840.49490.7057
WNN0.00890.46780.58120.01610.09940.9969−0.00180.05780.76210.51520.6738
LSTM0.27060.43640.54620.01380.09480.9973−0.05300.05290.64700.43430.7915
MOGWO-LSTM0.11420.40690.50070.01170.08640.9978−0.02270.04920.63760.51520.7848
FTS-LSTM0.05990.40410.49660.01150.08520.9978−0.01200.04910.64530.48480.7822
FTS-MOGWO-LSTM−0.01060.39620.48840.01100.08220.99790.00210.04830.65850.43430.7828
CEEMD-FTS-MOGWO-LSTM−0.02050.23140.29640.00410.04870.99920.00410.02960.74390.73740.9359
Dataset BARIMA0.01290.41600.53440.01030.07680.9980−0.00230.04700.65280.48480.7809
GRNN0.17440.45480.59070.01360.08890.9975−0.03060.05140.70580.58590.7252
WNN0.17960.43940.55100.01180.08580.9978−0.03150.04790.66920.49490.7726
LSTM−0.26420.42710.54870.01380.08630.99730.05450.05600.77110.51520.7855
MOGWO-LSTM−0.02910.43380.55210.01140.08030.99780.00520.04880.69260.51520.7519
FTS-LSTM−0.20940.43980.57000.01160.07910.99770.03800.05120.76500.54550.7632
FTS-MOGWO-LSTM−0.09800.42040.53560.01050.07730.99790.01760.04770.71890.52530.7663
CEEMD-FTS-MOGWO-LSTM−0.05320.23450.28500.00320.04390.99940.00950.02530.61450.76770.9737
Dataset CARIMA−0.00310.43130.54850.01300.08930.9974−0.00060.05340.59050.43430.6835
GRNN0.09320.48360.63930.01850.10380.9964−0.01810.06170.67990.50510.5105
WNN0.01990.47270.64380.01860.10050.9964−0.00390.06250.71270.43430.5528
LSTM0.01310.44190.57310.01460.09280.9971−0.00260.05570.61510.55560.6241
MOGWO-LSTM0.12060.43150.57390.01460.09190.9971−0.02340.05530.59970.52530.6341
FTS-LSTM−0.05500.41810.54160.01290.08680.99740.01090.05310.62600.57580.6515
FTS-MOGWO-LSTM0.09290.38920.52010.01210.08260.9976−0.01810.05030.60440.60610.6905
CEEMD-FTS-MOGWO-LSTM−0.01890.24320.30850.00390.04880.99920.00370.03010.55520.76770.9154
Dataset DARIMA−0.04150.41940.54160.01210.08460.99750.00810.05250.67180.42420.7712
GRNN−0.17440.49860.63080.01730.10080.99660.03470.06210.79360.50510.6845
WNN−0.26580.47780.59790.01500.09430.99690.05340.05940.77320.42420.7516
LSTM−0.11680.42950.55650.01310.08680.99740.02310.05440.71980.48480.7488
MOGWO-LSTM−0.06520.40940.54110.01220.08310.99750.01280.05270.70090.54550.7516
FTS-LSTM−0.04470.40750.52630.01140.08250.99760.00880.05120.69540.58590.7582
FTS-MOGWO-LSTM0.01510.38250.50310.01060.07870.9978−0.00300.04870.69230.58590.7740
CEEMD-FTS-MOGWO-LSTM−0.12820.25690.32720.00390.04910.99930.99910.02540.59210.71790.9280
Table 8. Experimental outcomes of the proposed forecasting system and other models (Experiment III, Wednesday).
Table 8. Experimental outcomes of the proposed forecasting system and other models (Experiment III, Wednesday).
ModelsAEMAERMSENMSEMAPEIAFBU1U2DAR
GRNN45.5683196.0143241.48240.00180.03470.9996−0.00780.02040.70290.35350.9478
WNN20.7296168.5709205.65670.00140.03020.9997−0.00350.01740.76510.66670.9604
CNN137.5887229.2342239.65510.00250.04030.9994−0.02330.02470.79530.4440.9362
DBN19.7873191.6701243.23050.00190.03410.9996−0.00340.02060.72480.27270.9444
SAE101.5634175.8553214.93760.00130.03040.9997−0.01730.01810.73510.43430.9652
LSTM−161.6647172.3742205.91650.00110.02810.99970.02810.01770.74250.75760.9916
FTS-LSTM47.641896.6102115.93600.00040.01710.9999−0.00810.00980.53280.70710.9904
MOGWO-LSTM−83.5761106.2048129.75330.00040.01760.99990.01440.01110.55700.76770.9918
FTS-MOGWO-LSTM26.917984.9093104.10580.00030.01460.9999−0.00460.00880.46290.76770.9903
EMD-FTS-MOGWO-LSTM52.636267.776579.82000.00020.01161.0000−0.00900.00680.49400.73740.9968
EEMD-FTS-MOGWO-LSTM11.396155.940869.42550.00010.00961.0000−0.00200.00590.41420.82830.9957
CEEMD-FTS-MOGWO-LSTM−29.731147.553764.36270.00010.00831.00000.00510.00550.40300.92930.9970
The values in bold indicate the best values of each benchmark function.
Table 9. Experimental outcomes of the proposed forecasting system and other models (Experiment III, Sunday).
Table 9. Experimental outcomes of the proposed forecasting system and other models (Experiment III, Sunday).
ModelsAEMAERMSENMSEMAPEIAFBU1U2DAR
GRNN17.5396149.6115190.14440.00130.02840.9997−0.00330.01790.69290.38380.9376
WNN38.7488103.6095137.21560.00060.01940.9998−0.00730.01290.64500.70710.9711
CNN16.6657162.4187200.30680.00130.03020.9997−0.00320.01890.70940.37370.9308
DBN8.2873163.9351205.50540.00150.03100.9996−0.00160.01940.72680.32320.9234
SAE6.4847132.5766173.31990.00100.02440.9998−0.00120.01630.67320.49490.9487
LSTM−2.859592.739113.82050.00040.01750.99990.00050.01070.67880.80810.9916
FTS-LSTM−20.649777.1604100.27230.00030.01430.99990.00390.00950.58440.84850.9910
MOGWO-LSTM−4.264178.601798.73970.00030.01480.99990.00080.00930.60910.84850.9917
FTS-MOGWO-LSTM1.984064.881182.84950.00020.01220.9999−0.00040.00780.50730.85860.9907
EMD-FTS-MOGWO-LSTM−6.880062.458579.6677−0.00020.01180.99990.00130.00750.52180.77780.9904
EEMD-FTS-MOGWO-LSTM−8.936347.308260.58720.00010.00881.00000.00170.00570.44530.85860.9960
CEEMD-FTS-MOGWO-LSTM−40.726041.181047.46670.000080.00791.00000.00780.00450.38020.86870.9990
The values in bold indicates the best values of each benchmark function.
Table 10. Results for the Diebold–Mariano (DM) test.
Table 10. Results for the Diebold–Mariano (DM) test.
ModelsDataset ADataset BDataset CDataset DDataset EDataset FAverage
GRNN3.95753.6231 *4.0912 *5.1216 *6.8446 *6.3102 *7.4871 *
WNN4.2354 *3.8835 *4.0655 *5.3478 *7.0481 *5.5907 *5.0285 *
CNN6.4340 *4.4792 *5.4704 *5.1895 *6.6125 *6.8757 *5.8436 *
DBN5.8905 *5.1538 *4.0607 *5.7001 *5.7287 *6.2989 *5.4721 *
SAE5.9415 *4.7566 *4.7852 *5.5428 *7.7388 *6.3725 *5.8563 *
LSTM4.0334 *3.8478 *4.3132 *4.4067 *7.9982 *6.4910 *5.1817 *
FTS-LSTM4.9204 *4.6690 *5.3686 *4.1885 *5.1504 *4.8412 *4.8064 *
MOGWO-LSTM3.6338 *3.8032 *3.9883 *4.0228 *5.4912 *5.5515 *4.4151 *
FTS-MOGWO-LSTM4.4712 *4.1507 *3.8489 *3.8727 *4.0104 *4.5032 *4.1429 *
EMD-FTS-MOGWO-LSTM----1.9867 *4.7634 *3.3751 *
EEMD-FTS-MOGWO-LSTM----0.83532.5458 *1.6901 *
CEEMD-FTS-MOGWO-LSTM-------
* Indicates the 1% significance level.
Table 11. Results for the Pearson’s test.
Table 11. Results for the Pearson’s test.
ModelsDataset ADataset BDataset CDataset DDataset EDataset FAverage
GRNN0.78300.72520.51050.68450.94780.93760.7648
WNN0.67380.77260.55280.75160.95040.97110.7787
CNN0.58710.55580.20780.58920.93620.93080.6345
DBN0.71220.72840.55890.74190.94440.92340.7682
SAE0.70830.66080.64810.72270.96520.94870.7756
LSTM0.79150.78550.62410.74880.99160.99160.8221
FTS-LSTM0.78220.76320.65150.75820.99040.99100.8228
MOGWO-LSTM0.78480.76630.63410.75150.99180.99170.8200
FTS-MOGWO-LSTM0.78280.76630.69050.77400.99030.99070.8324
EMD-FTS-MOGWO-LSTM----0.99680.99040.9936
EEMD-FTS-MOGWO-LSTM----0.99570.99600.9959
CEEMD-FTS-MOGWO-LSTM0.93690.97370.91540.9280.99700.99900.9583
The values in bold indicate the best values of each benchmark function.
Table 12. Results for the discussion of improvement percentages.
Table 12. Results for the discussion of improvement percentages.
Improvement PercentagesDataset BWednesdayAverageDataset BWednesdayAverage
MOGWO-LSTM vs. LSTMFTS-LSTM vs. LSTM
MAE−1.56871938.38706718.409174−2.97354243.95321320.489836
RMSE−0.61964636.9874218.183887−3.88190343.69756719.907832
MAPE6.95249137.36654822.159528.3429939.14590723.744449
U210.18026224.98316517.5817140.79107828.24242414.516751
Improvement PercentagesFTS-MOGWO-LSTM vs. LSTMFTS-MOGWO-LSTM vs. MOGWO-LSTM
MAE1.56871950.74129426.1550073.08898120.05135411.570168
RMSE2.38746149.44271125.9150862.98858919.76635711.377473
MAPE10.42873748.04270529.2357213.7359917.04545510.390723
U26.7695537.65656622.213058−3.79728616.8940756.5483945
Improvement PercentagesFTS-MOGWO-LSTM vs. FTS-LSTMCEEMD-FTS-MOGWO-LSTM vs. MOGWO-LSTM
MAE4.41109612.1114548.26127545.9428355.22452850.58368
RMSE6.03508810.2040788.11958348.3789250.39609849.387508
MAPE2.27560114.6198838.44774245.3300152.84090949.085461
U25.80450713.1193699.46193811.2763527.64811519.462233
Improvement PercentagesCEEMD-FTS-MOGWO-LSTM vs. FTS-MOGWO-LSTMCEEMD-FTS-MOGWO-LSTM vs. LSTM
MAE44.21979143.9947144.10725145.0948372.41251958.753673
RMSE46.78864838.17568342.48216648.0590568.74330158.401175
MAPE43.20827943.15068543.17948249.1309470.46263359.796786
U214.52218712.9401613.73117420.3086545.72390633.016278
Table 13. Results for the multistep-ahead forecasting.
Table 13. Results for the multistep-ahead forecasting.
DataMultistep AheadForecasting ModelsMAERMSENMSEMAPEIAFBU1U2DAr
Dataset AOne-step aheadGRNN0.46500.57230.01540.10040.9970−0.02830.05620.69840.49490.7057
LSTM0.43640.54620.01380.09480.9973−0.05300.05290.64700.43430.7915
EEMD-FTS-MOGWO-LSTM0.25130.32150.00480.05240.99900.01010.03220.70000.77780.9478
CEEMD-FTS-MOGWO-LSTM0.23140.29640.00410.04870.99920.00410.02960.74390.73740.9359
Two-step aheadGRNN0.55460.67480.02030.11700.9959−0.01490.06660.81600.32320.5548
LSTM0.49800.60340.01750.10580.99670.00230.06000.81690.45450.6754
EEMD-FTS-MOGWO-LSTM0.37250.46550.00900.07300.99800.05860.04770.75630.62630.8897
CEEMD-FTS-MOGWO-LSTM0.31210.39300.00720.06670.9986−0.01690.03870.57390.65660.8691
Three-step aheadGRNN0.56880.71830.02390.12620.9954−0.06010.06940.85420.45450.5725
LSTM0.54490.69860.02130.10760.99560.07160.07190.87150.35350.6332
EEMD-FTS-MOGWO-LSTM0.43270.52760.01220.08770.99750.05460.05390.75660.61620.8082
CEEMD-FTS-MOGWO-LSTM0.38180.48910.01160.08400.9978−0.04050.04780.74960.58590.8332
Dataset FOne-step aheadGRNN149.6115190.14440.00130.02840.9997−0.00330.01790.69290.38380.9376
LSTM92.7390113.82050.00040.01750.99990.00050.01070.67880.80810.9916
EEMD-FTS-MOGWO-LSTM47.308260.58720.00010.00881.00000.00170.00570.44530.85860.996
CEEMD-FTS-MOGWO-LSTM41.181047.46670.000080.00791.00000.00780.00450.38020.86870.999
Two-step aheadGRNN192.6101244.05320.00210.03680.9995−0.00690.02300.76530.35350.8958
LSTM137.3114171.66780.00100.02580.99980.00320.01620.67100.48480.9708
EEMD-FTS-MOGWO-LSTM109.3356151.51230.00070.02020.9998−0.00960.01420.64950.60610.9685
CEEMD-FTS-MOGWO-LSTM56.065572.43240.00020.01051.0000−0.00100.00680.49130.78790.9909
Three-step aheadGRNN225.4602287.14600.00290.04290.9993−0.00920.02700.80260.25250.8480
LSTM225.4602287.14600.00290.04290.9993−0.00920.02700.80260.25250.8480
EEMD-FTS-MOGWO-LSTM139.4634177.16130.00120.02720.9997−0.02090.01650.73750.58590.9673
CEEMD-FTS-MOGWO-LSTM107.3264133.46600.00070.02080.9999−0.01200.01250.66430.63640.9779
Table 14. Sensitivity analysis of different search agent numbers based on MOGWO.
Table 14. Sensitivity analysis of different search agent numbers based on MOGWO.
MetricsThe Value of Search Agent Number
51015202530
AE−0.583443−0.020492−0.480544−1.462529−0.670987−1.334932
MAE0.6135910.2313760.4923071.4625290.6803331.334932
RMSE0.6800070.2964080.5404431.4835240.7304941.349639
NMSE0.0192380.0040580.0118980.1271560.0228860.105523
MAPE0.1197430.0487230.0961810.2949950.1332440.271309
IA0.9959080.9992020.9973970.9831230.9953430.985745
FB0.1245800.0041280.1014930.3446320.1446250.309906
U10.0719000.0295620.0564840.1725240.0779150.154459
U20.9746100.7438810.8510541.0108760.9741861.003426
DA0.5757580.7373740.6262630.4848480.5656570.484848
r0.9257130.9358710.9690810.9643140.9550290.970766
The values in bold indicate the best values of each benchmark function.
Table 15. Sensitivity analysis of the different iteration numbers based on MOGWO.
Table 15. Sensitivity analysis of the different iteration numbers based on MOGWO.
MetricsThe Value of Iteration Number
51020304050
AE−0.372790−0.0204920.2882760.0913180.133480−0.398180
MAE0.402220.2313760.3100340.3509800.3554590.409639
RMSE0.4748410.2964080.3622810.4196250.4382890.464493
NMSE0.0089260.0040580.0064530.0085680.0092410.009084
MAPE0.0778560.0487230.0684450.0761890.0776420.080769
IA0.9979620.9992020.9988230.9983900.9982130.998088
FB0.0778500.004128−0.05631−0.01819−0.026470.083373
U10.0490910.0295620.0350480.0413910.0431160.048082
U20.7679180.7438810.7162700.7262360.7444100.779216
DA0.6565660.7373740.6666670.6666670.6464650.676768
r0.9474670.9358710.9693090.8501940.8839030.955046
The values in bold indicate the best values of each benchmark function.

Share and Cite

MDPI and ACS Style

Wei, D.; Wang, J.; Ni, K.; Tang, G. Research and Application of a Novel Hybrid Model Based on a Deep Neural Network Combined with Fuzzy Time Series for Energy Forecasting. Energies 2019, 12, 3588. https://doi.org/10.3390/en12183588

AMA Style

Wei D, Wang J, Ni K, Tang G. Research and Application of a Novel Hybrid Model Based on a Deep Neural Network Combined with Fuzzy Time Series for Energy Forecasting. Energies. 2019; 12(18):3588. https://doi.org/10.3390/en12183588

Chicago/Turabian Style

Wei, Danxiang, Jianzhou Wang, Kailai Ni, and Guangyu Tang. 2019. "Research and Application of a Novel Hybrid Model Based on a Deep Neural Network Combined with Fuzzy Time Series for Energy Forecasting" Energies 12, no. 18: 3588. https://doi.org/10.3390/en12183588

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop