Exploiting Deep Learning for Wind Power Forecasting Based on Big Data Analytics

: Recently, power systems are facing the challenges of growing power demand, depleting fossil fuel and aggravating environmental pollution (caused by carbon emission from fossil fuel based power generation). The incorporation of alternative low carbon energy generation, i.e., Renewable Energy Sources (RESs), becomes crucial for energy systems. Effective Demand Side Management (DSM) and RES incorporation enable power systems to maintain demand, supply balance and optimize energy in an environmentally friendly manner. The wind power is a popular energy source because of its environmental and economical beneﬁts. However, the uncertainty of wind power makes its incorporation in energy systems really difﬁcult. To mitigate the risk of demand-supply imbalance, an accurate estimation of wind power is essential. Recognizing this challenging task, an efﬁcient deep learning based prediction model is proposed for wind power forecasting. The proposed model has two stages. In the ﬁrst stage, Wavelet Packet Transform (WPT) is used to decompose the past wind power signals. Other than decomposed signals and lagged wind power, multiple exogenous inputs (such as, calendar variable and Numerical Weather Prediction (NWP)) are also used as input to forecast wind power. In the second stage, a new prediction model, Efﬁcient Deep Convolution Neural Network (EDCNN), is employed to forecast wind power. A DSM scheme is formulated based on forecasted wind power, day-ahead demand and price. The proposed forecasting model’s performance was evaluated on big data of Maine wind farm ISO NE, USA.


Introduction
Due to the industrial revolution, power demand has increased and fossil fuels are used extensively, resulting in an alarming energy crisis [1]. To mitigate the energy crisis, regulative acts that encourage the utilization of renewable energy are promoted worldwide. Wind power has attracted a lot of attention as a Renewable Energy Sources (RES) recently. Wind power has gained popularity due to its characteristics of wide availability, low investment cost [2] and no carbon emission. Wind power helps in reducing environmental pollution [3]. It is introduced worldwide as a way to reduce greenhouse gas emission. Moreover, replacing thermal generation with wind generation leads to a fuel cost saving as wind has zero fuel costs. According to the Global Wind Energy Council [4], the cumulative capacity of wind power reached 486 GW across the global market in 2016. Wind power is expected to significantly The wind power has a chaotic nature. Therefore, the incorporation of wind power in power supply systems is a risky task. To mitigate this risk, wind power forecasting is the most popular method. The wind power is forecasted using classical, statistical, data mining [9,[20][21][22][23][24][25][26][27][28] and artificial intelligence methods [1,[29][30][31][32][33][34]. The accuracy of wind power forecasting is important to avoid demand-supply imbalance. Therefore, researchers are still competing to improve the wind power forecasting accuracy.
In the literature, there are two types of wind power forecasting techniques: (1) Time series (univariate): Past generation data are used to predict future generation [12,24,25,36]. Univariate data are decomposed to make it multidimensional. Generally, data are decomposed by Discrete Wavelet Transform (DWT), Empirical Mode Decomposition (EMD) or Wavelet Packet Transform (WPT).
The Artificial Neural Networks (ANNs) have widely used for modeling the highly fluctuating wind power data [1,[29][30][31][32]. In [29], the authors forecasted wind power using ensemble ANN. The wind power time series is decomposed using DWT and related features are selected using conditional mutual information. Ensemble ANN is used for short-term wind power prediction. A Gaussian process based ensemble ANN is implemented in paper [30]. Five Gaussian processes and 52 sub-models of ANN are used to predict 48 h wind power. The authors of [1] proposed a bidirectional Extreme Learning Machine (ELM) for 6 h ahead WPF. Nelder-Mead simplex optimization algorithm is proposed for ELM's learning. ANNs combined with optimization techniques show a reasonable forecasting accuracy. However, the ANNs have a few limitations, such as over-training, sensitivity to initial set parameters and instability. The aforementioned methods are shallow learners, therefore, unable to learn the deep underlying structures hidden in the wind power data. To overcome the problem of shallow learning, the deep learning methods are introduced. Deep Neural Networks (DNNs) can model abstract features hidden in the data. The deep learning models have achieved better accuracy in WPF as compared to the ANN forecasting models [33][34][35][36][37]. The popular DNN methods used for WPF are Deep Belief Networks (DBNs) [33], Recurrent Neural Networks (RNNs) [34], Long Short-term Memory (LSTM) [35] and Convolution Neural Networks (CNNs) [36,37].
In [33], ensemble DBNs are utilized as the wind power forecasting model. The wind power time series is decomposed by EMD and predicted by DBN. The building blocks of DBN are Restricted Boltzmann Machines (RBMs). Several RMBs are stacked together to construct a DBN. The DBN training process consists of two main steps: greedy layer-wise pre training and fine-tuning. By increasing the number of inputs, the DBN's computational complexity increase. The authors of [34] combined the RNN and infinite feature selection technique to address the WPF problem. RNN has recurrence operation and maintains data in the memory cells. CNN is superior to the DBN and RNN due to its less training time and efficient feature mining. CNN is a state-of-the-art deep learning method. It is the CNN's characteristic that it can extract the spatial features automatically. CNN is the most popular method for extracting features from the images and widely used in the field of computer vision. The efficient feature extraction capability of CNN motivates us to exploit it for wind power forecasting. CNN successfully extracts the spatiotemporal correlations in wind power data [36,37]. Wang et al. proposed an ensemble CNN model [36]. Wind power time series is decomposed by DWT. Short-term wind power is predicted using ensemble CNN. In ensemble CNN, multiple CNNs are used for prediction of a data point. Prediction is performed by taking (weighted) vote of multiple predictions made by all the CNNs. In [37], an enhanced CNN is proposed for WPF. A new activation function, Scaled Exponential Linear Unit (SELU), is proposed. NWP inputs are used for short-term wind power forecasting. The afore mentioned CNN based prediction models perform reasonably well. However, the effect of using both decomposed and exogenous inputs simultaneously on the accuracy of prediction model still needs to be investigated. According to our limited knowledge, both the decomposed data and NWP are not simultaneously used as input for predicting wind power. Therefore, in this paper, a wind power prediction method is proposed which takes wavelet packet decomposed past wind power, NWP and lagged wind power data as input. The second objective of this research work is the optimal load profiling with the incorporation of wind generation. Previously, the optimal load profile is achieved by load forecasting [38], price-demand forecasting [39] or load and generation forecasting [40]. In this work, the wind generation is also considered, in addition to day-ahead demand and price. By optimal, it means the goal is to achieve a load profile that reduces the generation from dispatchable sources in an economical manner. In this work, a DSM algorithm is proposed on the basis of day-ahead demand, LMP and wind power forecasting. The wind power forecasting and day-ahead demand of MG are used to calculate the difference in the load demand and wind generation. The load is adjusted by shifting it to the low consumption time (valley filling). Thus, the peak periods' load is clipped and the valley periods are filled. The day-ahead LMP is used to calculate the day-ahead consumption cost. In this way, the objectives of energy management and DSM are achieved.

Contributions
In this paper, we are concerned with the problems of predicting the wind power and demand side management with the incorporation of wind power, demand and price. The uniqueness and originality of this work is given below. The contributions of this research work are listed below:

1.
A novel big data-driven wind power prediction model is proposed that combines the strengths of both the univariate and multivariate wind power forecasting techniques by using decomposed and exogenous inputs for forecasting; consequently, the forecasting accuracy is significantly enhanced. 2.
The proposed model employs an existing method wavelet packet decomposition and an enhanced method Efficient DCNN (EDCNN) for feature extraction and forecasting, respectively.

3.
A DSM algorithm is also proposed. The proposed DSM algorithm takes into account the day-ahead demand, day-ahead price and wind power.

4.
The proposed DSM algorithm reduces the consumption cost and improves the load profile to almost a normal shape.

Proposed Model
The proposed model for forecasting wind power generation (as shown in Figure 1) and the proposed DSM algorithm are discussed in this section.

Data Preprocess
The features and targets (wind power) are normalized using min-max normalization. The inputs to the forecasting model are shown in Table 2. Three types of inputs are given to the forecasting model: (i) NWP, i.e., dew point temperature, dry bulb temperature, and wind speed; (ii) past lagged values of wind power; and (iii) wavelet packet decomposed wind power. The wavelet decomposition is described in the next section.

Regression Layer
Wind Power Forecast

Feature Engineering
The historical wind power signal is decomposed using WPT. The WPT is a general form of the wavelet decomposition, which performs a better signal analysis. WPT was introduced in 1992 by Coifman and Wickerhauser [43]. Unlike DWT, the WPT waveforms or packets are interpreted by three different parameters: frequency, position and scale (similar to the DWT). For every orthogonal wavelet function, multiple wavelet packets are generated, having different bases ( Figure 2). With the help of these bases, the input signal can be encoded in such a way that the global energy of signal is preserved and the exact signal can be reconstructed effectively. Multiple expansions of an input signal can be achieved using WPT. The most suitable decomposition is selected by calculating the entropy (e.g., Shannon entropy). The minimal representation of the relevant data based on a cost function is calculated in WPT. The benefit of the WPT is its characteristic of analyzing signals in different temporal as well as spatial positions. For highly nonlinear and oscillating signal such as wind power DWT does not guarantee good results [44]. In WPT, both the approximation and detail coefficients are further decomposed into approximation and detail coefficients as the wavelet tree grows deeper. Wavelet packet decomposition operation can be expressed by Equations (1) and (2). For a signal a to be decomposed, two filters of size 2N are applied on a. The corresponding wavelets are h(n) and g(n).
where the scaling factor is W 0 (a) = φ(a) and the wavelet function is W 1 (a) = ψ(a). The past wind power signal is decomposed into 36 signals and the best representation of the input signal is selected through Shannon entropy.
After decomposing the past wind signals, the engineered features along with NWP variables (dew point, dry bulb, and wind speed), lagged wind power (w-24 and w-25) and time are input to the proposed forecasting model. The proposed forecasting model is discussed in the next section.

Efficient DCNN
Modified CNN is widely used for forecasting [45]. An enhanced CNN for wind power forecasting is discussed below. The inputs are given to the EDCNN for predicting day-ahead hourly wind power (24 values). Firstly, the functionality of trivial CNN is discussed in this section. Secondly, the proposed method EDCNN is explained.
CNN is the computational model of human visual cortex's functionality. CNN has an excellent capability of extracting deep underlying features of data. The CNN effectively identifies the spatially local correlations in data through convolution operation. In the convolution operation, a filter is applied to a block of spatially adjacent neurons ( Figure 3) and the result is passed through an activation function. This output of convolution layer becomes the input to next layer's neurons. Thus, the input to every neuron of a layer is the output of a convolved block of the previous layer. Unlike ANN, the CNN training is efficient due to the weight sharing scheme. Due to the weight sharing, the learning efficiency improves. CNN is composed of three altering layers: (i) convolution layer; (ii) sampling layer; and (iii) fully connected layer. The convolution operation can be explained by Equation (3). Suppose X = [x 1 , x 2 ,x 3 , . . . , x n ] is the vector of training samples and C = [c 1 , c 2 , c 3 , . . . , c n ] is the vector of corresponding targets. n is the number of training samples. CNN attempts to learn the optimal filter weights and biases that minimize the forecasting error. CNN can be defined as: where i = [1,2,. . . , n] and m = [1, 2, . . . , M]. m is the number of layer to be learned. The filter weights of the mth layer is denoted by w m . b m represents the corresponding biases and ⊗ is the convolution operator. f (·) is the nonlinear activation function. Y m i is the feature map generated by sample X i at layer m.
In the proposed forecasting method EDCNN, there are eleven layers: three convolution layers, three max pooling layers, two batch normalization layers, three ReLU (Rectified Linear Unit) layers, one modified fully connected layer and one modified output layer (Enhanced Regression Output Layer (EROL)). The number of filters in all convolution layers is 9. The number of neurons in all the hidden layers is 200. The functionality of two layers is modified to improve the forecasting performance of EDCNN. According to the ANN literature, there is no standard way to choose an optimal activation function. A modified activation function is employed in a hidden layer. The proposed activation function is the ensemble of results of three activation functions: hyperbolic tangent, sigmoid and radial base function (Equations (4)-(6), respectively). The proposed activation function, Equation (7), takes the average of the results of the three used activation functions.
where xw is the intermediate output of a network layer (weighted sum of input) on which activation is to be applied to achieve the final output. φ is the radial base function. The proposed activation function takes the average of the three aforementioned functions to calculate the results of corresponding hidden layer.
In the proposed output layer EROL, a modified objective function is embedded. The objective is to minimize the absolute percentage error between the forecast values and actual targets. The objective can be expressed as Equation (8): where L(w, X i , c i ) is the forecasting error or loss from sample X i . The loss function is expressed as Equation (9): where c i is the desired or actual target. Y i is the output of the output layer of EDCNN and its value is calculated as Y i = F(∑ n i=1 X i w i ). After forecasting the wind power, it is used in the DSM algorithm. The day-ahead Locational Marginal Price (LMP), day-ahead demand and forecasted wind power are the inputs to the proposed DSM algorithm. The proposed DSM algorithm is applied to the data of a smart grid-connected micro grid. The system description is presented in the next section.

System Description
A micro grid with the wind power plant that is connected to a smart grid is studied in this article. For the MG's load management, three parameters are utilized: (i) wind power forecast; (ii) day-ahead demand/load; and (iii) day-ahead LMP. The LMP is the price of energy purchased from the SG in the case of insufficient generation of wind power. In the wind power generation, there are the following possible cases:

Case 1
The first and simplest case is when the generated wind power is equal to the load. There is no gap between the generation and demanded power. In this case, no energy is required to be purchased from the SG. MG is self-sufficient.

Case 2
The wind power generated in the MG is greater than the required power. In this case, the excessive power is transmitted to the SG.
where P G is the active power, W is the wind power, L is the load and the transmission process is denoted by the symbol →. In exchange for this energy, the SG will give MG a subsidiary on the future price of future 24 h energy purchase.

Case 3
Another case is when there is either no or lesser wind power as compared to the demand. In this case, the MG has to purchase the required power from the SG. If there is a subsidiary on price from the past, the price is reduced, otherwise the actual price is paid for purchasing energy. Generally, a 10-15% concession on energy price is offered as a subsidiary. In this case, the proposed demand management algorithm is applied to achieve the objectives listed below: -Load factor maximization -Consumption cost minimization

Problem Formulation
The wind power is forecasted for 24 h. The first objective is to maximize load factor for maximum utilization the power resource (RES generation from wind power plant). The second objective is to minimize the consumption cost.
where LF is the load factor (Equation (13)) and C (Equation (14)) is the total consumption cost.
whereL is the sum of total load,L is the average load, L is the load vector, P is the LMP vector and the unit of LMP is $/MWh. n is the length of the load and LMP vectors.
There are a few constraints of the system. The first constraint is that the demanded load must be equal to the load after applying the DSM scheme. The second constraint is that, after applying the DSM, the consumption cost should be less than the initial cost. The third constraint is that load factor must increase. The following are the constraints (Equations (15)-(17)): where L is load before DSM and L new is load after applying DSM. C old is the consumption cost before DSM and C is cost after DSM. LF new is the load factor after DSM. The purpose of the proposed DSM scheme is to bring the consumption as close to the normal distribution curve as possible. Let the input vectors contain 24 values: W = wind power forecast, L = day-ahead demand and P = day-ahead LMP. The other variables used in the algorithm are: C = consumption cost, S = subsidiary, DWD = demand-wind power difference, P new = new adjusted price, and L new = new normally distributed load after applying DSM scheme.
Manage_Demand(·) is the proposed function for managing demand in an economical manner. This function will distribute the load in a normal form by shaving the peak periods and filling the valley periods (Algorithm 1). The resultant load profile achieved by this method will follow the normal distribution, approximately. L new = Manage_Demand(DWD, L) Managing demand to distribute it normally 15: if S = 0.9 then If there is subsidiary on the price, the price will be adjusted 16:

Results and Analysis
The proposed algorithms were implemented using MATLAB R2018a on a computer system with core i3 processor, 4 GB RAM and 500 GB hard disk.

Data Description
The three-year hourly data of wind power were taken from ISO New England's wind farm located in Maine. The duration of data utilized in this research was from January 2015 to December 2017. The data are publicly available for researchers on the ISO New England's website [46].

Wind Power Analysis
Wind power is a widely available RES, therefore it is one the most popular and emerging power generation sources. The predictive analytics were performed on wind power data of Maine wind farms, ISO New England. According to the annual report, Maine wind farms annually produce approximately 900 MW energy, which contributes almost 14% of the total electricity in Maine. The wind power is directly proportional to the wind speed. In Maine, USA, the wind speed is affected by seasonality. The wind power in autumn is higher compared to the other seasons. The reason behind this is the fastest winds in coastal area of Maine, where the wind turbines are installed.

EDCNN Performance Evaluation
EDCNN was compared with two models, namely typical CNN and SELU CNN [37], for wind power forecasting (Figure 4). For performance evaluation of wind power forecasting, three evaluation indicators were used: Mean Absolute Error (MAE), Normalized Root Mean Square Error (NRMSE) and Mean Absolute Percentage Error (MAPE) ( Table 3). MAPE, NRMSE, and MAE are widely used to evaluate the performance of wind power forecasting models [22,26,47,48]. All the results shown in Figure 4 and Table 3 were taken on one day (24 h) of every season, i.e., 1 January (winter), 1 April (spring), 1 July (summer) and 1 October (autumn).

Statistical Analysis of EDCNN
The aforementioned error indicator (Table 3) were utilized for accuracy comparison of forecasting models. However, the lesser error or higher accuracy of a model does not guarantee its superiority over other models. A model is better as compared to another model if the difference between their accuracies is statistically significant. Different statistical tests are used to validate the significance of models, such as error analysis [49], Friedman test [50], Diebold-Mariano (DM) test [51], etc. To validate the performance of the proposed forecasting model EDCNN, a well-known statistical test, DM, was used. Diebold and Mariano proposed the classical Diebold-Mariano statistical test in 1995 [51]. The DM test evaluates the significant difference between forecasting errors to two models. The null hypothesis H 0 states that the models have equal accuracy (when the value of d FM 1 , FM 2 t in Equation (18) is equal to zero). The alternative hypothesis H 1 is that one model is significantly more accurate as compared to the other mode (if the value of d FM 1 , FM 2 t in Equation (18) is greater than zero, Model 1 is better than Model 2).
A vector of values that are to be forecasted are X = [X 1 , X 2 , . . . , X n ]. Two prediction models FM predict these values, i.e., FM 1 and FM 2 . The forecasting errors of these models are: In this study, the error metric used for DM is MAE. A covariance loss function L(·) and differential loss was calculated in DM as Equation (18) [52]: DM is widely used for validation of wind power forecasting [53]. The results of the DM test with confidence level of 95% are shown in Table 4. DM was applied to the forecasting results of EDCNN and two compared methods: CNN and SELU CNN [37]. Three comparisons were performed, i.e., EDCNN with CNN, EDCNN with SELU CNN and CNN with SELU CNN. The EDCNN was better than CNN and SELU CNN and SELU CNN was better than CNN. The DM and p-values are shown in Table 4).
The significance level of p-value is 5%. In the comparison of EDCNN and SELU CNN, value is more than zero, which depicts that the EDCNN model is significantly better than SELU CNN and similarly EDCNN is better than CNN. According to the DM's hypothesis H 1 , if the DM value is greater than zero, the first model is significantly better than the second model. The results in Table 4 show that the forecasting accuracy of EDCNN is significantly better than SELU CNN and CNN. SELU CNN is significantly better than CNN.

Analysis of Proposed DSM Algorithm
The results of the proposed DSM algorithm are shown in Figure 5. It is clearly seen that the load from peak hours are clipped and shifted to the off peak hours. The total power consumption, power supplied by the MG and power consumed from the SG are shown in Figure 5. The proposed DSM scheme was applied on the 24 h of 7 January 2017 because of the fairly reasonable wind power generation and no zero generation hour throughout the day that leads to a clear depiction of DSM results. The purpose of DSM is to reduce the consumption load of peak hours to minimize the usage of the dispatchable generators of SG. The MG only has WPP and no dispatchable generators. If the wind generation is insufficient, the MG purchase energy from SG. If energy demand of MG's consumers is in the peak hours, then the load of MG is shifted from peak hours to off peak hours. An assumption is made that the MG encourages its consumers to shift their load from peak hours to off peak hours by offering some incentives and consumers shift their consumption load, which leads to overall load shifting in MG; consequently, the consumption cost of consumers is reduced. MG gets the advantage of not purchasing more energy from SG in peak hours (where price is higher than off peak hours' price), which also leads to the purchasing cost reduction for MG. In this manner, the consumers will be satisfied and MG will have cost effective demand management. The proposed algorithm successfully shifts the load. In the proposed method, the load is shifted to off peak hours that are not late night. This is suitable because late night is sleeping hours, and the electricity cannot be consumed much. The goal of almost normally distributing the load profile is achieved. The load before DSM and after applying proposed DSM algorithm is shown in Figure 6. The load profile after DSM is more towards the normal distribution than the profile before DSM. The exact normal distribution of load cannot be achieved because of the fixed working hours.
The electricity consumption in working hours cannot be shifted to other hours in a manner to achieve perfectly normal distribution of load. A portion of load is able to be shifted, which is known as shift-able load. The goal is to shift the shift-able load to improve load factor and reduce price that is achieved by applying proposed DSM.  Another goal of the proposed DSM algorithm is reducing the consumption cost. When the load is shifted to off peak hours, the consumption cost reduces due to the low power price in off peak hours. The reduction in consumption cost achieved by the proposed DSM algorithm is presented in Table 5, which shows the price before and after applying DSM algorithm. The cost reduced by DSM and its percentage is also mentioned. On average, 1.1% of total cost is reduced by applying the proposed DSM algorithm. When the proposed algorithm is applied to the 365 days of t 2017, approximately $2.25 million consumption cost is reduced. The DSM results of one day consumption cost from all four seasons are presented in Table 5. One day from every season of the year is taken for calculating results of DSM algorithm, i.e., 1 January (winter), 1 April (spring), 1 July (summer) and 1 October (autumn). The results confirm the effectiveness of proposed DSM algorithm as it achieves both the objectives: improving load factor and reducing consumption cost (as discussed in Section 6).

Conclusions and Future Work
This paper proposes a wind power forecasting scheme and a demand management strategy. To take part in the daily market that regulates the supply and demand in the Maine micro grid, a new demand management scheme is proposed that makes use of big data-driven wind power forecasting. The effective demand management is subject to the forecasting accuracy. A deep-learning technique EDCNN is developed to accurately predict the day-ahead hourly wind power on the Maine wind farm data. The numeric results validate the efficiency of the proposed model for wind power forecasting. The proposed DSM algorithm normally distributes the load. The results prove that the proposed DSM method successfully distribute the load, making load profile almost normally distributed. Moreover, the proposed DSM algorithm effectively reduces the consumption cost.
In the future work, shorter forecasting times will be considered, e.g., 12 h ahead. The day will be divided into daytime and nighttime to determine the impact of current conditions on wind power forecasting.
The connection possibilities of the power grid under the operating conditions of several wind farms will be analyzed by considering the throughput of power lines, permissible voltage values in the nodes and the power balance in the area.
The power grid in the area will be mapped by modeling the power lines, transformers, sources and loads to determine the impact of wind farm capacity maximization on the operation of the power system's balance and stability.
The impact of wind power forecasting on the changes in power losses in the grid will also be determined in the future work. Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.