Ultra-Short-Term Wind Power Prediction Based on Multivariate Phase Space Reconstruction and Multivariate Linear Regression

: In order to improve the accuracy of wind power prediction (WPP), we propose a WPP based on multivariate phase space reconstruction (MPSR) and multivariate linear regression (MLR). Firstly, the multivariate time series (TS) are constructed through reasonable selection of wind power and weather factors, which are closely associated with wind power. Secondly, the phase space of the multivariate time series is reconstructed based on the chaos theory and C-C method. Thirdly, an auto regression model for multivariate phase space is created by regarding phase variables as state variables, and the very-short-term wind power is predicted by using a multi-linear regression algorithm. Finally, a parallel algorithm based on map/reduce is presented to improve computing speed. A cloud computing platform, Hadoop consisting of ﬁve nodes, is established as a matter of convenience, followed by the prediction of wind power of a wind farm in the Hunan province of China. The experimental results show that the model based on MPSR and MLR is more accurate than both the continuous method and the simple approximation method, and the parallel algorithm based on map/reduce effectively accelerates the computing speed.


Introduction
In the past decades, with the increasing population, industrial need, and energy need [1], a large amount of fuels such as fossil oil, coal, and natural gas have been consumed. However, the fossil fuels can discharge a large amount of greenhouse gas and pollute the environment. What is more, the fossil fuels are non-renewable and diminishing day-by-day. Therefore, researchers have focused on the renewable energy sources, among which wind power generation is one of the most mature renewable energies with lower pollution and greenhouse gas emissions [2]. Wind power generation is affected by wind speed, wind direction, temperature, turbine type, terrain roughness, air density, and so on [3]. The wind power is random and intermittent. To reduce the risk that is caused by the wind power's fluctuation, both a one-day-ahead (0-24 h) and a real-time (15 min-4 h) wind power predicting report should be submitted to the Grid Dispatch Center in China. Actually, the ultra-short-term wind power prediction (UST-WPP) has been extensively employed in many fields, such as balancing load, optimal operation of reserves [4], and wind farm control [5]. The wind farm owners, power users, and facilities benefit from an improved wind power prediction (WPP).
According to the current law in China, the wind power prediction with horizons of 1-4 h and 24 h are necessary. This paper focuses on forecasting wind power with short horizons. Specifically, we The long-term and medium-term wind power predictions do not require very high forecasting accuracy. The short-term WPP forecasts the wind power in next three days and needs more precise results [7], while the ultra-short-term WPP, whose temporal resolution is 15 min, predicts the wind power in next 4 h and requires the highest precision [1]. It is very difficult to accurately forecast wind power because of its chaotic and stochastic characteristics. Additionally, compared to other WPP, the ultra-short-term WPP is more difficult due to its shorter time frames [7]. In the past decades, extensive efforts have focused on WPP, and a large number of wind power prediction methods, models, and tools have been developed. Generally, the WPP methods includes five categories: (a) physical methods, (b) statistic methods, (c) artificial intelligent methods, (d) hybrid methods, and (e) spatio-temporal methods [8].
The physical methods forecast the wind power in terms of the meteorological parameters such as topography, temperature, pressure, wind speed, and wind direction. The well-known physical wind power prediction systems are Prediktor system [9], Previento, and eWind. However, the physical methods are suitable to predict the wind power in 6-72 h or long-term wind power due to the low update frequency of numerical weather prediction (NWP). Because of the high computational cost of NWP, the physical models' application to ultra-short-term WPP are limited. The hybrid methods, which improve the WPP by making use of the advantages of physical methods, statistical methods, and intelligent methods, are gaining attention. Mehmet Baris Ozkan and Pinar Karagoz presented a novel wind power forecast model: Statistical Hybrid Wind Power Forecast Technique (SHWPFT). Compared with other statistical and physical models such as ANNs and SVM, SHWPFT requires less historical data to establish a model [10]. A hybrid forecasting model based on grey relational analysis and wind speed distributional features is presented to improve the effects of the ultra-short-term WPP [11]. Reference [12] provides an approach of short-term WPP with multiple observation points.
The speed and direction of wind are used to forecast the wind power; unfortunately, the error of numeric weather prediction (NWP) must be considered. Since the hybrid models also require the NWP, their application to ultra-short-term WPP is limited too. The statistical methods are widely used for the shorter-term wind power prediction because of the lower computational complexity and cost.
The statistical models can be modified based on the allocation and features of wind farms, and statistical models are widely used to forecast short-term wind power. Generally, the statistical models include the direct prediction and the indirect prediction. The wind speed is forecasted first and then is mapped to the wind power according to the wind power curve in indirect wind power prediction. However, the wind power curve is influenced by the environmental and meteorological variables, which are not considered in indirect methods. Consequently, the accuracy of indirect methods is limited. In direct methods, the wind power is predicted directly according to the historical data, and the wind power predictions have greater accuracy [13]. The statistical models are divided into time series (TS) methods and artificial intelligence (AI) methods. Persistence method is a classical TS method, and it assumes that the wind power at time 't + ∆t' equals the wind power at time 't'. Persistence method is more accurate than most physical and other statistical methods in short-term or ultra-short-term WPP. Hence, any new predicting method should be tested against the classical benchmark (persistence method). The other well-known TS models includes autoregressive (AR), autoregressive moving average (ARMA), autoregressive integrated moving average (ARIMA), and so on. However, these TS models are linear and it is difficult to capture the non-linear patterns. It is too difficult to directly create precise mathematical models for wind power prediction because of wind's complexity, so the artificial intelligence and machine learning methods are employed to improve WPP. Artificial neural network (ANN), Bayesian network (BN) and least square support vector regression (LSSVR) are widely used to improve the ultra-short-term WPP. Duehee Lee and Ross Baldick presented an ensemble model for short-term WPP based on Gaussian processes and neural networks [14] from which the predicted wind power and distribution can be obtained. The ensemble model is relatively accurate in the whole forecasting time horizon, and the predicted distribution objectively reflects the real distribution of wind power. The comparative study shows that the neural networks (NNs) outperform the ARIMA in [15]. A short-term WPP model based on wavelet support vector machine is proposed in reference [16], in which a wavelet support vector machine-based (WSVM) approach effectively reduces the negative effects on predicting results induced by errors of NWP data. Additionally, wind power can be forecasted when fewer training samples are accessed. Nevertheless, the universality should be further verified. The least square support vector machine (LSSVM) outperformed the SVM and NNs in terms of the computational complexity and the global convergence [17] in WPP. It is difficult to directly model the wind power time series that is composed of several components with different characteristics. To improve WPP, the wind power time series are decomposed by the wavelet transform (WT), empirical mode decomposition (EMD), and ensemble empirical mode decomposition (EEMD) in the preprocessing stage [18]. Some prior knowledge and assumptions are necessary to determine the mother wavelet in WT. EMD is a data-driven decomposition method, and outperforms WT. Compared with WT, EMD and its variant EEMD have the greatest accuracy. Although the non-linearity of wind power time series can be dealt with using decomposition methods, the chaotic nature of wind power time series can lead to prediction errors [19]. A method of wind power prediction based on chaotic theory and phase space reconstruction is proposed in reference [20]. Reference [13] proposed a short-term WPP framework based on chaotic time series analysis and singular spectrum analysis, and the accuracy is improved by distinguishing the chaotic and non-chaotic components in both decomposition and prediction stages. However, only the historical wind power data are used to forecast wind power in the phase-space-reconstruction-based WPP model.
According to the output, the statistical WPP is divided into probabilistic prediction and point prediction. To deal with wind ramp dynamics, a support-vector-machine-enhanced Markova model (SVMEM) for short-term WPP was presented by Lei Yang, Miao He, and Junshan Zhang et al. in which both distributional predictions and point predictions are derived [21]. Reference [14] has proved that the uncertainty of wind speed satisfies Gaussian distribution with zero mean and heteroscedasticity. An optimal loss function for heteroscedastic regression and a new framework of v-support vector regression (v-SVR) for learning tasks of Gaussian noise with heteroscedasticity were thus developed to improve the situation. Both the wind speed and wind power, however, satisfy other distribution but not the proposed model in reference [22]. The uncertainty of the wind power is modeled by the probabilistic prediction and prediction interval, which are the extension of point prediction. The point prediction is regarded as the most likely wind power. The probabilistic prediction has been applied to power system security [23], various unit commitment strategies [24], and energy storage sizing [25], etc.
The wind power data at each site are self-correlated individually, but also the wind power data at different sites are spatially cross-correlated. A sparsity-controlled vector autoregressive model is introduced to obtain the spatio-temporal wind power prediction in reference [26]. Although, the predicting methods are discussed in batch learning mode, a general overview of wind power prediction is presented in Table 2. The high penetration of wind power will definitely affect the reliability and security of the grid due to the fluctuation and randomness of wind power. Compared with the short-term WPP, the ultra-short-term WPP is more effective to solve these problems. Although the above mentioned works contribute a lot to the development of WPP, the ultra-short-term WPP is still a young area and needs more contributions from different areas.

Time-Series Similarity
Time-series similarity [27,28] is measured with many methods, such as Minkowski distance, Euclidean distance, Pearson coefficient, and dynamic time warping (DTW) distance [29,30]. Compared to other methods, DTW is not sensitive to synchronization and can measure the similarity of time-series with different lengths. Thus, DTW is employed to measure the similarity of numerical weather prediction series. DTW is defined as follows: Definition 1. Dynamic time warping (DTW) distance. X = {x 1 , x 2 , . . . , x l } and Y = {y 1 , y 2 , . . . , y n } are 2 time-series, where l and n are the lengths of time series X and Y, respectively. The DTW between X and Y is defined recursively as Equation (1).
where {} indicates empty time-series, d(x, y) =||x, y|| 2 is the 2-norm [31] of x and y, and rest(X) = {x 2 , . . . , x l }. Definition 2. Time-series similarity. X and Y are 2 time-series, and the time-series similarity between X and Y is defined as Equation (2).
where θ is a threshold, D dtw (X, Y) is the DTW between X and Y.

Multi-Variable Phase Space Reconstruction
According to Takens-theorem [32,33], the phase space of chaotic time series x 1 , x 2 , . . . , x n is reconstructed with appropriate embedding dimension m and time delay τ; the original system and the trajectory of reconstructed phase space are differential homeomorphism. The phase space reconstruction theory [34] of single-variable chaotic time series could be extended to multi-variable chaotic time series. The multi-variable phase space reconstruction is defined as follows.
where q = j, j + 1, . . . n, j = max 1≤i≤K {(m i − 1)τ i } + 1 and m i and τ i are embedding dimension and time delay of the chaotic time-series X i , respectively. The embedding dimension m i and time delay τ i are obtained using C-C algorithm [33].

Ultra-Short-Term WPP Modeling
In this paper the ultra-short-term WPP model is based on multi-variable phase space reconstruction, similarity of time-series, and linear regression. Our forecasting model includes steps steps, which are specifically presented in Figure 1. More details about the proposed ultra-short-term WPP model are listed below.
Step 1: Data preprocessing. The historical wind power data and weather data may be missing for some reason, for instance, equipment failure, network interruption, etc. The continuous missing data series Md, whose length is more than 5, is removed from historical data series. Otherwise, data is specified by simple interpolation, as expressed in Equation (4).
where x s is the data point before Md, x e is the data point behind Md, i = 1, 2, . . . , N, and N is the length of missing data series Md.
Step 2: Data dimensionality reduction. NWP includes some important parameters, for instance, wind speed v, wind direction θ, atmospheric pressure p, and temperature t. The speed and direction of wind are shown in Figure 2. In Figure 2, the x-axis is the latitude line of the given location, and the east is the positive direction. The y-axis is the longitude of the given location, and the north is the positon direction. In order to conveniently forecast wind power, wind speed ν is decomposed to ν x and ν y .
where θ is the angle between wind direction ν and the x-axis.
Only the most important parameters (ν x , ν y and p) of NWP are selected in the presented model. The historical NWP data can be expressed as a N × 3 matrix M, which is clearly shown as Equation (6).
where N is the number of NWP data points. The matrix M is converted to a N × 1 matrix using the approach in reference [2], as displayed in Equation (8).
where e is an eigenvector, whose corresponding Eigen value is the maximal Eigen value of matrix C.
Step 3: The most similar NWP series segments searching: NWP 0 is the NWP data series during 2L hours, which consist of two equal parts, namely, the former L hours and the latter L hours. NWP0 includes 8L data points (4 points per hour). NWP 0 is converted to X 0 , and the whole historical NWP data series is converted to X according to Equations (6)- (8). X is is the top K sub-series of X, which are most similar to X 0 , and S i is the similarity between X 0 and X is , i = 1, 2, . . . , K.
Step 4: The phase space reconstruction for multivariate time series. τ X and τ P are the delay time of X and P, respectively, and m X and m P are the embedding dimensions of X and P, respectively. τ X , τ P , m X and m P are achieved by C-C method. P i and X i (i = 0, 1, . . . , K) are the wind power sub-series and reduced NWP sub-series at the same time period, and Ph i is the reconstructed phase space of P i and X i .
The training process of linear regression models is to search for appropriate coefficient matrix C and constant term C 0 , so that residual error ξ minimizes.
The Equation (11) is further expressed as follows: Given a group of training samples < X i , Y i >, (i = 1, 2, . . . , K), the goal of the training process is to find appropriate C and C 0 , so that the mean error, namely, Equation (13) minimizes.
After the linear regression model is trained, Y is obtained from Equation (12) by assuming X is given.
Ph(i) is the i-th phase point of reconstructed phase space Ph. Considering that the reconstructed phase space is locally linear, Ph(i) and Ph(i + 1) are linearly related as follows.
The Equation (15) is obtained by substituting all the phase points of Ph into Equation (14).
Step 6: The comprehensive forecasting results. If Ph 1 ,Ph 2 , . . . , Ph k are the reconstructed phase space of the top K most similar time-series segments respectively, according to step 4, then C k is achieved by substituting Ph k into Equation (16). The multivariate linear regression modes are shown as Equation (17).
where Ph k (i) and Ph k (i + 1) are the input and output of the k-th linear regression model, k = 1, 2, . . . , k.
The predicted phase point is achieved when the phase points of the current NWP and wind power data are input into Equation (17). The predicted value of the k-th WPP model is shown as Equation (18).
where P k is the predicted value of wind power by the k-th regression model and Ph k (m P , i + 1) is the m P -th element of the predicted phase point Ph k (i + 1), k = 1, 2, . . . , K. The comprehensive forecasting sums the weighed P k as follows.
where S k is the similarity between X 0 and X is . The forecasting wind power series will be obtained when the above comprehensive forecasting process is executed iteratively.

Parallel Algorithm Based on Map/Reduce
To improve the ultra-short-term WPP, the multivariate phase space reconstruction, the similarities of time-series and the multi-variate linear regression are employed to model wind power prediction. However, the increasing wind power and NWP data increase the time and space complexity and affect the predicting accuracy of the ultra-short-term WPP. In order to accelerate the computing speed, we present a parallel algorithm of ultra-short-term WPP, which is based on the map/reduce programming model.

Map/Reduce Programming Model
Map/reduce is an extendable parallel programming model, which is widely used in the parallel computation of big data. The diagrammatic layout of map/reduce is shown in Figure 3. Map/reduce includes two phases: mapping and reducing. Every sub-process is highly parallel in these two phases. The specific procedures are as follows: 1.
The raw data are input into the key/value pairs (key, value), and the data are processed as much as possible without communication.

The Algorithm of Ultra-Short-Term WPP
The map/reduce model is widely used to process big data, however, a single map/reduce job only can complete simple jobs. To solve complex problems, the complex process is usually divided into some sub-processes, and the work is completed together with many map/reduce jobs. Due to the complexity of the ultra-short-term WPP, which is proposed in Section 3.2, the work is completed by five map/reduce jobs (see Algorithm 1). Algorithm 1. Parallel Ultra-short-term wind power prediction based on Map/reduce.

Job 1: Reducing dimension of NWP matrix. Map:
The NWP matrix M is separated into N sub-matrixes M i , where i = 1, 2, . . . , N. Each sub-matrix M i is a reduced dimension based on the method in Section 3.2.
(1) Each map process reads sub-matrix M i of M.
(2) The intermediate data pairs (0, i, X i is achieved by reducing the data dimension of each sub-matrix M i , according to Equation (8). Reduce: The reducing dimension series X is created by connecting in order all X i , which are from map process.
(1) X = {}; (2) For i = 1:N X i is added to the end of X; end for (3) Output X; For the sake of convenience, X 0 is the reduced NWP data series during 2L hours, which consists of two equal parts, namely, the former L hours and the latter L hours. X 0 is removed from X, and the remainder of X is divided into N sub-series, X 1 , X 2 , . . . , X N , where |X i ||≥ 8L , the head of X 1 is the first node in X, and the head of X i+1 is the (8L − 1)-th node of X i from the bottom, i = 1, 2, . . . , N − 1. Job 2: Searching top k sub-series, which are the most similar to X 0 .
(1) We compute the DTW between X 0 and X ij , which is X i 's sub-series including L elements and starting with the j-th element of Xi; (2) The intermediate data are key/value pairs (i, 'j, DTW') End for Reduce: (2) S k = θ − DTW ij ; (3) Output (k, S k ).
The sub-series are sorted by S k according to the method in reference [35], and the top k sub-series of NWP and wind power are selected. The top k most similar sub-series of NWP and wind power are denoted as X k and P k , where k = 1, 2, . . . , k.
The sub-series are sorted by S k according to the method in reference [35], and the top k sub-series of NWP and wind power are selected. The top k most similar sub-series of NWP and wind power are denoted as X k and P k where k = 1, 2, . . . , k. Job 3: The embedding dimension and delaying time of P and X are computed in this job.
For the sake of convenience, the time series P and X in our algorithm are denoted as P 1 and P 2 , respectively. Map: % calculating the starting index J i of P i (3) The intermediate data of this map are data pairs (0, "m i , τ i , J i "), where m i , τ i and J i are the embedding dimension, delaying time, and starting point of P i , respectively. Reduce: (3) Output m P , τ P , m X , τ X , J.
(2) The linear model is trained by < ℵ k,j , ℵ k,j+1 , where ℵ k,j is the j-th phase point of reconstructed phase space ℵ k , j = 1, 2, . . . , ℵ k − 1; (3) After the linear model has been trained, the next phase point is obtained according to Equation (12) and the last point of ℵ 0 . (4) The step 3 is executed iteratively, and then the forecasting reconstructed phase space ℵ P k is achieved. (5) The forecasting power sequence P kj is obtained based on an inverse process of the phase space reconstruction and reconstructed phase space ℵ P k . (6) The intermediate data are < 0, P kj , S k >.

Reduce:
The comprehensive wind power is given as follows.

The Experimental Data and Environments
The experimental data includes 2 sections: NWP data and wind power data. The NWP data ranged from 1 September 2012 to 31 August 2013 and were from the key laboratory of regional numerical weather predictions in a province of China. NWP data include temperature, humidity, wind speed, wind direction, atmosphere pressure, and so on. The NWP is from the mesoscale numerical prediction model. The temporal resolution of NWP is 1 h. The horizontal resolution of NWP is 3 × 3 km, and it is improved to 1 × 1 km by the dynamic downscaling model [36]. In our experiment, only the data of wind speed, wind direction, and atmosphere pressure were selected. The wind speed and wind direction were first converted into x components and y components, where the east and north are regarded as the positive direction of x-axis and y-axis, respectively. The converted NWP data are shown in Table 3. In Table 3, each row is an NWP data record, which is composed of the time, the wind speed in the x-component, the wind speed in the y-component, and the atmosphere pressure. simple interpolation method was used to get the 15-min NWP data. The wind power (15-min) from 1 September 2012 to 31 August 2013 was used to train the model. The Figure 4 shows the daily peak wind power in each month. In Figure 4, the x axis is time, y axis is the per-unit value of wind power. Because the wind power fluctuates severely at different times, the data are normalized based on Equation (18). (21) where P N is the normalized wind power, P is wind power, and P max is the peak daily wind power from 1 September 2012 to 31 August 2013.
To implement the experiment, we tried to establish an experimental cloudy computing platform Hadoop, which is composed of five nodes including a 4-core central processing unit (CPU) (Intel Core i5) and an Ubuntu operation system. The ultra-short-term wind power was individually predicted by continuous prediction method, simple approximation method, and other methods proposed in this paper.
In terms of continuous prediction method, the nearest observed values were regarded as the next predicting values, namely x i+1 = x i . The simple approximation method is another kind of continuous method based on phase space reconstruction, and the nearest observed phase points were regarded as the values of next phase point i, namely, X i+1 = X j+1 , where X j is the nearest phase point of X i . The error index was the normal mean absolute error (NMAE) in this paper.
where x i is the measured wind power, y i is the forecasting wind power, C is the installed capacity of wind farm, and N is the number of periods being forecasted. More details about NMAE are described in reference [26].

Results and Analysis of the Experiment
The embedding dimension and delaying time of wind power series P and NWP series X were computed by C-C method with 4000 points and 8000 points, respectively. The embedding dimension and delaying time of wind power series P were 5 and 18, respectively. The embedding dimension and delaying time of NWP series P were 6 and 17, respectively.
The number of the most similar sub-series could affect the ultra-short-term wind power prediction. To optimize the number of most similar sub-series, the number of most similar sub-series ranged from 1 to 10 in the experiment. The experimental results are shown as Figure 5, when the number of most similar sub-series ranges from 1 to 10.  Figure 5 shows the relation between NMAE and the number of most similar sub-series. When fewer of the most similar segments (especially 1) are used to train a multi-variate linear regression model, the multi-time weighted linear regression becomes a single variable linear regression, and the results have bigger errors. With increasing K, the errors are slowly reduced. When K is near to the threshold, the NMAE keeps getting smaller. However, when the K is larger than a certain number, the errors increase slowly. The reason for this phenomenon is that the dissimilar series could increase the errors of ultra-short-term WPP. A good result can be achieved when K is between 4 and 8. Figure 6 demonstrates the relation between forecasting time-scale and errors. When the predicting time-scale is less than 6 h, the predicting errors are stable and smaller. With an increasing forecasting time-scale, the average predicting error increases gradually. Figure 6 illustrates that the proposed model is suitable for ultra-short-term WPP, due to its smaller predicting time-scale.    In Figure 8, 'PM' is the errors of the persistence method. 'ARIMA' is the errors of the WPP based on autoregressive integrated moving average. 'BPNN' is the errors of the WPP based on back propagation neural networks [37]. 'SPSR' is the errors of the WPP based on the single-variable phase space reconstruction. 'LSSVR' is the errors of the WPP based on LSSVR. 'MPSR-MLR' is the errors of the proposed model. The statistical results of errors for different methods are listed in Table 4.  Figure 8 and Table 4 show that PM is not applicable to the seriously fluctuating wind power series. MPSR-MLR can reduce the predicting errors in the seriously fluctuating wind power series. The error of our model is lower than that of other WPP methods. MPSR-MLR outperforms SPSR by mining the correlation of different time series.

Conclusions
Various models and methods have been extensively employed in the field of ultra-short-term WPP as no model is suitable for all conditions and all wind farms. In order to improve the ultra-short-term WPP, we have proposed a novel model of the ultra-short-term WPP based on the multi-variates phase space reconstruction and the linear regression in this paper. The proposed model improves the ultra-short-term WPP by mining the correlation of different time series. The performance is particularly good, when the wind power series fluctuate seriously. A map/reduce-based parallel algorithm is implemented on a cloudy computing platform, Hadoop. The research results allow for three conclusions:

1.
Wind power is a chaotic time series, and multi-variate phase space reconstruction can improve the ultra-short-term WPP by mining the correlation of these wind power series.

2.
It is very difficult to forecast wind power, especially when the wind power series fluctuates seriously. The proposed model improves the performance during the wind ramp drastically.

3.
The forecasting speed is accelerated by adopting both the map/reduce-based parallel algorithm and the cloudy computing platform.