An Ultrashort-Term Net Load Forecasting Model Based on Phase Space Reconstruction and Deep Neural Network

Recently, a large number of distributed photovoltaic (PV) power generations have been connected to the power grid, which resulted in an increased fluctuation of the net load. Therefore, load forecasting has become more difficult. Considering the characteristics of the net load, an ultrashort-term forecasting model based on phase space reconstruction and deep neural network (DNN) is proposed, which can be divided into two steps. First, the phase space reconstruction of the net load time series data is performed using the C-C method. Second, the reconstructed data is fitted by the DNN to obtain the predicted value of the net load. The performance of this model is verified using real data. The accuracy is high in forecasting the net load under high PV penetration rate and different weather conditions.


Introduction
In recent years, an increasing number of photovoltaic (PV) power generations have been connected to the distribution network.The use of new energy brings huge benefits to human beings.However, it also has negative impacts on the power grid while improving the environment.The PV power is greatly influenced by weather conditions and fluctuates with the change in irradiance.
To ensure the quality of the power supply and safe operation of the power system, it is necessary to maintain equal amounts of power generation and power consumption in the power system.The power load forecasting is an indispensable mean of maintaining this dynamic balance.In addition, power load forecasting is also of great significance for the planning and scheduling of power systems and the planning of power maintenance.However, the fluctuations in PV power can cause fluctuations in the load when many PV systems are connected.Therefore, load forecasting will become more difficult.
Generally, load forecasting can be divided into long-term forecasting, medium-term forecasting, short-term forecasting, and ultrashort-term forecasting [1].Recently, many systematic and fruitful studies on traditional load forecasting have been conducted.The load forecasting methods mainly include the similar day prediction method [2,3], time series prediction method [4,5], expert system [6,7], and regression analysis method [8,9].Artificial intelligence and machine learning algorithm are types of prediction methods that have rapidly developed in recent years.Support vector regression (SVR) [10], relevance vector regression (RVR) [11], artificial neural network (ANN) [12], deep neural network (DNN) [13], and their improved hybrid algorithms [14] have been applied in the field of load forecasting.In Moon et al. [15], the random forest and multilayer perceptron are combined to predict the daily electrical load.In Nazar et al. [16], the wavelet and Kalman machines, Kohonen self-organizing map (SOM), multi-layer perceptron artificial neural network (MLP-ANN) and adaptive neuro-fuzzy inference system (ANFIS) are used to establish a hybrid three-stage forecasting model.In Zhang et al. [17], a short-term power load forecasting method with wavelet neural network (WNN) and an adaptive mutation bat optimization algorithm (AMBA) are proposed.Liang et al. [18] propose a hybrid model that combines the empirical mode decomposition (EMD), minimal redundancy maximal relevance (mRMR), general regression neural network (GRNN), and fruit fly optimization algorithm (FOA).In Dai et al. [19], a load forecasting model is proposed based on the complete ensemble empirical mode decomposition (EEMD) with adaptive noise and support vector machine (SVM), which is optimized by the modified grey wolf optimization (MGWO) algorithm.
Since traditional load data are typically in hours as the smallest unit, short-term and ultrashort-term forecasts are often not strictly distinguished.With the large-scale access of distributed PV, the short-term load forecasting usually in hours cannot satisfy the requirements of the real-time safety analysis.For the real-time safety analysis of power systems and the reliable operation of economic dispatch, a more detailed ultrashort-term prediction is required.Considering the volatility of the distributed PV power generation and the real-time requirements of the ultrashort-term prediction, the PV power should also be considered a load and merged with the traditional load to form a net load [20][21][22].Because the net load is a set of nonlinear time series with large volatility, in this paper, the phase space reconstruction of net load data is first performed to project the data into the moving point with certain regularity of the trajectory in the phase space.Then, the excellent nonlinear fitting ability of the deep neural network is used to fit the moving point trajectory to obtain the final prediction value.Finally, the actually measured load data is applied to verify the prediction effectiveness and prediction effect of the model under different weather conditions.The phase space is a tool to feature a dynamic system that is reconstructed from a univariate or multivariate time series [23], which is widely used in forecasting models.The DNN is the development of the traditional ANN and suitable for net load forecasting because the nonlinear fitting ability is strengthened [24].The high accuracy in forecasting the net load under high PV penetration rate and different weather conditions is verified using real data.The contribution of this paper can be summarized as follows:

•
The bus load prediction model is established considering distributed PV power supply.The prediction results are necessary guidance for the power grid dispatching, which is conducive to the improvement of PV consumption.

•
The phase space reconstruction is used to process bus load data, and one-dimensional time series data are inversely constructed into the phase space structure of the original system, which can better describe the dynamic characteristics and adapt to the strong fluctuation of the bus load.

•
Levenberg-Marquardt back propagation (LMBP) algorithm is used to train DNN, which accelerates the training speed.Compared with the single hidden layer neural network, DNN can fit the historical data better and significantly improve the accuracy of ultrashort-term load forecasting.

Phase Space Reconstruction
Phase space reconstruction is a method proposed by Takens to analyze the time series.The basic idea of the phase space reconstruction is to regard the time series as a component produced by a certain nonlinear dynamic system.The equivalent high-dimensional phase space of the power system can be reconstructed by the variation law of the component.Among them, the key to reconstruction is to determine the optimal embedding dimension m opt and delay t opt .
In this paper, the optimal embedding dimension m opt and delay t opt are simultaneously obtained by the C-C method.If a set of time series is x = {x 1 , x 2 • • • x N }, the embedding dimension is m, and the time delay is t, then the set of points in the reconstructed phase space can be expressed by Formula (1), where M = N − (m − 1)t [25].
At this time, the correlation integral is Formula (2), where θ(x) = 0 x < 0 1 x ≥ 0 .According to the statistical conclusion of BDS, the range of M and r k can be obtained when N > 3000; m ∈ {2, 3, 4, 5}, and r k = k × 0.5σ which is a real number representing a given range of distances.σ is the standard deviation of time series, and k ∈ {1, 2, 3, 4}.Correlation integral indicates the probability that the distance between any two points in the phase space is less than r k .
We define the test statistic S and ∆S, and use the block averaging strategy, as shown in Formula (3).
Formula ( 4) is the average of S and ∆S.Rounding the t value of the first zero of S or the first minimum of ∆S is the optimal delay t opt .
Formula ( 5) is the test statistic.The global minimum of S cor (t) is the optimal embedded window t ω .
S cor (t) = ∆S(t) + S(t) . Then: Therefore, the optimal delay t opt determined by Equation ( 4) and the optimal embedded window t ω determined by Equation ( 5) can be substituted into Formula (6) and rounded to obtain the optimal embedding dimension m opt .

Deep Neural Network
A neural network generally consists of three layers: An input layer, a hidden layer, and an output layer.As shown in Figure 1, it is a simple neural network with three inputs and two outputs and four neurons in a single hidden layer.The neurons between layers are connected by weight ω.The DNN contains multiple hidden layers, which has a significant improvement in the nonlinear fitting ability of the DNN compared with the single hidden layer.However, too many hidden layers are likely to cause over-fitting.The learning process of the network is the process of adjusting and determining the connection weights ω of each neuron through training samples.
and four neurons in a single hidden layer.The neurons between layers are connected by weight ω .
The DNN contains multiple hidden layers, which has a significant improvement in the nonlinear fitting ability of the DNN compared with the single hidden layer.However, too many hidden layers are likely to cause over-fitting.The learning process of the network is the process of adjusting and determining the connection weights ω of each neuron through training samples.

Hidden layer
Input layer In this paper, the LMBP algorithm is used to train DNN.Compared with the traditional back propagation (BP) algorithm, the LMBP algorithm has faster convergence speed and higher convergence reliability.It is more suitable for training the DNN and can also satisfy the requirements of real-time ultrashort-term prediction.Unlike the traditional BP algorithm, which uses a gradient descent, the LMBP algorithm is based on the Gauss-Newton method of the least squares solution and takes the square of error v as the objective function.

( ) ( ) ( )
The second-order Taylor expansion and derivation of the objective function of Equations ( 2)-( 7) can obtain the change of weight ω as follows: ( ) ( ) where: ( ) J ω is the Jacobian matrix of ( ) v ω .If ( ) v ω consists of a elements, ( ) J ω can be written as follows: ( ) 2∇ v ω v ω is usually negligible, Formula (8) can be rewritten as follows: In this paper, the LMBP algorithm is used to train DNN.Compared with the traditional back propagation (BP) algorithm, the LMBP algorithm has faster convergence speed and higher convergence reliability.It is more suitable for training the DNN and can also satisfy the requirements of real-time ultrashort-term prediction.Unlike the traditional BP algorithm, which uses a gradient descent, the LMBP algorithm is based on the Gauss-Newton method of the least squares solution and takes the square of error v as the objective function.
The second-order Taylor expansion and derivation of the objective function of Equations ( 2)-( 7) can obtain the change of weight ω as follows: where: J(ω) is the Jacobian matrix of v(ω).If v(ω) consists of a elements, J(ω) can be written as follows: Since 2∇ 2 v T (ω)v(ω) is usually negligible, Formula (8) can be rewritten as follows: Considering that J T (ω)J(ω) may be irreversible, Formula (11) is modified by adding the correction coefficient µ, where I is the unit matrix.
Similar to the BP algorithm, the modification of weight ω (k+1) in the kth iteration is shown in Formula (13).∆ω (k) can be obtained by Formula (12).When E ω (k+1) < ε, the algorithm has converged, where ε is the given error limit.
The initial value of µ generally takes a small positive number such as 0.001.If the objective function E ω (k) becomes lower in the kth iteration, µ (k) is divided by a factor θ as µ (k+1) of the next iteration.If the objective function E ω (k) becomes higher in the kth iteration, the iteration will be restarted and multiply µ (k) by the factor xxx as θ of this iteration.θ generally takes a number greater than 1, such as 4.

Ultrashort-Term Load Forecasting Model Based on Phase Space Reconstruction and Deep Neural Network
The traditional load fluctuation is mainly caused by user fluctuations in power usage.Although the electricity consumption of the user is uncertain, there are certain rules in general, and the fluctuation range is not large.For ultrashort-term prediction, linear extrapolation, time series prediction, and other methods can usually achieve the required accuracy.With the massive access of distributed energy sources such as PV power plants, the net load can be expressed by Formula (14).p t is the actual net load, p is the user's electricity load, and p PV t is the opposite number of PV power generation.
Since the amount of PV power generation is as uncertain as the power load and different from the traditional power supply with a known power output, the PV power generation can be considered a load, which reduces the dispatching burden of the system.As a result, the uncertainty of load increases, and the range of fluctuation enlarges, even the situation of power reversal will occur at noon on sunny days.If the traditional forecasting method is also used, it will produce larger errors and cannot accurately predict the load.
Since the load is a non-linear time series with large fluctuations after the distributed energy access, it is difficult to directly predict.Therefore, the complex short-term prediction model may not satisfy the real-time requirements.In this paper, the phase space reconstruction is used to project the load time series into a time-varying and short-term regularity point in the high-dimensional phase space.Then, the non-linear fitting ability and fast convergence speed of LMBP DNN are used to fit and predict the locus of the points in the phase space to realize the ultrashort-term prediction of the load considering the distributed energy.

Modelling Steps of Prediction Model
For a series of net load time series p = {p 1 , p 2 • • • p N } considering the PV power generation, the modelling, and forecasting steps of ultrashort-term forecasting model based on phase space reconstruction and DNN are as follows: • Step 1: The load time series is linearly normalized to facilitate the training of DNN.The maximum and minimum values of the data are saved for the reverse normalization of the load forecasting value to restore the actual value.

•
Step 2: The C-C method is used to process the load time series, and the optimal embedded dimension m opt and optimal delay t opt of the time series are obtained.

•
Step 3: The load time series are reconstructed according to the embedding dimension m and delay t obtained in Step 2. The phase space matrix of the reconstructed load time series is as follows.
In Formula (15), • Step 4: The p neural network is constructed, and the phase space matrix of load time series reconstructed in Step 3 is used as the training set to train the DNN.The trained DNN is used to predict the load value immediately after training.

•
Step 5: Using the maximum and minimum values stored in Step 1, the load prediction values returned by the DNN are inverse-normalized to obtain the actual load prediction values.

•
The model workflow chart is shown in Figure 2.
the modelling, and forecasting steps of ultrashort-term forecasting model based on phase space reconstruction and DNN are as follows: • Step 1: The load time series is linearly normalized to facilitate the training of DNN.The maximum and minimum values of the data are saved for the reverse normalization of the load forecasting value to restore the actual value.

•
Step 2: The C-C method is used to process the load time series, and the optimal embedded dimension mopt and optimal delay topt of the time series are obtained.

•
Step 3: The load time series are reconstructed according to the embedding dimension m and delay t obtained in Step 2. The phase space matrix of the reconstructed load time series is as follows.In Formula ( 15), ( 1) M N m t = − − . • Step 4: The p neural network is constructed, and the phase space matrix of load time series reconstructed in Step 3 is used as the training set to train the DNN.The trained DNN is used to predict the load value immediately after training.

•
Step 5: Using the maximum and minimum values stored in Step 1, the load prediction values returned by the DNN are inverse-normalized to obtain the actual load prediction values.

•
The model workflow chart is shown in Figure 2.

Determination of the Structure of Deep Neural Networks
The determination of the structure of the DNN is a link of the neural network hyper-parameter adjustment.An unreasonable structure can make the prediction results of the DNN seriously deviate.If the training time is too long, the work is half the effort.The specific method of determination is as follows: If the DNN is directly trained using the original load data, the determination of the number of neurons in the input layer can be very difficult and requires a lot of debugging to obtain the optimal value.Moreover, when the training set data changes, previous optimal values may no longer be

Determination of the Structure of Deep Neural Networks
The determination of the structure of the DNN is a link of the neural network hyper-parameter adjustment.An unreasonable structure can make the prediction results of the DNN seriously deviate.If the training time is too long, the work is half the effort.The specific method of determination is as follows:

•
Input layer If the DNN is directly trained using the original load data, the determination of the number of neurons in the input layer can be very difficult and requires a lot of debugging to obtain the optimal value.Moreover, when the training set data changes, previous optimal values may no longer be applicable, and the structure of the input layer must be re-debugged.In this model, the input data of the DNN is the phase space reconstructed matrix.Therefore, the number of neurons in the input layer is directly determined by the embedding dimension m obtained by the C-C method without artificial designation or after debugging to select the optimal value.

•
Hidden layer The number of hidden layers can be heuristically determined.When there are few hidden layers, the model will have an under-fitting and cause a large deviation in the predicted value.
Conversely, too many hidden layers can cause a model overfitting.The number of hidden layers can be gradually increased during the trial until the predicted value shows a significant over-fitting.Then, we gradually reduce the number of hidden layers so that the predicted value and true value of the model are as similar as possible on the verification set to determine the optimal number of hidden layers.The number of neurons in each hidden layer can be taken as 75% of the number of neurons in the upper layer but generally more than the number of neurons in the output layer.The activation function of the hidden layer neurons usually uses the tanh function and rectified linear unit (ReLU) function.The tanh function is used in this paper.

•
Output Layer The prediction model in this paper adopts the one-step prediction.Only the load value at the next moment is predicted at a given time.Therefore, after the load time series is projected to the moving point in the phase space, the model must output the position vector of the point in the phase space of the next time.In fact, if the input of the model is p i (1 ≤ i ≤ M) in the phase space reconstruction matrix of Formula ( 15), only p i+1+(m−1)t is unknown in position vector p i+1 at the next moment.Therefore, the output layer must only output the predicted value of load pi+1+(m−1)t .If i + 1 > M, the phase space reconstruction matrix of Formula ( 15) must be extended downward, and p i+1 is added as a new line.The expression of p i+1 is shown in Formula ( 16), where p i+1+(m−1)t is the true value of the newly measured load.
p i+1 is used as the input of the model to obtain the predicted value of p i+2+(m−1)t ; then, the matrix is augmented, and the predicted value of p i+3+(m−1)t is obtained, and the process continues until the end of the prediction.The activation function of the output layer is a linear function.The DNN structure is shown in Figure 3.
designation or after debugging to select the optimal value.

•
Hidden layer The number of hidden layers can be heuristically determined.When there are few hidden layers, the model will have an under-fitting and cause a large deviation in the predicted value.Conversely, too many hidden layers can cause a model overfitting.The number of hidden layers can be gradually increased during the trial until the predicted value shows a significant over-fitting.Then, we gradually reduce the number of hidden layers so that the predicted value and true value of the model are as similar as possible on the verification set to determine the optimal number of hidden layers.The number of neurons in each hidden layer can be taken as 75% of the number of neurons in the upper layer but generally more than the number of neurons in the output layer.The activation function of the hidden layer neurons usually uses the tanh function and rectified linear unit (ReLU) function.The tanh function is used in this paper.

•
Output Layer The prediction model in this paper adopts the one-step prediction.Only the load value at the next moment is predicted at a given time.Therefore, after the load time series is projected to the moving point in the phase space, the model must output the position vector of the point in the phase space of the next time.In fact, if the input of the model is ( ) in the phase space reconstruction matrix of Formula (15), only p + + − is unknown in position vector 1 i+ p at the next moment.Therefore, the output layer must only output the predicted value of load , the phase space reconstruction matrix of Formula ( 15) must be extended downward, and p is added as a new line.The expression of 1 i+ p is shown in Formula ( 16), where p + + − is the true value of the newly measured load.
1 i+ p is used as the input of the model to obtain the predicted value of  DNN can be widely used in different areas.In addition to load forecasting, DNN can also solve the problems including image processing, speech recognition, and fault diagnosis.The training method and network structure are basically the same.The difference is the training data.For the prediction of time series load data, the output layer is the actual load data.For the image processing and fault diagnosis, the output layer is the data label.The structure of DNN is basically the same as the traditional ANN, but the training method is improved which make it have better performance.

Analysis of Prediction Results
To verify the validity of the model, the proposed model was validated in MATLAB R2018a.In this paper, the net load data of the upper bus of a city's PV substation in the first 15 days of May 2017 were used.The sampling interval of the net load data is 5 min, and the installed capacity of the distributed PV power station is approximately 50 MW.The data from 1-11 May were selected as the training set to model the prediction model, and the parameters of the model were adjusted by cross-validation.The data from 12-15 May were selected as the samples of the prediction test, where 12 May was sunny, and 13-15 May were cloudy.

Result of the Phase Space Reconstruction by the C-C Method
The net load data from 1-11 May were processed by the C-C method.The corresponding statistics of ∆S(t) and S cor (t) are shown in Figure 4.The first extremum point of ∆S(t) was t = 7. S cor (t) had no obvious minimum point, and the optimal embedded window t ω was not obtained.According to the BDS statistics, when N > 3000, m ∈ {2, 3, 4, 5}, so the maximum value of m could only take 5.According to Formula (6), the final optimal embedding dimension m opt = 5 and optimal delay t opt = 7 were obtained.
the problems including image processing, speech recognition, and fault diagnosis.The training method and network structure are basically the same.The difference is the training data.For the prediction of time series load data, the output layer is the actual load data.For the image processing and fault diagnosis, the output layer is the data label.The structure of DNN is basically the same as the traditional ANN, but the training method is improved which make it have better performance.

Analysis of Prediction Results
To verify the validity of the model, the proposed model was validated in MATLAB R2018a.In this paper, the net load data of the upper bus of a city's PV substation in the first 15 days of May 2017 were used.The sampling interval of the net load data is 5 min, and the installed capacity of the distributed PV power station is approximately 50 MW.The data from 1-11 May were selected as the training set to model the prediction model, and the parameters of the model were adjusted by crossvalidation.The data from 12-15 May were selected as the samples of the prediction test, where 12 May was sunny, and 13-15 May were cloudy., so the maximum value of m could only take 5.According to Formula (6), the final optimal embedding dimension mopt = 5 and optimal delay topt = 7 were obtained.

Prediction Results of the Deep Neural Network
Since the embedding dimension is m = 5 as determined by the C-C method, there were five input neurons in the DNN.The cross-validation shows that when the number of layers in the hidden layer was 5, the predicted value showed a significant over-fitting.Therefore, the number of layers of the hidden layer was taken as 4, and the neurons of each hidden layer were taken as 5, 4, 3, and 2. The DNN used the single-step prediction and only predicted the next 5 min load value at a time.The

Prediction Results of the Deep Neural Network
Since the embedding dimension is m = 5 as determined by the C-C method, there were five input neurons in the DNN.The cross-validation shows that when the number of layers in the hidden layer was 5, the predicted value showed a significant over-fitting.Therefore, the number of layers of the hidden layer was taken as 4, and the neurons of each hidden layer were taken as 5, 4, 3, and 2. The DNN used the single-step prediction and only predicted the next 5 min load value at a time.The predicted result is shown in Figure 5.The black solid line is the actual net load, the red solid line is the ultrashort-term prediction value based on the phase space reconstruction and DNN, and the blue dotted line is the ultrashort-term prediction value based on the traditional BP neural network.
predicted result is shown in Figure 5.The black solid line is the actual net load, the red solid line is the ultrashort-term prediction value based on the phase space reconstruction and DNN, and the blue dotted line is the ultrashort-term prediction value based on the traditional BP neural network.In Figure 5, at approximately 12:00 noon on a clear day (12th), a negative power was present in the payload due to an increase in the amount of PV power generation.The prediction results based on the phase space reconstruction and DNN were closer to the actual net load value, which was obviously better than the prediction results using the traditional BP neural network.
On cloudy days (13-15 May), when the net load sharply fluctuated due to the fluctuation of the PV output, although the actual net load value was stable, the predicted value of the traditional BP neural network still had large fluctuations, which resulted in a large deviation.However, the prediction results based on the phase space and DNN did not strongly deviate and basically conformed to the actual trend of the net load.
To accurately evaluate the accuracy of the model prediction and the accuracy of prediction, the mean absolute percentage error (MAPE) and root mean square error (RMSE) were used as evaluation indicators.In Formulas (17) and (18), n is the number of predicted samples, i p is the actual value of the net load at time i, and ˆi p is the predicted value of the net load at time i.In Figure 5, at approximately 12:00 noon on a clear day (12th), a negative power was present in the payload due to an increase in the amount of PV power generation.The prediction results based on the phase space reconstruction and DNN were closer to the actual net load value, which was obviously better than the prediction results using the traditional BP neural network.
On cloudy days (13-15 May), when the net load sharply fluctuated due to the fluctuation of the PV output, although the actual net load value was stable, the predicted value of the traditional BP neural network still had large fluctuations, which resulted in a large deviation.However, the prediction results based on the phase space and DNN did not strongly deviate and basically conformed to the actual trend of the net load.
To accurately evaluate the accuracy of the model prediction and the accuracy of prediction, the mean absolute percentage error (MAPE) and root mean square error (RMSE) were used as evaluation indicators.In Formulas (17) and (18), n is the number of predicted samples, p i is the actual value of the net load at time i, and pi is the predicted value of the net load at time i.
The prediction accuracy is shown in Table 1.Compared with the prediction model based on the traditional BP neural network, the forecasting scheme proposed in this paper improved the accuracy of net load forecasting under different weather conditions.On a cloudy day (15 May), the power of PV power generation was very small.The accuracy of the prediction model based on the phase space reconstruction and the deep neural network was basically identical to that based on the traditional BP neural network.However, on a sunny day (12 May), the PV power was relatively high, and the difference of MAPE between the two models was nearly 10%.The predictive models based on the phase space reconstruction and DNN still had higher prediction accuracy even in the case of large distributed PV power access and large power fluctuation.

Conclusions
• A large amount of access to the distributed PV power generation results in the increasing fluctuation of the net load power, which challenges the ultrashort-term load forecasting.Considering this phenomenon and the characteristics of ultrashort-term load forecasting, this paper presents a model of ultrashort-term load forecasting based on the phase space reconstruction and DNN.Based on the phase space reconstruction, the time series is projected into a moving point in the phase space, and the DNN is subsequently used to fit the trajectory to realize the load forecasting.

•
The prediction of the actual load data and a comparison experiment with the BP neural network prediction model have verified that the proposed model has higher prediction accuracy even in the case of large distributed PV power fluctuations and different weather conditions.In particular, when there is a large amount of PV power access and high penetration, there is also an ideal predictive performance.
− is obtained, and the process continues until the end of the prediction.The activation function of the output layer is a linear function.The DNN structure is shown in Figure3.

4. 1 .
Result of the Phase Space Reconstruction by the C-C MethodThe net load data from 1-11 May were processed by the C-C method.The corresponding statistics of obvious minimum point, and the optimal embedded window t  was not obtained.According to the BDS statistics, when 3000 Figure 4. Curves of

Table 1 .
Prediction accuracy of different models.