Phase Space Reconstruction Algorithm and Deep Learning-Based Very Short-Term Bus Load Forecasting

: With the reﬁnement and intelligence of power system optimal dispatching, the widespread adoption of advanced grid applications that consider the safety and economy of power systems, and the massive access of distributed energy resources, the requirement for bus load prediction accuracy is continuously increasing. Aiming at the volatility brought about by the large-scale access of new energy sources, the adaptability to di ﬀ erent forecasting horizons and the time series characteristics of the load, this paper proposes a phase space reconstruction (PSR) and deep belief network (DBN)-based very short-term bus load prediction model. Cross-validation is also employed to optimize the structure of the DBN. The proposed PSR-DBN very short-term bus load forecasting model is veriﬁed by applying the real measured load data of a substation. The results prove that, when compared to other alternative models, the PSR-DBN model has higher prediction accuracy and better adaptability for di ﬀ erent forecasting horizons in the case of high distributed power penetration and large ﬂuctuation of bus load.


Introduction
Electricity cannot be stored in large quantities, and the investment recovery cycle of large-scale energy storage equipment is long.Therefore, in order to ensure the safe operation of power systems and power quality on the user side, the operators must have knowledge of future power loads [1].Power system load forecasting is an important method to understand the trend of future electric load.In addition, power load forecasting is of great significance for the planning of power systems and scheduling of generation and transmission maintenance.Power system load forecasting is generally divided into long-term forecasting, medium-term forecasting, short-term forecasting, and very short-term forecasting [2].Among them, short-term load forecasting (STLF) and very short-term load forecasting (VSTLF) are of great significance for economic dispatch, optimal power flow, and electricity market trading.The higher the accuracy of load forecasting is, the more beneficial it is to improve the utilization rate of power generation equipment and the effectiveness of economic dispatch, and reduce the operation cost of smart grid.
In the past decades, experts and scholars have made systematic and effective research on traditional deterministic and probabilistic STLF and VSTLF.Deterministic forecasting methods can be divided Energies 2019, 12, 4349 2 of 17 into two main categories [3]: The first category uses statistical forecasting models, such as linear regression [4], curve extrapolation [5], Autoregressive Integrated Moving Average (ARIMA) model [6,7], and other time series methods; the second category uses artificial intelligent forecasting models, such as Bayesian estimation [8], Random Forests [9], Support Vector Regression (SVR) [10,11], Artificial Neural Network (ANN) [12,13], Deep Belief Network (DBN) [14,15], and Long Short Term Memory (LSTM) Network [16,17].These methods have achieved high forecasting accuracy and good robustness in day-ahead and hour-ahead load forecasting.However, most of the studies are focused on system-level load forecasting and there are relatively few on bus load forecasting.Generally, bus loads can refer to loads supplied by transmission and distribution systems transformers [18].Bus load forecasting can be conducive to the optimal scheduling of decentralized generation, network congestion studies, and others [19].
With the refinement and intelligence of power grid optimal dispatching and the wide application of advanced smart grid applications, which take into account the security and economy of power system, the demand for bus load forecasting accuracy is increasing.Since bus load base is much smaller than that of a system, the uncertainty of bus load and the multi-dimensional nonlinearity [20] are more obvious.The traditional method of distributing the predicted value of system through bus load ratio often fails to achieve satisfactory results [21].In this regard, the literature [18] modifies the ANN model for the aggregated load of the interconnected system and proposes two novel hybrid forecasting models, which can capture and successfully treat the special characteristics of bus load patterns.In Ref. [19], a bus load forecasting model based on clustering and ANN is proposed; the day-ahead load forecasting and hour-ahead load forecasting are carried out, achieving high prediction accuracy.
With a large number of distributed power access, the uncertainty and nonlinear characteristics of system [22] and bus load are further enhanced.As the collection time interval of distributed photovoltaic power, plant data is generally several minutes.The short-term and day-ahead load forecasting in hours of bus cannot make full use of historical information and have low prediction accuracy.In order to ensure the reliable operation of power system real-time security analysis and economic dispatch, more detailed VSTLF is needed.The authors of [23] proposed a chaotic-radial basis function (RBF) photovoltaic power generation prediction model and verified its prediction accuracy under different weather conditions.However, the author only validated the prediction accuracy of the model in the case of single-step prediction and did not involve the forecasting horizon problem of the model [24].Ref. [20] proposes a novel load forecasting model based on phase space reconstruction (PSR) algorithm and bi-square kernel (BSK) regression, and achieves high prediction accuracy on different data sets.However, after phase space reconstruction, different BSK models are used to independently predict the various dimensional data, which neglect the time series characteristics of the load.
In view of the shortcomings of the above forecasting model, considering the adaptability to different forecasting horizons, the time series characteristics of load, and the volatility brought by large-scale access of new energy sources, this paper proposes a novel very short-term bus load forecasting model based on phase space reconstruction and deep belief network (PSR-DBN).Because the amount of historical data in VSTLF is relatively large and closely related to future load trend, the impact of weather, electricity price, and other factors on VSTLF is not considered in this paper.Firstly, the proposed PSR-DBN model performs phase space reconstruction on bus load history data, and projects the historical data to the motion track of a moving point in the phase space.Then the model takes advantage of the excellent nonlinear fitting ability of the deep belief network to fit the moving point trajectory and provide a prediction of the trajectory.Finally, the predicted value of the load is obtained.At the same time, the structure of the DBN is optimized by cross-validation.In order to test the validity and superiority of the proposed PSR-DBN very short-term bus load forecasting model, this paper applies the measured load data of a substation in China to verify the forecasting effectiveness of the model under different forecasting horizons (5 min-1 h).In addition, other six alternative forecasting models are employed to further compare with the proposed PSR-DBN model.The major contributions of this paper are as follows:

•
A novel hybrid VSTLF model based on phase space reconstruction ensemble deep belief network is proposed, which can maintain high prediction accuracy in the case of high distributed power penetration and large fluctuation of bus load.

•
The Levenberg-Marquardt backpropagation (LMBP) algorithm is used to fine-tune the DBN, which can make DBN convergence faster and more accurate, compared with a BP algorithm.

•
A practical method based on cross-validation is proposed to tune the structure of DBN for better forecasting performance.

•
The PSR algorithm is adopted to make a regular pattern that could not be obtained in one-dimensional time series appear in a high-dimensional phase space, which improves the adaptability of a forecasting model to different forecasting horizons, especially long estimation.
The rest of this paper is organized as follows.In Section 2, the relevant theory of the PSR-DBN model is introduced.Section 3 provides the principal steps of the PSR-DBN model and covers the tuning method of network hyperparameter based on cross-validation.Section 4 presents the evaluation criteria of forecasting accuracy, case study settings, forecasting results, and a comparison.Section 5 gives the conclusion of the paper and an outlook for future research.

Phase Space Reconstruction (PSR)
Phase space reconstruction (PSR) is an efficient method for analyzing nonlinear time series.The basic idea of phase space reconstruction is to regard the time series as a component generated by a nonlinear dynamic system.The variation law of the component can reconstruct the equivalent high-dimensional phase space of the dynamic system, and the time series can be projected into a moving point trajectory in the high-dimensional phase space.If there is a one-dimension time series x = {x 1 , x 2 • • • x N }, the embedding dimension is m, and the delay time is t, then the set of time series reconstructed by phase space can be expressed as: where The key to PSR is to determine the optimal embedding dimension m opt and optimal delay t opt .In this paper, the C-C method [25] is employed to determine the optimal embedding dimension m opt and delay t opt at the same time.
Based on matrix partitioning average strategy, the test statistics S is defined as: For N → ∞ , Equation (3) can be deformed to: For the fixed m and t, S(m, r k , t) will equal for 0 for all r, if the data are iid and N → ∞ .However, the real data set is not infinite, and there may be a correlation between data.Thus, the optimal delay time may be either the zero crossing of S(m, r k , t) or the times at which S(m, r k , t) shows the least variation with r [25].
To represent the variation of S(m, r k , t) with r, the test statistics ∆S is defined as: where The means of S and ∆S are defined as S and ∆S, and the equations are shown as: For all values of t, S(t) and ∆S(t) can find corresponding values.Wherein, the t value corresponding to the first zero point of S(t) or the first minimum point of ∆S(t) is rounded to be the optimal delay t opt .
The test statistic S cor is defined as: where the t value corresponding to the global minimum point of S cor (t) is the optimal embedded window t ω .When the optimal delay t opt is determined by Equation ( 6) and the optimal embedded window t ω is determined by Equation (7), the optimal embedding dimension m opt can be determined by rounding the value of Equation (8).

Deep Belief Network (DBN)
The Deep Belief Network (DBN) is a deep learning model proposed by Geoffrey Hinton [28] in 2006 and is a stack of multiple Restricted Boltzmann Machines (RBM).Compared with the Artificial Neural Network (ANN), DBN employs pre-training technology combined with Backpropagation (BP) algorithm to solve network parameters.Therefore, it is not easy for DBN to fall into a local optimal solution and has higher convergence accuracy.Furthermore, when the number of layers and the number of neurons in each layer are large, DBN also has a fast convergence speed, which makes it more suitable for the fitting problem of complex nonlinear time series [29].

Restricted Boltzmann Machine (RBM)
The RBM consists of a visible layer V and a hidden layer H.As shown in Figure 1, the visible layer consists of n v neurons and the hidden layer consists of m h neurons, each of which takes a value of Energies 2019, 12, 4349 5 of 17 0 or 1 and obeys the Bernoulli distribution, i.e., v i ∈ {0, There is no connection between the neurons in each layer, and the neurons between the layers are connected by weights ω.

ω [ ]
RBM is a kind of probabilistic unsupervised learning.Its network parameters are composed of visible layer bias b , weight matrix ω , and hidden layer bias c , and the optimal value of network parameters is determined by the minimum energy function.The energy function is defined as: ( ) Where ij ω is the connection weight of the i-th visible layer neuron and the j-th hidden layer neuron, and Based on Equation ( 9), the joint probability distribution of the visible neuron state and the hidden neuron state is shown as: Where normalization factor , which represents the sum of the energy function negative exponents under all possible values of visible layer neuron state variable v and hidden layer neuron state variable h.
The probability distribution ( ) P v of v can be derived from Equation (10) as: Thus, the objective function of RBM training can be expressed as a likelihood function of the probability distribution of visible layer state variable v on the training set, and the likelihood function can be derived from the Equation (11) as: Where T represents the set of sample inputs on the training set, and when the objective function takes the maximum value, the energy function is the minimum.
According to the network structure in Figure 1, the activation probability of i v in a given hidden layer neuron state h and the activation probability of j h in a given visible layer neuron state v can be derived as: ( ) ( ) Where σ represents the sigmoid function, ( ) RBM is a kind of probabilistic unsupervised learning.Its network parameters are composed of visible layer bias b, weight matrix ω, and hidden layer bias c, and the optimal value of network parameters is determined by the minimum energy function.The energy function is defined as: where ω ij is the connection weight of the i-th visible layer neuron and the j-th hidden layer neuron, and θ = {b, ω, c}.
Based on Equation ( 9), the joint probability distribution of the visible neuron state and the hidden neuron state is shown as: where normalization factor Z = v,h e −E(v,h|θ) , which represents the sum of the energy function negative exponents under all possible values of visible layer neuron state variable v and hidden layer neuron state variable h.The probability distribution P(v) of v can be derived from Equation (10) as: Thus, the objective function of RBM training can be expressed as a likelihood function of the probability distribution of visible layer state variable v on the training set, and the likelihood function can be derived from the Equation ( 11) as: where T represents the set of sample inputs on the training set, and when the objective function takes the maximum value, the energy function is the minimum.
According to the network structure in Figure 1, the activation probability of v i in a given hidden layer neuron state h and the activation probability of h j in a given visible layer neuron state v can be derived as: Energies 2019, 12, 4349 6 of 17 where σ represents the sigmoid function, σ(x) = 1 1+e −x .Since the gradient cannot be directly obtained when using stochastic gradient ascent algorithm to seek the maximum of Equation ( 12), the training of RBM usually applies Contrastive Divergence (CD) algorithm to approximate the gradient of likelihood function [30].The specific steps of RBM training are as follows: Step 1: Substitute the input of the training set as v 1 in Equation ( 14) to obtain P(h 1 = 1|v 1 ), then employ random sampling to acquire the reconstructed value of h 1 .
Step 2: Substitute h 1 in Equation ( 13) to obtain P(v 2 = 1|h 1 ) and then employ random sampling to acquire the reconstructed value of v 2 .
Step 4: Update network parameters.The iteration algorithm of network parameters is as follows: where ε is the learning rate, which takes the value of 0.8 in this paper, and the superscript k represents the k-th iteration.

DBN based on Levenberg-Marquardt backpropagation (LMBP) Algorithm
Traditional DBN is formed by stacking multiple RBMs, in which the hidden layer of the previous RBM is used as the visible layer of the next RBM.CD algorithm is used to determine the network parameters layer by layer during pre-training, which is unsupervised learning.Then the pre-trained network parameters are assigned to the neural network as the initial training value of network parameters.The network parameters are fine-tuned by using the sample labels in the training set combined with BP algorithm, which is supervised learning.The structure of a traditional DBN is shown in Figure 2.
Energies 2019, 12, x FOR PEER REVIEW 6 of 18 Since the gradient cannot be directly obtained when using stochastic gradient ascent algorithm to seek the maximum of Equation ( 12), the training of RBM usually applies Contrastive Divergence (CD) algorithm to approximate the gradient of likelihood function [30].The specific steps of RBM training are as follows: Step 1: Substitute the input of the training set as 1 v in Equation ( 14) to obtain ( ) then employ random sampling to acquire the reconstructed value of 1 h .
Step 2: Substitute 1 h in Equation ( 13) to obtain ( ) and then employ random sampling to acquire the reconstructed value of 2 v .
Step 3: Substitute 2 v in Equation ( 14) to obtain ( ) Step 4: Update network parameters.The iteration algorithm of network parameters is as follows: ( ) Where ε is the learning rate, which takes the value of 0.8 in this paper, and the superscript k represents the k-th iteration.In this paper, the LM (Levenberg-Marquardt) BP algorithm [31] is used to replace the traditional BP algorithm to fine-tune the DBN.Compared with the traditional BP algorithm, the LMBP algorithm has faster convergence speed and higher convergence reliability, and is more suitable for training neural networks with many hidden layers and neurons.

DBN based on Levenberg-Marquardt backpropagation (LMBP) Algorithm
Different from the traditional BP algorithm, the LMBP algorithm is based on the Gauss-Newton method in the least square solution.The square of error v is taken as the objective function and the second-order Taylor expansion of objective function is derived.After approximating the gradient of the objective function (ignoring the high-order term), the correction of weight ω is: where µ is the correction coefficient, which is set to prevent J T (ω)J(ω) from being irreversible; I is the identity matrix; J(ω) is the Jacobian matrix of v(ω), which can be written as: where represents the partial derivative of v i to ω j .
Similar to BP algorithm, the modifier expression of weight ω (k+1) in the k-th iteration is: µ needs to be adjusted in each iteration to obtain a better convergence effect.When µ is small, the algorithm is standard Gauss-Newton method, which has higher convergence accuracy.However, if the difference between the objective function and the approximate quadratic function is too large in the iteration, the convergence effect will be poor.When µ is large, the algorithm becomes traditional BP algorithm.When the Gauss-Newton method has a poor convergence performance, the gradient descent BP algorithm can be used as an auxiliary solution.

The Procedure of the PSR-DBN Model
For a list of bus load historical data time series p = p 1 , p 2 • • • p N , the prediction process of the PSR-DBN forecasting model is as follows: Step 1 normalization: The load time series is normalized to prepare for the training of deep belief network, and the maximum and minimum values of data are saved for subsequent denormalization of the load predicted value to restore real value.
Step 2 PSR: The C-C method is adopted to process the load time series to find the optimal embedding dimension m and the optimal delay t of time series.Then the load time series is reconstructed according to the obtained embedding dimension m and delay t.The reconstructed load time series is as follows: where M = N − (m − 1)t.
Energies 2019, 12, 4349 8 of 17 Step 3 DBN: The DBN is constructed by using the phase space matrix in step 2 as the training set and employing cross-validation to optimize the hyperparameters of the network.The details of hyperparameters tuning are introduced in Section 3.2.Finally, the trained DBN is adopted to predict the load value of the future moment.
Step 4 denormalization: The load prediction value returned by the deep belief network in step 3 is denormalized by applying the maximum and minimum values saved in step 1, then the actual load forecasting value is obtained.
The flow chart corresponding to the above steps is shown in Figure 3: Where Step 3 DBN: The DBN is constructed by using the phase space matrix in step 2 as the training set and employing cross-validation to optimize the hyperparameters of the network.The details of hyperparameters tuning are introduced in Section 3.2.Finally, the trained DBN is adopted to predict the load value of the future moment.
Step 4 denormalization: The load prediction value returned by the deep belief network in step 3 is denormalized by applying the maximum and minimum values saved in step 1, then the actual load forecasting value is obtained.
The flow chart corresponding to the above steps is shown in Figure 3:

Determination of DBN Network Structure
Similar to the neural network, DBN also has many hyperparameters to be set.The rationality of hyperparameters adjustment determines the prediction accuracy of the prediction model.
Unreasonable hyperparameters will lead to a significant increase in prediction error.The determination of the network structure is an important part of DBN hyperparameters adjustment.
For the input layer of the DBN, since the original load data is input to DBN after passing through PSR, the number of the input layer neurons of DBN does not need to be tuned and can be directly set to embedding dimensions m, that is, one line of elements in Equation ( 19) is input each time.For the DBN output layer, when the DBN input is a row of elements in Equation ( 19), it is equivalent to inputting the position vector of the moving point in phase space at a certain moment.
Then it needs to output the predicted value of the moving point position vector at the next moment.
However, if the input of the model is ( ) in the phase space matrix of Equation ( 19),  )

Determination of DBN Network Structure
Similar to the neural network, DBN also has many hyperparameters to be set.The rationality of hyperparameters adjustment determines the prediction accuracy of the prediction model.Unreasonable hyperparameters will lead to a significant increase in prediction error.The determination of the network structure is an important part of DBN hyperparameters adjustment.
For the input layer of the DBN, since the original load data is input to DBN after passing through PSR, the number of the input layer neurons of DBN does not need to be tuned and can be directly set to embedding dimensions m, that is, one line of elements in Equation ( 19) is input each time.For the DBN output layer, when the DBN input is a row of elements in Equation (19), it is equivalent to inputting the position vector of the moving point in phase space at a certain moment.Then it needs to output the predicted value of the moving point position vector at the next moment.
However, if the input of the model is p i (1 ≤ i ≤ M) in the phase space matrix of Equation ( 19), only p i+1+(m−1)t in the position vector p i+1 at the next moment is unknown, so the output layer only needs to output the predicted value pi+1+(m−1)t of the load at time i + 1 + (m − 1)t.If i+1 is greater than M, then the phase space matrix of Equation ( 19) needs to be extended downwards, p i+1 is added as a new row, and p i+1 is taken as the input of the model to obtain the predicted value of p i+2+(m−1)t ; Then the matrix is augmented and the predicted value of p i+3+(m−1)t is obtained, and so on until the end of the forecasting.The expression of p i+1 in the augmented matrix is: where the value of p i+1+(m−1)t is determined by the forecasting horizons.If it is one-step forecasting, p i+1+(m−1)t takes the real value of load measured at time i + 1 + (m − 1)t.If it is multi-step forecasting and the number of forecasting steps is k, p i+1+(m−1)t = pi+1+(m−1)t is taken until the k-th step forecasting.and then the predicted value in the augmented matrix is replaced by the real value of the measured load.
For the hidden layer of DBN, the number of hidden layers and neurons in each hidden layer has a significant influence on the prediction result, and it is found that the effect of optimizing the number of hidden layers is usually more obvious.Too few hidden layers will cause the under-fitting to affect the forecasting performance, and too many hidden layers will lead to over-fitting and will make forecasting performance worse.Therefore, in order to improve the prediction accuracy of the PSR-DBN model, cross-validation is used to optimize the number of hidden layers and neurons in each hidden layer.The flow is shown in the blue dotted line in Figure 3.The specific steps are as follows: 1.
Cross-validation method: Considering that the load data is a time series, it is not appropriate to use a K-fold cross-validation method to disrupt the order.Therefore, the last part of the training set is eliminated and used as a verification set by hold-out cross-validation.The rest of the data is kept as a training set.

2.
Determine the optimal number of layers: The enumeration is used to determine the optimal number of hidden layers.Many researchers find that a shallow network requires exponential width (number of neurons in each layer) to implement a function that a deep network of polynomial width could implement [32].That is, compared with the number of layers, the number of neurons in each layer has less influence on prediction, so it is fixed during the enumeration.The number of neurons in each layer is set to be 2m, and the number of hidden layers is increased layer by layer until a significant over-fitting occurs.Then the number of hidden layers with the smallest forecasting error is selected.

3.
Determine the number of neurons in each layer: Since the forecasting performance of DBN varies with the initial value, the effect of changing the number of neurons one by one on forecasting performance is easily submerged in the fluctuation of forecasting performance caused by different initial values.Therefore, this paper uses a fixed step size to search for the superior number of neurons roughly.After determining the number of hidden layers according to step 2, the combination of the number of neurons with minimum prediction error is searched in steps of m in each layer.Because too many neurons will make the training of network slow and bring the risk of over-fitting, the selected search range of this paper is m to 5m, and a good combination of the number of neurons is determined by testing.
In summary, the structure, input, and output of the proposed DBN are shown in Figure 4, in which the number of the hidden layers and the number of neurons in each hidden layer are obtained by cross-validation.

Bus Load Data
In order to test the validity of the model, the load data of a 220-kV substation bus in a city of is used as given in Ref. [22], the prediction horizons are too short to meet the real-time safety analysis and economic dispatching requirements of the power system.However, short-term load forecasting on an hourly scale combined with the interpolation has poor forecasting accuracy.
Therefore, the forecasting horizons of very short-term load forecasting in this paper are from 5 minutes to 1 hour, and the proposed model is validated in the MATLAB (R2018a, MathWorks Inc., Massachusetts, USA) environment.

Forecasting Evaluation Index
In order to more intuitively and accurately evaluate the forecasting performance of the model and the accuracy of prediction, this paper adopts Mean Absolute Percentage Error (MAPE), Mean Absolute Scaled Error (MASE), Symmetric Mean Absolute Error (sMAPE), Geometric Mean Absolute Error (GMAE), and Root Mean Square Error (RMSE) [33] as evaluation indicators.

Bus Load Data
In order to test the validity of the model, the load data of a 220-kV substation bus in a city of China from 1 May to 18 May 2017 are used in this paper.The bus is connected with a distributed photovoltaic power station with an installed capacity of about 50 MW, and the sampling time interval of load data is 5 min.During the period, the substation had no overhaul or fault shutdown.The reliability of historical data is high, and 3σ criteria is used to detect that there is no abnormal data.
In this paper, the data of 1-14 May are selected as the training set of the PSR-DBN forecasting model, and cross-validation is used to adjust the model hyperparameters.The data of 15-18 May are selected as the forecasting test set.If only one-step forecasting of the load in the next 5 min is used as given in Ref. [22], the prediction horizons are too short to meet the real-time safety analysis and economic dispatching requirements of the power system.However, short-term load forecasting on an hourly scale combined with the interpolation has poor forecasting accuracy.Therefore, the forecasting horizons of very short-term load forecasting in this paper are from 5 min to 1 h, and the proposed model is validated in the MATLAB (R2018a, MathWorks Inc., Massachusetts, USA) environment.

Forecasting Evaluation Index
In order to more intuitively and accurately evaluate the forecasting performance of the model and the accuracy of prediction, this paper adopts Mean Absolute Percentage Error (MAPE), Mean Absolute Scaled Error (MASE), Symmetric Mean Absolute Error (sMAPE), Geometric Mean Absolute Error (GMAE), and Root Mean Square Error (RMSE) [33] as evaluation indicators.
Energies 2019, 12, 4349 where n represents the number of predicted samples, p i represents the real value of the load at time i, and pi represents the predicted value of the load at time i.
The smaller the values of each metric, the higher the prediction accuracy of the model.However, these indicators are relative values and need to be compared under the same data scale to be meaningful.

PSR Reconstruction Results
Based on the theoretical analysis of PSR in Section 2.1, the C-C method is employed to reconstruct the phase space of the bus load data from 1 May to 14 May.The corresponding statistics of ∆S(t) and S cor (t) are shown in Figure 5.It can be seen that the first minimum point of ∆S(t) is t = 18, while S cor (t) cannot get the optimal embedding window t ω without an obvious minimum point.However, from the BDS statistics, when N > 3000, m ∈ {2, 3, 4, 5}, so the maximum value of m can only be 5.According to Equation ( 8), the final optimal embedding dimension m opt = 5 and the optimal delay t opt = 18.
    Where n represents the number of predicted samples, i p represents the real value of the load at time i, and ˆi p represents the predicted value of the load at time i.
The smaller the values of each metric, the higher the prediction accuracy of the model.
However, these indicators are relative values and need to be compared under the same data scale to be meaningful.

PSR Reconstruction Results
Based on the theoretical analysis of PSR in Section 2.1, the C-C method is employed to reconstruct the phase space of the bus load data from , so the maximum value of m can only be 5.According to Equation ( 8), the final optimal embedding dimension mopt = 5 and the optimal delay topt = 18.

DBN Hyperparameter Setting
In Section 4.3, the optimal embedding dimension obtained by PSR is mopt = 5.The structure of DBN is determined by the method described in Section 3.2.The number of neurons in the input layer and output layer of DBN is 5 and 1, respectively.
To determine the hidden layer hyperparameter, 13-14 May of the training set is eliminated and used as a verification set; the rest are reserved as the training set.The number of hidden layer neurons of DBN is fixed to 10, the number of layers is increased one by one, and the MAPE of the prediction result on the verification set is used as the criterion.The forecasting horizon is 1 hour, and the MAPE of different hidden layers models is shown in Table 1.

DBN Hyperparameter Setting
In Section 4.3, the optimal embedding dimension obtained by PSR is m opt = 5.The structure of DBN is determined by the method described in Section 3.2.The number of neurons in the input layer and output layer of DBN is 5 and 1, respectively.
To determine the hidden layer hyperparameter, 13-14 May of the training set is eliminated and used as a verification set; the rest are reserved as the training set.The number of hidden layer neurons of DBN is fixed to 10, the number of layers is increased one by one, and the MAPE of the prediction result on the verification set is used as the criterion.The forecasting horizon is 1 h, and the MAPE of different hidden layers models is shown in Table 1.In Table 1, when the number of hidden layers equals 8, the MAPE is significantly increased, and it can be inferred that over-fitting occurs at this time, so the continuation of increasing the number of layers is stopped.When the number of hidden layers is 4, MAPE is the smallest and is 0.9358.Therefore, the optimal number of hidden layers equals 4.
Based on the four hidden layers, the optimal number of neurons was roughly searched by a fixed step size, and the step size was set to 5 neurons.The MAPE of prediction result on the verification set is used as the criterion, and the forecasting horizon is 1 h.The sample space of the search is 5 4 = 625.When the MAPE is the smallest, the number of neurons in each hidden layer is [25,15,20,15].At this time, the MAPE of predicted value on the verification set is 0.8447, which is obviously better than the MAPE = 0.9358 of 4 hidden layers and the number of neurons in each layer is 10.Therefore, the number of neurons in each hidden layer is [25,15,20,15].
In summary, the hyperparameter setting of DBN in this paper is shown in Table 2.The solid black line in Figure 6 is the measured value.It can be inferred that the active output of the photovoltaic power station has large fluctuations, and the bus load curve is severely distorted by the general saddle type, showing irregular fluctuations.If short-term load forecasting is used at this time, it will cause a large error and waste a lot of information.The remaining curves are the forecasting values of the ARIMA model, NN model, PSR-NN model, LSTM model, PSR-DBN model (no tuning), and DBN model.It can be clearly seen that the predicted curves of the ARIMA model with the golden dashed line and the NN model with the blue dashed line deviate significantly from the black measured values, while the forecasting performance of other models cannot be directly judged by curves.In order to more intuitively see the prediction accuracy of each model, the forecasting evaluation indicators and training time of each model are calculated as shown in Table 3.In Table 3, it is seen that the training time of all models meet the requirements of VSTLF.The PSR-DBN model proposed in this paper has the smallest indicators, except for the second smallest RMSE in all seven models.The RMSE of the PSR-DBN model proposed in this paper is only 0.0006 more than the minimum RMSE from the PSR-NN model.The 'Inf' of GAME indicates that the product of local error exceeds the upper limit of double type.It can be seen that the prediction accuracy of the DBN pre-trained by the CD method is better than that of the ordinary NN.The forecasting evaluation indicators of the PSR-DBN model and PSR-NN model, which adds PSR link to reconstruct original data, are also significantly less compared with the DBN model and the NN model.Compared with the PSR-DBN model without tuning, the tuned PSR-DBN model also has less evaluation indicators.Therefore, it can be inferred that the proposed method has higher prediction accuracy and better forecasting performance in the one-hour very short-term prediction of bus load with high distributed energy permeability and large fluctuation.
Figure 7 is a bar graph of the relative error of the predicted values of each model on 17 May.It can be seen from Figure 7 that the prediction errors of the ARIMA model and NN model are larger than those of the other five models, and the time with large prediction errors is concentrated in the noon period.At this time, the power output of the photovoltaic power station is large and vulnerable to clouds and other weather factors, resulting in great fluctuations of output and difficulties in prediction.Although the prediction accuracy of the noon period is not improved after adding the PSR algorithm, the reconstruction of the data reduces the influence of the fluctuation of the historical load data at noontime on the load forecasting of other time periods, thus effectively reducing the relative error of other time periods and improving the prediction accuracy of the model.

Prediction of Different Forecasting Horizons (5 min to 1 h ahead)
In order to verify the adaptability of the proposed PSR-DBN model to different forecasting horizons, the DBN and NN models are still employed in this paper, and the load of 15-18 May is forecasted by using historical data of 1-14 May.The forecasting horizons are 5 min to 1 h, and the corresponding MAPE is calculated, respectively.The MAPE curves of different models vary with forecasting horizons (shown in Figure 8).
(g)   As can be seen in Figure 8, MAPE generally increases with increase of the forecasting horizon.As can be seen in Figure 8, MAPE generally increases with increase of the forecasting horizon.The model proposed in this paper has higher prediction accuracy in most of the forecasting horizon, while the forecasting performance of the NN model is not ideal.When the forecasting horizon is 5 min to half an hour, the DBN model and PSR-DBN model do not have much difference in prediction accuracy.However, when the forecasting horizon is further increased from half an hour to one hour, the advantage of reconstructing data by PSR begins to appear.At this time, the MAPE of the PSR-DBN model is obviously less than that of the pure DBN model.Therefore, the model proposed in this paper can also have a small prediction error within the forecasting horizon of 5 min to 1 h and better adaptability in different forecasting horizons.
As can be seen in Figure 8, MAPE generally increases with increase of the forecasting horizon.The model proposed in this paper has higher prediction accuracy in most of the forecasting horizon, while the forecasting performance of the NN model is not ideal.When the forecasting horizon is 5 min to half an hour, the DBN model and PSR-DBN model do not have much difference in prediction accuracy.However, when the forecasting horizon is further increased from half an hour to one hour, the advantage of reconstructing data by PSR begins to appear.At this time, the MAPE of the PSR-DBN model is obviously less than that of the pure DBN model.Therefore, the model proposed in this paper can also have a small prediction error within the forecasting horizon of 5 min to 1 h and better adaptability in different forecasting horizons.

Conclusions
In this paper, aiming at the adaptability of forecasting horizons, the time series characteristics of the load, and the fluctuation caused by large amounts of distributed power access in bus load forecasting, a very short-term bus load forecasting model based on phase space reconstruction and deep belief network is proposed.The time series is projected by phase space reconstruction as the trajectory of a moving point in phase space, then the excellent non-linear fitting ability of DBN network is applied to fit the trajectory, so as to realize bus load forecasting.This paper also employs a practical

Figure 2 .
Figure 2. Structure of the traditional Deep Belief Network (DBN).

Figure 2 .
Figure 2. Structure of the traditional Deep Belief Network (DBN).

p
at the next moment is unknown, so the output layer only needs to output the predicted value1 ( 1)ˆi m t p + + − of the load at time

DBNFigure 4 .
Figure 4.The input, output, and structure of the network.

China from 1
photovoltaic power station with an installed capacity of about 50 MW, and the sampling time interval of load data is 5 minutes.During the period, the substation had no overhaul or fault shutdown.The reliability of historical data is high, and 3 σ criteria is used to detect that there is no abnormal data.In this paper, the data of 1-14 May are selected as the training set of the PSR-DBN forecasting model, and cross-validation is used to adjust the model hyperparameters.The data of 15-18 May are selected as the forecasting test set.If only one-step forecasting of the load in the next 5 minutes

Figure 4 .
Figure 4.The input, output, and structure of the network.

Figure 5
Figure 5. Curves of

4. 5 . Forecasting Result 4 . 5 . 1 .Figure 6 .
Figure 6.Load forecasting curve one hour ahead of 15-18 May.The solid black line in Figure 6 is the measured value.It can be inferred that the active output of the photovoltaic power station has large fluctuations, and the bus load curve is severely distorted by the general saddle type, showing irregular fluctuations.If short-term load forecasting is used at this time, it will cause a large error and waste a lot of information.The remaining curves are the forecasting values of the ARIMA model, NN model, PSR-NN model, LSTM model, PSR-DBN model

4. 5 . 2 .
Prediction of Different Forecasting Horizons (5 minutes to 1 hour ahead) In order to verify the adaptability of the proposed PSR-DBN model to different forecasting horizons, the DBN and NN models are still employed in this paper, and the load of 15-18 May is forecasted by using historical data of 1-14 May.The forecasting horizons are 5 minutes to 1 hour, and the corresponding MAPE is calculated, respectively.The MAPE curves of different models vary with forecasting horizons (shown in Figure 8).

Figure 8 .
Figure 8.The curve of MAPE varies with forecasting horizons.
The model proposed in this paper has higher prediction accuracy in most of the forecasting horizon, while the forecasting performance of the NN model is not ideal.When the forecasting horizon is 5 minutes to half an hour, the DBN model and PSR-DBN model do not have much difference in prediction accuracy.However, when the forecasting horizon is further increased from half an hour to one hour, the advantage of reconstructing data by PSR begins to appear.At this time, the MAPE of the PSR-DBN model is obviously less than that of the pure DBN model.Therefore, the model proposed in this paper can also have a small prediction error within the forecasting horizon of 5 minutes to 1 hour and better adaptability in different forecasting horizons.

Figure 8 .
Figure 8.The curve of MAPE varies with forecasting horizons.

Table 1 .
Mean Absolute Percentage Error (MAPE) of models with various number of hidden layers.

Table 3 .
Forecasting performance of each model.