A Multi-Stage Planning Method for Distribution Networks Based on ARIMA with Error Gradient Sampling for Source–Load Prediction

As the scale of distributed renewable energy represented by wind power and photovoltaic continues to expand and load demand gradually changes, the future evolution of the smart distribution network will be directly driven by both distributed generation and user demand. The smart distribution network contains a wide range of flexible resources, and its flexibility and uncertainty will bring great challenges to grid data acquisition and control feedback. To adapt to the precise control and feedback of smart distribution network access equipment under the high proportion of new energy access and to ensure the safe operation of the system, it is urgent to accelerate the study of the evolution of the future distribution grid based on the existing distribution grid. Hence, a multi-stage planning method for distribution networks based on source–load prediction is proposed in this paper. Firstly, a distribution network source–load prediction method based on the autoregressive integrated moving average model (ARIMA) and error gradient sampling is proposed, using ARIMA to predict the scale of source–load development and error gradient sampling based on the generation of source–load scenarios with error intervals. K-means is further used for scenario reduction, to explore multiple operating scenarios of China’s distribution network source–load, and the unit’s output forecast interval and load demand from 2021 to 2030 for typical regions are derived using rolling forecasts by combining the unit’s output, end-demand and clean energy share over the years. Secondly, the planning model of distribution grid evolution in different stages is constructed to analyze the future evolution form of the distribution grid considering the distribution network’s load cross-section, respectively, and to provide a development path reference for the future construction of distribution grid form in China.


Introduction
With the broad access to new energy sources such as wind power and photovoltaic, and the development of customer-side electricity load, the increase in its controllability has put higher requirements on the operation mode and structure form of the distribution network. The distribution network is no longer a passive network that simply receives and distributes electricity but a primary support platform that carries a large amount of distributed energy access and supports electricity interaction between users in production and sales. Due to the significant uncertainty and volatility of distributed clean energy sources such as wind power and photovoltaic, it is essential to consider not only the planning constraints of component and equipment investments but also the operation of the distribution network under different distributed renewable energy output scenarios in the distribution network planning [1,2]. Not only the traditional investment in transformers and distribution lines should be considered, but also the allocation and utilization of flexible regulation resources such as energy storage and user-side controllable loads to enhance With the continuous development and construction of new power systems, there will be a large number of new energy sources connected to the distribution network in the future, and the uncertainty of new energy sources will have a greater impact on the power grid. At the same time, the scale and type of customer-side load will gradually expand from the original single fixed load to multiple flexible loads. The source-load situation of the distribution network will develop from "source follows load" to "load follows source." Therefore, the morphological evolution of the distribution network will be mainly reflected in the increase, decrease, and expansion of grid power and customer-side equipment. In order to cope with the impact of the uncertainty of the extensive access of new energy sources on the distribution network and the interaction of multiple flexible loads on the customer side, it is necessary to analyze and predict the future source-load trend more accurately, so as to reduce the impact of source-load fluctuations on the distribution network morphology in a long-term scale. Based on the results of different types of load forecasting in typical areas, it is also necessary to consider the actual situation in different areas of the distribution network, including the development of the urban residential population, the investment and expansion of factories and shopping malls, and the growth of diversified loads and other time section nodes. It is necessary to consider the above factors, construct the planning model of the power grid, and form the optimal evolution route of the distribution network with the objective of minimizing the total planning cost during the planning period of the distribution network. Therefore, the framework of morphological evolution analysis of the distribution grid at different stages is shown in Figure 1.
Sensors 2022, 22, x FOR PEER REVIEW 4 of 30 different areas of the distribution network, including the development of the urban residential population, the investment and expansion of factories and shopping malls, and the growth of diversified loads and other time section nodes. It is necessary to consider the above factors, construct the planning model of the power grid, and form the optimal evolution route of the distribution network with the objective of minimizing the total planning cost during the planning period of the distribution network. Therefore, the framework of morphological evolution analysis of the distribution grid at different stages is shown in Figure 1.

Prediction of Source-Load Morphology in Distribution Networks Based on ARIMA and Error Gradient Sampling
The planning elements of the future distribution network will be richer, and various types of sources and loads will have greater variability in physical properties and dynamic characteristics. In particular, access to distributed generation sources, energy storage, electric vehicles, and other devices often has a strong degree of freedom, which leads to a significant increase in the differences and uncertainties of individual user characteristics, which not only makes it difficult to carry out uniform metrics, but also greatly increases the difficulty of analysis and prediction. In this regard, a specific analysis of various source and load planning elements is needed to condense the common features of a large number of source-load responses, while distinguishing the typical individual differences between different sources and loads.
In the analysis of the morphological evolution of the distribution network, it is necessary to take the trend of the evolution of the unsupported source-load morphology of the distribution network as the premise, while there are more types of source-loads in China's distribution network. In the prediction and analysis of the source-load morphology, it is necessary to effectively grasp the trend of the source-load development and dig

Prediction of Source-Load Morphology in Distribution Networks Based on ARIMA and Error Gradient Sampling
The planning elements of the future distribution network will be richer, and various types of sources and loads will have greater variability in physical properties and dynamic characteristics. In particular, access to distributed generation sources, energy storage, electric vehicles, and other devices often has a strong degree of freedom, which leads to a significant increase in the differences and uncertainties of individual user characteristics, which not only makes it difficult to carry out uniform metrics, but also greatly increases the difficulty of analysis and prediction. In this regard, a specific analysis of various source and load planning elements is needed to condense the common features of a large number of source-load responses, while distinguishing the typical individual differences between different sources and loads.
In the analysis of the morphological evolution of the distribution network, it is necessary to take the trend of the evolution of the unsupported source-load morphology of the distribution network as the premise, while there are more types of source-loads in China's distribution network. In the prediction and analysis of the source-load morphology, it is necessary to effectively grasp the trend of the source-load development and dig into the trend and periodicity of the source-load evolution on the basis of the current stage of the source-load development state. The ARIMA prediction model can exploit the trend of the series to be predicted on the basis of the existing time series data. Compared with the general prediction methods, its model is simple and can effectively take into account Sensors 2022, 22, 8403 5 of 28 the smoothness of time series data to accurately predict the data development trend. The source-load growth rate is introduced to dynamically correct the error interval range of adjacent moments in the error gradient sampling process, and then multiple load scenarios are derived. Based on the load scenarios, the ARIMA model is used to make rolling forecasts of power sources, which eventually form multi-class source-load prediction scenarios, combined with K-means clustering to reduce the scenarios, and then derive the evolution curve of the source-load development trend.

The Source-Load Prediction Model Based on ARIMA
ARIMA is a time series forecasting analysis method that combines the autoregressive model and moving average model to form ARMIA(p, d, q) model. The input data, i.e., source-load historical data, are transformed into smooth data by difference, and then the dependent variable is regressed only on its own lagged value and the present and lagged values of the random error term to derive the final source-load forecasting results, where p is the number of autoregressive terms, q is the number of moving average terms, and d is the number of differences (order) made to make it a smooth series.
The smoothness of the source-load data from the input history is first tested using the difference method, which is the process of converting non-smooth data to smooth by removing its non-constant trend. Firstly, the source-load data of the input history is smoothed using the difference method defining the source-load series of the tth day of the history as x t and the difference process as the difference between the time series at t and t − 1 moments, as shown in Equation (1).
where ∆x t is the first-order difference of the time series at moment t. The second-order difference is performed on the basis of the first-order difference to observe whether the data are smooth or not.
where ∆(∆x t ) is the second-order difference of the time series at moment t. The difference method is used to determine the difference order d, when the time series tends to be smooth. On this basis, the ARIMA model is constructed, and the orders p and q are determined at the same time. The autoregressive order p and the moving average order q need to be determined in the modeling process of the ARIMA model. p and q are usually selected using the autocorrelation function (ACF) and the partial autocorrelation function (PACF). ACF is used to describe the linear correlation between the predicted and historical values of the time series, and PACF is used to describe the linear correlation between the predicted values of the time series expected past observations given intermediate observations. The formulas for ACF and PACF are as follows.
It is known that the input sequence X = {x 1 , x 2 , x t ,..., x n }, whose mean is denoted by u as shown in Equation (3), and variance is denoted by σ 2 as shown in Equation (4).
For two different series X and Y of equal length, the covariance can be used to portray their correlation, and the specific covariance formula is as follows. where u x and u y denote the mean values of series X and Y, respectively. The larger the value of the covariance cov (X, Y), the stronger the correlation between the sequences X and Y (positive correlation when greater than 0, negative correlation when less than 0). Similarly, for the sequence X, the corresponding serial self-covariance is calculated based on the number of lags k of the sequence, i.e., According to c k , the autocorrelation coefficient ACF can be obtained as: The partial correlation coefficient PACF of the series can be obtained from the c k and Toeplitz matrix as: where R is the Toeplitz matrix, which can be expressed by Equation (10), and r is the autocorrelation vector, which can be expressed by Equation (11).
The ACF and PACF of the model are calculated by choosing the appropriate autocorrelation order p and moving average order q. When the ACF converges quickly to 0 after being greater than a certain constant, this constant is called the autoregressive order p. Similarly, the PACF converges quickly after being greater than a certain constant, which is the moving average order q; then the model is fitted.
There is subjectivity in selecting p and q using ACF and PACF, so the Akaike information criterion (AIC) and Bayesian information criterion (BIC) are introduced to choose p and q more objectively, whose AIC can be expressed by Equation (12) and BIC can be described by Equation (13).
where L denotes the great likelihood function of the model, K denotes the number of model parameters, and AIC and BIC can balance the prediction error and model complexity, and according to the information criterion function, determine the model's order.
After determining the difference order d, the autoregressive order p and the moving average order q, the parameters of the ARMIA(p, d, q) model are formed, based on which the input smoothed posterior time series are predicted as shown in the following equation.
where x t is the predicted output of the autoregressive model at moment t, x t ,i ∈ [t − 1,t − p] is the historical input sequence, and u t is the random perturbation. When the random perturbation term is assumed to be white noise, Equation (14) can be rewritten in the following form.
where ε t is the variance of white noise. α p is the weighting factor of the historical input series. If u t is not white noise, a moving average of order q is used to represent it. That is: where ε t , i ∈ [t − 1,t − q] denotes the white noise series. β q is the weighting factor of the white noise sequence. In particular, when x t = u t , i.e., the current value of the time series is not related to the historical values and depends only on the linear combination of historical white noise, the moving average term is obtained as: The ARIMA(p, q) is obtained by combining the autoregressive model and the moving average model based on the difference method, as shown in the following equation.

Source-Load Scene Generation Based on Error Gradient Sampling Method
In order to generate specific source-load scenarios for the next 10 years, this paper will use the random sampling method to construct source-load scenarios for the next 10 years. However, in the random sampling process the source-load value at the future t + 1 moment will be constrained by the source-load value at the t moment, i.e., after determining the source-load scenario at the t moment, the source-load distribution at the t + 1 moment is not in satisfying the initial predicted distribution, to solve this problem this paper introduces the concept of source-load growth rate, whose specific expression is shown in Equations (19) and (20).
where P t+1 and P t are the source-load values at moments t + 1 and t, respectively, and r P,min and r P,max are the minimum and maximum growth rates of historical source-loads, respectively. The predicted distribution at the moment t + 1 is dynamically corrected in real time after randomly generating the source-load scenario at moment t, and the specific correction formula is as follows.
init,min if P t+1 mean − P t+1 min > r P,min P t+1 mean − r P,min else (21) init,max if P t+1 max − P t+1 mean > r P,max P t+1 mean + r P,max else where P t+1,max , P t+1,min and P t+1,mean are the initial predicted maximum, minimum and trend baseline values of the source-load at time t + 1, respectively.

Multi-Stage Morphological Evolution Planning Model of Distribution Network
According to the above source-load prediction results, the distribution network can access a variety of flexible adjustment resources at different time points and adjust the operation mode of the network by optimizing its own scheduling or adding flexible adjustment resources. In this chapter, the multi-stage planning model of the distribution network is constructed to analyze the morphological evolution route of distribution network at different time points.

Objective Function
As the proportion of new energy sources such as wind and solar energy is gradually increasing, and demand-side resources are more widely involved in power grid operation and dispatching, when analyzing the evolution of distribution network, it is necessary to consider the development trend of customer-side resources and wind and solar resources. Based on this, the planning model of distribution network is constructed. The planning model takes the minimum planning cost of the distribution network system as the objective function, and considers the distribution network loss, system equipment investment cost and equipment operation cost, as shown in Equation (23).
where C e denotes the time coefficient of power grid loss cost, E eloss denotes the power loss of distribution network, C s denotes the initial operation cost of power distribution system, C M denotes the total cost of operation and maintenance of distribution system equipment, C MP is the operation and maintenance cost of the equipment in the first year; n is the total number of equipment that the system needs to build; C j is the initial investment cost per unit capacity of equipment j; Q j is the capacity of equipment j; t r is the tax rate; i is the interest rate; L p is the equipment planning year; r M is the maintenance rate of the equipment; α n is the development degree of the system in the nth year; C Q is the cost of abandoned wind and light, C QP is the loss of abandoned wind and light in the first year of the system; C r is the penalty coefficient of abandoned wind and light, and P r is the amount of abandoned wind and light.

Equipment Capacity Constraint
The capacity constraint of the above involved devices is shown in Equation (29).
where Q EX,k denotes the actual reserve capacity of the resource k, Q min EX,k and Q max EX,k denote the minimum and maximum reserve capacity of the resource k, P j denotes the actual output of the device j, P min j and P max j denote the minimum and maximum output of the device j, P u,j denotes the actual upward climbing rate of the device j, P min d,j and P max u,j denote the minimum and maximum upward climbing rate of the device j, P d,j denotes the actual downward climbing rate of the device j, P min d,j and P max d,j denote the minimum and maximum downward climbing rate of the device j.

Power Grid Operation Constraint
The distribution network system model mainly includes power balance constraint, nodal voltage constraint, three-phase power flow constraint, line power constraint, generator output constraint, and generator climbing constraint.
where P ij,t denotes the transmission power between node I and node j at time t, P g,t denotes the active power of the generator at time t, P WT i,t denotes the active power of wind power at node i at time t, P PV i,t denotes the active power of photovoltaic at node i at time t, P indl i,t denotes the industrial load power at node i, P ecol i,t denotes the commercial load power at node i, P resl i,t denotes the residential load power at node i, P EV,in i,t and P EV,out i,t denote the charging power and the discharge power of the electric vehicle at node i at time t, Q ij,t denotes the reactive power between node i and node j at time t, Q g,t denotes the reactive power of the generator at time t, Q EV i,t denotes the reactive power of electric vehicle of node i at time t, Q indl i,t denotes the industrial load reactive power at node i, Q ecol i,t denotes the commercial load reactive power at node i, Q resl i,t denotes the residential load reactive power at node i, V i,t denotes the voltage of node i at time t, G ij denotes the conductance between node i and node j at time t, B ij denotes the susceptance between node i and node j, θ ij denotes the voltage phase angle between node i and node j, P max ij denotes the maximum transmission power, P min g , P max g denote the upper and lower limits of the generator active power, respectively, Q min g , Q max g denote the upper and lower limits of the generator reactive power, respectively, RU g , RD g denote the upper and lower climbing limits of the generator.

Morphological Evolution Planning Analysis of Typical Distribution Network Based on Source-Load Prediction
In this paper, the source-load prediction model proposed in Section 4.1 is used to predict the trend of the nodes of the typical example system. The example system is shown in Figure 2.
charging power and the discharge power of the electric vehicle at node i at time t, Qij,t denotes the reactive power between node i and node j at time t, Qg,t denotes the reactive power of the generator at time t, Q EV i,t denotes the reactive power of electric vehicle of node i at time t, Q indl i,t denotes the industrial load reactive power at node i, Q ecol i,t denotes the commercial load reactive power at node i, Q resl i,t denotes the residential load reactive power at node i, Vi,t denotes the voltage of node i at time t, Gij denotes the conductance between node i and node j at time t, Bij denotes the susceptance between node i and node j, θij denotes the voltage phase angle between node i and node j, P max ij denotes the maximum transmission power, P min g , P max g denote the upper and lower limits of the generator active power, respectively, Q min g , Q max g denote the upper and lower limits of the generator reactive power, respectively, RUg, RDg denote the upper and lower climbing limits of the generator.

Morphological Evolution Planning Analysis of Typical Distribution Network Based on Source-Load Prediction
In this paper, the source-load prediction model proposed in Section 4.1 is used to predict the trend of the nodes of the typical example system. The example system is shown in Figure 2. In this example, the load demand of each node is equivalently converted from the predicted load of typical areas. Among them, nodes 1, node 5, node 6, node 7, and node 10 are mainly industrial loads, and the total load of nodes 1 and 7 is higher among all nodes. The total load of nodes 2, 8, and 9 is lower, the economic growth of the region represented by node 2 is slower, and the commercial and industrial loads account for a more average share. Nodes 8 and node 9 are relatively small and dominated by commercial loads due to their geographical location; load distribution and the total amount of remaining nodes are more uniform.
Nodes 1, 4, and 7 also have a higher demand for electric vehicles, which is consistent with their larger industrial loads. Due to the impact of the epidemic, the load growth trend of each node is small before 2024, the economy is on track after 2024, and the population of residents in various regions is gradually rising, driving the load growth of the commercial complex. With the morphological evolution of the distribution network, some nodes In this example, the load demand of each node is equivalently converted from the predicted load of typical areas. Among them, nodes 1, node 5, node 6, node 7, and node 10 are mainly industrial loads, and the total load of nodes 1 and 7 is higher among all nodes. The total load of nodes 2, 8, and 9 is lower, the economic growth of the region represented by node 2 is slower, and the commercial and industrial loads account for a more average share. Nodes 8 and node 9 are relatively small and dominated by commercial loads due to their geographical location; load distribution and the total amount of remaining nodes are more uniform.
Nodes 1, 4, and 7 also have a higher demand for electric vehicles, which is consistent with their larger industrial loads. Due to the impact of the epidemic, the load growth trend of each node is small before 2024, the economy is on track after 2024, and the population of residents in various regions is gradually rising, driving the load growth of the commercial complex. With the morphological evolution of the distribution network, some nodes will be connected to new energy, and urban users will also carry out some demand response behavior of adjustable load.
In order to verify the effectiveness of the method proposed in this paper, two scenarios are set up for comparative analysis: S1: Single-stage planning scenario of distribution network considering the source-load prediction; S2: Multi-stage planning scenario of distribution network considering the source-load prediction.
The effectiveness of the proposed method is verified by comparing the results of the two scenarios. As the ARIMA model is not capable of multi-step forecasting, a rolling forecast is used for multi-step forecasting, and a schematic diagram of the rolling forecast is shown in Figure 3.

Prediction Analysis of the Coming
In order to verify the effectiveness of the method proposed in this paper, two scenarios are set up for comparative analysis: S1: Single-stage planning scenario of distribution network considering the sourceload prediction; S2: Multi-stage planning scenario of distribution network considering the sourceload prediction.
The effectiveness of the proposed method is verified by comparing the results of the two scenarios.

Prediction Analysis of the Coming Distributed Generations Morphology in the Distribution Network
As the ARIMA model is not capable of multi-step forecasting, a rolling forecast is used for multi-step forecasting, and a schematic diagram of the rolling forecast is shown in Figure 3. The annual load, gas turbine output, wind power output, and photovoltaic output data of a region are used to predict the annual load, gas turbine output, wind power output, and photovoltaic output of the region in the next 10 years. The data of load, gas turbine output, wind power output, and photovoltaic output over the years are shown in Figure 4.
Predicted value The annual load, gas turbine output, wind power output, and photovoltaic output data of a region are used to predict the annual load, gas turbine output, wind power output, and photovoltaic output of the region in the next 10 years. The data of load, gas turbine output, wind power output, and photovoltaic output over the years are shown in Figure 4.  In Figure 4, blue is the historical output curve of gas turbines. It can be found that th output of gas turbines shows a downward trend from 2013 to 2019. The red is the win power output curve, and the wind power output is in a high growth state. The yellow In Figure 4, blue is the historical output curve of gas turbines. It can be found that the output of gas turbines shows a downward trend from 2013 to 2019. The red is the wind power output curve, and the wind power output is in a high growth state. The yellow is the photovoltaic output curve, and the photovoltaic output is small, but the overall trend is still growing.
Firstly, the ARIMA model is established for the historical load data, and the ADF unit root is used to test the stability of the original historical load, first-order difference, and second-order difference sequence. The results are shown in Figure 5. In Figure 4, blue is the historical output curve of gas turbines. It can be found that the output of gas turbines shows a downward trend from 2013 to 2019. The red is the wind power output curve, and the wind power output is in a high growth state. The yellow is the photovoltaic output curve, and the photovoltaic output is small, but the overall trend is still growing.
Firstly, the ARIMA model is established for the historical load data, and the ADF unit root is used to test the stability of the original historical load, first-order difference, and second-order difference sequence. The results are shown in Figure 5. It can be seen from Figure 5 that the ADF unit root of the second-order difference sequence is −5.78 much smaller than the ACF unit root of the first-order difference and the original sequence, indicating that the stability of the second-order difference is stronger, and it can be seen that the values fluctuate around 0, so the difference coefficient d of ARIMA is 2. It can be seen from Figure 5 that the ADF unit root of the second-order difference sequence is −5.78 much smaller than the ACF unit root of the first-order difference and the original sequence, indicating that the stability of the second-order difference is stronger, and it can be seen that the values fluctuate around 0, so the difference coefficient d of ARIMA is 2.
ACF and PACF are used to select p and q for the ARIMA model. The results of ACF and PACF for the second-order differential load sequences are shown in Figure 6. ACF and PACF are used to select p and q for the ARIMA model. The results of ACF and PACF for the second-order differential load sequences are shown in Figure 6. It can be seen from Figure 6 that both ACF and PACF are first-order truncations, that is the value of the sequence becomes very small after the first point, so p and q should be taken as 1. However, due to the subjectivity of using ACF and PACF to select p and q, p and q are corrected by Equations (12) and (13).
The prediction process of gas turbine output, wind power output, and photovoltaic output in the next 10 years is similar to the appeal process. The relevant process is no longer given specifically, and the modeling results and prediction results are given directly. The modeling results are ARIMA (1,2,3), ARIMA (1,2,0), ARIMA (0,2,0). The prediction results are shown in Figures 7-9, respectively. It can be seen from Figure 6 that both ACF and PACF are first-order truncations, that is the value of the sequence becomes very small after the first point, so p and q should be taken as 1. However, due to the subjectivity of using ACF and PACF to select p and q, p and q are corrected by Equations (12) and (13).
The prediction process of gas turbine output, wind power output, and photovoltaic output in the next 10 years is similar to the appeal process. The relevant process is no longer given specifically, and the modeling results and prediction results are given directly. The modeling results are ARIMA (1,2,3), ARIMA (1,2,0), ARIMA (0,2,0). The prediction results are shown in Figures 7-9, respectively. It can be seen from Figure 6 that both ACF and PACF are first-order truncations, that is the value of the sequence becomes very small after the first point, so p and q should be taken as 1. However, due to the subjectivity of using ACF and PACF to select p and q, p and q are corrected by Equations (12) and (13).
The prediction process of gas turbine output, wind power output, and photovoltaic output in the next 10 years is similar to the appeal process. The relevant process is no longer given specifically, and the modeling results and prediction results are given directly. The modeling results are ARIMA (1,2,3), ARIMA (1,2,0), ARIMA (0,2,0). The prediction results are shown in Figures 7-9, respectively.    On the basis of the historical gas turbine output curve, combined with the characteristics of the distributed generation at the present stage [24], the prediction trend and error interval of the gas turbine in the next 10 years are formed. Due to the large-scale access to new energy in the future, the utilization rate of primary energy is gradually reduced, and the output of the traditional unit also shows a gradual downward trend. Because the future energy development trend cannot be accurately obtained, the rolling prediction is carried out by using the three times variance of the data of adjacent years, so the error interval will be generated, as shown in Figure 7a. The proposed error gradient sampling method is used to generate the scene of the gas turbine output trend in the error interval, and the K-means clustering is used to reduce the generated scene, as shown in Figure 7b,c. On this basis, the final output curve of the selected gas turbine in the next 10 years is shown in Figure 7d, and the overall output curve shows a downward trend.
There are abundant wind power resources in some areas of China. With the development of new power systems and the breakthrough of wind power technology, the output of wind power in China has generally shown an upward trend in the past 10 years. Based on the analysis of the development trend of wind power [25], ARIMA combined with the principle of three times variance is used to predict the output of wind power in the next 10 years, and the trend and error interval of wind power output in the next 10 years is obtained, as shown in Figure 8. On this basis, the error gradient sampling method is used to generate multiple wind power output trend scenarios in the error interval and K-means clustering is used to reduce them, as shown in Figure 8b,c. The final selected wind power output scenario is shown in Figure 8d. It can be seen that the wind power output will show a rapid upward trend in the next 10 years, but the update in technology can drive the development of wind power resources. Therefore, the growth trend of wind power output will occasionally slow down in the next 10 years.
From the historical data of photovoltaic, it can be found that its historical output curve is similar to that of wind power. ARIMA and three times variance principle are used to predict the photovoltaic output, and the output curve and error interval of photovoltaic in the next 10 years are obtained, as shown in Figure 9a. The error gradient sampling method is used to generate the photovoltaic output scene in the error interval, and the K-means clustering is used to reduce the scene. Finally, the typical photovoltaic output scene is selected, as shown in Figure 9b,d. It can be seen from the diagram that photovoltaic output is similar to wind power output, and it is also growing rapidly in the next 10 years, which is in line with the characteristics of China's future new power system construction. In the future, wind power and photovoltaics will be connected to the power grid on a large scale, gradually occupying the main body of power generation in China.

Prediction Analysis of the Coming Multi-Type Load Morphology in the Distribution Network
The network search method is used to search and traverse between p ∈ [0, 5] and q ∈ [0, 5]. Finally, the minimum AIC is 72.227 when p = 0, q = 2, so the ARIMA (0, 2, 2) model is finally established. The model is trained, and the load data for the next 10 years are predicted. The results are shown in Figure 10. From Figure 10, it can be seen that the load in the region will show an increa trend in the next 10 years, and its value fluctuates within the error range that satisfies Gaussian distribution. After processing by this method, the final load scenario of nod in the next 10 years is shown in Figure 11. Then, K-means clustering is used to reduce the generated scenarios, and the typ load scenarios in the next 10 years are obtained. The clustering center k is 4, and the typ load scenarios in the next 10 years after reductions are obtained as shown in Figure   Figure 10. Initial load prediction results.
From Figure 10, it can be seen that the load in the region will show an increasing trend in the next 10 years, and its value fluctuates within the error range that satisfies the Gaussian distribution. After processing by this method, the final load scenario of node 1 in the next 10 years is shown in Figure 11. From Figure 10, it can be seen that the load in the region will show an increasing trend in the next 10 years, and its value fluctuates within the error range that satisfies the Gaussian distribution. After processing by this method, the final load scenario of node 1 in the next 10 years is shown in Figure 11. Then, K-means clustering is used to reduce the generated scenarios, and the typical load scenarios in the next 10 years are obtained. The clustering center k is 4, and the typical load scenarios in the next 10 years after reductions are obtained as shown in Figure 12. Then, K-means clustering is used to reduce the generated scenarios, and the typical load scenarios in the next 10 years are obtained. The clustering center k is 4, and the typical load scenarios in the next 10 years after reductions are obtained as shown in Figure 12. According to the obtained clustering results, a typical scene is randomly selected for analysis, and the typical scene is selected as shown in Figure 13.  Using the same method, the load scenarios of other nodes in the next 10 years are obtained, and the results are shown in Figure 14.   Using the same method, the load scenarios of other nodes in the next 10 years are obtained, and the results are shown in Figure 14.  Using the same method, the load scenarios of other nodes in the next 10 years are obtained, and the results are shown in Figure 14.  Using the same method, the load scenarios of other nodes in the next 10 years are obtained, and the results are shown in Figure 14.  It can be seen from Figure 14 that the load of each node is on the rise, with nodes 1, 4, and 7 showing a large increase and the other nodes showing a slow growth trend in load. At the same time, in order to study the composition of each node load, the proportion of industrial heating load, industrial refrigeration load, other industrial load, commercial heating load, commercial refrigeration load, other commercial load, residential heating load, residential refrigeration load and other residential load of each node from 2010 to 2020 is analyzed, and the proportion distribution of each node load in the next 10 years is analyzed by statistical method. For the load with obvious change trend, the mean and variance of its growth rate are calculated, and the growth rate at time t + 1 is randomly selected by using the 3σ principle. For the load with no obvious change trend, the mean and variance of its proportion are counted, and the proportion at t + 1 is randomly selected by using the 3σ principle. The specific formula is as follows: where p t+1 o , p t o denote the load values with obvious proportion changes at tome t + 1 and t, p t+1 no denotes the load value whose proportion change is not obvious at time t + 1, r P,mean and r P,std denote the mean and variance of the obvious load growth rate of the proportion change, P no,mean and P no,std denote the mean and variance of the load with no obvious change in the proportion, and random (*) is a random function. The proportion of each type of load at node 1 from 2021 to 2030 is obtained through the appeal method, as shown in Figures 15 and 16. It can be seen from Figure 14 that the load of each node is on the rise, with nodes 1, 4, and 7 showing a large increase and the other nodes showing a slow growth trend in load. At the same time, in order to study the composition of each node load, the proportion of industrial heating load, industrial refrigeration load, other industrial load, commercial heating load, commercial refrigeration load, other commercial load, residential heating load, residential refrigeration load and other residential load of each node from 2010 to 2020 is analyzed, and the proportion distribution of each node load in the next 10 years is analyzed by statistical method. For the load with obvious change trend, the mean and variance of its growth rate are calculated, and the growth rate at time t + 1 is randomly selected by using the 3σ principle. For the load with no obvious change trend, the mean and variance of its proportion are counted, and the proportion at t + 1 is randomly selected by using the 3σ principle. The specific formula is as follows: where +1 , denote the load values with obvious proportion changes at tome t + 1 and t, +1 denotes the load value whose proportion change is not obvious at time t + 1, rP,mean and rP,std denote the mean and variance of the obvious load growth rate of the proportion change, Pno,mean and Pno,std denote the mean and variance of the load with no obvious change in the proportion, and random (*) is a random function. The proportion of each type of load at node 1 from 2021 to 2030 is obtained through the appeal method, as shown in Figures 15 and 16.  As shown in Figure 15, the load of node 1 in the next 10 years is mainly industrial load, accounting for about 50% of the total load demand. Among them, the proportion of other industrial loads is about 0.35, and the proportion of commercial load and residential Figure 16. Node 1 load prediction by type.
As shown in Figure 15, the load of node 1 in the next 10 years is mainly industrial load, accounting for about 50% of the total load demand. Among them, the proportion of other industrial loads is about 0.35, and the proportion of commercial load and residential load is about 0.25. The proportion of load in node 1 in 10 years is basically unchanged, and the proportion of industrial load has declined. On the contrary, the proportion of commercial load has increased. This is mainly due to the regional load is always the main part of the industrial load, large industrial users occupy a major position. It can be seen from Figure 16 that although the change in proportion of load is small, the annual load demand continues to increase. The total load of node 1 in 2030 is about 27.09% higher than that in 2021, which is mainly due to China 's future economic development and population. From 2021 to 2030, the industrial load will increase by about 19.06%, the commercial load will increase by about 53.89%, and the residential load will increase by about 24.12%. It can be seen that the commercial load will have a greater growth trend in the next 10 years, mainly because the development of the city will eventually be biased towards comfort and user satisfaction. The industrial load will gradually shift to areas with less cost or population, and the commercial load in areas with more population will gradually increase. The industrial, commercial and residential loads only account for a small proportion of the hot and cold loads, and the loads are still dominated by other loads.

Multi-Stage Planning Results of Distribution Network
Aiming at the source-load demand from 2021 to 2030 predicted in Section 3.2, based on the IEEE 14-node distribution system, the evolution route of the distribution network in the next ten years is analyzed under the predicted typical source-load scenarios. The node source-load data of the 14-node distribution system is shown in Figure 17. On the basis of Section 5.3, consider the time cross-section of source-load development, as shown in Figure 18. Combined with the source-load prediction results and development trend in Section 5.2, it can be seen that in 2025, the load demand shows a large increase, compared with the previous year increased by about 25%, and by 2030, the total load demand increased by 40%. Therefore, considering that there is a time cross-section from 2021 to 2030, a multi-stage evolution route of the distribution network is formed. As shown in Tables 1 and 2. The morphological evolution of distribution network is shown in Figures 19-22. On the basis of Section 5.3, consider the time cross-section of source-load development, as shown in Figure 18. Combined with the source-load prediction results and development trend in Section 5.2, it can be seen that in 2025, the load demand shows a large increase, compared with the previous year increased by about 25%, and by 2030, the total load demand increased by 40%. Therefore, considering that there is a time cross-section from 2021 to 2030, a multi-stage evolution route of the distribution network is formed. As shown in Tables 1 and 2. The morphological evolution of distribution network is shown in Figures 19-22. ment, as shown in Figure 18. Combined with the source-load prediction results and development trend in Section 5.2, it can be seen that in 2025, the load demand shows a large increase, compared with the previous year increased by about 25%, and by 2030, the total load demand increased by 40%. Therefore, considering that there is a time cross-section from 2021 to 2030, a multi-stage evolution route of the distribution network is formed. As shown in Tables 1 and 2. The morphological evolution of distribution network is shown in Figures 19-22.          As a result of the double carbon target, China has started to reduce carbon emissions, and with cars as the main greenhouse gas emitters, it is an inevitable trend to study the use of clean energy electric vehicles. Therefore, in the future evolution of the distribution network, electric vehicles will participate in power grid planning as the main load. By comparing Figures 19-22, considering the load growth cross-section in 2025, it can be found that the increase in electric vehicles, industrial and commercial refrigeration, and heating equipment shows a certain timeliness, mostly concentrated in 2021 and 2023. This is because before 2025, the load change trend is small, and the electric vehicles, industrial, commercial and residential refrigeration and heating equipment configured in 2021 can effectively meet the overall operation of the distribution network. The power grid can rely on its own dispatching operation to ensure the balance of load supply and demand, and there is no need to configure new equipment. In 2025, as the load shows massive growth and the scale of new energy is expanding before 2025 and new energy is only considered for local consumption, but after 2025, the installed capacity of new energy increases massively, and the original local balance has been unable to meet the consumption of new energy such as wind power and photovoltaic. On the basis of adding electric vehicles, and industrial, commercial, and residential refrigeration and heating equipment, new energy gradually forms cross-regional return with external power grids to promote the consumption of new energy. It can be seen from Table 2 that in the distribution network planning from 2025 to 2030, the configuration scheme of the electric vehicle charging station of node 8 and node 9 is added, and the residential heating equipment is added at nodes 5 and 14.

Comparative Analysis of Different Scene Configuration Results
Based on the above-proposed distribution network source-load prediction results, the distribution network is planned in a single phase without considering the increase or decrease in source-load in a specific year. The development of the distribution network is always accompanied by changes in energy, the alternation of new and old towns, and the addition of emerging entities (such as factories, load aggregators, residential communities, etc.). In view of the changes in node load demand and the trend of total node load under the access of different entities, considering the configuration of electric vehicles, industrial, commercial, and residential heating equipment and refrigeration equipment, the results of resource optimization configuration of the distribution network from 2021 to 2030 are shown in Table 3.
Through the comparison of planning costs in the S1 and S2 scenarios, it can be found that the total planning cost in the S1 scenario is about CNY 121,900.69, and the planning cost in the S2 scenario is CNY 117,289.49. Compared with the S1 scenario, the S2 planning cost considering the multi-stage planning scenario is reduced by about 3%, indicating that multi-stage planning can better improve the economy of planning compared with single-stage planning. From Table 3, it can be found that when the single-stage planning is carried out without considering the change in the load cross-section, compared with the configuration results of S2, most of the equipment in the S1 scenario is mainly configured in the first year of planning, and the configuration capacity is large. This is mainly because in the single-stage planning, the time cross-section change is not considered, and the cost of the entire planning cycle is minimized. The majority of the capacity allocation in the first year of planning allows for optimal system planning costs. Electric vehicle charging stations are planned at each node, and mostly in the first year. According to the load prediction results of 5.1, it can be seen that the electric vehicle load of each node of the system shows an increasing trend year by year, while the electric vehicle load of node 1 and node 7 accounts for a large proportion of the electric vehicle load of the whole system. Calculating the proportion of electric vehicles to the total node load for all nodes shows that there is less variability in the proportion of electric vehicles at each node, although node 1 and node 7 have a much larger electric vehicles allocation capacity than the other nodes, this is also a result of the total load at node 1 and node 7 having a larger demand relative to the other nodes, and therefore nodes 1 and 7 have a much larger electric vehicles charging station allocation capacity than the other nodes.
The configuration of industrial equipment is mostly concentrated on node 1 and node 7, which is mainly due to the large proportion of industrial load of these two nodes, and the relative proportion of industrial heat load demand of node 7 is relatively large. However, in the early stage of planning, the thermal load demand of nodes can be met through the dispatching instructions of power grid, so the industrial heating equipment will not be configured until 2023. For industrial refrigeration, the cooling load demand of node 1 is high, and the refrigeration equipment configuration has been carried out in 2021. When configuring equipment for commercial loads, nodes 3, 8, and 9 are considered to have a small geographical area and a smaller overall share of the load required relative to the other nodes, while nodes 4 and 5 have a larger demand, so these two nodes are allocated a large proportion of the capacity.
Comparing the configuration results of the S1 and S2 scenarios, namely Tables 2 and 3, it can be found that compared with the single-stage planning results, when considering 2025 as the time cross-section, the distribution network does not need to plan most of the capacity in the first year, which can effectively reduce the cost of the equipment configuration. Before the second stage configuration, the system can balance the source-load through its own optimal scheduling. When the line reaches the maximum capacity or the load demand increases significantly, the second configuration is performed. Compared with the overall configuration of the single-stage configuration in the first year, the equipment configuration considering the time cross-section of the distribution network can effectively reduce the planning cost of the distribution network and flexibly adjust the planning scheme of the power grid.

Analysis of Distribution Network Dispatch Results during No New Equipment
From the above analysis, it can be found that the distribution network equipment is only configured in a certain year, and in the unconfigured year, the power grid needs to reasonably plan the power output through its own dispatching operation to achieve the balance of supply and demand of each node in the network. The power output curve of the interaction between the distribution network and the external power grid in each stage year is shown in Figure 23. the interaction between the distribution network and the external power grid in each stage year is shown in Figure 23. It can be seen from the Figure 23 that after the capacity configuration of the distribution network is completed in 2021 because the load is far less than the configuration capacity, the new energy unit has additional output to help the main network to consume; after that, due to the continuous growth of the load, the power output in the distribution network is not enough to meet the load demand. At this time, it is necessary to purchase electricity from the main network to provide the node load, so the interactive power shows an increase. In 2025 form Figure 24, due to the new addition and expansion of node equipment in the distribution network, the pressure on new energy output is alleviated, and the interactive power is reduced compared with 2024, but it still shows an overall growth trend. Between 2020 and 2024, the power output of the distribution network is shown in Figure 25. It can be seen from the Figure 23 that after the capacity configuration of the distribution network is completed in 2021 because the load is far less than the configuration capacity, the new energy unit has additional output to help the main network to consume; after that, due to the continuous growth of the load, the power output in the distribution network is not enough to meet the load demand. At this time, it is necessary to purchase electricity from the main network to provide the node load, so the interactive power shows an increase. In 2025 form Figure 24, due to the new addition and expansion of node equipment in the distribution network, the pressure on new energy output is alleviated, and the interactive power is reduced compared with 2024, but it still shows an overall growth trend. Between 2020 and 2024, the power output of the distribution network is shown in Figure 25.
It can be seen from the above that 2024 is a time cross-section of load change. Before 2024, the load growth trend is relatively flat. According to Table 1, compared with the single-stage planning evolution, the equipment configuration capacity is smaller. Therefore, the power output is dominated by the gas turbines at node 1 and node 2, with a smaller output from the new energy units, and the output of each unit can meet the load demand at each node of the distribution network. It can be seen from the above that 2024 is a time cross-section of load change. Before 2024, the load growth trend is relatively flat. According to Table 1, compared with the single-stage planning evolution, the equipment configuration capacity is smaller. Therefore, the power output is dominated by the gas turbines at node 1 and node 2, with a smaller output from the new energy units, and the output of each unit can meet the load demand at each node of the distribution network.
It can be seen from Figure 26 that the load will increase greatly in 2025, which is the result of the economic recovery after the domestic epidemic and the construction of new towns. Therefore, most of the equipment expansion or new equipment is carried out in 2025 to meet the demand of power grid load. At the same time, it can be seen from Figure  26    It can be seen from Figure 26 that the load will increase greatly in 2025, which is the result of the economic recovery after the domestic epidemic and the construction of new towns. Therefore, most of the equipment expansion or new equipment is carried out in 2025 to meet the demand of power grid load. At the same time, it can be seen from Figure   It can be seen from the above that 2024 is a time cross-section of load change. Before 2024, the load growth trend is relatively flat. According to Table 1, compared with the single-stage planning evolution, the equipment configuration capacity is smaller. Therefore, the power output is dominated by the gas turbines at node 1 and node 2, with a smaller output from the new energy units, and the output of each unit can meet the load demand at each node of the distribution network.
It can be seen from Figure 26 that the load will increase greatly in 2025, which is the result of the economic recovery after the domestic epidemic and the construction of new towns. Therefore, most of the equipment expansion or new equipment is carried out in 2025 to meet the demand of power grid load. At the same time, it can be seen from Figure  26

Conclusions
In this paper, ARIMA and error gradient sampling methods are used to predict the future power structure and load demand of the distribution network, and then the multistage planning of a typical distribution network is carried out based on the prediction results and planning model of the coming source-load morphology. The conclusions obtained are shown below.

1.
The ARIMA prediction model can be used to effectively explore the trend of power supply and load and take into account the smoothness of the prediction sequence. The error gradient sampling method can effectively divide the error interval formed