Identification of Distribution Network Topology and Line Parameter Based on Smart Meter Measurements

: Accurate line parameters are the basis for the optimal control and safety analysis of distribution networks. The lack of real-time monitoring equipment in grids has meant that data-driven identification methods have become the main tool to estimate line parameters. However, frequent network reconfigurations increase the uncertainty of distribution network topologies, creating challenges in the data-driven identification of line parameters. In this paper, a line parameter identification method compatible with an uncertain topology is proposed, which simplifies the model complexity of the joint identification of topology and line parameters by removing the unconnected branches through noise reduction. In order to improve the solving accuracy and efficiency of the identification model, a two-stage identification method is proposed. First, the initial values of the topology and line parameters are quickly obtained using a linear power flow model. Then, the identification results are modified iteratively based on the classical power flow model to achieve a more accurate estimation of the grid topology and line parameters. Finally, a simulation analysis based on IEEE 33-and 118-bus distribution systems demonstrated that the proposed method can effectively realize the estimation of topology and line parameters, and is robust with regard to both measurement errors and grid structures.


Introduction
Accurate line parameters are the basis of state estimation, security control and other advanced applications in energy management systems (EMS).With the continuous expansion of power grids, the operation of distribution networks is complex and changeable; meanwhile, dynamic network reconfiguration is frequent [1,2].Unlike transmission networks, distribution networks have not realized the real-time monitoring of topology changes and periodic checks of line parameters; therefore, EMS can only rely on the topology and line impedance data recorded in grid-planning files.However, adjustments to the operation structure of a distribution network and changes in the environments around grid lines can lead to deviations between recorded data and actual parameters, which seriously affects the accuracy of EMS safety analysis and control decision results.Therefore, for a distribution network with missing or outdated topology information, it is of great significance to propose an effective line parameter identification method to improve its operation stability and management level [3,4].
Fortunately, the rapid deployment of advanced metering infrastructure (AMI) in distribution networks has enabled high-density historical data to be obtained for line parameter identification [5,6].In [7][8][9], line parameters were estimated based on nonlinear least squares (NLS), reweighted NLS and augmented state estimation, respectively.In [10], a graphical learning algorithm based on physical parameters was proposed, and the stochastic gradient descent method was used to estimate line parameters; however, the identification efficiency was low in networks with a large number of lines; In [11], a fast graphical learning method was proposed for a large-scale distribution network, which improved the efficiency of parameter estimation.In [12,13], line parameters were identified using particle swarm optimization and a machine learning algorithm, respectively; however, these had little physical interpretability and could not guarantee the generalization of an identification model in different distribution networks.In [14], a hierarchical estimation strategy for line parameters was proposed based on the generalized equations of line voltage drops, and reinforcement learning was used to improve the robustness of the line impedance estimation method to measurement errors.Although the above methods can identify line parameters, they all require prior information for an accurate network topology.
The uncertainty of dynamic distribution network topologies limits the use of the above identification methods in engineering applications.In this regard, some scholars have proposed a joint identification method for distribution network topologies and line parameters.In [15,16], topology identification was transformed into a generalized low-rank approximation problem, and the error-in-variables (EIV) model was used to realize the joint identification of topology and line impedance in a maximum-likelihood estimation framework.In [17], the distribution network topology and line impedance were estimated through the iteration of the Kalman filtering method and Newton-Raphson method.In [18], a deep-shallow neural network was proposed, and the network topology and parameters were identified based on reinforcement learning.In [19], a complex recursive grouping algorithm for the unsupervised identification of topology and line parameters was adopted, which is applicable in distribution networks with latent nodes.The methods proposed in [15][16][17][18][19] are based on the data collected by phasor measurement units (PMU), which synchronously sample the power and voltage measurement data of buses based on global positioning system (GPS) time references.
However, due to the limitations of economic cost, PMU devices have not been fully deployed in many distribution networks [20], and smart meters are generally used as alternative measurement equipment.Compared with PMU, smart meters cannot obtain voltage angle measurement information.There are several identification methods based on smart meter data.In [21], a linear regression model was established to construct the network topology and calculate the line parameters sequentially from the leaf nodes to the root nodes; however, the calculation capacity required by the algorithm increased rapidly with large-scale distribution networks.In [22], the measurement data of voltage magnitudes and power injections were used to estimate the impedance distance between observed buses, and the operation topology and line parameters were iteratively identified.In [23], a topology label matrix was constructed based on the LinDistFlow model, and the topology and line parameters were estimated by feature clustering.Nevertheless, the methods proposed in [22,23] require a large number of samples, thus the effectiveness of algorithm cannot be guaranteed in a distribution network with frequent topology changes, and they solely take the radial grid structure into consideration.
As above, existing parameter identification methods generally require prior information on grid topology, PMU devices, or months of measurement data.To overcome these limitations, we propose a method to identify topology and line parameters simultaneously that does not rely on prior information of the grid topology or parameters.Consequently, it addresses uncertainty in the topology, which may change flexibly and frequently in distribution networks.First, a linear power flow model is established according to the operating characteristics of the distribution network and the voltage angle information is ignored temporarily.The initial identification results of distribution network topology and line admittance are quickly obtained through the simplified model.Then, the identification results are modified through decoupling iterative optimization among variables, aiming to reduce the influence of modeling errors on the accuracy of the parameter estimation.Finally, we demonstrate the effectiveness of our method on IEEE test cases, and the robustness of proposed algorithm is verified through a sensitivity analysis.
The main contributions of this paper are given as follows: (1) All the measurement data are provided by smart meters and do not rely on expensive equipment such as PMU.Additionally, the samples required by the algorithm ars quite reasonable, only one day of historical data is needed in a five-minute sampling period, which can avoid topology changes during data collection to a large extent; (2) The method accurately identifies the topology and effectively estimates the line parameters; moreover, it is suitable for large-scale distribution networks and superior to the method that can only estimate the topology or line parameters; (3) The proposed identification algorithm is robust to measurement noises, and is applicable for distribution grids with different structures, including weakly meshed grids.
The rest of this paper is organized as follows.Section 2 provides the mechanism analysis of line parameter identification and demonstrates the overall flow of the proposed method.Sections 3 and 4 concretely introduce the initial identification strategy and enhanced identification strategy, respectively.Section 5 validates the performance of our method on IEEE 33-and 118-bus distribution systems.Section 6 concludes the paper.

Mechanism Analysis of Line Parameter Identification in Dynamic Distribution Network
The admittance matrix contains both the topology and line parameter information of the distribution network, so this paper solves the estimated admittance matrix G ˆand B based on the power and voltage data collected by smart meters.Similar to state estimation, the objective function is established according to the minimum variance between actual power injection and the power calculated by G ˆ, B ˆ, which can be expressed as follows. min where N is the total number of buses in the distribution network, p i and q i are the active and reactive power injection of bus i, respectively, p ˆi and q ˆi are the estimated active and reactive power injection of bus i, respectively.According to the classical power flow equations, there are constraints between the estimated power injection and the estimated admittance matrix as follows.
where |v i | and |v k | are the voltage magnitude of bus i and bus k, respectively, θ ik is the voltage angle difference between bus i and bus k, G ˆik and B ˆik are the estimated conductance and susceptance between bus i and bus k, respectively.In order to simplify the symbolic representation without losing its generality, the following will uniformly use v to represent |v|.In addition, according to the characteristics of admittance matrix, the optimization model needs to meet the following constraints.
(1) Where there is no connecting branch between bus i and bus k, it has G ˆik = B ˆik = 0.
(2) For the non-diagonal element of the admittance matrix, it always has: (3) Since the shunt resistance in distribution network could be neglected [15], the diagonal element of the admittance matrix always has: (4) The admittance matrix is symmetrical as follows.
where [] T represents matrix transpose.
With the premise that the topology information of the dynamic distribution grid is unavailable, we can construct a mixed-integer programming problem by introducing a binary variable representing the line connection status to jointly identify the grid topology and line parameters, whereas there is barely an effective method for solving this problem.Furthermore, for the two buses without connecting branches, it is difficult to optimize the corresponding element in G ˆand B ˆto 0. In this paper, the corresponding value of the unconnected line in the estimated admittance matrix is set to 0 by noise reduction, which can effectively imply the topological information and reduce the number of optimization variables.On this basis, a two-stage identification method for jointly estimating the distribution network topology and line parameters is proposed to ensure the accuracy of the identification model.
The principle of noise reduction is that due to the constraints of power flow and the characteristics of the admittance matrix, if there is no connecting branch between bus i and k, then G ˆik and B ˆik will be gradually optimized to a number close to 0; consequently, the value of |G ˆik |/|G ˆii | would be sufficiently small.Therefore, we can calculate |G ˆik |/|G ˆii | for each non-diagonal element G ˆik in G ˆ; if the value is less than ω g , the noise reduction threshold and corresponding branch should be eliminated, as well as G ˆik and B ˆik should be set to 0. Owing to the fact that the distribution network usually adopts a radial or weakly meshed grid structure, the number of branches connecting bus i is much lower than N − 1; therefore, ω g can be properly set to 1/(N − 1).
In order to further verify the rationality of the proposed criterion, we use four IEEE test grids to calculate the |G ik |/|G ii | of each branch.As shown in Table 1, for all branches in the four grids, the values of |G ik |/|G ii | are greater than ω g .Accordingly, we can effectively remove the unconnected branches by noise reduction.The overall flow of the proposed two-stage identification method is shown in Figure 1.In stage 1, the approximate topology and line parameters are quickly obtained based on linear regression, and the number of optimization variables is reduced significantly.On this basis, the identification results are modified by decoupling the iterative optimization between variables in stage 2. At the same time, the unconnected branches are further eliminated by noise reduction.The final identification result is outputted until the iteration converges.
Energies 2024, 17, x FOR PEER REVIEW 5 of 20 On this basis, the identification results are modified by decoupling the iterative optimization between variables in stage 2. At the same time, the unconnected branches are further eliminated by noise reduction.The final identification result is outputted until the iteration converges.

Initial Identification Strategy Based on Linear Regression
The nonlinearity of power flow equations may lead the identification model to become a non-convex optimization problem, which is difficult to solve; its linearization can improve the convergence and robustness of the identification algorithm [24,25].Therefore, a linear power flow model is established in this stage, and we propose an initial identification strategy based on linear regression.Although the simplification of power flow equations and the neglect of the voltage angle affect the identification accuracy of line parameters to a certain extent, the main purpose of this stage is to quickly remove most of the unconnected branches and reduce the number of optimization variables, as well as to provide the basic topology and initial values of line parameters for stage 2. In that stage, the estimated admittance matrix will be further modified based on a classical power flow model to improve the accuracy of the identification results.
According to the characteristics of a normally operating distribution gird, we can obtain the following approximations [24][25][26]: (1) The voltage magnitude of bus i is close to 1 p.u., which implies The voltage angle difference ik θ is usually less than 5°, so we have:

Initial Identification Strategy Based on Linear Regression
The nonlinearity of power flow equations may lead the identification model to become a non-convex optimization problem, which is difficult to solve; its linearization can improve the convergence and robustness of the identification algorithm [24,25].Therefore, a linear power flow model is established in this stage, and we propose an initial identification strategy based on linear regression.Although the simplification of power flow equations and the neglect of the voltage angle affect the identification accuracy of line parameters to a certain extent, the main purpose of this stage is to quickly remove most of the unconnected branches and reduce the number of optimization variables, as well as to provide the basic topology and initial values of line parameters for stage 2. In that stage, the estimated admittance matrix will be further modified based on a classical power flow model to improve the accuracy of the identification results.
According to the characteristics of a normally operating distribution gird, we can obtain the following approximations [24][25][26]: (1) The voltage magnitude of bus i is close to 1 p.u., which implies ∆v i = v i − 1 is a small number.
(2) The voltage angle difference θ ik is usually less than 5 • , so we have: With the approximations discussed above, the active power flow equation can be expressed as follows.
Replacing v k with 1 + ∆v k , and combining Equations ( 10) and ( 11), we obtain: Equations ( 6) and ( 7) imply that term (a) and (c) in Equation ( 13) are 0; furthermore, ∆v k multiplied by θ ik yields the value of term (e) to be much smaller than term (b) and term (d); therefore, we can omit term (e) and obtain: Analogically, we can obtain the linear equation of reactive power flow as follows.
Equations ( 14) and ( 15) can be rewritten in matrix form, and we can obtain the linear power flow model, as follows. where T are vectors containing the measurement data of all buses.
Assuming that we obtain M 1 independent samples without voltage angle information, it is necessary to further simplify Equation (16).Considering that the voltage angle difference between the slack bus and the other bus is typically small, Equation ( 16) is simplified in stage 1, as follows. where The solutions of G ˆand B ˆare: In order to eliminate the unconnected branches, G ˆand B ˆshould be modified by noise reduction.That is, traversing G ˆ, and setting G ˆik and B ˆik to 0 if the value of |G ˆik |/|G ˆii | is less than ω g .Additionally, G ˆand B ˆshould hold the symmetry of the admittance matrix, thus we symmetrize G ˆand B ˆto obtain G ˆSym and B ˆSym as follows: for convenience, we denote G ˆSym and B ˆSym as G ˆ, B ˆ, respectively.Moreover, for the purpose of eliminating the influence of removed branches on regression results, we should renew the remaining non-zero elements in G ˆand B ˆas follows: where G ˆi and B ˆi are the non-zero elements in the i-th row of G ˆand B ˆ, respectively, [P/V] i and [Q/V] i are vectors corresponding to the measurement data of bus i, ∆V i is a voltage matrix corresponding to the column indexes of non-zero elements in the i-th row.
Repeat the steps of noise reduction, symmetrization, and element renewing until G ˆand B ˆremain unchanged before and after iteration; we can then obtain the initial identification results of the topology and line parameters.

Enhanced Identification Strategy Based on Decoupling Iterative Optimization
In stage 1, the influence of the voltage angle on line parameter identification accuracy is temporarily ignored, and there are certain modeling errors in the linear regression model.Hence, in this stage, we will take the change of voltage angle into consideration, and iteratively modify G ˆand B ˆ, i.e., outputs in stage 1, based on the classical power flow model, so as to obtain more accurate identification results.
In stage 2, the variables of the optimization problem include branch admittance and the voltage angle.Simultaneously optimizing all variables together may slow down the execution speed of the algorithm, or even make the iterative process difficult to converge.In order to improve the robustness of optimization algorithm, we take the branch admittance as the main optimization variable and propose a decoupling iterative optimization method between the voltage angle and branch admittance.In each iteration, we first obtain the estimated voltage angle corresponding to each data sample through a pseudo power flow calculation; on this basis, the adaptive ridge regression model is established to modify G ând B ˆ, and the topology is further corrected by noise reduction.The iterative process ends until the convergence condition is satisfied.
In the pseudo power flow calculation, if we use the existing power system simulation tools, such as MATPOWER (version 7.1) [27], to build the distribution network model, and take the measurement data of power injection and G ˆ, B ˆas inputs, the voltage angle information can be obtained through the power flow calculation.
Once the estimation value of the voltage angle is acquired, we are able to establish a more accurate mathematical model to solve the line parameters.

Mathematical Optimization Model and Its Solving Algorithm
Taking line admittance as variables, the power flow equations can be rewritten as follows.
where g ij and b ij are the conductance and susceptance of line l ij , respectively, E is the set of connected lines.We take Equation (25) as an example to illustrate the establishment process of the mathematical optimization model.Equation (25) suggests that there is a linear relationship between p i and g ij , b ij .We denote the set of connected lines obtained in stage 1 as E ˆwith the basis M ˆ, then the matrix form of Equation ( 25) can be expressed as follows: where p = [p 1 . . .p N ] T , is the active power injection measurement, g = [g 1 . . .
T , are matrices containing the line conductance and susceptance in E ˆ, a g , a b ϵ R N×Mˆ, their elements can be expressed as follows.
where τ ϵ R Mˆ×2 , its first and second columns store the start bus and end bus index of each line in E ˆ, respectively.We denote a p = [a g a b ], x = [g; b], then Equation ( 27) can be expressed as follows: A minimized loss function model for solving line parameters can be built based on Equation (30).We denote the number of data samples in stage 2 as M 2 , and the optimization model based on square loss function can be expressed as follows: where A 1 = [a p1 ; . ..; a pM2 ], y 1 = [p 1 ; . ..; p M2 ].
In order to prevent overfitting and enhance the stability of the solution, the L 2 regularization term is added to the loss function to construct the ridge regression model as follows: where λ is the regularization parameter.The regularization term of the ridge regression model equally penalizes all parameters of x; consequently, biased estimates may occur when the distribution grid contains several lines with relatively large admittance values.Therefore, an adaptive ridge regression model is established as follows: where '•' is the matrix dot product operator, w ϵ R Mˆ×2 as follows: where both δ and γ are constant parameters, referring to [28], we select δ = 10 −5 , γ = 2, x 0 is the initial estimated value of x, which can be obtained by least square method as follows: Energies 2024, 17, 830 Equations ( 33)-( 35) indicate that the adaptive ridge regression model adds a penalty weight coefficient to each variable according to the initial estimated value of the corresponding line parameter.The weight coefficient is smaller when the admittance of the corresponding line is larger; as a result, the degree of shrinkage is reduced.
Analogically, we can obtain (36) from Equation (26): where A 2 = [a q1 ; . ..; a qM2 ], y 2 = [q 1 ; . ..; q M2 ], a q = [a b −a g ], q = [q 1 . . .q N ] T .Based on the above analysis, the flow chat of the proposed enhanced identification algorithm is shown in Figure 2. In the figure, ξ is the empirical threshold for deciding whether the iteration converges or not, which is set to 0.02.Appendix A specifically introduces the process of determining the value of ξ.It is worth noting that the adaptive ridge regression model in (33) and (36) are convex optimization problems, and stage 1 provides an appropriate initial value for the estimation of branch admittance; as a result, the enhanced identification algorithm has a fast solution speed and good convergence performance.

The Number of Measurement Samples
In order to improve the accuracy of the initial identification results, the measurement data of stage 1 should have a certain degree of redundancy.Fortunately, the linear regression problem is easy to solve, and the identification algorithm with redundant measurements still has a fast execution speed.The number of optimization variables is lower in stage 2 than in stage 1; as a consequence, we can reduce the samples used in stage 2 to save computing resources.In order to ensure the robustness of the algorithm, the number of independent equations M2 × N in Equation (33) should not be less than 2 × M ^.Therefore, M2 should not be less than 2M ^/N.

The Value of λ
λ directly affects the performance of the algorithm.When the value of λ is too small, the effect of the penalty term is not obvious; when the value of λ is too large, there is a high risk of eliminating too many lines, which makes it difficult to converge in a power flow calculation.Generalized cross-validation (GCV) selects the appropriate value of λ by minimizing V(λ), defined as follows [29]:  In order to improve the accuracy of the initial identification results, the measurement data of stage 1 should have a certain degree of redundancy.Fortunately, the linear regression problem is easy to solve, and the identification algorithm with redundant measurements still has a fast execution speed.The number of optimization variables is lower in stage 2 than in stage 1; as a consequence, we can reduce the samples used in stage 2 to save computing resources.In order to ensure the robustness of the algorithm, the number of independent equations M 2 × N in Equation (33) should not be less than 2 × M ˆ.Therefore, M 2 should not be less than 2M ˆ/N.

The Value of λ
λ directly affects the performance of the algorithm.When the value of λ is too small, the effect of the penalty term is not obvious; when the value of λ is too large, there is a high risk of eliminating too many lines, which makes it difficult to converge in a power flow calculation.Generalized cross-validation (GCV) selects the appropriate value of λ by minimizing V(λ), defined as follows [29]: where I is the identity matrix, tr() represents the trace of matrix, K 1 is defined as follows: Figure 3 shows the V(λ) curve of an IEEE 33-bus system in one iteration.The blue dot in the figure is the minimum point, and the curve changes gently near the minimum value.In order to better reflect the advantage of regularization, we select λ where the curve begins to rise significantly; that is, the λ corresponding to the red star point in the figure.

Simulation Parameter Setting
The identification algorithm was executed in MATLAB, and the convex optimization model was solved by CVX toolbox.IEEE 33-and 118-bus systems (case 1 and case 2 re spectively for short) were used to verify the effectiveness of the proposed identification method.The grid structure of the test cases is shown in Figure 4. We chose bus 1 as the slack bus and assumed that all buses in the grids were equipped with smart meters.
The one-day active power injection measurement data of 1200 users in China were collected as the original sample sets, and 50-150 users were randomly assigned to each bus except the slack bus to simulate the power consumption characteristics of the distri bution network.We emulated the corresponding reactive power injection according to a random lagging power factor cos φ, e.g., cos φ~Unif (0.85, 0.95) [30], and the voltage mag nitude data were simulated by MATPOWER.The sampling period of the original dataset was 1 min.With the intention of alleviating the data redundancy of the regression mode and reducing the consumption of computational resources, we set the sampling interva as 5 min.Affected by the accuracy of the measuring equipment, the sample data were contaminated with noise.In order to verify the practical applicability of the proposed method in real scenarios, noise was added to the measurement data.In most cases, the accuracy level of existing smart meters deployed in distribution networks is 0.2 s, so we set 0.2% additional error ε for the power injection data.Furthermore, we set ωg of case 1 and case 2 as 0.03 and 0.008, respectively, and the M2 of case 1 and case 2 as 10 and 20 respectively.

Simulation Parameter Setting
The identification algorithm was executed in MATLAB, and the convex optimization model was solved by CVX toolbox.IEEE 33-and 118-bus systems (case 1 and case 2 respectively for short) were used to verify the effectiveness of the proposed identification method.The grid structure of the test cases is shown in Figure 4. We chose bus 1 as the slack bus and assumed that all buses in the grids were equipped with smart meters.high risk of eliminating too many lines, which makes it difficult to converge in a power flow calculation.Generalized cross-validation (GCV) selects the appropriate value of λ by minimizing V(λ), defined as follows [29]: where I is the identity matrix, tr() represents the trace of matrix, K 1 is defined as follows: Figure 3 shows the V(λ) curve of an IEEE 33-bus system in one iteration.The blue dot in the figure is the minimum point, and the curve changes gently near the minimum value.In order to better reflect the advantage of regularization, we select λ where the curve begins to rise significantly; that is, the λ corresponding to the red star point in the figure.

Simulation Parameter Setting
The identification algorithm was executed in MATLAB, and the convex opti model was solved by CVX toolbox.IEEE 33-and 118-bus systems (case 1 and c spectively for short) were used to verify the effectiveness of the proposed iden method.The grid structure of the test cases is shown in Figure 4. We chose bus slack bus and assumed that all buses in the grids were equipped with smart met The one-day active power injection measurement data of 1200 users in Ch collected as the original sample sets, and 50-150 users were randomly assigned bus except the slack bus to simulate the power consumption characteristics of th bution network.We emulated the corresponding reactive power injection accord random lagging power factor cos φ, e.g., cos φ~Unif (0.85, 0.95) [30], and the volta nitude data were simulated by MATPOWER.The sampling period of the original was 1 min.With the intention of alleviating the data redundancy of the regressio and reducing the consumption of computational resources, we set the sampling as 5 min.Affected by the accuracy of the measuring equipment, the sample d contaminated with noise.In order to verify the practical applicability of the p method in real scenarios, noise was added to the measurement data.In most c accuracy level of existing smart meters deployed in distribution networks is 0.2 set 0.2% additional error ε for the power injection data.Furthermore, we set ωg and case 2 as 0.03 and 0.008, respectively, and the M2 of case 1 and case 2 as 10 respectively.

Simulation Parameter Setting
The identification algorithm was executed in MATLAB, and the convex optimization model was solved by CVX toolbox.IEEE 33-and 118-bus systems (case 1 and case 2 respectively for short) were used to verify the effectiveness of the proposed identification method.The grid structure of the test cases is shown in Figure 4. We chose bus 1 as the slack bus and assumed that all buses in the grids were equipped with smart meters.

Simulation Parameter Setting
The identification algorithm was executed in MATLAB, and the convex optimization model was solved by CVX toolbox.IEEE 33-and 118-bus systems (case 1 and case 2 respectively for short) were used to verify the effectiveness of the proposed identification method.The grid structure of the test cases is shown in Figure 4. We chose bus 1 as the slack bus and assumed that all buses in the grids were equipped with smart meters.
The one-day active power injection measurement data of 1200 users in China were collected as the original sample sets, and 50-150 users were randomly assigned to each bus except the slack bus to simulate the power consumption characteristics of the distribution network.We emulated the corresponding reactive power injection according to a random lagging power factor cos φ, e.g., cos φ~Unif (0.85, 0.95) [30], and the voltage magnitude data were simulated by MATPOWER.The sampling period of the original datasets was 1 min.With the intention of alleviating the data redundancy of the regression model and reducing the consumption of computational resources, we set the sampling interval as 5 min.Affected by the accuracy of the measuring equipment, the sample data were contaminated with noise.In order to verify the practical applicability of the proposed method in real scenarios, noise was added to the measurement data.In most cases, the accuracy level of existing smart meters deployed in distribution networks is 0.2 s, so we set 0.2% additional error ε for the power injection data.Furthermore, we set ωg of case 1 and case 2 as 0.03 and 0.008, respectively, and the M2 of case 1 and case 2 as 10 and 20, respectively. (a)

Performance Evaluation Index
We selected the F1 score as the index to evaluate the accuracy of the topology identification results.The F1 score was defined as follows: where P f and R f are, respectively, the index of precision and recall as follows: where TP, FP and FN, respectively, denote the number of branches that are correctly identified, the number of branches that are mistakenly identified as connected, and the number of branches that are mistakenly identified as unconnected.
We selected the mean absolute percentage error (MAPE) as the index to evaluate the accuracy of line parameter identification results, which is defined as follows: where m is the number of lines, ij g ^ and ij b ^ are the estimated conductance and suscep- tance of lij, respectively.

Identification Results
The identification results of test cases are shown in Table 2.The F1 scores of two cases The one-day active power injection measurement data of 1200 users in China were collected as the original sample sets, and 50-150 users were randomly assigned to each bus except the slack bus to simulate the power consumption characteristics of the distribution network.We emulated the corresponding reactive power injection according to a random lagging power factor cos φ, e.g., cos φ~Unif (0.85, 0.95) [30], and the voltage magnitude data were simulated by MATPOWER.The sampling period of the original datasets was 1 min.With the intention of alleviating the data redundancy of the regression model and reducing the consumption of computational resources, we set the sampling interval as 5 min.Affected by the accuracy of the measuring equipment, the sample data were contaminated with noise.In order to verify the practical applicability of the proposed method in real scenarios, noise was added to the measurement data.In most cases, the accuracy level of existing smart meters deployed in distribution networks is 0.2 s, so we set 0.2% additional error ε for the power injection data.Furthermore, we set ω g of case 1 and case 2 as 0.03 and 0.008, respectively, and the M 2 of case 1 and case 2 as 10 and 20, respectively.

Performance Evaluation Index
We selected the F1 score as the index to evaluate the accuracy of the topology identification results.The F1 score was defined as follows: where f P and f R are, respectively, the index of precision and recall as follows: where TP, FP and FN, respectively, denote the number of branches that are correctly identified, the number of branches that are mistakenly identified as connected, and the number of branches that are mistakenly identified as unconnected.
selected the mean absolute percentage error (MAPE) as the index to evaluate the accuracy of line parameter identification results, which is defined as follows: where m is the number of lines, g ˆij and b ˆij are the estimated conductance and susceptance of l ij , respectively.

Identification Results
The identification results of test cases are shown Table 2.The F1 scores of two cases both exceed 0.9 after stage 1, indicating that the initial identification algorithm can effectively remove most of the unconnected lines.However, due to the simplification of the power flow model and the neglect of the voltage angle information, there is still a fraction of misidentified elements in E ˆand certain errors in the estimated value of the line admittance.After stage 2, two cases both achieved correct identification of the grid topology, and the MAPE of line parameters was between 0.26% and 0.65%.Considering that we set ε as 0.2%, the line parameter identification accuracy of the two cases is within the acceptable range.Figures 5 and 6 show the parameter estimation errors of each line after stage 1 and stage 2, respectively.After modifying the identification results in stage 2, the estimation accuracy of all line parameters has been significantly improved.

Comparative Analysis of Linear Models
In order to verify the superiority of the proposed linear power flow model in topology identification, we select two typical linear power flow models for comparative testing with different levels of measurement errors.The corresponding mathematical model is shown in Table 3.
The initial identification results of case 2 with three linear models are shown in Table 4.For all levels of measurement errors, the topology identification accuracy of proposed model is higher than that of other two comparison models, and the proposed model is robust to data noise.

Comparative Analysis of Linear Models
In order to verify the superiority of the proposed linear power flow model in topology identification, we select two typical linear power flow models for comparative testing with different levels of measurement errors.The corresponding mathematical model is shown in Table 3.

Comparative Analysis of Linear Models
In order to verify the superiority of the proposed linear power flow model in topology identification, we select two typical linear power flow models for comparative testing with different levels of measurement errors.The corresponding mathematical model is shown in Table 3.

Model Linear Power Flow Equations
[24] Energies 2024, 17, 830 In order to verify the performance superiority of the adaptive ridge regression model, we selected the minimum loss function model (model A for short), the ridge regression model (model B for short), and the adaptive lasso regression model based on L 1 regularization (model C for short) [31] for comparison.The test results are shown in Figure 7.
model B adding the same penalty weight to all variables, branches are more likely to be deleted incorrectly, resulting in errors in topology identification.Compared with the adaptive ridge regression model, the number of algorithm iterations based on model A is higher and the estimation error of the line admittance is relatively higher.The IMAPE(g) in case 2 increases by 0.72%, indicating that the over-fitting can be effectively solved by regularization, so that the proposed model has better regression performance.Among the four optimization models, the identification algorithm based on model B has the most iterations and the lowest accuracy with respect to line parameter estimation.The IMAPE(b) in case 1 reaches 5.69%.This is because the ridge regression model excessively penalizes the elements with large admittance values, which leads to suboptimal estimation.The iterations of the identification algorithm based on model C are fewer, indicating that, compared with L2 regularization, L1 regularization has a faster convergence speed.However, the model proposed in this paper still has higher accuracy with regard to line parameter estimation.This is because L1 regularization can not only constrain parameters and prevent overfitting, but also constrains some line parameters with small admittances to zero, thus is more suitable for feature selection.While the constraint characteristic of L2 regularization can prevent a variable from being deleted directly, if the weight is too large in the optimization process, the optimization model is not particularly sensitive to data noise; therefore, the identification performance is superior.

Sensitivity Analysis
In this subsection, we tested the influence of measurement errors and various grid structures on the identification results to verify the robustness of proposed identification method.

Sensitivity to Measurement Errors
Data samples inevitably have measurement errors.The identification results of proposed method with different measurement errors are shown in Figure 8.For all levels of errors, the proposed method can correctly identify the grid topology, and the number of iterations is relatively stable for the enhancement identification algorithm.With the increase of measurement errors, the estimation accuracy of line parameters decreases All models except model B can correctly identify the topology-model B has the misidentified lines in case 2. The reason for this is that, for case 2, there are more line parameters to be estimated, and stage 1 can only output rough admittance estimates.Due to model B adding the same penalty weight to all variables, branches are more likely to be deleted incorrectly, resulting in errors in topology identification.Compared with the adaptive ridge regression model, the number of algorithm iterations based on model A is higher and the estimation error of the line admittance is relatively higher.The I MAPE(g) in case 2 increases by 0.72%, indicating that the over-fitting can be effectively solved by regularization, so that the proposed model has better regression performance.Among the four optimization models, the identification algorithm based on model B has the most iterations and the lowest accuracy with respect to line parameter estimation.The I MAPE(b) in case 1 reaches 5.69%.This is because the ridge regression model excessively penalizes the elements with large admittance values, which leads to suboptimal estimation.The iterations of the identification algorithm based on model C are fewer, indicating that, compared with L 2 regularization, L 1 regularization has a faster convergence speed.However, the model proposed in this paper still has higher accuracy with regard to line parameter estimation.This is because L 1 regularization can not only constrain parameters and prevent overfitting, but also constrains some line parameters with small admittances to zero, thus is more suitable for feature selection.While the constraint characteristic of L 2 regularization can prevent a variable from being deleted directly, if the weight is too large in the optimization process, the optimization model is not particularly sensitive to data noise; therefore, the identification performance is superior.

Sensitivity Analysis
In this subsection, we tested the influence of measurement errors and various grid structures on the identification results to verify the robustness of proposed identification method.

Sensitivity to Measurement Errors
Data samples inevitably have measurement errors.The identification results of proposed method with different measurement errors are shown in Figure 8.For all levels of errors, the proposed method can correctly identify the grid topology, and the number of iterations is relatively stable for the enhancement identification algorithm.With the increase of measurement errors, the estimation accuracy of line parameters decreases accordingly.When ε is 2%, I MAPE(g) and I MAPE(b) in case 1 are 3.66% and 4.80% respectively, which are less than 2.5 times of ε, I MAPE(g) and I MAPE(b) in case 2 are 2.57% and 5.96% respectively, which are less than 3 times of ε.Therefore, even if the data accuracy is low, the estimation error of line parameters is still within the normal range.

Sensitivity to Various Grid Structures
Distribution networks have various topology structures.By changing the state of line switches in case 1, we generated 9 comparative cases (called T1-T9, respectively).The specific modification scheme is shown in Table 5. Notably, the structures of T8 and T9 are weakly meshed.
The identification results are shown in Table 6.For various grid structures, the proposed method can correctly identify the topology, and the estimation accuracy of line parameters has strong stability.However, compared with others, there are more iterations of the enhanced identification algorithm in T9 due to the robustness of the ridge regression.Therefore, the proposed identification method is effective for various grid structures; however, for some weakly meshed grids, the time cost of the identification algorithm may be relatively high.If real-time application is strictly required, the iterative convergence condition can be relaxed to improve the solving speed, which may sacrifice some accuracy.

Sensitivity to Various Grid Structures
Distribution networks have various topology structures.By changing the state of line switches in case 1, we generated 9 comparative cases (called T1-T9, respectively).The specific modification scheme is shown in Table 5. Notably, the structures of T8 and T9 are weakly meshed.The identification results are shown in Table 6.For various grid structures, the proposed method can correctly identify the topology, and the estimation accuracy of line parameters has strong stability.However, compared with others, there are more iterations of the enhanced identification algorithm in T9 due to the robustness of the ridge regression.Therefore, the proposed identification method is effective for various grid structures; however, for some weakly meshed grids, the time cost of the identification algorithm may be relatively high.If real-time application is strictly required, the iterative convergence condition can be relaxed to improve the solving speed, which may sacrifice some accuracy.

Conclusions
Accurate topology and line parameters are prerequisites for the safe and efficient operation of a distribution network.Based on the measurement data collected by smart meters, a method for the joint identification of grid topologies and line parameters was proposed in this paper.First, a linear power flow model was established according to the operation characteristics of the distribution network, and the initial identification results were quickly obtained based on linear regression.On this basis, a decoupling iterative optimization algorithm was proposed to modify the identification results.The superiority and robustness of the proposed method were verified in a case study.The method is suitable for distribution networks without PMU equipment.Moreover, our method was also validated for both radial and weakly meshed grids.Furthermore, for the purpose of verifying the effectiveness of the selected ξ in a large-scale distribution network, the IEEE-118 bus system was used to execute the same with ξ = 0.02.The results show that the algorithm ends after 19 iterations.In order to confirm the validity of ξ, we continued to execute the algorithm artificially, and the changes in ∆x ˆ, I MAPE(g) and I MAPE(b) in the iterative process are shown in Figure A2.It can be observed that ∆x ˆ, I MAPE(g) and I MAPE(b) are basically stable after the 19th iteration, after which the estimation errors of the line parameters will not decrease obviously.It is thus appropriate to end the algorithm after the 19th iteration; consequently, 0.02 is a suitable value for ξ.
to confirm the validity of ξ, we continued to execute the algorithm artificially, and the changes in Δx ^, IMAPE(g) and IMAPE(b) in the iterative process are shown in Figure A2.It can be observed that Δx ^, IMAPE(g) and IMAPE(b) are basically stable after the 19th iteration, after which the estimation errors of the line parameters will not decrease obviously.It is thus appropriate to end the algorithm after the 19th iteration; consequently, 0.02 is a suitable value for ξ.

Figure 1 .
Figure 1.Overall flow of the identification method.

Figure 1 .
Figure 1.Overall flow of the identification method.

Figure 2 .
Figure 2. Flow chart of enhanced identification algorithm.

Figure 2 .
Figure 2. Flow chart of enhanced identification algorithm.

Energies 2024 ,
17, x FOR PEER REVIEW

Figure 5 .
Figure 5. Relative errors of line parameters after stage 1.(a) IEEE 33-bus distribution system; (b) IEEE 118-bus distribution system.(where g represents line conductance, b represents line admittance).

Figure 6 .
Figure 6.Relative errors of line parameters after stage 2: (a) IEEE 33-bus distribution system; and (b) IEEE 118-bus distribution system.

Figure 6 .
Figure 6.Relative errors of line parameters after stage 2: (a) IEEE 33-bus distribution system; and (b) IEEE 118-bus distribution system.

Figure 7 .
Figure 7. Identification results under different optimization models: (a) F1 score and number of enhanced identification algorithm iterations; (b) I MAPE(g) ; and (c) I MAPE(b) .

Figure 8 .
Figure 8. Identification results under different measurement errors: (a) F1 score and number of enhanced identification algorithm iterations; (b) I MAPE(g) ; and (c) I MAPE(b) .

Figure A1 .
Figure A1.Convergence characteristics of IEEE 33-bus distribution system in stage 2.

Figure A2 .
Figure A2.Convergence characteristics of IEEE 118-bus distribution system in stage 2.

Table 2 .
Identification results of test cases.

Table 3 .
Typical linear power flow model.

Table 3 .
Typical linear power flow model.

Table 4 .
Initial identification results under different linear power flow models.

Table 6 .
Identification results under different structures.
1 T0 is the original topology of case 1.