A Data Preprocessing Based on Cluster and Testing of Parameter Identification Method in Power Distribution Network

Li, Bin; Chen, Haoran; Hu, Ke

doi:10.3390/en15218007

Open AccessCommunication

A Data Preprocessing Based on Cluster and Testing of Parameter Identification Method in Power Distribution Network

by

Bin Li

^1,†,

Haoran Chen

^2,† and

Ke Hu

^3,*

¹

Energy Research Institude, Nanjing Institute of Technology, Nanjing 211167, China

²

College of Information and Communication, National University of Defense Technology, Wuhan 430000, China

³

College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Energies 2022, 15(21), 8007; https://doi.org/10.3390/en15218007

Submission received: 8 September 2022 / Revised: 9 October 2022 / Accepted: 21 October 2022 / Published: 28 October 2022

Download

Browse Figures

Versions Notes

Abstract

:

We present a data prepossessing method for parameter identification based on clustering and hypothesis testing in a power distribution network to successfully achieve a more accurate result. This method considers the similarities of data in both spatial relationship and statistical theory, then builds a sophisticated data processing method to improve the performance of dynamic model-based parameter identification models, i.e., Markov chain Monte Carlo and sequential model-based global optimization. We applied this data processing method to the actual feeder data with no adjustment of the other condition. The experiment shows that our method achieves a 4.8% improvement in accuracy at most.

Keywords:

power distribution network; parameter identification; data prepossessing

1. Introduction

In a power distribution network (PDN), to ensure system security and stability, parameter values in the line and transformer, such as the line resistance, line reactance, transformer resistance, transformer reactance, transformer conductance, and transformer electrical susceptance, should be recorded as accurately as possible. There are many research studies about this, such as the full-scale approach [1], normalized Lagrange multiplier test [2], finite-time algorithm [3], Lagrange multiplier method [4], and deep learning [5]. However, considering the lack of real-time measuring equipment, all methods mentioned above are not suitable to current conditions. To solve this problem, research on parameter identification based on node equations have been proposed recently. To simplify the computation, a power-flow calculation circuit model is built and shown in Figure 1.

In Figure 1,

P_{d}

,

Q_{d}

, and

U_{d}

are signed as the active power, reaction power, and voltage on the high-voltage side, respectively. These parameters can be obtained directly by real-time measurement, and they are the inputs of the node equations. Other parameters, such as transformer electrical

R_{d}

, transformer resistance

X_{d}

, transformer conductance

G_{d}

, transformer electrical susceptance

B_{d}

, line resistance

R_{c d}

, and line reactance

X_{c d},

are hard to detect. Node equations provide a solution to achieve parameter identification. The basic schedule is shown in Figure 2.

Obeying this schedule, some solutions such as the least squares (LS) method [6], Markov chain Monte Carlo (MCMC) [7], and sequential model-based global optimization (SMBO) [8], are proposed to solve node equations. These solutions work only based on the limited real-time measurement raw data, including active power, reactive power, and voltage on the high-voltage side of the transformer, which can be collected easily from the end layer of the power grid. According to the above description, all methods focus on the parameter identification algorithm. Among them, the methods based on node equations can complete the parameter identification of the entire PDN and only need a small amount of easily measured raw data. However, there is no research about how to select the most appropriate raw data to make the identified parameters more accurate.

Most of the time, these methods based on node equations are appropriate for ideal environments in a PDN and assume that the parameters in the line and transformers are stable and unchangeable during a long period. However, it is inevitable to avoid some impacts to the parameters such as heat, troughs, and peaks in electricity consumption, unstable conditions, and grounding resistance. As a result, the actual parameter values in the line and transformers are not stable and irreversible changes may occur. Considering that the sample frequency of the measurement raw data is very low, even if in sequential raw data samples, there may be differences, resulting in a large variance between the parameter identification results and the real value. Thus, it is necessary to do some processing of the measurement raw data to reduce the variance; however currently, there is no research about it.

In this study, we make a reasonable assumption that the parameters to be identified are similar under the conditions of a three-phase balance and high similarity of input parameters. Based on this assumption, we propose a measurement raw data processing method based on clustering and hypothesis testing to achieve a more accurate parameter identification result. The whole schedule is shown in Figure 3. The cluster process is to obtain different subsets with the strongest similarity. Meanwhile, considering the computation will grow greatly with more raw data, the hypothesis testing is introduced to aggregate the sets, and guarantee the results follow statistical theory.

To achieve more accurate identification results with smaller variance, firstly, the measurement raw data is divided into minimal granularity by cluster algorithm, then clusters with statistical similarity are aggregated into a set by hypothesis testing. Finally, each set is the input of the node equation instead of the whole measurement raw data. According to these processes, higher identification’s accuracy is obtained.

2. Materials and Methods

2.1. Node Equations

As Figure 1 shows, to simplify the computation, the three phases are assumed to be balanced as the premise for calculating the power flow in this paper. All three input parameters (active power

P_{L d}

, reaction power

Q_{L d}

, and reaction voltage

U_{L d}

on the high-voltage side) and six identified parameters (transformer electrical

R_{d}

, transformer resistance

X_{d}

, transformer conductance

G_{d}

, transformer electrical susceptance

B_{d}

, line resistance

R_{c d}

, and line reactance

X_{c d}

) satisfy the following Equations (1)–(3),

P_{d} = P_{L d} + \frac{P_{L d}^{2} + Q_{L d}^{2}}{U_{L d}^{2}} R_{d}^{T} + U_{L d}^{2} G_{d}^{2}

(1)

Q_{d} = Q_{L d} + \frac{P_{L d}^{2} + Q_{L d}^{2}}{U_{L d}^{2}} X_{d}^{T} + U_{L d}^{2} B_{d}^{2}

(2)

U_{d} = \sqrt{{(U_{L d} + Δ U_{d}^{T})}^{2} + {(δ U_{d}^{T})}^{2}}

(3)

Δ U_{d}^{T}

and

U_{d}^{T}

in Equation (3) are the longitudinal and transverse components of the transformer impedance voltage drop at bus

d

, and can be expressed by Equations (4) and (5), respectively.

Δ U_{d}^{T} = \frac{P_{Ld} R_{d}^{T} + Q_{Ld} X_{d}^{T}}{U_{Ld}}

(4)

{δ U}_{d}^{T} = \frac{P_{Ld} X_{d}^{T} + Q_{Ld} R_{d}^{T}}{U_{Ld}}

(5)

The equation of bus

c

can be expressed as Equations (6)–(8), whereas the final equation is calculated by Equation (9):

U_{c} = \sqrt{{(U_{d} + Δ U_{cd}^{T})}^{2} + {({δ U}_{cd}^{T})}^{2}}

(6)

Δ U_{cd}^{T} = \frac{P_{d} R_{cd}^{T} + Q_{d} X_{cd}^{T}}{U_{Ld}}

(7)

{δ U}_{cd}^{T} = \frac{P_{d} X_{cd}^{T} + Q_{d} R_{cd}^{T}}{U_{d}}

(8)

f_{c} = \sqrt{{(U_{d} + Δ U_{cd}^{T})}^{2} + {({δ U}_{cd}^{T})}^{2}} - \sqrt{{(U_{Lc} + Δ U_{c}^{T})}^{2} + {({δ U}_{c}^{T})}^{2}}

(9)

The parameters in the line and transformer can be calculated based on Equations (1)–(9) with the measured data of power and voltage.

2.2. Dimension Processing

The measurement data is made up of

P_{d}

,

Q_{d}

, and

U_{d}

, and the dimension is three. To guarantee the cluster algorithm has the ability to explore the association better among all samples of measurement raw data, it is necessary to expand the dimensions with additional information. In this study, the time stamp of each sample is selected and the following information is extracted: peak/trough of electricity consumption, month, holiday/workday. This is because the extra information is an all category feature and can be encoded as a multidimensional vector. Thus, the measurement raw data with three dimensions can be converted to a new data set with more dimensions.

2.3. Cluster Processing

There is a problem that the encoded category feature is sparse in each dimension, but the measurement raw data is dense; in other words, the density distribution of data is uneven, which would cause traditional cluster algorithms, such as k-means and DBSCAN [9], to work ineffectively. One idea to overcome this problem is to run an algorithm which produces a special order of the database with respect to its density-based clustering structure containing the information about every clustering level of the data set. In this study, we improved the DBSCAN algorithm and redefine the core distance as following:

c o r e D i s t (x) = {\begin{array}{l} U n d e f i n e d \begin{matrix} , & | N_{ε} (x) | < M \end{matrix} \\ d (x, N_{ε}^{M} (x) \begin{matrix} , & | N_{ε} (x) | \geq M \end{matrix} \end{array}

(10)

where x is a sample and

x \in Χ

;

ε

stands for radius of neighborhood;

M

is the min-points parameter where

M = s a m p l e_n u m / 10

;

N_{ε}^{i} (x)

represents the i’th neighbor to x in the data set

N_{ε} (x)

.

If

x, y \in Χ

, the reachability distance of y with respect to x is defined as following:

r e a c h D i s t (y, x) = {\begin{array}{l} U n d e f i n e d \begin{matrix} , & | N_{ε} (x) | < M \end{matrix} \\ \max {c o r e D i s t (x), D i s t (x, y)} \begin{matrix} , & | N_{ε} (x) | \geq M \end{matrix} \end{array}

(11)

In particular, when x is core point and

| N_{ε} (x) | \geq M

r e a c h D i s t (y, x) = \min {η : y \in N_{η} (x)}

(12)

2.4. Hypothesis Testing

Through clustering processing, a series of sample clusters in a scale of minimum granularity were obtained. Then, the hypothesis testing is applied to extract the relationship among these sample clusters, and the clusters having a statistical similarity to a sample set are gathered. Thus, sample sets with significant differences were obtained.

2.5. Identification Algorithms

The latest research about parameter identification include MCMC [7] and SMBO [8]. The principle of these two algorithms are building a model with a posteriori probability to optimize an objective function using the existing samples in parameter identification tasks. In this model, samplings are regarded as being from one of the distributions such as the Gaussian distribution or the compound Gaussian distribution. Thus, the model can be represented easily and the dimensions of the model are lower than most of the machine learning and deep learning models. In this paper, these two algorithms are applied to verify the effect of our data prepossessing method.

3. Experiments

3.1. Clustering Processing Results

Silhouette Coefficient is an evaluation method of clustering effect and it can be represented as Equation (13),

S (i) = \frac{b (i) - a (i)}{\max {a (i), b (i)}}

(13)

where a(i) represents the cohesion of the sample point, and it is as Equation (14),

a (i) = \frac{1}{n - 1} \sum_{j \neq i}^{n} distance (i, j)

(14)

where j represents other sample points in the same class as sample i, and distance represents the distance between i and j. Therefore, the smaller a(i) is, the closer the cluster is. If the value of S approaches 1, the more obvious the contour, and the better the clustering effect.

The results of the clustering process are shown in Figure 4 and Table 1, in detail. In addition, to exhibit the clustering results better, the scale of raw data is normalized to (−1, 1), and then the PCA algorithm is applied to reduce the dimensions to two. Thus, the x and y axes in Figure 4 have no label information.

From Figure 4 and Table 1, our method obtains a more reasonable cluster result and a higher Silhouette Coefficient. It indicates that our method performs better.

3.2. Experiment Results

A total of 1499 samples in the raw dataset were collected in a standard 10 kV feeder with the sampling period of 15 min [10]. It is clear that the voltage is closed to the three-phase balance as shown in Figure 5, and it satisfied the requirements of node equations. Thus, our method can be applied.

To make comparisons between with and without clustering processing, two related parameter identification methods, MCMC and SMBO, are introduced. We mainly measure the identification results by checking the errors of the voltage in high-voltage side between the measurement value and calculation value. The error distribution is introduced to weigh the error level [11,12,13,14,15,16]. The experiment results are exhibited in Figure 6.

In Figure 6,

U_{s t d}

is the target voltage value;

U_{M C M C}

and

U_{S M B O}

stand for the results of MCMC and SMBO; and

U_{M C M C_{C}}

and

U_{S M B O_{C}}

represent the results with our processing, respectively.

From Figure 6b,d, after processing by our method, the identification parameters obtained by

M C M C_{C}

and

S M B O_{C}

all achieve better error distribution than MCMC and SMBO. From Table 2,

M C M C_{C}

improves about 4.8 and 2.2% on MAE and RMSE, respectively; while

S M B O_{C}

improves about 3.6 and 3.0%, respectively. The experiment results indicate that our method can prompt the parameter identification in a PDN.

4. Conclusions

Parameter identification plays a key role in PDN calculation and analysis; however, current research pays more attention to the residuals between the true values and calculating values of voltage, but there is a lack of analysis and reprocessing. In this paper, the relationships of high dimension space among the raw feeder data are mined by an improved clustering algorithm, and the theory of statistics, such as hypothesis testing, is applied to guarantee processing results are reasonable. From the experiments,

M C M C_{C}

and

S M B O_{C}

all achieve better error distribution and metrics value (MAE and RMSE) than MCMC and SMBO, which indicates that our method can prompt the parameter identification in a PDN.

Although this method can achieve a higher accurate identification of results, it is limited to the serial computing mode, the application faces efficiency problems. In future work, parallel and dynamic methods will be considered to achieve a more efficient search mode among all similarity subsets.

Funding

This work is supported by Nanjing Institute of Technology Scientific Research Start-up Fund for High-level Introduced Talents (YKJ202046) and State Grid Jiangsu Electric Power Co., Ltd. (J2020097).

Data Availability Statement

Restrictions apply to the availability of these data. Data was obtained from State Grid Jiangsu Electric Power Co. Ltd. and are available from the corresponding author with the permission of State Grid Jiangsu Electric Power Co. Ltd.

Conflicts of Interest

The authors declare no conflict of interest.

References

Lave, M.; Reno, M.J.; Peppanen, J. Distribution System Parameter and Topology Estimation Applied to Resolve Low-Voltage Circuits on Three Real Distribution Feeders. IEEE Trans. Sustain. Energy 2019, 10, 1585–1592. [Google Scholar] [CrossRef]
Lin, Y.; Abur, A. Strategic Use of Synchronized Phasor Measurements to Improve Network Parameter Error Detection. IEEE Trans. Smart Grid 2018, 9, 5281–5290. [Google Scholar] [CrossRef]
Arnaldo, A.; Soriano–Rangel, C.A.; Mancilla–David, F.; Ortega, R.; Strunz, K. Finite–time identification of the Thévenin equivalent parameters in power grids. Int. J. Electr. Power Energy Syst. 2020, 116, 105534. [Google Scholar]
Venkata Krishna, B.; Padma Srinivasu, N. A Direct Approach for Distribution System Load Flow Solutions. Int. J. Eng. Adv. Technol. 2019, 8, 63. [Google Scholar]
Mahmud, M.; Kaiser, M.S.; Hussain, A.; Vassanelli, S. Applications of Deep Learning and Reinforcement Learning to Biological Data. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 2063–2079. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Yuan, K.; Wei, Z. Power line parameter identification based on multi-innovation least square algorithm. Electr. Power Eng. Technol. 2020, 39, 55–160. (In Chinese) [Google Scholar]
Wang, H.; Jiao, H.; Chen, J.; Liu, W. Parameter Identification for a Power Distribution Network Based on MCMC Algorithm. IEEE Access 2021, 9, 104154–104161. [Google Scholar] [CrossRef]
Li, B.; Ma, J.Y.; Hu, K.; Xu, S.H.; Jiao, H.; Chen, J.M.; Liu, W. A Method for Parameter Identification of Distribution Network Equipment Based on Sequential Model-Based Optimization. Int. Trans. Electr. Energy Syst. 2022, 14, 9880284. [Google Scholar] [CrossRef]
Liu, S.; Song, W.; Ying, M.; Sun, W.; Wang, R.; Niu, R. Seismic facies analysis for angle gathers based on DBSCAN waveform clustering. Geophys. Prospect. Pet. 2022, 58, 773–782. [Google Scholar]
Ma, Z.; Zhang, S.; Li, J.; Cai, Y.C. A Single-phase Grounding Fault Section Location Method for Power Distribution Network Based on Parameter Identification. Guangdong Electr. Power 2019, 32, 65–71. [Google Scholar]
Song, J.; Dall’Anese, E.; Simonetto, A.; Zhu, H. Dynamic Distribution State Estimation Using Synchrophasor Data. IEEE Trans. Smart Grid 2020, 11, 821–831. [Google Scholar] [CrossRef] [Green Version]
Anderson, P.M. Analysis of Faulted Power Systems; Wiley-IEEE Press: Hoboken, NJ, USA, 1995. [Google Scholar]
Zhang, S. Power system state estimation based on particle swarm optimization algorithm. Power Syst. Prot. Control 2010, 38, 86–89+95. (In Chinese) [Google Scholar]
Carquex, C.; Rosenberg, C.; Bhattacharya, K. State Estimation in Power Distribution Systems Based on Ensemble Kalman Filtering. IEEE Trans. Power Syst. 2018, 33, 6600–6610. [Google Scholar] [CrossRef] [Green Version]
do Nascimento Sepulchro, W.; Encarnacao, L.F.; Brunoro, M. Harmonic Distortion and Power Flow State Estimation for Distribution Systems Based on Evolutionary Strategies. IEEE Lat. Am. Trans. 2015, 13, 3066–3071. [Google Scholar] [CrossRef]
Weakliem, D.L. A Critique of the Bayesian Information Criterion for Model Selection. Sociol. Methods Res. 1999, 27, 359–397. [Google Scholar] [CrossRef]

Figure 1. Power-flow calculation circuit model.

Figure 2. Schedule of parameter identification based on node equations.

Figure 3. Schedule of measurement raw data preprocessing.

Figure 4. Results after clustering and aggregation: (a) DBSCAN [9]; (b) our method.

Figure 5. Three-phase voltage: (a) high side, (b) low side.

Figure 6. Experiment results with two identification algorithms: (a) calculation results based on MCMC and

M C M C_{C}

; (b) calculation error distribution based on MCMC and

M C M C_{C}

, (c) calculation results based on SMBO and

S M B O_{C}

; (d) calculation error distribution based on SMBO and

S M B O_{C}

.

Figure 6. Experiment results with two identification algorithms: (a) calculation results based on MCMC and

M C M C_{C}

; (b) calculation error distribution based on MCMC and

M C M C_{C}

, (c) calculation results based on SMBO and

S M B O_{C}

; (d) calculation error distribution based on SMBO and

S M B O_{C}

.

Table 1. Metrics of the processing results.

Algorithm	Estimated Cluster Number	Silhouette Coefficient
DBSCAN	10	0.146
Our method	9	0.477

Table 2. Experiment results.

Algorithm	MAE	RMSE
MCMC	65.6009 ± 0.3562	67.1831 ± 0.3202
$M C M C_{C}$	62.4666 ± 0.3655	65.7231 ± 0.3122
SMBO	64.1194 ± 0.3573	67.1831 ± 0.3202
$S M B O_{C}$	61.8680 ± 0.3218	65.2317 ± 0.3010

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, B.; Chen, H.; Hu, K. A Data Preprocessing Based on Cluster and Testing of Parameter Identification Method in Power Distribution Network. Energies 2022, 15, 8007. https://doi.org/10.3390/en15218007

AMA Style

Li B, Chen H, Hu K. A Data Preprocessing Based on Cluster and Testing of Parameter Identification Method in Power Distribution Network. Energies. 2022; 15(21):8007. https://doi.org/10.3390/en15218007

Chicago/Turabian Style

Li, Bin, Haoran Chen, and Ke Hu. 2022. "A Data Preprocessing Based on Cluster and Testing of Parameter Identification Method in Power Distribution Network" Energies 15, no. 21: 8007. https://doi.org/10.3390/en15218007

APA Style

Li, B., Chen, H., & Hu, K. (2022). A Data Preprocessing Based on Cluster and Testing of Parameter Identification Method in Power Distribution Network. Energies, 15(21), 8007. https://doi.org/10.3390/en15218007

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Data Preprocessing Based on Cluster and Testing of Parameter Identification Method in Power Distribution Network

Abstract

1. Introduction

2. Materials and Methods

2.1. Node Equations

2.2. Dimension Processing

2.3. Cluster Processing

2.4. Hypothesis Testing

2.5. Identification Algorithms

3. Experiments

3.1. Clustering Processing Results

3.2. Experiment Results

4. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI