1. Introduction
The distribution network constitutes a crucial component of the power grid, ensuring stable and efficient power delivery. Identifying the topology of LV networks is a prerequisite for developing accurate simulation models, which are vital for a cleaner, more efficient future grid. Accurate topology identification leads to precise models that extend the capabilities of existing networks, guide utility investments, and inspire technological developments to meet increasing electricity demands.
Smart-meter data can provide valuable insights into the network by capturing voltage, current, power, and harmonic information. Such data has been utilised in various studies for load profiling [
1], demand response programmes [
2], grid optimisation [
3], fault detection and diagnostics [
4], voltage prediction [
5], and power quality monitoring [
6]. However, relatively few studies have explored the use of harmonics to provide insights into the network, such as mapping the LV network, identifying impedance, and analysing faults such as high impedance connections [
7].
Research has been conducted regarding the deployment of smart-meters and how their data can be used in modelling and operating the electrical distribution system. In [
8], the authors investigated the data provided by smart-meters to support plans for setting new distribution network configurations; however, the heuristic approach used cannot guarantee globally optimal solutions. In contrast, other studies explored issues related to the availability and synchronisation of smart-meter data [
9,
10,
11]. In [
9], the researchers achieved network observability by extracting a subset of the total node data, while their analysis assumed idealised conditions regarding measurement errors and communication synchronisation. Other researchers have discussed how to limit pulling massive real-time data of smart-meters by developing an intelligent algorithm for efficiently collecting data for outage detection and mapping [
12].
Electrical utilities are increasingly interested in finding alternative solutions to determine distribution network topologies using smart-meter data. Software-based methods that rely on available smart-meter data can be implemented relatively quickly and at a low cost. Some research has shown promising results in modelling the network using these methods, but they provide only an approximation of the real network topology. In [
13], a methodology was developed relying on principal component analysis and its graph-theoretic interpretation to determine the distribution system topology. However, the authors assumed that all energy readings from the smart-meters are available, which is difficult to achieve in actual networks. In [
14], Pearson correlation was used to cluster customers’ voltage profiles in the same phases. This research did not consider the cases of asynchronous data and the effect of the impedance of the customers’ connections.
Recursive grouping algorithms have emerged as a powerful tool for topology estimation. Park et al. [
15] presented a method that can recover the full network structure using only smart-meter data from leaf nodes. The method analyses injection statistics at end users to infer the topology. This approach was extended by Pengwah et al. [
16] to handle scenarios with partial observability. In earlier research, Pengwah et al. [
17] utilised smart-meter data to quantify voltage sensitivity coefficients in response to fluctuations in load currents.
In [
18], a method combining t-distributed stochastic neighbour embedding (t-SNE) with density-based spatial clustering of applications with noise (DBSCAN) clustering was developed to identify LV distribution network topology using noisy smart-meter data. A two-stage method, combining linear power flow modelling with adaptive ridge regression was developed in [
19] to jointly identify distribution network topology and line parameters using smart meter data. In [
20], the authors present a wavelet-based topology identification method utilising only energy measurements from smart meters, demonstrating high accuracy whilst requiring minimal data. However, the method’s performance depends on customers exhibiting distinctive consumption patterns and shows accuracy degradation under low network observability or high renewable energy penetration.
A latent tree model approach was developed in [
21] for identifying low-voltage distribution grid topology using only end-user smart meter voltage data, employing Bayesian information criterion and expectation-maximisation algorithms to handle unmeasured intermediate nodes through a three-stage search methodology. However, the method relies on voltage correlation assumptions and requires sufficient voltage variation patterns for accurate latent node identification. The research presented in [
22] describes a two-stage topology identification framework using a modified expectation-maximisation algorithm called split-EM for historical data analysis, followed by machine learning classifiers for real-time prediction, capable of handling mixed topologies without prior knowledge of topology. However, the method requires substantial historical data for effective classifier training.
A correlation-based algorithm, enhanced with Fisher’s Z-transform, was developed to infer the partial topology of LV distribution networks, as reported in [
7]. This study focused solely on identifying transformer/phase mappings and did not address the topological connectivity between Installation Control Points (ICPs). In this research, the authors used voltage THD correlation instead of voltage correlation to make the algorithm more robust and resilient against noise and missing records. Their findings indicated that harmonic data provided more reliable results than voltage and energy measurements. Building upon their work, the proposed research extends the methodology by employing individual harmonic components.
Previous work on the identification of distribution network topology used the estimation of the network voltage sensitivity matrix [
23], but it required synchronised voltage and energy data. A low-complexity algorithm was developed in [
24] to identify the network operational structure relying on the voltage magnitudes and voltage phases, but this research discussed only radial networks. In [
25], the authors proposed an algorithm to identify network topologies as a probable graphic model by regularised linear regression. This algorithm can model meshed networks with the integration of Distributed Energy Resources (DERs). In [
26], the authors discussed how to identify network topologies with limited measurements by approximating these measurements as normally distributed random variables and by using the Maximum Likelihood Principle. The study only considers scenarios where a single breaker changes status at a time; extending the approach to handle multiple simultaneous breaker changes would result in an exponential increase in computational complexity.
This paper provides insights into the factors that influence the accuracy of topology identification algorithms. Furthermore, it introduces a novel approach to identifying the topology of an LV distribution network using smart-meter harmonic measurements. The proposed method demonstrates that utilising harmonic information can yield more precise results compared to voltage-based approaches.
The paper is structured as follows:
Section 2 discusses the factors affecting topology identification algorithms;
Section 3 describes the methodology for generating synthetic harmonic measurements;
Section 4 proposes the methodology for identifying the LV distribution network topology using harmonics;
Section 5 presents and discusses the results; and
Section 6 provides a conclusion.
2. Influential Factors on Topology Identification Algorithm
This section provides a comprehensive discussion of the factors that may affect the accuracy of the topology identification algorithm. The discussion focuses solely on algorithms that utilise voltage correlation. Furthermore, the discussion highlights the differences in correlation between harmonic measurements and RMS voltage measurements.
Figure 1 illustrates how these factors can be grouped and linked.
2.1. Upstream Voltage Variability
The variability in upstream voltage, typically in the Medium-Voltage (MV) (usually 11 kV to 66 kV), influences the correlation among different ICPs. This correlation is shaped by various upstream events, including transformer automatic tap changes, network switching, and various network operations, all of which have wide downstream voltage implications. Consequently, these events strengthen the correlation between ICPs associated with different phases and LV networks, effectively decreasing the accuracy of the topology identification algorithm.
The harmonic behaviour of a network is complex, with each non-linear device interacting with its terminal conditions. The sensitivity of harmonic current emission is a function of the device’s component parameters and controls.
The harmonic voltage distortion at the various ICP results from two sources: harmonic currents injected into the LV network, and harmonic distortion in the upstream network (sometimes modelled as a background harmonic voltage source). This harmonic voltage distortion is a function of the injected harmonic currents and the network’s system admittance matrices, which embody the harmonic impedances of all components and hence resonances.
Although the tap changer operation of transformers does alter the transformer impedance and hence system admittance matrices, the primary effect is on the fundamental voltage, as the dX/dtap is relatively small. However, the presence of upstream capacitors that have been switched IN or OUT can change the system admittance matrices significantly, and hence the resonance point. This will be reflected in a significantly different background harmonic level as seen from the supply point of the LV network.
Upstream capacitor switching will influence the voltage distortion levels at the ICPs, but not the correlation. Similarly, the harmonic contribution from upstream non-linear industrial loads and/or distributed energy resources does influence the harmonic voltage distortion at the ICPs. However, the harmonic voltage distortion seen at the ICP is a combination of the upstream distortion influence (MV distortion × Transfer Coefficient) and the harmonic voltage across the branches of the LV system. It is this latter component that allows identification to occur.
The frequency-dependent nature of the inductive branch impedance (Z = R + jX), and also the capacitance present in the network, means that higher-order harmonics encounter a higher series branch impedance. This leads to a more localised observable impact of the higher-order harmonic currents on the harmonic voltage distortion.
In this paper, background harmonic distortion from upstream sources was explicitly represented in the network modelling to ensure these effects were captured and to ensure the robustness of the identification algorithm to upstream harmonic sources.
2.2. Load Characteristics
The ICP’s voltage is directly affected by the real and reactive power consumed by the ICP. This occurs because the power is approximately proportional to the drawn current, resulting in increased voltage drop along the line. Equipment/loads used for heating and cooling typically have a higher power consumption than many other loads. However, such higher power loads may be absent during seasons with mild temperatures that do not require adjustments, or in areas with poor economic conditions where electricity is unaffordable. Moreover, these higher power loads may be missing when households rely on alternative energy sources such as gas, wood pellets, or coal for heating purposes.
A relatively high consumption is crucial for establishing a strong correlation between the voltages at different ICPs, thereby enhancing the accuracy of identifying network topology.
The introduction of DERs has the potential to significantly impact voltage fluctuations within distribution networks. This can pose challenges for traditional techniques that rely on voltage correlation for network topology identification. Moreover, the integration of rooftop solar panels, with their increasing penetration, further exacerbates the issue of phase identification within the network. When many solar systems in the same area generate power at the same time, they cause similar voltage changes across all phases, making phase identification even more difficult.
Conversely, harmonics do not directly depend on the load power but rather on the load type. In the presence of increased non-linear loads, harmonics become more apparent in the LV network, enabling a robust harmonic correlation between ICPs. Moreover, the rise in DERs and electric vehicles will also contribute to an increase in harmonics in the LV network. Consequently, this allows for more accurate results in identifying network topology based on harmonic measurements.
Energy theft can alter the consumption trend of ICPs, although the overall energy consumption will increase. Energy theft can increase the voltage drop, and it might provide more distinguished features to the phases where the theft is occurring, resulting in more accurate topology identification results.
Balanced three-phase loads, or phase-to-phase connected loads (that are connected between phases rather than phase-to-neutral), will affect the same phases with a similar trend of voltage variation. This may lead to the introduction of correlations between phases that do not reflect their topological connections.
2.3. Load Demand Response
The demand response of the ICP may be synchronised in a manner that complicates the network identification process. This occurrence can be outlined as follows:
Some utilities introduce programmes to incentivise consumers to reduce or shift their electricity usage during peak demand periods. Moreover, certain companies offer programmes such as “free hours” power plans. These initiatives can result in increased or decreased demand during specific times, leading to voltage variations that exhibit similar trends across wider geographical areas.
The implementation of ripple control for hot water cylinders can adjust the load across the entire network in a synchronised manner. This adjustment strengthens the correlation between different phases, potentially causing misidentification of the network structure. The broader the coverage of ripple control across an area, the more pronounced its effects become. This phenomenon occurs because its impact permeates through both LV networks and is compounded by similar LV demand control measures in adjacent networks.
The Vehicle-to-Grid (V2G) technology presents another challenge. Although V2G technologies remain in the early developmental stages, the potential for synchronised control by third parties may emerge in future implementations. When such capabilities materialise, the coordinated utilisation of V2G infrastructure on a large scale could substantially increase the correlation between different phases and networks within the system, thereby complicating the network identification processes.
Overall, the introduction of DERs, dynamic load demand, and emerging technologies such as V2G have profound implications for network topology identification and phase recognition in distribution systems. Consequently, there is a pressing need for the development of novel methodologies and algorithms to effectively address these challenges in contemporary distribution network management.
2.4. Network Structures
The structure of a network can significantly impact the correlation matrix between different ICPs. For example, in networks with radial and long feeders, the correlations between ICPs are more distinct. Conversely, mesh networks might make these correlations less clear, potentially affecting the performance of the network identification algorithms.
The extensive size of the network and the high number of ICPs typically result in a more distinct correlation matrix. However, a higher number of ICPs can increase the probability of exhibiting misleading strong correlations between ICPs that lack physical linkage.
The voltages of the ICPs are directly influenced by the impedance of the network. However, different harmonic orders perceive the network impedance differently, particularly in terms of reactance. Higher-order harmonics perceive the network impedance as a higher value, leading to increased harmonic voltage distortion and more distinct values for each phase in the network. Consequently, it is expected that voltage harmonic distortion will provide better performance in topology identification, especially in networks with relatively lower line impedance.
2.4.1. Capacitor Banks and Voltage Regulators
The operation of capacitor banks and voltage regulators in LV networks can obscure natural voltage fluctuations, reducing the ability to distinguish correlations between interconnected nodes within the network. This occurs because these devices maintain voltage levels within specified limits, thereby modifying the inherent voltage variability across the network.
2.4.2. Poor Neutral Conductor Connection
If these issues remain undetected, they can cause unexpected voltage behaviours on individual phases. This may result in certain phases experiencing overvoltages or undervoltages, which can lead to strengthening the correlation between ICPs that are not necessarily topologically connected.
2.5. Consideration of Measurement Characteristics
2.5.1. Data Synchronisation
Ensuring accurate correlations between corresponding phases relies heavily on effective data synchronisation. Nonetheless, the significance of this synchronisation diminishes as data time resolution is prolonged. It is worth noting that selecting extended time resolution may compromise the accuracy of correlation results. In general, this occurs because the unique features of voltage fluctuations may begin to fade, leading to less precise outcomes. Furthermore, synchronisation issues could become more prominent for the instantaneous voltage readings from smart-meters, or, in some cases, the minimum and maximum values of specific periods. For harmonic measurements, synchronisation presents less critical concern as these typically employ standard window times. According to IEC 61000-4-30 [
27], the standard window time for harmonic measurements is typically 10/12 cycles for 50/60 Hz systems, which corresponds to a 200 ms window.
2.5.2. Metering Errors
Smart-meter errors can generally be categorised into two types:
Systematic Errors: Systematic errors are constant and may arise from various issues, such as faulty internal components or incorrect calibration. To address these errors, specific correlation techniques, such as Pearson correlation, can be employed. Pearson correlation is less sensitive to constant errors and instead focuses on capturing the strength and direction of the linear relationship between data sets.
Random Errors: Random errors associated with smart-meters are typically unpredictable and may arise from fluctuations in measured data due to various factors, such as a malfunctioning smart-meter. These errors can lead to outlier data, potentially impacting the performance of correlation analyses. However, the significance of these random errors diminishes when they occur continuously over an extended period, with resampling characteristics similar to those found in systematic errors.
2.6. Methods for Analysing Voltage Measurements
2.6.1. Correlation Techniques
Correlation and clustering techniques represent the most common analytical methods for identifying network topologies. However, depending on the data characteristics, certain correlation techniques prove more suitable than others. For example, Pearson correlation is appropriate for assessing a linear relationship. Spearman’s Rank and Kendall’s Tau correlations are more suitable for assessing a non-linear relationship.
2.6.2. Enhancing the Precision of Topology Identification by Integrating the Locational Information of ICPs
The accuracy of topology identification can be greatly enhanced by integrating the locational information of ICPs, when available. This integration enables the identification of ICPs that are more likely to be physically linked. These locations are then compared with their correlation values, as some may exhibit strong correlations but lack a physical link, possibly from a different network. Such situations can arise when processing a large number of ICPs within a relatively short measurement period. Therefore, the algorithm should incorporate a location-based approach rather than solely relying on measurement results, which can sometimes be misleading. This enhances the precision of correlation identification and, consequently, contributes to the construction of more accurate network topologies.
3. Methodology for Generating Synthetic Harmonic Measurements
Obtaining actual harmonic data remains a challenge in the distribution sector compared to readily available power and voltage data. The industry may not perceive significant benefits from these measurements, especially since there are often no regulations on the harmonic emissions from domestic ICPs. Presently, harmonics are undeniably starting to cause issues at the distribution level, and they are gaining more attention from industry bodies.
For this research, synthetic harmonic measurements were generated to be utilised in testing the proposed algorithm for identifying the distribution network topology. This synthetic generated data provides flexibility, scability, and full network observability. The methodology used for generating synthetic harmonic and voltage measurments was presented in [
28]. This methodology comprises three stages, as illustrated in Algorithm 1. This approach combines data from the CREST demand model [
29] and PANDA (equiPment hArmoNic DAtabase) [
30] to create detailed profiles, including; active power, reactive power, and harmonic emissions. The process entails generating initial real power load profiles, assigning specific appliances to each dwelling, and subsequently calculating the corresponding reactive power and harmonic emissions. This method produces varied and realistic load profiles that accurately reflect the diversity of household electricity consumption patterns and harmonic emissions across ICPs.
The CREST tool, which was utilised to generate realistic power load profiles, is provided in Excel format. The tool’s Excel VBA script was modified to extract individual appliance load profiles.
Table 1 presents the main electrical devices modelled in this study.
Algorithm 1 Generation of Load Profiles and Harmonic Emissions for Residential Dwellings |
Require: CREST demand model, PANDA database , Number of dwellings N- 1:
Stage 1: Initial Active Load Demand Generation - 2:
Initialisation of CREST demand model - 3:
Parameter configuration for CREST model - 4:
▹ T denotes time interval - 5:
Stage 2: Appliance Selection and Assignment - 6:
- 7:
for to N do - 8:
with variance - 9:
- 10:
end for - 11:
- 12:
Stage 3: Load Profile Refinement and Harmonic Emission Generation - 13:
▹ Refined active load profiles - 14:
▹ Generated reactive load profiles - 15:
▹ Generated harmonic emissions - 16:
Output: - 17:
: Refined active power demand profiles - 18:
: Reactive power demand profiles - 19:
: Harmonic current emissions - 20:
: Set of device assignments per dwelling
|
Harmonic measurements for each appliance were obtained from the PANDA. Multiple actual measurements for household appliances with different types and ratings were selected. The measurements for the appliances were selected randomly, without considering the supply type (whether from a pure sinusoidal source or the main grid), as no clear relationship exists between these conditions and the appliances’ harmonic emissions [
31]. Therefore, for each dwelling, specific PANDA appliances measurements were created by adjusting the original data with a uniform random variable, as shown in Equation (
1).
The random value alters the actual harmonic measurements of the PANDA database by ±5% to vary the measurements around the recorded level. This approach creates a unique database of appliances for each house, introducing more realistic variations in harmonic emissions. The harmonic emissions are influenced by terminal voltage waveform and exhibit variation around its recorded operating point.
Following the establishment of a unique database of appliances for each house, power consumption and harmonic emissions were aggregated to create comprehensive load profiles. These profiles were subsequently assigned to the respective houses within the network, as shown in
Figure 2. Each load profile encompasses two primary components:
Power characteristics, including both active power and reactive power components, with loads modelled as constant power elements.
Harmonic emission profiles, selected based on the corresponding power profiles, with emissions modelled as direct current harmonic injection sources.
The results of the power-flow analysis and harmonic analysis were simulated using DIgSILENT PowerFactory 2024 (x64). As quasi-dynamic simulation for harmonic analysis is not available in PowerFactory, an automation script utilising Python 3.10.8 was developed to obtain these results.
The voltage and THD measurements for different nodes are shown in
Figure 3 and
Figure 4. The figures show that ICPs with the same phases exhibit similar voltage trends, leading to strong correlation between them. However, the correlation is relatively weaker for different phases, indicating that they are located on different phases.
4. Proposed Algorithm for Identifying the Network Topology
The proposed methodology leverages harmonic voltage correlation patterns between ICPs to reconstruct the network topology. This approach is based on the fundamental principle that voltage variations at nodes within the same electrical phase exhibit stronger correlations compared to nodes on different phases or networks.
The primary objective of the proposed algorithm is to accurately identify the topology of LV distribution networks. Unlike conventional methods that depend on current measurements or power consumption data from ICPs, this approach relies solely on voltage-based metrics, including , –, and . By analysing these voltage correlations, the method effectively maps the underlying network structure.
This section introduces the methodology in detail, highlighting its novel aspects. Specifically, the proposed approach innovatively employs THD and individual harmonic voltage components (–) as topology indicators. Furthermore, it enhances the classical MST-KRUSKAL algorithm, modifying it to suit the specific characteristics of electrical distribution network topologies.
Figure 5 presents a high-level overview of the proposed algorithm for identifying the distribution network topology, while Algorithm 2 provides a more detailed description of the methodology, which consists of three main stages:
Stage I: Correlation and Distance Matrix Calculation. Development of correlation and distance matrices utilising voltage and harmonic measurement data, as elaborated in
Section 4.1.
Stage II: Topology Construction via Modified MST-KRUSKAL. Topology construction through implementation of the modified MST-KRUSKAL algorithm, as elaborated in
Section 4.2.
Stage III: Topological Similarity Assessment. Quantitative assessment of topological similarity between the estimated and actual network configurations, as elaborated in
Section 4.3.
Algorithm 2 Distribution Network Topology Identification using Modified MST-KRUSKAL |
Require:- 1:
: Complete measurement space - 2:
: RMS voltage measurements at ICPs - 3:
: Harmonic voltage components - 4:
: THD measurements - 5:
k: Target number of clusters, defined as Ensure: Network topology - 6:
Stage I: Correlation and Distance Matrix Calculation - 7:
for each measurement type do - 8:
▹ Correlation matrix - 9:
▹ Convert to distances - 10:
end for - 11:
Stage II: Topology Construction using Modified MST-KRUSKAL - 12:
▹ Initialise with nodes - 13:
▹ Sort by weighted distances - 14:
fordo - 15:
if then - 16:
- 17:
end if - 18:
if then - 19:
break - 20:
end if - 21:
end for - 22:
Stage III: Topological Similarity Assessment - 23:
▹ Calculate similarity metrics - 24:
return
|
4.1. Distance Matrix Calculation Using Pearson Correlation
A crucial step in hierarchical clustering is the construction of the distance matrix, which quantifies the dissimilarity between observations. As measurement data comprise voltage and harmonics, particularly in this case where the focus is on the relationship between variables rather than their absolute values, the Pearson correlation coefficient provides a natural basis for calculating distances. The process involves several steps:
First, the Pearson correlation coefficient
between two observations
x and
y is calculated as follows:
where
Let
be the correlation matrix of
n variables, where each element
represents the Pearson correlation coefficient between the
i-th and
j-th variables. The matrix is symmetric, with diagonal elements equal to 1 (
) and off-diagonal elements satisfying
, for all
.
To use the correlation matrix
for clustering, it is transformed into a distance matrix
using the following transformation:
This transformation ensures non-negative distances, where
for perfectly positively correlated variables (
) and
for zero or negatively correlated variables (
). The resulting distance matrix is as follows:
Each element of is derived from the corresponding correlation coefficient , ensuring compatibility with clustering algorithms by representing dissimilarity between variables. Note that the diagonal elements are now 0.
4.2. Modified Kruskal’s Minimum Spanning Tree Algorithm
Kruskal’s algorithm solves the minimum spanning tree (MST) problem using a greedy approach [
32]. Given a connected, undirected graph
with a weight function
, the algorithm builds a minimum spanning tree by iteratively selecting the lowest-weight edge that connects two distinct components. It begins with each vertex as its own tree and progressively merges these trees until a single tree spans all vertices.
Unlike the standard MST approach, wherein the process continues until a single tree spans all vertices, the proposed modified MST-KRUSKAL algorithm is designed to construct multiple spanning trees based on predefined network constraints. Rather than merging all components into a single tree, the algorithm ensures that precisely K trees are formed, where k corresponds to distinct network clusters. For instance, in a LV distribution network, each phase (A, B, and C) is treated as a separate cluster, ensuring that connections respect the underlying phase structure. This modification enables better alignment with practical electrical distribution constraints whilst preserving the efficiency of the original Kruskal’s approach.
The proposed Modified MST-KRUSKAL algorithm is formally defined in Algorithm 3 and it employs the following principal functions:
CreateGraph(): Initialises a graph structure with the vertex set , representing the collection of network nodes inclusive of ICPs.
SortedEdges(): Executes an ascending sort operation on the distance matrix , which encodes the edge weights of the graph.
HasPath(, i, j): Ascertains the existence of a path between nodes i and j in graph , thereby ensuring the avoidance of cyclic paths in the resultant structure.
AddEdge(, i, j): Incorporates a new edge into graph connecting vertices i and j, thus expanding the network topology.
NumberComponents(): Determines the present quantity of connected components (trees) formed within graph , providing a measure of network segmentation.
Algorithm 3 Modified MST-KRUSKAL() |
Require:- 1:
: Distance Matrix - 2:
k: Target number of clusters (three phases × number of networks) Ensure:- 3:
▹ Initialise with nodes - 4:
▹ Sort by weighted distances (ascending order) - 5:
for
do - 6:
if then - 7:
- 8:
end if - 9:
if then - 10:
break - 11:
end if - 12:
end for - 13:
return
|
The algorithm initialises by creating a singleton set for each ICP. It then examines edges in sequence, checking if they would form cyclic paths. Edges that do not create cycles are added to the forest, connecting different vertices. The algorithm continues this process while monitoring the number of trees. Initially, the number of trees equals the number of vertices (), as each vertex represents an individual tree. This number decreases as trees are merged, and the algorithm terminates when the number of trees reaches the target value k.
4.3. Topological Similarity in Electrical Distribution Networks
Electrical distribution networks possess distinct characteristics that make traditional graph comparison methods insufficient for meaningful topology comparison. The fundamental challenge in comparing electrical distribution network topologies lies in recognising functional configurations and understanding the impact of incorrect topologies.
For comparing network topologies, let
be a network topology with its edges and vertices in formal mathematical notation:
is the set of vertices;
is the set of edges.
Let represent the actual topology, and represent the estimated topology. The estimated and actual graphs have the same vertex set but may have different edges , or the same edges in the case of identical topologies.
To identify missing edges, the comparison algorithm examines each edge within the estimated topology . For every edge, vertices and are identified where , and the minimum edge count in is calculated. The minimum edge count function measures the smallest number of edges between vertices and in , providing a quantitative measure of topological differences that accounts for both local and global variations in network structure. The results from this comparison indicate the number of missing edges in compared with .
Similarly, incorrect edges are defined by examining each edge within the actual topology . In this framework, an edge is classified as incorrect or missing when the minimum edge count between vertices and () . This approximation yields more consistent results by providing relaxation for minor topology errors where minimum edge count between two vertices is small.
The similarity metric for topology comparison is defined as one minus the proportion of incorrect edges relative to the total number of edges. The metric
S ranges from 0 to 1, where 0 indicates no similarity (all edges are incorrect) and 1 indicates perfect similarity (no incorrect or missing edges):
The relationship between topological similarity and functional impact presents interesting dynamics in these networks. Two networks may present similar topological structures yet have significant impacts on the electrical distribution network; for example, edges connecting two different phases in the network. Conversely, some dissimilarities between two network topologies may have negligible impacts, such as edges that connect two close nodes in the same phases which are very close to each other. This equivalence stems from the physical principles that govern electrical distribution networks. When assessing topology accuracy, edge anomalies can also be categorised as follows:
These two types of incorrect connections are shown in
Figure 6.
Obtaining the type of incorrect edges provides a comprehensive assessment of topology accuracy, as these two types of incorrect connections have distinct impacts on the network. In particular, edges that incorrectly link ICPs across different phases and networks can significantly affect the overall network topology.
5. Results and Discussion
5.1. Initial Visualised Results for Topology Identification Algorithm Performance
This section evaluates the performance of the proposed topology identification methodology using two measurements: and . The results are visualised for three LV distribution networks (denoted as S, N, and E), each fed by a distinct transformer connected to an MV network. Estimated topology connections are depicted as coloured lines (red, blue, and green), corresponding to the three-phase structure, whilst actual connections are shown as grey dotted lines for comparative analysis.
Figure 7 illustrates the topology derived from
measurements, while the
-based results capture the general radial structure of the actual network, several incorrect links are observed. These inaccuracies can be categorised as follows:
: Connections between nodes of the same phase and network that deviate from the actual network. For example, the Blue line (phase C) incorrectly links node 2S to 61S in network S, whereas the correct connection should link 2S to either node 42S or 55S.
: Spurious links across different phases or networks. A prominent example is the blue line (phase C) connecting node 45N to 14S, which violates the radial hierarchy. Similarly, the edge between node 33E (phase B) and 53S (phase A) misrepresents actual connections.
In contrast,
Figure 8, based on
measurements, demonstrates superior accuracy in resolving phase-specific connections and minimising ambiguous links. The enhanced performance stems from
’s sensitivity to harmonic propagation patterns, which are inherently tied to network impedance and topology. This allows finer discrimination of ICPs’ phases and networks. For instance, the THD-based method correctly isolates phase C (blue) in network N, avoiding the cross-network errors seen in the
results.
The colour-coded phase identification in both figures aligns with the actual topology to varying degrees of accuracy. Whilst the -based approach shows alignment in simpler radial branches, the -based method achieves near-perfect correspondence with actual connections. This robustness arises from the unique harmonic signatures at each node. These serve as discriminative features that enhance topology inference, providing advantages not utilised by -based methods alone.
These findings underscore the advantages of incorporating harmonic distortion metrics into topology identification frameworks. Utility operators can leverage -driven insights, particularly in systems equipped with smart-meters capable of capturing harmonic data, to enhance network visibility and operational tasks such as phase balancing, network restructuring, fault localisation, and DER integration. The subsequent section delves into quantitative performance metrics to further validate these observations.
5.2. Performance Metrics: Similarity Score
The topology identification results depicted in
Figure 9 demonstrate a clear relationship between measurement type and similarity score across various time resolution settings. The results are illustrated through two visual representations:
Figure 9a displays a line graph depicting the similarity score as a function of time resolution, while
Figure 9b shows a matrix of numerical similarity scores, with each cell colour-coded to form a heatmap. The similarity score is calculated between the actual and estimated topologies (
,
) resulting from the proposed topology identification algorithm. The graph illustrates distinct performance patterns among (i)
, (ii) various harmonic components, and (iii)
, with notable variations in similarity scores as the time resolution increases. This analysis provides valuable insights into the effectiveness of different measurement types for topology identification purposes.
Figure 9 demonstrates that
measurements exhibit superior accuracy in topology identification across all time resolution settings, with perfect similarity scores for time resolutions of less than 30 min. In contrast, the results based on
show the lowest performance among the evaluated metrics.
Analysis of lower-order harmonics (–) reveals several important characteristics. These components exhibit relatively lower similarity scores, particularly when the time resolution is one minute compared to THD. For instance, the second harmonic () achieves a similarity score of only 0.926 at a one-minute resolution. This behaviour can be attributed to the nature of lower-order harmonics, where yields smaller impedance values. The reduced impedance allows these harmonics to propagate more extensively throughout the network, making them susceptible to cumulative effects from multiple harmonic emission sources. Consequently, more diffuse distortion patterns emerge, complicating the process of topology identification.
In contrast, higher-order harmonics (–) demonstrate superior performance in topology identification tasks compared to lower-order harmonics. Almost all of the higher harmonics achieve impressive similarity scores of 1 at a one-minute time resolution, maintaining better accuracy even with extended time resolutions. This enhanced performance stems from the increased impedance these harmonics encounter within the distribution network, leading to more localised effects that better preserve topology-specific characteristics. The stronger correlation between higher-order harmonics in the network makes them particularly valuable for identification applications, as they are less influenced by broader network conditions and load variations.
Overall, the trend shows that higher time resolutions tend to reduce similarity scores across most metrics, although the extent of this reduction varies. This occurs because increasing the time resolution smooths out the variation in measurement profiles and reduces the number of available measurement points. As a result, the similarity score generally decreases as the time resolution increases. A notable observation is the non-monotonic behaviour exhibited by several harmonic components—for example, THD, , and —which show local fluctuations in similarity scores. These fluctuations refer to increases or decreases in the similarity score as the time resolution increases. Such variations, observed within the 24 h testing period, may be influenced by the network structure or by inherent changes in harmonic content throughout the daily cycle. The limited duration of the testing period may also contribute to this variability. These findings indicate that the relationship between time resolution and similarity score is not linear or straightforward, particularly when comparing closely spaced time resolutions, such as 45 min and 60 min.
A notable observation is the non-monotonic behaviour exhibited by several harmonic components; for example, as evident in THD, , and , which show local fluctuations in similarity scores. These fluctuations refer to increases or decreases in the similarity score as the time resolution increases. Such variations, observed within the 24 h testing period, may be attributed to the network structure or the inherent variability in harmonic content throughout the daily cycle. Furthermore, this variability could be attributed to the limited testing period of one day. This behaviour suggests that the relationship between time resolution and similarity score is more complex than a simple relationship, particularly when comparing relatively close time resolutions (e.g., 45 min versus 60 min). The optimal harmonic order or time resolution for accurately identifying network topology likely depends on multiple factors, including the network structure and its complexity, load characteristics, and the length of the measurement period. Therefore, optimising the time resolution or selecting the appropriate harmonic order for a specific network would require an understanding of that system’s load behaviour, structural configuration, and measurement characteristics.
5.3. Performance Metrics: and
Other important metrics to consider are the indices:
and
, shown in
Figure 10 and
Figure 11. As illustrated in
Section 4.3,
represents edges that incorrectly connect ICPs within the same phase and LV network, whilst
represents edges that incorrectly link ICPs across different phases or networks. The similarity score was calculated based on the total number of incorrect edges. These two types of incorrect edges are assigned equal weights. However, from an electrical perspective,
may be more tolerable than
. This is because
connects different phases or networks, which can lead to errors during power-flow analysis. In contrast,
does not cause power-flow calculation errors, but it can reduce the accuracy of the results. This is why
is more acceptable.
For lower time resolution,
Figure 10 reveals a distinctive pattern in
values. These values demonstrate higher magnitudes for lower harmonic orders, whilst decreasing for higher-order harmonics. This trend suggests that lower harmonic orders generate more distributed harmonic distortion throughout the grid, resulting in stronger correlations between electrically distant ICP locations.
In contrast,
Figure 11 presents a different behaviour for
measurements. These values remain relatively constant across all harmonic measurements, including THD, and consistently maintain lower values compared to
.
As the time resolution increases, both and metrics exhibit an upward trend. However, these two metrics display distinct behaviours:
shows stronger sensitivity to time resolution, particularly for higher-order harmonics. This occurs because neighbouring nodes inherently share strong correlations and similar harmonic profiles at high frequencies. Prolonged averaging masks subtle nodal distinctions, amplifying misidentification within localised regions.
increases moderately with extended time resolution, but displays relative resilience at higher harmonics. Since these harmonics decay rapidly with distance, remote nodes maintain distinct profiles even under averaging. This inherent dissimilarity limits correlation strengthening between distant ICPs.
Lower-order harmonic behaviour: Both metrics follow comparable growth patterns for harmonics. The diffuse nature of low-frequency oscillations creates network-wide correlation uniformity, reducing differentiation between local and remote misconnection trends.
For THD, remains largely unaffected by the time resolution, indicating a more localised influence. In contrast, this localised effect causes to increase as the time resolution extends, reinforcing misidentification within nearby ICPs.
6. Conclusions
This research proposes a novel methodology for identifying distribution network topology using voltage harmonic measurements. Unlike conventional approaches, it eliminates the need for energy measurements, historical data, or geographical information. The algorithm is based on a three-stage process consisting of harmonic voltage correlation matrices, a modified Kruskal’s minimum spanning tree, and a similarity assessment. Key innovations include the first use of THD and individual harmonics (–) as topology identifiers, demonstrating superior accuracy compared with conventional RMS voltage measurements.
THD measurements consistently achieved perfect similarity scores (1.0) for time resolutions below 30 min, significantly outperforming traditional voltage-based approaches. Higher-order harmonics (–) demonstrated superior performance compared to lower-order harmonics due to frequency-dependent network impedance. Higher frequencies encounter increased impedance (), creating more localised effects that better preserve the unique characteristics of each network topology. In contrast, lower-order harmonics exhibit distributed propagation patterns throughout the network, where cumulative effects from multiple emission sources obscure the distinct signatures needed for accurate topology identification.
The error analysis revealed different behaviours for the two error types. Intra-network errors () show stronger sensitivity to time resolution, particularly for higher-order harmonics, as neighbouring nodes naturally share strong correlations and prolonged averaging masks subtle differences between adjacent nodes, leading to increased misidentification within localised regions. Conversely, inter-network errors () increase only moderately with extended time resolution and display relative resilience at higher harmonics, since these frequencies decay rapidly with distance, allowing remote nodes to maintain distinct harmonic profiles even when measurements are averaged over longer periods.
These results highlight the significant potential of leveraging smart-meter harmonic data to enhance distribution network monitoring and operation. The methodology provides utilities with improved network visibility by accurately identifying network topology, which can subsequently support critical operational tasks such as phase balancing, fault localisation, and optimal DER integration. Importantly, this enhanced capability is achieved without requiring substantial additional hardware investments, making it a cost-effective solution for modern distribution system management.