Next Article in Journal
Transient Prediction Model of Wellbore Temperature in Ultra-Deep Wells Considering Cementing Quality
Previous Article in Journal
A Vehicular Traffic Condition-Based Routing Lifetime Control Scheme for Improving the Packet Delivery Ratio in Realistic VANETs
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Improved DBSCAN-Based Electricity Theft Detection Using Spatiotemporal Fusion Features

1
Cincinnati Joint Co-Op Institute, Chongqing University, Chongqing 400044, China
2
State Grid Chongqing Electric Power Company Shinan Power Supply Branch, Chongqing 400044, China
3
State Grid Zhejiang Provincial Electric Power Company Anji County Power Supply Branch, Huzhou 313300, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(22), 12028; https://doi.org/10.3390/app152212028 (registering DOI)
Submission received: 11 October 2025 / Revised: 5 November 2025 / Accepted: 10 November 2025 / Published: 12 November 2025

Abstract

Electricity theft is a major source of non-technical losses in distribution networks, threatening both economic revenues and power supply reliability. This study addresses the identification of nodes exhibiting anomalous load behavior (anomalous nodes) in 10 kV distribution feeders. Based on the IEEE-33 bus benchmark system, the disturbance patterns induced by abnormal consumption are analyzed. The results show that voltage and current fluctuations intensify with increasing electrical distance from the power source, while branch loss peaks localize at the affected terminals and propagate unidirectionally along the power flow path. Building on these findings, an improved density-based spatial clustering of applications with noise (DBSCAN) method is proposed, integrating five spatial network features and sixteen temporal electrical features extracted from voltage, current, and power series. Prior to clustering, the features are standardized and reduced via principal component analysis (PCA), retaining over 90% of the cumulative variance. Validation on a hybrid dataset demonstrates that the proposed method achieves 90.7% accuracy, 87.5% recall, and an F1-score of 0.895, outperforming traditional K-means and approaching supervised CNN models without requiring labeled data. These results confirm the method’s robustness and suitability for practical deployment in distribution networks.

1. Introduction

Energy losses are inevitable throughout the process of electricity generation, transmission, and consumption. These losses can be categorized into technical losses (TL) and non-technical losses (NTL) [1,2]. TL primarily arise from inherent dissipation in transmission lines, substations, distribution systems, and metering processes required to maintain normal grid operation. Conversely, NTL primarily stems from abnormal electricity consumption behaviors, notably electricity theft at the distribution level, constituting controllable losses [3,4]. Global economic losses due to electricity theft are estimated to exceed USD 96 billion annually [5]. In China, NTL accounts for approximately 16% of total electricity generation, while in India, it exceeds 25% [6]. Even in developed countries, the corresponding losses are considerable, reaching about USD 10 billion in Canada and USD 6 billion in the United States [7]. Common theft techniques include line hooking, meter bypassing, and meter tampering [8]. Therefore, effective and accurate theft detection is essential for safeguarding utility revenues and ensuring distribution system reliability.
Traditional electricity theft detection relies on manual inspection and heuristic judgment, which are inefficient and lack precise localization. With the deployment of Advanced Metering Infrastructure (AMI) and the widespread installation of smart meters, utilities now have access to massive amounts of consumption data, providing the foundation for data-driven theft detection [9]. Existing methods can generally be divided into two categories: network-oriented and data-oriented approaches [10]. The former leverage the physical and topological characteristics of the distribution network, while the latter analyzes statistical and temporal patterns of customer load data.
Network-oriented methods typically utilize voltage, current, and other electrical measurements from distribution sensors, in combination with network topology, to identify abnormal nodes. Luan et al. proposed a “weight-dropped polling” state estimation method for theft detection [11]. Silva et al. developed a P-median model to optimize the placement of power-quality monitors [12]. Manito et al. introduced an Equivalent Operational Impedance (EOI) approach to distinguish TL and NTL using load-flow analysis [13]. Ferreira et al. employed node voltage and active/reactive power data to locate illegal connections [14]. Veeramani et al. utilized an IoT-based current comparison between distribution transformers and aggregated customer currents [15]. These approaches accurately represent the physical laws of distribution networks, directly reflect real operating conditions, and exhibit low dependence on customer-side data, making them interpretable and practically implementable in engineering applications.
Data-oriented methods focus on mining temporal consumption patterns such as daily energy usage and load fluctuations to identify abnormal behaviors. They can be further divided into supervised and unsupervised techniques [10]. Supervised approaches rely on labeled datasets to train models that distinguish between normal and theft behaviors. Ul Haq et al. developed a Deep Convolutional Neural Network (Deep-CNN) for theft detection [16]. Bai et al. proposed a dual-scale dual-branch CNN combined with a Gaussian-weighted Transformer [17]. Fernandes introduced a probabilistic Optimum-Path Forest (OPF) classifier [18]. Rajesh proposed a hybrid method combining Dynamic Time Warping (DTW) and k-Nearest Neighbor (k-NN) [19]. Punmiya et al. improved eXtreme Gradient Boosting (XGBoost) for theft detection under Time-of-Use (ToU) pricing [20]. Xia et al. adopted a Convolutional Long Short-Term Memory (ConvLSTM) model to capture both global and local temporal dependencies [21]. Supervised learning methods generally achieve higher accuracy and exhibit strong pattern recognition capabilities when sufficient labeled data are available.
Unsupervised approaches, on the other hand, detect anomalies by exploring intrinsic data structures without requiring labeled samples. Wu applied K-means clustering to analyze consumption patterns [22]. Peng et al. combined K-means and Local Outlier Factor (LOF) for anomaly detection [23]. Bondok et al. integrated K-means, autoencoders, and one-class SVM [24]. Tian et al. employed DBSCAN to detect anomalous energy use [25]; and Zheng et al. used density-Clust with local density and distance metrics to characterize abnormal patterns [26]. These unsupervised methods effectively eliminate the dependence on labeled datasets, discover hidden patterns autonomously, and offer strong generalization and scalability in real-world applications.
Although supervised methods generally achieve higher detection accuracy, electricity theft data are highly confidential within utilities, and large-scale labeling is both difficult and privacy-sensitive. Even when suspicious users are detected, on-site verification is still required. Therefore, unsupervised approaches that can detect anomalies from unlabeled data offer greater practical value. Additionally, most existing studies rely solely on consumption data without considering spatial information, such as the user’s position in the distribution feeder. This limitation weakens the model’s ability to represent electrical correlations across the network. Incorporating network topology into analysis can improve detection accuracy under the same amount of electrical measurement data.
To address these challenges, this study investigates electricity theft detection in 10 kV distribution feeders. The disturbance patterns of node voltage, branch current, and power loss under theft scenarios are analyzed, and an improved DBSCAN-based electricity theft detection method with spatiotemporal feature fusion is proposed. Spatially, five topological features are constructed based on the node adjacency matrix; temporally, sixteen load time-series features are extracted. The fused spatiotemporal features are then reduced via Principal Component Analysis (PCA), and DBSCAN clustering is applied to achieve unsupervised identification of nodes exhibiting anomalous load behavior (anomalous nodes).
The main contributions of this paper are as follows:
  • Analyzes the impact of anomalous load behaviors on node voltage, branch current, and power loss in distribution feeders, providing physical insights and engineering guidance for theft detection in practical systems.
  • Enables low-cost implementation relying solely on data collected from existing smart meters, without the need for additional monitoring devices.
  • Proposes a spatiotemporal feature fusion approach that jointly captures electrical correlations in feeder topology and temporal load dynamics, enhancing the representational capacity of input features.
  • Develops an unsupervised detection framework integrating PCA-based dimensionality reduction and DBSCAN clustering, effectively removing dependence on labeled or historical data.
The remainder of this paper is organized as follows. Section 2 introduces the benchmark distribution system and analyzes the impact of theft behaviors on electrical parameters. Section 3 describes the proposed improved DBSCAN-based detection method incorporating spatiotemporal fusion features. Section 4 provides case studies and performance evaluations. Section 5 discusses the practical application, significance, limitations, and potential future improvements of the proposed method. Finally, Section 6 concludes the paper.

2. Electrical Parameter Analysis of Single-Node and Multi-Node Anomalies in Distribution Networks

2.1. Benchmark Model Construction

The IEEE 33-bus distribution network is adopted as the benchmark model in this study due to its representative 10 kV radial structure and moderate network scale, which effectively balances modeling realism and computational efficiency for simulation-based theft analysis [27]. As shown in Figure 1, the system consists of 33 nodes and 32 feeder branches in a typical radial configuration.
Bus 1 serves as the slack (source) node. A single main feeder (trunk line) extends from Bus 1 to Bus 18, from which three lateral branches supply downstream consumer areas (Buses 19–33). The network parameters, including line impedances and load power vectors, are derived from the IEEE PES Test Feeder standard [28]. The detailed configuration parameters are listed in Table 1.
To address the scarcity of labeled datasets for electricity theft identification, this study constructs power flow analysis models for theft scenarios by proportionally reducing the injected power at selected nodes of the IEEE 33-bus system using open-source load data. Both single-node and multi-node theft cases are simulated under theft ratios of 0.2, 0.4, 0.6, and 0.8. The Newton–Raphson algorithm is used to solve the power flow equations.
Based on the simulation results, the impacts of electricity theft on network electrical parameters are analyzed. Typical patterns include voltage drop at anomalous nodes, current reduction in adjacent branches, and variations in power factor. These results provide essential support for the subsequent identification of anomalous nodes.

2.2. Parameters Under Normal Operating Conditions

Figure 2 depicts the voltage distribution characteristics of the IEEE-33 distribution network under normal operating conditions, solved via the Newton-Raphson power flow algorithm. As the electrical distance from the 11 kV source bus increases, the system exhibits a characteristic voltage gradient decay, with node voltage magnitudes decreasing in a stepwise manner along the feeder direction.
To facilitate comparative analysis of electrical parameter variations in the IEEE-33 distribution network model under theft conditions, the branches are systematically numbered (see Table 2 for numbering conventions).
The key electrical parameters under normal conditions, including branch voltages, currents, and power losses, are presented in Table 3, serving as a reference baseline for subsequent comparative studies of anomalous conditions.

2.3. Single-Node Anomaly Analysis

Based on the IEEE 33-bus distribution network model, a single-node anomaly scenario is constructed. The scenario selects the central feeder nodes (6 and 4) and secondary branch nodes (14 and 24) to investigate the variations in node voltages, branch currents, and branch power losses under theft conditions. The calculation formula for the rate of change is given in Equation (1), and the variations in complex power injections at the anomalous nodes are listed in Table 4.
Δ x k ( % ) = x k p o s t x x p r e x k p r e × 100 % ,
In Equation (1), x k p r e and x k p o s t denote the branch or node data before and after electricity theft, respectively.
(1)
Node Voltage Variation under Single-Node Anomalies
Figure 3a–d show the relative voltage changes in nodes in the IEEE-33 distribution network under a single-node anomaly. Upstream nodes (1–3) exhibit increasing voltage disturbances with electrical distance from the source, peaking at the anomalous node. In the middle and downstream sections of the main feeder (nodes 4–18), voltage fluctuations remain relatively stable, whereas branch lines (19–22, 23–25) follow disturbance patterns similar to their adjacent main feeder nodes. The radial network structure amplifies voltage variations at remote nodes due to higher impedance and sudden changes in theft current, causing significant Ohmic voltage drops.
Comparing Figure 3a–c with Table 4, the voltage change trends remain consistent as the theft ratio increases from 0.2 to 0.8. For instance, node 4 shows voltage change rates of 2.26%, 4.56%, 6.64%, and 9.11% corresponding to theft ratios of 0.2, 0.4, 0.6, and 0.8, respectively, demonstrating approximate linear proportionality. Node 17, located at the feeder terminus, also shows increasing changes of 2.47%, 4.94%, 7.39%, and 9.86%, indicating that theft has a network-wide impact. These results confirm that, under stable operation of other nodes, an increase in theft proportion leads to proportional voltage changes across the system.
(2)
Branch Current Variation under Single-Node Anomalies
Figure 4a–d present relative changes in branch currents under single-node anomalies. In Figure 4a, upstream branches of the anomalous node exhibit increasing current changes along the feeder in the reverse direction, peaking at the terminal branch, while other branches show negligible fluctuations. A similar pattern is observed in Figure 4b. For remote-node theft (Figure 4c), the current variation gradient increases along the main feeder branches (Line 1–Line 13), while subsequent branches (Line 14–19, etc.) remain unchanged. In Figure 4d, proximal branch theft induces pronounced disturbances solely in upstream segments of the theft branches (Line 22–Line 24), with maximum change rates exceeding 40%.
(3)
Branch Power Loss Variation under Single-Node Anomalies
Based on the IEEE-33 distribution network benchmark model, the relative changes in branch power losses caused by a single-node anomaly at node 6 (near the source on the main feeder) and node 24 (end of the secondary branch) are listed in Table 5. The changes in branch power losses induced by theft follow trends similar to those of branch currents. In the main feeder theft scenario (node 6), the power losses of upstream branches along the energy transmission path (Line 1–Line 5) increase significantly compared to normal operation. In the secondary branch theft scenario (node 24), the branches closer to the source (Line 1, Line 2) and the connected branches (Line 22, Line 23) also exhibit notable increases in losses. The results indicate that in both scenarios, the magnitude of branch power loss gradually increases along the energy transmission path, reaching a maximum at the branch where theft occurs, while subsequent branches show no significant fluctuations.

2.4. Multi-Node Anomaly Analysis

This section establishes multi-node electricity theft scenarios by combining central trunk nodes (nodes 4 and 9) with branch nodes (nodes 20 and 24). The variations in injected complex power loads at anomalous nodes are summarized in Table 6.
(1)
Node Voltage Variation under Multi-Node Anomalies
Figure 5a,b illustrate the spatial distribution of relative node voltage changes under multi-node anomalies in the IEEE-33 distribution network. In the trunk-feeder scenario (Figure 5a), voltage variation rates in the upstream region of anomalous nodes (nodes 1–9) increase with electrical distance from the source, peaking at nodes 4 and 9. Downstream branch nodes exhibit voltage changes comparable to their nearest anomalous node, while lateral branches (nodes 19–23, 23–25) follow the variations in their respective connection points on the main feeder. In the secondary branch scenario (Figure 5b), trunk feeder voltages remain stable (fluctuations <0.1%), whereas terminal branch nodes 20 and 24 experience significant voltage surges.
In these cases, each anomalous node induces abnormal currents in its upstream branches, which in turn generate ohmic voltage drops on the corresponding line impedances. Consequently, the upstream feeders of multiple anomalous nodes consistently exhibit a distance-dependent increase in voltage variation rate. Branch voltages are always constrained by the potentials of their connection points to the main feeder. Regardless of whether theft occurs on a single-node or multi-node system, voltage disturbances on branch feeders remain synchronized with their main feeder connection points—an inherent property determined by electrical connectivity. Both single-node anomalies (Figure 4d, node 25) and multi-node anomalies (Figure 5b, nodes 20 and 24) exhibit elevated voltage change rates at terminal nodes, indicating the relative instability of electrical node voltages at these locations.
Simulation experiments indicate that each anomalous node precipitates incremental voltage change rates in subsequent branch nodes. When the anomalous nodes are located in close proximity, the voltage fluctuations rise sharply. This reveals that the spatial distribution of anomalous nodes directly influences electrical parameter fluctuations in the distribution network.
(2)
Branch Current Variation under Multi-Node Anomalies
Figure 6a,b depict relative branch current variations under multi-node anomalies. In the trunk-feeder theft scenario (Figure 6a), upstream branches of anomalous nodes (Line 1–2 and Line 5–7) exhibit gradually increasing currents, peaking at terminal branches connected to anomalous nodes (Line 3 and Line 8). Subsequent branches (Line 9–18) and lateral branches (Line 18–24) show negligible changes.
In the branch feeder theft scenario (Figure 6b), currents in branches directly connected to anomalous nodes (Line18–19 and Line22–23) also increase progressively, reaching their maximum at the anomalous nodes. Due to shorter paths and higher branch resistances, current variations in lateral feeders are more pronounced than in the trunk feeder.
Overall, in multi-node theft conditions, abnormal currents originating from each anomalous node propagate upstream toward the source, causing incremental increases in upstream branch currents and peaking at the terminal branches where theft occurs. This trend is consistent with single-node theft scenarios. Furthermore, because of their shorter paths and higher resistances, branch feeders exhibit more substantial current fluctuations than trunk feeders under identical theft levels.
(3)
Branch Power Loss Variation under Multi-Node Anomalies
Multi-node theft scenarios are simulated at trunk nodes 4 and 6 and branch nodes 20 and 24. Table 7 summarizes relative branch power loss variations. Using the source side as a reference, it is observed that branches adjacent to anomalous nodes exhibit significantly higher increases in power loss compared with other branches, and the magnitude of variation is strongly correlated with the topological location of the anomalous nodes. This finding suggests that branch power loss can serve as an effective localization indicator, where the analysis of loss gradients enables identification of anomalous nodes within the network.

3. Spatiotemporal Feature Fusion-Based Improved DBSCAN for Anomalous Nodes Detection

Power flow analysis results show that, under normal operating conditions and electricity theft scenarios, the distribution network exhibits significantly different patterns in the distribution of electrical parameters. Abnormal consumption at nodes introduces load disturbances that propagate through the network, with their magnitude strongly dependent on nodes’ topological positions. To enable unsupervised detection of anomalous nodes, it is essential to construct a recognition model that integrates both spatial and temporal features. From the spatial perspective, the connectivity among nodes must be exploited, while from the temporal perspective, the dynamic characteristics of voltage magnitudes, branch current phases, and complex power fluctuations must be captured.
This paper proposes an improved DBSCAN-based method for electricity theft detection using spatiotemporal fusion features, with operational data from distribution feeders as the foundation. The overall procedure is illustrated in Figure 7. The method comprises three steps: first, establishing a nodal spatial topology model based on grid connectivity relations and analyzing the time-series variation features of parameters such as voltage and current; second, integrating spatial and temporal features and applying PCA for dimensionality reduction; finally, employing DBSCAN density-based clustering to automatically identify anomalous nodes.

3.1. Extraction of Topological Features and Load Data Features

The 10 kV distribution feeder is characterized by a tree-like topology, where node positions are determined by their physical connectivity. Specifically, the low-voltage side of the 35 kV/11 kV or 110 kV/11 kV transformer is defined as the root node, while the high-voltage side of the 11 kV/0.4 kV transformer serves as the leaf nodes. The adjacency matrix A { 0 , 1 } N × N is employed to represent the connectivity among nodes, where A i j = 1 indicates that node i is directly connected to node j . Based on this relationship, five categories of spatial network features are extracted, with their calculation methods summarized in Table 8.
The listed spatial network features characterize structural properties of distribution network nodes from different dimensions. Node closeness centrality measures the extent to which a node is close to the network core; anomalous nodes are typically located at the periphery but can induce deviations in central node parameters. Node betweenness centrality reflects the bridging role of a node, where anomalies on critical paths can exert wider influence. Node hierarchical depth describes the vertical position within the tree structure. Neighbor connectivity density reveals local sparsity, where low-density regions are more likely to conceal anomalies. Electrical coupling strength captures electrical correlations with adjacent nodes; theft behaviors cause observable fluctuations in the parameters of neighboring nodes.
Electrical parameters within the distribution transformer area display seasonal cyclic fluctuation characteristics over time. Based on annual time-series data of node voltages, branch currents, and complex power, load usage patterns at both the feeder and node level can be identified, while theft behaviors lead to significant abnormal deviations. For implementation, annual data are divided into monthly windows, from which temporal features of voltage, current, and power at each node are extracted, thereby constructing a load behavior analysis framework covering the annual cycle. With monthly data as the analysis unit, the extracted temporal electrical features are summarized in Table 9.
Among the electricity consumption features, Features 1–6 are derived from voltage data, Features 7–9 are extracted from current data, Features 10–12 are generated from power data, and Features 13–16 are obtained by combining the relationships among voltage, current, and power. These features capture regularities, periodicity, and parameter correlations, thereby simplifying computation while retaining critical information, and enabling a comprehensive analysis of the electricity consumption patterns of each node.

3.2. Feature Dimensionality Reduction

The feature set constructed based on the spatiotemporal feature extraction method consists of T months of multidimensional monitoring data per sample (with D feature parameters per month), forming a high-dimensional multivariate time-series feature set of size T × D. Direct application of unsupervised clustering algorithms to such high-dimensional data can be severely affected by the “curse of dimensionality”, which may distort Euclidean distance calculations between samples, leading to biased cluster partitioning and misclassification of noise points. To address this, Principal Component Analysis (PCA) is employed to perform global dimensionality reduction, extracting principal component vectors that account for over 90% of the cumulative variance. This approach compresses the feature space while retaining the major variance information of the original data [29].
Initially, the feature matrix of the i-th node X i T × D is flattened as:
x i = vec ( X i ) = x i , 1 ( 1 ) , x i , 2 ( 1 ) , , x i , D ( T ) T D ,
where x i , 1 to x i , 5 denotes spatial features, and x i , 6 to x i , 21 denotes temporal features.
To avoid the influence of differing units across dimensions, Z-score standardization is applied to each feature:
z i , k = z i , k μ k σ k , k = 1 , 2 , , 21 ,
where μ k and σ k are the mean and standard deviation of the k-th feature across all nodes in the distribution network.
The standardized data matrix Z N × ( T D ) is then computed as:
C = 1 N 1 Z Z
The covariance matrix C is decomposed by eigenvalue decomposition:
C = V Λ V
where Λ = diag ( λ 1 , λ 2 , , λ T D ) is the diagonal matrix of eigenvalues, and V = [ v 1 , v 1 , , v T D ] is the matrix of eigenvectors.
The variance contribution rate for each feature is calculated as:
CR j = λ j / i = 1 T D λ i
The first k principal components are selected such that the cumulative variance contribution rate meets or exceeds the threshold, and the data is projected onto these components for dimensionality reduction:
F PCA = ZV [ : , 1 : k ] = [ F 1 , F 2 , , F k ] N k

3.3. Anomalous Node Identification Based on DBSCAN

Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is a density-based unsupervised clustering algorithm. Its core principle is to partition high-density regions into clusters by defining a neighborhood density threshold ϵ and a minimum number of neighbors M i n P t s , while sparsely distributed outliers are identified as noise. Compared with traditional clustering methods, DBSCAN does not require a predefined number of clusters and is highly adaptable to arbitrarily shaped data distributions, making it particularly suitable for detecting anomalous nodes with irregular and unknown spatial distribution in electricity theft scenarios [30].
Based on the number of nodes in the distribution area and empirical experience, M i n P t s is specified. The Euclidean distance from each node to its M i n P t s -th nearest neighbor is calculated and denoted as the k -distance (8):
d ( F i , F j ) = k = 1 D f i k f j k 2 ,
All nodes’ k -distances are sorted in ascending order, selecting the distance of maximal curvature as ϵ :
ϵ = arg max x d 2 k ( x ) d x 2 ,
For each node F i , its ϵ -neighborhood is defined as the ensemble of all samples F i with Euclidean distances not exceeding ϵ :
N ϵ ( F i ) = F j D | d ( F i , F j ) ϵ ,
If the number of samples within the neighborhood N ϵ ( F i ) of node F i is not less than M i n P t s , then F i is defined as a core point. If node F j N ϵ ( F i ) but does not itself satisfy the core point condition, it is designated as a border point. If node F k N ϵ ( F i ) and F i is a core point, then F k is said to be directly density-reachable from F i . Furthermore, if there exists a chain of nodes F i F j F k F z , then the node F z is considered density-reachable from F i .
Starting from an unvisited core node, all density-reachable nodes are recursively merged into a single cluster. When no new nodes can be added, the expansion of that cluster ends. After all clusters are formed, nodes not assigned to any cluster are identified as noise points, which are directly regarded as anomalous nodes.
To evaluate the clustering performance in identifying anomalous nodes, three mainstream metrics are employed: Silhouette Coefficient, Calinski–Harabasz Index, and Davies–Bouldin Index:
  • Silhouette Coefficient evaluates clustering quality by comparing the average intra-cluster distance with the nearest-cluster distance. Its value ranges from −1 to 1, with values closer to 1 indicating better cluster structure.
  • Calinski–Harabasz Index is based on the ratio of between-cluster variance to within-cluster variance, with higher values indicating better inter-cluster separation and intra-cluster compactness.
  • Davies–Bouldin Index calculates the mean ratio of intra-cluster compactness to inter-cluster separation; lower values indicate superior clustering performance.
Table 10 compares K-means and DBSCAN clustering results across these metrics.
DBSCAN significantly outperforms K-means: the Silhouette Coefficient increases to 0.68, the Calinski–Harabasz Index reaches 286.4, and the Davies–Bouldin Index decreases to 0.62. These results confirm that DBSCAN is effective in identifying complex and sparsely distributed anomalous nodes. Therefore, DBSCAN is selected as the unsupervised method for detecting anomalous nodes in the distribution network.

4. Case Study of Anomalous Node Identification in a Distribution Feeder

4.1. Feeder Topology and Two-Anomalous-Node Scenario

A 10 kV busbar distribution network of a substation is selected as the application case for algorithm validation, with its topology illustrated in Figure 8. The network comprises 76 public transformers, with a total line length of 98.86 km and a rated capacity of 8085 kVA.
Based on one-year operational data of node voltages, nodal complex power, and branch currents, 5–10% of the nodes were randomly selected as anomalous nodes. Corresponding electrical parameters for these nodes were generated, and the resulting theft disturbance data were combined with historical normal data at a 10% anomaly sample ratio to construct a hybrid validation dataset. This dataset strikes a balance between realism and controllability, providing a reliable benchmark for evaluating the proposed anomalous nodes identification algorithm.
To demonstrate the effectiveness of the proposed method, a two-nodes exhibiting anomalous load behavior scenario is implemented. Two representative anomalous nodes, A and B, are marked in red in Figure 8, and all subsequent experiments and analyses are conducted under this scenario.

4.2. Node Feature Analysis

Feature extraction was performed using a monthly sliding window. Table 11 lists the spatiotemporal feature values and their Z-score normalization results for Nodes A and B in January under normal operating conditions.
From the spatial network features, node A’s closeness centrality (Z = −0.32) is slightly closer to the network center than node B (Z = −0.49), although neither qualifies as a core node. The difference in betweenness centrality (node A Z = 0.49; node B Z = −0.55) indicates that node A functions as a transmission hub, while node B is at the downstream end.
From the temporal electrical features, node A exhibits typical industrial load behavior, with a monthly average current of 480 A (Z = 1.25), electricity consumption of 1380 MWh (Z = 1.12), and a power factor of 0.90, indicating stable operation. In contrast, node B has a monthly average current of 85.6 A (Z = −0.32), electricity consumption of 65.8 MWh (Z = −0.31), and experiences five current jumps in the month (Z = 0.84), demonstrating intermittent residential load patterns. These distinct differences in spatial and temporal features confirm the identifiability of nodes with different load characteristics.
After theft occurs (node A: undercurrent type; node B: phase-shifting type), the spatial network features remain unchanged, as theft actions do not alter network connectivity. However, temporal electrical features show notable deviations. Figure 9 presents the Z-score variations (ΔZ) before and after theft, where ΔZ is defined as the post-theft Z minus the pre-theft Z.
Node A exhibits systematic negative shifts in monthly average current (ΔZ = −0.48), monthly active power integral (ΔZ = −0.42), and voltage-current correlation coefficient (ΔZ = −0.73), revealing current attenuation patterns. Node B shows declines in power-current covariance (ΔZ = −0.43), monthly active power integral (ΔZ = −0.29), and power factor (ΔZ = −0.87), while monthly reactive power integral increases (ΔZ = +1.82), indicating phase disturbance effects. Both theft types cause significant deviation from normal feature patterns, confirming the sensitivity of spatiotemporal features to theft disturbances.

4.3. Detection Results of the Unsupervised Clustering Method

To evaluate the effectiveness of the proposed improved DBSCAN method in detecting anomalous nodes, experiments were conducted under the same feature extraction settings, with K-means clustering used for comparison. Figure 10a,b present 3D visualizations of the detected anomalous nodes, with X, Y, and Z axes corresponding to the top three principal components obtained via PCA.
As shown in Figure 10a, K-means identifies six noise points. Verification against feeder logs and tampering records reveals that five are normal nodes, and only one is an actual theft case, indicating a high false positive rate and reduced detection reliability. In contrast, Figure 10b demonstrates that the DBSCAN algorithm can adaptively identify clusters of varying densities, automatically separating the nodes into two user groups and accurately detecting two theft cases. By integrating feeder topology, the first group corresponds to residential areas (e.g., small community transformers and villages), and the second to industrial and commercial users (e.g., schools, hospitals, steel plants). The results demonstrate that the proposed DBSCAN method achieves higher detection accuracy and robustness than traditional K-means clustering.

4.4. Method Robustness and Quantitative Performance Evaluation

To further assess the robustness and effectiveness of the proposed approach, extended experiments were conducted on the 10 kV feeder case. Considering that theft typically occurs at low frequencies in real networks, anomaly ratios were set between 1% and 5%. Twenty combinations of dual-node theft scenarios were generated, forming 100 diverse validation samples that balance data volume and realistic variability.
Based on the benchmark feature set, six ablation experiments were conducted by removing different feature categories (Figure 11). The configurations were as follows: G0—All 21 spatiotemporal features; G1—Excluding spatial features; G2—Excluding voltage features; G3—Excluding current features; G4—Excluding power features; G5—Excluding cross-correlation features.
Under low-anomaly-rate conditions (1–5%), the full feature group (G0) achieved the best performance, with an accuracy of 90.7%, precision of 88.2%, recall of 87.5%, and an F1-score of 0.895.
Feature ablation experiments were conducted to quantify the contribution of different feature categories. As shown in Figure 11, excluding spatial (G1) or power features (G4) caused minor degradation, while removing current (G3) or cross-correlation features (G5) led to significant drops in performance, with F1-scores of 0.755 and 0.710, respectively. These results indicate that cross-correlation, current, and spatial features contribute most to model robustness, whereas voltage features have limited marginal impact.
During PCA-based dimensionality reduction, component selection was determined by cumulative variance contribution. When the cumulative contribution exceeds 90%, the reduced representation can be considered to preserve the essential information of the original dataset. As shown in Figure 12, the first five principal components explain 53.21%, 26.73%, 5.28%, 3.32%, and 2.10% of the variance, respectively, with a cumulative contribution of 90.64%. Thus, five principal components are sufficient to capture the main variance structure while substantially reducing feature dimensionality. This ensures computational efficiency and supports reliable clustering-based identification of anomalous nodes.

4.5. Comparison with Supervised and Unsupervised Baselines

To evaluate overall performance, the proposed method was compared with K-means clustering and a supervised CNN model under low-anomaly-rate (1–5%) conditions. All methods used the same 21 standardized spatiotemporal features, and the CNN input was reshaped to match network requirements. The results are presented in Table 12.
In 100 low-anomaly (1–5%) distribution feeder samples, the supervised CNN model achieved slightly higher accuracy than the proposed improved DBSCAN method. However, CNN relies heavily on large amounts of labeled data, which are difficult to obtain in practice due to privacy constraints and the scarcity of confirmed theft labels in real power systems, making it unsuitable for large-scale deployment. Both the proposed method and K-means are unsupervised and require no labeled data. Nevertheless, the proposed approach outperforms K-means in terms of accuracy and stability, demonstrating stronger adaptability to real-world distribution networks where labeled data are limited.

5. Discussion

5.1. Practical Application Significance

The proposed improved DBSCAN-based unsupervised detection method demonstrates significant practical value in power distribution systems. Unlike supervised approaches that rely on extensive manual labeling, this method requires no prior labels. It can directly utilize multi-dimensional data collected by AMI to identify nodes exhibiting anomalous load behavior, effectively addressing the “label scarcity” issue commonly encountered in real-world scenarios. Since the algorithm depends only on basic measurements such as voltage, current, and power, it offers low deployment cost and strong scalability, enabling rapid implementation at the feeder or substation level. Experimental results confirm that the method maintains high accuracy and robustness under low anomaly rates (1–5%), making it an effective tool to support anti-theft operations in distribution networks.

5.2. Limitations

Although the proposed method achieves promising performance in terms of detection accuracy and practicality, several limitations remain. In real AMI systems, measurement data may suffer from noise interference, sampling loss, or drifting baselines, which can blur cluster boundaries and reduce detection accuracy. Moreover, when processing millions of user records or long time-series datasets, the computational complexity of density-based clustering may lead to reduced efficiency in large-scale environments.

5.3. Future Research Directions

Future work will focus on more realistic and complex distribution network scenarios. A closed-loop framework will be developed to integrate detection results with on-site verification. This will enable self-adaptive parameter tuning and continuous model optimization. In addition, parallel and distributed implementations of the algorithm will be explored to improve computational efficiency and scalability for large-scale power data applications. Moreover, the development of secondary post-clustering filters, such as those based on branch loss gradients or topological location, will be investigated to further mitigate potential false positives.

6. Conclusions

This study addresses the challenges of label scarcity, and complex feature interactions in distribution networks. An improved DBSCAN-based unsupervised detection method is proposed, integrating spatiotemporal features for identifying nodes exhibiting anomalous load behavior. The effectiveness of the proposed approach is verified through hybrid dataset experiments. The main conclusions are summarized as follows:
(1)
Based on the IEEE 33-bus benchmark distribution network, the disturbance patterns of voltage, current, and power loss under theft conditions were analyzed. Results indicate that the farther a theft node is from the power source, the stronger the propagated voltage and current disturbances along the feeder. The electrical distance is positively correlated with fluctuation amplitude. Meanwhile, branch loss peaks are localized near theft terminals and increase unidirectionally along the power flow path without back-propagation to downstream lines.
(2)
By exploring the correlation between theft disturbances and network topology, a comprehensive set of spatial, temporal, and cross-correlation features was constructed to represent node operating states. An improved DBSCAN clustering framework was developed based on these fused features.
(3)
Validation on hybrid datasets demonstrated that the proposed method achieved an accuracy of 90.7%, a recall of 87.5%, and an F1-score of 0.895, outperforming the traditional K-means algorithm and approaching the performance of supervised CNN models—without the need for labeled data.
(4)
The proposed method relies solely on basic electrical measurements such as voltage, current, and power, without requiring user labels or complex training. This ensures low deployment cost, strong scalability.
In summary, the improved DBSCAN-based method achieves high detection accuracy and robustness, offering a practical and scalable approach for electricity theft detection in distribution networks. Future work will focus on parallel optimization, and field feedback integration to further enhance the real-time performance and engineering applicability of the method.

Author Contributions

Software, formal analysis, writing—original draft, writing—review and editing, J.C., Z.G. and W.B.; supervision, L.X.; methodology, validation, J.L.; funding acquisition, project administration, Y.Z.; data curation, resources, J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

Funded by the General Project of National Natural Science Foundation of China (Grant No. 52577149).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data will be made available on request.

Acknowledgments

The Huzhou Power Supply Bureau in Zhejiang Province, China, provided invaluable support in the data collection process. The authors extend their appreciation.

Conflicts of Interest

Author Wei Bai was employed by the company State Grid Chongqing Electric Power Company Shinan Power Supply Branch. Author Yanlong Zhao was employed by the company State Grid Zhejiang Provincial Electric Power Company Anji County Power Supply Branch. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Reddy Depuru, S.S.S.; Wang, L.; Devabhaktuni, V. Electricity theft: Overview, issues, prevention and a smart meter based approach to control theft. Energy Policy 2011, 39, 1007–1015. [Google Scholar] [CrossRef]
  2. Jiang, R.; Lu, R.; Wang, Y.; Luo, J.; Shen, C.; Shen, X. Energy-theft detection issues for advanced metering infrastructure in smart grid. Tsinghua Sci. Technol. 2014, 19, 105–120. [Google Scholar] [CrossRef]
  3. Massaferro, P.; Di Martino, J.M.; Fernández, A. Fraud detection in electric power distribution: An approach that maximizes the economic return. IEEE Trans. Power Syst. 2019, 35, 703–710. [Google Scholar] [CrossRef]
  4. Esmael, A.A.; Da Silva, H.H.; Ji, T.; da Silva Torres, R. Non-technical loss detection in power grid using information retrieval approaches: A comparative study. IEEE Access 2021, 9, 40635–40648. [Google Scholar] [CrossRef]
  5. Xia, X.; Xiao, Y.; Liang, W.; Cui, J. Detection methods in smart meters for electricity thefts: A survey. Proc. IEEE 2022, 110, 273–319. [Google Scholar] [CrossRef]
  6. Muzumdar, A.; Modi, C.; Vyjayanthi, C. Designing a blockchain-enabled privacy-preserving energy theft detection system for smart grid neighborhood area network. Electr. Power Syst. Res. 2022, 207, 107884. [Google Scholar] [CrossRef]
  7. Ahmad, T.; Chen, H.; Wang, J.; Guo, Y. Review of various modeling techniques for the detection of electricity theft in smart grid environment. Renew. Sustain. Energy Rev. 2018, 82, 2916–2933. [Google Scholar] [CrossRef]
  8. Shahid, M.B.; Shahid, M.O.; Tariq, H.; Saleem, S. Design and development of an efficient power theft detection and prevention system through consumer load profiling. In Proceedings of the 2019 International Conference on Electrical, Communication, and Computer Engineering (ICECCE), Cox’s Bazar, Bangladesh, 7–9 February 2019; pp. 1–6. [Google Scholar]
  9. Wang, Y.; Chen, Q.; Hong, T.; Kang, C. Review of smart meter data analytics: Applications, methodologies, and challenges. IEEE Trans. Smart Grid 2018, 10, 3125–3148. [Google Scholar] [CrossRef]
  10. Messinis, G.M.; Hatziargyriou, N.D. Review of non-technical loss detection methods. Electr. Power Syst. Res. 2018, 158, 250–266. [Google Scholar] [CrossRef]
  11. Luan, W.; Wang, G.; Yu, Y.; Lin, J.; Zhang, W.; Liu, Q. Energy theft detection via integrated distribution state estimation based on AMI and SCADA measurements. In Proceedings of the 2015 5th International Conference on Electric Utility Deregulation and Restructuring and Power Technologies (DRPT), Changsha, China, 26–29 November 2015; pp. 751–756. [Google Scholar]
  12. Silva, L.G.d.O.; da Silva, A.A.; de Almeida-Filho, A.T. Allocation of power-quality monitors using the P-median to identify nontechnical losses. IEEE Trans. Power Deliv. 2016, 31, 2242–2249. [Google Scholar] [CrossRef]
  13. Manito, A.R.; Bezerra, U.H.; Soares, T.M.; Vieira, J.P.; Nunes, M.V.; Tostes, M.E.; de Oliveira, R.C. Technical and non-technical losses calculation in distribution grids using a defined equivalent operational impedance. IET Gener. Transm. Distrib. 2019, 13, 1315–1323. [Google Scholar] [CrossRef]
  14. Ferreira, T.S.D.; Trindade, F.C.; Vieira, J.C. Load flow-based method for nontechnical electrical loss detection and location in distribution systems using smart meters. IEEE Trans. Power Syst. 2020, 35, 3671–3681. [Google Scholar] [CrossRef]
  15. Veeramani, P.; Aravindaguru, I.; Prathap, M.; Bhavesh, L.; Kamalesh, R.; Hassan, A.T. IOT Based Power Theft Detection For Transmission Lines. In Proceedings of the 2024 5th International Conference on Smart Electronics and Communication (ICOSEC), Tholurpatti, India, 18–20 September 2024; pp. 507–512. [Google Scholar]
  16. Haq, E.U.; Pei, C.; Zhang, R.; Jianjun, H.; Ahmad, F. Electricity-theft detection for smart grid security using smart meter data: A deep-CNN based approach. Energy Rep. 2023, 9, 634–643. [Google Scholar] [CrossRef]
  17. Bai, Y.; Sun, H.; Zhang, L.; Wu, H. Hybrid CNN-Transformer Network for Electricity Theft Detection in Smart Grids. Sensors 2023, 23, 8405. [Google Scholar] [CrossRef]
  18. Fernandes, S.E.; Pereira, D.R.; Ramos, C.C.; Souza, A.N.; Gastaldello, D.S.; Papa, J.P. A probabilistic optimum-path forest classifier for non-technical losses detection. IEEE Trans. Smart Grid 2018, 10, 3226–3235. [Google Scholar] [CrossRef]
  19. Ahir, R.K.; Chakraborty, B. Pattern-based and context-aware electricity theft detection in smart grid. Sustain. Energy Grids Netw. 2022, 32, 100833. [Google Scholar] [CrossRef]
  20. Punmiya, R.; Choe, S. ToU Pricing-Based Dynamic Electricity Theft Detection in Smart Grid Using Gradient Boosting Classifier. Appl. Sci. 2021, 11, 401. [Google Scholar] [CrossRef]
  21. Xia, X.; Lin, J.; Jia, Q.; Wang, X.; Ma, C.; Cui, J.; Liang, W. ETD-ConvLSTM: A Deep Learning Approach for Electricity Theft Detection in Smart Grids. IEEE Trans. Inf. Forensics Secur. 2023, 18, 2553–2568. [Google Scholar] [CrossRef]
  22. Wu, Q.; Zhang, M.; Liao, L. Analysis of electricity stealing based on user electricity characteristics of electricity information collection system. Energy Rep. 2022, 8, 488–494. [Google Scholar] [CrossRef]
  23. Peng, Y.; Yang, Y.; Xu, Y.; Xue, Y.; Song, R.; Kang, J.; Zhao, H. Electricity theft detection in AMI based on clustering and local outlier factor. IEEE Access 2021, 9, 107250–107259. [Google Scholar] [CrossRef]
  24. Bondok, A.; Abdelsalam, O.; Badr, M.; Mahmoud, M.; Alsabaan, M.; Alsaqhan, M.; Ibrahem, M.I. Accurate Power Consumption Predictor and One-Class Electricity Theft Detector for Smart Grid “Change-and-Transmit” Advanced Metering Infrastructure. Appl. Sci. 2024, 14, 9308. [Google Scholar] [CrossRef]
  25. Tian, L.; Xiang, M. Abnormal power consumption analysis based on density-based spatial clustering of applications with noise in power systems. Autom. Electr. Power Syst. 2017, 41, 64–70. [Google Scholar]
  26. Zheng, K.; Wang, Y.; Chen, Q.; Li, Y. Electricity theft detecting based on density-clustering method. In Proceedings of the 2017 IEEE Innovative Smart Grid Technologies-Asia (ISGT-Asia), Auckland Central, New Zealand, 4–7 December 2017; pp. 1–6. [Google Scholar]
  27. Dolatabadi, S.H.; Ghorbanian, M.; Siano, P.; Hatziargyriou, N.D. An enhanced IEEE 33 bus benchmark test system for distribution system studies. IEEE Trans. Power Syst. 2020, 36, 2565–2572. [Google Scholar] [CrossRef]
  28. Baran, M.E.; Wu, F.F. Network reconfiguration in distribution systems for loss reduction and load balancing. IEEE Trans. Power Deliv. 2002, 4, 1401–1407. [Google Scholar] [CrossRef]
  29. Gewers, F.L.; Ferreira, G.R.; Arruda, H.F.D.; Silva, F.N.; Comin, C.H.; Amancio, D.R.; Costa, L.d.F. Principal component analysis: A natural approach to data exploration. ACM Comput. Surv. 2021, 54, 1–34. [Google Scholar] [CrossRef]
  30. Hahsler, M.; Piekenbrock, M.; Doran, D. dbscan: Fast density-based clustering with R. J. Stat. Softw. 2019, 91, 1–30. [Google Scholar] [CrossRef]
Figure 1. Radial Topological Structure of the IEEE 33-Node Distribution System.
Figure 1. Radial Topological Structure of the IEEE 33-Node Distribution System.
Applsci 15 12028 g001
Figure 2. Voltage Distribution of Nodes in the IEEE-33 Distribution System. All voltage values are in kV. The color gradient from yellow to green to blue indicates the gradual voltage decrease along the feeder.
Figure 2. Voltage Distribution of Nodes in the IEEE-33 Distribution System. All voltage values are in kV. The color gradient from yellow to green to blue indicates the gradual voltage decrease along the feeder.
Applsci 15 12028 g002
Figure 3. Node Voltage Disturbance under a Single-Node Anomaly Scenario: (a) anomaly at node 4; (b) anomaly at node 6; (c) anomaly at node 14; (d) anomaly at node 24. The vertical red dashed line indicates the location of the anomalous node.
Figure 3. Node Voltage Disturbance under a Single-Node Anomaly Scenario: (a) anomaly at node 4; (b) anomaly at node 6; (c) anomaly at node 14; (d) anomaly at node 24. The vertical red dashed line indicates the location of the anomalous node.
Applsci 15 12028 g003aApplsci 15 12028 g003b
Figure 4. Branch Current Disturbance under a Single-Node Anomaly Scenario: (a) anomaly at node 4; (b) anomaly at node 6; (c) anomaly at node 14; (d) anomaly at node 24. The vertical red dashed line indicates the branch containing the anomalous node.
Figure 4. Branch Current Disturbance under a Single-Node Anomaly Scenario: (a) anomaly at node 4; (b) anomaly at node 6; (c) anomaly at node 14; (d) anomaly at node 24. The vertical red dashed line indicates the branch containing the anomalous node.
Applsci 15 12028 g004aApplsci 15 12028 g004b
Figure 5. Node Voltage Disturbance under Multi-Node Anomaly Scenarios: (a) anomaly at nodes 4 and 9; (b) anomaly at nodes 20 and 24. The vertical red dashed line indicates the location of the anomalous node.
Figure 5. Node Voltage Disturbance under Multi-Node Anomaly Scenarios: (a) anomaly at nodes 4 and 9; (b) anomaly at nodes 20 and 24. The vertical red dashed line indicates the location of the anomalous node.
Applsci 15 12028 g005
Figure 6. Branch Current Disturbance under Multi-Node Anomaly Scenarios: (a) anomaly at nodes 4 and 9; (b) anomaly at nodes 20 and 24. The vertical red dashed line indicates the branch containing the anomalous node.
Figure 6. Branch Current Disturbance under Multi-Node Anomaly Scenarios: (a) anomaly at nodes 4 and 9; (b) anomaly at nodes 20 and 24. The vertical red dashed line indicates the branch containing the anomalous node.
Applsci 15 12028 g006
Figure 7. Framework of Anomalous Node Detection in Distribution Networks.
Figure 7. Framework of Anomalous Node Detection in Distribution Networks.
Applsci 15 12028 g007
Figure 8. Distribution Topology Diagram of a 10 kV Busbar in a Substation.
Figure 8. Distribution Topology Diagram of a 10 kV Busbar in a Substation.
Applsci 15 12028 g008
Figure 9. Standardized Scores of Node Features Before and After Electricity Theft.
Figure 9. Standardized Scores of Node Features Before and After Electricity Theft.
Applsci 15 12028 g009
Figure 10. Three-dimensional PCA Clustering Visualization of the Case Study Area: (a) K-means clustering; (b) DBSCAN clustering.
Figure 10. Three-dimensional PCA Clustering Visualization of the Case Study Area: (a) K-means clustering; (b) DBSCAN clustering.
Applsci 15 12028 g010
Figure 11. Robustness of the Proposed Method under Low Anomaly Rate Scenarios.
Figure 11. Robustness of the Proposed Method under Low Anomaly Rate Scenarios.
Applsci 15 12028 g011
Figure 12. Variance contribution rates and cumulative contribution rates of PCA components.
Figure 12. Variance contribution rates and cumulative contribution rates of PCA components.
Applsci 15 12028 g012
Table 1. Benchmark Parameter Configuration of IEEE 33-Bus Distribution System.
Table 1. Benchmark Parameter Configuration of IEEE 33-Bus Distribution System.
Node i Node j Line Impedance/p.u.Node j Load/kW + j kVarNode i Node j Line Impedance/p.u.Node j Load/kW + j kVar
120.0922 + j0.047100 + j6017180.3720 + j0.574090 + j40
230.4930 + j0.251190 + j402190.1640 + j0.156590 + j40
340.3660 + j0.1864120 + j8019201.5042 + j1.355490 + j40
450.3811 + j0.194160 + j3020210.4095 + j0.478490 + j40
560.8190 + j0.707060 + j2021220.7089 + j0.937390 + j40
670.1872 + j0.6188200 + j1003230.4512 + j0.308390 + j50
780.7114 + j0.2351200 + j10023240.8980 + j0.7091420 + j200
891.0300 + j0.740060 + j2024250.8960 + j0.7011420 + j200
9101.0440 + j0.740060 + j206260.2030 + j0.103460 + j25
10110.1966 + j0.065045 + j3026270.2842 + j0.144760 + j25
11120.3744 + j0.123860 + j3527281.0590 + j0.933760 + j20
12131.4680 + j1.155060 + j3528290.8042 + j0.7006120 + j70
13140.5416 + j0.7129120 + j8029300.5075 + j0.2585200 + j600
14150.5910 + j0.526060 + j1030310.9744 + j0.9630150 + j70
15160.7463 + j0.545060 + j2031320.3105 + j0.3619210 + j100
16171.2890 + j1.721060 + j2032330.3410 + j0.536260 + j40
Table 2. Branch Numbering of the IEEE-33 Distribution Network.
Table 2. Branch Numbering of the IEEE-33 Distribution Network.
Line IDFrom NodeTo NodeLine IDFrom NodeTo NodeLine IDFrom NodeTo NodeLine IDFrom NodeTo Node
Line112Line9910Line171718Line25626
Line223Line101011Line18219Line262627
Line334Line111112Line191920Line272728
Line445Line121213Line202021Line282829
Line556Line131314Line212122Line292930
Line667Line141415Line22323Line303031
Line778Line151516Line232324Line313132
Line889Line161617Line242425Line323233
Table 3. Electrical Parameters of Branches in the IEEE 33-Bus Distribution System.
Table 3. Electrical Parameters of Branches in the IEEE 33-Bus Distribution System.
Line IDVoltage/VCurrent/APower Loss
/kW + j kVar
Line IDVoltage/VCurrent/APower Loss
/kW + j kVar
Line136.72210.347.07 + j3.12Line175.834.920.02 + j0.03
Line2179.30187.1129.90 + j15.23Line187.1018.090.10 + j0.09
Line395.76134.6111.49 + j5.85Line1947.6213.580.48 + j0.43
Line494.72127.8710.79 + j5.50Line209.889.060.06 + j0.07
Line5233.79124.7522.08 + j19.06Line219.224.530.03 + j0.03
Line665.3758.381.105 + j3.65Line2245.8848.481.84 + j1.25
Line762.0147.612.80 + j0.92Line2386.5943.692.97 + j2.34
Line880.7936.782.41 + j1.73Line2443.1221.880.74 + j0.58
Line974.7333.712.06 + 1.46Line2525.7865.341.50 + j0.76
Line1010.9930.640.32 + j0.11Line2634.5162.481.92 + j9.79
Line1119.1328.000.51 + 0.17Line27145.8359.636.52 + j5.75
Line1279.5924.601.54 + j1.21Line28105.2556.974.52 + j3.94
Line1332.8521.180.42 + j0.55Line2949.8950.582.25 + j1.15
Line1419.4414.190.21 + j0.18Line3055.4023.350.92 + j0.91
Line1517.9411.210.16 + j0.12Line3112.4915.130.12 + j0.14
Line1630.048.060.15 + j0.19Line323.953.590.008 + j0.012
Table 4. Proportional Changes in Power Load of Anomalous Node.
Table 4. Proportional Changes in Power Load of Anomalous Node.
No.NodeNormal Power
Load/kW + jkVar
Power at Theft Ratio r/kW + jkVar
r1 = 0.2r2 = 0.4r3 = 0.6r4 = 0.8
14120 + j8096 + j6472 + j4848 + j3224 + j16
2660 + j2048 + j1636 + j1224 + j812 + j4
314120 + j8096 + j6472 + j4848 + j3224 + j16
424420 + j200336 + j160252 + j120168 + j8084 + j40
Table 5. Relative Variation in Branch Power Loss under Single-Node Anomaly (%).
Table 5. Relative Variation in Branch Power Loss under Single-Node Anomaly (%).
BranchElectricity Theft at Node 6Electricity Theft at Node 24
r1 = 0.2r2 = 0.4r3 = 0.6r4 = 0.8r1 = 0.2r2 = 0.4r3 = 0.6r4 = 0.8
Line10.5931.1831.7712.3574.1968.28812.27916.168
Line20.6641.3251.9832.6394.7019.27413.7218.04
Line30.9061.8082.7043.5950.0920.1830.2740.364
Line40.9521.8992.8413.7770.0920.1840.2750.366
Line50.9741.9432.9063.8630.0920.1840.2760.367
Line60.0480.0950.1430.190.0920.1830.2740.364
Line70.0480.0960.1440.1920.0920.1840.2760.367
Line80.0490.0970.1450.1940.0930.1860.2790.371
Line90.0490.0970.1460.1940.0940.1870.2790.372
Line100.0490.0970.1460.1950.0940.1870.280.373
Line110.0490.0980.1460.1950.0940.1870.2810.373
Line120.0490.0980.1470.1960.0940.1880.2810.374
Line130.0490.0980.1470.1960.0940.1880.2820.375
Line140.0490.0980.1470.1960.0950.1890.2820.376
Line150.0490.0980.1480.1970.0950.1890.2830.376
Line160.0490.0990.1480.1970.0950.1890.2830.377
Line170.0490.0990.1480.1970.0950.1890.2830.377
Line180.0020.0040.0050.0070.0120.0250.0370.05
Line190.0020.0040.0050.0070.0130.0250.0370.05
Line200.0020.0040.0050.0070.0130.0250.0370.05
Line210.0020.0040.0050.0070.0130.0250.0370.05
Line220.0120.0240.0350.04717.41233.11547.12759.459
Line230.0120.0240.0350.04719.20336.31451.35164.331
Line240.0120.0240.0360.0470.290.5770.8631.146
Line250.0480.0970.1450.1930.0930.1850.2780.369
Line260.0480.0970.1450.1930.0930.1860.2780.37
Line270.0490.0970.1450.1940.0930.1860.2790.371
Line280.0490.0970.1460.1940.0930.1860.2790.371
Line290.0490.0970.1460.1940.0940.1870.2790.372
Line300.0490.0980.1470.1950.0940.1880.2810.374
Line310.0490.0980.1470.1960.0940.1880.2810.374
Line320.0490.0980.1470.1960.0940.1880.2810.374
Table 6. Changes in Power Load of Anomalous Nodes.
Table 6. Changes in Power Load of Anomalous Nodes.
No.NodeNormal Power Load/kW + jkVarPower at Theft Ratio r/kW + jkVar
r1 = 0.2r2 = 0.4r3 = 0.6r4 = 0.8
14120 + j8096 + j6472 + j4848 + j3224 + j16
960 + j2048 + j1636 + j1224 + j812 + j4
22090 + j4072 + j3254 + j2436 + j1618 + j8
24420 + j200336 + j160252 + j120168 + j8084 + j40
Table 7. Relative Variation in Branch Power Loss under Multi-Node Anomaly (%).
Table 7. Relative Variation in Branch Power Loss under Multi-Node Anomaly (%).
No.Electricity Theft at Nodes 4 and 9Electricity Theft at Nodes 20 and 24
r1 = 0.2r2 = 0.4r3 = 0.6r4 = 0.8r1 = 0.2r2 = 0.4r3 = 0.6r4 = 0.8
Line11.9213.8215.7027.5625.0349.92314.66919.273
Line22.1564.2866.3918.4714.7049.2813.72818.049
Line32.9645.888.74811.5690.0950.1890.2830.376
Line41.0302.0533.0684.0770.0950.190.2840.378
Line51.0532.0983.1364.1660.0950.190.2850.379
Line62.2114.3946.5488.6750.0950.1890.2820.376
Line72.6875.3337.93910.5040.0950.190.2850.379
Line83.4416.81710.12713.3720.0960.1920.2880.383
Line90.1410.2810.4210.560.0970.1930.2880.384
Line100.1410.2820.4220.5610.0970.1930.2890.384
Line110.1410.2820.4220.5620.0970.1930.2900.385
Line120.1420.2830.4240.5640.0970.1940.2900.386
Line130.1420.2830.4240.5650.0970.1940.2910.387
Line140.1420.2840.4250.5660.0980.1950.2910.388
Line150.1420.2840.4260.5670.0980.1950.2920.388
Line160.1430.2850.4260.5680.0980.1950.2920.389
Line170.1430.2850.4270.5680.0980.1950.2920.389
Line180.0060.0110.0170.0239.80419.09627.87836.149
Line190.0060.0110.0170.02312.9424.97636.1146.343
Line200.0060.0110.0170.0230.0690.1370.2060.274
Line210.0060.0110.0170.0230.0690.1370.2060.274
Line220.0380.0760.1130.15117.41433.11947.13159.464
Line230.0380.0760.1130.15119.20536.31751.35564.335
Line240.0380.0760.1140.1520.2920.5830.8711.157
Line250.0960.1910.2870.3820.0960.1910.2860.381
Line260.0960.1920.2870.3830.0960.1920.2870.382
Line270.0960.1920.2880.3840.0960.1920.2880.383
Line280.0960.1930.2880.3840.0960.1920.2880.383
Line290.0970.1930.2890.3850.0970.1930.2880.384
Line300.0970.1940.2900.3850.0970.1930.2880.384
Line310.0970.1940.2910.3850.0970.1940.2900.386
Line320.0970.1940.2910.3850.0970.1940.2900.386
Table 8. Spatial Network Characteristic Parameters and Their Calculation Methods.
Table 8. Spatial Network Characteristic Parameters and Their Calculation Methods.
No.Topological FeatureCalculation Method
1Node Closeness Centrality C c l o s e ( i ) = N 1 j i d ( i , j )
2Node Betweenness Centrality C b t w ( i ) = s i t δ s t ( i ) δ s t
3Node Hierarchical Depth D ( i ) = l ( i )
4Neighbor Connectivity Density p ( i ) = k i k max
5Electrical Coupling Strength E ( i ) = 1 Z a v g ( i ) Z avg ( i ) = 1 k i j N ( i ) Z i j
Here, N denotes the total number of nodes; d ( i , j ) is the shortest path length between nodes i and j ; δ s t is the total number of shortest paths between nodes s and t ; δ s t ( i ) is the number of shortest paths passing through node i ; l ( i ) represents the shortest path length from the root node to node i ; k i and k max denote the actual and maximum possible number of neighbors, respectively; Z i j is the impedance magnitude of the branch between node i and its neighbor; and N ( i ) is the neighbor set of node i .
Table 9. Summary of Extracted Temporal Electrical Features.
Table 9. Summary of Extracted Temporal Electrical Features.
No.Feature NameCalculation MethodNo.Feature NameCalculation Method
1Monthly Average Voltage V ¯ m = 1 T t = 1 T V t 9Current Jump Frequency Count I t + 1 I t I ¯ m > 0.3
2Voltage Fluctuation Index σ V = 1 N t = 1 T ( V t V ¯ m ) 2 10Monthly Active Power Integral E P = t = 1 N P t Δ t
3Voltage Skewness S V = 1 T t 1 T ( V t V ¯ m ) 3 σ V 3 11Monthly Reactive Power Integral E Q = t = 1 N Q t Δ t
4Weekly Fluctuation Rate σ V , w = 1 4 k = 1 4 σ V , k 12Mean Power Factor PF m = 1 N t = 1 N P t P t 2 + Q t 2
5Mean Daily Peak-to-Valley Voltage Δ V d a y = 1 D d = 1 D max ( V d ) min ( V d ) 13Power–Current Covariance Cov ( P , I ) = 1 N t = 1 N ( P t P ¯ m ) ( I t I ¯ m )
6Voltage Jump Frequency Count V t V ¯ m > 3 δ V 14Power Output per Unit Current η = P ¯ m I ¯ m
7Monthly Average Current I ¯ m = 1 T t = 1 T I t 15Voltage–Current Correlation Coefficient ρ V , I = Cov ( V t , I t ) σ V σ I
8Peak-to-Valley Current Difference Δ I max min = max ( I t ) min ( I t ) 16Power-Time Peak-Valley Synchronization Sync = Count arg max ( P d ) = arg max ( V d ) D
Here, T is the number of sampling points in the month; Δ t is the sampling interval; V t is the voltage at sampling instant t ; V ¯ m is the monthly average voltage; V d is the voltage sequence of day d ; σ V is the voltage standard deviation; σ V , k is the standard deviation of weekly voltage; δ V denotes the voltage fluctuation index; I t is the current at sampling instant t ; I ¯ m is the monthly average current; P t and Q t are the active and reactive power at sampling instant t ; P ¯ m is the monthly average active power; σ I is the current standard deviation; arg max ( P d ) and arg max ( V d ) represent the peak and valley instants of power and voltage, respectively.
Table 10. Comparison of Clustering Performance.
Table 10. Comparison of Clustering Performance.
Evaluation MetricK-Means ResultsDBSCAN Results
Silhouette Coefficient0.320.68
Calinski–Harabasz Index152.7286.4
Davies–Bouldin Index1.850.62
Table 11. Characteristic Values and Z-Score Standardized Values of Nodes under Normal Operating Conditions (January).
Table 11. Characteristic Values and Z-Score Standardized Values of Nodes under Normal Operating Conditions (January).
No.FeatureNode A ValueZ-ScoreNode B ValueZ-Score
1Node Closeness Centrality0.081−0.320.078−0.49
2Node Betweenness Centrality6500.49146−0.55
3Node Hierarchical Depth9−0.63191.12
4Neighbor Connectivity Density0.0270.0370.0270.037
5Electrical Coupling Strength4.80%−0.453.60%−1.11
6Monthly Average Voltage (kV)10.510.3510.360.05
7Voltage Fluctuation Index0.15%0.550.12%0.42
8Voltage Skewness0.030.120.080.42
9Weekly Fluctuation Rate0.25%−1.020.18%−1.31
10Mean Daily Peak-to-Valley Voltage (kV)0.18−0.750.250.13
11Voltage Jump Frequency1/month−1.223/month0.25
12Monthly Average Current (A)480.31.2585.6−0.32
13Peak-to-Valley Current Difference (A)78.50.2845.2−0.35
14Current Jump Frequency2/month−0.545/month0.84
15Monthly Active Power Integral (MWh)13801.1265.8−0.31
16Monthly Reactive Power Integral (MVarh)3200.8512.3−0.67
17Mean Power Factor0.900.410.85−0.23
18Power-Current Covariance0.650.950.420.25
19Power Output per Unit Current (kW/A)0.330.150.25−0.45
20Voltage-Current Correlation Coefficient0.870.910.82−0.54
21Power-Time Peak-Valley Synchronization0.920.650.750.26
Table 12. Comparison of Detection Performance under Low Anomaly Rate (1–5%).
Table 12. Comparison of Detection Performance under Low Anomaly Rate (1–5%).
MethodAccuracy (%)Anomaly
Variability (%)
Noise
Sensitivity (%)
Label
Requirement
Proposed Method90.72±1.2±2.15None
CNN (Supervised)92.31±0.8±1.53High
K-means84.84±4.51±6.32None
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, J.; Guan, Z.; Bai, W.; Liu, J.; Zhao, Y.; Zhou, J.; Xiong, L. Improved DBSCAN-Based Electricity Theft Detection Using Spatiotemporal Fusion Features. Appl. Sci. 2025, 15, 12028. https://doi.org/10.3390/app152212028

AMA Style

Chen J, Guan Z, Bai W, Liu J, Zhao Y, Zhou J, Xiong L. Improved DBSCAN-Based Electricity Theft Detection Using Spatiotemporal Fusion Features. Applied Sciences. 2025; 15(22):12028. https://doi.org/10.3390/app152212028

Chicago/Turabian Style

Chen, Jianlin, Zhe Guan, Wei Bai, Jiayue Liu, Yanlong Zhao, Junyu Zhou, and Lan Xiong. 2025. "Improved DBSCAN-Based Electricity Theft Detection Using Spatiotemporal Fusion Features" Applied Sciences 15, no. 22: 12028. https://doi.org/10.3390/app152212028

APA Style

Chen, J., Guan, Z., Bai, W., Liu, J., Zhao, Y., Zhou, J., & Xiong, L. (2025). Improved DBSCAN-Based Electricity Theft Detection Using Spatiotemporal Fusion Features. Applied Sciences, 15(22), 12028. https://doi.org/10.3390/app152212028

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop