1. Introduction
Harmonic distortion, arising from nonlinear loads such as power electronic converters, variable frequency drives, and renewable energy inverters, presents significant challenges to power quality and system reliability in modern power systems [
1,
2,
3,
4,
5,
6]. These distortions, increasingly prevalent in smart grids and distributed generation, cause equipment overheating, reduced power factor, and heightened risks of grid instability, particularly in networks with high penetration of photovoltaic systems and electric vehicle chargers [
7,
8]. Effective harmonic source localization is critical to mitigate these adverse effects, ensuring the reliable integration of renewable energy sources and maintaining compliance with power quality standards.
Analytical methods for harmonic source localization, such as harmonic state estimation (HSE) and power flow analysis, rely on detailed system models to identify source locations using synchronized voltage and current measurements [
9,
10]. These approaches achieve high accuracy in well-characterized transmission networks but are computationally intensive and sensitive to model inaccuracies or dynamic topologies [
11]. Recent advancements leverage multi-source information fusion and underdetermined measurement systems to enhance HSE performance, yet these methods often require extensive instrumentation, limiting their scalability in large or rapidly changing systems [
12].
Signal processing techniques offer an alternative by capturing transient or sparse harmonic signals, making them well-suited for smart grids and microgrids [
13,
14]. Compressive sensing exploits signal sparsity to reduce measurement requirements, while wavelet-based methods excel in analyzing the time–frequency characteristics of harmonic distortions [
15]. Advanced signal decomposition algorithms further improve localization accuracy, particularly in three-phase systems [
16]. However, these methods often necessitate high-frequency sampling and complex computations, posing challenges for real-time applications and scalability in large-scale systems.
Machine learning-based methods have gained prominence due to their ability to extract patterns from measurement data with reduced reliance on detailed system models [
2,
17]. Deep learning techniques, such as convolutional neural networks, achieve high accuracy in analyzing harmonic current limits, while soft computing approaches, like fuzzy logic, address complexities in distribution networks [
4,
18]. Particle swarm optimization has also been applied to locate dominant harmonic sources with minimal metering [
19]. Despite their potential, these methods often require large datasets and significant computational resources, and their interpretability is limited compared to ensemble methods like random forests.
Existing methods struggle to balance accuracy, computational efficiency, and robustness across diverse power system configurations, particularly in dynamic or large-scale networks. This paper proposes a harmonic source localization method that integrates voltage difference features with a random forest classifier to address these challenges. The method aims to deliver a scalable, topology-independent solution, validated on IEEE standard transmission networks, with objectives of achieving high accuracy, computational efficiency, and robustness to network variations.
The proposed method introduces several key innovations: (1) voltage difference features, incorporating magnitude and phase components, robustly capture harmonic propagation patterns across various network topologies; (2) a random forest classifier ensures high accuracy and computational efficiency, outperforming resource-intensive deep learning models; (3) optimized topology handling, such as merging parallel branches, enhances adaptability to complex systems; and (4) demonstrates scalability through validation on small (9-bus), medium (39-bus), and large (118-bus) IEEE test systems. These contributions distinguish the method from existing approaches, providing a practical framework for harmonic mitigation in modern power systems.
The remainder of this paper is organized as follows:
Section 2 details the proposed methodology, including feature extraction and classification;
Section 3 describes the simulation setup;
Section 4 analyzes results across IEEE test systems; and
Section 5 concludes with future research directions.
3. Simulation Setup
To validate the effectiveness and scalability of the proposed harmonic source localization method, comprehensive experiments are conducted on IEEE standard transmission networks. The simulation workflow, illustrated in
Figure 2, encompasses test system selection, data generation, feature extraction, model training, and performance evaluation. This section details each component, ensuring a robust framework for assessing the method’s performance across diverse network configurations [
21].
3.1. Test Systems
The proposed method is evaluated on three IEEE standard test systems: the 9-bus (9 nodes, 8 branches), 39-bus (39 nodes, 46 branches), and 118-bus (118 nodes, 186 branches, reduced to 179 after merging parallel branches) systems. These systems represent small-, medium-, and large-scale transmission networks, respectively, with diverse load types (PQ, PV, and slack buses) and line characteristics, making them widely used benchmarks for power system studies. Their selection ensures comprehensive validation across varying network complexities, from simple configurations to large-scale topologies with parallel branches.
3.2. Data Generation and Feature Extraction
For each system with
n nodes, a dataset of
samples is generated by simulating 5th harmonic current injections at each node 100 times. Simulations with 50, 100, and 200 samples per node for the 9-bus (99.44%, 100.00%, 99.44%), 39-bus (97.95%, 98.97%, 98.97%), and 118-bus (97.03%, 98.56%, 98.56%) systems confirm that
training samples balance accuracy and dataset size. The injection amplitude is randomly sampled between 0.5 and 2.0 per unit [
23], and the phase angle is uniformly distributed between 0 and
, reflecting realistic harmonic variations observed in power systems. To simulate real-world measurement noise, 5% Gaussian noise is added to 10% of the samples, enhancing robustness evaluation [
22]. Harmonic power flow is calculated using the harmonic admittance matrix as described in
Section 2.1. Voltage difference features, defined in (
5) of
Section 2.2, are extracted across all branches, yielding
features per sample for a system with
m branches. This process ensures a comprehensive representation of harmonic propagation patterns.
3.3. Model Training and Testing
The random forest classifier is implemented using scikit-learn on a desktop (Intel Core i7 CPU with 16 GB RAM). The dataset is split into 70% training and 30% testing sets, with stratification to ensure balanced node representation [
19]. Uniform sampling of 100 samples per node and stratification prevent class imbalance across all test systems, with class weighting available to address any potential imbalances. Hyperparameters, including the number of trees (50–200) and maximum depth (10–None), are optimized via grid search to balance accuracy and computational complexity, with a minimum leaf size of 5. The grid search evaluates tree counts of 50, 100, 150, and 200, and maximum depths of 10, 20, 30, and None, ensuring optimal model performance. The results show that increasing tree counts and maximum depths significantly improves test accuracy while increasing computational time; to balance accuracy and efficiency, we select 100 trees with a maximum depth of 10. The training process is repeated five times with different random seeds to ensure reproducibility. Out-of-bag (OOB) accuracy is computed using samples excluded from bootstrap subsets, providing an unbiased estimate of generalization performance.
3.4. Evaluation Metrics
Performance is evaluated using training accuracy, testing accuracy, OOB accuracy, precision, recall, and F1-score. Training accuracy measures the model’s fit to the training data, while testing accuracy, the primary indicator, assesses generalization to unseen samples [
4]. OOB accuracy complements testing accuracy by estimating generalization without additional validation sets. Precision, recall, and F1-score provide per-class insights, particularly for nodes prone to misclassification. Metrics are averaged over five runs to account for randomness in the random forest, ensuring a robust evaluation across the test systems.
4. Results and Discussion
This section presents a detailed analysis of the proposed harmonic source localization method’s performance across IEEE standard transmission networks, focusing on validation results, misclassification patterns, performance insights, and practical implications. The method leverages voltage difference features, as defined in (
5), to achieve high accuracy and scalability, with results visualized through tables and figures to elucidate its effectiveness [
21].
4.1. Validation on Multiple Test Systems
The proposed method is validated on the IEEE 9-bus (9 nodes, 8 branches), 39-bus (39 nodes, 46 branches), and 118-bus (118 nodes, 186 branches, reduced to 179 after merging parallel branches) systems.
Table 1 compares RF with CNN and KNN; CNN and KNN exhibit lower accuracy in larger systems, underscoring the robustness of RF. Taking the 5th harmonic as an example, the 9-bus system achieves perfect classification (100% training and testing accuracy), attributed to its simple topology with minimal feature overlap. The 39-bus system records 100% training accuracy and 98.72% testing accuracy, while the 118-bus system yields 99.99% training accuracy, 98.98% testing accuracy, and 98.43% out-of-bag (OOB) accuracy [
4]. The slight accuracy drop in larger systems reflects increased topological complexity, particularly in the 118-bus system with higher node connectivity.
The random forest model achieves out-of-bag (OOB) accuracies of 100.00% for the 9-bus system, 99.80% for the 39-bus system, and 99.85% for the 118-bus system, slightly higher than test accuracies (
Table 1) due to the absence of additional noise in OOB data compared to the test set under 5% noise. Compared to deep learning methods requiring several minutes for training on similar power systems [
2] and PSO methods taking seconds to minutes [
19], our method achieves superior computational efficiency with training times of 2.52–1222.31 s and testing times of 1.29–17.00 s across the 9-bus, 39-bus, and 118-bus systems.
Feature importance analysis, based on out-of-bag predictor importance from rerun simulations, reveals that magnitude features () dominate model decisions, contributing approximately 60.29%, 68.96%, and 72.81% in the 9-bus, 39-bus, and 118-bus systems, respectively, compared to 39.71%, 31.04%, and 27.19% for phase features (, ).
Parallel branch merging in the 118-bus system, as described in
Section 2.1, reduces feature redundancy while preserving electrical characteristics, enabling efficient testing (17 s for 3540 samples).
Figure 3 provides the feature importance of the IEEE-9 system.
Figure 4 illustrates training, testing, and OOB accuracies across the three systems, highlighting consistent performance. The 39-bus system’s topology, shown in
Figure 5, displays classification results with green nodes indicating correct classifications and red nodes marking misclassifications, annotated with misclassification frequencies [
18]. The high accuracy in smaller systems and sustained performance in larger ones underscore the method’s scalability.
4.2. Confusion Matrix Analysis
The confusion matrices for the IEEE 9-bus, 39-bus, and 118-bus systems (illustrated in
Figure 6) primarily exhibit strong diagonal dominance, indicating a high proportion of correct classifications across nodes, also illustrating per-node classification accuracy. Off-diagonal elements are sparse, representing misclassifications that are limited in number and concentrated among specific node pairs.
For the IEEE 9-bus system (9 nodes), the matrix shows only 1 misclassification, with node 3 predicted as node 6 in 1 instance. All other diagonal elements equal the per-node test samples (approximately 30), resulting in minimal off-diagonal presence.
For the IEEE 39-bus system (39 nodes), the matrix contains 10 misclassifications, distributed as follows: node 22 predicted as 23 (3 instances), node 30 as 2 (3 instances), node 28 as 29 (2 instances), and node 29 as 28 (2 instances). The diagonal elements dominate for the remaining nodes, with off-diagonal sparsity highlighting localized errors.
For the IEEE 118-bus system (118 nodes), the matrix includes 38 misclassifications, with key off-diagonal entries: node 114 predicted as 115 (11 instances), node 36 as 35 (8 instances), node 110 as 111 (5 instances), node 109 as 108 (4 instances), node 105 as 104 (3 instances), node 35 as 36 (3 instances), node 86 as 87 (2 instances), node 114 as 115 (2 instances), and several single-instance pairs (e.g., node 77 as 78 and node 56 as 55). The diagonal remains predominant, underscoring overall classification reliability despite the system’s complexity.
Precision, recall, and F1-score for the IEEE 9-bus, 39-bus, and 118-bus systems, derived from
Figure 6, average approximately 99.59%, 98.90%, and 98.49% for precision, 99.47%, 98.87%, and 98.47% for recall, and 99.52%, 98.84%, and 98.40% for F1-score, respectively, with lower values for nodes like 114 and 115 in the 118-bus system due to higher misclassification rates (
Table 2); detailed per-node metrics are planned for future work due to space constraints.
4.3. Misclassification Analysis
Some misclassification patterns, detailed in
Table 2, reveal that errors primarily occur between adjacent nodes with similar harmonic propagation paths and identical node types (PQ or PV). In the 39-bus system, nodes 22 and 23 (both PQ, 6-node common path) exhibit 3 misclassifications, and nodes 30 and 2 (both PV, 2-node path) also show 3 misclassifications. In the 118-bus system, nodes 114 and 115 (PQ, 7-node path) result in 11 misclassifications, while nodes 110 and 111 (PV, 14-node path) yield 5. Conversely, nodes 34 (PV) and 37 (PQ), despite a 5-node common path, are correctly classified due to differing node types, as visualized in
Figure 5. These patterns suggest that long topological shortest common paths amplify voltage difference feature similarity, challenging the random forest classifier’s ability to distinguish nodes with identical electrical behavior (PQ or PV). Additionally, nodes closer to high-degree nodes or the slack bus may experience stronger signal interactions, increasing misclassification likelihood. Potential improvements include incorporating topology-aware features, such as node degree or distance to the slack bus, or leveraging ensemble methods to enhance discrimination [
15]. Nodes with higher degrees, indicating more connected branches, tend to increase misclassification rates due to complex signal interactions, while nodes closer to the slack bus exhibit fewer errors due to distinct harmonic propagation; incorporating these topology-aware features could further reduce misclassifications.
4.4. Performance Insights
The proposed method demonstrates stable performance across network scales, with testing accuracies ranging from 98.98% to 100% as shown in
Table 1. The perfect accuracy on the 9-bus system reflects its low node connectivity, which minimizes feature overlap in voltage difference features (Equation (
5)). In contrast, the 118-bus system’s slightly lower accuracy (98.98%) is attributed to increased topological complexity, including higher node degrees and longer propagation paths, which challenge feature discrimination [
4]. The computational efficiency, with testing times as low as 1.29 s for the 9-bus system and 17 s for the 118-bus system, highlights the method’s suitability for large-scale applications. The use of voltage difference features (Equation (
5)) ensures topology independence, while the random forest classifier’s ensemble nature mitigates overfitting as evidenced by the small training-testing accuracy gap [
18]. These insights suggest that the method’s performance is robust, particularly in transmission networks with well-defined topologies. The method’s low testing times support near-real-time harmonic source localization on moderate hardware, though future optimizations, including streaming data processing, are needed to minimize latency for large-scale systems like the 118-bus as reported in
Table 1.
4.5. Practical Implications and Limitations
The method’s high accuracy and low testing time make it suitable for integration into power system monitoring systems, such as those using phasor measurement units (PMUs) or supervisory control and data acquisition (SCADA) systems [
21]. The proposed method’s computational complexity of
for feature extraction and random forest classification, where
m is the number of branches and
n is the number of nodes, scales linearly with network size, unlike HSE methods (
due to matrix inversions) or DL approaches (
for
k layers,
e epochs, and
l neurons), making it more suitable for large networks [
9]. It enables rapid harmonic source identification, reducing equipment downtime and enhancing power quality in transmission networks. Applications include renewable energy integration, where harmonics from inverters are prevalent, and smart grids with dynamic topologies [
7]. The simulations use generalized harmonic source injections and do not explicitly model distributed generation sources like photovoltaic or inverter-based systems; future work will evaluate these sources to enhance applicability.
However, challenges persist in distribution networks due to high impedance and radial topologies, which may increase feature overlap and reduce classification accuracy. Radial topologies and high impedance diversity in distribution networks exacerbate feature overlap by amplifying voltage difference variations across branches, reducing the random forest classifier’s ability to distinguish harmonic sources as observed with increased noise levels. Sensitivity to high noise levels (e.g., 10%) further limits performance in noisy environments as discussed in
Section 2.4 [
6]. To assess robustness under real-world PMU conditions, we evaluated the method with 10% and 20% Gaussian noise added to 10% of samples, yielding noticeable performance decline compared to the 5% noise case. To further validate model generality, RF testing on 3rd and 7th harmonics achieves robust performance across harmonic orders. Future work should explore topology-aware feature engineering and validation with real-world PMU data to address these limitations.
5. Conclusions
This study introduces a harmonic source localization method that integrates voltage difference features with a random forest classifier, achieving high precision and computational efficiency in power systems. By capturing harmonic distortion propagation patterns and optimizing network topology handling, the method ensures robust performance across diverse system configurations. Validation on IEEE standard transmission networks confirms its high accuracy and scalability, demonstrating effectiveness in large-scale systems.
Despite its robust performance in transmission networks, the method faces significant challenges in distribution networks due to high impedance and complex radial topologies, as well as in real-world PMU data scenarios involving sampling rate mismatches, varying harmonic amplitudes, missing data, multi harmonic sources, decision paths, rules, and synchronization errors, necessitating advanced feature engineering, data augmentation, real-time processing, and validation with actual PMU data to address these limitations. Dynamic topology changes, such as switch events and line outages, may alter feature distributions and challenge model accuracy, with future improvements exploring topology-aware features like node degree and electrical distance to enhance robustness [
15]. This work provides a robust framework for harmonic source localization, offering an effective solution to improve power quality across varied power system configurations.