Deep Reinforcement-Learning-Optimized Adaptive EKF for Robust Utility Harmonic Impedance Estimation
Abstract
1. Introduction
2. System Model and Problem Formulation
2.1. Norton Equivalent Model for PCC Harmonic Analysis
2.2. State-Space Dynamic Model
2.2.1. Norton Equivalent Circuit and State Definition
2.2.2. Conventional AEKF with EM Algorithm
2.3. Problem Formulation
3. Proposed Deep Reinforcement-Learning-Optimized AEKF Method
3.1. DQN-Based Adaptive Optimization Framework
3.2. DQN Algorithm and Training Process
- (1)
- Initialize the main network and target network parameters, experience replay buffer, training hyperparameters (discount factor , learning rate , batch size = 32, target network update frequency = 100 steps).
- (2)
- Generate training data through simulation under various grid scenarios (varying background harmonics, impedance ratios, noise levels, and abrupt state changes).
- (3)
- For each training episode:
- Initialize the AEKF and the grid environment, obtain the initial state .
- For each step in the episode: I. Select an action according to the -greedy strategy (initial , decay rate = 0.999 per episode, minimum ). II. Execute the AEKF update with the selected action , obtain the new state and the immediate reward . III. Store the transition into the experience replay buffer. IV. Sample a batch of transitions from the experience replay buffer to train the main network, and update the network parameters by minimizing the loss function:where θ is the parameter of the main network, and is the parameter of the target network. V. Update the target network parameters every fixed number of steps.
- End the episode when the maximum number of steps is reached.
- (4)
- Finish the training when the maximum number of episodes is reached or the cumulative reward converges, and save the trained main network parameters for online application.
3.3. Generalization Assurance
4. Simulation Validation
- M1: Binary Linear Regression method;
- M2: Fluctuation Method;
- M3: Independent Random Vector method;
- M4: Complex ICA method.
4.1. Performance Under Varying Background Harmonics
4.2. Performance Under Varying Impedance Ratio
4.3. Performance Under Different Measurement Noise Levels
5. Field Measurement Verification
5.1. Electric Arc Furnace Data
5.2. DC Terminal Data
6. Discussion
- The proposed method does not need the three key assumptions required by conventional methods (customer-side impedance much larger than utility-side, negligible background harmonics, independent harmonic sources on both sides), which greatly expands the applicable scope in practical industrial engineering scenarios.
- The DQN-based RL optimization framework can adaptively adjust the noise covariance update rule in real time according to the grid operation state and filtering performance, which effectively solves the problem of accuracy degradation of conventional fixed-rule AEKF in dynamic grid scenarios and has stronger anti-interference ability and dynamic tracking ability.
- Both simulation and field verification show that the proposed method has higher estimation accuracy than mainstream existing methods under various scenarios, and the estimation results have the smallest fluctuation in consecutive time windows, which can provide stable and reliable impedance estimation results for practical engineering.
- The proposed method only needs the PCC harmonic voltage and current data that can be measured by conventional power quality monitoring devices, without additional signal injection or equipment modification, has no interference with the system operation, and can be easily integrated into the existing power quality monitoring system. In addition, the DQN agent is trained offline, and the online application has low computational complexity, which can meet the real-time requirements of industrial monitoring systems.
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| PCC | Point of Common Coupling |
| RL | Reinforcement Learning |
| AEKF | Adaptive Extended Kalman Filter |
| EM | Expectation Maximization |
| ICA | Independent Component Analysis |
| DQN | Deep Q-Network |
| SNR | Signal-to-Noise Ratio |
| EAF | Electric Arc Furnace |
| DDPG | Deep Deterministic Policy Gradient |
| SAC | Soft Actor-Critic |
References
- Xue, T.; Wei, Z.; Du, X.; Liu, J. A New Impedance Measurement Method for Wind Farms Considering the Influence of Background Harmonics. Electronics 2025, 14, 501. [Google Scholar] [CrossRef]
- Zhou, X.; Xu, D.; Huang, Y. Impedance Characteristics and Harmonic Analysis of LCL-Type Grid-Connected Converter Cluster. Energies 2022, 15, 3708. [Google Scholar] [CrossRef]
- Singh, J.; Singh, S.P.; Verma, K.S.; Iqbal, A.; Kumar, B. Recent control techniques and management of AC microgrids: A critical review on issues, strategies, and future trends. Int. Trans. Electr. Energy Syst. 2021, 31, e13035. [Google Scholar] [CrossRef]
- Wang, Y.; Chen, H.; Yang, L.; Zhang, P.; Li, S. Non-Injected broadband impedance estimation based on injected impedance measurement. Int. J. Electr. Power Energy Syst. 2025, 172, 111363. [Google Scholar] [CrossRef]
- Mahlalela, J.S.; Massucco, S.; Mosaico, G.; Saviozzi, M. Harmonic Source Modeling Techniques for Wide-Area Distribution System Monitoring: A Systematic Review. Energies 2026, 19, 1810. [Google Scholar] [CrossRef]
- Zhu, G.; Dong, J.; Grazian, F.; Bauer, P. A Hybrid Modulation Scheme for Efficiency Optimization and Ripple Reduction in Secondary-Side Controlled Wireless Power Transfer Systems. IEEE Trans. Transp. Electrif. 2025, 11, 6840–6853. [Google Scholar] [CrossRef]
- Wang, C.; Xu, F.; Shu, Q.; Zheng, H.; Ma, Z.; Zhang, W. A Noninvasive Method to Estimate the Variable Utility Harmonic Impedance. IEEE Trans. Power Deliv. 2023, 38, 1747–1754. [Google Scholar] [CrossRef]
- Tang, Z.; Li, H.; Xu, F.; Shu, Q.; Jiang, Y. A Harmonic Impedance Estimation Method Based on the Cauchy Mixed Model. Math. Probl. Eng. 2020, 2020, 1580475. [Google Scholar] [CrossRef]
- Liu, Q.; Li, Y.; Luo, L.; Peng, Y.; Cao, Y. Power quality management of PV power plant with transformer integrated filtering method. IEEE Trans. Power Deliv. 2019, 34, 941–949. [Google Scholar] [CrossRef]
- Cella, U.; Naidu, B.R. Harmonic Equivalent Circuit Estimation Using Continuous Monitoring and Naturally Occurring Disturbances: Theory and Experimental Results. In Proceedings of the 2025 IEEE PES 35th Australasian Universities Power Engineering Conference (AUPEC), Brisbane, Australia, 29 September–1 October 2025; pp. 1–5. [Google Scholar] [CrossRef]
- Wang, C.; Yu, C.; Shu, Q. An Algorithm for Estimating Time-Varying Impedance at PCC Based on Numerical Variation. IEEE Trans. Instrum. Meas. 2023, 72, 1–9. [Google Scholar] [CrossRef]
- Xia, Y.; Tang, W.; Lin, X. Assessing the Harmonic Impedance Based on Least Squares Support Vector Machine. In Proceedings of the 2021 4th International Conference on Energy, Electrical and Power Engineering (CEEPE), Chongqing, China, 23–25 April 2021; pp. 987–992. [Google Scholar] [CrossRef]
- Xia, Y.; Tang, W. Study on Harmonic Impedance Estimation Based on Gaussian Mixture Regression Using Railway Power Supply Loads. Energies 2022, 15, 6952. [Google Scholar] [CrossRef]
- Cheng, Z.; Zhang, H.; Liu, H.; Xu, P. Assessment Method of Harmonic Contribution Based on Covariance Characteristic. In Proceedings of the 2024 11th International Forum on Electrical Engineering and Automation (IFEEA), Shenzhen, China, 22–24 November 2024; pp. 328–331. [Google Scholar] [CrossRef]
- Tang, X.; Xu, F.; Wang, W.; Wang, C.; Chen, C.; Fang, J. Harmonic Contribution Quantification for Multiple Harmonic Sources Based on Minimum Impedance Fluctuation. IEEE Access 2023, 11, 87409–87419. [Google Scholar] [CrossRef]
- Zhang, J.; Jiang, D.; Liu, C. Dominant Harmonic Source Determination Based on Comprehensive Minimum Fluctuation. In Proceedings of the 2025 8th International Conference on Power and Energy Applications (ICPEA), Shanghai, China, 23–25 October 2025; pp. 341–346. [Google Scholar] [CrossRef]
- Hui, J.; Yang, H.; Lin, S.; Ye, M. Assessing Utility Harmonic Impedance Based on the Covariance Characteristic of Random Vectors. IEEE Trans. Power Deliv. 2010, 25, 1778–1786. [Google Scholar] [CrossRef]
- Xu, F.; Yang, H.; Zhao, J.; Wang, Z.; Liu, Y. Study on Constraints for Harmonic Source Determination Using Active Power Direction. IEEE Trans. Power Deliv. 2018, 33, 2683–2692. [Google Scholar] [CrossRef]
- Karimzadeh, F.; Esmaeili, S.; Hosseinian, S. A Novel Method for Noninvasive Estimation of Utility Harmonic Impedance Based on Complex Independent Component Analysis. IEEE Trans. Power Deliv. 2015, 30, 1843–1852. [Google Scholar] [CrossRef]
- Novey, M.; Adali, T. Complex ICA by Negentropy Maximization. IEEE Trans. Neural Netw. 2008, 19, 596–609. [Google Scholar] [CrossRef]
- Karimzadeh, F.; Hossein, H.S.; Esmaeili, S. Method for determining utility and consumer harmonic contributions based on complex independent component analysis. IET Gener. Transm. Distrib. 2016, 10, 526–534. [Google Scholar] [CrossRef]
- Zhang, S.; Chang, X.; Li, S.; Wang, J. Evaluation of wind farm harmonic emission level based on parameter adaptive FastICA. In Proceedings of the 2022 International Conference on Wireless Communications, Electrical Engineering and Automation (WCEEA), Indianapolis, IN, USA, 15–16 October 2022; pp. 132–135. [Google Scholar] [CrossRef]
- Zhao, X.; Yang, H. A New Method to Calculate the Utility Harmonic Impedance Based on FastICA. IEEE Trans. Power Deliv. 2016, 31, 381–388. [Google Scholar] [CrossRef]
- Tang, Z.; Shu, Q.; Xu, F.; Jiang, Y. A Novelty Method for the Utility Harmonic Impedance Estimation Based on Gaussian Mixed Model. IET Gener. Transm. Distrib. 2020, 14, 2573–2580. [Google Scholar] [CrossRef]
- Akhlaghi, S.; Zhou, N. Adaptive multi-step prediction based EKF to power system dynamic state estimation. In Proceedings of the 2017 IEEE Power and Energy Conference at Illinois (PECI), Champaign, IL, USA, 23–24 February 2017; pp. 1–8. [Google Scholar] [CrossRef]
- Wang, T.; Huang, S.; Gao, M.; Wang, Z. Adaptive Extended Kalman Filter Based Dynamic Equivalent Method of PMSG Wind Farm Cluster. IEEE Trans. Ind. Appl. 2021, 57, 2908–2917. [Google Scholar] [CrossRef]
- Gong, C.; Sou, W.-K.; Lam, C.-S. Reinforcement Learning Based Sliding Mode Control for a Hybrid-STATCOM. IEEE Trans. Power Electron. 2023, 38, 6795–6800. [Google Scholar] [CrossRef]
- Cavus, M.; Jiang, J.; Allahham, A. Deep Multi-Task Forecasting of Net-Load and EV Charging with a Residual-Normalised GRU in IoT-Enabled Microgrids. Energies 2026, 19, 311. [Google Scholar] [CrossRef]
- Li, L.; Bai, Z.; Zhong, Y.; Zhang, W.; Qi, H. Reinforcement Learning-Enhanced Two-Stage Kalman Filter for Fault Diagnosis. In Proceedings of the 2025 Low-Altitude Economy Forum & International Conference on Low-Altitude Flight Technology and Unmanned Aerial Vehicle Application (LEF & ICLU), Guangzhou, China, 26–28 September 2025; pp. 48–53. [Google Scholar] [CrossRef]
- Xue, L.; Ma, B.; Liu, J.; Mu, C.; Wunsch, D.C. Extended Kalman Filter Based Resilient Formation Tracking Control of Multiple Unmanned Vehicles via Game-Theoretical Reinforcement Learning. IEEE Trans. Intell. Veh. 2023, 8, 2307–2318. [Google Scholar] [CrossRef]
- Wang, B.; Wang, T.; Tang, Y.; Huang, Y. Knowledge-GPT Guided Generalizable Reinforcement Learning for Intelligent Emergency Generator Tripping in Power System. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 20416–20428. [Google Scholar] [CrossRef]
- Carey, M.; Ramsay, J.O. Fast stable parameter estimation for linear dynamical systems. Comput. Stat. Data Anal. 2021, 156, 107124. [Google Scholar] [CrossRef]
- Sivanesan, N.; Thankaraj, A.; Satheesh, S.; John Peter, V.; Satheesh Kumar, J.; Uma Devi, M. Optimizing Climate Condition Prediction Using Q-Learning, Deep Q-Networks, and Policy Gradient Reinforcement Learning Methods. In Proceedings of the 2026 3rd International Conference on Emerging Trends in Engineering and Medical Sciences (ICETEMS), Nagpur, India, 6–7 March 2026; pp. 1–7. [Google Scholar] [CrossRef]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous Control with Deep Reinforcement Learning. arXiv 2015, arXiv:1509.02971. [Google Scholar] [CrossRef]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. arXiv 2018, arXiv:1801.01290. [Google Scholar] [CrossRef]
- Aldosari, W. Integration of Extended Kalman Filtering and Deep Reinforcement Learning for Autonomous UAV Navigation Under GPS Jamming. IEEE Access 2026, 14, 19649–19661. [Google Scholar] [CrossRef]
- Lu, Z.; Gursoy, M.C.; Mohan, C.K.; Varshney, P.K. Constrained Deep Reinforcement Learning for Cognitive Radar Resource Management. IEEE Trans. Radar Syst. 2026, 4, 627–644. [Google Scholar] [CrossRef]
- Shu, Q.; Wu, Y.; Xu, F.; Zheng, H. Estimate Utility Harmonic Impedance via the Correlation of Harmonic Measurements in Different Time Intervals. IEEE Trans. Power Deliv. 2020, 35, 2060–2067. [Google Scholar] [CrossRef]
- Fan, Y.; Xu, F.; Wang, C.; Shu, Q. A Harmonic Impedance on Utility Side Estimation Method Based on Laplace Mixture Model. IEEE Trans. Power Deliv. 2024, 39, 1774–1782. [Google Scholar] [CrossRef]











| Algorithm | Average Relative Error (%) | Maximum Transient Error (%) | Convergence Time (ms) | Training Episodes | Inference Time per Sample (μs) | p-Value vs. DQN |
|---|---|---|---|---|---|---|
| DQN | 0.18 | 0.80 | 0.5 | 500 | 15.7 | - |
| DDPG | 0.21 | 1.30 | 1.2 | 1500 | 28.4 | 0.023 |
| SAC | 0.23 | 1.50 | 1.5 | 1800 | 32.1 | 0.008 |
| Algorithm | Amplitude/Ω | Phase/Rad | Convergence Status (Failed Runs) |
|---|---|---|---|
| The proposed method | 10.1173 ± 0.0421 | 1.2018 ± 0.0123 | 0/10 |
| M1 | 11.2061 ± 0.1875 | 1.2733 ± 0.0452 | 3/10 |
| M2 | 10.1883 ± 0.0964 | 1.1914 ± 0.0287 | 4/10 |
| M3 | 10.0985 ± 0.0732 | 1.2177 ± 0.0215 | 2/10 |
| M4 | 10.1352 ± 0.1200 | 1.1697 ± 0.0330 | 2/10 |
| Harmonic Order | 3rd | 5th | 7th | 11th | |
|---|---|---|---|---|---|
| The proposed method | Amplitude/Ω | 10.1173 ± 0.0421 | 6.2451 ± 0.0317 | 4.5218 ± 0.0273 | 2.8974 ± 0.0225 |
| Phase/rad | 1.2018 ± 0.0123 | 1.1872 ± 0.0105 | 1.1735 ± 0.0098 | 1.1598 ± 0.0087 | |
| M1 | Amplitude/Ω | 11.2061 ± 0.1875 | 6.8924 ± 0.1523 | 4.9876 ± 0.1345 | 3.2145 ± 0.1127 |
| Phase/rad | 1.2733 ± 0.0452 | 1.2541 ± 0.0387 | 1.2367 ± 0.0352 | 1.2189 ± 0.0318 | |
| M2 | Amplitude/Ω | 10.1883 ± 0.0964 | 6.3102 ± 0.0821 | 4.5721 ± 0.0735 | 2.9316 ± 0.0642 |
| Phase/rad | 1.1914 ± 0.0287 | 1.1785 ± 0.0243 | 1.1659 ± 0.0217 | 1.1523 ± 0.0194 | |
| M3 | Amplitude/Ω | 10.0985 ± 0.0732 | 6.2137 ± 0.0619 | 4.4982 ± 0.0563 | 2.8763 ± 0.0491 |
| Phase/rad | 1.2177 ± 0.0215 | 1.2034 ± 0.0189 | 1.1896 ± 0.0172 | 1.1754 ± 0.0156 | |
| M4 | Amplitude/Ω | 10.1352 ± 0.1200 | 6.2689 ± 0.1012 | 4.5437 ± 0.0925 | 2.9128 ± 0.0817 |
| Phase/rad | 1.1697 ± 0.0330 | 1.1562 ± 0.0291 | 1.1438 ± 0.0265 | 1.1315 ± 0.0238 | |
| Algorithm | Amplitude/Ω | Phase/Rad | Convergence Status (Failed Runs) |
|---|---|---|---|
| The proposed method | 79.6336 ± 0.0872 | 1.0810 ± 0.0094 | 0/10 |
| M1 | 79.9247 ± 0.2135 | 1.0898 ± 0.0217 | 3/10 |
| M2 | 80.3419 ± 0.1768 | 1.0819 ± 0.0183 | 3/10 |
| M3 | 79.2138 ± 0.1429 | 1.0815 ± 0.0156 | 1/10 |
| M4 | 79.7757 ± 0.1984 | 1.0711 ± 0.0241 | 2/10 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Tang, Z.; Wei, X.; Wei, Z.; Tan, F.; Tian, C.; Tang, Y.; Xiong, X. Deep Reinforcement-Learning-Optimized Adaptive EKF for Robust Utility Harmonic Impedance Estimation. Electronics 2026, 15, 2557. https://doi.org/10.3390/electronics15122557
Tang Z, Wei X, Wei Z, Tan F, Tian C, Tang Y, Xiong X. Deep Reinforcement-Learning-Optimized Adaptive EKF for Robust Utility Harmonic Impedance Estimation. Electronics. 2026; 15(12):2557. https://doi.org/10.3390/electronics15122557
Chicago/Turabian StyleTang, Zhirong, Xin Wei, Zhaobin Wei, Fei Tan, Cong Tian, Ying Tang, and Xuedou Xiong. 2026. "Deep Reinforcement-Learning-Optimized Adaptive EKF for Robust Utility Harmonic Impedance Estimation" Electronics 15, no. 12: 2557. https://doi.org/10.3390/electronics15122557
APA StyleTang, Z., Wei, X., Wei, Z., Tan, F., Tian, C., Tang, Y., & Xiong, X. (2026). Deep Reinforcement-Learning-Optimized Adaptive EKF for Robust Utility Harmonic Impedance Estimation. Electronics, 15(12), 2557. https://doi.org/10.3390/electronics15122557

