Deep Reinforcement-Learning-Optimized Adaptive EKF for Robust Utility Harmonic Impedance Estimation

Tang, Zhirong; Wei, Xin; Wei, Zhaobin; Tan, Fei; Tian, Cong; Tang, Ying; Xiong, Xuedou

doi:10.3390/electronics15122557

Open AccessArticle

Deep Reinforcement-Learning-Optimized Adaptive EKF for Robust Utility Harmonic Impedance Estimation

by

Zhirong Tang

¹,

Xin Wei

^1,*,

Zhaobin Wei

²

,

Fei Tan

¹,

Cong Tian

¹,

Ying Tang

¹ and

Xuedou Xiong

¹

Guang’an Institute of Technology, Guang’an 638550, China

²

Datang Hydropower Science and Technology Research Institute Co., Ltd., Chengdu 610036, China

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(12), 2557; https://doi.org/10.3390/electronics15122557 (registering DOI)

Submission received: 9 May 2026 / Revised: 3 June 2026 / Accepted: 4 June 2026 / Published: 10 June 2026

(This article belongs to the Special Issue Reinforcement Learning: Emerging Techniques and Future Prospects)

Download

Browse Figures

Versions Notes

Abstract

Accurate estimation of the utility harmonic impedance at the Point of Common Coupling (PCC) is critical for harmonic pollution management in industrial power grids. Existing non-invasive methods rely heavily on restrictive assumptions that are rarely satisfied in practice, and conventional filtering-based approaches suffer from accuracy degradation in dynamic scenarios due to fixed-rule updates of the noise covariance. This paper proposes a deep reinforcement learning (RL)-optimized adaptive extended Kalman filter (AEKF) method for robust harmonic impedance estimation. A state-space model is established without restrictive assumptions, and a deep Q-network (DQN) framework is designed to optimize noise covariance updates adaptively. Simulation results show that the method achieves reliable estimation under normal conditions. Although errors rise under strong noise, it remains stable and exhibits better noise robustness than conventional methods. Field measurements in actual power grid environments further verified the feasibility and application potential of the proposed method in field engineering.

Keywords:

reinforcement learning; deep Q-network; harmonic impedance estimation; adaptive extended Kalman filter; power quality; robust state estimation

1. Introduction

With the increasing development of smart grids, AC microgrids have become an important part of future intelligent power distribution systems [1,2,3]. However, with the large-scale integration of nonlinear loads and renewable energy power stations, the voltage quality at the Point of Common Coupling (PCC) of microgrids and distribution networks has deteriorated significantly, directly affecting the stable operation of power systems [4,5]. Meanwhile, the dynamic interaction between widely deployed power electronic converters has made harmonic characteristics more complex and time-varying [6], generating interharmonics and causing harmonic resonance that further exacerbates the difficulty of accurate harmonic impedance estimation, and traditional methods assuming stationary harmonic conditions are no longer sufficient for modern industrial grids. Thus, accurate estimation of utility harmonic impedance at the PCC is critical for quantifying harmonic pollution responsibility, evaluating user harmonic emission levels, and realizing microgrid islanding detection [7].

Existing utility harmonic impedance estimation methods are mainly divided into invasive methods and non-invasive methods [8]. Invasive methods estimate harmonic impedance by injecting additional harmonic current into the system, which will interfere with normal system operation. In contrast, non-invasive methods only use naturally fluctuating harmonic voltage and current data at the PCC for estimation and have become the mainstream research direction [9]. Mainstream non-invasive methods include regression methods [10,11,12,13,14], fluctuation methods [15,16], covariance characteristic methods [17], and independent component analysis (ICA) methods [18,19,20,21,22,23]. However, all these methods rely on one or more restrictive assumptions, such as negligible background harmonics and independent harmonic sources on both sides, which are rarely satisfied in actual complex power grids, leading to large estimation errors [8,24].

The adaptive extended Kalman filter (AEKF) has been widely used in state estimation due to its good dynamic tracking performance [25,26]. Conventional AEKF adopts the expectation maximization (EM) algorithm to update noise covariance, but its update rule uses fixed weight coefficients for all historical data. When the grid operating state changes abruptly, the fixed update rule cannot adapt to system dynamics, resulting in degraded tracking performance and estimation accuracy. In recent years, reinforcement learning (RL) has shown excellent performance in adaptive optimization problems of dynamic systems [27]. Cavus et al. [28] demonstrated that multi-task learning architectures can effectively capture cross-variable dependencies in non-stationary energy time-series data, achieving superior robustness compared to single-task approaches. Existing studies have proved that deep RL can effectively optimize the parameter adjustment process of Kalman filters, achieving better performance than fixed-rule algorithms [29,30]. However, there is still no research applying deep RL-optimized AEKF to harmonic impedance estimation.

To fill this research gap, this paper proposes a DQN-based RL-AEKF method for robust utility harmonic impedance estimation. A state-space dynamic model is established based on the PCC Norton equivalent circuit, avoiding the restrictive assumptions of conventional methods. A DQN agent is designed to adaptively adjust the weight coefficient of noise covariance update in real time, solving the problem of poor adaptive ability of a conventional fixed-rule AEKF. Comprehensive simulations and field measurements at two industrial sites verify the superiority and engineering applicability of the proposed method.

The rest of this paper is organized as follows: Section 2 describes the system model and the proposed RL-AEKF method. Section 3 presents the simulation and field measurement results. Section 4 discusses the advantages, limitations, and future work. Finally, Section 5 draws the main conclusions.

2. System Model and Problem Formulation

2.1. Norton Equivalent Model for PCC Harmonic Analysis

The Norton equivalent circuit is the most widely used model for PCC harmonic emission analysis, as shown in Figure 1 [8].

Where

{\dot{U}}_{p}

is the harmonic voltage phasor at the PCC,

{\dot{I}}_{p}

is the harmonic current phasor at the PCC,

{\dot{Z}}_{s} = R_{s} + j X_{s}

is the utility side harmonic impedance (the target to be estimated),

{\dot{I}}_{s}

is the equivalent harmonic current source at the utility side,

{\dot{I}}_{c}

is the customer harmonic current, and

{\dot{Z}}_{c}

is the harmonic impedance on the customer harmonic current. According to Kirchhoff’s circuit law, the fundamental circuit equation of at the PCC is

{\dot{U}}_{p} = {\dot{I}}_{s} {\dot{Z}}_{s} - {\dot{I}}_{p} {\dot{Z}}_{s} = {\dot{U}}_{s} - {\dot{I}}_{p} {\dot{Z}}_{s},

(1)

where

{\dot{U}}_{s} = {\dot{I}}_{s} {\dot{Z}}_{s}

is the equivalent utility harmonic voltage. Decomposing Equation (1) into real and imaginary parts:

\{\begin{array}{l} U_{p, R} = - I_{p, R} R_{s} + I_{p, X} X_{s} + U_{s, R} \\ U_{p, X} = - I_{p, X} R_{s} - I_{p, R} X_{s} + U_{s, X} \end{array}

(2)

where,

U_{p, R}

,

I_{p, R}

and

U_{s, R}

are the real parts of

{\dot{U}}_{p}

,

{\dot{I}}_{p}

and

{\dot{U}}_{s}

, respectively,

U_{p, X}

,

I_{p, X}

and

U_{s, X}

are the imaginary parts of

{\dot{U}}_{p}

,

{\dot{I}}_{p}

and

{\dot{U}}_{s}

, respectively.

2.2. State-Space Dynamic Model

2.2.1. Norton Equivalent Circuit and State Definition

Based on the hidden Markov principle [31], a linear state-space dynamic model is established that follows the characteristic that the current state depends only on the previous state, and the observed quantity is determined only by the current state. The state vector x_n is defined as

x_{n} = {[\begin{matrix} R_{s, n} & X_{s, n} & U_{s, R, n} & U_{s, X, n} \end{matrix}]}^{T}

(3)

where

n

is the n-th sampling instant; since the state variables change slowly over a short observation window, the state transition matrix is set to the identity matrix

A = I_{4}

, which is physically justified by the fundamental time-scale separation between the sampling process and grid impedance variations. As specified in Section 4, our sampling frequency is set to 10 kHz, corresponding to a sampling interval of 0.1 ms. In contrast, grid harmonic impedance is determined by the physical parameters of transmission lines, transformers, and system topology, which change on a timescale ranging from milliseconds (for fast load switching) to seconds (for topology changes). Within a single 0.1 ms sampling interval, these parameters can be accurately approximated as constant, a principle widely adopted in power system state estimation [25,26]. Crucially, all unmodeled dynamic variations, including fast impedance changes, grid frequency fluctuations, and the stochastic nature of renewable energy generation, are explicitly captured by the zero-mean Gaussian process noise vector wn with covariance matrix

Q_{n}

. The core innovation of our method lies in using a DQN agent to adaptively adjust

Q_{n}

and Rn in real time, which effectively compensates for any approximation errors introduced by the identity state transition matrix assumption. And the state equation is:

x_{n} = A x_{n - 1} + w_{n}

(4)

where

w_{n}

is the process noise vector obeying a zero-mean Gaussian distribution, with covariance matrix

Q_{n} = E [w_{n} w_{n}^{T}]

. The real and imaginary parts of the PCC harmonic voltage construct the observation vector:

z_{n} = {[\begin{matrix} U_{p, R, n} & U_{p, X, n} \end{matrix}]}^{T}

(5)

The observation matrix is constructed by the real and imaginary parts of the PCC harmonic current:

H_{n} = [\begin{matrix} - I_{p, R, n} & I_{p, X, n} & 1 & 0 \\ - I_{p, X, n} & - I_{p, R, n} & 0 & 1 \end{matrix}]

(6)

Thus, the observation equation is

z_{n} = H_{n} x_{n} + v_{n}

(7)

where

v_{n}

is the observation noise vector obeying a zero-mean Gaussian distribution, with covariance matrix

R_{n} = E [v_{n} v_{n}^{T}]

.

The key problem to be solved is: only using the measurable PCC harmonic voltage and current data, without relying on any restrictive assumptions, to achieve accurate and stable estimation of utility harmonic impedance, while adapting to unknown time-varying noise and dynamic grid state changes in practical engineering.

2.2.2. Conventional AEKF with EM Algorithm

The conventional AEKF algorithm includes the prediction step, update step, and fixed-rule noise covariance update step based on the EM algorithm [32].

Prediction Step:

{\hat{x}}_{n | n - 1} = A {\hat{x}}_{n - 1 | n - 1}

(8)

P_{n | n - 1} = A P_{n - 1 | n - 1} A^{T} + {\hat{Q}}_{n - 1}

(9)

where

{\hat{x}}_{n | n - 1}

is the prior state prediction value, and

P_{n | n - 1}

is the prior error covariance matrix.

Update Step:

K_{n} = P_{n | n - 1} H_{n}^{T} {(H_{n} P_{n | n - 1} H_{n}^{T} + {\hat{R}}_{n})}^{- 1}

(10)

{\hat{x}}_{n | n} = {\hat{x}}_{n | n - 1} + K_{n} (z_{n} - H_{n} {\hat{x}}_{n | n - 1})

(11)

P_{n | n} = (I - K_{n} H_{n}) P_{n | n - 1}

(12)

where

K_{n}

is the Kalman gain, and

P_{n | n}

is the posterior error covariance matrix.

Fixed-Rule Noise Covariance Update:

e_{n} = z_{n} - H_{n} {\hat{x}}_{n | n - 1}

(13)

{\hat{R}}_{n} = \frac{1}{n} \sum_{i = 1}^{n} (e_{i} e_{i}^{T} - H_{i} P_{i | i - 1} H_{i}^{T})

(14)

{\hat{w}}_{n} = {\hat{x}}_{n | n} - A {\hat{x}}_{n - 1 | n - 1}

(15)

{\hat{Q}}_{n} = \frac{1}{n} \sum_{i = 1}^{n} ({\hat{w}}_{i} {\hat{w}}_{i}^{T} + P_{i | i} - A P_{i - 1 | i - 1} A^{T})

(16)

where

e_{n}

is the innovation sequence.

2.3. Problem Formulation

From the above model, the PCC harmonic voltage U_p and current I_p can be directly measured at the PCC, while the utility harmonic impedance Z_s (i.e., R_s and X_s) and the equivalent background harmonic voltage Us are hidden state variables to be estimated. The main problem to be solved in this paper is: only using the measurable PCC harmonic voltage and current data, without relying on any restrictive assumptions (customer-side impedance much larger than utility-side, negligible background harmonics, independent harmonic sources on both sides), to achieve accurate and stable estimation of the utility harmonic impedance, while adapting to the unknown and time-varying noise and dynamic grid state changes in practical industrial engineering.

The conventional AEKF uses a fixed-rule EM algorithm to update the noise covariance matrix

Q_{n}

and

R_{n}

, which cannot adapt to the dynamic changes in the grid. Thus, this paper designs a deep RL framework to optimize the update process of the noise covariance matrix, so as to improve the adaptive ability and estimation accuracy of the algorithm in dynamic and uncertain industrial grid scenarios.

3. Proposed Deep Reinforcement-Learning-Optimized AEKF Method

3.1. DQN-Based Adaptive Optimization Framework

The deep Q-network (DQN) algorithm is adopted to design the adaptive optimization framework, which is suitable for discrete action space decision-making problems and has good stability for industrial applications [33].

The state vector of the DQN agent is constructed by key filtering state quantities and grid fluctuation indicators, which are all observable in real time:

s_{n} = [‖ e_{n} ‖_{2}, tr (P_{n | n - 1}), tr (P_{n | n}), | Δ {‖ z_{n} ‖}_{2} |]

(17)

where

‖ \cdot ‖_{2}

is the 2-norm,

tr (\cdot)

is the matrix trace, and

| Δ {‖ z_{n} ‖}_{2} |

reflects the grid fluctuation degree. The state space is normalized to [0,1] to improve training efficiency.

The action space is a discrete space with 10 optional actions, corresponding to the adaptive weight coefficient

a_{n} \in \{0.1, 0.2, . . ., 1.0\}

. The optimized noise covariance update equations with a sliding window are

{\hat{R}}_{n} = (1 - a_{n}) {\hat{R}}_{n - 1} + a_{n} \cdot \frac{1}{N} \sum_{i = n - N + 1}^{n} (e_{i} e_{i}^{T} - H_{i} P_{i | i - 1} H_{i}^{T})

(18)

{\hat{Q}}_{n} = (1 - a_{n}) {\hat{Q}}_{n - 1} + a_{n} \cdot \frac{1}{N} \sum_{i = n - N + 1}^{n} ({\hat{w}}_{i} {\hat{w}}_{i}^{T} + P_{i | i} - A P_{i - 1 | i - 1} A^{T})

(19)

where N is the sliding window length. A small

a_{n}

ensures filtering stability in steady state, while a large

a_{n}

improves tracking ability during abrupt changes. The reward function is the core of the RL algorithm, which guides the agent to learn the optimal decision-making strategy. The design goal of the reward function is to minimize the filtering error and ensure the stability of the filtering process. The immediate reward at the n-th instant is defined as:

r_{n} = - (α \cdot ‖ e_{n} ‖_{2}^{2} + β \cdot | tr (P_{n | n}) - tr (P_{n - 1 | n - 1}) |)

(20)

where α and β are weight coefficients, which are set to 0.7 and 0.3 respectively in this paper. The first term of the reward function punishes the filtering innovation error to improve the estimation accuracy of the algorithm; the second term punishes the drastic change in the error covariance matrix to ensure the stability of the filtering process. The agent will obtain a higher reward when the filtering error is smaller and the filtering process is more stable, so as to learn the optimal adaptive weight coefficient adjustment strategy.

3.2. DQN Algorithm and Training Process

The discrete-action DQN algorithm is adopted in this work after a systematic comparison with continuous-action reinforcement learning algorithms such as Deep Deterministic Policy Gradient (DDPG) [34] and Soft Actor-Critic (SAC) [35]. This choice is justified by three key considerations. First, the weight coefficient a_n for noise covariance update does not require extremely fine-grained continuous tuning. As discussed in Section 3.1, the optimal a_n typically falls within the range of 0.1 to 1.0. Dividing this range into 10 discrete steps of 0.1 provides sufficient resolution for adaptive tuning. Additional experiments comparing 5, 10, 20, and 50 discrete actions show that increasing the number of actions beyond 10 does not lead to any significant improvement in estimation performance (average error reduction < 0.03%), while the computational overhead increases linearly. Then, discrete-action DQN offers significantly better training stability and lower computational complexity. Unlike continuous-action algorithms that require separate actor and critic networks, DQN uses a single Q-network, which is less prone to training instability, mode collapse, and overfitting. In our experiments, DQN converges reliably within 500 training episodes, while DDPG requires more than 1500 episodes and exhibits 15% higher variance in the final performance. Finally, for real-time power system applications, computational efficiency is a critical requirement. The DQN inference per sampling instant involves only a single forward pass through a small neural network, which can be easily implemented on low-cost embedded systems. Continuous-action algorithms would require additional computational resources for the actor network, which could compromise the real-time performance of the estimation system. To further validate the choice of discrete-action DQN, we conducted a comprehensive comparative study between the proposed 10-action DQN, DDPG, and SAC algorithms. All algorithms were trained under the same grid scenarios and evaluated on the same test datasets. The performance and computational complexity comparison results are shown in Figure 2 and summarized in Table 1.

The results show that the proposed DQN achieves comparable or better performance than the continuous-action algorithms in all metrics. DQN achieves the lowest average relative error (0.18%) and the fastest convergence time (0.5 ms) during impedance step changes. To confirm that this performance difference is statistically significant rather than random noise, we conducted a paired two-tailed t-test on the estimation errors of 100 independent test scenarios. The results show that the p-value between DQN and DDPG is 0.023 (<0.05), indicating a statistically significant improvement of the proposed method. In terms of computational efficiency, DQN has the shortest inference time per sampling point (15.7 μs), which is only 55% of DDPG and 49% of SAC. The training time of DQN is also significantly shorter, requiring only 1/3 of the training episodes of DDPG to converge. For comparison, the conventional EM-AEKF algorithm has an inference time of 12.3 μs per sample. The proposed DQN-AEKF introduces only 3.4 μs of additional computational overhead, which is negligible for real-time industrial monitoring systems operating at a 10 kHz sampling frequency (corresponding to a 100 μs sampling interval). These results confirm that the discrete-action DQN is the most suitable choice for this application, balancing estimation accuracy, dynamic tracking performance, and computational efficiency.

The reward function is designed to balance tracking accuracy and filtering stability. As defined in Equation (20), the reward function penalizes both large innovation errors and abrupt changes in the error covariance matrix. To determine the optimal values of the reward weights α and β, we conducted a comprehensive sensitivity analysis. We evaluated all combinations of α and β in the range [0.1, 0.9] with a step size of 0.1, under three different operating conditions: steady state, step impedance change, and high noise. The results show that the optimal performance is achieved when α = 0.7 and β = 0.3. This weight combination minimizes the overall estimation error while ensuring a stable filtering operation. When α > 0.7, the filter becomes too aggressive and exhibits increased steady-state noise. When α < 0.7, the filter becomes overly conservative and exhibits a slower dynamic response. To verify the selection of the reward weights α = 0.7 and β = 0.3, we present the results of the sensitivity analysis in Figure 3.

Figure 3 shows the average relative estimation error as a function of α and β. The minimum error is clearly achieved at the point (α = 0.7, β = 0.3), confirming that this weight combination provides the optimal balance between tracking accuracy and filtering stability. The error increases gradually as the weights deviate from the optimal values, with a more significant increase when α is too small or β is too large. These results validate our choice of reward weights and demonstrate the robustness of the proposed method to reasonable variations in these hyperparameters. The contour plot shows the average relative estimation error for different combinations of α and β. The minimum error is achieved at α = 0.7 and β = 0.3, which is marked by the red dot. The color bar indicates the magnitude of the average relative error in percentage.

The DQN algorithm uses a deep neural network to approximate the action-value function

Q (s, a)

, which represents the expected cumulative reward when the agent takes action

a

in state

s

[36]. The algorithm adopts the experience replay mechanism and target network to improve the stability of training. The structure of the Q network is a fully connected neural network with three hidden layers, each with 64 neurons, and the activation function is ReLU. The input of the network is the state vector

s_{n}

, and the output is the Q value of each optional action. The target network has the same structure as the main network, and its parameters are updated by copying the main network parameters every fixed number of steps [37].

The offline training process of the DQN agent is as follows:

(1)

Initialize the main network and target network parameters, experience replay buffer, training hyperparameters (discount factor

γ = 0.9

, learning rate

λ = 0.001

, batch size = 32, target network update frequency = 100 steps).

(2)

Generate training data through simulation under various grid scenarios (varying background harmonics, impedance ratios, noise levels, and abrupt state changes).

(3)

For each training episode:

Initialize the AEKF and the grid environment, obtain the initial state $s_{0}$ .
For each step in the episode: I. Select an action $a_{n}$ according to the $ε$ -greedy strategy (initial $ε = 0.9$ , decay rate = 0.999 per episode, minimum $ε = 0.1$ ). II. Execute the AEKF update with the selected action $a_{n}$ , obtain the new state $s_{n + 1}$ and the immediate reward $r_{n}$ . III. Store the transition $(s_{n}, a_{n}, r_{n}, s_{n + 1})$ into the experience replay buffer. IV. Sample a batch of transitions from the experience replay buffer to train the main network, and update the network parameters by minimizing the loss function:

$L (θ) = E [{(r_{n} + γ \max_{a'} Q (s_{n + 1}, a'; θ^{-}) - Q (s_{n}, a_{n}; θ))}^{2}]$

(21)

where θ is the parameter of the main network, and $θ^{-}$ is the parameter of the target network. V. Update the target network parameters every fixed number of steps.
End the episode when the maximum number of steps is reached.

(4)

Finish the training when the maximum number of episodes is reached or the cumulative reward converges, and save the trained main network parameters for online application.

3.3. Generalization Assurance

To minimize the domain shift between simulation training data and real-world field measurements and ensure the generalization ability of the DQN agent, we have taken three key measures. First, the training dataset was systematically constructed to cover all typical operating conditions of industrial power grids, including background harmonic current ratios from 0.1 to 1.0, SNR levels from 10 dB to 60 dB, random load fluctuations with amplitudes up to ±30% of the base load, uniform, Gaussian and skewed harmonic phase distributions, impedance ratios from 3 to 12, and common transient events such as voltage sags, current spikes and capacitor bank switching. In total, over 15,000 independent simulation scenarios were generated to expose the agent to a wide range of operating conditions that closely resemble real industrial environments. Second, we incorporated statistical characteristics extracted from the collected EAF and DC terminal field data into the simulation process, matching the harmonic amplitude distribution, calibrating the noise model using actual field-measured noise variance, and generating transient event patterns based on real EAF transient characteristics. This statistical matching effectively reduces the domain gap by making the simulation data statistically consistent with real field measurements rather than using idealized conditions. Finally, no fine-tuning was performed on any real field data during the entire validation process. The same DQN model trained exclusively on 3rd harmonic simulation data was directly applied to both the 150 kV EAF site and the 500 kV DC terminal site with completely different grid topologies and operating characteristics, and maintained stable estimation performance across different harmonic orders, confirming that the model has learned a general adaptive noise covariance adjustment strategy rather than overfitting to specific simulation conditions.

The flowchart of the proposed DRL-optimized AEKF method for utility harmonic impedance estimation is shown in Figure 4.

4. Simulation Validation

To verify the estimation accuracy, robustness, and superiority of the proposed RL-AEKF method under different practical industrial grid scenarios, a simulation model is established based on the Norton equivalent circuit in MATLAB 2021a. The performance of the proposed method is compared with 4 mainstream existing methods widely used in engineering:

M1: Binary Linear Regression method;
M2: Fluctuation Method;
M3: Independent Random Vector method;
M4: Complex ICA method.

The fundamental parameters of the simulation model are set as follows: system rated voltage: 10 kV; fundamental frequency [24]: 50 Hz. Target utility harmonic impedance (true value):

{\dot{Z}}_{s} = 15 ∠ 77^{\circ} Ω

(

R_{s} = 3.37 Ω

,

X_{s} = 14.62 Ω

). Customer-side harmonic impedance:

{\dot{Z}}_{c} = m \cdot | Z_{s} | ∠ 66^{\circ}

, where

m

is the impedance ratio between customer side and utility side. The amplitude ratio of utility side and customer side harmonic current sources:

q = | {\dot{I}}_{s} / {\dot{I}}_{c} |

, which represents the relative level of background harmonics. Sampling frequency: 10 kHz, 300 groups of valid samples are generated for each scenario, and the sliding window length is set to 60. The estimation performance is evaluated by the relative error of amplitude and phase:

δ_{| Z |} = |\frac{| {\hat{Z}}_{s} | - | Z_{s, true} |}{| Z_{s, true} |}| \times 100 %

(22)

δ_{φ} = |\frac{{\hat{φ}}_{s} - φ_{s, true}}{φ_{s, true}}| \times 100

(23)

where

δ_{| Z |}

is the amplitude relative error, and

δ_{φ}

is the phase relative error.

4.1. Performance Under Varying Background Harmonics

This scenario corresponds to the actual engineering situation where the background harmonics fluctuate greatly due to the output change in renewable energy stations. The impedance ratio is fixed at

m = 5.5

, and

q

is set from 0.1 to 1.0 with a step of 0.1, which covers the scenario from very small background harmonics to background harmonics equivalent to the customer side. The estimation error results and the 95% confidence intervals of each method under different background harmonic levels are shown in Figure 5. It can be seen that with the increase of

q

(i.e., the enhancement of background harmonics), the estimation errors of all methods increase, but the proposed method always maintains the minimum error. Even when

q = 1.0

(strong background harmonics, which make the conventional methods fail), the amplitude relative error of the proposed method is still less than 4%, while the errors of all contrast methods exceed 10%. In addition, the proposed method has significantly higher estimation accuracy than the Complex Independent Component Analysis method (M4) under all background harmonic levels, which proves the effectiveness of the RL-based adaptive optimization. In addition, Figure 5c,d demonstrate that the proposed method consistently maintains the shortest 95% confidence intervals across all background harmonic current ratios, verifying its significantly superior stability and reproducibility compared with all conventional methods.

4.2. Performance Under Varying Impedance Ratio

This scenario corresponds to the actual engineering situation where the customer side impedance changes greatly after different nonlinear loads are connected to the PCC. The background harmonic level is fixed at

q = 0.5

, and

m

is set from 3 to 12 with a step of 1, which covers the scenario from the customer side impedance close to the utility side to the customer side impedance much larger than the utility side. The estimation error results under different impedance ratios are shown in Figure 6. It can be seen that even when

m = 3

(the customer side impedance is close to the utility side, which violates the key assumption of most conventional methods), the amplitude and phase relative errors of the proposed method are still less than 4%, while the errors of the contrast methods all exceed 8%. When

m = 12

, the proposed method still maintains the best estimation performance. This proves that the proposed method does not need the assumption that the customer side impedance is much larger than the utility side and has good robustness under different impedance ratio scenarios. In addition, the proposed method consistently maintains the shortest 95% confidence intervals across all impedance ratios, verifying its superior stability compared with conventional methods.

4.3. Performance Under Different Measurement Noise Levels

To further discuss the advantages of the proposed method, we compare it with M1 to M4 in different noise environments. In this simulation, we set m = 5 and q = 0.3 and add white noise with a signal-to-noise ratio (SNR) of 10–50 dB with a step size of 5 to the harmonic voltage and harmonic current of PCC respectively. The estimation results of the utility harmonic impedance by the three methods are shown in Figure 7. Figure 7 shows that the errors of the four methods exhibit a decrease with increasing SNR. The proposed method achieves an amplitude error of 41.2% at 10 dB, 14.1 percentage points better than the best-performing conventional method in the low-SNR harsh industrial environment, and drops sharply to 2.1–8.3% at 20–30 dB typical industrial conditions, where conventional methods show fluctuating rankings with no consistently stable performer. It also demonstrates clearer advantages in phase estimation, with a 20.5% error at 10 dB (2.7–11.6 percentage points lower than the baseline) and a 51–74% improvement in accuracy at 15 dB. Overall, the proposed method outperforms all conventional methods across the entire tested SNR range, showing more obvious advantages in challenging low-SNR industrial scenarios and validating its effective adaptive noise suppression capability. The 95% confidence intervals shown in Figure 7c,d further confirm that the proposed method has the smallest estimation variance across all SNR levels.

5. Field Measurement Verification

To further verify the practical engineering applicability of the proposed method, two sets of the field measured harmonic data from actual industrial sites are used for testing, and the performance is compared with the above four mainstream methods.

5.1. Electric Arc Furnace Data

A well-recognized challenge in field validation of harmonic impedance estimation methods is the lack of direct ground-truth measurements. In operational power grids, invasive methods such as controlled current injection are generally prohibited due to safety concerns and potential interference with normal system operation. To address this limitation, we adopted two complementary validation approaches for the electric arc furnace (EAF) site: (1) cross-method comparison with four widely accepted existing methods (M1–M4), where high consistency between results provides strong evidence of estimation accuracy; and (2) consistency analysis across 10 non-overlapping 1 min sub-intervals, where small standard deviations demonstrate the stability and reproducibility of the method.

The measured data comes from the 150 kV bus of a 100 MW DC EAF in a steel plant [38]. The EAF is a typical strong nonlinear impact load with large harmonic fluctuation and complex background harmonics, which is a difficult scenario for harmonic impedance estimation. In total, 600 groups of 3rd harmonic voltage and current data are collected, with a sampling frequency of 20 kHz. The amplitude and phase of voltage and current are shown in Figure 8 [8]. It can be seen from Figure 8 that the amplitude variation trend of EAF harmonic voltage and harmonic current is almost the same, and in amplitude, the voltage is about 10 times the current.

The overall estimation results of each method for the EAF site data are shown in Table 2. It can be seen that the estimation results of all methods are close, with the amplitude between 10.0 Ω, which is consistent with the actual operation parameters of the site. The proposed method has the most stable estimation results, and the phase estimation value is consistent with the fluctuation trend of the measured data. As shown in Table 2, the proposed method has the smallest standard deviations for both amplitude and phase estimation and achieved zero convergence failures across all 10 sub-intervals. This confirms that the DQN-based adaptive noise covariance adjustment mechanism can effectively prevent filter divergence and maintain stable estimation performance even under severe and time-varying harmonic fluctuations in actual industrial scenarios.

To further verify the stability of the proposed method, the 600 groups of data are divided into 10 consecutive sub-intervals, each with 60 groups of data, and the estimation is carried out for each sub-interval respectively. The results are shown in Figure 9. It can be seen that the proposed method has the smallest fluctuation range in the estimation of amplitude and phase among all methods. The average value of the 10 sub-intervals is 10.1029 Ω and 1.2041/rad, which is almost consistent with the overall estimation result in Table 1, while the estimation results of the contrast methods have large fluctuations in different sub-intervals. This fully proves that the proposed method has excellent stability in the actual industrial scenario with large harmonic fluctuation. Notably, the DQN model used for all EAF data analysis was trained exclusively on 3rd harmonic simulation data. Its consistent performance across the 3rd, 5th, 7th, and 11th harmonics (as shown in Table 3) demonstrates excellent cross-harmonic generalization ability, confirming that the learned adaptive strategy applies to different harmonic orders without retraining.

5.2. DC Terminal Data

The DC terminal data is taken from a city power grid with multiple DC terminals, and the measured data are collected from 600 data points (20 sampled data per minute) at a 500 kV bus of the DC terminal for the 11th harmonic voltage and current at the PCC of a DC terminal data are shown in Figure 10 [24]. In the absence of load, the utility side is directly connected into a closed loop with wires. Then the measured system short circuit capacity S_short is 34580MVA.

From Figure 10a,b, it can be seen that the amplitude fluctuations of the harmonic voltage and harmonic current of the DC terminal data are more complicated, which makes it more difficult to estimate the utility harmonic impedance. Table 4 shows the estimated values of the utility harmonic impedance of the DC terminal system by five methods.

According to Ref. [39], we have |Z_s| ≈ h·V²/S_short, where h is the harmonic order and V is the rated voltage of the network. Thus, the utility impedance value estimated by the short circuit capacity of the network is about 79.5 Ω. It can be seen from Table 4 that the utility impedance value estimated by the proposed method (79.6336 ± 0.0872 Ω) is closest to the theoretical value of 79.5 Ω calculated from the short-circuit capacity. The proposed method also achieves the smallest standard deviation and zero convergence failures across all 10 sub-intervals. And the phase estimates of the five methods are relatively similar. Similarly, we divide the 600 data points into 10 subintervals, each subinterval contains 60 data points. Figure 11 shows the estimation values of per subinterval. As can be seen from Figure 11a, when estimating the impedance amplitudes of each subinterval, although the estimated results of the proposed method are fluctuating, they are more stable than those of the other four methods. Figure 11b shows the phase estimated value by the five methods. It is obvious that the fluctuation range of the proposed method is smaller than that of the other four methods. In general, in the estimation of the utility harmonic impedance of the DC terminal, the proposed method is the most robust. Importantly, the same DQN model trained on 3rd harmonic simulation data was directly applied to the 11th harmonic data from this 500 kV DC terminal, which has a completely different grid topology and operating characteristics compared to the 150 kV steel plant. This result further validates the strong cross-topology and cross-harmonic generalization ability of the proposed method, making it suitable for deployment in various industrial power grid scenarios.

6. Discussion

Compared with mainstream existing methods, the main advantages of the proposed method are summarized as follows:

The proposed method does not need the three key assumptions required by conventional methods (customer-side impedance much larger than utility-side, negligible background harmonics, independent harmonic sources on both sides), which greatly expands the applicable scope in practical industrial engineering scenarios.
The DQN-based RL optimization framework can adaptively adjust the noise covariance update rule in real time according to the grid operation state and filtering performance, which effectively solves the problem of accuracy degradation of conventional fixed-rule AEKF in dynamic grid scenarios and has stronger anti-interference ability and dynamic tracking ability.
Both simulation and field verification show that the proposed method has higher estimation accuracy than mainstream existing methods under various scenarios, and the estimation results have the smallest fluctuation in consecutive time windows, which can provide stable and reliable impedance estimation results for practical engineering.
The proposed method only needs the PCC harmonic voltage and current data that can be measured by conventional power quality monitoring devices, without additional signal injection or equipment modification, has no interference with the system operation, and can be easily integrated into the existing power quality monitoring system. In addition, the DQN agent is trained offline, and the online application has low computational complexity, which can meet the real-time requirements of industrial monitoring systems.

7. Conclusions

This paper presents a DQN-optimized adaptive extended Kalman filter for harmonic impedance estimation in power systems. Aiming at the limitation that conventional AEKF cannot dynamically adjust noise covariance under time-varying grid conditions, we introduce deep reinforcement learning to optimize the parameter updating mechanism. The DQN agent learns adaptive adjustment strategies from abundant simulation scenarios covering diverse SNR levels, harmonic distributions, and load fluctuation states, which enables the filter to respond flexibly to complex operating environments.

Numerical simulations and field tests are conducted to evaluate the overall performance of the proposed method. Test results indicate that the proposed method works well under regular grid operating conditions, with the average steady-state relative error controlled below 0.2%. When operating in harsh environments with severe noise interference, such as the condition of 10 dB SNR, the estimation error increases obviously, where the maximum amplitude relative error is 41.2%, and the maximum phase relative error is 20.5%. In comparison with several classic harmonic impedance estimation methods, the proposed approach exhibits more stable performance across the full range of test conditions. Although all methods show increased errors as SNR decreases, the proposed method always obtains relatively higher estimation accuracy.

The proposed method combines the advantages of adaptive filtering and deep reinforcement learning, and it applies to harmonic impedance estimation in complex power grids with time-varying characteristics. In future work, we will further optimize the model structure to reduce computational complexity and extend the algorithm to more complex multi-branch grid scenarios to improve its universality.

Author Contributions

Conceptualization, Z.T. and X.W.; methodology, Z.T.; software, Y.T. and X.X.; validation, Z.W. and Z.T.; formal analysis and investigation, C.T.; resources and data curation, F.T.; writing—original draft preparation, Z.T.; writing—review and editing, X.W. and X.X.; visualization, Y.T.; supervision, Z.T.; project administration and funding acquisition, Z.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the scientific research startup foundation of Guang’an Institute of Technology, Grant No. KYQD-2026-051 and Grant No. KYQD-2026-163.

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

Zhirong Tang, Xin Wei, Fei Tan, Cong Tian, Ying Tang, and Xuedou Xiong declare no conflicts of interest. Zhaobin Wei is an employee of Datang Hydropower Science and Technology Research Institute Co., Ltd. The funders (Scientific Research Startup Foundation of Guang'an Institute of Technology, Grant No. KYQD-2026-051) and the affiliated company had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

PCC	Point of Common Coupling
RL	Reinforcement Learning
AEKF	Adaptive Extended Kalman Filter
EM	Expectation Maximization
ICA	Independent Component Analysis
DQN	Deep Q-Network
SNR	Signal-to-Noise Ratio
EAF	Electric Arc Furnace
DDPG	Deep Deterministic Policy Gradient
SAC	Soft Actor-Critic

References

Xue, T.; Wei, Z.; Du, X.; Liu, J. A New Impedance Measurement Method for Wind Farms Considering the Influence of Background Harmonics. Electronics 2025, 14, 501. [Google Scholar] [CrossRef]
Zhou, X.; Xu, D.; Huang, Y. Impedance Characteristics and Harmonic Analysis of LCL-Type Grid-Connected Converter Cluster. Energies 2022, 15, 3708. [Google Scholar] [CrossRef]
Singh, J.; Singh, S.P.; Verma, K.S.; Iqbal, A.; Kumar, B. Recent control techniques and management of AC microgrids: A critical review on issues, strategies, and future trends. Int. Trans. Electr. Energy Syst. 2021, 31, e13035. [Google Scholar] [CrossRef]
Wang, Y.; Chen, H.; Yang, L.; Zhang, P.; Li, S. Non-Injected broadband impedance estimation based on injected impedance measurement. Int. J. Electr. Power Energy Syst. 2025, 172, 111363. [Google Scholar] [CrossRef]
Mahlalela, J.S.; Massucco, S.; Mosaico, G.; Saviozzi, M. Harmonic Source Modeling Techniques for Wide-Area Distribution System Monitoring: A Systematic Review. Energies 2026, 19, 1810. [Google Scholar] [CrossRef]
Zhu, G.; Dong, J.; Grazian, F.; Bauer, P. A Hybrid Modulation Scheme for Efficiency Optimization and Ripple Reduction in Secondary-Side Controlled Wireless Power Transfer Systems. IEEE Trans. Transp. Electrif. 2025, 11, 6840–6853. [Google Scholar] [CrossRef]
Wang, C.; Xu, F.; Shu, Q.; Zheng, H.; Ma, Z.; Zhang, W. A Noninvasive Method to Estimate the Variable Utility Harmonic Impedance. IEEE Trans. Power Deliv. 2023, 38, 1747–1754. [Google Scholar] [CrossRef]
Tang, Z.; Li, H.; Xu, F.; Shu, Q.; Jiang, Y. A Harmonic Impedance Estimation Method Based on the Cauchy Mixed Model. Math. Probl. Eng. 2020, 2020, 1580475. [Google Scholar] [CrossRef]
Liu, Q.; Li, Y.; Luo, L.; Peng, Y.; Cao, Y. Power quality management of PV power plant with transformer integrated filtering method. IEEE Trans. Power Deliv. 2019, 34, 941–949. [Google Scholar] [CrossRef]
Cella, U.; Naidu, B.R. Harmonic Equivalent Circuit Estimation Using Continuous Monitoring and Naturally Occurring Disturbances: Theory and Experimental Results. In Proceedings of the 2025 IEEE PES 35th Australasian Universities Power Engineering Conference (AUPEC), Brisbane, Australia, 29 September–1 October 2025; pp. 1–5. [Google Scholar] [CrossRef]
Wang, C.; Yu, C.; Shu, Q. An Algorithm for Estimating Time-Varying Impedance at PCC Based on Numerical Variation. IEEE Trans. Instrum. Meas. 2023, 72, 1–9. [Google Scholar] [CrossRef]
Xia, Y.; Tang, W.; Lin, X. Assessing the Harmonic Impedance Based on Least Squares Support Vector Machine. In Proceedings of the 2021 4th International Conference on Energy, Electrical and Power Engineering (CEEPE), Chongqing, China, 23–25 April 2021; pp. 987–992. [Google Scholar] [CrossRef]
Xia, Y.; Tang, W. Study on Harmonic Impedance Estimation Based on Gaussian Mixture Regression Using Railway Power Supply Loads. Energies 2022, 15, 6952. [Google Scholar] [CrossRef]
Cheng, Z.; Zhang, H.; Liu, H.; Xu, P. Assessment Method of Harmonic Contribution Based on Covariance Characteristic. In Proceedings of the 2024 11th International Forum on Electrical Engineering and Automation (IFEEA), Shenzhen, China, 22–24 November 2024; pp. 328–331. [Google Scholar] [CrossRef]
Tang, X.; Xu, F.; Wang, W.; Wang, C.; Chen, C.; Fang, J. Harmonic Contribution Quantification for Multiple Harmonic Sources Based on Minimum Impedance Fluctuation. IEEE Access 2023, 11, 87409–87419. [Google Scholar] [CrossRef]
Zhang, J.; Jiang, D.; Liu, C. Dominant Harmonic Source Determination Based on Comprehensive Minimum Fluctuation. In Proceedings of the 2025 8th International Conference on Power and Energy Applications (ICPEA), Shanghai, China, 23–25 October 2025; pp. 341–346. [Google Scholar] [CrossRef]
Hui, J.; Yang, H.; Lin, S.; Ye, M. Assessing Utility Harmonic Impedance Based on the Covariance Characteristic of Random Vectors. IEEE Trans. Power Deliv. 2010, 25, 1778–1786. [Google Scholar] [CrossRef]
Xu, F.; Yang, H.; Zhao, J.; Wang, Z.; Liu, Y. Study on Constraints for Harmonic Source Determination Using Active Power Direction. IEEE Trans. Power Deliv. 2018, 33, 2683–2692. [Google Scholar] [CrossRef]
Karimzadeh, F.; Esmaeili, S.; Hosseinian, S. A Novel Method for Noninvasive Estimation of Utility Harmonic Impedance Based on Complex Independent Component Analysis. IEEE Trans. Power Deliv. 2015, 30, 1843–1852. [Google Scholar] [CrossRef]
Novey, M.; Adali, T. Complex ICA by Negentropy Maximization. IEEE Trans. Neural Netw. 2008, 19, 596–609. [Google Scholar] [CrossRef]
Karimzadeh, F.; Hossein, H.S.; Esmaeili, S. Method for determining utility and consumer harmonic contributions based on complex independent component analysis. IET Gener. Transm. Distrib. 2016, 10, 526–534. [Google Scholar] [CrossRef]
Zhang, S.; Chang, X.; Li, S.; Wang, J. Evaluation of wind farm harmonic emission level based on parameter adaptive FastICA. In Proceedings of the 2022 International Conference on Wireless Communications, Electrical Engineering and Automation (WCEEA), Indianapolis, IN, USA, 15–16 October 2022; pp. 132–135. [Google Scholar] [CrossRef]
Zhao, X.; Yang, H. A New Method to Calculate the Utility Harmonic Impedance Based on FastICA. IEEE Trans. Power Deliv. 2016, 31, 381–388. [Google Scholar] [CrossRef]
Tang, Z.; Shu, Q.; Xu, F.; Jiang, Y. A Novelty Method for the Utility Harmonic Impedance Estimation Based on Gaussian Mixed Model. IET Gener. Transm. Distrib. 2020, 14, 2573–2580. [Google Scholar] [CrossRef]
Akhlaghi, S.; Zhou, N. Adaptive multi-step prediction based EKF to power system dynamic state estimation. In Proceedings of the 2017 IEEE Power and Energy Conference at Illinois (PECI), Champaign, IL, USA, 23–24 February 2017; pp. 1–8. [Google Scholar] [CrossRef]
Wang, T.; Huang, S.; Gao, M.; Wang, Z. Adaptive Extended Kalman Filter Based Dynamic Equivalent Method of PMSG Wind Farm Cluster. IEEE Trans. Ind. Appl. 2021, 57, 2908–2917. [Google Scholar] [CrossRef]
Gong, C.; Sou, W.-K.; Lam, C.-S. Reinforcement Learning Based Sliding Mode Control for a Hybrid-STATCOM. IEEE Trans. Power Electron. 2023, 38, 6795–6800. [Google Scholar] [CrossRef]
Cavus, M.; Jiang, J.; Allahham, A. Deep Multi-Task Forecasting of Net-Load and EV Charging with a Residual-Normalised GRU in IoT-Enabled Microgrids. Energies 2026, 19, 311. [Google Scholar] [CrossRef]
Li, L.; Bai, Z.; Zhong, Y.; Zhang, W.; Qi, H. Reinforcement Learning-Enhanced Two-Stage Kalman Filter for Fault Diagnosis. In Proceedings of the 2025 Low-Altitude Economy Forum & International Conference on Low-Altitude Flight Technology and Unmanned Aerial Vehicle Application (LEF & ICLU), Guangzhou, China, 26–28 September 2025; pp. 48–53. [Google Scholar] [CrossRef]
Xue, L.; Ma, B.; Liu, J.; Mu, C.; Wunsch, D.C. Extended Kalman Filter Based Resilient Formation Tracking Control of Multiple Unmanned Vehicles via Game-Theoretical Reinforcement Learning. IEEE Trans. Intell. Veh. 2023, 8, 2307–2318. [Google Scholar] [CrossRef]
Wang, B.; Wang, T.; Tang, Y.; Huang, Y. Knowledge-GPT Guided Generalizable Reinforcement Learning for Intelligent Emergency Generator Tripping in Power System. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 20416–20428. [Google Scholar] [CrossRef]
Carey, M.; Ramsay, J.O. Fast stable parameter estimation for linear dynamical systems. Comput. Stat. Data Anal. 2021, 156, 107124. [Google Scholar] [CrossRef]
Sivanesan, N.; Thankaraj, A.; Satheesh, S.; John Peter, V.; Satheesh Kumar, J.; Uma Devi, M. Optimizing Climate Condition Prediction Using Q-Learning, Deep Q-Networks, and Policy Gradient Reinforcement Learning Methods. In Proceedings of the 2026 3rd International Conference on Emerging Trends in Engineering and Medical Sciences (ICETEMS), Nagpur, India, 6–7 March 2026; pp. 1–7. [Google Scholar] [CrossRef]
Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous Control with Deep Reinforcement Learning. arXiv 2015, arXiv:1509.02971. [Google Scholar] [CrossRef]
Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. arXiv 2018, arXiv:1801.01290. [Google Scholar] [CrossRef]
Aldosari, W. Integration of Extended Kalman Filtering and Deep Reinforcement Learning for Autonomous UAV Navigation Under GPS Jamming. IEEE Access 2026, 14, 19649–19661. [Google Scholar] [CrossRef]
Lu, Z.; Gursoy, M.C.; Mohan, C.K.; Varshney, P.K. Constrained Deep Reinforcement Learning for Cognitive Radar Resource Management. IEEE Trans. Radar Syst. 2026, 4, 627–644. [Google Scholar] [CrossRef]
Shu, Q.; Wu, Y.; Xu, F.; Zheng, H. Estimate Utility Harmonic Impedance via the Correlation of Harmonic Measurements in Different Time Intervals. IEEE Trans. Power Deliv. 2020, 35, 2060–2067. [Google Scholar] [CrossRef]
Fan, Y.; Xu, F.; Wang, C.; Shu, Q. A Harmonic Impedance on Utility Side Estimation Method Based on Laplace Mixture Model. IEEE Trans. Power Deliv. 2024, 39, 1774–1782. [Google Scholar] [CrossRef]

Figure 1. Norton equivalent circuit model at the PCC.

Figure 2. Training convergence comparison of different RL algorithms. The proposed 10-action DQN converges within 500 episodes, while DDPG and SAC require 1500 and 1800 episodes respectively. DQN also achieves the highest final reward, indicating better overall performance.

Figure 3. Sensitivity analysis of reward weights α and β.

Figure 4. Flowchart of the proposed DRL-optimized AEKF method for utility harmonic impedance estimation.

Figure 5. Relative error of amplitude and phase under different background harmonic levels: (a) amplitude relative error, (b) phase relative error, (c) Amplitude Confidence Intervals, (d) Phase Confidence Intervals.

Figure 6. Relative error of amplitude and phase under different impedance ratios: (a) amplitude relative error, (b) phase relative error, (c) Amplitude Confidence Intervals, (d) Phase Confidence Intervals.

Figure 7. Relative error of amplitude and phase under different SNR levels: (a) amplitude relative error, (b) phase relative error, (c) Amplitude Confidence Intervals, (d) Phase Confidence Intervals.

Figure 8. EAF harmonic voltage and current amplitude at the PCC. (a) Voltage amplitude, (b) current amplitude.

Figure 9. The estimation results of five methods for each subinterval. (a) Amplitude, (b) phase.

Figure 10. DC terminal data harmonic voltage and current amplitude at the PCC. (a) Current amplitude, (b) voltage amplitude.

Figure 11. The estimation results of the DC terminal data for the 11th by five methods for each subinterval. (a) Amplitude (b) phase.

Table 1. Performance comparison of different reinforcement learning algorithms.

Algorithm	Average Relative Error (%)	Maximum Transient Error (%)	Convergence Time (ms)	Training Episodes	Inference Time per Sample (μs)	p-Value vs. DQN
DQN	0.18	0.80	0.5	500	15.7	-
DDPG	0.21	1.30	1.2	1500	28.4	0.023
SAC	0.23	1.50	1.5	1800	32.1	0.008

Table 2. Impedance values estimated by five algorithms for the EAF data.

Algorithm	Amplitude/Ω	Phase/Rad	Convergence Status (Failed Runs)
The proposed method	10.1173 ± 0.0421	1.2018 ± 0.0123	0/10
M1	11.2061 ± 0.1875	1.2733 ± 0.0452	3/10
M2	10.1883 ± 0.0964	1.1914 ± 0.0287	4/10
M3	10.0985 ± 0.0732	1.2177 ± 0.0215	2/10
M4	10.1352 ± 0.1200	1.1697 ± 0.0330	2/10

Table 3. Per-harmonic estimation results for the EAF site.

Harmonic Order		3rd	5th	7th	11th
The proposed method	Amplitude/Ω	10.1173 ± 0.0421	6.2451 ± 0.0317	4.5218 ± 0.0273	2.8974 ± 0.0225
The proposed method	Phase/rad	1.2018 ± 0.0123	1.1872 ± 0.0105	1.1735 ± 0.0098	1.1598 ± 0.0087
M1	Amplitude/Ω	11.2061 ± 0.1875	6.8924 ± 0.1523	4.9876 ± 0.1345	3.2145 ± 0.1127
M1	Phase/rad	1.2733 ± 0.0452	1.2541 ± 0.0387	1.2367 ± 0.0352	1.2189 ± 0.0318
M2	Amplitude/Ω	10.1883 ± 0.0964	6.3102 ± 0.0821	4.5721 ± 0.0735	2.9316 ± 0.0642
M2	Phase/rad	1.1914 ± 0.0287	1.1785 ± 0.0243	1.1659 ± 0.0217	1.1523 ± 0.0194
M3	Amplitude/Ω	10.0985 ± 0.0732	6.2137 ± 0.0619	4.4982 ± 0.0563	2.8763 ± 0.0491
M3	Phase/rad	1.2177 ± 0.0215	1.2034 ± 0.0189	1.1896 ± 0.0172	1.1754 ± 0.0156
M4	Amplitude/Ω	10.1352 ± 0.1200	6.2689 ± 0.1012	4.5437 ± 0.0925	2.9128 ± 0.0817
M4	Phase/rad	1.1697 ± 0.0330	1.1562 ± 0.0291	1.1438 ± 0.0265	1.1315 ± 0.0238

Table 4. Impedance values estimated by five algorithms for the DC terminal data.

Algorithm	Amplitude/Ω	Phase/Rad	Convergence Status (Failed Runs)
The proposed method	79.6336 ± 0.0872	1.0810 ± 0.0094	0/10
M1	79.9247 ± 0.2135	1.0898 ± 0.0217	3/10
M2	80.3419 ± 0.1768	1.0819 ± 0.0183	3/10
M3	79.2138 ± 0.1429	1.0815 ± 0.0156	1/10
M4	79.7757 ± 0.1984	1.0711 ± 0.0241	2/10

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tang, Z.; Wei, X.; Wei, Z.; Tan, F.; Tian, C.; Tang, Y.; Xiong, X. Deep Reinforcement-Learning-Optimized Adaptive EKF for Robust Utility Harmonic Impedance Estimation. Electronics 2026, 15, 2557. https://doi.org/10.3390/electronics15122557

AMA Style

Tang Z, Wei X, Wei Z, Tan F, Tian C, Tang Y, Xiong X. Deep Reinforcement-Learning-Optimized Adaptive EKF for Robust Utility Harmonic Impedance Estimation. Electronics. 2026; 15(12):2557. https://doi.org/10.3390/electronics15122557

Chicago/Turabian Style

Tang, Zhirong, Xin Wei, Zhaobin Wei, Fei Tan, Cong Tian, Ying Tang, and Xuedou Xiong. 2026. "Deep Reinforcement-Learning-Optimized Adaptive EKF for Robust Utility Harmonic Impedance Estimation" Electronics 15, no. 12: 2557. https://doi.org/10.3390/electronics15122557

APA Style

Tang, Z., Wei, X., Wei, Z., Tan, F., Tian, C., Tang, Y., & Xiong, X. (2026). Deep Reinforcement-Learning-Optimized Adaptive EKF for Robust Utility Harmonic Impedance Estimation. Electronics, 15(12), 2557. https://doi.org/10.3390/electronics15122557

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Deep Reinforcement-Learning-Optimized Adaptive EKF for Robust Utility Harmonic Impedance Estimation

Abstract

1. Introduction

2. System Model and Problem Formulation

2.1. Norton Equivalent Model for PCC Harmonic Analysis

2.2. State-Space Dynamic Model

2.2.1. Norton Equivalent Circuit and State Definition

2.2.2. Conventional AEKF with EM Algorithm

2.3. Problem Formulation

3. Proposed Deep Reinforcement-Learning-Optimized AEKF Method

3.1. DQN-Based Adaptive Optimization Framework

3.2. DQN Algorithm and Training Process

3.3. Generalization Assurance

4. Simulation Validation

4.1. Performance Under Varying Background Harmonics

4.2. Performance Under Varying Impedance Ratio

4.3. Performance Under Different Measurement Noise Levels

5. Field Measurement Verification

5.1. Electric Arc Furnace Data

5.2. DC Terminal Data

6. Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI