Recovery of Power Quality Terminal’s Harmonic Data with the Interference of Bad Data

Pan, Yufei; Yuan, Zehui; Zheng, Jiaoyu; Ma, Xiaoyang

doi:10.3390/electronics11111694

Open AccessArticle

Recovery of Power Quality Terminal’s Harmonic Data with the Interference of Bad Data

College of Electrical Engineering, Sichuan University, No.24 South Section 1, Yihuan Road, Chengdu 610065, China

^*

Author to whom correspondence should be addressed.

Electronics 2022, 11(11), 1694; https://doi.org/10.3390/electronics11111694

Submission received: 26 April 2022 / Revised: 20 May 2022 / Accepted: 23 May 2022 / Published: 26 May 2022

(This article belongs to the Section Systems & Control Engineering)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Power quality monitoring equipment is inevitably faced with the problem of data loss and is vulnerable to the interference of noise or bad data. We propose a harmonic data recovery method that is based on graph clustering and non-negative matrix factorization (NMF) under multiple constraints. Compared with the existing harmonic data recovery methods, the proposed method can effectively recover lost data and it has a strong anti-interference ability, especially for the recovery of harmonic data with interference. In the recovery of data loss, noisy interference tests and bad data interference tests, the presented recovery algorithm has high accuracy within 60% for continuous missing data. In an environment with SNR = 50, this method has high recover reliability and accuracy within 15% for situations involving bad data interference.

Keywords:

power quality terminal; harmonic data; bad data interference; data recovery; graph clustering; matrix factorization

1. Introduction

Recently, with the wide application of powered electronic devices and the high penetration of new energy, the sources of power pollution are rapidly increasing in number. In order to ensure the safe, stable and economic operation of the power system, the need for power quality monitoring technology has increased.

Presently, power quality monitoring is faced with numerous challenges, such as data recovery under the conditions of noise and bad data interference. These challenges add to other issues, including measurement errors, external attacks, signal loss and other factors, that all result in inevitably bad data, noise interference and other problems [1,2,3]. In order to ensure the reliable and stable operation of the power system, an efficient recovery algorithm is of the utmost importance.

The existing methods can be approximately classified into three main categories: the interpolation algorithm [4,5,6], state estimation method [7,8,9,10,11,12,13] and low-rank matrix algorithm [14,15,16,17,18,19,20]. The advantages and drawbacks of these algorithms will be briefly summarized as follows.

A Lagrangian interpolation algorithm is proposed in article [4] that can adaptively recover missing data; however, its accuracy decreases in the absence of continuous or large-scale data. In article [5], an improved cubic spline interpolation method is proposed so as to resolve the problem of the recovery of continuous data loss. In article [6], a recovery algorithm is proposed using non-linear interpolation, which has good recovery effects for the random absence of data at a small sampling ratio (about 10%). The aforementioned interpolation algorithms are low computationally, simple and fast. However, they all use single phasor measurement unit (PMU) data for recovery without considering the correlations between the data; their recovery effects are poor for large-scale and continuous missing data with weak anti-interference performances, especially for situations involving bad data.

In order to solve the above defects in the interpolation methods, the state estimation algorithm has arisen. In article [7], a network embedding-based method that describes the spatial correlations among buses with graphs is proposed, which requires prior knowledge of topological parameters. In article [8], the missing data substitution method, which is based on the participation decomposition of random implicit Krasulina events, is introduced; the missing data within which can be inferred from past states or the known data of other terminals. These methods need the supports of system topology parameters and several complete PMU datasets, which require the redundancy of the measurement terminals. Recently, the advent of machine learning technology has increased the accuracy and reliability of state estimation algorithms. In article [9], a convolutional neural network model was built that uses a data-driven method to alleviate the dependence on the parameters of the network’s topological structure. In article [10], graph-based deep learning technology was used to solve the problem of power system state monitoring for the first time. Furthermore, some probabilistic machine learning models, such as the probabilistic depth automatic encoder [11] and generating adversarial nets [12], don’t recover the missing or bad data, but directly produce data with the same characteristics. Article [13] proposed a single variable time convolution denoising automatic encoder, which uses the convolution denoising network in order to improve the method’s robustness to bad data interference. These methods rely on historical data as training data; thus, data cannot be recovered reliably without sufficient historical data. Moreover, they are limited by their high costs and poor transferability.

With the development of compressed sensing technology, the sparse matrix shows more robustness to noise-containing data [14]. Presently, low-rank matrix recovery techniques extend sparse representations to the low-rank situation of the matrix, which mainly consists of three directions: robust principal component analysis [15], matrix completion [16,17] and low-rank representation [18,19,20]. In article [15], a QRPCA method that resolves the convex relaxation problem of quantified low-rank matrix recovery is proposed. A BSVT algorithm is proposed in article [16] that is robust to bad data, combining kernel norm minimization and Bayesian estimation. In each iteration, the algorithm structure is optimized by the adaptive threshold. In article [17], the ADMM algorithm was used to recover missing data without estimating the rank of the target data matrix, offering lower computational complexity. In article [18], the concept of non-negative matrix factorization was first proposed, which outperforms the SVD algorithm. Accordingly, non-negative matrix decomposition with graph regularization algorithm [19] and LMFAGR algorithms [20] have been proposed, which introduce a graph regularization constraint in order to improve the robustness to noise interference and graph the construction with low-rank matrix factorization in order to improve the accuracy and the robustness to bad data interference, respectively. These methods have high computational complexity and are limited by the lowest amount of observable data and iterative convergence problems. Particularly, when the amount of lost data is increased, the recovery performance decreases dramatically. Moreover, the matrix becomes full-rank and disordered when the bad data interference is large, leading to the failure of most of these methods.

Based on the aforementioned analysis, the existing algorithms generally have some deficiencies in the recovery of power quality data with the interference of bad data. In addition, the existing power quality monitoring equipment mainly includes the power quality terminal (PQT), PMU, and wide-area measurement system. The PQT uses various statistical methods in order to measure power quality. Owing to its numerous channels, data may interfere with each other in different power quality index channels, facilitating the production of bad data.

Therefore, this study presents a recovery method that is based on non-negative matrix factorization for the harmonic data of PQTs. First, the similarity matrix is used to analyze the correlation of PQTs. Subsequently, the raw data are processed based on the threshold graph clustering algorithm; this improves the accuracy and reliability of recovery. Next, we propose a harmonic recovery model of non-negative matrix factorization (NMF) under multiple constraints and offer a solution with high accuracy and strong resistivity to the interference of bad data. This includes low-rank part extraction and optimization processes. Low-rank part extraction is based on the two-sided random projection (BRP) method, the power scheme model and the QR decomposition model. Conversely, the low-rank optimization process uses the Lagrangian iterative algorithm.

Based on the simulation experiments that are presented in this study, the proposed algorithm can reliably recover partially missing harmonic data. Meanwhile, it has a strong anti-noise ability and robustness to bad data interference. In most complex environments, the presented harmonic data recovery method maintains high accuracy and reliability, producing satisfactory results. The main contributions of this paper are as follows:

In order to obtain the similarity matrix, the information entropy weighting of the harmonic data was introduced and a graph threshold clustering algorithm was used so as to effectively obtain and utilize the correlation between harmonic data from multiple PQTs.
A multi-constraint harmonic data restoration model that is based on the NMF algorithm was established. Low-rank, sparse and regularization constraints were introduced, in order to more accurately distinguish and correct bad data and noise interference mixed in the harmonic data.

The remainder of this article is arranged as follows. Section 2 introduces the correlation analysis and clustering methods for PQTs. In Section 3, we propose a harmonic data recovery model that is based on NMF under multiple constraints and provide a solution that guarantees convergence. In Section 4, the simulation effects and analyses of the algorithm are presented. Section 5 summarizes the simulation and discusses the numerical findings. The conclusions and future research directions are provided in Section 6.

2. Processing of PQT Harmonic Data

In this section, we will describe how we processed the raw harmonic data that were collected by PQTs. We divided the data into different subsets using the graph threshold clustering algorithm. Data in the same subset have high similarity, that is, at one timestamp, the harmonic data of multiple PQT units in the same subset have similar amplitude, phase and change trends. The processing of the raw data clustering algorithm enabled the subsequent algorithm to recover the abnormal data with the use of similar data that were provided by different PQTs, making the recover algorithm have a higher level of accuracy and the ability to resist bad data interference. The clustering algorithm obtains the correlation between harmonic data through the system network topology and introduces the information entropy of harmonic data in order to improve it.

2.1. Graph Clustering Algorithm

For a power system that is equipped with n PQTs, the harmonic data set can be described as X = {x₁, x₂, …, x_j, …, x_n} ∈R^m^×n, where, x_j is the data that are sampled by the jth PQT, which is constituted by a set of parameters x_j = {x_1j, x_2j, …, x_ij, …, x_mj}^T. The variable x_ij is the data that are sampled at the ith timestamp. Based on the system’s network topology, the weight diagram can be constructed as shown in Figure 1.

In Figure 1, each node represents a PQT that is connected to n neighboring nodes. The measured impedance of the ith PQT is assumed to be the origin. That is, the relative measured impedance is set as z_i= 0 + 0j. According to the known impedance value of each line in the power system, the relative measured impedance of other nodes, such as z_j of the jth PQT, in the power system network can be obtained. Thus, according to the two relative measured impedance values z_i and z_j of the nodes at both ends of each line, we can provide a weight value to each edge, such as W_ij, that represents the relation between the ith PQT’s harmonic data set x_i and the jth PQT’s harmonic data set x_j. Therefore, W is called the similarity matrix (W ∈ Rⁿ ^{× n}), defined as

W_{i j} = e^{- {| z_{i} - z_{j} |}^{2} / (2 σ^{2})},

(1)

where

| z_{i} - z_{j} |,

represents the Euclidean distance between the relative measured impedancez_i and z_j, which is also the electrical distance between the ith PQT and jth PQT. The variable σ is the standard deviation.

Assuming z_i has the Markov property, the transfer matrix of the harmonic data is M:

M_{i j} = W_{i j} / \sum_{j = 1}^{n} W_{i j}

(2)

When the ith eigenvalue of the matrix M is denoted as φ_i, α_i and β_i denote the left and right eigenvectors of φ_i, respectively. These eigenvalues should be arranged in order from large to small. Thus, the matrix M can be decomposed as:

M = \sum_{i = 1}^{n} φ_{i} α_{i} β_{i}^{T}

(3)

The top q eigenvalues in descending order are assumed as the main eigenvalues, whereas the other eigenvalues are set to zero, where φ_q≪ φ_q−1 or φ_q—φ_q+1≫ φ_k—φ_k+1, (q + 1 ≤ k ≤ n). Thus, M can be simplified intoM_q:

M_{q} = \sum_{i = 1}^{q} φ_{i} α_{i} β_{i}^{T},

(4)

where q is the theoretical optimal number of subgroups.

In order to introduce a simplified diffusion distance for clustering which quantifies the correlation between harmonic data x_i and x_j, we define a mapping MAP:

MAP : z \to [\begin{matrix} φ_{1} z β_{1} \\ ⋮ \\ φ_{n} z β_{n} \end{matrix}],

(5)

where MAP is Rⁿ→Rⁿ and β_i = {β_i1, β_i2, …, β_in}. According to the main eigenvalues φ₁, φ₂, …, φ_q and their eigenvectors, the diffusion distance between x_i and x_j can be simplified as:

D^{2} (x_{i}, x_{j}) = {‖ \begin{matrix} φ_{1} [β_{1 i} - β_{1 j}] \\ ⋮ \\ φ_{q} [β_{q i} - β_{q j}] \end{matrix} ‖}^{2},

(6)

where

‖ ‖

is norm. Given a threshold Ω ∈ (0, 1), when D² (x_i, x_j) ≤ Ω, the corresponding PQTs are divided into the same subgroup.

2.2. Weighting Optimization Based on Information Entropy

The reliability of the aforementioned clustering algorithm decreases when the sampled harmonic data differ significantly. That is, although the electrical distance between the two PQTs is small, the actual harmonic datapoints x_i and x_j may differ significantly and should not be classified into the same cluster. In this case, because the graph clustering algorithm is only related to the Euclidean distance between the relative measured impedance, clustering cannot be completed correctly. For example, when the normal harmonic data is mixed with bad data, the entire dataset offers low smoothness and shows defects in the measurement of the relevance. To resolve the aforementioned problems, we introduced the information entropy of the harmonic data weighting method in order to optimize the Euclidean and diffusion distances. The specific steps are as follows:

Harmonic data that are collected from n PQTs are arranged in order of time so as to obtain a measurement matrix $X = [\begin{matrix} x_{11} & x_{12} & \dots & x_{1 n} \\ x_{21} & x_{22} & \dots & x_{2 n} \\ ⋮ & ⋮ & ⋮ \\ x_{m 1} & x_{m 2} & \dots & x_{m n} \end{matrix}],$ wherein x_tp denotes the harmonic datum that was sampled by the pth measurement terminals at tth timestamp.
Based on Equation (7), X is normalized in order to obtain the standardization matrix X^′.

$X^{'} = X_{t p}^{'} = x_{t p} / \sum_{t = 1}^{n} x_{t p} .$

(7)
Based on Equation (8), the entropy H_t is obtained for each timestamp.

$H_{t} = - \ln n \sum_{p = 1}^{n} X_{t p}^{'} \ln X_{t p}^{'} .$

(8)

when X^′_tp = 0, the corresponding item X^′_tp lnX^′_tp = 0 is defined.
To better describe the influence of the entire data of one timestamp for the purpose of clustering, the difference coefficient q_t is introduced and defined as q_t = 1 − H_t, q_t ∈ (0, 1). The normalized q_t is defined as the weight coefficient Q_t:

$Q_{t} = q_{t} / \sum_{t = 1}^{m} q_{t}, \sum_{t = 1}^{m} Q_{t} = 1 .$

(9)
The electrical distance, weighted by the information entropy of real-time harmonic data, is calculated.

$O^{'} (d_{i}, d_{j}) = \sqrt{\sum_{t = 1}^{m} Q_{t} {(x_{t i} - x_{t j})}^{2}} \cdot {| z_{i} - z_{j} |}^{2} .$

(10)

Equations (1) and (10) are combined in order to obtain the optimized similarity matrix W^′:

W_{i j}^{'} = e^{- \frac{O^{'} (d_{i}, d_{j})}{2 σ^{2}}} .

(11)

Thus, the optimization process of the Euclidean distance is completed. W^′_ij is used instead of W_ij to complete Equations (2)–(6). The PQTs are clustered by the entropy-weighted diffusion distance D² (x_i, x_j)^′. The improved clustering algorithm has higher accuracy and a wider range of usage scenarios with a certain degree of resistance to bad data interference.

3. Data Recovery Algorithm Based on NMF under Multiple Constraints

This section addressed the recovery of the harmonic data that were processed as mentioned in Section 2. We improved the NMF algorithm and combined it with the recovery of the harmonic data. We introduced sparse, low rank and graph regularization constraints, which all ensure that the recovered harmonic data’s values are as close to the actual values as is possible in the real physical environment, better realizing the monitoring and control of the system state. Moreover, this process provides a solution method with high precision and strong data interference ability against adverse data, which further improves the practicability of the algorithm.

3.1. Non-Negative Matrix Factorization

For a non-negative matrix X = {x₁, x₂, …, x_j, …, x_n} ∈ R^m^×n (R_ij > 0), NMF attempts to find two non-negative matrices U and V, where U = [U₁, U₂, …, U_k] ∈ R^m ^{× k} and V = [V₁, V₂, …, V_n] ∈ R^k^×n. It is assumed that their product is close to the input matrix X, namely X ≈ UV. The objective function of NMF is

\begin{matrix} \min_{U, V} \frac{1}{2} ‖ X - U V ‖_{F}^{2}; \\ s . t . U ⩾ 0, V ⩾ 0, \end{matrix}

(12)

where

‖ ‖_{F}

is the F norm.

However, when the traditional NMF algorithm is applied to the harmonic data recovery process, it may lose the expression of the data’s geometric relationship and destroy the low-rank structure. Furthermore, as mentioned in article [21], the effects of matrix decomposition can slightly deteriorate when the smoothness of the data is low. Based on the aforementioned problems, we propose a non-negative matrix factorization model with the constrains of sparsity, graph regularization and low-rank and then offer a solution method for the proposed model.

3.2. NMF under Multiple Constraints Recovery Model

Here, the raw harmonic data matrix X is approximately represented as L + E, where L is the low-rank part and E is the anomalous part (including missing data and the interference of noisy and bad data). The objective function of the proposed algorithm is first presented as:

\begin{matrix} \min_{L, E} ‖ X - L - E ‖_{F}^{2} + λ tr (G) + \\ μ tr (U U^{T}) + ν ‖ U ‖_{1}; \\ s . t . L = U V, L, U, V ⩾ 0, G = V L_{s} V^{T}; \\ rank (L) ⩽ r, card (E) ⩽ k, \end{matrix}

(13)

where the base matrix U and the coefficient matrix V are derived from the NMF of the low-rank matrix L. Rank (L) represents the rank of L and card (E) represents the number of non-zero terms in E. The variable r denotes the rank upper limit of L and k denotes the sparse upper limit of E. Variable L_s is the figure Laplacian operator. Variables λ and μ are regularization parameters. Variable ν is the sparse constraint parameters of the basis matrix U. The variable tr(η) represents the trace of the matrix and

‖ ‖_{1}

is the L1 norm.

In Equation (13),

\min_{L, E} ‖ X - L - E ‖_{F}^{2}

is the self-representation model loss under the constraint conditions of

rank (L) ⩽ r

and

card (E) ⩽ k

. The low-rank constraint maximizes the retention of the low-rank information during decomposition, whereas the sparse constraint minimizes the energy of the anomalous data.

λ tr (G)

is the local non-negative orthogonal constraint of the spectral clustering. A Tikhonov regularization operator G and Laplacian matrix L_s are introduced where

G = V L_{s} V^{T}, L_{s} = D - W^{'}

. D is the degree matrix and W^′_ij is the entropy-weighted similarity matrix.

μ tr (U U^{T})

is the Tikhonov regularization constraint on U, improving the smoothness of the harmonic data. Moreover, it has the relational expression

tr (U U^{T}) = \sum_{i, j} U_{ι j}

.

ν ‖ U ‖_{1}

is a constraint on sparse representations.

According to the model that is proposed in Equation (13), the recovery problem of the raw harmonic data X is transformed into the optimal solution problem of U and V.

3.3. Solution Method of the Proposed Model

As can be seen when observing Equation (13), the constraints are all distributed on U and V. Therefore, the solution process can be divided into two layers: the approximate calculation of the raw harmonic data X ≈ L + E and the constrained NMF solution. The former is the extraction process of the target matrix L, attempting to obtain the external optimal L. The latter consists of solving the internal optimization process of the target matrix, which weakens the interference that is caused by noise or bad data, and the output of the optimal state observation for each monitoring terminal.

3.3.1. Process of Extracting the Low-Rank Portion

According to Equation (13), the first part of the problem after decomposition can be described as:

\begin{matrix} \min_{L, E} ‖ X - L - E ‖_{F}^{2}; \\ s . t . rank (L) ⩽ r, card (E) ⩽ k . \end{matrix}

(14)

Although the problem of solving both L and E in Equation (14) is non-convex, it can be converted to the following two convergent iterations:

{\begin{matrix} L_{t} = \underset{rank (L) ⩽ r}{\arg \min} {‖ X - L - E_{t - 1} ‖}_{F}^{2} \\ E_{t} = \underset{cord (E) ⩽ k}{\arg \min} {‖ X - L_{t} - E ‖}_{F}^{2} \end{matrix} .

(15)

To simplify the calculation, the BRP method that is proposed in [22] was used. For the input raw harmonic data matrix X ∈ R^m ^{× n}, we have

Y_{1} = X J_{1},

(16)

Y_{2} = X^{T} J_{2},

(17)

where,J₁ ∈ Rⁿ^×r and J₂ ∈ R^m^×r are Gaussian random matrices. Y₁ and Y₂ are the right and left random projection matrices. Thus, the low-rank matrix L can be constructed as

L = Y_{1} {(J_{2}^{T} Y_{1})}^{- 1} Y_{2}^{T} .

(18)

However, although the relaxation of X is completed in the random projection process, it may cause singular value distortion. To avoid this, the power scheme model is introduced. X is replaced with X* = (XX^T)^cX, which has the same singular value, where c is the synchronization parameter. Moreover, in order to heighten the accuracy and further weaken the noise level and bad data interference, an orthogonal triangular QR decomposition is performed for Y₁ and Y₂.

Y_{1} = Q_{1} R_{1} = X^{*} J_{1},

(19)

Y_{2} = Q_{2} R_{2} = {(X^{T})}^{*} J_{2},

(20)

where, Q₁ and Q₂ are orthogonal matrices and R₁ and R₂ are upper triangular matrices. Equation (17) is updated by Equations (18) and (19) as

L = {(L^{*})}^{\frac{1}{2 c + 1}} = Q_{1} {[R_{1} {(J_{2}^{T} Y_{1})}^{- 1} R_{2}^{T}]}^{\frac{1}{2 c + 1}} Q_{2}^{T} .

(21)

We use Y₁ and Y₂ to update J₁ and J₂, respectively. Thus, we alternately iterate Y₁, Y₂, J₁ and J₂, in order to improve the accuracy of the extraction of the low-rank part L. E_t in Equation (15) depends on the hard threshold of the X–L_t. That is,

E_{t} = η_{hard - k} (X - L_{t}),

(22)

where

η_{hard - k} ()

is the matrix element hard threshold operator that is based on the condition

card (E) ⩽ k

, which retains the largest k element in |X–L_t| and sets the others to zero. In addition, it sets the allowable threshold error

ε

= 0.001, cyclically iterating until the error requirement is met. Thus, we can complete the optimal solution process for L and E. The L value that is obtained in this process is the external-optimal harmonic data wherein the interferences of bad data and noise have been filtered to the utmost. However, matrix L still cannot meet the requirements of the recovery accuracy of missing data and deeply solve the interference of bad data, because it did not consider the optimal recovery performances of harmonic data of individual PQTs. Thus, matrix L is included in the internal optimization process.

3.3.2. Internal Optimization Process Based on the Solution of U and V

According to Equation (13), the internal optimization problem of the low-rank part L, which is obtained as described in the previous section, can be described as follows:

\begin{matrix} \min_{U, V ⩾ 0} ‖ L - U V ‖_{F}^{2} + λ tr (G) + \\ μ tr (U U^{T}) + ν ‖ U ‖_{1}; \\ s . t . G = V L_{s} V^{T}, \end{matrix}

(23)

In the case of a known linear system Ax = b, A ∈ R^m^×n, when m < n, although the problem is NP-hard, the zero norm of x can be minimized so as to obtain an approximate solution, x^′. Similarly, the NMF optimization model of the harmonic low-rank partial uses the L1 norm to add the penalty term of the sparse constraint to the basis matrix U. Thus, Equation (22) can be rewritten as:

\begin{matrix} F (U, V) = tr (S S^{T}) + λ t r (G) + \\ μ tr (U U^{T}) + ν ‖ U ‖_{1}; \\ s . t . S = L - U V, G = V L_{s} V^{T}, \end{matrix}

(24)

where S is the NMF residue operator and G is the Tikhonov regularization operator. Next, Equation (23) is expanded as a Lagrangian function of U and V, letting δ and ξ be the Lagrange multiplier and δ = [δ_ia], ξ= [ξ_aj]. According to the trace calculation properties in linear algebra, the Lagrangian function K (L, U, V) can be constructed as:

\begin{matrix} K (L, U, V) = λ tr (V L_{s} U^{T}) + μ tr (U U^{T}) + \\ tr (L L^{T} - 2 U V L^{T} + U V V^{T} U^{T}) + \\ tr (ξ V^{T}) + tr (δ U^{T}) + γ ‖ U ‖_{1} . \end{matrix}

(25)

Subsequently, the partial guidance method is used to solve the transformed unconstrained Equation (24) in order to obtain the matrices U and V. Combined with the KKT condition of the Lagrangian function, the relational expression of iterative update u_ia and v_aj can be obtained:

{\begin{matrix} u_{i a}^{(f + 1)} = u_{i a}^{f} \frac{L {(V^{f})}^{T}}{U^{f} U^{f} {(V^{f})}^{T} + μ U^{f} + \frac{1}{2} ν s g n^{°} U^{f}} \\ v_{a j}^{(f + 1)} = v_{a j}^{f} \frac{{(U^{f})}^{T} L + λ V^{f} {W^{'}}_{}^{}}{U^{T} U^{f} V^{f} + λ V^{f} D_{}} \end{matrix},

(26)

where f denotes the number of iterations, W^′ is the similarity matrix and D is the degree matrix. The variable sgn

^{°}

(·) is a symbolic extraction function. That is, when W_ij is positive, sgn

^{°}

(W_ij) = 1; when W_ij is negative, sgn

^{°}

(W_ij) = −1; when W_ij= 0, sgn

^{°}

(W_ij) = 0.

The allowable error threshold

ε

= 0.001 is set and cyclically iterated upon until the error requirement is met. In this way we complete the optimization process for L.

A method that may be used to recover the missing harmonic data with the interference of bad data and noise from a power quality terminal is proposed here. The flowchart of the proposed method is shown in Figure 2 and the steps are summarized as follows:

Step 1:: Similarity analysis. The entropy-weighted diffusion distance is introduced in order to obtain the similarity matrix W^′ and the best group number q using Equations (1)–(4);
Step 2:: Based on the similarity, according to D² (x_i, x_j)^′ ≤ Ω, the PQTs are clustered into q subsets.
Step 3:: A data recovery model of NMF is built under multiple constraints and the solution is divided into two processes: low-rank information extraction and internal optimization.
Step 4:: The low-rank part is extracted. From Equations (15)–(21), the iterative algorithm is used based on the BRP method with the power scheme model and QR decomposition in order to solve the low-rank part L.
Step 5:: The internal optimization of L is completed. The Lagrange algorithm is used to transform Equation (23) into an unconstrained optimization problem of Equation (24). Combined with KKT conditions, the updated Equation (25) is obtained in order to iterate U and V. The recovery of the harmonic data is completed.

4. Simulation Analysis

This section presents the simulation analyses of the proposed graph clustering and NMF algorithm under multiple constraints, using the normalized mean absolute deviation (NMAD) as the evaluation standard of the recovery effect, which has been calculated as follows:

NMAD = (1 / n) \sum_{i = 1}^{n} ({‖ y_{i} - x_{i} ‖}_{2} / x_{i}),

(27)

where x_i is the raw harmonic data value and y_i is the value after recovery.

‖ ‖_{2}

represents the L2 norm.

In order to objectively evaluate the performance of the proposed method, this section compares the proposed method with cubic spline interpolation, OLAP and ADMM algorithms. Moreover, the harmonic data that were collected from our research mainly include the harmonic current and harmonic voltage. We selected the harmonic voltage for analysis. In order to increase the practicality and accuracy of the simulation analysis, we selected the main harmonic components in the measured harmonic data in northwest China for simulation. In order to determine the main harmonic components, we measured the harmonic voltage content with the harmonic order within 50 and collected the data in 16h with 2h intervals. Subsequently, we arranged the measurements in descending order after calculating their 95% probability values.

In Table 1, Hc-i represents the ith harmonic component. As observed, for the northwest region, the third, fifth and seventh harmonics rendered the largest weights. In order to simulate the real power quality environment to the extent possible, the third, fifth and seventh harmonic voltages were injected into the simulation system and the harmonic data were sampled every in 3 s intervals, similarly to the process that was applied to the harmonic data of the PQT. The data in this study were obtained by averaging 50 replicated experiments. Meanwhile, in order to verify the robustness of the algorithm under the bad data interference of different offset ranges and proportions, the parameters of different offset ranges were set under the condition of simulation. However, in the actual measurement task, the offset ranges of these bad harmonic data were difficult to obtain.

In this part, method 1 represents the proposed algorithm, method 2 represents the cubic spline interpolation algorithm, method 3 represents the ADMM algorithm and method 4 represents the OLAP algorithm.

4.1. Missing Harmonic Data Recovery without Interference

In article [5], power harmonic data loss is divided into two main types: single point loss and continuous missing data. Thus, in this study, 35 harmonic data within 105 s of each group (the third, fifth and seventh harmonics) were used for single-point loss and continuous missing data (10 data loss). The results are shown in Table 2, Table 3 and Table 4 and the recoveries of the missing data are tagged in gray.

Using the data that are provided in Table 2, Table 3 and Table 4, the recovery effects of the four methods were evaluated using the NMAD. Comparing the three situations with the third, fifth and seventh harmonics, each algorithm showed a similar tendency of the NMADs for the same type of data loss in different situations. It can be observed that the four algorithms show good recovery performance with single points loss of the third, fifth and seventh harmonics’ data with NMADs < 0.02%. However, for 10 continuous missing data, the algorithms show different performances. The proposed algorithm’s NMADs are 0.25%, 0.21% and 0.22% for the third, fifth and seventh harmonics, respectively, showing the most excellent recovery effects for longer missing times. The average NMAD of the cubic spline interpolation algorithm is approximately 6.41%, the OLAP algorithm is approximately 1.44%, and the ADMM algorithm is approximately 1.23%. The recovery effects of cubic spline interpolation are the worst. This is because only a single PQT’s harmonic data was used for this recovery. Therefore, the recovery effects are good for a small quantity of missing data. However, when the ratios of continuous data loss are large, the remainder of the intact data cannot estimate the state of a longer missing time, resulting in the severely diminished recovery performance of this method. In contrast, the OLAP and ADMM algorithms utilize multiple PQTs for cooperative recovery. Although they cannot infer the missing state from their own good data when the time duration of the missing data is long, they can use the harmonic data of other PQTs, which offer high correlation without data loss, in order to approximate the state of these terminals. However, the recovery performance of these two algorithms is still slightly lower than that of the proposed algorithm.

In order to better explore the effects of the proposed algorithm in situations of continuous missing data, we simulated the recovery of fifth harmonic data with different missing ratios. The results are shown in Figure 3. The total number of datapoints that were included in the following simulation is 200 (in 10 min).

As shown in Figure 3, the accuracy of each algorithm decreases with an increase in the continuous missing data ratio. However, their NMADs’ escalating trends are distinct. Compared to other algorithms, the NMADs of the cubic spline interpolation algorithm are always high. In the case of more than 50% continuous data loss, the NMAD is more than 30%, which cannot provide reliable data recovery. In contrast, when the continuous missing data ratio is lower than 30%, the NMADs of OLAP, ADMM and our proposed algorithm are low, (1.43%, 1.2% and 0.25%, respectively). However, with an increase in the missing ratio to 30%, the NMADs of the OLAP and ADMM algorithms increase and their recovery effects deteriorate rapidly. This is because when the amount of missing data increases, both of these algorithms cannot replace the harmonic data properly.

In contrast, our proposed algorithm is strongly correlated and smoother owing to the introduction of graph regularization and sparse constraints. Therefore, when the data missing ratio increased, although the algorithm also produced additional errors, the errors were smaller. Within the 60% missing ratio, the proposed algorithm always had excellent recovery performance and the NMAD did not increase significantly but rather slowly accumulated with the increase in deletions. Moreover, in an extreme situation with 70% of continuous data missing, the proposed algorithm still achieved an accuracy of approximately 95%. Therefore, this algorithm has excellent recovery performance and high stability for single-point and continuous data losses, providing good recovery effects in the case of PQT’s harmonic data loss.

4.2. Impact of Noise Interference on the Recovery of Missing Data

In an actual power system, the noise interference that is generated by the monitoring terminals and environment will severely influence the quality of the power data. In order to verify the noise resistance of the proposed algorithm, Gaussian white noise in SNR = 60 dB with a 10% interval was added to the fifth harmonics that were described above and the recovery effects were evaluated with different missing ratios.

Combining the results of Figure 3 and Figure 4, following the introduction of Gaussian white noise, the performance of the proposed algorithm shows slight changes for the NMADs within 0.08% when the missing ratios were lower than 50%. Further, in the situations with 60 and 70% missing ratios, the NMAD increased to 0.12% and 0.17%, respectively, which is still in the satisfying range, showing excellent recovery effects and noise resistance. The ADMM algorithm had slight fluctuations in the NMAD, which remained within 0.05% when the data loss was within 50%, showing excellent noise resistance. However, the NMADs significantly increased when the missing ratio exceeded 50%, for 1.12% in 60% and 1.23% in 70%. This is because the influence of noise is considered in the two algorithms. A Gaussian noise matrix is introduced in the ADMM algorithm, which weakens the influence of noise through the iteration process. However, the weakening effect decreases when the missing ratio increases. In the proposed algorithm, the introduction of the abnormal data matrix E and regularization constraints led to low energy and good smoothness of the abnormal data, which effectively eliminated the noise interference. In contrast, the cubic interpolation and the OLAP algorithms, without considering noise interference, are sensitive to noise. Thus, their NMADs are significantly increased when compared with their performance in situations without interference, especially in the cubic interpolation algorithm with over 10% increasing NMADs on average.

The recovery effects of the algorithms were further tested as shown in Figure 5, with significant performances (not including the cubic spline interpolation algorithm), for the proposed algorithm, OLAP, and ADMM methods with 20% continuous loss and SNR ranging from 30 to 90.

As shown, the recovery effects of each algorithm decrease with an increase in the noise level. Although the NMADs of the proposed algorithm when the noise interference of SNR is less than 50 also increased significantly, the values were still below 1%. This was especially so when the SNR of the noise interference was no less than 50; in this situation the NMADs maintained a low rate of ascent, showing a strong ability to resist the noise interference. Although ADMM also demonstrates significant recovery accuracy with NMADs ranging from 0.14% to 1.09%, it is always lower than the proposed algorithms and the gap increased when the SNR was less than 50. Compared with these two algorithms, the NMADs of OLAP are slightly higher, from 0.48% to 2.41%. The aforementioned simulation further verifies the effectiveness of the proposed algorithm.

4.3. Recovery of Harmonic Data with the Interference of Bad Data

Bad data is another interference. In the case of external attacks, measurement error or equipment failure, the interference of bad data may cause misjudgment and generate an offset of the state estimation of the power quality in the system. The modes of interference of bad data can be classified as follows: First, the distortion of data from the power quality terminal caused by measurement errors or by being attacked by an external signal, such as an abnormal fluctuation of the power quality data’s amplitude [23]. Second, the failure of the power quality terminal or a transmission delay leading to a mixture of normal harmonic data with a small amount of random wrong data.

A small amount of random wrong data has minor effects on harmonic data recovery, effects which are lower in magnitude than the recovery error that is caused by a single point of random missing data. Therefore, we selected the bad data with a significant impact on the recovery performance for verification. In order to test the effect of bad data on the recovery of the proposed algorithm, we analyzed it from two dimensions: the bad data ratio and amplitude maximum offset range. Table 5 compares the impact of different ratios of bad data with a 10% offset range to the recovery effects.

From Table 5, it can be noted that each algorithm was negatively influenced by increasing ratios of bad data, especially the cubic spline interpolation algorithm, which cannot efficiently identify bad data because it ignores the correlation of the harmonic data. This results in the severe deterioration of the recovery effect. When the bad data ratio increases to 20%, its NMAD increases to 18.99%, which indicates that it cannot complete recovery effectively. In contrast, the OLAP, ADMM and proposed algorithms show good recovery effects for NMADs lower than 1.5% for 10% bad data. This is because these methods exploit the low-rank character of harmonic data, which enables them to correct the harmonic data, including the bad data, without modifying the system’s topology and parameters [24]. Among these, our proposed algorithm shows the best resistance performance to the interference of bad data with the lowest NMADs.

When the ratios of bad data increase to more than 20%, the NMADs of the OLAP and ADMM algorithms increase significantly. This is similar to the previous large ratios due to missing data and noise interference. Owing to the loss of excessive good data, both of these algorithms reach the upper limit of their data recovery capacity, resulting in poor recovery effects. In contrast, the proposed algorithm utilizes the low-rank of the harmonic data and the raw harmonic data is appropriately sparse in the recovery process. This improves its smoothness and its ability to resist bad data. Thus, its recovery performance in each order of harmonic slightly changes with increasing ratios of bad data and high accuracy is always maintained.

Table 6 compares the effects of different offset ranges on the recovery effects of the four methods with 10% bad data.

From Table 6, the aforementioned conclusion is further verified. When the ratio of the bad data is low (10%), in addition to the high NMADs of the cubic spline interpolation algorithm, the other algorithms show good recovery effects and the growth amount is approximately proportional to the growth of the amplitude offsets. Among them, the NMADs of our proposed algorithm have the slowest growth with an increasing deviation range of bad data, reflecting the optimal recovery performance.

According to the aforementioned simulation results, in order to better apply to the monitoring of the power state in different power system environments, we tested the critical bending condition of the robustness to bad data interference. The third, fifth and seventh harmonics’ data showed similar change trends and rates in the presence of bad data interference; thus, we selected the third harmonic data with the largest total NMADs for the test. Moreover, we simultaneously changed the amplitude offset range and ratio of bad data and calculated the NMADs as shown in Table 7.

The gray cells indicates that the NMADs increased substantially and exceeded 1.5%. Based on each gray cell, with an increase in the ratio of bad data or the range of maximum amplitude deviation, the robustness of the algorithm to the interference of bad data can be seen to decrease rapidly and, therefore, the power monitoring task cannot be completed accurately and reliably. The yellow cells indicate the critical bending limit for interference robustness to bad data, such as with a 20% bad data ratio at a 20% maximum amplitude offset and a 40% bad data ratio at a 10% maximum amplitude offset. Within the corresponding range, the NMADs of the proposed algorithm grow slowly, exhibiting strong resistance to bad data interference.

4.4. Robustness Test against Bad Data Interference in Noisy Environments

This section considers the robustness of the proposed algorithm to bad data interference in complex environments with different noise levels. Referring to the aforementioned sections, in order to enhance the practicality of the results and the simulation we introduced Gaussian white noise interference with SNR from 30 to 80 and bad data interference in ratios from 2.5% to 20%. Based on the aforementioned test results, the maximum amplitude offset range of the bad data was set at 20%. We chose the ADMM algorithm as the comparison algorithm. The results are shown in Figure 6.

In noisy environments and when the ratio of bad data is less than 10%, the proposed and ADMM algorithms have good recovery effects. Their NMADs are all approximately equal to the superposition of the NMADs of the bad data or noise interference acting individually; that is, they cause no additional loss of recovery accuracy. However, when the ratio of bad data further increases to 15% and 20%, the NMADs of both algorithms increase to varying degrees; moreover, they increase far more than the superposition of the two effects acting separately. In the case of 20% bad data, the NMADs of the ADMM algorithm is 2.03% in an environment of low noise with SNR = 80, which is much higher than its 1.52% in a situation without noise and 0.23% with 20% data loss when considering noise alone. The NMAD further increases when the SNR is less than 50. Moreover, when the fifth harmonic data includes 20% bad data interference and noise with SNR = 30, the NMAD is approximately 5.12%.

However, in noisy environments, the change of NMADs in the proposed algorithm with no more than 15% bad data is smooth. Although the algorithm also increases significantly when it is exposed to 20% bad data interference, the NMAD is always maintained at approximately 2%, a result which is better than that of the ADMM algorithm. Thus, we have verified the robustness of the algorithm against bad data interference in noisy environments.

5. Results and Discussion

Based on the results of the missing data recovery test, noise interference tests and bad data interference tests using main harmonic components from northwest China, the high efficiency, reliability and accuracy of the proposed algorithm have been verified.

From the simulation analysis, the following results and discussion are noteworthy:

The processing of raw harmonic data exploits the electronic distance between the network topology of the power system. It enables the proposed method to effectively obtain and utilize the correlation between the harmonic data of multiple PQT units.
The influence of anomalous data, which is caused by bad data and noise interference, is considered in the proposed recovery model. In addition, low-rank, sparse and regularization constraints are introduced, which further improve the upper limit of the recovery ability of harmonic data with anomalous data. This model can identify bad data and noise that are mixed with the real harmonic data with increased accuracy and then correct it.
In the presence of both bad data and noise interferences, the proposed algorithm performed well with at most 15% bad data at 20% maximum amplitude offset in a noisy situation with SNR = 50, showing no obvious error increase.

6. Conclusions

This study presents an algorithm that is based on NMF under multiple constraints in order to recover the harmonic data that are provided by PQTs with the interference of bad data and noise. Meanwhile, data correlation was properly utilized by a graph clustering algorithm based on entropy-weighted diffusion distance, providing good recovery effects and a strong anti-interference ability for application to data loss with bad data interference. In addition, in the NMF model, the low-rank, sparse and regularization constraints were introduced, which enhanced the protection and utilization of the low-rank features, made the recovery data smooth, and offered better resistance to noise and bad data interference. The aforementioned advantages were verified by the recovery of missing data, noisy interference tests and bad data interference tests. Through the analysis and discussion of the simulation test results, the algorithm has been found to have high recovery accuracy for harmonic data with a single point missing and less than 60% continuous data missing. Furthermore, it can maintain high accuracy and reliability under the condition of Gaussian white noise interference of an SNR of at least 50 and 40% or less bad data interference acting individually. Moreover, in a noisy environment of SNR = 50, within 15% bad data interference at 20% amplitude offset, the algorithm still exhibited strong robustness to bad data interference and the error increase was insignificant, which can be applied to PQT power quality monitoring efficiently.

However, the performance test of the proposed algorithm was built on a 10 min level dataset with 200 datapoints. The recovery performance of the algorithm in scenarios with large datasets is not considered. Meanwhile, although we provide a convergence iterative algorithm for the proposed model, its computational complexity has not been considered. Therefore, we plan to conduct further research on the reduction of the computational time and complexity of the algorithm.

Author Contributions

Conceptualization, X.M.; methodology, Y.P.; software, Y.P.; validation, Y.P.; formal analysis, J.Z. and Y.P.; investigation, J.Z. and Z.Y.; resources, X.M.; data curation, X.M. and Z.Y.; writing—original draft preparation, Y.P.; writing—review and editing, Y.P.; visualization, Y.P.; supervision, X.M.; project administration, X.M.; funding acquisition, Y.P. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Interdisciplinary Innovation Cultivate Project of Sichuan University under Grant 0030304153003.

Data Availability Statement

The data presented in this study are available on request from corresponding authors.

Acknowledgments

The authors would like to acknowledge the support and guidance from editors in Electronics editorial office, and the three reviewers.

Conflicts of Interest

The authors declare no conflict of interest.

References

Yao, W.; Liu, Y.; Zhou, D.; Pan, Z.; Till, M.J.; Zhao, J.; Zhu, L.; Zhan, L.; Tang, Q.; Liu, Y.; et al. Impact of GPS signal loss and its mitigation in power system synchronized measurement terminals. IEEE Trans. Smart Grid 2018, 9, 1141–1149. [Google Scholar] [CrossRef]
Wu, M.; Xie, L. Online Detection of Low-Quality Synchrophasor Measurements: A Data-Driven Approach. IEEE Trans. Power Syst. 2017, 32, 2817–2827. [Google Scholar] [CrossRef]
Meka, R.; Jain, P.; Dhillon, I.S. Matrix completion from power-law distributed samples. In Advances in Neural Information Processing Systems 22; Curran Associates Inc.: Red Hook, NY, USA, 2009; pp. 1258–1266. [Google Scholar]
Huang, C.; Li, F.; Zhan, L.; Xu, Y.; Hu, Q.; Zhou, D.; Liu, Y. Data quality issues for synchrophasor applications Part II: Problem formulation and potential solutions. J. Mod. Power Syst. Clean Energy 2016, 4, 353–361. [Google Scholar] [CrossRef] [Green Version]
Yang, Z.; Liu, H.; Bi, T.; Li, Z.; Yang, Q. An adaptive PMU missing data recovery method. Int. J. Electr. Power Energy Syst. 2020, 116, 105577. [Google Scholar] [CrossRef]
Gopal, P.; Bailey, D.; Svalbe, I. Nonlinear Interpolation in the Fourier Domain Guided by Morphologic Filters. In Proceedings of the 2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Sydney, NSW, Australia, 29 November–1 December 2017; pp. 1–8. [Google Scholar] [CrossRef]
De Mijolla, G.M.; Konstantinopoulos, S.; Gao, P.; Chow, J.H.; Wang, M. An Evaluation of Algorithms for Synchrophasor Missing Data Recovery. In Proceedings of the 2018 Power Systems Computation Conference (PSCC), Dublin, Ireland, 11–15 June 2018; pp. 1–6. [Google Scholar] [CrossRef]
Foggo, B.; Yu, N. Online PMU Missing Value Replacement via Event-Participation Decomposition. IEEE Trans. Power Syst. 2022, 37, 488–496. [Google Scholar] [CrossRef]
Deng, X.; Bian, D.; Wang, W.; Jiang, Z.; Yao, W.; Qiu, W.; Tong, N.; Shi, D.; Liu, Y. Deep learning model to detect various synchrophasor data anomalies. IET Gener. Transm. Distrib. 2020, 14, 5739–5745. [Google Scholar] [CrossRef]
Yu, J.J.Q.; Hill, D.J.; Li, V.O.K.; Hou, Y. Synchrophasor Recovery and Prediction: A Graph-Based Deep Learning Approach. IEEE Internet Things J. 2019, 6, 7348–7359. [Google Scholar] [CrossRef]
Lin, Y.; Wang, J. Probabilistic Deep Autoencoder for Power System Measurement Outlier Detection and Reconstruction. IEEE Trans. Smart Grid 2020, 11, 1796–1798. [Google Scholar] [CrossRef]
Yu, J.J.Q.; Lam, A.Y.S.; Hill, D.J.; Hou, Y.; Li, V.O.K. Delay Aware Power System Synchrophasor Recovery and Prediction Framework. IEEE Trans. Smart Grid 2019, 10, 3732–3742. [Google Scholar] [CrossRef]
Li, Z.; Liu, H.; Zhao, J.; Bi, T.; Yang, Q. A Power System Disturbance Classification Method Robust to PMU Data Quality Issues. IEEE Trans. Ind. Inform. 2022, 18, 130–142. [Google Scholar] [CrossRef]
Wang, R.; Wang, M.; Xiong, J. Data Recovery and Subspace Clustering From Quantized and Corrupted Measurements. IEEE J. Sel. Top. Signal Process. 2018, 12, 1547–1560. [Google Scholar] [CrossRef]
Lan, A.S.; Studer, C.; Baraniuk, R.G. Matrix recovery from quantized and corrupted measurements. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, May 4–9 2014; pp. 4973–4977. [Google Scholar]
Genes, C.; Esnaola, I.; Perlaza, S.M.; Ochoa, L.F.; Coca, D. Robust recovery of missing data in electricity distribution systems. IEEE Trans. Smart Grid 2019, 10, 4057–4067. [Google Scholar] [CrossRef] [Green Version]
Liao, M.; Shi, D.; Yu, Z.; Yi, Z.; Wang, Z.; Xiang, Y. An Alternating Direction Method of Multipliers Based Approach for PMU Data Recovery. IEEE Trans. Smart Grid 2019, 10, 4554–4565. [Google Scholar] [CrossRef]
Yang, C.F.; Ye, M.; Zhao, J. Document clustering based on nonnegative sparse matrix factorization. In Proceedings of the 1st International Conference on Natural Computation, Changsha, China, 27–29 August 2005; pp. 557–563. [Google Scholar]
Cai, D.; He, X.; Han, J.; Huang, T.S. Graph regularized nonnega-tive matrix factorization for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 1548–1560. [Google Scholar]
Lu, G.-F.; Wang, Y.; Zou, J. Low-Rank Matrix Factorization with Adaptive Graph Regularizer. IEEE Trans. Image Process. 2016, 25, 2196–2205. [Google Scholar] [CrossRef]
Zhuang, C.; An, J.; Liu, Z.; Zeng, R. Power load data completion method considering low rank property. CSEE J. Power Energy Syst. 2020, 1–8. [Google Scholar] [CrossRef]
Zhou, T.; Tao, D. Godec: Randomized low-rank & sparsematrix decomposition in noisy case. In Proceedings of the 28th International Conference on Machine Learning, Washington, DC, USA, 28 June–2 July 2011; Omnipress: Madison, WI, USA, 2011; pp. 33–40. [Google Scholar]
Tajer, A.; Sihag, S.; Alnajjar, K. Non-linear state recovery in power system under bad data and cyber attacks. J. Mod. Power Syst. Clean Energy 2019, 7, 1071–1080. [Google Scholar] [CrossRef] [Green Version]
Wang, M.; Chow, J.H.; Osipov, D.; Konstantinopoulos, S.; Zhang, S.; Farantatos, E.; Patel, M. Review of Low-Rank Data-Driven Methods Applied to Synchrophasor Measurement. IEEE Open Access J. Power Energy 2021, 8, 532–542. [Google Scholar] [CrossRef]

Figure 1. Simple diagram of the power system topology.

Figure 2. Flowchart of graph clustering NMF under multiple constraints recovery method.

Figure 3. NMADs for different continuous missing ratios of 5th harmonic data.

Figure 4. NMADs for different continuous missing ratios of 5th harmonic data with Gaussian white noise.

Figure 5. Recovery effects in 20% continuous data loss with different SNRs: (a) third harmonic data recovery; (b) fifth harmonic data recovery; (c) seventh harmonic data recovery.

Figure 6. The recovery of fifth harmonic data with the interference of bad data and noise.

Table 1. Harmonic orders of the top five main components from samples.

Time/h	1st	2nd	3rd	4th	5th
0	Hc-5	Hc-7	Hc-3	Hc-13	Hc-9
2	Hc-5	Hc-7	Hc-3	Hc-13	Hc-9
4	Hc-5	Hc-7	Hc-3	Hc-9	Hc-13
6	Hc-5	Hc-7	Hc-3	Hc-13	Hc-9
8	Hc-5	Hc-7	Hc-3	Hc-13	Hc-9
10	Hc-5	Hc-7	Hc-3	Hc-13	Hc-9
12	Hc-5	Hc-7	Hc-3	Hc-13	Hc-9
14	Hc-5	Hc-7	Hc-3	Hc-13	Hc-9
16	Hc-5	Hc-7	Hc-3	Hc-9	Hc-13

Table 2. Recovery effects of single-point loss and continuous missing data (10 data loss) in third-harmonic data.

Time/s	Integrated Data/V	Data with Loss/V	Recovered Data/V
Time/s	Integrated Data/V	Data with Loss/V	Method 1	Method 2	Method 3	Method 4
0.077	413.244	413.244	413.244	413.244	413.244	413.244
3.077	526.225	526.225	526.225	526.225	526.225	526.225
6.077	418.642	418.642	418.642	418.642	418.642	418.642
9.077	507.169	0	507.482	507.369	507.730	507.861
12.077	497.712	497.712	497.712	497.712	497.712	497.712
15.077	384.232	384.232	384.232	384.232	384.232	384.232
18.077	494.482	494.482	494.482	494.482	494.482	494.482
21.077	368.179	368.179	368.179	368.179	368.179	368.179
24.077	366.643	366.643	366.643	366.643	366.643	366.643
27.077	476.253	476.253	476.253	476.253	476.253	476.253
30.077	476.213	476.213	476.213	476.213	476.213	476.213
33.077	466.876	466.876	466.876	466.876	466.876	466.876
36.077	460.574	460.574	460.574	460.574	460.574	460.574
39.077	449.204	449.204	449.204	449.204	449.204	449.204
42.077	452.893	452.893	452.893	452.893	452.893	452.893
45.077	429.803	429.803	429.803	429.803	429.803	429.803
48.077	429.823	429.823	429.823	429.823	429.823	429.823
51.077	417.668	417.668	417.668	417.668	417.668	417.668
54.077	434.948	434.948	434.948	434.948	434.948	434.948
57.077	418.711	418.711	418.711	418.711	418.711	418.711
60.077	519.364	519.364	519.364	519.364	519.364	519.364
63.077	395.204	395.204	395.204	395.204	395.204	395.204
66.077	472.881	0	470.042	489.219	467.435	463.719
69.077	457.970	0	460.500	431.433	452.927	448.063
72.077	487.602	0	485.813	506.672	482.342	479.750
75.077	473.883	0	476.156	496.226	479.047	482.156
78.077	466.068	0	464.274	451.287	460.695	473.906
81.077	452.851	0	451.253	492.112	446.426	460.844
84.077	580.834	0	584.531	548.787	571.353	570.438
87.077	556.592	0	552.037	532.492	561.781	565.344
90.077	502.655	0	500.084	481.145	495.134	492.912
93.077	489.315	0	492.777	521.542	494.491	500.688
96.077	590.917	590.917	590.917	590.917	590.917	590.917
99.077	588.993	588.993	588.993	588.993	588.993	588.993
102.077	564.598	564.598	564.598	564.598	564.598	564.598

Table 3. Recovery effects of single-point loss and continuous missing data (10 data loss) in fifth harmonic data.

Time/s	Integrated Data/V	Raw Data/V	Recovered Data/V
Time/s	Integrated Data/V	Raw Data/V	Method 1	Method 2	Method 3	Method 4
0.077	4227.813	4227.813	4227.813	4227.813	4227.813	4227.813
3.077	4345.249	4345.249	4345.249	4345.249	4345.249	4345.249
6.077	4244.180	4244.180	4244.180	4244.180	4244.180	4244.180
9.077	4348.890	0	4348.201	4347.980	4347.862	4347.109
12.077	4307.209	4307.209	4307.209	4307.209	4307.209	4307.209
15.077	4248.413	4248.413	4248.413	4248.413	4248.413	4248.413
18.077	4339.457	4339.457	4339.457	4339.457	4339.457	4339.457
21.077	4232.662	4232.662	4232.662	4232.662	4232.662	4232.662
24.077	4258.363	4258.363	4258.363	4258.363	4258.363	4258.363
27.077	4277.058	4277.058	4277.058	4277.058	4277.058	4277.058
30.077	4312.267	4312.267	4312.267	4312.267	4312.267	4312.267
33.077	4262.791	4262.791	4262.791	4262.791	4262.791	4262.791
36.077	4334.925	4334.925	4334.925	4334.925	4334.925	4334.925
39.077	4316.909	4316.909	4316.909	4316.909	4316.909	4316.909
42.077	4315.944	4315.944	4315.944	4315.944	4315.944	4315.944
45.077	4290.567	4290.567	4290.567	4290.567	4290.567	4290.567
48.077	4293.313	4293.313	4293.313	4293.313	4293.313	4293.313
51.077	4283.420	4283.420	4283.420	4283.420	4283.420	4283.420
54.077	4377.183	4377.183	4377.183	4377.183	4377.183	4377.183
57.077	4296.724	4296.724	4296.724	4296.724	4296.724	4296.724
60.077	4378.204	4378.204	4378.204	4378.204	4378.204	4378.204
63.077	4338.284	4338.284	4338.284	4338.284	4338.284	4338.284
66.077	4364.792	0	4370.111	4392.121	4378.200	4381.753
69.077	4340.860	0	4348.265	4365.497	4331.105	4356.484
72.077	4404.886	0	4409.678	4363.605	4397.199	4392.071
75.077	4376.992	0	4382.925	4397.398	4384.594	4387.173
78.077	4323.251	0	4318.530	4361.041	4331.398	4311.227
81.077	4370.478	0	4375.425	4348.008	4384.550	4362.108
84.077	4408.842	0	4401.443	4389.085	4399.970	4420.792
87.077	4418.551	0	4414.038	4438.127	4410.414	4408.514
90.077	4371.396	0	4376.237	4332.091	4382.669	4385.437
93.077	4344.307	0	4338.471	4312.142	4324.505	4321.103
96.077	4400.660	4400.660	4400.660	4400.660	4400.660	4400.660
99.077	4362.066	4362.066	4362.066	4362.066	4362.066	4362.066
102.077	4375.862	4375.862	4375.862	4375.862	4375.862	4375.862

Table 4. Recovery effects of single-point loss and continuous missing data (10 data loss) in seventh harmonic data.

Time/s	Integrated Data/V	Raw Data/V	Recovered Data/V
Time/s	Integrated Data/V	Raw Data/V	Method 1	Method 2	Method 3	Method 4
0.077	2086.731	2086.731	2086.731	2086.731	2086.731	2086.731
3.077	2098.033	2098.033	2098.033	2098.033	2098.033	2098.033
6.077	2078.937	2078.937	2078.937	2078.937	2078.937	2078.937
9.077	2111.626	0	2112.834	2111.291	2112.300	2112.806
12.077	2131.365	2131.365	2131.365	2131.365	2131.365	2131.365
15.077	2074.219	2074.219	2074.219	2074.219	2074.219	2074.219
18.077	2105.754	2105.754	2105.754	2105.754	2105.754	2105.754
21.077	2135.162	2135.162	2135.162	2135.162	2135.162	2135.162
24.077	2147.078	2147.078	2147.078	2147.078	2147.078	2147.078
27.077	2123.025	2123.025	2123.025	2123.025	2123.025	2123.025
30.077	2151.056	2151.056	2151.056	2151.056	2151.056	2151.056
33.077	2119.069	2119.069	2119.069	2119.069	2119.069	2119.069
36.077	2149.303	2149.303	2149.303	2149.303	2149.303	2149.303
39.077	2131.038	2131.038	2131.038	2131.038	2131.038	2131.038
42.077	2120.104	2120.104	2120.104	2120.104	2120.104	2120.104
45.077	2109.036	2109.036	2109.036	2109.036	2109.036	2109.036
48.077	2150.128	2150.128	2150.128	2150.128	2150.128	2150.128
51.077	2143.850	2143.850	2143.850	2143.850	2143.850	2143.850
54.077	2137.864	2137.864	2137.864	2137.864	2137.864	2137.864
57.077	2161.683	2161.683	2161.683	2161.683	2161.683	2161.683
60.077	2147.748	2147.748	2147.748	2147.748	2147.748	2147.748
63.077	2169.577	2169.577	2169.577	2169.577	2169.577	2169.577
66.077	2190.872	0	2188.794	2168.002	2196.012	2184.468
69.077	2167.230	0	2163.908	2144.313	2160.247	2157.152
72.077	2169.963	0	2173.967	2155.377	2161.331	2179.411
75.077	2167.494	0	2165.072	2149.106	2159.785	2176.096
78.077	2184.524	0	2180.757	2201.650	2192.925	2194.001
81.077	2211.611	0	2216.630	2232.992	2220.554	2224.891
84.077	2204.015	0	2207.889	2226.300	2213.686	2216.520
87.077	2207.953	0	2210.872	2233.311	2217.110	2220.231
90.077	2196.231	0	2201.152	2231.312	2207.108	2208.799
93.077	2202.074	0	2204.937	2228.787	2210.141	2209.681
96.077	2212.880	2212.880	2212.880	2212.880	2212.880	2212.880
99.077	2193.608	2193.608	2193.608	2193.608	2193.608	2193.608
102.077	2167.164	2167.164	2167.164	2167.164	2167.164	2167.164

Table 5. NMADs of third, fifth and seventh harmonic data recovery with 10% offset range in different ratios of bad data.

Order	Ratio/%	Method 1/%	Method 2/%	Method 3/%	Method 4/%
3	5	0.07	4.75	0.33	0.83
	10	0.26	12.76	0.58	1.32
	20	0.45	18.99	1.65	3.78
	30	0.63	26.38	5.97	3.12
	40	0.86	37.26	4.43	8.23
5	5	0.07	5.08	0.35	0.95
	10	0.23	11.26	0.53	1.26
	20	0.46	19.37	1.52	3.81
	30	0.61	28.01	5.73	2.74
	40	0.81	40.92	4.01	7.68
7	5	0.06	4.72	0.37	0.86
	10	0.25	12.12	0.67	1.29
	20	0.41	17.61	1.69	3.65
	30	0.63	26.48	5.77	3.11
	40	0.83	38.17	4.22	7.91

Table 6. NMADs of third, fifth and seventh harmonic data recovery with 10% bad data under different offset ranges.

Order	Range/%	Method 1/%	Method 2/%	Method 3/%	Method 4/%
3	5	0.21	9.33	0.39	1.11
	10	0.26	12.76	0.58	1.32
	15	0.3	16.27	0.74	1.47
	20	0.36	22.33	0.97	1.74
5	5	0.19	9.12	0.34	1.06
	10	0.23	11.26	0.53	1.26
	15	0.26	14.82	0.75	1.57
	20	0.31	23.03	0.88	1.76
7	5	0.21	8.55	0.36	1.02
	10	0.25	12.12	0.67	1.29
	15	0.28	15.80	0.89	1.43
	20	0.33	21.66	1.06	1.63

Table 7. NMADs of proposed algorithm with different ratios of bad data under different offset ranges.

	5	10	15	20	30	40	60	70
Ratio/%	5	10	15	20	30	40	60	70
5	0.06	0.07	0.12	0.17	0.32	0.55	1.01	1.39
10	0.21	0.26	0.3	0.36	0.51	0.70	1.26	2.33
20	0.41	0.45	0.68	0.91	1.47	2.09	5.97	8.61
30	0.54	0.63	1.02	1.69	3.92	6.49	14.26	19.33
40	0.77	0.86	1.52	3.41	6.61	11.29	24.82	30.65

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pan, Y.; Yuan, Z.; Zheng, J.; Ma, X. Recovery of Power Quality Terminal’s Harmonic Data with the Interference of Bad Data. Electronics 2022, 11, 1694. https://doi.org/10.3390/electronics11111694

AMA Style

Pan Y, Yuan Z, Zheng J, Ma X. Recovery of Power Quality Terminal’s Harmonic Data with the Interference of Bad Data. Electronics. 2022; 11(11):1694. https://doi.org/10.3390/electronics11111694

Chicago/Turabian Style

Pan, Yufei, Zehui Yuan, Jiaoyu Zheng, and Xiaoyang Ma. 2022. "Recovery of Power Quality Terminal’s Harmonic Data with the Interference of Bad Data" Electronics 11, no. 11: 1694. https://doi.org/10.3390/electronics11111694

APA Style

Pan, Y., Yuan, Z., Zheng, J., & Ma, X. (2022). Recovery of Power Quality Terminal’s Harmonic Data with the Interference of Bad Data. Electronics, 11(11), 1694. https://doi.org/10.3390/electronics11111694

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Recovery of Power Quality Terminal’s Harmonic Data with the Interference of Bad Data

Abstract

1. Introduction

2. Processing of PQT Harmonic Data

2.1. Graph Clustering Algorithm

2.2. Weighting Optimization Based on Information Entropy

3. Data Recovery Algorithm Based on NMF under Multiple Constraints

3.1. Non-Negative Matrix Factorization

3.2. NMF under Multiple Constraints Recovery Model

3.3. Solution Method of the Proposed Model

3.3.1. Process of Extracting the Low-Rank Portion

3.3.2. Internal Optimization Process Based on the Solution of U and V

4. Simulation Analysis

4.1. Missing Harmonic Data Recovery without Interference

4.2. Impact of Noise Interference on the Recovery of Missing Data

4.3. Recovery of Harmonic Data with the Interference of Bad Data

4.4. Robustness Test against Bad Data Interference in Noisy Environments

5. Results and Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI