1. Introduction
Recently, with the wide application of powered electronic devices and the high penetration of new energy, the sources of power pollution are rapidly increasing in number. In order to ensure the safe, stable and economic operation of the power system, the need for power quality monitoring technology has increased.
Presently, power quality monitoring is faced with numerous challenges, such as data recovery under the conditions of noise and bad data interference. These challenges add to other issues, including measurement errors, external attacks, signal loss and other factors, that all result in inevitably bad data, noise interference and other problems [
1,
2,
3]. In order to ensure the reliable and stable operation of the power system, an efficient recovery algorithm is of the utmost importance.
The existing methods can be approximately classified into three main categories: the interpolation algorithm [
4,
5,
6], state estimation method [
7,
8,
9,
10,
11,
12,
13] and low-rank matrix algorithm [
14,
15,
16,
17,
18,
19,
20]. The advantages and drawbacks of these algorithms will be briefly summarized as follows.
A Lagrangian interpolation algorithm is proposed in article [
4] that can adaptively recover missing data; however, its accuracy decreases in the absence of continuous or large-scale data. In article [
5], an improved cubic spline interpolation method is proposed so as to resolve the problem of the recovery of continuous data loss. In article [
6], a recovery algorithm is proposed using non-linear interpolation, which has good recovery effects for the random absence of data at a small sampling ratio (about 10%). The aforementioned interpolation algorithms are low computationally, simple and fast. However, they all use single phasor measurement unit (PMU) data for recovery without considering the correlations between the data; their recovery effects are poor for large-scale and continuous missing data with weak anti-interference performances, especially for situations involving bad data.
In order to solve the above defects in the interpolation methods, the state estimation algorithm has arisen. In article [
7], a network embedding-based method that describes the spatial correlations among buses with graphs is proposed, which requires prior knowledge of topological parameters. In article [
8], the missing data substitution method, which is based on the participation decomposition of random implicit
Krasulina events, is introduced; the missing data within which can be inferred from past states or the known data of other terminals. These methods need the supports of system topology parameters and several complete PMU datasets, which require the redundancy of the measurement terminals. Recently, the advent of machine learning technology has increased the accuracy and reliability of state estimation algorithms. In article [
9], a convolutional neural network model was built that uses a data-driven method to alleviate the dependence on the parameters of the network’s topological structure. In article [
10], graph-based deep learning technology was used to solve the problem of power system state monitoring for the first time. Furthermore, some probabilistic machine learning models, such as the probabilistic depth automatic encoder [
11] and generating adversarial nets [
12], don’t recover the missing or bad data, but directly produce data with the same characteristics. Article [
13] proposed a single variable time convolution denoising automatic encoder, which uses the convolution denoising network in order to improve the method’s robustness to bad data interference. These methods rely on historical data as training data; thus, data cannot be recovered reliably without sufficient historical data. Moreover, they are limited by their high costs and poor transferability.
With the development of compressed sensing technology, the sparse matrix shows more robustness to noise-containing data [
14]. Presently, low-rank matrix recovery techniques extend sparse representations to the low-rank situation of the matrix, which mainly consists of three directions: robust principal component analysis [
15], matrix completion [
16,
17] and low-rank representation [
18,
19,
20]. In article [
15], a QRPCA method that resolves the convex relaxation problem of quantified low-rank matrix recovery is proposed. A BSVT algorithm is proposed in article [
16] that is robust to bad data, combining kernel norm minimization and Bayesian estimation. In each iteration, the algorithm structure is optimized by the adaptive threshold. In article [
17], the ADMM algorithm was used to recover missing data without estimating the rank of the target data matrix, offering lower computational complexity. In article [
18], the concept of non-negative matrix factorization was first proposed, which outperforms the SVD algorithm. Accordingly, non-negative matrix decomposition with graph regularization algorithm [
19] and LMFAGR algorithms [
20] have been proposed, which introduce a graph regularization constraint in order to improve the robustness to noise interference and graph the construction with low-rank matrix factorization in order to improve the accuracy and the robustness to bad data interference, respectively. These methods have high computational complexity and are limited by the lowest amount of observable data and iterative convergence problems. Particularly, when the amount of lost data is increased, the recovery performance decreases dramatically. Moreover, the matrix becomes full-rank and disordered when the bad data interference is large, leading to the failure of most of these methods.
Based on the aforementioned analysis, the existing algorithms generally have some deficiencies in the recovery of power quality data with the interference of bad data. In addition, the existing power quality monitoring equipment mainly includes the power quality terminal (PQT), PMU, and wide-area measurement system. The PQT uses various statistical methods in order to measure power quality. Owing to its numerous channels, data may interfere with each other in different power quality index channels, facilitating the production of bad data.
Therefore, this study presents a recovery method that is based on non-negative matrix factorization for the harmonic data of PQTs. First, the similarity matrix is used to analyze the correlation of PQTs. Subsequently, the raw data are processed based on the threshold graph clustering algorithm; this improves the accuracy and reliability of recovery. Next, we propose a harmonic recovery model of non-negative matrix factorization (NMF) under multiple constraints and offer a solution with high accuracy and strong resistivity to the interference of bad data. This includes low-rank part extraction and optimization processes. Low-rank part extraction is based on the two-sided random projection (BRP) method, the power scheme model and the QR decomposition model. Conversely, the low-rank optimization process uses the Lagrangian iterative algorithm.
Based on the simulation experiments that are presented in this study, the proposed algorithm can reliably recover partially missing harmonic data. Meanwhile, it has a strong anti-noise ability and robustness to bad data interference. In most complex environments, the presented harmonic data recovery method maintains high accuracy and reliability, producing satisfactory results. The main contributions of this paper are as follows:
In order to obtain the similarity matrix, the information entropy weighting of the harmonic data was introduced and a graph threshold clustering algorithm was used so as to effectively obtain and utilize the correlation between harmonic data from multiple PQTs.
A multi-constraint harmonic data restoration model that is based on the NMF algorithm was established. Low-rank, sparse and regularization constraints were introduced, in order to more accurately distinguish and correct bad data and noise interference mixed in the harmonic data.
The remainder of this article is arranged as follows.
Section 2 introduces the correlation analysis and clustering methods for PQTs. In
Section 3, we propose a harmonic data recovery model that is based on NMF under multiple constraints and provide a solution that guarantees convergence. In
Section 4, the simulation effects and analyses of the algorithm are presented.
Section 5 summarizes the simulation and discusses the numerical findings. The conclusions and future research directions are provided in
Section 6.
3. Data Recovery Algorithm Based on NMF under Multiple Constraints
This section addressed the recovery of the harmonic data that were processed as mentioned in
Section 2. We improved the NMF algorithm and combined it with the recovery of the harmonic data. We introduced sparse, low rank and graph regularization constraints, which all ensure that the recovered harmonic data’s values are as close to the actual values as is possible in the real physical environment, better realizing the monitoring and control of the system state. Moreover, this process provides a solution method with high precision and strong data interference ability against adverse data, which further improves the practicability of the algorithm.
3.1. Non-Negative Matrix Factorization
For a non-negative matrix
X = {
x1,
x2, …,
xj, …,
xn} ∈
Rm×n (
Rij > 0), NMF attempts to find two non-negative matrices
U and
V, where
U = [
U1,
U2, …,
Uk] ∈
Rm × k and
V = [
V1,
V2, …,
Vn] ∈
Rk×n. It is assumed that their product is close to the input matrix
X, namely
X ≈
UV. The objective function of NMF is
where
is the F norm.
However, when the traditional NMF algorithm is applied to the harmonic data recovery process, it may lose the expression of the data’s geometric relationship and destroy the low-rank structure. Furthermore, as mentioned in article [
21], the effects of matrix decomposition can slightly deteriorate when the smoothness of the data is low. Based on the aforementioned problems, we propose a non-negative matrix factorization model with the constrains of sparsity, graph regularization and low-rank and then offer a solution method for the proposed model.
3.2. NMF under Multiple Constraints Recovery Model
Here, the raw harmonic data matrix
X is approximately represented as
L +
E, where
L is the low-rank part and
E is the anomalous part (including missing data and the interference of noisy and bad data). The objective function of the proposed algorithm is first presented as:
where the base matrix
U and the coefficient matrix
V are derived from the NMF of the low-rank matrix
L. Rank (
L) represents the rank of
L and card (
E) represents the number of non-zero terms in
E. The variable
r denotes the rank upper limit of
L and
k denotes the sparse upper limit of
E. Variable
Ls is the figure Laplacian operator. Variables
λ and
μ are regularization parameters. Variable
ν is the sparse constraint parameters of the basis matrix
U. The variable tr(η) represents the trace of the matrix and
is the L1 norm.
In Equation (13), is the self-representation model loss under the constraint conditions of and . The low-rank constraint maximizes the retention of the low-rank information during decomposition, whereas the sparse constraint minimizes the energy of the anomalous data.
is the local non-negative orthogonal constraint of the spectral clustering. A Tikhonov regularization operator G and Laplacian matrix Ls are introduced where . D is the degree matrix and W′ij is the entropy-weighted similarity matrix.
is the Tikhonov regularization constraint on U, improving the smoothness of the harmonic data. Moreover, it has the relational expression .
is a constraint on sparse representations.
According to the model that is proposed in Equation (13), the recovery problem of the raw harmonic data X is transformed into the optimal solution problem of U and V.
3.3. Solution Method of the Proposed Model
As can be seen when observing Equation (13), the constraints are all distributed on U and V. Therefore, the solution process can be divided into two layers: the approximate calculation of the raw harmonic data X ≈ L + E and the constrained NMF solution. The former is the extraction process of the target matrix L, attempting to obtain the external optimal L. The latter consists of solving the internal optimization process of the target matrix, which weakens the interference that is caused by noise or bad data, and the output of the optimal state observation for each monitoring terminal.
3.3.1. Process of Extracting the Low-Rank Portion
According to Equation (13), the first part of the problem after decomposition can be described as:
Although the problem of solving both
L and
E in Equation (14) is non-convex, it can be converted to the following two convergent iterations:
To simplify the calculation, the BRP method that is proposed in [
22] was used. For the input raw harmonic data matrix
X ∈
Rm × n, we have
where,
J1 ∈
Rn×r and
J2 ∈
Rm×r are Gaussian random matrices.
Y1 and
Y2 are the right and left random projection matrices. Thus, the low-rank matrix
L can be constructed as
However, although the relaxation of
X is completed in the random projection process, it may cause singular value distortion. To avoid this, the
power scheme model is introduced.
X is replaced with
X* = (
XXT)
cX, which has the same singular value, where c is the synchronization parameter. Moreover, in order to heighten the accuracy and further weaken the noise level and bad data interference, an orthogonal triangular QR decomposition is performed for
Y1 and
Y2.
where,
Q1 and
Q2 are orthogonal matrices and
R1 and
R2 are upper triangular matrices. Equation (17) is updated by Equations (18) and (19) as
We use
Y1 and
Y2 to update
J1 and
J2, respectively. Thus, we alternately iterate
Y1,
Y2,
J1 and
J2, in order to improve the accuracy of the extraction of the low-rank part
L.
Et in Equation (15) depends on the hard threshold of the
X–
Lt. That is,
where
is the matrix element hard threshold operator that is based on the condition
, which retains the largest
k element in |
X–
Lt| and sets the others to zero. In addition, it sets the allowable threshold error
= 0.001, cyclically iterating until the error requirement is met. Thus, we can complete the optimal solution process for
L and
E. The
L value that is obtained in this process is the external-optimal harmonic data wherein the interferences of bad data and noise have been filtered to the utmost. However, matrix
L still cannot meet the requirements of the recovery accuracy of missing data and deeply solve the interference of bad data, because it did not consider the optimal recovery performances of harmonic data of individual PQTs. Thus, matrix
L is included in the internal optimization process.
3.3.2. Internal Optimization Process Based on the Solution of U and V
According to Equation (13), the internal optimization problem of the low-rank part
L, which is obtained as described in the previous section, can be described as follows:
In the case of a known linear system
Ax = b,
A ∈
Rm×n, when
m <
n, although the problem is NP-hard, the zero norm of x can be minimized so as to obtain an approximate solution, x
′. Similarly, the NMF optimization model of the harmonic low-rank partial uses the L1 norm to add the penalty term of the sparse constraint to the basis matrix
U. Thus, Equation (22) can be rewritten as:
where
S is the NMF residue operator and
G is the Tikhonov regularization operator. Next, Equation (23) is expanded as a Lagrangian function of
U and
V, letting
δ and
ξ be the Lagrange multiplier and
δ = [
δia],
ξ= [
ξaj]. According to the trace calculation properties in linear algebra, the Lagrangian function K (
L,
U,
V) can be constructed as:
Subsequently, the partial guidance method is used to solve the transformed unconstrained Equation (24) in order to obtain the matrices
U and
V. Combined with the KKT condition of the Lagrangian function, the relational expression of iterative update
uia and
vaj can be obtained:
where f denotes the number of iterations,
W′ is the similarity matrix and
D is the degree matrix. The variable sgn
(·) is a symbolic extraction function. That is, when
Wij is positive, sgn
(
Wij) = 1; when
Wij is negative, sgn
(
Wij) = −1; when
Wij= 0, sgn
(
Wij) = 0.
The allowable error threshold = 0.001 is set and cyclically iterated upon until the error requirement is met. In this way we complete the optimization process for L.
A method that may be used to recover the missing harmonic data with the interference of bad data and noise from a power quality terminal is proposed here. The flowchart of the proposed method is shown in
Figure 2 and the steps are summarized as follows:
- Step 1:
Similarity analysis. The entropy-weighted diffusion distance is introduced in order to obtain the similarity matrix W′ and the best group number q using Equations (1)–(4);
- Step 2:
Based on the similarity, according to D2 (xi, xj)′ ≤ Ω, the PQTs are clustered into q subsets.
- Step 3:
A data recovery model of NMF is built under multiple constraints and the solution is divided into two processes: low-rank information extraction and internal optimization.
- Step 4:
The low-rank part is extracted. From Equations (15)–(21), the iterative algorithm is used based on the BRP method with the power scheme model and QR decomposition in order to solve the low-rank part L.
- Step 5:
The internal optimization of L is completed. The Lagrange algorithm is used to transform Equation (23) into an unconstrained optimization problem of Equation (24). Combined with KKT conditions, the updated Equation (25) is obtained in order to iterate U and V. The recovery of the harmonic data is completed.
4. Simulation Analysis
This section presents the simulation analyses of the proposed graph clustering and NMF algorithm under multiple constraints, using the normalized mean absolute deviation (NMAD) as the evaluation standard of the recovery effect, which has been calculated as follows:
where
xi is the raw harmonic data value and
yi is the value after recovery.
represents the
L2 norm.
In order to objectively evaluate the performance of the proposed method, this section compares the proposed method with cubic spline interpolation, OLAP and ADMM algorithms. Moreover, the harmonic data that were collected from our research mainly include the harmonic current and harmonic voltage. We selected the harmonic voltage for analysis. In order to increase the practicality and accuracy of the simulation analysis, we selected the main harmonic components in the measured harmonic data in northwest China for simulation. In order to determine the main harmonic components, we measured the harmonic voltage content with the harmonic order within 50 and collected the data in 16h with 2h intervals. Subsequently, we arranged the measurements in descending order after calculating their 95% probability values.
In
Table 1, Hc-i represents the ith harmonic component. As observed, for the northwest region, the third, fifth and seventh harmonics rendered the largest weights. In order to simulate the real power quality environment to the extent possible, the third, fifth and seventh harmonic voltages were injected into the simulation system and the harmonic data were sampled every in 3 s intervals, similarly to the process that was applied to the harmonic data of the PQT. The data in this study were obtained by averaging 50 replicated experiments. Meanwhile, in order to verify the robustness of the algorithm under the bad data interference of different offset ranges and proportions, the parameters of different offset ranges were set under the condition of simulation. However, in the actual measurement task, the offset ranges of these bad harmonic data were difficult to obtain.
In this part, method 1 represents the proposed algorithm, method 2 represents the cubic spline interpolation algorithm, method 3 represents the ADMM algorithm and method 4 represents the OLAP algorithm.
4.1. Missing Harmonic Data Recovery without Interference
In article [
5], power harmonic data loss is divided into two main types: single point loss and continuous missing data. Thus, in this study, 35 harmonic data within 105 s of each group (the third, fifth and seventh harmonics) were used for single-point loss and continuous missing data (10 data loss). The results are shown in
Table 2,
Table 3 and
Table 4 and the recoveries of the missing data are tagged in gray.
Using the data that are provided in
Table 2,
Table 3 and
Table 4, the recovery effects of the four methods were evaluated using the NMAD. Comparing the three situations with the third, fifth and seventh harmonics, each algorithm showed a similar tendency of the NMADs for the same type of data loss in different situations. It can be observed that the four algorithms show good recovery performance with single points loss of the third, fifth and seventh harmonics’ data with NMADs < 0.02%. However, for 10 continuous missing data, the algorithms show different performances. The proposed algorithm’s NMADs are 0.25%, 0.21% and 0.22% for the third, fifth and seventh harmonics, respectively, showing the most excellent recovery effects for longer missing times. The average NMAD of the cubic spline interpolation algorithm is approximately 6.41%, the OLAP algorithm is approximately 1.44%, and the ADMM algorithm is approximately 1.23%. The recovery effects of cubic spline interpolation are the worst. This is because only a single PQT’s harmonic data was used for this recovery. Therefore, the recovery effects are good for a small quantity of missing data. However, when the ratios of continuous data loss are large, the remainder of the intact data cannot estimate the state of a longer missing time, resulting in the severely diminished recovery performance of this method. In contrast, the OLAP and ADMM algorithms utilize multiple PQTs for cooperative recovery. Although they cannot infer the missing state from their own good data when the time duration of the missing data is long, they can use the harmonic data of other PQTs, which offer high correlation without data loss, in order to approximate the state of these terminals. However, the recovery performance of these two algorithms is still slightly lower than that of the proposed algorithm.
In order to better explore the effects of the proposed algorithm in situations of continuous missing data, we simulated the recovery of fifth harmonic data with different missing ratios. The results are shown in
Figure 3. The total number of datapoints that were included in the following simulation is 200 (in 10 min).
As shown in
Figure 3, the accuracy of each algorithm decreases with an increase in the continuous missing data ratio. However, their NMADs’ escalating trends are distinct. Compared to other algorithms, the NMADs of the cubic spline interpolation algorithm are always high. In the case of more than 50% continuous data loss, the NMAD is more than 30%, which cannot provide reliable data recovery. In contrast, when the continuous missing data ratio is lower than 30%, the NMADs of OLAP, ADMM and our proposed algorithm are low, (1.43%, 1.2% and 0.25%, respectively). However, with an increase in the missing ratio to 30%, the NMADs of the OLAP and ADMM algorithms increase and their recovery effects deteriorate rapidly. This is because when the amount of missing data increases, both of these algorithms cannot replace the harmonic data properly.
In contrast, our proposed algorithm is strongly correlated and smoother owing to the introduction of graph regularization and sparse constraints. Therefore, when the data missing ratio increased, although the algorithm also produced additional errors, the errors were smaller. Within the 60% missing ratio, the proposed algorithm always had excellent recovery performance and the NMAD did not increase significantly but rather slowly accumulated with the increase in deletions. Moreover, in an extreme situation with 70% of continuous data missing, the proposed algorithm still achieved an accuracy of approximately 95%. Therefore, this algorithm has excellent recovery performance and high stability for single-point and continuous data losses, providing good recovery effects in the case of PQT’s harmonic data loss.
4.2. Impact of Noise Interference on the Recovery of Missing Data
In an actual power system, the noise interference that is generated by the monitoring terminals and environment will severely influence the quality of the power data. In order to verify the noise resistance of the proposed algorithm, Gaussian white noise in SNR = 60 dB with a 10% interval was added to the fifth harmonics that were described above and the recovery effects were evaluated with different missing ratios.
Combining the results of
Figure 3 and
Figure 4, following the introduction of Gaussian white noise, the performance of the proposed algorithm shows slight changes for the NMADs within 0.08% when the missing ratios were lower than 50%. Further, in the situations with 60 and 70% missing ratios, the NMAD increased to 0.12% and 0.17%, respectively, which is still in the satisfying range, showing excellent recovery effects and noise resistance. The ADMM algorithm had slight fluctuations in the NMAD, which remained within 0.05% when the data loss was within 50%, showing excellent noise resistance. However, the NMADs significantly increased when the missing ratio exceeded 50%, for 1.12% in 60% and 1.23% in 70%. This is because the influence of noise is considered in the two algorithms. A Gaussian noise matrix is introduced in the ADMM algorithm, which weakens the influence of noise through the iteration process. However, the weakening effect decreases when the missing ratio increases. In the proposed algorithm, the introduction of the abnormal data matrix
E and regularization constraints led to low energy and good smoothness of the abnormal data, which effectively eliminated the noise interference. In contrast, the cubic interpolation and the OLAP algorithms, without considering noise interference, are sensitive to noise. Thus, their NMADs are significantly increased when compared with their performance in situations without interference, especially in the cubic interpolation algorithm with over 10% increasing NMADs on average.
The recovery effects of the algorithms were further tested as shown in
Figure 5, with significant performances (not including the cubic spline interpolation algorithm), for the proposed algorithm, OLAP, and ADMM methods with 20% continuous loss and SNR ranging from 30 to 90.
As shown, the recovery effects of each algorithm decrease with an increase in the noise level. Although the NMADs of the proposed algorithm when the noise interference of SNR is less than 50 also increased significantly, the values were still below 1%. This was especially so when the SNR of the noise interference was no less than 50; in this situation the NMADs maintained a low rate of ascent, showing a strong ability to resist the noise interference. Although ADMM also demonstrates significant recovery accuracy with NMADs ranging from 0.14% to 1.09%, it is always lower than the proposed algorithms and the gap increased when the SNR was less than 50. Compared with these two algorithms, the NMADs of OLAP are slightly higher, from 0.48% to 2.41%. The aforementioned simulation further verifies the effectiveness of the proposed algorithm.
4.3. Recovery of Harmonic Data with the Interference of Bad Data
Bad data is another interference. In the case of external attacks, measurement error or equipment failure, the interference of bad data may cause misjudgment and generate an offset of the state estimation of the power quality in the system. The modes of interference of bad data can be classified as follows: First, the distortion of data from the power quality terminal caused by measurement errors or by being attacked by an external signal, such as an abnormal fluctuation of the power quality data’s amplitude [
23]. Second, the failure of the power quality terminal or a transmission delay leading to a mixture of normal harmonic data with a small amount of random wrong data.
A small amount of random wrong data has minor effects on harmonic data recovery, effects which are lower in magnitude than the recovery error that is caused by a single point of random missing data. Therefore, we selected the bad data with a significant impact on the recovery performance for verification. In order to test the effect of bad data on the recovery of the proposed algorithm, we analyzed it from two dimensions: the bad data ratio and amplitude maximum offset range.
Table 5 compares the impact of different ratios of bad data with a 10% offset range to the recovery effects.
From
Table 5, it can be noted that each algorithm was negatively influenced by increasing ratios of bad data, especially the cubic spline interpolation algorithm, which cannot efficiently identify bad data because it ignores the correlation of the harmonic data. This results in the severe deterioration of the recovery effect. When the bad data ratio increases to 20%, its NMAD increases to 18.99%, which indicates that it cannot complete recovery effectively. In contrast, the OLAP, ADMM and proposed algorithms show good recovery effects for NMADs lower than 1.5% for 10% bad data. This is because these methods exploit the low-rank character of harmonic data, which enables them to correct the harmonic data, including the bad data, without modifying the system’s topology and parameters [
24]. Among these, our proposed algorithm shows the best resistance performance to the interference of bad data with the lowest NMADs.
When the ratios of bad data increase to more than 20%, the NMADs of the OLAP and ADMM algorithms increase significantly. This is similar to the previous large ratios due to missing data and noise interference. Owing to the loss of excessive good data, both of these algorithms reach the upper limit of their data recovery capacity, resulting in poor recovery effects. In contrast, the proposed algorithm utilizes the low-rank of the harmonic data and the raw harmonic data is appropriately sparse in the recovery process. This improves its smoothness and its ability to resist bad data. Thus, its recovery performance in each order of harmonic slightly changes with increasing ratios of bad data and high accuracy is always maintained.
Table 6 compares the effects of different offset ranges on the recovery effects of the four methods with 10% bad data.
From
Table 6, the aforementioned conclusion is further verified. When the ratio of the bad data is low (10%), in addition to the high NMADs of the cubic spline interpolation algorithm, the other algorithms show good recovery effects and the growth amount is approximately proportional to the growth of the amplitude offsets. Among them, the NMADs of our proposed algorithm have the slowest growth with an increasing deviation range of bad data, reflecting the optimal recovery performance.
According to the aforementioned simulation results, in order to better apply to the monitoring of the power state in different power system environments, we tested the critical bending condition of the robustness to bad data interference. The third, fifth and seventh harmonics’ data showed similar change trends and rates in the presence of bad data interference; thus, we selected the third harmonic data with the largest total NMADs for the test. Moreover, we simultaneously changed the amplitude offset range and ratio of bad data and calculated the NMADs as shown in
Table 7.
The gray cells indicates that the NMADs increased substantially and exceeded 1.5%. Based on each gray cell, with an increase in the ratio of bad data or the range of maximum amplitude deviation, the robustness of the algorithm to the interference of bad data can be seen to decrease rapidly and, therefore, the power monitoring task cannot be completed accurately and reliably. The yellow cells indicate the critical bending limit for interference robustness to bad data, such as with a 20% bad data ratio at a 20% maximum amplitude offset and a 40% bad data ratio at a 10% maximum amplitude offset. Within the corresponding range, the NMADs of the proposed algorithm grow slowly, exhibiting strong resistance to bad data interference.
4.4. Robustness Test against Bad Data Interference in Noisy Environments
This section considers the robustness of the proposed algorithm to bad data interference in complex environments with different noise levels. Referring to the aforementioned sections, in order to enhance the practicality of the results and the simulation we introduced Gaussian white noise interference with SNR from 30 to 80 and bad data interference in ratios from 2.5% to 20%. Based on the aforementioned test results, the maximum amplitude offset range of the bad data was set at 20%. We chose the ADMM algorithm as the comparison algorithm. The results are shown in
Figure 6.
In noisy environments and when the ratio of bad data is less than 10%, the proposed and ADMM algorithms have good recovery effects. Their NMADs are all approximately equal to the superposition of the NMADs of the bad data or noise interference acting individually; that is, they cause no additional loss of recovery accuracy. However, when the ratio of bad data further increases to 15% and 20%, the NMADs of both algorithms increase to varying degrees; moreover, they increase far more than the superposition of the two effects acting separately. In the case of 20% bad data, the NMADs of the ADMM algorithm is 2.03% in an environment of low noise with SNR = 80, which is much higher than its 1.52% in a situation without noise and 0.23% with 20% data loss when considering noise alone. The NMAD further increases when the SNR is less than 50. Moreover, when the fifth harmonic data includes 20% bad data interference and noise with SNR = 30, the NMAD is approximately 5.12%.
However, in noisy environments, the change of NMADs in the proposed algorithm with no more than 15% bad data is smooth. Although the algorithm also increases significantly when it is exposed to 20% bad data interference, the NMAD is always maintained at approximately 2%, a result which is better than that of the ADMM algorithm. Thus, we have verified the robustness of the algorithm against bad data interference in noisy environments.
6. Conclusions
This study presents an algorithm that is based on NMF under multiple constraints in order to recover the harmonic data that are provided by PQTs with the interference of bad data and noise. Meanwhile, data correlation was properly utilized by a graph clustering algorithm based on entropy-weighted diffusion distance, providing good recovery effects and a strong anti-interference ability for application to data loss with bad data interference. In addition, in the NMF model, the low-rank, sparse and regularization constraints were introduced, which enhanced the protection and utilization of the low-rank features, made the recovery data smooth, and offered better resistance to noise and bad data interference. The aforementioned advantages were verified by the recovery of missing data, noisy interference tests and bad data interference tests. Through the analysis and discussion of the simulation test results, the algorithm has been found to have high recovery accuracy for harmonic data with a single point missing and less than 60% continuous data missing. Furthermore, it can maintain high accuracy and reliability under the condition of Gaussian white noise interference of an SNR of at least 50 and 40% or less bad data interference acting individually. Moreover, in a noisy environment of SNR = 50, within 15% bad data interference at 20% amplitude offset, the algorithm still exhibited strong robustness to bad data interference and the error increase was insignificant, which can be applied to PQT power quality monitoring efficiently.
However, the performance test of the proposed algorithm was built on a 10 min level dataset with 200 datapoints. The recovery performance of the algorithm in scenarios with large datasets is not considered. Meanwhile, although we provide a convergence iterative algorithm for the proposed model, its computational complexity has not been considered. Therefore, we plan to conduct further research on the reduction of the computational time and complexity of the algorithm.