Disturbance Detection of a Power Transmission System Based on the Enhanced Canonical Variate Analysis Method

: Aiming at the characteristics of dynamic correlation, periodic oscillation, and weak disturbance symptom of power transmission system data, this paper proposes an enhanced canonical variate analysis (CVA) method, called SLCVA k NN, for monitoring the disturbances of power transmission systems. In the proposed method, CVA is ﬁrst used to extract the dynamic features by analyzing the data correlation and establish a statistical model with two monitoring statistics T 2 and Q . Then, in order to handling the periodic oscillation of power data, the two statistics are reconstructed in phase space, and the k -nearest neighbor ( k NN) technique is applied to design the statistics nearest neighbor distance DT 2 and DQ as the enhanced monitoring indices. Further considering the detection difﬁculty of weak disturbances with the insigniﬁcant symptoms, statistical local analysis (SLA) is integrated to construct the primary and improved residual vectors of the CVA dynamic features, which are capable to prompt the disturbance detection sensitivity. The veriﬁcation results on the real industrial data show that the SLCVA k NN method can detect the occurrence of power system disturbance more effectively than the traditional data-driven monitoring methods.


Introduction
With the increasing demand on the power energy in the modern industry, power transmission systems are becoming more and more large-scale and complicated [1,2]. Due to the system complexity, anomalies and disturbances are often unavoidable in real power systems. If these unexpected events are not handled timely, they may cause huge accident risks and even the widespread power outages, which are companied by the huge economic loss and severe life inconvenience. Therefore, it is of great value to detect the abnormal events quickly and maintain the safe running of power systems [3]. In recent years, the wide area measurement system (WAMS) based on synchronous phaser technology has been successfully applied in the power industry. The phasor measurement units in WAMS provide the basic data support for the real-time dynamic monitoring of the power system [4]. Accordingly, safety monitoring and disturbance detection of power systems based on the measurement data analysis has been a hot topic in academic and engineering fields [5][6][7].
Aiming at the power system disturbance detection task, researchers have conducted a lot of studies, which can be roughly divided into two categories: time/frequency domain analysis and multivariate statistical analysis. The time/frequency domain analysis investigates the power system changes from the perspective of the signal processing, which involves the time domain, frequency domain, or time-frequency domain. In consideration of the good time-frequency localization property, Huang et al. [8] discussed the application of the Morelet wavelets method in power system disturbance detection. The Hilbert Huang

CVA Monitoring Model
A power system is a classical dynamic process [26,27], where the measurement data demonstrate the clear trend along the sampling time. The measured three-phase electric field and current waveform change with time, and the current data point has a certain correlation with the historical samples. Therefore, it is more reasonable to apply the dynamic data analysis tool to extract the process features.
Canonical variate analysis (CVA) is an effective dynamic data analysis tool, which has been applied to the model identification and control in the multivariate dynamic system [28,29]. This paper introduces it to deal with power system data. For a certain power transmission line, the data points under normal system operation have a fixed correlation along the time dimension. When a disturbance occurs, this correlation may be destroyed. By monitoring the correlation among the time series data, CVA can find the system disturbance effectively. When CVA is applied to data modeling, the training data are firstly divided into the historical data set and the future data set, and the CVA optimization problem is designed to find the maximum correlation between these two data sets for describing the data dynamic features. The algorithm details are clarified as follows.
For the power system measurement data vector x h ∈ R m at the h-th sampling instant, its corresponding historical data vector p h and future data vector f h are constructed as where M = m × l, and l represents the time lag order. Given the projection vectors a and b, they are used to transform the historical and future vectors into their respective projections d = a T p h and v = b T f h . CVA is to optimize the vector pair a and b so that the correlation between d and v is maximized, which are also called canonical variates. This can be described by the mathematical expression as where Σ p f represents the cross-covariance matrix of the historical and future data vectors, and Σ pp , Σ f f denote the covariance matrix of the historical and future data vectors, respectively.
Suppose that the training data set includes n samples as X = [x T 1 , x T 2 , · · · , x T n ] T ∈ R n×m , then the historical and future data matrix can be expressed by where N = n − 2l + 1 is the sample number of the historical and future data matrix. Then the covariance matrices defined in Equation (3) can be calculated by Solving the optimization problem described by Equation (3) leads to a singular value decomposition on the matrix The solution of Equation (9) is further used to build the a series of the projection vectors a i and b i (1 ≤ i ≤ M), which are computed by where (:, i) represents the i-th column of the matrix. The vectors a i and b i are ordered by the corresponding correlation degree, which is given in the diagonal elements of matrix Λ, also meaning the correlation coefficients. The first s pairs of projection vectors {a i , b i , 1 ≤ i ≤ s} describe the stronger correlation and indicate the close relationship between the historical data and the future data. Therefore, a projection matrix A s = [a 1 a 2 · · · a s ] is defined to extract the canonical variate vector d h as which describes the main dynamic features of process data. Here, s is determined so that the corresponding canonical variate vectors describe a cumulative percentage of 90% of correlation coefficients. As A s only involves the first s projection directions, it cannot cover all the data information. The rest information in the CVA model can be described by the CVA residual vector e h as Based on the canonical variate vector and CVA residual vector, two monitoring statistics T 2 and Q are often used to judge the process state. The T 2 statistic describes the changes of principal dynamic states, while the Q statistic monitors the changes of the residual information. For the h-th sample, the statistics are written by In the normal operation, these two statistics should satisfy T 2 h ≤ T 2 h,lim and Q h ≤ Q h,lim , where T 2 h,lim and Q h,lim are the corresponding confidence limits. In some literature, the confidence limits of these two statistics can be obtained by assuming the prior distribution [30]. However, these distribution assumptions are often difficult to satisfy. Therefore, this paper applies the data-driven kernel density estimation to determine the confidence limit [31,32].

CVAkNN Model Based on kNN Monitoring Index
As the measurement data of power transmission systems have the periodic fluctuation characteristic, the traditional CVA monitoring statistics T 2 and Q behave unsteadily with the periodic changes. In this case, disturbance detection by directly monitoring the amplitudes of monitoring statistics cannot discover the disturbance signals effectively and may lead to a high disturbance missing rate.
In order to overcome this defect, this paper introduces the k-nearest neighbor analysis (kNN) to enhance the basic monitoring statistics. kNN is one effective multimodal data analysis tool and does not depend on the amplitude changes before and after the disturbance. In the literature [33,34], kNN was introduced and adapted for real-time detection of system disturbances. By combining the CVA model and the kNN-based monitoring statistics, the improved method, which is called CVAkNN, has a stronger capability of dealing with the periodic oscillation data property. The main idea of CVAkNN is to first reconstruct the monitoring statistic in the phase space and then build the monitoring index based on the distance between the reconstructed statistic vector and its k-nearest neighbor.
Phase space reconstruction is a good method to deal with time series analysis. This method regards one-dimensional time series as the result of nonlinear dynamic system motion and constructs the phase vectors by re-arranging the time series. This theory has been successfully applied in the fields of chaotic time series prediction and equipment failure data analysis [35,36]. Here it is introduced to deal with the CVA monitoring statistics for the further kNN analysis.
For the training data set with n samples x 1 , x 2 , . . . , x n , the corresponding statistics vectors are obtained by the CVA modeling as Further, the phase reconstruction statistics matrix can be formulated as follows: where L is the embedding dimension defining the length of the reconstructed phase vector. Based on the results of the phase space reconstruction, the dynamic behavior of the statistics can be better described, which is conducive to the detection of power system disturbances.
In the online monitoring stage, a new testing sample x t is collected at the t-th sampling instant. Then the monitoring statistics can be computed by applying Equations (13) and (14), and the reconstructed phase vectors are described as To determine whether the test data x t is normal, it is necessary to compare the similarity between NT 2 t , NQ 2 t , and the reconstructed statistics matrix in Equations (17) and (18). If the reconstructed statistics NT 2 t , NQ 2 t are strongly similar to one column of the training statistics vectors in Equations (17) and (18), then the test data x t describe the normal working condition. Otherwise, it means that some faults occur in the power transmission system. Therefore, the key is how to perform this similarity comparison. This paper introduces the k-nearest neighbor (kNN) analysis to construct a kNN-based distance measurement indicator: statistical nearest neighbor distance (SNND).
The idea of SNND is to find the first k-th nearest neighbors of the test vector in the given matrix data and compute the distance between the test vector and the k-th nearest neighbors as a disturbance detection criterion. The SNND index for NT 2 t is defined as where MT 2 (j k , :) represents the j k -th row in the MT 2 matrix, which corresponds to the k-th nearest neighbor of NT 2 t , and ||.|| represents the L2 norm calculation. By analogy, the SNND indicator of NQ t can be established as Under normal operating conditions, the above two indicators should fluctuate within a relatively small range. That means DT 2 t ≤ DT 2 lim and DQ t ≤ DQ lim for the normal running status. Once the threshold is exceeded, it means that there is a system disturbance. The threshold can be obtained by the kernel density estimation method.

SLCVAkNN Model Assisted by Statistical Local Analysis
In the power transmission system, some weak disturbances are often difficult to detect, such as the high-impendence single-phase ground fault. When this kind of disturbance occurs, the changes reflected by the measure voltage and current variables are very small. Further, considering the influence of modeling error and process noise, this kind of disturbance may be concealed and viewed as the normal process changes. Therefore, enhacning the weak disturbance detection is of great value to ensure the safety of power transmission systems. In this paper, we integrate the statistical local analysis (SLA) with CVAkNN and propose an improved SLCVAkNN monitoring model for better weak disturbance monitoring performance.
SLA was originally proposed by Basseville [37] for inspecting the process parameter changes. In recent years, some researchers have introduced it into the chemical process fault detection and demonstrated its effectiveness [38][39][40]. In this paper, we will perform the statistical local analysis on the CVA model. To look back into the CVA monitoring statistics in Equations (13) and (14), it is found that the monitoring statistics used to indicate the process status are composed of the canonical variate vector d h and the CVA residual vector e h . Therefore, if we attempt to improve the weak disturbance monitoring of CVA statistics, the vectors d h and e h must be improved with stronger disturbance sensitivity.
According to the statistical local analysis theory, given the system observation z j and the system parameter ϑ, a primary residual vector ϕ(z j , ϑ) can be defined for disturbance detection if it meets the following conditions: [37,38] • Here ϑ 0 represents the parameters under the normal condition. By investigating the i-th element in the vector d h , which is denoted as d h,i , it is easily derived by Equation (11) that d h,i = a T i p h . Naturally, the variance of d h,i can be computed as For the statistical samples, E{p h p T h } is factually equal to the covariance matrix Σ pp . Further combining the first constraint on the vector a in Equation (3), it is known that a T i E{p h p T h }a i = 1. Therefore, we build the SLA primary residual of the canonical variate as which meets the condition E{ϕ d h,i } = 0 in the normal condition. Similarly, we analyze the variance of e h,i to obtain As A r can be obtained in the model training procedure, the above expression must be equal to a fixed value, which is denoted as σ i = A r (i, :)E{p h p T h }A r (i, :) T . Therefore, the SLA primary residual of the CVA residual can be built as which meets the condition E{ϕ e h,i } = 0 for the normal data. For a more sensitive disturbance detection, the SLA improved residual is applied in a moving window with the width of w, which is expressed by Up to now, we can obtain the SLA improved residual vectors ψ d,h = [ψ d h,1 ψ d h,2 · · · ψ d h,s ] T and ψ e,h = [ψ e h,1 ψ e h,2 · · · ψ e h,M ] T . These residual vectors are used to replace the original CVA features d h and e h so that the monitoring model is modified to the SLCVAkNN model.
With the SLA improved residual vectors, the monitoring statistics are constructed as follows:

Disturbance Detection Procedure Based on SLCVAkNN
Power transmission system disturbance detection based on SLCVAkNN method is divided into two stages: offline modeling stage and online detection stage. The corresponding flowchart is shown in Figure 1. Acquire the normal condition data to constitute the training data set X = [x T 1 , x T 2 , · · · , x T n ] T ∈ R n×m and perform data normalization processing. Here, the mentioned normal condition data mean the data from a section of transmission line between two adjacent nodes. For different lines, the corresponding modelings are needed separately.

2.
Construct historical data sets P and future data sets F according to Equations (4) and (5), calculate the covariance matrices by Equations (6) Construct the statistics matrix in the phase space according to Equations (17) and (18). 7.
Calculate the SNND monitoring indices DT 2 and DQ for all the training samples and determine the 95% confidence limits DT 2 lim and DQ lim by kernel density estimation.
Obtain online new data x t and normalize it with the training data.

2.
Construct the corresponding historical vector p t and project the p t to the CVA model and obtain the canonical variate vector d t and e t according to Equations (11) and (12).

4.
Compute the monitoring statistics T 2 t and Q t for the online new sample x t according to Equations (29) and (30).

5.
Construct the phase space statistics vector NT 2 t and NQ 2 t , and calculate the SNND index DT 2 t and DQ 2 t by Equations (21) and (22). 6.
Compare the SNND indices with the corresponding confidence limits DT 2 lim and DQ lim . If any one exceeds the confidence limit, a disturbance sample is indicated.
Here, it is pointed out that the local neighborhood standardization (LNS) [41] may be used to enhance the traditional z-score standardization. Compared with the traditional z-score method, LNS has better capability to deal with the non-steady data with the periodic oscillations.

Case Analysis
In order to verify the advantages of the SLCVAkNN method in the power transmission system disturbance detection, this section gives the case study on the real industrial data collected from the actual power transmission system. For method comparison, four methods, including the proposed SLCVAkNN method and three other methods of PCA, PCAkNN, and CVAkNN, are all applied to build the monitoring models for disturbance detection. The PCA method has two monitoring statistics T 2 and Q, while the other three methods are with the kNN-based statistics DT 2 and DQ. When these methods are used, they indicate the system status by the monitoring charts, where the monitoring indices of normal and faulty samples are given by black and blue solid lines, respectively, while the detection threshold, that is the 95% confidence limit of the monitoring index, is plotted by the red dashed line. One evaluation index, called the disturbance detection rate (DDR), is used to evaluate the different monitoring methods. DDR is the percentage of the abnormal samples exceeding the detection threshold over all the abnormal samples.
The used real industrial data are collected from the seven transmission lines in a power supply station in August 2018. These lines are radially connected. Their data are collected because all of them involve the ground fault. The data acquisition units, designed by Qingdao Topscomm Communication CO. LTD, are used to collect the electric field intensity and current. Here, the real line voltage is up to 110 KV so that the existing equipment can not directly measure it. Therefore, the electric field intensity is applied to reflect the voltage trend. For each transmission line, one corresponding data set is recorded that involves the normal state and the abnormal state. The data set has the length of about 1300 samples, where the disturbance starting time (DST) is different in different transmission lines. The detailed information about the acquired data sets are listed in Table 1, where DST data record the sample number corresponding to the disturbance starting time. A demonstration of the collected data for the DATA-A is given in Figure 2, where six measured variables, including the electric field intensities of phase A, B, and C, and the currents of phase A, B, and C, are involved. Due to the existence of the harmonic load, the current sine wave distortion can be seen in these curves.  Taking the data set DATA-2 as one example, it is collected from the pole 116-3 of the line 906. This data set includes 1312 samples. To investigate it with the help of on-site engineers, it is known that the disturbance occurs from the 456th sample. Although engineers can find this disturbance by careful analysis, this manual way is very time-consuming and inefficient, so it is difficult to implement in large-scale transmission system monitoring. Therefore, building an automatic multivariate data analysis tool is very necessary. In this section, we apply four MSA methods, which are PCA, PCAkNN, CVAkNN, and SLCVAkNN, to perform the automatic fault detection. When the statistical models are developed, the model parameters are set as follows: k = 3, L = 10, l = 2, w = 20. For the data set DATA-2, the first 320 sampling point are considered to be in a normal operating state, they can be utilized as the training data set for model development, while monitoring charts of PCA, PCAkNN, CVAkNN, and SLCVAkNN are demonstrated in the Figures 3-6, respectively. By the PCA monitoring results shown in Figure 3, it can be seen that the disturbance cannot be detected very effectively. The DDR of PCA T 2 is 4.43%, while the Q is a little better with the DDR of 29.52%. When PCAkNN is used, the DT 2 has a similarly poor detection rate, but the DQ statistic achieves clear improvement with the DDR of 57.76%. These results demonstrate that the PCAkNN method proposed by Cai et al. [20] can deal with the power system data with oscillation characteristic effectively. However, from these figures, the monitoring statistics do not exceed the confidence limits significantly. This may lead to the uncertain judgement on the occurrence of disturbance. When the CVAkNN is applied in Figure 5, the DQ statistic performs a little better with the DDR of 49.71%. However, its DT 2 indicator clearly improves the DDR to 92.51%, which means a significant detection rate improvement of about 70% in contrast with the PCAkNN's DQ index. The best monitoring results on this data set is provided by SLCVAkNN, which are shown in Figure 6. By this figure, it is observed that the disturbance is detected very clearly with the DDRs of 97.25% and 96.80% for DT 2 and DQ, respectively. This case gives a comprehensive comparison on the four methods of PCA, PCAkNN, CVAkNN, and SLCVAkNN. The applications show that PCAkNN does better than PCA due to the use of kNN, while SLCVAkNN further prompts the disturbance detection performance with the integration of CVA and SLA. Another example on the data set DATA-6 is illustrated, which corresponds to the line 906 exit. The modeling procedure is similar to the above case. Here we only give the monitoring charts of CVAkNN and SLCVAkNN, as shown in the Figures 7 and 8. With the consideration of system dynamics, the CVAkNN DT 2 monitoring chart gives a higher DDR of 88.51%. Compared with the CVAkNN method, which has only one effective monitoring statistic, SLCVAkNN has two well-behaved monitoring statistics. The DT 2 and DQ have the DDRs of 97.37% and 97.25%, respectively. The testing results on DATA-6 further verify the advantage of the proposed method over the CVAkNN method. The summary of disturbance detection rates for all seven data sets are shown in Table 2. From this table, it is shown that the faults in DATA-2 and DATA-4 are difficult to detect by PCA, whose DDRs are all lower than 30%. By the use of PCAkNN, these two faults are detected with higher DDRs, which are 57.76% and 26.78%, respectively. By contrast, CVAkNN does better on the two faults. In particular, its DT 2 statistic gives the DDR higher than 90%. When SLCVAkNN is used, its two monitoring statistics have the higher DDRs than 95%. For the sets of DATA-1, DATA-3, DATA-6, and DATA-7, PCA can detect these faults with about 70-80% DDR on one statistic. That means PCA can alarm these faults, but the alarm degree is not very sufficient. The PCAkNN and CVAkNN improve the DDR to about 90%. Further combining the SLA technique, SLCVAkNN achieves higher DDR than CVAkNN on these four sets. As to DATA-5, all these four methods give a similarly good performance with the DDRs higher than 95%. Considering all seven of these data sets, we observe that the average detection rates of CVAkNN outperforms the PCA and PCAkNN method, while the ones of SLCVAkNN statistics can reach 97.46% and 96.29%, which are the highest among these four methods.  To sum up, the applications on real industrial data verify the effectiveness of the proposed SLCVAkNN in the power transmission system monitoring. All the tested faults are about the ground faults. Although this paper does not provide the results on the other disturbances such as 1,3-phase short circuits, overvoltages, the presented algorithm is also suitable for these cases because they similarly lead to the changes of voltage and current. However, one related issue should be noted. In this article, this method detects all the occurred disturbances, including normal disturbances such as load power variations. To judge whether the disturbance is a fault or a normal disturbance is a further job. In fact, as to this issue, one solution is to enrich the modeling data with different normal changes. As the kNN used in this method can deal with the multimodal data case, the trained model can distinguish the faults and normal disturbances effectively when the normal changing data are considered in the model training procedure.

Conclusions
This paper proposes a power transmission system disturbance detection method based on SLCVAkNN. The real industrial data collected from the field transformer station are applied to verify the proposed method. By investigating the application results, we can draw the following conclusions. •

CVA-based monitoring method can provide better dynamic information mining.
The dynamic data analysis tool CVA is introduced to deal with the power transmission system data. By observing the application results, we find that CVAkNN has a higher detection rate than PCAkNN. • The statistical local analysis can further enhance the disturbance monitoring. Considering that many high-impendence ground faults in the real power systems are with insignificant symptoms, the weak disturbance detection methods are very important in improving the disturbance detection sensitivity. By focusing on the statistical local information of CVA features, the proposed SLCVAkNN method outperforms the CVAkNN method.