Detection and Analysis of Multiple Events Based on High-Dimensional Factor Models in Power Grid

Multiple event detection and analysis in real time is a challenge for a modern grid as its features are usually non-identifiable. This paper, based on high-dimensional factor models, proposes a data-driven approach to gain insight into the constituent components of a multiple event via the high-resolution phasor measurement unit (PMU) data, such that proper actions can be taken before any sporadic fault escalates to cascading blackouts. Under the framework of random matrix theory, the proposed approach maps the raw data into a high-dimensional space with two parts: (1) factors (spikes, mapping faults); (2) residuals (a bulk, mapping white/non-Gaussian noises or normal fluctuations). As for the factors, we employ their number as a spatial indicator to estimate the number of constituent components in a multiple event. Simultaneously, the autoregressive rate of the noises is utilized to measure the variation of the temporal correlation of the residuals for tracking the system movement. Taking the spatial-temporal correlation into account, this approach allows for detection, decomposition and temporal localization of multiple events. Case studies based on simulated data and real 34-PMU data verify the effectiveness of the proposed approach.


Introduction
A multiple event is composed of several constituent components that occur successively within a short time period in the power system, such as load fluctuations, oscillations and faults. For a large-scale power grid, multiple events can hardly be identified properly as it is difficult to distinguish their features via the raw data. The uncertainty and non-transparency of multiple events pose a serious threat to the system; it may lead to wide-spread blackouts, such as the August 2003 U.S. northeastern blackouts and the July 2012 India blackouts [1,2].
The development of data-driven algorithms [3,4], and the increasing deployment of synchronous phase measurement units (PMUs) allow for real-time situational awareness of the system. However, it is not feasible for us to process the raw data in a handcrafted way, which motivates the need for multivariate statistical approaches to assist multi-event detection and analysis. For example, in [5], Rafferty utilized moving-window principal component analysis (PCA) for real-time multiple event detection and classification. In his approach, a threshold of the cumulative percentage of total variation is selected in advance and then the number of principal components is determined accordingly. The proposed statistics T 2 and Q are derived from the above PCA model. In [6], sequential events can be effectively distinguished if their occurring time is sufficiently apart such that the event signal can be segmented into ones from multiple single events. In [7], Wang proposed the nonnegative sparse event unmixing (NSEU) approach to extract each single event from the mixed multiple event in a small power grid. They then extended their work to large-scale power systems through cluster-based sparse coding in [8]. Some other research on multiple event analysis are mainly model-based [9,10]. They use many power system parameters to model the power grid and strongly depend on the topology of the grid network.
Along the well-established research line of random matrix theory (RMT) [11][12][13][14], this paper proposes a novel statistical approach, namely, high-dimensional factor models, for the multiple event detection, decomposition and temporal localization in a modern grid. Given the characteristics of PMU measurements, we explore the spatial correlation (cross-correlation) and temporal correlation (autocorrelation) in PMU data structure. The autocorrelation exists between the measurements of adjacent sampling points, whereas the cross-correlation exists between different PMUs at different locations [15,16]. The proposed method extracts spatial-temporal information from the raw data, respectively, in the form of factors (principal components) and residuals. The factors are related to some certain events (anomaly signals or faults) occurring in a power grid, whereas the residuals are associated with white/non-Gaussian noises or normal fluctuations within the raw PMU measurements. The formulation of both time and space jointly is very demanding analytically. Due to the special feature of free probability (a modern tool in random matrix theory), the analytical difficulty is first overcome in [17], and in recent years, it has been well studied in economics [18] and statistics [19]. This paper, for the first time, extends their approach to the context of a power grid.
The main advantages of our approach are summarized as follows: (1) It takes advantage of unique properties of high-dimensional factor models: under reasonable assumptions of noise distributions (white/non-Gaussian), the PMU data matrix can be exactly decomposed into a signal subspace and a noise subspace by minimizing the distance between two spectrums. Then, the spatial correlation and the temporal correlation in the PMU data matrix are extracted from the signal subspace and the noise subspace, respectively, in the form of the number of factors and the autoregressive rate of residuals.
(2) The number of factors can reflect the number of constituent components and their respective details (e.g., occurring timings) in a multiple event, so as to realize the multi-event decomposition.
(3) The autoregressive rate of residuals can extract the temporal correlation within noises or fluctuations, so as to track the system movement. It is noted that the proposed approach has the advantage of handling both white and non-Gaussian noises. (4) This approach is purely data-driven, requiring no knowledge of topologies or parameters of the power system, which makes it able to adapt to changes in topology. (5) It is practical for both on-line and off-line analysis using moving-split windows.
To the best of our knowledge, for the first time, an algorithm aiming at multiple event detection and analysis (in which successive constituent components occurring within a short time period) for power systems based on random matrix theory has been proposed. The rest of this paper is organized as follows. In Section 2, the raw PMU data is processed into a normalized spatial-temporal matrix, and the main idea of high-dimensional factor models is introduced. In Section 3, the concrete formulation of high-dimensional factor models is given. The number of factors and the autocorrelation rate of residuals are estimated by minimizing the distance between two spectrums. In Section 4, both simulated data and real 34-PMU data are utilized to validate the effectiveness of the proposed approach. Comparisons with other algorithms for feature selection, such as deep learning and principal component analysis (PCA), are included in Section 5. Conclusions of this research is given in Section 6.

Problem Formulation
In this section, we first construct the PMU data into a spatial-temporal matrix and normalize it. Then, we give the general idea of high-dimensional factor models.

Data Processing
In this paper, PMU measurements (e.g., voltage magnitudes data, power flow data) are employed as status data for analysis. We consider N observation variables jointly for a duration of T sampling points. The observation window is represented by a matrix of size N × T. With sampling points at time t, t ≥ T, the spatial-temporal matrix of PMU measurementsX t ∈ C N×T is formed aŝ wherex t = (x 1,t ,x 2,t , · · · ,x N,t ) H is the N-dimensional vector measured at sampling time t.
If we keep the last sampling time as the current time, with the moving split-window, the real-time analysis can be conducted. Then, we convert the raw data matrixX t obtained at each sampling time t into a standard non-Hermitian matrixX t with the following normalization: , · · · , N and j = 1, 2, · · · , T.X t is the normalized spatial-temporal matrix built at sampling time t.

High-Dimensional Factor Models
For a certain window, i.e., the one obtained at sampling time t, we aim to decompose the standard non-Hermitian matrixX t , as given in Equation (2), into factors and residuals as follows: where p is the number of factors, F (p) is the factor, L (p) is the corresponding loading, and U (p) is the residual. Usually, onlyX t is available, whereas L (p) , F (p) and U (p) need to be estimated. Factor model aims to simultaneously provide estimators of the number of factors and the autocorrelation rate of residuals. Considering a minimum distance between the empirical spectral distribution ρ real (p) and the theoretical spectral distribution ρ model (b), the parameter-estimation problem is turned into a minimum-distance problem. The empirical one ρ real (p), depending on the sampling data, is obtained as empirical spectral distribution (ESD) of Σ (p) real in Equation (6), and the theoretical one ρ model (b), based on Sokhotskys formula, is given as ρ model (b) in Equation (10). Therefore, we turn the factor model estimation problem into a classical optimization as where D is a spectral distance measure or loss function. The solution of this minimization problem gives the number of factors, in the form ofp, and the parameter for the autocorrelation structure of residuals, in the form ofb. It is noted that the proposed approach considers the spatial-temporal correlation of the PMU data structure. It explores the spatial-(cross-) correlation of the data by estimating the factors. Meanwhile, it measures the temporal correlation (autocorrelation) by analyzing the residuals obtained by subtracting factors from the raw data.

High-Dimensional Factor Model Analysis
In this section, we further discuss the calculation of the high-dimensional factor models, so as to obtain the spatial indicator p and the temporal indicator b. In this process, free random variable techniques proposed in Burda's work [20] are used to calculate the modeled spectral density. Then, Jensen-Shannon divergence is used to find the optimal parameter set (p,b).

Empirical Spectral Distribution
According to Equation (3), p-level empirical residuals are generated as: where F (p) is a p × T matrix of p factors, each row of which is the j-th (j = 1, · · · , p) principal component ofX T tX t , L (p) is an N × p matrix of factor loadings, estimated by multivariate least squares regression ofX t on F (p) . Then the covariance matrix of p-level residuals is obtained as The subscript "real" indicates that Σ (p) real is obtained from real data. The idea behind this is simple. We keep subtracting factors until the bulk spectrum from the residuals using real data becomes close to that from modeled residuals. The steps are summarized in Table 1. Table 1.
Steps of calculating ρ real (p).

Theoretical Spectral Distribution
Reference [21] provides a fundamental formulation to model noises. We assume there are crossand auto-correlated structures within the noises such that U is represented as where G is an N × T matrix with independent identically distributed (i.i.d) Gaussian entries, and A N and B T are N × N and T × T symmetric non-negative definite matrices, representing cross-and auto-correlations, respectively. The theoretical spectral distribution with general A N and B T , however, is difficult to be obtained due to too much computation via Stieltjes transform. To simplify the modeling, referring to Yeo's work [17], we make two assumptions here.
1. The cross-correlations are effectively removed by subtracting p principal components, where p is the true number of factors, and the residual U has sufficiently negligible cross-correlations: The autocorrelations of U are exponentially decreasing. That is, B T is in the form of exponential decays with respect to time lags, as: where |i − j| is the distance between time i and time j, |b| < 1. This is equivalent to modeling the residual as an autoregressive model, i.e., the AR(1) process: process degenerates into Gaussian white noise.
As a result, the correlation structure of the time-series residuals following autoregressive processes can be represented as ρ model (b). This simplification enables us to calculate the modeled spectral density much more easily. In this paper, we calculate ρ model (b) using free random variable techniques proposed in [20]. For the autoregressive model U, by using the Moment Generating Function M ≡ M Σmodel (z), and its inverse relation to N-transform, we can derive the polynomial equation as Equation (8), by solving which we can obtain M Σmodel (z). Then, we can calculate the Green's function G Σmodel (z) by Equation (9). The theoretical spectral density ρ model (b) can be reconstructed from the imaginary part of G Σmodel (z) by using the Sokhotsky's formula as Equation (10).
The steps of calculating ρ model (b) are summarized as follows. For the key concepts and derivation details of the calculation process, please refer to Appendix A.
Steps of calculating ρ model (b): (3) Get the theoretical spectral density from the Green's Function G Σmodel (z) by using Sokhotsky's formula:

Distance Measure
The spectral distance measure D is used to find the optimal parameter set (p,b) by minimizing the distance between ρ real (p) and ρ model (b). Since the empirical spectrum contains spikes, a distance measure which is sensitive to the presence of spikes should be given. This paper uses Jensen-Shannon divergence, which is a symmetric version of Kullback-Leibler divergence. Jensen-Shannon divergence is defined as follows: where ρ real and ρ model are probability densities, ρ = 1 2 (ρ real + ρ model ). Equations (12) and (13) are given to calculate the Kullback-Leibler divergence used in Equation (11).
where ρ i = ρ(ih) with eigenvalue grid size h. To deal with zero elements that may appear in ρ i , we use: where ε is a small enough positive number. Denote the number of zero elements in ρ i as num, In this way, the probability density satisfies ∑ iρ i = 1. Note that the Kullback-Leibler distance becomes larger if one density has a spike at a point while the other is almost zero at the same point. It is sensitive to the presence of spikes.
Based on Sections 3.1-3.3, the spectrum of the PMU data is decomposed into two parts: spikes and a bulk. As we subtract p factors from data, then p spikes that correspond to the p largest eigenvalues are removed from the spectrum of the original data. Meanwhile, the parameter b controls the region of smaller eigenvalues. The parameters p and b give an overall description of the PMU measurements.

Case Studies
In this section, the proposed method is tested with both the simulated data from the IEEE 118 bus system and real 34-PMU data. The MATPOWER package is used to generate the simulated PMU data [22]. To mimic the normal fluctuations of active load, noise is added so that the signal-to-noise ratio (SNR) is set as 22 dB. In these simulations, three kinds of events, i.e., a short-circuit fault, a disconnection fault, and a generator tripping event, are considered as constituent components of a multiple event, respectively, whereas the magnitudes of bus voltages are used as raw data for analysis. The topology of the IEEE 118 bus system is shown in Figure 1.

Case Study with Simulated Data
In this set of cases, the simulated data contains 118 voltage measurements with 700 sampling points, respectively. The size of the moving split-window is set as 118 × 250, i.e.,X t ∈ C 118×250 . Then, Equation (4) is used in each split-window to estimate the parameters p and b asp andb: thep for the number of the constituent components in a multiple event, and theb for the autocorrelation structure of the residuals. The idea behind this is as follows. First, raw voltage magnitude data in each split-window is normalized using Equation (2). Then, we keep subtracting different numbers of factors from the normalized voltage magnitude data using Equation (5) until the residuals become close to the autoregressive model with autoregressive rate b as we calculate in Equation (8). The Jensen-Shannon divergence in Equation (11) is utilized to measure how "close" the two residual spectral distributions are close to each other. The optimal parameter set (p,b) is finally obtained once the Jensen-Shannon divergence is minimized. Three cases are designed in this part to validate the proposed approach. In Sampling Point (c) Case 3

Figure 2.
Case Study with Simulated Data. The estimated number of constituent components in a multiple event, i.e.,p, is equal to the actual number of constituent components, i.e., n event that we assume in a multiple event.

Case 1-A Single Event
In Case 1, a single event, i.e., a short-circuit fault is set at Bus 64, which is shown in Table 2.  Actually, as can be seen from Table 2: no events occur in the system whenp keeps steady. Right at t s = 500, a short-circuit fault happens at Bus 64. Therefore, we can conduct the single event detection with the proposed approach. Moreover, in this case, for the split-windowt ∈ [251, 500] tot ∈ [451, 700], there exists a single event (a short-circuit fault at Bus 64 at t s = 500) and 1 factor, i.e., 1 event,p ≈ 1.

Case 2-A Multiple Event with Two Constituent Components
In this case, a multiple event is set with two constituent components. One is a short-circuit fault at Bus 64, and the other is a disconnection fault at the line connected by Bus 23 and Bus 24, which is shown in Table 3.  Figure 2b shows that: 1. During the sampling time t s = 250 ∼ 499,p andb remain steady, which is consistent with Table 3: no events occur. 2. At t s = 500,p starts to decline and then keeps around 1 till t s = 599. For the split-window t ∈ [251, 500] tot ∈ [350, 599], there exists a single event (the short-circuit fault at Bus 64 at t s = 500) and 1 factor, i.e., 1 event,p ≈ 1.
3. At t s = 600,p starts to raise and then keeps around 2. For the split-window t ∈ [351, 600] to t ∈ [451, 700], there exist two constituent components (a short-circuit fault at Bus 64 at t s = 500 and a disconnection fault at the line connected by Bus 23 and Bus 24 at t s = 600) and 2 factors, i.e., 2 constituent components,p ≈ 2.

Case 3-A Multiple Event with Three Constituent Components
In this case, a multiple event is set with three constituent components at Bus 64, the line connected by Bus 23 and Bus 24, and Bus 107, respectively, which is shown in Table 4.   Table 5. Based on the three cases, we can deduce that the estimated value of factors (p) is equal to the number of constituent components in a multiple event (n event ). To validate the above conclusion, we explore the empirical spectral distribution (ESD) of the covariance matrix ofX in Case 3. The ESD typically exhibits two aspects: a bulk and several spikes (i.e., the deviating eigenvalues). The bulk arises from noise or normal fluctuations, while the spikes are caused by faults. Figure 3 shows that: 1. At t s = 400, no spikes (factors) are observed. It is noted thatp remains almost at 5 in Figure 2c when no events occur. The most likely explanation is that some normal fluctuations of active load create several weak factors. Our approach is sensitive to weak factors under normal operating conditions. However, this phenomenon does not interfere with the judgment of the number of constituent components in a multiple event, because the number of factors is stable under normal conditions, whereas when a multiple event occurs, the change of the number of factors has a regularity. 2. At t s = 520, one spike (factor) is observed. It is caused by the single event in the window, which is consistent with the results in Table 5. 3. At t s = 570, two spikes (factors) are observed. They are caused by the two constituent components of a multiple event in the window. 4. At t s = 620, three spikes (factors) are observed. They are caused by the three constituent components in the multiple event.
Note thatp is a spatial indicator that maps the events occurring in the system.   Table 5.

More Discussions ofb
In Figure 3, we can see that the estimations ofp andb are simultaneous and interdependent. The deviating eigenvalues are factors, whose number is estimated by the indicatorp, whereas the region of the bulk is closely related tob. In other words,p derives from the signal subspace, denoting the number of signals or events, whereasb derives from the noise subspace, denoting the autocorrelation rate of residuals. The number of factors can be determined only if the bulk is well fitted by the theoretical autoregressive model with parameterb.
For the PMU data structure,p reveals the correlation (cross-correlation) in space, and meanwhile, b tracks that in time (autocorrelation). Although the number of constituent components within a multiple event is contained in the spatial information, we can see that the parameterb can also reflect the status of the system based on the results in the above results. Every time a constituent component of the multiple event occurs,b changes together withp and can act as a temporal indicator to assist judgment.

Case Study with Real Data
In this section, we evaluate the proposed algorithm with real 34-PMU data. In the following experiment, the real power flow data of a multiple event happened in the China power grids in 2013 is utilized as the raw data for analysis. The PMU number, the sampling rate and the total sampling time are 34, 50 Hz and 284 s, respectively. The multiple event happened from 65.4 s to 73.3 s. Figure 4a,b depict the three-dimensional power flow data under normal conditions and in the case of multiple events. It can be seen that the power flow is relatively steady in the normal state whereas changes irregularly when the multiple event occurs.  In this experiment, we use a window of size 34 × 400 (i.e.,X t ∈ C 34×400 ) and move it at each sampling point. In each split-window, we estimate the number of factorsp and the autoregressive rate of residualsb of the power flow data as the synthetic voltage magnitude data we deal with in Section 4.1. Near the occurrence of the multiple event,p andb are estimated, and the indicators are shown in Figure 5. From thep − t andb − t curves, we can realize multiple event detection and analysis as follows: 1. From 60.00 s to 65.38 s, the values ofp andb remain steady, indicating that no events occur during this period. It is noted that the value ofp stays around 7 rather than 5 as it is in the simulated data. One explanation is that even though noise with an SNR of 22 dB is added to the simulated data, it is still difficult to accurately mimic the fluctuations of real PMU data. The normal fluctuations of real power flow data cause some weak factors, which are detected by our algorithm.   As the window continues to move, the number of constituent components in the window decreases, corresponding to the decrease ofp value from 73.38 s to 80.98 s. In addition, as a temporal indicator to describe the autocorrelation of residuals,b decreases every time a new constituent component occurs.p, together withb, reveals the occurrence of the multiple event.

Comparisons and Discussions
In this section, we compare our algorithm with two frameworks for feature selection, i.e., deep learning and PCA. We first give a review of the frameworks, and then make further analysis and comparisons.

Comparison with Deep Learning
In recent years, deep artificial neural networks have undergone a rapid development in pattern recognition and machine learning. The deep learning algorithms can automatically select features from massive data sets. The most successful form of deep learning, is supervised learning. In a typical supervised deep-learning system, hundreds of millions of labelled examples are needed to train the deep neural networks [23]. However, the real data deficiency of multiple events, especially those with labels, has greatly hindered the practical application of the supervised learning technology. In addition, the interpretability and robustness of deep learning are still being questioned [24][25][26][27]. The undesirable behaviour of deep learning has brought about concerns for AI safety [28], making it difficult to be applied to power grids with high security requirements as a result.
In contrast, the proposed high-dimensional factor model is not only "label-free", but also has no requirements for data quantity. Moreover, our method is deeply rooted in random matrix theory and has the advantage of transparency in that our results are provable. With the aid of Equation (7), our approach is capable of dealing with spatial-temporal correlation, which is a difficulty for many common algorithms. Using powerful capabilities of free probability, Equation (7) can be analyzed in a closed form so iterative procedures can be developed to search for the two fundamental parameters, i.e.,p andb, as defined in Equation (4).

Comparison with Principal Component Analysis (PCA)
Principal Component Analysis (PCA) is a classical method for feature selection and dimensionality reduction. It transforms highly correlated variables in the datasets into a set of simplified, uncorrelated variables that best summarize the information in the raw data. These simplified, uncorrelated variables are called Principal Components (PCs). They are ordered and preserve most of the variation that was present in the original data [29]. For more details of the PCA algorithm, readers can refer to [29].
We compare our method with PCA using moving split-windows. That is, we first choose a threshold of cumulative variance contribution rate, and then the number of PCs is determined according to the empirically determined threshold defined as follows.
where λ j is the eigenvalue of the raw matrixX t after subtracting mean values within each dimension. The eigenvalues are sorted in descending order and m is their total number. η k is the cumulative variance contribution rate of the front k PCs and α is the pre-defined threshold, α ∈ [0%, 100%]. The idea behind this is simple, we keep adding the eigenvalues from large to small until the inequation (15) holds for the first time, and the k is the number of PCs we need. Apparently, the number of PCs is closely related to the predetermined cumulative variance contribution rate α. In this experiment, we choose different cumulative variance contribution rates as shown in Figure 6. Real 34-PMU data in Section 4.2 is utilized for analysis. The size of the split-window is set the same as in Section 4.2, i.e., 34 × 400, so that the multiple events are the same in each split-window of the two comparative experiments, and the two methods can be compared directly.   Figure 6, and the number of principal components is shown by the color bar on the right side. The two figures indicate that the cumulative variance contribution rate has a significant influence on the number of principal components. The threshold, however, is set empirically in advance. Furthermore, Figure 7 shows that the number of PCs determined by PCA cannot effectively reflect the number of constituent components in the multiple event no matter what cumulative variance contribution rate we choose.
In contrast, our method can determine the number of factors without any pre-defined conditions and thus outperforms the moving split-window PCA. Both spatial and temporal information are taken into account and the respective parametersp andb are selected automatically to detect and analyze the multiple events.

Conclusions
Based on high-dimensional factor models, this paper proposes a data-driven method for multiple event detection and analysis.Jointly considering both spatial and temporal correlation, this approach is able to reveal (or even decompose) a multiple event: its constituent components and their respective details, e.g., occurring timings. This joint consideration is made possible by the recent advance in random matrix theory-based statistical methods. In particular, following [17], we model the joint correlation using Equation (7). This model is very difficult to solve analytically. Using free random variables, the analytical solution is possible. Our paper exploits this mathematical trick.
The proposed method is purely data-driven, requiring no prior knowledge of the network topology or parameters, which enables it to adapt to topology changes. Both simulated data and real PMU data is used to verify the effectiveness of the algorithm. In the future work, more extensions are possible. For example, more general modeling for noise can be employed. If we consider the auto regressive and moving average model ARMA(1,1) process, we have up to 6-th order polynomial equations [20].
ρ H (λ) can be reconstructed from the imaginary part of the Green's Function G H (z). Now, recall the residual U defined in Equation (7). Its covariance matrix Σmodel can be expressed as follows.
The N-transform of Σmodel can be derived as where free random variable (FRV) multiplication law and the cyclic property of trace are used. Note that N 1 T G T G (z) = (1+z)(r+z) z .
Using the Moment Generating Function M ≡ M Σmodel (z) and its inverse relation to N-transform, we can obtain For the AR(1) process we assume in Section 3.2, the cross-correlation matrix is A N ≈ I N×N . Then, N A (z) = N I (z) = 1 + 1/z. Therefore, Equation (A10) can be written as Then we need to find the Moment Generation Function of B T . Recall that the auto-covariance matrix of AR(1) process can be written as (B T ) i,j = b |i−j| in Section 3.2. Using its Fourier-transform, we can derive M B T as Therefore, we can obtain Equation (8).