A Novel Framework for Synchrophasor Based Online Recognition and E ﬃ cient Post-Mortem Analysis of Disturbances in Power Systems

Featured Application: Synchrophasor based data compression and post-mortem analysis as well as online detection and classiﬁcation of grid disturbances. Abstract: Synchrophasor based applications become more and more popular in today’s control centers to monitor and control transient system events. This can ensure secure system operation when dealing with bidirectional power ﬂows, diminishing reserves and an increased number of active grid components. Today’s synchrophasor applications provide a lot of additional information about the dynamic system behavior but without signiﬁcant improvement of the system operation due to the lack of interpretable and condensed results as well as missing integration into existing decision-making processes. This study presents a holistic framework for novel machine learning based applications analyzing both historical as well as online synchrophasor data streams. Di ﬀ erent methods from dimension reduction, anomaly detection as well as time series classiﬁcation are used to automatically detect disturbances combined with a web-based online visualization tool. This enables automated decision-making processes in control centers to mitigate critical system states and to ensure secure system operations (e.g., by activating curate actions). Measurement and simulation-based results are presented to evaluate the proposed synchrophasor application modules for di ﬀ erent use cases at the transmission and distribution level. This study presents a novel framework of machine learning-based applications for analyzing historical and online synchrophasor measurements. enables the e ﬃ cient processing and post-mortem analysis of large, unlabeled measurement records as the online detection and classiﬁcation of grid disturbances in an automated fashion. The interaction of these novel applications as the online visualization of the results allow enhanced situational awareness for the and assessment states the measures system


Introduction
The electrical power system is in a transition process. While the number of converter-interfaced renewable generation rises, conventional power plants are decommissioned, which leads to a reduced system inertia and rises volatility in the electrical power system [1,2]. The deregulation of electricity generation and unbundling of the market from transmission and distribution tasks introduces additional challenges [3]. Thus, today's control room operators are facing a large number of events during daily system operation. To address these challenges, synchronized phasor measurements and wide area monitoring (WAM) systems are deployed worldwide in power system control rooms [4]. Being a valuable resource to observe and understand the dynamics of power systems, they additionally enable a new quality of operator decision support functions, assistant systems and automated control [4][5][6].
This study presents a novel framework of machine learning-based applications for analyzing historical and online synchrophasor measurements. This enables the efficient processing and post-mortem analysis of large, unlabeled measurement records as well as the online detection and classification of grid disturbances in an automated fashion. The interaction of these novel applications as well as the online visualization of the results allow an enhanced situational awareness for the recognition and assessment of critical system states and the activation of appropriate counter measures to ensure a secure system operation. Chapter 2 presents a state of the art for synchrophasor applications, their current usage in today's power systems as well as limitations for a widespread utilization. Following a literature review of recent research fields in Section 2.3, the novel framework for advanced synchrophasor applications is presented in Section 3.1. Sections 3.2 and 3.3 describe the offline applications spatiotemporal data compression module and disturbance extraction module and Sections 3.4 and 3.5 follow with detailed descriptions of the disturbance detection module and disturbance classification module. Section 4 presents and discusses the application results from different case studies including field measurements as well as dynamic grid simulations. A short presentation of a web-based visualization tool is given in Section 5. Section 6 concludes the investigations and gives a short outlook for possible future work.

Application of Synchrophasors in Modern Control Centers
Conventional supervisory control and data acquisition (SCADA) systems do not provide the ability to monitor transient events [7,8]. Even though the connected remote terminal units (RTU) can communicate cyclically, the measurements samples are typically not time aligned when arriving at the control center [9,10]. Thus, a state estimator is required [9]. Unless a general interrogation is inquired, traditional SCADA protocols (e.g., IEC 60870-5-101 /-104 [11] and DNP3 [12]) only transmit spontaneous messages in case of measurement value changes or if a switch status message has been received. This requires a state estimation (SE) to generate a valid steady-state power flow image of the system state. This state estimate is used by most secondary energy management system (EMS) functions and applications (i.e., contingency analysis, optimization, etc.); hence the SE module is often referred to as the cornerstone of an EMS. Dynamic stability assessment (DSA) systems can enhance the situational awareness in transmission control centers significantly [13]. DSA systems run time-domain simulations to evaluate possible contingencies as well as suitable mitigation strategies. These results can also be used for the creation of training sets for machine learning based analysis modules. Additionally, phasor measurement unit (PMU) based wide area monitoring (WAM) systems increase the dynamic observability by recording transient events by time synchronized samples. The synchronization accuracy of modern GPS linked PMUs is below 1µs [14]. Modern phasor data concentrators (PDC) can acquire data from several hundred PMUs [15][16][17]. An IEEE C37.118 [18] conform interface to IEC 61850 [19] is available, which increases the flexibility of PMU data processing [20]. The synthetic patterns created by DSA systems and the recorded patterns from WAMS can be used to improve the training set as well as the DSA time-domain model. A promising application is the HVDC based RAS, an automated control strategy to exploit the flexibility of the VSC technology to substitute the expensive generator-based redispatch [21].
Today's synchrophasor applications typically run in parallel to existing SCADA based control room applications (see Figure 1). According to [22], common PMU applications can be distinguished into real-time operation tasks (e.g., frequency stability monitoring, power oscillation monitoring, phasor-only state estimation, dynamic line rating) and planning tasks (e.g., model calibration, primary frequency response analysis, post-mortem analysis). Further applications as well as possible enhancements of SCADA based monitoring and control applications were also investigated in [23,24]. A recent study identified the following hindering issues for a wider application of synchrophasor technology in today's system operation [25]: • A clear determination of accurate, dynamic operating limits (i.e., phase-angle differences, or oscillations) is not available, which diminishes the value of the information gain for the system operation.

•
The data quality still strongly depends on the quality of the instrument transformers, as well as the ICT infrastructure.

•
The rising observability due to a newly installed WAMS can lead to an exposition of new issues, which can lead to a commitment of valuable personnel to investigate the problems.

•
Unless some TSOs exchange data of strategic important PMUs, the data exchange is often subject to cyber security issues or other sensitivities. • A separate development of EMS and WAMS lead to challenges for human operators, who prefer a single unified user interface to support a smoother workflow and a clear decision-making. • A general evaluation scheme of system dynamics and correlating actions still needs to be defined.

•
An operator training, addressing the understanding and the interpretation of dynamic phenomena, needs to be established to raise the level of operator awareness and to establish a flexible response to events, instead of a mainly rule-based operation.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 3 of 35 • The data quality still strongly depends on the quality of the instrument transformers, as well as the ICT infrastructure. • The rising observability due to a newly installed WAMS can lead to an exposition of new issues, which can lead to a commitment of valuable personnel to investigate the problems. • Unless some TSOs exchange data of strategic important PMUs, the data exchange is often subject to cyber security issues or other sensitivities. • A separate development of EMS and WAMS lead to challenges for human operators, who prefer a single unified user interface to support a smoother workflow and a clear decisionmaking. • A general evaluation scheme of system dynamics and correlating actions still needs to be defined. • An operator training, addressing the understanding and the interpretation of dynamic phenomena, needs to be established to raise the level of operator awareness and to establish a flexible response to events, instead of a mainly rule-based operation.

Enhanced Situational Awareness
The most commonly used definition of situation awareness (SA) describes it as "the perception of the elements in the environment within a volume of time and space, the comprehension of their meaning, and the projection of their status in the near future" [26]. As the complexity of the power system operation grows, the risk of human induced errors with consequences for the system security will rise. In future, human operators need appropriate tools to assist their cognitive capabilities in the evaluation of a growing amount of data gathered in the control room for a quicker diagnosis and decision-making [27]. Assistance systems can enhance the SA during the system operation and allows to perceive critical situations earlier.

State of the Art Analysis of Synchrophasor Based Detection and Mitigation of Critical Events
Current research in the field of synchrophasor based applications for enhanced monitoring and control mainly focus on online or real-time methods. This includes robust statistical or machine learning based algorithms for the automated detection, localization and identification of critical events or cascading events [28][29][30]. Online algorithms for the automated detection or assessment of system instabilities (e.g., by computing stability margins), like transient and small signal stability,

Enhanced Situational Awareness
The most commonly used definition of situation awareness (SA) describes it as "the perception of the elements in the environment within a volume of time and space, the comprehension of their meaning, and the projection of their status in the near future" [26]. As the complexity of the power system operation grows, the risk of human induced errors with consequences for the system security will rise. In future, human operators need appropriate tools to assist their cognitive capabilities in the evaluation of a growing amount of data gathered in the control room for a quicker diagnosis and decision-making [27]. Assistance systems can enhance the SA during the system operation and allows to perceive critical situations earlier.

State of the Art Analysis of Synchrophasor Based Detection and Mitigation of Critical Events
Current research in the field of synchrophasor based applications for enhanced monitoring and control mainly focus on online or real-time methods. This includes robust statistical or machine learning based algorithms for the automated detection, localization and identification of critical events or cascading events [28][29][30]. Online algorithms for the automated detection or assessment of system instabilities (e.g., by computing stability margins), like transient and small signal stability, were introduced in [31,32]. Apart from recognition tasks, authors in [33][34][35] suggest the utilization of PMUs for the real-time control of active grid components (e.g., for damping of inter-area oscillations or activating special protection schemes) as part of a wide area monitoring, protection and control (WAMPAC) system. This also enables a synchrophasor based congestion or restoration management including suitable monitoring and control algorithms. In addition to the use of PMUs in transmission power systems, authors in [36,37] investigated the application of low-cost µ-PMUs for lower voltage levels (distribution grids). The automated detection of grid disturbances and the activation of suitable countermeasures (e.g., curative actions) is an important application to prevent or mitigate major supply disruptions or blackouts. It enables a fast and reliable system operation with reduced human supervision. Existing approaches for the online detection, identification, and localization of grid disturbances from synchrophasor measurements utilize different feature extraction and classification techniques. Additionally, there are major differences with regard to the addressed task (detection, identification or localization), the investigated contingency events and the used input data (e.g., measurement channels, pre-and post-disturbance time). Authors in [38] propose a simple k-Nearest Neighbor based method to detect different events from its oscillation modes using frequency measurements. More sophisticated approaches utilize support-vector machines [39] or decision rules [40] to identify different disturbance events voltage spikes, load losses or line trips. The combined identification and localization (disturbance classification) of different line trip and short circuit events is investigated in [28,41] by analyzing voltage signals. Other works use frequency and voltage measurements to classify various disturbance events by combining wavelets and support vector machines [42,43] or S-Transforms and extreme learning machines [30]. The disturbance classification, as a supervised learning task, relies on a sufficient training database with a predefined selection of critical contingencies. The sole generation of a measurement based training dataset using post-mortem analysis is computationally costly, has high risks to incomplete datasets (only observed contingencies can be considered) and requires a large amount of historical phasor measurement records. Most of the studies, therefore, utilize dynamic grid simulations to generate a representative training dataset. This requires a detailed knowledge of the grid topology and the control parameters of the relevant active grid components as well as a suitable simulation tool to perform large scale contingency simulations for a wide range of grid scenarios.
Existing recognition approaches mainly focus on measured or simulated online phasor data streams and neglect potential useful information from historical measurement records. In contrast to that, a combined online and offline analysis framework can improve simulation-based detection methods by utilizing knowledge from historical measurements.

Framework Architecture and Analysis Modules
In this study, a novel framework for analyzing historical and online synchrophasor measurements is presented in order to detect and classify grid disturbances and to activate suitable countermeasures. In contrast to existing approaches (see Section 2.3), additional knowledge from historical measurement records can be integrated in order to increase the recognition capabilities and the classification accuracies of the online analysis modules. This framework combines different offline and online synchrophasor applications to enable an efficient analysis and decision-making based on simulated and measured critical events. The proposed analysis modules comprise of a spatiotemporal data compression module, disturbance extraction module, disturbance detection module and disturbance classification module.
The spatiotemporal data compression module eliminates redundant information for an efficient processing and analysis of large historical phasor measurement records. Due to the high reporting rates (typically between 10 and 50 frames per second [18]) and the large number of installed PMU sensors especially in large power grids, high data volumes and transmission rates occur when collecting synchrophasor data in control centers. With the use of statistical methods, the storage demand in modern control centers can be drastically reduced. This enables a long-term archiving of phasor data as well as an efficient post-mortem analysis of large datasets. A detailed description of the spatiotemporal data compression module is given in Section 3.2.
The disturbance extraction module analyzes compressed synchrophasor data records to extract potential critical events and to enhance the training database of the disturbance classification module. For this purpose, an intelligent analysis procedure is proposed to automatically explore historical measurement records and to discover potential interesting events that were not detected with existing grid monitoring tools. These events mainly include disturbances like malfunctions, outages, oscillations, faults or other unusual deviations from regular grid operation. It is to be expected, that only a small part of the measurement records contain relevant events. Additionally, no prior knowledge is available, which further complicates the recognition task. Extracted events or disturbances can be processed further, e.g., for root-cause event analysis, the assessment of grid assets and the enhancement of training data or subsequent signal analysis algorithms (e.g., for the disturbance detection module and disturbance classification module in Sections 3.4 and 3.5).
The disturbance detection module detects deviations from steady conditions by computing an anomaly score for each PMU measurement channel. In contrast to the previous application modules, this is done online using a sliding window-based approach. Generally, there is a vast number of possible events or disturbances, which lead to deviations from steady state conditions including outages, short circuits / faults, oscillations or malfunctions. The automated recognition of these events is a basic prerequisite to prevent undesired or unstable system states and to initiate suitable countermeasures. In contrast to the disturbance classification module, an exact identification of the disturbance type or the location of the disturbance origin is not performed. Instead, the goal of the disturbance detection module is to recognize possible disturbances, if only undisturbed system conditions are known. Therefore, the disturbance detection module can act as a trigger for subsequent process analysis or can enhance the training data base.
In case of high anomaly scores, the disturbance classification module estimates the disturbance type and location of the transient event by computing the corresponding probabilities of occurrence. Similar to the disturbance detection module, this is done online by analyzing the current phasor data stream. Compared to traditional SCADA or protection systems, a wide range of possible disturbances can be detected within short time spans (< 1 s) including outages from generators, lines, or renewable energy sources as well as short circuits or load trips. Precise information about the origin and type of the disturbance as well as additional time and frequency analysis information (e.g., oscillation modes, frequency drops) enable the initialization of precise countermeasures (e.g., curative actions) to restore a stable system state. These actions can be precomputed for different disturbances and updated with each new steady state condition. Today's DSA systems provide sufficient computation options to simulate different contingencies and to select and assess suitable countermeasures [13]. Further descriptions are given in Section 3.6.
For a better understanding, Figure 2 shows the simplified workflow and interaction between the different synchrophasor based online and offline analysis modules. The phasor data concentrator collects the phasor measurements and continuously updates a database. At the same time, the online data stream is forwarded to the online analysis modules. The offline analysis modules can be activated periodically to process a certain amount of historical measurement records. The measurement and analysis results can be visualized with a web-based user interface (see Section 5) and can be transferred to subsequent EMS applications. Appl. Sci. 2020, 10,

Spatiotemporal Synchrophasor Data Compression
The idea of data compression is to transform the original input measurements into a small subset of coefficients with minimum information loss. The smaller this subset is compared to the size of the uncompressed signal, the higher is the compression rate. The applied signal transformation aims to eliminate all redundant information and can be inverted to reconstruct the original signals with a certain error (reconstruction error). In contrast to existing single-step approaches, the spatiotemporal data compression module combines two compression stages to identify and eliminate redundant synchrophasor data in space (e.g., by PMU sensors with close proximity) and time (e.g., by low signal variations over long time spans)-see Figure 3. This enables high compression rates in presence of transient events and preserves low reconstruction errors. The compression rate C is calculated according to (1) given the cardinality of the normalized input signal N divided by the cardinality of the compressed signal C . The transformation coefficients C must also be taken into account, because they are required to reconstruct the original signals.
The reconstruction error R is based on the L2-norm (squared Euclidean distance) between the original signal and the reconstructed signal N -see (2).
Minimizing this error ensures a close approximation of the original measurement matrix.

Spatiotemporal Synchrophasor Data Compression
The idea of data compression is to transform the original input measurements into a small subset of coefficients with minimum information loss. The smaller this subset is compared to the size of the uncompressed signal, the higher is the compression rate. The applied signal transformation aims to eliminate all redundant information and can be inverted to reconstruct the original signals with a certain error (reconstruction error). In contrast to existing single-step approaches, the spatiotemporal data compression module combines two compression stages to identify and eliminate redundant synchrophasor data in space (e.g., by PMU sensors with close proximity) and time (e.g., by low signal variations over long time spans)-see Figure 3. This enables high compression rates in presence of transient events and preserves low reconstruction errors. The compression rate r C is calculated according to (1) given the cardinality of the normalized input signal X N divided by the cardinality of the compressed signal X C . The transformation coefficients θ C must also be taken into account, because they are required to reconstruct the original signals.
The reconstruction error e R is based on the L2-norm (squared Euclidean distance) between the original signal and the reconstructed signalX N -see (2).
Minimizing this error ensures a close approximation of the original measurement matrix.  For the spatial compression stage, different dimension reduction techniques can be used including principle component analysis (PCA), independent component analysis and non-negative matrix factorization. In case of the temporal compression, time-frequency transformations are suitable like discrete wavelet transform (DWT), discrete cosine transform or fast Fourier transform. Based on preliminary work [44], PCA and DWT are used as best combination to achieve low reconstruction errors and high compression rates. In the spatial compression stage, PCA is applied on the normalized measurement matrix, which creates a set of "virtual" PMU sensors depending on the number of chosen principle components. This ensures that only non-redundant information (highest signal variances) are captured by the first principle components. In this step, the temporal dimension (number of time steps) remains the same. In the second compression stage, DWT transforms each "virtual" sensor signal into the frequency domain. In case of multiple decomposition levels, this transform is applied recursively to generate the approximation and detailed coefficients. A compression is achieved by setting some detailed coefficients to zero depending on a predefined, global threshold value. PCA and DWT are both linear signal transform techniques. The PCA and the DWT parameters can be reused in the spatial and the temporal inverse processes to reconstruct the "virtual" sensor signals as well as the input measurement matrix-see Table 1. This procedure is applied on each PMU channel separately assuring optimal compression rates and reconstruction errors. A detailed description of the methodology as well as a comparison between the different compression techniques is given in [44].

Method
Compression Result Parameters PCA PCA scores (principle components) PCA loadings, sample means DWT approximation and detailed coefficients low-and high-pass filters, wavelet expansion coefficients

Disturbance Extraction (Post-Mortem Analysis)
The disturbance extraction module detects a predefined number of possible grid disturbances by analyzing historical records of a single phasor measurement signal (e.g., frequency). The lack of positive examples (non-outliers) and negative examples (outliers) as well as the high diversity of outlier types makes an exact prediction very difficult. To minimize the variance of the model For the spatial compression stage, different dimension reduction techniques can be used including principle component analysis (PCA), independent component analysis and non-negative matrix factorization. In case of the temporal compression, time-frequency transformations are suitable like discrete wavelet transform (DWT), discrete cosine transform or fast Fourier transform. Based on preliminary work [44], PCA and DWT are used as best combination to achieve low reconstruction errors and high compression rates. In the spatial compression stage, PCA is applied on the normalized measurement matrix, which creates a set of "virtual" PMU sensors depending on the number of chosen principle components. This ensures that only non-redundant information (highest signal variances) are captured by the first principle components. In this step, the temporal dimension (number of time steps) remains the same. In the second compression stage, DWT transforms each "virtual" sensor signal into the frequency domain. In case of multiple decomposition levels, this transform is applied recursively to generate the approximation and detailed coefficients. A compression is achieved by setting some detailed coefficients to zero depending on a predefined, global threshold value. PCA and DWT are both linear signal transform techniques. The PCA and the DWT parameters can be reused in the spatial and the temporal inverse processes to reconstruct the "virtual" sensor signals as well as the input measurement matrix-see Table 1. This procedure is applied on each PMU channel separately assuring optimal compression rates and reconstruction errors. A detailed description of the methodology as well as a comparison between the different compression techniques is given in [44].

Disturbance Extraction (Post-Mortem Analysis)
The disturbance extraction module detects a predefined number of possible grid disturbances by analyzing historical records of a single phasor measurement signal (e.g., frequency). The lack of positive examples (non-outliers) and negative examples (outliers) as well as the high diversity of outlier types makes an exact prediction very difficult. To minimize the variance of the model predictions, different feature extraction, dimension reduction and outlier score estimation techniques are combined within an ensemble-based approach. Each outlier detection algorithm uses different metrics for the outlier scores, which makes a direct comparison difficult. Therefore, a rank transformation aggregates the outlier scores of the base detectors, such that high rank values correspond to low outlier scores and vice versa. In a final clustering stage, only the samples with the lowest total rank values are chosen. The goal of the clustering algorithm is to group similar disturbance patterns into meaningful classes. Therefore, a similarity matrix is computed using time-warp edit distance (TWED) [45]. The resulting dissimilarity matrix is passed to a hierarchical density based clustering algorithm (HDBSCAN) [46] to identify the cluster groups without prior knowledge about the number of cluster groups. For the disturbance extraction module, three outlier detection algorithms are used including local outlier factors (LOF) [47], correlation outlier probabilities (COP) [48] and single-linkage based outlier detection (SiLiOd) [49]. These algorithms compute an outlier score to measure the deviation to the normal or majority signal behavior. The main principles of the three algorithms for the outlier scores are given in Table 2. Table 2. Overview of the outlier detection techniques and outlier score properties.

Method
Main Principle Outlier Score

LOF
Local density of data points and its neighborhoods Local outlier factor COP Deviation within local correlation model using robust PCA Correlation outlier probability SiLiOd Hierarchical clustering using shortest distances Path lengths to final cluster As mentioned earlier, the ensemble-based disturbance extraction module uses multiple base detectors to compute outlier scores and extract potential unusual or abnormal records. The base detectors differentiate with respect to the used dimension reduction technique (PCA or Isomap) and the outlier detection method (LOF, COP or SiLiOd). This leads to a total number of six base detectors within the disturbance extraction module. Each base detector calculates in total 13 features including statistical and information-based metrics in time-domain as well as energy based features from Stockwell transform coefficients and discrete wavelet transform coefficients. This comprehensive feature representation allows to capture various signal traits in different disturbed situations like abrupt signal changes, oscillations, or other dynamic variations. A short overview of the different base detectors is given in Figure 4. Further literature can be found in [49,50].
Appl. Sci. 2020, 10, x FOR PEER REVIEW 8 of 35 predictions, different feature extraction, dimension reduction and outlier score estimation techniques are combined within an ensemble-based approach. Each outlier detection algorithm uses different metrics for the outlier scores, which makes a direct comparison difficult. Therefore, a rank transformation aggregates the outlier scores of the base detectors, such that high rank values correspond to low outlier scores and vice versa. In a final clustering stage, only the samples with the lowest total rank values are chosen. The goal of the clustering algorithm is to group similar disturbance patterns into meaningful classes. Therefore, a similarity matrix is computed using timewarp edit distance (TWED) [45]. The resulting dissimilarity matrix is passed to a hierarchical density based clustering algorithm (HDBSCAN) [46] to identify the cluster groups without prior knowledge about the number of cluster groups. For the disturbance extraction module, three outlier detection algorithms are used including local outlier factors (LOF) [47], correlation outlier probabilities (COP) [48] and single-linkage based outlier detection (SiLiOd) [49]. These algorithms compute an outlier score to measure the deviation to the normal or majority signal behavior. The main principles of the three algorithms for the outlier scores are given in Table 2. As mentioned earlier, the ensemble-based disturbance extraction module uses multiple base detectors to compute outlier scores and extract potential unusual or abnormal records. The base detectors differentiate with respect to the used dimension reduction technique (PCA or Isomap) and the outlier detection method (LOF, COP or SiLiOd). This leads to a total number of six base detectors within the disturbance extraction module. Each base detector calculates in total 13 features including statistical and information-based metrics in time-domain as well as energy based features from Stockwell transform coefficients and discrete wavelet transform coefficients. This comprehensive feature representation allows to capture various signal traits in different disturbed situations like abrupt signal changes, oscillations, or other dynamic variations. A short overview of the different base detectors is given in Figure 4. Further literature can be found in [49,50].

Disturbance Detection
Anomaly detection or novelty detection is an active research area in machine learning to automatically detect abnormal signal behavior or signal patterns that deviate strongly from the expected or majority behavior. This is different from the outlier detection (see Section 3.3), in which the negative examples in the training dataset are unlabeled or unknown. Different techniques are proposed in literature including classification based approaches (e.g., one-class support vector machine), clustering based approaches (e.g., density-based clustering) or subspace-based approaches (e.g., principle component analysis) and have a high resemblance with outlier detection methods. Further literature can be found in [51,52]. Compared to other approaches, the z-score based disturbance detection module computes anomaly scores instead of a binary decision. This is done by deriving features in the time-and frequency-domain and by applying a Z-transformation considering a fixed amount of historical feature values. The resulting z-scores are computed for each PMU measurement channel separately assuming a Gaussian feature distribution within the normal grid operation. The basic workflow is given in Figure 5.

Disturbance Detection
Anomaly detection or novelty detection is an active research area in machine learning to automatically detect abnormal signal behavior or signal patterns that deviate strongly from the expected or majority behavior. This is different from the outlier detection (see section 3.3), in which the negative examples in the training dataset are unlabeled or unknown. Different techniques are proposed in literature including classification based approaches (e.g., one-class support vector machine), clustering based approaches (e.g., density-based clustering) or subspace-based approaches (e.g., principle component analysis) and have a high resemblance with outlier detection methods. Further literature can be found in [51,52]. Compared to other approaches, the z-score based disturbance detection module computes anomaly scores instead of a binary decision. This is done by deriving features in the time-and frequency-domain and by applying a Z-transformation considering a fixed amount of historical feature values. The resulting z-scores are computed for each PMU measurement channel separately assuming a Gaussian feature distribution within the normal grid operation. The basic workflow is given in Figure 5. The input matrix consists of the last measurement samples n and are normalized into the range 0; 1 . Within the feature extraction step, a feature vector F n is created for each sample using signal analysis techniques in the time-und frequency-domain-see (3) and (4).
After the feature extraction phase, z-scores k are computed for each feature using the actual feature value F,N k , the mean and the standard deviation of all past features of the current measurement channel F k . The z-score vector contains the z-scores from all feature values-see (5) and (6).
The final z-score for a given PMU measurement channel can be derived from the maximum value within the z-score vector. A short overview of the features is given in Table 3. As a very The input matrix X consists of the N last measurement samples x n and are normalized into the range [0; 1]. Within the feature extraction step, a feature vector x n F is created for each sample using signal analysis techniques in the time-und frequency-domain-see (3) and (4).
After the feature extraction phase, z-scores z k are computed for each feature using the actual feature value x k F,N , the mean and the standard deviation of all past N features of the current measurement channel x k F . The z-score vector z contains the z-scores from all feature values-see (5) and (6).
Appl. Sci. 2020, 10, 5209 10 of 33 The final z-score for a given PMU measurement channel can be derived from the maximum value within the z-score vector. A short overview of the K features is given in Table 3. As a very rough approach, an anomaly level can be estimated depending on these z-score values using the overview in Table 4.  Table 4. Overview z-scores and corresponding anomaly levels.

Disturbance Classification
From machine learning perspective, the PMU based identification and localization of grid disturbances can be solved with a time-series classification (TSC) system. A general classification system comprises of the following sub-modules: • preprocessing: normalize the input data into a suitable data range, • feature extraction: extract relevant features to distinguish between the given classes, • classification: compute affiliation values (e.g., probabilities) for each class using the features and • decision-making: final class assignment of the current observation based on the maximum affiliation value.
In particular, the feature extraction from multidimensional sequential data is a challenging task that takes into account the spatiotemporal relationships within the input data. Current TSC approaches include sequence-based, similarity-based (e.g., dynamic time warping), feature-based (e.g., symbolic piecewise aggregation) and model-based techniques (e.g., hidden Markov models, neural networks)-see [53,54]. Especially recurrent neural networks like long short-term memories (LSTM) or gated recurrent units (GRU) provide state-of-the-art results when dealing with high-dimensional sequential inputs like speech, text or video data-see [55][56][57].
Compared to existing approaches, see Section 2.3, the disturbance classification module uses a single prediction model to simultaneously identify and locate grid disturbances from phasor measurements. Dynamic simulations are used to create the training data for a predefined set of contingencies. The input matrix for the classification algorithm comprises of K normalized measurements from multiple PMU sensors of the grid (no full observability required) over a fixed time span T-see (7). The target values of the classifier are the disturbance location y Loc and the disturbance type y Type , which are summarized within the class label y-see (8).
y = y Loc , y Type First investigations of a simultaneous disturbance identification and localization were performed in [58] by comparing five different classification approaches. From these results and additional research [59], recurrent neural networks were rated as very suitable for the online classification of grid disturbances regarding the classification accuracy and prediction time. Recurrent neural networks like LSTMs or GRUs can capture autoregressive, non-linear dependencies and time-varying patterns in high-dimensional sequential data by utilizing flexible gating mechanisms within the LSTM or GRU cells. Within this study, GRUs were preferred over LSTMs to increase the robustness and to minimize overfitting problems. The principle workflow of the GRU-based disturbance classification approach is shown in Figure 6.
In the feature extraction step, a hidden state matrix H is computed by a single GRU layer with Q dimensions given the normalized measurement matrix X N and the learned parameters θ F . Afterwards, a feature vector x F of dimension P is generated from the high-dimensional hidden state matrix using an embedding function f E with θ E . The general equations are given in (9) and (10). x Appl. Sci. 2020, 10, x FOR PEER REVIEW 11 of 35 classification of grid disturbances regarding the classification accuracy and prediction time. Recurrent neural networks like LSTMs or GRUs can capture autoregressive, non-linear dependencies and time-varying patterns in high-dimensional sequential data by utilizing flexible gating mechanisms within the LSTM or GRU cells. Within this study, GRUs were preferred over LSTMs to increase the robustness and to minimize overfitting problems. The principle workflow of the GRUbased disturbance classification approach is shown in Figure 6.
In the feature extraction step, a hidden state matrix is computed by a single GRU layer with dimensions given the normalized measurement matrix N and the learned parameters F . Afterwards, a feature vector F of dimension is generated from the high-dimensional hidden state matrix using an embedding function E with E . The general equations are given in (9) and (10). The goal of the embedding function is to capture the necessary information from the hidden state matrix to differentiate between the classes. According to (11) and (12), the classifier C is based on a standard softmax regression to compute the class probability vector P with the parameters C over all classes.
The final class assignment is based on the maximum probability value estimated by the classifier. The parameters of the feature extraction F , embedding and classifier are learned jointly via backpropagation using the cross-entropy loss formulation given in (13).
The GRU layer in the feature extraction step contains the most expensive operations in the disturbance classification module. For each time step , the GRU computes a hidden state vector ℎ using the actual input vector and the hidden state vector from the previous time step ℎ . This is done sequentially over all time steps to compute the hidden state matrix . The weights and biases of the GRU F are shared among all time steps. The principle workflow is shown in Figure 7. The goal of the embedding function is to capture the necessary information from the hidden state matrix to differentiate between the classes. According to (11) and (12), the classifier f C is based on a standard softmax regression to compute the class probability vector x P with the parameters θ C over all C classes. x The final class assignmentŷ is based on the maximum probability value estimated by the classifier. The parameters of the feature extraction θ F , embedding θ E and classifier θ C are learned jointly via backpropagation using the cross-entropy loss formulation given in (13).

of 33
The GRU layer in the feature extraction step contains the most expensive operations in the disturbance classification module. For each time step t, the GRU computes a hidden state vector h t using the actual input vector x t and the hidden state vector from the previous time step h t−1 . This is done sequentially over all T time steps to compute the hidden state matrix H. The weights and biases of the GRU θ F are shared among all time steps. The principle workflow is shown in Figure 7. Inside a GRU, the input vector and the last hidden state vector are used to compute the output vectors of the update gate and the reset gate . From that, a candidate hidden state vector ℎ is estimated, which represents the actual information gained by the GRU. In the last step, the new hidden state vector ℎ is computed depending on the output of the update gate vector, which decides whether to keep the old information ( → 1) or to forget them and to use the new information ( → 0). These gating mechanisms allow a very flexible control over the learned representations and accounts for the vanishing gradient problem when applying backpropagation over time (BPTT). The basic formulations for the gate calculations are given in (14) to (17). = z ℎ + z + z with z ∈ ℝ Q×Q and z ∈ ℝ K×Q = r ℎ + r + r with r ∈ ℝ Q×Q and r ∈ ℝ K×Q (15) ℎ = tanh h ∘ ℎ + h + with h ∈ ℝ Q×Q and h ∈ ℝ K×Q (16) Despite of the gating mechanisms, the information cannot be maintained over long time periods. As the number of time steps increases, some information might be lost or overwritten such that the final hidden state vector ℎ T does not contain all necessary information to classify the time series. Hence, different embedding functions are proposed to create a representative feature vector from the full hidden state matrix including feedforward neural networks as well as parametric and nonparametric attention models-see Figure 8. Inside a GRU, the input vector and the last hidden state vector are used to compute the output vectors of the update gate z t and the reset gate r t . From that, a candidate hidden state vector h t is estimated, which represents the actual information gained by the GRU. In the last step, the new hidden state vector h t is computed depending on the output of the update gate vector, which decides whether to keep the old information ( z t → 1 ) or to forget them and to use the new information ( z t → 0 ). These gating mechanisms allow a very flexible control over the learned representations and accounts for the vanishing gradient problem when applying backpropagation over time (BPTT). The basic formulations for the gate calculations are given in (14) to (17).
Despite of the gating mechanisms, the information cannot be maintained over long time periods. As the number of time steps increases, some information might be lost or overwritten such that the final hidden state vector h T does not contain all necessary information to classify the time series. Hence, different embedding functions are proposed to create a representative feature vector from the full hidden state matrix including feedforward neural networks as well as parametric and non-parametric attention models-see Figure 8.
Parametric and non-parametric attention model based embedding: A more efficient approach utilizes attention models to compute a weighted sum of the hidden state vectors-see (19). According to (20), these attention weights are computed for each time step using a softmax normalization on the score values . In this case, the feature dimension equals the hidden dimension = , such that no additional dimension reduction is performed within the embedding function.

= ∑
A score value indicates the importance of the current hidden state, such that high score values lead to high attention weights and vice versa. Within the parametric attention model (derived from [60]), a small feedforward neural network with tangent hyperbolic activation function computes the score values-see (21). The network parameters can be shared over time steps (global parametric attention model) or calculated for each time step separately (local parametric attention model).
= tanh S ℎ + S with S ∈ ℝ Q The non-parametric attention model uses dot-product (22) or cosine similarity (23) to calculate the score values. For this, the actual hidden state ℎ is compared with the last hidden state ℎ T for each time step.
= 〈ℎ , ℎ T 〉 (22) x F = σ W S vec(H) + b S with W S ∈ R P×(T·Q) and b S ∈ R P (18) Parametric and non-parametric attention model based embedding: A more efficient approach utilizes attention models to compute a weighted sum of the hidden state vectors-see (19). According to (20), these attention weights α t are computed for each time step using a softmax normalization on the score values s t . In this case, the feature dimension equals the hidden dimension P = Q, such that no additional dimension reduction is performed within the embedding function.
A score value indicates the importance of the current hidden state, such that high score values lead to high attention weights and vice versa. Within the parametric attention model (derived from [60]), a small feedforward neural network with tangent hyperbolic activation function computes the score values-see (21). The network parameters can be shared over time steps (global parametric attention model) or calculated for each time step separately (local parametric attention model).

of 33
The non-parametric attention model uses dot-product (22) or cosine similarity (23) to calculate the score values. For this, the actual hidden state h t is compared with the last hidden state h T for each time step.

Curative Actions
A main security aspect for the system operation is the maintenance of the N-1 criterion. It is applied to the system operation in such a way, that equipment outages do not result into a violation of operational constraints or limit violations and to prevent cascading outages. The N-1 criterion is conservative in terms of unexploited transmission capacities. "Curative actions are operational measures that are executed immediately after the occurrence of a foreseeable contingency" [61]. Thus, the application of curative actions is suitable to sensitively dissolve or extend the conservative N-1 criterion under certain conditions to gain transmission capacity. It is important to note, that curative actions are selective against distinguished events or critical contingencies. They can include line switching, redispatch or adaptation of VSC active power set points [21].

Results from Case Studies
Different datasets are used to evaluate the proposed synchrophasor applications for efficient data processing and post-mortem analysis (see Sections 3.2 and 3.3) as well as enhanced situational awareness (see Sections 3.4 and 3.5). These include low-to medium-voltage field measurements as well as high-voltage PMU signals extracted from dynamic grid simulations. Table 5 gives a short overview. The LVField dataset contains synchrophasor measurements of five PMUs from a public vendor, which are placed at a low-voltage distribution grid in Germany. The M-class PMUs provide three-phase voltage and current phasors (12 phasors per PMU) as well as frequency and rate of change of frequency values at 10 f.p.s reporting rate. The total record time covers 16.5 h. The MVField dataset contains measurements from the Texas Synchrophasor Network. It contains a 1 h record of single-phase voltage phasors and frequency measurements from six different PMUs placed at a medium-voltage distribution grid. Further information as well as the dataset provides [62]. In contrast to the previous datasets, the HVSim records are based on dynamic grid simulations from a generic transmission power grid. The synthetic PMU signals comprise of single-phase voltage phasors and frequencies from 21 PMUs (25 f.p.s. reporting rate). Detailed descriptions are given in the following Section 4.1.

Grid Topology and Key Assumptions
For the HVSim dataset, a single-phase grid model based on the ENTSO-E European Transmission System is used to facilitate the modelling and the analysis of the dynamic behavior of the power system. The 380 kV transmission system consist of 33 substations, where each interconnection consists of multiple circuit transmission lines to ensure the N-1 criteria. In order to emulate the dynamic of the system under outage conditions or load/generation changes, some generator units are equipped with Automatic Voltage Regulators (AVR) and Power System Stabilizers (PSS) to ensure the system stability in the simulated scenarios. Furthermore, multiple protection mechanisms for generators and lines are modeled. The frequency protection maintains the generator connected for frequency ranges between 47.5 Hz to 51.5 Hz and disconnects them otherwise for 5 s to 30 s to preserve the technical integrity of the equipment. Additionally, a fault-ride-through (FRT) capability for the generators protects the equipment under long duration short circuits. Finally, a simple maximum loading protection schema is implemented for the lines, which allows a line overloading for short time periods (<5 s). The implementation of these protection scheme allows simulating cascading effects for certain grid states. The basic grid topology with station and line labels is given in Figure 9. In a preprocessing step, different generation and load conditions are evaluated using optimal power flow (OPF) simulations. Stabilizers (PSS) to ensure the system stability in the simulated scenarios. Furthermore, multiple protection mechanisms for generators and lines are modeled. The frequency protection maintains the generator connected for frequency ranges between 47.5 Hz to 51.5 Hz and disconnects them otherwise for 5 s to 30 s to preserve the technical integrity of the equipment. Additionally, a faultride-through (FRT) capability for the generators protects the equipment under long duration short circuits. Finally, a simple maximum loading protection schema is implemented for the lines, which allows a line overloading for short time periods (<5 s). The implementation of these protection scheme allows simulating cascading effects for certain grid states. The basic grid topology with station and line labels is given in Figure 9. Compared to field measurements, the phasor estimation procedure and consequent signal deviations (e.g., filtering effects) are neglected.

Dynamic Simulation Results
For a better illustration of the dynamic grid model, some exemplary contingencies are presented for simulated frequency and voltage magnitude signals from four different PMUs. Figure 10 shows the corresponding simulations results over a period of 10 s for a partial PV plant outage of 50% installed capacity for two different operational points. In both cases, a frequency drop of about 20 mHz can be noticed resulting from a 0.5 MW PV capacity loss (about 1 MW nominal capacity). The voltage magnitudes increase from steady conditions of up to 1.5 kV in adjacent substations. A comparison between both operational points reveals minor differences between frequency signals but larger deviations between voltage magnitude signals due to the difference in the supplied reactive power. A short circuit at 90% length of line L19 is shown in Figure 11 for the same PMU sensor signals and operational points. Compared to the previous contingency, the frequency is oscillating within the first seconds of the disturbance between +20/−40 mHz and returns slowly to the pre-disturbance frequency level of 50 Hz. Characteristic voltage drops of about 110 kV can be observed for both operational points shortly after the disturbance event.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 16 of 35 For a better illustration of the dynamic grid model, some exemplary contingencies are presented for simulated frequency and voltage magnitude signals from four different PMUs. Figure 10 shows the corresponding simulations results over a period of 10 seconds for a partial PV plant outage of 50% installed capacity for two different operational points. In both cases, a frequency drop of about 20 mHz can be noticed resulting from a 0.5 MW PV capacity loss (about 1 MW nominal capacity). The voltage magnitudes increase from steady conditions of up to 1.5 kV in adjacent substations. A comparison between both operational points reveals minor differences between frequency signals but larger deviations between voltage magnitude signals due to the difference in the supplied reactive power. A short circuit at 90% length of line L19 is shown in Figure 11 for the same PMU sensor signals and operational points. Compared to the previous contingency, the frequency is oscillating within the first seconds of the disturbance between +20/−40 mHz and returns slowly to the pre-disturbance frequency level of 50 Hz. Characteristic voltage drops of about 110 kV can be observed for both operational points shortly after the disturbance event.

Results from Spatiotemporal Synchrophasor Data Compression
The evaluation of the spatiotemporal data compression module is based on the LVField dataset. The spatiotemporal compression algorithm is applied on randomly selected measurement records with a fixed time range of 100 s (1000 observations). Within the data pre-processing, the raw measurement values are normalized into the range [0,1] and filtered using backward sliding moving averages to eliminate high-frequency oscillations. To avoid signal discontinuities, the voltage and current phase angles are unwrapped before normalization. Regarding Section 3.2, the evaluation is focused on PCA as spatial and DWT as temporal compression algorithm whereas the chosen DWT hyperparameters are given in Table 6. The number of principle components is varied during the evaluation. Original and reconstructed voltage magnitude and frequency values are compared in Figure 12 for one and three principle components. Increasing the number of principle components reduces the compression rate roughly from 17 to 5. At the same time, the reconstruction error raises from 3.47 to 17.30 for voltage magnitudes and from 0.64 to 2.59 for frequencies. As a result, high deviations between original and reconstructed signals can be observed when reducing the number of principle components. Table 6. Hyperparameters of the spatiotemporal synchrophasor data compression module.

Hyperparameter Value
Wavelet function Db5 Decomposition level 2 Coefficient threshold 0.05 Spatial, temporal and total compression rates as well as reconstruction errors are summarized for the different measurement channels in Table 7 assuming three principle components used within the spatial compression stage. Spatial, temporal and total compression rates r C as well as reconstruction errors e R are summarized for the different measurement channels in Table 7 assuming three principle components used within the spatial compression stage. Using three principle components reduces the compression rate up to 5.8 and the reconstruction error up to 0.04. The highest reconstruction errors can be observed especially during the spatial compression and for voltage magnitude signals. The reconstruction errors heavily depend on the signal variability and the deviations within the different measurement channels. Assuming 1000 observations, a single measurement record of one voltage and current phasor as well as one frequency and ROCOF signal comprises 240 kByte memory space allocation. Specific file formats or additional over-head information are not considered here. The size of the compressed dataset varies between 14 kByte and 40 kByte depending on the number of principle components used in the spatial compression stage. The necessary calculations are performed in Matlab ® with the implementation of an additional toolbox for wavelet decomposition [63].

Results from Synchrophasor Disturbance Extraction
The evaluation of the disturbance extraction module is based on the MVField dataset including voltage magnitudes and angles as well as frequencies from different PMU sensors at medium-voltage level. The reporting rate is 30 f.p.s. Before preprocessing, high frequency components are eliminated from the raw measurements using a wavelet based denoising. A chosen time section of about 15 min of the raw voltage phasors and frequencies is given in Figure 13.

Results from Synchrophasor Disturbance Extraction
The evaluation of the disturbance extraction module is based on the MVField dataset including voltage magnitudes and angles as well as frequencies from different PMU sensors at medium-voltage level. The reporting rate is 30 f.p.s. Before preprocessing, high frequency components are eliminated from the raw measurements using a wavelet based denoising. A chosen time section of about 15 min of the raw voltage phasors and frequencies is given in Figure 13.   Table 8. The influence of changing hyperparameters is investigated in [49]. The disturbance extraction module is applied on a historical record of a single measurement channel. Within this survey, results are shown for the voltage magnitude measurements. The dataset is sampled with a window length of 3000 time steps (100 s) and normalized into the range [0,1]. The definition of the window length is crucial for the disturbance extraction module and should be selected in accordance with the expected disturbance time spans.
The outlier detection algorithms compute outlier scores for all samples depending on their deviation from the major signal behavior. Figure 14 shows the distribution of outlier scores (outlier map) for different combinations of outlier detection and dimension reduction techniques using time domain features. The 10 samples with highest outlier scores (top-10 outliers) are additionally highlighted in red.
in accordance with the expected disturbance time spans.
The outlier detection algorithms compute outlier scores for all samples depending on their deviation from the major signal behavior. Figure 14 shows the distribution of outlier scores (outlier map) for different combinations of outlier detection and dimension reduction techniques using time domain features. The 10 samples with highest outlier scores (top-10 outliers) are additionally highlighted in red. The majority of data points are clustered close together and correspond to samples with very similar features and signal behavior. For the LOF and COP, the outlier score decreases for data points The majority of data points are clustered close together and correspond to samples with very similar features and signal behavior. For the LOF and COP, the outlier score decreases for data points with a high distance to the center, while SiLiOd shows opposite behavior due to the different metric calculation (see also Section 3.3). Additionally, PCA and Isomap compute slightly different feature embeddings, which increases the variance of the results of the different base detectors. In all cases, LOF, COP and SiLiOd mainly agree when detecting far-away outliers (data points with high distance to center), but disagree when detecting near-by outliers (data points with medium or low distance to center). Due to their low distances to neighboring data points, near-by outliers are more difficult to detect and to distinguish from the majority signal behavior. As stated in Section 3.3, the outlier scores of the different base detectors are aggregated via rank transformation and passed to the clustering stage. For a better illustration, Figure 15 shows the rank transformed top-10 outliers for all base detectors in the time and frequency domain. Low outlier ranks correspond to high outlier scores and vice versa. A high agreement among all base detectors and domains can be observed for the samples 8, 9, 97 and 98 indicating a high outlier degree and a potential disturbance. These samples are assigned with low total rank values. Other samples like 43, 65 or 85 are only detected by a minority of the base detectors which results in a lower certainty of the results and higher total rank values. Figure 16 gives some exemplary results for disturbed patterns (low total ranks) and undisturbed patterns (high total ranks). In case of the disturbed signal patterns, sample 6 corresponds to a voltage sag and sample 10 corresponds to a voltage oscillation. From Figure 15, sample 10 with the voltage oscillation shows low total ranks among all base detectors in the time-and frequency-domain. In contrast to that, sample 6 with the voltage sag can only be detected by a few base detectors in the time-domain resulting in high total ranks. This maybe a result of the short time span of the disturbance compared to the window size.
The necessary calculations are performed in Matlab ® with an additional open-source implementation for the S transform [63,64] and a JAVA ® based library for Isometric mapping [65].
The outlier detection methods were taken from the JAVA ® based ELKI (Environment for Developing KDD-Applications Supported by Index-Structures) data mining framework [66].
sag and sample 10 corresponds to a voltage oscillation. From Figure 15, sample 10 with the voltage oscillation shows low total ranks among all base detectors in the time-and frequency-domain. In contrast to that, sample 6 with the voltage sag can only be detected by a few base detectors in the time-domain resulting in high total ranks. This maybe a result of the short time span of the disturbance compared to the window size.
The necessary calculations are performed in Matlab ® with an additional open-source implementation for the S transform [63,64] and a JAVA ® based library for Isometric mapping [65]. The outlier detection methods were taken from the JAVA ® based ELKI (Environment for Developing KDD-Applications Supported by Index-Structures) data mining framework [66].

Results from Synchrophasor Disturbance Detection
The disturbance detection module is evaluated for different contingencies using dynamic grid simulations from the HVSim dataset as described in Section 4.1. A fixed hyperparameter set has been chosen according to Table 9. The disturbance detection module is applied to assess features from 13 different PMU stations. For each PMU measurement channel, the last 50 observations (corresponds to 2 s) are used to generate the features in time and frequency domain. The resulting feature vector is compared with the features from the last N samples to compute the anomaly scores (see also Section 3.4). Some exemplary results are shown in Figures 17 and 18 for two different contingencies. It can be seen that high anomaly scores occur mainly within the first PMU samples, which represents the beginning of a disturbance and usually causes high signal variations and fluctuations. During this time period, there is a transition from steady to transient system states. For subsequent PMU samples, the anomaly scores decrease depending on the measurement and the disturbance type. Especially in case of the line outage, high anomaly scores can be observed for voltage magnitudes over the whole timespan, while the frequency values almost remain unchanged. For a better understanding, some exemplary signal patterns of low and high anomaly scores are given in Figure 19. Each frequency, voltage magnitude or voltage angle signal represents a single PMU station.

# of time steps per sample 50
The disturbance detection module is applied to assess features from 13 different PMU stations. For each PMU measurement channel, the last 50 observations (corresponds to 2 s) are used to generate the features in time and frequency domain. The resulting feature vector is compared with the features from the last N samples to compute the anomaly scores (see also section 3.4). Some exemplary results are shown in Figures 17 and 18 for two different contingencies. It can be seen that high anomaly scores occur mainly within the first PMU samples, which represents the beginning of a disturbance and usually causes high signal variations and fluctuations. During this time period, there is a transition from steady to transient system states. For subsequent PMU samples, the anomaly scores decrease depending on the measurement and the disturbance type. Especially in case of the line outage, high anomaly scores can be observed for voltage magnitudes over the whole timespan, while the frequency values almost remain unchanged. For a better understanding, some exemplary signal patterns of low and high anomaly scores are given in Figure 19. Each frequency, voltage magnitude or voltage angle signal represents a single PMU station.      Clear differences can be observed between patterns with low and high anomaly score. At low anomaly scores below or equal 1.0, only small signal variations occur over time for frequency, voltage magnitude and voltage angle signals. At high anomaly scores between 3.0 and 4.0, high frequency oscillations are present as well as significant changes of voltage magnitude and angle signals over time. The necessary calculations are performed in Python ® with additional packages for wavelet decomposition [67] and statistical analysis [68].

Results from Synchrophasor Disturbance Classification
For the evaluation of the disturbance classification module, dynamic grid simulations are used as described in Section 4.1. The base scenario comprises 20 different contingencies or disturbance classes, which are concentrated at 4 different stations and lines in the grid-as shown in Table 10. Generator outages are distinguished between small power plants (DKW) and large power plants (GKW). PV power plant outages and load losses are modelled as partial outages related to their installed capacities. Short circuits are modelled as 3-phase line-to-ground faults with varying fault location with regard to the line length. These disturbances are simulated for three operational points with different generation (renewable and conventional) and load profiles. The main parameters of the base scenario are given in Table 11 concerning the used input data, the time windows and the number of instances in training, validation and test phase. The time window equals the sampling window T, while the post-disturbance time corresponds to the total time range for a disturbance event. Training, validation, and test instances contain the same disturbance classes and operational points but differ with respect to the number of samples included into the datasets which depends on the degree of overlapping between subsequent samples. The following results relate to the base scenario and a GRU-based classifier with a local parametric attention model (compare with Section 3.5). Table 12 summarizes the chosen hyperparameters for the model. Hidden and feature dimensions are equal due to the attention based embedding function. Optimizer, learning rate and batch size are chosen by extensive investigations with different hyperparameter settings. Additionally, early stopping was used to limit the number of training epochs and prevent overfitting. Table 13 shows the accuracy and F1-score results of the training, validation and test datasets for a single training run. Appendix A gives additional information for calculating both metrics. The validation accuracy and F1-score are quite low compared to the training and validation test due to the small size of the validation dataset. The test performance is between the training and the validation values as a result of the high number of samples. Additionally, the training and validation instances cannot be excluded from the test dataset, because of the different sampling rates. Figure 20 shows the normalized confusion matrix for the true and predicted class labels from the training dataset. Very high accuracies can be observed for the conventional and renewable power plant outages as well as the load losses. A few misclassified training instances occur for line trips and short circuits of Line 32. Also in other cases, it can be seen that a correct differentiation between line trips and short circuits represents a big challenge for the PMU based disturbance classification task. One possible reason could be the short duration of these events, which makes a classification difficult for samples with a large time lag to the start of the disturbance. Similar conclusions arise from the receiver operating characteristic (ROC) analysis of the training and validation predictions shown in Figure 21. In accordance with the previous findings, accuracy drops between training and validation can be observed for line outages (minimum AUC value: 0.97) and short circuits (minimum AUC value: 0.99). Other disturbance classes are affected only marginal.    In further investigations, the embedding functions are compared with each other for different numbers of hidden dimensions, learning rates and optimizers. The training, validation and test performance results are given in Figure 22. Highest accuracies can be achieved in case of the local parametric attention model ("local scoring") with 95.12% (training) and 90.89% (test) as well as the non-parametric attention model using cosine similarity ("cosine attention") with 94.58% (training) and 91.96% (test). In case of no embedding with 90.54% (training) and 86.53% (test), the last hidden state h T is used as the feature vector. In general, attention based embedding functions increase the test accuracy about 4-5%. Feedforward neural network based embedding functions show very low accuracies on the training (max. 85.88%), validation (max. 78.94%) and test datasets (max. 79.89%) and even perform worse than the "no embedding" variant. This could be due to the vectorization of the hidden state matrix ignoring the dependencies between the hidden state vectors. In contrast to that, attention-based embeddings are more effective to summarize the information across the time steps but keep the number of hidden dimensions unchanged. This can lead to performance degradations when calculating distances between the high-dimensional feature vectors.
Furthermore, some parameters of the base scenario are varied to evaluate their influence on the classification accuracies. For this, a GRU-based classifier with a local parametric attention based embedding function is investigated using different hidden dimensions. The average accuracy deviations to the base scenario are shown in Figure 23. Larger changes in the test accuracies can be observed when decreasing (−3.1%) or increasing (+2.4%) the number of available PMUs, decreasing (−4.5%) or increasing (+4.1%) the sample overlapping of the training instances and increasing the post-disturbance time from 10 s to 40 s (−2.3%). A higher number of PMUs or sample overlapping increases the input data size and improves the classification accuracies. In contrast to that, a high post-disturbance time complicates the recognition of disturbances especially with short disturbance durations like short circuits or line outages. The sole use of voltage magnitudes and angles as input measurements reduces the training accuracy (−1.6%) but increases the test accuracy (+1.1%). Combining voltage phasors and frequencies has no significant impact on the test accuracy (−0.2%). The necessary calculations are performed in Python ® with additional packages for statistical analysis [68] as well as for creating and training of neural networks [70].
observed when decreasing (−3.1%) or increasing (+2.4%) the number of available PMUs, decreasing (−4.5%) or increasing (+4.1%) the sample overlapping of the training instances and increasing the post-disturbance time from 10 s to 40 s (−2.3%). A higher number of PMUs or sample overlapping increases the input data size and improves the classification accuracies. In contrast to that, a high post-disturbance time complicates the recognition of disturbances especially with short disturbance durations like short circuits or line outages. The sole use of voltage magnitudes and angles as input measurements reduces the training accuracy (−1.6%) but increases the test accuracy (+1.1%). Combining voltage phasors and frequencies has no significant impact on the test accuracy (−0.2%). The necessary calculations are performed in Python ® with additional packages for statistical analysis [68] as well as for creating and training of neural networks [70].

Synchrophasor Online Visualization Tool for Enhanced Operator Guidance
For a better integration into existing control center architectures, a web-based online visualization tool has been developed for the proposed synchrophasor application modules. The application results are written into a PostgreSQL ® database and continuously updated at the front end applying the Highcharts ® visualization framework. This enables a good interpretability of the results and gives the operator access to the online PMU streaming data as well as notifications for potential disturbances or critical system states. Some exemplary visualization charts for the

Synchrophasor Online Visualization Tool for Enhanced Operator Guidance
For a better integration into existing control center architectures, a web-based online visualization tool has been developed for the proposed synchrophasor application modules. The application results are written into a PostgreSQL ® database and continuously updated at the front end applying the Highcharts ® visualization framework. This enables a good interpretability of the results and gives the operator access to the online PMU streaming data as well as notifications for potential disturbances or critical system states. Some exemplary visualization charts for the disturbance detection module and disturbance classification module are given in Figure 24.

Synchrophasor Online Visualization Tool for Enhanced Operator Guidance
For a better integration into existing control center architectures, a web-based online visualization tool has been developed for the proposed synchrophasor application modules. The application results are written into a PostgreSQL ® database and continuously updated at the front end applying the Highcharts ® visualization framework. This enables a good interpretability of the results and gives the operator access to the online PMU streaming data as well as notifications for potential disturbances or critical system states. Some exemplary visualization charts for the disturbance detection module and disturbance classification module are given in Figure 24.

Conclusions and Future Work
In this study, a novel framework is presented with additional visualization capabilities for efficient processing and analyzing historical as well as online PMU data streams. Integrated into existing control centers, this framework provides valuable information for the automated online detection and classification of critical system states as well as the activation of suitable countermeasures to maintain secure system operation. The framework comprises of different application modules for efficient data processing (spatiotemporal data compression module), postmortem analysis (disturbance extraction module) and online recognition (disturbance detection and classification module). Compared to existing approaches, this framework enables a comprehensive situational awareness to recognize known disturbance events, which belong to simulated contingencies, or new disturbance events, which are revealed post-mortem from efficient analysis of large historical measurements. Within different case studies incorporating field measurements at distribution level as well as dynamic grid simulation data at the transmission level, the proposed application modules are evaluated. The investigations comprise of different recognition tasks including the spatio-temporal compression of measured PMU signals with an average compression rate of 5.8, the detection and extraction of different voltage sag and oscillation events from historical PMU voltage signals as new disturbance events, the computation of anomaly scores for abnormal PMU signals as well as the automated identification and localization of predefined contingencies

Conclusions and Future Work
In this study, a novel framework is presented with additional visualization capabilities for efficient processing and analyzing historical as well as online PMU data streams. Integrated into existing control centers, this framework provides valuable information for the automated online detection and classification of critical system states as well as the activation of suitable countermeasures to maintain secure system operation. The framework comprises of different application modules for efficient data processing (spatiotemporal data compression module), post-mortem analysis (disturbance extraction module) and online recognition (disturbance detection and classification module). Compared to existing approaches, this framework enables a comprehensive situational awareness to recognize known disturbance events, which belong to simulated contingencies, or new disturbance events, which are revealed post-mortem from efficient analysis of large historical measurements. Within different case studies incorporating field measurements at distribution level as well as dynamic grid simulation data at the transmission level, the proposed application modules are evaluated. The investigations comprise of different recognition tasks including the spatio-temporal compression of measured PMU signals with an average compression rate of 5.8, the detection and extraction of different voltage sag and oscillation events from historical PMU voltage signals as new disturbance events, the computation of anomaly scores for abnormal PMU signals as well as the automated identification and localization of predefined contingencies (e.g., short circuits at different line positions, generator and line trips, partial PV and load losses). High accuracies of about 95% can be achieved with a GRU based classification model and an attention based embedding function. Additionally, a web-based visualization is presented to integrate analysis results into subsequent decision processes for system operation. Future work would comprise the demonstration of a combined disturbance classification and automated selection of curative actions for critical system events. An evaluation of defined benchmark grids is preferable for a better comparison with similar research. Additionally, cyber-security issues should be taken into account in order to increase the robustness of the application modules in case of data manipulations or communication interruptions. A combined online compression and encryption of PMU data streams could be a promising solution for secure and effective data transmission.
Author Contributions: A.K. proposed the approach and prepared the manuscript with contributions by C.B. and C.M. under the guidance of S.N. and D.W. All authors have read and agreed to the published version of the manuscript.
Funding: This paper is a partly result of the project DGCC (03ET7541D) funded by the German federal ministry for Economic Affairs and Energy (BMWi) as part of the funding initiative "Future-proof Power Grids".

Conflicts of Interest:
The authors declare no conflict of interest. Nomenclature X, X N raw PMU measurement matrix, normalized PMU measurement matrix N number of PMU measurement samples H, h hidden state matrix, hidden state vector T, t number of PMU measurement time steps, single time step Q, P, K number of hidden dimensions, number of features, number of measurements α, s attention weight and score value x F , X F feature vector, feature matrix x P class probability vector z z-score vector µ, σ sampled mean, sampled standard deviation y, y Loc , y Type disturbance event, event location, event type θ F , θ E , θ C parameter for feature extraction, embedding, classification The F1-score is calculated using the positive precision rate PP and the right positive rate RPR as described in (A2). In multi-class settings macro-averaging represents the mean of all class-wise evaluation metrics whereas micro-averaging take into account all class samples to calculate an aggregated evaluation metric.