Power Line Communication and Sensing Using Time Series Forecasting

Smart electrical grids rely on data communication to support their operation and on sensing for diagnostics and maintenance. Usually, the roles of communication and sensing equipment are different, i.e., communication equipment does not participate in sensing tasks and vice versa. Power line communication (PLC) offers a cost-effective solution for joint communication and sensing for smart grids. This is because the high-frequency PLC signals used for data communication also reveal detailed information regarding the health of the power lines that they travel through. Traditional PLC-based power line or cable diagnostic solutions are dependent on prior knowledge of the cable type, network topology, and/or characteristics of the anomalies. In this paper, we develop a power line sensing technique that can detect various types of cable anomalies without any prior domain knowledge. To this end, we design a solution that first uses time-series forecasting to predict the PLC channel state information at any given point in time based on its historical data. Under the approximation that the prediction error follows a Gaussian distribution, we then perform chi-squared statistical test to build an anomaly detector which identifies the occurrence of a cable fault. We demonstrate the effectiveness and universality of our sensing solution via evaluations conducted using both synthetic and real-world data extracted from low- and medium-voltage distribution networks.


Background
Asset monitoring is critical for the safe and smooth operation of the electricity grid system [1]. The advent of the smart grid, which allows for bidirectional data exchange between the utility and the consumer [2][3][4], unfolds a new paradigm of solutions for smart grid sensing and infrastructure monitoring to improve system resilience of the grid. Among various smart grid data communication solutions, power line communications (PLC) use the existing power line infrastructure, which was originally designed to transport electric energy and for communication purposes. In this paper, we further propose to re-use PLC modems for cable diagnostics as a use case of joint communication and sensing [5] for the smart grid. This provides the benefits of realizing a low-cost solution that can operate in an online, independent, and automatic manner without requiring any new component installations [6][7][8][9][10][11].
PLC is a commonly used solution to enable information and communication technology for the smart grid [12][13][14]. Power line modems (PLMs) that transmit and receive smart grid data constantly estimate the power line channel state information (PLCSI) for adapting their operation. In this context, we refer to PLCSI as any parameter that conveys the channel behavior, either directly, e.g., channel frequency response (CFR) or access impedance, or indirectly, e.g., signal-to-noise ratio (SNR) or precoder matrix. Prior articles have shown that this estimated PLCSI also contains information that can be used as a smart grid sensing solution to infer cable health conditions [6][7][8][9][10][11]. We build on these methods to propose an enhanced cable diagnostics solution using time series forecasting.

Related Works
To learn from and contrast our proposed solution to prior articles, we present a brief overview of the existing body of literature in the broad area of diagnostics by categorizing them into the following topics.

Legacy Cable Diagnostics
Some legacy cable diagnostic solutions, e.g., reflectometry-based methods, require deployment of specialized equipment and/or personnel to conduct the tests [15,16], ([17] Ch. 6), [18]. Furthermore, several non-PLC solutions that sample the electrical signal with a lower frequency, such as phasor measurement units, suffer from noisy data impacted by electrical disturbances, and are unable to discern precise information about cable defects, as seen in, e.g., [19] to determine the age of degradation or accurate location of the position of a fault [16,19]. Our proposed PLC-based monitoring technique, on the other hand, reuses the high-frequency broadband communication signals as probing waves to provide effective cable diagnostics similar to other PLC-based diagnostic solutions as presented in [6,20].

PLC Cable Diagnostics Solution
Various PLC-based solutions have been developed in the past to exploit the throughthe-grid nature of PLC signals to gain insights into the characteristics of the transmission medium [6][7][8][9][10][11]21]. Many of the proposed PLC-based diagnostic solutions typically require a reference healthy measurement, i.e., PLCSI of the exact type of the deployed cable that is undamaged (e.g., [8][9][10]). PLCSI estimated within the PLM is then compared against this reference measurement to infer the health of the cable. Aside from practical challenges associated with employing such a method in the real world, these methods also provide less than reliable results in anomaly detection since the load conditions are constantly fluctuating, which renders it hard to distinguish between benign and malicious PLCSI changes, e.g., those that are caused due to load variations as opposed to grid anomalies. Alternatively, data-driven methods that were designed to use machine-learning (ML) techniques to intelligently detect and assess cable health are resilient against such challenges [6,7,22]. These methods harness ML classification and regression techniques to detect, locate, and assess various smart grid network anomalies, such as cable degradation and faults and network intrusions [6,7,20,23]. One notable work is [24], which extracts features from the PLM SNR spectrogram and conducts unsupervised clustering for anomaly detection. However, these methods are not universally applicable since the machines used here are typically trained under a specific operating network topology to detect a few known types of characterized anomalies. When the machine is deployed under a different network topology or is applied to detect a type of anomaly it has never encountered in the process of training, the performance of these solutions suffer significantly. Our proposed universal cable anomaly detector counters these drawbacks by processing the collected PLC signals and/or PLCSI data using time-series forecasting to determine a variety of anomalies in a smart grid network. Unlike the above methods, our solution does not demand stringent machine training regimens that are heavily reliant on the accuracy of the network topology emulation or load modeling.

Time-Series Prediction for Diagnostics
Our proposed solution includes a two-pronged approach with a time-series forecasting followed by the anomaly detection procedure. Prior articles on the former use auto-regressive integrated moving-average (ARIMA) and NN-based forecasting to provide accurate prediction results for a variety of applications such as gearbox fault trend prediction [25], peak load value forecasting [26], and other seasonal and time-series trend analyses [27][28][29][30]. The issue with these prior articles is that they fall short of the task that we consider, i.e., detecting anomalies using the predicted data. We nevertheless borrow approaches from the aforementioned works for our first task of time-series forecasting, and then further build on the work to use the results from the prediction to determine possible anomalies by comparing them against practical values obtained from a device under test. To this end, forecasting approaches developed in the past form a solid starting point for our proposed work.

Cable Diagnostics Solution Using Other Techniques
Several cable diagnostic solutions have been historically proposed that use high-frequency probing or testing signals, for example, using time-frequency reflectometry [31][32][33][34][35][36]. Such techniques have been shown to be efficient at detecting abrupt cable anomalies, e.g., a cable fault introduced due to neighboring infrastructure upgrade, such as a neighboring line drilling. However, these methods are not suitable for diagnosing cable anomalies that progressively develop over a longer period of time. For instance, some incipient faults do not typically manifest as distinct changes in waveform responses observed between relatively closer instances of time, and could instead take a considerable amount of time to be seen as a noticeable change in the probing waveform. Our solution is applicable to both these types of faults and can also be used for capturing anomalies that tend to manifest over a longer duration of time.

Diagnostics Solution for Other Systems
A vast body of literature exists in the broad domain of diagnostics that neither use time-series forecasting-based techniques nor are targeted at detecting anomalies for cables. The works in [37][38][39][40][41] are some of the classical works to name a few. While we recognize the existence of these prior articles that focus on a problem similar to that of ours, techniques of little relevance are found in these works that can be used or adapted to counter the challenges we face in our considered setup. We highlight that most of these works are predominantly suitable for detecting, locating, and/or assessing abrupt faults that manifest as noticeable response change in a short duration time. In contrast, our time-series prediction-based diagnostics solution is a powerful method to detect a host of anomalies of varied nature without requiring prior domain knowledge of the infrastructure under test.

Contributions
In this paper, we develop a general purpose cable anomaly detector. This means our developed cable anomaly detector does not require any reference measurements from healthy cables and is universally applicable. Our design is fully agnostic to the nature of the anomaly, i.e., its physical behavior and its cause, and to the infrastructure configuration, such as cable type or network topology. To this end, we propose the use of historical PLCSI, in particular the SNR as we will explain below, for the link between a transmitting and a receiving PLM, to train a time-series predictor. By treating the time-stamped PLCSI as time-series data, we use time-series forecasting to predict the PLCSI at any given point in time using historical data by exploiting the knowledge that the network topology, cable configuration, and the physical properties of the cable are relatively stable for extended periods of time. In addition, since the long-term load (throughout this paper, we refer to load as the impedance load connected to a circuitry, as opposed to the electricity consumption load of a grid) conditions are closely related to their historical values, the PLCSI is also correlated in time and can be predicted using historical state information [42]. We then compare the predicted response against the actual response estimated by the PLM to detect a potential anomaly.
The performance of our solution relies heavily on the accuracy of the predicted PLCSI. With a highly accurate prediction, the detector would be capable of detecting even subtle faults, which might not be discernible if the prediction itself is noisy. To this end, we investigate a range of possible candidates for forecasting, including classical approaches such as the ARIMA model ( [43], Ch. 4) and feed-forward neural networks (FFNNs) [30], and also relatively recently developed techniques such as the long short-term memory (LSTM) model [44]. Furthermore, owing to its success in previous PLC-based cable diagnostics [6,20], we also evaluate the use of least-square boosting (L2Boost) [45].
The second factor of consideration toward building our solution is the design of the cable anomaly detector based on the predicted and the measured PLCSI values. PLMs estimate a range of instances of PLCSI for adapting data communication in a time varying environment. Some of the estimated PLCSIs that shed direct light on the channel and in turn on the cable health are the end-to-end CFR, access impedance, precoder matrix, and self-interference channel impulse response [46][47][48]. However, several existing PLM chip-sets are unable to extract these parameters in their entirety without additional firmware modifications [7]. In light of this, we consider the use of SNRs instead, which can be readily extracted from current-day PLM chip-sets [7] and can be used for processing either locally within the PLM or reported to a common location by all PLMs, e.g., a substation, for centralized data processing. The challenge lies in differentiating between a cable anomaly and an inaccurate prediction. Furthermore, the use of SNRs as PLCSIs introduces an added challenge of designing a cable diagnostics solution with poor quality information, since SNRs contain lesser insight into the power line medium when compared to, e.g., CFR, together with being distorted by ambient noise. To counter these impacts, we exploit the orthogonal frequency-division multiplexing (OFDM) nature of broadband PLC transmissions [49]. We first divide all the OFDM subcarriers into several groups and average the value of SNRs across all subcarriers within each group. This stabilizes the group SNR average, which then in turn also makes it more accurately predictable. With the approximation that the prediction errors across the subcarrier groups follow a multi-variate Gaussian distribution as in [50,51], we determine a probable occurrence of an anomaly event based on the significance level of the squared Mahalanobis distance (SMD) [52]. The significance level can be determined either empirically from the training data or theoretically from a chi-squared test [53].
We verify the feasibility and the effectiveness of our proposed schemes through numerical evaluations using both synthetic data and in-field collected data. For the former, we use a bottom-up PLC channel emulator to generate the SNR time-series data, which allows us to investigate the performance of our proposed solution under various types of cable anomalies in a customized and a controlled environment. The in-field data obtained from [54] further allows us to verify our proposed schemes in the real-world, which indicates the performance of our proposed technique in practice.
To summarize, in this work, we propose a general purpose cable anomaly detector using improvised grid sensors, i.e., PLMs. Our proposed solution does not require any prior domain knowledge of the infrastructure under test and is universally applicable. We design our anomaly detection schemes based on the techniques of time-series forecasting and the statistical test of prediction errors. We verify the feasibility and effectiveness of our proposed schemes through numerical evaluations using both synthetic and in-field collected data.

Paper Organization
In Section 2, we formulate our time-series forecasting problem and elaborate our techniques for the time-series prediction. Further, in Section 3, we introduce our cable anomaly detection scheme based on the time-series prediction results, including the construction of stabilizer group and the detection based on the use of SMD for the chi-squared test. We present our findings from our case studies of our proposed schemes in Section 4. Supplementary analyses, including a robustness test and discussion of incipient faults, are provided in Section 5. Section 6 concludes the paper.

Time-Series Forecasting
We begin by presenting a brief overview of time-series prediction by focusing on the pertinent algorithms that we consider for our proposed method. This helps us in understanding the performance of the SNR forecasting using time-series data. We note that the SNR for the link between two PLMs that is measured in the receiving PLM is the ratio of the power of the received data signal of interest to the power of the background noise. As we will discuss in Section 3.1, the SNR is typically measured for each OFDM subcarrier in a PLM. Since this is not significant for the discussion of the prediction algorithms, we omit the subcarrier index for the moment.

Time-Series Data for Cable Anomaly Detection
The time-stamped SNR between a transmitter-receiver PLM pair is denoted as x j , where j is the integer discrete time index. We formulate our problem as using windowed instances of x j , where n − w ≤ j < n, to predict x n and obtain the predicted value asx n , with w being the window size. While a larger w can improve the prediction performance, we note that increasing w requires more training data to determine the predictor parameters, leads to a higher computational complexity for prediction, and only provides incremental benefits beyond a certain point. Hence, the value of w should thus be determined by considering the trade-off between benefits and costs.
Among the available samples of x j , we use x j , where j ≤ n tr , to train the time-series predictor, where n tr is the number of samples used for training. Once the model is trained, we then use it to predictx j , where j > n tr . We use the normalized root mean square error (RMSE), η, as the performance indicator of our prediction, which is computed as where µ x is the sample mean of the observations of x j for n tr + 1 ≤ j ≤ N, and N is the total number of x j samples used for training and testing. Typically, we would want a large value of N, so as to saturate the training of the predictors' parameters and to reliably measure their performances.
To compare the performance of our ARIMA and ML-based predictors against a baseline approach, we consider a simple extrapolation, In the following, we discuss the use of different time-series forecasting methods for predictingx n . We defer to Section 4.2 for the procedure to choose suitable time-series prediction models to use for our anomaly detection, depending on the nature of the data used for our diagnostics scheme.

ARIMA
The ARIMA model is a classical time-series predictor that has successfully been used across various domains of application, including financial series prediction, demand and load prediction in the power generation and distribution industry, and customer sales prediction ( [43], Ch. 1). An ARIMA model is specified by its order and its associated parameters. A (p, d, q) ARIMA model is a pth order auto-regressive, qth order movingaverage linear model with dth order of difference. A (p, d, q) ARIMA model has p autoregressive terms with p auto-regressive coefficients and q moving-average terms with q moving-average coefficients. A dth order difference is generated using d subtraction operations, i.e., u d, The resultant time-series after difference is then assumed to be a (p, q) auto-regressive moving-average model, which is a linear model with p auto-regressive terms and q movingaverage terms, which is specified by where φ i are coefficients for auto-regressive terms, θ i are coefficients for moving-average terms, and a j is the random shock terms drawn independently from a Gaussian distribution having zero mean and variance σ 2 a .

Least-Square Boosting
As our second time-series predictor candidate, we investigate L2Boost, which has been shown to be successful in the past, specifically for cable diagnostics [6,20]. L2Boost is a popular ML technique used for supervised regression tasks [45]. It is one of the meta-ML algorithms which works by consolidating multiple weak learners into a strong learner [55]. It applies the weak learners sequentially to weighted versions of the data, where a higher weight is allocated to examples that suffered greater inaccuracy in earlier prediction rounds. These weak learners are typically only marginally better than random guessing but are computationally simple. Boosting is also known to be robust to over-fitting, and can be efficiently executed since it is a forward stage-wise additive model.
To use the L2Boost for time-series prediction, we organize the SNR time series into a labeled data set for the supervised learning. For the training data set, i.e., x j , where 1 ≤ j ≤ n tr , we prepare each sample with input x j = (x j , x j+1 , . . . , x j+w−1 ) and its associated label y j = x j+w , where j + w ≤ n tr . We then prepare the testing samples in a similar way with input from x j to x j+w−1 and its associated label as x j+w , but with j > n tr .

Feed-Forward Neural Network and Long-Short-Term-Memory
As our last set of predictor candidates, we investigate the use of two types of artificial neural network (ANN) models, FFNN and LSTM. Despite the absence of feature engineering, ANNs can still explore the inherent structure of the input data, which could be hidden and/or complex. The architecture of ANN is flexible with varying number of hidden layers and neurons in each layer. To use ANNs for time-series prediction, we organize the PLCSI values into a labeled data set of x j and y j for the supervised learning the same manner as in Section 2.3.
While the FFNN has a plain architecture, where the output of the previous layer is fed as the input to the current layer, i.e., feed-forward from the input layer to the output layer, the LSTM has a feed-back mechanism, where the output of the current layer at the last time stamp together with the output of the previous layer at the current time stamp are fed as the input to the current layer at the current time stamp. For the LSTM model, the feed-back of the current layer from the last time stamp is controlled by a forgetting gate and the output of the previous layer at the current time stamp is controlled by an input gate. The forgetting gate controls how much previous information memorized by the LSTM machine is forgotten and the input gate controls how much new information from the input layer is passed through the LSTM machine. Such a feed-back mechanism is capable of capturing a long-term time dependence relationship and suitable for a variety of time-series prediction tasks. When such a long-term time dependence relationship is not present, using FFNN in place of an LSTM machine can reduce the risk of over-fitting.

Cable Anomaly Detection
In this section, we present the design of the cable anomaly detector. We consider PLMs deployed in a smart grid that communicate with each other through the grid's power lines. For a link between two PLMs, which could be part of a longer multi-hop PLC connection, the receiving PLM measures the SNR. The current SNR measurement and its predicted value, which are obtained from previous measurements and time-series forecasting as described in Section 2, are input to the anomaly detector. An anomaly located between the two PLMs is identified by comparing the difference between the measured and the predicted SNR against a threshold, as will be explained in more detail below. The objective of the anomaly detector is to maximize the detection rate while simultaneously not exceeding a set probability of false alarm (FA).
We note that we use the term "cable anomaly" fairly broadly, as there are various factors that affect the communication quality as quantified by the SNR. First, while our work, and in particular our numerical results presented in Section 4, focus on underground cables, the presented concept is also applicable to overhead power lines. Second, since the PLC signal is actively probing the power line infrastructure, our method is able to detect anomalies resulting from physical degeneration of cables. We will describe several models for such a degeneration in Section 4.3. However, anomalies in other infrastructure elements such as cable joints or fuses would also affect the SNR, as well as changes in the noise environment due to, e.g., imminent equipment failures.

Data Preparation
For the overwhelming majority of broadband PLC transceivers that use OFDM transmission [49], SNRs are measured individually at each subcarrier. To stabilize the time-series SNR data, we divide all the OFDM subcarriers into multiple groups called stabilizer batches. We then average the SNR across all individual subcarriers within each stabilizer batch. This procedure of averaging within a stabilizer batch ensures that the time-series SNR data are more predictable when compared to using SNRs of individual subcarriers. In this regard, it is essential to have the subcarriers within a batch to be contiguous. This ensures that, within each stabilizer batch, the variations in individual SNR values are only gradual and the impacts of cable anomalies on the individual subcarrier SNRs are similar in nature.
This process results in several stabilizer batches, and the time-stamped average SNR values in each individual stabilizer batch are treated as a set of time-series data. We denote z i = {z i,j }, 1 ≤ i ≤ n SB to denote the time series of the average SNR of the ith stabilizer batch, where n SB is the number of such stabilizer batches. For every ith stabilizer batch, we use the candidate forecasting models described in Section 2 to develop a time-series predictor F i to predict the average time-series SNR,γ i,j . The input to the predictor is the windowed time series with the samples corresponding to j + w ≤ n tr used during training and those corresponding to j + w > n tr used while testing. Hence, the prediction isγ i,j = F i (v i,j ) while the true label is γ i,j = z i,j+w .

Detection Using Squared Mahalanobis Distance
To detect an anomaly we consider the difference in the predicted SNR to the one measured by the PLM, The measured SNR is a random variable affected by several aspects in the grid such as loads that are random in nature. The prediction error is a result of the linear (ARIMA) or non-linear (L2Boost, FFNN, LSTM) superposition of several such measured SNRs. Therefore, we approximate δ i,j as a multi-variate Gaussian distribution as in [50,51], which is stationary over j, with mean µ and covariance matrix Σ. With δ j = [δ 1,j , δ 2,j , ..., δ n SB ,j ] T , we compute the SMD as D 2 MA follows a chi-squared distribution with a degree of freedom of κ = n SB . Then, following the theory of chi-squared statistical test [53], for a significance level of α, we define the quantile function of the chi-squared distribution with a degree of freedom κ, as χ 2 κ (·), i.e., where Pr(·) is the probability function. Finally, for a chosen target FA rate of p FA , our anomaly detector declares a warning of a potential cable anomaly when where the threshold T r (p FA ) is determined according to the corresponding significance level by

Design and Case Studies
We now highlight the performance of our proposed cable anomaly detection by applying it to two different types of data sets, one generated synthetically and the other collected in-field, and we describe the design details involved.

Data Sets In-Field Data
We acquire in-field measurements from the data made available to us by the author of [54]. The data were measured using 24 BB-PLC modems installed in the low-voltage (LV) sector of a distribution network and 12 BB-PLC modems in the medium-voltage (MV) grid. The PLMs in the LV network were placed at house connection and at distribution points, and several point-to-multi-point connections were possible. Overall these formed 22 transmitter-receiver pairs with 44 bidirectional data transmission links. The PLMs in the MV network established point-to-point connections, and we thus have 6 pairs with 12 PLC links. The SNR data for PLC-links were measured by the receiving PLMs every 15 min over 917 OFDM subcarriers spaced 24.414 kHz apart. This data collection spanned an overall time period from 17 to 21 months.
Due to limitations in generating flexible observations and anomalies in practical grids, the in-field data consist of only two recorded instances of network anomalies. Furthermore, although information of the cable type, length, and the biological age of the cables are provided in [54], there is limited information available on the operation condition during the field test. Therefore, for a comprehensive evaluation, together with using the in-field collected data, we also use synthetic data sets obtained from constructing a PLC network and generating SNR time-series data using the bottom-up approach.
For consistency between the two types of data sets, we borrow several network settings for generating the synthetic data from the in-field measurement campaign. We generate samples for the SNR between a pair of PLMs for every 15 min over a period of 664 days. For this, we adopt the same OFDM parameters as listed above, a transmit power spectral density (PSD) of −50 dBm/Hz, and a noise PSD of −120 dBm/Hz, which are typical values for PLC. The channel is generated with an emulator [56] for the constructed PLC network, which we choose as a T-topology as shown in Figure 1. The SNR measurements are performed at PLM-2 for the link between PLM-1 and PLM-2, i.e., anomalies anywhere along the hop from PLM-1 to PLM-2 including in the branch leading to PLM-3 are targeted. The three PLMs are connected through multi-core N2XSEY HELUKABEL cables with crosslinked polyethylene (XLPE) insulation, whose configuration and parameters can be found in ( [57], Table 2). Load impedances (LIs) which represent the power grid extending outward from the T-topology are connected in parallel to the PLMs. That is, the LIs are the equivalent impedances of the surrounding grid experienced at the edges of the T-topology. We consider three types of time-series LI models to emulate the temporal dependence of electrical loads, motivated by seasonal and auto-regressive properties of loads in the mains frequency [42]. We denote the load value at discrete time index j of the LI model k, k ∈ {1, 2, 3}, as L k,j . For k = 1, 2, we apply a second-order auto-regressive model and a cyclic model with one day per cycle, respectively. Furthermore, we add random shocks, r k,j , to the models to introduce a degree of randomness in the load variations. As a result, our first LI model is where L 2,j is a summation of a set of sine and cosine terms, each with its frequencies being harmonics of a set fundamental frequencies. We set the cycle corresponding to the fundamental frequency to be one day. We then set the third model to be as a hybrid of both the auto-regression and the cyclic behaviors.

Time Series Prediction for Studied Data Sets
In this part, we develop the time-series prediction solutions for our studied data sets using the candidate models described in Section 2. For the in-field-data cases, the SNR data were measured by the PLMs every 15 min spanning an overall time period from 17 to 21 months, so that between N = 45, 000 to N = 65, 000 data samples were available to us. For the synthetic-data case, we generated N = 63, 702 data samples, which is the same size as the two in-field collected data sets with the documented events. These values are sufficiently large so that the performances of the trained time-series predictors do not improve with the further increase of N. We set the window size to w = 20, as we observed from our numerical study that this window size generally results in a favorable prediction performance with a moderate model complexity. For the subcarrier grouping, we observed eight notches in the frequency band considered for the PLC data transmission for the in-field data set. Hence, setting n SB = 9 follows quite naturally as discussed in Section 3.1. This number also provides for sufficient averaging of per-subcarrier SNRs to obtain a smooth time series for the average SNR in each subcarrier group.

ARIMA
We consider ARIMA models for all combinations of p, d, q, where 0 ≤ p, d, q ≤ 2, which is known to be sufficient for most practical time-series prediction tasks ( [43], Ch. 6). Discarding the case of p = q = 0, we investigate a total of 24 candidate ARIMA models.

L2Boost
We choose the hyper-parameter, k total , which represents the total number of iterations as k total = 50, 100, 200. We make the choice considering that for smaller values of k total , the resultant trained model has a lesser representation power but also a lower risk of over-fitting.

ANN
Given the small input size to the NN, i.e., the window size w, we consider a simple architecture with one hidden layer with eight neurons for the FFNN and the LSTM models. For the FFNN and LSTM, we use the sigmoid function and hyperbolic tangent as the activation functions for the hidden layer, respectively. The purpose of the activation function for the hidden layer is to implement a non-linear transform so that non-linear relationship between the output and the input can be learned by the ANN.

Results
Our aim is to develop a time-series predictor that can predict future values as accurately as possible when the system is operated under normal conditions, i.e., without anomalies. Thereby, an anomaly produces a pronounced deviation between the actual value and the predicted one. Therefore, in this part of the study, the training and testing data for the synthetic data sets only contain the SNR values when the cable is under normal operating conditions. For the data from field measurements, we stipulate that most of the data were collected when the cables were operated under the normal condition with only occasional values corresponding to anomalous conditions. We use n tr = 0.8 N and the remaining samples for testing the performance of the time series predictor. The performance of our chosen set of time-series predictors are shown in Table 1, where the results are presented for the SNR of the first stabilizer batch. Exp MV and Exp LV are the two in-field collected (or experimental) data sets for the MV network and the LV network for which instances of network anomalies have been recorded. The MV link consists of a 376 m three-core cable, and the LV link comprises a four-core cable over 69.4 m long Syn 1 , Syn 2 , Syn 3 are the synthetic data sets generated using the three LI models L k,j for k = 1, 2, 3. For brevity, we only present selective results for ARIMA models. From Table 1, we can observe that FFNN, LSTM, L2Boost and some ARIMA models match or improve the performance over the baseline setting. Moreover, the LSTM model shows the best performance across the data sets that we have investigated, supporting its suitability to time-series prediction tasks. Similar results were obtained for other subcarrier groups.
We also note from Table 1 that the performance of the baseline model is often fairly close to those from other time-series prediction models. This is highlighted in Table 2, which shows the results for the ARIMA(2, 0, 1) and the baseline models for all subcarrier groups. Therefore, since the baseline predictor does not require any training and presents no additional computational complexity (see (2)), the anomaly detector can begin prediction with this technique until sufficient samples are collected over the operation to use other predictors that require a meaningful set of training data. From the results in Table 2 we also observe a larger variation in the prediction accuracy across subcarrier groups for the in-field compared to the synthetic data. We attribute this to the same LI models and noise PSD used across the subcarrier groups, which is likely not the case regarding the in-field data.

Anomaly Detection for Studied Data Sets
In this section, we develop and test our anomaly detector for the studied data sets. According to the discussion in Section 3.2, we approximate the prediction errors for the average SNR values as a multivariate Gaussian distribution, with a dimension of nine since we have nine stabilizer batches in total. We then calculate D 2 MA using (6) and use (9) for the anomaly detection with varying p FA and κ = n SB = 9.
The only available recorded anomaly events for the in-field data in [54], are the switching operations at the 20th day in the data set Exp MV and the fuse failure at the 156th day in the data set Exp LV . For each stabilizer batch, we compute the average SNR data and calculate the SMDs based on the prediction errors. The results for data set Exp MV and data set Exp LV are shown in Figure 2a and Figure 2b, respectively. The two documented events are clearly seen in these two figures as notable spikes. To relate this result with the observed raw data, we present the SNR color maps for the two data sets in Figure 3a,b. It is also clearly noticeable from the figures that there are multiple (undocumented) anomalies in the LV data in Figure 3b, which are rightly represented as notable spikes in the SMD plot of Figure 2b. The higher rate of the indicated abnormal events in LV networks in comparison with MV networks can be attributed to the increased presence of interference and higher disturbance levels in an LV network.
While the two documented events in the in-field data provided us the opportunity to test the performance of our solution using real-world data, the exercise does not provide a comprehensive evaluation of our method, especially for different types of cable anomalies and operation under various load types and load changes. To this end, we use synthetic training and testing data sets obtained from the network and LI models constructed as explained in Section 4.1. This provides us the flexibility to choose a variety of load and anomaly types to investigate the robustness of our method. We identify three main categories of anomalies, similar to those in [58], which are, concentrated faults, distributed faults (DFs), and abnormal termination impedance changes. We emulate a concentrated fault by inserting a fault resistance r f between a pair of conductors at the fault point. Such a line-line fault is the most common among all types of hard faults [59]. This process can also be extended by placing a fault impedance r f between each pair of conductors to emulate a symmetrical fault. To emulate a DF, we increase the perunit-length (PUL) resistance of the conductor and the PUL conductance of the insulation materials over a section of the cable that is affected by this degradation. For many types of DF, the conductors have a deteriorated conductance and the insulation material has degraded insulation property [60], which we emulate by this process. Finally, to emulate the abnormal termination impedance changes, for our synthetic generators, we change from one LI model to another, among the three that we use, over a period of time, e.g., one hour for four samples.
We first present the results of the change in SNR values with the introduction of a concentrated fault. We introduce a fault impedance r f = 100 Ω between a pair of conductors at a location that is 100 m from the PLM transmitter, i.e., PLM-1 in Figure 1. We show the impact of this in Figure 4 by contrasting the average SNR change of one stabilizer batch for the condition of concentrated fault in Figure 4a with LI model 1 and a termination impedance change from LI model 1 to LI model 3 in Figure 4b. It is clearly visible that these variations cause a significant, noticeable, and distinctive change in the measured SNR values. As a result, we focus on the more challenging case of DF in the following.  We introduce three different types of DFs, a slight DF, a mild DF, and a medium DF. We emulate each of these three conditions by increasing the PUL serial resistance and shunt conductance of the cable by 10%, 20%, and 60%, respectively, to emulate different extents of cable degradation [61]. We introduce the DF over a 300 m section of the cable with the starting point of the faulty section being at a distance of 100 m away from PLM-1.
The average SNR values of the first stabilizer batch over time, as shown in Figure 5a,b, signify that detecting a DF is more challenging than a hard fault. We employ our anomaly detection procedure, and accordingly compute the SMDs, as illustrated in Figure 6, where the faulty events are indicated as distinctive spikes in the middle. We then determine the anomaly detection thresholds with an FA rate p FA either theoretically using (9) or empirically through the training data. For the empirical determination, we sort |D 2 MA | for the training data prediction difference in the descending order as d i from d 1 to d (n tr −w) . We then compute the threshold as where · is the floor function.
Choosing the threshold involves a trade-off between the detection rate p DT , i.e., the probability that an anomaly can be successfully detected, and p FA , i.e., the probability that a normal condition is identified as an anomaly. An increase in detection rate is typically accompanied by higher FA rates. We show this behavior in the receiver operating characteristic (ROC) curve for our anomaly detection solution in Figure 7. Since the performance of our method for the cases of mild and medium DFs are nearly ideal for all candidate forecasting choices, we only present ROC behaviors for the more challenging case of slight DF. We generate 100 different test cases, where in each case, we introduce a slight DF in the middle of the time series. The blue single-step (SS) curve in Figure 7 is the baseline prediction method in (2), and AVG is an alternative trivial prediction scheme that uses the average of the training data as the predicted value at all times. We observe from Figure 7 that ARIMA and baseline predictors provide the best detection performance, as also evidenced in Table 1 for prediction performance. However, anomaly detectors using LSTM or other data driven time-series predictors demonstrate worse performance than even the AVG predictor for the case of slight DF. We observe that data driven time-series predictors, including LSTM, FFNN and L2Boost, have good prediction performance both before and after the slight DF is introduced. This shows that they adapt better to the case of faulty cable condition. Such generalization ability to unobserved data with a slight difference from the training data is a detriment to anomaly detection as it does not produce a distinct change of the prediction error after the slight DF is introduced. For more distinct DFs however, e.g., mild and medium DFs, anomaly predictors using data driven time-series predictors and those using the classical ARIMA models have matched performance. For an FA rate of p FA = 1%, we obtain the threshold as T r (p FA ) = 21.67 theoretically using (9), or empirically using the training data and (13) as T r (p FA ) = 23.91 as an alternative. For the generated test cases, the threshold to achieve an FA rate of p FA = 1% is T r (p FA ) = 23.75, which is very close to the threshold determined theoretically using (9) or empirically using the training data. This shows that both theoretical and empirical approaches are viable methodologies to determine the threshold T r (p FA ).

Supplementary Evaluation
In this section, we further investigate the suitability of our proposed solution in practical scenarios. In particular, we address two challenges faced in practice, which are the lack of available data for training and the identification of cable anomalies that are gradual in nature, such as an incipient fault.

Robustness Test
Our evaluation campaign in Section 4 involved using historical SNR time-series data for training and prediction. This type of data collection is suitable in fixed asset monitoring. However, we investigate the suitability of using our solution as a dynamic diagnostics technique, where a machine is trained to detect anomalies on one type of a network and required to function on another type. This expands the scope of our proposed solution to make it more universally applicable, where, e.g., the SNR data from one pair of transmitter and receiver can be used to detect anomalies in networks operating in a different portion of the grid. A likely more beneficial use-case is to train the machine using synthetic data extracted from a best-guess estimate of the network-under-test and to use it in a real-world network to detect cable anomalies. We conduct both these investigations and present the performance results in Figure 8a,b. In the first evaluation, we train the machine using SNR data extracted from the dataset Exp MV2 , another MV experimental dataset from [54], and test it over the data set collected in a different portion of the MV network, i.e., dataset Exp MV . The result in Figure 8a shows a clearly discernible spike in the SMD plot, which is easily detectable by our anomaly detector with little/no FA. The adjacent Figure 8b demonstrates that training the network with synthetic data, which were generated using L 3 according to the procedure explained in Section 4.1, and testing it with the in-field collected data set Exp MV is also able to detect network anomalies. The results from Figure 8a,b indicate the robustness of our solution to variations between training and application data.

Incipient Fault
Our investigations in Section 4 considered faults that are abrupt, i.e., occurring to their full extent at one instant of time. However, the cable may also be susceptible to an incipient fault, which is introduced gradually over time. We emulate such a condition by generating a 132-day time sequence, where the incipient fault begins to develop on the 66th day. We quantify the severity of the fault after the 66th day by γ(t) ∝ t, where t is time in seconds. We increase the PUL serial resistance and PUL shunt conductance by a factor of γ(t) between γ(t) = 0 on the 66th day to γ(t) = 2 on the 132nd day. We place the incipient fault on a cable section of 300 m whose starting point is 100 m from the transmitter PLC modem, i.e., PLM-1, and use L 1 to generate our synthetic SNR data. We train the predictor using normal operating conditions, i.e., without the incipient fault, and then use ARIMA(2, 1, 1) for time-series forecasting. The resultant SMD for the generated incipient fault case is shown in Figure 9. The SMD plot shows spikes indicating a fault from the 66th day onward and whose magnitude increases as time progresses. Naturally, the choice of the threshold determines how quickly an incipient fault can be detected and what the FA rate is that is sacrificed in the process. This decision would be made based on the operating scenario.

Conclusions
We have designed a first-of-its-kind PLC-based universal cable anomaly detector as a smart grid sensing solution using time-series forecasting and statistical test of prediction errors. Our low-cost solution repurposed PLC modems as sensors to also enable monitoring of the grid system to ensure its smooth operation and improve its resilience by reusing the channel state information inherently estimated by the modems. Our method, which combines forecasting with the post-processing of prediction errors based on Mahalanobis distance, produces a robust cable anomaly detection performance. Further, our solution is applicable across various network conditions and can operate without prior domain knowledge of the anomaly, network topology, type of cable, or load conditions.