A Low-Complexity Model-Free Approach for Real-Time Cardiac Anomaly Detection Based on Singular Spectrum Analysis and Nonparametric Control Charts

: While the importance of continuous monitoring of electrocardiographic (ECG) or photoplethysmographic (PPG) signals to detect cardiac anomalies is generally accepted in preventative medicine, there remain numerous challenges to its widespread adoption. Most notably, difficulties arise regarding crucial characteristics such as real-time capability, computational complexity, the amount of required training data, and the avoidance of too-restrictive modeling assumptions. We propose a lightweight and model-free approach for the online detection of cardiac anomalies such as ectopic beats in ECG or PPG signals on the basis of the change detection capabilities of singular spectrum analysis (SSA) and nonparametric rank-based cumulative sum (CUSUM) control charts. The procedure is able to quickly detect anomalies without requiring the identification of fiducial points such as R-peaks, and it is computationally significantly less demanding than previously proposed SSA-based approaches. Therefore, the proposed procedure is equally well suited for standalone use and as an add-on to complement existing (e.g., heart rate (HR) estimation) procedures.


Introduction
The ubiquity of powerful smartphones and other smart devices, which nowadays incorporate a plethora of advanced sensing capabilities, has led to an increasing trend in the consumer sphere to continuously gather and evaluate physiological signals [1,2].In particular, cardiovascular parameters such as heart rate (HR) and pulse rate (PR), extracted respectively from measurements of myocardial electrical potentials through electrocardiographic (ECG) observations [3][4][5][6] and from measurements of volumetric changes in blood perfusion during cardiac cycles by optical means through photoplethysmographic (PPG) observations [7][8][9], are being recorded and analyzed in apps, fitness trackers and so forth, with great potential benefits for public health [1,[10][11][12].
Virtually all of such consumer-oriented apps and devices fall into the category of fitness and well-being products, thereby avoiding the substantial burden of having to comply with requirements imposed by medical device regulatory frameworks [13,14], which is reflected by often notoriously inaccurate and unreliable results [15,16].Moreover, functionality is usually limited to providing estimates of the average HR.
While low-resolution averaged HR estimates may be of some use in fitness and well-being scenarios, from a clinical perspective, the detection of sudden changes in the signal structure is of the utmost importance.ECG recordings from a healthy heart are characterized by a sinus rhythm, wherein the normal cardiac cycle begins with an action potential in the sinoatrial (SA) node, located in the right atrium, which propagates and depolarizes neighboring cells.The depolarization of the SA node spreads rapidly throughout both atria, specifically to the left atrium through Bachmann's bundle and through internodal pathways in the right atrium to the atrioventricular (AV) node.Full depolarization gives rise to the P-wave, which initiates atrial contraction.From the AV node, excitation is further propagated after an initial delay of about 100 ms through the bundle of His, which splits up into the right bundle branch and the left bundle branch, initiating respectively the depolarization of the right and left ventricle, yielding to the conspicuous QRS-complex, which ends with completely depolarized and contracting ventricles.Ventricular repolarization following the contraction eventually results in the T-wave and concludes the normal cardiac cycle.We note that both left and right bundle branches eventually differentiate into a large number of Purkinje fibers, the repolarization of which is thought to occasionally result in an additional U-wave [3].
Deviations from the normal sinus rhythm are referred to as arrhythmias and comprise a large number of specific arrhythmias (see, e.g., [17]).Heart rhythms exhibiting variations in timing, such as those that are either below 60 beats per minute (bpm) or above 100 bpm, as well as rhythms disrupted by changes in the morphology, for example, as a result of ectopic beats (i.e., heart beats whose origin is different from and outside of the region typically responsible for impulse generation, namely, the SA node) such as premature ventricular contractions (PVCs) or premature atrial contractions (PACs), all qualify as arrhythmias.Furthermore, they all tend to induce distinctive changes in the ECG signal, thereby allowing for a change detection approach, which must not necessarily be based on templates corresponding to the various arrhythmia-induced changes [18].Additionally, many arrhythmias, although usually not acutely life-threatening, are paroxysmal and asymptomatic and therefore likely to go unnoticed for long periods of time, which carries the risk of exacerbation and possibly the development of more serious types of arrhythmias [3,17,18].Outpatient cardiac monitoring through wearable devices and apps is commonly accepted as a promising approach to tackle this issue and improve treatment outcome while at the same time lowering overall healthcare costs [1,[10][11][12]18,19].
The automatic monitoring of ECG signals has been researched for decades, and various algorithms have been proposed and implemented.However, the shift towards outpatient monitoring through low-power wearable devices and apps introduces additional challenging requirements such as real-time capability, the ability to cope with rather noisy and low-quality signals with various artifacts, and harsh constraints on computational complexity and power consumption [18][19][20].Various approaches for the automatic online detection of cardiac arrhythmias have been proposed in the literature.Machine learning approaches have been adopted by numerous authors [21,22], and as have wavelet- [22,23], artificial neural network (ANN)- [24][25][26], and decision tree-based [27] approaches.For a more detailed review, the reader is referred to recent review papers [18,28].Most approaches require the extraction of certain features from the signal, such as the location of QRS complexes [20,27,29,30] or R-peaks [22,23,[31][32][33].This is commonly performed using the algorithm proposed by Pan and Tompkins [34] or variations thereof [18,28], resulting in an inherent vulnerability to inaccuracies in the initial estimation of these fiducial points.The reliance upon rather restrictive modeling assumptions and the requirement of large amounts of training data is not uncommon.On the other hand, recent research has resulted in improved and numerically efficient QRS detectors that are particularly suited for mobile battery-powered applications.Among the latter, we point out the fast QRS detection algorithm proposed by Elgendi [35], as well as the general two event-related moving averages (TERMA) framework [36] by the same author, for being highly efficient, yet much simpler and faster than conventional QRS detectors.It is furthermore worth noting that Elgendi and colleagues in [37] and more recently in [38] were able to show the good performance characteristics of TERMA-based QRS detectors to also hold for compressed ECG signals.In particular, in [38], they were able to corroborate this for ECG records decimated by a factor of up to 8, making TERMA-based QRS detectors particularly well suited for mobile health applications in low-and middle-income countries (see also [39]).
We propose a lightweight and model-free approach for the online detection of cardiac anomalies such as ectopic beats in ECG or PPG signals on the basis of the change detection capabilities of singular spectrum analysis (SSA) and nonparametric rank-based cumulative sum (CUSUM) control charts.The procedure is able to quickly detect anomalies without requiring templates, extensive training data sets or the identification of fiducial points such as R-peaks.This is accomplished by leveraging the power and versatility of SSA to capture the essential signal structure as well as potential changes therefrom.We note that, crucially and contrary to the aforementioned QRS detectors, this is achieved without requiring a morphological dissection of the signal.From a computational complexity perspective, while our proposed method is certainly more involved than a QRS detector, it is significantly less demanding than previously proposed SSA approaches.
The proposed method is essentially composed of two consecutive steps: a SSA-based algorithm is sequentially applied to the observed data to construct viable test statistics that reflect potential changes in the cardiac signal, and these statistics are then monitored using distribution-free CUSUM-type control charts.
The use of SSA in a sequential framework as a means for change detection as introduced in Section 2 is based on works by Moskvina and Zhigljavsky, discussed in detail in [40] and in a more condensed fashion in [41], although it should be noted that the concept was already described earlier in [42].This algorithm has successfully been applied to various real-world detection problems, for example, anomaly detection in cognitive radio networks [43], smart power grids [44], software engineering [45] and change point detection for complex-valued time series [46].In the biomedical context, it has shown to be useful for the identification of freezing of gait in patients with Parkinson's disease [47], and the detection of anomalies in periodic biosignals such as ECGs [48], as well as in other biomedical applications [49].
Building on the prior technique outlined in Section 2, we introduce a novel SSA-based change detection procedure (lightweight singular spectrum analysis change point detection: l-SSA-CPD) that exhibits good performance characteristics while at the same time drastically reduces the computational burden.As is shown in Section 3, this is accomplished by modifying the conventional SSA-based change detection algorithm such that the computationally expensive task of computing the singular value decomposition (SVD) is only performed at the very beginning instead of each time a new data point becomes available.Furthermore, the proposed procedure uses more elaborate test statistics that take into account the information derived from the angle between data vectors representing new observations and the subspace representing the signal characteristics as well as the Euclidean distances.Lastly, our procedure differs from previous approaches also in that rank-based control limits and the reinitialization of control charts after an anomaly has been detected are used.
A performance evaluation of l-SSA-CPD using ECG and PPG records from the publicly available Physionet Challenge 2015 training database (PC15) [50,51] are presented in Section 4.3 and compared to a recently proposed competing algorithm by Pflugradt et al. [52].
Furthermore, we evaluate the performance of our simpler method against that of a previously published work by Uus and Liatsis [48], which is related to our work in that it, too, is based on leveraging the change detection capabilities of SSA.For the latter assessment, we used 10 cardiologist-annotated ECG records from the St. Petersburg Institute of Cardiological Technics 12-Lead Arrhythmia (INCART) database, which is also publicly available through Physiobank [51].These results are presented and elaborated on in Section 4.4 and are eventually followed by a short discussion in Section 5, which concludes this paper.

Fundamentals of Singular Spectrum Analysis
SSA is a technique of time series analysis and can be interpreted as belonging to the general class of principal component analysis (PCA) methods.SSA has become a standard tool in meteorology and climatology but is mostly unknown outside of those disciplines.Golyandina et al. [42] attribute this to the nature of SSA being more a technique of multivariate geometry than of statistics.According to their representation, SSA should rather be seen as an exploratory, model-building tool than a confirmatory procedure.In essence, SSA can be seen as the application of PCA to the "trajectory matrix" (obtained directly from the original time series) with the subsequent attempt to reconstruct the original series.Prior to proceeding to SSA for change detection, a short introduction to the basic SSA algorithm appears in order.

Basic SSA Algorithm
Consider N observations X N = (x 1 , . . . ,x N ) of a univariate time series and an integer M (1 < M << N) commonly referred to as the window length, lag-integer or embedding dimension.The basic SSA algorithm is commonly described as being composed of the following four stages (see, e.g., [42,49,53]): 1. Embedding: Notice the Hankel-structure of X = x ij M,K i,j=1 , i.e., X has equal elements on the anti-diagonals i + j = const.
One can think of X as multivariate data with M characteristics and K observations and accordingly X j of X as vectors in the M-dimensional space R M .

Singular Value Decomposition of X
Taking the SVD of X decomposes the trajectory matrix into its orthogonal bases and yields a collection of M eigenvalues and eigenvectors.Let λ 1 ≥ • • • ≥ λ M ≥ 0 and U 1 , . . ., U M denote, respectively, the eigenvalues and eigenvectors of XX T and the rank of X be denoted as d = max (i, such that λ i > 0).The SVD of X can then be rewritten as the sum of d elementary matrices with matrices Note that V i are the eigenvectors of X T X and √ λ i , U i , V i the eigentriples of the SVD in Equation (2).Also note that due to the symmetry of left and right singular vectors, the SVD of trajectory matrices obtained with window length M and K = N − M + 1 are equivalent.Accordingly, one can impose the limitation M ≤ N/2 on the window length since there is no additional benefit in using a larger window (see, e.g., [53] at 47, [42] at 69).

Eigentriple Grouping
In order to separate the signals of interest from noise and artifacts, the third stage of basic SSA aims to find particular disjoint subsets of the set of indices {1, . . ., d} such that the respective systems of eigenvectors span the subspaces associated with the different signal components.
Consider the task of separating a signal of interest from unwanted noise.One then looks for a certain subset of indices I = {i 1 , . . ., i l }, l < d ≤ M that span an l-dimensional subspace in R M , denoted as L I ⊂ R M = span{U I } = span{U i 1 , . . ., U i l }.Analogously, the remaining eigentriples with Ī = {i 1 , . . ., i d } \ I span the noise subspace L Ī ⊂ R M = span{U Ī }.
The trajectory matrix component X I corresponding to the subset I of eigentriples associated with the signal of interest is then and the component X Ī corresponding to the subset Ī = {i 1 , . . . ,i d } \ I associated with the remainder of the observed signal is such that In the case of separability (see, e.g., [53] at 17), the contribution of X I to the entire observed signal X is represented by the respective share of eigenvalues ∑ i∈I λ i / ∑ d i=1 λ i .

Diagonal Averaging
For perfectly separable components, all matrices in the expansion of Equation ( 5) are Hankel matrices.For real world problems, however, such perfect separability is rarely achievable and results in matrices with unequal entries on the antidiagonals.The last step of the basic SSA algorithm therefore performs a Hankelization of said matrices, i.e., a diagonal averaging is performed on all the X i of Equation ( 5) yielding matrices X i that have equal elements on the antidiagonals One can then e.g., easily reconstruct the approximation of the signal of interest through the eigentriples with indices I through the one-to-one correspondence between X I and the respective time series X N = ( x1 , . . . ,xN ) which provides an approximation of the entire time series X N or some components of it, depending on the particular choice of indices I.
The usefulness of basic SSA is illustrated in the example depicted in Figure 1 where the wandering baseline of an ECG signal (blue solid line) is removed by subtracting the trend reconstructed through SSA (with a window length of M = 100 and using the first two eigentriples yielding the cleaned ECG signal (green solid line).For a more detailed discussion of SSA, we refer to two well-known monographs [42,54] in the field as well as [49,53] and references therein.

SSA Based Change Detection: Prior Art
The sequential application of SSA described in the following is based on work by Moskvina and Zhigljavsky and will be referred to as MZ in the remainder of this paper (see [40][41][42]).The need for an adaptation of basic SSA is due to the circumstance that it operates in batch mode and is therefore not suited for online change-point detection.
Assume a truly sequential problem in which observations x 1 , x 2 , . . .arrive one at a time.Having collected a sufficiently large number N of observations, MZ constructs the trajectory matrix B (the subindex B refers to 'base' for reasons that will become obvious in a moment) for time index n with M ≤ N/2, K = N − M + 1 and performs the SVD and grouping steps as in basic SSA yielding an l-dimensional subspace L (n) I ⊂ R M spanned by the respective eigenvectors which captures the main structure of the signal.
The basic idea of MZ relies on the fact that the distance between the vectors X (n) j , j = 1, . . ., K and L (n) I , controlled by the specific choice of I, can be reduced to rather small values.If monitoring of the series {x t } N t=1 continues for t > N without a change in the underlying data generating mechanism, the vectors X j , j > K are expected to remain relatively close to L (n) I while, on the other hand, if such a change were to occur at time N + τ, the distance between X j , j ≥ K + τ and L I would increase as it would move such vectors X j out of the subspace L (n) I (see [41] at 2).Therefore, said distance can be used as a test statistic for change-point detection.Note that only the first three steps of basic SSA need to be performed since reconstruction of the original series is not required.MZ constructs two matrices, the above mentioned base matrix X (n) B , i.e., the trajectory matrix using data samples x n+1 , . . ., x n+N , and a test matrix X (n) T using observations x n+p+1 , . . ., x n+q+M−1 .The former is subjected to SVD and used to obtain the subspace L (n) I while the latter serves to calculate the sum of squared Euclidean distances between its column vectors and L (n) I .This process can be thought of as having two (possibly intersecting) windows (of M-dimensional data), of length K and Q = q − p respectively, slide over the data.
Let N, M, l, p, q be fixed integers s.t.l < M < N/2 and 0 ≤ p < q.Then, for each n = 0, 1, . . .MZ proceeds as follows: 1. Apply SSA on the interval [n + 1, n + N] (after centering x n+1 , . . ., x n+N ) to get L (n) I (a) Construct the trajectory/base matrix X where 3. Compute the detection statistic D n,I,p,q D n,I,p,q where I , i.e., D n,I,p,q is the sum of squared Euclidean distances between the columns of X (n) T and L (n) I .MZ normalizes the sum of distances D n,I,p,q to the number of elements in and further normalizes the test statistic as S n,I,p,q = Dn,I,p,q v n (12) such that it does not depend on the unknown variance of the noise (see [40] at 28) with v n being an estimator of Dn,I,p,q , e.g., v n = Dm,I,0,K with m ≤ n such that the hypothesis of no change can be accepted.
4. Monitoring of S n,I,p,q using CUSUM-type Control Charts MZ then constructs the following CUSUM-type control chart: with κ suggested as κ = 1/ 3 √ MQ (see [40] at 29) and threshold h MZ = 1 + 1.9 √ M (see [40] at 35).A change-point at n is then declared if holds.

The Proposed Method (l-SSA-CPD)
While MZ provides a powerful methodology that could be applied directly to raw ECG (or PPG) data, it exhibits some drawbacks (for the particular application at hand) that motivated the development of the novel approach to be presented below which we shall refer to as lightweight-SSA-ChangePointDetection (l-SSA-CPD).

Motivation and Informal Description of the Improvements
As discussed in the preceding Section, MZ makes use of two (possibly intersecting) windows that are slid over the observed time series, one comprising the data that is embedded to form the trajectory matrix, which is then decomposed by means of SVD to identify an appropriate low(er)-dimensional subspace, and another one containing new (or, in case of overlap, a combination of old and new) observations whose distance to said low-dimensional subspace is then used as a test statistic.This entails the quite burdensome step of performing a SVD every time a new data sample becomes available.
We shall first highlight the main improvements of our method prior to its formal description.

• Low Computational Complexity
Small variations over time are intrinsic to cardiac signals and may, besides noise and motion artifacts, e.g., be due to Heart Rate Variability.Contrary to anomalies caused by abnormal cardiac excitation phenomena, these changes in the time between consecutive R-peaks are subtle and often not readily discernible.Most importantly, they do not induce changes as severe as to change the signal's main characteristics which are captured through the decomposition and grouping stages of SSA.This is illustrated in Figure 2 which shows a raw (unfiltered) ECG signal with two distinctly shaped PVCs (highlighted in purple) in the third quarter of the excerpt.
For the task at hand, performing the SVD of a newly generated trajectory matrix each time a new data point becomes available is not strictly necessary.We are able to drastically reduce the computational burden by generating only one initial trajectory matrix X B and relying on the obtained reference subspace L (0) I throughout the monitoring.This is shown in Figure 2 with the section of the signal highlighted through gray and blue backgrounds representing the intervals used to generate X .The computational burden of l-SSA-CPD is greatly reduced compared to MZ by relying on the reference subspace obtained from an initial, non-sliding trajectory matrix (illustrated by the gray background area).Furthermore, note that l-SSA-CPD's reference subspace remains locked on the main signal's characteristics while in MZ, since the reference subspace is updated at each observation (illustrated by the gray area sliding as well), it will lock on the two anomalies (highlighted in pink) for some time as the two moving windows are pass over them.
Clearly, relying on a fixed reference subspace L I throughout the entire monitoring process goes along with an inherent limitation of the potential application scenarios l-SSA-CPD would be suited for as it implies the fixed nominal subspace to be a valid representation of the anomaly free signal over long periods of time.While such an assumption may at first appear extreme and untenable, this is not the case when monitoring subjects at rest over relatively short periods of time (e.g., in the order of 2-10 min) as would be the case in a typical prescreening scenario for ambulatory care settings.Said implied assumption however becomes untenable in other relevant scenarios such as continuous long-term monitoring as well as the monitoring of patients not at rest; for in those cases the general cardiovascular responses to homeostatic disturbances triggered by various physical and mental stressors would suffice to move new observations out of the designated nominal subspace.
There exist, however, various rather straightforward mitigation strategies to this problem.For instance, we may view the method presented here as a special case of a more general l-SSA-CPD that allows for the regular recalculation of the reference subspace according to an update frequency which depends on the particular scenario.Possibilities to lower the computational burden on the mobile device are equally manifold and include, among others, establishing a wireless data connection to a server or PC and outsourcing demanding tasks such as SVD or substituting SVD with a computationally less expensive alternative such as SVD updating algorithms.Eventually, such considerations inevitably boil down to a scenario specific trade-off between required performance and available resources.

• Simplicity
By sliding only a single instead of two windows over the time series the entire procedure is simplified and benefits from a reduction in tuning parameters.
In fact, while the total number Q of columns in X T is of course relevant, p and q are not since, due to B an overlap of X B and X T can only occur in the first N − p samples for p < N.
While our algorithm allows for such an initial overlap of X B and X T , the following discussion is purposely limited to p = N < q, which is in line with recommendations by Moskvina and Zhigljavsky who point out that p = N, q = N + 1 and accordingly Q = 1 is a very reasonable choice if minimizing the detection delay is of importance, since Q > 1 entails a smoother behavior of the test statistic and thus a loss of agility (see, e.g., [40] at 30).
The question as to whether or not X B and X T should overlap and if so by how much is therefore removed.Furthermore, since p = N, q = N + 1 can generally be recommended (see Section 4), we can omit both tuning parameters p and q.
• Augmentation of Test Statistic by considering the angle between L (0) I and X (n) j Some authors [55][56][57] successfully proposed a modified version of MZ, wherein the test statistic is based on angles rather than on Euclidean distances.While both approaches are viable on their own merits, we chose to merge them as they augment each other yielding a test statistic that, according to our results on raw ECG and PPG records, performs favorably compared to the test statistic constructed using either one on its own.
In other words, we augment and improve upon the test statistic of MZ by making use of the information from the angles between L (0) I and X (n) j as well.
• Improved thresholding through Sequential Ranks CUSUM MZ provides further potential for improvement by employing a CUSUM-type control chart (see Equations ( 13) and ( 14)) whose control limit (or threshold) h is obtained through suitable normal approximation and asymptotic considerations (see [40] at 31; see also [41] at 8).
The nuisance of having to properly normalize the test statistic (see Equation ( 12)) is a direct consequence of this design choice.
We instead propose the use of McDonald's Sequential Ranks CUSUM (SRC) [58] which we deem to be more appropriate and in line with the model-free nature of SSA.

Restarting of SRC control chart after it signaled
Lastly, to allow for the detection of multiple and potentially nearby change-points we restart the SRC every time after it signaled an anomaly by exceeding the preset threshold h SRC .This is indicated since otherwise, depending on the extent of the anomaly (in terms of number of samples) and how it relates to the embedding dimension M as well as the number Q of columns in the Hankel matrix X T , it may take some time before the anomaly propagates through and clears X T (i.e., our sliding window) thereby resulting in a return of the test statistic to its 'baseline level'.This is illustrated in Figure 3 where, as depicted in (a), three windows (differing in size) with Q = {1, 20, 40} are slid over an ECG containing a single PVC (highlighted in purple and marked with an arrow).As can be seen in (b), while Q = 1 is a feasible choice even without restarting the control chart after exceeding the threshold h SRC , since the respective SRC returns to values below h SRC after a relatively long but perhaps still acceptable amount of time, the same cannot be said for Q = {20, 40}.Part (c) of Figure 3 illustrates the clear benefits of restarting the control charts, in that regardless of the choice of Q the PVC is detected and monitoring for further changes can swiftly resume.As was to be expected, Q = 1 is favorable in terms of detection delay.

Formal Description of l-SSA-CPD
To allow for better comparison, we use the notation introduced in Section 2.2 as far as possible.
Let N, M, l, p, q be fixed integers s.t.l < M < N/2 and 0 ≤ p < q.Then our method proceeds as follows: 1. Initialization at n = 0 SSA is applied on the interval [n + 1, n + N] to get L I = L Then, for each n = 0, 1, . . .we proceed as follows: see Equation ( 9).

Compute the detection statistics
taking values in 0, π 2 , accordingly D † 2 n,I,p,q ∈ [0, 1], U I = U i 1 , . . ., U i l being the M × l matrix of eigenvectors spanning L I , and • denoting the Hadamard (element-wise) product.
The Sequential Ranks CUSUM is then with C 0 = 0 and k SRC being a reference constant.
The SRC then signals and a change-point at n is declared if holds, i.e., if C n exceeds a predetermined control limit h SRC .
It can be shown [58] that, given that no change in the monitored signal occurred, the quantities R n n+1 are independent and discrete uniform on which represents a crucial advantage of the SRC in that it implies that for any k SRC we can obtain the control limit h SRC without the need for any historical training data or further assumptions through simulations as outlined in Algorithm 1.

Performance Evaluation
To evaluate and assess the performance and utility of our method we use records which are publicly available through Physiobank [51], a vast and commonly used resource for ECG and other biophysiological data.In particular, since we claim our method to be suitable for both ECG and PPG data, we found the Physionet Challenge 2015 training database (PC15) [50] to be of particular interest as it provides a collection of synchronized ECG and PPG recordings from which we chose a subset similar to the one used in [52].With PC15 records not being annotated, we purposely chose to limit our evaluation to records containing PVCs (with different frequencies of occurrence) since they can quite accurately be spotted by careful visual inspection.
Furthermore, we evaluate the performance of our simpler method against a competing one proposed by Uus and Liatsis [48], which is related to our work in that it, too, is based on leveraging the change detection capabilities of SSA.For the latter assessment, we use ten cardiologist-annotated ECG records from the St. Petersburg Institute of Cardiological Technics 12-Lead Arrhythmia (INCART) database, which is also publicly available through Physiobank [51].The additional use of INCART records shall shed light on the robustness of l-SSA-CPD, since as we shall see, contrary to PC15, INCART contains both very noisy signals as well as occasional occurrences of arrhythmias other than PVCs.
Before presenting some results, it seems appropriate to briefly restate the goal of our method, which is to provide a lightweight, model-free tool capable of providing rough assessment and general anomaly detection capabilities under tight resource constraints, e.g., to be used as a pre-screening tool.It is therefore not to substitute for but rather to complement more sophisticated procedures in very much the same way in which it is not a substitute for QRS detectors and should be valued on its own particular merits (which have been discussed in Sections 1 and 2).

Performance Metrics
In reporting our results we rely on established metrics commonly used in the literature and report sensitivity (Se), specificity (Sp), and accuracy (Acc) defined as with TP, FP, TN, FN being the number of true positives, false positives, true negatives, and false negatives, respectively.Accordingly, sensitivity quantifies the ability to correctly detect actual anomalies while vice versa specificity quantifies the proportion of non-abnormal segments that are correctly identified as such.Accuracy, on the other hand, assesses the overall performance in terms of both correctly identified abnormal and non-abnormal segments.
Note that there is a nonzero detection delay introduced by the use of a control chart.Typically, said delay tends to be longer for nonparametric control charts such as the SRC compared to parametric charts (see, e.g., [58]).Use of the latter however would require imposing a parametric model and therefore inevitably conflict with our goal of minimizing (distributional) assumptions as much as possible.
For an event occurring at time instance n we allow for a certain detection delay τ d and consider a signal from the control chart as true positive if it falls in the interval [n, n + τ d ].All results presented here were obtained using τ d = N, i.e., we allow for a detection delay less or equal to the length of the interval used to construct the initial trajectory matrix X B .Furthermore, note that when directly comparing (synchronized) ECG and PPG signals there is an inherent delay (between the R-peak of the ECG and the respective pulse peak in the PPG) in the PPG signal due to the propagation delay of the pulse pressure wave through the arterial system.This is commonly referred to as Pulse Arrival Time (PAT) or Pulse Transit Time (PTT) (see, e.g., [9,59]) and illustrated on a short data excerpt (containing a single PVC, highlighted by blue and red backgrounds for ECG and PPG, respectively) in Figure 4. To account for the PTT delay, when dealing with PPG signals we shift the interval [n, n + τ d ] by 2M/5 , taken to be a rough estimate of the actual PTT.

Setup and l-SSA-CPD Parameters
The PC15 database [50] provides a collection of ECG and PPG recordings from which we chose a subset similar to the one used in [52].With PC15 records not being annotated, we purposely chose to limit our evaluation to records containing PVCs (with different frequencies of occurrence), since they can quite accurately be spotted by careful visual inspection.Eventually we included 8 records (composed of two ECG and one PPG signal per record) with varying frequency of PVC occurrence in our analysis.The records are approximately 5 min long with the sampling frequency being 250 Hz, yielding about 75,000 observations each.
Since an in-depth discussion of how to select important SSA tuning parameters, most notably window length M and number of eigentriples used (i.e., selection of I), would be beyond the scope of this paper (see, e.g., [40][41][42]49,53,54] and references therein), it shall suffice to briefly discuss our settings and the rationale behind them.
Consider a periodic signal with period T, then for SSA to capture the main structure of the signal it is important that M be at least equal to T. Taking into account the physiological limits on HR and the sampling frequency of our signals, M = 300 appears to be a safe and reasonable choice.Accordingly, since as discussed in Section 2.1.1 we impose M ≤ N/2, we set N = 2M = 600.Furthermore, we set I to contain the leading l eigentriples such as to account for 92.5% of the data's variance.As for the SRC's control limit, we use h SRC = 59.4246 which we obtained through Algorithm 1 for B = 10 6 , N SRC = 3000, ARL 0 = 3000, k SRC = 0.5.
It shall further be emphasized that, unless otherwise stated, we apply l-SSA-CPD to the raw unfiltered data without any preprocessing steps.This is one further peculiarity that sets this work apart from not only the two competing methods it is benchmarked against in the following subsections but any related work we are aware of.Clearly, suitable preprocessing steps might further enhance performance.The objective here, however, is to ascertain whether or not useful information pertaining to the presence or absence of anomalies) can be obtained by solely applying our l-SSA-CPD with very general parameter settings.A direct performance comparison to MZ is omitted for two main reasons:

•
MZ would be computationally prohibitively expensive.
Recall that the PC15 records are approximately 75 • 10 3 samples long, requiring computing the SVD of a 300 × 301 trajectory matrix, assuming M = 300, N = 600, K = N − M + 1, Q = 1 about 74100 times as opposed to just once for l-SSA-CPD (see Figure 2).

•
We aim to assess whether, based on its own merits, the performance of l-SSA-CPD suffices to be considered for potential real life applications such as the use case presented in this paper.
To further this goal an in-depth comparative analysis to competing algorithms is not required and deemed to be beyond the scope of this paper.

Performance Evaluation and Comparison on PC15 Data
Table 1 shows the experimental results obtained by applying l-SSA-CP configured as described above to 8 PC15 ECG records with varying length Q = {1, 5, 10} of the test matrix X T .As elaborated on in Section 3.1, a crucial condition for l-SSA-CPD to work properly by relying solely on the fixed nominal subspace is that the first N samples, which are embedded to form the trajectory matrix X B , be an adequate representation of the underlying signal.In other words, we require this initial segment to be free of anomalies.If an anomaly occurs in the first N samples, those samples are to be discarded.This was the case for record t662s, which contains a premature ventricular contraction at about n = 332 < N and required us to discard the first 615 observations.Similar issues were encountered with some of the PC15 PPG traces.More specifically, the first 4000 observations were discarded due to heavy distortions for v253l's as well as as for v368s's PPG signal, as were the first 450 samples of v255l due the presence of an ectopic in the PPG signal.Note that in the case of v255l the ECG traces were not affected, since, consistent with delay due to PTT, the PVC was already over when the recording began.
Examining the entries of Table 1 it is apparent that l-SSA-CPD performs well, especially keeping in mind that in our setup it is applied with fairly general parameters to raw, unfiltered ECG traces.The bottom of the table presents the average performance over the entire 8 records for the three different test statistics {D † 1 n,I,p,q , D † 2 n,I,p,q , D † 3 n,I,p,q } and test matrix widths Q = {1, 5, 10}.Note how the performance of l-SSA-CPD with test statistic D † 3 n,I,p,q is not negatively affected by the at times deteriorated performance of D † 2 n,I,p,q and the absence of any benefit from using larger values for Q.These findings corroborate our recommendations made in Section 3.1 to use D † 3 n,I,p,q and Q = 1, with the latter being in agreement with results reported by other authors (see, e.g., [40,41]).Results obtained using the second ECG trace of the 8 PC15 records were similar and are therefore omitted.
Experimental results obtained by applying l-SSA-CPD to the PPG trace of the same 8 PC15 records are shown in Table 2, again with varying length Q = {1, 5, 10} of the test matrix X T .
Table 1.Detection performance of l-SSA-CPD on PC15 ECG trace I.  Comparing the entries of Table 2 with those Table 1 we notice an overall drop in performance.Nevertheless, l-SSA-CPD still manages to provide reasonable results.Furthermore, the recommendation of using D † 3 n,I,p,q and Q = 1 is shown to hold for the PPG traces as well.Focusing in particular on v368s in Table 2 allows us to highlight an instance of failure and to briefly address its causes.Specifically, after the removal of the first 4000 heavily distorted observations, the record contains 6 PVCs in total, none of which was correctly detected (since the detection delays exceeded the allowed maximum delay and were accordingly counted as false alarms) using D † 1 n,I,p,q while the angle-based as well as the augmented statistics D † 2 n,I,p,q and D † 3 n,I,p,q were both able to detect the first of the 6 PVC events.The detection delay is mainly due the SRC control chart.Accordingly, it would certainly be straightforward to tweak the settings by using an SRC that allows for a more agile response (i.e., lower ARL 0 ) at the cost of an increase of the false alarm rate (i.e., a decrease in specificity).

Record
A few additional remarks on the three detection statistics {D † 1 n,I,p,q , D † 2 n,I,p,q , D † 3 n,I,p,q }, however, appear warranted and the fact that in the above example of v368s our augmented statistic D † 3 n,I,p,q performs akin to D † 2 n,I,p,q rather than D † 1 n,I,p,q provides cause for elaborating on the subject.As stated previously, both D † 1 n,I,p,q and D † 2 n,I,p,q are suitable detection statistics and reliably and readily reflect deviations of new observations from the designated reference subspace.Recalling Equations ( 15) and (17) note that D † 2 n,I,p,q takes values in [0, 1] while D † 1 n,I,p,q , which our proposed method does, contrary to MZ, not require to be normalized, takes potentially very large values.Accordingly, we propose to use D † 2 n,I,p,q as a weighting of D † 1 n,I,p,q by combining them taking the Hadamard or element-wise product, which yields D † 3 n,I,p,q (see Equation (17)).The benefit of the proposed augmentation is most readily discernible in Table 2, where we can observe that D † 3 n,I,p,q mimics D † 1 n,I,p,q in cases where the latter performs well but D † 2 n,I,p,q performs very poorly.On the other hand, if the circumstances were reversed (i.e., poorly performing D † 1 n,I,p,q and better/well performing D † 2 n,I,p,q ), D † 3 n,I,p,q then mimics the better performing angle-based detection statistic.Thus, overall, the proposed augmentation is favorable in that only a minor performance drop (if at all) is experienced when both D † 1 n,I,p,q and D † 2 n,I,p,q perform well, whereas a lot is gained when they do not.Furthermore, it should be pointed out that, as has already been observed by other authors (see, e.g., [52]), there are some inconsistencies in the PC15 records in that the some of the supposedly synchronized PPG traces exhibit an unusual delay not consistent with the assumption that the PPG pulse peak should have an offset equal to the PTT with respect to the respective R peak in the ECG.Pflugradt et al. attribute this occasional unusual offset to glitches in the original measurement setup (see [52] at 11) and we assume it to have negatively impacted the performance characteristics presented in Table 2 since we allow only for a very limited detection delay τ d .Focusing on Table 3, which directly compares our results to those obtained by Pflugradt on the almost identical subset of PC15 records, we observe that on average our l-SSA-CPD outperforms Pflugradt et al.'s Fast Multimodal Ectopic Beat Detector (MEBD) on ECG records, achieving both higher sensitivity and specificity.The case becomes less clear for the respective PPG records, where MEBD exhibits higher sensitivity to the detriment of specificity, which is considerably better using l-SSA-CPD.Again, it shall be emphasized that contrary to MEBD our approach does not require a morphological dissection of the examined signal.As previously mentioned, INCART records provide only ECG traces and furthermore significantly differ from their PC15 counterparts in that they are much longer (30 min as opposed to 5 min), at times very noisy and contain occasional anomalies other than PVCs as well.Therefore, to allow for a fair comparison, we evaluated the performance of l-SSA-CPD with and without an additional preprocessing step consisting of a zero-phase FIR bandpass filter tuned to the interval of [0.04, 40] Hz.
As evidenced by two excerpts from different INCART records shown in Figure 5,the database contains both very contaminated as well as moderately contaminated signals.This directly translates to very good and very poor performance of l-SSA-CPD, as can be seen by looking up the respective records in Table 4.In fact, while our proposed algorithm performs well on the fifth record (and records of similar quality), with sensitivities of 0.82 and 0.99 for the unfiltered and the bandpass-filtered signal, respectively and a specificity of 0.99 for both, its sensitivity drops to rather poor 0.25 and 0.34 for the first record.Note that this was to be anticipated since it is consistent with the limitations of l-SSA-CPD as discussed in Section 3.1.That being said, we would nonetheless like to emphasize the surprisingly good performance of l-SSA-CPD on the ten examined INCART records with the exception of signals {I01m,I02m,I04m,I07m} both with and without the additional filtering step.In fact, l-SSA-CPD without BP-filtering still outperforms the manually fine-tuned algorithm of Uus and Liatsis for record I03m and its automatically tuned version additionally for records {I08m,I10m}.Eventually, with the simple addition of an FIR BP-filtering preprocessing step to l-SSA-CPD, while it still lags behind the algorithm by Uus and Liatsis overall, it manages to outperform it's manually fine-tuned version on half of the records, which is remarkable.Table 4. Performance comparison of l-SSA-CPD (using D † 3 n,I,p,q , Q = 1) and the SSA-based approach by Uus and Liatsis (see [48], Table I Unfortunately, we are unable to provide detailed commentary on the method proposed by Uus and Liatsis since their paper lacks in clarity and does not provide details pertaining to crucial parts of their algorithm such as how the automatic adjustment of parameters is performed.It is however safe to state that their method is far more complex than ours, for it comprises various stages (such as preprocessing, peak detection, piecewise quadratic polynomial modeling of the R-R series, re-adjustment of parameters, and the creation of a pattern dictionary) that are not required in our method.

Discussion and Outlook
In this paper, we have proposed a novel lightweight and model-free approach for the online detection of cardiac anomalies such as ectopic beats in ECG or PPG signals based on the change detection capabilities of Singular Spectrum Analysis (SSA) and nonparametric rank-based cumulative sum (CUSUM) control charts.The procedure is able to quickly detect anomalies without requiring the identification of fiducial points such as R-peaks and is computationally significantly less demanding than previously proposed SSA-based approaches.This is accomplished by modifying the conventional SSA-based change detection algorithm such that the computationally expensive task of computing the SVD is only performed at the very beginning instead of each time a new data point becomes available.Furthermore, our procedure uses more elaborate test statistics that take into account the information derived from the angle between data vectors representing new observations and the subspace representing the signals characteristics as well as the euclidean distances.Lastly, our procedure differs from previous approaches also in that rank-based control limits and the reinitialization of control charts after an anomaly has been detected are used.
Using a set of ECG and PPG records we demonstrated that the direct application of our l-SSA-CPD without any further pre-or post-processing yields not only viable but surprisingly accurate results with an average sensitivity and specificity of 0.9888 and 0.9989 for ECG and 0.8666 and 0.9987 for PPG records, respectively, which compares well and tends to outperform a recently proposed competing approach.
Furthermore, we evaluated the performance of our simpler method against that of a more complex SSA-based approach and were again able to highlight strengths as well as some weaknesses of our proposed method.
With regards to the selection and fine-tuning of SSA-parameters, the performance evaluation on records containing mostly cardiac anomalies other than PVCs and a comparative performance analysis, questions for future works are left open.

TFigure 2 .
Figure 2. Comparison of MZ (left) and l-SSA-CPD (right).The computational burden of l-SSA-CPD is greatly reduced compared to MZ by relying on the reference subspace obtained from an initial, non-sliding trajectory matrix (illustrated by the gray background area).Furthermore, note that l-SSA-CPD's reference subspace remains locked on the main signal's characteristics while in MZ, since the reference subspace is updated at each observation (illustrated by the gray area sliding as well), it will lock on the two anomalies (highlighted in pink) for some time as the two moving windows are pass over them.

Figure 3 .
Figure 3. Whilst monitoring the ECG signal (a), restarting the SRC each time the threshold was hit (c) is indicated to avoid potentially lengthy delays before the anomaly propagates through and clears X T (b).
to MZ involves: (a) Construction of the trajectory/base matrix X B = X Singular Value Decomposition of X B .(c) Selection of I = {i 1 , . . .

4 .
Monitoring of D † 1,...,3 n,I,p,q using the Sequential Ranks CUSUM Control Chart Let us denote the sequential rank of D † 1,...,3 n,I,p,q as Set the control limit h SRC as the B • (1 − ARL 0 −1 ) ordered extracted maximum value with ARL 0 being the nominal in-control average run length (ARL) (see Appendix A).

Figure 4 .
Figure 4. Excerpt of a synchronized recording of ECG (blue dash-dotted) and PPG (red solid) containing a single ectopic beat, highlighted with blue and red backgrounds respectively.Note the shift between the R-peak of the ECG and the respective pulse peak of the PPG, known as Pulse Transit Time (PTT).

Figure 5 .
Figure 5. Excerpts of a very noisy I01m (top) and a rather clean I05m (bottom) INCART signal.

Table 2 .
Detection performance of l-SSA-CPD on PC15 PPG trace.