ECG Classification Based on Wasserstein Scalar Curvature

Electrocardiograms (ECG) analysis is one of the most important ways to diagnose heart disease. This paper proposes an efficient ECG classification method based on Wasserstein scalar curvature to comprehend the connection between heart disease and the mathematical characteristics of ECG. The newly proposed method converts an ECG into a point cloud on the family of Gaussian distribution, where the pathological characteristics of ECG will be extracted by the Wasserstein geometric structure of the statistical manifold. Technically, this paper defines the histogram dispersion of Wasserstein scalar curvature, which can accurately describe the divergence between different heart diseases. By combining medical experience with mathematical ideas from geometry and data science, this paper provides a feasible algorithm for the new method, and the theoretical analysis of the algorithm is carried out. Digital experiments on the classical database with large samples show the new algorithm’s accuracy and efficiency when dealing with the classification of heart disease.


Introduction
As one of the most common and deadly diseases in the world, heart disease poses a significant threat to people's happiness and healthy life. With the increase of social pressure, the incidence of heart disease is proliferating in recent years [1]. Therefore, it is of great indispensability to realize efficient diagnosis, real-time monitoring and prediction for heart disease. At present, the diagnosis is mainly made through doctors' manual analysis of ECG (electrical activity records of patient's heart contraction) [2]. With no exception, ECG analysis requires doctors to command professional and detailed medical knowledge. However, as a kind of valuable medical resource, the distribution of experienced doctors is not well balanced in different regions of the world. Thus, the life and health of patients can not be fully protected in a situation with poor medical conditions and resources.
With the rapid development of computer technology, using computer-aided diagnosis technology to screen heart diseases has become a new technical method to alleviate the imbalance of artificial medical resources. The normal ECG, that is healthy ECG, shows regular changes in the PQRST complex. PQRST complex [3] is the pattern of the electrical activity of the heart during one cardiac cycle as recorded by ECG, including P-R interval, QRS complex, Q-T interval and S-T segment. Furthermore, different pathological features will cause symmetry to vanish. The computer-aided diagnosis techniques aim to extract different features of ECG and detect fluctuation of the PQRST complex in view of these features to achieve accurate classification.
Commonly used computer-aided diagnosis techniques mainly extract features from the viewpoints of signal analysis, dynamic system modeling (DSA) and topological data analysis (TDA), which are also combined with classic statistical analysis [4] and machine learning [5][6][7][8][9][10]. Signal analysis can directly extract morphological features such as amplitude [11,12] or use the wavelet transform to acquire frequency domain features [13,14]. Other emerging signal analysis methods [15] also have the potential to analyze ECG. Dynamical system analysis is used to model cardiac dynamical system through phase space

Preliminaries
In this section, we will introduce some basic preliminaries such as STFFT, local statistics and Wasserstein geometric structure on SPD(n).

Short-Time Fourier Fast Transform
Fourier transform decomposes signals into waves with different frequencies and reveals certain features hidden in the time domain. For discrete inputs, fast Fourier transform (FFT) is a widely used tool. Given a signal T = {t i } n i=1 with even n, t i can be represented as: where If the sample frequency is f s , by combining Equations (1) and (2), we have where t i = t i f s . Now we use FFT and sliding windows (short-time Fourier transform) to convert signal T = {t i } n i=1 to a point cloud in R d where d is odd. Let l be window length and τ be sliding speed. Firstly, we transform T into a signal set P l,τ (T) = {p k }n −1 k=1 withn = n−l τ and p k = t kτ , t kτ+1 , · · · , t kτ+(l−1) , where · denotes floor operator. Secondly, we transform each p k into a point in R d by choosing the first d bases from FFT and obtain the following point cloud: where S l,τ,d (T) = 2 l a k 0 , a k and d ≤ l.

Local Statistic
Objective phenomena in nature are often disturbed by many small random variables, which assign a random distribution with the local neighborhood of any point in the point cloud. If the factors which affect the local distribution of a point cloud are considered small and complex enough, then according to the law of large numbers and the central limit theorem, we can assume that the local statistics come from a high-dimensional Gaussian distribution whose parameters are neighborhood mean and neighborhood covariance matrix.
In data science, k-nearest neighbors (kNN) algorithm algorithm provides a natural neighborhood selection method. The idea is to search for the nearest k points as the neighborhood samples of any fixed point in the point cloud. To acquire local statistics, for every point P i ∈ S F (T), we can search kNN to obtain neighborhood N i = {N ij |j = 1, · · · , k}, and calculate local mean µ i and covariance matrix Σ i : Consequently, S F (T) is converted to a point cloud in SPD(d):

Wasserstein Geometric Structure on SPD(n)
Wasserstein distance describes the minimal energy used to transport one distribution to another. It can be used to measure the difference between two distributions and is vividly called earth-moving distance [28,[30][31][32][33].
Let F 1 , F 2 be two distributions. Then the Wasserstein distance between F 1 and F 2 based on the L p norm is defined as the infimum of geodesic distance integral needed for transporting probability measure element [34]: where Π(F 1 , F 2 ) is the set of joint distributions taking F 1 , F 2 as marginal distributions and E is the expectation. Although the definition is abstract and there is no explicit expression for the general Wasserstein distance, the Wasserstein distance based on the L 2 norm between any two Gaussian distributions in R n has the following explicit expression [35]: where µ i and Σ i are the mean and covariance matrix of Gaussian distribution N i , i = 1, 2.
Wasserstein distance can be induced by a Riemannian metric on SPD(n) defined as: where S ∈ SPD(n), X, Y ∈ T S SPD(n) are tangent vectors and Because the geodesic distance induced by Equation (9) is consistent with the original definition of Wasserstein distance (8), we call g W Wasserstein metric. In addition, we write the Riemannian manifold SPD(n) endowed with Wasserstein metric as (SPD(n), g W ).
For any S ∈ (SPD(n), g W ), let X, Y be the smooth vector field of SPD(n). P.A. Absil et al. provide the explicit expression of Riemannian curvature tensor R XY X, Y at S [29,37]: Furthermore, the scalar curvature at S satisfies where {e i } is any standard orthonormal basis of T S SPD(n), Λ = diag(λ 1 , · · ·, λ n ) is orthogonally similar to S, λ i is the ith eigenvalue of S, and U = 1 λ i +λ j i<j is an upper triangular matrix.
Note that the scalar curvature of S can be controlled by the second small eigenvalue of S. In fact, there exists a standard orthonormal basis {e k } of T S SPD(n), such that ∀ e k 1 , e k 2 ∈ {e k }, 0 < K S e k 1 , e k 2 = n ∑ j=1 R e k 1 , e k j , e k 2 , e k j < 3 where λ min2 is the second small eigenvalue of S. Furthermore, by Equation (11), we have The boundedness of WSC indicates that the curvature of the local covariance matrix is controllable unless the local covariance degenerates in two dimensions or more. Consequently, for most neighborhoods, as long as using appropriate embedding methods to make sure there is no degeneration beyond two dimensions, the WSC of point cloud on SPD(n) will be in a controllable range, which provides a theoretical criterion for the robustness of our algorithm.

ECG Classification Algorithm Based on Wasserstein Scalar Curvature
In this section, we will introduce an ECG classification algorithm based on Wasserstein scalar curvature (WSCEC) which can detect heart disease. Wasserstein scalar curvature dispersion is extracted to reveal the change of regularity of the PQRST complex. The framework of the WSCEC algorithm is as follows. • Continuous ECG signals are segmented and denoised by interpolation and filter to obtain multiple single heartbeats. • Every single ECG is transformed into a point cloud in Euclidean space by STFFT. Through local statistics, the point cloud in Euclidean space is converted to the point cloud on SPD(n).

•
Calculate the WSC of each point to obtain WSCH and extract WSCD as the feature.
• Do auxiliary diagnosis according to clustering results.
The intuitive algorithm pipeline is shown in Figure 1.

Preprocessing of ECG Signal
We adopt the idea of Butterworth filter algorithm [38] to cut off noisy portions with spectral power over 50 Hz. By developing a local search algorithm to find the periodic R-peak in the PQRST complex, we transform continuous ECG signals into multiple single heartbeats with a length of 300. Noting that the length of the sharp part of the QRS complex in normal ECG is around 10, we set window length l = 10 and sliding speed τ = 1 to emphasize the change of the QRS complex when sliding the window. Then by using Equation (4) and taking d = 3, we convert various single heartbeats to point clouds in R 3 . Let T s be a standard normal ECG signal and denote S F (T s ) as the Euclidean point cloud of T s after STFFT. Figure 2 shows In an attempt to describe the local differences of Euclidean point clouds more accurately, we obtain neighborhood properties by local statistics. We combine kNN algorithm with Equation (5) to obtain S P (T s ), and the parameter k of kNN algorithm is 20. With an attempt to give readers an intuitive understanding of the structure of point clouds on SPD(3), we acquire Wasserstein distance matrices by Equation (8) and present the grayscale images of Wasserstein distance matrice for T s , see Figure 3.

Feature Extraction
The pathological differences of original ECG signals are completely reflected by the local information of point clouds, and the differences of local information are further reflected in the neighborhood mean and covariance matrix, that is, reflected in the pointby-point difference of point clouds on SPD(n). Since Wasserstein scalar curvature reflects the structural relationship between a point and its adjacent points, we can use Wasserstein scalar curvature (WSC) to describe different pathological features of ECG signals.
Firstly, we calculate WSC of each point in the distribution point cloud, then the corresponding WSC sequence W = {w i }n i=1 can be obtained. Furthermore, given m and b, we can acquire Wasserstein scalar curvature histogram (WSCH): where · denotes the integer operator and | · | denotes the cardinality or size of a finite set. where We call cur 1 (m, b, s) Wasserstein scalar curvature transverse dispersion of W, cur 2 (m, b, s) Wasserstein scalar curvature longitudinal dispersion of W and cur(m,b,s) Wasserstein scalar curvature dispersion of W, respectively.
In histogram H(m, b), transverse dispersion cur 1 (m, b, s) is the median of the intersection between W and [ms, b]. cur 1 The longitudinal dispersion cur 2 (m, b, s) represents the fluctuation of the column, which can be regarded as the correction of the standard deviation. If the column heights of some curvature intervals in H(m, b) are 0, then this fluctuation is further amplified. cur 2 (m, b, s) describes the uniformity of W ranging in [m(s + 1), b] vertically. In particular, Therefore, cur(m, b, s) describes the homogeneity of W overall, and cur(m, b, s) is closer to 1 2 (b + ms), 0 if the columns are more evenly distributed. Since the ECG of healthy heartbeats has strong regularity, their WSCD is more even, as shown in Figure 5. We consider WSCD cur(m, b, s) as the feature of our final classification.   B.B.B.) is obviously broadened, and generally, there are two R peaks. The P wave of atrial premature heartbeats (A.P.) occurs earlier and is significantly different from that of the sinus. P.V.C. has the larger QRS complex amplitude, which always companies with more significant range differences in the waves. The QRS complex in the fusion of ventricular and normal heartbeats (F.V.N.) is the fusion of normal heartbeat and ventricular flutter heartbeats (V.F.), and its deterioration will change into V.F. whose waveform is similar to a sine wave, in which case cardiopulmonary resuscitation is needed for treatment.   Figure 7, the differences of waves are reflected in the differences of local information of point clouds in R 3 . Notice that the local information of the point clouds of ECG signals with diseases is significantly different from those of normal ECG signals except A.P., this may be due to the fact that A.P. is generated by atrial abnormal excitation foci in advance, and sometimes there are only P wave differences with normal ECG signals. Therefore, their local structures of Euclidean point clouds are similar. There are also similarities among the point cloud structures of P.V.C., F.V.N., and V.F., which may be due to the fact that these three types of diseases are also generated from ventricular ectopic excitation foci, their pathogenesis and trend also have a certain progressive relationship. The similarities of local structures between different ECG signal point clouds also reflect the necessity of introducing WSC to describe the differences of such fine structures more accurately.

Case Analysis
By local statistics, we change the point clouds of ECG signals with diseases into the point clouds on SPD(3). The grayscale images of Wasserstein distance matrices describe the dispersion of distribution point clouds on SPD(3), where black represents the zero distance and white represents the maximum distance. Local structure differences of distribution point clouds can be visually presented in Figures 3 and 8. We calculate the WSC of each point to precisely characterize the differences in neighborhood information of different point clouds on SPD(3). WSCHs are also formed, as shown in Figure 9.
WSC sequences corresponding to seven kinds of ECG signals are almost located at [0, 200], as shown in Figures 4 and 9. By Equation (13), it can be inferred that the neighborhood information of most points in the Euclidean point clouds does not have more than two-dimensional degradation, which shows the effectiveness of the STFFT method and the selected parameters.
In the histogram of normal ECG signal and A.P., their columns are evenly distributed horizontally and the columns of A.P. fluctuate more violently. The histograms of P.V.C.,    For those heartbeats that land in D 4 = D − ∪ 3 j=0 D j , we consider these heartbeats are from other abnormal areas. Thus, we derive a symptom description domain partition D = ∪ 4 j=0 D j . Given an ECG signal T i , the auxiliary diagnostic analysis of heart disease is as follows: we think T i has the pathological features of both D 2j and D 2k . In this case, we cannot classify T i , but we can label it ventricular abnormal.
we cannot classify T i , but we can label it bundle branch block. • If cur T i (m, b, s) ∈ D 4 , T i cannot be classified. Now we give our algorithm to show how to classify the ECG signals and carry out the auxiliary diagnosis.

WSCEC Algorithm
j=0 Q j be the set of the given ECG signal where T 0 is the set of normal ECG in T , T j (1 ≤ j ≤ r) is the set of ECG with jth pathological feature in T , and Q j is the original heartbeat set which is corresponding to symptom description domain D j . Then WSCEC algorithm is shown in Algorithm 1.

Algorithm 1 WSCEC algorithm
Input: ECG set T ; parameter k, m, s, Output: Classification result T = ∪ r+1 j=0 T j ∪ Q 4 = ∪ 4 j=0 Q j 1: Choose standard normal ECG signal T s 2: For every ECG signal T i in T , acquire point cloud S F (T i ) after STFFT by Equation (4) For the output of Algorithm 1, T 0 represents the healthy heartbeat set, T j (1 ≤ j ≤ r) represents the heartbeat set with jth pathological feature, and T r+1 represents the unclassified heartbeat set with label of lesion area. Q j denotes the classified heartbeat set which is corresponding to the symptom description domain D j . Thus, for an unclassified heartbeat T i ∈ T r+1 , although we cannot know the exact type of T i , we can also know which part of the heart is abnormal.

Digital Experiment
In this section, the main numerical results are introduced. The sampled data comes from MIT-BIH Arrhythmia Database [39]. We sample 5000 heartbeats with distinguishing features, including 2500 normal heartbeats (N), 1000 premature ventricular contraction heartbeats (P.  Figure 11 shows the calculation results of WSCD for the 5000 segment ECG signals sampled, the parameters are = 0.09, b = 3d(d−1) = 200, m = 1, and s = 0. Note that there are overlaps between bundle branch block heartbeats and abnormal ventricular heartbeats such as F.V.N. and P.V.C., this may be because all these heartbeats have widened QRS complex and some of them are indeed pretty similar. We introduce sensitivity or true positive rate (TPR), noise removal rate (NRR), precision or positive predictive value (PPV) and F 1 score to show the efficiency of our symptom description domain partition. Let T = ∪ 4 j=0 Q j be the original ECG set and T = ∪ 4 j=0 Q j be the set after classification. Then for all 0 ≤ j ≤ 4,

Results of WSCEC Algorithm
where | · | denotes the cardinality or size of a finite set.
TPR j describes the accuracy of original ECG signals with lesion area j preserved by the new classification. NRR j represents the success rate of removing ECG signals except j. PPV j describes the proportions of a positive result and F 1 score shows the accuracy of the test, which is appropriate when working with imbalanced data [40]. It can be intuitively understood that higher TPR and NRR represent better classification ability. Table 1 shows the classification accuracy of symptom description domain partition. Intuitively, the normal ECG can be very accurately separated from other heartbeats with pathological features. To show the classification results of each specific disease and the validity of symptom description domain partition, we use principal component analysis (PCA) to obtain the confidence elliptic region [41], as shown in Figure 12. Notice that almost every confidence elliptic region is contained in the corresponding symptom description domain, hence the selection of our symptom description domain partition has strong robustness. Although the results from normal ECG signals are basically identical, ECG signals with different pathological features tend to have different degrees of change and ECG signals may vary even if they share the same pathological feature. Thus, firstly, we infer the location of cardiac abnormalities such as the atrium, ventricle and atrioventricular bundle. Secondly, we give the reference diagnostic results. For those ECG signals in which we can infer the location of cardiac abnormality but cannot be sure of the exact disease, doctors need to make further diagnoses. Furthermore, we can estimate the severity of pathology according to the longitudinal dispersion. Table 2 shows the classification result of pathological classes by PH method. The main idea is to utilize persistent homology to construct a topological structure to classify point clouds after STFFT. It is noted that the PH method can characterize the normal and pathological ECGs, but the performance of classifying different pathological ECGs is not satisfactory. This may be because the topological method largely reflects the overall feature of the point cloud and ignores the local difference. However, the WSCEC algorithm emphasizes local structure and makes up for this defect.  Table 3 shows the classification result of pathological classes by CNN method. We sample 1500 ECG signals as the training dataset and set the other 3500 ECG signals as the test database. The overall accuracy of the CNN method is similar to the WSCEC method. It is noted that F 1 score of atrial abnormality in CNN is lower than WSCEC, which may be because the training dataset of atrial abnormality is small. This observation reflects the advantage that our WSCEC method does not depend on training samples. As the training samples increase, such as ventricular abnormal and bundle branch block, the accuracy of the CNN method becomes higher, which is a little bit better than WSCEC.

Conclusions, Limitations and Future Works
In this paper, a novel ECG-assisted classification algorithm based on Wasserstein scalar curvature is proposed. By introducing Wasserstein scalar curvature, the WSCEC algorithm more accurately describes the neighborhood information differences of point clouds obtained after STFFT and shows its ability to classify heart-healthy conditions, especially to recognize atrial fibrillation and ventricular fibrillation. Meanwhile, WSCEC presents its potential in predicting the developing tendency of heart diseases, which might decline the incidence of sudden death. Consequently, a well-packaged heart disease prediagnosis system might be designed.
WSCEC algorithm is an original attempt to incorporate geometric invariants into medical research, however, there remain some limitations. More precise classification of heart diseases in each pathological class needs to be studied. In addition, we will verify the effectiveness of the WSCEC method on many other available ECG databases including more different arrhythmias in the future.
For further research, we hope to combine Wasserstein scalar curvature, topological features and machine learning to comprehensively investigate the signal and obtain more efficient algorithms. Besides, the standard 12-lead ECG auxiliary diagnosis is expected to be studied. In addition, the WSCEC algorithm can be further applied to a variety of signal research, such as signal identification and Electroencephalogram (EEG) analysis. We believe that the geometrical method will show more potential in data science.