Investigation on the Prediction of Cardiovascular Events Based on Multi-Scale Time Irreversibility Analysis

: Investigation of the risk factors associated with cardiovascular disease (CVD) plays an important part in the prevention and treatment of CVD. This study investigated whether alteration in the multi-scale time irreversibility of sleeping heart rate variability (HRV) was a risk factor for cardiovascular events. The D-value, based on analysis of multi-scale increments in HRV series, was used as the measurement of time irreversibility. Eighty-four subjects from an open-access database (i.e., the Sleep Heart Health Study) were included in this study. None of them had any CVD history at baseline; 42 subjects had cardiovascular events within 1 year after baseline polysomnography and were classed as the CVD group, and the other 42 subjects in the non-CVD group were age matched with those in the CVD group and had no cardiovascular events during the 15-year follow-up period. We compared D-values of sleeping HRV between the CVD and non-CVD groups and found that the D-values of the CVD group were signiﬁcantly lower than those of the non-CVD group on all 10 scales, even after adjusting for gender and body mass index. Moreover, we investigated the performance of a machine learning model to classify CVD and non-CVD subjects. The model, which was fed with a feature space based on the D-values on 10 scales and trained by a random forest algorithm, achieved an accuracy of 80.8% and a positive prediction rate of 86.7%. These results suggest that the decreased time irreversibility of sleeping HRV is an independent predictor of cardiovascular events that could be used to assist the intelligent prediction of cardiovascular events.


Introduction
Cardiovascular disease (CVD) has become the most common chronic disease [1]. The early and accurate prediction of CVD risk is thus of vital importance for the timely prevention and treatment of CVD events. The assessment of CVD risk and prediction of CVD events have been topics of great interest in recent decades. Multiple cardiovascular risk assessment systems, including the Framingham risk score, have been proposed and applied in clinical diagnosis. However, it remains a challenge to improve the accuracy of methods to identify the risk of CVD events automatically. Thus, exploring effective biomarkers of CVD events is of great significance.
As a reflection of the cardiovascular dynamical system, electrocardiograms (ECGs) contain substantial information associated with the physiological and pathological activity of the heart [2]. Heart rate variability (HRV), which is defined as the variation in the continuous interval of heart beats, can be extracted from ECG signals. It is widely accepted that HRV is controlled by a variety of factors, including autonomic modulation, body fluids, and blood pressure. As a non-invasive method, HRV analysis has been used in numerous studies to investigate the alterations in cardiovascular dynamics related to CVD [3,4].
It has been established and is now generally accepted that the cardiovascular system is chaotic and nonlinear [5], as it is coordinated and controlled by many factors; these include the autonomic nervous system (ANS), which is composed of the sympathetic and parasympathetic nervous systems. Time irreversibility is an important nonlinear aspect and fundamental property of the cardiovascular system [6]. If a dynamic system is time irreversible, it can return to its past state when time is reversed. In statistics, a time series is generally regarded as time irreversible only when its statistical characteristics will not change after time is reversed [7].
In recent decades, time irreversibility analysis has been widely applied in many research fields including dynamics, finance, and physiological signals [8][9][10][11][12]. Cammarota et al. proposed that the HRV signal in healthy people was time irreversible [13], whereas Costa et al. reported that the time irreversibility of HRV signals decreased with aging and with the occurrence of CVD [14]. However, despite studies showing that HRV signals in healthy controls were time irreversible and became more symmetrical with the emergence of diseases, especially CVD, it has remained unclear whether the time irreversibility of HRV has predictive value for cardiovascular events.
Almost all previous studies on time irreversibility analysis focused on 24 h HRV signals [13,14] or signals that had been sampled during the daytime when the participants were awake [15,16]. However, about a third of a person's lifetime is spent in sleep. Numerous studies have shown that sleep has important roles in human attention [17,18], working memory [19], visual learning [20], mood regulation [21], cognitive function [22,23], etc. Furthermore, the sleep-wake cycle also affects the regulation of the cardiovascular system by the ANS [24,25]. As demonstrated by Morris et al. and Neufeld et al., circadian misalignment increases CVD risk factors in humans [26], and sleep deprivation may increase the risk of CVD [27]. In addition, alterations in HRV during sleep were found to be related to the risk of CVD. Nakayama et al. revealed that increased activity during the first hour of sleep in patients with cardiovascular risk factors increased HRV [28]. Eguchi et al. showed that HRV during sleep was closely associated with an increased risk of CVD in people with type 2 diabetes [29]. Ulmer et al. demonstrated that the high-frequency component of HRV in sleep, which reflects the regulatory effect of parasympathetic nerves on the cardiovascular system, is an independent predictor of CVD [30]. Zhang et al. demonstrated that sleep heart rate variability assists the automatic prediction of long-term cardiovascular outcomes [31]. However, most of these studies only measured HRV based on time and frequency domain analyses, which may not be sufficient to determine the characteristics of the complex dynamics of the chaotic and nonlinear cardiovascular system [32].
Therefore, based on a nonlinear analysis method (i.e., time irreversibility analysis), this study evaluated the differences in HRV time irreversibility during sleep in participants who were healthy at baseline but suffered from cardiovascular events in the subsequent year (i.e., the CVD group) and in those who experienced no cardiovascular events during the 15-year follow-up period (i.e., the non-CVD group).We aimed to explore whether the change in HRV time irreversibility during sleep had predictive value for the occurrence of cardiovascular events and could thus contribute to the improvement and development of cardiovascular risk assessment systems.

Participants
The research data for this study were downloaded from a public database (the Sleep Heart Health Study) [33]. All participants underwent baseline polysomnography (PSG) for sleep monitoring between 1995 and 1998. After the baseline PSG monitoring, all the participants were followed up for 15 years, and information about cardiovascular events including coronary heart disease, angina, myocardial infarction, heart failure, and stroke was recorded.
Among the multiple signals recorded in PSG monitoring, there is one ECG channel with a sampling rate of 125 Hz. The R-wave peak of the ECG was identified using the Pan-Tompkins method [34], and the original RR intervals were then calculated. Moreover, to obtain the final HRV series for further analysis, artifacts in the original RR intervals were detected according to the following criteria: (1) less than 300 ms or 0.8 times the median RR intervals, or (2) larger than 1700 ms or 1.2 times the median RR intervals.
In this study, participants who experienced emergence of cardiovascular events during the following year after baseline PSG monitoring were selected as the CVD group. Each subject was required to have no baseline CVD history and a HRV series longer than 10,000 points after sleep onset, with a proportion of artifacts less than 10%. In this way, 42 participants were selected for the CVD group. They comprised 24 males and 18 females, aged 70 ± 9 years (mean ± standard deviation, SD), with a body mass index (BMI) of 29.28 ± 4.69 (mean ± SD) kg/m 2 . The occurrence time of the first cardiovascular event among these participants was 221 ± 114 days (mean ± SD) after baseline PSG recording. Moreover, 42 aged-matched healthy controls with neither CVD history at baseline nor experience of any cardiovascular events during the 15-year follow-up period were included as the non-CVD group. There were 17 males and 25 females in the non-CVD group, with BMI values of 29.01 ± 4.50 kg/m 2 (mean ± SD). In line with the CVD group, for each participant in the non-CVD group, the HRV series during sleep was required to be longer than 10,000 points, with a proportion of artifacts less than 10%.

Multi-Scale Time Irreversibility Analysis
For each participant, an HRV segment of 10,000 data points (after artifact removal) during sleep was used for time irreversibility analysis. Given an HRV time series {x i |i = 1, 2, . . . , N} with N data points, a coarse-grained series y τ j j = 1, 2, . . . , N τ was firstly applied according to Formula (1) [35].
Here, the integer τ represents the scale factor for coarse graining, and N τ means to round down the ratio of N and τ. When τ equals 1, the coarse-grained time series is actually the original time series {x i }.
For each coarse-grained series y τ j , differences in every two successive values were then calculated and denoted as {∆RR τ k | 1 ≤ k ≤ N τ − 1} [36], as shown in Formula (2).
In { ∆RR τ k , the components greater than 0 were denoted as { ∆RR + }, while the components less than 0 were denoted as { ∆RR − }. According to Hou et al. [36], two measurements regarding the time irreversibility of y τ j could then be calculated according to Formulas (3) and (4).
In Formulas (3) and (4), N ∆RR + and N ∆RR − represent the numbers of components in ∆RR + and ∆RR − , respectively. The time irreversibility of the HRV series could thus be quantified by calculating the Euclidean distance of (P(τ), G(τ)) to the symmetric center (50%, 50%), as shown in Formula (5) [36]: In this way, we could obtain D τ when different scale factors were considered, such as τ ranging from 1 to 10 in steps of 1.
In this way, D τ broke through the limitation that the measurement of time irreversibility is limited to a two-dimensional state space and reflects the average condition of the asymmetry of the multi-dimensional vector with respect to the main diagonal on multiple projection planes [37]. A larger value of D τ corresponded to more asymmetric values when reading the HRV series from both forward and backward directions. Thus, the metric D τ was proposed and demonstrated as a measurement of time irreversibility of the HRV series [36].
In this study, for each participant, D τ (τ ranging from 1 to 10 in steps of 1) was calculated on the corresponding HRV segment of data points. The corresponding values of D τ were denoted as D1, D2, . . . , and D10. Moreover, the average of these D-values was also computed and denoted as Dmean.

Conventional HRV Analysis
In this study, we also performed conventional HRV analysis on each HRV segment with 10,000 data points to explore whether the proposed multi-scale time irreversibility analysis outperformed the conventional approach.
Here, we used four well-established HRV indices derived from time domain, frequency domain, and nonlinear complexity analyses, i.e., the standard deviation of all RR intervals (SDNN), the power in the high-frequency (0.15-0.4 Hz) range (HF), the power in the low-frequency (0.04-0.15 Hz) range in normalized units (LFnorm) [38], and the multiscale sample entropy (MSE) [35]. HF and LFnorm were calculated on each 5 min HRV segment [38] and then averaged for the whole point segment. Similar to the proposed multiscale time irreversibility analysis, the calculation of MSE was conducted on 10 time scales of each point segment. For each scale, SE was computed with an embedding dimension of 2 and a tolerance of 2 * SD (the SD of the point segment) [35]. Then, the average of these SE values was computed and denoted by SEmean.

Classification of CVD and Non-CVD Participants by Random Forest (RF) Algorithm
To further investigate the capability of the time irreversibility of sleep HRV for CVD prediction, we used machine learning technology to classify CVD and non-CVD participants based on their D-values over 10 scales. The RF algorithm was chosen as it is widely used in classification tasks [39]. RF is a machine learning method based on the decision tree (DT) and bagging ensemble learning algorithm proposed by Leo Breiman and Adele Cutler in 2001 [40].

The Basic Principle of DT
DT is a process of classifying samples in a training set based on a series of decision rules, including classification DT and regression DT. The core idea behind the construction of a classification DT is to recursively divide the training samples into different subsets from the DT's root according to the value of a certain feature, until all sample categories in the subset at a node are the same or the features of the samples have been exhausted. For example, Figure 1 shows the decision-making process used to screen participants for the CVD group in this study. Four features were considered, i.e., whether there was a baseline CVD history or not, whether the individual suffered from cardiovascular events during the following year or not, whether the HRV series was longer than 10,000 points or not, and whether the proportion of artifacts was less than 10% or not.

Bagging Ensemble Learning Algorithm
Bagging is a parallel ensemble learning method proposed by Breiman et al. in 1996 [41]. Its basic principle is shown in Figure 2. With the help of the bagging algorithm, a stronger learner can be established by the combination of a series of weaker learners. Each weaker learner is trained independently using a subset of all the available samples based on bootstrap sampling. For classification tasks, a simple voting strategy is usually used to obtain the stronger learner in the combination stage [42].

RF Algorithm
RF is a variant of the bagging ensemble learning algorithm that adopts DTs as the weak learners and introduces random feature selection into the training process of DTs. Compared with a single DT, RF usually shows better generalization ability. The pseudocode of RF is shown in Algorithm 1. As a bagging ensemble learning algorithm, the RF algorithm generates T training sets by bootstrapping and then constructs a DT for each training set. Moreover, for each DT, a subset of features should also be randomly selected from all the features. That is, each DT in RF is trained with both independent samples and features.
In this study, the RF model was used to assign each participant to the CVD group or the non-CVD group. For each subject, there were ten features, i.e., the D-values on 10 scales (D1, D2, . . . , and D10), which were combined to form a feature vector for the participant and then fed into the RF classifiers. To achieve better generalization, five-fold crossvalidation was used in the model selection and training processes. Furthermore, two hyperparameters of the RF algorithm, i.e., the number of DTs (denoted as n_estimators) and the maximum depth of the tree (denoted as max_depth), were adjusted by grid search in Python 3.6 with the help of the sklearn library [40].

Statistical Analysis
To investigate the difference in the time irreversibility of sleep HRV between the CVD and non-CVD groups, we compared the D-values obtained on different scales for both groups. If the D-values followed a normal distribution, a t-test was used; otherwise, the Wilcoxon rank test was used. Moreover, logistic regression analysis was used to explore whether the D-value was an independent predictor of cardiovascular events after adjusting for the effects of gender and BMI, as studies have shown that women have a lower risk of CVD than men [43], and that obesity is a risk factor for CVD [44]. A value of p < 0.05 was considered to indicate a significant difference.
For comparison of the proposed multi-scale time irreversibility analysis with conventional HRV analysis, the variance inflation factor (VIF) was first used to check the multicollinearity of the HRV indices, and the area under the receiver operating characteristic (ROC) curve (AUC) was used as the measurement of the ability to discriminate between the CVD and non-CVD groups.
All the statistical analyses were performed using the SAS9.4 software (SAS Institute Inc., Cary, NC, USA).

Results
For each subject, time irreversibility analysis and conventional analysis were applied to its HRV time series with data points, which was obtained after the participant fell asleep in the night and preprocessed with artifact removal. D1, D2, . . . , and D10 were calculated and compared between the CVD and non-CVD groups. As shown in Table 1 and Figure 3, compared with the non-CVD group, all the D-values significantly decreased in the CVD group (p < 0.05), indicating a weakening of time irreversibility with a higher risk of CVD.  Moreover, logistic regression models were used to test whether the differences in D-values between the two groups were associated with gender or BMI. As shown in Table 2, a significant reduction in the D-value remained an independent predictor of cardiovascular events after correction for gender and BMI (p < 0.05). Furthermore, we used the RF algorithm to classify CVD and non-CVD subjects automatically using a feature space constructed based on the values of D1, D2, . . . , and D10. The results of the grid search and five-fold cross-validation show that the highest performance (i.e., accuracy of 80.8%, specificity of 81.3%, sensitivity of 80.0%, negative prediction rate of 72.7%, and positive prediction rate of 86.7%) was achieved when max_depth was set to 2 and n_estimators was set to 20.
Finally, we compared the ability of the various HRV indices, i.e., Dmean, SDNN, HF, LFnorm, and SEmean, to discriminate between CVD and non-CVD participants (Figure 4). The multicollinearity test showed no collinearity among these indices (VIF less than 10), and the highest AUC (0.7948) was achieved by Dmean. These results suggest that the proposed method could supplement conventional HRV analysis and might be more sensitive to the pathological dynamics of CVD than conventional HRV indices. Moreover, we calculated the AUC for D1 to separate CVD and non-CVD participants. The result (0.744) was also lower than that achieved with Dmean.

Conclusions and Discussion
In this study, we investigated the role of the time irreversibility of sleep HRV in the prediction of CVD events. Forty-two participants who were healthy at baseline but suffered from cardiovascular events in the subsequent year were included as potential CVD patients. Compared with those participants who were healthy at baseline and had no cardiovascular events in the following 15 years, the results show the following: (1) the time irreversibility of the cardiovascular system during sleep was significantly reduced in people at risk of CVD, regardless of the scale used; (2) the significant decrease in time irreversibility remained after adjusting for gender and BMI; (3) the machine learning model based on the RF algorithm showed an accuracy of 80.8% in predicting whether the participant would have CVD events within 1 year by using features derived from the proposed time irreversibility analysis, i.e., D1, D2, . . . , and D10; (4) Dmean, i.e., the averaged time irreversibility on all 10 scales, outperformed several conventional HRV indices as well as D1 in distinguishing potential CVD patients. These findings suggest that the proposed method might be a more sensitive means of capturing alterations in cardiovascular dynamics in CVD patients than conventional HRV analysis. Moreover, the measurements of time irreversibility used here (i.e., D1, D2, . . . , and D10) were potential predictors of cardiovascular events.
The human body is a complex physiological system. In order to maintain the stability of the body's internal environment, the ANS regulates the heart through various cardiovascular reflexes [45]. These reflexes often act on the heart's dynamics system with different time delays, leading to complex fluctuations in the heart rate on multiple time scales. This is the main reason why multi-scale analysis has been widely used in the study of physiological signals [35]. In this work, we calculated the D-values of 1-10 scales to explore the predictive value of time irreversibility analysis for cardiovascular events. Our results show that the AUC of Dmean was higher than that of D1, indicating that multi-scale analysis was more informative than only focusing on the original time scale in cardiac dynamics research.
Since the 1990s, nonlinear concepts and research methods have been widely used in the analysis of HRV time series. Time irreversibility analysis, a nonlinear method, has also been used successfully [13,14,46,47]. Previous studies have shown that time irreversibility was reduced among CVD patients, especially in those suffering with heart failure [14]. Our work extends the results of previous studies to demonstrate that the decrease in the time irreversibility of sleep HRV may be a potential biomarker for CVD at a very early stage. Furthermore, our results indicate that time irreversibility analysis might be associated with some aspect of the underlying mechanism of the cardiovascular system that can be captured by nonlinear complexity analysis as well as conventional time and frequency domain analyses.
Owing to the limited sample size, we only used the RF algorithm to construct a machine learning model to distinguish CVD and non-CVD participants. The model's classification was solely based on the D-values. Although the model was simple, it achieved an accuracy of 80.8% and a positive prediction rate of 86.7%, indicating its promise for applications in the early prediction of cardiovascular events to enable timely intervention. For example, it would be informative for clinical staff to include the proposed biomarker in cardiovascular risk assessment systems. Moreover, people engaged in the construction of CVD prediction models could use this biomarker as one characteristic input of their model. In addition, with the development of wearable devices, e.g., smart watches, HRV data obtained during sleep are becoming increasingly accessible. These wearable devices could use this biomarker to monitor the user's cardiovascular health.
In conclusion, in this study, we demonstrated the predictive value of the multi-scale time irreversibility of sleep HRV for CVD events. However, there were still some limitations. Considering the influence of sample quantity on the accuracy of models, it would be necessary to confirm the results of this study in a wider range of subjects. Moreover, adverse events such as snoring and sleep apnea could interfere with ECG, but information about such events was not available in the database. Therefore, future efforts could also focus on exploring the effects of these adverse events on the time irreversibility of sleep HRV. Institutional Review Board Statement: Not applicable.

Informed Consent Statement:
In this study, we employed data from an open-access database, i.e., the Sleep-EDF database available at Sleep Heart Health Study. The study protocol of the used datasets was approved by the institutional review board of each participating center, and each participant signed informed consent. All methods were carried out in accordance with relevant guidelines and regulations. The current study only analyzed de-identified data from those databases and did not involve a research protocol requiring approval by the relevant institutional review board or ethics committee.

Data Availability Statement:
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: Sleep-EDF Database Expanded: https://sleepdata.org/datasets/shhs (accessed on 22 September 2021).

Acknowledgments:
The authors would like to acknowledge a graduated student, Lulu Zhang, for her useful suggestions on the experiments.

Conflicts of Interest:
The authors declare no conflict of interest.