4.1. Implementation Details and Evaluation Metrics
4.1.1. Implementation Details
The software and hardware environments used in this study are as follows: the operating system is Ubuntu 20.04.6 LTS (Canonical, London, UK), the NVIDIA graphics card is NVIDIA GeForce RTX 4090 (NVIDIA Inc., Santa Clara, CA, USA), the deep learning framework is PyTorch 2.1.0 (Meta Inc., Menlo Park, CA, USA), and the parallel computing platform is CUDA 12.1 (NVIDIA Inc., Santa Clara, CA, USA). During the model training process, the cross-entropy loss function and the Adaptive Moment Estimation (Adam) optimizer were adopted. The initial learning rate was set to 0.001, and it was reduced by a factor of 0.1 after every 5 training epochs. The batch size of the model was set to 32, and the maximum number of training epochs was set to 40 to ensure the stability of the model.
For linear feature extraction, the process involved computing 5-dimensional handcrafted features from ECG signals, including time-domain, statistical, and morphological characteristics. These features were normalized using StandardScaler to ensure zero mean and unit variance. Key hyperparameters included a 15,000-point analysis window with no overlap and physiologically plausible heart rate constraints.
For the nonlinear feature extraction, Detrended Fluctuation Analysis (DFA) was employed to capture long-range correlations in heart rate variability, yielding a 1-dimensional scaling exponent. The DFA implementation required careful parameter selection including logarithmic scaling ranges (4–16 for short-term and 16–64 for long-term fluctuations), 50% segment overlap, and first-order detrending. Additional non-linear measures such as sample entropy were considered with embedding dimension 2 and tolerance threshold 0.2.
For the TCN-Seq2vec deep feature extraction module, a temporal convolutional network with multi-scale processing automatically learned hierarchical representations from raw ECG signals. The TCN architecture employed progressively increasing channels (32, 64, 128) with exponential dilation factors (1, 2, 4). A temporal attention mechanism with reduction to 64 channels and tanh activation highlighted clinically relevant segments. Comprehensive regularization included spatial dropout (0.5), feature dropout (0.3), and attention dropout (0.2) to prevent overfitting.
4.1.2. Evaluation Metrics
To comprehensively evaluate the performance of the proposed method, precision (Pre), recall (Rec), accuracy (Acc), and F1-score (F1) were selected as the evaluation metrics. Their calculation formulas are shown in Equations (11) to (14):
Here, True Positive (TP) and True Negative (TN) represent the number of correctly classified positive samples and negative samples, respectively; False Positive (FP) and False Negative (FN) represent the number of incorrectly classified positive samples and negative samples, respectively.
4.2. Experimental Result
To evaluate the generalization capability of the proposed method, we conducted 10 independent repeated experiments for each of the six datasets that we obtained during the signal preprocessing (SCD10, SCD20, SCD30, SCD40, SCD50, SCD60). In each experiment, the dataset was randomly partitioned into training and testing sets following an inter-patient paradigm, with specific record numbers detailed in
Table 3 (where each record number corresponds to a distinct patient). To systematically present the findings,
Table 4,
Table 5,
Table 6,
Table 7,
Table 8 and
Table 9 provide detailed experimental results for six datasets.
As shown in
Table 4, the SCD10 model achieved consistently high performance metrics (Rec = 99.00%, Pre = 99.02%, Acc = 99.00%, F1 = 99.00%) during the 0–10 min window preceding ventricular fibrillation (VF) onset, demonstrating excellent stability across all experimental trials.
Table 5 reveals that the SCD20 model maintained equally outstanding performance (Rec = 99.00%, Pre = 99.04%, Acc = 99.00%, F1 = 99.00%) in the 10–20 min pre-VF window, matching the SCD10 model’s effectiveness.
The results in
Table 6 indicate a slight performance degradation for the SCD30 model (Rec = 98.40%, Pre = 98.43%, Acc = 98.40%, F1 = 98.40%) during the 20–30 min pre-VF window, though it retained high accuracy and stability compared to earlier time windows.
Performance trends show further decline in the 30–40 min pre-VF window (
Table 7), where the SCD40 model yielded metrics of Rec = 97.00%, Pre = 97.19%, Acc = 97.00%, F1 = 97.00%. Notably, Experiments 3 and 5 demonstrated significantly lower performance, suggesting reduced model stability compared to shorter prediction windows.
This downward trend continues in the 40–50 min pre-VF window (
Table 8), with the SCD50 model achieving Rec = 96.50%, Pre = 96.67%, Acc = 96.50%, F1 = 96.50%. The increased frequency of false positives (FPs) across multiple experiments indicates deteriorating model reliability.
The poorest performance occurs in the 50–60 min pre-VF window (
Table 9), where the SCD60 model’s metrics drop to Rec = 95.00%, Pre = 95.48%, Acc = 95.00%, F1 = 94.98%. The substantial increase in both false negatives (FNs) and FPs confirms significantly reduced model stability at this extended prediction horizon.
The SCD10 and SCD20 models demonstrated exceptional performance across all evaluation metrics, maintaining both high accuracy (≥99%) and remarkable stability. These results indicate that our proposed method can reliably predict SCD events within the critical 20 min window preceding onset.
While the SCD30 and SCD40 models showed modest performance degradation compared to their shorter-term counterparts, they still maintained accuracy levels above 97%. The observed decline became more pronounced in the SCD50 and SCD60 models, particularly for SCD60, potentially due to diminished feature discriminability between SCD and NSR samples in these extended timeframes.
Our analysis reveals an important temporal pattern: SCD signals occurring closer to VF onset exhibit more distinct characteristics compared to normal ECG, while those further removed (50–60 min pre-onset) demonstrate greater similarity to normal sinus rhythm. This finding explains the observed performance gradient across time windows.
As summarized in
Table 10, the proposed risk prediction model achieved an average accuracy of 97.48% across 10 independent trials, with individual experiment accuracy ranging from 95.50% to 98.67%. The consistent performance (Rec, Pre, and F1 all >97%) confirms balanced and stable identification of positive cases. The relatively lower accuracy (95.50%) in Experiment 3 may be attributed to reduced feature discriminability in its particular test set composition.
As shown in
Figure 9, statistical analysis was conducted to assess significance between performance metrics. The model achieved an average recall of 97.48% (SD = 0.82%), precision of 97.64% (SD = 0.71%), and F1-score of 97.48% (SD = 0.83%). Pairwise t-tests revealed that precision was statistically significantly higher than both recall (
p = 0.004) and F1-score (
p = 0.004), while no significant difference was found between recall and F1-score (
p = 0.104). The lower standard deviation observed for precision (0.71%) compared to recall (0.82%) and F1-score (0.83%) suggests more consistent performance in positive predictive value across experimental trials. All metrics demonstrated low variability (SD < 0.85%), indicating stable model performance.
These comprehensive results demonstrate that our method provides reliable predictions in most experimental scenarios, exhibiting both good generalizability and robustness for clinical SCD risk assessment.
Table 11 presents a comprehensive comparison between our proposed method and existing approaches in the literature. The results demonstrate significant advancements in both prediction accuracy and clinical applicability. The comparative analysis follows the standard practice in the field, benchmarking reported performances on the common task of SCD prediction from short-term ECG signals. While direct implementation of all cited methods with an identical preprocessing pipeline is not feasible, the comparison remains valid as it reflects the performance achieved by different methodological paradigms on standard clinical prediction tasks using public datasets. The consistent and significant performance advantage of our proposed method, as established through our rigorous internal validation, indicates that its superiority is primarily attributable to the model architecture rather than minor variations in data preparation.
Tseng et al. [
31] developed a CNN-based model utilizing 2D short-time Fourier transform or continuous wavelet transform (CWT) for ECG feature extraction, achieving 88% accuracy in SCD prediction 5 min before onset. Chen et al. [
32] employed phase space reconstruction and fuzzy C-means clustering for HRV analysis, reaching 98.40% accuracy at the 5 min prediction window.
Several studies have explored nonlinear analysis techniques: Khazaei et al. [
13] applied increment entropy and recurrence quantification analysis to HRV signals, obtaining 95% accuracy 6 min pre-SCD. Ebrahimzadeh et al. [
14] combined time-domain, frequency-domain, time-frequency, and nonlinear features with TLSFS-based feature selection, achieving 82.85% accuracy 13 min before SCD onset.
Advanced signal processing methods have shown promising results: Shi et al. [
16] implemented discrete wavelet transform (DWT) and locality preserving projections (LPP) with sophisticated feature ranking techniques, attaining 97.6% accuracy 14 min pre-SCD using only 5 LPP features. Centeno-Bautista et al. [
35] combined complete ensemble empirical mode decomposition (CEEMD) with statistical analysis and SVM classification, reaching 97.28% accuracy 30 min before SCD.
Longer-term prediction studies include Gao et al. [
21], who developed a specialized algorithm for low-SNR single-lead ECG signals incorporating 12-dimensional features (including ventricular late potentials and T-wave alternans), achieving 93.22% accuracy 30 min pre-SCD (improving to 95.43% with NSR controls). Abrishami et al. [
33] proposed a Deep Bidirectional LSTM (DBLSTM) network for ECG interval segmentation, achieved 90% accuracy in T-Wave segmentation for ECG analysis. This architecture employs stacked bidirectional LSTM layers to process sequences in both forward and backward directions. Saragih and Isa [
34] proposed an SCA prediction system that integrates Wavelet Packet Transform with a CNN to identify subtle patterns in ECG recordings. Evaluated through 10-fold cross-validation on one-minute segments taken 30 min before onset, the model attained an accuracy of 95.89%. Jablo et al. [
36] employed empirical mode decomposition (EMD) with nonlinear feature extraction and SVM/KNN classification, obtaining 94.42% accuracy 60 min before SCD onset.
Notably, our method requires only three carefully selected features to achieve superior performance (97.48% accuracy) at the 60 min prediction window, while offering three key advantages:
- (1)
Providing substantially longer clinical intervention time (60 min vs. typically <30 min in literature)
- (2)
Maintaining lower computational complexity through efficient feature selection
- (3)
Delivering more consistent performance across extended prediction windows
This comparative analysis demonstrates that our approach represents a significant improvement over existing methods in terms of both early prediction capability and practical clinical implementation.
To further validate the proposed method, we performed additional evaluation using the Creighton University Ventricular Tachyarrhythmia Database (CUDB), which contains 35 eight-minute ECG recordings. Based on expert annotations, we identified 7 subjects with documented ventricular fibrillation (VF), ventricular tachycardia (VT), or atrial fibrillation (AF) (Record Numbers: cu01, cu02, cu03, cu09, cu16, cu18, cu21).
Following the same experimental protocol, we conducted 10 repeated random trials using an inter-patient paradigm for dataset partitioning. The training set was composed of 13 patients from the NSR dataset and 18 patients from the SCD dataset, while the test set included the remaining 5 patients from the NSR dataset along with 5 patients from CUDB (Record Numbers: cu01, cu02, cu03, cu09, cu16).
As demonstrated in
Table 12, our method achieved consistently strong performance on this independent validation dataset. These results indicate that the proposed approach maintains excellent generalization capability when applied to different patient populations and clinical datasets. The robust performance across multiple validation scenarios underscores the method’s potential for reliable clinical implementation in SCD risk assessment.
4.4. Complexity Analysis of the Proposed Method
To thoroughly evaluate the computational efficiency and practical deployment potential of our proposed model, we conducted a detailed analysis of its time and space complexity. The analysis indicates that the model achieves a favorable balance between high performance and computational cost, making it suitable for potential clinical applications.
Time complexity is measured in terms of the number of Floating Point Operations (FLOPs) required for a single forward pass of one input sample. Space complexity is quantified by the total number of learnable parameters. The detailed breakdown is presented in
Table 14.
The analysis in
Table 14 yields several key observations:
Localized Computational Burden: The computational bottleneck is concentrated in the TCN module, with the third TCN block alone accounting for over 52% of the total FLOPs. This is a direct result of its large number of output channels (128) operating on the full-length sequence (L = 15,000). This design is an intentional trade-off to extract high-level, abstract features from the ECG signal, which is crucial for achieving high accuracy in SCD risk prediction.
High Parameter Efficiency: Despite the significant computational load, the model maintains a relatively modest parameter count of approximately 192.9 K parameters. This demonstrates high parameter efficiency, which is largely attributable to the weight-sharing mechanism of one-dimensional convolutions. This efficiency allows the model to effectively process long sequences without an explosion in the number of parameters, reducing the risk of overfitting.
Feasible Deployment Profile: With a total complexity of approximately 612.85 MFLOPs per sample, the model is computationally feasible for modern GPU hardware. When processed in batches (e.g., Batch Size = 16), real-time or near-real-time inference can be readily achieved. This level of complexity is justifiable given the critical nature of the SCD prediction task and supports the model’s potential for future clinical integration.
In summary, the proposed model strikes a balance between performance and efficiency by strategically allocating computational resources to its core feature extractor (the TCN), resulting in a robust yet practically deployable architecture for SCD risk assessment.