ECG Waveform Segmentation via Dual-Stream Network with Selective Context Fusion

Niu, Yongpeng; Lin, Nan; Tian, Yuchen; Tang, Kaipeng; Liu, Baoxiang

doi:10.3390/electronics14193925

Open AccessArticle

ECG Waveform Segmentation via Dual-Stream Network with Selective Context Fusion

by

Yongpeng Niu

¹,

Nan Lin

^1,*,

Yuchen Tian

²,

Kaipeng Tang

¹ and

Baoxiang Liu

¹

School of Cyber Science and Engineering, Zhengzhou University, Zhengzhou 450000, China

²

School of Economics and Management, Beijing Institute of Petrochemical Technology, Beijing 102627, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(19), 3925; https://doi.org/10.3390/electronics14193925

Submission received: 8 August 2025 / Revised: 21 September 2025 / Accepted: 29 September 2025 / Published: 2 October 2025

Download

Browse Figures

Versions Notes

Abstract

Electrocardiogram (ECG) waveform delineation is fundamental to cardiac disease diagnosis. This task requires precise localization of key fiducial points, specifically the onset, peak, and offset positions of P-waves, QRS complexes, and T-waves. Current methods exhibit significant performance degradation in noisy clinical environments (baseline drift, electromyographic interference, powerline interference, etc.), compromising diagnostic reliability. To address this limitation, we introduce ECG-SCFNet: a novel dual-stream architecture employing selective context fusion. Our framework is further enhanced by a consistency training paradigm, enabling it to maintain robust waveform delineation accuracy under challenging noise conditions.The network employs a dual-stream architecture: (1) A temporal stream captures dynamic rhythmic features through sequential multi-branch convolution and temporal attention mechanisms; (2) A morphology stream combines parallel multi-scale convolution with feature pyramid integration to extract multi-scale waveform structural features through morphological attention; (3) The Selective Context Fusion (SCF) module adaptively integrates features from the temporal and morphology streams using a dual attention mechanism, which operates across both channel and spatial dimensions to selectively emphasize informative features from each stream, thereby enhancing the representation learning for accurate ECG segmentation. On the LUDB and QT datasets, ECG-SCFNet achieves high performance, with F1-scores of 97.83% and 97.80%, respectively. Crucially, it maintains robust performance under challenging noise conditions on these datasets, with 88.49% and 86.25% F1-scores, showing significantly improved noise robustness compared to other methods and demonstrating exceptional robustness and precise boundary localization for clinical ECG analysis.

Keywords:

ECG segmentation; deep learning; ECG delineation

1. Introduction

Cardiovascular diseases (CVDs) represent a leading global health burden, accounting for over ten million annual fatalities according to World Health Organization estimates. As one of the world’s most detrimental disease categories, CVDs encompass conditions such as heart disease, ischemic and hemorrhagic stroke, heart failure, various arrhythmias, and valvular disorders. Early detection through preventive measures is crucial for mitigating the severe outcomes associated with these conditions. Electrocardiography (ECG), which captures the heart’s electrical activity from the body surface, serves as a fundamental diagnostic tool for cardiac ailments. Clinicians rely on ECG tracings to assess cardiac health, distinguishing between benign and pathological states [1]. This analysis hinges on the morphology, duration, and intervals (e.g., PR, ST, QT) delineating the characteristic waveforms—namely the P wave, QRS complex, and T wave—that constitute each cardiac cycle in a normal ECG. Precise delineation of these waveforms is essential for both electrophysiological characterization and clinical diagnosis. However, achieving reliable automated waveform delineation faces significant hurdles. ECG signals are inherently nonstationary and vulnerable to diverse noise sources, including baseline drift, powerline interference, electromyographic noise, motion artifacts, and instrumentation noise. The typically low amplitude of the P wave renders it particularly susceptible to obscuration by noise. Furthermore, the biphasic nature occasionally exhibited by both P and T waves complicates the precise demarcation of their onsets and offsets. Compounding these issues is the potential absence of standard waveform segments in certain cardiac cycles (e.g., missing P waves). Consequently, accurate ECG waveform delineation remains a substantial challenge.

In this study, ECG waveform delineation refers to the task of identifying the onset, offset, and peak of P-waves, QRS complexes, and T-waves. The primary objective of this study is to develop a robust and accurate framework for adaptive waveform delineation of noisy ECG signals. This task requires precise localization of key fiducial points, specifically the onset, peak, and offset positions of P-waves, QRS complexes, and T-waves, to aid in the diagnosis of cardiovascular diseases.

Achieving accurate waveform delineation in noisy environments requires a model to resolve the conflict between capturing fine-grained morphological details and maintaining robustness to artifact corruption. We hypothesize that although ECG signals present an integrated representation of cardiac activity, the temporal–rhythmic and morphological–spatial features within them exhibit distinct representational properties and spectral sensitivities. For example, low-frequency baseline wander disproportionately distorts wave amplitudes and offsets (morphological features), while the underlying rhythm (temporal features) often remains discernible. Conversely, high-frequency noise can corrupt fine morphological details while leaving the coarse temporal sequence intact.

To exploit this differential sensitivity, we propose a dual-stream architecture, ECG-SCFNet, composed of two dedicated feature extraction pathways: A temporal stream designed to learn robust rhythmic patterns and long-range contextual dependencies that are inherently more persistent across varying noise conditions, and a morphological stream engineered to capture structural signatures essential for precise boundary localization of individual waves.

A central design consideration is the effective integration of these two complementary feature representations. Our solution is the Selective Context Fusion (SCF) module, which functions as an adaptive feature recalibration gate. It dynamically modulates and integrates the contributions from each stream, learning to emphasize the most reliable and informative features in a context-dependent manner. This enables the model to, for instance, leverage stable rhythmic contexts to guide morphological interpretation when the signal quality is poor.

To further endow the model with explicit invariance to noise, we employ a consistency training paradigm. This strategy regularizes the model to produce coherent delineations for both clean and artificially corrupted versions of an ECG signal, thereby solidifying its robustness.

The synergistic combination of the dual-stream architecture, the adaptive fusion mechanism, and the consistency training paradigm provides an integrated and robust solution for reliable ECG waveform delineation in clinical environments plagued by noise.

1.1. Traditional ECG Waveform Segmentation Methods

In their pioneering study, Pan and Tompkins [2] first proposed a method for QRS compound wave detection based on the fusion of slope, amplitude, and width features. Since then, ECG waveform recognition technology has continued to evolve, with numerous advanced solutions emerging for the detection of P-waves and T-waves: (1) time–frequency analysis techniques such as wavelet transform [3,4,5,6,7], Hilbert transform [8,9], and phase-volume transform [10] are adopted in the field of digital signal processing; and (2) probabilistic modeling methods such as Hidden Markov Models [11,12] and Gaussian Mixture Models [13] are applied in the direction of classical machine learning. Due to their excellent performance on public datasets such as LUDB [14], wavelet transform-based solutions are widely recognized as the performance benchmark of traditional methods. However, such methods have inherent limitations: (1) they rely heavily on manual feature engineering and domain a priori knowledge; (2) wavelet-based algorithms require fine thresholding to tune the parameters and have insufficient generalization capabilities; and (3) the overall framework lacks the ability to be adaptive to individual differences and noise interference.

1.2. Segmentation of ECG Waveform Based on Fixed-Length Heartbeat Slicing

The length of the heartbeat is dynamic and changes with time, individual and environmental conditions, and the common fixed-length heartbeat cut-off method (after locating the R-peak, a number of sampling points are intercepted forward and backward to divide the heartbeat) ignores the dynamics of the length of the heartbeat. Reference [15] first downsampled the ECG signal and then took 61 points before the R-wave spike and 38 points after the R-wave spike, totaling 100 points, to represent a heartbeat. Reference [16] similarly downsampled the ECG signal time series and then took 64 points on each side of the R-wave spike. This fixed method to represent the length of a heartbeat can easily to ignore the dynamics of the heartbeat length, which makes the heartbeat obtained by slicing have residuals or redundancy, and it affects the effect of later feature extraction and waveform recognition.

1.3. Deep Learning-Based ECG Waveform Segmentation Method

Deep learning has made significant breakthroughs in the field of ECG segmentation [17,18,19,20,21,22,23,24,25,26,27,28], and a number of innovative architectures have emerged: Guillermo et al. introduced the U-Net architecture to the ECG segmentation task for the first time [29]; Sereda et al. developed an 8-layer convolutional network to realize end-to-end waveform segmentation [30]; and Moskalenko et al. designed a fully convolutional neural network that can directly process ECG signals of arbitrary sampling rates and output P/T/QRS waves with precise boundary points [31]. In recent studies, Chen et al. proposed a one-dimensional U-Net variant [32] that innovatively introduces a single-beat segmentation post-processing strategy and enhances local and global feature extraction via heartbeat cycle isolation. Nurmaini’s team, on the other hand, fused convolutional recurrent networks with grid search optimization techniques to significantly improve the segmentation accuracy of P-QRS-T composite waves [33]. Li et al. developed the SEResUTer model [34], which replaces the standard convolutional layer with a Transformer encoder through a residual linking mechanism to improve optimization and coding capabilities.

Beyond domain-specific approaches, the general concept of decoupling temporal and spatial features has proven to be a powerful paradigm in other segmentation tasks, particularly in computer vision. For instance, in video object segmentation, reference [35] demonstrated that separately modeling temporal distribution and spatial correlation leads to more robust and universal performance. This philosophy aligns with and reinforces the design motivation behind our dual-stream architecture. While our application domain is fundamentally different, we share the core insight that processing temporal and spatial (morphological) features through dedicated pathways allows for more effective and adaptive feature extraction. Our work effectively translates and validates this powerful concept in the domain of ECG signal analysis.

Our proposed dual-stream network with the selective context fusion module achieves state-of-the-art performance on both LUDB and QT datasets, with F1-scores of 97.83% and 97.80%, respectively, under clean conditions, representing a relative improvement of 0.37% and 0.22% over the previous best method (97.46% and 97.58% F1-score). Under noisy conditions, our method maintains F1-scores of 88.49% and 86.25% on the respective datasets, demonstrating a substantial 3.04% and 1.46% absolute improvement over the previous best performance (85.45% and 84.79% F1-score). These findings conclusively demonstrate the effectiveness of our proposed method for clinical applications requiring reliable ECG analysis in challenging environments.

2. Materials and Methods

As shown in Figure 1, in this paper, we propose ECG-SCFNet, a novel dual-stream architecture for robust ECG segmentation. The model includes the basic components described below.

2.1. Model Architecture

2.1.1. Temporal Stream

As shown in Figure 2, the temporal stream is designed to capture dynamic rhythm patterns and long-range dependencies within ECG signals. The architecture utilizes a sequential multi-branch structure, where each branch processes the output of the previous one. This cascaded design—comprising localized feature extraction, intermediate processing, and dilated convolutional layers for long-range context—progressively expands the receptive field to integrate contextual information over extended time periods. By emphasizing hierarchical temporal aggregation, the stream effectively models rhythm evolution (e.g., RR intervals and heart rate variability). A temporal attention mechanism is incorporated at the end to recalibrate channel-wise features, enhancing salient rhythm information.

2.1.2. Morphology Stream

As shown in Figure 3, the morphology stream focuses on characterizing the structural shape of individual ECG waveforms (e.g., P, QRS, and T waves). It adopts a genuinely parallel multi-scale architecture, where four independent convolutional branches—with varying kernel sizes (3, 5, 7, 9) and dilation rates—operate simultaneously on the same input feature map. This design enables the concurrent extraction of shape characteristics across different scales, capturing both fine-grained details and broader waveform contours essential for morphological discrimination. The multi-scale features are subsequently integrated through a feature pyramid module. A morphology-specific attention mechanism, combining channel and spatial attention, further emphasizes regions with prominent morphological signatures.

2.1.3. Feature Fusion Module

As shown in Figure 4, the Selective Context Fusion (SCF) module integrates temporal and morphological features through a dual-attention gating mechanism. It first applies channel attention to each input stream, modulating feature significance across channels using global statistics. Spatial attention is then computed exclusively for the temporally modulated features by combining max-pooling and average-pooling outputs, with a 1D convolutional layer designed to highlight potential regions. The morphologically modulated features remain unaltered at this stage. Both streams are concatenated and processed by a convolutional gating mechanism that dynamically balances inter-stream contributions through element-wise multiplication. Finally, the fused representation is enhanced by a convolutional block with batch normalization, ReLU activation, and efficient channel attention refinement, optimizing features for precise segmentation.

2.1.4. Decoder

The decoder progressively upsamples the feature maps through transposed convolutions, reconstructing the output signal. At each stage, upsampled features from higher levels are fused with corresponding encoder features via attention-guided concatenation. This fusion is modulated by an attention mechanism that weights diagnostically critical regions, followed by convolutional refinement to generate resolution-enhanced representations for precise boundary localization.

2.1.5. Training Configuration

We adopted the Adam optimizer with its default hyperparameter settings. The initial learning rate was set to 0.001 and followed a cosine annealing strategy throughout the training process. The final architecture of the proposed model contains 12.32 million trainable parameters and requires about 70 MB of memory, making it suitable for real-time analysis in clinical applications. The experiments were conducted on a system equipped with an Intel(R) Core(TM) i7-13650H CPU (ASUS TUF Gaming A15), 16.0 GB of RAM, and an NVIDIA GeForce RTX 4060 GPU with 8 GB of VRAM. The equipment adopted is the ASUS TUF Gaming A15 Pro from ASUS, Suzhou, China. The models were implemented using Python 3.12 and the PyTorch 2.3.1 deep learning framework with CUDA 12.1 support.

2.2. Dataset

The Lobachevsky University Electrocardiography Database (LUDB) [14] comprises 200 distinct 12-lead electrocardiogram (ECG) recordings, each lasting 10 s, capturing a spectrum of ECG morphologies. A defining feature of this resource is the inclusion of detailed manual annotations for each record, meticulously delineating the onset, peak, and offset boundaries of the P-wave, QRS complex, and T-wave by board-certified cardiologists. The ECG data originated from both healthy volunteers and patients receiving care at Nizhny Novgorod City Hospital No. 5 during 2017–2018. The patient cohort presented with diverse cardiovascular conditions, including individuals with implanted pacemakers. Beyond the waveform boundary annotations, each record is supplemented with its corresponding clinical diagnosis. Consequently, LUDB serves as a valuable benchmark dataset, suitable both as an educational resource and for the critical tasks of training and evaluating automated ECG segmentation algorithms designed to perform accurate delineation of P-waves, QRS complexes, and T-waves.

The QT Database [36] is a widely used resource for validating electrocardiographic (ECG) waveform boundary detection algorithms. It consists of 105 two-channel, 15 min excerpts from Holter recordings, selected for their diversity in QRS and ST-T morphologies. A subset of 3622 beats across all records (minimum 30 per record) was manually annotated by expert cardiologists using an interactive tool that allowed for simultaneous viewing of both channels. This annotated set includes representative examples of each morphological variation. To enable the study of inter-observer variability, 11 records include 2 independently annotated sets. The database is formatted consistently with the MIT-BIH Arrhythmia and European ST-T Databases, from which some recordings are derived.

2.3. Evaluation Metrics

The American Association for the Advancement of Medical Instrumentation (AAMI) mandates a 150-millisecond tolerance threshold for validating ECG segmentation accuracy. Under this protocol, an algorithm’s detection of fiducial points (e.g., P-wave onset) is accepted as a True Positive (TP) only when its deviation from expert-annotated references remains within ±150 ms. Predictions exceeding this tolerance window are classified as False Positives (FP), while undetected reference annotations constitute False Negatives (FN).

The efficacy assessment of our proposed framework employs three key metrics: precision (positive predictive value), recall (sensitivity), and F1-score (harmonic mean of precision and recall).

Precision rate is a measure of accuracy that indicates the proportion of examples categorized as positive that are actually positive.

P r e = \frac{T P}{T P + F P},

(1)

Recall is an indication of how many predictions were correct out of all the actual positive data, the same as sensitivity.

R e c = \frac{T P}{T P + F N},

(2)

F1-score, defined as the harmonic mean of precision and recall, serves as a robust composite metric for holistic model evaluation.

F 1 = \frac{2 \times P r e \times R e c}{P r e + R e c},

(3)

To ensure comparability with prior work, we evaluate our model using these widely adopted metrics for ECG delineation.

3. Results

3.1. Data Augmentation and Training

In order to simulate noise in the real case, we use data enhancement methods such as Gaussian noise, baseline wander, baseline shift, resize, etc., as shown in Figure 5.

Gaussian noise: Zero-mean Gaussian noise was added. The standard deviation of the noise was a fixed value of 0.1 multiplied by the standard deviation of the input ECG signal.

Baseline wander: A compound sinusoidal wave was added to simulate respiratory artifact. The base amplitude scaling factor was 0.15 multiplied by the standard deviation of the input signal. The frequency of the constituent waves ranged from 0.1 to 0.5 Hz.

Baseline shift: Gradual DC shifts were introduced to simulate electrode motion. The maximum absolute shift magnitude was a fixed value of 0.3 multiplied by the standard deviation of the input signal. The transition time for shifts was 200 milliseconds.

Resize: The signal amplitude was globally rescaled. The scaling factor was uniformly sampled from the range of 0.7 to 1.5.

During the training process, the network is first trained using augmented ECG data generated through various augmentation techniques, including Gaussian noise, baseline wander, baseline shift, and resizing. These augmentations are applied in a randomized manner to enhance the diversity of the training samples. The augmented ECG signals, along with the original signals, are subsequently fed into ECG-SCFNet for consistency training. The consistency loss is computed using the Mean Squared Error (MSE) loss function, and model parameters are updated via backpropagation to improve the network’s robustness. As illustrated in Figure 6, this consistency training phase is followed by supervised training on the original dataset to improve final performance.

3.2. Ablation Experiment

As shown in Table 1 and Table 2, the ablation studies conducted on both LUDB and QT datasets provide comprehensive validation of our architectural design choices.

Individual stream analysis reveals that both temporal and morphological pathways contribute significantly to the final performance. The temporal stream achieves 96.51% and 96.60% F1-scores on the LUDB and QT datasets, respectively, effectively capturing rhythm patterns and long-range dependencies. The morphology stream demonstrates comparable performance with 96.47% and 96.55% F1-scores, validating its capability in extracting structural waveform characteristics. This balanced performance confirms that both types of features are essential for accurate ECG segmentation.

The integration of both streams through simple concatenation already shows meaningful improvements, achieving 97.26% and 97.20% F1-scores on the respective datasets. This improvement indicates the complementary nature of temporal and morphological information. However, our proposed selective context fusion module provides additional performance gains, reaching 97.83% and 97.80% F1-scores, demonstrating that adaptive feature integration outperforms basic fusion strategies.

The consistent performance gaps between concatenation and SCF-based fusion across both datasets (0.57% on LUDB and 0.60% on QT) validate the effectiveness of our fusion mechanism. This advantage can be attributed to the module’s ability to dynamically adjust feature contributions based on their relevance and quality, particularly under varying signal conditions.

These results collectively demonstrate that our dual-stream architecture with selective fusion learns more robust and generalizable representations compared to single-stream approaches or simple fusion strategies.

3.3. Comparison Experiment

We conducted noise robustness experiments by introducing composite noise perturbations—including Gaussian noise, baseline wander, baseline shift, and resize to the ECG signals. The experimental results are summarized in Table 3 and Table 4.

In the ECG waveform segmentation evaluation on the LUDB dataset, as shown in Table 3, our method achieved a precision of 96.79%, recall of 98.90%, and F1-score of 97.83% under standard testing conditions, outperforming all reference methods including recent approaches by Joung et al. [27] (97.46% F1-score) and Tutuko et al. [37] (96.86% F1-score). Under noisy conditions with composite artifacts, our approach maintained its performance advantage, with 87.60% precision, 89.40% recall, and 88.49% F1-score, demonstrating significantly enhanced robustness compared to other methods. This was particularly evident when compared with Joung et al. [27] (85.45% F1-score) and Tutuko et al. [37] (83.76% F1-score) under identical noise conditions. The performance advantage was most pronounced when compared to Sereda et al.’s [30] method, which exhibited a substantial performance degradation to 77.58% F1-score under noise corruption.

Table 3. Comparison of segmentation performance on LUDB.

Method	Pre (%)	Rec (%)	F1 (%)
Sereda [30]	84.08	98.23	90.41
Moskalenko [31]	95.59	98.75	97.14
Liang [38]	95.85	98.30	97.05
Tutuko [37]	95.53	98.22	96.86
Joung [27]	96.23	98.73	97.46
Our Method	96.79	98.90	97.83
Sereda [30] + noise	72.35	83.62	77.58
Moskalenko [31] + noise	82.40	85.15	83.75
Liang [38] + noise	83.25	86.40	84.79
Tutuko [37] + noise	82.31	85.26	83.76
Joung [27] + noise	84.10	86.85	85.45
Our method + noise	87.60	89.40	88.49

Table 4. Comparison of segmentation performance on QT database.

Method	Pre (%)	Rec (%)	F1 (%)
Sereda [30]	90.25	96.80	93.41
Moskalenko [31]	96.15	97.82	96.98
Liang [38]	96.40	97.65	97.02
Tutuko [37]	96.20	97.56	96.87
Joung [27]	97.07	98.11	97.58
Our Method	97.35	98.25	97.80
Sereda [30] + noise	75.80	82.45	78.98
Moskalenko [31] + noise	80.15	84.20	82.12
Liang [38]+noise	81.30	85.65	83.41
Tutuko [37] + noise	80.23	84.39	82.25
Joung [27] + noise	83.25	86.40	84.79
Our method + noise	84.95	87.60	86.25

The evaluation on the QT database provides compelling evidence for the superior generalization capability and noise robustness of our proposed method. As shown in Table 4, our approach achieves state-of-the-art performance under both clean and noisy conditions, demonstrating consistent advantages across all evaluation metrics. Under clean conditions, our method obtains a 97.80% F1-score, representing a modest but consistent improvement over existing approaches. This performance advantage becomes substantially more pronounced under noisy conditions, where our method maintains an 86.25% F1-score compared to 84.79% for the next best method (Joung et al. [27]).

Experiments on the LUDB and QT datasets show that this consistent performance advantage across both clean and noisy environments validates the effectiveness of our dual-stream architecture and consistency training strategy in learning noise-invariant representations.

4. Conclusions

This study presents ECG-SCFNet, a dual-stream network architecture that integrates temporal rhythm processing and morphological feature extraction through selective context fusion. The proposed model achieves improved segmentation performance on LUDB and QT dataset, consistently outperforming existing methods under both clean and noisy conditions. Notably, ECG-SCFNet exhibits strong robustness in the presence of simulated noise, maintaining accurate waveform delineation where other approaches show significant performance degradation. These results highlight the method’s potential to support reliable ECG analysis in clinical settings where signal quality often varies.

Author Contributions

Conceptualization, Y.N. and N.L.; methodology, Y.N. and N.L.; software, Y.N. and K.T.; validation, Y.N., N.L. and B.L.; formal analysis, Y.N.; investigation, Y.N.; resources, N.L. and Y.T.; data curation, Y.N.; writing—original draft, Y.N. and N.L.; writing—review and editing, N.L., Y.T., K.T. and B.L.; visualization, Y.N.; supervision, N.L.; project administration, Y.N.; funding acquisition, N.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Henan Province, the key research and development and promotion of special scientific and technological grant number 222102310663.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Gacek, T. ECG Signal Processing, Classification and Interpretation: A Comprehensive Framework of Computational Intelligence; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
Pan, J. A real-time QRS detection algorithm. IEEE Trans. Biomed. Eng. 2007, 3, 230–236. [Google Scholar] [CrossRef]
Li, C. Detection of ECG characteristic points using wavelet transforms. IEEE Trans. Biomed. Eng. 1995, 42, 21–28. [Google Scholar] [CrossRef] [PubMed]
Martínez, J. A wavelet-based ECG delineator: Evaluation on standard databases. IEEE Trans. Biomed. Eng. 2004, 51, 570–581. [Google Scholar] [CrossRef]
Kalyakulina, A. Finding morphology points of electrocardiographic-signal waves using wavelet analysis. Radiophys. Quantum Electron. 2019, 61, 689–703. [Google Scholar] [CrossRef]
Sabherwal, P. Independent detection of t-waves in single lead ecg signal using continuous wavelet transform. Cardiovasc. Eng. Technol. 2019, 14, 167–181. [Google Scholar] [CrossRef]
Di Marco, L. A wavelet-based ECG delineation algorithm for 32-bit integer online processing. Biomed. Eng. Online 2011, 10, 23. [Google Scholar] [CrossRef]
Benitez, D. A new QRS detection algorithm based on the Hilbert transform. Comput. Cardiol. 2019, 27, 379–382. [Google Scholar]
Mukhopadhyay, S. Time plane ECG feature extraction using Hilbert transform, variable threshold and slope reversal approach. In Proceedings of the International Conference on Communication and Industrial Application, Kolkata, India, 26–28 December 2011; pp. 1–4. [Google Scholar]
Martinez, A. Automatic electrocardiogram delineator based on the phasor transform of single lead recordings. In Proceedings of the Computers in Cardiology, Belfast, UK, 26–29 September 2010; pp. 987–990. [Google Scholar]
Graja, S. Hidden Markov tree model applied to ECG delineation. IEEE Trans. Instrum. Meas. 2019, 54, 2163–2168. [Google Scholar] [CrossRef]
Akhbari, M. ECG segmentation and fiducial point extraction using multi hidden Markov model. Comput. Biol. Med. 2016, 79, 21–29. [Google Scholar] [CrossRef]
Dubois, R. Automatic ECG wave extraction in long-term recordings using Gaussian mesa function models and nonlinear probability estimators. Comput. Methods Programs Biomed. 2007, 88, 217–233. [Google Scholar] [CrossRef] [PubMed]
Kalyakulina, A. LUDB: A new open-access validation tool for electrocardiogram delineation algorithms. IEEE Access 2020, 8, 186181–186190. [Google Scholar] [CrossRef]
Cimen, E. Arrhythmia classification via k-means based polyhedral conic functions algorithm. In Proceedings of the CSCI, Las Vegas, NV, USA, 15–17 December 2016; pp. 798–802. [Google Scholar]
Kiranyaz, S. Real-time patient-specific ECG classification by 1-D convolutional neural networks. IEEE Trans. Biomed. Eng. 2015, 63, 664–675. [Google Scholar] [CrossRef]
Abrishami, H. Supervised ECG interval segmentation using LSTM neural network. In Proceedings of the IEEE BIOCOMP, Las Vegas, NV, USA, 19–21 March 2018; pp. 71–77. [Google Scholar]
Londhe, A. Semantic segmentation of ECG waves using hybrid channel-mix convolutional and bidirectional LSTM. Biomed. Signal Process. Control 2021, 63, 102162. [Google Scholar] [CrossRef]
Camps, J. Deep learning based QRS multilead delineator in electrocardiogram signals. In Proceedings of the CinC, Maastricht, The Netherlands, 23–26 September 2018; Volume 45, pp. 1–4. [Google Scholar]
Peimankar, A. DENS-ECG: A deep learning approach for ECG signal delineation. Expert Syst. Appl. 2021, 165, 113911. [Google Scholar] [CrossRef]
Sodmann, P. ECG segmentation using a neural network as the basis for detection of cardiac pathologies. In Proceedings of the Computing in Cardiology, Rimini, Italy, 13–16 September 2020; pp. 1–4. [Google Scholar]
Cai, W. QRS complex detection using novel deep learning neural networks. IEEE Access 2020, 8, 97082–97089. [Google Scholar] [CrossRef]
Mitrokhin, M. Deep learning approach for QRS wave detection in ECG monitoring. In Proceedings of the AICT, Moscow, Russia, 20–22 September 2017; pp. 1–3. [Google Scholar]
Wang, J. A knowledge-based deep learning method for ECG signal delineation. Future Gener. Comput. Syst. 2020, 109, 56–66. [Google Scholar] [CrossRef]
Jimenez-Perez, G. Delineation of the electrocardiogram with a mixed-quality-annotations dataset using convolutional neural networks. Sci. Rep. 2021, 11, 863. [Google Scholar] [CrossRef] [PubMed]
Avetisyan, A. Self-Trained Model for ECG Complex Delineation. In Proceedings of the ICASSP, Hyderabad, India, 6–11 April 2025; pp. 1–5. [Google Scholar]
Joung, C. Deep learning based ECG segmentation for delineation of diverse arrhythmias. PLoS ONE 2024, 19, e0303178. [Google Scholar] [CrossRef]
Emrich, J. Physiology-Informed ECG Delineation Based on Peak Prominence. In Proceedings of the EUSIPCO, Lyon, France, 26–30 August 2024; pp. 1402–1406. [Google Scholar]
Jimenez-Perez, G. U-Net architecture for the automatic detection and delineation of the electrocardiogram. In Proceedings of the CinC, Singapore, 8–11 September 2019; pp. 1–4. [Google Scholar]
Sereda, I. ECG segmentation by neural networks: Errors and correction. In Proceedings of the IJCNN, Budapest, Hungary, 14–19 July 2019; pp. 1–7. [Google Scholar]
Moskalenko, V. Deep learning for ECG segmentation. In Proceedings of the International Conference on Neuroinformatics, Dolgoprudny, Russia, 7–11 October 2019; pp. 246–254. [Google Scholar]
Chen, Z. Post-processing refined ECG delineation based on 1D-UNet. Biomed. Signal Process. Control 2023, 79, 104106. [Google Scholar] [CrossRef]
Nurmaini, S. Robust electrocardiogram delineation model for automatic morphological abnormality interpretation. Sci. Rep. 2023, 13, 13736. [Google Scholar] [CrossRef] [PubMed]
Li, X. SEResUTer: A deep learning approach for accurate ECG signal delineation and atrial fibrillation detection. Physiol. Meas. 2023, 44, 125005. [Google Scholar] [CrossRef] [PubMed]
Dong, G. Learning Temporal Distribution and Spatial Correlation Toward Universal Moving Object Segmentation. IEEE Trans. Image Process. 2024, 33, 2447–2461. [Google Scholar] [CrossRef]
Laguna, P. A database for evaluation of algorithms for measurement of QT and other waveform intervals in the ECG. In Proceedings of the Computers in Cardiology, Lund, Sweden, 7–10 September 1997; pp. 673–676. [Google Scholar]
Tutuko, B. Short single-lead ECG signal delineation-based deep learning: Implementation in automatic atrial fibrillation identification. Sensors 2022, 22, 2329. [Google Scholar] [CrossRef] [PubMed]
Liang, X. ECGSegNet: An ECG delineation model based on the encoder-decoder structure. Comput. Biol. Med. 2022, 145, 105445. [Google Scholar] [CrossRef] [PubMed]

Figure 1. An overview of the ECG-SCFNet architecture designed for ECG waveform segmentation. ECG-SCFNet is a dual-stream architecture designed for ECG waveform delineation, integrating both temporal and morphological features through a novel selective context fusion mechanism. The network processes raw ECG signals through two parallel streams: a temporal stream that captures dynamic patterns and a morphological stream that extracts structural features.

Figure 2. Architecture of the temporal stream module.

Figure 3. Architecture of the morphology stream module.

Figure 4. Architecture of the SCF module.

Figure 5. (a) Gaussian noise: Simulates high-frequency myographic interference caused by muscle activity. (b) Baseline shift: Represents abrupt changes in baseline due to variations in electrode–skin contact impedance. (c) Baseline wander: Mimics low-frequency artifacts induced by respiratory motion or electrode sway. (d) Resize: Simulates amplitude variations from lead gain differences and physiological variability.

Figure 6. Consistency training.

Table 1. Ablation experiment on the LUDB dataset.

Method	Pre (%)	Rec (%)	F1 (%)
Temporal stream	95.20	97.85	96.51
Morphology stream	95.85	97.10	96.47
DualStream + concat	96.25	98.30	97.26
DualStream + SCF	96.79	98.90	97.83

Table 2. Ablation experiment on the QT dataset.

Method	Pre (%)	Rec (%)	F1 (%)
Temporal stream	95.92	97.30	96.60
Morphology stream	96.15	96.95	96.55
DualStream + concat	96.75	97.65	97.20
DualStream + SCF	97.35	98.25	97.80

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Niu, Y.; Lin, N.; Tian, Y.; Tang, K.; Liu, B. ECG Waveform Segmentation via Dual-Stream Network with Selective Context Fusion. Electronics 2025, 14, 3925. https://doi.org/10.3390/electronics14193925

AMA Style

Niu Y, Lin N, Tian Y, Tang K, Liu B. ECG Waveform Segmentation via Dual-Stream Network with Selective Context Fusion. Electronics. 2025; 14(19):3925. https://doi.org/10.3390/electronics14193925

Chicago/Turabian Style

Niu, Yongpeng, Nan Lin, Yuchen Tian, Kaipeng Tang, and Baoxiang Liu. 2025. "ECG Waveform Segmentation via Dual-Stream Network with Selective Context Fusion" Electronics 14, no. 19: 3925. https://doi.org/10.3390/electronics14193925

APA Style

Niu, Y., Lin, N., Tian, Y., Tang, K., & Liu, B. (2025). ECG Waveform Segmentation via Dual-Stream Network with Selective Context Fusion. Electronics, 14(19), 3925. https://doi.org/10.3390/electronics14193925

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

ECG Waveform Segmentation via Dual-Stream Network with Selective Context Fusion

Abstract

1. Introduction

1.1. Traditional ECG Waveform Segmentation Methods

1.2. Segmentation of ECG Waveform Based on Fixed-Length Heartbeat Slicing

1.3. Deep Learning-Based ECG Waveform Segmentation Method

2. Materials and Methods

2.1. Model Architecture

2.1.1. Temporal Stream

2.1.2. Morphology Stream

2.1.3. Feature Fusion Module

2.1.4. Decoder

2.1.5. Training Configuration

2.2. Dataset

2.3. Evaluation Metrics

3. Results

3.1. Data Augmentation and Training

3.2. Ablation Experiment

3.3. Comparison Experiment

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI