1. Introduction
Non-invasive fetal electrocardiogram (FECG) monitoring is a vital clinical tool for evaluating fetal cardiac electrophysiology during late pregnancy and delivery. The bandwidth of the FECG is typically 5–100 Hz, with the dominant energy of the fetal QRS complex concentrated at approximately 8–45 Hz. FECG provides precise timestamps for each cardiac cycle and complete QRS–T morphological details, enabling the early identification of fetal distress, arrhythmias, and other abnormalities, thereby helping to reduce the incidence of adverse perinatal outcomes [
1,
2]. Nevertheless, in abdominal electrocardiogram (AECG) signals, the amplitude of the fetal component is typically only 1/5 to 1/10 that of the maternal electrocardiogram (MECG), or even lower. The FECG is therefore severely contaminated by strong maternal interference, electromyographic noise, respiration-induced baseline drift, power-line interference, and other instrumental disturbances. Although multichannel systems can enhance signal separation via spatial filtering, they are bulky, costly, and impractical for long-term home monitoring. In comparison, single-channel acquisition is characterized by simple hardware, a compact size, low cost, and superior wearability, making it highly suitable for home-based fetal surveillance. This configuration can be used to improve maternal compliance and expand the accessibility of FECG monitoring [
3,
4]. Accordingly, the efficient extraction of FECG from single-channel AECG is not only essential for addressing existing technical bottlenecks but also fundamental for realizing portable and widely accessible home fetal monitoring, thereby improving the timeliness and reliability of fetal health assessment [
5,
6].
Classical signal processing techniques, including Independent Component Analysis (ICA), Principal Component Analysis (PCA), adaptive filtering, wavelet transform, and template coherent averaging subtraction, deliver satisfactory performance in multichannel FECG extraction. However, under single-channel acquisition conditions, the absence of spatial information, combined with weak fetal signal components and nonstationary noise, significantly complicates maternal–fetal signal separation. Behar et al. reported that single-channel approaches such as template subtraction (TS), adaptive filtering, and echo state networks (ESNs) exhibited performance degradation under limited spatial information, particularly in the presence of strong maternal interference or high noise levels [
7]. ICA-based blind source separation relies on statistical independence and requires multichannel recordings, rendering it inapplicable to single-channel AECG signals. Adaptive filtering enables online updating for non-invasive FECG extraction [
8], yet it heavily depends on high-quality maternal reference signals that are difficult to acquire reliably. Wavelet-based methods suppress noise through multiresolution decomposition but are sensitive to threshold selection and may distort the morphology of fetal QRS complexes under low-signal-to-noise ratio (SNR) conditions [
9]. Although TS effectively removes maternal QRS complexes [
10], it demonstrates limited robustness against high levels of background noise and temporal overlap between maternal and fetal cardiac waveforms.
Data-driven decomposition methods have emerged as promising alternatives for single-channel FECG extraction. Empirical Mode Decomposition (EMD) and Ensemble Empirical Mode Decomposition (EEMD) are well suited for nonlinear and nonstationary AECG signals, as they do not require predefined basis functions [
11,
12,
13]. Dash and Nath combined EMD with wavelet decomposition for fetal QRS extraction [
14]; however, their selection of intrinsic mode functions (IMFs) was based on fixed frequency criteria, which limits the robustness of the method under low-SNR conditions. Zhang and Guo applied EMD combined with coherent averaging [
13], but threshold-based selection carries the risk of retaining maternal residual components or discarding weak fetal ECG content. Liu and Luan introduced an EEMD-based method that uses energy and correlation coefficients for IMF selection [
15], yet manual intervention was still required. Barnova et al. integrated ICA, recursive least squares (RLS), and EEMD [
16], but manual IMF selection reduced the degree of automation and limited clinical practicability. Despite their adaptive decomposition capability, EEMD-based approaches still rely on heuristic rules or manual experience for IMF selection, which results in inconsistent performance across diverse clinical scenarios.
Although existing studies made important progress in system platform architecture and semi-automatic algorithm workflow, the core algorithmic bottleneck in single-channel fetal ECG extraction, namely the automatic selection of IMFs after EEMD decomposition, remained insufficiently resolved. In this study, a one-dimensional convolutional neural network (1D CNN) was adopted for IMF classification instead of Transformer or Long Short-Term Memory (LSTM) architectures. EEMD-derived IMFs exhibit local and quasi-periodic impulse patterns, and their discriminative features mainly lie in QRS morphology and periodicity rather than long-range dependencies. 1D CNNs capture these local characteristics through translation-invariant convolution with fewer parameters and provide high computational efficiency that is critical for biomedical applications [
17,
18]. In contrast, Transformers require excessive computational resources, while LSTMs are prone to gradient vanishing and slow convergence [
19,
20]. Conventional machine learning methods rely on handcrafted features that cannot adequately describe the nonlinear and nonstationary characteristics of fetal QRS complexes, especially under low-SNR conditions [
21,
22,
23]. Studies that focus on the semantic discrimination of individual EEMD-derived IMFs and systematically integrate deep learning into single-channel FECG extraction remain limited [
24,
25], and this gap further motivated the proposed hybrid framework.
This study presented a deep learning-guided method for single-channel FECG extraction. This approach replaced manual component selection in conventional EEMD-based methods with a data-driven IMF classification module. First, EEMD decomposes single-channel AECG into multiple IMFs. A trained 1D CNN then analyzes each IMF and outputs fetal-related probabilities to automatically identify fetal-dominant components, thereby eliminating subjective component selection and enhancing robustness. Building on this module, EEMD decomposition, intelligent IMF selection, maternal QRS template coherent averaging subtraction, and a second EEMD-based purification step were integrated into a sequential multi-stage signal processing pipeline. Unlike conventional hybrid architectures requiring manual intervention for IMF selection, the proposed framework completely eliminates this subjective manual procedure. Previous approaches retain operator-dependent heuristic rules for component identification; in contrast, the proposed method uses a physiologically guided 1D CNN to classify IMFs based on fetal probability scores. This data-driven strategy transformed the conventional semi-automatic pipeline into a fully automated system without compromising clinical interpretability. The contribution of this study extends beyond algorithmic integration, providing a reproducible standard for single-channel FECG extraction using adaptive decomposition.
2. Materials and Methods
2.1. Method Validation Design
To rigorously evaluate the effectiveness, robustness, and clinical generalizability of the proposed single-channel FECG extraction method, a two-stage validation protocol was designed and implemented in this study. First, a mixed dataset was constructed using 15 high-fidelity simulated signals with accurate fetal R-peak ground-truth annotations (noise augmented to SNR = 10 dB) and 5 real clinical recordings from the Abdominal and Direct Fetal Electrocardiogram Database (ADFECGDB). A leave-one-subject-out (LOSO) cross-validation scheme was adopted to train and evaluate the IMF classification model, thereby preventing inter-subject data leakage and ensuring an objective assessment.
Second, short single-channel recordings from the Database for the Identification of Systems (DaISy), which were not used for training, were introduced as an independent external test set. Under identical preprocessing and maternal template subtraction settings, a head-to-head comparison was conducted between the proposed 1D CNN-guided IMF selection method (CNN-2×EEMD), a fixed-heuristic automatic selection method (Auto-2×EEMD), and manual visual selection (Manual-2×EEMD, used as a manual visual reference). Performance was evaluated using multiple metrics, including waveform fidelity, fetal R-peak detection accuracy, and signal-to-noise ratio improvement, in conjunction with visual inspection of the reconstructed waveforms. The overall procedure is shown in
Figure 1.
2.2. Dataset Description
To balance data diversity and training efficiency under a limited sample size, 15 single-channel abdominal ECG signals were generated. By adjusting key parameters, including maternal heart rate, fetal heart rate, and the fetal-to-maternal amplitude ratio, these signals covered clinical scenarios ranging from normal pregnancy to low-SNR conditions with weak fetal components. This design ensured sufficient diversity and difficulty for learning fetal characteristics, while LOSO was enabled to reduce overfitting and provide an objective assessment of generalization. Moreover, the chosen sample size limited the computational burden and reproduced the physiological property that fetal amplitudes are markedly lower than maternal amplitudes, providing reliable supervised labels for IMF classification under conditions close to clinical practice. Because high-quality public AECG recordings are scarce and often lack accurate fetal R-peak annotations, they could not directly support supervised deep learning training. Therefore, the 15 signals were generated using an H2-3000F maternal–fetal ECG simulator, and precise fetal R-peak annotation was performed on a 500 Hz sampling grid. The fetal ECG simulator (Xuzhou Mingsheng Electronic Technology Co., Ltd., Xuzhou, China) integrates four 18,650 lithium batteries and a dedicated power management module to ensure stable, low-interference operation. It supports 12 V DC charging and provides dual-channel outputs for maternal-only and mixed maternal–fetal ECG signals. Maternal and fetal parameters such as heart rate, R-wave amplitude, and signal intensity can be flexibly adjusted to simulate various clinical conditions. This simulator is widely used for algorithm development and validation of maternal–fetal ECG monitoring systems. However, systematic discrepancies exist between simulated and real clinical recordings, which may weaken model generalization. In typical simulators, the fetal-to-maternal amplitude ratio is fixed between 1:1.17 and 1:16, with only Gaussian noise and simple baseline drift added. Real abdominal ECG signals show dynamic amplitude fluctuations and complex non-Gaussian noise, including electromyographic interference and irregular baseline wander. Such domain mismatch may degrade model performance in practical applications. To alleviate this issue, random amplitude jitter was introduced during simulation to replicate physiological variability, and real clinical noise segments were incorporated to augment the training data. Based on the parameters listed in
Table 1, 15 synthetic datasets were generated, covering maternal heart rates of 60–120 bpm, fetal heart rates of 110–160 bpm, and variable fetal–maternal amplitude ratios. These cases included both normal and low-SNR conditions to better represent real clinical signals.
In addition, five clinical recordings were selected from ADFECGDB (sampling rate: 1000 Hz) to enhance the model’s generalization to real-world clinical data [
26,
27]. Together with the 15 simulated cases spanning extreme physiological parameter settings and a LOSO strategy, this design mitigated the limited number of clinical samples and supported an objective evaluation of generalization. ADFECGDB provides simultaneous recordings of direct fetal ECG reference leads and multiple abdominal channels. This study focuses on single-channel FECG extraction, so one abdominal channel was selected as the sole input for both training and inference. During IMF classifier training, the direct fetal lead was used only to detect fetal R-peaks and label the IMFs decomposed from the corresponding abdominal signal. For final inference, the direct fetal lead was excluded entirely; only the selected abdominal channel served as input, consistent with the single-channel paradigm. To ensure consistent processing of simulated and clinical data and reduce the effects of noise and frequency interference on subsequent EEMD decomposition and model training, a standardized preprocessing pipeline was applied to all signals used for IMF classifier training.
To further examine generalization on a fully independent external dataset, short single-channel recordings from DaISy that were not involved in training were used. DaISy was sampled at 250 Hz; each record was 10 s long, and five abdominal channels were available, though fetal R-peak annotations were not provided [
28]. Based on preliminary signal analysis, channels 2 and 3 were selected for validation. As shown in
Figure 2, these channels exhibited concentrated energy within the characteristic 8–45 Hz band.
Figure 3 shows that these channels contained fetal QRS complexes with sharper rising and falling edges and more stable RR intervals than other channels, which facilitated the construction of reliable pseudo-references. Fetal R-peak locations were manually verified on a beat-by-beat basis, and ambiguous samples were excluded, providing a robust basis for head-to-head comparisons among the three IMF selection strategies.
2.3. Preprocessing
2.3.1. Sampling Rate Unification
The simulated data were resampled from 500 Hz to 1000 Hz to ensure consistent processing with the ADFECGDB dataset. The abdominal signals in ADFECGDB were originally sampled at 1000 Hz and thus required no resampling. Resampling was implemented using a polyphase filtering method with an upsampling factor of 2 and a downsampling factor of 1, as expressed in Equation (
1):
where
is the unit impulse response of the polyphase filter, which ensures accurate mapping of the signal frequency components.
2.3.2. Signal Filtering
A 50 Hz IIR notch filter was used to remove power-line interference. A fourth-order Butterworth bandpass filter (0.1–100 Hz) was then applied via zero-phase forward–backward filtering (filtfilt) to avoid phase distortion. The 0.1 Hz cutoff attenuated baseline wander without excessive waveform distortion, while the 100 Hz upper limit preserved diagnostically relevant frequency content: maternal ECG components extend to 100 Hz and fetal QRS complexes reach approximately 80–90 Hz.
2.3.3. Noise Augmentation (Simulated Training Data)
To better match the SNR distribution observed in clinical recordings, the simulated AECG was augmented with synthetic EMG interference and baseline wander, and the overall SNR was controlled to 10 dB after augmentation (
Figure 4). In clinical practice, after contamination by EMG and baseline drift, the fetal signal SNR typically fell within 8–12 dB; thus, 10 dB was a representative value for common medium-to-low-SNR scenarios. This setting avoided insufficient noise at high SNR (e.g., >15 dB), which may reduce model generalization, while also preventing extremely low SNR (e.g., <5 dB) where fetal components are fully masked and meaningful features become difficult to learn. Consequently, the training data remained both challenging and informative, and the learned IMF discrimination capability was closer to practical clinical conditions [
29,
30]. The noise synthesis procedure was as follows.
EMG interference simulation: Gaussian white noise was generated and then bandpass-filtered at 20–150 Hz to obtain EMG-like components. Its amplitude was scaled to 0.1 times the standard deviation of the original AECG signal to mimic high-frequency interference caused by muscle activity.
Baseline wander simulation: Two low-frequency sinusoids at 0.2 Hz and 0.33 Hz were generated and summed, with an amplitude set to 0.18 times the standard deviation of the original AECG signal, to reproduce slow baseline fluctuations induced by respiration and posture changes.
Noise was added following the energy superposition principle. By precisely adjusting the amplitude ratio between the two noise sources, the overall SNR of the augmented signals was strictly maintained at 10 dB. The noise-augmented signals were processed using exactly the same preprocessing pipeline as the original signals: 50 Hz notch filtering followed by bandpass filtering. This ensured a consistent training data distribution, avoided additional bias introduced by preprocessing, and provided a realistic and standardized noise environment for model training.
2.4. Ensemble Empirical Mode Decomposition
After the above preprocessing, power-line interference and broadband noise were reduced, and baseline wander was partially attenuated, providing a cleaner input for subsequent separation of fetal and maternal ECG components [
31]. Next, EEMD was used for adaptive decomposition. The nonstationary AECG signal was decomposed into IMFs with different time scales and frequency characteristics, providing interpretable units for subsequent IMF classification. By introducing noise-assisted decomposition, EEMD could alleviate mode mixing in nonstationary signals [
16]. The decomposition was expressed as follows:
where
denotes the input AECG signal,
(
) are the extracted intrinsic mode functions, and
is the residual.
To balance decomposition stability and computational efficiency, a fixed EEMD configuration was adopted. The ensemble size was set to 30, and the added noise level was set to noise_width = 0.2, and the maximum number of the extracted IMFs was limited to 8 to capture components related to fetal QRS activity (approximately 8–45 Hz), residual maternal activity, and noise, while avoiding redundancy due to over-decomposition. During training, the preprocessed AECG was segmented into 10 s windows (10,000 samples at 1000 Hz), and each segment was decomposed into eight IMFs. During inference, the same EEMD configuration was applied in both decomposition stages (i.e., after the maternal template subtraction and on the subsequent residual), ensuring consistent IMF generation for CNN-based scoring and selection.
The first-stage reconstruction may still contain residual maternal activity and nonstationary noise, with the residual signal retaining low-amplitude fetal information, particularly under low-SNR conditions. Therefore, a second EEMD was applied to the residual to further decompose the remaining components, and the fetal-related IMFs were selected to refine the extracted FECG.
Each of the IMFs obtained by EEMD may contain fetal ECG, residual maternal activity, and noise. To train the IMF classifier, binary labels (fetal-related = 1; non-related = 0) were generated using the reference fetal R-peak locations, as described below.
2.5. IMF Label Generation Strategy
For the simulated data, fetal R-peak indices annotated at 500 Hz were mapped to the 1000 Hz grid after resampling by multiplying the indices by 2. For ADFECGDB, fetal R-peaks were detected from the direct fetal ECG lead after 8–45 Hz bandpass filtering, using an amplitude threshold with a minimum inter-peak interval constraint of 300 ms. These R-peak locations were used only to generate the IMF labels during training and were not used in the final inference stage.
The abdominal AECG was segmented into 10 s windows (10,000 samples at 1000 Hz). After EEMD decomposition of each segment, the global peak-to-peak amplitude
of each IMF and the local peak-to-peak amplitude
within a ±50-sample window around each fetal R-peak were computed. The labeling rule was defined in Equation (
3):
where
is the maximum local peak-to-peak amplitude and
is the standard deviation of the IMF. For segments without fetal R-peaks, all IMF labels were set to 0 by default, because such segments contribute little fetal information and were excluded from model training.
The thresholds were selected by considering both the physiological characteristics of fetal QRS complexes and statistical properties of the IMFs. Given that fetal amplitudes are typically only 1/5 to 1/10 of maternal amplitudes, the 1.4
criterion helped distinguish local fetal QRS pulses from random noise, reducing the risk that low-amplitude fetal activity is mislabeled as noise. The
criterion was set according to the expected energy contribution of fetal components within the informative IMFs, thereby selecting fetal-dominant components while excluding the IMFs dominated by maternal residuals or noise. Together, these two constraints were suitable for low-SNR single-channel AECG and improved the specificity and plausibility of the labeling procedure [
32].
To avoid mislabeling caused by isolated noise spikes (e.g., EMG bursts) within a single R-peak window, the local peak-to-peak value from one window was not relied on. Instead, for each IMF, the local peak-to-peak amplitudes across windows centered at all fetal R-peak reference points were computed, and the median was used as the final decision statistic. This strategy leveraged the quasi-periodic nature of fetal ECG and effectively distinguished true fetal activity that persists across most cardiac cycles from transient noise that occurs sporadically [
32,
33,
34].
In total, from 20 subjects (15 simulated and 5 clinical), 1505 IMFs (each sample length: 10,000 points) were obtained, among which 744 were labeled as fetal-related positives, accounting for 49.44%. To automatically score and classify the IMFs, a 1D CNN tailored to the IMF waveform characteristics (IMFClassifier1D) was designed. Notably, the labeling strategy was used only to construct reference labels for supervised learning and was not directly used for the final IMF selection decision. Unlike fixed-rule IMF screening, the trained CNN learned high-dimensional temporal features from the IMF waveforms, and its decision boundary was not constrained by a single threshold, thereby improving robustness and generalization during inference. The detailed architecture is described below.
2.6. Architecture of IMFClassifier1D
IMFClassifier1D was a 1D CNN designed for the IMF waveforms. As shown in
Figure 5, the input was a single-channel signal of length 10,000, corresponding to a 10 s IMF segment. The network consisted of five convolution–normalization–activation–pooling blocks and three fully connected layers [
35,
36]. The architectural design adopted a progressive multi-scale strategy: the kernel sizes were set to [31, 21, 15, 11, 7] in decreasing order, where large kernels (31, 21) in the early layers captured low-frequency trends (maternal ECG components), while small kernels (15, 11, 7) in the deep layers extracted high-frequency fetal ECG features. The first convolutional layer mapped the single-channel IMF to 32 feature channels, using a kernel size of 31 with a stride of 2 to rapidly reduce the temporal dimension, followed by batch normalization, ReLU activation, and max pooling. The next two convolutional layers increased the number of feature channels to 64 and 128, respectively, using kernels of 21 and 15, and further reduced the sequence length via pooling while extracting higher-level time–frequency features. The fourth and fifth convolutional layers increased the feature channels to 256 and 512, with kernels of 11 and 7, and adaptive average pooling was used to compress the temporal dimension to a fixed short length, yielding a fixed-dimensional high-level feature representation. This hybrid pooling strategy (max pooling for layers 1–4, adaptive average pooling for layer 5) balanced the preservation of local details with the stabilization of output dimensions. After flattening, the feature dimensions were progressively reduced to 1024 and 256 via three fully connected layers, and a scalar logit representing the log-probability of the IMF being fetal-related was subsequently generated. ReLU nonlinearities and a relatively high dropout rate were employed to mitigate overfitting.
2.7. Training Strategy and Cross-Validation
The above architecture was tailored for extracting time–frequency characteristics from the IMF waveforms. To ensure generalization across subjects and to mitigate overfitting and class imbalance, the following training strategy and cross-validation protocol were adopted. All IMFs were organized into a unified matrix in the format of feature–label–subject ID, and training and evaluation were conducted on 20 subjects (15 simulated and 5 real). In each fold, one subject was used as the validation set and the remaining 19 subjects were used for training.
To balance the class distribution, negative samples in the training set were resampled such that the number of negatives was approximately twice the number of positives. While a 1:1 balanced distribution is standard practice in many classification tasks, the EEMD decomposition process inherently generates an extensive and highly heterogeneous set of noise-dominated IMFs, including maternal ECG residuals, baseline wander, and high-frequency EMG noise. A 2:1 ratio ensured that the CNN was exposed to a sufficiently broad spectrum of these non-fetal IMF morphologies, rather than over-representing positive fetal IMFs. This design avoided overwhelming majority-class bias while preserving the network’s ability to distinguish subtle fetal ECG features from background noise, thus achieving an optimal trade-off between sensitivity and specificity for fetal IMF detection. The Adam optimizer (learning rate: 1 ×
; weight decay: 5 ×
) was used for training. The training batch size was 64 and the validation batch size was 128. The loss function was binary cross-entropy with logits, as shown in Equation (
4):
where
is the ground-truth IMF label,
is the model logit,
is the sigmoid function, and
N is the batch size. Each fold was trained for 60 epochs, and three independent runs with different random seeds were conducted to assess training stability.
During validation, the IMF-level AUC values and the ROC curves were computed. If a validation set contained only one class, the AUC was set to 0.5 to avoid bias. This yielded an AUC matrix of 3 runs × 20 subjects, which was used to analyze model performance and stability.
Using the above training and validation procedure, a stable IMF classifier that outputs a fetal-related probability for each IMF produced by EEMD was obtained. The classifier was then combined with conventional signal processing steps to form a complete single-channel FECG extraction method, as described below.
2.8. Single-Channel FECG Extraction Pipeline
First, the input AECG was preprocessed using a 50 Hz notch filter and a fourth-order 0.1–100 Hz bandpass filter to eliminate power-line interference and retain physiologically relevant frequency components. Maternal R-peaks were primarily detected using a method implemented in the NeuroKit2 toolbox, which integrates multimodal feature-recognition logic and adapts to different noise levels in AECG. This robustness is critical for handling scenarios where the maternal ECG dominates and the fetal component is weak. Compared with single-threshold detectors, this hybrid detection strategy reduces missed detections and false positives of maternal R-peaks. If primary detection failed, a fallback scheme based on an amplitude threshold with a minimum RR-interval constraint (500 ms, corresponding to a maximum maternal heart rate of 120 bpm) was implemented [
37,
38]. Fixed-length windows centered on the detected maternal R-peaks were then extracted, and a maternal ECG template was constructed via coherent averaging, as defined in Equation (
5):
where
is the number of maternal R-peaks,
is the location of the
k-th maternal R-peak, and
is the preprocessed AECG signal. The template was then superimposed at all maternal R-peak locations to reconstruct the maternal waveform, which was subtracted from the original signal to obtain a noise-contaminated fetal signal [
39].
The noise-contaminated fetal signal after the maternal subtraction was first decomposed by EEMD. A fetal-related probability was output by IMFClassifier1D for each IMF. For recordings sampled at rates lower than 1000 Hz, each IMF segment was internally resampled to 10,000 points (equivalent to 10 s at 1000 Hz) only for CNN inference to match the fixed input length of the IMFClassifier1D. Reconstruction was performed using the original IMFs at the original sampling rate. In the first EEMD stage, the two IMFs with the highest probabilities were summed to obtain the initial FECG estimate. In the second EEMD stage, only the single highest-scoring IMF was retained to refine the extracted FECG. Next, fetal R-peaks were detected from this estimate using the NeuroKit2 method, and a 10–60 Hz bandpass filter was applied to further suppress residual noise [
40]. The difference between the optimized initial FECG (after fetal R-peak detection and 10–60 Hz bandpass filtering) and the original initial estimate was treated as a residual. This residual was decomposed again by EEMD, and the same model was used to select the optimal IMFs, yielding a refined FECG. The final output was obtained using a zero-phase fourth-order Butterworth bandpass filter (10–60 Hz). Given the distinctly different signal characteristics processed in the two stages, it is essential to clarify the discrepancies in the criteria and outcomes of IMF selection for each stage. The input to Stage 1 is the residual signal after maternal template subtraction, which still contains prominent maternal ECG remnants, low-frequency noise, and core fetal ECG components, with fetal cardiac energy dispersed across multiple frequency bands. For this reason, we retained the two IMFs with the highest relevance scores; this strategy ensures the complete preservation of morphological features of the fetal QRS complex, and preliminary analyses confirm that fetal cardiac energy is primarily distributed in two adjacent IMFs. The input to Stage 2 is the reconstructed signal from Stage 1, from which most interference has been removed. The remaining fetal ECG signal has a low amplitude but a significantly reduced noise level, with cardiac energy highly concentrated. Accordingly, we only retained the IMF with the highest relevance score to effectively suppress the high-frequency noise not fully eliminated in Stage 1. From a statistical perspective, the average fetal relevance probability of the IMF selected in Stage 2 is significantly higher than that of the IMFs selected in Stage 1, which validates the progressive purification effect of the two-stage processing pipeline.
The proposed multi-stage pipeline, consisting of template subtraction, two-stage EEMD, and CNN-based selection, is designed with clear modular contributions. Template subtraction serves to effectively suppress strong maternal ECG interference and highlight weak fetal ECG components. Two-stage EEMD serves to adaptively decompose the mixed signal and separate fetal ECG from various interferences and noise. The CNN selection module serves to automatically identify fetal-related modes and ensure the accuracy and robustness of final extraction. Collectively, these three sequential modules work in synergy to achieve reliable and accurate fetal ECG extraction, with each component playing an indispensable and irreplaceable role.
4. Discussion
4.1. Interpretation of Key Experimental Findings
The LOSO cross-validation results demonstrated that IMFClassifier1D achieves both high discriminative performance and strong robustness for fetal-related IMF identification, with a mean AUC of 0.9282 ± 0.0189 across simulated and clinical datasets. This low SD indicates minimal sensitivity to random initialization, validating the stability of the model architecture. The optimal IMF selection strategy (two IMFs in Stage 1, one IMF in Stage 2) aligns with the signal characteristics of two-stage EEMD decomposition: retaining two IMFs in the first stage preserves complete fetal QRS morphological features from maternal-subtracted residual signals, while selecting one IMF in the second stage effectively suppresses high-frequency noise from reconstructed signals.
Ablation experiments further validated the rationality of the proposed architectural design for IMFClassifier1D. The decreasing kernel size configuration balances large receptive fields in shallow layers (capturing low-frequency maternal interference) and fine-grained feature extraction in deep layers (identifying fetal QRS high-frequency components). The pyramid channel structure achieves an optimal trade-off between feature extraction capability and model complexity, while hybrid pooling combines the strengths of max pooling (preserving local temporal details of QRS complexes) and adaptive average pooling (stabilizing output dimensions and enhancing generalization). Computational complexity analysis confirmed that IMFClassifier1D maintains superior performance (AUC = 0.9445) with ultra-low inference latency (2.24 ms) and computational overhead (0.08 GFLOPs), making it well-suited for real-time deployment on resource-constrained edge devices.
4.2. Comparison with Existing Practical System Platforms
The proposed CNN-2×EEMD method demonstrated excellent performance on standard datasets. However, it is crucial to objectively acknowledge the significant progress made in the clinical translation of fetal ECG monitoring technology. Previous studies established a clinical feasibility evaluation framework for single-channel methods, developed adaptive extraction systems integrating multiple signal processing techniques with explicit goals toward practical clinical application, and comprehensively summarized commercialization progress in the field, noting that multichannel wearable devices have entered the clinical translation stage. These works represent an important transition from laboratory algorithms to real-world deployment. Nevertheless, for practical deployment in home monitoring scenarios, existing system platforms still face critical limitations. Most current systems rely on multichannel electrode configurations and require manual IMF selection, hindering the degree of automation. Commercial systems predominantly adopt complex cloud computing architectures, which introduce network latency and privacy risks while offering insufficient real-time performance at the edge. More importantly, the key bottleneck for achieving full automation in single-channel acquisition scenarios, namely the intelligent selection of IMFs following EEMD decomposition, remains unresolved.
The CNN-2×EEMD method was designed for deployment on resource-constrained edge computing platforms. The IMFClassifier1D required merely 0.08 GFLOPs and exhibited an inference latency of 2.24 ms, which was readily supported by widely available ARM Cortex-M series microcontrollers (e.g., STM32H7 with Cortex-M7 at 480 MHz) or entry-level ARM64 processors (e.g., Raspberry Pi Zero 2W with Cortex-A53). This computational efficiency eliminated the dependency on cloud computing or GPU acceleration, ensuring data privacy and real-time responsiveness under unstable network conditions typical of home environments. For practical implementation, the single-channel abdominal ECG signal would undergo analog preprocessing and digitization, be processed through the CNN-2×EEMD pipeline on the embedded processor, and the extracted mECG and fECG waveforms would be transmitted via Bluetooth Low Energy to paired iOS/Android smartphones for real-time display and clinical interpretation. However, the present study was limited to algorithmic validation on standard datasets; hardware-in-the-loop testing on actual wearable prototypes was not conducted due to the absence of dedicated embedded development platforms and clinical-grade acquisition hardware in the laboratory. Future work would integrate the proposed algorithm with commercial ARM-based wearable systems to validate end-to-end latency, power consumption, and long-term monitoring stability in real-world home settings.
4.3. Comprehensive Analysis of Method Limitations
Despite promising experimental results, this study has several notable limitations that require objective acknowledgment. First, the clinical training dataset remains small (five subjects from ADFECGDB) and lacks geographic and demographic diversity, limiting generalization to extreme low-SNR scenarios (<5 dB) and special clinical cases such as multiple pregnancies or fetal arrhythmias. Second, the proposed method is exclusively designed for single-channel AECG signals and cannot be directly extended to multichannel data; the fixed IMF selection logic and maternal template subtraction strategy are optimized for single-channel characteristics, and adapting to multichannel inputs would require reconfiguring the feature fusion mechanism. Third, the model relies entirely on supervised training with labeled fetal reference leads, which restricts its applicability in clinical settings where ground-truth FECG labels are unavailable. This dependence on annotated data also increases the labor cost of model training and validation.
Beyond these foundational limitations, error analysis of experimental results reveals specific sources of performance degradation that hinder further optimization. For the DaISy Channel 3 dataset, two fetal R-peak missed detections were observed in the CNN-2×EEMD results, attributed to morphological distortion of fetal QRS complexes in the original signal (caused by overlapping maternal ECG components) and subsequent misclassification of relevant IMFs by IMFClassifier1D. High-frequency noise residues in Auto-2×EEMD results stem from incomplete maternal template subtraction, where the fixed template update rule fails to adapt to nonstationary maternal ECG variations, leading to residual maternal components being misclassified as fetal-related IMFs. Additionally, quantitative analysis of false detection cases in the NIFEA database (Case ARR01) shows that 28% of missed detections are associated with weak fetal QRS amplitudes and overlapping EMG interference, which reduces the accuracy of IMF feature extraction and classification.
To intuitively characterize these errors, time-domain error heatmaps were generated by marking missed detection/false detection positions on raw and processed waveforms, combined with PSD analysis of these regions. The heatmaps reveal that missed detection regions exhibit reduced energy in the 20–40 Hz core band of FECG (15–20% lower than normal regions) and increased energy accumulation in the 0–10 Hz band (maternal baseline drift), while false detection regions show abnormal high-frequency energy (>40 Hz) from EMG interference. This analysis confirms a direct correlation between error distribution and signal feature anomalies, providing clear targets for method improvement.
4.4. Targeted Directions for Future Improvement
To address the identified limitations and error sources, future work will focus on actionable and context-specific optimizations rather than general strategies. For dataset expansion, a standardized multi-center clinical dataset will be constructed, collecting over 100 samples from five or more regional hospitals across different geographic areas (Europe, Asia, North America) to resolve uneven data distribution. Each sample will include detailed metadata (maternal age, gestational week, acquisition device) to enable stratified analysis and reduce selection bias, with special emphasis on including extreme low-SNR cases and special clinical scenarios to enhance model generalization.
For model architecture optimization, channel attention mechanisms will be integrated into the 1D CNN backbone to enhance feature weighting for fetal QRS complexes in the 8–45 Hz band, improving discrimination against maternal interference and reducing IMF classification errors. Lightweight design strategies, including pruning redundant convolutional layers and implementing INT8 precision quantization, will be applied to reduce the model parameter count by 30% while maintaining an AUC above 0.93, further lowering inference latency and device power consumption for wearable deployment. To address the dependence on labeled data, semi-supervised learning frameworks will be explored, using unlabeled clinical data to pre-train the model and fine-tuning with a small subset of labeled samples to reduce reliance on ground-truth FECG references.
For error mitigation and application expansion, targeted improvements will be made to address specific error sources. The IMF label generation rule will be optimized to account for morphological variations in fetal QRS complexes, incorporating dynamic thresholding based on local signal SNR. The maternal template update mechanism will be enhanced with adaptive windowing to track nonstationary maternal ECG variations, reducing residual maternal components in subtracted signals. Additionally, the preprocessing workflow will be adapted for dynamic monitoring scenarios (maternal walking, sitting, lying down) by adding a motion artifact suppression module, combining adaptive filtering and IMF-based noise reduction tailored to wearable device acquisition characteristics. The optimized pipeline will be validated on a chest-worn wearable prototype with 24-h continuous monitoring to ensure robustness in real-world clinical settings.
5. Conclusions
This study proposed a CNN-guided two-stage EEMD method for single-channel FECG extraction, with a dedicated 1D CNN (IMFClassifier1D) achieving high precision (F1-score: 0.8372–0.9565), ultra-low computational overhead (0.08 GFLOPs), and strong generalization on diverse independent datasets including normal pregnancies (DaISy) and fetal arrhythmias (NIFEA DB). Systematic ablation experiments validated the optimality of architectural choices, including the decreasing kernel size strategy, pyramid channel configuration, and hybrid pooling mode. The IMF selection strategy was quantitatively optimized through parameter sensitivity analysis. The method achieved low inference latency (2.24 ms) with minimal computational requirements, demonstrating its suitability for real-time home monitoring. The modular design, which included maternal template subtraction, EEMD decomposition and CNN-guided IMF selection, effectively addressed core challenges in single-channel FECG extraction, including strong maternal interference and nonstationary noise. While limitations existed in dataset diversity, applicability scope, and labeled data dependence, the comprehensive error analysis and targeted improvement strategies outlined here provided clear pathways for advancing clinical translation and enabling real-time, portable fetal monitoring in diverse clinical scenarios. It was noted that although this study was validated only on standard datasets without large-scale clinical trials, the proposed ultra-low-complexity CNN-2×EEMD method provided a direct algorithmic foundation for integration with existing system platforms. Compared to existing semi-automatic methods and traditional adaptive methods, the fully automated edge-computing approach of this study offered distinct advantages in real-time performance, computational efficiency, and fully automated processing capability, potentially enabling true clinical translation for home fetal monitoring through future integration with hardware architecture of existing system platforms.