MCA-FM: Robust Non-Invasive Fetal ECG Extraction via Minimal Channel Attention and Flow Matching

Duan, Qingqing; Hu, Xinyu; Zhang, Yuwei; Xiao, Zhijun; Liu, Chengyu

doi:10.3390/app16125953

Open AccessArticle

MCA-FM: Robust Non-Invasive Fetal ECG Extraction via Minimal Channel Attention and Flow Matching

by

Qingqing Duan

¹,

Xinyu Hu

¹,

Yuwei Zhang

^1,*,

Zhijun Xiao

² and

Chengyu Liu

³

¹

School of Computer Science, Jiangsu University of Science and Technology, Zhenjiang 212003, China

²

School of Information and Artificial Intelligence, Yangzhou University, Yangzhou 225127, China

³

The State Key Laboratory of Bioelectronics, School of Instrument Science and Engineering, Southeast University, Nanjing 210096, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(12), 5953; https://doi.org/10.3390/app16125953 (registering DOI)

Submission received: 28 April 2026 / Revised: 3 June 2026 / Accepted: 10 June 2026 / Published: 12 June 2026

(This article belongs to the Special Issue Research and Technology in Electrocardiology)

Download

Browse Figures

Versions Notes

Abstract

Non-invasive fetal electrocardiogram (FECG) extraction from maternal abdominal ECG (AECG) is crucial for prenatal monitoring but remains challenging due to strong interference from maternal ECG (MECG), baseline drift, and noise. We propose an FECG extraction method based on minimal channel attention (MCA) and flow matching (FM), learning a deterministic mapping from AECG to FECG via a probabilistic path. To balance the preservation of physiological signals and separation of interference, we employ bridge variance scheduling for the diffusion process. Target matching loss is introduced to regress the FECG directly, enhancing training stability and waveform fidelity. For feature selection, a minimal channel attention module with global average pooling and a single linear layer is embedded after feature extraction, capturing cross-channel dependencies with minimal parameters. Enhanced residual connections are incorporated to retain underlying features and optimize gradient flow in deep networks. Experiments on two public datasets (ADDB and BDDB) with a leave-one-out cross-validation strategy show that our method achieves average Pearson correlation coefficients (PCCs) of 0.94 ± 0.050 on ADDB and 0.91 ± 0.122 on BDDB, demonstrating robust performance across diverse real-world recording conditions. The method balances high accuracy with efficient feature extraction, offering a reliable solution for non-invasive fetal heart health monitoring.

Keywords:

minimal channel attention; flow matching; non-invasive fetal electrocardiogram; residual connection

1. Introduction

Fetal electrocardiogram (FECG) is a cornerstone signal for non-invasive prenatal evaluation of fetal cardiac health. It captures a myriad of key physiological indicators, most notably fetal heart rate (FHR) and the morphology of the QRS complex. These parameters offer a pivotal and reliable foundation for the early diagnosis of critical fetal distress conditions [1]. Scalp electrocardiography can obtain clear fetal electrocardiograms, but the signal can only be obtained during delivery, and the risk of fetal infection is relatively high. Non-invasive fetal ECG (FECG) extraction is typically implemented by acquiring mixed abdominal electrocardiogram (AECG) signals via patch-type electrodes affixed to the maternal abdomen, which constitutes the core of a non-invasive abdominal patch-based real-time FECG monitoring and analysis system. This approach offers prominent merits including fetal safety, compatibility with real-time processing, and feasibility for long-term continuous monitoring, establishing it as a key research focus in clinical prenatal fetal health assessment [2]. The complete workflow of signal acquisition, FECG extraction, and real-time analysis in this system is illustrated in Figure 1. The system is structured as a four-stage pipeline: Figure 1a shows a wearable abdominal biosensor patch with textile-integrated hydrogel electrodes and a low-power wireless chip, which collects raw multi-channel AECG signals non-invasively and transmits data via Bluetooth; Figure 1b shows a patient-facing mobile terminal that supports real-time FHR display and localized arrhythmia alerts; Figure 1c shows an AI-powered IoMT cloud platform that performs secure AECG signal reconstruction and MECG–FECG separation; and Figure 1d shows a bidirectional remote clinician portal that enables multi-modal fetal rhythm review and real-time prenatal health assessment. However, the amplitude of the FECG signal is only 1/10 to 1/5 of the maternal ECG (MECG), and the two overlap significantly in both time and frequency domains. Simultaneously, complex interferences such as baseline drift, electromyographic noise, and power line interference are superimposed, collectively rendering accurate FECG extraction a persistent challenge in biomedical signal processing [3].

Conventional FECG extraction algorithms rely on traditional signal processing techniques, with the core idea to suppress MECG interference through linear separation or template matching. Blind source separation (BSS) and adaptive noise cancellation (ANC) are typical representatives. The ANC was first systematically proposed by Widrow et al. [4], achieving interference cancellation by constructing an MECG reference signal. However, limited by the nonlinear propagation characteristics of MECG from the maternal thorax to the abdomen, waveform differences between the reference signal and the actual interference led to suboptimal separation results. Independent component analysis (ICA), as a fundamental technique in BSS, was utilized by Zarzoso et al. [5] to blindly separate the MECG and FECG based on statistical independence. Nevertheless, its effectiveness was compromised by sensitivity to non-stationary noise [6]. More fundamentally, the linear mixing assumption underlying BSS and ICA does not hold for abdominal ECG recordings, where the maternal ECG propagates through a nonlinear, time-varying volume conductor from the thorax to the abdomen, causing waveform distortion and phase shifts that cannot be captured by linear demixing models [7]. To optimize the robustness of traditional methods, Zhang et al. [8] proposed an improved ANC scheme combining SVD with a smoothing window (SWSVD), enhancing interference suppression capability by adaptively constructing reference signals. However, performance still degraded significantly in low signal-to-noise ratio scenarios. Template matching (TM) and singular value decomposition (SVD) are also important branches of traditional methods. Cerutti et al. [9] proposed a template subtraction method based on coherent averaging, which canceled MECG interference by leveraging an averaged MECG template. Nonetheless, its performance heavily relied on accurate detection of MECG QRS complexes. When maternal and fetal QRS complexes overlap in time, which occurs frequently because their heart rates differ by only 20 to 40 bpm on average, template subtraction either fails to remove the maternal component completely or inadvertently attenuates the fetal waveform, leading to missed or distorted fetal beats [10]. Liu et al. [11] proposed a method that combined RR interval smoothing with an SVD template to enhance FECG detection accuracy in overlapping regions through correction of falsely detected R-waves. Nevertheless, the generalization capability of manually designed template parameters remained limited. Kanjilal et al. [12] employed SVD-based decomposition of the AECG matrix to extract the FECG from the residual after removing MECG-dominant components. However, it could not effectively separate fetal signals that overlap spectrally with MECG. Overall, traditional methods have simple structures and low computational cost. Nonetheless, they rely on handcrafted feature design and have poor adaptability to signal variations and noise.

In recent years, deep learning has provided end-to-end solutions for FECG extraction. Mohebbian et al. [13] introduced an attention-based CycleGAN, which employed attention masks to focus on fetal signal regions and reached a 99.70% F1-score. However, the full-layer attention design led to computational redundancy. Huang et al. [14] proposed TCGAN, which utilized temporal convolution to enhance temporal feature capture capability and improved waveform detail preservation. Nevertheless, the instability of adversarial training easily leads to mode collapse, where the generator learns to produce only a limited subset of plausible FECG morphologies and fails to capture the full variability of real fetal ECG waveforms. This is particularly problematic for fetal monitoring, where subtle morphological changes in the P-wave, T-wave, or ST segment can carry diagnostic significance. Basak et al. [15] developed 1D-CycleGAN, achieving morphology-preserving FECG extraction through spectral consistency constraints, further optimizing the waveform fidelity of GAN-based methods.

Beyond GAN-based approaches, attention mechanisms have been extensively investigated to enhance feature discrimination. Wang et al. [16] proposed PA²Net, which integrated periodic-aware attention with residual connections and used KL divergence shared weights to improve the robustness of joint FECG detection, achieving a positive predictive value (PPV) of 99.74% on the Abdominal and Direct Fetal ECG Database (ADDB), though the stacking of multiple modules increased model complexity. Chen et al. [17] developed a CNN–transformer hybrid model and compared the effectiveness of channel and spatial attention, verifying the advantage of channel attention for time-series signal adaptation. Wang et al. [18] introduced ECA-Net, an efficient local cross-channel interaction strategy without dimensionality reduction, implementing efficient channel attention via 1D convolution and providing theoretical support for attention design. Wang et al. [19] further proposed Correlation-Aware Attention CycleGAN, combining attention mechanisms with GANs to enhance feature discrimination capability. Nonetheless, the self-attention mechanism of transformers still suffers from the problem of computational cost increasing quadratically with sequence length.

Given the limitations of existing methods in handling complex noise and computational efficiency, alternative generative frameworks have been explored. Diffusion models have also been applied to signal reconstruction. Chen et al. [20] proposed DIFF-FECG, a conditional diffusion-based method designed to handle non-Gaussian noise, achieving a Pearson correlation coefficient (PCC) of 0.92. Flow matching methods offer a promising alternative. Lipman et al. [21] first proposed the generative modeling framework based on flow matching, which learns a deterministic vector field to achieve efficient mapping between distributions. Its single-step inference characteristic avoids the efficiency loss caused by multi-step iterations. However, existing flow matching methods lack specific feature enhancement design for the FECG extraction task, have limited ability to distinguish channel features between MECG and FECG, and struggle to further improve signal separation accuracy.

Despite these advances, several key limitations persist. Traditional methods lack robustness and have poor adaptability to MECG variations and complex noise. Deep learning methods suffer from a performance–efficiency imbalance: GANs and full-attention models are computationally complex, while diffusion models remain time-consuming for inference. The baseline flow matching model offers efficiency but lacks targeted attention enhancement, resulting in limited feature discrimination capability.

We propose a fetal electrocardiogram extraction algorithm based on minimal channel attention and flow matching (MCA-FM). We draw on the design ideas of hierarchical multi-kernel filtering (HNF) and diffusion-inspired conditional fusion (Diff) feature extraction modules found in [20] and add a minimal channel attention mechanism and strong residual connection to achieve synergistic optimization of accuracy and robustness. Specifically, a precise mapping from AECG to FECG is constructed based on deterministic flow transformation. Inspired by the non-dimensionality-reduction cross-channel interaction idea of ECA-Net [18], the newly added minimal channel attention module includes only global average pooling and a single linear layer. It uses learnable mixing weights to focus on key frequency channels related to the fetal signal, avoiding the redundant computation of full-layer attention. Enhanced residual connections fuse original features with attention-enhanced features, alleviating the gradient vanishing problem in deep models. Simultaneously, bridge variance scheduling (BVS) and target matching loss (TM) are introduced to ensure training stability and waveform morphology fidelity.

The main contributions of this paper are summarized as follows:

Novel Generative Modeling for FECG Extraction: We formulate non-invasive FECG extraction as a continuous-time generative process using flow matching. By learning a deterministic mapping via a probabilistic path, the model effectively isolates clean fetal signals from highly complex maternal mixtures, overcoming the limitations of linear assumptions in traditional blind source separation methods.
Minimal Channel Attention for Feature Discrimination: We introduce a minimal channel attention module tailored for physiological time-series signals. By capturing cross-channel dependencies without dimensionality reduction, this design significantly enhances the model’s ability to discriminate between MECG and FECG features. Crucially, it avoids computational redundancy, achieving a strict balance between high accuracy and efficient feature extraction.
Real-World Clinical Robustness and Efficient Inference: Validated on ADDB and BDDB via leave-one-out cross-validation, our method maintains consistent waveform reconstruction and R-peak detection across diverse subjects under severe interference such as uterine contractions. Leveraging single-step flow matching inference, it achieves linear time complexity and ultra-fast execution, making it well-suited for real-time prenatal monitoring and deployment on resource-constrained portable devices.

The remainder of this paper is organized as follows. Section 2 details the proposed MCA-FM framework, including the experimental datasets, preprocessing steps, model design, training strategy, inference process, and evaluation metrics. Section 3 presents the experimental results, including waveform reconstruction and R-peak detection performance, visualization verification, statistical validation, ablation studies, sensitivity analysis, normalization strategy comparison, cross-dataset generalization, evaluation on synthetic data, consistency of fetal physiological parameters across datasets, and resource footprint analysis. Section 4 discusses the advantages and limitations of the proposed method in comparison with existing approaches. Finally, Section 5 concludes the paper and outlines directions for future research.

2. Materials and Methods

This section details the proposed MCA-FM framework for non-invasive FECG extraction. We begin by introducing the datasets and preprocessing steps. We then present the model design, including the conditional flow matching formulation, network architecture, and our core improvements. Finally, we outline the training strategy, inference process, and evaluation metrics.

2.1. Databases

To verify the model’s generalization, we employ three datasets: two real-world clinical maternal abdominal ECG recordings and one synthetic dataset with controlled interferences.

Abdominal and Direct Fetal ECG Database (ADDB): This contains abdominal recordings from 5 full-term pregnant women (38–41 weeks). Each recording includes 4 channels of abdominal electrode signals and 1 channel of direct fetal scalp ECG (reference signal). The sampling rate is 1000 Hz, and each recording lasts 5 min. This includes expert-annotated fetal R-peak positions, which are used for evaluating waveform reconstruction and R-peak detection accuracy [22,23].

Fetal electrocardiograms, direct and abdominal with reference heart beats annotations (BDDB): Contains abdominal recordings from 12 full-term pregnant women (38–42 weeks). This includes 4 channels of abdominal electrode signals with a sampling rate of 500 Hz. Each recording lasts 5 min. A direct fetal scalp ECG is simultaneously collected as a reference. Fetal R-peaks are automatically detected and then corrected by clinical experts [24]. Notably, BDDB contains a greater diversity of physiological interferences, such as uterine contractions and motion artifacts, making it a more challenging benchmark for evaluating the robustness of FECG extraction algorithms under realistic clinical conditions.

Fetal ECG synthetic database (FECGSYNDB): A synthetic dataset generated based on 10 virtual pregnancy scenarios. It contains 32 channels of abdominal electrodes and 2 channels of maternal reference signals. The sampling rate is 250 Hz, and each recording lasts 5 min. The dataset includes seven event scenarios: baseline (no events), noise only, fetal movement, maternal or fetal heart rate acceleration or decelerations, uterine contractions, ectopic beats (for both fetus and mother), and additional NI-FECG (twin pregnancy). Each scenario is available at five noise levels (0–12 dB), enabling evaluation of the model’s robustness under controllable interference [6]. For this dataset, we select four abdominal channels (indices 11, 19, 22, and 25) as the input leads [25].

2.2. Preprocessing

To unify the signal distribution and suppress interference, the specific steps are as follows:

Filtering: A 7.5–75 Hz bandpass filter (Butterworth, 3rd order) is applied to suppress EMG noise and baseline drift [20,26]. Simultaneously, 50 Hz and 60 Hz notch filters are used to reduce power line interference.
Downsampling: ADDB (1000 Hz), BDDB (500 Hz), and FECGSYNDB (250 Hz) are uniformly downsampled to 200 Hz to balance computational efficiency and signal fidelity.
Normalization: Z-score normalization is applied, with the following formula:

s_{norm} = \frac{s - μ}{σ},

(1)

where

μ

and

σ

are statistics calculated from the training set, avoiding magnitude differences affecting training.

4.: Segmentation: The signal is segmented into 5 s overlapping segments (1000 sample points) with an overlap rate of 50%. This ensures that each segment contains a complete fetal heartbeat waveform while also augmenting the training sample size.
5.: Multichannel handling: All datasets provide four abdominal leads. The MCA-FM model itself operates on single-lead input: it takes one channel of AECG and outputs the estimated FECG. During training, segments from all four channels are independently preprocessed and combined to enlarge the training set, with each single-lead segment paired with the corresponding reference FECG. During inference, the trained model is applied separately to each lead, and the lead with the highest Pearson correlation coefficient relative to the reference is selected as the final output per subject.

2.3. Proposed MCA-FM Architecture

The non-invasive FECG extraction task can be described as follows: Given a single-channel or multi-channel mixed signal

s \in ℝ^{L}

(AECG) recorded from the maternal abdomen, the goal is to recover the pure fetal ECG signal

\hat{s} \in ℝ^{L}

(FECG) contained within it, where L is the signal length. AECG can be modeled as the superposition of FECG and interference noise n, primarily maternal ECG (MECG):

s = \hat{s} + n .

(2)

Our network architecture is based on the conditional flow matching (CFM) framework. Referring to the design concept of HNF and Diff blocks, it adopts a multi-scale convolutional feature extraction structure. The core improvement lies in adding a minimal channel attention module and enhanced residual connections (Figure 2), achieving a balance between accuracy and feature extraction efficiency.

2.3.1. Conditional Flow Matching Formulation

Flow matching provides an efficient way to learn a continuous transformation from a simple prior distribution

p_{0} (x)

to a complex data distribution

p_{1} (x)

. Its core is to learn a time-dependent vector field

v_{θ} (x_{t}, t, s)

, which defines an ordinary differential equation (ODE) [21]:

\frac{d}{d t} x_{t} = v_{θ} (x_{t}, t, s),

(3)

where

t \in [0, 1]

,

x_{0} \sim p_{0}

and

x_{1}

should follow the target data distribution

p_{1} (s)

. By numerically integrating this ODE from

t = 0

to

t = 1

, an estimated FECG signal

\hat{s}

can be generated from the observation

s

.

Conditional flow matching (CFM) constructs a time-dependent probability path

p_{t} (x | s, \hat{s})

, which starts near the clean FECG

\hat{s}

at

t = 0

and ends at the observed AECG

s

at

t = 1

. Specifically, we define a linear interpolation path, as follows:

x_{t} = μ_{t} (\hat{s}, s) + σ_{t} ε, ε ~ N (0, I),

(4)

where the mean path

μ_{t} (\hat{s}, s) = (1 - t) \hat{s} + t s

ensures the trajectory smoothly transitions from

\hat{s}

to

s

, and where

σ_{t}

is the noise scale following the bridge variance schedule (BVS). The corresponding true vector field is as follows:

u_{t} = \frac{d μ_{t}}{d t} + \frac{d σ_{t}}{d t} ε = (s - \hat{s}) + {\dot{σ}}_{t} ε .

(5)

By training a parameterized neural network

v_{θ} (x_{t}, t, s)

to match this vector field, i.e., minimizing the loss, we obtain the following:

L_{C F M} = E_{t, ε} {‖v_{θ} (x_{t}, t, s) - u_{t}‖}_{2}^{2} .

(6)

This model can learn a deterministic ODE. During inference, starting from the initial point

x_{0} \approx s

and numerically integrating this ODE in a single step can generate a high-quality FECG estimate

\hat{s}

, which has a significant efficiency advantage over diffusion models that require multi-step iterations.

2.3.2. Flow Matching Modules

HNF Block: Captures multi-scale temporal features through 4 parallel 1D convolutions (kernel sizes 3, 5, 9, 15), covering from fetal QRS complex (~0.1 s) to maternal QRS complex (~0.15 s). The convolution outputs are concatenated and then compressed using a 9 × 1 convolution. Subsequently, channel splitting (instance normalization branch + identity branch) is applied. After concatenation, LeakyReLU activation is used. Finally, a residual connection fuses the input features to alleviate gradient vanishing. Instance normalization in the HNF block is applied only to half of the feature channels after channel splitting and operates on learned feature maps rather than raw ECG signals. It is important to distinguish this internal instance normalization from the input-level Z-score normalization applied during preprocessing. The Z-score normalization removes global amplitude variations across different subjects and recordings, ensuring that the network receives inputs with a consistent scale. In contrast, instance normalization operates on deep feature maps within the network, helping to stabilize the distribution of intermediate features and reduce internal covariate shift [27].

Diff block: Encodes the time step t via positional encoding (PE) and maps it to modulation parameters through a fully connected layer. These parameters are element-wise added to the input features. Then, bidirectional dilated convolutions that progress through the layers capture long-range dependencies. A gating mechanism (tanh feature ×

σ

gate) selects effective features. Finally, a 1 × 1 convolution compresses the dimensions and a residual connection fuses the output with the output of the HNF block, injecting time step information to guide the flow transformation.

2.3.3. Minimal Channel Attention Module

To enhance the selection capability of key fetal features, a minimal channel attention module is added after the Diff block feature fusion. Referring to the non-dimensionality-reduction cross-channel interaction idea of ECA-Net, it avoids the redundant computation of full-layer attention: Given an input feature

F \in ℝ^{C \times L}

(C is the number of channels), the module first compresses the temporal information to

ℝ^{C \times 1}

using AdaptiveAvgPool1d (global average pooling). Then, a single linear layer captures cross-channel dependencies (adapting to the local channel correlations of time-series signals). Finally, a Sigmoid activation function generates channel weight vectors

ω \in ℝ^{C \times 1}

, with the following formula:

ω = σ (Linear (AdaptiveAvgPool 1 d (F))) .

(7)

The weighted feature is

\hat{F} = ω \cdot F

. A learnable mixing weight

α = 0.05

is introduced to balance the original feature and the enhanced feature:

F_{o u t} = F + α (\hat{F} - F) .

(8)

This module introduces C² parameters from the linear layer, which is negligible relative to the total model size.

2.3.4. Enhanced Residual Connection

To optimize gradient flow in the deep network, we systematically enhance residual connections at key positions in the network. First, the identity mapping residual connection inside the HNF block is retained. Second, a new strong residual connection is added between the attention module and subsequent feature transmission. Specifically, the attention-weighted features do not directly replace the original features. Instead, a linear fusion of “original feature + weighted residual” (

F_{o u t} = F + α (\hat{F} - F)

) is adopted. This design ensures that underlying effective features are not overwritten, thereby stabilizing the training process and accelerating model convergence.

2.4. Training Strategy

Training is performed using the target matching (TM) objective, directly regressing the clean signal to provide stable gradients. The training loss function is defined as follows:

L_{TM} = E_{t, ε} {‖x_{θ} (x_{t}, t, s) - \hat{s}‖}_{2}^{2},

(9)

where

x_{θ}

is the MCA-FM network.

Bridge variance scheduling (BVS) is introduced to enhance training stability and waveform morphology fidelity. BVS adopts a parabolic form:

σ_{t} = σ_{\min} + (σ_{\max} - σ_{\min}) \cdot 4 t (1 - t) .

(10)

Set

σ_{\min} = 0.0

,

σ_{\max} = 0.5

. This maintains low variance at the path endpoints (

t = 0, 1

) to stay close to the physiological signal and high variance in the middle (

t = 0.5

) to explore the interference separation space. Training parameters are set as follows: AdamW optimizer is used, with a learning rate of 2 × 10⁻⁴ and weight decay of 5 × 10⁻⁵. The batch size is set to 8, and the number of training epochs is 90. An early stopping strategy is adopted (stop training if validation set PCC does not improve for 5 consecutive epochs), and gradient clipping (norm threshold 1.0) is used to stabilize the training process. The combination of leave-one-out cross-validation, early stopping on validation PCC, weight decay, and gradient clipping serves to detect and prevent overfitting. Across all experimental folds, no systematic divergence between training and validation performance was observed at convergence. All convolutional layers use Kaiming normal initialization with biases set to zero, and fully connected layers use the PyTorch 2.7.1 default Kaiming uniform initialization. The random seed was set to 1 for all experiments, and descriptive statistics (mean ± SD) are reported over the five leave-one-out cross-validation folds. The MCA-FM network uses a feature dimension of 128 channels throughout all HNF and Diff blocks, with five HNF blocks, six bridge modules, and five Diff blocks. The total number of trainable parameters is approximately 6.62 M. The complete training procedure is summarized in Algorithm 1.

Algorithm 1 MCA-FM Training

Input: Paired AECG-FECG data

{\{(s^{(i)}, {\hat{s}}^{(i)})\}}_{i = 1}^{N}

, network

x_{θ}

(including HNF block, Diff block and attention), bridge variance parameters

σ_{\min}, σ_{\max}

, learning rate

η

, epochs

E

, batch size

B

Output: Trained model

x_{θ}

for epoch = 1 to $E$ do
for each mini-batch $\{(s^{(i)}, {\hat{s}}^{(i)})\}$ of size B do
Sample t uniformly from $[0, 1]$
Compute $σ_{t} = σ_{\min} + (σ_{\max} - σ_{\min}) \cdot 4 t (1 - t)$
Compute $μ_{t} = (1 - t) \cdot {\hat{s}}^{(i)} + t \cdot s$
Sample noise $ε ~ N (0, I)$ , set $x_{t} = μ_{t} + σ_{t} \cdot ε$
Extract features from $x_{t}$ via HNF block, from s via Diff block, then fuse to obtain f
Apply channel attention: $f_{att} = f + α \cdot CA (f)$ //CA defined in Equations (7) and (8), $α$ learnable
Predict clean FECG: $\hat{s} = x_{θ} (x_{t}, s, t)$
Compute loss $L = {‖\hat{s} - {\hat{s}}^{(i)}‖}_{2}^{2}$
Update $θ \leftarrow θ - η \nabla_{θ} L$
end for
end for

2.5. Inference Process

Inference follows single-step ODE integration. Starting from

t = 1

, we initialize

x_{1} = s + σ_{1} \cdot N (0, I)

with

σ_{1}

computed from (10). This initialization follows naturally from the path definition:

t = 1

corresponds to the observed AECG s, so the starting point is set near the observation. A small amount of noise is added for numerical stability during integration. The vector field is estimated by the trained network:

{\hat{s}}_{t} = x_{θ} (x_{t}, s, t),

(11)

{\hat{u}}_{t} = (s - {\hat{s}}_{t}) + \frac{{\dot{σ}}_{t}}{σ_{t}} (x_{t} - μ_{t} ({\hat{s}}_{t}, s)),

(12)

where

μ_{t} ({\hat{s}}_{t}, s) = (1 - t) {\hat{s}}_{t} + t s

and

{\dot{σ}}_{t} = 4 (σ_{\max} - σ_{\min}) (1 - 2 t)

. The state is updated using the Euler method with a single step (

Δ t = - 1

):

x_{t - 1} = x_{t} - {\hat{u}}_{t} .

(13)

The final extracted FECG is obtained as

\hat{s} = x_{0}

. The inference procedure is detailed in Algorithm 2.

Algorithm 2 MCA-FM Inference (Single-Step ODE)

Input: Noisy AECG s, trained model

x_{θ}

, variance parameters

σ_{\min}, σ_{\max}

Output: Extracted FECG

\hat{s}

Set t = 1.0
Compute $σ_{t} = σ_{\min} + (σ_{\max} - σ_{\min}) \cdot 4 t (1 - t)$
Initialize $x_{t} = s + σ_{t} \cdot ε$ , with $ε ~ N (0, I)$ (small noise added for robustness)
Predict clean FECG: ${\hat{s}}_{t} = x_{θ} (x_{t}, t, s)$
Compute $μ_{t} = (1 - t) \cdot {\hat{s}}_{t} + t \cdot s$
Compute ${\dot{σ}}_{t} = 4 (σ_{\max} - σ_{\min}) (1 - 2 t)$
Compute velocity: ${\hat{u}}_{t} = (s - {\hat{s}}_{t}) + \frac{{\dot{σ}}_{t}}{σ_{t}} (x_{t} - μ_{t})$
Update state: $x_{t - 1} = x_{0} = x_{t} - {\hat{u}}_{t}$ (since $Δ t = - 1$ for single-step inference)
Obtain extracted FECG: $\hat{s} = x_{0}$

Return

\hat{s}

2.6. Evaluation Metrics

Two types of well-recognized metrics in the field of FECG extraction are respectively used to evaluate waveform quality and R-peak detection accuracy.

2.6.1. Waveform Reconstruction Quality Metrics

Pearson correlation coefficient (PCC): This measures the linear correlation between the estimated signal

\hat{s}

and the true signal

s

. A value closer to 1 indicates higher waveform morphological consistency. The calculation formula is as follows:

PCC (s, \hat{s}) = \frac{\sum_{i = 1}^{L} (s_{i} - \bar{s}) ({\hat{s}}_{i} - \bar{\hat{s}})}{\sqrt{\sum_{i = 1}^{L} {(s_{i} - \bar{s})}^{2} \sum_{i = 1}^{L} {({\hat{s}}_{i} - \bar{\hat{s}})}^{2}}},

(14)

where

\bar{s}

and

\bar{\hat{s}}

are the means of the true and estimated signals, respectively. Notably, although PCC may yield optimistic values for very low-amplitude signals, it remains the most widely adopted waveform-level metric in the non-invasive FECG extraction task [28,29], as it directly quantifies morphological similarity between the extracted and reference signals.

Spectral correlation (SPC): This evaluates frequency domain consistency and is sensitive to the fidelity of periodic features (such as QRS complexes). The formula is as follows:

SPC (s, \hat{s}) = PCC (PSD (s), PSD (\hat{s})),

(15)

where PSD stands for power spectral density.

Mean absolute error (MAE): Quantifies the average amplitude deviation between the estimated and reference signals, with a lower value indicating better waveform fidelity. The formula is as follows:

MAE = \frac{1}{L} \sum_{i = 1}^{L} |s_{i} - {\hat{s}}_{i}|

(16)

2.6.2. Fetal R-Peak Detection Metrics

From the estimated FECG signal s, we use the Pan–Tompkins [30] algorithm to detect fetal R-peak positions. These are compared with the ground truth annotations to calculate the following metrics:

Sensitivity (Sen, recall): The ratio of correctly detected R-peaks to the total number of true R-peaks.

Sen = \frac{TP}{TP + FN} \times 100 %

(17)

Positive predictive value (PPV, precision): The ratio of correctly detected R-peaks to all detected peaks.

PPV = \frac{TP}{TP + FP} \times 100 %

(18)

F1-score: The harmonic mean of Sen and PPV.

F 1 = \frac{2 \times Sen \times PPV}{Sen + PPV}

(19)

Here, TP is true positive, FN is false negative, and FP is false positive. A detection error ≤ 50 ms is considered correct.

3. Results

All experiments were conducted on a unified platform to ensure reproducibility. The hardware configuration included a Lenovo Legion Y9000P IRX8 laptop equipped with 16.0 GB DDR5 RAM and an NVIDIA GeForce RTX 4050 Laptop GPU (6 GB VRAM) for accelerated deep learning. The software environment comprised 64-bit Windows 11 Home (version 25H2), Python 3.9 within a Conda virtual environment, and PyTorch 2.7.1 as the core deep learning framework.

To comprehensively verify the performance of the proposed model, systematic experiments were conducted on the ADDB and BDDB clinical datasets. Consistent with the experimental design outlined in the Section 2, the experiments used leave-one-out cross-validation and evaluated performance from multiple dimensions. Evaluation metrics included waveform reconstruction quality (MAE, SPC, PCC) and R-peak detection accuracy (Sen, PPV, F1), as well as Pearson correlation (r), root mean square error (RMSE) for heart rate (HR) estimation, and mean difference with 95% limits of agreement (LoA) for R-to-R wave (RR) interval estimation. These multi-dimensional clinical metrics assess morphology, localization, consistency, and generalization for rigorous clinical evaluation. The specific results are as follows.

3.1. Waveform Reconstruction Performance

On the basis of the designed preprocessing pipeline, we further evaluated the overall performance of the MCA-FM model on real clinical datasets. We further tested the model on BDDB, which features a broader variety of interferences than ADDB, to examine its generalization ability. Table 1 and Table 2 present the quantitative evaluation of the proposed model’s FECG waveform reconstruction quality on the ADDB and BDDB datasets, respectively. The core metrics are MAE, SPC, and PCC. The model demonstrated excellent FECG extraction capability on both types of real datasets.

The ADDB dataset (Table 1) consists of recordings from five subjects, with detailed waveform reconstruction metrics reported for each. The model achieved excellent reconstruction performance, with mean PCC of 0.94 and MAE of 0.21, indicating high morphological agreement with the direct scalp ECG reference. The mean SPC of 0.94 confirmed strong frequency-domain consistency. Among individual subjects, r01, r04, r07, and r08 exhibited particularly high performance (PCC ≥ 0.96), while r10 displayed relatively lower PCC (0.86), possibly due to higher noise levels and atypical signal characteristics in that recording.

The BDDB dataset (Table 2), comprising recordings from 12 subjects with significant physiological interferences, such as uterine contractions and electrode displacement, more closely resembled practical clinical scenarios. Despite these challenges, the model maintained good waveform reconstruction stability, achieving mean PCC of 0.91, SPC of 0.92, and MAE of 0.27 ± 0.125. Most subjects attained PCC above 0.92, with B2_Labour_11 reaching the highest PCC of 0.97. However, subject B2_Labour_03 exhibited substantially lower performance (PCC = 0.53, MAE = 0.66). Further analysis revealed that all four abdominal channels of this subject had severely degraded signal quality, with the best channel achieving a PCC of only 0.52 and no channel exceeding 0.60. The fetal frequency band signal-to-noise ratio was approximately −0.1 dB, indicating that fetal ECG energy was nearly indistinguishable from background noise.

3.2. R-Peak Detection Performance

R-peak detection accuracy is a key indicator for clinical application, and performance across individual subjects reflects the model’s adaptability to varying signal quality. Table 3 and Table 4 present the detailed R-peak classification results (TP, FP, FN, TN) for each subject in the ADDB and BDDB datasets, respectively. From these, Sen, PPV, and F1 metrics were calculated. The model achieved consistently high detection accuracy across subjects, as further illustrated by the confusion matrices in Figure 3.

In the ADDB dataset (Table 3, Figure 3), the R-peak detection performance across the five subjects was generally balanced: Subjects r01 and r07 had zero false positives (FP = 0) and zero false negatives (FN = 0), with completely accurate classification. Subjects r04 and r08 had only two false positives each, with missed detections ≤ 3. Subject r10 illustrated six false positives and eight missed detections, possibly related to signal baseline drift caused by fetal movement in that recording, yet its F1-score was still 98.80%, indicating very high overall R-peak detection accuracy. The overall confusion matrix shows that the model’s true positive rate (TPR) and true negative rate (TNR) on ADDB were both close to 100%, with only a small number of misjudgments in high-interference segments.

In the BDDB dataset (Table 4, Figure 3), among the 12 subjects, B2_Labour_01 and B2_Labour_08, which had excellent waveform reconstruction metrics, also achieved completely accurate R-peak detection. Eleven subjects had F1-scores ≥ 97.10%, maintaining a high level of detection accuracy. Only B2_Labour_03 performed relatively weaker (FP = 143, FN = 176). A waveform comparison with the best-performing subject confirmed that QRS complexes were almost entirely obscured by noise in this recording, resulting in the high false positive and false negative counts. Such recordings are typically classified as invalid in clinical practice and highlight the need for a signal quality assessment front-end in real-world wearable monitoring. The remaining subjects had false positives/false negatives ≤ 31, indicating that the model was robust to signals from most clinical individuals.

3.3. Visualization Verification

To further corroborate the quantitative results with intuitive evidence, two representative subjects from the BDDB dataset were selected for layered validation: subject B2_Labour_04 (low SNR with weak FECG amplitude, a typical challenging scenario) was used to demonstrate the model’s FECG extraction effect and waveform consistency, while subject B2_Labour_11 (strong uterine contraction interference) was adopted to verify the R-peak localization accuracy.

3.3.1. Visual Analysis of Extracted FECG Waveforms

Figure 4 illustrates the FECG extraction process using subject B2_Labour_04 as an example. The raw AECG signal (Figure 4a) contains strong MECG components, background noise, and weakly expressed fetal ECG signals. After processing by the proposed model, the maternal component (Figure 4b) was effectively separated and the clean FECG signal (Figure 4c) was successfully extracted, with morphological features clearly restored. Compared with the true scalp FECG signal (Figure 4d), the extracted FECG achieved high consistency in both time and amplitude.

To further evaluate the model’s temporal stability over a longer duration, Figure 5 presents a continuous 30 s overlay of the extracted FECG and the ground-truth scalp FECG for the same subject, divided into three 10 s panels for visual clarity. The two waveforms exhibit strong agreement in both QRS timing and overall morphology, with no visible degradation or drift over the extended recording period, confirming the model’s consistent performance.

3.3.2. Verification of R-Wave Localization Accuracy

Based on the extraction results from subject B2_Labour_11, the model’s R-peak detection performance is verified (Figure 6). Figure 6a shows the true FECG signal and expert-annotated R-peaks (red dots); Figure 6b shows the extracted FECG and predicted R-peaks (purple dots). The predicted R-peaks were highly aligned with true R-peaks, with no missed or false detections, consistent with quantitative metrics (Sen = 99.70%, PPV = 99.40%, F1 = 99.50%). The time offset was less than 50 ms (clinically acceptable). Under strong uterine contraction interference, zero missed and false detections were achieved, demonstrating that the minimal channel attention module accurately focuses on fetal QRS features and suppresses background interference.

3.4. Statistical Validation of Cross-Dataset Stability

To statistically verify the cross-dataset stability of MCA-FM, we compared the PCC and F1 scores between ADDB (n = 5) and BDDB (n = 12) using the Mann–Whitney U test (independent samples, two-sided). The results are visualized in Figure 7. No statistically significant differences were found for either PCC (p = 0.1037) or F1 (p = 0.2226), indicating that the model maintains consistent performance across the two independent clinical datasets despite the larger variability in BDDB.

3.5. Ablation Study

To evaluate the contribution of each key component of MCA-FM, we conducted ablation experiments on the ADDB dataset by systematically removing or replacing six components: minimal channel attention (MCA), enhanced residual connection, bridge variance scheduling (BVS), target matching (TM) loss, single-step inference (replaced with 20-step Euler ODE solver), and instance normalization (IN). All ablations were performed under the same leave-one-subject-out cross-validation protocol as the full model. The results are summarized in Table 5.

Removing MCA leads to a slight decrease in Sen (99.64% to 99.54%), PPV (99.67% to 99.61%), and F1 (99.66% to 99.58%), indicating that MCA primarily contributes to improving recall and overall detection accuracy by enhancing the selection of key fetal features through cross-channel attention. Removing the enhanced residual connection causes a clear drop in F1 (99.66% to 99.46%) and Sen (99.64% to 99.50%), confirming its role in stabilizing gradient flow in deeper networks and preserving effective features that would otherwise be weakened during training.

Replacing BVS with a linear schedule results in a small but consistent degradation across all metrics, most notably F1, which decreases from 99.66% to 99.46%, and Sen, which decreases from 99.64% to 99.55%. This demonstrates that the parabolic variance schedule provides a measurable positive contribution, particularly to the overall F1 score.

Removing instance normalization from all HNF blocks leads to a consistent drop in R-peak detection performance: F1 falls from 99.66% to 99.58%, Sen from 99.64% to 99.58%, and PPV from 99.67% to 99.58%. These results confirm that instance normalization, applied internally on deep feature maps, makes a meaningful contribution to detection accuracy, likely by stabilizing the feature distributions across different input segments. This finding also validates the complementary design where input-level Z-score normalization handles global amplitude variations while internal IN enhances feature-level stability [27].

In stark contrast, replacing the target matching loss with a standard noise-prediction loss causes a catastrophic collapse: Sen drops from 99.64% to 37.10%, PPV from 99.67% to 38.68%, and F1 from 99.66% to 35.96%. This unequivocally demonstrates that direct regression of the clean FECG signal is essential for this task. Finally, multi-step inference (20 steps) achieves nearly identical accuracy to the single-step full model (for example, F1 99.55% versus 99.66%) but increases inference time from 6.05 ms/segment to 33.31 ms/segment, validating the efficiency of our single-step design.

3.6. Sensitivity Analysis of BVS Parameter

The bridge variance scheduling (BVS) introduces a parabolic noise schedule parameterized by σ_max, which controls the peak noise level at the midpoint t = 0.5. To investigate the robustness of MCA-FM to this hyperparameter, we evaluated the model on ADDB with σ_max ∈ {0.1, 0.3, 0.5, 0.7, 1.0} while keeping all other settings unchanged. The results are summarized in Table 6 and visualized in Figure 8.

The model achieves the best performance at the default σ_max = 0.5, with PCC = 0.944 ± 0.050, F1 = 99.66 ± 0.49%, and MAE = 0.206 ± 0.066. When σ_max deviates from this value, performance degrades moderately: PCC decreases by approximately 0.04 to 0.05, F1 drops by about 0.6 to 1.6 percentage points, and MAE increases by roughly 0.05 to 0.07. These results confirm that σ_max = 0.5 provides a balanced trade-off between diffusion strength and signal preservation, and that the model maintains reasonable performance within a moderate range (0.3 to 0.7).

3.7. Effect of Normalization

To justify the choice of Z-score normalization, we compared it with two alternative preprocessing strategies on the ADDB dataset: Min–max normalization (scaling to [0, 1]) and no normalization (only filtering). All other experimental settings (model architecture, training hyperparameters, and five-fold leave-one-subject-out cross-validation) were kept identical. The results are summarized in Table 7.

Table 7 compares the performance of three normalization strategies on the ADDB dataset. Z-score normalization yields the highest PCC (0.944 ± 0.050) and F1 (99.66 ± 0.49%), slightly outperforming min–max (PCC = 0.935 ± 0.014, F1 = 99.18 ± 0.35%) and no normalization (PCC = 0.936 ± 0.052, F1 = 99.15 ± 0.30%). These results indicate that Z-score provides a modest but consistent advantage in waveform reconstruction and R-peak detection, likely due to its ability to stabilize training by centering and scaling the input signals. As the primary objective of MCA-FM is to recover the morphological shape of the FECG waveform and to detect R-peaks, both of which are largely invariant to global amplitude scaling, the use of Z-score normalization does not affect the core task. Therefore, we adopt Z-score as the default preprocessing method.

3.8. Cross-Dataset Generalization Analysis

To evaluate the generalization capability of MCA-FM across different datasets, we conducted a cross-dataset validation experiment. The model trained on ADDB was tested on BDDB, and the model trained on BDDB was tested on ADDB, using the same preprocessing pipeline and leave-one-out protocol. The results are summarized in Table 8.

When trained on ADDB and tested on BDDB, the model maintained an F1 score of 96.98% and PCC of 0.89, demonstrating that even with only five training subjects, the model generalizes reasonably well to a larger and more challenging dataset. When trained on BDDB and tested on ADDB, the model achieved excellent performance (PCC = 0.97, F1 = 99.53%), indicating that the richer training data from BDDB transfers effectively to the smaller ADDB dataset. These cross-dataset results confirm that MCA-FM learns transferable features rather than overfitting to dataset-specific characteristics.

3.9. Evaluation on Synthetic Data with Controlled Interferences

To further evaluate the model’s robustness under controlled interference conditions beyond the real-world clinical datasets, we tested MCA-FM on the FECGSYNDB. The results for five representative interference scenarios are summarized in Table 9.

Across the five interference scenarios, the model maintained reasonable performance, with PCC ranging from 0.81 to 0.92 and F1 from 85.64% to 95.13%. The variation in performance across cases reflects the differing severity of the simulated interferences. Case 1 (fetal movement) and Case 2 (MHR/FHR acceleration/decelerations) achieved the highest F1 scores (94.36% and 95.13%, respectively). Case 3 (uterine contractions) yielded the lowest PCC (0.81) but still maintained an F1 above 91%. These results on synthetic data with controlled interferences complement the evaluations on ADDB and BDDB and further demonstrate the model’s ability to handle diverse physiological artifacts.

3.10. Consistency of Fetal Physiological Parameters (HR and RR Intervals) Across Datasets

HR and RR intervals are the most direct and critical clinical physiological indicators for fetal cardiac monitoring, and their estimation accuracy and consistency across datasets are important benchmarks for evaluating the clinical applicability of FECG extraction models. Based on the FECG extraction and R-peak detection results of the ADDB and BDDB clinical datasets, this section conducted correlation analysis and Bland–Altman agreement analysis on the predicted and ground-truth HR and RR interval values, with the visualization results shown in Figure 9 and Figure 10, respectively. Note that the coordinate axes of the HR and RR Bland–Altman plots differ because heart rate is measured in beats per minute while RR interval is measured in milliseconds; this reflects the typical magnitude of each metric rather than a difference in estimation quality.

Figure 9 presents the HR correlation and agreement analysis across the ADDB and BDDB datasets. Figure 9a and Figure 9c are scatter plots of predicted HR (HR_pred) versus ground-truth HR (HR_ECG) for the ADDB and BDDB datasets, respectively, demonstrating strong linear correlation between the predicted and true HR values (ADDB: r = 0.907, RMSE = 4.26 bpm; BDDB: r = 0.908, RMSE = 5.51 bpm; p < 0.001), with all sample points closely clustered around the identity line (y = x). Figure 9b,d are Bland–Altman plots illustrating the mean difference and 95% limits of agreement (LoA) between HR_pred and HR_ECG: the ADDB dataset has a mean difference of −0.045 bpm and LoA of [−8.405, 8.315] bpm, while the BDDB dataset has a mean difference of 0.011 bpm and LoA of [−10.797, 10.818] bpm. The HR difference values of both datasets are tightly distributed around the zero line, with no obvious systematic bias, indicating high agreement in HR estimation across different clinical datasets.

Figure 10 shows the RR interval estimation consistency across the ADDB and BDDB datasets via Bland–Altman plots. Figure 10a is the Bland–Altman plot for the ADDB dataset, where the mean difference between predicted and ground-truth RR intervals is −0.02 ms with 95% LoA of [−29.56, 29.52] ms; Figure 10b is the Bland–Altman plot for the BDDB dataset, where the mean difference is −0.04 ms with 95% LoA of [−34.76, 34.67] ms. The two datasets exhibit negligible overall systematic bias (mean difference ≈ −0.03 ms) and tight 95% LoA (≈±30–35 ms), with residual errors randomly distributed around the zero line without trending deviation. This confirms that the model has high temporal fidelity in recovering fetal heart rate variability and that the RR interval estimation results have good consistency across different clinical datasets.

3.11. Time Complexity, Resource Footprint, and Real-Time Performance Analysis

Beyond reconstruction and detection accuracy, the practical deployment of a fetal ECG extraction model in real-time monitoring scenarios also depends on its computational efficiency and resource requirements. This section evaluates the time complexity, inference efficiency, memory footprint, and energy consumption of the MCA-FM model on both GPU and CPU, with the quantitative results summarized in Table 10.

Table 10 presents the total inference time, mean and median latency per segment, throughput, real-time factor (RTF), time complexity, peak memory usage, and energy consumption per segment for both GPU and CPU inference on the ADDB and BDDB datasets. The signal segment length is uniformly set to 1000 samples (5 s at a sampling rate of 200 Hz). RTF is defined as the ratio of signal duration to inference time; RTF > 1 indicates that the model’s inference speed exceeds real-time requirements.

On GPU, the model achieves a mean inference latency of approximately 4.5 ms per 5 s segment with an RTF exceeding 1100, far surpassing real-time requirements. The peak GPU memory usage is 41.3 MB, and the energy consumption is 220–519 mJ per segment. On CPU, the mean latency is approximately 61–63 ms per segment, corresponding to an RTF of approximately 80, which still exceeds real-time requirements by a substantial margin. The peak CPU memory usage is 666.5–674.6 MB, and the energy consumption is 548–1342 mJ per segment. The model has a linear time complexity of O(N·L), and the mean inference time per segment on both platforms is far lower than the actual segment duration (5 s). These results demonstrate that MCA-FM offers excellent computational efficiency and real-time performance. The low GPU memory footprint and moderate energy consumption further indicate that the model is suitable for both high-throughput clinical workstation deployment and resource-constrained edge computing scenarios, providing a solid technical foundation for practical clinical application.

4. Discussion

This study proposes the MCA-FM based on two core improvements: minimal channel attention and flow matching. The model achieves efficient noninvasive FECG extraction with strong waveform reconstruction, R-peak detection, and generalization. In the following, we compare with mainstream methods, validate our improvements, and discuss the model’s limitations and future research directions.

To ensure a fair comparison, all baseline methods were reimplemented and retrained under the same preprocessing pipeline, segmentation scheme, sampling rate, and leave-one-out cross-validation protocol as MCA-FM. The comparison includes representative methods from multiple categories: ICA-based (SA-KICA [31]), diffusion-based (DIFF-FECG [20]), GAN-based (1D-CycleGAN [19]), CNN–transformer [17], and time-frequency domain approaches [26]. The complete results are summarized in Table 11.

Kernel-based blind source separation approaches, such as SA-KICA [31], extend traditional ICA by incorporating spectral attention to enhance separation. However, their underlying linear mixing assumption limits performance under nonlinear interference. On ADDB, SA-KICA achieves PCC = 0.73 ± 0.064 and MAE = 0.52 ± 0.071, substantially lower than MCA-FM’s PCC = 0.94 ± 0.050 and MAE = 0.21 ± 0.066. On BDDB, the extracted FECG amplitude is also severely distorted. MCA-FM abandons the linear assumption, learns a single-step deterministic mapping via CFM, and directly models the nonlinear AECG-FECG relationship, overcoming nonlinear interference.

As a conditional diffusion-based method, DIFF-FECG [20] handles non-Gaussian noise through multi-step denoising. However, its iterative inference process leads to a higher computational cost. Under our standardized evaluation protocol, it achieves PCC = 0.74 ± 0.142 and F1 = 92.20 ± 6.70% on ADDB, and PCC = 0.69 ± 0.162 and F1 = 87.46 ± 12.42% on BDDB, both considerably lower than MCA-FM. In contrast, MCA-FM’s single-step flow matching inference not only achieves higher accuracy but also offers substantially faster inference.

Generative adversarial networks (GANs) [19] risk mode collapse under low SNR, losing weak features (e.g., P-waves, T-waves). On ADDB, 1D-CycleGAN achieves F1 = 98.91 ± 1.48% for R-peak detection, while MCA-FM reaches 99.66 ± 0.49%. The gain is achieved through minimal channel attention, which dynamically focuses on the fetal QRS frequency band (10–15 Hz), thereby avoiding feature loss from adversarial training. On BDDB, the performance gap widens further, with 1D-CycleGAN dropping to F1 = 96.57 ± 7.89% compared with MCA-FM’s 97.37 ± 6.43%, highlighting MCA-FM’s greater stability under strong physiological interferences.

The CNN–transformer hybrid model [17] combines the local feature extraction capability of CNNs with the global context modeling of transformers. However, the standard self-attention mechanism brings considerable computational overhead when processing long input sequences, which limits its efficiency for real-time monitoring scenarios. Under our standardized evaluation, it achieves PCC = 0.83 ± 0.061 and F1 = 98.17 ± 0.66% on ADDB, and PCC = 0.85 ± 0.109 and F1 = 96.80 ± 5.96% on BDDB, trailing MCA-FM by a clear margin on both datasets. This suggests that, while hybrid architectures are promising, the lightweight channel attention in MCA-FM provides a more effective and efficient alternative for fetal ECG extraction.

Time-frequency domain methods [26] convert AECG to 2D representations, introducing redundancy and higher complexity. On ADDB, their MAE (0.27 ± 0.065) is 29% higher than MCA-FM’s, and on BDDB their F1 drops to 97.27 ± 6.46% compared with MCA-FM’s 97.37 ± 6.43%. The 2D transformation also increases the number of parameters and processing steps without a corresponding gain in extraction quality. MCA-FM directly optimizes 1D signal mapping without domain transformation, achieving a better accuracy–efficiency balance.

Specifically, the minimal channel attention module in MCA-FM follows ECA-Net’s dimensionality-reduction-free cross-channel interaction [18]. This design avoids unnecessary computational burden while significantly enhancing feature selection. After adding this module, PCC increased to 0.94 ± 0.050 and SPC reached 0.94 ± 0.044, confirming its effectiveness in improving the frequency-domain consistency of fetal QRS complexes. The enhanced residual connection draws on the periodic-aware residual design of PA²Net [16] for FECG detection. It retains the identity mapping inside the HNF block and adds a new linear fusion of “original feature + weighted residual” (

F_{out} = F + α (\hat{F} - F)

,

α

= 0.05) after the attention module. The resulting R-peak detection F1 reaches 97.37 ± 6.43%, demonstrating its ability to stabilize gradient flow and avoid loss of effective features under strong interference. The residual false positives and false negatives observed in the confusion matrices are predominantly concentrated in low-quality segments where the fetal ECG is nearly obscured by transient artifacts. Integrating a signal quality assessment module to identify and flag such segments before processing would be a practical direction for reducing these errors. The BDDB dataset is particularly informative for assessing clinical robustness, as its greater diversity of physiological interferences provides a more rigorous test of generalization. The fact that MCA-FM maintains high PCC and F1 on BDDB underscores its practical potential in challenging clinical scenarios.

As shown in the quantitative comparison results in Table 11, although MCA-FM outperforms existing methods on most metrics, the following limitations still exist. The experiments are based on two clinical datasets of healthy full-term pregnancies, ADDB (5 subjects) and BDDB (12 subjects), supplemented by the FECGSYNDB synthetic dataset for controlled interference evaluation. However, real pathological scenarios such as preterm birth (<37 weeks) or fetal arrhythmias (e.g., supraventricular tachycardia) are not included. In such clinical scenarios, the FECG amplitude is lower (about 0.1–0.3 mV, 50% lower than that of full-term fetuses) and waveform distortion is more severe, so model performance might decrease by 10–15%. The assumption of short-term noise stationarity within each 5 s window, while common in fetal ECG processing, has not been formally verified. These limitations indicate directions for the model’s further optimization, which will be addressed in future research.

In the future, we hope to collaborate with obstetric departments to collect AECG data from 24 to 42 weeks of gestation (including 30 preterm cases and 20 arrhythmia cases), and introduce domain adaptation training (e.g., domain-adversarial neural networks) to reduce the distribution difference between healthy and pathological data. We aim to further enhance the model’s clinical value and promote the popularization of non-invasive fetal monitoring technology in primary healthcare institutions.

5. Conclusions

We propose a channel attention and flow matching algorithm for fetal electrocardiogram (ECG) extraction, providing a reliable solution for non-invasive fetal heart health monitoring. First, starting from the nonlinear hybrid relationship between AECG and FECG, we construct a single-step deterministic mapping based on flow matching. Next, we combine target matching loss and bridging variance scheduling to improve training stability. Finally, we employ a minimal channel attention mechanism to enhance the selection of key fetal features and combine enhanced residual connections to optimize gradient flow and feature preservation. We validate the model’s effectiveness on two real-world clinical datasets (ADDB and BDDB). On the ADDB dataset, the waveform reconstruction metric achieves a PCC of 0.94 ± 0.050, and the fetal R-wave peak detection F1 score reaches 99.66 ± 0.49%. On the BDDB dataset, the model maintains stable performance (PCC = 0.91 ± 0.122, F1 = 97.37 ± 6.43%), even in the presence of severe physiological interferences such as uterine contractions. These results demonstrate that our method achieves high-precision waveform reconstruction and robust R-peak detection across diverse subjects, confirming its strong generalization capability. Future research will focus on expanding data to include pathological scenarios, optimizing strong interference suppression, and integrating signal quality assessment to reduce the residual false positive and false negative detections. These efforts aim to further enhance the clinical applicability of the model and promote the application of non-invasive fetal monitoring technology in primary healthcare institutions.

Author Contributions

Conceptualization, Q.D., X.H. and Y.Z.; methodology, Q.D. and Y.Z.; software, X.H. and Z.X.; validation, Q.D. and X.H.; formal analysis, Y.Z. and C.L.; investigation, Q.D. and Z.X.; resources, Z.X. and C.L.; data curation, Q.D. and X.H.; writing—original draft preparation, Q.D.; writing—review and editing, Q.D., X.H., Z.X. and C.L.; visualization, Q.D.; supervision, Y.Z., Z.X. and C.L.; project administration, Y.Z.; funding acquisition, Y.Z. and C.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Grant No. 62171123), the National Key Research and Development Program of China (Grant Nos. 2023YFC3603600), and the Research Start-up Funding for High-level Talent of Jiangsu University of Science and Technology (Grant No. 1132932502).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets used in this study are all publicly available from the PhysioNet and Springer Nature Figshare repositories. The Abdominal and Direct Fetal ECG Database (ADDB) is available at https://physionet.org/content/adfecgdb/1.0.0/ (accessed on 1 November 2025). The Fetal electrocardiograms, direct and abdominal with reference heart beats annotations (BDDB) is available at https://springernature.figshare.com/articles/dataset/Fetal_electrocardiograms_direct_and_abdominal_with_reference_heart_beats_annotations/10311029 (accessed on 1 November 2025). The Fetal ECG Synthetic Database (FECGSYNDB) is available at https://physionet.org/content/fecgsyndb/1.0.0/ (accessed on 1 November 2025). All data were used in accordance with the original licenses provided by the sources.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Kong, Z.; Ping, W.; Huang, J.; Zhao, K.; Catanzaro, B. Diffwave: A Versatile Diffusion Model for Audio Synthesis. arXiv 2020, arXiv:2009.09761. [Google Scholar]
Vullings, R.; Peters, C.H.; Sluijter, R.J.; Mischi, M.; Oei, S.G.; Bergmans, J.W. Dynamic Segmentation and Linear Prediction for Maternal ECG Removal in Antenatal Abdominal Recordings. Physiol. Meas. 2009, 30, 291–306. [Google Scholar] [CrossRef]
Zhang, Y.; Gu, A.; Xiao, Z.; Xing, Y.; Yang, C.; Li, J.; Liu, C. Wearable Fetal ECG Monitoring System from Abdominal Electrocardiography Recording. Biosensors 2022, 12, 475. [Google Scholar] [CrossRef]
Widrow, B.; Glover, J.R.; McCool, J.M.; Kaunitz, J.; Williams, C.S.; Hearn, R.H.; Goodlin, R.C. Adaptive Noise Cancelling: Principles and Applications. Proc. IEEE 1975, 63, 1692–1716. [Google Scholar] [CrossRef]
Zarzoso, V.; Nandi, A.K. Noninvasive Fetal Electrocardiogram Extraction: Blind Separation versus Adaptive Noise Cancellation. IEEE Trans. Biomed. Eng. 2002, 49, 12–18. [Google Scholar] [CrossRef] [PubMed]
Andreotti, F.; Behar, J.; Zaunseder, S.; Oster, J.; Clifford, G.D. An Open-Source Framework for Stress-Testing Non-Invasive Foetal ECG Extraction Algorithms. Physiol. Meas. 2016, 37, 627–647. [Google Scholar] [CrossRef] [PubMed]
Tang, M.; Wu, Y. A Blind Extraction Method of Fetal Electrocardiogram Signal Based on MNCMD-NLBCA. EURASIP J. Adv. Signal Process. 2024, 2024, 102. [Google Scholar] [CrossRef]
Zhang, N.; Zhang, J.; Li, H.; Mumini, O.O.; Samuel, O.W.; Ivanov, K.; Wang, L. A Novel Technique for Fetal ECG Extraction Using Single-Channel Abdominal Recording. Sensors 2017, 17, 457. [Google Scholar] [CrossRef]
Cerutti, S.; Baselli, G.; Civardi, S.; Ferrazzi, E.; Marconi, A.M.; Pagani, M.; Pardi, G. Variability Analysis of Fetal Heart Rate Signals as Obtained from Abdominal Electrocardiographic Recordings. In Proceedings of the 8th International Conference of the IEEE Engineering in Medicine and Biology Society (EMBS), Boston, MA, USA, 13 November 1986; pp. 623–624. [Google Scholar]
Jezewski, J.; Kupka, T.; Horoba, K.; Czabanski, R.; Wrobel, J. A Problem of Maternal and Fetal QRS Complexes Overlapping in Fetal Heart Rate Estimation. In Proceedings of the 2010 International Conference on Information Technology and Applications in Biomedicine (ITAB); ACM: Corfu, Greece, 2010. [Google Scholar]
Liu, H.; Chen, D.; Sun, G. Detection of Fetal ECG R Wave from Single-Lead Abdominal ECG Using a Combination of RR Time-Series Smoothing and Template-Matching Approach. IEEE Access 2019, 7, 66633–66643. [Google Scholar] [CrossRef]
Kanjilal, P.P.; Palit, S.; Saha, G. Fetal ECG Extraction from Single-Channel Maternal ECG Using Singular Value Decomposition. IEEE Trans. Biomed. Eng. 1997, 44, 51–59. [Google Scholar] [CrossRef]
Mohebbian, M.R.; Vedaei, S.S.; Wahid, K.A.; Dinh, A.; Marateb, H.R.; Tavakolian, K. Fetal ECG Extraction from Maternal ECG Using Attention-Based CycleGAN. IEEE J. Biomed. Health Inform. 2022, 26, 515–526. [Google Scholar] [CrossRef] [PubMed]
Huang, Z.Z.; Zhang, W.T.; Li, Y.; Cui, J.; Zhang, Y.R. TCGAN: Temporal Convolutional Generative Adversarial Network for Fetal ECG Extraction Using Single-Channel Abdominal ECG. IEEE J. Biomed. Health Inform. 2024, 28, 3192–3203. [Google Scholar] [CrossRef]
Basak, P.; Sakib, A.N.; Chowdhury, M.E.; Al-Emadi, N.; Yalcin, H.C.; Pedersen, S.; Al-Maadeed, S. A Novel Deep Learning Technique for Morphology Preserved Fetal ECG Extraction from Mother ECG Using 1D-CycleGAN. Expert Syst. Appl. 2024, 235, 121196. [Google Scholar] [CrossRef]
Wang, X.; He, Z.; Lin, Z.; Han, Y.; Liu, T.; Lu, J.; Xie, S. PA²Net: Period-Aware Attention Network for Robust Fetal ECG Detection. IEEE Trans. Instrum. Meas. 2022, 71, 1–12. [Google Scholar] [CrossRef]
Chen, Z.; Zheng, K.; Shen, J.; Lin, Y.; Feng, Y.; Xu, J. Sample Point Classification of Abdominal ECG through CNN-Transformer Model Enables Efficient Fetal Heart Rate Detection. IEEE Trans. Instrum. Meas. 2023, 73, 1–12. [Google Scholar] [CrossRef]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14 June 2020; pp. 11534–11542. [Google Scholar]
Wang, X.; He, Z.; Lin, Z.; Han, Y.; Su, W.; Xie, S. Correlation-Aware Attention CycleGAN for Accurate Fetal ECG Extraction. IEEE Trans. Instrum. Meas. 2023, 72, 1–13. [Google Scholar] [CrossRef]
Chen, Z.; Lin, Y.; Luo, Q.; Xu, J. DIFF-FECG: A Conditional Diffusion-Based Method for Fetal ECG Extraction from Abdominal ECG. IEEE Trans. Artif. Intell. 2025, 7, 534–546. [Google Scholar] [CrossRef]
Lipman, Y.; Chen, R.T.; Ben-Hamu, H.; Nickel, M.; Le, M. Flow Matching for Generative Modeling. arXiv 2022, arXiv:2210.02747. [Google Scholar]
Goldberger, A.L.; Amaral, L.A.; Glass, L.; Hausdorff, J.M.; Ivanov, P.C.; Mark, R.G.; Stanley, H.E. PhysioBank, PhysioToolkit, and PhysioNet: Components of a New Research Resource for Complex Physiologic Signals. Circulation 2000, 101, e215–e220. [Google Scholar] [CrossRef]
Jezewski, J.; Matonia, A.; Kupka, T.; Roj, D.; Czabanski, R. Determination of Fetal Heart Rate from Abdominal Signals: Evaluation of Beat-to-Beat Accuracy in Relation to the Direct Fetal Electrocardiogram. Biomed. Tech. 2012, 57, 383–394. [Google Scholar] [CrossRef] [PubMed]
Matonia, A.; Jezewski, J.; Kupka, T.; Jezewski, M.; Horoba, K.; Wrobel, J.; Kahankowa, R. Fetal Electrocardiograms, Direct and Abdominal with Reference Heartbeat Annotations. Sci. Data 2020, 7, 200. [Google Scholar] [CrossRef]
Almadani, M.; Hadjileontiadis, L.; Khandoker, A. One-Dimensional W-NETR for Non-Invasive Single Channel Fetal ECG Extraction. IEEE J. Biomed. Health Inform. 2023, 27, 3198–3209. [Google Scholar] [CrossRef]
Lin, Y.; Liu, H.; Ruan, L.; Chen, Z.; Xu, J. Advancing Non-Invasive Fetal Health Monitoring: A Time–Frequency Approach to Extracting Fetal Electrocardiogram Signals. Biomed. Signal Process. Control 2024, 95, 106477. [Google Scholar] [CrossRef]
Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the 32nd International Conference on Machine Learning (ICML), Lille, France, 6 July 2015; Volume 37, pp. 448–456. [Google Scholar]
Qu, R.; Song, T.; Wei, G.; Wei, L.; Cao, W.; Song, J. Integrating Contrastive Learning and Cycle Generative Adversarial Networks for Non-Invasive Fetal ECG Extraction. Pediatr. Cardiol. 2024, 46, 2078–2088. [Google Scholar] [CrossRef]
Rahman, A.; Mahmud, S.; Chowdhury, M.E.H.; Yalcin, H.C.; Khandakar, A.; Mutlu, O.; Mahbub, Z.B.; Kamal, R.Y.; Pedersen, S. Fetal ECG Extraction from Maternal ECG Using Deeply Supervised LinkNet++ Model. Eng. Appl. Artif. Intell. 2023, 123, 106414. [Google Scholar] [CrossRef]
Pan, J.; Tompkins, W.J. A Real-Time QRS Detection Algorithm. IEEE Trans. Biomed. Eng. 1985, BME-32, 230–236. [Google Scholar] [CrossRef] [PubMed]
Qiao, L.; Hu, S.; Xiao, B.; Bi, X.; Li, W.; Gao, X. A Dual Self-Calibrating Framework for Noninvasive Fetal ECG R-Peak Detection. IEEE Internet Things J. 2023, 10, 16579–16593. [Google Scholar] [CrossRef]

Figure 1. Framework of the non-invasive abdominal patch-based real-time FECG monitoring and analysis system. (a) Abdominal patch with textile-integrated hydrogel electrodes and wireless chip for multi-channel AECG acquisition. (b) Mobile terminal for wireless data reception and upload via 5G network. (c) Cloud/server-side AECG reconstruction and MECG separation to extract clean FECG. (d) Clinical dashboard displaying real-time FECG waveform, heart rate, and rhythm classification.

Figure 2. Architecture of the proposed MCA-FM for fetal ECG extraction, consisting of a flow matching network, minimal channel attention, and enhanced residual connections. (a) Overall structure; (b) minimal channel attention; (c) flow matching network. Abbreviations: PE, positional encoding; FC, fully connected; IN, instance normalization; ID, identity mapping.

Figure 3. Confusion matrix of fetal R-peak detection by the proposed MCA-FM model on the ADDB and BDDB test sets. The horizontal axis is the predicted label and the vertical axis is the true label.

Figure 4. Signal analysis for subject B2_Labour_04 (BDDB). (a) Input AECG; (b) extracted MECG; (c) extracted FECG by MCA-FM; (d) ground-truth FECG. Vertical axis: normalized amplitude; horizontal axis: time (s).

Figure 5. Continuous 30 s FECG extraction comparison for BDDB subject B2_Labour_04. The signal is divided into three 10 s panels (0–10 s, 10–20 s, and 20–30 s). Blue: ground-truth invasive FECG; red: FECG extracted by MCA-FM.

Figure 6. R-peak localization for subject B2_Labour_11 (BDDB). (a) Ground-truth FECG and expert-annotated R-peaks (red dots); (b) extracted FECG and predicted R-peaks (purple dots). Axes: amplitude (mV) vs. time (s).

Figure 7. Boxplot comparison of MCA-FM performance between ADDB (n = 5) and BDDB (n = 12). Left: PCC; right: F1 score. Each box shows the median (red line), interquartile range (box), and whiskers. Group means ± standard deviations are displayed below each x-axis label. The Mann–Whitney U test p-values are shown at the bottom of each subplot (inside the blue information box).

Figure 8. Sensitivity of PCC to BVS parameter σ_max on the ADDB dataset. The model achieves optimal performance at σ_max = 0.5, with performance gradually degrading as σ_max deviates from this value.

Figure 9. HR correlation and Bland–Altman agreement analysis across ADDB and BDDB datasets. (a,c) Predicted vs. ground-truth HR scatter plots (gray dashed: y = x). (b,d) Bland–Altman plots with mean difference (black dashed) and 95% limits of agreement (red solid). X-axis (a,c): HR_ECG (bpm); Y-axis (a,c): HR_pred (bpm). X-axis (b,d): mean HR ((HR_pred + HR_ECG)/2, bpm); Y-axis (b,d): difference HR_pred − HR_ECG (bpm). All heart rate values are in beats per minute (bpm).

Figure 10. Bland–Altman plots for RR interval estimation on ADDB (a) and BDDB (b), showing negligible bias and tight LoA. Axes: mean RR (ms) vs. prediction error (ms). Black dashed: mean difference; red solid: 95% LoA.

Table 1. Waveform reconstruction performance of the proposed MCA-FM on the ADDB dataset.

Subject	MAE	SPC	PCC
r01	0.16	0.97	0.97
r04	0.19	0.94	0.96
r07	0.19	0.96	0.96
r08	0.16	0.97	0.97
r10	0.32	0.87	0.86
Mean ± standard deviation	0.21 ± 0.066	0.94 ± 0.044	0.94 ± 0.050

Table 2. Waveform reconstruction performance of the proposed MCA-FM on the BDDB dataset.

Subject	MAE	SPC	PCC
B2_Labour_01	0.23	0.96	0.95
B2_Labour_02	0.26	0.94	0.93
B2_Labour_03	0.66	0.60	0.53
B2_Labour_04	0.29	0.94	0.92
B2_Labour_05	0.24	0.95	0.95
B2_Labour_06	0.23	0.97	0.95
B2_Labour_07	0.23	0.93	0.94
B2_Labour_08	0.26	0.95	0.95
B2_Labour_09	0.27	0.94	0.93
B2_Labour_10	0.21	0.95	0.96
B2_Labour_11	0.19	0.96	0.97
B2_Labour_12	0.21	0.96	0.95
Mean ± standard deviation	0.27 ± 0.125	0.92 ± 0.103	0.91 ± 0.122

Table 3. R-Peak detection classification results for each subject in the ADDB dataset.

Subject	TP	FP	FN	TN	Sen (%)	PPV (%)	F1 (%)
r01	644	0	0	2213	100	100	100
r04	629	2	3	2223	99.50	99.70	99.60
r07	627	0	0	2230	100	100	100
r08	651	2	0	2204	100	99.70	99.80
r10	592	6	8	2251	98.70	99.00	98.80

Table 4. R-Peak detection classification results for each subject in the BDDB dataset.

Subject	TP	FP	FN	TN	Sen (%)	PPV (%)	F1 (%)
B2_Labour_01	644	0	0	2213	100	100	100
B2_Labour_02	629	31	6	2191	99.10	95.30	97.10
B2_Labour_03	538	143	176	2000	75.40	79.00	77.10
B2_Labour_04	676	10	10	2161	98.50	98.50	98.50
B2_Labour_05	660	0	1	2196	99.80	100	99.90
B2_Labour_06	683	1	1	2172	99.90	99.90	99.90
B2_Labour_07	627	7	8	2215	98.70	98.90	98.80
B2_Labour_08	645	0	0	2212	100	100	100
B2_Labour_09	662	11	12	2172	98.20	98.40	98.30
B2_Labour_10	626	2	1	2228	99.80	99.70	99.80
B2_Labour_11	648	4	2	2203	99.70	99.40	99.50
B2_Labour_12	652	3	5	2197	99.20	99.50	99.40

Table 5. Ablation study on ADDB dataset (mean ± standard deviation over 5 folds).

Configuration	Sen (%)	PPV (%)	F1 (%)
MCA-FM	99.64 ± 0.58	99.67 ± 0.41	99.66 ± 0.49
w/o MCA	99.54 ± 0.60	99.61 ± 0.32	99.58 ± 0.45
w/o enhanced residual	99.50 ± 0.58	99.54 ± 0.46	99.46 ± 0.59
w/o BVS	99.55 ± 0.59	99.56 ± 0.40	99.46 ± 0.59
w/o instance norm	99.58 ± 0.45	99.58 ± 0.45	99.58 ± 0.45
w/o TM loss	37.10 ± 10.99	38.68 ± 9.16	35.96 ± 9.18
inference (20 steps)	99.53 ± 0.58	99.58 ± 0.41	99.55 ± 0.46

Table 6. Sensitivity analysis of BVS parameter σ_max on the ADDB dataset (mean ± standard deviation over 5 folds).

σ_max	MAE	PCC	F1 (%)
0.1	0.268 ± 0.090	0.887 ± 0.084	97.99 ± 2.60
0.3	0.271 ± 0.076	0.893 ± 0.071	99.00 ± 0.92
0.5 (ours)	0.206 ± 0.066	0.944 ± 0.050	99.66 ± 0.49
0.7	0.256 ± 0.084	0.906 ± 0.069	99.04 ± 0.75
1.0	0.256 ± 0.089	0.903 ± 0.071	98.97 ± 0.59

Table 7. Performance comparison of different normalization strategies on ADDB (mean ± standard deviation over 5 folds).

Normalization	MAE	PCC	F1 (%)
Z-score (ours)	0.206 ± 0.066	0.944 ± 0.050	99.66 ± 0.49
Min–max	0.220 ± 0.018	0.935 ± 0.014	99.18 ± 0.35
None	0.211 ± 0.085	0.936 ± 0.052	99.15 ± 0.30

Table 8. Cross-dataset generalization performance of MCA-FM.

Source	Target	Subjects	MAE	SPC	PCC	Sen (%)	PPV (%)	F1 (%)
ADDB	BDDB	12	0.29 ± 0.130	0.90 ± 0.106	0.89 ± 0.132	96.96 ± 7.45	97.00 ± 6.59	96.98 ± 7.00
BDDB	ADDB	5	0.17 ± 0.007	0.97 ± 0.006	0.97 ± 0.003	99.84 ± 0.31	99.23 ± 1.17	99.53 ± 0.75

Table 9. Performance of MCA-FM across different physiological interference scenarios in FECGSYNDB. Note: Case 0: Noise only; Case 1: Fetal movement; Case 2: MHR/FHR acceleration/decelerations; Case 3: Uterine contractions; Case 4: Ectopic beats.

Case	PCC	Sen (%)	PPV (%)	F1 (%)
0	0.89	86.95	84.37	85.64
1	0.89	94.99	93.73	94.36
2	0.92	95.85	94.42	95.13
3	0.81	92.16	90.68	91.41
4	0.88	88.03	86.11	87.06

Table 10. Inference performance and resource footprint of MCA-FM on GPU and CPU. Note: Inf. = inference time; seg = segment; RTF = real-time factor, Mem. = memory, Energy = energy consumption per segment. Time complexity is expressed as O(N⋅L), where N is the total number of segments and L = 1000.

Platform	Dataset	Total Inf. (s)	Mean (ms/seg)	Median (ms/seg)	Throughput (seg/s)	RTF	Time Complexity	Peak Mem. (MB)	Energy (mJ/seg)
GPU	ADDB	2.753	4.54 ± 0.68	4.52	216.2	1101.9	O(N·L)	41.3	220
GPU	BDDB	6.486	4.45 ± 0.34	4.50	220.2	1123.4	O(N·L)	41.3	519
CPU	ADDB	36.536	61.33 ± 2.16	61.10	16.3	81.5	O(N·L)	666.5	548
CPU	BDDB	89.444	62.56 ± 1.97	62.37	16.0	79.9	O(N·L)	674.6	1342

Table 11. Comparison of waveform reconstruction quality and R-peak detection performance metrics between state-of-the-art methods and the proposed MCA-FM model on the ADDB and BDDB datasets (mean ± standard deviation). Note: ↓ indicates the lower the value, the better the performance; ↑ indicates the higher the value, the better the performance.

Dataset	Method	MAE ↓	SPC ↑	PCC ↑	Sen(%) ↑	PPV(%) ↑	F1(%) ↑
ADDB	SA-KICA [31]	0.52 ± 0.071	0.73 ± 0.094	0.73 ± 0.064	97.16 ± 1.62	96.27 ± 1.95	96.71 ± 1.61
	DIFF-FECG [20]	0.46 ± 0.137	0.73 ± 0.110	0.74 ± 0.142	91.95 ± 7.14	92.46 ± 6.26	92.20 ± 6.70
	1D-CycleGAN [19]	0.28 ± 0.066	0.88 ± 0.060	0.90 ± 0.057	98.98 ± 1.59	98.83 ± 1.38	98.91 ± 1.48
	CNN–transformer [17]	0.35 ± 0.076	0.83 ± 0.030	0.83 ± 0.061	98.53 ± 0.74	97.82 ± 0.63	98.17 ± 0.66
	Time-frequency [26]	0.27 ± 0.065	0.88 ± 0.036	0.90 ± 0.052	99.55 ± 0.48	99.20 ± 0.40	99.37 ± 0.44
	MCA-FM	0.21 ± 0.066	0.94 ± 0.044	0.94 ± 0.050	99.64 ± 0.58	99.67 ± 0.41	99.66 ± 0.49
BDDB	SA-KICA [31]	0.56 ± 0.108	0.70 ± 0.160	0.68 ± 0.132	95.19 ± 9.51	95.15 ± 8.10	95.16 ± 8.81
	DIFF-FECG [20]	0.52 ± 0.131	0.69 ± 0.151	0.69 ± 0.162	86.69 ± 13.44	88.30 ± 11.36	87.46 ± 12.42
	1D-CycleGAN [19]	0.29 ± 0.110	0.89 ± 0.105	0.89 ± 0.117	96.61 ± 8.61	96.56 ± 7.15	96.57 ± 7.89
	CNN–transformer [17]	0.33 ± 0.123	0.84 ± 0.107	0.85 ± 0.109	97.14 ± 6.40	96.48 ± 5.56	96.80 ± 5.96
	Time-frequency [26]	0.27 ± 0.120	0.88 ± 0.105	0.89 ± 0.110	97.59 ± 6.70	96.97 ± 6.25	97.27 ± 6.46
	MCA-FM	0.27 ± 0.125	0.92 ± 0.103	0.91 ± 0.122	97.37 ± 6.96	97.38 ± 5.93	97.37 ± 6.43

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Duan, Q.; Hu, X.; Zhang, Y.; Xiao, Z.; Liu, C. MCA-FM: Robust Non-Invasive Fetal ECG Extraction via Minimal Channel Attention and Flow Matching. Appl. Sci. 2026, 16, 5953. https://doi.org/10.3390/app16125953

AMA Style

Duan Q, Hu X, Zhang Y, Xiao Z, Liu C. MCA-FM: Robust Non-Invasive Fetal ECG Extraction via Minimal Channel Attention and Flow Matching. Applied Sciences. 2026; 16(12):5953. https://doi.org/10.3390/app16125953

Chicago/Turabian Style

Duan, Qingqing, Xinyu Hu, Yuwei Zhang, Zhijun Xiao, and Chengyu Liu. 2026. "MCA-FM: Robust Non-Invasive Fetal ECG Extraction via Minimal Channel Attention and Flow Matching" Applied Sciences 16, no. 12: 5953. https://doi.org/10.3390/app16125953

APA Style

Duan, Q., Hu, X., Zhang, Y., Xiao, Z., & Liu, C. (2026). MCA-FM: Robust Non-Invasive Fetal ECG Extraction via Minimal Channel Attention and Flow Matching. Applied Sciences, 16(12), 5953. https://doi.org/10.3390/app16125953

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

MCA-FM: Robust Non-Invasive Fetal ECG Extraction via Minimal Channel Attention and Flow Matching

Abstract

1. Introduction

2. Materials and Methods

2.1. Databases

2.2. Preprocessing

2.3. Proposed MCA-FM Architecture

2.3.1. Conditional Flow Matching Formulation

2.3.2. Flow Matching Modules

2.3.3. Minimal Channel Attention Module

2.3.4. Enhanced Residual Connection

2.4. Training Strategy

2.5. Inference Process

2.6. Evaluation Metrics

2.6.1. Waveform Reconstruction Quality Metrics

2.6.2. Fetal R-Peak Detection Metrics

3. Results

3.1. Waveform Reconstruction Performance

3.2. R-Peak Detection Performance

3.3. Visualization Verification

3.3.1. Visual Analysis of Extracted FECG Waveforms

3.3.2. Verification of R-Wave Localization Accuracy

3.4. Statistical Validation of Cross-Dataset Stability

3.5. Ablation Study

3.6. Sensitivity Analysis of BVS Parameter

3.7. Effect of Normalization

3.8. Cross-Dataset Generalization Analysis

3.9. Evaluation on Synthetic Data with Controlled Interferences

3.10. Consistency of Fetal Physiological Parameters (HR and RR Intervals) Across Datasets

3.11. Time Complexity, Resource Footprint, and Real-Time Performance Analysis

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI