1. Introduction
Electroencephalography (EEG) is a widely used non-invasive technique for monitoring brain activity with millisecond temporal resolution and comparatively low-cost instrumentation, supporting applications ranging from clinical assessment to brain–computer interfaces and the longitudinal monitoring of cognitive states [
1,
2,
3,
4,
5,
6,
7,
8]. In parallel, there is growing interest in wearable and limited-lead EEG (e.g., headbands, ear-EEG, and EEG integrated into mobile or VR form factors) to improve comfort and support use outside the laboratory [
9,
10,
11,
12,
13]. These designs can, however, make analysis more sensitive to non-neural contamination, particularly when fewer channels are available [
14].
EEG is low-amplitude (tens of microvolts) and is, therefore, susceptible to physiological artifacts, especially ocular activity (EOG) from blinks and eye movements and muscle activity (EMG) from cranial and facial muscles [
15,
16,
17]. These contaminants can reach amplitudes comparable to, or larger than, the underlying neural signal and can overlap with conventional EEG bands (ocular transients often dominate low frequencies, while EMG contributes broadband activity extending into higher frequencies). Such overlap can distort event-related potentials, and bias spectral or connectivity measures, with effects that may be more pronounced in ambulatory settings, where blinking and facial activity are frequent [
17,
18,
19].
A broad toolbox exists for artifact mitigation in multichannel EEG; however, many common approaches rely on spatial redundancy, which is not present in single-lead recordings. Blind source separation and subspace methods (e.g., independent component analysis (ICA) and artifact subspace reconstruction (ASR)) typically exploit cross-channel structures to separate artifact-dominated sources from neural activity [
20,
21,
22,
23], while regression methods that use auxiliary reference channels may be impractical in some wearable deployments [
24,
25,
26,
27]. As a result, single-channel pipelines often turn to time–frequency or data-adaptive decompositions that map a 1-D mixture into multiple components that can be selectively attenuated, including wavelet transforms, empirical mode decomposition (EMD) family methods, variational mode decomposition (VMD), and singular spectrum analysis (SSA).
Wavelet decompositions, including discrete wavelet transform (DWT) and stationary wavelet transform (SWT), are widely used for single-channel EEG denoising because they provide a multiscale representation (approximation plus detail sub-bands) that can support sub-band selection and/or coefficient thresholding during reconstruction [
28,
29,
30,
31,
32]. Their behaviour is shaped by a small set of practical hyperparameters, most notably the mother wavelet, decomposition level, and thresholding/shrinkage rule (global vs. band-wise); published results suggest that these choices can be condition- and dataset-dependent. For instance, Khatun et al. examined blink removal under multiple wavelets and thresholding schemes and reported sensitivity to these design choices [
32].
EMD-family methods are widely used for single-channel EEG denoising because they provide a data-adaptive decomposition of the signal into intrinsic mode functions (IMFs) plus a residual, often yielding components ordered from faster to slower oscillations without assuming stationarity [
33]. To reduce the mode-mixing seen in classical EMD, noise-assisted variants such as ensemble empirical mode decomposition (EEMD) and complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) perform ensemble decompositions with added noise, with CEEMDAN commonly used when more stable IMFs are desired [
34,
35]. In practice, CEEMDAN’s behaviour is influenced by a few key hyperparameters, the added noise amplitude, the ensemble size, and the maximum number of IMFs, which can trade reconstruction quality against runtime and may vary with artifact regimes [
36]. Recent studies illustrate both the popularity of these methods and the variability in evaluation choices (e.g., comparisons across EMD/EEMD/CEEMDAN on EMG/ECG denoising [
37], and bidimensional empirical mode decomposition-based ERP denoising compared with EEMD-based baselines [
38]).
Variational mode decomposition (VMD) is a commonly used single-channel decomposition method that represents a signal as a sum of K band-limited modes, each centred around an adaptively estimated frequency via an optimisation procedure [
39]. In EEG denoising, VMD is often used to generate candidate components that can be attenuated or selected during reconstruction without requiring multichannel spatial information [
40]. Its practical behaviour is influenced by a small set of hyperparameters, most notably, the number of modes (K), which control decomposition granularity and can affect both separability and runtime; other settings (e.g., penalty/bandwidth and stopping criteria) are frequently kept at defaults in applied work. Recent single-channel studies have used VMD mainly for ocular artifact suppression with fixed or heuristically chosen settings, e.g., two-stage VMD pipelines for blink removal [
41], Multiscale modified sample entropy-guided VMD component regression [
42], and hybrid VMD–BSS approaches reporting dataset-specific preferences for K [
40].
Furthermore, recent work has expanded VMD-based denoising through hybrid selector stages that explicitly identify artifact-related modes or localise artifacts in the time–frequency domain. For example, Kaur et al. proposed a hybrid VMD–detrended fluctuation analysis (DFA)–wavelet approach in which VMD modes are screened using DFA/Hurst-type criteria and only the flagged modes are further denoised using wavelet transforms, with evaluation reported on simulated mixtures and a real depression dataset [
43]. Other recent studies have also explored VMD-family variants and hybrid frameworks for artifact removal, including successive or adaptive VMD strategies coupled with automated component identification, and muscle-artifact pipelines combining VMD with wavelet-domain localisation and correlation-based separation [
44,
45]. Collectively, this literature reinforces that VMD’s realised denoising performance depends not only on the decomposition itself (notably
) but also on the downstream mode-selection or weighting rule, motivating decomposition-centric evaluation under ideal weighting and systematic hyperparameter sweeps, as performed in this study.
Singular spectrum analysis (SSA) is a single-channel decomposition method that embeds a time series into a trajectory (Hankel) matrix and uses singular value decomposition to reconstruct a set of components that can be recombined for denoising [
46,
47]. In EEG applications, SSA is often used as a data-adaptive component generator followed by heuristic selection or attenuation of components before reconstruction [
48]. Its behaviour is largely governed by the window length (L) and the associated grouping/retention strategy, which influence both separability and the number of components to manage [
48]. Prior work has applied SSA to ocular blink suppression with clustering-based selection [
49], to EMG attenuation using mobility-type criteria [
48], and to multi-artifact settings when combined with ICA (e.g., SSA–SOBI) under fixed design choices and semi-simulated mixtures [
50].
In parallel with the decomposition-based denoising, recent work has increasingly explored end-to-end deep learning for single-channel EEG artifact removal, including convolutional neural network (CNN)–transformer hybrids, decision-guided routing networks, and transformer-based denoisers that fuse local and non-local structure [
51,
52,
53,
54]. These models typically learn a direct nonlinear mapping from contaminated to clean EEG using semi-simulated mixtures (e.g., EEGdenoiseNet) and report signal-level metrics on those benchmarks. However, their operation is usually “always-on”; the learned mapping is less transparent than decomposition pipelines that expose intermediate components and provide explicit handles (e.g., number of modes, wavelet family/level, window length) for controlling latency and behaviour. Accordingly, deep denoisers and decomposition methods address complementary needs: learned models can provide powerful nonlinear suppression, whereas decomposition pipelines can be more interpretable and tuneable, but are sensitive to hyperparameter choice and component-selection rules.
Across decomposition-based approaches, several practical considerations recur. Performance depends on design choices (e.g., decomposition depth, basis or penalty settings, ensemble size) that influence component separability, computational cost, and downstream selection burden, and these settings are often chosen heuristically. Moreover, end-to-end denoising outcomes conflate the capabilities of the decomposition itself with those of the downstream selector. To disentangle these factors, we quantify decomposition recoverability capacity under ideal component weighting via an oracle reconstruction, providing an upper-bound reference and stability/latency trade-offs that can guide practical selector design and controlled deployment.
Recent denoising studies, including both decomposition-based hybrids and end-to-end deep networks, are typically evaluated as complete pipelines in which the decomposition, selector, and reconstruction stages are intertwined. As a result, it is often unclear whether observed performance differences arise from the representational capacity of the decomposition itself or from the downstream selection rule. The present work addresses this gap by introducing an oracle-based benchmark that isolates decomposition recoverability under ideal component weighting, while also characterising hyperparameter stability and performance–latency trade-offs across method families.
These considerations motivate several practical questions:
RQ1: Are decomposition hyperparameters (e.g., depth/ensemble size/mode count) reasonably stable across artifact types and contamination strengths, or do preferred settings shift by regime?
RQ2: Do different decomposition families show regime-dependent advantages (EOG, EMG, mixed; low vs. high NSR), or is any one family consistently competitive?
RQ3: How does decomposition depth affect the performance–runtime balance, and where do diminishing returns appear?
Here, we use an oracle-based benchmarking approach to examine the representational behaviour of several common single-channel decomposition families under controlled conditions. Using a benchmark with a known clean reference EEG, we sweep hyperparameter grids for DWT, CEEMDAN, VMD, and SSA. For each noisy epoch, we compute an oracle reconstruction as a bounded, nonnegative linear recombination of the decomposition components, which serves as a reference point for what can be recovered, given that representation. We summarise results by artifact kind (EOG, EMG, mixed) and by noise-to-signal ratio (NSR) bins, rather than relying only on pooled averages. To relate oracle results to more practical settings, we also report heuristic operating points: Best, an effect-size–aware Opt1 rule that selects the fastest configuration that is negligibly different from the bin-wise Best, and a Pareto-based Opt2 criterion to summarise performance–latency trade-offs under consistent tuning rules.
In this work, we:
evaluate DWT, CEEMDAN, VMD, and SSA under controlled hyperparameter sweeps in a single-channel benchmark;
compute oracle reconstructions to summarise what is recoverable from each decomposition independent of a specific selector;
report performance by artifact kind (EOG, EMG, mixed) and by NSR bins, rather than only pooled averages;
summarise practical settings using Best/Opt1/Opt2 criteria to reflect performance–latency trade-offs.
This study is intentionally framed as a recoverability benchmark rather than a deployable artifact-removal pipeline. By using an oracle that has access to the clean target, we quantify an upper bound on what each decomposition family can reconstruct when component weighting is ideal, independent of any practical component-selection rule. Consequently, conclusions are restricted to the controlled benchmark conditions and are intended to guide subsequent development and evaluation of implementable selectors.
The remainder of the paper describes the benchmark construction and decomposition grids, defines the constrained oracle reconstruction and evaluation metrics, introduces Best/Opt1/Opt2 selection, and reports within-method hyperparameter behaviour and inter-method comparisons stratified by artifact kind and NSR.
4. Discussion
Our results provide a decomposition-focused view of four commonly used single-channel families (DWT, CEEMDAN, VMD, SSA) under the oracle reconstruction setting and can be summarised in relation to the practical questions, as mentioned previously in the Introduction.
RQ1 (hyperparameter stability across regimes). Across the NSR bins and contamination kinds considered, SSA and DWT showed comparatively stable near-optimal regions under the explored grids. SSA’s Best settings clustered around a moderate window length (near ). DWT’s Best settings consistently favoured coif3 at the highest tested depth, while Opt1/Opt2 selected shallower levels with only small changes in oracle metrics.
For VMD and CEEMDAN, the Opt1 selections showed clearer regime dependence by contamination type. Under bin-wise Opt1 tuning, EOG segments selected relatively modest VMD mode counts ( across bins), whereas EMG and mixed segments selected higher values ( for EMG and for mixed), with the highest-NSR bins favouring the largest in both cases. For CEEMDAN (Nstd/NR/MaxIMF), Opt1 was stable within each contamination kind but shifted across kinds: EOG bins consistently selected 0.10/30/8, while EMG and mixed bins consistently selected 0.30/30/16 under the tested grid. These patterns suggest that reasonable operating points may be more transferable for some families (SSA, DWT) than others, and that preferred settings can depend on contamination regime.
RQ2 (regime-dependent competitiveness across families). Under bin-wise Opt1 tuning, SSA was most frequently top-ranked by epoch-wise RRMSE across most conditions, particularly under EMG and mixed contamination. The clearest exception occurred in the lowest EOG NSR bin, where DWT was often most competitive and rank distributions overlapped more strongly between methods. Overall, these results suggest that relative rankings can vary with both artifact type and contamination level, with some regimes showing clearer separation and others appearing closer to “tied” behaviour under the oracle/Opt1 lens.
RQ3 (performance–runtime trade-offs and diminishing returns). The within-method sweeps illustrate that improved oracle performance can coincide with higher computational cost; however, the degree of this trade-off differs by family. VMD benefited from increasing in terms of oracle metrics but with near-linear time growth, making moderate values a plausible compromise under Opt1/Opt2. CEEMDAN’s runtime was dominated by ensemble size (NR) in our implementation; larger ensembles often provided only modest oracle gains within the tested grid. By contrast, DWT and SSA were comparatively inexpensive per epoch in our MATLAB implementation; their Opt1/Opt2 operating points tended to retain most of the Best performance while reducing depth/complexity.
Notably, oracle reconstruction errors were generally higher under EMG and mixed regimes than under EOG regimes in this benchmark; the Opt1 selections for VMD and CEEMDAN tended to require higher decomposition capacity in EMG/mixed than in EOG. While this does not imply that EMG is universally harder in all settings, it is consistent with the idea that EMG-heavy regimes can be more demanding for single-channel decomposition-based suppression under the contamination model and hyperparameter ranges considered here.
4.1. Practical Considerations Beyond Performance
Our primary inter-method comparison is performance-driven (epoch-wise rank distributions under bin-wise Opt1 tuning using oracle reconstructions). In applied settings, however, method choice is also influenced by latency constraints, the transparency of decomposition outputs, and the practical burden of tuning and component handling.
Runtime practicality. In our MATLAB setup (MATLAB R2023b, i7 CPU, 32 GB RAM), DWT and SSA executed in the millisecond range per 2 s epoch, VMD was slower and increased with the number of modes, and CEEMDAN was slowest (≈0.14–0.49 s per epoch under our implementation). Because absolute timings depend on software optimisation and hardware, these results are best interpreted as relative indicators within a consistent implementation rather than fixed deployment estimates.
Decision practicality (component-handling burden). Decomposition-based denoising requires translating components into a reconstruction decision. DWT yields a structured set of sub-bands that can support simple attenuation rules. VMD produces modes which can remain manageable at moderate , while larger may introduce redundancy and increase handling burden. CEEMDAN produces multiple IMFs and can be sensitive to noise-assistance settings. SSA can yield many candidate components; however, in practice, it is often paired with grouping rules rather than purely manual component-by-component curation. These considerations are most relevant in regimes where oracle performance is closely overlapping between methods.
Tuning burden and stability. Under our explored grids, SSA and DWT showed relatively broad near-optimal regions across NSR bins, suggesting that once a reasonable configuration is established, extensive retuning may be less critical within this benchmark. By comparison, VMD’s -dependent trade-offs and CEEMDAN’s interacting parameters (particularly ensemble size vs. runtime) make tuning choices more consequential, especially when latency constraints are present.
Implications for method choice under this evaluation. Under oracle/Opt1 conditions, SSA most frequently ranked first across most bins and contamination types, while DWT was most competitive in the lowest EOG NSR bin. In regimes where rank distributions overlap (i.e., methods appear effectively tied), practical considerations, such as latency and component-handling complexity, may reasonably guide method choice. These observations should be interpreted as patterns within our synthetic contamination and oracle reconstruction framework, rather than as deployment-ready prescriptions.
4.2. Interpreting Oracle Results and Implications
4.2.1. Interpreting Oracle Performance
Oracle reconstructions are defined using the clean reference and, therefore, serve as a best-case reference point for what is recoverable from a given decomposition representation under the tested contamination regimes. They do not constitute a deployable method without an explicit selection or weighting rule. Within this framing, the strong oracle/Opt1 performance of SSA, particularly under EMG and mixed contamination, suggests that SSA representations can support low-error reconstructions when component weights are chosen optimally. The EOG results were more regime-dependent, with closer competition at low EOG NSR and clearer separation at higher EOG NSR, consistent with the idea that relative advantages can vary with contamination level in single-channel settings. The oracle should therefore be interpreted as a best-case reference within a fixed component basis, not as evidence that the decomposition itself yields perfectly separated neural and artifact components.
4.2.2. From Oracle to Deployment: The Selection Gap
A practical pipeline requires an explicit mechanism to approximate the oracle’s component decisions. Candidate approaches include lightweight heuristics (e.g., component bandpower ratios, time–frequency concentration measures, or transient morphology indicators) or learned selectors that output per-component attenuation weights. Quantifying the resulting gap-to-oracle across contamination regimes helps to distinguish decompositions that are not only recoverable in principle but also tractable in practice.
To provide an empirical bridge in a setting where established selector rules exist, we added a practical baseline within the wavelet family using standard coefficient thresholding (DWT, universal soft thresholding). Under this baseline, coif3 achieved the best overall performance on ocular mixtures and was consistently best in moderate-to-high ocular regimes (
dB;
Table 5), broadly aligning with the oracle-guided preference for coif3 under nontrivial contamination. In the mildest ocular bin
dB, db4 slightly outperformed coif3, and paired testing confirmed that this small difference is statistically detectable (
Supplementary Table S4). Together, these results emphasize that oracle analysis quantifies representational headroom, whereas realized performance depends on selector behaviour and operating regime.
From a deployment perspective, the results suggest that decomposition choice and hyperparameter selection should be matched to both the contamination regime and the latency budget. The Opt1 operating points are useful when maximizing recoverability is the primary goal, whereas the Opt2 points provide a more pragmatic choice when runtime or energy constraints are important. In this sense, the benchmark is not only comparative but also prescriptive: it indicates which settings are likely to offer the best accuracy–efficiency trade-off before any selector is designed. More broadly, practical selector design is likely to remain family-specific: wavelet pipelines naturally align with thresholding rules, VMD with mode-selection or mode-weighting criteria based on frequency content and bandwidth, SSA with grouping or subspace-selection rules, and EMD-family methods with IMF-selection heuristics. Likewise, within-family extensions, such as adaptive or successive VMD variants, stationary or dual-tree complex wavelet transforms, and other advanced decomposition variants, may further improve practical performance; however, evaluating these would shift the emphasis from inter-family upper-bound comparison to within-family method development. We therefore view the present study as a decomposition-centric reference point that can guide these next-stage selector and family-extension studies in a more controlled and interpretable way.
4.2.3. Limitations
A key limitation of the present benchmark is that the oracle reconstruction requires access to the clean target and, therefore, cannot be used directly in deployment. Importantly, the oracle does not assume that the decomposition has already achieved perfect separation of neural and artifact components; rather, it computes the best achievable reconstruction within the span of the obtained component set under bounded weighting. This makes it a decomposition-centric upper bound rather than a practical denoiser. Relatedly, the benchmark relies on synthetic mixtures because the question posed here, how much clean EEG is recoverable from a given decomposition under ideal weighting, requires a known clean reference. In this sense, the use of semi-simulated data is intentional: the clean EEG and artifact exemplars are real signals drawn from EEGdenoiseNet, while the simplification lies in the additive mixing model and the controlled NSR design. We therefore interpret the results as upper-bound recoverability under controlled conditions, while recognising that genuine EEG artifacts can be more nonlinear, nonstationary, and context-dependent than the mixtures considered here. Real-data validation remains important; however, in this study, it is most naturally framed as a next step for evaluating practical selectors rather than as a substitute for the oracle benchmark itself. Furthermore, because epochs are not linked to subject/session identifiers, epoch-wise inferential testing may overstate certainty if samples are correlated; accordingly, we emphasise rank distributions and robust descriptive summaries and treat inferential tests as exploratory. Runtime comparisons are implementation-dependent and should be interpreted comparatively within our setup.
4.2.4. Future Work
Immediate next steps are (i) evaluating heuristic and learned selectors to approximate oracle weighting and reporting the resulting gap-to-oracle across regimes; and (ii) assessing task preservation (e.g., ERP morphology, SSVEP peaks, and downstream decoding stability) to check that improvements in reconstruction metrics do not come at the cost of attenuating neurophysiologically meaningful content. More complex adaptive pipelines (e.g., regime-aware switching across methods) could then be explored once baseline selector performance is established.
5. Conclusions
This work presented an oracle-based benchmark of four commonly used one-dimensional decomposition families (VMD, SSA, DWT, and CEEMDAN) for single-channel EEG artifact suppression under controlled EOG, EMG, and mixed contamination across NSR bins and the pooled range . By computing an oracle reconstruction through the constrained re-weighting of decomposition components against a clean reference, we used a decomposition-focused reference point to examine how recoverability varies with both noise regime and decomposition granularity.
Under bin-wise Opt1 tuning (effect-size–aware “good-enough” settings), SSA was the most consistently competitive method, ranking first in 14/15 contamination conditions. In the pooled bin, SSA achieved the best mean ranks for EOG (1.85), EMG (1.27), and mixed (1.47) contamination. The main exception was mild EOG contamination , where DWT most frequently ranked first, suggesting that relative performance can be regime-dependent even under consistent tuning rules.
Beyond oracle performance, we summarised practical considerations that may influence method choice in applied pipelines, including runtime, tuning burden, and component-handling complexity. While absolute timings are implementation-dependent, DWT and SSA were comparatively lightweight in our MATLAB setup, VMD offered adjustable granularity with higher compute cost, and CEEMDAN’s runtime was substantially higher under the tested settings.
Because oracle weighting uses access to the clean reference, these findings should be interpreted as an upper bound on what each decomposition can recover under the tested regimes. Translating this bound into deployable pipelines requires practical component selection/weighting; future work should quantify the gap to oracle and validate task preservation (e.g., ERP/SSVEP integrity) on broader real-world and wearable artifacts.