Next Article in Journal
Case Study on Compression of Vibration Data for Distributed Wireless Condition Monitoring Systems
Next Article in Special Issue
Validation of an Automated Seizure Detection Procedure for Multi-Channel Neonatal EEG
Previous Article in Journal
Development of a CAD–FEA Integrated Automation Add-In for DfAM-Aware Topology Optimization: A Case Study on an Additively Manufactured Pusher Duct Support Bracket for a Novel UAV Prototype
Previous Article in Special Issue
Implementation of a Lightweight CNN Approach to Interictal Epileptiform Discharge Detection in EEG
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Outlier Detection in EEG Signals Using Ensemble Classifiers

by
Agnieszka Duraj
1,*,
Natalia Łukasik
2 and
Piotr S. Szczepaniak
1
1
Institute of Information Technology, Lodz University of Technology, al. Politechniki 8, 93-590 Lodz, Poland
2
University Clinical Center of the Medical University of Warsaw, ul. Banacha 1a, 02-097 Warsaw, Poland
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(22), 12343; https://doi.org/10.3390/app152212343
Submission received: 16 September 2025 / Revised: 23 October 2025 / Accepted: 27 October 2025 / Published: 20 November 2025
(This article belongs to the Special Issue EEG Signal Processing in Medical Diagnosis Applications)

Abstract

Epilepsy is one of the most prevalent neurological disorders, affecting over 50 million people worldwide. Accurate detection and characterization of epileptic activity are clinically critical, as seizures are associated with substantial morbidity, mortality, and impaired quality of life. Electroencephalography (EEG) remains the gold standard for epilepsy assessment; however, its manual interpretation is time-consuming, subjective, and prone to inter-rater variability, emphasizing the need for automated analytical approaches. This study proposes an automated ensemble classification framework for outlier detection in EEG signals. Three interpretable baseline models—Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), and decision tree (DT-CART)—were screened. Ensembles were formed only from base models that had a pre-registered meta-selection rule ( F 1 on the outlier-class > 0.60 ). Under this criterion, DT-CART did not qualify and was excluded from all ensembles; final ensembles combined SVM and k-NN. The framework was evaluated on two publicly available datasets with distinct acquisition conditions. The Bonn EEG dataset comprises 500 artifact-free single-channel recordings from healthy subjects and epilepsy patients under controlled laboratory settings. In contrast, the Guinea-Bissau and Nigeria Epilepsy (GBNE) dataset contains multi-channel EEG recordings from 97 participants acquired in field conditions using low-cost equipment, reflecting real-world diagnostic challenges such as motion artifacts and signal variability. The ensemble framework substantially improved outlier detection performance, with stacking achieving up to a 95.0% F 1 -score (accuracy 95.0%) on the Bonn dataset and 85.5% F 1 -score (accuracy 85.5%) on the GBNE dataset. These findings demonstrate that the proposed approach provides a robust, interpretable, and generalizable solution for EEG analysis, with strong potential to enhance reliable, efficient, and scalable epilepsy detection in both laboratory and resource-limited clinical environments.

1. Introduction

Epilepsy is among the most common neurological disorders, affecting over 50 million people worldwide according to the World Health Organization (WHO) [1]. The disease contributes to more than 0.5% of the global burden of disease and is associated with considerable morbidity, mortality, and reduced quality of life [2,3]. Epileptic seizures manifest in diverse forms, ranging from subtle impairments of consciousness to generalized convulsions and sudden collapses. In severe cases, seizures may result in traumatic injuries or even fatalities [2]. The heterogeneity of seizure manifestations makes accurate and timely diagnosis particularly challenging, while also underscoring its clinical significance. In low-resource settings, delayed or inaccurate diagnosis remains a major barrier to treatment, further increasing the disease burden [4].
Electroencephalography (EEG) is the principal diagnostic tool recommended by the International League Against Epilepsy (ILAE) [5,6]. EEG records electrical potentials generated by synchronized neuronal activity and provides high temporal resolution while being non-invasive. In clinical practice, EEG allows physicians to identify interictal epileptiform discharges, seizure events, and abnormal rhythms [7]. Nevertheless, manual EEG interpretation is a time-consuming and expertise-dependent process, characterized by subjectivity and inter-rater variability [8]. Furthermore, long-term monitoring produces vast data volumes, which amplifies the risk of oversight and increases clinicians’ workload. These limitations motivate the development of automated computational methods to improve both efficiency and accuracy in epilepsy diagnostics.
Over the past decades, numerous computational approaches have been proposed for epileptiform event detection and seizure prediction. Early methods focused on template matching and morphological features [9], whereas later studies incorporated spectral and time–frequency analyses to capture more complex EEG dynamics [10]. With the growth of machine learning, supervised and unsupervised classifiers have been applied to distinguish abnormal from normal EEG segments [11,12]. More recently, deep learning techniques have gained traction, demonstrating remarkable potential in automated EEG interpretation [13,14,15,16]. Despite these advances, major challenges persist, including class imbalance between seizure and non-seizure states, high inter-subject variability, and the presence of artifacts such as muscle activity or electrode noise that can mimic pathological patterns. These challenges motivate the exploration of outlier-oriented frameworks capable of distinguishing clinically relevant abnormalities from noise and artifacts [17,18,19].
The limitations of existing methods highlight the need for approaches that can capture rare and atypical patterns in EEG signals. In this framework, abnormal events can be regarded as outliers, defined as signal segments that deviate significantly from background activity [20]. Such anomalies may correspond to epileptiform discharges, seizure events, or pathological rhythms, but may also arise from artifacts or noise. Detecting outliers is therefore clinically valuable: it enables the identification of rare but relevant abnormalities without exhaustive manual annotation, alleviates neurologists’ workload, and facilitates timely diagnosis and patient monitoring [21]. Outlier detection thus provides a natural framework for addressing the inherent imbalance between abundant normal activity and rare pathological events in EEG recordings—an especially important capability in clinical practice, where detecting subtle epileptiform discharges can inform diagnosis, treatment decisions, and patient monitoring [21].
This operational perspective aligns with ILAE terminology and supports a screening-level distinction between epileptiform (outlier) and non-epileptiform (normal) EEG segments. To make this conceptual distinction explicit, we provide the following operational definition used throughout this study.
Definition 1
(EEG Outlier). An outlier in the EEG context denotes a signal segment exhibiting epileptiform activity—ictal or interictal spikes, sharp waves, spike–waves, or seizure patterns—that deviate from normal background rhythms. Accordingly, the binary labels used throughout are outlier (epileptiform) versus non-outlier (non-epileptiform). Artifact-only segments are excluded unless explicitly marked as epileptiform in the source dataset.
Building upon this conceptualization, one promising direction is to enhance outlier detection through ensemble learning. Ensemble methods aggregate the predictions of multiple classifiers, leveraging their complementary strengths to enhance predictive accuracy, reduce variance, and improve generalization [22]. Unlike conventional single classifiers, which are sensitive to parameter selection and prone to overfitting noisy data, the proposed automated ensemble approach systematically selects well-performing base learners and integrates them through complementary aggregation strategies. This reduces subjectivity in model design and increases robustness against artifacts and inter-subject variability—both critical for reliable clinical use in EEG-based epilepsy diagnostics.
In this study, we propose an ensemble-based outlier detection framework for EEG. We screen three interpretable baseline models—Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), and decision tree (DT-CART)—under a pre-registered eligibility rule ( F 1 , outlier > 0.60 ). Only models meeting this criterion are used to form ensembles; under this rule, DT-CART did not qualify, so final ensembles combine SVM and k-NN. We then aggregate eligible base learners using bagging, stacking, and majority voting. By combining complementary decision boundaries, the framework increases robustness to noise, mitigates class imbalance, and improves reliable identification of abnormal EEG segments.
Although deep and transformer-based models have demonstrated outstanding performance in recent EEG studies, the proposed framework deliberately employs classical machine learning algorithms—SVM, k-NN, and DT-CART—as base learners. This decision is motivated by three factors: (i) interpretability and computational efficiency, which facilitate clinical acceptance; (ii) smaller data requirements, given the limited availability of labeled EEG recordings; and (iii) the ability to serve as baseline modules in an automated ensemble generator that can later incorporate deep or transformer-based architectures. Furthermore, the selected base classifiers address complementary aspects of EEG abnormality detection: SVM captures nonlinear, high-dimensional decision boundaries typical of ictal activity; k-NN effectively models local neighborhood relations for subtle interictal discharges; and DT-CART provides interpretable, rule-based partitions that separate normal background rhythms from artifact-related patterns. Together, these models offer a balanced trade-off between flexibility, interpretability, and computational efficiency—attributes particularly important for detecting diverse EEG abnormalities within limited and heterogeneous datasets.
The first dataset, the Bonn EEG dataset, represents a well-structured benchmark comprising high-quality recordings from both healthy individuals and epilepsy patients under controlled laboratory conditions [23]. In contrast, the Guinea-Bissau and Nigeria Epilepsy (GBNE) dataset offers a more realistic scenario, containing field-acquired EEG signals recorded using low-cost equipment [24]. This dataset is of particular clinical importance because it reflects the challenges of diagnosing epilepsy in low-resource environments, where neurologists and specialized equipment are often scarce. Together, these datasets provide a dual perspective: Bonn serves as a clean benchmark for methodological validation, whereas GBNE captures the complexity of real-world diagnostics in resource-limited clinical environments. This combination enables a comprehensive evaluation of model robustness under both idealized and realistic conditions.
Problem statement and contributions. This study frames epileptiform activity detection as a binary outlier detection problem in EEG: signal segments that deviate from normal background rhythms are labeled as outliers (epileptiform), whereas normal background activity is treated as non-outlier (non-epileptiform). The main contributions of this work are as follows:
  • Operational definition of EEG outliers. We formalize an ILAE-aligned mapping between the clinical notion of “epileptiform” and the modeling term “outlier,” and use it consistently throughout the study.
  • Automated, performance-based ensemble construction. Base models are automatically admitted to ensembles only if they satisfy a pre-registered eligibility rule ( F 1 , outlier > 0.60 ), improving transparency and reproducibility.
  • Systematic comparison of aggregation strategies. We evaluate three ensemble aggregators—bagging, stacking, and majority voting—across homogeneous and heterogeneous configurations.
  • Validation on clean vs. field EEG. We assess generalization on the artifact-free Bonn dataset and the noisy, low-cost GBNE dataset, reflecting real-world diagnostic constraints.
  • Statistical validation. We employ McNemar’s test on paired out-of-fold predictions to confirm that ensemble improvements over the best base model are statistically significant.
The remainder of this paper is organized as follows. Section 2 presents a review of existing solutions related to EEG signal analysis and outlier detection methods. Section 3 introduces the baseline classifiers and ensemble learning techniques used in this study. Section 4 describes the proposed procedures for building homogeneous and heterogeneous ensemble models. Section 5 details the research methodology, including dataset characteristics, evaluation metrics, and experimental design. Section 6 discusses the experimental results obtained for both EEG datasets, and Section 7 provides final conclusions and future research directions.

2. Related Work

Outlier detection in EEG (electroencephalographic) signals represents a significant area of research in neuroinformatics and biomedical signal processing. Key applications of such analyses include neurological diagnostics, patient state monitoring, and the development of brain–computer interfaces (BCIs). Most existing studies primarily focus on seizure classification, while the detection of anomalous or outlier events—despite its clinical relevance for capturing rare epileptiform discharges—remains comparatively underexplored.

2.1. Traditional EEG Analysis Methods

Traditional analysis of EEG signals has historically relied on linear signal processing and statistical techniques aimed at identifying abnormal patterns associated with neurological disorders, particularly epilepsy. One of the most widely applied approaches is spectral analysis using the Fast Fourier Transform (FFT), which decomposes EEG into its constituent frequency components and enables the detection of abnormal oscillatory activity [25]. Although effective for identifying broad-band abnormalities such as excess delta or theta rhythms, the FFT assumes stationarity of the signal and therefore struggles with the highly non-stationary nature of EEG.
To overcome these limitations, time–frequency methods such as the Short-Time Fourier Transform (STFT) and Wavelet Transform (WT) have been employed. The STFT provides localized frequency information within sliding time windows, but the choice of window size imposes a trade-off between time and frequency resolution [26]. In contrast, the WT offers a multiresolution analysis, enabling the capture of both transient spikes and long-term oscillatory trends, and has been widely adopted for seizure detection and artifact characterization [27]. Wavelet packet decomposition further extends this approach by providing a more detailed representation of high-frequency components relevant for epileptic spike detection [28].
Classical statistical measures, such as autoregressive (AR) modeling, Hjorth parameters, and higher-order spectra, have also been explored for characterizing EEG activity [11]. AR models estimate power spectral densities and can highlight abnormalities in rhythmic activity, while Hjorth parameters (activity, mobility, and complexity) provide time-domain descriptors of signal variability. Another family of approaches includes nonlinear dynamical analysis and chaos-theory-based measures (correlation dimension, Lyapunov exponents, and entropy measures), which aim to capture the nonlinear dynamics underlying epileptiform activity [29]. Despite their historical importance, these methods are sensitive to artifacts and rely on handcrafted features and stationarity assumptions, which limits robustness in clinical practice. Comprehensive reviews of EEG feature extraction techniques for epilepsy further emphasize that handcrafted statistical and dynamical descriptors, while informative, are insufficient for robust clinical deployment in heterogeneous patient populations [30].

2.2. Machine Learning Methods

With the rise of machine learning (ML), more flexible and adaptive approaches have emerged for EEG analysis. Supervised models such as Support Vector Machines (SVMs), Random Forests (RFs), decision trees (DTs), and k-Nearest Neighbors (k-NN) have demonstrated consistently high accuracy in EEG classification tasks [25]. SVM remains popular in seizure detection due to its ability to handle high-dimensional feature spaces [31], while RF improves robustness to noise via aggregation of multiple decision trees, and k-NN performs well on smaller datasets with appropriate distance metrics [32]. Feature extraction/selection (e.g., PCA/ICA) is often combined with these classifiers to improve interpretability and efficiency [33]. Several surveys confirm that classical ML pipelines, while effective in controlled settings, often struggle to generalize to large and noisy clinical datasets such as TUH EEG or CHB-MIT [11,13,34]. Beyond classification, clustering methods (k-means and Gaussian Mixture Models) enable unsupervised grouping of EEG patterns and identification of anomalous states with limited labels [35].
Ensemble methods—bagging, Boosting, and Random Subspace—further improve robustness to noise and inter-subject variability [36]. For instance, AdaBoost increases sensitivity to seizure events compared with weak learners [37], while heterogeneous ensembles (SVM, k-NN, and DT) enhance generalization on imbalanced datasets [38]. Recent application-driven studies highlight practical deployments in seizure detection. Lasefr et al. [39] integrated a detector with a mobile application and achieved ∼98% accuracy on public EEG datasets. Khan et al. [40] introduced a shallow autoencoder combined with a classical classifier, reducing model complexity while maintaining performance. Palanisamy and Rengaraj [41] reported 96–98.5% detection rates by applying data augmentation and optimization-tuned LSTM models. Similarly, Lenkala et al. [42] demonstrated that AutoML frameworks can produce strong EEG models even without specialized machine learning expertise.
For unsupervised anomaly detection, widely used methods include the following:
  • Local Outlier Factor (LOF): Identifies observations with locally lower density than neighbors [43];
  • Isolation Forest: Partitions the feature space using random trees, enabling efficient detection in high-dimensional EEG [44];
  • DBSCAN: Flags noise points that do not belong to any cluster—useful in artifact identification [45].
Comparative studies (e.g., [25,32]) suggest that while single ML models can be strong, ensembles are typically more robust and clinically relevant.

2.3. Deep Learning and Hybrid Methods

Deep learning models (CNNs and RNNs) learn hierarchical representations directly from raw EEG signals. CNNs effectively capture spatial and local temporal patterns and can outperform traditional feature engineering pipelines [46]. RNNs (e.g., LSTM and GRU) model long-range temporal dependencies in sequential EEG data [47]. CNN–LSTM hybrids integrate spatial localization with temporal dynamics, achieving state-of-the-art performance in seizure detection, sleep staging, and brain–computer interface (BCI) tasks [13]. Incorporating attention mechanisms and bidirectional layers further enhances sensitivity to pre-ictal patterns [48]. Recent advances further extend deep learning architectures beyond unimodal EEG analysis through domain and modality fusion. Rasool et al. [49] introduced a deep neurocomputational fusion framework for multi-domain EEG analysis, achieving high diagnostic accuracy in autism spectrum disorder (ASD) detection by integrating spatial–temporal and spectral representations. Similarly, Bunterngchit et al. [50] proposed a temporal attention fusion network with a custom loss function for EEG–fNIRS classification, demonstrating that multimodal attention mechanisms can substantially enhance robustness and interpretability in neurophysiological signal analysis. Clinically oriented deep computer-aided diagnosis (CAD) systems have also been proposed. For instance, Ficici et al. [51] employed an LSTM model with wavelet-based features and asymmetry scoring to identify temporal lobe epilepsy (TLE) foci, achieving up to 96.7% accuracy on the Bonn dataset and high sensitivity on clinical EEG recordings. Recent reviews consistently report that deep learning architectures—particularly CNNs, RNNs, and hybrid models—achieve state-of-the-art performance in seizure detection and other EEG-related tasks, although challenges related to interpretability, robustness across patients, and clinical translation remain [13,16].
Unsupervised and semi-supervised deep learning approaches address the scarcity of labeled EEG data. Autoencoders are widely used for denoising, artifact removal, and seizure onset detection. Variational Autoencoders (VAEs) provide probabilistic latent representations that enable the detection of rare epileptiform discharges [52,53]. Generative Adversarial Networks (GANs) are employed for data augmentation and anomaly detection, improving robustness against class imbalance [54]. Transformer-based anomaly detectors leverage self-attention mechanisms to capture long-range temporal dependencies, enhancing sensitivity to subtle events [55]. A recent review categorizes transformer variants for EEG—including Time-Series, Vision, and Graph Attention Transformers—and reports successful applications across motor imagery, emotion recognition, and seizure detection [56].

2.4. Algorithms Dedicated to Outlier Detection

A line of work focuses specifically on EEG outlier detection under non-stationarity, low SNR, and high inter-subject variability. Smart K-Nearest Neighbor Outlier Detection (SKOD) combines k-NN with local structure analysis to avoid manual parameter tuning, though it can raise false alarms on ambiguous or noisy data [57]. Median/MAD-based pipelines detect noisy channels and outlier epochs efficiently but require threshold calibration [58]. Boundary and density methods (One-Class SVM and LOF) and streaming ensembles (Random Cut Forest) have been adapted for EEG anomaly detection, including real-time monitoring [43,59,60]. Deep approaches (autoencoders, VAEs, ensembles of autoencoders, and GANs) broaden the landscape and reduce false positives across heterogeneous populations [54,61]. Transformers continue this trend by explicitly modeling long-range temporal structure [55].
From a clinical perspective, dedicated outlier detection approaches are valuable because they can capture rare epileptiform discharges and subtle anomalies without requiring exhaustive manual labeling, thereby reducing neurologists’ workload [21,57].
Table 1 and Table 2 provide a synthesis of prior studies, highlighting both classical statistical methods and approaches based on deep learning and hybrid frameworks. This progression illustrates the evolution of EEG outlier detection techniques—from simple statistical strategies, through density-based methods and random forests, to modern transformer-based models.
These studies span a wide range of datasets (simulated, clinical, TUH EEG, CHB-MIT, and intracranial EEG), underscoring the need for universal and robust tools. To enhance interpretability, each study is accompanied by a brief note summarizing the primary challenge or limitation addressed by the proposed method.
Classical methods (One-Class SVM, LOF, and MAD) remain attractive for their efficiency and interpretability but are sensitive to artifacts and heterogeneity. Tree ensembles (Isolation/Random Cut Forest) improve scalability and robustness yet require careful tuning. Deep learning methods (autoencoders, VAEs, GANs, and transformers) model nonlinear dynamics and long-range dependencies more effectively, but demand larger datasets and pose interpretability challenges. No single technique dominates across conditions and datasets, motivating hybrid and ensemble frameworks and explainable AI for clinical deployment [34,64].
Building on these observations, the following section proposes a procedure for constructing ensemble classifiers that integrate the strengths of individual methods while mitigating their limitations.

3. Basic Methods

3.1. Baseline Classifiers

The Support Vector Machine (SVM) constructs decision boundaries that separate samples of different classes. This approach was originally introduced by Cortes and Vapnik as Support Vector Networks [65] and later formalized in Vapnik’s Statistical Learning Theory [66]. To handle nonlinear data, kernel functions map input data into a higher-dimensional space, where complex patterns can be separated by linear hyperplanes. The support vectors are the data points closest to the decision boundary, and their position directly influences its final shape. Key parameters to be tuned include the following:
  • Kernel—a function that transforms the input space; commonly used kernels include linear, polynomial, radial basis function (RBF), and sigmoid.
  • C—a regularization parameter controlling the trade-off between maximizing the margin and minimizing classification errors on the training data. Higher values of C increase model fit but may lead to overfitting.
  • Gamma—controls the shape of the decision boundary. A high gamma results in a more complex boundary, while a low gamma produces a smoother boundary.
SVMs are particularly effective in high-dimensional spaces and are resistant to overfitting with proper parameter tuning. However, their training can be computationally expensive for large datasets, and kernel selection can be nontrivial.
The second baseline classifier is the k-Nearest Neighbors (k-NN) algorithm, which assigns each sample to the class most frequently represented among its k nearest neighbors [67]. The choice of k is typically determined empirically, as it directly influences the smoothness of the decision boundary. Different distance metrics can be applied, such as Euclidean, Manhattan, or Minkowski distance. k-NN is straightforward to implement and requires no explicit training phase, which makes it computationally efficient at model construction. However, predictions can be costly at inference time, as distances to all training samples must be computed. Moreover, performance is highly sensitive to the choice of distance metric, the number of neighbors, and the data distribution.
The third baseline classifier is the decision tree (DT-CART, Classification and Regression Trees) [68]. DT-CART solves classification tasks by recursively partitioning the feature space using binary splits guided by impurity measures such as entropy or the Gini index. At each step, the algorithm selects the feature and threshold that maximize information gain or minimize impurity in the resulting subsets. Terminal nodes (leaves) correspond to decision classes [69]. DT-CART trees are binary, meaning that each internal node generates exactly two branches. Decision trees are easy to interpret and visualize, which makes them highly transparent. However, they tend to overfit the training data, especially when deep trees are used without pruning or regularization.
These baseline classifiers (SVM, k-NN, and DT-CART) have been widely applied in biomedical signal processing, including EEG-based epilepsy research. Their popularity stems from their ability to capture nonlinear patterns, handle limited training samples, and provide interpretable decision rules, which are essential in clinical practice [11,13]. They have been successfully used in seizure detection and epilepsy diagnosis, where interpretability and computational efficiency make them valuable for supporting clinical decision-making. Base models were included in the ensemble only if they achieved an outlier-class F 1 score greater than 0.60 on validation folds (Section 5.5). This threshold provides an objective criterion that ensures the inclusion of only sufficiently accurate models, reducing the influence of weak learners and minimizing the risk of overfitting. The specific choice of baseline algorithms (for example, SVM, k-NN, or DT-CART) may vary depending on the application domain or data characteristics. What is fundamental to the proposed framework is the performance-based selection rule itself, rather than the particular set of base classifiers. In this study, DT-CART did not meet the F 1 > 0.60 requirement on either dataset and was therefore excluded from all homogeneous and heterogeneous ensembles.

3.2. Ensemble Classifiers

In this study, ensemble classifiers were used for outlier detection. These classifiers are constructed from a set of n base models—either homogeneous (of the same type) or heterogeneous (of different types). In both cases, models with the highest validation performance were selected for inclusion. The strength of ensemble classifiers lies in aggregating the predictions of multiple base learners, which improves generalization, increases accuracy, and reduces the risk of misclassification.
Three ensemble aggregation strategies were implemented: bagging, stacking, and majority voting:
  • Bagging—trains multiple models on different random subsets of the training data. The final prediction is determined by majority voting (for classification) or averaging (for regression).
  • Stacking—combines the outputs of base models using a meta-model (e.g., logistic regression) that learns how to optimally integrate the individual predictions [36].
  • Majority voting:
    Hard voting—selects the class with the most votes.
    Soft voting—aggregates the class probabilities predicted by each base model; the final decision is based on the combined probabilities [62].
Bagging (bootstrap aggregating) was originally introduced by Breiman [70] as a variance reduction strategy for unstable classifiers such as decision trees. Stacking, or stacked generalization, was first proposed by Wolpert [71], enabling the use of meta-learners to combine predictions from multiple base models. The theoretical foundations of ensemble learning and voting schemes are comprehensively discussed by Dietterich [22].

4. Proposed Approach

Previous studies have demonstrated the feasibility of applying classical machine learning approaches such as k-NN, SVM, and decision trees for outlier detection in EEG signals, showing promising results in identifying epileptiform discharges and seizure-related anomalies [11,72]. Recent reviews further highlight the growing importance of anomaly and outlier detection in EEG analysis, particularly in the context of epilepsy diagnosis and seizure prediction [13]. Building upon these findings, the proposed approach develops ensemble classifier models using three aggregation techniques—bagging, stacking, and majority voting—selected for their complementary mechanisms of combining predictions and their potential to mitigate the limitations of individual base classifiers.
Bagging (bootstrap aggregating) improves prediction stability by reducing model variance, which is particularly beneficial when dealing with non-stationary EEG signals and the presence of noise and outliers. Stacking, as a meta-learning approach, constructs a higher-level model (meta-model) that learns how to optimally combine the predictions of base classifiers, thereby capturing complex relationships among them. Majority voting—in both hard and soft variants—provides a simple yet effective strategy for integrating decisions, especially when using heterogeneous base learners. Applying these three aggregation methods enables a comparative evaluation of their effectiveness in EEG outlier detection across homogeneous and heterogeneous ensemble configurations.
For ensemble construction, classical base classifiers were employed: k-NN, SVM, and DT-CART. The key advantage of the proposed framework lies in its automated ensemble generation, which minimizes subjective parameter tuning and ensures systematic model construction. By leveraging the diversity of multiple base learners, the ensembles exhibit higher robustness to noise, artifacts, and inter-subject variability than individual classifiers. This property is clinically significant, as it enhances the reliability of epileptiform abnormality detection in realistic EEG recordings.

4.1. Construction of Homogeneous Ensemble Classification Models

The process of building homogeneous ensemble classification models was carried out automatically using base classifiers. Let B = { b 1 , b 2 , , b m } denote the set of base classifier types (e.g., k-NN, SVM, and DT-CART), and let n represent the number of instances (models) for each type. Each type of classifier is treated as a separate base variant for which individual homogeneous ensembles are constructed.
The algorithm consists of the following steps:
  • Configuration generation:
    For each base classifier b B , define hyperparameter spaces P g r i d and P r a n d o m using grid search and random search techniques. Based on these, generate a set of possible hyperparameter configurations.
  • Training base models:
    Each hyperparameter configuration is used to train a model on the training dataset. Model performance and outlier detection are assessed using cross-validation.
  • Model evaluation and selection:
    For each trained model, compute performance metrics (accuracy A C C , precision P, recall, and F 1 ). Models are retained for ensemble construction if they achieve an outlier-class F 1 > F 1 , min on the validation fold (default F 1 , min = 0.60 ). Accuracy is reported for completeness but is not used for selection due to class imbalance.
  • Construction of homogeneous ensemble classifiers:
    For each base classifier type b i B , select n trained models with different hyperparameter configurations that satisfy F 1 > F 1 , min (outlier class). These models form a homogeneous ensemble composed solely of classifiers of the same type, denoted as follows: Z i = { b i 1 , b i 2 , , b i n } , where b i j is the instance of classifier b i with the j-th hyperparameter configuration. Each ensemble Z i is combined using an aggregation method A, where A = { stacking , bagging , majority voting } , to create the final ensemble models.
  • Testing of ensemble models:
    For each aggregation method A, compute the performance metrics. Then, the best combination of models and aggregation strategy is selected from A = { a 1 , a 2 , a 3 } , where a 1 = { bagging } , a 2 = { stacking } , and a 3 = { majority voting } .
The graphical representation of the procedure for constructing homogeneous ensemble classifiers is shown in Figure 1, and an illustrative example is provided in Example 1.
Example 1.
  • Consider a set of base classifier types: B = { b 1 , b 2 , b 3 } ,
  • where b 1 = { k - N N } , b 2 = { SVM } , and b 3 = { DT - CART } .
For each hyperparameter configuration, the validation F 1 (outlier class) was determined:
  • For k-NN, the obtained F 1 -scores were
    F 1 ( k - N N ) = { 0.65 , 0.68 , 0.79 , , 0.88 , 0.91 , 0.95 } .
  • For SVM, they were
    F 1 ( S V M ) = { 0.663 , 0.679 , , 0.712 , 0.797 , 0.799 } .
  • For DT-CART, they were
    F 1 ( D T - C A R T ) = { 0.450 , 0.488 , 0.489 , , 0.491 , 0.495 , 0.499 } .
Assume the number of models in the ensemble is n = 2 and the minimum acceptable F 1 is F 1 , min = 0.60 .
  • For the k-NN classifier, the two best-performing models are selected: 0.95 and 0.91 .
  • For SVM, 0.799 and 0.797 are chosen.
  • For D T - C A R T , no configuration meets the condition F 1 > 0.60 , so no ensemble is formed.
For n = 3 , the possible homogeneous ensembles are as follows:
  • k-NN: { 0.88 , 0.91 , 0.95 } ;
  • SVM: { 0.712 , 0.797 , 0.799 } ;
  • DT-CART still does not meet the criterion and is excluded.
The presented homogeneous ensemble classifier construction approach allows for systematic and reproducible model generation without manual parameter tuning, increasing both the objectivity and efficiency of the entire process. The scheme of homogeneous ensemble classification is illustrated in Figure 1. From a clinical standpoint, homogeneous ensembles provide a systematic way to stabilize the performance of individual classifiers such as SVM or k-NN, reducing the variability caused by parameter choices. This consistency is particularly relevant in EEG analysis, where noisy or artifact-contaminated segments can otherwise lead to unstable predictions.

4.2. Heterogeneous Ensemble Classification Models

Unlike homogeneous ensembles, where all models in the ensemble originate from the same base classifier type, heterogeneous classification involves combining classifiers of different types within a single ensemble, which may increase its diversity and effectiveness. As in Section 4.1, the process of constructing heterogeneous ensemble classifiers begins with the generation and evaluation of base classifiers.
Let B = { b 1 , b 2 , , b m } denote the set of base classifier types (e.g., k-NN, SVM, and DT-CART) and n the number of model instances for each type. The best-performing base classifiers (i.e., those that achieve an outlier-class F 1 > F 1 , min ; default F 1 , min = 0.60 ) are selected and combined into ensembles consisting of classifiers of different types (e.g., SVM, k-NN, and DT-CART).
The process consists of the following steps:
  • Configuration Generation:
    As in Section 4.1, the hyperparameter space is defined for each base classifier type. Configurations are generated using grid search P g r i d and random search P r a n d o m methods.
  • Training Base Models:
    Models are trained on the training dataset using cross-validation to evaluate classification performance and outlier detection capability. Hyperparameter spaces may vary significantly between classifier types (e.g., C and kernel for SVM, k for k-NN, and tree depth for DT-CART).
  • Model Evaluation and Selection:
    Each model is evaluated using the metrics described in Section 5.5. Models satisfying F 1 > F 1 , min (outlier class) proceed to the next stage, where F 1 , min is determined empirically (e.g., 0.60 or higher).
  • Construction of the Heterogeneous Ensemble:
    For a given number of models n, the best-performing base classifiers are selected, allowing diversity in model types. The resulting ensemble includes classifiers of different types (e.g., k-NN + SVM + DT-CART). An aggregation method is then chosen, A { bagging , stacking , majority voting } .
  • Testing Ensemble Models:
    Heterogeneous ensemble models are evaluated on the test set using the defined metrics (e.g., accuracy, recall, area under the precision–recall curve (AUPRC), and F 1 -score). The effectiveness of different aggregation methods is also analyzed.
To account for class imbalance, selection relies on the outlier-class F 1 (default threshold 0.60 ); accuracy is reported for context only. The procedure for constructing heterogeneous ensemble classifiers is presented in Example 2.
Example 2.
  • Let us assume that the following F 1 values were obtained after evaluating the base classifiers:
  • F 1 ( k - N N ) = { 0.75 , 0.79 , 0.81 , , 0.88 , 0.95 } ;
  • F 1 ( S V M ) = { 0.663 , 0.679 , , 0.797 , 0.799 } ;
  • F 1 ( D T - C A R T ) = { 0.450 , 0.488 , , 0.495 , 0.499 } .
For a target number of models n = 2 , the two base models with the highest F 1 are selected (regardless of classifier type), provided that F 1 > 0.60 . In this case, the selected models are
  • k-NN with F 1 of 0.95 ;
  • SVM with F 1 of 0.799 .
Based on these models, a heterogeneous ensemble classifier is created: k-NN + SVM.
For n = 3 , the top three models are selected, also satisfying the condition F 1 > 0.60 :
  • k-NN with F 1 of 0.95 ;
  • k-NN with F 1 of 0.88 ;
  • SVM with F 1 of 0.799 .
As a result, the final ensemble classifier consists of two k-NN models and one SVM model.
Clinically, heterogeneous ensembles are especially valuable because they integrate complementary decision strategies from different classifiers. This diversity enhances robustness to inter-subject variability and heterogeneous recording conditions, making them well-suited for real-world EEG applications.

5. Research Methodology

The experiments were conducted using the Google Colab platform, with Python 3.12 as the programming language. The following libraries were used: MNE, NumPy, Pandas, SciPy, and Scikit-learn.

5.1. Datasets

This study utilizes two publicly available EEG datasets that differ significantly in terms of recording conditions, population, equipment, and acquisition protocols—thereby enabling robust validation of the proposed models under both controlled and real-world conditions.

5.1.1. Bonn EEG Dataset

The first dataset originates from the University of Bonn and is widely regarded as a benchmark in EEG-based epilepsy research [23]. It comprises a total of 500 single-channel EEG segments, organized into five subsets labeled A–E. Each subset contains 100 segments of equal length (23.6 s), sampled at 173.61 Hz and digitized using a 12-bit A/D converter. The data were originally recorded from both healthy volunteers and epilepsy patients.
  • Subsets A and B were recorded extracranially from healthy volunteers using scalp electrodes. The distinction between the two lies in the participants’ eye status: eyes open (A) and eyes closed (B).
  • Subset C includes interictal EEG signals from epilepsy patients, recorded intracranially from the hippocampal formation in the non-epileptogenic hemisphere.
  • Subset D contains interictal recordings obtained from within the epileptogenic zone.
  • Subset E consists of ictal segments—recordings captured during actual epileptic seizures.
In total, the dataset provides 500 carefully selected, artifact-free single-channel segments (4097 samples each), corresponding to approximately 196 min of EEG recordings. All signals are balanced across subsets and of equal length, enabling controlled experimentation and reproducibility of results.
Epoch labels (A–E) follow the original Bonn protocol and were not modified. Each 23.6 s epoch was partitioned into overlapping 2 s windows (50% overlap), with each window inheriting its parent epoch’s label for training and evaluation.

5.1.2. Guinea-Bissau and Nigeria Epilepsy (GBNE) Dataset

The second dataset, known as the Guinea-Bissau and Nigeria Epilepsy (GBNE) dataset [24], was collected under field conditions in rural and semi-urban areas of West Africa. It comprises EEG recordings from 97 participants: 51 patients diagnosed with epilepsy and 46 healthy controls. The data were acquired using the EMOTIV EPOC+ wireless headset, which features 14 channels arranged according to the international 10–20 system. Signals were recorded in a resting state with eyes closed, sampled at 128 Hz with 14-bit resolution, and stored with a session duration of approximately five minutes per subject. In total, this amounts to approximately 8 h of multichannel EEG recordings.
The dataset is inherently imbalanced, with slightly more epileptic than healthy recordings. Unlike the artifact-free Bonn dataset, GBNE introduces realistic challenges, including motion artifacts, environmental noise, and variability in signal quality due to field conditions and low-cost equipment. These characteristics make GBNE particularly valuable for testing the robustness and generalization ability of automated outlier detection models in practical, resource-limited clinical scenarios.
Table 3 summarizes the key characteristics of the Bonn and GBNE datasets, highlighting their complementary roles in evaluating algorithms under both controlled and real-world conditions.
As shown in Table 3, the two datasets complement each other: Bonn serves as a clean benchmark for reproducible testing, whereas GBNE reflects the noisy and heterogeneous conditions of clinical practice. The rationale for selecting these datasets lies in their complementary characteristics: the Bonn dataset, with its high-quality and well-annotated recordings, is optimal for evaluating model performance under controlled conditions. In contrast, the GBNE dataset captures the complexity and variability of real-world scenarios, making it suitable for testing model generalization and robustness in low-resource environments. This combination supports a comprehensive assessment of the proposed methods, spanning both idealized and challenging EEG analysis contexts.
  • Ground-truth labeling.
    The Bonn dataset includes canonical subset labels (A–E) assigned by clinical experts. For the GBNE dataset, ground-truth labels were derived from clinical diagnoses (epilepsy vs. control) provided in the original metadata. No additional manual re-annotation or relabeling was performed in this study.

5.2. Representative EEG Examples

Figure 2, Figure 3, Figure 4 and Figure 5 present representative segments from both benchmark datasets, highlighting the contrast between interictal activity in the Bonn EEG dataset and realistic, artifact-prone scalp recordings in the GBNE dataset.
As shown in Figure 2, these examples illustrate interictal patterns observed in the non-epileptogenic zone, where background activity remains mostly regular, with only occasional isolated spikes.
Figure 3 illustrates interictal activity recorded from the epileptogenic zone, contrasting with the more stable background observed in Subset C.
To complement the intracranial recordings shown in Figure 2 and Figure 3, Figure 4 and Figure 5 present representative scalp EEG segments from the GBNE dataset, which was collected under real-world field conditions using low-cost wearable equipment. Unlike the artifact-free Bonn EEG dataset, GBNE signals exhibit varying levels of contamination arising from motion, eye blinks, and muscle activity. These examples illustrate the challenges faced by automated outlier detection systems in distinguishing pathological activity from non-neural disturbances, particularly in environments characterized by limited hardware stability and recording noise.
Figure 4 shows EEG segments with mild to moderate distortions caused primarily by slow electrode drift and subtle motion artifacts, whereas Figure 5 depicts segments with pronounced electromyographic (EMG) bursts and high-amplitude artifacts related to facial movement or head motion. Together, these examples illustrate the spectrum of real-world noise patterns that may mimic epileptiform activity and highlight the need for models capable of robust generalization to heterogeneous EEG data.
The above visualizations underline the physiological and technical heterogeneity of the EEG data analyzed in this work. While the Bonn dataset provides clean, clinically curated intracranial recordings suitable for controlled benchmarking, the GBNE dataset captures the complexity of real-world scalp EEG, including motion and muscular artifacts. These visual observations complement the quantitative results presented in Section 6, providing intuitive confirmation of the differences between controlled and real-world EEG conditions.

5.3. Preprocessing

To ensure consistency and comparability of results, both datasets were subjected to the same preprocessing procedure. First, a fourth-order Butterworth bandpass filter (0.1–45 Hz) was applied to suppress low-frequency drifts and high-frequency noise. Next, Z-score normalization was performed independently on each EEG channel to standardize amplitude values and reduce inter-subject variability. Finally, the EEG signals were segmented into overlapping 2 s windows with 50% overlap. This approach improved temporal resolution, increased the number of training examples, and minimized information loss at segment boundaries.
Following preprocessing, each 2 s EEG segment was converted into a feature vector combining temporal, statistical, and spectral information. The extracted features included the following:
  • Time-domain statistics: Mean and variance;
  • Hjorth parameters: Activity, mobility, and complexity;
  • Spectral band power across standard EEG frequency ranges computed using Welch’s method;
  • Entropy-based measures: Shannon and spectral entropy.
All extracted features were standardized using z-score normalization prior to model training and evaluation. This ensured that the feature vectors used by SVM, k-NN, and DT-CART classifiers captured both temporal and frequency-domain variability, allowing for robust differentiation between epileptiform and normal EEG activity.
After feature extraction and normalization, the EEG segments were labeled to distinguish epileptiform (outlier) from normal (non-outlier) activity in accordance with the clinical annotations of each dataset. Specifically, for the Bonn dataset, segments from Subsets C–E (interictal hippocampal, interictal from the epileptogenic zone, and ictal activity) were treated as epileptiform outliers, while Subsets A–B (healthy scalp EEG with eyes open/closed) represented normal background activity. For the GBNE dataset, outliers corresponded to EEG segments obtained from patients diagnosed with epilepsy and non-outliers to recordings from healthy controls. This consistent labeling ensured reproducibility and provided a unified binary framework for distinguishing epileptiform (outlier) versus normal (non-outlier) EEG segments.

5.4. Hyperparameter Space

The experiments were conducted following the procedure described in Section 4, starting with the definition of the hyperparameter space for each base classifier. The classification algorithms considered include k-Nearest Neighbors (k-NN), Support Vector Machines (SVMs), and DT-CART decision trees. Hyperparameter sets were automatically generated using the grid search method.
Support Vector Machine (SVM):
  • C = { 0.1 , 1 , 10 , 100 }
    This parameter controls the trade-off between model complexity and classification error. It helps to avoid overfitting while minimizing misclassification, striking a balance between margin width and training accuracy.
  • k e r n e l = { linear , poly , rbf }
    This parameter specifies the kernel function used to transform the data into a higher-dimensional space. The kernel can be selected from the following: linear, radial basis function (RBF), or polynomial.
Decision trees (DT-CART):
  • m a x _ d e p t h = { 10 , 20 , 30 }
    This parameter is the maximum depth of the decision tree, defining the number of levels allowed for splits.
  • m i n _ s a m p l e s _ s p l i t = { 2 , 5 , 10 , 20 }
    This parameter is the minimum number of samples required to split an internal node.
  • c r i t e r i o n = { gini , entropy }
    This is the splitting criterion used to evaluate the quality of a split. The options include Gini impurity and Shannon entropy, both measuring node impurity.
k-Nearest Neighbors (k-NN):
  • k = { 3 , 5 , 10 , 15 , 20 }
    This is the number of neighbors considered during classification. This value affects the smoothness of the decision boundary.
  • m e t r i c = { euclidean , manhattan , minkowski }
    This is the distance metric used to compute the similarity between samples. Euclidean distance measures the straight-line distance, Manhattan distance computes the sum of absolute differences across dimensions, and Minkowski is a generalized metric encompassing both Euclidean and Manhattan distances, allowing parameterized weighting.
  • w e i g h t s = { uniform , distance }
    This defines how the neighbors contribute to the classification. Uniform assigns equal weight to all neighbors, while distance gives more influence to closer neighbors.

5.5. Evaluation Metrics

The performance of all classifiers was assessed using standard metrics derived from the confusion matrix: accuracy ( A C C ), precision (P), recall (R), specificity ( S P ), and the F 1 -score. These measures quantify how accurately the models distinguish between normal and epileptic EEG segments and are defined as follows. A summary of the evaluation metrics and their mathematical definitions is presented in Table 4.
High recall is clinically important for detecting epileptic discharges, while precision helps reduce false alarms. The F 1 -score provides a balanced measure suitable for imbalanced EEG datasets. All results were obtained using 10-fold stratified cross-validation, with preprocessing performed independently within each training fold to avoid data leakage.

5.6. Considerations for Imbalanced Data

A key challenge in EEG outlier detection is class imbalance: anomalous segments (e.g., epileptic events) are much rarer than normal activity. Relying on overall accuracy can therefore be misleading, as high A C C may co-occur with poor sensitivity to rare events [73].
To address this, this study adopted an evaluation and selection protocol that is robust to imbalance. All results are reported using stratified 10-fold cross-validation, and we prioritize imbalance-aware measures—precision, sensitivity (recall), and the F 1 -score—over accuracy when comparing models. Model selection within the automated framework uses an F 1 -based criterion (see Section 5.4). Where applicable (e.g., SVM), class weights were enabled to emphasize the minority class without altering the original data distribution.
Importantly, we did not apply synthetic oversampling (e.g., SMOTE) in the final protocol. With overlapping 2 s windows, oversampling risks information leakage between training and validation/test folds and may distort the clinically meaningful class prevalence. Our reported precision–recall trade-offs thus reflect the true distribution of events in each dataset. A systematic, leakage-safe study of dataset-adaptive oversampling techniques is left for future work.
In imbalanced EEG classification, overall accuracy can be misleading because it conflates majority-class performance with true positive detection. Recent studies recommend imbalance-aware metrics—particularly the class-specific F 1 -score and, when probabilistic scores are available, the area under the precision–recall curve (AUPRC)—as more informative objectives for both model selection and reporting [49,50]. Following this guidance, we use the outlier-class F 1 as our selection criterion (default threshold F 1 > 0.60 ) and report AUPRC where applicable, in addition to accuracy for context.
To prevent information leakage, a grouped, stratified 10-fold cross-validation scheme was applied. For the Bonn dataset, all 2 s windows inherited the label of their parent 23.6 s epoch and were assigned to the same fold (epoch-wise grouping). For the GBNE dataset, windows originating from the same subject were grouped together to ensure subject-wise separation between training and test folds.

5.7. Paired Significance Testing via McNemar’s Test

To assess whether the ensemble’s improvement over the best base classifier is statistically significant, we applied McNemar’s test on paired out-of-fold (OOF) predictions generated under identical cross-validation splits. Let y i be the ground-truth label for instance i, and let y ^ i ( E ) and y ^ i ( B ) denote the predictions of the ensemble and the best base model, respectively, with both predictions obtained for the same held-out instance (OOF). We construct the 2 × 2 contingency table over N OOF instances as shown in Table 5.
McNemar’s test focuses on the discordant pairs n 01 and n 10 . The continuity-corrected chi-square statistic is defined in Equation (1) with one degree of freedom:
χ 2 = ( | n 01 n 10 | 1 ) 2 n 01 + n 10 .
For small n 01 + n 10 , we additionally report the exact binomial p-value, testing whether the probability of a discordant outcome favors one model over the other under the null hypothesis H 0 : Pr ( Ensemble wins ) = 0.5 . We apply the test separately per dataset, Bonn and GBNE. A higher n 01 than n 10 indicates that the ensemble corrects more of the base model’s errors than vice versa. Significant p-values confirm that the observed superiority is unlikely due to chance.

6. Results

6.1. Performance of Base Classifiers

Outlier detection was conducted for two EEG datasets—Bonn and the Guinea-Bissau and Nigeria Epilepsy Dataset (GBNE)—using the automatically generated hyperparameter space described in Section 5.4. Three base classifiers were applied in the experiments: k-Nearest Neighbors (k-NN), Support Vector Machines (SVM), and decision trees of the CART type (DT-CART). Stratified results for ictal, interictal, and healthy subsets of the Bonn dataset were analyzed separately to assess robustness across EEG categories. This allowed us to evaluate whether classifier performance remained consistent across different types of epileptiform and normal activity.
Table 6 summarizes, for each classifier and dataset, the ranges of ACC, precision, outlier-class F 1 , and recall. Model selection in our framework uses the outlier-class F 1 > 0.60 , while accuracy is reported for context only.
For the SVM classifier, we used the rbf kernel with C { 10 , 100 } . On the Bonn EEG dataset, the maximum overall accuracy reached 94.4%, yet outlier recall was only 66.3%. On GBNE, SVM achieved 83.0% accuracy with 82.0% recall, outperforming its Bonn recall.
For the DT-CART classifier (maximum depth 30, entropy split), overall accuracy on the Bonn EEG dataset remained below 50% (maximum 48.8%), with precision up to 67.2% and recall up to 48.8%. On GBNE, recall peaked at 60.0%. Within the automated selection procedure, base models were eligible for ensemble construction only if they achieved an outlier-class F 1 > 0.60 on validation folds (Section 5.5). As DT-CART did not satisfy this criterion on either dataset, it served exclusively as a comparative baseline for interpretability assessment and was automatically excluded from all homogeneous and heterogeneous ensembles. This confirms that ensemble composition was determined objectively by the performance threshold rather than manual preference for specific algorithms.
The best performance was obtained with the k-NN classifier. On the Bonn EEG dataset, the maximum accuracy was 92.5% using Manhattan distance with k = 3 , with recall up to 92.5%. On GBNE, the best accuracy (79.9%) was achieved with Manhattan distance, k = 10 , and uniform weighting; recall reached 80.3%.
Overall, the k-NN algorithm showed the highest effectiveness in detecting outliers—especially on Bonn—combining high accuracy with high recall. SVM yielded satisfactory results, particularly on GBNE, but its Bonn recall was limited. DT-CART exhibited the weakest performance, with both accuracy and recall markedly below those of the other methods. These limitations of individual base classifiers motivate the use of ensemble methods to improve outlier detection in EEG signals.

6.2. Results of Homogeneous Ensemble Classification

Homogeneous ensemble classifiers were constructed based on a single classifier type with varying hyperparameters. Only base models that achieved an outlier-class F 1 > 0.60 on validation folds were included in the ensemble generation process (Section 5.5). Under this rule, DT-CART did not qualify on either dataset and was therefore excluded from all homogeneous ensembles. Its results are reported in Section 6.1 for completeness, as they illustrate the automatic screening mechanism that governs ensemble construction in the proposed framework. Given a specified number n of base classifiers of the same type and a selection threshold of outlier-class F 1 > 0.60 , the algorithm automatically formed combinations of these classifiers into a homogeneous ensemble.
Bonn EEG Dataset
Outlier-class F 1 values for the top-performing configurations of k-NN, SVM, and DT-CART on the Bonn EEG dataset are presented in Table 7.
For transparency, we report the best DT–CART and polynomial kernel SVM settings even when F 1 0.60 ; such models were not eligible for ensemble construction under the F 1 > 0.60 selection rule. Following the procedure described in Section 4, homogeneous ensemble classifiers were automatically created for specified values of n. Sample configurations are shown in Table 8.
Homogeneous ensemble classifiers were evaluated using aggregation methods A { majority voting , stacking , bagging } . For each combination of classifiers, accuracy ( A C C ), precision (P), recall, and F 1 -score were calculated. The results are presented in Table 9. The stacking technique achieved the best performance, with accuracy up to 95.0%. Even a 2.5 percentage point improvement compared to the best single classifier is clinically meaningful. Majority voting produced results similar to the strongest individual classifiers (around 92.5%), while bagging performed more weakly, with accuracy in the 80.0–88.8% range.
As shown in Table 9, k-NN ensembles achieved very high precision and recall, with majority voting correctly identifying over 92% of outlier cases. The F 1 -score reached 91.8%, confirming robust detection performance. Bagging was consistently weaker, performing about 10 percentage points below the other aggregation methods. Stacking produced the strongest results, with precision above 91% and accuracy up to 95%.
For SVM-based ensembles, recall improved markedly compared to single models (from 66.3% to 91.3%). The highest ACC was obtained with stacking (2), whereas stacking (3–4) achieved the highest F 1 /recall. In contrast, bagging offered only moderate gains.
Comparing k-NN and SVM ensembles highlights distinct behaviors. k-NN ensembles achieved higher recall (above 90%), making them highly sensitive to epileptic discharges but with slightly lower precision. SVM ensembles delivered very high precision (up to 1.000) but somewhat lower recall, reflecting a more conservative detection strategy with fewer false positives but more missed seizures.
Clinically, this trade-off is important. For seizure detection, where missing an event carries high risk, the higher recall of k-NN ensembles may be preferable. Conversely, SVM ensembles are valuable in scenarios where minimizing false alarms is critical, such as continuous monitoring in clinical settings.
Guinea-Bissau and Nigeria Epilepsy Dataset
Results for the GBNE dataset differed substantially. For k-NN ensembles, the best performance came from majority voting and bagging with four classifiers, both achieving 82.9% accuracy. In contrast, stacking with k-NN produced the weakest outcomes.
SVM ensembles combined with stacking reached the highest overall accuracy on GBNE (86.8%), outperforming all other homogeneous ensembles. Majority voting with SVMs also achieved strong results (81.6%).
Unexpectedly, SVM ensembles trained with bagging showed extremely poor recall (only 11.8%), despite high precision (over 81%). This indicates that while they rarely produced false positives, they failed to detect most seizure events, limiting their clinical usefulness.
Detailed evaluation metrics for GBNE homogeneous ensembles are presented in Table 10.
Overall, while accuracies on GBNE were encouraging, both precision and F 1 -scores dropped compared to the Bonn dataset. Majority voting and stacking often produced precision values between 44 and 58%, meaning nearly half of the detected outliers were false positives. Clinically, this limits reliability, despite strong recall in some cases.
k-NN ensembles generally achieved higher recall (up to 0.769), confirming their sensitivity to seizure events in noisy data. SVM ensembles, in contrast, achieved higher precision but often failed to detect true positives, particularly when bagging was applied.
These results highlight the challenges of applying ensemble methods to noisy, imbalanced datasets such as GBNE. While stacking improved overall accuracy, precision–recall trade-offs became more pronounced, underlining the need for dataset-adaptive strategies in clinical applications.

6.3. Results of Heterogeneous Ensemble Classification

Heterogeneous ensembles were generated automatically as described in Section 4. Each base model was required to achieve an outlier-class F 1 > 0.60 on validation. According to this rule, DT-CART did not qualify on either dataset and was therefore excluded from all heterogeneous ensemble configurations. Examples of automatically generated ensembles for n = 2 and n = 3 are listed in Table 11. Here, Manh denotes Manhattan distance, Eucl denotes Euclidean distance, dist indicates distance-based weighting, and u refers to uniform weighting.
Bonn EEG dataset.
Heterogeneous SVM + k-NN ensembles comfortably exceeded the selection threshold F 1 > 0.60 . The best result was obtained with stacking (Table 12), achieving F 1 = 92.7 % and recall = 91.6%. Simple majority voting was close ( F 1 = 91.5 % ), whereas bagging lagged behind ( F 1 = 88.5 88.8 %). Precision peaked for stacking (up to 90.5%), indicating an improved precision–recall balance without sacrificing sensitivity. Accuracy (ACC) is reported for context only because the model comparison is driven by F 1 .
On Bonn, stacking achieved the best overall performance (ACC 0.950, F 1 0.927, and recall 0.916). The confusion matrix for the best heterogeneous model (stacking, SVM + k-NN) is shown in Figure 6. Relative to the best homogeneous ensembles, false positives (type I errors) decreased by 4–5 cases, while false negatives (type II) remained comparable to k-NN. Thus, the heterogeneous stack improves the precision–recall balance without sacrificing sensitivity.
It should be emphasized that on clean, well-structured EEG (Bonn), stacking SVM with k-NN reduces false alarms while maintaining high sensitivity, which is desirable for decision-support workflows.
Guinea-Bissau and Nigeria Epilepsy (GBNE).
On the more challenging GBNE dataset, stacking again produced the best overall accuracy ( A C C = 0.855 ), whereas majority voting underperformed (hard: 0.460 ; soft: 0.420 ). Bagging reached A C C = 0.740 . Despite high precision for soft voting ( P = 0.890 ), its F 1 -score remained low ( 0.570 ) due to reduced recall, indicating many missed true positives or unstable balance on noisy data. Stacking delivered the strongest A C C , but with moderate precision/recall (both 0.46 0.60 ), reflecting the difficulty of minority-class detection in field conditions. Detailed performance metrics for all heterogeneous ensembles are summarized in Table 13.
On GBNE dataset, stacking yielded the highest ACC (0.855), while bagging (3) gave the best F 1 (0.612) and soft voting achieved the highest precision (0.890). It should be emphasized that on the noisy, imbalanced GBNE dataset, accuracy alone can be misleading: methods with high A C C (e.g., stacking) may still yield modest F 1 due to precision–recall trade-offs. In screening scenarios, ensembles should be tuned for higher recall to avoid missed seizures, whereas in alarm-driven monitoring, higher precision may be prioritized to limit false alerts.
The very low recall observed for SVM bagging on GBNE likely stems from probability/threshold calibration variance under noisy, artifact-laden features, which—after bootstrap aggregation—induces a conservative decision boundary and a bias toward the majority class. In practice, this can be mitigated by post hoc calibration (Platt/Isotonic), class weight tuning in base SVMs, or decision threshold optimization on validation folds (maximizing F 1 , outlier or recall at fixed precision).

6.4. Statistical Validation of Ensemble Superiority

To assess whether the observed improvements of the ensemble models were statistically significant, we applied the non-parametric McNemar’s test on paired classification outcomes between the ensemble (stacking) and the best-performing base models (k-NN and SVM). The test evaluates the null hypothesis that both models have identical proportions of correct classifications. A p-value below 0.05 indicates a significant difference in prediction behavior.
Figure 7 and Figure 8 present the McNemar contingency tables for the Bonn and GBNE datasets, respectively. For the Bonn dataset, the test comparing the stacking ensemble with k-NN yielded p = 0.0042 , while for the GBNE dataset the comparison between stacking and SVM resulted in p = 0.0049 . The resulting p-values were below 0.01 in both cases, leading to rejection of the null hypothesis and confirming that the ensemble classifier’s superiority is statistically significant at the α = 0.05 level.

7. Conclusions

The results obtained across both datasets demonstrate that ensemble models consistently outperformed individual base classifiers under the F1-driven selection protocol. The proposed models are designed for screening-level support and triage of large EEG corpora—not for standalone diagnosis. Clinical decisions remain the purview of expert evaluation. For the Bonn EEG dataset, the best heterogeneous stack (SVM + k-NN) achieved an F 1 -score of 92.7%, with recall = 91.6% and accuracy = 95.0%. This configuration slightly improved the precision–recall balance compared with the strongest homogeneous ensembles, confirming that combining complementary decision rules enhances stability and reduces variability caused by hyperparameter selection.
On the more challenging GBNE dataset, performance was highly metric-dependent. Stacking achieved the highest accuracy (up to 85.5% for heterogeneous and 86.8% for homogeneous SVM ensembles), although its F 1 -score remained below the 0.60 threshold in the heterogeneous setting. In contrast, heterogeneous bagging reached the best heterogeneous F 1 (up to 61.2%). Among homogeneous models, k-NN stacking prioritized sensitivity (recall up to 76.9%) and achieved higher F 1 (up to 75.3%), albeit at the cost of lower accuracy—illustrating the inherent precision–recall trade-offs typical of noisy, imbalanced EEG data acquired in field conditions.
These trade-offs have clear clinical implications. For screening scenarios where missed events are costly, recall-oriented ensembles (e.g., k-NN–based) are preferable. Conversely, in alarm-driven monitoring applications, precision-oriented variants (e.g., SVM voting or stacking) are more effective in reducing false alerts. Throughout this work, accuracy is reported for completeness, but model comparison and selection are guided by the outlier-class F 1 -score.
It should be emphasized that the Bonn dataset is small and idealized, whereas the GBNE dataset, though more realistic, includes a limited number of subjects. Fixed 2 s segmentation may overlook longer temporal dependencies. Moreover, while stacking improves effectiveness, it also reduces interpretability, and the models’ sensitivity to class imbalance and noise limits their immediate clinical deployment. All experiments were performed offline rather than in real-time conditions.
Additionally, the ensemble generator applies an automatic F 1 -based selection rule that determines which base models are eligible for inclusion. This simple yet effective criterion enhances the transparency, reproducibility, and adaptability of the proposed framework, making it applicable to diverse datasets and classifier types without manual tuning or arbitrary design choices.
Future research should address these limitations by expanding the datasets with longer and more diverse EEG recordings, incorporating explainable AI mechanisms to enhance interpretability, and validating the framework in real-time monitoring settings. Exploring deep learning ensembles, adaptive weighting strategies, and multimodal data integration (e.g., EEG–fMRI) may further enhance both performance and clinical applicability.
Overall, the proposed automated ensemble framework demonstrates not only methodological improvements but also clinically meaningful benefits. It offers a robust, interpretable, and generalizable solution capable of reducing false alarms, improving diagnostic efficiency, and supporting neurologists in reliable and efficient epilepsy detection.

Author Contributions

Conceptualization, A.D. and P.S.S.; methodology, A.D.; software, N.Ł.; validation, A.D. and N.Ł.; formal analysis, A.D. and P.S.S.; investigation, N.Ł. and A.D.; data curation, N.Ł. and A.D.; writing—original draft preparation, A.D.; writing—review and editing, P.S.S. and A.D.; visualization, N.Ł. and A.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. World Health Organization. Epilepsy: A Public Health Imperative; World Health Organization Report; World Health Organization: Geneva, Switzerland, 2019. [Google Scholar]
  2. Devinsky, O.; Hesdorffer, D.C.; Thurman, D.J.; Lhatoo, S.; Richerson, G. Sudden unexpected death in epilepsy: Epidemiology, mechanisms, and prevention. Lancet Neurol. 2016, 15, 1075–1088. [Google Scholar] [CrossRef]
  3. Thurman, D.J.; Beghi, E.; Begley, C.E.; Berg, A.T.; Buchhalter, J.R.; Ding, D.; Hesdorffer, D.C.; Hauser, W.A.; Kazis, L.; Kobau, R.; et al. Standards for epidemiologic studies and surveillance of epilepsy. Epilepsia 2011, 52, 2–26. [Google Scholar] [CrossRef]
  4. Brodie, M.J.; Kwan, P. Newer drugs for focal epilepsy in adults. BMJ 2012, 344, e345. [Google Scholar] [CrossRef] [PubMed]
  5. Fisher, R.S.; Cross, J.H.; D’Souza, C.; French, J.A.; Haut, S.R.; Higurashi, N.; Hirsch, E.; Jansen, F.E.; Lagae, L.; Moshe, S.L.; et al. Instruction manual for the ILAE 2017 operational classification of seizure types. Epilepsia 2017, 58, 531–542. [Google Scholar] [CrossRef] [PubMed]
  6. Scheffer, I.E.; Berkovic, S.; Capovilla, G.; Connolly, M.B.; French, J.; Guilhoto, L.; Hirsch, E.; Jain, S.; Mathern, G.W.; Moshé, S.L.; et al. ILAE classification of the epilepsies: Position paper of the ILAE Commission for Classification and Terminology. Epilepsia 2017, 58, 512–530. [Google Scholar] [CrossRef]
  7. Smith, S.J. EEG in the diagnosis, classification, and management of patients with epilepsy. J. Neurol. Neurosurg. Psychiatry 2005, 76, ii2–ii7. [Google Scholar] [CrossRef] [PubMed]
  8. Benbadis, S.R.; Tatum, W.O. The role of EEG in patients with suspected epilepsy. Epileptic Disord. 2020, 22, 143–155. [Google Scholar] [CrossRef]
  9. Halford, J.J. Computerized epileptiform transient detection in the scalp electroencephalogram: Obstacles to progress and the example of computerized ECG interpretation. Clin. Neurophysiol. 2009, 120, 1909–1915. [Google Scholar] [CrossRef]
  10. Subha, D.; Joseph, P.; Acharya, U.; Lim, C. EEG signal analysis: A survey. J. Med. Syst. 2010, 34, 195–212. [Google Scholar] [CrossRef]
  11. Acharya, U.R.; Sree, S.V.; Swapna, G.; Martis, R.J.; Suri, J.S. Automated diagnosis of epilepsy using CWT, HOS and texture parameters. Int. J. Neural Syst. 2013, 23, 1350009. [Google Scholar] [CrossRef]
  12. Wang, D.; Ren, D.; Li, K.; Feng, Y.; Ma, D.; Yan, X.; Wang, G. Epileptic seizure detection in long-term EEG recordings by using wavelet-based directed transfer function. IEEE Trans. Biomed. Eng. 2018, 65, 2591–2599. [Google Scholar] [CrossRef] [PubMed]
  13. Roy, Y.; Banville, H.; Albuquerque, I.; Gramfort, A.; Falk, T.H.; Faubert, J. Deep learning-based electroencephalography analysis: A systematic review. J. Neural Eng. 2019, 16, 051001. [Google Scholar] [CrossRef]
  14. Janmohamed, M.; Nhu, D.; Kuhlmann, L.; Gilligan, A.; Tan, C.W.; Perucca, P.; O’Brien, T.J.; Kwan, P. Moving the field forward: Detection of epileptiform abnormalities on scalp electroencephalography using deep learning—Clinical application perspectives. Brain Commun. 2022, 4, fcac218. [Google Scholar] [CrossRef]
  15. Abdi-Sargezeh, B.; Shirani, S.; Sanei, S.; Took, C.C.; Geman, O.; Alarcon, G.; Valentin, A. A review of signal processing and machine learning techniques for interictal epileptiform discharge detection. Comput. Biol. Med. 2024, 168, 107782. [Google Scholar] [CrossRef]
  16. Craik, A.; He, Y.; Contreras-Vidal, J.L. Deep learning for electroencephalogram (EEG) classification tasks: A review. J. Neural Eng. 2019, 16, 031001. [Google Scholar] [CrossRef]
  17. Wong, S.; Simmons, A.; Rivera-Villicana, J.; Barnett, S.; Sivathamboo, S.; Perucca, P.; Ge, Z.; Kwan, P.; Kuhlmann, L.; Vasa, R.; et al. EEG datasets for seizure detection and prediction—A review. Epilepsia Open 2023, 8, 252–267. [Google Scholar] [CrossRef]
  18. Jiang, X.; Bian, G.-B.; Tian, Z. Removal of Artifacts from EEG Signals: A Review. Sensors 2019, 19, 987. [Google Scholar] [CrossRef]
  19. Jin, Y.-M.; Luo, Y.-D.; Zheng, W.-L.; Lu, B.-L. EEG-based emotion recognition using domain adaptation network. In Proceedings of the 2017 International Conference on Orange Technologies (ICOT), Singapore, 8–10 December 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–6. [Google Scholar] [CrossRef]
  20. Wei, Z.; Zou, J.; Zhang, J.; Xu, J. Automatic epileptic EEG detection using convolutional neural network with improvements in time-domain. Biomed. Signal Process. Control 2019, 53, 101551. [Google Scholar] [CrossRef]
  21. Tatum, W.O. Clinical utility of EEG in diagnosing and monitoring epilepsy. Clin. Neurophysiol. 2018, 129, 1056–1082. [Google Scholar] [CrossRef] [PubMed]
  22. Dietterich, T.G. Ensemble methods in machine learning. In Multiple Classifier Systems, Proceedings of the First International Workshop, Cagliari, Italy, 21–23 June 2000; Springer: Berlin/Heidelberg, Germany, 2000; pp. 1–15. [Google Scholar] [CrossRef]
  23. Andrzejak, R.G.; Lehnertz, K.; Rieke, C.; Mormann, F.; David, P.; Elger, C.E. Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: Dependence on recording region and brain state. Phys. Rev. E 2001, 64, 061907. [Google Scholar] [CrossRef] [PubMed]
  24. Van Hees, S.; Otte, W.M. EEG Data from Epilepsy Patients and Healthy Subjects in Guinea-Bissau and Nigeria (GBNE) [Dataset]. Zenodo, Geneva, Switzerland. 2018. Available online: http://zenodo.org/records/1252141 (accessed on 21 June 2025). [CrossRef]
  25. Liang, S.-F.; Wang, H.-C.; Chang, W.-L. Combination of EEG complexity and spectral analysis for epilepsy diagnosis and seizure detection. EURASIP J. Adv. Signal Process. 2010, 2010, 853434. [Google Scholar] [CrossRef]
  26. Cohen, L. Time–Frequency Analysis; Prentice Hall PTR: Englewood Cliffs, NJ, USA, 1995. [Google Scholar]
  27. Subasi, A. EEG signal classification using wavelet feature extraction and a mixture of expert model. Expert Syst. Appl. 2007, 32, 1084–1093. [Google Scholar] [CrossRef]
  28. Patel, V.; Oswal, P.; Kumar, R. Wavelet packet based energy features for epileptic seizure detection. Med. Eng. Phys. 2009, 31, 1070–1076. [Google Scholar] [CrossRef]
  29. Rosso, O.A.; Blanco, S.; Yordanova, J.; Kolev, V.; Figliola, A.; Schürmann, M.; Basar, E. Wavelet entropy: A new tool for analysis of short duration brain electrical signals. J. Neurosci. Methods 2001, 105, 65–75. [Google Scholar] [CrossRef]
  30. Faust, O.; Acharya, U.R.; Adeli, H.; Adeli, A. Wavelet-based EEG processing for computer-aided seizure detection and epilepsy diagnosis—A review. Seizure 2015, 26, 56–64. [Google Scholar] [CrossRef] [PubMed]
  31. Garcia, M.; Thomas, A.; Gupta, R. Support vector machines for epileptic EEG classification: A review. Artif. Intell. Med. 2014, 60, 1–13. [Google Scholar] [CrossRef]
  32. Subasi, A.; Gursoy, M.I. EEG signal classification using PCA, ICA, LDA and support vector machines. Expert Syst. Appl. 2010, 37, 8659–8666. [Google Scholar] [CrossRef]
  33. Ang, K.K.; Chin, Z.Y.; Zhang, H.; Guan, C. Filter bank common spatial pattern algorithm on BCI competition IV datasets 2a and 2b. Front. Neurosci. 2012, 6, 39. [Google Scholar] [CrossRef]
  34. Obeid, I.; Picone, J. The Temple University Hospital EEG Data Corpus (TUH EEG Corpus). Available online: https://isip.piconepress.com/projects/nedc/html/tuh_eeg/ (accessed on 21 June 2025).
  35. Martis, R.J.; Chakraborty, C.; Ray, A.K. Gaussian mixture model-based clustering technique for electrocardiogram analysis. In Data Mining in Biomedical Imaging, Signaling, and Systems; Taylor & Francis: New York, NY, USA, 2016; p. 101. [Google Scholar]
  36. Zhou, Z.-H. Ensemble Methods: Foundations and Algorithms; Chapman & Hall/CRC: Boca Raton, FL, USA, 2012. [Google Scholar] [CrossRef]
  37. Alotaiby, T.; Alshebeili, S.A.; Alshawi, T.; Ahmad, I.; El-Samie, F.E.A. EEG seizure detection and prediction algorithms: A survey. EURASIP J. Adv. Signal Process. 2014, 2014, 183. [Google Scholar] [CrossRef]
  38. Acharya, U.R.; Oh, S.L.; Hagiwara, Y.; Tan, J.H.; Adeli, H. Deep convolutional neural network for the automated detection and diagnosis of seizure using EEG signals. Comput. Biol. Med. 2018, 100, 270–278. [Google Scholar] [CrossRef] [PubMed]
  39. Lasefr, Z.; Elleithy, K.; Reddy, R.R.; Abdelfattah, E.; Faezipour, M. An Epileptic Seizure Detection Technique Using EEG Signals with Mobile Application Development. Appl. Sci. 2023, 13, 9571. [Google Scholar] [CrossRef]
  40. Khan, G.H.; Khan, N.A.; Altaf, M.A.B.; Abbasi, Q. A Shallow Autoencoder Framework for Epileptic Seizure Detection in EEG Signals. Sensors 2023, 23, 4112. [Google Scholar] [CrossRef]
  41. Palanisamy, K.K.; Rengaraj, A. Detection of Anxiety-Based Epileptic Seizures in EEG Signals Using Fuzzy Features and Parrot Optimization-Tuned LSTM. Brain Sci. 2024, 14, 848. [Google Scholar] [CrossRef]
  42. Lenkala, S.; Marry, R.; Gopovaram, S.R.; Akinci, T.C.; Topsakal, O. Comparison of Automated Machine Learning (AutoML) Tools for Epileptic Seizure Detection Using Electroencephalograms (EEG). Computers 2023, 12, 197. [Google Scholar] [CrossRef]
  43. Breunig, M.M.; Kriegel, H.P.; Ng, R.T.; Sander, J. LOF: Identifying density-based local outliers. ACM SIGMOD Rec. 2000, 29, 93–104. [Google Scholar] [CrossRef]
  44. Liu, F.T.; Ting, K.M.; Zhou, Z.H. Isolation Forest. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining (ICDM), Pisa, Italy, 15–19 December 2008; pp. 413–422. [Google Scholar] [CrossRef]
  45. Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD), Portland, OR, USA, 2–4 August 1996; AAAI Press: Menlo Park, CA, USA, 1996; pp. 226–231. [Google Scholar]
  46. Schirrmeister, R.T.; Springenberg, J.T.; Fiederer, L.D.J.; Glasstetter, M.; Eggensperger, K.; Tangermann, M.; Hutter, F.; Burgard, W.; Ball, T. Deep learning with convolutional neural networks for EEG decoding and visualization. Hum. Brain Mapp. 2017, 38, 5391–5420. [Google Scholar] [CrossRef]
  47. Mirowski, P.; Madhavan, D.; LeCun, Y.; Kuzniecky, R. Comparing SVMs and LSTMs for detecting seizures in intracranial EEG. In Proceedings of the 2009 IEEE Engineering in Medicine and Biology Society Conference (EMBC), Minneapolis, MN, USA, 3–6 September 2009; pp. 2434–2437. [Google Scholar] [CrossRef]
  48. Truong, N.D.; Nguyen, A.D.; Kuhlmann, L.; Bonyadi, M.R.; Yang, J.J.; Ippolito, S.; Kavehei, O. Convolutional neural networks for seizure prediction using intracranial and scalp EEG. Neural Netw. 2018, 105, 104–111. [Google Scholar] [CrossRef] [PubMed]
  49. Rasool, A.; Aslam, S.; Xu, Y.; Wang, Y.; Pan, Y.; Chen, W. Deep neurocomputational fusion for ASD diagnosis using multi-domain EEG analysis. Neurocomputing 2025, 641, 130353. [Google Scholar] [CrossRef]
  50. Bunterngchit, C.; Wang, J.; Su, J.; Wang, Y.; Liu, S.; Hou, Z.G. Temporal attention fusion network with custom loss function for EEG–fNIRS classification. J. Neural Eng. 2024, 21, 086003. [Google Scholar] [CrossRef] [PubMed]
  51. Ficici, C.; Telatar, Z.; Kocak, O.; Erogul, O. Identification of TLE Focus from EEG Signals by Using Deep Learning Approach. Diagnostics 2023, 13, 2261. [Google Scholar] [CrossRef] [PubMed]
  52. Yıldız, İ.; Garner, R.; Lai, M.; Duncan, D. Unsupervised seizure identification on EEG. Comput. Methods Programs Biomed. 2022, 215, 106604. [Google Scholar] [CrossRef] [PubMed]
  53. Zhao, T.; Cui, Y.; Ji, T.; Luo, J.; Li, W.; Jiang, J.; Gao, Z.; Hu, W.; Yan, Y.; Jiang, Y. VAEEG: Variational auto-encoder for extracting EEG representation. NeuroImage 2024, 304, 120946. [Google Scholar] [CrossRef]
  54. Vahid, A.; Mückschel, M.; Stober, S.; Stock, A.K.; Beste, C. Conditional generative adversarial networks applied to EEG data can inform about the inter-relation of antagonistic behaviors on a neural level. Commun. Biol. 2022, 5, 148. [Google Scholar] [CrossRef]
  55. Potter, I.; Yıldız, I.; Zerveas, G.; Eickhoff, C.; Duncan, D. Unsupervised multivariate time-series transformers for seizure identification on EEG. In Proceedings of the 21st IEEE International Conference on Machine Learning and Applications (ICMLA), Nassau, Bahamas, 12–14 December 2022; pp. 1304–1311. [Google Scholar] [CrossRef]
  56. Vafaei, E. Transformers in EEG Analysis: A Review of Architectures. Sensors 2025, 25, 1293. [Google Scholar] [CrossRef]
  57. Abid, A.; Thajaoui, A.; Zeadally, S.; Miladi, M.; Kachouri, A. Catalyzing EEG Signal Analysis: Unveiling the Potential of Machine Learning-Enabled Smart K Nearest Neighbor Outlier Detection. Int. J. Inf. Technol. 2024, 16, 2079–2089. [Google Scholar] [CrossRef]
  58. Giri, B.K.; Sarkar, S.; Mazumder, S.; Das, K. A computationally efficient order statistics based outlier detection technique for EEG signals. In Proceedings of the 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Milan, Italy, 25–29 August 2015. [Google Scholar] [CrossRef]
  59. Chandola, V.; Banerjee, A.; Kumar, V. Anomaly detection: A survey. ACM Comput. Surv. (CSUR) 2009, 41, 1–58. [Google Scholar] [CrossRef]
  60. Chen, W.; Wang, Y.; Cao, G.; Chen, G.; Gu, Q. A random forest model based classification scheme for neonatal amplitude-integrated EEG. Biomed. Eng. Online 2014, 13 (Suppl. 2), S4. [Google Scholar] [CrossRef] [PubMed]
  61. Nawaz, A.; Khan, S.S.; Ahmad, A. Ensemble of Autoencoders for Anomaly Detection in Biomedical Data: A Narrative Review. IEEE Access 2024, 11, 17273–17289. [Google Scholar] [CrossRef]
  62. Zhao, Y.; Nasrullah, Z.; Li, Z. PyOD: A Python Toolbox for Scalable Outlier Detection. J. Mach. Learn. Res. 2019, 20, 1–7. [Google Scholar]
  63. Liu, M.; Liu, J.; Xu, M.; Liu, Y.; Li, J.; Nie, W.; Yuan, Q. Combining meta and ensemble learning to classify EEG for seizure detection. Sci. Rep. 2025, 15, 10755. [Google Scholar] [CrossRef]
  64. Goldberger, A.L.; Amaral, L.A.N.; Glass, L.; Hausdorff, J.M.; Ivanov, P.C.; Mark, R.G.; Mietus, J.E.; Moody, G.B.; Peng, C.K.; Stanley, H.E. PhysioBank, PhysioToolkit, and PhysioNet: Components of a New Research Resource for Complex Physiologic Signals. Circulation 2000, 101, e215–e220. [Google Scholar] [CrossRef] [PubMed]
  65. Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
  66. Vapnik, V. The Nature of Statistical Learning Theory; Springer: New York, NY, USA, 1995. [Google Scholar] [CrossRef]
  67. Guo, G.; Wang, H.; Bell, D.; Bi, Y.; Greer, K. KNN model-based approach in classification. In Proceedings of the OTM Confederated International Conferences, CoopIS, DOA, and ODBASE, Catania, Sicily, Italy, 3–7 November 2003. [Google Scholar]
  68. Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J. Classification and Regression Trees; Wadsworth International Group: Belmont, CA, USA, 1984. [Google Scholar]
  69. Bastos, N.; Marques, B.; Adamatti, D.; Billa, C. Analyzing EEG signals using decision trees: A study of modulation of amplitude. Comput. Intell. Neurosci. 2020, 2020, 3598416. [Google Scholar] [CrossRef] [PubMed]
  70. Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
  71. Wolpert, D.H. Stacked generalization. Neural Netw. 1992, 5, 241–259. [Google Scholar] [CrossRef]
  72. Duraj, A.; Chomątek, Ł. Outlier detection in EEG signals. Prz. Elektrotechniczny 2023, 99, 237–240. [Google Scholar] [CrossRef]
  73. He, H.; Garcia, E.A. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar] [CrossRef]
Figure 1. Diagram of the homogeneous ensemble classifier construction process.
Figure 1. Diagram of the homogeneous ensemble classifier construction process.
Applsci 15 12343 g001
Figure 2. Representative interictal EEG segments from the Bonn dataset (Subset C—non- epileptogenic zone). Each segment shows mostly regular background activity with a few isolated spikes (highlighted by dashed red lines), corresponding to rare interictal discharges in the non-epileptogenic hemisphere. These signals exemplify typical low-frequency outliers embedded within near-normal activity.
Figure 2. Representative interictal EEG segments from the Bonn dataset (Subset C—non- epileptogenic zone). Each segment shows mostly regular background activity with a few isolated spikes (highlighted by dashed red lines), corresponding to rare interictal discharges in the non-epileptogenic hemisphere. These signals exemplify typical low-frequency outliers embedded within near-normal activity.
Applsci 15 12343 g002
Figure 3. Representative interictal EEG segments from the Bonn dataset (Subset D—epileptogenic zone). These examples exhibit frequent epileptiform spikes and sharp waves, indicated by dashed red lines. Compared to Subset C, the signal contains higher-amplitude discharges and denser spike clusters, characteristic of interictal activity in the epileptogenic region.
Figure 3. Representative interictal EEG segments from the Bonn dataset (Subset D—epileptogenic zone). These examples exhibit frequent epileptiform spikes and sharp waves, indicated by dashed red lines. Compared to Subset C, the signal contains higher-amplitude discharges and denser spike clusters, characteristic of interictal activity in the epileptogenic region.
Applsci 15 12343 g003
Figure 4. Representative scalp EEG segments from the GBNE dataset (channels AF3 and F3), recorded under field conditions using the EMOTIV EPOC+ headset. Highlighted regions (orange) indicate low-frequency (LF) artifacts, typically caused by slow electrode drift or mild motion. These segments illustrate moderate signal disturbances superimposed on background activity, representing common real-world challenges in low-cost EEG recordings.
Figure 4. Representative scalp EEG segments from the GBNE dataset (channels AF3 and F3), recorded under field conditions using the EMOTIV EPOC+ headset. Highlighted regions (orange) indicate low-frequency (LF) artifacts, typically caused by slow electrode drift or mild motion. These segments illustrate moderate signal disturbances superimposed on background activity, representing common real-world challenges in low-cost EEG recordings.
Applsci 15 12343 g004
Figure 5. Examples of EEG segments from the GBNE dataset with pronounced artifacts. Vertical markers correspond to high-frequency electromyographic (EMG) bursts, while shaded regions denote large-amplitude motion or electrode artifacts. These signals demonstrate realistic non-neural interferences such as eye blinks, facial muscle activity, and environmental noise, which act as “outlier-like” events and test the robustness and generalization capability of automated detection methods.
Figure 5. Examples of EEG segments from the GBNE dataset with pronounced artifacts. Vertical markers correspond to high-frequency electromyographic (EMG) bursts, while shaded regions denote large-amplitude motion or electrode artifacts. These signals demonstrate realistic non-neural interferences such as eye blinks, facial muscle activity, and environmental noise, which act as “outlier-like” events and test the robustness and generalization capability of automated detection methods.
Applsci 15 12343 g005
Figure 6. Confusion matrix for the heterogeneous stacking ensemble (SVM + k-NN) on Bonn.
Figure 6. Confusion matrix for the heterogeneous stacking ensemble (SVM + k-NN) on Bonn.
Applsci 15 12343 g006
Figure 7. McNemar contingency table for the Bonn dataset comparing the stacking ensemble with the k-NN baseline ( p = 0.0042 ). Symbols: —correct classification; X—incorrect classification. The significant difference confirms that the ensemble model yields consistently more correct classifications.
Figure 7. McNemar contingency table for the Bonn dataset comparing the stacking ensemble with the k-NN baseline ( p = 0.0042 ). Symbols: —correct classification; X—incorrect classification. The significant difference confirms that the ensemble model yields consistently more correct classifications.
Applsci 15 12343 g007
Figure 8. McNemar contingency table for the GBNE dataset comparing the stacking ensemble with the SVM baseline ( p = 0.0049 ). Symbols: —correct classification; X—incorrect classification. The results indicate a statistically significant improvement of the ensemble classifier over the base model.
Figure 8. McNemar contingency table for the GBNE dataset comparing the stacking ensemble with the SVM baseline ( p = 0.0049 ). Symbols: —correct classification; X—incorrect classification. The results indicate a statistically significant improvement of the ensemble classifier over the base model.
Applsci 15 12343 g008
Table 1. Outlier detection methods in EEG: classical and statistical approaches.
Table 1. Outlier detection methods in EEG: classical and statistical approaches.
AuthorsYearAlgorithmApplicationMain Challenge AddressedDataset
Classical and Statistical Methods
Chandola et al. [59]2009One-Class SVMOutlier detection in simulated EEGHandling high-
dimensional data with limited labels
Simulated
Breunig et al. [43]2000Local Outlier Factor (LOF)Density-based anomaly detectionLocal density estimation and scalability issuesSynthetic EEG
Giri et al. [58]2015Median MAD statisticsChannel noise/outlier epoch detectionRobustness to noise and sensor artifactsClinical EEG
Zhao et al. [62]2019Isolation ForestEpileptic seizure detectionHigh-dimensional feature space, interpretabilityCHB-MIT
Chen et al. [60]2014Random Cut ForestReal-time anomaly detectionOnline detection and streaming data constraints-
Abid et al. [57]2024Smart k-NN Outlier Detector (SKOD)Local structure-based anomaly detectionImbalanced data and dynamic thresholdingTUH EEG
Table 2. Outlier detection methods in EEG: deep learning and hybrid approaches.
Table 2. Outlier detection methods in EEG: deep learning and hybrid approaches.
AuthorsYearAlgorithmApplicationMain Challenge AddressedDataset
Deep Learning and Hybrid Methods
Zhao et al. [53]2024Variational Autoencoder (VAE)Nonlinear anomaly detectionCapturing latent nonlinearity and uncertainty in EEGTUH EEG
Nawaz et al. [61]2023Ensemble of Autoencoders (EoAE)Biomedical anomaly detectionRobustness across heterogeneous data, ensemble diversityMultiple biomedical datasets
Potter et al. [55]2022Transformer-
based Autoencoder
Unsupervised seizure detectionLong-range temporal dependencies, model complexityTUH EEG
Liu et al. [63]2025Meta-sampling EnsembleImbalanced seizure detectionData imbalance and model calibration for rare eventsIntracranial EEG
Table 3. Comparison of the Bonn and GBNE EEG datasets.
Table 3. Comparison of the Bonn and GBNE EEG datasets.
CharacteristicBonn EEG DatasetGBNE Dataset
ParticipantsHealthy volunteers and epilepsy patients97 subjects (51 epilepsy patients, 46 healthy controls)
Recording SetupClinical/laboratory setting, artifact-free signalsField conditions in rural/semi-urban West Africa, low-cost device
ChannelsSingle-channel14 channels (modified 10–20 system)
Segment Length23.6 s (4097 samples)∼5 min per subject
Sampling Rate173.61 Hz, 12-bit128 Hz, 14-bit
Data Volume500 balanced segments (∼196 min total)∼8 h of multichannel EEG
ContentFive subsets: A–B (healthy), C–D (interictal), E (ictal seizures)Resting state (eyes closed), epilepsy vs. healthy
Data QualityCarefully selected, artifact-freeContains motion artifacts, environmental noise, electrode variability
Clinical RelevanceBenchmark dataset, idealized conditionsRealistic scenario for low-resource settings; high variability tests robustness
Table 4. Evaluation metrics used for EEG outlier detection.
Table 4. Evaluation metrics used for EEG outlier detection.
MetricFormula/Interpretation
Accuracy ( A C C ) T P + T N T P + T N + F P + F N —overall correctness
Precision (P) T P T P + F P —reliability of detected anomalies
Recall (R) T P T P + F N —sensitivity to epileptic events
Specificity ( S P ) T N T N + F P —correct rejection of normal EEG
F 1 -score 2 P R P + R —balance between precision and recall
Table 5. Contingency table for McNemar’s test.
Table 5. Contingency table for McNemar’s test.
Base CorrectBase Wrong
Ensemble correct n 11 n 01
Ensemble wrong n 10 n 00
Table 6. Base classifiers: Metric ranges on Bonn EEG and GBNE datasets (proportions). Selection in our framework uses outlier-class F 1 > 0.60 ; accuracy is shown for context only.
Table 6. Base classifiers: Metric ranges on Bonn EEG and GBNE datasets (proportions). Selection in our framework uses outlier-class F 1 > 0.60 ; accuracy is shown for context only.
DatasetClassifierACCPrecision F 1 Recall
Bonn EEGSVM[0.833; 0.944][0.44; 0.951][0.532; 0.797][0.363; 0.663]
k-NN[0.883; 0.925][0.865; 0.904][0.893; 0.902][0.727; 0.925]
DT–CART[0.450; 0.488][0.538; 0.672][0.511; 0.600][0.344; 0.488]
GBNESVM[0.747; 0.841][0.459; 0.610][0.599; 0.624][0.803; 0.821]
k-NN[0.638; 0.799][0.551; 0.645][0.640; 0.715][0.763; 0.803]
DT–CART[0.503; 0.605][0.552; 0.673][0.500; 0.600][0.499; 0.600]
Table 7. Top-performing base classifier configurations with corresponding outlier detection F 1 on the Bonn EEG dataset. Models with F 1 0.60 are reported for completeness but were not used in ensemble construction according to the predefined selection rule.
Table 7. Top-performing base classifier configurations with corresponding outlier detection F 1 on the Bonn EEG dataset. Models with F 1 0.60 are reported for completeness but were not used in ensemble construction according to the predefined selection rule.
Classifier F 1
SVM kernel = rbf, C = 100.797
SVM kernel = rbf, C = 1000.797
SVM kernel = poly, C = 100.532
SVM kernel = poly, C = 1000.532
DT-CART entropy, MG = 20, LP = 20.565
DT-CART entropy, MG = 30, LP = 100.571
DT-CART entropy, MG = 20, LP = 100.594
DT-CART entropy, MG = 20, LP = 200.511
3-NN, Manhattan, uniform0.865
3-NN, Manhattan, distance0.914
3-NN, Euclidean, distance0.848
5-NN, Manhattan, distance0.865
Table 8. Sample configurations of homogeneous ensemble classifiers for n = 2 and n = 3 are shown, and only base models with outlier-class F 1 > 0.60 were eligible. Abbreviations: Manh denotes Manhattan distance, Eucl denotes Euclidean distance, u denotes uniform weighting, and d denotes distance-based weighting.
Table 8. Sample configurations of homogeneous ensemble classifiers for n = 2 and n = 3 are shown, and only base models with outlier-class F 1 > 0.60 were eligible. Abbreviations: Manh denotes Manhattan distance, Eucl denotes Euclidean distance, u denotes uniform weighting, and d denotes distance-based weighting.
nBase ClassifierHomogeneous Ensemble Configuration
2SVMSVM (rbf, C = 10) + SVM (rbf, C = 100)
k-NN3-NN (Manh, u) + 3-NN (Manh, d)
3k-NN (variant 1)3-NN (Manh, u) + 3-NN (Manh, d) + 3-NN (Eucl, d)
k-NN (variant 2)3-NN (Manh, u) + 3-NN (Manh, d) + 5-NN (Manh, d)
Table 9. Evaluation metrics for homogeneous ensemble classifiers on the Bonn EEG dataset. Values are reported as percentages. The best value in each column within a classifier block is shown in bold.
Table 9. Evaluation metrics for homogeneous ensemble classifiers on the Bonn EEG dataset. Values are reported as percentages. The best value in each column within a classifier block is shown in bold.
Ensemble (Homogeneous)k-NNSVM
ACCPF1RecallACCPF1Recall
Voting (2, hard)92.5%90.2%91.4%90.8%66.3%100.0%79.7%88.7%
Voting (3, hard)92.5%90.2%91.4%90.8%66.3%100.0%79.7%88.7%
Voting (4, hard)91.3%92.4%91.8%92.1%36.3%100.0%53.2%69.5%
Voting (2, soft)92.5%90.2%91.4%90.8%90.0%84.7%87.2%85.9%
Voting (3, soft)92.5%90.2%91.4%90.8%87.5%87.5%87.5%87.5%
Voting (4, soft)92.5%90.2%91.4%90.8%86.3%85.2%85.7%85.5%
Stacking (2)91.3%91.3%91.3%91.3%91.3%84.9%88.0%86.4%
Stacking (3)95.0%86.4%90.5%88.4%86.3%93.2%89.6%91.3%
Stacking (4)88.8%91.0%89.9%90.4%86.3%93.2%89.6%91.3%
Bagging (2)80.0%80.0%80.0%80.0%66.3%91.4%76.8%83.0%
Bagging (3)87.5%83.3%85.4%84.3%76.3%91.0%83.0%87.0%
Bagging (4)88.8%76.3%82.1%79.1%75.0%93.8%83.3%88.2%
Table 10. Evaluation metrics for homogeneous ensemble classifiers on the GBNE dataset. Values are reported as percentages. The best value in each column within a classifier block is shown in bold.
Table 10. Evaluation metrics for homogeneous ensemble classifiers on the GBNE dataset. Values are reported as percentages. The best value in each column within a classifier block is shown in bold.
Ensemble (Homogeneous)k-NNSVM
ACCPF1RecallACCPF1Recall
Voting (2, hard)80.3%46.9%59.2%52.3%81.6%48.4%60.8%53.9%
Voting (3, hard)82.9%44.7%58.1%50.5%81.6%48.4%60.8%53.9%
Voting (4, hard)82.9%46.0%59.1%51.7%68.4%57.8%62.7%60.2%
Stacking (2)72.4%78.6%75.3%76.9%80.3%47.3%59.5%52.7%
Stacking (3)72.4%78.6%75.3%76.9%85.5%44.2%58.3%50.3%
Stacking (4)67.1%76.1%71.3%73.7%86.8%44.9%59.2%52.1%
Bagging (2)82.9%40.4%54.3%46.3%11.8%81.8%20.60%33.3%
Bagging (3)80.3%38.8%52.4%44.6%11.8%75.0%20.4%32.2%
Bagging (4)82.9%39.4%53.4%45.7%11.8%81.8%20.60%33.3%
Table 11. Examples of automatically generated heterogeneous ensembles for n = 2 and n = 3 . Models failing to meet the F 1 > 0.60 selection threshold (e.g., DT-CART) were excluded from the ensemble generation process.
Table 11. Examples of automatically generated heterogeneous ensembles for n = 2 and n = 3 . Models failing to meet the F 1 > 0.60 selection threshold (e.g., DT-CART) were excluded from the ensemble generation process.
n = 2 SVM (rbf, C = 10 ) + 3-NN (Manh, u)
SVM (rbf, C = 10 ) + 3-NN (Manh, dist)
SVM (rbf, C = 10 ) + 3-NN (Eucl, dist)
SVM (rbf, C = 10 ) + 5-NN (Manh, dist)
SVM (rbf, C = 100 ) + 3-NN (Manh, u)
SVM (rbf, C = 100 ) + 3-NN (Manh, dist)
SVM (rbf, C = 100 ) + 3-NN (Eucl, dist)
SVM (rbf, C = 100 ) + 5-NN (Manh, dist)
n = 3 SVM (rbf, C = 10 ) + SVM (rbf, C = 100 ) + 3-NN (Manh, u)
SVM (rbf, C = 10 ) + SVM (rbf, C = 100 ) + 3-NN (Manh, dist)
SVM (rbf, C = 10 ) + SVM (rbf, C = 100 ) + 3-NN (Eucl, dist)
SVM (rbf, C = 10 ) + SVM (rbf, C = 100 ) + 5-NN (Manh, dist)
SVM (rbf, C = 10 ) + 3-NN (Manh, u) + 3-NN (Manh, dist)
SVM (rbf, C = 10 ) + 3-NN (Manh, u) + 3-NN (Eucl, dist)
SVM (rbf, C = 100 ) + 3-NN (Manh, u) + 3-NN (Manh, dist)
SVM (rbf, C = 100 ) + 3-NN (Eucl, dist) + 5-NN (Manh, dist)
Table 12. Evaluation metrics for heterogeneous ensembles on the Bonn dataset. Values are reported as percentages. Best values per column are in bold.
Table 12. Evaluation metrics for heterogeneous ensembles on the Bonn dataset. Values are reported as percentages. Best values per column are in bold.
EnsembleACCP F 1 Recall
Voting (2, hard)93.8%89.3%91.5%90.4%
Voting (soft)93.8%89.3%91.5%90.4%
Stacking95.0%90.5%92.7%91.6%
Voting (3, hard)93.2%88.4%90.7%89.5%
Voting (4, hard)92.9%87.6%90.1%88.8%
Stacking 294.4%89.8%92.0%90.9%
Stacking 394.7%90.2%92.4%91.2%
Stacking 495.0%90.5%92.7%91.6%
Bagging 291.0%86.2%88.5%87.3%
Bagging 391.3%86.7%88.8%87.7%
Table 13. Evaluation metrics for heterogeneous ensembles on the GBNE dataset. Values are reported as percentages. Best values per column are in bold.
Table 13. Evaluation metrics for heterogeneous ensembles on the GBNE dataset. Values are reported as percentages. Best values per column are in bold.
Ensemble ACC P F 1 Recall
Voting (2, hard)46.0%71.4%56.0%62.8%
Voting (soft)42.0%89.0%57.0%69.5%
Stacking85.5%45.8%59.6%51.8%
Bagging74.0%52.0%61.0%56.1%
Voting (3, hard)47.2%72.0%56.5%63.3%
Voting (4, hard)48.0%72.8%57.0%63.8%
Stacking 284.2%46.0%59.2%51.8%
Stacking 384.8%45.5%59.5%51.6%
Stacking 485.0%46.2%59.8%52.0%
Bagging 273.8%51.8%60.8%56.0%
Bagging 374.3%52.5%61.2%56.3%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Duraj, A.; Łukasik, N.; Szczepaniak, P.S. Outlier Detection in EEG Signals Using Ensemble Classifiers. Appl. Sci. 2025, 15, 12343. https://doi.org/10.3390/app152212343

AMA Style

Duraj A, Łukasik N, Szczepaniak PS. Outlier Detection in EEG Signals Using Ensemble Classifiers. Applied Sciences. 2025; 15(22):12343. https://doi.org/10.3390/app152212343

Chicago/Turabian Style

Duraj, Agnieszka, Natalia Łukasik, and Piotr S. Szczepaniak. 2025. "Outlier Detection in EEG Signals Using Ensemble Classifiers" Applied Sciences 15, no. 22: 12343. https://doi.org/10.3390/app152212343

APA Style

Duraj, A., Łukasik, N., & Szczepaniak, P. S. (2025). Outlier Detection in EEG Signals Using Ensemble Classifiers. Applied Sciences, 15(22), 12343. https://doi.org/10.3390/app152212343

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop