FDSTCN-EEG: Federated Depthwise Separable Temporal Convolutional Networks for Decentralized EEG Seizure Detection

Lim, Zheng You; Pang, Ying Han; Ooi, Shih Yin; Khoh, Wee How; Chew, Yee Jian

doi:10.3390/ai7030101

Open AccessArticle

FDSTCN-EEG: Federated Depthwise Separable Temporal Convolutional Networks for Decentralized EEG Seizure Detection

by

Zheng You Lim

¹,

Ying Han Pang

^1,2,*

,

Shih Yin Ooi

^1,2

,

Wee How Khoh

^1,2 and

Yee Jian Chew

^1,2

¹

Centre for Advanced Analytics, CoE for Artificial Intelligence, Multimedia University, Jalan Ayer Keroh Lama, Bukit Beruang, Melaka 75450, Malaysia

²

Faculty of Information Science and Technology, Multimedia University, Jalan Ayer Keroh Lama, Bukit Beruang, Melaka 75450, Malaysia

^*

Author to whom correspondence should be addressed.

AI 2026, 7(3), 101; https://doi.org/10.3390/ai7030101

Submission received: 6 January 2026 / Revised: 3 March 2026 / Accepted: 6 March 2026 / Published: 10 March 2026

(This article belongs to the Section Medical & Healthcare AI)

Download

Browse Figures

Versions Notes

Abstract

In this paper, we propose FDSTCN-EEG, which is a customized federated learning framework for EEG-based seizure detection that leverages deep depthwise separable temporal convolutions and asynchronous model aggregation. The network design tackles major problems in distributed healthcare AI by jointly boosting computational efficiency, training rate, and classification performance. In this paper, we propose FDSTCN-EEG, a novel federated learning framework specifically designed for EEG-based seizure detection. Our key contributions are threefold: first, high architectural efficiency with depthwise separable temporal convolutions, reducing parameters by 40.4% (9.8M to 5.8M) while maintaining accuracy of 96.96%; second, speeding up training by a factor of 38.5% compared with synchronous learning via an asynchronous aggregation protocol; finally, a privacy-preserving decentralized learning model without sharing raw EEG data and with the capability of coping with the heterogeneous clinical technology infrastructure. Extensive experiments show superior performance (accuracy: 96.96%, F1-score: 97.02%) and practical viability for real-world seizure monitoring systems. Such work introduces a practical privacy-preserving medical AI paradigm, which balances model efficiency with training scalability and clinical quality accuracy.

Keywords:

temporal convolutional networks; EEG seizure detection; depthwise separable convolutions; federated learning; asynchronous aggregation

1. Introduction

Epilepsy is one of the most prevalent neurological disorders and affects about 50 million people worldwide, approximately between 4 and 10 per 1000 population [1]. The main test for the diagnosis of seizures is the EEG method [1]. While the application of deep learning to seizure classification is a remarkable success, current methods have three major limitations: (1) privacy issues where EEG data are first centralized [2], (2) the computational need of traditional DNNs for implementation on edge devices being extremely high [3], and (3) the performance degradation associated with synchronous federated learning when applied to clinical or real-world IOT devices’ datasets that are heterogeneous in nature [4].

Federated learning (FL) provides a decentralized approach, and the aggregated model can be trained on local devices or institutional servers without transmitting raw EEG data, which reduces privacy concerns [5]. However, a major limitation of classical FL is the synchronous aggregation methodology, which can be affected by straggler (slow worker) effects that will slow down the updating process [6]. Asynchronous approaches to FL address this drawback by allowing clients to update the server in a non-blocking manner, and are resilient against the varying network conditions and device capabilities that occur in realistic scenarios [7].

To jointly tackle the privacy preservation, computation, and practical deployment challenges, we present FDSTCN-EEG, an innovative framework that integrates:

Depthwise separable Temporal Convolutional Networks—We propose a compact model which replaces plain convolutions with depthwise operations, significantly reducing the number of parameters while keeping its temporal-feature extracting ability, an important factor for analyzing EEG data [8].
Adaptive asynchronous FL aggregation—We propose a time-aware aggregation algorithm that weights client updates by the transmission delay and relevance of datasets, resulting in faster convergence compared to synchronous baselines [9].

In summary, our key contributions include:

The first application of depthwise separable TCNs with asynchronous FL in EEG-based seizure detection.
A thorough clinical-grade evaluation on datasets recorded in clinics is presented, with an accuracy of 96.95% and a 40.4% reduction in communication overhead.
We adopted asynchronous aggregation when training the TCN in FL, which provides faster convergence up to 38.5% compared with the synchronous aggregation approach.

This work fills the critical gap of privacy-preserving distributed learning and on-device efficiency for real-world seizure monitoring systems at a large scale. Federated learning is illustrated in Figure 1.

2. Related Works

Traditional machine learning approaches for seizure classification relied on handcrafted features (e.g., spectral power, entropy) with shallow classifiers like SVMs [10] or random forests [11]. While interpretable, these methods often struggled with generalization across diverse patient populations. Deep learning has been a game changer in the field, and CNNs [12] and LSTMs [13] are among the new models that have achieved better accuracy by automatically learning spatiotemporal EEG patterns. Later on, Temporal Convolutional Networks (TCNs) appeared as an attractive alternative, which has both parallel processing and the ability to capture long-range dependencies via dilated convolutions [8]. Yet such architectures usually necessitate centralized storage of the EEG records, which is an issue of privacy according to medical data laws [14].

The FL paradigm is already being acknowledged as a secure approach for federated medical data analysis. Existing works also show FL’s effectiveness in diagnosis related to brain disorders with MRI and EEG [15], although most of them use synchronized aggregations, which are vulnerable to straggler effects in practical deployments with non-identical devices. Asynchronous FL methods can address this in a flexible way by enabling clients with asynchronous participation [16], and some of them integrate staleness-aware weighting [17] or clustered aggregation [18] to cope with non-IID data. Specifically, [19] introduced an FL framework for epileptic seizure prediction with the computationally expensive CNN combined with Bi-LSTM, and [20] used non-heavy models but did not consider issues of asynchronous communication—our work fills this gap.

The efficiency of a model is paramount when deploying at edge devices in healthcare. Depthwise separable convolutions, first popularized by MobileNet [21], are effective for EEG analysis as they detangle spatial and channel-wise feature learning [22]. As far as we know, in seizure EEG classification, no study has reported the application of depthwise separable TCNs on asynchronous FL—such a combination is a contribution of our work to provide differential privacy and real-time ability.

The existing literature suffers from three major limitations: (1) the majority of FL-EEG research works exploit synchronous aggregation, which is not applicable to real-device diversity graphs; (2) previous lightweight designs (such as pruned CNNs) cannot afford a temporal modeling ability, which is essential for seizure detection and characterization; and (3) there has been no prior work in jointly optimizing separable TCNs in a human-centric way with asynchronous FL schemes. Our work further consolidates these dimensions, empowering privacy-preserving, low-latency seizure monitoring.

3. Methodology

In spite of the established competence of deep learning at the auto-classification of seizures, existing approaches face three main challenges: (1) centralized collection of EEG data violates patient privacy, (2) traditional models such as CNNs and RNNs are too computationally heavy for real-world edge deployment, and (3) synchronized federated training suboptimizes for diverse types of healthcare devices. Despite advances, the integration of depthwise separable TCN architecture to optimize efficiency and temporal modeling has not been studied with asynchronous FL aggregation, crucial for handling device heterogeneity. We bridge this gap by presenting the first uniform framework that unifies depthwise separable TCNs and adaptive asynchronous FL, realizing confidential and low-latency seizure monitoring. Figure 2 depicts the overall process of FDSTCN-EEG on EEG signals for seizure classification.

3.1. Datasets

The research itself employed two publicly available EEG datasets. The curator of the first dataset was from the Rochester Institute of Technology, and the host was the University of California Irvine (UCI) Machine Learning Repository (available at: https://www.kaggle.com/datasets/harunshimanto/epileptic-seizure-recognition (accessed on 1 July 2025)) [23]. The dataset is a set of recorded EEGs of a group of 500 subjects, in which each of the subjects contributes a 23.6 s long EEG recording containing 4097 data points at 178 Hz. The recordings were divided into 23 non-overlapping, one-second epochs (178 samples on each epoch) so that the final number of standardized data segments was 11,500 over the entire dataset.

It included five clinically relevant conditions, namely (1) eyes open (no seizure baseline), (2) eyes closed (no seizure baseline), (3) activity in healthy brain regions (no seizure control), (4) activity in tumor-affected regions (pathological no seizure), and (5) ictal seizure activity. The dataset contains 11,500 standardized EEG segments distributed across five conditions:

Condition 1 (Eyes open): 2875 samples;
Condition 2 (Eyes closed): 2875 samples;
Condition 3 (Healthy brain activity): 2875 samples;
Condition 4 (Tumor-affected regions): 2875 samples;
Condition 5 (Ictal seizure activity): 2875 samples.

Such a categorical stratification allows providing a very strong differentiation between normal brain activity, the non-epileptiform abnormalities, and epileptic settings of seizures, presenting a thorough structure within which classification models can be established. Figure 3 illustrates a sample of a seizure EEG and a healthy EEG.

A second database was the CHB-MIT Scalp EEG Database, which comprises EEG reports from 22 children with intractable epilepsy, collected during long-term sessions at Boston Children’s Hospital. The dataset comprises 182 labeled seizures and the recordings were made over several days of withdrawal of anti-seizure medication. The cleaned data have 11,233 samples (rows) and 36,865 features (columns), of which seizure (ictal, 1434 samples) and non-seizure (interictal, 9799 samples) are binary target columns.

There was a considerable class imbalance in both datasets, where seizures represented only 20% of the samples (prevalence = 0.2) and four different non-seizure types represented the rest of the 80%. In order to correct this imbalance and reduce bias, we performed stratified subsampling. We randomly undersampled the non-seizure classes, preserving all seizure instances, achieving a balanced number of seizure and non-seizure cases. This step helps the model to learn equally from the important classes in training and evaluation.

3.2. Federated Learning Setup

3.2.1. Client Selection and Count Rationale

We simulated 20 clients in our federated setting, which is a representative scale at a hospital or edge device. This value was decided upon due to the likely scenario where multiple patients or clinics would contribute data independently.

3.2.2. Non-IID Data Simulation

Non-IID scenarios were emulated by distributing the dataset among clients according to:

Patient-specific models (simulating physiological differences between subjects);
Changes in class imbalance (clients with different numbers of seizures and non-seizures);
Temporal pattern diversity (varied patterns of seizure onset among clients).

3.2.3. Privacy Guarantees:

Apart from combining raw data avoidance, we applied:

Differential privacy with ε = 0;
Secure aggregation using homomorphic encryption;
Gradient Clipping (as a possible defense against any data leakage from model updates) with threshold = 1.0.

3.3. FDSTCN-EEG: Federated Depthwise Separable Temporal Convolutional Networks

3.3.1. Depthwise Separable Temporal Convolutional Network

In this paper, we made use of a Temporal Convolutional Network (TCN) to classify the EEG recordings. The TCN comprises dilated causal convolutions to mimic the brain-behavior mappings across time. Multi-layer TemporalBlocks comprising two 1D convolutional layers is the architecture. Both layers have dilation rates doubling at 1, 2, 4 and 8. The network was created to identify short epileptiform discharges and long ictal patterns while still preserving temporal causality by using Chomp1d layers, allowing it to remove the extra padding. The network benefits from better gradient flow in training. It prevents overfitting to the specific seizure EEG characteristics by introducing residual connections within 1 × 1 convolutions as dropout layers with p = 0.2. Its receptive field eventually extends up to about 32 time steps in order to keep instant timing information useful for medical diagnosis and intervention. The TCN dilated causal convolution approach is shown in Figure 4.

The network structure exploits four feature extraction layers, which are consecutively deeper, with channel numbers from 16 to 128. Each layer is tuned to model patterns at different temporal resolutions. The use of dilated convolutions enables the model to capture dynamic seizure signatures effectively across long time-windows without incurring an exponential increase in the number of parameters as higher layers naturally aggregate information from broader temporal windows. After each convolutional operation, batch normalization and ReLU activation are used to stabilize the gradient flow during training.

Next, the causal convolutions are done by careful padding and filtering, used to avoid information from the future timeline. This property is essential for real-time seizure detection, and it should be the case that predictions are based only on past and present inputs.

The architecture goes from feature extraction to classification in the end through a two-step procedure. First, a pointwise (1 × 1) convolution with 128 input and 64 output channels is used to reduce the channel dimension from 128 to 64 without changing the full temporal length. These features are then flattened and fed to a fully connected layer with SoftMax activation for binary seizure/non-seizure classification. This architecture preserves finer temporal information rather than global pooling methods, which can remain sensitive to short-duration epileptiform events.

The final architecture is also efficient, and in part owes its efficiency to the use of dilated convolutions and residual connections, as well as deliberate reduction in dimensionality. It has been shown that it is capable of better handling long EEG sequences compared to regular RNNs, and yet it is more computationally economical than usual CNNs in such temporal classification. As a result, the introduced TCN model reaches an optimal trade-off between temporal resolution, model complexity and classification performance for seizure detection. The architecture of the TCN is shown in Figure 5.

The Temporal Convolutional Network (TCN) used in this study is structured hierarchically, by successive dilated causal convolutions, to model the short-duration part of an epileptiform discharge but also the long-range structuring of an ictal pattern more efficiently than previous methods. The main network consists of four successive TemporalBlocks with two convolutional layers in each, where the dilation rate exponentially grows when moving through the blocks (1-2-4-8). It mirrors a progression of dilation, allowing the model great receptivity—up to about 32 time steps—in the temporal dimension without increasing the computational burden due to a deeper network and larger kernel size for multi-scale feature learning. The evolution on distribution encourages dispersion of the nodes between kernels and enables each convolution to extract features from a sliding window at progressively broader intervals; for example, a dilation rate of 8 means the kernel covers every eighth time step, summarizing events over long temporal sequences. In particular, each convolution is made causal through padded Chomp1d layers. As padding adds the same number of elements to the tempest at both ends, standard 1D convolutions with padding accidentally include future information. The Chomp1d module address this issue by Slicing the output after each convolution operation to abandon any extra padding from the end of a proceeding, making it so that an output at time t can only be dependent on inputs before and at time t, which is one of the key comparisons for processing real-time seizure data, since in prediction we want to keep computing efforts processing brain activity yet without information about future frames.

After each dilated convolution and Chomp1d operation, the network applies batch normalization to normalize activations in each layer, enabling faster training. Batch normalization normalizes activations by centering and scaling the output of every layer with the mean and standard deviation calculated across the mini-batch dimension, which helps reduce internal covariate shift as well as allows higher learning rates. Normalizing the inputs to each layer is especially relevant for applications involving EEG, as signal amplitudes can vary drastically between patients and recording sessions; normalizing helps maintain consistent gradient flow, thus preventing the model from over-emphasizing high-amplitude features. The TemporalBlock consists of two such convolution–normalization–activation sequences (convolution → Chomp1d → batch normalization → ReLU → dropout) with a residual connection skipping both convolutions. This residual path through identity mapping, or a 1 × 1 convolution if changing the number of channels, allows gradients to flow directly through it back to the earlier layers, averting the vanishing-gradient problem present in deep networks. The four blocks increase channel dimensions from 16 to 128; this also corresponds to increasingly abstract representations built by the neural network: low dimensionalities (16 → 32 channels) capture fine temporal detail, such as spike–wave discharges, while high dimensionalities (64 → 128 channels) model richer spatiotemporal patterns of seizure evolution. Finally, at the end of the TemporalBlock, we have a pointwise 1 × 1 convolution to reduce the channel dimension from 128 to 64 and flatten the output before sending it for classification as a binary seizure using a fully connected layer with SoftMax activation. Notably, the inclusion of batch normalization and residual connections allows for both enhanced training stability and efficient depth, ensuring that the network captures fine-grained temporal information through causal constraints in order to provide robust seizure detection across heterogeneous EEG recordings.

The applied depthwise separable convolution splits the ordinary 1D convolution into two different convolutions: depthwise and pointwise convolutions. The depthwise separable convolution splits a regular convolution into two convolutions, one that operates on each input channel independently (called the depthwise convolution) and another that generates an output using all of the channels in parallel (called the pointwise convolution), leading to huge computational savings while having only half the representational capacity. The depthwise convolution applies one (and only one) convolutional filter per input channel independently, which means that if the input has independent channels, it uses separate kernels—which all have the same size of

k \times 1

—that operate purely on their associated input without any cross-channel interaction. This is done by specifying the number of groups equal to the input channels

C_{i n}

, such that each channel gets its own processing. This means that the output in this step has an equal channel depth to the input, but information is spatially filtered per channel. Then, the pointwise convolution performs a

1 \times 1

convolution on the channel dimension to linearly combine the outputs of all depthwise channels and obtains the final output with the desired number of channels

C_{o u t}

. By doing so, it mixes the information between the channels and simultaneously learns from scratch what will be a good combination of the spatial features calculated by the depthwise layer.

To put it simply, a depthwise separable convolution is a smarter way of computing the information by splitting it in half through 2 simple stages. In a standard convolution, it would be as if every person on an entire team were attempting to piece together every little bit of a complicated puzzle all at once, requiring humongous amounts of effort and communication. Instead, depthwise separable convolution enables each person to become a specialist on just one puzzle piece, studying it with care and becoming an expert on its unique patterns—this is the depthwise part, where each channel is processed independently. Then you have a supervisor who looks at all the findings of each specialist and merges them to see how the individual pieces fit into the whole—this is the pointwise part, where a simple 1 × 1 view mixes information across channels. This separation of work allows the model to gain the same insight into the data, but with far less effort—similar to how a well-planned kitchen can produce delicious dishes much faster by grouping (chef) chopping tasks, stirring tasks and combining ingredient tasks together, rather than one chef preparing everything on their own. In short, this efficiency allows the model to be much more accurate while needing many fewer parameters, so that it can run even on small medical devices (like portable EEG edge devices) that do not need to have the computational power of a normal computer.

Mathematically, for an input tensor

X \in R^{N \times C_{i n} \times L}

and kernel

W \in R^{C_{o u t} \times C_{i n} \times k}

, a standard convolution computes

Y = X ⊛ W

where ⊛ denotes cross-correlation. The depthwise separable variant first applies channel-wise spatial convolution (

W_{d e p t h} \in R^{C_{i n} \times 1 \times k}

) in Equation (1):

Y_{d e p t h} [c, l] = \sum_{i = 0}^{k - 1} X [c, l + i] \cdot W_{d e p t h} [c, 0, i]

(1)

Then, this is followed by a 1 × 1 pointwise convolution (

W_{p o i n t} \in R^{C_{o u t} \times C_{i n} \times 1}

) in Equation (2):

Y [c, l] = \sum_{m = 0}^{C i n - 1} Y_{d e p t h} [m, l] \cdot W_{p o i n t} [c, m, 0]

(2)

This factorization reduces parameters from

O (k \cdot C_{i n} \cdot C_{o u t})

to

O (k \cdot C_{i n} + C_{i n} \cdot C_{o u t})

, achieving 8 to 9 times compression when

k = 3

and

C_{i n} = C_{o u t} = 128,

as in the final Temporal Block. Figure 6 illustrates the architecture of the depthwise separable Temporal Convolutional Network in the FDSTCN-EEG.

The DepthwiseSeparableConv1d module replaces the usual convolutions in TemporalBlocks but keeps the residual structure. Each block processes EEG signals through two depthwise separable convolution branches with ReLU activation and a dropout value of 0.2. An exponentially expanding receptive field is obtained by multiplying the dilation rate by a factor of 2 at each layer (1,2,4,8), although the depthwise operations guarantee this expansion to still be computationally affordable.

For all EEG inputs with L = 178 time steps and C_in = 128 channels, this reduces the number of parameters in depthwise separable convolutions used by the final TemporalBlock from 3128 + 128,128 = 16k to 3,128,128 = 49k for standard convolutions—a reduction of about 66%. Thus, the model still inherits (and benefits from) edge deployment but not at the expense of temporal precision, because (1) causal dilation prevents forward-looking label contamination, (2) the residual pathway retains signal quality and potentially facilitates learning easier relationships, and (3) pointwise convolution allows feature recombination on channels. The 22,784-dimensional flattened feature map (128 × 178) acts as a compact representation of both the oscillatory signatures (which are encoded by depthwise layers) and inter-channel interactions (which are embedded within pointwise operations), achieving an efficient basis that aims at detecting seizures.

3.3.2. Asynchronous Federated Aggregation Method

The asynchronous federated aggregation framework we proposed also leverages dynamic decentralized training, where synchronized client updates are not needed. In essence, the system is based on an adaptively weighted average with staleness that automatically assigns a different weight to each client according to the arrival time of its most recent contribution, such as to privilege recent contributions and discount them as they get older. This is done by a server-side priority queue for immediate update processing and a momentum-based merge algorithm to absorb variability in the incoming update stream. Moreover, the algorithm checks mutual update compatibility by comparing similarity against a moving average of recent gradients, so that only consistent changes on the parameter level are finally assimilated into the global model. Pseudocode for the asynchronous federated aggregation process is given as follows:

METHOD async_client_train(client):
client.train()
ACQUIRE update_lock
round_num = self.current_round
model_update = deep_copy(client.model)
weight = client.train_samples
# Calculate staleness (delay between client’s round and current round)
staleness = round_num − client.last_round_participated
# Store update with staleness info
self.received_updates[round_num][client.id] = (model_update, weight, staleness)
# Pass staleness to aggregation
self.aggregate_update(model_update, weight, staleness)
NOTIFY condition
RELEASE update_lock
METHOD aggregate_update(model_update, weight, staleness):
ACQUIRE model_lock
self.global_counter += 1
# Staleness-aware weighting (e.g., polynomial decay: α(t) = (staleness + 1)^ −β)
beta = 0.5 # decay factor (hyperparameter, most optimized value)
staleness_weight = (staleness + 1) ∗∗ (−beta)
effective_weight = weight ∗ staleness_weight
self.total_weight += effective_weight
# Weighted aggregation
FOR server_param, client_param IN zip(self.temp_model.parameters(), model_update.parameters()):
server_param.data += client_param.data ∗ effective_weight
IF sufficient_updates_received:
NORMALIZE global_model by total_weight
RELEASE model_lock

In this work, we introduce methods that are both simpler and more accurate in practice, and make critical technical contributions to introduce client-side training with compressed representations from depthwise separable networks for large-scale deployment to reduce communication overhead. On the server side, complex update management is realized by means of exponential staleness decay and the simultaneous processing of received contributions. The frontend exhibits better convergence properties than its synchronous counterparts, even in heavily non-regular environments. It remains robust under non-IID data distribution using client-wise normalization, and stability is guaranteed by L2 regularization with an update validation protocol.

A key difference between asynchronous and synchronous aggregation is in the underlying coordination model and in their ability to cope with system heterogeneity. Using synchronous aggregation, the server has to wait until all of those chosen clients have reported before model merging. This results in a delay that is bottlenecked by the slower clients (or stragglers), such as those with low computational power or those with bad network connections, which can slow down the whole training process. This stringent synchronization restriction often leads to very inefficient utilization of resources, as faster clients are idle while they wait for slow peers.

On the other hand, using an asynchronous strategy eliminates this centralized coordination, such that when updates are received, the server can incorporate them straight away without needing to know about what other clients are doing. This lifelong learning paradigm improves resource utilization and speeds up convergence—especially for a very heterogeneous device environment. A key ingredient of the asynchronous approach is its stale-adaptive weighting, which is a more subtle notion than the simple synchronous average. This mechanism performs an atomic, yet smart modulation of the update influence over time, thanks to both a fusion with momentum and a consistent check against the global model trajectory in validation. This architectural benefit enables asynchronous aggregation to be well-suited for real-world implementation, where device capacities and network quality are themselves varying. The operational contrast between these two paradigms is depicted in Figure 7.

4. Results and Discussion

4.1. Training Performance

To systematically assess the contributions of our architectural design choices, we conduct an ablation study analyzing the isolated and combined effects of two major architectural enhancements in our FDSTCN-EEG: (1) depthwise separable convolution layers for parameter efficiency and (2) an asynchronous aggregation protocol for fast training. This systematic investigation compares four flavors: baseline TCN with synchronous aggregation, TCN with asynchronous aggregation, depthwise separable TCN with synchronous aggregation, and our full FDSTCN-EEG. The quantitative results about total training time for 30 rounds are shown in Table 1, and the corresponding comparison of the loss over training is shown in Figure 8. The design of this experiment makes it possible to quantify the contribution of each part to the optimal efficiency and performance.

The results of the experiment, conducted on 100 independent training runs, show that both model and aggregation optimization contribute to better efficiency during training. The depthwise separable TCN with asynchronous aggregation (Lightweight FedAsync TCN) is over 10× faster compared to the normal TCN with synchronous aggregation (4658.63 s) and normal TCN with asynchronous aggregation (3788.13 s). The 95% CI for the FDSTCN-EEG ([2851.61, 2872.45]) implies performance that is both consistent and stable, considering the much larger interval of [4651.71, 4670.01] in the case of normal TCN with synchronous aggregation. Similar standard deviations among all models were observed; these values ranged from 46.12 to 52.61 s, which indicates that the variability of training times was well-behaved across experiments. These results indicate that adding depthwise separable convolution caused significant time savings with stable performance in repeated experiments.

As shown in Figure 7, the training loss curves show important characteristics of convergence behaviors under diverse model architectures and aggregation approaches. For the base TCN setup, asynchronous aggregation converges much faster than synchronous aggregation for similar loss values (after 452 s vs. 616 s). This 26.6%faster time to convergence indicates the efficiency advantage of asynchronously updating over synchronously updating, in that no zeroing-out for straggler nodes is required and we have more-frequent model updates. Nevertheless, for both depthwise separable TCN variants, we see an even larger advantage for asynchronous aggregation—the FDSTCN-EEG only takes 597 s to converge, while cloud optimization requires 1622 s, and it corresponds with a significant 63.2% faster convergence rate.

Interestingly, the results demonstrate that depthwise separable TCNs converge more slowly overall when compared to the standard TCN (597 vs. 452 s in asynchronous setup), which can be justified due to architectural differences. The depthwise separable design factorizes spatial and channel information in a sequential instead of joint manner, providing slower feature learning at the beginning but potentially being more parameter-efficient. This trade-off in early convergence velocity and long-term model efficiency is a common trait shared by factorized convolution architectures. The significantly inferior performance of synchronous aggregation in the case of depthwise separable TCNs (1622 s) indicates that lightweight models are more sensitive to update delays, and hence, asynchronous methods are more appropriate for training efficiency. Together, these results suggest that depthwise separable architectures may not converge as quickly initially, but their combination with asynchronous aggregation achieves the best trade-off between training efficiency and model compactness in a federated learning setting. The parameter number of the baseline TCN model and the proposed FDSTCN-EEG is listed in Table 2.

The parameter count analysis reveals significant architectural efficiency gains achieved by the FDSTCN-EEG compared to the standard TCN implementation. The normal TCN architecture contains 9,769,436 trainable parameters, while the FDSTCN-EEG reduces this by 40.4% to just 5,823,508 parameters through its innovative use of depthwise separable convolutions. This parameter reduction is quite significant, as it does not sacrifice the model’s performance. This is shown by the similar convergence behavior in the training loss curves of Figure 8b,d. This is added by the depthwise separable architecture that divides the spatial and channel-wise feature learning processes. This operation eliminates, without destroying the representational power of the model, the needless parameters in convolutional operations.

The efficiency of its parameters makes the FDSTCN-EEG outstanding in the federated learning environment. To begin with, the truncated model size leads to reduced communication overhead on updating the client-server model, an issue of paramount importance in bandwidth-limited edge devices used to facilitate the federated learning process. Second, the low parameter count offers faster inference on-device, and thus, the model is better suited to run even on a low-powered compute device. Third, together with asynchronous aggregation, such architectural changes will create a very effective solution to real-world deployments of federated learning, prioritizing both computational and communication efficiency. These results demonstrate that FDSTCN-EEG achieves its dual goals of preserving the performance of the model and radically reducing both computational and communication costs of training in its distributed settings.

4.2. Classification Performance

In order to demonstrate the effectiveness of our proposed FDSTCN-EEG, we conduct a thorough comparative study against several state-of-the-art models presented in the recent literature, including classical non-federated models and our baseline federated implementation of a TCN. Some of the most relevant performance parameters for comparison are accuracy, recall, precision, specificity, and F1-score, which can be used for a good assessment of the model’s capabilities to perform the classifications. We have also trained the FDSTCN-EEG model on the UCI Seizure Dataset 100 times, and present its statistical results, including 95% CIs and Wilcoxon signed-rank tests. Table 3 presents the results of 95-percent confidence levels on the performance measures. Table 4 presents the results of Wilcoxon signed-rank tests.

We show these metrics in Table 3, evaluated on the FDSTCN-EEG model over 100 independent trainings (mean accuracy: 0.9696 ± 0.0022, and mean F1-score: 0.9702 ± 0.0026), where each training uses the same architecture from initialization to convergence with different random seeds (training multiple rounds of communication within a single federated run), rather than training over total 100 communication rounds staying on a single federated setup or session. This tough evaluation methodology justifies the robustness of our results, demonstrated by tight 95% confidence intervals (e.g., accuracy CI [0.9689, 0.9698]) and very low standard deviation on all scores, which also show stable model performances across initializations. The fact that this consistency holds over 100 separate training runs on different random seeds and juxtapositions of the data also suggests that our reported performance is not due to any fluke, but rather the phenomenon illustrated here represents indeed the real generalization ability of the proposed architecture. Notably, the above are not federated learning rounds (client-server communication) but complete end-to-end training repetitions, providing direct evidence of the model’s suitability for clinical deployment.

Wilcoxon signed-rank tests showed that all performance metrics were significantly greater than our baseline threshold (all p < 0.0001, Table 4). In the case of pairwise comparisons, precision was found to be significantly harder than recall (W = 32,885, p = 1.08 × 10⁻⁶), while different or smaller inter-metric differences were detected. This suggests that our model’s performance is slightly but consistently better at reducing FPs than FNs—that is, it leans a bit more towards certainty in seizure predictions over limiting capture of all potential seizures—and that there are no significant differences between the rest of the metrics (see, for example, recall vs. specificity), both attesting to balanced performance across all measures. This trend could indicate a strong classifier for which this marginal performance gain might be clinically relevant when the cost of false alarms is high.

Next, the quantitative data is systematically presented in Table 5, which enables direct comparison of models’ performance between different architectural paradigms and training methods. This comparative approach allows us to objectively compare the balance of model complexity and prediction accuracy, in addition to showing the advantages of our federated learning approach.

In Table 5, the comparison between model performance metrics shows that our proposed FDSTCN-EEG results in competitive performance compared to the non-federated counterparts and the baseline FL-TCN counterpart. The reported results on non-federated settings are those of CNN-Bi-LSTM [19] and DNN [24]. CNN-Bi-LSTM presented F1-scores of 0.7760, and DNN presented an accuracy of 0.80. However, federated scenarios achieved better performance on all evaluation metrics. The FL-TCN base surpasses non-federative methods by far, with an accuracy of 0.9698 and an F1-score of 0.9706. Strikingly, the FDSTCN-EEG holds almost identical performance (accuracy: 0.9696; F1-score: 0.9702) with a significant parameter reduction of 40.4%, verifying that our architectural optimizations do not sacrifice model efficacy. The balanced values of precision (0.9706) and recall (0.9698) mean the model works well in handling both false positives and false negatives, while the high specificity (0.9694) verifies its ability to distinguish negative cases correctly.

The results show that this approach causes a significant degradation over non-federated solutions. Although 1D CNN-BiLSTM + TBPTT [26] has high specificity (0.988), it reports less information in terms of other metrics. By contrast, both federated TCN wealth models have full performance profiles whereby their measure scores are consistent across all four measures, indicating more robust and balanced classification ability. The superior performance of the Lightweight FedAsync TCN is remarkable, in particular, as it does so by enlisting both architectural efficiency (depthwise separable convolutions) and training efficiency (asynchronous federated aggregation). The very small performance gap in accuracy (0.0002) between FL-TCN and FDSTCN-EEG illustrates that the proposed optimizations preserve model quality while achieving attractive practical advantages toward federated deployment setups.

Our federated learning method shows a clear edge over the centralized and other federated approaches, especially for achieving a trade-off between performance and privacy capability. While CAE-ELSTM [27] is competitive with 0.9932 on the CHB-MIT dataset under the central setting, it depends on aggregating raw EEG data—a severe invasion of medical privacy rules (e.g., HIPAA, GDPR). More importantly, our proposed FDSTCN-EEG achieves strong accuracy (0.8591) and F1-score (0.8672), only using intra-hospital-transmitted EEG data, which is not sensitive material. Most importantly, our approach outperforms the only existing federated method (i.e., Res1DCNN [15]) by 5.66% in F1-score (0.8672 vs. 0.82), offering a new state-of-the-art for the federated EEG seizure detection problem. Lightweight SE-EEGNet [28], a centralized solution, attains an accuracy of 0.9451 even with full data access. In comparison, our proposed approach effectively learns seizure patterns while preserving patient privacy.

In addition to preserving privacy, our federated designs achieve stronger robustness and practical deployment advantages. The centralized FCEEG model [29] exhibits high precision–recall imbalance (0.9995 recall while precision is merely 0.7586) on the UCI dataset, which would lead to excessive false alarms and clinical alarm fatigue if implemented in clinics. FDSTCN-EEG, on the other hand, has almost perfect balance (0.9698 recall and 0.9706 precision with only a slight accuracy drop from 0.9696 to 0.8766, an absolute improvement of 10% points). This performance balance is important in the clinical context because it limits both missed seizures and false alarms during real-world monitoring. Moreover, when considering cross-dataset generalization, our federated TCN achieves consistently better results than the non-federated alternatives in terms of all metrics and FDSTCN-EEG even obtains an accuracy increase of 2.76% for CHB-MIT compared to FL-TCN (0.8591 vs. 0.8315). This proves that federated learning, in conjunction with our architectural novelties, ensures not only privacy but generalizability of the model across different patient populations and clinical environments—the key prerequisites for future-scalable epilepsy monitoring systems.

The advantages of our approach are even more compelling, as shown by the CHB-MIT dataset results. The FDSTCN-EEG outperforms the classic approaches and our FL-TCN-based model in terms of accuracy (0.8591 vs. 0.8315), recall (0.9148 vs. 0.8626), and F1-score (0.8672 vs. 0.8373). The recall value (0.9148) is particularly interesting since it shows how the model has made sure to avoid missing most of the seizure occurrences, which is of paramount importance in clinical use. Precision experiences a slight drop (0.82422 vs. 0.8135) due to the portfolio’s recall design at the expense of trading precision, yet it remains higher than non-federated methods. The findings prove that federated Temporal Convolutional Networks, and, more specifically, the optimized FDSTCN-EEG version, can be superior to mainstream centralized methods in meeting the requirements of a distributed learning setup. As there is a difference in the classification performance between the UCI dataset and CHB-MIT dataset, the confusion matrix is generated based on the CHB-MIT dataset for failure/error analysis, as shown in Table 6.

Our model managed to reach an accuracy of 85.91% on CHB-MIT, but still missed 39 out of 287 seizures (FN rate = 13.59%). Review of FNs identified three main patterns: (1) unilateral FOC epileptic events limited to 1–2 EEG channels (50% of FN), (2) high-frequency, low-amplitude discharges that simulate muscle artifacts or movement artifacts in sleep recordings (30%), and (3) onset periods in the EEG contaminated with muscle or external-acquisition artifacts during intervals perceptually recognized by trained clinicians as an ictal event in the recording obtained in a sleep unit but not on polygraphic traces recorded at home. These mistakes are derived from the inherent difficulties of pediatric EEG: variable seizure semiology, much interference due to patients’ movements, and focal epilepsies being more common in children. The model’s temporal receptive field (∼180 ms) may also not be long enough for seizures with >1 s onset. Although acceptable for outpatient screening, this FN rate could necessitate additional clinical review for possible critical care use. In future developments, patient-specific adaptation as well as multi-modal verifications will be added to prevent misdetections.

For inference latency, FDSTCN-EEG can detect a seizure in real time with 230 ms per one-second EEG epoch, which meets the requirement for continuous monitoring. Cost-wise, compared to the centralized deep learning models using a GPU server, the federated approach is estimated to reduce costs by about 40% due to distributed computation, while communication costs are decreased by 40.4% due to parameter-efficient models.

FL in healthcare faces significant challenges due to the heterogeneity of clinical data across different hospitals. Non-IID data distributions complicate model convergence due to the variation in patients, imaging protocols, and EHR systems. In addition, the presence of resource limitations, privacy laws and class imbalances are other factors that negatively affect efficient FL implementation. These problems are especially dangerous when applied to asynchronous FL environments, where late or biased updates may impair model performance.

To overcome such risks, the FDSTCN-EEG includes adaptive mechanisms, including dynamic client weighting and staleness-aware aggregation to manage data and system heterogeneity. Lightweight TCN architecture and gradient harmonization techniques guarantee efficient training in various hospitals, whereas differential privacy and augmentation of synthetic data solve the issue with privacy and class representation. Combined, these strategies lead to increased robustness, which ensures a practical application of the model in a clinical case of FL.

5. Conclusions

This study has introduced an efficient federated learning approach to EEG seizure detection from a scalable FDSTCN-EEG architecture. Our model carefully combines depthwise separable convolutions and asynchronous aggregation to provide a trade-off for the three crucial aspects in distributed medical AI: computational efficiency, training speed, and model performance. The depthwise separable design also decreased the parameter count by 40.4% (from 9.8M to 5.8M parameters) while performance was maintained at nearly the same level as that of standard TCNs (96.96% vs. 96.98%), which highlights that architectural efficiency does not come at the cost of diagnostic capacity. Even more importantly, the asynchronous aggregation protocol showed a training speedup of 38.5% with respect to synchronous methods and final convergence times down from 1622 s to 597 s for the depthwise separable configuration. These technical innovations were benchmarked against federated and non-federated baselines, demonstrating that our approach consistently outperformed traditional methods such as CNN-BiLSTM (F1-score 97.02% vs. 77.60%), while keeping in mind the need to work on learning environments with heterogeneous and distributed resources. The balanced precision and recall of 97.06% and 96.98%, respectively, demonstrate the model’s clinical reliability by jointly designing the model architecture, training process, and holistic performance metrics to optimize the federated learning system for lightweight and accurate models of real-world medical applications, where data privacy, computation efficiency, and diagnostic accuracy are of utmost importance.

Author Contributions

Conceptualization, Z.Y.L. and Y.H.P.; methodology, Z.Y.L. and S.Y.O.; software, Z.Y.L. and W.H.K.; validation, Y.H.P. and W.H.K., S.Y.O. and Y.J.C.; formal analysis, Z.Y.L.; investigation, W.H.K. and Y.J.C.; resources, S.Y.O.; data curation, Y.H.P.; writing—original draft preparation, Z.Y.L.; writing—review and editing, Y.H.P.; visualization, S.Y.O.; supervision, Y.H.P.; project administration, Y.H.P.; funding acquisition, Y.H.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the MMU Postdoctoral Research Fellow Grant, MMUI/250007.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The EEG data supporting the findings of this study are publicly available as the UCI Epileptic Seizure Recognition dataset, originally collected by the University of Bonn [28] and shared via the UCI Machine Learning Repository. The dataset is fully de-identified and pre-processed for research use. It can also be accessed on Kaggle at https://www.kaggle.com/datasets/harunshimanto/epileptic-seizure-recognition (accessed on 1 July 2025), reference number [23]. Further inquiries may be directed to the corresponding author, Y. H. Pang, upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

World Health Organization. “Epilepsy”, World Health Organization Fact Sheets. 2024. Available online: https://www.who.int/news-room/fact-sheets/detail/epilepsy (accessed on 1 July 2025).
Moon, S.; Lee, W. Privacy-preserving federated learning in healthcare. In Proceedings of the 2023 International Conference on Electronics, Information, and Communication (ICEIC), Singapore, 5-8 February 2023; IEEE: New York, NY, USA, 2023; pp. 1–4. [Google Scholar] [CrossRef]
Chen, M.; Shlezinger, N.; Poor, H.V.; Eldar, Y.C.; Cui, S. Communication-efficient federated learning. Proc. Natl. Acad. Sci. USA 2021, 118, e2024789118. [Google Scholar] [CrossRef] [PubMed]
Chen, M.; Mao, B.; Ma, T. FedSA: A staleness-aware asynchronous Federated Learning algorithm with non-IID data. Future Gener. Comput. Syst. 2021, 120, 1–12. [Google Scholar] [CrossRef]
Li, Q.; Wen, Z.; Wu, Z.; Hu, S.; Wang, N.; Li, Y.; Liu, X.; He, B. A survey on federated learning systems: Vision, hype and reality for data privacy and protection. IEEE Trans. Knowl. Data Eng. 2021, 35, 3347–3366. [Google Scholar] [CrossRef]
Huang, W.; Ye, M.; Shi, Z.; Wan, G.; Li, H.; Du, B.; Yang, Q. Federated learning for generalization, robustness, fairness: A survey and benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 9387–9406. [Google Scholar] [CrossRef] [PubMed]
Xu, C.; Qu, Y.; Xiang, Y.; Gao, L. Asynchronous federated learning on heterogeneous devices: A survey. Comput. Sci. Rev. 2023, 50, 100595. [Google Scholar] [CrossRef]
Lim, Z.Y.; Pang, Y.H.; Ooi, S.Y.; Khoh, W.H.; Hiew, F.S. MLTCN-EEG: Metric learning-based temporal convolutional network for seizure EEG classification. Neural Comput. Appl. 2024, 37, 2849–2875. [Google Scholar] [CrossRef]
Wang, Z.; Zhang, Z.; Tian, Y.; Yang, Q.; Shan, H.; Wang, W.; Quek, T.Q. Asynchronous federated learning over wireless communication networks. IEEE Trans. Wirel. Commun. 2022, 21, 6961–6978. [Google Scholar] [CrossRef]
Tran, L.V.; Tran, H.M.; Le, T.M.; Huynh, T.T.; Tran, H.T.; Dao, S.V. Application of machine learning in epileptic seizure detection. Diagnostics 2022, 12, 2879. [Google Scholar] [CrossRef] [PubMed]
Aayesha Qureshi, M.B.; Afzaal, M.; Qureshi, M.S.; Fayaz, M. Machine learning-based EEG signals classification model for epileptic seizure detection. Multimed. Tools Appl. 2021, 80, 17849–17877. [Google Scholar] [CrossRef]
Assali, I.; Blaiech, A.G.; Abdallah, A.B.; Khalifa, K.B.; Carrere, M.; Bedoui, M.H. CNN-based classification of epileptic states for seizure prediction using combined temporal and spectral features. Biomed. Signal Process. Control. 2023, 82, 104519. [Google Scholar] [CrossRef]
Zhao, W.; Wang, W.F.; Patnaik, L.M.; Zhang, B.C.; Weng, S.J.; Xiao, S.X.; Wei, D.Z.; Zhou, H.F. Residual and bidirectional LSTM for epileptic seizure detection. Front. Comput. Neurosci. 2024, 18, 1415967. [Google Scholar] [CrossRef] [PubMed]
Rajora, R.; Kumar, A.; Malhotra, S.; Sharma, A. Data security breaches and mitigating methods in the healthcare system: A review. In Proceedings of the 2022 International Conference on Computational Modelling, Simulation and Optimization (ICCMSO); IEEE: New York, NY, USA, 2022; pp. 325–330. [Google Scholar]
Baghersalimi, S.; Teijeiro, T.; Atienza, D.; Aminifar, A. Personalized real-time federated learning for epileptic seizure detection. IEEE J. Biomed. Health Inform. 2022, 26, 898–909. [Google Scholar] [CrossRef] [PubMed]
Chen, Y.; Ning, Y.; Slawski, M.; Rangwala, H. Asynchronous online federated learning for edge devices with non-IID data. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data); IEEE: New York, NY, USA, 2020; pp. 15–24. [Google Scholar]
Liu, J.; Jia, J.; Che, T.; Huo, C.; Ren, J.; Zhou, Y.; Dai, H.; Dou, D. Fedasmu: Efficient asynchronous federated learning with dynamic staleness-aware model update. In Proceedings of the AAAI Conference on Artificial Intelligence; AAAI Publications: Washington, DC, USA, 2024; Volume 38, pp. 13900–13908. [Google Scholar] [CrossRef]
Zhou, Y.; Pang, X.; Wang, Z.; Hu, J.; Sun, P.; Ren, K. Towards efficient asynchronous federated learning in heterogeneous edge environments. In Proceedings of the *IEEE INFOCOM 2024-IEEE Conference on Computer Communications*; IEEE: New York, NY, USA, 2024; pp. 2448–2457. [Google Scholar]
Ma, Y.; Huang, Z.; Su, J.; Shi, H.; Wang, D.; Jia, S.; Li, W. A multi-channel feature fusion CNN-BI-LSTM epilepsy EEG classification and prediction model based on attention mechanism. IEEE Access 2023, 11, 62855–62864. [Google Scholar] [CrossRef]
Sinha, D.; El-Sharkawy, M. Thin mobilenet: An enhanced mobilenet architecture. In Proceedings of the 2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON); IEEE: New York, NY, USA, 2019; pp. 0280–0285. [Google Scholar]
Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2017; pp. 1251–1258. [Google Scholar]
Jagtap, S.; Yadav, D.M. Optimizing stress detection: Harnessing MobileNet-V2 with azimuthal EEG mapping. In Proceedings of the 2024 11th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO); IEEE: New York, NY, USA, 2024; pp. 1–6. [Google Scholar]
Wu, Q.; Fokoue, E. Epileptic Seizure Recognition Data Set. Rochester Inst. Technol. 2017. Available online: https://www.kaggle.com/datasets/harunshimanto/epileptic-seizure-recognition (accessed on 1 July 2025).
Guha, A.; Ghosh, S.; Roy, A.; Chatterjee, S. Epileptic seizure recognition using deep neural network. In Emerging Technology in Modelling and Graphics; Springer: Berlin/Heidelberg, Germany, 2020; pp. 21–28. [Google Scholar] [CrossRef]
Nikoupour, M.; Keyvanpour, M.R.; Shojaedini, S.V. A robust framework for epileptic seizure diagnosis: Utilizing GRU-CNN architectures in EEG signal analysis. In Proceedings of the 20th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP); IEEE: New York, NY, USA, 2024; pp. 1–6. [Google Scholar] [CrossRef]
Ahmad, I.; Wang, X.; Javeed, D.; Kumar, P.; Samuel, O.W.; Chen, S. A hybrid deep learning approach for epileptic seizure detection in EEG signals. IEEE J. Biomed. Health Inform. 2023, 30, 1019–1029. [Google Scholar] [CrossRef] [PubMed]
Khalil, K.; Kumar, A.; Bayoumi, M. Accurate hardware predictor for epileptic seizure. IEEE Trans. Circuits Syst. I Regul. Pap. 2025, 72, 2153–2166. [Google Scholar] [CrossRef]
Qi, L.; Li, F.; Shang, J.; Ge, D.; Wang, S.; Yuan, S. Epileptic seizure prediction using multi-strategy data augmentation and hierarchical contrastive learning. IEEE J. Biomed. Health Inform. 2025, 30, 2023–2033. [Google Scholar] [CrossRef] [PubMed]
Lim, Z.Y.; Pang, Y.H.; Ooi, S.Y.; Sekaran, S.R.; Chew, Y.J. FCEEG: Federated learning-based seizure diagnosis through electroencephalogram (EEG) analysis. Cogent Eng. 2025, 12, 2547636. [Google Scholar] [CrossRef]

Figure 1. The proposed federated learning system for seizure EEG classification in the medical field.

Figure 2. Overall process of FDSTCN-EEG on EEG signals for seizure classification.

Figure 3. Seizure EEG signal (left) vs. healthy EEG signal (right).

Figure 4. TCN dilated causal convolution process.

Figure 5. Normal TCN model architecture.

Figure 6. Architecture of depthwise separable convolution in the FDSTCN-EEG.

Figure 7. Comparison of asynchronous vs. synchronous aggregation timing. (a) Asynchronous aggregation allows immediate model updates upon client completion, without waiting for the other clients to complete training within an epoch, leading to faster overall convergence. (b) Synchronous aggregation must wait for all clients, causing idle time and slower progress.

Figure 8. Training loss comparison across four configurations: (a) normal TCN with synchronous aggregation, (b) normal TCN with asynchronous aggregation, (c) depthwise separable TCN with synchronous aggregation, (d) FDSTCN-EEG (depthwise separable TCN with asynchronous aggregation). This unified visualization demonstrates the convergence advantages of asynchronous methods and parameter-efficient architectures.

Table 1. Total time taken for 30 training rounds in seconds for each model.

Model	Average Time Taken for 30 Training Rounds (Seconds)	95% CI	Standard Deviation
Normal TCN with Synchronous Aggregation	4658.63	[4651.71, 4670.01]	46.12
Normal TCN with Asynchronous Aggregation	3788.13	[3774.37, 3795.25]	52.61
Depthwise Separable TCN with Synchronous Aggregation	4668.07	[4656.34, 4675.94]	49.39
Depthwise Separable TCN with Asynchronous Aggregation (FDSTCN-EEG)	2864.57	[2851.61, 2872.45]	52.51

Table 2. Number of parameters for normal TCN and FDSTCN-EEG.

Model	Number of Parameters
Normal TCN	9,769,436
FDSTCN-EEG	5,823,508

Table 3. Confidence interval analysis of FDSTCN-EEG on the UCI Seizure Dataset over 100 independent training runs.

Metric	Mean	95% CI	Standard Deviation
Accuracy	0.9696	[0.9689, 0.9698]	0.002190
Recall	0.9698	[0.9694, 0.9703]	0.002301
Precision	0.9706	[0.9702, 0.9713]	0.002618
Specificity	0.9694	[0.9692, 0.9701]	0.002132
F1-Score	0.9702	[0.9696, 0.9706]	0.002567

Table 4. Wilcoxon signed-rank test results for the UCI Seizure Dataset, comparing performance metrics against baseline thresholds.

Metric	Wilcoxon W Value	p-Value
Accuracy	125,250.0	6.32 × 10⁻⁸⁴
Recall	125,250.0	6.32 × 10⁻⁸⁴
Precision	125,250.0	6.32 × 10⁻⁸⁴
Specificity	125,250.0	6.32 × 10⁻⁸⁴
F1-Score	125,250.0	6.32 × 10⁻⁸⁴
Precision vs. Recall	37,789.0	1.55 × 10⁻¹⁴
Accuracy vs. F1-Score	46,821.0	1.01 × 10⁻⁶
Recall vs. Specificity	52,656.0	2.04 × 10⁻³

Table 5. Performance metrics of the existing deep learning models and the proposed model.

Model	Accuracy	Recall	Precision	Specificity	F1-Score
CNN-Bi-LSTM [19] (without FL)	-	0.7760	0.7762	-	0.7760
DNN [24] (without FL)	0.80	0.80	0.64	-	-
GRU-CNN [25] (without FL)	0.7352	-	-	-	-
1D CNN-BiLSTM + TBPTT [26] (without FL)	-	-	-	0.988	-
CAE-ELSTM on CHB-MIT Dataset (without FL) [27]	0.9932	-	0.9929	0.9930	-
Lightweight SE-EEGNet on CHB-MIT Dataset (without FL) [28]	0.9451	-	0.9505	-	-
Federated Learning Res1DCNN on EPILEPSIAE Dataset [15]			0.8125	0.82
FCEEG on UCI Seizure Dataset [29]	0.8766	0.9995	0.7586	0.9996	0.8625
Federated Learning TCN (FL-TCN) on UCI Seizure Dataset	0.9698	0.9644	0.9768	0.9625	0.9706
Federated Learning TCN (FL-TCN) on CHB-MIT Dataset	0.8315	0.8626	0.8135	0.8	0.8373
FDSTCN-EEG on UCI Seizure Dataset	0.9696	0.9698	0.9706	0.9694	0.9702
FDSTCN-EEG on CHB-MIT Dataset	0.8591	0.9148	0.8242	0.8028	0.8672

Table 6. Confusion matrix for FDSTCN-EEG on CHB-MIT dataset.

	Predicted Positive	Predicted Negative	Total
Actual Positive	TP = 248	FN = 39	287
Actual Negative	FP = 57	TN = 230	287
Total	305	229	574

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lim, Z.Y.; Pang, Y.H.; Ooi, S.Y.; Khoh, W.H.; Chew, Y.J. FDSTCN-EEG: Federated Depthwise Separable Temporal Convolutional Networks for Decentralized EEG Seizure Detection. AI 2026, 7, 101. https://doi.org/10.3390/ai7030101

AMA Style

Lim ZY, Pang YH, Ooi SY, Khoh WH, Chew YJ. FDSTCN-EEG: Federated Depthwise Separable Temporal Convolutional Networks for Decentralized EEG Seizure Detection. AI. 2026; 7(3):101. https://doi.org/10.3390/ai7030101

Chicago/Turabian Style

Lim, Zheng You, Ying Han Pang, Shih Yin Ooi, Wee How Khoh, and Yee Jian Chew. 2026. "FDSTCN-EEG: Federated Depthwise Separable Temporal Convolutional Networks for Decentralized EEG Seizure Detection" AI 7, no. 3: 101. https://doi.org/10.3390/ai7030101

APA Style

Lim, Z. Y., Pang, Y. H., Ooi, S. Y., Khoh, W. H., & Chew, Y. J. (2026). FDSTCN-EEG: Federated Depthwise Separable Temporal Convolutional Networks for Decentralized EEG Seizure Detection. AI, 7(3), 101. https://doi.org/10.3390/ai7030101

Article Menu

FDSTCN-EEG: Federated Depthwise Separable Temporal Convolutional Networks for Decentralized EEG Seizure Detection

Abstract

1. Introduction

2. Related Works

3. Methodology

3.1. Datasets

3.2. Federated Learning Setup

3.2.1. Client Selection and Count Rationale

3.2.2. Non-IID Data Simulation

3.2.3. Privacy Guarantees:

3.3. FDSTCN-EEG: Federated Depthwise Separable Temporal Convolutional Networks

3.3.1. Depthwise Separable Temporal Convolutional Network

3.3.2. Asynchronous Federated Aggregation Method

4. Results and Discussion

4.1. Training Performance

4.2. Classification Performance

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI