Next Article in Journal
Enhanced Distributed Energy-Efficient Clustering (DEEC) Protocol for Wireless Sensor Networks: A Modular Implementation and Performance Analysis
Previous Article in Journal
The Impact of a Modality Switch During Isokinetic Leg Extensions on Performance Fatigability and Neuromuscular Patterns of Response
Previous Article in Special Issue
A Comprehensive Survey of Privacy-Enhancing and Trust-Centric Cloud-Native Security Techniques Against Cyber Threats
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

FedSW-TSAD: SWGAN-Based Federated Time Series Anomaly Detection

1
Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, China
2
Shandong Data Open Innovative Application Laboratory, Qingdao 266580, China
3
Shandong Key Laboratory of Intelligent Oil & Gas Industrial Software, Qingdao 266580, China
*
Author to whom correspondence should be addressed.
Sensors 2025, 25(13), 4014; https://doi.org/10.3390/s25134014 (registering DOI)
Submission received: 3 May 2025 / Revised: 7 June 2025 / Accepted: 22 June 2025 / Published: 27 June 2025
(This article belongs to the Special Issue AI-Driven Security and Privacy for IIoT Applications)

Abstract

As distributed sensing technologies evolve, the collection of time series data is becoming increasingly decentralized, which introduces serious challenges for both model training and data privacy protection. In response to this trend, federated time series anomaly detection enables collaborative analysis across distributed sensing nodes without exposing raw data. However, federated anomaly detection experiences issues with unstable training and poor generalization due to client heterogeneity and the limited expressiveness of single-path detection methods. To address these challenges, this study proposes FedSW-TSAD, a federated time series anomaly detection method based on the Sobolev–Wasserstein GAN (SWGAN). It leverages the Sobolev–Wasserstein constraint to stabilize adversarial training and combines discriminative signals from both reconstruction and prediction modules, thereby improving robustness against diverse anomalies. In addition, FedSW-TSAD adopts a differential privacy mechanism with L2-norm-constrained noise injection, ensuring privacy in model updates under the federated setting. The experimental results determined using four real-world sensor datasets demonstrate that FedSW-TSAD outperforms existing methods by an average of 14.37% in the F1-score while also enhancing gradient privacy under the differential privacy mechanism. This highlights the practical value of FedSW-TSAD for privacy-preserving anomaly detection in sensor-based monitoring systems such as industrial IoT, remote diagnostics, and predictive maintenance.

1. Introduction

With the rapid development of distributed sensing and edge computing, massive volumes of time series data are continuously collected and stored by various sensors [1]. These multivariate time series often exhibit complex inter-variable correlations and temporal structures, which can be leveraged by time series anomaly detection (TSAD) methods to identify abnormal points or segments that deviate from normal patterns. Accordingly, TSAD has become a core technique for monitoring dynamic systems across domains such as industrial process control [2,3], sensor-based healthcare monitoring [4], and critical infrastructure protection [5].
Most existing TSAD methods are designed for centralized settings and typically assume access to all training data on a single node [6]. However, in real-world deployments, time series data are often generated and stored locally by different nodes that form distributed sensor networks. Due to privacy concerns and bandwidth constraints, these nodes cannot upload raw data to a central server, making centralized TSAD approaches difficult to apply. This not only hinders the effective utilization of locally collected data but also limits the scalability and practicality of distributed sensor networks.
Facing these limitations, federated learning [7] (FL) offers a practical solution by enabling collaborative model training across distributed nodes, each treated as a federated client, without sharing raw data. In an FL framework, local models are trained independently on clients and periodically aggregated by a central server to form a global model. This paradigm has been successfully applied to privacy-sensitive domains, where it effectively balances collaboration and privacy [8]. Nonetheless, applying TSAD in federated environments is far from trivial, as it involves several non-trivial challenges that merit closer examination.
One major challenge in federated TSAD settings is the instability of model training caused by data heterogeneity and inconsistent local optimization [6]. Unsupervised methods are widely adopted for TSAD due to the scarcity of labeled anomalies, among which Generative Adversarial Networks (GANs) [9] have shown particular promise for modeling complex temporal patterns. However, despite their potential, GAN-based anomaly detection methods face significant stability issues, such as mode collapse and training failure, which are widely observed in empirical studies [10]. These issues can prevent the generator from effectively capturing the complex distribution of normal time series data. In a federated setting, this challenge is further exacerbated by client data heterogeneity, where local models may learn divergent data modes, leading to unstable global aggregation and degraded convergence behavior. These problems highlight the need for more stable generative architectures tailored to federated TSAD.
Another key challenge lies in the diversity of anomaly types across clients, which limits the generalization ability of conventional detection paradigms. In unsupervised TSAD, reconstruction-based methods aim to identify anomalies by measuring reconstruction errors, which is typically effective for contextual or collective anomalies. In contrast, prediction-based methods forecast future values and flag deviations as anomalies, showing better sensitivity to point anomalies or abrupt changes [11]. As illustrated in Figure 1, federated clients often encounter heterogeneous anomaly types due to local sensor heterogeneity and diverse operational contexts. Consequently, single-path detection methods struggle to generalize across the federation. A hybrid scoring mechanism that integrates both reconstruction and prediction paths is needed to ensure robust performance under heterogeneous anomaly types across clients in distributed sensor networks.
Building on these observations, this study proposes FedSW-TSAD, a federated time series anomaly detection framework based on an improved Sobolev–Wasserstein GAN (SWGAN). First, to ensure stable training in federated settings, FedSW-TSAD introduces an enhanced SWGAN module, which replaces the Jensen–Shannon (JS) divergence used in standard GANs with the Sobolev–Wasserstein (SW) constraint [12]. The SWGAN module incorporates a Temporal Convolutional Network (TCN) into its generator to capture long-range dependencies and support the high-quality reconstruction of multivariate sequences. This integration enables robust and fine-grained modeling of multivariate time series, improving the detection performance in decentralized environments. Second, to address the heterogeneity of anomaly types across clients, FedSW-TSAD employs an additional prediction-based module. This module is co-trained with the SWGAN model to jointly optimize detection performance. During inference, anomaly scores from both modules are fused into a unified detection metric, leveraging the complementary strengths of reconstruction and prediction. Finally, to preserve privacy throughout the federated training process, an L2-norm-constrained noise-injection mechanism is applied to model updates. This mechanism enforces formal privacy guarantees while maintaining model utility. Together, these design choices result in stable training, robust detection across heterogeneous clients, and effective privacy protection. FedSW-TSAD further achieves significant performance gains over existing baselines, making it a practical and effective solution for federated anomaly detection in sensor-based monitoring systems.
The main contributions of this study are as follows:
  • A novel framework named FedSW-TSAD is proposed, incorporating an improved Sobolev–Wasserstein GAN with a Temporal Convolutional Network. This design leads to more stable convergence and better anomaly detection performance in federated learning over distributed sensor networks.
  • A hybrid scoring mechanism is developed, where reconstruction-based and prediction-based modules are jointly optimized to leverage their complementary strengths. This approach improves robustness in complex distributed environments with diverse anomaly types and client behaviors.
  • Comprehensive experiments are conducted on four real-world sensor datasets. FedSW-TSAD consistently outperforms both centralized and federated baselines, achieving average F1-score improvements of 4.27% and 14.37% over the strongest centralized and federated baselines, respectively. Furthermore, a case study demonstrates that the proposed differential privacy mechanism reduces gradient leakage risk.
The remainder of this study is structured as follows. Section 2 reviews related work on time-series anomaly detection and federated learning. Section 3.1 introduces the SW-TSAD model, which serves as the centralized backbone combining Sobolev–Wasserstein-based adversarial reconstruction mechanisms and hybrid anomaly scoring. Section 3.2 extends the framework to the federated setting, proposing FedSW-TSAD with privacy-preserving and robustness-enhancing techniques. Section 4 presents the experimental setup, baseline comparisons, and quantitative results. Section 5 offers a comprehensive analysis including ablation, efficiency, robustness, hyperparameter sensitivity, and the impact of privacy mechanisms. Finally, Section 6 concludes the paper and outlines future directions.

2. Related Work

2.1. Time Series Anomaly Detection

Time series anomaly detection is a reliable solution for maintaining modern systems’ safety and has long been a prominent research focus. Due to the scarcity of labeled anomalies, unsupervised learning is the dominant paradigm in most practical TSAD applications. Under this paradigm, existing TSAD methods can be broadly categorized into classical statistical approaches, prediction-based techniques, and reconstruction-based techniques.
Classical statistical methods have historically served as foundational tools in unsupervised TSAD. One representative approach is the Local Outlier Factor (LOF) [13], which detects anomalies by measuring the local data density. The Isolation Forest (IF) [14], in contrast, detects anomalies by randomly partitioning data points and measuring the ease of isolation. A notable early attempt to model temporal dependencies is the Hidden Markov Model (HMM) [15], which assumes fixed-order Markovian dynamics and predefined transition structures. Other commonly used techniques include PCA [16], KNN [17], and one-class SVM (OC-SVM) [18]. While computationally lightweight and interpretable, these methods often rely on strong distributional assumptions and struggle to capture the nonlinear or temporal dependencies in multivariate time series. As a result, they are often outperformed by deep learning methods in complex real-world scenarios.
Prediction-based methods identify anomalies by first forecasting future values, then measuring the deviation between predicted and observed data points. In practice, researchers typically train a prediction model on normal time series to learn temporal patterns, and prediction errors are used as anomaly scores during inference [19,20,21]. Accordingly, several deep models have shown promising results in different application scenarios. Neural architectures such as LSTM-based models [22,23] achieve strong performance in telemetry systems. In addition, hybrid frameworks combining VAE and GBDT [24] have demonstrated utility in smart grid monitoring. Prediction-based methods are particularly effective in capturing abrupt deviations, making them well-suited for detecting point anomalies. However, their reliance on short-term forecasting limits their ability to handle contextual and collective anomalies with complex temporal structures. Beyond anomaly detection, it is worth noting that time series prediction itself has been widely studied in various application scenarios. For example, models based on recurrent neural networks and fully connected architectures have demonstrated strong capabilities in COVID-19 trend forecasting [25] and high-speed train vibration prediction [26], highlighting the importance of predictive modeling for complex temporal systems.
Reconstruction-based methods detect anomalies by learning low-dimensional representations of normal patterns and identifying deviations in reconstructions. These models reconstruct inputs from compressed representations and assume anomalies cannot be accurately recovered due to their rarity and unpredictability. Approaches including LSTM-Autoencoders [27,28], Variational Autoencoders (VAEs) [29,30], and DAGMM [31] have demonstrated effectiveness in modeling complex multivariate time series. By modeling long-range temporal dependencies, reconstruction-based methods are positioned well to identify contextual and collective anomalies. Nevertheless, these methods often result in false negatives and unstable training behavior.
Among reconstruction-based methods, Generative Adversarial Networks have attracted considerable attention due to their powerful generative capabilities and potential to model complex temporal dependencies. Representative methods such as MAD-GAN [32] and TadGAN [33] leverage adversarial training to improve reconstruction fidelity and capture nonlinear dynamics in multivariate TSAD tasks. Despite these advantages, GAN-based models also suffer from notorious training issues like mode collapse and gradient vanishing. Such instability tends to be exacerbated in federated environments, where data heterogeneity and inconsistent local updates undermine convergence. Consequently, existing approaches still fail to provide a robust TSAD framework that combines the strengths of reconstruction and prediction while ensuring stable training under distributed sensor networks.

2.2. Federated Learning

Federated Learning (FL) [7], an innovative distributed machine learning framework, facilitates collaborative model training without requiring direct access to raw data. It has gained increasing attention in privacy-sensitive domains, including IoT and healthcare, where time series data are naturally distributed across edge devices. For instance, DIoT [34] demonstrates the effectiveness of FL-based anomaly detection in device communication monitoring, while [35] applies FL to cross-institutional healthcare data to detect abnormal patterns early.
Recent studies have also systematically investigated the integration of federated learning with TSAD. These studies explore the challenges and solutions for time series anomaly detection in federated settings, which is most relevant to our work. FedTADBench [36] provides a systematic benchmark, evaluating the performance of mainstream TSAD algorithms (e.g., LSTM-AE, USAD, and GDN) under federated settings with varying client partitions and aggregation strategies. The benchmark primarily reports AUC-ROC and AUC-PR as performance metrics across different scenarios. PeFAD [6] improves communication efficiency by employing pre-trained language models (PLMs) as the local model backbone, a parameter-efficient training scheme, and knowledge distillation on a synthetic shared dataset. Its performance is evaluated in terms of its F1-score, detection accuracy, and communication cost. FedAnomaly [37] adopts a variational autoencoder with convolutional and recurrent encoders (ConvGRU) to model spatiotemporal dependencies at the edge and evaluates its performance in terms of its F1-score, detection latency, and ability to detect contiguous anomaly segments.
Although these methods demonstrate promising results, most rely solely on either reconstruction or prediction-based strategies, limiting their generalization under diverse anomaly types. Furthermore, while some studies have explored the combination of federated learning with differential privacy in other domains, such as intrusion detection systems [38], similar efforts remain rare in federated TSAD. Generative instability also remains underexplored, especially in federated environments, where non-IID data and inconsistent local updates often exacerbate convergence issues. To address these gaps, FedSW-TSAD was created: a framework designed to stabilize adversarial training, integrate complementary reconstruction and prediction signals, and incorporate differential privacy mechanisms to improve robustness and preserve privacy across FL clients.

3. Materials and Methods

This section presents the proposed frameworks for time series anomaly detection. It first introduces SW-TSAD, a centralized model that integrates adversarial reconstruction and temporal prediction. Building upon this foundation, FedSW-TSAD is developed to enable privacy-preserving federated training across distributed sensor networks.

3.1. SWGAN-Based Time Series Anomaly Detection

SW-TSAD serves as the local modeling framework of FedSW-TSAD, combining adversarial reconstruction and temporal prediction to enable robust time series anomaly detection. It begins by formalizing the problem of multivariate time series anomaly detection, followed by a detailed presentation of the proposed model, which integrates a Sobolev-Wasserstein GAN for generative reconstruction and a parallel LSTM-based predictor for temporal forecasting. The joint training and hybrid scoring mechanism are further elaborated, along with the complete algorithmic workflow.

3.1.1. Problem Description

In multivariate time series anomaly detection, the input data is typically collected by a network of sensors. Each sensor continuously records measurements over time, forming a multivariate time series. Formally, let x = [ x 1 , x 2 , , x T ] denote a multivariate time series of length T, where each observation x t R M represents measurements from M sensors (or features) at time step t. Thus, the entire time series x has a matrix form of size R T × M . To facilitate training, x is segmented into N fixed-length subsequences of size S, using a sliding window with stride d. Here, the stride d determines the step size between two adjacent windows, and  N = T S d + 1 . This segmentation yields a training tensor X R N × S × M , where each subsequence x i , 1 : S R S × M is a contiguous temporal window of x .
Based on the segmented input, the TSAD model in this paper is designed to jointly optimize three components: a predictor P, a discriminator D, and a generator G, with corresponding parameters Θ = ( η , ω , θ ) . In the training phase, the objective is to minimize the average loss across all N input subsequences:
Θ * = arg min Θ 1 N i = 1 N L ( Θ ; x i , 1 : S )
where L ( Θ ; x i , 1 : S ) denotes the loss function for the i-th subsequence.
During inference, the trained model with optimal parameters Θ * = ( η * , ω * , θ * ) is applied to a set of subsequences X = x 1 , 1 : S , x 2 , 1 : S , , x T , 1 : S extracted from the test time series x . For each time step t, an anomaly score is computed as follows:
A D s c o r e ( t ) = α R s c o r e ( t ) ( x t , 1 : S ; θ * ) + β D s c o r e ( t ) ( x t , 1 : S ; ω * ) + γ P s c o r e ( t ) ( x t , 1 : S ; η * )
where R s c o r e , D s c o r e , and  P s c o r e denote the reconstruction, discrimination, and prediction scores, with  α , β , and  γ as their respective weights. A higher A D s c o r e ( t ) indicates a higher likelihood that the observation at time step t is anomalous.
The final output of the TSAD model is a binary indicator sequence y ^ { 0 , 1 } T , where y ^ ( t ) = 1 denotes an anomaly at time t. This is achieved by labeling time steps as anomalies if their anomaly scores A D s c o r e ( t ) exceed a predefined threshold.

3.1.2. Overall Architecture

Figure 2 illustrates the comprehensive architecture of the proposed SW-TSAD model, which serves as the core local anomaly detection component within the broader FedSW-TSAD framework. The input time series waveform is first processed through a fixed-size sliding window mechanism, generating a set of overlapping subsequences. Each subsequence, representing a temporal segment of the raw data, then serves as the input to the SW-TSAD model. The model comprises two primary, jointly optimized modules: a Sobolev–Wasserstein Generative Adversarial Network (SWGAN) module and a prediction module. The SWGAN module trains a generator and a discriminator using real normal data, generator-produced (fake) data, and noise-corrupted data to learn structural and distributional properties of normal subsequences by minimizing the Sobolev-Wasserstein distance between them. This design improves the stability of adversarial training and enhances the model’s sensitivity to subtle anomalies. Concurrently, the prediction module utilizes a Temporal Convolutional Network and Long Short-Term Memory (LSTM) layers to capture long-range temporal dependencies and forecast future time series behaviors. Through this joint optimization, the model learns to encode both temporal dependencies and distributional regularities of normal sequences, capturing complementary aspects of normal time series behavior—namely structural reconstruction, adversarial discrimination, and temporal forecasting. During inference, the trained model assigns an anomaly score to each input segment based on deviations from these learned normal patterns. Specifically, the final anomaly score is robustly obtained by combining distinct signals from both the SWGAN (reflecting reconstruction and discrimination errors) and the prediction modules (quantifying forecasting errors). This fusion of complementary anomaly signals significantly enhances the model’s capacity to detect various anomalies with diverse temporal and structural characteristics. To facilitate a deeper understanding of SW-TSAD, the structure and specific roles of each module are detailed in the subsequent sections.
The SWGAN module comprises a generator and a discriminator. The generator performs conditional reconstruction using input subsequences perturbed with Gaussian noise and mapping them onto the original data manifold. Aided by a Temporal Convolutional Network, it captures long-range temporal dependencies to enhance reconstruction quality. The discriminator, on the other hand, evaluates the realism of generated samples to guide the generator via adversarial feedback. Unlike traditional GANs that rely on Jensen–Shannon divergence and often suffer from instability such as mode collapse or vanishing gradients, this design replaces JS divergence with the SW constraint to enforce a smoother Lipschitz constraint and promote more stable training.
The prediction module is implemented using a two-layer LSTM followed by a linear output layer. It learns to forecast future values based on past observations, thereby capturing sequential dependencies that may not be explicitly modeled by the generator.
To produce the final anomaly score, SW-TSAD combines the outputs from all three branches: the reconstruction score R s c o r e from the generator, the discrimination score D s c o r e from the discriminator, and the prediction score P s c o r e from the predictor. This hybrid scoring strategy allows the model to leverage both reconstruction fidelity and temporal consistency. As shown in Figure 2, test data are processed in a sliding-window fashion, where each subsequence is evaluated by all three branches. The final anomaly score is computed through a weighted combination of the three outputs, which is further detailed in Section 3.1.4.

3.1.3. SWGAN Module

Adversarial learning in GANs is known to suffer from instability issues, such as mode collapse, vanishing gradients, and poor convergence. These problems are particularly critical for time series anomaly detection, where the generator must accurately learn the distribution of normal time series patterns. If the generator fails to model this distribution, it produces biased or incomplete reconstructions, causing the reconstruction errors to lose their discriminative power. As a result, both false alarms and missed anomalies may occur, degrading the model’s reliability. The fundamental cause of these instability issues lies in using the Jensen–Shannon divergence as the optimization objective in standard GANs. To address this, we adopt the Sobolev–Wasserstein constraint, which replaces the JS divergence and provides a more stable and theoretically grounded alternative. The original formulation of the SWGAN is defined as follows:
min G max D L S ( D w , G θ ) = E x P r D w ( x ) E z P z D w ( G θ ( z ) ) ,
with the constraint that
E x μ x i , x j x D w ( x ) 2 1 , x i P r , x j P g
where G θ is the generator parameterized by θ , D w is the discriminator parameterized by w, P r is the real data distribution, P g is the generator-induced distribution, P z is the distribution of latent noise vectors z used as inputs for the generator (typically a Gaussian or uniform distribution), and  μ x i , x j denotes the uniform interpolation distribution between real subsequences x i P r and generated subsequences x j P g . The constraint enforces smooth discriminator behavior across the interpolated input space, helping stabilize adversarial training and improving generalization.
Although the original SWGAN enforces this constraint via Lagrangian multipliers and slack variables [12], such formulations increase optimization complexity and are sensitive to hyperparameters. To simplify training and improve stability, SW-TSAD retains the core objective of SWGAN but replaces the constraint mechanism with a gradient penalty, which softly enforces the Lipschitz condition by adding a regularization term to the discriminator loss. This formulation eliminates the need for auxiliary variables and enables more stable adversarial optimization. Based on the reformulated objective, the implementation details of the SWGAN module are introduced below.
Given a time series segmented into N fixed-length subsequences { x i , 1 : S } i = 1 N , which collectively form the training set, the generator G ( · ; θ ) reconstructs the values for time steps in 1 , t 0 , where t 0 denotes the reconstruction horizon. A noise term z R t 0 × M , sampled from a standard normal distribution N ( 0 , 1 ) , is added to the target segment to introduce stochasticity into the generative process. As a result, the reconstructed segment is given by x ^ i , 1 : t 0 = G ( x i , 1 : t 0 + z ; θ ) . The discriminator model D ( · ; ω ) then validates the closeness between the output value of G ( · ; θ ) and the true value of the target range. The objective function for the discriminator is given by
L D = 1 N i = 1 N D ( x ^ i , 1 : t 0 ; ω ) 1 N i = 1 N D ( x i , 1 : t 0 ; ω ) + λ 1 N i = 1 N x ¯ i D ( x ¯ i ; ω ) 2 1 2
where x i ¯ = ϵ · x i , 1 : t 0 + ( 1 ϵ ) · x ^ i , 1 : t 0 and ϵ U [ 0 , 1 ] . Here, λ is a regularization coefficient that controls the strength of the Sobolev gradient penalty.
The objective function of the generator is
L G = 1 N i = 1 N D ( x ^ i , 1 : t 0 ; ω )
This adversarial training process proceeds by alternately updating the discriminator and generator. The discriminator learns to maximize its ability to distinguish real from generated data, while the generator is optimized to fool the discriminator by generating reconstructions that align closely with the structural characteristics of normal data. To further support this objective, a Temporal Convolutional Network is incorporated into the generator to improve its ability to model complex temporal patterns and capture long-range dependencies.
As shown in Figure 3, the Temporal Convolutional Network in SW-TSAD is designed to efficiently capture temporal dependencies while maintaining stability and computational efficiency. The architecture is structured as follows:
  • Residual Block Design: TCN employs residual connections to facilitate stable gradient propagation and accelerate convergence. Each residual block consists of multiple convolutional layers interleaved with non-linear activations, enabling effective feature extraction while preserving historical information.
  • Temporal Processing via Convolutional Layers: The network utilizes 1D convolutional layers (Conv1D) to model sequential dependencies. To expand the receptive field without disrupting temporal alignment, a constant padding (ConstantPad1d) operation is applied before convolution.
  • Activation and Transformation: Each convolutional layer is followed by a LeakyReLU activation function, which introduces non-linearity while mitigating vanishing gradient issues. Permutation operations (Permute) are applied before and after convolution to ensure proper dimensional alignment for temporal sequence processing.
  • Residual Connections and Feature Fusion: The output from the convolutional layers is merged with the original input via an addition operation (Residual + Add), preserving low-level information and enhancing gradient flow across layers. This design mitigates degradation issues in deep networks and promotes feature reuse.
Figure 3. The architecture of the Temporal Convolutional Network (TCN) in the SWGAN module.
Figure 3. The architecture of the Temporal Convolutional Network (TCN) in the SWGAN module.
Sensors 25 04014 g003
Overall, the TCN module enables the generator to extract multi-scale temporal features from noisy or corrupted input segments, contributing to the robustness of sequence reconstruction in the adversarial setting. The SWGAN module and its integrated TCN structure together form a robust generative backbone that effectively captures the temporal and structural regularities of normal time series data, serving as a foundation for subsequent anomaly scoring.

3.1.4. Prediction Module and Anomaly Scoring

In addition to generative modeling, SW-TSAD incorporates a dedicated prediction module to capture temporal dependencies through forecasting. While the SWGAN module emphasizes distributional reconstruction, the prediction module provides an orthogonal perspective by learning temporal continuity, enabling complementary scoring for anomaly detection.
As visualized in Figure 2, the prediction module consists of two LSTM layers followed by a linear, fully connected layer. The predictor P takes multivariate time series data of a given window length as input and outputs predicted time series data for the target window length. Additionally, P is trained in parallel with the SWGAN module. During the testing phase, the prediction error is computed and incorporated into anomaly scoring.
Given an input time series x i , 1 : S , i = 1 , , N , the prediction model P ( · ; η ) with parameter η predicts the values for time steps in t 0 + 1 , t 0 + τ conditioning on time steps in 1 , t 0 . τ is the number of time steps P ( · ; η ) is trained to predict, where t 0 + τ = S . In contrast to the generator, which reconstructs the target segment based on noisy inputs, the predictor learns to forecast future observations from past contexts. That is, x ˜ i , t 0 + 1 : S = P ( x i , 1 : t 0 ; η ) . The time ranges 1 , t 0 and t 0 + 1 , S are referred to as the conditioning range and target range, respectively. The predictor is trained using the general Mean Squared Error (MSE) loss, defined as
L P = 1 N i = 1 N x i , t 0 + 1 : S x ˜ i , t 0 + 1 : S 2
With the training objectives of the SWGAN and the prediction module clearly defined, both components are optimized in parallel during model training. This joint learning enables the model to capture the complementary structural and temporal features of normal time series patterns.
Once the training is complete, the optimized model parameters Θ * = ( η * , ω * , θ * ) are deployed to perform anomaly detection on test sequences x . Inspired by the DR score in MAD-GAN [32], SW-TSAD defines the anomaly score ( A D s c o r e ) as a combination of the generator’s reconstruction error ( R s c o r e ), the discriminator’s discrimination error ( D s c o r e ), and the predictor’s prediction error ( P s c o r e ). For each time step t, the anomaly score is computed as follows:
R s c o r e ( t ) = 1 τ     x t , t 0 + 1 : S G ( x t , t 0 + 1 : S + z ; θ * )     2
D s c o r e ( t ) = D ( x t , t 0 + 1 : S ; ω * ) + 1
P s c o r e ( t ) = 1 τ     x t , t 0 + 1 : S P ( x t , 1 : t 0 ; η * )     2
A D s c o r e ( t ) = α R s c o r e ( t ) + β D s c o r e ( t ) + γ P s c o r e ( t )
where α , β , and  γ are weight parameters satisfying α + β + γ = 1 , used to balance R ( t ) s c o r e , D ( t ) s c o r e , and  P s c o r e ( t ) . These weights can be selected empirically based on validation performance. Anomaly scores are used to detect anomalous points based on a predefined threshold during the anomaly detection phase. The threshold can be set using various methods, and if a point’s anomaly score exceeds the threshold, it is classified as anomalous.
The introduction of the prediction module, along with the unified anomaly scoring mechanism, allows SW-TSAD to jointly exploit reconstruction, discrimination, and forecasting errors. This design enables the model to better capture both structural and temporal anomalies, ultimately enhancing detection performance across diverse time series patterns.

3.1.5. SW-TSAD Workflow

To consolidate the design of SW-TSAD, this section summarizes the complete training workflow, which jointly optimizes the generative and predictive components. As detailed in Algorithm 1, the model operates in a hybrid optimization loop. In each training round, the predictor P and discriminator D are first updated using their respective loss functions (Equations (5) and (7)). These two components provide complementary learning signals—P captures temporal continuity via forecasting, while D focuses on structural regularity by distinguishing between real and generated sequences.
Algorithm 1 SWGAN-based Time Series Anomaly Detection Strategy.
  1:
Input: training set X ; testing set X ; time window length S; conditional length t 0 ; target length τ ; batch size m; gradient penalty coefficient λ ; weight parameters ( α , β , γ ) ; discriminator update steps n critic ; Adam optimizer parameters ( ϕ , ψ 1 , ψ 2 ) ; initialized parameters Θ = ( η , ω , θ ) .
  2:
Output: Anomaly score A D s c o r e .
  3:
//Training Phase
  4:
while not converged do
  5:
    for  t = 1 to n critic  do
  6:
        for  i = 1 to m do
  7:
           Sample real subsequence x i , 1 : S X ;
  8:
           Predict: x ˜ t 0 + 1 : t 0 + τ = P ( x i , 1 : t 0 ; η ) ;
  9:
           Sample noise subsequence z N ( 0 , 1 ) ;
10:
           Reconstruct: x ^ i , 1 : t 0 = G ( x i , 1 : t 0 + z ; θ ) ;
11:
           Compute discriminator loss L D ( i ) and predictor loss L P ( i ) using Equations (5) and (7);
12:
        end for
13:
        Compute gradients: η L P = 1 m i = 1 m η L P ( i ) , ω L D = 1 m i = 1 m ω L D ( i ) ;
14:
        Update predictor: η Adam ( η , η L P , ϕ ) ;
15:
        Update discriminator: ω Adam ( ω , ω L D , ϕ , ψ 1 , ψ 2 ) ;
16:
    end for
17:
    Compute generator loss L G using Equation (6);
18:
    Compute gradients: θ L G = 1 m i = 1 m θ L G ( i ) ;
19:
    Update generator: θ Adam ( θ , θ L G , ϕ , ψ 1 , ψ 2 ) ;
20:
end while
21:
Obtain the converged model Θ * = ( η * , ω * , θ * ) ;
22:
//Inference Phase
23:
for each test subsequence x t , 1 : S X  do
24:
    Predict: x ˜ t , t 0 + 1 : S = P ( x t , 1 : t 0 ; η * ) ;
25:
    Sample noise subsequence z N ( 0 , 1 ) ;
26:
    Reconstruct: x ^ t , t 0 + 1 : S = G ( x t , t 0 + 1 : S + z ; θ * ) ;
27:
    Calculate anomaly scores A D s c o r e ( t ) using Equations (8)–(11);
28:
end for
29:
return  A D s c o r e
Subsequently, the generator G is updated by minimizing the loss in Equation (6), encouraging the generation of realistic target segments that align with normal sequence dynamics. To improve adversarial stability, the discriminator is updated multiple times per generator iteration (denoted as n critic ) while following the training scheme of WGAN-GP [39]. All components are optimized using the Adam optimizer. This coordinated learning strategy enables the model to progressively learn fine-grained temporal and structural features, laying the foundation for robust anomaly detection during inference.
After training converges, the optimized model parameters Θ * = ( η * , ω * , θ * ) are used to perform anomaly detection on unseen test sequences. For each test window, the predictor, generator, and discriminator, respectively, yield forecasting, reconstruction, and discrimination errors, which are fused into an anomaly score defined in Equations (8)–(11). This scoring mechanism helps accurately identify abnormal time points based on deviations from learned temporal and structural patterns.

3.2. Federated Time Series Anomaly Detection

In real-world scenarios, time series data are often distributed across multiple locations with strict privacy constraints, preventing the direct deployment of the previously introduced SW-TSAD. Thus, this section proposes FedSW-TSAD, which maintains the advantages of adversarial generation and temporal prediction while incorporating differential privacy mechanisms for secure and efficient collaboration.

3.2.1. Scenario and Architecture

FedSW-TSAD operates in a federated sensing setting where each client holds private multivariate time series data. To preserve privacy, only model updates are exchanged, enabling the collaborative construction of a global anomaly detector without sharing raw data. Formally, let a federation consist of K clients. Each client k owns a local dataset D k = x k , where x k R T k × M and T k denote the sequence length. The total number of time steps across all clients is given by T total = k = 1 K T k . Global anomaly detection is achieved by minimizing a weighted sum of client-specific objectives:
Θ g * = arg min Θ g k = 1 K T k T total L k ( Θ g ; D k )
where Θ g = ( η g , ω g , θ g ) denotes the global model parameters extended from Algorithm 1, and  L k is the loss function computed on client k.
The architecture of FedSW-TSAD is illustrated in Figure 4, which highlights the client-server interaction and privacy-preserving communication pipeline. Each communication round consists of two main steps: local update and global aggregation. In the local update phase, client k performs several training steps on its private dataset D k , resulting in updated parameters Θ k and a corresponding model update Δ Θ k . To protect local information, each client applies L2-norm constrained differential privacy noise to its local update Δ Θ k before transmission:
Δ ˜ Θ k = P ( Δ Θ k ) + N ( 0 , σ 2 I )
where P ( · ) denotes the reprocessing operator, and  σ controls the noise magnitude.
In the global aggregation phase, the central server aggregates noisy updates using weighted averaging:
Δ Θ g ( r ) = k = 1 K T k T total Δ ˜ Θ k ( r )
where r denotes the communication round.
The global model is updated accordingly and broadcast to all clients for the next round. This iterative process allows the global model to gradually integrate knowledge from heterogeneous client distributions while rigorously preserving local privacy. By decoupling data access from training, FedSW-TSAD supports secure and scalable anomaly detection across decentralized time series sources.

3.2.2. Regularization and Differential Privacy Protection

To protect individual-level information in client updates, FedSW-TSAD incorporates ( ε , δ ) -differential privacy via the standard Gaussian mechanism. Each client performs gradient clipping to limit update sensitivity, followed by noise injection calibrated to a desired privacy budget. This ensures that the global model cannot infer specific sample-level information from local updates. In addition, to address the instability caused by heterogeneous local datasets—which may lead to noisy or overfitted updates—FedSW-TSAD applies L2-norm regularization prior to privacy operations. This results in a two-fold mechanism: regularization stabilizes local learning, while privacy-preserving perturbation guarantees data protection.
L2-norm Regularization. As a first step, an L2-norm regularization is applied to suppress excessive dependence on local updates and enforce smoother gradient trajectories. Let Δ Θ denote the local parameter update after client-side training. The regularization objective is defined as the expected L2-norm of the update:
R norm = λ r · E [ Δ Θ 2 ]
where λ r is a regularization strength coefficient.
Next, the local update is adjusted via gradient descent on this regularizer:
Δ Θ Δ Θ Θ R norm
The adjusted update Δ Θ then serves as the input for subsequent privacy-preserving operations, such as gradient clipping and Gaussian noise injection.
Clipping and Noise Injection. To satisfy the sensitivity constraints required by differential privacy, each client applies L2 clipping with a dynamic threshold C:
Δ Θ Δ Θ max ( 1 , | | Δ Θ | | 2 / C ) , C = 1 d in
where d in is the input dimensionality. This bounds the update norm by C and limits the maximum possible influence of any individual sample. Finally, Gaussian noise is added to achieve ( ε , δ ) -differential privacy:
ε N ( 0 , σ 2 I ) , σ = C 2 ln ( 1.25 / δ ) ε
For convenience, the regularization and clipping can be encapsulated by a unified operator P ( · ) :
P ( Δ Θ ) = Clip Δ Θ Θ R norm
Thus, the final privacy-preserving update transmitted to the server becomes
Δ ˜ Θ = P ( Δ Θ ) + N ( 0 , σ 2 I )
Through the combination of Sobolev regularization, sensitivity-aware clipping, and calibrated noise injection, FedSW-TSAD enforces robust and privacy-preserving parameter updates, ensuring that the global model is both generalizable and secure against gradient leakage.

3.2.3. FedSW-TSAD Workflow

To complete the design of FedSW-TSAD, this section presents the federated training workflow, which extends the optimization strategy of SW-TSAD into a privacy-preserving collaborative setting. As detailed in Algorithm 2, the framework adopts a parameter-isolated architecture where raw data remain local and only differentially private model updates are exchanged between clients and the central server.
Algorithm 2 FedSW-TSAD: Federated SWGAN-based Time Series Anomaly Detection.
  1:
Input: time series dataset D = { D k } k = 1 K ; testing set X ; initial global parameters Θ g ( 1 ) = ( η g ( 1 ) , ω g ( 1 ) , θ g ( 1 ) ) ; communication rounds R max ; local training epochs E; DP budget ( ϵ , δ ) ; regularization weight λ r ; anomaly threshold h.
  2:
Output: Final global model Θ g * and anomaly labels y ^ .
  3:
for each round r = 1 to R max  do
  4:
    Parameter Broadcast: Server distributes Θ g ( r ) to all clients;
  5:
    for each client k { 1 , , K }  in parallel do
  6:
        Initialize local model Θ k ( r ) Θ g ( r ) ;
  7:
        Local Training: Train Θ k ( r ) for E epochs using Algorithm 1 (Training Phase);
  8:
        Compute local update: Δ Θ k = Θ k ( r ) Θ g ( r ) ;
  9:
        Apply differential privacy: Δ ˜ Θ k = P ( Δ Θ k ) + N ( 0 , σ 2 I ) ;
10:
        Upload Δ ˜ Θ k to server;
11:
    end for
12:
    Secure Aggregation:  Δ Θ g ( r ) = k = 1 K T k T total Δ ˜ Θ k ;
13:
    Update:  Θ g ( r + 1 ) = Θ g ( r ) + Δ Θ g ( r ) ;
14:
end for
15:
Anomaly Scoring: Compute A D score ( X ) using Algorithm 1 (Inference Phase);
16:
Threshold Judgment:  y ^ I ( A D score ( X ) h ) ;
17:
return  Θ g * , y ^
In each communication round, the server first broadcasts the current global model parameters Θ g ( r ) to all clients. Then, client k performs local training over E epochs on its private dataset D k , obtaining an updated model and the corresponding parameter difference Δ Θ k ( r ) . To protect sensitive information, a privacy-preserving operator P ( · ) which combines regularization and norm clipping is applied to the update, followed by Gaussian noise injection to ensure ( ε , δ ) -differential privacy. The resulting noisy update Δ ˜ Θ k ( r ) is transmitted to the server.
In the aggregation phase, the server performs weighted averaging over all received updates, producing the aggregated update Δ Θ g ( r ) based on client data sizes. The global model is then updated as Θ g ( r + 1 ) = Θ g ( r ) + Δ Θ g ( r ) and redistributed to all clients. This iterative process continues until the global model converges.
After convergence, the optimized global parameters Θ g * = ( η g * , ω g * , θ g * ) are used to compute anomaly scores for test sequences, following the same multi-branch scoring strategy as in SW-TSAD. This federated framework ensures that the anomaly detection model benefits from diverse client data while rigorously preserving data privacy throughout the training process.

4. Results and Discussion

This section presents a comprehensive empirical evaluation of FedSW-TSAD on multivariate time series anomaly detection tasks. It is divided into four parts: (1) a description of the benchmark datasets, (2) a formal statement of the evaluation metrics and implementation guidelines, (3) an overview of the baseline models, and (4) a presentation and analysis of the comparison results with the baselines. Through this structured validation approach, the performance of the proposed model is empirically examined, and its competitive advantages are evaluated relative to existing methods.

4.1. Datasets and Experiment Setup

The evaluation involves four publicly available multivariate time series datasets, whose characteristics are summarized in Table 1. The first two datasets, Server Machine Dataset (SMD) [40] and Pool Server Metrics (PSM) [41], are collected from distributed sensing infrastructures in large-scale computing systems. Specifically, SMD includes real-time sensor readings from 28 monitored server nodes, covering performance indicators such as CPU usage and memory activity, with anomalies labeled based on operational event reports. Proposed by eBay, PSM comprises 26-dimensional system-level measurements recorded by monitoring sensors embedded in application servers. Additionally, two sensor-based satellite telemetry datasets from NASA are included: Soil Moisture Active Passive (SMAP) and the Mars Science Laboratory (MSL) [22]. These datasets provide remote sensing data for Earth and planetary monitoring.
To rigorously evaluate FedSW-TSAD’s performance in realistic non-IID federated environments, we carefully designed our data partitioning strategy for each of the four real-world sensor datasets (PSM, MSL, SMD, and SMAP). For all experiments, we simulated five federated clients. The raw time series data from each dataset were divided into distinct, non-overlapping chronological segments. Each of these equally sized temporal segments was then assigned to a unique client. Overall, this time-based partitioning approach is a highly relevant method for inducing non-IIDness in time series federated learning because it reflects the inherent temporal dynamics of real-world systems.
In all experiments, time series data are segmented using a sliding window approach with a window length S = 7 , where the first t 0 = 6 steps serve as the conditioning input and the last τ = 1 step is used for prediction. The stride is set to 1, resulting in N = T S + 1 subsequences per series. Min-max normalization is applied to ensure consistency across features:
x ˜ = x min ( X train ) max ( X train ) min ( X train )
where min ( X train ) and max ( X train ) denote the minimum and maximum values across all the training samples, respectively.
Model implementation is based on the PyTorch 1.12.1 framework, with experiments conducted on a single NVIDIA RTX 3090 GPU. The SW-TSAD model is trained until convergence on each dataset in the centralized setting. For federated learning, FedSW-TSAD is deployed with K = 5 clients. Each client performs E = 5 local training epochs per communication round, with a total of R max = 50 global rounds. The batch size is set to m = 1024 , and model optimization is performed using the Adam optimizer with the learning rate ϕ = 1 × 10 4 and momentum parameters ( ψ 1 , ψ 2 ) = ( 0.5 , 0.999 ) . To ensure privacy, we apply FedSW-TSAD with clipping and Gaussian noise. The privacy budget ( ϵ , δ ) is set to ( 2 , 10 5 ) , which represents a widely accepted level of differential privacy in practice. During inference, the final anomaly score A D s c o r e is computed by combining reconstruction, discrimination, and prediction components with empirically tuned weights: α = 0.35 , β = 0.15 , and γ = 0.5 .

4.2. Evaluation Metrics

Anomaly detection performance is assessed using precision (Pre), recall (Rec), and F1-score (F1), which are standard metrics for binary classification in imbalanced settings. In particular, the F1-score, as the harmonic mean of precision and recall, is especially suitable for time-series anomaly detection tasks, where datasets are typically highly imbalanced—i.e., normal instances significantly outnumber anomalous ones. In such cases, the F1-score provides a balanced measure by accounting for both false positives and false negatives, offering a more comprehensive evaluation of detection performance than accuracy alone.
The F1-score is computed based on a tunable threshold applied to the anomaly scores. To estimate the performance upper bound, we follow the evaluation protocol in Su et al. [40] and conduct an exhaustive search over threshold values ranging from 1 × 10 4 to 1, with 150 evenly spaced points. The best achievable score under this search is reported as F1-best, reflecting the model’s maximum potential under ideal threshold calibration. Additionally, a window-based evaluation criterion is adopted, where a window is considered anomalous if at least one of its constituent time points is identified as anomalous.

4.3. Baselines

To validate the effectiveness of the proposed model across different deployment settings, baseline methods are categorized into non-federated (centralized) and federated learning-based approaches. This division enables a clear comparison between SW-TSAD and existing centralized methods, as well as between FedSW-TSAD and existing federated frameworks.
Non-federated baselines. This group includes traditional machine learning algorithms, deep generative models, and Transformer-based detectors commonly used in centralized anomaly detection pipelines. LOF [13] detects anomalies by evaluating the local density deviation of a point relative to its neighbors, offering robustness in high-dimensional, unsupervised settings. iForest [14] isolates anomalies via recursive partitioning of the feature space, enabling efficient and accurate detection without probabilistic assumptions. MADGAN [32], a GAN-based framework for multivariate time-series anomaly detection, models complex dependencies through adversarial learning and employs MLP modules to reduce computational cost. USAD [42] enhances anomaly detection by combining adversarial training with an encoder-decoder structure and a signal amplification mechanism. OmniAnomaly [40] incorporates temporal dependencies via stochastic variable modeling, achieving strong performance on real-world sensor time-series benchmarks. Autoformer [43] integrates series decomposition and autocorrelation mechanisms into a deep architecture to improve temporal pattern extraction and forecasting accuracy. Informer [44] introduces the ProbSparse self-attention and a generative decoder, significantly boosting efficiency in long-sequence modeling. FEDformer [45] combines Fourier-based decomposition with Transformer structures to enhance global pattern extraction and scalability. AT [46] proposes a Transformer-based anomaly detector leveraging anomaly-attention and discrepancy-aware learning to capture multi-scale dependencies and distinguish anomalous behaviors. FPT [47] adapts frozen pretrained Transformers from NLP and vision domains for time-series tasks, maintaining model structure while achieving competitive performance and offering theoretical insights through its connection to PCA.
Federated baselines. This group consists of models designed for or adapted to the federated setting, where training occurs across decentralized clients. Several Transformer-based models are extended to the federated setting via the FedAvg algorithm [7], including Autoformer ( fl ) , Informer ( fl ) , FEDformer ( fl ) , AT ( fl ) , and FPT ( fl ) . In addition, DeepSVDD [36], a one-class classification method evaluated in the FedTADBench benchmark, and FedAnomaly [37], a variational autoencoder-based distributed detection framework, are included as federated baseline models.

4.4. Main Results

Table 2 and Table 3 present the evaluation results comparing the proposed models and baselines under centralized and federated settings across four benchmark datasets (PSM, SMAP, MSL, and SMD). In each setting, the best F1-score is marked in bold, and the second-best is underlined for clarity.
In the centralized setting, SW-TSAD achieves consistently strong results, outperforming all baselines on every dataset. For example, it reaches an F1-score of 97.95% on PSM, exceeding the next-best model FPT by a small margin of 0.88%. On SMAP, it achieves 80.81%, outperforming the best-performing traditional baseline iForest by 10.4 percentage points. This balance of high recall and high precision is also evident in the MSL and SMD datasets, where SW-TSAD achieves F1-scores of 88.35% and 93.50%, respectively, exceeding the best existing models by 3.71% and 1.48%, respectively. Although the margins over strong baselines like FPT, AT, and OmniAnomaly are small, they demonstrate that integrating reconstruction and prediction modules contributes to a more robust and generalizable detector.
The effectiveness of the model is further validated in the federated learning framework. Under a five-client setting, FedSW-TSAD maintains an F1-score of 97.91% on the PSM dataset, with only a 0.04% performance gap compared to centralized training (97.95%). This significantly outperforms other federated learning methods, with an F1-score improvement of 25.55 percentage points over the best-performing alternative, DeepSVDD (72.36%). In the SMAP dataset, the proposed method sustains 100% recall in the federated setting, achieving an F1-score of 79.92%, which is 48.9% higher than the centralized baseline, FedAnomaly (53.66%).
These results highlight the key strength of the proposed framework: while SW-TSAD shows moderate advantages, FedSW-TSAD demonstrates substantial improvement over alternatives. This improvement stems from two core components. First, combining prediction and reconstruction modules enhances anomaly coverage across diverse behaviors. Second, the Sobolev–Wasserstein constraint stabilizes GAN training, while L2-norm clipping regularizes all local modules, mitigating client inconsistency in federated updates. Together, these designs enable FedSW-TSAD to approach centralized performance while preserving privacy. This makes it well-suited for anomaly detection in distributed sensor networks and privacy-sensitive industrial IoT applications, where local heterogeneity and data protection requirements hinder the use of centralized methods.

5. Comprehensive Model Analysis

This section presents a comprehensive evaluation of FedSW-TSAD from multiple perspectives, including effectiveness, efficiency, robustness, and privacy. The analysis covers ablation study, system performance, client instability, hyperparameter sensitivity, and the impact of differential privacy mechanisms, providing a well-rounded understanding of the model’s practical behavior in federated anomaly detection scenarios.

5.1. Ablation Study

To assess the impact of individual components on model performance, an ablation study was conducted by systematically removing key elements from the FedSW-TSAD architecture. The evaluation focused on the effects of excluding the Temporal Convolutional Network, differential privacy (DP), Sobolev–Wasserstein Constraint (SW Constraint), and Hybrid Anomaly Scoring. The experimental results obtained using four benchmark datasets are presented in Table 4.
  • The effect of TCN: Removing the Temporal Convolutional Network module (denoted as w/o TCN) significantly reduces the F1-score across all datasets. For instance, in the PSM dataset, the F1-score decreases from 97.91 to 93.30, while in the MSL dataset, it drops from 86.49 to 80.02. This confirms the importance of the TCN in modeling temporal dependencies essential for accurate anomaly detection.
  • The effect of DP: The removal of DP (w/o DP) leads to a slight improvement in the F1-score on certain datasets, such as PSM (from 97.91 to 98.31) and MSL (from 86.49 to 89.24). However, DP remains essential for privacy preservation in federated learning. These findings underscore the trade-off between privacy and performance, which must be carefully managed.
  • The effect of the Sobolev–Wasserstein constraint: Excluding the Sobolev–Wasserstein constraint (w/o SW constraint) results in a notable performance decline, particularly in the PSM dataset, where the F1-score drops from 97.91 to 94.01. This suggests that the SW constraint is crucial in enhancing feature alignment across clients, thereby improving overall model robustness in federated environments.
  • The effect of Hybrid Anomaly Scoring: To evaluate the contribution of the hybrid scoring mechanism, additional ablation settings were introduced during the inference stage. In the first variant, the prediction-based score P score was removed, resulting in a reconstruction-only detector denoted as (w/o Prediction Score). In the second variant, the reconstruction-based scores R score and D score were removed, simulating a purely prediction-based scheme, denoted as (w/o SWGAN Score). The results exhibit distinct trends across datasets with different anomaly types. On MSL and SMAP, where collective anomalies dominate, the absence of reconstruction-based scores led to significant F1-score drops of 18.57 and 26.32 percentage points. In contrast, excluding the prediction score resulted in relatively minor performance degradation (3.01 and 5.20 points). This suggests that reconstruction-based detection is more effective at capturing structured anomalies affecting multiple dimensions or time steps. Conversely, on PSM and SMD—datasets characterized by point anomalies and abrupt shifts—the prediction score played a more critical role. Removing P score decreased the F1-score by 18.75 and 30.51 percentage points, respectively, whereas removing reconstruction-based scores led to smaller declines (6.68 and 7.62 points). These findings highlight the complementary nature of the two scoring paradigms: while each offers benefits for specific anomaly types, their combination yields robust and consistently high detection performance across diverse scenarios.
The ablation study demonstrates that the TCN and the Sobolev–Wasserstein constraint are crucial for enhancing detection accuracy, while differential privacy introduces a moderate trade-off between utility and privacy (see Section 5.7). Additionally, the hybrid scoring mechanism significantly improves robustness: removing either the prediction-based or reconstruction-based score leads to substantial performance drops on datasets dominated by different anomaly types. These results highlight the necessity of each component and the importance of combining complementary signals in federated anomaly detection.

5.2. System Efficiency Analysis

To assess the computational overhead and potential for edge deployment, we conducted targeted experiments on the MSL dataset. Compared to the other datasets used in this study, MSL offers a moderate scale in both training and test set sizes and features the highest input dimensionality (55 sensors), making it a representative choice for evaluating runtime and memory efficiency under realistic multivariate input conditions. As shown in Table 5 and Figure 5, the predictor contains the majority of parameters—189.6K, representing over 84% of the model size. The total floating point operations (FLOPs) per round amount to 302.4 MFLOPs, which remains lightweight given the model’s modular architecture.
We further measured the runtime across three GPU platforms of varying capacity (Table 6 and Figure 6). Even with full-precision FP32 operations, the model achieved rapid execution: 3.2 s per round on an RTX 3060 and just 0.5 s on an RTX 4090, with stable memory usage at 2.1 GB across all devices. These figures suggest good scalability and suitability for edge GPUs.
Given the predictor’s parameter dominance, we applied two efficiency optimizations: 16-bit floating-point quantization and structured pruning at 50% sparsity (Table 7 and Figure 7). Quantization reduced the model size to 180K parameters and memory usage to 1.7 GB, with only a modest F1 decline (from 86.49% to 83.95%). Pruning further compressed the model to 113K parameters and 1.1 GB of memory while retaining 82.71% F1 accuracy. These results indicate that the model can be significantly compressed with minimal performance trade-offs.
Overall, the experiments confirm that the proposed model—despite its use of SWGAN, TCN, and LSTM modules—can maintain low computational costs. The applied compression techniques enable faster inference and reduced memory footprint, supporting deployment on resource-limited edge devices.

5.3. Sensitivity to the Number of Clients

The influence of client quantity on model effectiveness was examined by comparing four configurations: a centralized setup (all data on a single server) and federated settings with 5, 10, and 15 clients. Performance was evaluated using the precision, recall, and F1-score across all four datasets. The results are summarized in Table 8.
Overall, the centralized configuration yields the highest performance, with a gradual decline observed as the number of clients increases. On the PSM dataset, for instance, the F1-score drops from 97.95 (centralized) to 97.91 (5 clients), 94.97 (10 clients), and 92.12 (15 clients). A similar trend is observed across other datasets: on SMAP, the F1-score drops from 80.81 to 75.20; on MSL, from 88.35 to 81.38; and on SMD, from 93.50 to 87.66, as the number of clients increases from 1 (centralized) to 15.
This degradation reflects a key trade-off in federated learning: more clients offer improved data locality and privacy but introduce greater statistical heterogeneity and reduce the local data volume, thereby weakening generalization. Across the four datasets, the average F1-score decline from centralized to 15-client configurations ranges from approximately 5.61% (SMD) to 6.97% (MSL), clearly illustrating this trend. These observations underscore the importance of selecting an appropriate client scale to balance performance and deployment constraints in real-world scenarios.

5.4. Robustness Evaluation

To evaluate the robustness of the proposed FedSW-TSAD model, we conducted experiments focusing on two key aspects: the model’s ability to handle noisy or corrupted client data and its tolerance to client dropout and intermittent communication during federated training. All robustness experiments were conducted on the SMD dataset, due to its large size and low anomaly rate, which makes it a suitable testbed for evaluating performance under partial corruption or instability. The federated configuration used in these experiments consists of five clients, each trained with non-overlapping partitions of the SMD training data.
In real-world scenarios, client devices may generate noisy or partially corrupted time-series data due to sensor faults or environmental interference. To simulate local data corruption, we randomly selected 20% of clients and added zero-mean Gaussian noise to all training samples on those clients. The noise intensity is controlled via the standard deviation σ of the Gaussian distribution, set to 10%, 15%, or 20% of the average magnitude of the input features. The test data remained clean to assess generalization robustness. Table 9 presents the F1-scores under different noise levels.
As shown in Table 9, the model maintains a relatively high detection performance under mild noise (10–15%), with a decrease in F1-score of less than 7%. Even with 20% noise, the model achieves an F1-score of 80.08%, indicating strong robustness to moderate data corruption. This resilience is attributed to the regularized training of the Sobolev–Wasserstein GAN and the federated aggregation mechanism, which collectively mitigate the influence of outliers.
Federated learning systems often suffer from client unavailability due to network instability or device power constraints. To simulate client unavailability, we apply random dropout during each communication round: each client has a fixed probability (10–40%) of becoming temporarily unavailable and skipping that round. This probabilistic dropout model reflects realistic network instability or device constraints in federated environments. Table 10 provides a summary of the model’s performance under different dropout rates.
The results demonstrate that FedSW-TSAD is tolerant to a moderate level of client dropout. With up to 20% dropout, the F1-score remains above 90%, suggesting minimal degradation. Although performance declines more noticeably beyond 30% dropout, the model still preserves acceptable anomaly detection capability, highlighting its robustness in dynamic federated environments.

5.5. Hyperparameter Sensitivity Analysis

In this study, we conducted a hyperparameter sensitivity analysis to examine how the weights of different loss components influence model performance. We focused on three coefficients: α (discrimination loss), β (reconstruction loss), and γ (prediction loss), and designed two rounds of systematic experiments to explore their impact and determine an effective configuration.
In the first round, we fixed γ = 0.5 and maintained α + β = 0.5 , varying the ratio of α to β to investigate how adjusting the weights of the discrimination and reconstruction components affects performance. The results, summarized in Table 11 and visualized in Figure 8, show a consistent improvement in F1-score with increasing α across the four datasets (PSM, SMAP, MSL, and SMD), peaking when α = 0.35 and β = 0.15 . This configuration yielded F1-scores of 97.91% on PSM, 79.92% on SMAP, 93.17% on SMD, and 86.49% on MSL, reflecting both generalization and robustness. These outcomes indicate that giving more weight to the discrimination term ( α ) improves anomaly detection. In contrast, reducing the reconstruction term ( β ) does not noticeably impair performance and may help the model focus on learning more discriminative features.
Guided by these results, we fixed the ratio α : β = 7 : 3 and conducted a second round of experiments to assess how varying γ affects performance, as shown in Table 12 and visualized in Figure 9. We gradually increased γ from 0.17 to 0.67 to test different contribution levels from the prediction term. As γ increased moderately, F1-scores improved across all datasets, with optimal results at γ = 0.50 . Specifically, the configuration achieved 97.91% on PSM, 79.92% on SMAP, 86.49% on MSL, and 93.17% on SMD, maintaining the performance level from the first round. This consistency supports the effectiveness and stability of assigning a weight of 0.50 to the prediction loss. These findings emphasize the importance of a balanced design among the three loss terms, as both overly low and high values of γ can destabilize performance.
Taken together, the two rounds of sensitivity analysis suggest that the configuration α : β : γ = 0.35 : 0.15 : 0.50 offers a reliable trade-off among the three loss components, yielding stable and competitive results across various benchmark datasets and providing practical guidance for model tuning.

5.6. Privacy-Utility Trade-Off

To assess the practical impact of differential privacy on TSAD performance, we conduct a systematic evaluation of the trade-off between privacy protection and model utility. While previous experiments confirm that introducing DP noise may degrade performance, this subsection quantitatively analyzes how varying privacy budgets affect results under the ( ϵ , δ ) -DP framework.
Specifically, we fix δ = 10 5 and vary the privacy budget ϵ { 1 , 2 , 3 , 4 } . For each setting, we calculate the corresponding noise multiplier σ and retrain the FedSW-TSAD model with identical optimization parameters. An additional baseline with ϵ = represents the non-private case. Precision, Recall, and F1-score are evaluated across four benchmark datasets: PSM, SMAP, MSL, and SMD.
As shown in Table 13, model performance improves consistently as ϵ increases, which corresponds to reduced DP noise and weaker privacy guarantees. For example, with ϵ = 1 , the average F1-score is only 85.79, reflecting a noticeable degradation in utility. When the privacy budget is relaxed to ϵ = 4 , the average F1 rises to 91.25, approaching the non-private baseline of 91.69.

5.7. Case Study: Impact of Differential Privacy on Federated Learning

To evaluate how differential privacy influences the training dynamics of FedSW-TSAD, gradient updates were recorded over 10 consecutive epochs on the SMAP dataset with five federated clients. Figure 10 illustrates the dynamic evolution of gradient updates between the non-private setting (w/o DP) and the DP-protected scenario using a grouped bar chart.
To determine whether the observed difference is statistically significant, a two-sample t-test was applied. For two equally sized groups ( n 1 = n 2 = 10 ), the degrees of freedom were computed as:
The statistical significance of the observed difference was assessed using a two-sample t-test. For two equally sized groups ( n 1 = n 2 = 10 ), the degrees of freedom were computed as:
n 1 + n 2 2 = 18
The t-statistic was calculated using:
t = X ¯ D P X ¯ w / o DP s D P 2 n D P + s w / o DP 2 n w / o DP
With mean gradient update values X ¯ D P = 2.13 × 10 2 and X ¯ w / o DP = 2.01 × 10 2 , the analysis yields a t-value of 2.37 and a p-value of 0.029, indicating that under the null hypothesis (no difference in gradient updates between the two groups), the probability of observing such an extreme result is only 2.9%.
Since p < 0.05, the null hypothesis is rejected. To assess the strength of the observed difference, Cohen’s d was computed:
d = | X ¯ D P X ¯ w / o D P | ( n D P 1 ) s D P 2 + ( n w / o D P 1 ) s w / o D P 2 n D P + n w / o D P 2 = 0.74
The resulting effect size suggests a moderate impact of DP-induced noise on local gradients (d > 0.5). These findings demonstrate that while DP introduces meaningful variability, the model remains trainable with acceptable performance, highlighting the feasibility of privacy-preserving training under FedSW-TSAD.

6. Conclusions

Distributed sensor networks are increasingly deployed in industrial and environmental systems, where decentralized sensing and strict privacy constraints render centralized anomaly detection approaches impractical. To address these challenges, this study presents FedSW-TSAD, a federated learning framework tailored for time series anomaly detection in such settings. The framework integrates Sobolev–Wasserstein GANs to stabilize generative modeling, temporal convolutional networks for feature extraction, and a hybrid scoring mechanism to improve robustness under heterogeneous anomaly types. These components collectively address two core difficulties in federated TSAD: unstable training and limited generalization across clients.
Comprehensive experiments on four real-world sensor datasets demonstrate that FedSW-TSAD consistently outperforms both centralized and federated baselines. Notably, it achieves near-centralized detection accuracy under federated settings while preserving privacy through L2-norm-constrained differential privacy. These results underscore the significance of designing anomaly detection models that align with the distributed nature of modern sensor networks, offering a scalable and privacy-aware solution for critical applications such as industrial IoT, predictive maintenance, remote diagnostics, and smart healthcare.
While FedSW-TSAD demonstrates strong performance across multiple datasets, several limitations warrant future exploration. First, the framework relies on fixed-size sliding windows for segmenting time series, which may be suboptimal for sequences with dynamic temporal patterns. Second, although the model simulates concept drift through time-based data partitioning, it lacks real-time adaptability to long-term distributional changes. Addressing these challenges, future work may incorporate adaptive windowing strategies based on signal characteristics and develop online adaptation mechanisms for sustained deployment. Furthermore, explainable anomaly detection techniques—such as attention-based visualizations, perturbation-based saliency analysis, or post hoc feature attribution—could enhance model interpretability and support trustworthy deployment in safety-critical environments.

Author Contributions

Conceptualization, X.Z. and H.Z.; methodology, X.Z.; software, X.Z.; validation, X.Z., H.Z., H.S. and B.Z.; formal analysis, X.Z.; investigation, X.Z. and H.Z.; resources, W.Z. and S.C.; data curation, X.Z.; writing—original draft preparation, X.Z.; writing—review and editing, H.Z., H.S., B.Z., W.Z. and S.C.; visualization, X.Z.; supervision, W.Z. and S.C.; funding acquisition, W.Z. and S.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by Shandong Provincial Natural Science Foundation Major Basic Research Program (Grant No. ZR2024ZD20), Shandong Data Open Innovative Application Laboratory, and the National Natural Science Foundation of China under Grant (62072469).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. These datasets include the Server Machine Dataset (SMD) [40], Pool Server Metrics (PSM) [41], Soil Moisture Active Passive (SMAP) [22], and Mars Science Laboratory (MSL) [22].

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Wang, F.; Jiang, Y.; Zhang, R.; Wei, A.; Xie, J.; Pang, X. A Survey of Deep Anomaly Detection in Multivariate Time Series: Taxonomy, Applications, and Directions. Sensors 2025, 25, 190. [Google Scholar] [CrossRef] [PubMed]
  2. Yan, P.; Abdulkadir, A.; Luley, P.P.; Rosenthal, M.; Schatte, G.A.; Grewe, B.F.; Stadelmann, T. A Comprehensive Survey of Deep Transfer Learning for Anomaly Detection in Industrial Time Series: Methods, Applications, and Directions. IEEE Access 2024, 12, 3768–3789. [Google Scholar] [CrossRef]
  3. Zhang, Z.; Geng, Z.; Han, Y. Graph Structure Change-Based Anomaly Detection in Multivariate Time Series of Industrial Processes. IEEE Trans. Ind. Inform. 2024, 20, 6457–6466. [Google Scholar] [CrossRef]
  4. Nawaz, A.; Khan, S.S.; Ahmad, A. Ensemble of Autoencoders for Anomaly Detection in Biomedical Data: A Narrative Review. IEEE Access 2024, 12, 17273–17289. [Google Scholar] [CrossRef]
  5. Pinto, A.; Herrera, L.C.; Donoso, Y.; Gutierrez, J.A. Enhancing Critical Infrastructure Security: Unsupervised Learning Approaches for Anomaly Detection. Int. J. Comput. Intell. Syst. 2024, 17, 236. [Google Scholar] [CrossRef]
  6. Xu, R.; Miao, H.; Wang, S.; Yu, P.S.; Wang, J. PeFAD: A Parameter-Efficient Federated Framework for Time Series Anomaly Detection. In Proceedings of the 30th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Barcelona, Spain, 25–29 August 2024; pp. 3621–3632. [Google Scholar] [CrossRef]
  7. McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, Ft. Lauderdale, FL, USA, 20–22 April 2017; Volume 54, pp. 1273–1282. [Google Scholar]
  8. Hallaji, E.; Razavi-Far, R.; Saif, M.; Wang, B.; Yang, Q. Decentralized Federated Learning: A Survey on Security and Privacy. IEEE Trans. Big Data 2024, 10, 194–213. [Google Scholar] [CrossRef]
  9. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
  10. Lucic, M.; Kurach, K.; Michalski, M.; Gelly, S.; Bousquet, O. Are GANs Created Equal? A Large-Scale Study. In Proceedings of the Advances in Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018; Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R., Eds.; Curran Associates, Inc.: Sydney, Australia, 2018; Volume 31. [Google Scholar] [CrossRef]
  11. Wong, L.; Liu, D.; Berti-Equille, L.; Alnegheimish, S.; Veeramachaneni, K. AER: Auto-Encoder with Regression for Time Series Anomaly Detection. In Proceedings of the 2022 IEEE International Conference on Big Data (Big Data), Osaka, Japa, 17–20 December 2022; pp. 1152–1161. [Google Scholar] [CrossRef]
  12. Xu, M.; Zhou, Z.; Lu, G.; Tang, J.; Zhang, W.; Yu, Y. Sobolev Wasserstein GAN. arXiv 2020, arXiv:2012.03420. [Google Scholar]
  13. Breunig, M.M.; Kriegel, H.P.; Ng, R.T.; Sander, J. LOF: Identifying Density-Based Local Outliers. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, TX, USA, 16–18 May 2000; pp. 93–104. [Google Scholar] [CrossRef]
  14. Liu, F.T.; Ting, K.M.; Zhou, Z.H. Isolation Forest. In Proceedings of the 8th IEEE International Conference on Data Mining (ICDM), Pisa, Italy, 15–19 December 2008; pp. 413–422. [Google Scholar] [CrossRef]
  15. León-López, K.M.; Mouret, F.; Arguello, H.; Tourneret, J.Y. Anomaly Detection and Classification in Multispectral Time Series Based on Hidden Markov Models. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–11. [Google Scholar] [CrossRef]
  16. Abdi, H.; Williams, L.J. Principal component analysis. Wiley Interdiscip. Rev. Comput. Stat. 2010, 2, 433–459. [Google Scholar] [CrossRef]
  17. Angiulli, F.; Pizzuti, C. Fast Outlier Detection in High Dimensional Spaces. In Lecture Notes in Computer Science, Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD), Helsinki, Finland, 19–23 August 2002; Springer: Berlin/Heidelberg, Germany, 2002; Volume 2431, pp. 15–27. [Google Scholar] [CrossRef]
  18. Schölkopf, B.; Williamson, R.C.; Smola, A.; Shawe-Taylor, J.; Platt, J. Support Vector Method for Novelty Detection. In Proceedings of the Advances in Neural Information Processing Systems, Denver, CO, USA, 29 November–4 December 1999; Volume 12, pp. 582–588. [Google Scholar]
  19. Ren, H.; Xu, B.; Wang, Y.; Yi, C.; Huang, C.; Kou, X.; Xing, T.; Yang, M.; Tong, J.; Zhang, Q. Time-Series Anomaly Detection Service at Microsoft. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Anchorage, AK, USA, 4–8 August 2019; pp. 3009–3017. [Google Scholar] [CrossRef]
  20. Lin, S.; Clark, R.; Birke, R.; Schonborn, S.; Trigoni, N.; Roberts, S. Anomaly Detection for Time Series Using VAE-LSTM Hybrid Model. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 4322–4326. [Google Scholar] [CrossRef]
  21. Chen, Z.; Chen, D.; Zhang, X.; Yuan, Z.; Cheng, X. Learning Graph Structures with Transformer for Multivariate Time Series Anomaly Detection in IoT. IEEE Internet Things J. 2021, 9, 9179–9189. [Google Scholar] [CrossRef]
  22. Hundman, K.; Constantinou, V.; Laporte, C.; Colwell, I.; Soderstrom, T. Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), London, UK, 19–23 August 2018; pp. 387–395. [Google Scholar] [CrossRef]
  23. Fadili, Y.; El Yamani, Y.; Kilani, J.; El Kamoun, N.; Baddi, Y.; Bensalah, F. An Enhancing Timeseries Anomaly Detection Using LSTM and Bi-LSTM Architectures. In Proceedings of the 2024 11th International Conference on Wireless Networks and Mobile Communications (WINCOM), Leeds, UK, 23–25 July 2024; pp. 1–6. [Google Scholar] [CrossRef]
  24. Park, S.; Jung, S.; Jung, S.; Rho, S.; Hwang, E. Sliding Window-Based LightGBM Model for Electric Load Forecasting Using Anomaly Repair. J. Supercomput. 2021, 77, 12857–12878. [Google Scholar] [CrossRef]
  25. Wieczorek, M.; Siłka, J.; Woźniak, M. Neural network powered COVID-19 spread forecasting model. Chaos Solitons Fractals 2020, 140, 110203. [Google Scholar] [CrossRef]
  26. Siłka, J.; Wieczorek, M.; Woźniak, M. Recurrent neural network model for high-speed train vibration prediction from time series. Neural Comput. Appl. 2022, 34, 13305–13318. [Google Scholar] [CrossRef]
  27. Hsieh, R.J.; Chou, J.; Ho, C.H. Unsupervised Online Anomaly Detection on Multivariate Sensing Time Series Data for Smart Manufacturing. In Proceedings of the 2019 IEEE 12th Conference on Service-Oriented Computing and Applications (SOCA), Kaohsiung, Taiwan, 18–21 November 2019; pp. 90–97. [Google Scholar] [CrossRef]
  28. Lee, Y.; Park, C.; Kim, N.; Ahn, J.; Jeong, J. LSTM-Autoencoder Based Anomaly Detection Using Vibration Data of Wind Turbines. Sensors 2024, 24, 2833. [Google Scholar] [CrossRef]
  29. Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
  30. He, S.; Du, M.; Jiang, X.; Zhang, W.; Wang, C. VAEAT: Variational Autoencoder with Adversarial Training for Multivariate Time Series Anomaly Detection. Inf. Sci. 2024, 676, 120852. [Google Scholar] [CrossRef]
  31. Zong, B.; Song, Q.; Min, M.R.; Cheng, W.; Lumezanu, C.; Cho, D.; Chen, H. Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection. In Proceedings of the International Conference on Learning Representations (ICLR), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
  32. Li, D.; Chen, D.; Jin, B.; Shi, L.; Goh, J.; Ng, S.K. MAD-GAN: Multivariate Anomaly Detection for Time Series Data with Generative Adversarial Networks. In Lecture Notes in Computer Science, Proceedings of the Artificial Neural Networks and Machine Learning—ICANN 2019, Munich, Germany, 17–19 September 2019; Springer: Berlin/Heidelberg, Germany, 2019; Volume 11728, pp. 703–716. [Google Scholar] [CrossRef]
  33. Geiger, A.; Liu, D.; Alnegheimish, S.; Cuesta-Infante, A.; Veeramachaneni, K. TadGAN: Time Series Anomaly Detection Using Generative Adversarial Networks. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA, 10–13 December 2020; pp. 33–43. [Google Scholar] [CrossRef]
  34. Nguyen, T.D.; Marchal, S.; Miettinen, M.; Fereidooni, H.; Asokan, N.; Sadeghi, A.R. DÏoT: A Federated Self-learning Anomaly Detection System for IoT. In Proceedings of the 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS), Atlanta, GA, USA, 10–13 December 2019; pp. 756–767. [Google Scholar] [CrossRef]
  35. Xu, J.; Glicksberg, B.S.; Su, C.; Walker, P.; Bian, J.; Wang, F. Federated Learning for Healthcare Informatics. J. Healthc. Inform. Res. 2021, 5, 1–19. [Google Scholar] [CrossRef]
  36. Liu, F.; Zeng, C.; Zhang, L.; Zhou, Y.; Mu, Q.; Zhang, Y.; Zhang, L.; Zhu, C. FedTADBench: Federated Time-Series Anomaly Detection Benchmark. In Proceedings of the 24th IEEE International Conference on High Performance Computing and Communications (HPCC), Hainan, China, 18–20 December 2022; pp. 303–310. [Google Scholar] [CrossRef]
  37. Zhang, K.; Jiang, Y.; Seversky, L.; Xu, C.; Liu, D.; Song, H. Federated Variational Learning for Anomaly Detection in Multivariate Time Series. In Proceedings of the IEEE International Performance, Computing, and Communications Conference (IPCCC), Austin, TX, USA, 29–31 October 2021; pp. 1–9. [Google Scholar] [CrossRef]
  38. de Oliveira, J.A.; Gonçalves, V.P.; Meneguette, R.I.; de Sousa, R.T.; Guidoni, D.L.; Oliveira, J.C.; Rocha Filho, G.P. F-NIDS—A Network Intrusion Detection System based on federated learning. Comput. Netw. 2023, 236, 110010. [Google Scholar] [CrossRef]
  39. Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A.C. Improved Training of Wasserstein GANs. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30, pp. 5767–5777. [Google Scholar]
  40. Su, Y.; Zhao, Y.; Niu, C.; Liu, R.; Sun, W.; Pei, D. Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Anchorage, AK, USA, 4–8 August 2019; pp. 2828–2837. [Google Scholar] [CrossRef]
  41. Abdulaal, A.; Liu, Z.; Lancewicki, T. Practical Approach to Asynchronous Multivariate Time Series Anomaly Detection and Localization. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), Singapore, 14–18 August 2021; pp. 2485–2494. [Google Scholar] [CrossRef]
  42. Audibert, J.; Michiardi, P.; Guyard, F.; Marti, S.; Zuluaga, M.A. USAD: UnSupervised Anomaly Detection on Multivariate Time Series. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Virtual Event, 6–10 July 2020; pp. 3395–3404. [Google Scholar] [CrossRef]
  43. Wu, H.; Xu, J.; Wang, J.; Long, M. Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting. arXiv 2021, arXiv:2106.13008. [Google Scholar]
  44. Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. Proc. AAAI Conf. Artif. Intell. 2021, 35, 11106–11115. [Google Scholar] [CrossRef]
  45. Zhou, T.; Ma, Z.; Wen, Q.; Wang, X.; Sun, L.; Jin, R. FEDformer: Frequency Enhanced Decomposed Transformer for Long-Term Series Forecasting. arXiv 2022, arXiv:2201.12740. [Google Scholar]
  46. Xu, J.; Wu, H.; Wang, J.; Long, M. Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy. In Proceedings of the Tenth International Conference on Learning Representations (ICLR), Virtual Event, 25–29 April 2022. [Google Scholar]
  47. Zhou, T.; Niu, P.; Wang, X.; Sun, L.; Jin, R. One Fits All: Power General Time Series Analysis by Pretrained LM. In Proceedings of the Advances in Neural Information Processing Systems 36 (NeurIPS 2023), New Orleans, LA, USA, 10–16 December 2023; pp. 43322–43355. [Google Scholar]
Figure 1. An illustration of heterogeneous anomaly types across clients in federated time series anomaly detection. Red circles highlight anomalous regions. Each client retains its local time series data and independently trains a local model, which contributes to the global model through federated aggregation.
Figure 1. An illustration of heterogeneous anomaly types across clients in federated time series anomaly detection. Red circles highlight anomalous regions. Each client retains its local time series data and independently trains a local model, which contributes to the global model through federated aggregation.
Sensors 25 04014 g001
Figure 2. The architecture of the proposed SW-TSAD framework. SW-TSAD stands for the time series anomaly detection using Sobolev-Wasserstein GAN (SWGAN).
Figure 2. The architecture of the proposed SW-TSAD framework. SW-TSAD stands for the time series anomaly detection using Sobolev-Wasserstein GAN (SWGAN).
Sensors 25 04014 g002
Figure 4. The architecture of the proposed FedSW-TSAD framework. FedSW-TSAD stands for federated time series anomaly detection using Sobolev–Wasserstein GAN.
Figure 4. The architecture of the proposed FedSW-TSAD framework. FedSW-TSAD stands for federated time series anomaly detection using Sobolev–Wasserstein GAN.
Sensors 25 04014 g004
Figure 5. FLOPs and parameter analysis of the proposed model on the MSL dataset. The predictor dominates in terms of both computation and parameter count.
Figure 5. FLOPs and parameter analysis of the proposed model on the MSL dataset. The predictor dominates in terms of both computation and parameter count.
Sensors 25 04014 g005
Figure 6. Visualization of runtime performance of the model across different GPU platforms on the MSL dataset.
Figure 6. Visualization of runtime performance of the model across different GPU platforms on the MSL dataset.
Sensors 25 04014 g006
Figure 7. Visualization of the impact of quantization and pruning on model performance and resource consumption, based on the MSL dataset. Quantization and pruning significantly reduce resource requirements.
Figure 7. Visualization of the impact of quantization and pruning on model performance and resource consumption, based on the MSL dataset. Quantization and pruning significantly reduce resource requirements.
Sensors 25 04014 g007
Figure 8. Visualization of model performance as a function of α and β with fixed γ = 0.5 . The plots reveal a clear upward trend in F1-score with larger α values, supporting the importance of emphasizing the discrimination loss component.
Figure 8. Visualization of model performance as a function of α and β with fixed γ = 0.5 . The plots reveal a clear upward trend in F1-score with larger α values, supporting the importance of emphasizing the discrimination loss component.
Sensors 25 04014 g008
Figure 9. Impact of varying γ on F1-score while holding α : β = 7 : 3 constant. The curves indicate that the model reaches optimal stability and accuracy around γ = 0.50 , with lower or higher values leading to slight performance degradation.
Figure 9. Impact of varying γ on F1-score while holding α : β = 7 : 3 constant. The curves indicate that the model reaches optimal stability and accuracy around γ = 0.50 , with lower or higher values leading to slight performance degradation.
Sensors 25 04014 g009
Figure 10. Visualization of gradient update ablation experiment Differential Privacy (DP) vs w/o DP.
Figure 10. Visualization of gradient update ablation experiment Differential Privacy (DP) vs w/o DP.
Sensors 25 04014 g010
Table 1. Dataset statistics.
Table 1. Dataset statistics.
DatasetsTrainTestDimensionsAnomaly Rate (%)
PSM132,48126,4972528
MSL58,31773,7295511
SMAP135,183427,6172513
SMD708,405708,420384
Table 2. The main results of comparing SW-TSAD and centralized baseline models are presented. “Pre,” “Rec,” and “F1” represent the precision, recall, and F1-score, respectively, expressed as percentages.
Table 2. The main results of comparing SW-TSAD and centralized baseline models are presented. “Pre,” “Rec,” and “F1” represent the precision, recall, and F1-score, respectively, expressed as percentages.
ModelPSMSMAPMSLSMD
Pre Rec F1 Pre Rec F1 Pre Rec F1 Pre Rec F1
LOF15.49100.0026.832.05100.004.0111.52100.0020.677.90100.0014.65
iforest19.2747.9327.4954.33100.0070.4179.2885.9382.4777.9888.6082.95
OmniAnomaly75.9984.9680.2250.06100.0066.7269.7480.2874.6487.4097.1692.02
USAD100.0020.9134.5828.37100.0044.2076.7792.8284.0467.9591.1577.86
MADGAN82.0480.3481.1823.05100.0037.4780.7886.3483.4763.0589.5974.01
Autoformer99.9479.0688.28\\\76.9376.5076.7178.4565.1071.15
Informer97.2980.5988.15\\\79.7974.7377.1890.2875.2482.08
FEDformer99.9881.6989.91\\\90.6169.0278.3576.7859.7267.19
AT95.7095.3495.52\\\69.1486.4876.8590.3482.3486.16
FPT98.3695.8297.07\\\81.1080.3580.7287.6080.7984.06
SW-TSAD (Centralized)97.9997.9197.9567.80100.0080.8188.5388.1888.3594.1092.9193.50
Table 3. The main results of comparing FedSW-TSAD and federated baseline models are presented.
Table 3. The main results of comparing FedSW-TSAD and federated baseline models are presented.
ModelPSMSMAPMSLSMD
Pre Rec F1 Pre Rec F1 Pre Rec F1 Pre Rec F1
Autoformer(fl)97.7778.8886.64\\\84.0965.5772.6674.9282.3077.23
Informer(fl)77.9859.5864.11\\\80.3467.9072.1277.4491.1883.08
FEDformer(fl)76.6958.5462.64\\\79.1666.9571.3676.6489.5881.66
AT(fl)87.0283.5784.63\\\81.7769.4073.9387.0283.5784.63
FPT(fl)84.9380.0881.49\\\70.9073.2571.8584.9380.0881.49
DeepSVDD96.5764.4172.36\\\77.6969.3772.2686.0187.0285.77
FedAnomaly\\\67.3944.5853.6658.0858.5758.33\\\
FedSW-TSAD (5 clients)98.1597.6897.9166.56100.0079.9287.1485.8486.4993.7392.6193.17
Table 4. Ablation experiments results.
Table 4. Ablation experiments results.
ModelPSMSMAPMSLSMD
Pre Rec F1 Pre Rec F1 Pre Rec F1 Pre Rec F1
FedSW-TSAD (5 clients)98.1597.6897.9166.56100.0079.9287.1485.8486.4993.7392.6193.17
w/o TCN95.4091.2893.3060.1295.0973.6779.8380.2180.0288.1687.3387.74
w/o DP98.5998.0498.3171.30100.0083.2492.1386.5289.2494.0898.3796.18
w/o SW constraint93.7394.3094.0163.92100.0077.9985.3783.8784.6192.1291.1791.64
w/o Prediction Score74.1084.9679.1659.65100.0074.7294.5674.7283.48100.0045.6362.66
w/o SWGAN Score92.3290.1691.2344.25100.0061.3543.0499.9560.1781.3990.1685.55
Table 5. The number of parameters and FLOPs for each model component on the MSL dataset. The predictor accounts for most of the parameters and computational cost.
Table 5. The number of parameters and FLOPs for each model component on the MSL dataset. The predictor accounts for most of the parameters and computational cost.
ComponentNumber of Parameters
Generator22,345
Discriminator13,215
Predictor189,596
Total225,156
Computation StageFLOPs (MFLOPs)
Forward pass100.8
Backward pass201.6
Total302.4
Table 6. The runtime performance of the model across different GPU platforms on the MSL dataset. The model maintains low memory usage and high execution efficiency.
Table 6. The runtime performance of the model across different GPU platforms on the MSL dataset. The model maintains low memory usage and high execution efficiency.
MetricEquipment
RTX 3060 RTX 3090 RTX 4090
FLOPs per round302.4 MFLOPs302.4 MFLOPs302.4 MFLOPs
GPU time per round3.2 s1.5 s0.5 s
GPU memory usage2.1 GB2.1 GB2.1 GB
Table 7. Impact of quantization and pruning on model performance and resource consumption on the MSL dataset. The model remains effective even with significant parameter reduction.
Table 7. Impact of quantization and pruning on model performance and resource consumption on the MSL dataset. The model remains effective even with significant parameter reduction.
ConfigurationF1 Score
(%)
GPU Memory
(GB)
Parameters
(k)
Full model (FP32)86.492.1225
LSTM Quantized (FP16)83.951.7180
LSTM Quantized + Pruned (FP16, 50%)82.711.1113
Table 8. The results of federated learning performance across different numbers of simulated clients.
Table 8. The results of federated learning performance across different numbers of simulated clients.
Different Numbers of ClientPSMSMAPMSLSMD
Pre Rec F1 Pre Rec F1 Pre Rec F1 Pre Rec F1
Centralized97.9997.9197.9567.80100.0080.8188.5388.1888.3594.1092.9193.50
5 clients98.1597.6897.9166.56100.0079.9287.1485.8486.4993.7392.6193.17
10 clients95.2194.7594.9764.5697.0077.5284.5383.2683.9090.9289.8390.37
15 clients92.3591.9192.1262.6394.0975.2081.9980.7781.3888.1987.1487.66
Table 9. Performance under varying levels of client data noise on the SMD dataset.
Table 9. Performance under varying levels of client data noise on the SMD dataset.
ModelClean Data F1 (%)Noisy Data F1 (%) Δ F1 (%)
FedSW-TSAD (10% noisy)93.1789.253.92
FedSW-TSAD (15% noisy)93.1786.316.86
FedSW-TSAD (20% noisy)93.1780.0813.09
Table 10. Performance under different client dropout rates on the SMD dataset.
Table 10. Performance under different client dropout rates on the SMD dataset.
Dropout RateF1 (%)Precision (%)Recall (%)
0% (FedSW-TSAD)93.1793.7392.61
0.1092.5993.1592.04
0.2090.9891.8590.12
0.3088.4589.6087.33
0.4083.4785.2481.77
Table 11. F1-scores under varying α and β settings with fixed γ = 0.5 on four benchmark datasets. As α increases while β decreases (maintaining α + β = 0.5 ), the F1-scores improve across most datasets. The best F1-score for each dataset is highlighted in bold. Overall, the best configuration is observed at α = 0.35 , β = 0.15 .
Table 11. F1-scores under varying α and β settings with fixed γ = 0.5 on four benchmark datasets. As α increases while β decreases (maintaining α + β = 0.5 ), the F1-scores improve across most datasets. The best F1-score for each dataset is highlighted in bold. Overall, the best configuration is observed at α = 0.35 , β = 0.15 .
Fixing γ = 0.5PSMSMAPMSLSMD
α β Pre Rec F1 Pre Rec F1 Pre Rec F1 Pre Rec F1
0.050.4582.3385.0683.6759.56100.0074.6643.2799.9560.4091.7592.4592.10
0.100.4095.2784.9689.8262.38100.0076.8343.0999.9560.2291.2692.4591.85
0.150.3591.3888.6089.9762.38100.0076.8343.0999.9560.2292.1892.4592.31
0.200.3096.4588.6092.3666.56100.0079.9248.4899.9565.2993.6192.4593.03
0.250.2590.2493.1191.6566.56100.0079.9262.5899.9576.9793.8992.4593.16
0.300.2097.1795.9996.5866.56100.0079.9263.0699.9577.3394.1892.6193.39
0.350.1598.1597.6897.9166.56100.0079.9287.1485.8486.4993.7392.6193.17
0.400.1097.2491.7394.4058.85100.0074.1054.6599.9570.6693.0493.4793.25
0.450.0589.5394.0791.7447.39100.0064.3143.1399.9560.2694.0592.1493.09
Table 12. F1-scores under varying γ values with fixed α : β = 7 : 3 on four benchmark datasets. The best F1-score for each dataset is highlighted in bold. Moderate increases in γ lead to consistent performance gains, with γ = 0.50 yielding the most stable and accurate results across datasets.
Table 12. F1-scores under varying γ values with fixed α : β = 7 : 3 on four benchmark datasets. The best F1-score for each dataset is highlighted in bold. Moderate increases in γ lead to consistent performance gains, with γ = 0.50 yielding the most stable and accurate results across datasets.
Fixing α : β = 7 : 3 PSMSMAPMSLSMD
α β γ Pre Rec F1 Pre Rec F1 Pre Rec F1 Pre Rec F1
0.580.250.1772.3387.7979.3160.96100.0075.7561.1794.3074.2186.4891.5488.94
0.500.210.2985.2782.9984.1163.99100.0078.0454.2399.9570.3287.5892.6190.02
0.440.190.3891.3884.4487.7766.34100.0079.7665.4285.8474.2589.5692.6191.06
0.390.170.4498.7393.1195.8466.23100.0079.6853.3399.9569.5592.3992.6192.50
0.350.150.5098.1597.6897.9166.56100.0079.9287.1485.8486.4993.7392.6193.17
0.320.140.5591.9798.5995.1767.88100.0080.8752.7499.9569.0492.4493.2192.82
0.290.120.5894.6893.6094.1466.99100.0080.2363.8199.9577.8991.7593.2192.47
0.270.120.6292.6493.6093.1263.57100.0077.7352.1799.9568.5690.4393.4791.92
0.250.110.6490.4993.6092.0262.55100.0076.9663.1999.9577.4391.7691.8891.82
0.230.100.6787.1788.0587.6157.38100.0072.9271.0294.3081.0291.2691.8891.57
Table 13. F1-scores of FedSW-TSAD under varying privacy budgets ϵ , evaluated on four benchmark datasets. A larger ϵ implies weaker privacy but higher utility. The baseline ϵ = represents the non-private setting.
Table 13. F1-scores of FedSW-TSAD under varying privacy budgets ϵ , evaluated on four benchmark datasets. A larger ϵ implies weaker privacy but higher utility. The baseline ϵ = represents the non-private setting.
Privacy Budget ϵ PSMSMAPMSLSMDAvg.F1
Pre Rec F1 Pre Rec F1 Pre Rec F1 Pre Rec F1
ϵ = 1 93.7295.1494.4260.65100.0075.5182.0385.5583.7591.1187.9089.4885.79
ϵ = 2 98.1597.6897.9166.56100.0079.9287.1485.8486.4993.7392.6193.1789.37
ϵ = 3 98.1597.8898.0168.31100.0081.1788.8986.2387.5494.0094.4294.2190.23
ϵ = 4 98.3598.0098.1770.98100.0083.0390.4786.2388.3094.0097.0795.5191.25
ϵ = 98.5098.0098.2371.30100.0083.2492.0086.5089.1694.0098.3796.1391.69
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, X.; Zhao, H.; Zhang, W.; Cao, S.; Sun, H.; Zhang, B. FedSW-TSAD: SWGAN-Based Federated Time Series Anomaly Detection. Sensors 2025, 25, 4014. https://doi.org/10.3390/s25134014

AMA Style

Zhang X, Zhao H, Zhang W, Cao S, Sun H, Zhang B. FedSW-TSAD: SWGAN-Based Federated Time Series Anomaly Detection. Sensors. 2025; 25(13):4014. https://doi.org/10.3390/s25134014

Chicago/Turabian Style

Zhang, Xiuxian, Hongwei Zhao, Weishan Zhang, Shaohua Cao, Haoyun Sun, and Baoyu Zhang. 2025. "FedSW-TSAD: SWGAN-Based Federated Time Series Anomaly Detection" Sensors 25, no. 13: 4014. https://doi.org/10.3390/s25134014

APA Style

Zhang, X., Zhao, H., Zhang, W., Cao, S., Sun, H., & Zhang, B. (2025). FedSW-TSAD: SWGAN-Based Federated Time Series Anomaly Detection. Sensors, 25(13), 4014. https://doi.org/10.3390/s25134014

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop