Negative Feedback Matters: Exploring Positive and Negative Correlations for Time Series Anomaly Detection

Jin, Yixuan; Liu, Xueting; Hu, Bing; Walker, Joojo; Wang, Ke; Wu, Wei; Zhong, Ting

doi:10.3390/electronics14102068

Open AccessArticle

Negative Feedback Matters: Exploring Positive and Negative Correlations for Time Series Anomaly Detection

by

Yixuan Jin

¹,

Xueting Liu

¹

,

Bing Hu

^2,*,

Joojo Walker

³

,

Ke Wang

⁴,

Wei Wu

⁵ and

Ting Zhong

¹

School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu 610054, China

²

Zhengzhou Zhongyuan Environmental Protection Co., Ltd., Zhengzhou 450003, China

³

College of International Education, Chengdu University of Technology, Chengdu 610059, China

⁴

College of Electronics and information Engineering, Sichuan University, Chengdu 610065, China

⁵

Zhengzhou Aiwen Tech Co., Ltd., Zhengzhou 450047, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(10), 2068; https://doi.org/10.3390/electronics14102068

Submission received: 18 April 2025 / Revised: 9 May 2025 / Accepted: 10 May 2025 / Published: 20 May 2025

Download

Browse Figures

Versions Notes

Abstract

Recently, graph neural networks (GNNs) have demonstrated remarkable success in multivariable time series anomaly detection, particularly by explicitly modeling inter-variable relationships. However, to prevent the distinct pattern of one variable from introducing noise to unrelated variables, existing methods focus solely on leveraging positive correlations among neighbors for relationship modeling while neglecting the role of negative correlations. This limitation hinders their effectiveness in complex scenarios where both positive and negative dependencies are critical. To address this challenge, we propose PNGDN, a novel GNN framework that incorporates both positive and negative correlations to enhance anomaly-detection performance. Notably, PNGDN introduces a correlational graph structure learning module that simultaneously captures positive and negative dependencies. It quantitatively filters out spurious relationships based on the value of similarity, which serves as a unified threshold to screen both positive and negative correlations, allowing the model to focus on truly meaningful correlations among variables. Additionally, an attention-based information propagation mechanism ensures the efficient propagation of information under positive and negative correlations, facilitating accurate predictions for each variable. Extensive experiments on three benchmark time series anomaly detection datasets validate the superior performance of PNGDN.

Keywords:

time series; anomaly detection; graph neural networks; GNNs; negative correlation graph

1. Introduction

In multivariate time series data, malfunctions or attacks may result in data that significantly deviate from the normal regime, commonly referred to as anomalies. Discovering and flagging these deviations is often defined as multivariate time series (MTS) anomaly detection. Early and accurate detection of anomalies in multivariate time series data is critical for preventing operational disruptions and minimizing economic losses, including ensuring the safety of large-scale systems [1], detecting cyber-security intrusions [2], and preventing credit card fraud [3]. However, anomalies are inherently challenging to identify due to two key factors. First, they stem from rare events, making them sparsely represented in datasets and difficult to annotate comprehensively. Second, defining the entire spectrum of potential anomalous events is typically unfeasible, undermining the validity of supervised learning methods [4]. Consequently, unsupervised methods, which do not rely on labeled data, have emerged as the preferred approach for detecting MTS anomalies.

Traditional unsupervised techniques, such as linear-based methods [5], distance-based methods [6], density-based methods [7], classification-based methods [8,9], and distribution-based methods [10], often fail to incorporate temporal dependencies and have difficulty with high-dimensional variables, limiting their real-world applicability. Recent advancements in deep learning have introduced reconstruction-based techniques [11,12] and prediction-based methods [13,14], which leverage errors as indicators of anomalies. However, these methods frequently overlook explicit inter-variable relationships, hindering their ability to fully exploit inherent MTS correlations. Methods using graph neural networks (GNNs), such as the graph deviation network (GDN) [15] and graph learning with Transformer for anomaly detection (GTA) [16], address these limitations by capturing both temporal and spatial dependencies between variables, offering improved insights into their behaviors and interactions [17].

Nevertheless, after carefully analyzing the dataset characteristics in existing studies, we highlight the following challenges in the current application of GNNs for MTS: (1) Inability to handle relational noise: A subset of methods does not filter out relational noise but treats all relationships as a single correlation graph, which hinders the accurate modeling of inter-variable dependencies. (2) Focus on positive correlations only: In MTS, unique patterns in one variable may influence unrelated variables, thereby increasing computational cost. Hence, most existing approaches focus solely on modeling positive correlations between variables, disregarding the critical role of negative correlations. As shown in Figure 1, by excluding negative correlations, existing GNN-based methods miss important relational features, leading to lower prediction errors during anomalies, reducing anomaly detectability and weakening their effectiveness in complex scenarios.

To tackle the aforementioned challenges, we draw inspiration from CrossGNN, an early attempt to use signed variable correlation graphs in MTS forecasting, and propose the Positive and Negative correlation Graph Deviation Network (PNGDN), which decomposes inter-variable relationships into positive and negative correlation dependencies. These dependencies are modeled separately to capture their respective impacts on the time series, making the method well-suited for anomaly detection in complex scenarios. To better model positive and negative correlations and eliminate relational noise from unrelated variables, the method first utilizes a variable embedding module and an inter-variable correlation graph learning module to extract variable features and construct positive and negative correlation graphs with strong relationships. To address the differences in information propagation across positive and negative neighbors, we further apply an attention-based mechanism to each graph to generate the predicted values for each variable. This design amplifies prediction errors during anomalies and enhances interpretability. Finally, the anomaly detector uses prediction errors to identify anomalies in MTS. Experimental results demonstrate that PNGDN achieves superior performance on real-world datasets. Overall, our contributions are summarized as follows:

Motivated by the observed significance of both positive and negative correlations in time series data, we introduce PNGDN, a novel framework that models these inter-variable relationships comprehensively, mitigating relational noise and enhancing anomaly detection accuracy. This approach addresses the critical limitations of prior methods that rely solely on positive correlations, enabling a more accurate representation of dynamic dependencies.
We design a tailored message-passing strategy based on attention mechanisms, explicitly separating the effects of positive and negative correlations. This innovative approach amplifies prediction errors, significantly enhancing the model’s ability to distinguish anomalies.
Extensive evaluations on three real-world public datasets demonstrate that PNGDN consistently outperforms baseline models in anomaly detection. Furthermore, ablation studies and interpretability analyses further validate its effectiveness and practical applicability.
PNGDN shows strong practical potential for real-world scenarios requiring timely anomaly detection. In cyber–physical systems (CPSs), such as industrial automation and intelligent transportation, it captures complex sensor dependencies to enable accurate detection. Similarly, in environmental monitoring, it detects air quality anomalies by modeling correlations like those between PM2.5 and $O_{3}$ or wind speed [18,19].

2. Related Works

2.1. Anomaly Detection for Unsupervised Multivariate Time Series

Unsupervised MTS anomaly detection seeks to identify data points that deviate from the expected patterns of the data-generating process [20]. Traditional unsupervised MTS anomaly detection methods, including linear-based, distance-based, density-based, classification-based, and distribution-based approaches, have foundational value but face critical limitations. Linear-based methods, such as principal component analysis (PCA) [5], detect anomalies by projecting data onto orthogonal principal components and measuring deviations in the reduced-dimensional space. Distance-based methods, like K nearest neighbors (KNNs) [6], label anomalies by calculating the sum of distances to the k nearest neighbors and identifying points with the largest distance scores. Density-based methods, such as the local outlier factor (LOF) [7], identify anomalies by clustering data and flagging points in low-density clusters. Classification-based approaches, such as one-class support vector machines (OC-SVMs) [8] and support vector data description (SVDD) [9], define normality regions and score anomalies based on their distance from these boundaries. Distribution-based methods, such as the isolation distributional kernel (IDK) [10], rely on distributional similarity to isolate anomalies.

The emergence of deep learning has driven advancements in unsupervised MTS anomaly detection, with notable contributions in reconstruction-based models and prediction-based methods. Reconstruction-based models, including anomaly Transformer [1], autoencoders (AEs) [11], long short-term memory-based variational autoencoders (LSTM-VAEs) [12], and multivariate anomaly detection with generative adversarial networks (MAD-GANs) [21], excel at learning normal data patterns. Their reconstruction errors increase during anomalies, providing a reliable detection signal. Prediction-based methods, such as long short-term memory with non-parametric dynamic thresholds (LSTM-NDTs) [13], utilize forecasting models to predict normal data behavior. Discrepancies between predicted and actual values during anomalies yield larger prediction errors, effectively highlighting deviations. While, in real-time monitoring scenarios, frequent access to historical data is costly, and labeled anomalies are often scarce. To tackle distribution shifts under such constraints, When Model Meets New Normals (M2N2) [14] explores a test-time adaptation strategy that adjust models on-the-fly during inference to enhance model robustness.

It is crucial to highlight that the aforementioned traditional techniques fail to incorporate temporal dependencies and suffer significant performance degradation as data dimensionality increases, limiting their real-world applicability [22]. While deep learning methods advance anomaly detection, they often fall short in modeling thorough inter-variable relationships and mitigating the relational noise for complex MTS scenarios. This underscores the need for more sophisticated approaches like our proposed PNGDN model that not only address these gaps but also ensure robustness under real-time conditions.

2.2. Graph Neural Networks in Time Series Anomaly Detection

GNNs have emerged as powerful tools for capturing complex relationships among variables in time series, offering a robust foundation for real-world MTS modeling [23]. GNNs map variable dependencies to graph structures, thereby enabling effective information propagation and feature representation. Typically, GNN-based models assume that a variable’s state is influenced by its neighbors, leveraging graph convolutional networks (GCNs) [24] or graph attention networks (GATs) [25] to aggregate information. These methods explicitly model inter-variable relationships, advancing tasks such as time series prediction, anomaly detection, classification, and imputation.

To address the intricate dependencies in real-world scenarios, recent studies combine GNNs with temporal modeling frameworks to jointly capture spatial and temporal correlations. Early works primarily focused on forecasting [26], but recent advances highlight GNNs’ potential in anomaly detection. For instance, the GDN model [15] constructs graphs based on positively correlated variables, using attention mechanisms to model dependencies, and detect anomalies via prediction errors. Similarly, the GTA model [16] learns variable relationships using graph structures, employs Transformer-based temporal modeling, and detects anomalies via reconstruction errors. Additionally, CrossGNN [27] takes an early step in modeling variable homogeneity and heterogeneity via signed graphs for forecasting.

However, prior GNN-based approaches for MTS anomaly detection overlook this critical aspect: the role of negative correlations in variable dependencies. By focusing exclusively on positive correlations, these methods fail to fully capture the complexity of real-world scenarios, reducing their effectiveness. PNGDN addresses this limitation by constructing both positive and negative correlation graphs and integrating distinct information propagation mechanisms for each graph. This dual-correlation approach not only models thorough inter-variable relationships but also mitigates relational noise introduced by irrelevant variable patterns, significantly enhancing anomaly detection in complex MTS environments.

3. Methods

3.1. Problem Definition

In our work, MTS is defined as

X = {x^{(1)}, x^{(2)}, \dots, x^{(N)}}

and

x^{(i)} = {x_{1}^{(i)}, x_{2}^{(i)}, \dots, x_{T}^{(i)}}

, which denotes data collected from N variates over T time steps. At time step t, we use a sliding window of size L and stride S over historical MTS data to define the model input as

X_{t}^{in} \in R^{N \times L}

and the target output

{\hat{x}}_{t} \in R^{N}

, where

{\hat{x}}_{t}

and

x_{t}

represent the predictions and observations for all variables at time step t separately. Concretely, the input

X_{t}^{in}

can be expressed as defined in (1):

X_{t}^{in} = [x_{t - L}, x_{t - L + 1}, \dots, x_{t - 1}]

(1)

According to the standard unsupervised MTS anomaly detection framework, the training data are assumed to remain in normal states throughout, and the task objective is to determine whether a given

x_{t}

is an anomaly or not.

3.2. Overall Structure

The core idea of our proposed PNGDN model is to effectively leverage the typical positive and negative correlations among time series variables, making the model better suited for complex real-world scenarios. The detailed structure of PNGDN is illustrated in Figure 2. Specifically, it begins with a variable embedding module to capture each variable’s behaviors and characteristics. Next, the inter-variable correlation graph structure learning module picks strong positive and negative correlations and eliminates relational noise from irrelevant variables, constructing the inter-variable correlation graphs. The attention-based information propagation and data forecasting module is then employed, leveraging graph attention weights to capture the diverse influence of neighboring variables for variable representation and forecasting. Finally, the anomaly detector identifies anomalies by calculating the absolute error between predicted and actual values, thereby identifying the time and variables associated with the anomalies. This approach can amplify prediction errors under anomalous conditions, further enhancing anomaly-detection performance.

3.3. Variable Embedding

In the MTS task, most data originate from different devices that vary simultaneously over time. These variables may exhibit unique but interrelated characteristics. For instance, data measured from identical components of two similar devices tend to exhibit significant positive correlations, while data from components with opposite functions in the same device often show strong negative correlations. To capture these relationships, we introduce a multidimensional embedding vector of size h for each variable to represent its distinct behaviors and characteristics. These variable embeddings, denoted as

{vec}_{i} \in R^{h}, i \in 1, 2, \dots, N

, are initialized randomly and trained alongside the rest of the model using multivariate data within each sliding window. Through training, the embeddings

{vec}_{i}

serve as part of the GNN parameters, enabling the model to better capture variable-specific properties that are otherwise difficult to model through shared GNN weights alone. High absolute values of cosine similarity between embeddings indicate strong variable correlations.

In this model, variable embeddings serve two main purposes: (1) To determine typical dependencies and relational noise among variables and then form the inter-variable correlation graph. (2) To provide variable-specific information to enhance attention-based information propagation and data forecasting, helping the model to distinguish heterogeneous contributions from different neighbors during prediction.

3.4. Inter-Variable Correlation Graph Structure Learning

Most existing studies either focus solely on positive correlations among variables or fail to effectively filter out relational noise in MTS anomaly detection. As a consequence, the deviation between the predicted and actual values during anomalous intervals becomes insufficient, making anomalies harder to distinguish and ultimately degrading detection performance. To address these limitations, this module learns both positive and negative correlations among variables and generates the inter-variable correlation graph

G

. We represent this graph as

G = (V, E)

, where

V = {v_{1}, v_{2}, \dots \dots, v_{N}}

denotes the set of variable nodes,

v_{i}

is defined as the i-th variable node, and N is the total number of variables. The edge set refers to

E

, where

E_{j, i}^{+}

and

E_{j, i}^{-}

denote the edge set with positive correlations and negative correlations, respectively, for variable

v_{i}

.

In the process of developing the graph

G

, we firstly define the candidate correlation set

P_{i}

for variable

v_{i}

as all variables excluding itself, i.e.,

P_{i} = \{v_{1}, v_{2}, \dots \dots, v_{N}\} ∖ {v_{i}}

. To determine the dependencies for variable

v_{i}

, the similarity score

e_{j, i}

is computed between the embedding of node i and the embeddings of each node in

P_{i}

, which can be formulated as

e_{j, i} = \frac{{vec}_{i}^{⊤} {vec}_{j}}{∥ {vec}_{i} ∥ \cdot ∥ {vec}_{j} ∥} for j \in P_{i}

(2)

Then, we choose nodes with

K^{+}

maximum similarity scores as positively correlated neighbors, forming positive correlations, and nodes with

K^{-}

minimum similarity scores as negatively correlated neighbors, forming negative correlations. For variable

v_{i}

, its node sets can be represented as

V^{+} (v_{i})

for positively correlated neighbors and

V^{-} (v_{i})

for negatively correlated neighbors. The values of

K^{+}

and

K^{-}

are determined based on the dataset characteristics and are used to select variable pairs with sufficiently large absolute similarity, ensuring that both strong positive and strong negative correlations are retained while filtering out noisy or irrelevant relationships. This process is formally represented as

E_{j, i}^{+} = 1 \{j \in V^{+} (v_{i}) ({e_{k, i} : k \in P_{i}})\}

(3)

E_{j, i}^{-} = 1 \{j \in V^{-} (v_{i}) ({e_{k, i} : k \in P_{i}})\}

(4)

where

E_{j, i}^{+}

denotes the directed edges from node i to node j, which models the positive correlations from variable

v_{i}

to neighboring variable

v_{j}

as a directed edge. Similarly,

E_{j, i}^{-}

represents the directed edges capturing negative correlations from variable

v_{i}

to neighboring variable

v_{j}

.

3.5. Attention-Based Information Propagation and Data Forecasting

In this paper, we adopt a prediction-based method for anomaly detection in MTS data, assuming low prediction errors during normal periods and higher errors during anomalies. To enhance prediction accuracy and amplify errors for anomalies, we employ an attention-based mechanism to propagate and aggregate positive and negative correlation information based on the inter-variable correlation graph

G

. The resulting representation is then used to predict future data at time step t. The detailed process is described below.

Since the processes of information aggregation and propagation for positive and negative correlations follow the same logic, we take the case of positively correlated neighbors as an example for detailed explanation. We leverage the aforementioned attention-based mechanism as the feature extractor. Considering each variable’s unique properties, we concatenate the variable embedding

{vec}_{i}

with the appropriately transformed historical data

X_{t}^{in}

to combine temporal and spatial features simultaneously, which is outlined as follows:

q_{i} = {vec}_{i} \oplus W x^{(i)}

(5)

where ⊕ represents concatenation and the subscript t is disregarded. Then, the resulting combination

q_{i}

is processed together with the learning coefficient

a

for the attention mechanism, and the attention coefficients are computed by

LeakyReLU

. These coefficients, denoted as

α_{i, j}

are then normalized using the

softmax

function. The entire process can be represented as

α_{i, j}^{+} = softmax {LeakyReLU (a^{⊤} (q_{i} \oplus q_{j}))}

(6)

After obtaining the attention coefficients

α_{i, j}^{+}

, we can obtain the positive aggregated representation

z_{i}^{+}

for positively correlated neighbors. This can be expressed in the following form:

\begin{matrix} z_{i}^{+} = M P N N (x_{t}^{(i)}, {vec}_{i}, α_{i, j}^{+}, E_{j, i}^{+}) = R e L U (α_{i, i} W x^{(i)} + \sum_{j \in V^{+} (v_{i})} α_{i, j}^{+} W x^{(j)}) \end{matrix}

(7)

where

MPNN

denotes the message-passing neural network for positively correlated neighbors of variable node

v_{i}

.

Hence, for each variable node

v_{i}

,

Z_{i}

is defined as the aggregated representation of information for variable node

v_{i}

. The corresponding mathematical expressions are

\begin{matrix} {z_{i}}^{-} = M P N N (x_{t}^{(i)}, {vec}_{i}, α_{i, j}^{-}, E_{j, i}^{-}) = R e L U (α_{i, i} W x^{(i)} + \sum_{j \in V^{-} (v_{i})} α_{i, i}^{-} W x^{(j)}) \end{matrix}

(8)

Z_{i} = {z_{i}}^{+} + {z_{i}}^{-}

(9)

where

z_{i}^{-}

depicts the negative aggregated representation for variable node

v_{i}

.

Subsequently, we acquire the representations of all variables at time step t, denoted as

{Z_{1}, Z_{2}, \dots, Z_{N}}

, where the subscript t is omitted for simplicity. However,

Z_{i}

does not explicitly differentiate the inherent properties of individual variables. To bridge the gap between variable representations and the intrinsic characteristics of each variable, we incorporate variable-specific embeddings

{vec}_{i}

into the forecasting process. This allows the network to emphasize or suppress different dimensions in

Z_{i}

according to each variable’s unique preferences, effectively acting as a soft gating mechanism.

As a result, for each

Z_{i}

, the corresponding variable embedding

{vec}_{i}

is element-wise multiplied with

Z_{i}

(denoted ∘), and the result serves as input to stacked fully connected layers with input dimension h and output dimension N. This layer predicts the values of all variables at time t, represented as

{\hat{x}}_{t}

{\hat{x}}_{t} = f_{θ} ([{vec}_{1} \circ Z_{1}, {vec}_{2} \circ Z_{2}, \dots, {vec}_{N} \circ Z_{N}])

(10)

Finally, we define the mean squared error between the predicted output

{\hat{x}}_{t}

and the observed output

x_{t}

for each variable as the loss function, which is formulated as follows:

L_{M S E} = \frac{1}{T - L} \sum_{t = L + 1} {||{\hat{x}}_{t} - x_{t}||}_{2}^{2}

(11)

3.6. Anomaly Detector

For better performance in detecting and interpreting anomalies, we compute individual anomaly scores for each variable and aggregate them into a single anomaly score for each time step, enabling a clearer identification of which variables exhibit anomalies. The anomaly score is determined by comparing the predicted data with the observed true data at time t, calculating the absolute error for variable

v_{i}

as

{Dev}_{t}^{(i)}

. Furthermore, given the varying characteristics of different variables, their anomaly deviations may differ in magnitude. To prevent any single variable’s anomaly deviation from overshadowing others, we normalize the predicted values for each variable. This process is formalized as

{Dev}_{t}^{(i)} = |x_{t}^{(i)} - {\hat{x}}_{t}^{(i)}|

(12)

d_{i} (t) = \frac{{Dev}_{t}^{(i)} - {\bar{μ}}_{i}}{{\bar{σ}}_{i}}

(13)

where

{\bar{μ}}_{i}

and

{\bar{σ}}_{i}

are computed as the median and inter-quartile range of

{Dev}_{t}^{(i)}

over the temporal dimension, as these metrics are more robust to anomalies than mean and standard deviation.

To calculate the overall anomaly score

A_{t}

at time t, our model combines the deviations of all variables through a maximum function, effectively emphasizing the variables displaying anomalies. Also, a simple moving average [13] is applied to

A_{t}

to smooth the score and mitigate the impact of sharp spikes in normal data.

At last, if

A_{t}

at time step t surpasses the anomaly threshold, this time step is identified as anomalous. Although techniques like extreme value theory [28] can be employed to set the threshold, an adaptive threshold is used for simplicity. It is set as the maximum of

A_{t}

in different validation datasets, avoiding the introduction of additional hyperparameters, and aligns better with the real-time, unsupervised nature of MTS anomaly detection tasks.

4. Experiments

4.1. Datasets

We conducted experiments on three real-world publicly available datasets for MTS anomaly detection: Secure Water Treatment (SWaT) [29], Water Distribution (WADI) [30], and Server Machine Dataset (SMD) [31].

The SWaT and WADI datasets are widely used benchmarks in cyber–physical system (CPS) security research; both are provided by operational testbeds simulating real-world water infrastructures. The SWaT dataset originates from a six-stage water treatment system managed by the Singapore Public Utilities Board, incorporating PLCs, EtherNet/IP, and CIP protocols. It provides data collected from 51 sensors, supporting anomaly detection and attack validation studies. In contrast, the WADI dataset is collected from a three-stage water distribution system equipped with extensive pipeline networks and uses the Modbus/TCP protocol. It offers a larger and more complex dataset with 127 sensors, and presents greater modeling challenges due to its higher dimensionality. However, as both systems require 5–6 h to stabilize [32], we excluded the initial 21,600 unstable data points generated during this period.

The SMD is from a large Internet company, comprising multivariate time series data monitored from 28 server machines over five weeks. Each observation includes 38 metrics such as CPU load, network usage, and memory usage, sampled at one-minute intervals. Designed for entity-level anomaly detection, the SMD captures complex operational behaviors of server machines under real-world conditions, providing a benchmark for evaluating robustness in industrial monitoring scenarios. Given that data from the 28 servers were collected simultaneously, we opted to train and test models for each server individually. The performance metrics are aggregated through averaging validation results across all servers, enabling the selection of the optimal model. This approach ensures robustness and generalizability by tailoring the anomaly detection framework to the operational dynamics of distinct server environments while maintaining statistical reliability.

The statistics of the three datasets are summarized in Table 1. Due to the high frequency of the original sampling points in the three datasets resulting in large data volumes, we applied downsampling to reduce training time. Concretely, measurements were aggregated at 10-second intervals, with the median value taken as the processed data point. Similarly, the label for each interval was determined as the most frequent label within the 10-second window.

4.2. Baselines

We compare our model, PNGDN, with different baselines representing the main categories of anomaly detection methods: (1) linear-based methods (e.g., PCA [5]), which project data onto principal components and identify anomalies through deviations in lower-dimensional subspaces; (2) distance-based methods (e.g., KNN [6]), which calculate distances to the k nearest neighbors and detect anomalies via aggregated proximity scores; (3) ensemble-based methods (e.g., FB [33]), which train multiple detectors on random feature subsets to enhance diversity and detect anomalies by aggregating their outlier scores; (4) density-based methods (e.g., DAGMM [34]), which evaluate the density of time series data to identify anomalies; (5) reconstruction-based methods (e.g., AE [11], LSTM-VAE [12], MAD-GAN [21]), which encode subsequences of normal training time series data in latent space to model normal behavior and detect anomalies based on reconstruction errors; (6) prediction-based models (e.g., M2N2 [14] and GDN [15]), which learn predictive models to forecast future variable values from the current context window and identify anomalies using prediction errors.

4.3. Experiment Setup

Our model is implemented using PyTorch version 1.12.1 with CUDA 11.1 [35] and PyTorch Geometric library version 1.12.1 [36]. Training is conducted on a server equipped with a 12th Gen Intel(R) Core(TM) i5-12400F 2.50 GHz and a NVIDIA GeForce RTX 2070 GPU. For all three datasets, we use embedding vectors with length of 64 and hidden layers with 128 neurons. At the same time, the value of

K^{+}

is defined as 5 or 30 and

K^{-}

is defined as 5 or 10, respectively, where their values are adjusted based on the number of variables in every dataset. Moreover, the sliding window size is set to five for all datasets. In addition, we choose the Adam optimizer with a learning rate of

1 \times 10^{- 3}

for the training of our model. Also, we trained it for a maximum of 30 epochs, with early stopping applied when the performance does not improve for 10 consecutive epochs.

4.4. Performance Comparison and Discussion

We conducted experiments on three real-word datasets to evaluate our model, comparing its performance with the aforementioned eight baseline models in terms of precision (P), recall (R), and F1 -score（F1）, area under receiver operating characteristic curve (AUROC, ROC), and area under the precision-recall curve (AUPRC, PRC). The first three are calculated as follows:

P = \frac{TP}{TP + FP}, R = \frac{TP}{TP + FN}, F 1 = \frac{2 \cdot P \cdot R}{P + R}

(14)

where TP, TN, FP, and FN are the numbers of true positives, true negatives, false positives, and false negatives, respectively. Higher values across these three metrics indicate better performance. At the same time, the AUROC evaluates model performance across all possible thresholds, reducing sensitivity to any single decision point, while the AUPRC is particularly appropriate for assessing performance under class imbalance [37,38], ensuring fairer evaluation. The experimental results are presented in Table 2.

The results demonstrate that PNGDN achieves the best performance across three real-world datasets: SWaT, WADI, and SMD (except for precision on the WADI dataset). Specifically, on the SWaT dataset, which represents an industrial water treatment system, the F1-score of our model achieves 83%, significantly outperforming M2N2 (78%) and GDN (81%). This highlights PNGDN’s superior ability to capture both inter-variable correlations and temporal dependencies. It also achieves the highest AUROC (88%) and AUPRC (81%) on SWaT, indicating strong and stable anomaly discrimination performance even under class imbalance. On the WADI dataset, which contains higher dimensional data and a lower anomaly ratio (5.82%), PNGDN still maintains an F1-score of 54%, surpassing GDN (51%) and demonstrating robustness in high-dimensional sparse scenarios. In this case, PNGDN again leads with an AUROC of 79% and AUPRC of 49%, showing its ability to generalize under sparse anomaly conditions and to detect rare events with greater precision than GDN (78%/46%) and MAD-GAN (73%/32%). For the SMD, PNGDN obtains an F1-score of 60%, slightly outperforming MAD-GAN (57%), GDN (58%), and M2N2 (57%). Although the performance gap is narrower compared to that in SWaT and WADI, the results still underscore PNGDN’s adaptability to complex IT environments. Additionally, PNGDN achieves a competitive AUROC (86%) and the highest AUPRC (51%) on SMD, further confirming its effectiveness in handling subtle and imbalanced anomalies in large-scale server systems.

The main challenges of the SMD lie in two aspects: First, the presence of 38 metrics (e.g., CPU usage, network traffic) introduces complex interdependencies and noise. Second, most anomalies appear as short-term resource spikes, such as brief CPU peaks. These anomalies are less distinguishable compared to the sustained attacks observed in SWaT or WADI. As a result, they require more fine-grained temporal modeling. It is worth noting that the F1-score, AUROC, and AUPRC ranking across the three datasets (SWaT > SMD > WADI) aligns well with their intrinsic difficulty: SWaT features denser and clearer anomalies, WADI poses the greatest challenge due to its high dimensionality and sparsity, and SMD lies in between due to its heterogeneous metrics and low anomaly salience. These comprehensive results verify the generalizability of PNGDN across diverse and challenging MTS anomaly detection scenarios.

4.5. Ablation Study

We conducted several ablation experiments on the three datasets using modified models of PNGDN by removing specific modules to evaluate the necessity of each component. We mainly investigate the following variants: TOPK+ uses only positive correlation graphs constructed from the top

K^{+}

positively correlated neighbors to assess the effectiveness of the negative correlation graph. TOPK-all replaces the dynamic learned graph with a fully connected static graph linking all variables to examine the importance of the learned graph structure. Shared message-passing (SMP) retains the positive and negative correlation graphs but replaces their separate propagation mechanisms with a unified shared propagation mechanism to evaluate the impact of distinct attention-based information propagation. -EMB uses an attention mechanism without variable embeddings to analyze their unique roles in the information propagation and data forecasting process. -ATT disables the attention mechanism entirely, assigning equal weights to all neighbors during information propagation.

The results of the ablation experiments are shown in Figure 3. The detailed analysis of the results for each modified model is as follows:

TOPK+: As shown in the figure, TOPK+ maintains or slightly improves precision (e.g., 0.99 on SWaT and 0.89 on WADI) but suffers significant recall declines (4.0%, 9.1%, and 5.0% for SWaT, WADI, and SMD, respectively), resulting in overall F1-score reductions of 3.0–4.0%. The AUROC remains strong (0.88 on SWaT, 0.82 on WADI, and 0.85 on SMD), showing that despite the lower recall, TOPK+ still has a good ability to distinguish anomalies. The AUPRC shows a slight drop (0.80 on SWaT, 0.47 on WADI, and 0.46 on SMD), reflecting the impact of recall reductions. This decline in recall can be attributed to TOPK+’s reliance solely on positive correlation graphs, which overlooks negative correlations that may be crucial for modeling time series data, leading to a reduced sensitivity in anomaly detection.
TOPK-all: The static fully connected graph improves precision by 17.0% on WADI (0.86 vs. PNGDN 0.69) but collapses recall by 10.2%, leading to a 5.1% F1-score drop. On SWaT and SMD, both precision and recall degrade (e.g., precision drops 2.0% and 3.1%), with F1-scores declining 5.0% and 4.1%. The AUROC on SWaT drops from 0.88 to 0.86, and on WADI from 0.79 to 0.79, while AUPRC drops from 0.81 to 0.77 on SWaT and 0.45 on WADI. It indicates that the static graph introduces relational noise that hinders the model’s ability to detect anomalies, highlighting the necessity of selectively learning inter-variable dependencies.
SMP: The variant marginally increases WADI precision by 19.0% (0.88 vs. 0.69) but reduces precision on SWaT and SMD by 5.1% and 4.2%. Recall declines sharply across datasets (e.g., 11.0% on WADI), causing F1-score reductions up to 7.1%. The AUROC on SWaT drops from 0.88 to 0.84, and on WADI from 0.79 to 0.76, while AUPRC drops from 0.81 to 0.71 on SWaT and 0.42 on WADI. This indicates that, in the SMP mechanism, information from positive and negative correlations may cancel each other out, leading to the loss of critical information.
-EMB: Removing the variable embeddings from the attention mechanism reduces the average precision by 6.0% (e.g., SWaT: 0.93 vs. 0.99) and recall by 14.0% (e.g., WADI: 0.30 vs. 0.44), resulting in an 8.0–10.1% F1-score declines. The AUROC on SWaT drops from 0.88 to 0.83 and on WADI from 0.79 to 0.74. The AUPRC also decreases from 0.81 to 0.69 on SWaT and from 0.49 to 0.39 on WADI. This demonstrates that variable embeddings are essential to tailor the attention mechanism for the characteristics specific to the individual variable. Without ${vec}_{i}$ , the model loses its ability to selectively emphasize informative features for each variable, resulting in less effective detecting accuracy.
-ATT: Completely disabling the attention mechanism and assigning equal weights to all neighbors degrades both the precision (e.g., 27.3% drop on SWaT) and recall (e.g., 12.0% drop on WADI), with F1-scores plummeting by up to 15.2%. The AUROC drops from 0.88 to 0.80 on SWaT and from 0.79 to 0.71 on WADI, while the AUPRC decreases from 0.81 to 0.61 on SWaT and from 0.49 to 0.34 on WADI, demonstrating the importance of the attention mechanism for preserving model sensitivity and robustness.

5. Interpretability of Model

To analyze the interpretability of the variable embeddings and the inter-variable correlation graph in the PNGDN model, we make use of the t-SNE method [39] and the similarity of our variable embeddings to visualize the variable embeddings. First, to verify whether the variable embeddings effectively reveal behavioral patterns and correlations among variables, t-SNE is utilized to cluster similar variables, making their distributions easier to observe. Secondly, considering that the WADI dataset corresponds to a real-world system with seven distinct sensor categories, the similarity of the variable embeddings is applied to classify the variables. Different colors are assigned to the seven kinds of variables based on the similarities, ensuring that data points belonging to the same category are displayed in the same color, which facilitates the distinction of variable distributions and their correlations. Building on the above conditions, Figure 4 illustrates the effectiveness of variable embeddings and their roles in the inter-variable correlation graph with practical examples. Detailed interpretations are outlined as follows.

In Figure 4, variables from the same category cluster together as points of the same color, with positively correlated variables forming uniform-color clusters and negatively correlated variables appearing in different colors and spatially distant from the positive clusters. For example, the dark blue dashed line highlights sensors positively correlated with

2_F I C_101_S P

, such as

2_F I C_201_S P

and

2_F I C_101_P V

, which measure similar metrics in similar water tanks. Conversely, the red dashed line identifies negatively correlated sensors, such as

2_F I T_002_P V

, which are either located in tanks with opposing functions or measure opposite metrics. These results confirm that variable embeddings effectively reveal variable correlations in inter-variable correlation graphs.

In addition to the qualitative visualization, we further conduct a quantitative analysis to verify whether the variable embeddings preserve real-world dependency structures. Specifically, we compare the cosine similarity matrix derived from the learned variable embeddings with the Pearson correlation coefficient matrix computed directly from the raw MTS data in WADI dataset. As shown in Figure 5, the two matrices exhibit similar structural patterns despite differences in their absolute similarity magnitudes, indicating that the variable embeddings,

{vec}_{i}

, capture the true statistical correlations among variables. This strongly supports the interpretability and reliability of the variable embeddings for correlation graph construction. At the same time, it should be emphasized that the primary purpose of the variable embeddings is to capture the inherent characteristics of each variable and facilitate the construction of the inter-variable correlation graph. They are not designed to directly reflect or detect real-world anomalies.

Furthermore, the attention weights in the model, represented as the weights of different edges in the graph, indicate the contribution of various positively and negatively correlated neighbors in modeling variable behavior. To provide a more detailed interpretability analysis of this attention mechanism, we carry out a specific case with a known root cause. As described in the WADI dataset documentation, this anomaly was caused by the malicious activation of the

2_M C V_007

device, leading to water leakage anomalies before water reached consumers.

The model identified the variable

2_M C V_007_C O

as having the highest anomaly score during this attack, indicating that

2_M C V_007

was the compromised device. Furthermore, the variable called

2_F I C_401_P V

, shown in Figure 6 (below), was detected as the positively correlated neighbor of

2_M C V_007_C O

with the highest attention weight. By comparing the predicted values and observations of these two variables during the anomalous period in the plot, the anomaly can be further understood. Specifically, for

2_M C V_007_C O

, its predicted values, influenced by the positive correlation with

2_F I C_401_P V

, remained at relatively low normal levels for the most part, resulting in a large deviation from the ground truth data.

For negatively correlated neighbors of the variable

2_M C V_007_C O

, the variable with the highest attention weight detected by the model is

2 B_A I T_004_P V

. Comparing the predicted and actual values of them within the anomaly time segment shown in Figure 7 reveals that the negative correlation causes the prediction of

2_M C V_007_C O

to decrease after a brief increase and then remain at low normal levels. This amplifies the deviation between the true and predicted values of the variable, making the anomaly easier to detect.

Overall, the PNGDN model is capable of detecting anomalies by anomaly scores and influencing the predicted values of variables via correlation relationships and attention weights, thereby amplifying the predicted deviation during anomalies and providing better insight into how the true anomalies deviate from the normal values.

6. Conclusions

This paper proposed PNGDN, a novel multivariate time series anomaly detection model that simultaneously captures positive and negative correlations among variables. By integrating variable embeddings, an inter-variable correlation graph learning module, and an innovative attention-based information propagation mechanism, PNGDN overcomes the limitations of previous methods that relied solely on positive correlations. This dual-correlation approach not only enhances anomaly-detection performance but also improves interpretability by distinctly modeling the contributions of both positive and negative dependencies. To the best of our knowledge, this is the first time negative inter-variable correlations have been explicitly incorporated into GNN-based time series anomaly detection, enabling richer structural understanding and more robust decision-making.

Extensive experiments demonstrate that PNGDN consistently outperforms traditional methods and prior GNN-based models, establishing it as a robust solution for detecting anomalies in complex real-world scenarios. In practical terms, incorporating negatively correlated variables can amplify prediction errors when anomalies occur, further improving detection sensitivity. Also, by using the adaptive threshold strategy, the results of the AUROC and AUPRC confirms that the performance of our model is not affected by the distribution between train and test dataset and aligns better with the unsupervised nature of MTS anomaly detection tasks. Therefore, PNGDN holds strong potential for real-time monitoring in CPS such as water treatment, smart grids, and manufacturing, where data come from diverse sensors and the early detection of abnormal behaviors is essential for system reliability and safety. Specifically, in environmental monitoring, PNGDN identifies air quality anomalies by capturing intricate interactions among pollutants and weather factors, even when the relationships vary in direction. Also, in financial systems, such as credit card fraud detection, PNGDN can identify anomalous transactions by modeling complex correlations among user behavior features, including transaction amount, frequency, location, and device consistency. In the future, PNGDN may be used in more and more time series anomaly detection scenarios to help detect problems early and improve the reliability of important systems.

Author Contributions

Conceptualization, T.Z.; methodology, Y.J.; data curation, Y.J.; software, B.H.; validation, X.L.; formal analysis, Y.J. and X.L.; investigation, K.W.; resources, W.W.; writing—original draft preparation, Y.J.; writing—review and editing, X.L. and J.W.; visualization, Y.J.; supervision, B.H.; project administration, T.Z.; funding acquisition, T.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the Science and Technology Program of Xizang Autonomous Region, under Grant (No. XZ202401ZY0008).

Data Availability Statement

The authors confirm that the data supporting the findings of this study are available within the article. The experimental object of this study is three public time series anomaly detection datasets, the experimental environment is PyTorch version 1.12.1, and the simulation settings, such as embedding vectors, hidden layers, etc., can be found in Section 4.

Conflicts of Interest

Author Bing Hu was employed by the company Zhengzhou Zhongyuan Environmental Protection Co., Ltd. Author Wei Wu was employed by the company Zhengzhou Aiwen Tech Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Xu, J.; Wu, H.; Wang, J.; Long, M. Anomaly transformer: Time series anomaly detection with association discrepancy. arXiv 2021, arXiv:2110.02642. [Google Scholar]
Hong, J.; Liu, C.C.; Govindarasu, M. Integrated anomaly detection for cyber security of the substations. IEEE Trans. Smart Grid 2014, 5, 1643–1653. [Google Scholar] [CrossRef]
Nune, G.K.; Sena, P.V. Novel artificial neural networks and logistic approach for detecting credit card deceit. Int. J. Comput. Sci. Netw. Secur. (IJCSNS) 2015, 15, 21. [Google Scholar]
Jin, M.; Koh, H.Y.; Wen, Q.; Zambon, D.; Alippi, C.; Webb, G.I.; King, I.; Pan, S. A survey on graph neural networks for time series: Forecasting, classification, imputation, and anomaly detection. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 10466–10485. [Google Scholar] [CrossRef] [PubMed]
Shyu, M.L.; Chen, S.C.; Sarinnapakorn, K.; Chang, L. A novel anomaly detection scheme based on principal component classifier. In Proceedings of the IEEE Foundations and New Directions of Data Mining Workshop, Melbourne, FL, USA, 19–22 November 2003; IEEE Press: Piscataway, NJ, USA, 2003; pp. 172–179. [Google Scholar]
Angiulli, F.; Pizzuti, C. Fast outlier detection in high dimensional spaces. In Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery, Helsinki, Finland, 19–23 August 2002; Springer: Berlin/Heidelberg, Germany, 2002; pp. 15–27. [Google Scholar]
Breunig, M.M.; Kriegel, H.P.; Ng, R.T.; Sander, J. LOF: Identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, TX, USA, 16–18 May 2000; pp. 93–104. [Google Scholar]
Manevitz, L.M.; Yousef, M. One-class SVMs for document classification. J. Mach. Learn. Res. 2001, 2, 139–154. [Google Scholar]
Tax, D.M.; Duin, R.P. Support vector data description. Mach. Learn. 2004, 54, 45–66. [Google Scholar] [CrossRef]
Ting, K.M.; Xu, B.C.; Washio, T.; Zhou, Z.H. Isolation distributional kernel: A new tool for kernel based anomaly detection. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, in virtual, 6–10 July 2020; pp. 198–206. [Google Scholar]
Aggarwal, C.C. Outlier Analysis; Springer International Publishing: Berlin/Heidelberg, Germany, 2015; pp. 237–263. [Google Scholar]
Park, D.; Hoshi, Y.; Kemp, C.C. A multimodal anomaly detector for robot-assisted feeding using an lstm-based variational autoencoder. IEEE Robot. Autom. Lett. 2018, 3, 1544–1551. [Google Scholar] [CrossRef]
Hundman, K.; Constantinou, V.; Laporte, C.; Colwell, I.; Soderstrom, T. Detecting spacecraft anomalies using lstms and nonparametric dynamic thresholding. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 387–395. [Google Scholar]
Kim, D.; Park, S.; Choo, J. When model meets new normals: Test-time adaptation for unsupervised time-series anomaly detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 13113–13121. [Google Scholar]
Deng, A.; Hooi, B. Graph neural network-based anomaly detection in multivariate time series. In Proceedings of the AAAI Conference on Artificial Intelligence, in virtual, 2–9 February 2021; Volume 35, pp. 4027–4035. [Google Scholar]
Chen, Z.; Chen, D.; Zhang, X.; Yuan, Z.; Cheng, X. Learning graph structures with transformer for multivariate time-series anomaly detection in IoT. IEEE Internet Things J. 2021, 9, 9179–9189. [Google Scholar] [CrossRef]
Han, S.; Woo, S.S. Learning sparse latent graph representations for anomaly detection in multivariate time series. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 14–18 August 2022; pp. 2977–2986. [Google Scholar]
Luo, Y.; Liu, M.; Gan, J.; Zhou, X.; Jiang, M.; Yang, R. Correlation study on PM2.5 and O₃ mass concentrations in ambient air by taking urban cluster of Changsha, Zhuzhou and Xiangtan as an example. J. Saf. Environ. 2015, 15, 313–317. [Google Scholar]
Niu, M.; Zhang, Y.; Ren, Z. Deep learning-based PM2. 5 long time-series prediction by fusing multisource data—A case study of Beijing. Atmosphere 2023, 14, 340. [Google Scholar] [CrossRef]
Hawkins, D.M. Identification of Outliers; Springer: Berlin/Heidelberg, Germany, 1980; Volume 11. [Google Scholar]
Li, D.; Chen, D.; Jin, B.; Shi, L.; Goh, J.; Ng, S.K. MAD-GAN: Multivariate anomaly detection for time series data with generative adversarial networks. In Proceedings of the International Conference on Artificial Neural Networks, Munich, Germany, 17–19 September 2019; Springer: Berlin/Heidelberg, Germany, 2019; pp. 703–716. [Google Scholar]
Garg, A.; Zhang, W.; Samaran, J.; Savitha, R.; Foo, C.S. An evaluation of anomaly detection and diagnosis in multivariate time series. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 2508–2517. [Google Scholar] [CrossRef] [PubMed]
Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Yu, P.S. A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 4–24. [Google Scholar] [CrossRef] [PubMed]
Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
Jin, G.; Liang, Y.; Fang, Y.; Shao, Z.; Huang, J.; Zhang, J.; Zheng, Y. Spatio-temporal graph neural networks for predictive learning in urban computing: A survey. IEEE Trans. Knowl. Data Eng. 2023, 36, 5388–5408. [Google Scholar] [CrossRef]
Huang, Q.; Shen, L.; Zhang, R.; Ding, S.; Wang, B.; Zhou, Z.; Wang, Y. Crossgnn: Confronting noisy multivariate time series via cross interaction refinement. Adv. Neural Inf. Process. Syst. 2023, 36, 46885–46902. [Google Scholar]
Siffer, A.; Fouque, P.A.; Termier, A.; Largouet, C. Anomaly detection in streams with extreme value theory. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; pp. 1067–1075. [Google Scholar]
Mathur, A.P.; Tippenhauer, N.O. SWaT: A water treatment testbed for research and training on ICS security. In Proceedings of the 2016 International Workshop on Cyber-Physical Systems for Smart Water Networks (CySWater), Vienna, Austria, 11–14 April 2016; pp. 31–36. [Google Scholar]
Ahmed, C.M.; Palleti, V.R.; Mathur, A.P. WADI: A water distribution testbed for research in the design of secure cyber physical systems. In Proceedings of the 3rd International Workshop on Cyber-Physical Systems for Smart Water Networks, Pittsburgh, PA, USA, 21 April 2017; pp. 25–28. [Google Scholar]
Su, Y.; Zhao, Y.; Niu, C.; Liu, R.; Sun, W.; Pei, D. Robust anomaly detection for multivariate time series through stochastic recurrent neural network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2828–2837. [Google Scholar]
Goh, J.; Adepu, S.; Junejo, K.N.; Mathur, A. A dataset to support research in the design of secure water treatment systems. In Proceedings of the Critical Information Infrastructures Security: 11th International Conference, CRITIS 2016, Paris, France, 10–12 October 2016; Revised Selected Papers 11. Springer: Berlin/Heidelberg, Germany, 2017; pp. 88–99. [Google Scholar]
Lazarevic, A.; Kumar, V. Feature bagging for outlier detection. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, Chicago, IL, USA, 21–24 August 2005; pp. 157–166. [Google Scholar]
Zong, B.; Song, Q.; Min, M.R.; Cheng, W.; Lumezanu, C.; Cho, D.; Chen, H. Deep autoencoding gaussian mixture model for unsupervised anomaly detection. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; Lerer, A. Automatic differentiation in pytorch. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Fey, M.; Lenssen, J.E. Fast graph representation learning with PyTorch Geometric. arXiv 2019, arXiv:1903.02428. [Google Scholar]
Saito, T.; Rehmsmeier, M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 2015, 10, e0118432. [Google Scholar] [CrossRef] [PubMed]
Sørbø, S.; Ruocco, M. Navigating the metric maze: A taxonomy of evaluation metrics for anomaly detection in time series. Data Min. Knowl. Discov. 2024, 38, 1027–1068. [Google Scholar] [CrossRef]
Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]

Figure 1. The comparison of MTS prediction errors between the method using only the positive variable correlation graph (PG) and the method using both the positive and negative variable correlation graphs (PNG). In the anomaly time segment, the method using PNG shows higher prediction errors than the method using PG, facilitating MTS anomaly detection.

Figure 2. The framework of PNGDN. (a) Variable embedding module captures each variable’s unique characteristics. (b) Inter-variable correlation graph structure learning module learns variable similarities and selects strongly correlated nodes to build the graphs. (c) Attention-based information propagation and data forecasting module uses positive and negative correlation attention weights

α

, respectively, to forecast each variable, with thicker arrows indicating higher attention. (d) Anomaly detector identifies anomalies based on the absolute error between predicted and actual values.

Figure 2. The framework of PNGDN. (a) Variable embedding module captures each variable’s unique characteristics. (b) Inter-variable correlation graph structure learning module learns variable similarities and selects strongly correlated nodes to build the graphs. (c) Attention-based information propagation and data forecasting module uses positive and negative correlation attention weights

α

, respectively, to forecast each variable, with thicker arrows indicating higher attention. (d) Anomaly detector identifies anomalies based on the absolute error between predicted and actual values.

Figure 3. Performance with different ablation experiment results.

Figure 4. A t-SNE visualization of the variable embeddings from PNGDN on the WADI dataset, with node colors coded by seven different classes. The points enclosed by the dark blue dashed line and red dashed line represent variables positively and negatively correlated with the

2_F I C_101_S P

sensor.

Figure 4. A t-SNE visualization of the variable embeddings from PNGDN on the WADI dataset, with node colors coded by seven different classes. The points enclosed by the dark blue dashed line and red dashed line represent variables positively and negatively correlated with the

2_F I C_101_S P

sensor.

Figure 5. Comparison between (left) the data correlation matrix computed from the raw data in WADI dataset by Pearson correlation coefficient and (right) the cosine similarity matrix derived from our variable embeddings.

Figure 6. Comparison of predicted and actual values for the variable and its corresponding highest attention weight in positive correlations.

Figure 7. Comparison of predicted and actual values for the variable and its corresponding highest attention weight in negative correlations.

Table 1. Statistics of SWaT dataset, WADI dataset, and SMD.

Dataset	Variates	Train	Test	Anomalies
SWaT	51	496,801	449,920	12.21%
WADI	127	1,048,572	172,802	5.82%
SMD	38	708,406	708,421	4.16%

Table 2. Performance comparison between proposed PNGDN and baselines on three MTS datasets. Best in bold; second best underlined.

Method	P	R	F1	ROC	PRC	P	R	F1	ROC	PRC	P	R	F1	ROC	PRC
Method	SWaT					WADI					SMD
PCA	24.92	21.63	0.23	0.68	0.15	39.53	5.63	0.10	0.53	0.02	6.69	44.34	0.12	0.62	0.05
KNN	7.83	7.83	0.08	0.62	0.06	7.76	7.75	0.08	0.48	0.04	8.32	40.11	0.13	0.63	0.07
FB	10.17	10.17	0.10	0.63	0.12	8.60	8.60	0.09	0.49	0.04	7.98	26.65	0.12	0.61	0.08
AE	72.63	52.63	0.61	0.78	0.45	34.35	34.35	0.34	0.68	0.28	54.78	55.12	0.55	0.78	0.45
DAGMM	27.46	69.52	0.39	0.73	0.22	54.44	26.99	0.36	0.73	0.32	53.33	57.67	0.55	0.81	0.46
LSTM-VAE	96.24	59.91	0.74	0.83	0.72	87.79	14.45	0.25	0.68	0.22	56.71	56.51	0.56	0.82	0.48
MAD-GAN	98.97	63.74	0.75	0.88	0.76	41.44	33.92	0.37	0.73	0.32	57.42	55.83	0.57	0.83	0.50
GDN	95.45	70.36	0.81	0.85	0.76	75.38	38.66	0.51	0.78	0.46	56.59	58.82	0.58	0.84	0.49
M2N2	95.46	66.40	0.78	0.88	0.78	10.51	32.66	0.16	0.62	0.081	57.71	55.88	0.57	0.81	0.42
PNGDN	99.03	70.96	0.83	0.88	0.81	68.78	44.23	0.54	0.79	0.49	59.21	60.75	0.60	0.86	0.51

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jin, Y.; Liu, X.; Hu, B.; Walker, J.; Wang, K.; Wu, W.; Zhong, T. Negative Feedback Matters: Exploring Positive and Negative Correlations for Time Series Anomaly Detection. Electronics 2025, 14, 2068. https://doi.org/10.3390/electronics14102068

AMA Style

Jin Y, Liu X, Hu B, Walker J, Wang K, Wu W, Zhong T. Negative Feedback Matters: Exploring Positive and Negative Correlations for Time Series Anomaly Detection. Electronics. 2025; 14(10):2068. https://doi.org/10.3390/electronics14102068

Chicago/Turabian Style

Jin, Yixuan, Xueting Liu, Bing Hu, Joojo Walker, Ke Wang, Wei Wu, and Ting Zhong. 2025. "Negative Feedback Matters: Exploring Positive and Negative Correlations for Time Series Anomaly Detection" Electronics 14, no. 10: 2068. https://doi.org/10.3390/electronics14102068

APA Style

Jin, Y., Liu, X., Hu, B., Walker, J., Wang, K., Wu, W., & Zhong, T. (2025). Negative Feedback Matters: Exploring Positive and Negative Correlations for Time Series Anomaly Detection. Electronics, 14(10), 2068. https://doi.org/10.3390/electronics14102068

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Negative Feedback Matters: Exploring Positive and Negative Correlations for Time Series Anomaly Detection

Abstract

1. Introduction

2. Related Works

2.1. Anomaly Detection for Unsupervised Multivariate Time Series

2.2. Graph Neural Networks in Time Series Anomaly Detection

3. Methods

3.1. Problem Definition

3.2. Overall Structure

3.3. Variable Embedding

3.4. Inter-Variable Correlation Graph Structure Learning

3.5. Attention-Based Information Propagation and Data Forecasting

3.6. Anomaly Detector

4. Experiments

4.1. Datasets

4.2. Baselines

4.3. Experiment Setup

4.4. Performance Comparison and Discussion

4.5. Ablation Study

5. Interpretability of Model

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI