Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

An Interpretable Method for Anomaly Detection in Multivariate Time Series Predictions

Appl. Sci. 2025, 15(13), 7479; https://doi.org/10.3390/app15137479

by Shijie Tang^1,2,†, Yong Ding^1,3,4,† and Huiyong Wang^5,*,†

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Appl. Sci. 2025, 15(13), 7479; https://doi.org/10.3390/app15137479

Submission received: 8 April 2025 / Revised: 17 June 2025 / Accepted: 27 June 2025 / Published: 3 July 2025

(This article belongs to the Special Issue Novel Insights into Cryptography and Network Security)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Paper Summary:

This paper proposed an interpretation method to extend the direction of anomaly detection beyond detection performance by explaining the decision-making and providing answers about why data are marked as malicious and the types of attacks.

However, some comments to the authors should be addressed to improve the paper's quality:

1) The direction of anomaly interpretation has already been searched, and many previous studies started the trend. Specifically, the proposed method (gradient-based technique) seems not very novel, as many previous studies used GNN and reverse gradient. This raises a question about what novelty is presented in the paper compared with the state-of-the-art. Authors should provide clear clarification regarding this.

2) The authors shortly mentioned that the gradient model is used for the interpretability method because it is independent of the anomaly detection model. Such a statement is hard to understand for non-expert readers and needs more clarification with examples to show to what extent the gradient is independent of the performance evaluation.

3) The implementation of the study is based only on one dataset (SWAT). However, from my experience, many of the domain-based detection models in ICS are usually evaluated using three famous and publicly available datasets, which are SWAT, HAI, and WADI. WADI and HAI datasets are much bigger in terms of training samples and number of features. The authors have to check this article (Kim, B.; Alawami, M.A.; Kim, E.; Oh, S.; Park, J.; Kim, H. A Comparative Study of Time Series Anomaly Detection Models for Industrial Control Systems. Sensors 2023, 23, 1310. https://doi.org/10.3390/s23031310) that provided a comparative study of various anomaly models using the SWAT and HAI datasets.

4) There are many previous works already developed anomaly interpretability methods using techniques such as GNN or reverse gradient. The authors claim that such techniques are risky because the internal structure of the model can be exposed. Such a claim is unclear without evidence. Further explanations with examples are needed to clear this point. Also, I request the authors to provide direct comparisons with previous techniques in terms of performance, usability, and security.

5) Regarding the expierments and paramter settings,
The authors did not investigate the effects of the data amount
(samples size) of the SWAT dataset on the performance evaluation, which is a crucial part to show which train set size (i.e., 40, 60, 80,
or the entire training set) would be sufficient to build
a model. Also, no time-based calculations are provided to see whether the proposed method is efficient on ICS models or not. The amount of data samples and the time factor should also be applied to three datasets ( SWaT, HAI, and WADI) to validate the proposed method over diverse datasets. Without it, we can not comprehensively compare the study with the state-of-the-art.

Comments on the Quality of English Language

No problem with "Quality of English Language"!

Author Response

However, some comments to the authors should be addressed to improve the paper's quality:

Reply: Our method differs fundamentally from GNN and inverse gradient methods: the interpretability of GNN depends on the inherent topological relationships of the graph structure (such as node/edge importance), while the inverse gradient method suppresses the sensitivity of the model to certain features by reversing the gradients of specific layers or parameters. In contrast, our proposed method utilizes optimization algorithms to solve for the normal "reference" state of outliers, which not only captures the correlation between features through gradient optimization, but also predicts the evolution trend of the system after being attacked. We have revised points (2) and (3) of our main contributions to demonstrate our novelty.

Reply: Maybe I didn't explain it clearly, which caused ambiguity. What we mean is not that gradients are independent of performance evaluation, but that the interpreter uses model agnostic analysis techniques to achieve interpretation without relying on the original model structure. The update of the interpreter gradient depends on the input, predicted output, and optimization function of the anomaly detection model, and does not trigger the calculation of the anomaly detection model gradient. We have revised the statement in the paper regarding 'independent of anomaly detection models'.

Reply: Thank you for your valuable comments, and we tried to do more experiments. But due to time constraints, we only added experiments related to the WADI dataset.

Reply: Accessing internal gradient and node data might expose the internal structure of the model. If an interpretable method unrelated to the model is used, this problem can be avoided. The performance comparison is presented in Tables 3 and 4.

5) Regarding the expierments and paramter settings,
The authors did not investigate the effects of the data amount (samples size) of the SWAT dataset on the performance evaluation, which is a crucial part to show which train set size (i.e., 40, 60, 80, or the entire training set) would be sufficient to build a model. Also, no time-based calculations are provided to see whether the proposed method is efficient on ICS models or not. The amount of data samples and the time factor should also be applied to three datasets ( SWaT, HAI, and WADI) to validate the proposed method over diverse datasets. Without it, we can not comprehensively compare the study with the state-of-the-art.

Reply: In Section 4.1, we have added explanations on the sample size of the dataset and the division of the dataset. We originally planned to add comparative data on running time in the algorithm comparison in Section 4.2, but we failed to get the running time of Baseline 1 and Baseline 3, since they are too time-consuming. The experimental results we have obtained are shown in the table below.

Table 3 Comparison of running time on different data sets

Methods	SWAT（s）	WADI（s）
Baseline1	*	*
Baseline 2	1294	1307
Baseline 3	30496	*
Baseline 4	1639	*
LIME	1251	134
SHAP	11	13
Our method	1968	4647

*：No valid data obtained

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The author showcases an important study for a gradient-based interpretability method for anomaly detection in industrial control systems and focuses on identifying which physical components have been compromised during an attack. I have some suggestions to improve the study:

The study would benefit from a stronger theoretical justification for using the product of gradient and input (equation 3) as the importance measure. A comparison with other gradient-based feature attribution methods (like SHAP or LIME) would strengthen the paper's positioning in the interpretability literature.

The Pearson correlation coefficient only captures linear relationships between features. Consider incorporating non-linear correlation measures (like mutual information or distance correlation).

For future discussion, creating a feedback loop where operator insights improve the interpretability model would be valuable.

Author Response

1. The author showcases an important study for a gradient-based interpretability method for anomaly detection in industrial control systems and focuses on identifying which physical components have been compromised during an attack. I have some suggestions to improve the study:

Reply: In Section 4.2, a comparative experiment between SHAP and LIME was added to the algorithm comparison.

2. The study would benefit from a stronger theoretical justification for using the product of gradient and input (equation 3) as the importance measure. A comparison with other gradient-based feature attribution methods (like SHAP or LIME) would strengthen the paper's positioning in the interpretability literature.

Reply: Thank you for the expert's suggestion. We use a combination of KSG mutual information estimation and multivariate cross-correlation analysis to quantify the correlation between key dimensions. KSG mutual information is based on the k-nearest neighbor framework, which can accurately capture nonlinear and non-monotonic dependency relationships; Multivariate cross-correlation analysis, on the other hand, effectively identifies dynamic correlation patterns with lagged effects through time delay correlation detection. This bimodal correlation evaluation framework not only overcomes the limitations of traditional linear correlation coefficients, but also comprehensively characterizes the instantaneous and delayed correlation characteristics in complex systems.

3. The Pearson correlation coefficient only captures linear relationships between features. Consider incorporating non-linear correlation measures (like mutual information or distance correlation).

For future discussion, creating a feedback loop where operator insights improve the interpretability model would be valuable.

Reply: Thank you for your valuable suggestion. We will consider the discussion in our future work.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

I have reviewed the revised manuscript and the authors' responses. They have adequately addressed all of my previous comments, and I have no further suggestions.

Article Menu

An Interpretable Method for Anomaly Detection in Multivariate Time Series Predictions

Further Information

Guidelines

MDPI Initiatives

Follow MDPI