A Malicious Program Behavior Detection Model Based on API Call Sequences
Round 1
Reviewer 1 Report
Comments and Suggestions for Authors
The authors present an approach for Malicious Program Behaviour Detection Model Based on API Call Sequences.
Please consider the following considerations as a way to help me to understand some decisions and with ideas to improve the work.
On the contribution described on the line 60, are the authors considering sharing the results in a form of a dataset? If so, I think it will be important to refer it and share it here.
The related work section analyses good and recent works , but i think authors could gain with the analysis of the following article : "Tieming Chen, Huan Zeng, Mingqi Lv, Tiantian Zhu, CTIMD: Cyber threat intelligence enhanced malware detection using API call sequences with parameters, Computers & Security, Volume 136, 2024, 103518,ISSN 0167-4048", as they have a similar base approach to the analysis, they could make the distinction between them.
It is positive that the authors presented the experimental setup in terms of hardware, but they can improve stating information about operative, systems, libraries/tools/software version used. That will also enable that the authors could improve by presenting with the results of the execution times, and the resources used by each approach in a way that the community can compare.
Section 5 — Results — is well achieved, but could be improved by also adding execution times and hardware resources usage.
In the discussion section, the authors discuss the imbalance in the amount of data between the categories of the dataset, that was something already presented in Table 2. I think the authors should explain why they chose to use the experimental dataset with that imbalance related to the number of files. Do have an idea of the number of extracted API sequences per category? Did you consider making it more even?
In the overall, the article is well written, following a correct structure, it is technical sound and makes a good contribution to the field.
Comments on the Quality of English Language
Just some typos:
Line 80: pro-gram
Line 280: an-ti-virtualization
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for Authors
1. The term ' API call sequences' appears many times in the article, which is inconsistent with the term ' API sequences'. Please modify and unify.
2. About the process of the malicious behavior sequence matching model, could you add some details?
3. The term 'deep learning detection model' is inconsistent with the term 'neural network model' in Fig.3. Please unify.
4. Unify the font size of several experimental result pictures in Fig.2 and Fig.5.
Comments on the Quality of English Language
As stated above.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for Authors
In this paper the authors propose a detection model based on API call sequences, which combines Rule Matching and Deep Learning methods to improve the performance of malicious program detection.
Experimental results show that the proposed detection model can effectively detect malicious samples and identify malicious program behaviours.
My suggestions regarding the improvement of the paper are as follows:
The Introduction is missing the short description of the paper content. Need to be added.
At Line 80 – term pro-gram needs to be replaced with - program.
At Line 149 – term de-duplication needs to be replaced with deduplication.
At Line 217 – term ins-tances needs to be replaced with - instances.
At Line 232 – term fea-tures needs to be replaced with - features.
At Line 244 – term perfor-mances needs to be replaced with - performances.
At Line 252 – term con-volutional needs to be replaced with - convolutional.
At Line 290 – term an-ti-virtualization needs to be replaced with an-ti-virtualization.
The findings are adequate and contribute with experimental results in the area of Prefixspan PrefixSpan (Prefix-projected Sequential pattern mining) algorithm to mine the frequent API sequences of several processes in the malicious program.
The article is well written and composed, and it presents an interesting study.
The methodology and comparison of the obtained results of the used methods is also clear.
Future work is missing at the Conclusion. Maybe, the part of the text regarding future work needs to be replaced from Discussion to Conclusion.
My suggestions are that some parts of the article need to be improved, such as Introduction, Discussion and Conclusion.
Comments for author File: Comments.pdf
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 4 Report
Comments and Suggestions for Authors
This study proposes a detection model based on API call sequences, with a two-step approach by combining rule matching and deep learning methods to improve the performance of malicious program detection.
First, this paper uses the Prefixspan algorithm to mine the frequent API sequences of different processes of the same program in the malicious program dataset to construct a rule base of malicious behavior sequences. Then, the malicious behavior sequence matching model is used to match the API sequences to be tested, and the API sequences that fail to match will be input to TextCNN deep learning detection model for further detection. Finally, the two models work together to achieve the detection of program behavior.
Major Comments
References need to provide accurately and in its first mention in the manuscript.
TextCNN[27] is cited in table 4, where it is mention in the introduction section.
Not seeing a citation for Prefixspan algorithm.
Table 4. Comparison of evaluation results of the models mentioned above.
The results seem too high for a multiclass classifier with 8 classes.
The authors need to be made sure about the overfitting aspect.
Seems like class 1,4 and 6 have less labels compared to other classes. Is class imbalance affecting the prediction accuracy?
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 3 Report
Comments and Suggestions for Authors
The Authors have taken in consideration my suggestions and proposals for improvement of the paper.