Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Towards Explainable Quantum Machine Learning for Mobile Malware Detection and Classification^†

Appl. Sci. 2022, 12(23), 12025; https://doi.org/10.3390/app122312025

by Francesco Mercaldo^1,2,*

, Giovanni Ciaramella^2,*, Giacomo Iadarola²

, Marco Storto¹, Fabio Martinelli² and Antonella Santone¹

Reviewer 1:

Xitong Zhang

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Appl. Sci. 2022, 12(23), 12025; https://doi.org/10.3390/app122312025

Submission received: 27 October 2022 / Revised: 16 November 2022 / Accepted: 17 November 2022 / Published: 24 November 2022

(This article belongs to the Collection Innovation in Information Security)

Round 1

Reviewer 1 Report

The authors proposed using deep learning and quantum machine learning models to classify whether an application is malware or not. The classification result is also explained based on the CAM method. The convolutional neural network achieved the best performance, and the CAM method highlighted the region to indicate where the model pays attention to for the final classification.

The general approach is interesting. However, I still have some concerns:

1. The description and the introduction of quantum machine learning are too brief. It'd be better if the authors could provide a more detailed introduction to the quantum machine learning method since the usage of quantum machine learning is claimed as the contribution, and it is not a conventional field. More detailed mathematical equations and references could be included to illustrate how it works and what is the major advantages and difficulties of applying the quantum method.

2. It is reasonable to downsample the input images to the same size for a fair performance comparison among different methods. For now, it is difficult to see if it is necessary to use quantum machine learning methods for the malware detection task.

3. Since the images should be downsampled for the quantum machine learning method, it is reasonable to show the computational cost, including hardware requirements (memory, CPUs, etc) and running time compared with deep learning models.

4. Is there any evidence to prove that the CAM is meaningful? For example, can the CAM be mapped to the real codes and check whether the highlighted blocks are related to the malware? It is shown that the CAM is useful to CNNs for model interpretation, but is it also true for quantum machine learning? Any references to support this? Also, the CAM method is efficient but not accurate, it is highly possible to include noises in the activation map.

Author Response

Reviewer #1
List of Comments and Suggestions.
Comment #1: The description and the introduction of quantum machine learning are too brief. It'd be better if the authors could provide a more detailed introduction to the quantum machine learning method since the usage of quantum machine learning is claimed as the contribution, and it is not a conventional field. More detailed mathematical equations and references could be included to illustrate how it works and what is the major advantages and difficulties of applying the quantum method.
Response: According to the reviewer, we improved the section on background related to quantum computing. In particular, we added information about qubits in a mathematical sense illustrating the geometric interpretation taught by the sphere of Bloch. We also reported the difference in the sense of measurement, the usage of registers, and Quantum Neural Network notions with mathematical formulas to reinforce the concept. Thank you for the suggestion aimed to improve the quality of the proposed manuscript.

Comment #2: It is reasonable to downsample the input images to the same size for a fair performance comparison among different methods. For now, it is difficult to see if it is necessary to use quantum machine learning methods for the malware detection task.
Response: We perfectly agree with the reviewer. To introduce a fair comparison in the revised version of the paper we directly compared the convolutional models with the hybrid quantum ones. Unfortunately, we were not able to compare all models with the full quantum network due to the size of the images. In particular, we performed new experiments directly comparing in terms of image size the two CNN models (Standard-CNN and EfficientNet) with the hybrid quantum one.

Comment #3: Since the images should be downsampled for the quantum machine learning method, it is reasonable to show the computational cost, including hardware requirements (memory, CPUs, etc) and running time compared with deep learning models.
Response: We perfectly agree with the reviewer. In the revised version of the manuscript, we added more details about the experiment analysis we performed, including the hardware we considered for the experiment running and the time employed for (quantum and convolutional) model training. Thank you again for the possibility to improve the proposed experimental analysis.

Comment #4: Is there any evidence to prove that the CAM is meaningful? For example, can the CAM be mapped to the real codes and check whether the highlighted blocks are related to the malware? It is shown that the CAM is useful to CNNs for model interpretation, but is it also true for quantum machine learning? Any references to support this? Also, the CAM method is efficient but not accurate, it is highly possible to include noises in the activation map.
Response: We are really thankful to the reviewer for the opportunity to improve the proposed manuscript. Considering that malware belonging to the same family share the malicious payload (i.e., the malicious action), the idea behind the CAM application is to understand whether CAM of different samples belonging to the same family highlight similar areas in the images. Differently, samples belonging to different families should exhibit different areas highlighted by the CAM. The rationale is to find areas of the images obtained from malware symptomatic of certain malicious behaviors: in this way the security analyst can focus the manual effort on a reduced section of the application under analysis. We are aware that images obtained from application source code can be more informative from this point of view, as a matter of fact in this case the areas highlighted will be related to code snippets, and for this reason more understandable from the security analysis. We added these sentences to the paper discussion section. Thank you again for the opportunity to clarify this aspect of the proposed manuscript.

Reviewer 2 Report

This paper compares three CNN models and two quantum models to classify malware. It also provides explainability behind the model predictions by adopting the weighted CAM to highlight the areas of the image obtained from the application symptomatic of a certain prediction. Real-world experiments have been performed on a dataset of 8446 Android malicious and legitimate applications, getting interesting results.

Strengths:

1. The proposed method extends the previous work by tuning the models with the aim to empirical obtain better detection performances

2. It resorts to an algorithm aimed to highlight the areas of the application under analysis to provide explainability behind the model decision.

Weakness:

1. The abstract has too many themes. I suggest explaining the key issue briefly. What is the main question the authors are trying to answer or explore?

2. In the introduction section, the authors should present the motivation and scope of the problem and investigate it. For example, why “Explaining” machine learning is important for mobile malware detection and classification?

3. The introduction section has too many paragraphs, which makes it poorly structured. In addition, why do you select this particular framework to handle the question?

4. The figure captions are too simple; they should describe the main idea of the figures. For example, the caption of figure 3 should describe the workflow of the proposed method more clearly.

5. In experiments, I think the comparison with classic CNN models is insufficient. What is new and significant? I suggest comparing it with other state-of-the-art methods in the same field.

6. The title of section 4.2 is “Reliability of Classification using Grad-CAM.” However, why do the results evaluate the reliability? Perhaps you should describe it more clearly.

7. The experimental results need to be further discussed. I suggest presenting the principles and relationships shown by the results.

8. In the Conclusion and Future works section, you should state your conclusions as clearly as possible. Please note that this section is different from the Introduction.

Author Response

Reviewer #2
List of Comments and Suggestions.

Comment #1: The abstract has too many themes. I suggest explaining the key issue briefly. What is the main question the authors are trying to answer or explore?
Response: We are really thankful for the opportunity to improve the paper presentation. In the revised version of the paper we shortened the abstract to better highlight the paper contribution. Thank you again for your support.

Comment #2: In the introduction section, the authors should present the motivation and scope of the problem and investigate it. For example, why “Explaining” machine learning is important for mobile malware detection and classification?
Response: In the revised version of the paper we highlighted the reason why is important to explain the reason why a classifier takes certain decisions. Moreover, we better motivate the proposed contribution. Thank you a lot for your interesting suggestions.

Comment #3: The introduction section has too many paragraphs, which makes it poorly structured. In addition, why do you select this particular framework to handle the question?
Response: We are thankful for the opportunity to improve the proposed manuscript. We better structured the introduction section and we explained the reason why we resort to the proposed framework. Thank you again for the help in the candidate paper improving.

Comment #4: The figure captions are too simple; they should describe the main idea of the figures. For example, the caption of figure 3 should describe the workflow of the proposed method more clearly.
Response: We are thankful for the interesting observation: in the revised version of the manuscript, we added more details for all the captions of figures and tables, with particular regard to the caption of Figure 2, related to the overview of the proposed method. Thank you again for the opportunity to improve the proposed manuscript.

Comment #5: In experiments, I think the comparison with classic CNN models is insufficient. What is new and significant? I suggest comparing it with other state-of-the-art methods in the same field.
Response: According to the reviewer, we added four new models that come from the state-of-the-art. In particular, we perform experiments using VGG19, MobileNet, and EfficientNet models. In addition to obtaining a better comparison with the Hybrid model, we also executed the EfficientNet and Standard-CNN using an image dimension of 25x1. As a matter of fact we were not able to perform a comparison between the convolutional models and the quantum model due to the image size used in the latter to perform experiments. We discussed these aspects in the revised version of the manuscript.

Comment #6: The title of section 4.2 is “Reliability of Classification using Grad-CAM.” However, why do the results evaluate the reliability? Perhaps you should describe it more clearly.
Response: We perfectly agree with the reviewer. We renamed section 4.2 from “Reliability of Classification using Grad-CAM.” in “Discussion” and we better explained how we use the results obtained by Grad-CAM clearly describing how it works.

Comment #7: The experimental results need to be further discussed. I suggest presenting the principles and relationships shown by the results.
Response: In the revised version of the paper we better presented the experimental analysis results. We also added other experiments with state-of-the-art deep learning models: in this way the results section is reorganised and enriched with the experimental results. Thank you again for your support.

Comment #8: In the Conclusion and Future works section, you should state your conclusions as clearly as possible. Please note that this section is different from the Introduction.
Response: We are thankful to the reviewer for the possibility to improve the paper presentation. In the revised version of the manuscript, we improved the Conclusion and Future works section, by adding more future research lines to differentiate this section from the introduction one. Thank you again for your help in improving the proposed manuscript.

Reviewer 3 Report

The authors discuss a very interesting and important topics related to the safety of mobile devices. Although the presented Introduction begins well and ends show what the authors intend to show in the article, most of this chapter refers to "State of Art".

That's why my suggestion.

Reorganize the article. Shorten the introduction. Transfer the part to the new prostate "State of Art" which additionally develop, show a larger number of works related to the subject and refer to them showing what the authors' work is better. It is true that subsequent chapters the authors also refer to other sources, but it must be presented differently so that it reads and analyzes it better.

Further parts present the methodology and the tests themselves. The descriptions are satisfying and promising. Certainly the results obtained by the authors are interesting and can be used in further works or specific implementation. The descriptions themselves could be more in -depth.

Technical attention for tables: If you use 0.001 precision, it should also be 0.010. It also applies to other tables.

In my opinion, the fifth chapter should be incorporated into the discussion.

The authors do not use the IMRAD model, which makes them difficult to read and analyze their work.

The authors work is interesting.

Author Response

Reviewer #3
List of Comments and Suggestions.

Comment #1: Reorganize the article. Shorten the introduction. Transfer the part to the new prostate "State of Art" which additionally develops, shows a larger number of works related to the subject and refers to them showing what the authors' work is better. It is true that in subsequent chapters the authors also refer to other sources, but it must be presented differently so that it reads and analyzes it better.
Response: We removed several sentences into the introduction section to better highlight the motivation behind the proposed manuscript. Moreover, we added recent works to the related work section by showing the contribution we introduce with the proposed manuscript.

Comment #2:Further parts present the methodology and the tests themselves. The descriptions are satisfying and promising. Certainly, the results obtained by the authors are interesting and can be used in further works or specific implementations. The descriptions themselves could be more in-depth.
Response: We are really thankful to the reviewer for the opportunity to improve the proposed manuscript. In the revised version of the paper, we added more descriptions related to the exploited models but also to quantum computing and quantum machine learning. Thank you again for your suggestions.

Comment #3: Technical attention for tables: If you use 0.001 precision, it should also be 0.010. It also applies to other tables.
Response: We apologize for this and we perfectly agree with the reviewer. We updated all numbers in the table according to the reviewer's suggestions. Thank you a lot for the suggestion and for the opportunity to improve the candidate manuscript.

Comment #4: In my opinion, the fifth chapter should be incorporated into the discussion.
Response: We are thankful for the suggestion. In the revised version of the manuscript, we take into account the reviewer's suggestion. Thank you again for the opportunity to improve the proposed manuscript presentation.

Comment #5: The authors do not use the IMRAD model, which makes them difficult to read and analyze their work.
Response: We are thankful for the opportunity to improve the manuscript presentation. We reorganized the revised version of the paper by following the IMRAD model we introduced the Introduction, Methods, Results, and Discussion sections in this order. Thank you again for the suggestion.

Round 2

Reviewer 1 Report

Thanks a lot for the effort! However, there are still some questions needed to be addressed.

1. There are some reference errors in lines 507 and 591.

2. Please consider revising Lines 820 to 822 since there are two repeated sentences "the idea behind the Grad-CAM application is ...."

3. It is unclear which model is used to generate Figures 8 to 11. From the Standard CNN or the Hybrid-QCNN?

4. The contribution seems weak since it looks like the quantum component is unnecessary. The standard CNN is efficient and accurate enough. So, why do we need a quantum machine learning framework for this task?

5. For the conventional CAM applied to the image classification, we can see that the highlighted region is related to the target label. For example, if the label is "dog", the area of the dog will be highlighted in the input image by CAM. We can directly know the activation map is correct visually, but it is unclear whether the CAM is still meaningful in the malware detection task. So my question is, how to prove that the highlighted area by CAM is correct for the malware detection task? Could the authors provide some evidence about it?

Author Response

Reviewer #1
List of Comments and Suggestions.

Comment #1: There are some reference errors in lines 507 and 591.
Response: We corrected the errors. Thank you for your help.

Comment #2: Please consider revising Lines 820 to 822 since there are two repeated sentences "the idea behind the Grad-CAM application is ...."
Response: We removed the repeated sentences, thank you for helping us in the manuscript improving.

Comment #3: It is unclear which model is used to generate Figures 8 to 11. From the Standard CNN or the Hybrid-QCNN?
Response: We added the detail about the model used for the generation of Figures from 8 to 11. In particular Figures 8 and 9 were obtained by invoking the STANDARD_CNN model, while Figures 10 and 11 were by using the Hybrid-QCNN one. Thank you for the possibility to improve the proposed manuscript.

Comment #4: The contribution seems weak since it looks like the quantum component is unnecessary. The standard CNN is efficient and accurate enough. So, why do we need a quantum machine learning framework for this task?
Response: Unfortunately, due to hardware limitations, we are not able to experiment with the quantum models (i.e., Hybrid-QCNN and the QNN) using images with dimensions bigger than 25x1. Anyway, when we train quantum and convolutional models by using the same image dimensions we can note that the quantum models are able to obtain better performances. This result is highlighted in Figure 3, in particular by observing the results obtained from the EfficientNet(25x3) and the Hybrid-QCNN networks, with an accuracy respectively equal to 0.251
(for the EfficientNet model) and 0.905 (for the Hybrid-QCNN one). The STANDARD_CNN model, with images of the size of 25x3, obtains an accuracy equal to 0.915 which is slightly upper if compared with the one obtained from the Hybrid-QCNN ones but, the STANDARD_CNN model was trained with 64 as a dimension of the batch, while the Hybrid-QCNN with batch size equal to 32. For these reasons, we observe that quantum machine learning can be promising in the malware detection task.
We added these sentences in the revised version of the manuscript. Thank you again for the possibility to better highlight the aim of the proposed manuscript.

Comment #5: For the conventional CAM applied to the image classification, we can see that the highlighted region is related to the target label. For example, if the label is "dog", the area of the dog will be highlighted in the input image by CAM. We can directly know the activation map is correct visually, but it is unclear whether the CAM is still meaningful in the malware detection task. So my question is, how to prove that the highlighted area by CAM is correct for the malware detection task? Could the authors provide some evidence about it?
Response: The application of Grad-CAM can be useful to understand in which part of the image under analysis are located the bytes that, from the model point of view, are symptomatic of the malicious payload. It can also be of interest to the security analyst for studying malware families: in fact, since samples belonging to the same family share the same payload, the Grad-CAM for these samples should highlight the same area of the image. It can also be useful for identifying malware variants belonging to the same family, in fact, attackers develop new variants to evade signature-based detection provided by antimalware by applying code obfuscation techniques. Therefore the highlighting of different areas among various samples of the same family can be symptomatic of a new variant of an existing malware family.
We added these sentences in the revised version of the manuscript. Thank you again for the opportunity to improve the proposed paper.

Reviewer 2 Report

The authors have address my concerns.

Author Response

Reviewer #2
List of Comments and Suggestions.

Comment #1: The authors have address my concerns.
Response: We are really thankful for the opportunity to improve the paper. Thank you again for your support.

Article Menu

Towards Explainable Quantum Machine Learning for Mobile Malware Detection and Classification^†

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Towards Explainable Quantum Machine Learning for Mobile Malware Detection and Classification†

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Towards Explainable Quantum Machine Learning for Mobile Malware Detection and Classification^†