Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Android Malware Classification Using Optimized Ensemble Learning Based on Genetic Algorithms

Sustainability 2022, 14(21), 14406; https://doi.org/10.3390/su142114406

by Altyeb Taha^*

and Omar Barukab

Reviewer 1:

Ignacio Martin

Reviewer 2:

Shivi Garg

Sustainability 2022, 14(21), 14406; https://doi.org/10.3390/su142114406

Submission received: 21 September 2022 / Revised: 25 October 2022 / Accepted: 1 November 2022 / Published: 3 November 2022

Round 1

Reviewer 1 Report

In general I believe this is a good research with relevant and interesting conclusions. However, the same does not apply to the paper.

For example in lines 43 and 44 of page 1: "To identify and categorize Android malware, security researchers employ a variety of technical ways. These types of studies can be classified as static, dynamic, or mixed" is a very complicated phrase that uses wrong terms (technical ways would be approaches or alternatives; studies is also better with approaches or alternatives).

Furthermore, there are some incomplete claims; for example: Static analysis is more limited that simply not recognising zero-day vulnerabilities. The references to ML algorithms are ok, but would be better in the literature review/methods section rather than the introduction.

Regarding the proposed approach, I think that the way the authors put it is very complicated. If I understand correctly, they propose an ensemble method to classify malware/goodware that uses a random forest classifier to make the final decision (using as features the predictions of the rest of the ensemble). If this is the case why not put it this way?

In this same line, I believe that the authors should make a bigger effort on variable selection. It is not possible to know which are the features used (the entire list, at least conceptually [e.g. permissions, intents, etc.]), the most relevant ones or get a grasp on all the features used. I see permissions, but what else are we using there?

Finally, talking about metrics:

Where is AUC explained among the performance metrics? either explain it or do not use it.

Why do the authors not provide train and test metrics? I assume that Table 3 provides figures of test, but train is important (e.g. would be relevant to see if the system is generalising correctly or has some overfitting).

I do believe that his paper has a relevant and sound technical content behind that has potential for publication, but it must receive an in-depth restructuring, correction and probably re-writing, so I am going to recommend a major review where the authors should reconsider the entire structure, fix typos and inconsistencies and make an effort to make the manuscript more readable (and complete).

Typos and other edition problems:

There is a significant number of typographic errors, for instance: in the abstract "continues for continuous"; in line 35 (page 1): "complex malware that utilises new methods" rather than "than complex malware that utilizing new methods";

Strange writing: lines 43 and 44 of page 1: "To identify and categorize Android malware, security researchers employ a variety of technical ways."

Inconsistency in line 227: "parameters optimization of the base learner, which is the RF algorithm" was not the RF the meta learner?

Why are there sections in full capital letters and others in regular camel case? Consistency is lacking.

In addition, the paper lacks commas for miles in numbers, which makes them very difficult to read.

Graphs are very unprofessional. Quality of image is really bad and they feel made in a hurry.

Author Response

Comment #1

The introduction writing is bad, very difficult to read and with constant typos and in corrections. While there is an attempt to perform an introduction, the narrative goes sideways and does not center the discussion on justifying what is the matter with Android malware and what do the authors propose to fix it. A major re-writing is needed for this section. For example in lines 43 and 44 of page 1: "To identify and categorize Android malware, security researchers employ a variety of technical ways. These types of studies can be classified as static, dynamic, or mixed" is a very complicated phrase that uses wrong terms (technical ways would be approaches or alternatives; studies is also better with approaches or alternatives).Furthermore, there are some incomplete claims; for example: Static analysis is more limited that simply not recognizing zero-day vulnerabilities. The references to ML algorithms are ok, but would be better in the literature review/methods section rather than the introduction.

Response #1

We highly appreciate the time and efforts by the reviewer in reviewing our manuscript, and sincerely appreciate the valuable comments and suggestions from the reviewer that help us in improving our paper. English language native speaker has improved the language of the paper. Below is the language editing certificate

Comment #2

Response #2

We thank the reviewer for his great comments. To make the proposed approach clear we explained in the revised manuscript in section 3 the proposed approach as suggested by the reviewer.

Comment #3

Regarding feature selection and features used: How Is the information gain ratio used? Are the authors performing feature selection based on that? If that is the case, it should be explicitly explained. In this same line, I believe that the authors should make a bigger effort on variable selection. It is not possible to know which are the features used (the entire list, at least conceptually [e.g. permissions, intents, etc.]), the most relevant ones or get a grasp on all the features used. I see permissions, but what else are we using there?

Response #3

We thank the reviewer for his useful comment, we updated the revised manuscript by explaining how the Information Gain algorithm is used to select the important permissions in section 3.1, also we explained that the classification of the malware and goodware is based on the used Android permissions.

Comment #4

Where is AUC explained among the performance metrics? either explain it or do not use it.

Response #4

We thank the reviewer for his comment. As suggested by the reviewer, we updated the revised manuscript by explaining the AUC in section 3.5

Comment #5

Response #5

We thank the reviewer for his comment.We have updated the revised manuscript by adding the training metrics in Table 3.

Comment #6

There is a significant number of typographic errors, for instance: in the abstract "continues for continuous"; in line 35 (page 1): "complex malware that utilises new methods" rather than "than complex malware that utilizing new methods";

Strange writing: lines 43 and 44 of page 1: "To identify and categorize Android malware, security researchers employ a variety of technical ways."

Inconsistency in line 227: "parameters optimization of the base learner, which is the RF algorithm" was not the RF the meta learner?

Why are there sections in full capital letters and others in regular camel case? Consistency is lacking.

In addition, the paper lacks commas for miles in numbers, which makes them very difficult to read. Graphs are very unprofessional. Quality of image is really bad and they feel made in a hurry.

Response #6

We thank the reviewer for his comment, we improved the quality of the images. English language native speaker has improved the language of the paper. Below is the language editing certificate

Author Response File: Author Response.docx

Reviewer 2 Report

Reviews: Sustainability - MDPI

Manuscript Number sustainability-1956473

# Major Strong Points

The paper follows a usual structure, which helps the reader follow it easily.

# Major Weak Points

While the paper aims at being very descriptive, it misses out on focusing on the own research, e.g., a very concise and comprehensive goal.

-Native English speaker should review the paper. There are many grammatical errors in the manuscript such as – In line 1 of the Abstract “The continues increase of …” it should be continuous instead of continues.

-In section 1 Introduction, Line 29, it should be 83.8% rather than 83.8 percent.

-Referencing style should be consistent and the authors are suggested to use proper referencing styles such as APA, IEEE, etc.

-Authors should refer to more recent journals rather than conference papers.

- The authors should clearly mention research contributions. These are very naïve and lack important details.

-Authors should summarize the literature review in some form of comparison table to draw conclusions

-There are lot many works based on ensemble detection of Android malware detection and classification. I failed to understand the novelty of this work.

- The dataset “Drebin” chosen by the authors contains the samples collected in the period of August 2010 to October 2012, which is now obsolete. I recommend the authors to work on newer datasets such as CICAndMal2017, CICMalDroid 2020, etc.

-Figure 1 representing the flowchart needs to be modified since it lacks significant details.

-The figures are not clear. They should be 600 DPI.

- Some of the important references are missing in this domain and need to be replaced with the recent literature – “Garg, S., & Baliyan, N. (2019). A novel parallel classifier scheme for vulnerability detection in android. Computers & Electrical Engineering, 77, 12-26.”

- Connecting lines between the different sections are missing, which makes it difficult to comprehend. Instead, it is advised to revise the storyline of the manuscript to enhance the readability

- Section 4 (Experimental Evaluation) needs a more detailed discussion

Author Response

Comment #1

Response #1

English language native speaker has improved the language of the paper. Below is the language editing certificate

Comment #2

-In section 1 Introduction, Line 29, it should be 83.8% rather than 83.8 percent.

Response #2

We thank the reviewer for his comment, We have updated the revised manuscript as suggested by the reviewer.

Comment #3

-Referencing style should be consistent and the authors are suggested to use proper referencing styles such as APA, IEEE, etc.

-Authors should refer to more recent journals rather than conference papers.

Response #3

We thank the reviewer for his comment, We have updated the revised manuscript as suggested by the reviewer.

Comment #4

- The authors should clearly mention research contributions. These are very naïve and lack important details.

Response #4

We thank the reviewer for his comment, as suggested by the reviewer; we have updated the revised manuscript by clearly mentioning the research contribution.

Comment #5

-Authors should summarize the literature review in some form of comparison table to draw conclusions

Response #5

We thank the reviewer for his comment; we have updated the literature review in the revised manuscript and added recent literature suggested by the reviewer.

Comment #6

-There are lot many works based on ensemble detection of Android malware detection and classification. I failed to understand the novelty of this work.

Response #6

We thank the reviewer for his comment, as suggested by the reviewer; we have updated the revised manuscript by clearly mentioning the research contribution.

Comment #7

Response #7

We thank the reviewer for his comment, as suggested by the reviewer; we have updated the revised manuscript by mentioning that the used dataset include samples from newer datasets.

Comment #8

-Figure 1 representing the flowchart needs to be modified since it lacks significant details. The figures are not clear. They should be 600 DPI.

Response #8

We thank the reviewer for his comment; we have updated the revised manuscript by making the figures clear.

Comment #9

Response #9

We thank the reviewer for his comment; we have updated the literature review in the revised manuscript and added recent literature suggested by the reviewer.

Comment #10

- Connecting lines between the different sections are missing, which makes it difficult to comprehend. Instead, it is advised to revise the storyline of the manuscript to enhance the readability, Section 4 (Experimental Evaluation) needs a more detailed discussion

Response #10

We thank the reviewer for his comment; we have revised the manuscript as suggested by the reviewer. Moreover, English language native speaker has improved the language of the paper. Below is the language editing certificate

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

Authors have done the needful changes as suggested

Article Menu

Android Malware Classification Using Optimized Ensemble Learning Based on Genetic Algorithms

Further Information

Guidelines

MDPI Initiatives

Follow MDPI