Advanced Financial Fraud Malware Detection Method in the Android Environment
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe comments are as follows,
- The study relies solely on data from a single South Korean bank, risking regional overfitting to local user behaviors or financial ecosystems. The lack of validation across diverse financial environments (e.g., payment habits, app designs in other countries) undermines the model’s generalizability.
- Malware samples are not classified by family or attack type (e.g., banking trojans, ransomware), preventing evaluation of the model’s effectiveness against specific advanced threats. This limits actionable insights for targeted defense strategies in real-world scenarios.
- The fixed 10% undersampling rate lacks theoretical justification, with no comparison to alternatives like SMOTE, ensemble learning, or cost-sensitive approaches. Random sampling may discard critical malicious patterns, compromising model robustness.
- While excluding app names due to multilingual complexity, the authors omit alternatives like Unicode pattern analysis or language-agnostic semantic extraction. This oversight may neglect critical features (e.g., spoofed bank names in app titles).
- No ablation study validates the independent impact of proposed features (e.g., user age/gender statistics). Performance gains could stem from data scale rather than feature innovation, leaving their necessity unproven.
- The model remains untested against financial malware evasion tactics (code obfuscation, dynamic loading, zero-day exploits). This overestimates real-world detection capability, particularly against APT-level threats.
- Partial dataset disclosure (GitHub subset) and incomplete feature engineering details (e.g., age/gender normalization) hinder independent verification. Full pipeline documentation and anonymization protocols are needed.
none
Author Response
Please see the attachment
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for Authors1) Abstract – Please better clarify the meaning of the following statement: “Moreover, 92 datasets were compiled through daily training 16 to select the optimal model, with five ML algorithms used to evaluate the proposed approach.”
2) Please try to not anticipate too much numerical results within the abstract.
3) In my opinion, the authors should compress the statement of contributions into 3-4 main innovations provided by this work.
4) There is a typo in Fig. 3: “hyperprameter search”.
5) The following papers on dynamic analysis was missing in the literature review. Also, it should be discussed that incremental learning is essential in malware analysis, as malware signatures should be constantly incorporated in the considered ML model, e.g.:
Xu, Xiaohu, et al. "Advancing malware detection in network traffic with self-paced class incremental learning." IEEE Internet of Things Journal (2024).
6) Other than malware detection, it would be very useful if the authors could discuss (at least) how the proposed framework could be applied to classify also malware types, i.e. to enable taking the appropriate countermeasures.
7) The authors should clarify whether the considered real data from BanK A will be released (in anonymized form) for reproducibility purposes.
8) Other than end-to-end malware detection pipeline, it would be useful if the authors could perform some statistical evaluation (e.g. histograms) of the static analysis-originated features to provide a snapshot of the challenges associated with the considered dataset.
9) In Sec. 5, as a potential avenue of research, the authors may also want to mention explainable AI to interpret the result of the proposed ML-based detection pipeline, e.g. following:
Nascita, Alfredo, et al. "A Survey on Explainable Artificial Intelligence for Internet Traffic Classification and Prediction, and Intrusion Detection." IEEE Communications Surveys & Tutorials (2024).
Comments on the Quality of English LanguageCan be further improved.
Author Response
Please see the attachment
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsNo further comments.
Comments on the Quality of English Languagenone
Reviewer 2 Report
Comments and Suggestions for AuthorsThe authors have satisfactorily addressed my previous concerns.