Next Article in Journal
Design and Application of a Multi-Source Fusion Settlement Monitoring System for the Construction Period of Seawall
Previous Article in Journal
Self-Coagulations of Mass and Energy in Laboratory Plasmas and Their Implications
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

From Hand-Crafted Features to Large Language Models: A Comparative Evaluation of Android Malware Detection Paradigms

by
Egemen Taşkın
1,* and
İbrahim Alper Doğru
2
1
Department of Information Security Engineering, Graduate School of Natural and Applied Sciences, Gazi University, Ankara 06560, Turkey
2
Department of Computer Engineering, Faculty of Technology, Gazi University, Ankara 06560, Turkey
*
Author to whom correspondence should be addressed.
Appl. Sci. 2026, 16(11), 5600; https://doi.org/10.3390/app16115600
Submission received: 7 May 2026 / Revised: 31 May 2026 / Accepted: 2 June 2026 / Published: 3 June 2026

Abstract

The rapid evolution of Android malware and increasingly sophisticated obfuscation techniques challenge traditional detection systems. This study presents a rigorous, unified comparative evaluation of three methodological paradigms-classical machine learning, Transformer-based architectures, and generative Large Language Models (LLMs)-for static Android malware detection. We construct a balanced dataset of 12,000 APKs from the AndroZoo repository and implement a fold-independent experimental pipeline featuring constraint-aware sequence selection for Transformers and structured LLM-driven feature distillation with parameter-efficient fine-tuning (LoRA). All evaluations employ stratified 5-fold cross-validation with statistical significance testing and comprehensive resource profiling. Classical models (e.g., Random Forest) achieve strong baselines (~0.975 F1) but exhibit limited contextual resilience. Distilled Transformers (RoBERTa ~0.970 F1-score) deliver an optimal accuracy-latency trade-off for real-time screening. While zero-shot LLMs show moderate performance (~0.74–0.84 F1), integrating LLM-extracted semantic features with LoRA fine-tuning yields accuracy (Qwen3.5-27B: ~0.982 F1-score), cross-dataset generalization, and structured interpretability. Hallucination analysis reveals a manageable 7.7% rate, with ablation confirming minimal impact on downstream classification. We advocate a tiered deployment strategy: lightweight Transformers for high-throughput screening, complemented by fine-tuned LLMs for deep forensic analysis and explainable threat intelligence. This hybrid framework effectively balances computational efficiency, detection robustness, and operational interpretability for modern Android security pipelines.
Keywords: Android malware detection; static analysis; transformer models; large language models; LLM-based feature extraction Android malware detection; static analysis; transformer models; large language models; LLM-based feature extraction

Share and Cite

MDPI and ACS Style

Taşkın, E.; Doğru, İ.A. From Hand-Crafted Features to Large Language Models: A Comparative Evaluation of Android Malware Detection Paradigms. Appl. Sci. 2026, 16, 5600. https://doi.org/10.3390/app16115600

AMA Style

Taşkın E, Doğru İA. From Hand-Crafted Features to Large Language Models: A Comparative Evaluation of Android Malware Detection Paradigms. Applied Sciences. 2026; 16(11):5600. https://doi.org/10.3390/app16115600

Chicago/Turabian Style

Taşkın, Egemen, and İbrahim Alper Doğru. 2026. "From Hand-Crafted Features to Large Language Models: A Comparative Evaluation of Android Malware Detection Paradigms" Applied Sciences 16, no. 11: 5600. https://doi.org/10.3390/app16115600

APA Style

Taşkın, E., & Doğru, İ. A. (2026). From Hand-Crafted Features to Large Language Models: A Comparative Evaluation of Android Malware Detection Paradigms. Applied Sciences, 16(11), 5600. https://doi.org/10.3390/app16115600

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop