Mitigating Metamorphic Malware Through Adversarial Learning Techniques
Abstract
1. Introduction
- 1.
- How does a fitness function used in the EA influence the capability of the proposed method to discover new evasive variants?
- 2.
- How diverse is a set of newly-produced, novel variants with respect to the range of their behavioural signatures and their structural similarity in comparison with the original malware?
- 3.
- How do the 63 mainstream antivirus products (i.e., antivirus engines) differ in their ability to detect the newly generated malware variants across each of the malware families evaluated?
- 4.
- Which well-known Machine Learning (ML) models are improved more significantly, when trained with the newly-produced mutants, in the classification of metamorphic malware?
- 5.
- Can a transformer model—such as BERT, which has been pretrained on large-scale NLP corpora, be applied in a transfer learning setting to enhance the classification of metamorphic malware when the newly generated mutants are incorporated into the training data?
2. Related Work
3. Methodology (Learning to Defend Against Metamorphic Malware Using Adversarial Samples)
- Step one: An understanding of the current samples is essential. This includes understanding what kind of data is required for analysis. For instance, the samples might comprise mobile malware, such as an Android malware source. The framework can also be adapted to other classes of malware, depending on the underlying platform in which the samples run, for example, desktop-focused threats, or to other operating systems, including iOS, where suitable intermediate representations and analysis tools are available.
- Step two: This includes reverse engineering the malware to a form in which variants can be easily created. This is where the need for a disassembly tool, such as apktool comes in. Apktool [38] is employed in this work to disassemble Android applications (APK files) into their smali representation. Alternative disassembly utilities may be used depending on the platform on which the malware executes. For example, Portable Executable (PE) Viewers (e.g., CFF Explorer [39]) for desktop-based malware.
- Step three: At this stage, the disassembled malware is modified under the guidance of a fitness function to produce new mutant variants. The smali representation is treated as editable code, and mutation operators are applied to introduce controlled transformations. Although this process can, in principle, be generalised to any code-like intermediate representation, the specific operators must be tailored to the syntax and semantics of the target platform. In this work, transformations were informed by properties of the malware, such as structural layout, behavioural profile, and its ability to evade existing detection engines. Other characteristics could equally be incorporated to guide the search for malicious and diverse mutants.
- Step four: The final stage employs the mutants produced in Step three to train machine-learning models to improve protection against future variants. This stage does not require designing new ML architectures; standard models with well-understood complexities can be employed. In this study, both feature-based and sequence-based classifiers were used, although alternative ML models could also be applied within the same framework.
3.1. Evolutionary Algorithm
| Algorithm 1 Evolutionary Algorithm [15] |
| 1: initialise_random_population() ▹ created by mutating original malware |
| 2: evaluate() ▹ score each malware variant based on the chosen fitness measure |
| 3: = min() ▹ the worst member of the population |
| 4: while Maximum number of iterations not reached do |
| 5: select_parent(P) |
| 6: mutate(p) |
| 7: evaluate(c) |
| 8: if then |
| 9: ▹ add child to the population |
| 10: remove_worst() ▹ remove least fit mutant |
| 11: = min() |
| 12: end if |
| 13: end whilereturn |
3.2. Initialisation
3.3. Selection of Parent Malware
3.4. Smali-Level Mutation Operators and Execution Constraints
- 1.
- M1—Instruction reordering (control-flow no-op insertion). We inject inert control-flow blocks immediately after local-register declarations (e.g., .locals 3), using label patterns goto :goto_k, :cond_k, :goto_k and a benign const-string assignment. This perturbs basic-block layout and CFG structure without affecting semantics (instructionreordering).
- 2.
- M2—Garbage (junk) code insertion. Following a .locals line, we inject benign statements such as .line 0 and const-string v0, “ ” that alter the textual structure but not the program’s logic (garbagecodeinsert).
- 3.
- M3—Variable/register renaming. We rename registers within method scope while respecting .locals bounds and type uses, preserving method signatures and avoiding register overflow (variablerename).
3.5. Fitness Functions (Quantitative Definitions)
On Single-Objective vs. Multi-Objective Optimisation
| Algorithm 2 Steady-State EA for Smali-level Malware Mutation |
| 1: Input: parent malware m; objective ; population size P; iterations T |
| 2: Output: validated mutants with fitness values and per-engine signatures |
| 3: Initialise population by mutating m once; validate (rebuild/sign/align → emulator |
| → Droidbox); keep only feasible individuals |
| 4: for to T do |
| 5: Select parent (e.g., tournament) |
| 6: Sample operator from {M1, M2, M3} uniformly; apply to p to obtain x |
| 7: Validate x: rebuild/sign/align; emulator launch; Droidbox maliciousness |
| 8: if x infeasible then |
| 9: assign worst fitness and discard |
| 10: else |
| 11: evaluate ; insert x into with steady-state replacement |
| 12: end if |
| 13: end for |
| 14: return |
4. Experiments and Discussion
4.1. Malware Samples for Evaluation
4.1.1. Ethical and Security Safeguards
4.1.2. Dataset Composition and Validation
4.2. Evolutionary Algorithm-Method
4.2.1. Evolutionary Parameters
Random Search Baseline
4.2.2. Collection of Relevant Metrics
- AV engine: The AV engine used in this work is VirusTotal, and the function DR(x) derives its value from it. VirusTotal is an online malware-scanning platform that aggregates the results of more than 70 antivirus engines, and it is used in this work to evaluate the evasiveness of the generated mutants. The fitness value produced by this function represents the percentage of antivirus engines the variants evaded. It is normalised to a value between 0 and 1. Where 0 means the variant was able to evade being detected by all the antivirus engines and 1 means the variant was detected by all antivirus engines. It also retains the information regarding the mutant’s evasion score. Note that though VirusTotal now aggregates over 70 antivirus/analysis engines; however, at the time our experiments were executed, 63 engines were consistently exposed via our scanning workflow, and all DR computations were normalised to 63 for comparability across treatments.
- Collecting the behavioural trace: Strace [53] is employed to capture the runtime behaviour of each variant by logging its system calls. The primary activity of the application is triggered using MonkeyRunner [54] to emulate user interaction. The output produced by Strace consists of a log containing the process ID together with the invoked system calls and their arguments. This log is subsequently converted into a fixed-length vector in which each element corresponds to the frequency of a specific system call. As the analysis considers 251 system calls, the resulting vector contains 251 entries. The behavioural similarity BS(x) is then computed as the cosine similarity between the system-call frequency vector of a variant and that of its corresponding original malware sample.
- Libraries for structural similarity SS(x): To assess text-level similarity between the original malware samples and their mutants, the following metrics are employed:
- –
- Cosine similarity: Computes the cosine of the angle between two non-zero vectors.
- –
- Levenshtein distance: Measures the minimum number of deletions, insertions, or substitutions required to transform file A into file B [55].
- –
- FuzzyWuzzy: Performs approximate string matching to identify near-similar text patterns [56].
For source-code-level similarity, we apply:- –
- JPlag: Tokenises both programs and identifies matching token sequences from the largest to the smallest [44].
- –
- Sherlock: Generates signatures of both programs and computes similarity between these signatures [44].
- –
- Normalised Compression Distance (NCD): Estimates similarity by comparing compression lengths of files. Given the original malware m and a variant v, the compression distance is defined as [57]:where is the compressed size of file m under compressor Z. In this work, 7-Zip is used as the compression engine.
- Each of the similarity metrics produces a value between 0 and 1, where a score of 1 means the original malware and the variant are identical, and 0 means the original malware and the variant are completely different. The average of these metrics is computed, and that represents the structural similarity. These metrics were implemented using various Python 2.7 and glibc 2.23 libraries.
Structural Similarity Aggregation and Sensitivity
VirusTotal Temporal Stability and Reproducibility
Practical Runtime and Scalability Considerations
4.3. Evolutionary Algorithm—Results and Analysis
4.3.1. Influence of Fitness Function on Evasiveness of Evolved Mutants
- Dougalek: Figure 3 shows that all three fitness functions produce mutants that are consistently more evasive than the original malware. While 40.3% of AV engines fail to detect the original Dougalek sample, the best EA-generated variants evade 72%, 66.7%, and 67.3% of engines under DR(x), BS(x), and SS(x), respectively. Notably, even when the EA does not explicitly optimise for evasiveness (i.e., using BS(x) or SS(x)), it still yields highly evasive variants. The median detection-failure rates for EA-generated mutants range from 62.1% to 69.4%, depending on the objective. In contrast, the Random Search baseline achieves a median of 58.5%, demonstrating that the EA provides a measurable advantage.
- Droidkungfu: A similar pattern appears in Figure 4. The original sample evades 65% of engines, whereas the strongest mutants produced by the EA evade 94%, 82.1%, and 83% of engines for DR(x), BS(x), and SS(x), respectively. Median evasion rates for EA-generated mutants lie between 73.2% and 87.3%, while the Random Search baseline achieves only 63.1%, again showing that guided evolution offers a clear benefit.
- GGtracker: Figure 5 demonstrates that all objectives again produce mutants that outperform the original malware in terms of evasiveness. Only 38.3% of engines fail to detect the original GGTracker sample, while the best evolved variants evade 73.3%, 62.1%, and 62.1% of engines for DR(x), BS(x), and SS(x), respectively. The similarity of the boxplots for all three fitness functions indicates that substantial evasion can arise even when the EA primarily targets behavioural or structural diversification.From Figure 5, the median detection-failure rate for GGTracker mutants produced by the EA ranges from 61.4% to 66%, depending on the fitness function. In contrast, the Random Search baseline (RS(x)) attains a median rate of 61.4%. The similarity between RS(x) and the outcomes for BS(x) and SS(x) indicates that, for this family, these two objectives do not surpass the random baseline when used in an evolutionary setting. However, Figure 5 also shows that EA runs optimising DR(x) yield clearly higher evasiveness than RS(x).
4.3.2. Analysis of the Evasion Characteristics of the New Mutants
- Dougalek: Figure 6 shows the extent to which each engine fails against mutants when evolved under the three objectives. Under DR(x), fourteen engines, including AVG and Tencent, detect all mutants consistently, whereas seventeen engines, such as AVware and McAfee, fail to detect every mutant. For BS(x), nineteen engines (e.g., AVG and Fortinet) remain robust, while eleven engines (including GData and McAfee) fail on all mutants. Under SS(x), sixteen engines (e.g., Symantec Mobile and Fortinet) detect all variants, whereas seventeen engines, such as McAfee-GW and BitDefender, miss every mutant.
- Droidkungfu: As shown in Figure 7, three engines, including Avast Mobile and NANO Antivirus, detect all DR(x) mutants, while twelve engines (e.g., Fortinet and Kaspersky) fail to detect any of them. For BS(x), six engines (including Avast Mobile and NANO Antivirus) succeed on all mutants, whereas seven engines (e.g., Symantec Mobile and Tencent) fail completely. Under SS(x), seven engines such as AVG and Cyren detect all variants, while eleven engines (including Kaspersky and ZoneAlarm) miss every mutant.
- GGtracker: Figure 8 shows that nine engines (e.g., CAT-QuickHeal and DrWeb) detect all DR(x) mutants, whereas thirteen (such as Arcabit and BitDefender) fail consistently. Under BS(x), fifteen engines, including K7GW and Kaspersky, detect every mutant, while sixteen engines (e.g., AVware and TrendMicro) miss them all. Similarly, SS(x) yields sixteen engines (such as Avast and AVG) detecting all mutants, while eighteen (including Cyren and McAfee) fail across all samples.
- percentage of unique detection signatures,
- percentage of unique behavioural signatures,
- structural similarity.
- (i)
- Detection-signature diversity: A detection signature is represented as a 63-dimensional vector d, where if engine i detects the variant and otherwise. For each subset , denoting variants from malware family m evolved under fitness objective f, we compute the proportion of unique detection signatures. Of the 76 malicious mutants generated across all classes and objectives, removal of duplicates yields 46 unique detection vectors. Results in Table 4 show that Droidkungfu produces the greatest uniqueness for DR(x) (90%) and BS(x) (89%), while Dougalek produces the highest uniqueness under SS(x) (50%). To visualise the dispersion of these signatures, we project all 63-dimensional vectors into two dimensions using t-SNE [58]. This method preserves local neighbourhood structure while maintaining global relationships, facilitating degree-of-diversity interpretation. Figure 9 illustrates the resulting clusters. Droidkungfu mutants form a distinguishable cluster in the upper-left region of the plot, whereas variants from Dougalek and GGTracker are more intermixed but still exhibit sub-clustering based on the generating fitness function.
- (ii)
- Behavioural-signature diversity: Each variant’s behavioural signature is a system-call frequency vector b of length 251. Using the same process applied to detection signatures, we count unique behavioural vectors per subset. Table 5 shows that Dougalek achieves 100% uniqueness under both DR(x) and BS(x). GGTracker is most diverse under SS(x). Notably, across all experiments, only 33 out of 251 system calls ever appear with non-zero frequency, regardless of malware family or fitness function, yet these combinations still yield substantial behavioural differentiation. Overall, behavioural signatures show greater variance than detection signatures, though both exhibit clear and meaningful diversity.
- (iii)
- Structural diversity: Structural similarity scores between all pairwise combinations of mutants are computed, taking values between 0 (completely different) and 1 (identical). Heatmaps of these pairwise scores are shown in Figure 10, Figure 11 and Figure 12. The results indicate that for GGTracker, SS(x) produces the highest structural diversity. For Droidkungfu, BS(x) yields the most variation. For Dougalek, SS(x) again produces the most structurally distinct mutants. Across families, Droidkungfu consistently shows the greatest spread in structural differences, whereas Dougalek mutants exhibit the least structural variation overall.Future direction—towards unified diversity metrics—In this paper, we examine diversity along three axes, detection-signature diversity, behavioural dispersion, and structural dispersion, using the quantitative measures already reported in Table 4 and Table 5, and Figure 10, Figure 11 and Figure 12. While these metrics are analysed separately to preserve methodological clarity, an interesting future direction would be to explore unified or composite diversity indices that combine these components into a single descriptive measure. Such an extension could enable more direct quantitative comparisons with evasiveness across objectives and families, but it lies outside the scope of the present analysis.Quantitative metrics and visualisation—In addition to the t-SNE visualisations, our interpretation of diversity relies on the quantitative measures already reported above: (i) the proportion of unique 63-bit detection signatures, (ii) the mean pairwise behavioural distance based on (1−cosine) over syscall-frequency vectors, and (iii) the mean pairwise structural distance based on (1−SS). These numerical indicators form the basis of our comparative analysis across evolutionary treatments, while the t-SNE plots serve only as an illustrative projection of these underlying relationships rather than a source of statistical evidence.
4.3.3. Explicit Diversity Metrics and Evolution Benefits Matrix
4.3.4. Contextualizing Our EA Method Within Existing Literature
4.4. Machine Learning—Method
4.4.1. Feature Extraction and Preprocessing Pipeline
4.4.2. Model Architectures and Training
Explanation of Computational Characteristics
4.5. Machine Learning—Results and Analysis
4.5.1. Enhancing Metamorphic Malware Classification with Evolved Mutants
4.5.2. Enhancing Malware Classification via Transfer Learning with BERT and the Evolved Mutants
4.5.3. Superior Cost-Efficiency Through Transfer Learning
4.5.4. Statistical Reporting for Diversity and Classification
5. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- CrowdStrike. CrowdStrike 2024 Global Threat Report. Available online: https://www.crowdstrike.com/en-us/resources/reports/crowdstrike-2024-global-threat-report/ (accessed on 15 September 2024).
- SOPHOS. Sophos 2024 Threat Report: Cyberthreats to Small Businesses Are Expanding Beyond Ransomware. Here’s What You Need to Know. Available online: https://assets.sophos.com/X24WTUEQ/at/wwf5phjtj9bjvmpqqsbfxc/sophos-2024-threat-report.pdf (accessed on 15 September 2024).
- SOPHOS. Sophos 2022 Threat Report: Gravitational Force of Ransomware Black Hole Pulls in Other Cyberthreats to Create One Massive, Interconnected Ransomware Delivery System. Available online: https://www.sophos.com/en-us/press/press-releases/2021/11/sophos-2022-threat-report#:~:text=Sophos%2C%20a%20global%20leader%20in,significant%20implications%20for%20IT%20security (accessed on 25 May 2022).
- Brezinski, K.; Ferens, K. Metamorphic Malware and Obfuscation: A Survey of Techniques, Variants, and Generation Kits. Secur. Commun. Netw. 2023, 2023, 8227751. [Google Scholar] [CrossRef]
- Zuo, Z.H.; Zhu, Q.X.; Zhou, M.T. On the time complexity of computer viruses. IEEE Trans. Inf. Theory 2005, 51, 2962–2966. [Google Scholar] [CrossRef]
- F-Secure. 2014 Mobile Threat Report. Available online: https://www.infopoint-security.de/medien/f_secure_mobile_threat_report_q1_2014_print_version.pdf (accessed on 19 July 2019).
- Maiorca, D.; Ariu, D.; Corona, I.; Aresu, M.; Giacinto, G. Stealth attacks: An extended insight into the obfuscation effects on Android malware. Comput. Secur. 2015, 51, 16–31. [Google Scholar] [CrossRef]
- Hasan, R.; Biswas, B.; Samiun, M.; Saleh, M.A.; Prabha, M.; Akter, J.; Joya, F.H.; Abdullah, M. Enhancing malware detection with feature selection and scaling techniques using machine learning models. Sci. Rep. 2025, 15, 9122. [Google Scholar] [CrossRef]
- Hawana, A.; Hassan, E.S.; El-Shafai, W.; El-Dolil, S.A. Enhancing malware detection with deep learning convolutional neural networks: Investigating the impact of image size variations. Secur. Priv. 2025, 8, e70000. [Google Scholar] [CrossRef]
- Roy, S.; Bhanja, S.; Das, A. AndyWar: An intelligent android malware detection using machine learning. Innov. Syst. Softw. Eng. 2025, 21, 303–311. [Google Scholar] [CrossRef]
- Alomar, A.; AlJarullah, A.; Abu-Ghazalah, S. Permissions-based Android malware detection using machine learning. Neural Comput. Appl. 2025, 37, 5255–5270. [Google Scholar] [CrossRef]
- Wasif, M.S.; Miah, M.P.; Hossain, M.S.; Alenazi, M.J.; Atiquzzaman, M. CNN-ViT synergy: An efficient Android malware detection approach through deep learning. Comput. Electr. Eng. 2025, 123, 110039. [Google Scholar] [CrossRef]
- Lunghi, D.; Simitsis, A.; Caelen, O.; Bontempi, G. Adversarial Learning in Real-World Fraud Detection: Challenges and Perspectives. In DEC ’23: Proceedings of the Second ACM Data Economy Workshop; Association for Computing Machinery: New York, NY, USA, 2023; pp. 27–33. [Google Scholar] [CrossRef]
- Eiben, A.E.; Smith, J.E. What is an Evolutionary Algorithm? In Introduction to Evolutionary Computing; Springer: Berlin/Heidelberg, Germany, 2003; pp. 15–35. [Google Scholar] [CrossRef]
- Babaagba, K.O.; Tan, Z.; Hart, E. Nowhere Metamorphic Malware Can Hide—A Biological Evolution Inspired Detection Scheme. In Proceedings of the Dependability in Sensor, Cloud, and Big Data Systems and Applications; Wang, G., Bhuiyan, M.Z.A., De Capitani di Vimercati, S., Ren, Y., Eds.; Springer: Singapore, 2019; pp. 369–382. [Google Scholar] [CrossRef]
- Babaagba, K.O.; Tan, Z.; Hart, E. Improving Classification of Metamorphic Malware by Augmenting Training Data with a Diverse Set of Evolved Mutant Samples. In Proceedings of the 2020 IEEE Congress on Evolutionary Computation (CEC); IEEE: Piscataway, NJ, USA, 2020; pp. 1–7. [Google Scholar] [CrossRef]
- Babaagba, K.O. Application of Evolutionary Machine Learning in Metamorphic Malware Analysis and Detection. Ph.D. Thesis, Edinburgh Napier University, Edinburgh, UK, 2022. [Google Scholar]
- Habib, F.; Shirazi, S.H.; Aurangzeb, K.; Khan, A.; Bhushan, B.; Alhussein, M. Deep Neural Networks for Enhanced Security: Detecting Metamorphic Malware in IoT Devices. IEEE Access 2024, 12, 48570–48582. [Google Scholar] [CrossRef]
- Babaagba, K.O.; Tan, Z.; Hart, E. Automatic Generation of Adversarial Metamorphic Malware Using MAP-Elites. In Proceedings of the Applications of Evolutionary Computation: 23rd European Conference, EvoApplications 2020, Held as Part of EvoStar 2020, Seville, Spain, 15–17 April 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 117–132. [Google Scholar] [CrossRef]
- Zhan, D.; Liu, X.; Bai, W.; Li, W.; Guo, S.; Pan, Z. GAME-RL: Generating Adversarial Malware Examples Against API Call Based Detection via Reinforcement Learning. IEEE Trans. Dependable Secur. Comput. 2025, 22, 5431–5447. [Google Scholar] [CrossRef]
- Manju; Rana, C. Application of Deep Reinforcement Learning in Adversarial Malware Detection. In Deep Reinforcement Learning and Its Industrial Use Cases; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2024; Chapter 5; pp. 91–113. [Google Scholar] [CrossRef]
- Lee, J.; Austin, T.H.; Stamp, M. Compression-based analysis of metamorphic malware. Int. J. Secur. Netw. 2015, 10, 124–136. [Google Scholar] [CrossRef]
- Charoenthanakitkul, A.; Viboonsang, P.; Kosolsombat, S. Optimizing Malware Detection with Random Forest, XGBoost, LightGBM, and LLM-Reporting. In Proceedings of the 2025 IEEE International Conference on Cybernetics and Innovations (ICCI); IEEE: Piscataway, NJ, USA, 2025; pp. 1–5. [Google Scholar] [CrossRef]
- Choudhary, S.P.; Vidyarthi, M.D. A Simple Method for Detection of Metamorphic Malware using Dynamic Analysis and Text Mining. Procedia Comput. Sci. 2015, 54, 265–270. [Google Scholar] [CrossRef]
- Baysa, D.; Low, R.M.; Stamp, M. Structural entropy and metamorphic malware. J. Comput. Virol. 2013, 9, 179–192. [Google Scholar] [CrossRef]
- Armoun, S.E.; Hashemi, S. A General Paradigm for Normalizing Metamorphic Malwares. In Proceedings of the 2012 10th International Conference on Frontiers of Information Technology; IEEE: Piscataway, NJ, USA, 2012; pp. 348–353. [Google Scholar] [CrossRef]
- Zheng, M.; Lee, P.P.C.; Lui, J.C.S. ADAM: An Automatic and Extensible Platform to Stress Test Android Anti-virus Systems. In Proceedings of the Detection of Intrusions and Malware, and Vulnerability Assessment; Flegel, U., Markatos, E., Robertson, W., Eds.; Springer: Berlin/Heidelberg, Germany, 2013; pp. 82–101. [Google Scholar] [CrossRef]
- Rastogi, V.; Chen, Y.; Jiang, X. DroidChameleon: Evaluating Android Anti-malware Against Transformation Attacks. In ASIA CCS ’13: Proceedings of the 8th ACM SIGSAC Symposium on Information, Computer and Communications Security; Association for Computing Machinery: New York, NY, USA, 2013; pp. 329–334. [Google Scholar] [CrossRef]
- Nawaz, M.S.; Fournier-Viger, P.; Nawaz, M.Z.; Chen, G.; Wu, Y. Metamorphic Malware Behavior Analysis Using Sequential Pattern Mining. In Proceedings of the Machine Learning and Principles and Practice of Knowledge Discovery in Databases; Springer: Cham, Switzerland, 2021; pp. 90–103. [Google Scholar] [CrossRef]
- Jha, A.K.; Vaish, A.; Patil, S. A Novel Framework for Metamorphic Malware Detection. SN Comput. Sci. 2022, 4, 10. [Google Scholar] [CrossRef]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
- Li, Z.; Wang, P.; Wang, Z. FlowGANAnomaly: Flow-Based Anomaly Network Intrusion Detection with Adversarial Learning. Chin. J. Electron. 2024, 33, 58–71. [Google Scholar] [CrossRef]
- Aydogan, E.; Sen, S. Automatic Generation of Mobile Malwares Using Genetic Programming. In Applications of Evolutionary Computation; Springer: Cham, Switzerland, 2015; pp. 745–756. [Google Scholar] [CrossRef]
- Xu, W.; Qi, Y.; Evans, D. Automatically Evading Classifiers—A Case Study on PDF Malware Classifier. NDSS 2016, 2016, 1–15. [Google Scholar] [CrossRef]
- Javaheri, D.; Lalbakhsh, P.; Hosseinzadeh, M. A Novel Method for Detecting Future Generations of Targeted and Metamorphic Malware Based on Genetic Algorithm. IEEE Access 2021, 9, 69951–69970. [Google Scholar] [CrossRef]
- Bala, Z.; Zambuk, F.U.; Ya’u Imam, B.; Ya’u Gital, A.; Shittu, F.; Aliyu, M.; Abdulrahman, M.L. Transfer Learning Approach for Malware Images Classification on Android Devices Using Deep Convolutional Neural Network. Procedia Comput. Sci. 2022, 212, 429–440. [Google Scholar] [CrossRef]
- Raza, A.; Qaisar, Z.H.; Aslam, N.; Faheem, M.; Ashraf, M.W.; Chaudhry, M.N. TL-GNN: Android Malware Detection Using Transfer Learning. Appl. AI Lett. 2024, 5, e94. [Google Scholar] [CrossRef]
- APKTOOL. APKTOOL. Available online: http://ibotpeaches.github.io/Apktool (accessed on 26 February 2019).
- NTCore. CFF Explorer. Available online: https://ntcore.com/?page_id=388 (accessed on 10 June 2021).
- NTCore. Structure of a Smali. Available online: https://pysmali.readthedocs.io/en/latest/api/smali/language.html (accessed on 3 June 2020).
- The Honeynet Project. Droidbox. Available online: https://github.com/pjlantz/droidbox (accessed on 19 February 2019).
- Bakurov, I.; Murphy, A.; Ofria, C.; Banzhaf, W. A comparison of tournament and lexicase selection paradigms in regression problems: Error-based fitness versus correlation fitness. In GECCO ’25: Proceedings of the Proceedings of the Genetic and Evolutionary Computation Conference; Association for Computing Machinery: New York, NY, USA, 2025; pp. 970–979. [Google Scholar] [CrossRef]
- VTDOC. VirusTotal. Available online: https://developers.virustotal.com/reference#getting-started (accessed on 10 October 2023).
- Heres, D. Source Code Plagiarism Detection using Machine Learning. Ph.D. Thesis, Utrecht University, Utrecht, The Netherlands, 2017. [Google Scholar]
- Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 2002, 6, 182–197. [Google Scholar] [CrossRef]
- García-Teodoro, P.; Gómez-Hernández, J.; Abellán-Galera, A. Multi-labeling of complex, multi-behavioral malware samples. Comput. Secur. 2022, 121, 102845. [Google Scholar] [CrossRef]
- Zhou, Y.; Jiang, X. Android Malware Genome Project. Available online: http://www.malgenomeproject.org/ (accessed on 19 July 2019).
- Zhou, Y.; Jiang, X. Dissecting Android Malware: Characterization and Evolution. In Proceedings of the 2012 IEEE Symposium on Security and Privacy; IEEE: Piscataway, NJ, USA, 2012; pp. 95–109. [Google Scholar] [CrossRef]
- F-Secure. Trojan: Android/DroidKungFu.C. Available online: https://www.f-secure.com/v-descs/trojan-android-droidkungfu-c.shtml (accessed on 19 July 2019).
- F-Secure. Trojan: Android/GGTracker.A. Available online: https://www.f-secure.com/v-descs/trojan_android_ggtracker.shtml (accessed on 19 July 2019).
- TRENDMICRO. ANDROIDOS_DOUGALEK.A. Available online: https://www.trendmicro.com/vinfo/us/threat-encyclopedia/malware/androidos_dougalek.a (accessed on 19 July 2019).
- Babaagba, K.O.; Tan, Z.; Hart, E. Improving Classification of Metamorphic Malware. Available online: https://github.com/KehindeOloye/Improving-Classification-of-Metamorphic-Malware.git (accessed on 3 February 2020).
- Linux.die.net. Strace(1)-Linux Man Page. Available online: https://linux.die.net/man/1/strace (accessed on 12 April 2023).
- Developers. UI/Application Exerciser Monkey. Available online: https://developer.android.com/studio/test/monkey (accessed on 12 April 2023).
- Arockiya Jerson, J.; Preethi, N. An Analysis of Levenshtein Distance Using Dynamic Programming Method. In Proceedings of the 3rd International Conference on Recent Trends in Machine Learning, IoT, Smart Cities and Applications; Gunjan, V.K., Zurada, J.M., Eds.; Springer: Singapore, 2023; pp. 525–532. [Google Scholar] [CrossRef]
- Dhakal, A.; Poudel, A.; Pandey, S.; Gaire, S.; Baral, H.P. Exploring Deep Learning in Semantic Question Matching. In Proceedings of the 2018 IEEE 3rd International Conference on Computing, Communication and Security (ICCCS); IEEE: Piscataway, NJ, USA, 2018; pp. 86–91. [Google Scholar] [CrossRef]
- Ragkhitwetsagul, C.; Krinke, J.; Clark, D. A comparison of code similarity analysers. Empir. Softw. Eng. 2018, 23, 2464–2519. [Google Scholar] [CrossRef]
- Gove, R.; Cadalzo, L.; Leiby, N.; Singer, J.M.; Zaitzeff, A. New guidance for using t-SNE: Alternative defaults, hyperparameter selection automation, and comparative evaluation. Vis. Inform. 2022, 6, 87–97. [Google Scholar] [CrossRef]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015; pp. 1–13. [Google Scholar]
- Keras_Team. Keras. Available online: https://github.com/keras-team/keras (accessed on 21 August 2022).
- Arun S. Maiya. Ktrain. Available online: https://github.com/amaiya/ktrain (accessed on 19 June 2024).
- Jacob Devlin. Google’s Multi-lingual Bert Model. Available online: https://github.com/google-research/bert/blob/master/multilingual.md (accessed on 11 March 2020).
- Firat, E.E.; Swallow, B.; Laramee, R.S. PCP-Ed: Parallel coordinate plots for ensemble data. Vis. Inform. 2023, 7, 56–65. [Google Scholar] [CrossRef]

















| Parameters | Values |
|---|---|
| Selection | Tournament Selection, k = 5 |
| Population Size | 20 |
| Iterations | 100 |
| Fitness Function | Dougalek | Droidkungfu | GGtracker |
|---|---|---|---|
| DR(x) | 7 | 10 | 9 |
| BS(x) | 7 | 9 | 8 |
| SS(x) | 10 | 9 | 7 |
| Metric | Mean | Std Dev | 95% CI Lower | 95% CI Upper | |
|---|---|---|---|---|---|
| Dougalek | DR(x) | 0.697 | 0.012 | 0.688 | 0.705 |
| BS(x) | 0.624 | 0.043 | 0.589 | 0.659 | |
| SS(x) | 0.663 | 0.011 | 0.655 | 0.670 | |
| RS(x) | 0.415 | 0.018 | 0.412 | 0.418 | |
| Droidkungfu | DR(x) | 0.880 | 0.024 | 0.862 | 0.897 |
| BS(x) | 0.729 | 0.065 | 0.680 | 0.778 | |
| SS(x) | 0.810 | 0.009 | 0.803 | 0.817 | |
| RS(x) | 0.366 | 0.017 | 0.363 | 0.369 | |
| GGtracker | DR(x) | 0.681 | 0.028 | 0.661 | 0.701 |
| BS(x) | 0.610 | 0.032 | 0.585 | 0.635 | |
| SS(x) | 0.616 | 0.034 | 0.593 | 0.639 | |
| RS(x) | 0.384 | 0.036 | 0.378 | 0.391 |
| Dougalek | Droidkungfu | GGtracker | |
|---|---|---|---|
| Detection | 43 | 90 | 78 |
| Behavioural Similarity | 71 | 89 | 50 |
| Structural Similarity | 50 | 33 | 29 |
| Dougalek | Droidkungfu | GGtracker | |
|---|---|---|---|
| Detection | 100 | 70 | 89 |
| Behavioural Similarity | 100 | 78 | 78 |
| Structural Similarity | 80 | 75 | 100 |
| Family–Objective | Evasiveness (Mean) | % Unique Detection | % Unique Behaviour | Structural Diversity |
|---|---|---|---|---|
| Dougalek–DR | 0.303 | 43 | 100 | Low–Moderate |
| Dougalek–BS | 0.376 | 71 | 100 | Moderate |
| Dougalek–SS | 0.337 | 50 | 80 | High |
| DroidKungFu–DR | 0.120 | 90 | 70 | Moderate |
| DroidKungFu–BS | 0.271 | 89 | 78 | High |
| DroidKungFu–SS | 0.190 | 33 | 75 | Moderate–High |
| GGTracker–DR | 0.319 | 78 | 89 | Moderate |
| GGTracker–BS | 0.390 | 50 | 78 | Moderate |
| GGTracker–SS | 0.384 | 29 | 100 | High |
| Hyper-Parameter | Value |
|---|---|
| LSTM Model | |
| Optimiser | Adam [59] |
| LSTM Layers | 2 |
| Neurons per LSTM Layer | 128 |
| Batch Size | 50 |
| Epochs | 3 |
| Binary Classification | |
| Loss Function | Binary Cross Entropy |
| Output Layer Activation | Sigmoid |
| Multi-class Classification | |
| Loss Function | Sparse Categorical Cross Entropy |
| Output Layer Activation | Softmax |
| BERT Model (via ktrain) | |
| Framework | ktrain (interface to Keras) |
| Pre-trained Model | Multilingual BERT, Cased |
| Batch Size | 50 |
| Epochs | 3 |
| Training Method | fit_onecycle |
| Data Loading | texts_from_folder |
| Preprocessing | BERT Tokenizer |
| Classifier Wrapper | text_classifier |
| Model | Primary Use Case | Key Implementation Parameters | Theoretical Training Complexity | Practical Runtime Factor |
|---|---|---|---|---|
| Naive Bayes | Non-sequential classification | Features: 251 (system calls). Classes: 2 or 4. | Feature Count (). Extremely fast. | |
| LSTM | Sequential classification | Layers: 2. Neurons/Layer: 128. Batch: 50. Seq. Length: L. Epochs: 3. | per sequence | Sequence Length (L) & Architecture. Moderate. |
| BERT (Multilingual) | Deep contextual classification | Base Model: ∼110 M params. Batch: 50. Seq. Length: ≤512. Epochs: 3. | per layer | Sequence Length (L) Squared. Very high (GPU). |
| Evolutionary Algorithm (EA) | Hyperparameter/Feature Optimisation | Population: 20. Generations: 100. Runs: 10. Eval Time: 4 min. | ∼55.5 h (sequential) | Fitness Eval. Time & Population. Massive, requires parallelisation. |
| Models | 6020 | 6050 | ||
|---|---|---|---|---|
| Binary | Multiclass | Binary | Multiclass | |
| NB | 0.91 | 0.81 | 0.92 | 0.80 |
| LSTM | 0.63 | 0.77 | 0.54 | 0.76 |
| Models | 6020 | 6050 | ||
|---|---|---|---|---|
| Binary | Multiclass | Binary | Multiclass | |
| NB | 0.91 | 0.81 | 0.92 | 0.8 |
| LSTM | 0.63 | 0.77 | 0.54 | 0.76 |
| BERT | 0.93 | 0.77 | 0.9 | 0.5 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Babaagba, K.O.; Tan, Z. Mitigating Metamorphic Malware Through Adversarial Learning Techniques. Network 2026, 6, 22. https://doi.org/10.3390/network6020022
Babaagba KO, Tan Z. Mitigating Metamorphic Malware Through Adversarial Learning Techniques. Network. 2026; 6(2):22. https://doi.org/10.3390/network6020022
Chicago/Turabian StyleBabaagba, Kehinde O., and Zhiyuan Tan. 2026. "Mitigating Metamorphic Malware Through Adversarial Learning Techniques" Network 6, no. 2: 22. https://doi.org/10.3390/network6020022
APA StyleBabaagba, K. O., & Tan, Z. (2026). Mitigating Metamorphic Malware Through Adversarial Learning Techniques. Network, 6(2), 22. https://doi.org/10.3390/network6020022

