MDPI - Publisher of Open Access Journals

25 pages, 1524 KiB

Open AccessArticle

Detecting Emerging DGA Malware in Federated Environments via Variational Autoencoder-Based Clustering and Resource-Aware Client Selection

by Ma Viet Duc, Pham Minh Dang, Tran Thu Phuong, Truong Duc Truong, Vu Hai and Nguyen Huu Thanh

Future Internet 2025, 17(7), 299; https://doi.org/10.3390/fi17070299 - 3 Jul 2025

Viewed by 385

Abstract

Domain Generation Algorithms (DGAs) remain a persistent technique used by modern malware to establish stealthy command-and-control (C&C) channels, thereby evading traditional blacklist-based defenses. Detecting such evolving threats is especially challenging in decentralized environments where raw traffic data cannot be aggregated due to privacy [...] Read more.

Domain Generation Algorithms (DGAs) remain a persistent technique used by modern malware to establish stealthy command-and-control (C&C) channels, thereby evading traditional blacklist-based defenses. Detecting such evolving threats is especially challenging in decentralized environments where raw traffic data cannot be aggregated due to privacy or policy constraints. To address this, we present FedSAGE, a security-aware federated intrusion detection framework that combines Variational Autoencoder (VAE)-based latent representation learning with unsupervised clustering and resource-efficient client selection. Each client encodes its local domain traffic into a semantic latent space using a shared, pre-trained VAE trained solely on benign domains. These embeddings are clustered via affinity propagation to group clients with similar data distributions and identify outliers indicative of novel threats without requiring any labeled DGA samples. Within each cluster, FedSAGE selects only the fastest clients for training, balancing computational constraints with threat visibility. Experimental results from the multi-zones DGA dataset show that FedSAGE improves detection accuracy by up to 11.6% and reduces energy consumption by up to 93.8% compared to standard FedAvg under non-IID conditions. Notably, the latent clustering perfectly recovers ground-truth DGA family zones, enabling effective anomaly detection in a fully unsupervised manner while remaining privacy-preserving. These foundations demonstrate that FedSAGE is a practical and lightweight approach for decentralized detection of evasive malware, offering a viable solution for secure and adaptive defense in resource-constrained edge environments. Full article

(This article belongs to the Special Issue Security of Computer System and Network)

► Show Figures

Figure 1

37 pages, 3741 KiB

Open AccessArticle

Enhancing Malware Detection via RGB Assembly Visualization and Hybrid Deep Learning Models

by Esra Eroğlu Demirkan and Murat Aydos

Appl. Sci. 2025, 15(13), 7163; https://doi.org/10.3390/app15137163 - 25 Jun 2025

Viewed by 534

Abstract

Malicious software presents significant challenges in cybersecurity, leveraging rapidly evolving technologies to bypass traditional defense mechanisms. This research introduces a novel image-based malware classification framework that uses hybrid-model Convolutional Neural Networks to process RGB images generated from assembly code. We present MalevisAsm, an [...] Read more.

Malicious software presents significant challenges in cybersecurity, leveraging rapidly evolving technologies to bypass traditional defense mechanisms. This research introduces a novel image-based malware classification framework that uses hybrid-model Convolutional Neural Networks to process RGB images generated from assembly code. We present MalevisAsm, an enriched dataset that merges MaleVis malware samples with benign files, and propose a hybrid deep learning model that combines EfficientNetB0 and DenseNet121 for robust feature extraction. The approach transforms Portable Executable files into assembly code, maps opcode transitions into three-channel images, and uses a fine-tuned CNN to classify malware families. Additionally, we implemented Uniform Manifold Approximation and Projection a contemporary nonlinear dimensionality reduction technique, to enhance the identification of previously unseen malware samples via binary classification. Our experiments achieve a top-tier accuracy of 98.45%, surpassing existing benchmarks on the MaleVis dataset. This research contributes to the field by integrating static binary analysis with advanced computer vision techniques, offering a scalable and effective solution for malware detection. Full article

(This article belongs to the Special Issue New Advances in Computer Security and Cybersecurity)

► Show Figures

Figure 1

25 pages, 2789 KiB

Open AccessArticle

Crypto-Ransomware Detection Through a Honeyfile-Based Approach with R-Locker

by Xiang Fang, Eric Song, Cheng Ning, Huseyn Huseynov and Tarek Saadawi

Mathematics 2025, 13(12), 1933; https://doi.org/10.3390/math13121933 - 10 Jun 2025

Viewed by 712

Abstract

Ransomware is a group of malware that aims to make computing resources unavailable, demanding a ransom amount to return control back to users. Ransomware can be classified into two types: crypto-ransomware and locker ransomware. Crypto-ransomware employs strong encryption and prevents users’ access to [...] Read more.

Ransomware is a group of malware that aims to make computing resources unavailable, demanding a ransom amount to return control back to users. Ransomware can be classified into two types: crypto-ransomware and locker ransomware. Crypto-ransomware employs strong encryption and prevents users’ access to the system. Locker ransomware makes access unavailable to users either by locking the boot sector or the user’s desktop. The proposed solution is an anomaly-based ransomware detection and prevention system consisting of post- and pre-encryption detection stages. The developed IDS is capable of detecting ransomware attacks by monitoring the usage of resources, triggered by anomalous behavior during an active attack. By analyzing the recorded parameters after recovery and logging any adverse effects, we were able to train the system for better detection patterns. The proposed solution allows for detection and intervention against the crypto and locker types of ransomware attacks. In previous work, the authors introduced a novel anti-ransomware tool for Windows platforms, known as R-Locker, which demonstrates high effectiveness and efficiency in countering ransomware attacks. The R-Locker solution employs “honeyfiles”, which serve as decoy files to attract ransomware activities. Upon the detection of any malicious attempts to access or alter these honeyfiles, R-Locker automatically activates countermeasures to thwart the ransomware infection and mitigate its impact. Building on our prior R-Locker framework this work introduces a multi-stage detection architecture with resource–behavioral hybrid analysis, achieving cross-platform efficacy against evolving ransomware families not addressed previously. Full article

(This article belongs to the Special Issue Recent Advances in Computational Intelligence Methodologies for Industries)

► Show Figures

Figure 1

35 pages, 4844 KiB

Open AccessArticle

A Transductive Zero-Shot Learning Framework for Ransomware Detection Using Malware Knowledge Graphs

by Ping Wang, Hao-Cyuan Li, Hsiao-Chung Lin, Wen-Hui Lin and Nian-Zu Xie

Information 2025, 16(6), 458; https://doi.org/10.3390/info16060458 - 29 May 2025

Viewed by 517

Abstract

Malware continues to evolve rapidly, posing significant challenges to network security. Traditional signature-based detection methods often struggle to cope with advanced evasion techniques such as polymorphism, metamorphism, encryption, and stealth, which are commonly employed by cybercriminals. As a result, these conventional approaches frequently [...] Read more.

Malware continues to evolve rapidly, posing significant challenges to network security. Traditional signature-based detection methods often struggle to cope with advanced evasion techniques such as polymorphism, metamorphism, encryption, and stealth, which are commonly employed by cybercriminals. As a result, these conventional approaches frequently fail to detect newly emerging malware variants in a timely manner. To address this limitation, Zero-Shot Learning (ZSL) has emerged as a promising alternative, offering improved classification capabilities for previously unseen malware samples. ZSL models leverage auxiliary semantic information and binary feature representations to enhance the recognition of novel threats. This study proposes a Transductive Zero-Shot Learning (TZSL) model based on the Vector Quantized Variational Autoencoder (VQ-VAE) architecture, integrated with a malware knowledge graph constructed from sandbox behavioral analysis of ransomware families. The model is further optimized through hyperparameter tuning to maximize classification performance. Evaluation metrics include per-family classification accuracy, precision, recall, F1-score, and Receiver Operating Characteristic (ROC) curves to ensure robust and reliable detection outcomes. In particular, the harmonic mean (H-mean) metric from the Generalized Zero-Shot Learning (GZSL) framework is introduced to jointly evaluate the model’s performance on both seen and unseen classes, offering a more holistic view of its generalization ability. The experimental results demonstrate that the proposed VQ-VAE model achieves an F1-score of 93.5% in ransomware classification, significantly outperforming other baseline models such as LeNet-5 (65.6%), ResNet-50 (71.8%), VGG-16 (74.3%), and AlexNet (65.3%). These findings highlight the superior capability of the VQ-VAE-based TZSL approach in detecting novel malware variants, improving detection accuracy while reducing false positives. Full article

(This article belongs to the Collection Knowledge Graphs for Search and Recommendation)

► Show Figures

Figure 1

23 pages, 10233 KiB

Open AccessArticle

Deep Defense Against Mal-Doc: Utilizing Transformer and SeqGAN for Detecting and Classifying Document Type Malware

by Gati Lother Martin, Sang-Min Lee, Jong-Hyun Kim, Young-Seob Jeong, Ah Reum Kang and Jiyoung Woo

Appl. Sci. 2025, 15(6), 2978; https://doi.org/10.3390/app15062978 - 10 Mar 2025

Viewed by 974

Abstract

The prevalence of non-executable malware is on the rise, presenting a major threat to users, including major public institutions and corporations. While extensive research has been conducted on detecting malware threats, there is a noticeable gap in studying document-type malware compared with executable [...] Read more.

The prevalence of non-executable malware is on the rise, presenting a major threat to users, including major public institutions and corporations. While extensive research has been conducted on detecting malware threats, there is a noticeable gap in studying document-type malware compared with executable files. The proposed model will solve this gap by detecting and classifying document-type malware families using script codes, including tags, to write documents and script languages to execute malicious functions. These script codes offer insights into how the malware was constructed and operates on the victim’s system. Additionally, we leverage language models in our approach. Initially, we develop MalCode2Vec to learn associations between source codes and represent them as numeric vectors. Subsequently, we design a Transformer-based model for document malware detection and family classification. Detection is conducted at both the stream and file levels. To solve the class imbalance issue in the malware family, we utilize a generative adversarial network to generate malware samples. Our experimental domain focuses on the Hangul (Korean) word processor, a tool notably used by North Korea in targeting the South Korean government. Full article

► Show Figures

Figure 1

25 pages, 2916 KiB

Open AccessArticle

Improving Cyber Defense Against Ransomware: A Generative Adversarial Networks-Based Adversarial Training Approach for Long Short-Term Memory Network Classifier

by Ping Wang, Hsiao-Chung Lin, Jia-Hong Chen, Wen-Hui Lin and Hao-Cyuan Li

Electronics 2025, 14(4), 810; https://doi.org/10.3390/electronics14040810 - 19 Feb 2025

Cited by 1 | Viewed by 946

Abstract

The rapid proliferation of ransomware variants necessitates more effective detection mechanisms, as traditional signature-based methods are increasingly inadequate. These conventional methods rely on manual feature extraction and matching, which are time-consuming and limited to known threats. This study addresses the escalating challenge of [...] Read more.

The rapid proliferation of ransomware variants necessitates more effective detection mechanisms, as traditional signature-based methods are increasingly inadequate. These conventional methods rely on manual feature extraction and matching, which are time-consuming and limited to known threats. This study addresses the escalating challenge of ransomware threats in cybersecurity by proposing a novel deep learning model, LSTM-EDadver, which leverages Generative Adversarial Networks (GANs) and Carlini and Wagner (CW) attacks to enhance malware detection capabilities. LSTM-EDadver innovatively generates adversarial examples (AEs) using sequential features derived from ransomware behaviors, thus training deep learning models to improve their robustness and accuracy. The methodology combines Cuckoo sandbox analysis with conceptual lattice ontology to capture a wide range of ransomware families and their variants. This approach not only addresses the shortcomings of existing models but also simulates real-world adversarial conditions during the validation phase by subjecting the models to CW attacks. The experimental results demonstrate that LSTM-EDadver achieves a classification accuracy of 96.59%. This performance was achieved using a dataset of 1328 ransomware samples (across 32 ransomware families) and 519 normal instances, outperforming traditional RNN, LSTM, and GCU models, which recorded accuracies of 90.01%, 93.95%, and 94.53%, respectively. The proposed model also shows significant improvements in F1-score, ranging from 2.49% to 6.64% compared to existing models without adversarial training. This advancement underscores the effectiveness of integrating GAN-generated attack command sequences into model training. Full article

(This article belongs to the Section Networks)

► Show Figures

Figure 1

22 pages, 5047 KiB

Open AccessArticle

Attention-Based Malware Detection Model by Visualizing Latent Features Through Dynamic Residual Kernel Network

by Mainak Basak, Dong-Wook Kim, Myung-Mook Han and Gun-Yoon Shin

Sensors 2024, 24(24), 7953; https://doi.org/10.3390/s24247953 - 12 Dec 2024

Cited by 2 | Viewed by 1601

Abstract

In recent years, significant research has been directed towards the taxonomy of malware variants. Nevertheless, certain challenges persist, including the inadequate accuracy of sample classification within similar malware families, elevated false-negative rates, and significant processing time and resource consumption. Malware developers have effectively [...] Read more.

In recent years, significant research has been directed towards the taxonomy of malware variants. Nevertheless, certain challenges persist, including the inadequate accuracy of sample classification within similar malware families, elevated false-negative rates, and significant processing time and resource consumption. Malware developers have effectively evaded signature-based detection methods. The predominant static analysis methodologies employ algorithms to convert the files. The analytic process is contingent upon the tool’s functionality; if the tool malfunctions, the entire process is obstructed. Most dynamic analysis methods necessitate the execution of a binary file within a sandboxed environment to examine its behavior. When executed within a virtual environment, the detrimental actions of the file might be easily concealed. This research examined a novel method for depicting malware as images. Subsequently, we trained a classifier to categorize new malware files into their respective classifications utilizing established neural network methodologies for detecting malware images. Through the process of transforming the file into an image representation, we have made our analytical procedure independent of any software, and it has also become more effective. To counter such adversaries, we employ a recognized technique called involution to extract location-specific and channel-agnostic features of malware data, utilizing a deep residual block. The proposed approach achieved remarkable accuracy of 99.5%, representing an absolute improvement of 95.65% over the equal probability benchmark. Full article

(This article belongs to the Special Issue Intelligence, Security, Trust and Privacy Advances in IoT, Bigdata and 5G Networks (2nd Edition))

► Show Figures

Figure 1

19 pages, 1428 KiB

Open AccessArticle

Behavioral Analysis of Android Riskware Families Using Clustering and Explainable Machine Learning

by Mohammed M. Alani and Moatsum Alawida

Big Data Cogn. Comput. 2024, 8(12), 171; https://doi.org/10.3390/bdcc8120171 - 26 Nov 2024

Viewed by 1603

Abstract

The Android operating system has become increasingly popular, not only on mobile phones but also in various other platforms such as Internet-of-Things devices, tablet computers, and wearable devices. Due to its open-source nature and significant market share, Android poses an attractive target for [...] Read more.

The Android operating system has become increasingly popular, not only on mobile phones but also in various other platforms such as Internet-of-Things devices, tablet computers, and wearable devices. Due to its open-source nature and significant market share, Android poses an attractive target for malicious actors. One of the notable security challenges associated with this operating system is riskware. Riskware refers to applications that may pose a security threat due to their vulnerability and potential for misuse. Although riskware constitutes a considerable portion of Android’s ecosystem malware, it has not been studied as extensively as other types of malware such as ransomware and trojans. In this study, we employ machine learning techniques to analyze the behavior of different riskware families and identify similarities in their actions. Furthermore, our research identifies specific behaviors that can be used to distinguish these riskware families. To achieve these insights, we utilize various tools such as k-Means clustering, principal component analysis, extreme gradient boost classifiers, and Shapley additive explanation. Our findings can contribute significantly to the detection, identification, and forensic analysis of Android riskware. Full article

► Show Figures

Figure 1

27 pages, 549 KiB

Open AccessArticle

Class Incremental Deep Learning: A Computational Scheme to Avoid Catastrophic Forgetting in Domain Generation Algorithm Multiclass Classification

by João Rafael Gregório, Adriano Mauro Cansian and Leandro Alves Neves

Appl. Sci. 2024, 14(16), 7244; https://doi.org/10.3390/app14167244 - 17 Aug 2024

Viewed by 1376

Abstract

Domain Generation Algorithms (DGAs) are algorithms present in most malware used by botnets and advanced persistent threats. These algorithms dynamically generate domain names to maintain and obfuscate communication between the infected device and the attacker’s command and control server. Since DGAs are used [...] Read more.

Domain Generation Algorithms (DGAs) are algorithms present in most malware used by botnets and advanced persistent threats. These algorithms dynamically generate domain names to maintain and obfuscate communication between the infected device and the attacker’s command and control server. Since DGAs are used by many threats, it is extremely important to classify a given DGA according to the threat it is related to. In addition, as new threats emerge daily, classifier models tend to become obsolete over time. Deep neural networks tend to lose their classification ability when retrained with a dataset that is significantly different from the initial one, a phenomenon known as catastrophic forgetting. This work presents a computational scheme composed of a deep learning model based on CNN and natural language processing and an incremental learning technique for class increment through transfer learning to classify 60 DGA families and include a new family to the classifier model, training the model incrementally using some examples from known families, avoiding catastrophic forgetting and maintaining metric levels. The proposed methodology achieved an average precision of 86.75%, an average recall of 83.06%, and an average F1 score of 83.78% with the full dataset, and suffered minimal losses when applying the class increment. Full article

(This article belongs to the Special Issue Advanced Technologies in Data and Information Security III)

► Show Figures

Figure 1

26 pages, 590 KiB

Open AccessArticle

SINNER: A Reward-Sensitive Algorithm for Imbalanced Malware Classification Using Neural Networks with Experience Replay

by Antonio Coscia, Andrea Iannacone, Antonio Maci and Alessandro Stamerra

Information 2024, 15(8), 425; https://doi.org/10.3390/info15080425 - 23 Jul 2024

Cited by 2 | Viewed by 1911

Abstract

Reports produced by popular malware analysis services showed a disparity in samples available for different malware families. The unequal distribution between such classes can be attributed to several factors, such as technological advances and the application domain that seeks to infect a computer [...] Read more.

Reports produced by popular malware analysis services showed a disparity in samples available for different malware families. The unequal distribution between such classes can be attributed to several factors, such as technological advances and the application domain that seeks to infect a computer virus. Recent studies have demonstrated the effectiveness of deep learning (DL) algorithms when learning multi-class classification tasks using imbalanced datasets. This can be achieved by updating the learning function such that correct and incorrect predictions performed on the minority class are more rewarded or penalized, respectively. This procedure can be logically implemented by leveraging the deep reinforcement learning (DRL) paradigm through a proper formulation of the Markov decision process (MDP). This paper proposes SINNER, i.e., a DRL-based multi-class classifier that approaches the data imbalance problem at the algorithmic level by exploiting a redesigned reward function, which modifies the traditional MDP model used to learn this task. Based on the experimental results, the proposed formula appears to be successful. In addition, SINNER has been compared to several DL-based models that can handle class skew without relying on data-level techniques. Using three out of four datasets sourced from the existing literature, the proposed model achieved state-of-the-art classification performance. Full article

(This article belongs to the Special Issue Machine Learning Approaches for Imbalanced Domains: Emerging Trends and Applications)

► Show Figures

Figure 1

30 pages, 1318 KiB

Open AccessArticle

Malware Classification Using Dynamically Extracted API Call Embeddings

by Sahil Aggarwal and Fabio Di Troia

Appl. Sci. 2024, 14(13), 5731; https://doi.org/10.3390/app14135731 - 30 Jun 2024

Cited by 4 | Viewed by 2852

Abstract

Malware classification stands as a crucial element in establishing robust computer security protocols, encompassing the segmentation of malware into discrete groupings. Recently, the emergence of machine learning has presented itself as an apt approach for addressing this challenge. Models can undergo training employing [...] Read more.

Malware classification stands as a crucial element in establishing robust computer security protocols, encompassing the segmentation of malware into discrete groupings. Recently, the emergence of machine learning has presented itself as an apt approach for addressing this challenge. Models can undergo training employing diverse malware attributes, such as opcodes and API calls, to distill valuable insights for effective classification. Within the realm of natural language processing, word embeddings assume a pivotal role by representing text in a manner that aligns closely with the proximity of similar words. These embeddings facilitate the quantification of word resemblances. This research embarks on a series of experiments that harness hybrid machine learning methodologies. We derive word vectors from dynamic API call logs associated with malware and integrate them as features in collaboration with diverse classifiers. Our methodology involves the utilization of Hidden Markov Models and Word2Vec to generate embeddings from API call logs. Additionally, we amalgamate renowned models like BERT and ELMo, noted for their capacity to yield contextualized embeddings. The resultant vectors are channeled into our classifiers, namely Support Vector Machines (SVMs), Random Forest (RF), k-Nearest Neighbors (kNNs), and Convolutional Neural Networks (CNNs). Through two distinct sets of experiments, our objective revolves around the classification of both malware families and categories. The outcomes achieved illuminate the efficacy of API call embeddings as a potent instrument in the domain of malware classification, particularly in the realm of identifying malware families. The best combination was RF and word embeddings generated by Word2Vec, ELMo, and BERT, achieving an accuracy between 0.91 and 0.93. This result underscores the potential of our approach in effectively classifying malware. Full article

(This article belongs to the Collection Innovation in Information Security)

► Show Figures

Figure 1

23 pages, 8085 KiB

Open AccessArticle

CSMC: A Secure and Efficient Visualized Malware Classification Method Inspired by Compressed Sensing

by Wei Wu, Haipeng Peng, Haotian Zhu and Derun Zhang

Sensors 2024, 24(13), 4253; https://doi.org/10.3390/s24134253 - 30 Jun 2024

Cited by 4 | Viewed by 1541

Abstract

With the rapid development of the Internet of Things (IoT), the sophistication and intelligence of sensors are continually evolving, playing increasingly important roles in smart homes, industrial automation, and remote healthcare. However, these intelligent sensors face many security threats, particularly from malware attacks. [...] Read more.

With the rapid development of the Internet of Things (IoT), the sophistication and intelligence of sensors are continually evolving, playing increasingly important roles in smart homes, industrial automation, and remote healthcare. However, these intelligent sensors face many security threats, particularly from malware attacks. Identifying and classifying malware is crucial for preventing such attacks. As the number of sensors and their applications grow, malware targeting sensors proliferates. Processing massive malware samples is challenging due to limited bandwidth and resources in IoT environments. Therefore, compressing malware samples before transmission and classification can improve efficiency. Additionally, sharing malware samples between classification participants poses security risks, necessitating methods that prevent sample exploitation. Moreover, the complex network environments also necessitate robust classification methods. To address these challenges, this paper proposes CSMC (Compressed Sensing Malware Classification), an efficient malware classification method based on compressed sensing. This method compresses malware samples before sharing and classification, thus facilitating more effective sharing and processing. By introducing deep learning, the method can extract malware family features during compression, which classical methods cannot achieve. Furthermore, the irreversibility of the method enhances security by preventing classification participants from exploiting malware samples. Experimental results demonstrate that for malware targeting Windows and Android operating systems, CSMC outperforms many existing methods based on compressed sensing and machine or deep learning. Additionally, experiments on sample reconstruction and noise demonstrate CSMC’s capabilities in terms of security and robustness. Full article

(This article belongs to the Special Issue Compressed Sensing and Imaging Processing—2nd Edition)

► Show Figures

Figure 1

26 pages, 2875 KiB

Open AccessArticle

Identifying Malware Packers through Multilayer Feature Engineering in Static Analysis

by Ehab Alkhateeb, Ali Ghorbani and Arash Habibi Lashkari

Information 2024, 15(2), 102; https://doi.org/10.3390/info15020102 - 9 Feb 2024

Cited by 4 | Viewed by 4600

Abstract

This research addresses a critical need in the ongoing battle against malware, particularly in the form of obfuscated malware, which presents a formidable challenge in the realm of cybersecurity. Developing effective antivirus (AV) solutions capable of combating packed malware remains a crucial endeavor. [...] Read more.

This research addresses a critical need in the ongoing battle against malware, particularly in the form of obfuscated malware, which presents a formidable challenge in the realm of cybersecurity. Developing effective antivirus (AV) solutions capable of combating packed malware remains a crucial endeavor. Packed malicious programs employ encryption and advanced techniques to obfuscate their payloads, rendering them elusive to AV scanners and security analysts. The introduced research presents an innovative malware packer classifier specifically designed to adeptly identify packer families and detect unknown packers in real-world scenarios. To fortify packer identification performance, we have curated a meticulously crafted dataset comprising precisely packed samples, enabling comprehensive training and validation. Our approach employs a sophisticated feature engineering methodology, encompassing multiple layers of analysis to extract salient features used as input to the classifier. The proposed packer identifier demonstrates remarkable accuracy in distinguishing between known and unknown packers, while also ensuring operational efficiency. The results reveal an impressive accuracy rate of 99.60% in identifying known packers and 91% accuracy in detecting unknown packers. This novel research not only significantly advances the field of malware detection but also equips both cybersecurity practitioners and AV engines with a robust tool to effectively counter the persistent threat of packed malware. Full article

(This article belongs to the Special Issue Advances in Cybersecurity and Reliability)

► Show Figures

Figure 1

22 pages, 630 KiB

Open AccessArticle

Dynamic Malware Classification and API Categorisation of Windows Portable Executable Files Using Machine Learning

by Durre Zehra Syeda and Mamoona Naveed Asghar

Appl. Sci. 2024, 14(3), 1015; https://doi.org/10.3390/app14031015 - 25 Jan 2024

Cited by 13 | Viewed by 4946

Abstract

The rise of malware attacks presents a significant cyber-security challenge, with advanced techniques and offline command-and-control (C2) servers causing disruptions and financial losses. This paper proposes a methodology for dynamic malware analysis and classification using a malware Portable Executable (PE) file from the [...] Read more.

The rise of malware attacks presents a significant cyber-security challenge, with advanced techniques and offline command-and-control (C2) servers causing disruptions and financial losses. This paper proposes a methodology for dynamic malware analysis and classification using a malware Portable Executable (PE) file from the MalwareBazaar repository. It suggests effective strategies to mitigate the impact of evolving malware threats. For this purpose, a five-level approach for data management and experiments was utilised: (1) generation of a customised dataset by analysing a total of 582 malware and 438 goodware samples from Windows PE files; (2) feature extraction and feature scoring based on Chi2 and Gini importance; (3) empirical evaluation of six state-of-the-art baseline machine learning (ML) models, including Logistic Regression (LR), Support Vector Machine (SVM), Naive Bayes (NB), Random Forest (RF), XGBoost (XGB), and K-Nearest Neighbour (KNN), with the curated dataset; (4) malware family classification using VirusTotal APIs; and, finally, (5) categorisation of 23 distinct APIs from 266 malware APIs. According to the results, Gini’s method takes a holistic view of feature scoring, considering a wider range of API activities. The RF achieved the highest precision of 0.99, accuracy of 0.96, area under the curve (AUC) of 0.98, and F1-score of 0.96, with a 0.93 true-positive rate (TPR) and 0.0098 false-positive rate (FPR), among all applied ML models. The results show that Trojans (27%) and ransomware (22%) are the most risky among 11 malware families. Windows-based APIs (22%), the file system (12%), and registry manipulation (8.2%) showcased their importance in detecting malicious activity in API categorisation. This paper considers a dual approach for feature reduction and scoring, resulting in an improved F1-score (2%), and the inclusion of AUC and specificity metrics distinguishes it from existing research (Section Comparative Analysis with Existing Approaches). The newly generated dataset is publicly available in the GitHub repository (Data Availability Statement) to facilitate aspirant researchers’ dynamic malware analysis. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

27 pages, 20139 KiB

Open AccessArticle

A Streamlined Framework of Metamorphic Malware Classification via Sampling and Parallel Processing

by Jian Lyu, Jingfeng Xue, Weijie Han, Qian Zhang and Yufen Zhu

Electronics 2023, 12(21), 4427; https://doi.org/10.3390/electronics12214427 - 27 Oct 2023

Cited by 1 | Viewed by 1890

Abstract

Nowadays, malware remains a significant threat to the current cyberspace. More seriously, malware authors frequently use metamorphic techniques to create numerous variants, which throws malware researchers a heavy burden. Being able to classify these metamorphic malware samples into their corresponding families could accelerate [...] Read more.

Nowadays, malware remains a significant threat to the current cyberspace. More seriously, malware authors frequently use metamorphic techniques to create numerous variants, which throws malware researchers a heavy burden. Being able to classify these metamorphic malware samples into their corresponding families could accelerate the malware analysis task efficiently. Based on our comprehensive analysis, these variants are usually implemented by making changes to their assembly instruction sequences to a certain extent. Motivated by this finding, we present a streamlined and efficient framework of malware family classification named MalSEF, which leverages sampling and parallel processing to efficiently and effectively classify the vast number of metamorphic malware variants. At first, it attenuates the complexity of feature engineering by extracting a small portion of representative samples from the entire dataset and establishing a simple feature vector based on the Opcode sequences; then, it generates the feature matrix and conducts the classification task in parallel with collaboration utilizing multiple cores and a proactive recommendation scheme. At last, its practicality is strengthened to cope with the large volume of diversified malware variants based on common computing platforms. Our comprehensive experiments conducted on the Kaggle malware dataset demonstrate that MalSEF achieves a classification accuracy of up to 98.53% and reduces time overhead by 37.60% compared to the serial processing procedure. Full article

(This article belongs to the Special Issue AI-Driven Network Security and Privacy)

► Show Figures

Figure 1

Search Results (70)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (70)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI