A Transductive Zero-Shot Learning Framework for Ransomware Detection Using Malware Knowledge Graphs

Wang, Ping; Li, Hao-Cyuan; Lin, Hsiao-Chung; Lin, Wen-Hui; Xie, Nian-Zu

doi:10.3390/info16060458

Open AccessArticle

A Transductive Zero-Shot Learning Framework for Ransomware Detection Using Malware Knowledge Graphs

by

Ping Wang

^1,*

,

Hao-Cyuan Li

¹,

Hsiao-Chung Lin

²,

Wen-Hui Lin

¹ and

Nian-Zu Xie

¹

Green Energy Technology Research Center, Faculty of Department of Information Management, Kun Shan University, Tainan 710303, Taiwan

²

Department of Information Management, National Chin-Yi University of Technology, Taichung 411030, Taiwan

^*

Author to whom correspondence should be addressed.

Information 2025, 16(6), 458; https://doi.org/10.3390/info16060458

Submission received: 29 April 2025 / Revised: 22 May 2025 / Accepted: 27 May 2025 / Published: 29 May 2025

(This article belongs to the Collection Knowledge Graphs for Search and Recommendation)

Download

Browse Figures

Versions Notes

Abstract

Malware continues to evolve rapidly, posing significant challenges to network security. Traditional signature-based detection methods often struggle to cope with advanced evasion techniques such as polymorphism, metamorphism, encryption, and stealth, which are commonly employed by cybercriminals. As a result, these conventional approaches frequently fail to detect newly emerging malware variants in a timely manner. To address this limitation, Zero-Shot Learning (ZSL) has emerged as a promising alternative, offering improved classification capabilities for previously unseen malware samples. ZSL models leverage auxiliary semantic information and binary feature representations to enhance the recognition of novel threats. This study proposes a Transductive Zero-Shot Learning (TZSL) model based on the Vector Quantized Variational Autoencoder (VQ-VAE) architecture, integrated with a malware knowledge graph constructed from sandbox behavioral analysis of ransomware families. The model is further optimized through hyperparameter tuning to maximize classification performance. Evaluation metrics include per-family classification accuracy, precision, recall, F1-score, and Receiver Operating Characteristic (ROC) curves to ensure robust and reliable detection outcomes. In particular, the harmonic mean (H-mean) metric from the Generalized Zero-Shot Learning (GZSL) framework is introduced to jointly evaluate the model’s performance on both seen and unseen classes, offering a more holistic view of its generalization ability. The experimental results demonstrate that the proposed VQ-VAE model achieves an F1-score of 93.5% in ransomware classification, significantly outperforming other baseline models such as LeNet-5 (65.6%), ResNet-50 (71.8%), VGG-16 (74.3%), and AlexNet (65.3%). These findings highlight the superior capability of the VQ-VAE-based TZSL approach in detecting novel malware variants, improving detection accuracy while reducing false positives.

Keywords:

malware classification; deep learning models; zero-shot learning; VQ-VAE

1. Introduction

Ransomware threats have evolved substantially, progressing from basic file encryption schemes to highly sophisticated attacks targeting critical infrastructure and global supply chains. Recent industry reports underscore the increasing risks and financial losses incurred by enterprises due to ransomware incidents.

A notable example is the LockerGoga ransomware attack in 2019, which severely impacted Norsk Hydro, a major global aluminum producer. The attack disrupted operations across multiple international sites, leading to significant financial and operational losses. Such incidents highlight the vulnerabilities of industrial and manufacturing sectors and underscore the urgent need for more advanced and adaptive security strategies [1].

In 2021, the DarkSide ransomware group launched a high-profile attack against Colonial Pipeline, a key fuel supplier in the United States. The breach resulted in a temporary shutdown of operations, leading to fuel shortages and substantial economic disruption [2]. That same year, Brenntag, a global chemical distribution company, was also targeted by DarkSide. Attackers exfiltrated sensitive data and demanded a ransom of USD 7.5 million. The incident demonstrated not only the financial impact but also the operational fragility beyond traditional IT systems, ultimately compelling the company to pay the ransom [3].

Further evidence of ransomware’s cascading effects was seen in 2022, when Kojima Industries, a supplier of plastic parts and electronic components to Toyota, suffered a cyberattack believed to be ransomware-related. The breach disrupted Toyota’s domestic production lines, illustrating how ransomware can trigger widespread supply chain disruptions across interconnected industrial ecosystems [4].

Recent cybersecurity reports emphasize the persistent and escalating threat of ransomware to organizations worldwide. According to Trend Micro’s 2024 mid-year assessment, ransomware remains a top security concern, with critical sectors such as banking, technology, and government facing substantial impacts. Notably, in the first half of 2024, the LockBit ransomware variant was the most dominant, accounting for over half of all ransomware detections [5].

Check Point Research further reported a global surge in ransomware incidents, documenting over 1230 cases between May and October 2024, with North America disproportionately affected—representing 57% of all cases. In Q3 2024, organizations experienced an average of 1876 weekly attacks, marking a 75% year-over-year increase. These attacks result not only in direct financial losses and data breaches but also in long-term reputational damage for affected organizations [6].

This evolving threat landscape is exacerbated by the increased reliance on interconnected digital infrastructures, particularly in industrial settings. Modern ransomware increasingly employs object-oriented programming and code obfuscation techniques, enabling it to bypass traditional signature-based detection mechanisms [7,8]. Consequently, reactive security approaches are no longer adequate. There is a growing need for proactive, AI-driven solutions capable of anticipating and adapting to emerging ransomware variants, which are essential for establishing resilient cyber defense strategies.

Traditional machine learning models typically require large volumes of labeled data to achieve effective training, especially when dealing with continuously evolving malware attack patterns. However, when novel or mutated malware variants emerge, conventional models such as Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM) networks, and Temporal Convolutional Networks (TCNs) often deteriorate when encountering novel or mutated ransomware variants, leading to reduced detection accuracy and an elevated rate of false positives [9].

To identify such new or polymorphic threats, cybersecurity defenders have traditionally relied on a combination of techniques, including signature-based detection [8], behavioral analysis [9], and machine learning (ML)-based approaches [10,11]. These methods typically analyze binary samples collected from infected endpoints or sandboxes to extract behavioral signatures or static indicators of compromise.

Table 1 provides a comparative summary of three major approaches for malware detection, outlining their core features, typical application scenarios, and associated limitations. This comparison serves as a foundation for motivating the adoption of more flexible and adaptive frameworks, such as Zero-Shot Learning, in the face of rapidly evolving ransomware threats.

As shown in Table 1, the effectiveness of deep learning networks (DLNs) is highly dependent on the quality and relevance of the features extracted from collected malware samples. When feature engineering is inadequate or overly complex, DLNs may fail to capture critical behavioral patterns associated with emerging or obfuscated malware, thereby limiting their detection capabilities [13]. Consequently, it is essential to clearly define the rationale behind feature selection and its relevance to ransomware classification. Given that ransomware typically demonstrates distinct behaviors, features such as file encryption activities, system process manipulation, network scanning, and registry modifications were prioritized in this study. These behavioral indicators are key discriminative factors for distinguishing ransomware from benign applications.

Furthermore, conventional machine learning (ML) methods—particularly image-based approaches utilizing Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM) networks, and Temporal Convolutional Networks (TCNs)—often struggle to detect previously unseen or mutated ransomware variants. These models typically rely on supervised learning from labeled datasets containing known malicious binaries. As a result, they excel in recognizing known patterns but exhibit limited generalization when encountering novel or zero-day threats. This limitation underscores the need for advanced models capable of identifying new ransomware variants without requiring extensive labeled training data, such as those enabled by Zero-Shot Learning (ZSL) frameworks.

In our previous work [14], the developed classifier exhibited limitations in distinguishing between certain ransomware families—specifically LockBit and RagnarLocker. This misclassification was primarily attributed to the high degree of similarity in their attack command sequences, as extracted from Cuckoo Sandbox analysis (v2.0.6). The resulting overlap in feature representation significantly reduced the model’s discriminative power.

Motivated by advances in Zero-Shot Learning (ZSL) [15,16,17], the present study investigates the integration of ZSL techniques into deep learning architectures to enhance classification performance for previously unseen ransomware variants. By leveraging both textual semantic attributes and binary image features derived from malware samples, the ZSL framework enables a more expressive and generalizable feature space, thereby improving classification robustness under limited supervision.

As ransomware continues to evolve, its attack vectors become increasingly sophisticated and evasive, rendering traditional detection systems less effective. To address these challenges, this study proposes a more scalable solution that incorporates semi-supervised learning principles and ZSL, aiming to expand the model’s ability to recognize novel or mutated ransomware families without requiring extensive labeled data for each new threat.

The concept of ZSL was first introduced by Mark Palatucci et al. in 2009 [18]. This seminal work laid the foundation for models capable of identifying instances from previously unseen classes by mimicking human-like reasoning. For example, a model trained exclusively on images of horses and tigers may still recognize a zebra by leveraging shared semantic attributes—such as tiger-like stripes and a horse-like body shape—even without having encountered zebra images during training. This capability allows ZSL to bridge the gap between seen and unseen categories by utilizing semantic knowledge to generalize beyond the training data.

Recent research demonstrates that ZSL methods can outperform conventional Convolutional Neural Networks (CNNs) in tasks involving the recognition of novel or variant classes, particularly in dynamic or open-world environments where new instances frequently emerge. More specifically, ZSL enables models to transfer learned knowledge from previously encountered categories to infer labels for unseen classes. This is achieved through the use of auxiliary information, such as semantic attributes or textual descriptions, which serve as an intermediate representation linking known and unknown categories.

This research is driven by the pressing need for more robust malware detection mechanisms, particularly in the context of identifying malware variants through Transductive Zero-Shot Learning (TZSL) [17]. TZSL is a variant of the Zero-Shot Learning (ZSL) paradigm that leverages both labeled data from seen classes and unlabeled data from unseen classes during the training phase. Unlike Inductive ZSL (IZSL), which relies solely on labeled samples from known categories, TZSL better reflects practical scenarios in which partial or unlabeled information about emerging threats may be available.

In this study, we propose a ZSL-based model constructed on the Vector Quantized Variational Autoencoder (VQ-VAE) framework [19], specifically tailored for ransomware classification. By enabling the model to recognize and classify previously unseen malware variants, the VQ-VAE architecture addresses critical limitations of traditional machine learning approaches, which typically depend on large quantities of labeled training data.

The detection pipeline begins with the identification and feature extraction of unknown or variant malware samples. The dataset includes 1782 ransomware samples spanning 12 families (collected from GitHub) and 299 benign application programs, processed using analysis tools such as PETools (v1.9), IDA Pro (v7.7), and Cuckoo Sandbox (v2.0.6). Next, concept similarity analysis is performed to identify semantic relationships among malware families and their behavioral attributes, which are then used to construct a feature-engineered malware knowledge graph. This knowledge graph serves as an auxiliary structure that enhances classification by incorporating high-level semantic representations—such as behaviors extracted from sandbox analysis—to reduce misclassification and improve the detection of novel or polymorphic ransomware variants. In other words, knowledge graphs offer a framework for organizing the behavioral traits and contextual features of ransomware into a semantically connected structure. Through the analysis of inter-node relationships and the inference of common behavioral patterns, the system can derive meaningful associations among entities [20,21,22]. This process enables the model to infer potential classifications for samples lacking prior labels, thereby enhancing its ability to reason about unfamiliar threats.

To evaluate the effectiveness of the proposed model, five key performance indicators for malware classification were analyzed: per-family accuracy, precision, recall, F1-score, and Receiver Operating Characteristic (ROC) curves. In addition, this study introduces the harmonic mean (H-mean) metric as part of the Generalized Zero-Shot Learning (GZSL) evaluation framework, which accounts for both seen and unseen class accuracies to more accurately assess the model’s generalization performance.

The primary contributions of this study are summarized as follows:

This research presents a ZSL model based on the VQ-VAE framework that is specifically designed for malware classification. By enabling the detection and classification of previously unseen ransomware variants, the proposed approach addresses a significant limitation in traditional machine learning techniques, which typically rely on large volumes of labeled data.
The proposed TZSL model based on VQ-VAE achieves an F1-score of 93.5% in detecting novel or variant ransomware samples. This significantly improves detection accuracy during the early download or infection phase, while also reducing false positive rates.
In contrast, traditional CNN-based models—including LeNet-5, ResNet-50, VGG-16, and AlexNet—achieve a lower F1-score of 65.6%, 71.8%, 74.3%, and 65.3%, respectively, when classifying ransomware variants. These results highlight the superior generalization capability of the proposed TZSL model in handling previously unseen threats.
The remainder of this paper is organized as follows: Section 2 provides an overview of the Zero-Shot Learning (ZSL) approach and introduces the proposed ZSL-based model for recognizing previously unseen malware. Section 3 details the architecture of the proposed ZSL malware detection framework, including the integration of a similarity-based analysis for constructing the malware knowledge graph. Section 4 presents and discusses the experimental results, highlighting model performance across various evaluation metrics. Finally, Section 5 concludes this paper and outlines directions for future work.

2. Overview of Deep Learning Techniques and Their Applications in Malware Detection

This section reviews existing research in two key areas: (i) misclassification analysis in machine learning models for malware detection, and (ii) the application of ZSL to enhance model robustness in identifying novel or previously unseen malware variants.

2.1. Misclassification Analysis in Machine Learning Models

Deep learning-based classification errors remain a critical challenge in network security. To improve performance, researchers have explored strategies such as expanding the diversity of training datasets, incorporating adversarial training, and optimizing model architectures. These methods aim to enhance both accuracy and generalization. However, they also involve trade-offs, including increased computational demand and implementation complexity, which must be considered based on application scenarios and resource availability.

A.: Addressing Misclassification in Deep Learning-Based Malware Detection

In the context of ransomware detection, incorrect predictions—particularly false positives and false negatives—pose substantial risks. Recent research has therefore emphasized the development of more resilient detection algorithms and post hoc correction techniques. Strengthening model generalization is viewed as a key solution, especially for emerging threats like zero-day or polymorphic ransomware. The following summarizes three widely adopted approaches to reducing such prediction inaccuracies:

Method 1: Enhancing Model Classification Capability through Semantic Embedding

Improving classification performance in malware detection can be effectively achieved by increasing model complexity through techniques such as ZSL with semantic embedding. This method incorporates category-related textual semantics as auxiliary information, enabling the model to measure semantic similarities between categories. By jointly leveraging semantic embeddings and visual features, the model can more accurately distinguish between known and previously unseen ransomware families.

A key advantage of this approach is the enhancement of classification accuracy. Semantic embedding facilitates a deeper understanding of inter-class relationships, thereby improving the model’s ability to recognize complex or variant ransomware behaviors. Moreover, the inclusion of semantic knowledge significantly boosts the model’s generalization capability, allowing it to effectively detect unseen or zero-day malware without prior exposure.

However, this method also has notable limitations. One primary drawback is the increased computational complexity; the integration of high-dimensional semantic features requires more intensive computing resources. In addition, the model’s effectiveness is highly dependent on the availability and quality of auxiliary information. Collecting and curating rich, category-specific semantic data can be time-consuming and resource-intensive, posing challenges for scalability and practical deployment.

Method 2: Utilizing Diverse Training Data with Augmentation Techniques

A fundamental strategy for enhancing model effectiveness in ransomware detection involves incorporating a diverse set of training samples, representing a wide range of ransomware variants. To address class imbalance, data augmentation techniques such as the Synthetic Minority Over-sampling Technique (SMOTE) can be employed to maintain a well-balanced data distribution. Exposure to varied threat patterns enables the model to learn discriminative features more effectively, thereby improving its capacity to distinguish between different ransomware families and enhancing overall classification performance.

A key advantage of this approach is improved model robustness. Training on heterogeneous samples increases the model’s resilience and adaptability to novel or polymorphic threats. Additionally, augmenting data from underrepresented classes mitigates the risk of overfitting, ensuring a more balanced and generalizable learning process.

However, several challenges must be considered. Collecting and curating a sufficiently diverse and representative dataset is resource-intensive, often requiring significant effort in data acquisition, labeling, and preprocessing. Moreover, while augmentation techniques improve data variety, they may inadvertently introduce artifacts or distortions, which could compromise model performance if not properly validated. Thus, careful tuning and rigorous evaluation are essential to maintain data quality and model reliability.

Method 3: Incorporating Adversarial Attack Training

An effective strategy to enhance ransomware detection involves incorporating adversarial training, wherein the model is exposed to adversarially crafted samples that simulate evasion techniques used by real-world malware. By training with such examples, the model learns to recognize and resist malicious perturbations, thereby increasing its resilience against evolving cyber threats and improving its robustness in identifying sophisticated attack patterns.

A key advantage of this approach is its contribution to model security hardening. Models trained on adversarial samples exhibit greater resistance to manipulation, making them more reliable in real-world cybersecurity scenarios where attackers often deploy obfuscation or evasion techniques. Furthermore, this method encourages the model to learn generalizable representations, thus improving its adaptability in detecting novel or previously unseen ransomware variants.

However, this approach also presents several challenges (as summarized in Table 2). The generation and integration of adversarial examples significantly increase computational overhead, leading to higher processing and training resource demands. Moreover, the need for a large and diverse set of high-quality adversarial samples complicates the data preparation pipeline. Crafting, labeling, and validating these samples requires considerable time, expertise, and oversight to ensure their relevance and effectiveness without introducing unintended noise or bias.

B.: Deep Learning-based Misclassification for Malware Classification

Numerous studies have investigated the application of machine learning models for malware detection, with particular emphasis on misclassification challenges and adversarial robustness. Abusnaina et al. (2020) [25] proposed a fine-grained hierarchical deep learning approach tailored for malware classification in Internet of Things (IoT) environments. Their study evaluated the vulnerability of existing models to adversarial samples generated via graph embedding and augmentation techniques. To address these threats, they introduced a classifier named DL-FHMC, which demonstrated enhanced resilience, achieving a detection rate of 88.52% against adversarial malware samples.

Alam et al. (2021) [26] developed a deep learning-based malware classification framework utilizing multiple Convolutional Neural Network (CNN) architectures, including AlexNet, VGG-16, ResNet-50, and InceptionV3. Malware binaries were transformed into grayscale images and evaluated using the Malimg-dataset9010. Their best-performing model achieved a classification accuracy of 98.90% for known malware, highlighting the potential of CNNs in static malware image analysis.

Aslan and Yılmaz (2021) [27] introduced a novel deep learning framework aimed at classifying malware variants. Their architecture combined two pre-trained models to enhance classification accuracy. Evaluations conducted on the Malimg, Microsoft BIG 2015, and Malevis datasets showed that their proposed method significantly outperformed existing machine learning-based techniques, particularly in terms of accuracy and generalization.

More recently, Card, Aryal, and Gupta (2024) [28] examined the adversarial vulnerability of neural network-based malware classifiers in dynamic and online analysis environments. They trained a feedforward neural network (FFNN) to classify behavioral features and applied SHAP (SHapley Additive exPlanations) for model interpretability. Targeted adversarial attacks were then launched using the extracted explainable features. The results revealed a high evasion success rate in certain scenarios, underscoring the susceptibility of deep learning models to adversarial manipulation in real-world settings.

2.2. Zero-Shot Learning Model for Malware Detection

In the context of ZSL, two primary learning paradigms can be distinguished: Inductive Zero-Shot Learning (IZSL) and Transductive Zero-Shot Learning (TZSL).

IZSL trains the model exclusively on data from seen classes during the training phase. At inference time, the model is required to transfer its learned knowledge to classify entirely unseen classes, without access to any information from those categories during training. This setting reflects a more constrained and challenging scenario.

In contrast, TZSL leverages auxiliary information from unseen classes during training. This may include unlabeled data, semantic descriptions, or attribute vectors related to the unseen categories. By incorporating such information, TZSL facilitates a more informed learning process, allowing the model to better adapt to and recognize previously unseen classes with greater accuracy and generalization capability.

In summary, the primary distinction between the two approaches lies in their use of information from unseen categories. ZSL leverages additional data from unseen classes—such as unlabeled instances or semantic descriptors—during the training phase, whereas IZSL relies solely on labeled data from seen classes.

As a result, TZSL is potentially more effective in recognizing unseen categories, particularly in dynamic environments. However, its performance may also be influenced by the distributional characteristics of the unseen categories, which can affect the model’s generalization capability. Moreover, TZSL incorporates both visual and semantic embeddings from seen and unseen classes without requiring labeled information for the latter, enabling it to bridge the semantic gap more effectively.

System Architecture

A Zero-Shot Learning (ZSL) classifier is constructed by integrating both visual features (e.g., binary image representations of malware) and semantic features (e.g., attribute vectors or textual descriptions). These features are encoded and projected into a shared embedding space, denoted as E (I_i, T_i), where I_i, represents the image-based features and T_i denotes the corresponding textual semantics.

Within this shared space, the model learns to establish meaningful correlations between visual and semantic modalities, thereby aligning representations across seen and unseen classes. This joint embedding strategy enables the classifier to generalize to previously unseen ransomware families. The overall architecture and feature fusion process are illustrated in Figure 1.

In the subsequent inference stage, the system utilizes the relational information encoded within the shared embedding space E(I_i, T_i) to predict the category of previously unseen malware images. Based on these learned associations, the model retrieves and outputs the corresponding semantic attributes. The inference process is illustrated in Figure 2.

Operational Steps for Zero-Shot Learning (ZSL) [15]

The execution process of a ZSL model typically involves the following key stages:

Dataset Construction:
A dataset comprising training categories and their corresponding attribute descriptions is first prepared. This dataset includes labeled images from seen classes along with textual descriptions representing each category’s semantic characteristics.
Attribute Description Extraction:
For each known category, attribute descriptions are extracted in textual form. These descriptions typically include human-interpretable features and behavioral characteristics, serving as the foundation for semantic reasoning.
Feature Extraction:
Visual features are extracted from training images using deep learning models (e.g., CNNs or VQ-VAE). These features are used to construct a visual embedding space that encodes the appearance or structure of malware samples.
Semantic Embedding and Alignment:
The extracted attribute descriptions are converted into semantic vectors (see Note 1), which map words or phrases into a high-dimensional semantic space. A contrastive pre-training model is then employed to jointly embed both semantic and visual features into a shared embedding space (as illustrated in Figure 1), enabling effective comparison and alignment across modalities.
Prediction and Classification:
During inference, the trained model projects the features of unseen malware samples into the shared embedding space. Based on similarity to known semantic vectors, the model predicts the most likely category for the previously unseen instance (see Figure 2).
Performance Evaluation:
The trained model is evaluated using a test dataset containing samples from unseen categories. Metrics such as classification accuracy, precision, recall, and H-mean are used to assess the model’s generalization ability under the GZSL setting.

Note 1:

Semantic embedding is a foundational concept in Natural Language Processing (NLP) that involves mapping words or phrases into numerical vector representations. Tools such as Word2Vec, developed by Google in 2013, are widely used to generate these embeddings. Word2Vec learns semantic relationships by analyzing co-occurrence patterns in large text corpora, based on the assumption that words appearing in similar contexts tend to have similar meanings.

Mathematical model of TZSL-based classifier [15,19]

Variable Definitions

X: Feature space, typically consisting of feature vectors extracted by the model.

Y: Label space, containing all possible class labels.

Y_seen: The set of classes encountered during the training phase.

Y_unseen: The set of classes not seen during training, where Y_seen ∩ Y_unseen = φ.

S: Semantic space, containing descriptive information about categories, such as attribute vectors or textual descriptions.

f: X→S: A mapping function from the feature space to the semantic space.

Feature Extraction
For each training sample x∈X, extract its feature vector f (x).
Loss Function
A loss function is typically used to optimize the mapping function f, such as cross-entropy loss.
Mapping Learning

Learn a mapping function f such that for each known class y∈Y_seen, its feature vector f (x) is accurately mapped to the corresponding semantic representation s∈S.

Class Prediction

For an unseen class sample x′, extract its feature f (x′) and then find the closest semantic representation f (x′) in the semantic space s′ and map it to the corresponding class

Y_{u n s e e n}

. i.e.,

: X^{'} \overset{s'}{\to} Y_{u n s e e n}

.

2.3. Vector Quantized Variational Autoencoder Model

In 2006, the Google Brain team introduced the autoencoder architecture, which has had a profound impact on the advancement of deep learning. This neural network model is designed to compress high-dimensional input data into a lower-dimensional latent representation, capturing the most salient features, and then reconstructing the original input from this compressed form.

An autoencoder consists of two primary components: an encoder and a decoder. The encoder transforms the input xxx into a lower-dimensional latent vector zzz, effectively summarizing the essential characteristics of the data. The decoder then attempts to reconstruct the original input xxx from the latent representation zzz, ensuring that critical information is retained during the compression process. These latent vectors serve as compact, informative representations that facilitate downstream tasks such as dimensionality reduction, feature extraction, and anomaly detection.

In 2017, researchers at DeepMind proposed the VQ-VAE, a variant of the standard variational autoencoder that incorporates vector quantization (VQ) techniques [30]. Unlike traditional VAEs, which rely on continuous latent variables, the VQ-VAE employs a discretized latent space, in which continuous encoder outputs are quantized to the nearest entry in a learned codebook. This discrete representation enables the model to capture more structured and interpretable features, making it particularly useful for tasks involving symbolic representations and data compression.

Key Components and Process

Encoder: The input vector x is processed by the encoder to produce a latent vector $z_{e}$ .
Codebook Search: The model searches a learned codebook, which represents the discrete latent space, to find the latent vector $z_{c}$ that is closest to $z_{e}$ . This is achieved by identifying the code vector $e_{k}$ with the minimum Euclidean distance to $z_{e}$ .
Decoder: The selected latent vector $z_{c}$ is then fed into the decoder to reconstruct the input data, denoted as $\hat{x}$ .
Loss Function: The reconstruction error is measured using a loss function, such as entropy, between the original input x and its reconstructed version $\hat{x}$ .

The process can be summarized as follows:

z_{e} = e n c o d e r (x),

(1)

z_{c} = e_{k}, k = {a r g m i n}_{i} | | z_{e} - e_{i} | |

(2)

\hat{x} = d e c o d e r (z_{c})

(3)

L o s s = E n t r o p y (x, \hat{x})

(4)

3. A ZSL-Based Classification Model for Malware Detection

3.1. Architecture Design for ZSL Models in Malware Identification

This study proposes a network malware classification architecture (as illustrated in Figure 3) specifically designed to detect novel malware variants and emerging threats. The system is structured to support four key functional components:

1. Malware Feature Database Construction—extraction and organization of representative features from malware samples to create a structured feature repository.

2. Ontology-Based Concept Similarity Analysis—application of semantic analysis techniques to evaluate conceptual relationships between known and unknown malware based on an ontology framework.

3. Codebook Matching and Vector Quantization—utilization of a codebook-based mechanism to perform discrete latent encoding and efficient feature lookup.

4. Malware Classification—integration of visual and semantic features within a ZSL framework to accurately classify both seen and unseen malware families.

In recent years, several studies have explored the application of Zero-Shot Learning (ZSL) for malware and vulnerability detection, including Malware-SMELL by Barros et al. (2022) [30] and Zero-Vuln by Sawadogo et al. (2023) [31]. While these works demonstrate the potential of ZSL in cybersecurity, notable gaps remain in feature representation, semantic integration, and model learning strategies.

In terms of feature encoding, Malware-SMELL employed ontology-based semantic vectors, whereas Zero-Vuln combined deep neural networks with few-shot samples for behavior modeling. In contrast, this study introduces a novel VQ-VAE-based encoder that transforms malware binaries into images and compresses them into discrete latent vectors, offering dimensionality reduction, semantic retention, and enhanced minority class recognition.

For semantic integration, prior studies relied on static attributes (e.g., CVE summaries). This work innovatively adopts a semantic graph to model attribute sharing and conceptual distances across ransomware families, enabling scalable reasoning within the ZSL inference process.

From a learning strategy perspective, this study departs from traditional inductive pipelines by employing a TZSL framework, which jointly leverages seen and unseen class information during training. Finally, the evaluation incorporates GZSL metrics, including H-mean, to better assess model generalization across both known and novel ransomware variants.

The detailed flowchart of the proposed model for malware classification is presented in Figure 4. As illustrated, the system incorporates a knowledge concept map constructed from API call behaviors, which provides a structured semantic representation of malware families. Additionally, concept lattices are employed to visualize and hierarchically organize behavioral attributes of ransomware families, thereby facilitating enhanced interpretability and more effective feature extraction.

To optimize performance, the VQ-VAE model parameters were fine-tuned using a grid search approach, aiming to minimize convergence loss during the training process.

Figure 4 outlines the operational phases of the proposed TZSL model, which consists of the following components:

(i): Data preprocessing;
(ii): Ontology construction;
(iii): Semantic knowledge concept mapping;
(iv): Model training and optimization;
(v): Model validation.

3.2. Knowledge Graph Construction and Inference on the Evolution of Ransomware Threats

This section outlines the methodology for constructing the ontology-driven knowledge graph, emphasizing the quantification of conceptual similarity between nodes and the assignment of semantic edge weights.

Step 1: Node Representation

Each node in the knowledge graph corresponds to either a ransomware family or a behavioral attribute. These nodes are encoded as semantic vectors, which are derived from ontological descriptions and associated malware features. This representation enables structured and interpretable modeling of malware semantics.

Step 2: Similarity Computation Using Cosine Similarity

To measure the conceptual similarity between nodes, cosine similarity is applied to their semantic vectors. Cosine similarity calculates the cosine of the angle between two vectors in a high-dimensional space, yielding a continuous similarity score ranging from 0 (no similarity) to 1 (complete similarity). This metric effectively captures the semantic alignment between ransomware families or behavioral attributes.

The similarity Sim(A,B) between nodes A and B is computed using the following formula:

S i m (A, B) = \frac{v_{A} . v_{B}}{|v_{A} | . | v_{B}|}

(5)

where

v_{A}

and

v_{B}

are the semantic vectors of nodes A and B, respectively.

Step 3: Edge Weight Assignment

Edges in the knowledge graph are assigned continuous weights based on the computed cosine similarity scores, with values ranging from 0 to 1. These weights quantify the semantic strength of the relationship between connected nodes, enabling a more nuanced representation compared to binary edge assignment (i.e., the presence or absence of a connection).

Step 4: Edge Construction Based on Shared Attributes

An edge is established between two nodes if their similarity score exceeds a predefined threshold, indicating a significant degree of shared behavioral characteristics or attribute overlap. This thresholding mechanism ensures that only semantically meaningful connections are retained in the graph, reducing noise and enhancing interpretability.

Step 5: Pseudocode for Edge Construction

To formally describe the edge construction process, the following pseudocode outlines the algorithm used to generate weighted edges between semantically similar nodes:

# Assume nodes is a list of semantic vectors representing ransomware families or attributes
# threshold is the minimum similarity value to create an edge
for i in range (len (nodes)):
for j in range (i + 1, len (nodes)):
similarity = Jaccard_similarity (nodes [i], node s [j])
if similarity >= threshold:
create_edge (node i, node j, weight = similarity)

A Knowledge Graph for Inferring the Evolution of Ransomware Threats

In response to the increasing diversity and rapid evolution of ransomware attacks, traditional supervised learning models—which rely heavily on large volumes of labeled data—often struggle to adapt to newly emerging or mutated ransomware families. To address this limitation, the integration of knowledge graph-based reasoning and zero-shot inference offers a promising solution.

Knowledge graphs allow for the structured representation of behavioral characteristics and semantic attributes across ransomware families. By leveraging node similarity and shared attribute inference, the model can uncover latent semantic relationships, facilitating the interpretation of previously unlabeled samples. Meanwhile, zero-shot inference enables the classification of unseen categories by utilizing semantic embeddings learned from known classes, eliminating the need for extensive training data. The combination of these techniques significantly enhances the model’s ability to identify novel or variant ransomware, reduces false positives, and enables a more flexible and forward-looking detection mechanism. This is particularly beneficial in cybersecurity contexts characterized by zero-day threats and sparsely labeled data environments.

3.3. Mathematical Model of TZSL for Malware Classification

The mathematical model for ZSL in malware threat classification incorporates feature representation, attribute-based classification, and similarity measurement to effectively identify unseen malware variants. In the context of malware threat classification, TZSL can be mathematically modeled using several key components [15,16,17].

To enhance clarity and interpretability, the variable definitions used throughout the ZSL mathematical formulation are summarized as follows (Table 3):

Feature space representation. Malware samples x are represented as a feature vector $f (x) \in R^{n}$ , where n is the number of features extracted from the malware using sandbox (e.g., API calls, opcode sequences, or static analysis features). The training set consists of labeled instances from known classes (seen classes), while the test set includes instances from unknown classes (unseen classes). Let C_s denote the set of seen classes and C_u denote the set of unseen classes.
Attribute-based representation. Each class can be described by a set of attributes. For instance, malware families can be characterized by attributes such as encrypted files, spread through e-mail, or disguised as legitimate software. These attributes form a semantic space A, where each class c∈C is associated with a vector of attributes a_c∈R_n
Learning mechanism. The model learns to predict attributes for unseen classes based on their feature representations. This can be expressed as follows:

f (x) = a_{p}

(6)

where f is the function mapping feature vectors to attribute vectors.

The similarity between the predicted attribute vector and the attribute vectors of unseen classes can be computed using the cosine similarity measure Sim:

S i m (a_{p}, a_{u}) = \frac{a_{p} . a_{u}}{∥ a_{p} ∥ . ∥ a_{u} ∥}

(7)

D.: Classification decision. The predicted class for an unseen instance is determined by selecting the class with the highest similarity score:

{c_{p} = a r g m a x}_{c \in C u} (a_{p}, a_{u})

(8)

E.: Training and evaluation. The model can be trained using a loss function that minimizes the difference between predicted and actual attributes for seen classes, often using cross-entropy loss:

L = - \sum_{i = 1}^{n} y_{i} . \log (y_{i})

(9)

where

y_{i}

is the classification error for feature i between the predicted attribute

a_{p}

and actual attribute

a_{u}

, i.e.,

y_{i} = a_{p} (i) - a_{u} (i)

(10)

F.: Evaluation Metrics. The performance of the ZSL model can be evaluated using metrics such as accuracy, precision, recall, F1-score, and ROC curve analysis.

3.4. Generalized Zero-Shot Learning Model

Traditional ZSL assumes that all test instances belong exclusively to unseen classes—categories that are entirely absent during the training phase. While this assumption is useful for theoretical validation, it is misaligned with real-world applications. In practical scenarios such as malware detection, object recognition, or medical diagnostics, test samples frequently originate from a mixture of seen and unseen classes. This results in a critical limitation: standard ZSL models tend to be biased toward seen classes, as their training is constrained to only those categories.

To address this issue, the concept of GZSL was formally introduced by Chao et al. (2016) [32]. Their work critically examined the limitations of conventional ZSL frameworks and proposed a generalized evaluation protocol that more accurately reflects real deployment environments. GZSL enables the model to classify both seen and unseen categories during inference, thereby improving its ability to generalize across the entire label space.

Unlike standard ZSL, which restricts evaluation to unseen classes only, GZSL adopts a more realistic setting where both known and novel categories are present at test time. The model must determine, without prior knowledge of class membership, whether an input sample belongs to a previously seen or unseen category, making GZSL especially valuable for tasks such as zero-day malware classification or the identification of emerging entities in dynamic domains.

The typical GZSL workflow includes the following [33]:

Semantic space embedding: Each class (seen or unseen) is represented by a semantic vector, such as an attribute vector, word embedding, or ontology-based descriptor.
Visual-semantic mapping: A mapping function is trained to project input features (e.g., images, behavior logs) into the semantic space.
Inference: During testing, a sample is mapped to the semantic space and classified into the class whose semantic representation is most similar.

This evaluation set compels the model to generalize beyond the training categories while avoiding overfitting the seen classes. Given that models typically perform significantly better on seen categories, an H-mean is employed as a balanced metric to assess generalization performance across both seen and unseen classes. The H-mean mitigates the bias by penalizing large discrepancies between the two accuracy scores, offering a more representative evaluation of the model’s real-world applicability as follows:

H - m e a n = \frac{2 \times ({A c c}_{s e e n} \times {A c c}_{u n s e e n})}{{A c c}_{s e e n} + {A c c}_{u n s e e n}},

(11)

where Acc_seen is the average classification accuracy on seen classes, and Acc_unseen is the average classification accuracy on unseen classes.

Practically, the harmonic mean (H-mean) is widely used to measure the trade-off between a model’s performance on seen and unseen classes in the context of GZSL. A high H-mean indicates that the model performs well on both seen and unseen categories, which is essential for robust real-world deployment in dynamic threat environments such as malware or ransomware detection.

4. Experimental Results

This section evaluates the performance of the proposed TZSL-based model through a practical ransomware detection experiment. The experiment was carried out in a Linux-based environment, and the corresponding software dependencies are detailed in Table 4.

4.1. Case Studies

Step 1. Data preprocessing

The experimental dataset comprised 1782 ransomware samples from 12 distinct families, collected from Intezer Analyze (https://analyze.intezer.com/, accessed on 5 September 2024) [34] and the Malware Bazaar database (https://bazaar.abuse.ch/, accessed on 5 September 2024) [35] between February 2023 and March 2025, as illustrated in Figure 5. Initially, a total of 2950 samples associated with these malware families were downloaded.

After excluding files with preprocessing anomalies, a total of 1782 samples spanning 12 malware families were retained. The anomalies included (1) samples employing sandbox evasion techniques; (2) samples with insufficient API call data; and (3) samples that generated corrupted or unreadable output during analysis. Additionally, 299 benign application files were collected from the official Windows 7 update repository to serve as control samples. Of the known malware samples, 70% were allocated for training, 20% for validation, and the remaining 10% of variant samples were reserved for testing. The 12 ransomware families used in the experiments are listed in Table 5.

As shown in Table 5, the number of ransomware samples is highly imbalanced across the 12 experimental families. For instance, the WannaCry family contains only 32 samples, whereas the RedLineStealer family includes over 377 samples. Such class imbalance can adversely affect the training process and model performance, particularly for minority classes that are underrepresented.

Resampling Strategy

To mitigate this issue, the Synthetic Minority Over-sampling Technique (SMOTE) [36,37] was employed during the training phase. SMOTE generates synthetic instances by interpolating between existing samples in the minority class, thereby achieving class balance without relying on simple duplication. This method reduces model bias toward majority classes and enhances generalization across all categories. The use of SMOTE ensures that each ransomware family contributes more evenly to the learning process, thereby improving the model’s robustness and fairness in classification tasks.

A.: Feature image

As illustrated in Figure 6, binary feature files can be visualized as grayscale images, where each element in the two-dimensional array corresponds to a pixel intensity ranging from 0 (white) to 255 (black). Higher values in the matrix indicate darker shades, while lower values correspond to lighter tones. In this experiment, ransomware executable files from various time periods and sources were transformed into Byteplot grayscale representations (Figure 6), which were subsequently used as input to the deep neural network.

To satisfy the input requirements of the TZSL classification model, which incorporates both semantic text and feature images, it was necessary to provide not only API-derived latent vectors but also binary image representations of malware samples. However, due to the frequent unavailability of executable malware binaries, direct image conversion was often infeasible. To address this limitation, output reports generated by Cuckoo Sandbox (in JSON format) were converted into 16 × 16 grayscale images, as illustrated in Figure 7, to fulfill the input constraints of the TZSL framework. An example visualization of the AgentTesla ransomware is presented in Figure 8.

B.: Auxiliary information: semantic descriptions

We employed Cuckoo v2.0.7 alongside a Python script tool named ‘Fuckoo’ to generate behavioral features in JSON format, which were subsequently stored as datasets in CSV format (Figure 9). These records encompass data from both normal applications and 35 ransomware families. Then, API calls are captured using the Cuckoo Sandbox as input auxiliary information for the ZSL model.

Step 2. Ontology construction

API call names were extracted from all JSON-formatted samples and analyzed using n-gram statistical methods. The frequency of each API across the dataset was computed, and a filtering threshold was applied (e.g., inclusion in more than 10 samples). Based on this criterion, a total of 274 representative API calls were selected through n-gram frequency analysis.

Data Visualization

To illustrate the behavioral relationships among API calls, a call graph was generated using the matplotlib.pyplot library (3.5.3), as shown in Figure 10. This visualization highlights the associations between Windows API functions invoked by ransomware during execution.

Step 3. Knowledge Concept Map

In this step, the goal is to clarify hierarchical relationships—such as superclass, subclass, and sibling class associations—among semantic categories. Behavioral characteristics extracted from malware samples are input into the Protégé ontology editor to construct a feature-based knowledge representation. These data are subsequently exported in XML format to ensure compatibility with the Concept Explorer tool, which is used to generate concept lattices that visualize semantic relationships among ransomware families.

Within the Protégé environment, class hierarchies define object properties, inter-class relationships, attribute constraints, and the shared semantics of various feature types. As illustrated in Figure 11 and summarized in Table 6, the resulting concept lattice provides a structured and hierarchical representation that enhances semantic understanding and supports the systematic analysis of behavioral traits exhibited by different ransomware families.

The generated ontology, visualized through concept lattices, facilitates the exploration of complex attribute structures and the identification of meaningful relationships within the dataset. Concept lattices are widely utilized in knowledge representation and data mining research due to their ability to model intricate semantic dependencies in a clear and interpretable manner.

The architecture of the knowledge concept lattice is effectively visualized using concept lattices, which are widely employed in studies of knowledge representation and data mining. These structures facilitate the detailed analysis of complex attribute relationships and provide semantic clarity within datasets. By leveraging such visualizations, researchers can uncover latent patterns and interdependencies, thereby improving the interpretability and analytical depth of the data.

Step 4: Construction of the Malware Codebook

The construction of the malware codebook, based on the Vector Quantized Variational Autoencoder (VQ-VAE) model, involved encoding all 274 API behavioral features associated with ransomware families. These features were organized as row entries (C1–C274) within the malware concept corpus. Following the VQ-VAE methodology, continuous feature spaces were transformed into a set of finite, discrete latent representations. Each API feature was encoded as a 768-dimensional discrete vector, which constituted the column vectors of the resulting codebook. This transformation enabled the normalization of feature distributions across the dataset.

Specifically, the codebook originally presented in Figure 12 represents a 768 × 1782 matrix, where 1782 malware samples are each mapped to a set of 274 APIs. These 274 API names were semantically embedded into 768-dimensional vectors using the BERT language model. Each column (from 0 to 767) corresponds to one semantic dimension of the API embedding, forming the structure of the codebook.

More specifically, Figure 12 represents a 768 × 1782 codebook matrix, in which each of the 1782 malware samples is embedded as a 768-dimensional semantic vector. Each column (from 0 to 767) corresponds to one semantic dimension of an API name, generated using a BERT-based embedding model.

Step 5. Model training and optimization

As shown in Figure 4, the optimal parameters of the VQ-VAE models were determined using a grid search strategy to minimize convergence loss during the training process. Grid search [15] is a systematic method for exhaustively exploring a predefined hyperparameter space to identify the best combination of hyperparameters for a given model.

The basic steps are as follows: First, the hyperparameter space must be defined, including the parameters to be optimized and their possible value ranges. For example, to optimize the regularization parameter α, one might define the search space as α∈{0.001, 0.01, 0.1}.

During training, the objective is to minimize the total loss function L(θ), where θ represents the trainable parameters of the TZSL model. The loss function incorporates a cross-entropy component and a regularization term to mitigate overfitting:

L (θ) = - \frac{1}{N} \sum_{i = 1}^{N} y_{i} . l o g (y_{i}) + α . R (θ),

(12)

where N is the number of samples;

y_{i}

is the classification error between the predicted attribute

a_{p}

and actual attribute

a_{u}

. The term R(θ) denotes the regularization function (e.g., L₂ norm) used to prevent overfitting. The coefficient α balances the trade-off between model fitting and regularization.

In Equation (12), the total loss function L(θ) integrates the cross-entropy loss with a regularization term weighted by the coefficient α. The value of α is selected through grid search over the predefined range α ∈ {0.001, 0.01, 0.1}.

By exhaustively evaluating all combinations of hyperparameters and computing their respective performance metrics, the optimal configuration—i.e., the one yielding the best overall performance—was selected. The finalized architecture parameters are presented in Table 7.

Step 6. Model Validation

In the context of ransomware classification research, it is critically important to distinguish between GZSL and conventional machine learning classification paradigms. Traditional classification assumes that all ransomware families are observed during the training phase; consequently, the model is only expected to classify samples belonging to previously seen categories. This approach is appropriate for static environments where the dataset is complete and fully labeled.

However, in real-world scenarios, ransomware evolves rapidly, and zero-day variants frequently emerge without any prior labeled instances—rendering conventional models ineffective for timely and accurate threat detection.

In contrast, GZSL reflects a more realistic and challenging setting, where the test set comprises both seen and unseen classes. The model must leverage semantic knowledge to infer labels for previously unseen samples. This evaluation paradigm not only assesses the generalization capability of the model but also better aligns with practical cybersecurity requirements for identifying novel threats.

Therefore, distinguishing between GZSL and machine learning experimental designs is essential for accurately evaluating a model’s robustness and real-world applicability in dynamic threat landscapes. A comparative summary of these two paradigms is presented in Table 8.

Two case studies were conducted in the model validation experiment, traditional machine learning and the GZSL classification experiment, as described below:

Case I: Conventional Machine Learning Classification Experiment

Intuitively, the experiment was further divided into two parts: binary classification and multiclass classification.

Binary classification

The experimental dataset consists of ransomware and benign samples, encompassing 12 ransomware families (including known strains and their variants) and one benign category, for a total of 13 classes. In this setting, 10% of previously unseen ransomware samples—without labeled categories—were used as the test set to evaluate the performance of the VQ-VAE-based model. Evaluation metrics included accuracy, recall, and F1-score.

The results of the binary classification experiment are summarized in Figure 13, Figure 14, Figure 15 and Figure 16 and Table 9.

Table 9 summarizes the performance metrics of the VQ-VAE binary classification model. During training, the model achieved an accuracy of 98.0% and a validation accuracy of 97.0%, with both precision and recall ranging from 98% to 99%. In the testing phase, the model maintained a high accuracy of 95.0%, with a slightly reduced precision of 92.0%, but achieved a perfect recall of 100.0%, resulting in an F1-score of 96.0%. These results highlight the model’s strong generalization capability and its effectiveness in accurately detecting ransomware samples.

Multiclass Classification

The multiclass classification experiment utilized a dataset containing both ransomware and benign samples, comprising 12 ransomware families (including known variants) and one benign class, for a total of 13 categories. In this experiment, 10% of the unseen ransomware samples—those without labeled family categories—were designated as the test set to evaluate the performance of the VQ-VAE model. Evaluation metrics included accuracy, recall, and F1-score. The results of the multiclass classification are illustrated in Figure 17 and Figure 18.

A single ROC curve without class-specific delineation significantly limits the interpretability of the model’s performance in a multiclass classification context. To address this limitation, we incorporated per-family ROC curves and their corresponding Area Under the Curve (AUC) values. In addition, both macro-averaged and micro-averaged ROC metrics were computed to provide comprehensive insights into both class-wise and overall model performance. These results are summarized in Table 10.

Table 10 presents the ROC-AUC scores for each ransomware and benign family under the multiclass classification setting. The results indicate strong discriminative performance across both seen and unseen classes. Notably, families such as Dharma, Conti, and Chaos achieved perfect AUC scores of 1.

The inclusion of both macro-averaged and micro-averaged ROC metrics provides a comprehensive assessment of the model’s performance. Macro-averaging, which treats all classes equally, reflects the model’s balanced capability across families and yielded an AUC of 0.995. In contrast, micro-averaging—based on the aggregation of all prediction instances—emphasizes overall classification accuracy and produces an AUC of 0.996.

These findings demonstrate that the proposed model delivers robust performance and generalization, even for previously unseen ransomware families within the GZSL framework. The consistent per-family AUC values underscore the effectiveness of semantic-guided feature learning in detecting novel and evolving threats.

Table 11 further highlights the VQ-VAE model’s per-family classification effectiveness during the testing phase. The results validate its strong capability in distinguishing individual ransomware families and demonstrate its reliability in real-world threat detection scenarios.

Table 12 illustrates the strong performance of the VQ-VAE model in the multiclass classification setting. During the training phase, the model achieved high accuracy (94.9%), along with balanced precision, recall, and F1-score values. Similar results were observed during the validation and testing phases (both at 93.5%), indicating a strong generalization capability and minimal risk of overfitting. These consistently high evaluation metrics underscore the robustness and effectiveness of the model in handling unseen data across multiple ransomware classes.

4.2. Performance Comparison

To establish a clearer benchmark for evaluating the proposed VQ-VAE-based Zero-Shot Learning (ZSL) approach, we conducted a comparative performance analysis using several state-of-the-art deep learning models. Specifically, ResNet-50, VGG-16, and AlexNet were evaluated under the same multiclass classification setting, using identical datasets and evaluation metrics.

Table 13 presents the per-family classification performance of the ResNet-50 model, reporting accuracy, precision, recall, F1-score, and AUC values. The results indicate that ResNet-50 achieves high AUC scores for most families (e.g., 0.998 for Conti, 0.9998 for Chaos), reflecting strong discriminative capabilities.

However, performance notably declined for certain ransomware families, including WannaCry, TrickBot, and RaccoonStealer, particularly in terms of precision and F1-score. This suggests potential limitations in ResNet-50′s ability to handle imbalanced or underrepresented classes. Furthermore, micro-averaged and macro-averaged AUC values were computed, yielding scores of 0.978 and 0.971, respectively, which further reflect the model’s overall classification capability.

Table 14 presents the comparative performance evaluation of the proposed VQ-VAE model against four baseline models: LeNet-5, ResNet-50, VGG-16, and AlexNet. The results demonstrate that the VQ-VAE model consistently outperforms traditional convolutional architectures across all key classification metrics. Specifically, VQ-VAE achieves the highest per-family accuracy (93.5%) and F1-score (93.5%), as well as macro-averaged (99.5%) and micro-averaged (99.6%) AUC values, indicating superior generalization across both seen and unseen ransomware families.

In contrast, CNN-based models such as LeNet-5 and AlexNet exhibit lower classification performance, with F1-scores of only 65.6% and 65.3%, respectively, and notable gaps in the recall. Although VGG-16 shows relatively higher accuracy (82.3%), it is still outperformed by VQ-VAE in all metrics.

Regarding execution time, VQ-VAE maintains competitive efficiency (3.85 s) despite its advanced architecture, outperforming VGG-16 (13.01 s) and ResNet-50 (10.39 s), highlighting its practical applicability in real-time malware detection systems.

4.3. Generalized Zero-Shot Learning Experiment

To assess the generalization capability of the proposed model, a GZSL experiment was designed. In this setting, four ransomware families—WannaCry 2.0, TrickBot, Emotet, and RedLineStealer—were designated as unseen classes and were entirely excluded from the training phase. The remaining nine families were used for training and validation. During the testing phase, the model was evaluated on a mixed dataset comprising both seen and unseen samples, effectively simulating a realistic zero-day detection scenario.

Table 15 presents the per-family classification results of the proposed VQ-VAE-based model under the GZSL framework. For seen families, the model demonstrates strong performance, achieving an average F1-score of 88.38% and AUC of 92.05%. Notably, Dharma, Conti, and Cerber achieved near-perfect metrics across all evaluation criteria, confirming robust recognition of well-trained families. However, precision for Chaos was lower (48.69%) despite high recall, indicating potential false-positive tendencies.

For unseen families, performance varied significantly. While TrickBot and RedLineStealer yielded reasonable F1-scores (81.0% and 75.0%, respectively), WannaCry 2.0 and Emotet performed poorly, with F1-scores of 55.0% and 48.0%, respectively. This discrepancy reflects the challenges of semantic generalization for behaviorally ambiguous or underrepresented families. The overall average F1-score for unseen classes was 64.75%, demonstrating moderate generalization capacity.

As visualized in Figure 19 (ROC curve) and Figure 20 (confusion matrix), class imbalance and feature overlap remain key factors influencing performance, underscoring the importance of continued model refinement for GZSL deployment in real-world scenarios.

Unlike standard accuracy metrics, which tend to be biased toward seen classes due to their inclusion in the training data, the harmonic mean (H-mean) provides a more balanced evaluation by jointly considering performance on both seen and unseen classes. It penalizes large disparities between the two, making it particularly well suited for assessing the effectiveness of Generalized Zero-Shot Learning (GZSL) models.

Table 16 summarizes the GZSL evaluation results of the proposed VQ-VAE-based model. The model achieves a high classification accuracy of 91.40% for seen classes, indicating its effectiveness in learning and recognizing known ransomware families. However, the accuracy for unseen classes is notably lower at 59.25%, resulting in a harmonic mean (H-mean) of 71.89%. This performance gap highlights a common challenge in GZSL settings: the model tends to overfit to seen classes due to the imbalance in training data and the lack of labeled examples for unseen categories.

The lower accuracy on unseen classes can be attributed to several factors. First, semantic representations derived from limited behavioral information may be insufficient to fully capture the unique characteristics of novel malware families. Second, unseen families often exhibit ambiguous or overlapping features with seen classes, increasing the likelihood of misclassification. Lastly, the absence of fine-tuning on unseen distributions limits the model’s adaptability. These observations suggest the need for enhanced semantic embedding quality, data augmentation strategies, or hybrid few-shot learning mechanisms to improve the detection of novel threats in real-world GZSL scenarios.

4.4. Practical Deployment Considerations

To enhance the practical applicability of the proposed framework, this section addresses key real-world implementation factors, including inference speed, memory requirements, and integration potential with existing security infrastructures. These considerations are essential for evaluating the feasibility of deploying the model in operational cybersecurity environments (see Table 17).

Inference Speed

To assess responsiveness, the average inference time was measured during the evaluation. The model processes each input sample within approximately 75–85 milliseconds on average (including image and semantic processing), indicating suitability for near real-time ransomware detection. This latency enables timely threat identification and response, which is crucial for minimizing the impact of ransomware attacks.

Hardware Specifications

All experiments were conducted on a workstation equipped with an Intel Core i7-6700@3.40 GHz CPU, 32 GB of RAM, and an NVIDIA GTX 1080 GPU with approximately 3–3.5 GB of Video RAM. These hardware configurations were sufficient to support real-time inference without noticeable computational bottlenecks.

Minimum Hardware Requirements

For wider adoption in enterprise settings, the minimum system specifications required to maintain inference efficiency include at least an Intel Core i5-class CPU, 8 GB of RAM, and a GPU with a minimum of 4 GB of dedicated memory. These requirements ensure compatibility with mid-range hardware commonly found in corporate environments.

Deployment on Edge Devices

The feasibility of deploying the model on edge devices—often characterized by constrained computational resources—was also examined. While the current implementation is optimized for GPU-accelerated environments, ongoing work focuses on applying model compression techniques, such as TensorRT or ONNX Runtime-based pruning, to reduce memory and compute demands. These efforts aim to enable effective deployment on resource-constrained platforms such as IoT gateways and endpoint security appliances, thereby extending the framework’s real-world applicability.

5. Conclusions

This study proposed a TZSL model based on the VQ-VAE architecture for ransomware detection and the identification of novel or variant malware families, validated on a dataset comprising 12 ransomware families collected between 2023 and March 2025. Experimental results demonstrate that the proposed model achieves an F1-score of 93.5% in ransomware classification, outperforming conventional CNN-based baselines (i.e., ResNet-50, VGG-16, and AlexNet) while exhibiting strong generalization capability across diverse ransomware categories.

The primary contributions of this work include the following: (1) the integration of semantic knowledge graphs and binary feature representations to enhance the learning process; (2) the design of a VQ-VAE-based codebook that discretizes malware behavioral features into compact latent embeddings; and (3) the application of formal concept lattice visualization to reveal hierarchical relationships among ransomware behaviors. Furthermore, a semi-supervised learning strategy was implemented, allowing the model to exploit both labeled and unlabeled samples, thereby bridging the semantic gap between seen and unseen classes.

While the proposed method shows strong generalization in controlled settings, its performance may vary across broader or evolving datasets. Additionally, the reliance on labeled semantic attributes and sandbox-derived features may limit adaptability in environments with sparse behavioral metadata. Future work will extend model testing to open-source datasets (e.g., EMBER [38], Malicia [39]) and evaluate adversarial robustness under obfuscation and concept drift scenarios.

Real-time deployment considerations, including performance profiling on different hardware configurations and lightweight model compression for edge devices, will also be examined. Lastly, comparative analyses with other state-of-the-art Zero-Shot Learning and Few-Shot Learning approaches will be conducted to further validate the performance advantages of the proposed solution.

Author Contributions

Conceptualization, P.W.; methodology, P.W.; resources, P.W.; formal analysis, H.-C.L. (Hsiao-Chung Lin); data curation, H.-C.L. (Hao-Cyuan Li).; writing—original draft, P.W. and W.-H.L.; writing—review and editing, P.W.; software, H.-C.L. (Hao-Cyuan Li) and N.-Z X.; validation, H.-C.L. (Hao-Cyuan Li) and W.-H.L.; visualization, H.-C.L. (Hsiao-Chung Lin) and N.-Z.X.; project administration, P.W.; funding acquisition, P.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Ministry of Science and Technology of Taiwan under grant no. NSTC 113-2410–H-168-003 and the green energy technology research center on the featured areas research center program within the framework of the higher education sprout project from the Ministry of Education (MOE) of Taiwan under grant no. MOE 2000-109CC5-001.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Montalbano, E. Hit with ‘Severe’ LockerGoga Ransomware Attack, Norsk Hydro Press, 19 March 2019. Available online: https://securityledger.com/2019/03/norsk-hydro-hit-with-severe-lockergoga-ransomware-attack/ (accessed on 15 May 2024).
Wallix. What Happened in the Colonial Pipeline Ransomware Attack. Available online: https://www.wallix.com/what-happened-in-the-colonial-pipeline-ransomware-attack-2/ (accessed on 5 February 2025).
Abrams, L. Chemical Distributor Pays $4.4 Million to DarkSide Ransomware. Available online: https://www.bleepingcomputer.com/news/security/chemical-distributor-pays-44-million-to-darkside-ransomware/ (accessed on 7 February 2025).
Reuters. Toyota Suspends Domestic Factory Operations After Suspected Cyber Attack. 1 March 2022. Available online: https://www.reuters.com/business/autos-transportation/toyota-suspends-all-domestic-factory-operations-after-suspected-cyber-attack-2022-02-28/ (accessed on 7 February 2025).
Trend Micro. Ransomware Review: First Half of 2024. Available online: https://unit42.paloaltonetworks.com/unit-42-ransomware-leak-site-data-analysis/ (accessed on 10 September 2024).
Check Point Research, Global Cyber Attack Statistics for Q3, 12 November 2024. 2024. Available online: https://www.informationsecurity.com.tw/article/article_detail.aspx?aid=11374 (accessed on 5 March 2025).
Olaimat, M.N.; Aizaini Maarof, M.; Al-rimy, B.A.S. Ransomware Anti-Analysis and Evasion Techniques: A Survey and Research Directions. In Proceedings of the 2021 3rd International Cyber Resilience Conference (CRC), Langkawi Island, Malaysia, 29–31 January 2021; pp. 1–6. [Google Scholar]
Beaman, C.; Barkworth, A.; Akande, T.D.; Hakak, S.; Khan, M.K. Ransomware: Recent advances, analysis, challenges and future research directions. Comput. Secur. 2021, 111, 102490. [Google Scholar] [CrossRef] [PubMed]
Gyamfi, N.K.; Goranin, N.; Ceponis, D.; Čenys, H.A. Automated System-Level Malware Detection Using Machine Learning: A Comprehensive Review. Appl. Sci. 2023, 13, 11908. [Google Scholar] [CrossRef]
Fidelis Security. Breaking Down Signature-Based Detection: A Practical Guide. Available online: https://fidelissecurity.com/threatgeek/network-security/signature-based-detection/ (accessed on 2 March 2025).
Chaudhary, S.; Garg, A.A. Machine Learning Technique to Detect Behavior Based Malware. In Proceedings of the 2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence), Noida, India, 29–31 January 2020; pp. 655–659. [Google Scholar]
Reddy, V.J.; Jyoshna, R.; Jyothi, B.; Poojari, R.J.; Kartheek, N.; Karthik, G.B.; Karthik, R. Behaviour Based Malware Detection using Machine Learning. Int. J. Res. Appl. Sci. Eng. Technol. 2023, 11, 706–7111. [Google Scholar] [CrossRef]
Bensaoud, A.; Kalita, J. A Survey of Malware Detection Using Deep Learning. arXiv 2024, arXiv:2407.19153v1. [Google Scholar] [CrossRef]
Wang, P.; Lin, H.C.; Chen, J.H.; Lin, W.H.; Li, H.C. Improving Cyber Defense Against Ransomware: A Generative Adversarial Networks-Based Adversarial Training Approach for Long Short-Term Memory Network Classifier. Electronics 2025, 14, 810. [Google Scholar] [CrossRef]
Wang, W.; Duan, L.; En, Q.; Zhang, B. Context-sensitive Zero-shot Semantic Segmentation Model based on Meta-learning. Neurocomputing 2021, 465, 465–475. [Google Scholar] [CrossRef]
Li, X.; Fang, M.; Feng, D.; Li, H.; Wu, J. Prototype Adjustment for Zero Shot Classification. Signal Process. Image Commun. 2019, 74, 242–252. [Google Scholar] [CrossRef]
Wan, Z.; Chen, D.; Li, Y.; Yan, X.; Zhang, J.; Yu, Y.; Liao, J. Transductive Zero-Shot Learning with Visual Structure Constraint. arXiv 2019, arXiv:1901.01570. [Google Scholar] [CrossRef]
Palatucci, M.; Pomerleau, D.; Hinton, G.; Mitchell, T.M. Zero-shot Learning with Semantic Output Codes. In Proceedings of the 22nd International Conference on Neural Information Processing Systems (NIPS’09), Vancouver, BC, Canada, 7–10 December 2009; pp. 1410–1418. [Google Scholar]
Hinton, G.E.; Salakhutdinov, R.R. Reducing the Dimensionality of Data with Neural Networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef]
Wu, Y.; Liu, F.; Wan, L.; Wang, Z. Intelligent Fault Diagnostic Model for Industrial Equipment Based on Multimodal Knowledge Graph. IEEE Sens. J. 2023, 23, 26269–26278. [Google Scholar] [CrossRef]
Han, H.; Wang, J.; Wang, X.; Chen, S. Construction and Evolution of Fault Diagnosis Knowledge Graph in Industrial Process. IEEE Trans. Instrum. Meas. 2022, 71, 1–12. [Google Scholar] [CrossRef]
Peng, C.; Xia, F.; Naseriparsa, M.; Osborne, F. Knowledge Graphs: Opportunities and Challenges. Artif. Intell. Rev. 2023, 56, 13071–13102. [Google Scholar] [CrossRef] [PubMed]
Haffar, R.; Domingo-Ferrer, J.; Sánchez, D. Explaining Misclassification and Attacks in Deep Learning via Random Forests. In Modeling Decisions for Artificial Intelligence (MDAI 2020); Torra, V., Narukawa, Y., Nin, J., Agell, N., Eds.; Lecture Notes in Computer Science 12256; Springer: Cham, Switzerland, 2020. [Google Scholar]
Dunmore, A.; Jang-Jaccard, J.; Sabrina, F.; Kwak, J.A. Comprehensive Survey of Generative Adversarial Networks (GANs) in Cybersecurity Intrusion Detection. IEEE Access 2023, 11, 76071–76094. [Google Scholar] [CrossRef]
Abusnaina, A.; Abuhamad, M.; Alasmary, H.; Anwar, A.; Jang, R.; Salem, S.; Nyang, D.; Muheisin, D. A Deep Learning-based Fine-grained Hierarchical Learning Approach for Robust Malware Classification. arXiv 2020, arXiv:2005.07145. [Google Scholar] [CrossRef]
Alam, M.; Akram, A.; Saeed, T.; Arshad, S. DeepMalware: A Deep Learning based Malware Images Classification. In Proceedings of the 2021 International Conference on Cyber Warfare and Security (ICCWS), Islamabad, Pakistan, 23–25 November 2021; pp. 93–99. [Google Scholar]
Aslan, O.; Yılmaz, A.A. A New Malware Classification Framework Based on Deep Learning Algorithm. IEEE Access 2021, 9, 87936–87951. [Google Scholar] [CrossRef]
Card, Q.; Aryal, K.; Gupta, M. Explainability-Informed Targeted Malware Misclassification. arXiv 2024, arXiv:2405.04010. [Google Scholar]
Tiu, E. Understanding Zero-Shot Learning—Making ML More Human. Available online: https://www.linkedin.com/pulse/understanding-zero-shot-learning-making-ml-more-human-chittibabu/ (accessed on 10 September 2024).
Barros, P.H.; Chagas, E.T.C.; Oliveira, L.B.; Queiroz, F.; Ramos, H.S. Malware-SMELL: A zero-shot learning strategy for detecting zero-day vulnerabilities. Comput. Secur. 2022, 120, 102785. [Google Scholar] [CrossRef]
Sawadogo, Z.; Dembele, J.M.; Mendy, G.; Ouya, S. Zero-Vuln: Using deep learning and zero-shot learning techniques to detect zero-day Android malware. In Proceedings of the 2023 3rd International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME), Tenerife, Canary Islands, Spain, 19–21 July 2023. [Google Scholar]
Chao, W.L.; Changpinyo, S.; Gong, B.; Sha, F. An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild. In Proceedings of the Computer Vision–ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016. [Google Scholar]
Zhang, C.; Li, Z. Generalized zero-shot learning via discriminative and transferable disentangled representations. Neural Netw. 2025, 183, 106964. [Google Scholar] [CrossRef]
Intezer. Intezer Analyze. Available online: https://analyze.intezer.com/ (accessed on 5 September 2024).
Malware Bazaar. Malware Bazaar Database. Available online: https://bazaar.abuse.ch/browse/ (accessed on 5 September 2024).
Tan, X.; Su, S.; Huang, Z.; Guo, X.; Zuo, Z.; Sun, Z.; Li, L. Wireless sensor networks intrusion detection based on SMOTE and the random forest algorithm. Sensors 2019, 19, 203. [Google Scholar] [CrossRef]
Chawla, N.V.; Bowye, K.W.; Hal, R.L.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
Anderson, S.H.; Roth, P. EMBER: An Open Dataset for Training Static PE Malware, Machine Learning Models. arXiv 2018, arXiv:1804.04637v2. [Google Scholar]
Imdea Software Institute, Academia. Available online: https://software.imdea.org/~juanca/bibtex.html#driving (accessed on 17 May 2025).

Figure 1. Pre-training in the ZSL model [29].

Figure 2. ZSL model recognition for unseen categories.

Figure 3. TZSL model for malware classification.

Figure 4. Operational flow of the proposed TZSL-based malware classification model.

Figure 5. Ransomware sample collection from public malware databases.

Figure 6. Workflow for generating grayscale image representations from sandbox outputs.

Figure 7. Grayscale image generation from sandbox analysis reports of normal applications.

Figure 8. Visualization of a 16 × 16 grayscale image for the AgentTesla ransomware sample.

Figure 9. Information extracted from malware analysis using Cuckoo Sandbox.

Figure 10. API call relationship graph for AgentTesla ransomware.

Figure 11. Concept lattice illustrating hierarchical relationships among ransomware families based on shared semantic attributes.

Figure 12. API feature-based codebook generated using the VQ-VAE model (CSV format).

Figure 13. Training and validation accuracy for the VQ-VAE model (binary classification).

Figure 14. Convergence error training of the VQ-VAE model (binary classification).

Figure 15. ROC curve of the VQ-VAE model (binary classification).

Figure 16. Confusion matrix for VQ-VAE model (binary classification).

Figure 17. ROC curve of the VQ-VAE model (multiclass classification).

Figure 18. Confusion matrix for the VQ-VAE model (multiclass classification).

Figure 19. ROC curves for ransomware under the GZSL approach.

Figure 20. Confusion matrix of the VQ-VAE model on GZSL-based multiclass classification.

Table 1. Traditional techniques for malware detection.

	Features		Limitations
	Basic Theory	Function	Limitations
Signature-based detection [10]	Relies on known static patterns of malware retrieved from the sandbox offline to identify threats.	Fast and efficient for known threats; low false positive rate when signatures are accurate.	Ineffective against new or mutated malware that lacks signatures.
Behavioral analysis [11]	Dynamically analyzes malware behavior online via an SDN (software-defined network) controller.	This approach effectively detects and prevents infections from unknown malware through the analysis of predefined behavioral patterns configured in the SDN controller, assuming that the malware variants share behavioral features with known threats.	Relies on pattern update frequencies in SDN controllers; may be vulnerable to stealthy malware.
Machine learning [9,12]	Utilizes algorithms such as CNNs, RNNs, LSTMs, and GRUs to classify malware based on features extracted from collected binary files.	Can adapt to new threats through training; improves detection accuracy over time with sufficient labeled malware samples.	Requires large labeled datasets for training; may struggle with zero-day threats.

Table 2. The comparison table for the three solutions to address misclassification in deep learning networks.

	Features	Advantages	Limitations
Increase model algorithm complexity [15,16,17,18,19]	-Increases model algorithm complexity, e.g., using semantic embedding techniques.	-Improves classification accuracy; enhances generalization ability.	-Increased computational cost; requires more auxiliary information.
Use diverse training data [23]	-Ensures training data covers various types of samples; uses sample augmentation algorithms.	-Improves model robustness; reduces overfitting.	-Increased data collection cost; may introduce noise.
Adversarial attack training [24]	-Researches adversarial attacks and defense strategies to enhance the model’s recognition ability.	-Improves model security; enhances generalization ability.	-Increased computational cost; requires additional data.

Table 3. Variable Definitions for the ZSL Mathematical Model.

Symbol	Definition
x	A malware instance (sample) in the dataset.
$f (x) \in R^{n}$	Feature vector representation of malware sample xxx, where n is the feature dimension.
C_s	The set of seen classes (i.e., malware families available during training).
C_u	The set of unseen classes (i.e., malware families not present during training).
a_c∈R_n	Attribute vector representing semantic features of class c.
A	Semantic space is composed of class attribute vectors.
f(⋅)	Mapping function from feature space to semantic attribute space.
$a_{p}$	Predicted attribute vector output by model for input x.
$a_{u}$	True attribute vector associated with an unseen class c∈Cu.
$S i m (a_{p}, a_{u})$	Cosine similarity between predicted and true attribute vectors.
$c_{p}$	Predicted class label for an unseen instance.
$y_{i}$	Classification error at feature dimension iii, i.e., $y_{i}$ = $a_{p}$ (i) − $a_{u}$ (i).
L	Loss function (cross-entropy) used during model training.

Table 4. Software package used in ransomware classification.

Item	Hardware/Software Installed
Host	CPU: Intel i7-6700 3.4 Ghz RAM: 32 G, NVIDIA GTX-2080 TI x2
Disassembly tools	1. PETools; 2. Exeinfo PE; and 3. Pestudio on Ubuntu v16.04
Feature library	Protégé v3.1.1
Sandbox	Cuckoo v2.0.7
Library Package	Python 3.12.4, Tensorflow 2.10.0, Pytorch 1.13.1, abslpy 0.7.1, Opencv, numpy 1.16.3, Pandas 1.3.5, scikit-learn 1.0.2, and Matplotlib 3.5.3.

Table 5. Experiment families for ransomware.

Family	Date of First Detection	Download Date	# of Samples	Feature
Benign	N/A	02/2024	299	Clean software used for control comparison
Lokibot	06/2016	02/2023	215	Info-stealer with ransomware component; email phishing vector
Dharma	08/2016	02/2023	74	Uses RDP for infection vector; strong file encryption
Cerber	09/2016	06/2023	55	Offline encryption; voice ransom note; uses RSA + RC4
AgentTesla	06/2017	06/2023	107	.NET-based keylogger and info-stealer; spreads via malicious attachments
RaccoonStealer	04/2019	02/2024	94	Malware-as-a-Service; targets browser credentials, crypto wallets
Conti	12/2020	02/2024	37	Ransomware-as-a-Service; rapid file encryption and data exfiltration
Chaos	08/2021	02/2024	100	Fake ransomware; destroys files instead of encrypting them
Lockbit	01/2021	02/2024	67	Autonomous spreading via the Active Directory; fast encryption
TrickBot	10/2016	02/2025	242	Modular banking Trojan; used for later ransomware delivery (e.g., Ryuk, Conti)
Emotet	03/2017	02/2025	83	Email-based Trojan dropper; often delivers TrickBot or Ryuk
WannaCry 2.0	05/2017	02/2025	32	Encrypts files via EternalBlue exploit and demands Bitcoin; spreads via SMB protocol
RedLineStealer	03/2020	03/2025	377	Steals passwords, cookies, and cryptocurrency wallets; spreads via cracked software ads

Table 6. Descriptions of objects and attributes used in Figure 11.

Object	Form	Meanings
An implicit object	A blank circle	Symbolizing an abstract concept that may encompass further attributes.
The concept of common features	A circle shaded in blue and white	A concept that shares common features; the text inside each cell describes these shared features.
A real object	A white and black circle	A real object within the entire concept lattice.
A real object and common attributes	A blue and black circle	A real object within the entire concept lattice that simultaneously exhibits common attributes shared with others.

Table 7. Final architecture and hyperparameter settings for the proposed TZSL model.

Hyperparameters	Value
Codebook size	512
Embedding dimension	64
Regularization coefficient α	0.01
Batch size	64
Number of layers (encoder/decoder)	3
Optimizer	Adam
Latent vector update frequency	Each mini-batch
Dense (Softmax)	12 (normal:1 + ransomware:12)
Loss function	Categorical entropy
The maximum number of epochs	20

Table 8. Comparison between GZSL and traditional machine learning classification paradigms.

Comparison Aspect	Generalized ZSL Experiment	Traditional Machine Learning Classification Experiment
Training data	Uses only a subset of malware families and a portion of goodware samples	Includes all malware families and all goodware samples
Testing data	Includes both seen and unseen malware families	Contains only samples from the same families seen during training
Objective	Evaluates the model’s ability to infer previously unseen malware families	Evaluates the model’s classification performance on previously seen families
Applicable scenario	Zero-day attacks; dynamic threat environments	Static, fully labeled environments

Table 9. Binary classification accuracy for the VQ-VAE model.

	Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)
Training	98.0	98.0	99.0	99.0
Validation	97.0	98.0	99.0	98.0
Testing	95.0	92.0	100.0	96.0

Table 10. Per-family ROC-AUC scores and macro-/micro-averaged ROC metrics for multiclass classification.

Family	AUC
Benign	0.999
Loki	0.978
Dharma	1.000
Lockbit	1.000
AgentTesla	0.978
Conti	1.000
Chaos	1.000
Cerber	0.998
RaccoonStealer	0.999
WannaCry 2.0	0.999
TrickBot	0.988
Emotet	0.998
RedLineStealer	0.989
Micro-averaged	0.996
Macro-averaged	0.995

Table 11. Per-family validation results of the VQ-VAE multiclass classification model during the testing phase.

Family	Accuracy (Per_Family) (%)	Precision (%)	Recall (%)	F1-Score (%)
Benign	100	84.4	100.0	91.6
Loki	0.86.8	75.0	86.8	80.5
Dharma	100.0	100.0	100.0	100.0
Lockbit	100.0	100.0	100.0	100.0
AgentTesla	71.1	87.1	71.1	78.4
Conti	100.0	100.0	100.0	100
Chaos	100.0	100.0	100.0	100
Cerber	94.7	92.3	94.7	93.5
RaccoonStealer	97.4	92.5	97.4	94.9
WannaCry 2.0	92.1	97.2	92.1	94.6
TrickBot	94.7	97.3	94.7	96.0
Emotet	92.1	94.6	92.1	93.3
RedLineStealer	86.8	100.0	86.8	93.0
Average value	93.5	93.9	93.5	93.5
Micro-averaged	-	-	-	99.6
Macro-averaged	-	-	-	99.5

Table 12. Multiclass classification accuracy for the VQ-VAE model.

	Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)
Training	94.9	95.2	94.9	94.9
Validation	93.5	94.3	93.5	93.6
Testing	93.5	93.9	93.5	93.5

Table 13. Per-family validation results of the ResNet-50 multiclass classification model.

Family	Accuracy (Per_Family) (%)	Precision (%)	Recall (%)	F1-Score (%)	AUC (%)
Benign	82.6	86.4	82.6	84.4	96.6
Loki	84.4	96.4	84.4	90.0	98.6
Dharma	81.4	88.0	81.5	84.6	98.4
Lockbit	100.0	53.8	100.0	70.0	99.8
AgentTesla	81.8	90.0	81.8	85.7	98.0
Conti	90.0	90.0	90.0	90.0	99.8
Chaos	100.0	90.0	100.0	94.7	99.9
Cerber	71.5	50.0	71.4	58.8	96.6
RaccoonStealer	57.1	80.0	57.1	66.7	93.7
WannaCry 2.0	8.3	33.3	8.30	13.3	97.2
TrickBot	66.7	14.3	66.7	23.5	93.2
Emotet	85.7	75.0	85.7	80.0	96.4
RedLineStealer	50.0	100.0	50.0	66.7	93.5
Average value	72.8	73.8	69.9	97.1	97.1
Micro-averaged	-	-	-	-	97.8
Macro-averaged	-	-	-	-	97.1

Table 14. Per-family validation results for performance comparison of baseline models.

Model	Accuracy (Per_Family) (%)	Precision (%)	Recall (%)	F1-Score (%)	Macro-Averaged	Micro-Averaged	Execution Time (Sec)
VQ-VAE	93.5	93.9	93.5	93.5	99.5	99.6	3.85
LetNet5	63.3	73.2	63.3	65.6	95.1	97.4	3.93
ResNet-50	72.8	73.8	69.9	71.8	97.1	97.8	10.39
VGG16	82.3	74.5	74.1	74.3	96.5	94.4	13.01
AlexNet	67.5	63.3	67.5	65.3	97.6	98.8	6.55

Table 15. Per-family classification performance under the GZSL evaluation framework.

Family	Accuracy (Per_Family) (%)	Precision (%)	Recall (%)	F1-Score (%)	AUC (%)
Benign(seen)	98.86	78.38	98.86	87.44	98.8636
Loki(seen)	65.91	96.67	65.91	78.38	65.91
Dharma(seen)	100.00	100.00	100.00	100.00	100.00
Lockbit(seen)	92.78	99.19	92.78	95.87	92.78
AgentTesla(seen)	86.36	78.89	86.36	82.46	86.36
Conti(seen)	98.48	96.65	98.48	97.56	98.48
Chaos(seen)	98.86	48.69	98.86	65.24	98.86
Cerber(seen)	94.74	92.31	94.748	93.51	94.74
RaccoonStealer(seen)	92.42	97.60	92.42	94.94	92.42
Average (Seen)	91.40	87.70	91.63	88.38	92.05
WannaCry 2.0 (unseen)	43.0	56.0	54.0	55.0	58.0
TrickBot(unseen)	81.0	79.0	83.0	81.0	83.0
Emotet(unseen)	40.0	51.0	46.0	48.0	49.0
RedLineStealer(unseen)	73.0	74.0	76.0	75.0	76.0
Average (Unseen)	59.25	65	64.75	64.75	66.5

Table 16. H-mean results for GZSL-based multiclass classification using the VQ-VAE model.

Evaluation Metric	Value (%)
Seen class accuracy	91.40
Unseen class accuracy	59.25
Harmonic mean (H-mean)	71.89

Table 17. Key factors for practical deployment of the proposed model.

Metric	Details
Average inference time	75–85 ms per sample (with image + semantic input)
Hardware used	Intel Core i7-6700 CPU, 32 GB RAM, NVIDIA GTX 1080 GPU (3–3.5 GB)
Minimum hardware requirements	Intel Core i5 CPU, 8 GB RAM, 4 GB GPU VRAM
Edge deployment feasibility	Supported (under development using TensorRT/ONNX compression)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, P.; Li, H.-C.; Lin, H.-C.; Lin, W.-H.; Xie, N.-Z. A Transductive Zero-Shot Learning Framework for Ransomware Detection Using Malware Knowledge Graphs. Information 2025, 16, 458. https://doi.org/10.3390/info16060458

AMA Style

Wang P, Li H-C, Lin H-C, Lin W-H, Xie N-Z. A Transductive Zero-Shot Learning Framework for Ransomware Detection Using Malware Knowledge Graphs. Information. 2025; 16(6):458. https://doi.org/10.3390/info16060458

Chicago/Turabian Style

Wang, Ping, Hao-Cyuan Li, Hsiao-Chung Lin, Wen-Hui Lin, and Nian-Zu Xie. 2025. "A Transductive Zero-Shot Learning Framework for Ransomware Detection Using Malware Knowledge Graphs" Information 16, no. 6: 458. https://doi.org/10.3390/info16060458

APA Style

Wang, P., Li, H.-C., Lin, H.-C., Lin, W.-H., & Xie, N.-Z. (2025). A Transductive Zero-Shot Learning Framework for Ransomware Detection Using Malware Knowledge Graphs. Information, 16(6), 458. https://doi.org/10.3390/info16060458

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Transductive Zero-Shot Learning Framework for Ransomware Detection Using Malware Knowledge Graphs

Abstract

1. Introduction

2. Overview of Deep Learning Techniques and Their Applications in Malware Detection

2.1. Misclassification Analysis in Machine Learning Models

2.2. Zero-Shot Learning Model for Malware Detection

System Architecture

2.3. Vector Quantized Variational Autoencoder Model

3. A ZSL-Based Classification Model for Malware Detection

3.1. Architecture Design for ZSL Models in Malware Identification

3.2. Knowledge Graph Construction and Inference on the Evolution of Ransomware Threats

A Knowledge Graph for Inferring the Evolution of Ransomware Threats

3.3. Mathematical Model of TZSL for Malware Classification

3.4. Generalized Zero-Shot Learning Model

4. Experimental Results

4.1. Case Studies

4.2. Performance Comparison

4.3. Generalized Zero-Shot Learning Experiment

4.4. Practical Deployment Considerations

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI