1. Introduction
As network communication technologies continue to advance, organizations are increasingly confronted with escalating information security challenges. Among these challenges, ransomware poses a significant threat that organizations cannot ignore. According to the latest cybersecurity threat intelligence report from Cyble [
1], the number of global ransomware attacks has increased year by year, as illustrated in
Figure 1. In January 2025, the attack volume more than tripled compared to the same period in previous years. This trend highlights the critical importance of effective ransomware prevention. Training detection models on known ransomware samples improves classification accuracy. However, performance typically degrades when encountering ransomware variants, primarily due to the limited availability of representative training data. Nevertheless, ransomware variants are not entirely independent, since most originate from a small number of major ransomware families and preserve core behavioral and structural characteristics. Therefore, these variants can be classified as part of the same ransomware family. As a result, detection techniques that can identify unknown threats are essential to address the insufficient protection against variant ransomware.
Ransomware detection typically relies on monitoring network traffic or identifying characteristic features. Traditionally, signature-based detection [
2] compares unknown software against a database of signature patterns to determine if it matches any known ransomware. However, signature-based detection can only identify signature patterns already recorded in the database, making it difficult to detect or block new ransomware variants. As a result, anomaly detection techniques are frequently used to detect unknown ransomware, primarily through dynamic and static analysis. Dynamic analysis involves executing unknown programs in a simulated environment to observe whether they exhibit abnormal or malicious behavior. However, dynamic analysis typically requires extended monitoring time, making real-time detection difficult. Additionally, malware can employ various anti-analysis techniques to evade monitoring, thereby reducing the effectiveness of dynamic analysis [
3]. In contrast, static analysis does not require program execution and can directly examine malware’s file structure and code characteristics, enabling rapid detection before ransomware is executed.
As ransomware continues to evolve, training datasets struggle to capture all emerging families. Traditional classifiers are typically designed under the closed-set assumption, limiting their ability to accurately identify categories present only during training. Consequently, when encountering novel ransomware samples, these classifiers often misclassify them into the most similar known families. To enable models to recognize unknown samples, Open-Set Recognition (OSR) methods have recently become an important research focus. However, OSR-based static ransomware analysis faces three major challenges. First, ransomware datasets often exhibit a pronounced long-tailed distribution, leading models to overfit to families with abundant samples while neglecting those with scarce samples, resulting in class imbalance bias [
4]. Second, newly emerging ransomware families have limited samples in their early stages, which makes it difficult for models to effectively extract discriminative features from scarce training data. This challenge is known as the few-shot learning problem [
5]. Third, even if the issues related to long-tailed distributions and few-shot learning are mitigated, models must still be able to recognize unknown samples through out-of-distribution (OOD) detection [
6], allowing them to label unseen samples as unknown rather than erroneously assigning them [
7]. As Moreira et al. [
8] noted, if training data fails to sufficiently cover potential new patterns, the model’s generalization ability for unknown sample recognition will be limited, thus restricting its practical applicability.
To address the aforementioned issues, several studies have proposed targeted solutions for individual challenges. For example, Dina et al. [
9] utilized Focal Loss to dynamically adjust sample weights, reducing the impact of easy examples on the loss function and thereby enhancing the recognition of hard negatives. Although this approach mitigates bias induced by class imbalance, it remains constrained within a closed-set framework and cannot possess the capacity to effectively identify samples from unknown classes. Zhu et al. [
10] enhanced classification stability by employing similarity learning and Center Loss to strengthen inter-family feature distinctions. However, this framework lacks a decision mechanism for recognizing unknown families, often misclassifying them as known families. On the other hand, Ji et al. [
11] used Model-agnostic Meta-learning (MAML) to obtain generalized initial parameters, enabling rapid convergence under conditions of limited samples. Nonetheless, this approach relies heavily on the similarity between training and testing data distributions and does not incorporate mechanisms to identify unknown samples, resulting in ineffective handling of novel families. Traditional OSR methods often rely on Extreme Value Theory (EVT) to estimate a rejection threshold, which is used to assess the probability that a sample belongs to a known class and decide whether to reject it. OpenMax is a representative example of such methods. However, under conditions of limited samples and long-tailed distributions, the fitting process for EVT often becomes inaccurate due to insufficient sample size or feature distribution shifts. This leads to unstable rejection thresholds, which in turn affect the overall reliability of recognition. Guo et al. [
12] employed Generative Adversarial Networks (GANs) to synthesize unknown samples, thereby helping the model learn rejection decisions and expanding the separation boundary between known and unknown samples. However, when tail family samples are extremely scarce, GANs struggle to capture representative latent distributions. The generated samples often suffer from mode collapse and distribution shift, resulting in a lack of diversity and potentially containing features of invalid PE structures, which further weakens the ability of the model to recognize unknown samples. Lu et al. [
13] proposed the DOMR framework, which uses meta-learning to simulate open-world scenarios, enhancing the model’s adaptability to unknown classes. Although this method shows promise in open-set recognition experiments, its effectiveness heavily depends on sufficient training samples. The authors also note that when the sample size per family is small, the model’s generalization ability significantly deteriorates.
In summary, previous research has predominantly focused on individual challenges, making it difficult to simultaneously address the demands of extreme data distributions, few-shot conditions, and unknown sample recognition. To address these limitations, this study introduces a unified framework titled Few-Shot Open-Set Ransomware Detection through Meta-learning and Energy-based Modeling (MEM), which integrates three complementary techniques: MAML, the Energy Function from Energy-based Model (EBM), and Focal Loss, each corresponding to one of the aforementioned challenges. First, MAML is employed to learn generalized initial parameters, enhancing the ability of the model to rapidly adapt to extremely limited samples. Second, the Energy Function quantifies the model’s confidence in its predictions, helping establish a stable and adjustable rejection threshold. Lastly, Focal Loss dynamically adjusts the loss contribution during fine-tuning to alleviate overfitting on head classes and increase the learning participation of tail samples. The integration of these three components enables the proposed framework to effectively handle data imbalance and unknown class challenges in open-set ransomware recognition under static conditions, thereby improving the model’s performance in OSR scenarios. The main contributions of this study are as follows:
Strengthening model learning stability under extreme data distributions by addressing the complexities of long-tailed and few-shot scenarios.
Enhancing the adaptability of models in open-set contexts to improve unknown sample recognition.
The structure of this thesis is organized as follows.
Section 2 introduces static feature modeling and explains the theoretical foundations related to open-set recognition (OSR).
Section 3 provides a detailed description of the proposed three-stage recognition framework and its training process.
Section 4 presents the complete experimental design, the comparative methods, and the result analysis. Finally,
Section 5 summarizes the main contributions of this study and discusses potential directions for future research.
2. Preliminaries
This section presents the core technical foundations underlying MEM.
Section 2.1 describes the Portable Executable (PE) static features used as input, chosen because static analysis avoids the sandbox-evasion risks inherent in dynamic approaches and enables rapid, scalable screening.
Section 2.2 introduces MAML, the meta-learning backbone that enables few-shot adaptation.
Section 2.3 defines the Energy Function used to separate known from unknown samples.
Section 2.4 presents Focal Loss for mitigating class imbalance, and
Section 2.5 describes Center Loss for encouraging intra-class compactness.
2.1. Portable Executable Features
Moreira et al. [
8] proposed a comprehensive multi-PE structure feature combination analysis method that systematically integrates five categories of static features extracted from PE files: PE header fields, section metadata, section entropy, imported DLL and API information, and opcode sequences. This approach constructs a unified feature representation by combining structural, behavioral, and content-based characteristics and has demonstrated significant effectiveness in distinguishing ransomware families through static analysis. The integration of these complementary feature types enables the capture of both high-level behavioral patterns and low-level implementation details, thereby improving classification accuracy across diverse ransomware variants.
In the Windows operating system environment, ransomware primarily relies on Windows PE files as the vehicle for propagation and execution. Therefore, accurately parsing the PE file structures and extracting static features is a necessary prerequisite for effective static analysis and identification of malicious software. As illustrated in
Figure 2, the standard PE format consists of two major structural components: the header and the sections. The header provides metadata about the file’s overall properties and internal indexing, while the sections contain the actual code, data, and other resources. The header region can be further divided into three substructures: the DOS Header, the NT Header, and the Section Header. The DOS Header, located at the beginning of the file, identifies the file type and provides a pointer to the NT Header. The NT Header follows and includes the File Header and the Optional Header. The File Header contains basic attributes such as the target platform and the number of sections. The Optional Header specifies parameters such as the program entry point, memory layout, and execution settings. Within the Optional Header, the Data Directories field lists critical information such as table addresses and library references. Among them, the Import Address Table (IAT) records the Dynamic-Link Libraries (DLLs) and Application Programming Interface (API) names required at runtime, serving as a key reference for static behavioral analysis. Finally, the Section Header defines the starting location, size, and access permissions of each section both on disk and in memory. This structural information can be used to detect anomalies in section configuration, which may indicate malicious behavior.
The sections of a PE file serve as data blocks that store the program’s actual content, with each section holding different types of data depending on its purpose. For example, the .text section contains executable instructions, the .data section stores global variables, and the .rsrc section includes resources such as icons and strings. These section contents can be leveraged to extract opcode sequences, compute section entropy, and examine whether any section exhibits abnormal resource configurations, which may suggest the presence of hidden malicious behavior. All of the above structural and sectional information can be considered as potential sources of static features, contributing to ransomware family classification and the detection of novel variants. Accordingly, this study systematically extracts five categories of static features based on the PE file structure, which serve as inputs to the proposed model. The following subsections provide detailed explanations of each feature type:
Section 2.1.1 introduces the PE Header;
Section 2.1.2 focuses on section metadata;
Section 2.1.3 explains section entropy;
Section 2.1.4 covers DLL and API information; and
Section 2.1.5 presents opcode sequence analysis.
2.1.1. PE Header
The PE Header is located at the beginning of the file and contains essential layout information such as the program’s execution environment and internal structure. As shown in
Table 1, by analyzing the values of specific fields in the header, it is possible to capture the fundamental structural characteristics of a malicious executable and distinguish between different ransomware families. Studies have shown that PE header fields have a significant impact on malware classification. According to the findings of Moreira et al., the PE header is considered one of the most influential static features for ransomware detection.
2.1.2. Section Metadata
Every section is accompanied by structured metadata that defines its attributes, including its name, size, virtual address, and memory access permissions. Such metadata facilitates the reconstruction of a program’s memory layout and enables comparative analysis of section-level structures across different binaries, which can aid in identifying anomalous configurations. Irregularities in section names or sizes may suggest obfuscation techniques employed by malicious software to evade detection. Consequently, section attributes have been widely adopted as salient features in static analysis for identifying potentially suspicious or malicious samples [
13,
14].
2.1.3. Section Entropy
Beyond serving as a basis for understanding program structure and functionality, the content characteristics of PE file sections can also reveal potential anomalies. In the context of malware analysis, entropy-based methods are commonly employed to detect packed or encrypted portable executable files [
2,
9]. Sections that have been encrypted or compressed often exhibit high randomness, which can be quantitatively measured using entropy. Conversely, when a section contains highly regular content, such as repeating characters or zero-padding, it tends to have the lower entropy values, whereas more randomized content results in the higher entropy values.
Sections with abnormally high entropy may indicate that the program utilizes custom encryption or compression techniques to conceal its executable code, thereby evading signature-based detection by antivirus software. As such, entropy analysis has become a valuable auxiliary technique in static analysis for identifying suspicious section configurations and facilitating the preliminary screening of potentially malicious samples.
2.1.4. Dynamic-Link Library and Application Programming Interface
The Import Table in a PE file lists the external DLLs and corresponding API functions that must be loaded at runtime. By examining the DLLs and system APIs imported by a given binary, it is possible to infer the likely behavioral intentions of a malware sample—such as file manipulation, network communication, process injection, or registry modification.
Since the list of imported DLLs and APIs reflects the malware’s dependency on specific operating system functionalities, it is often regarded as a static feature indicative of behavioral intent. Numerous studies have demonstrated that leveraging such import-related features for malware classification can significantly enhance detection accuracy [
13,
15].
2.1.5. Opcode Sequence
An opcode sequence is derived by disassembling the machine code in the executable sections of a binary, resulting in a series of low-level instructions that correspond to specific processor operations. Malware samples belonging to the same family often exhibit similar patterns or recurring fragments in their opcode usage. As a result, opcode sequence features can help determine whether an unknown sample is related to a known malware family.
Some studies [
13,
16,
17] have proposed using opcode frequency patterns to detect variants of known malware families, demonstrating that such approaches are effective in identifying previously unseen malicious samples. Therefore, opcode sequences not only serve as a useful indicator of family-level similarity in static analysis but are also frequently used to construct behavioral models to improve the detection of malware variants.
2.2. Model-Agnostic Meta-Learning
MAML [
18] is a meta-learning strategy that aims to find a set of highly transferable initial model weights, enabling the model to quickly adapt and achieve good performance when facing new tasks with only a small number of samples and gradient update steps.
MAML employs a two-layer optimization architecture consisting of an inner loop and an outer loop, as illustrated in
Figure 3. In each training iteration, the algorithm randomly selects several classification tasks from a task pool, such as identifying different malware families. For each task
, the training data is divided into two subsets: a support set
for task-specific parameter tuning and a query set
for evaluating the tuned performance. In the inner loop, the model uses a small number of labeled samples from the support set to perform several gradient descent iterations starting from the initial weights to obtain task-specific temporary parameters. Then, in the outer loop,
is used for outer layer updates. The algorithm calculates the loss based on the model’s performance on the query set and backpropagates and adjusts the initial weights, making the model more adaptable to new tasks in the next iteration.
Through a two-stage training mechanism of rapid inner-layer adjustment and robust outer-layer optimization, MAML can maintain the initial parameters at a highly adaptive initial position and demonstrate strong transferability and learning stability in multi-task learning and very few-shot scenarios.
2.3. Energy Function
The Energy Function [
19] is a quantitative metric that measures a classification model’s confidence in an input sample. This method computes a scalar value, referred to as the Energy Value, by directly operating on the logits vector of the model output without applying Softmax normalization. The resulting value reflects the overall activation magnitude of the model output and serves as a proxy for the model’s confidence.
For an input sample
, the Energy Function is defined as Equation (1):
where
denotes the logit output corresponding to class
, and
is the total number of classes. This formulation computes the negative log-sum-exp of the logits, capturing the model’s overall response strength. When the model is highly confident in a particular class, the corresponding logit dominates the summation, resulting in a lower energy value, which indicates a clear prediction tendency. Conversely, when the logits are distributed more evenly across classes, the energy value increases, suggesting lower confidence in the prediction.
Previous studies have demonstrated that energy values can effectively distinguish in-distribution samples from out-of-distribution inputs, and can serve as a complementary mechanism for separating known and unknown samples.
2.4. Focal Loss
Focal Loss [
20] is a loss function specifically designed to address class imbalance, derived from Cross-entropy Loss. In practical applications of ransomware detection, the sample sizes of different families vary significantly, resulting in a typical long-tail distribution. Traditional Cross-entropy Loss is easily dominated by the majority classes in this context, leading to poor learning performance for the minority classes. Cross-entropy Loss is the standard loss function for supervised classification tasks, measuring the difference between the model’s predicted probability and the true label. For a single sample, its Cross-entropy Loss is defined as shown in Equation (2), where
is the total number of classes,
is the true label, and
is the model’s predicted probability for the
-th class.
Focal Loss introduces a class balance factor and a focus parameter on top of Cross-entropy Loss, enabling the model to handle imbalanced data more effectively. Its definition is shown in Equation (3), where
is the class balance factor, which adjusts the weights based on the sample size of different classes. When the number of samples in a certain category is sparse,
can be set to a higher value to strengthen the loss impact of that category.
is the focusing parameter, which controls the degree to which the model pays attention to difficult samples. When the model’s prediction confidence
for a sample is close to 1, the
term will significantly reduce the loss contribution of that sample, causing the model to focus its attention on samples that are difficult to classify correctly.
2.5. Center Loss
Center Loss [
21] is an auxiliary loss function specifically designed to address the feature dispersion problem in deep learning. In neural networks trained using only the Softmax loss function, the feature representations of samples within the same class are often too dispersed, which reduces the model’s classification accuracy and stability.
The core idea of Center Loss is to encourage samples of the same class to cluster in the feature space. Specifically, this method maintains a dynamically updated center point for each class and minimizes the distance between all samples of that class and its center. This mechanism effectively reduces intra-class distance, making the feature distribution of the same class more compact. For the detection sample set
, the mathematical definition of Center Loss is shown in Equation (4), where
represents the feature vector of the
-th sample, and
represents the center vector of the class
to which the sample belongs. During training, the class center is dynamically updated based on the latest sample features of that class.
By minimizing the intra-class distance, the feature representations of the same class become more concentrated, and the boundaries between different classes become clearer. This not only improves the accuracy of classification but also enhances the stability and reliability of the model when faced with new samples.
5. Conclusions
As ransomware continues to evolve, new families emerge with very few available samples, making it challenging for detection systems to maintain accurate recognition. Furthermore, the uneven distribution of sample sizes across families leads to biased classification results, where models tend to overfit to families with abundant samples while neglecting those with scarce data. While existing open-set recognition methods have improved detection capabilities, most approaches remain dependent on sufficient training samples to establish reliable rejection boundaries and fail to simultaneously address data imbalance, limited sample availability, and unknown family identification. To mitigate these challenges, we introduce the MEM framework, which integrates MAML, an Energy Function, and Focal Loss to develop a unified open-set recognition system based on static Portable Executable feature analysis. MAML enables the model to rapidly adapt to emerging ransomware families, allowing effective classification even with only a few labeled samples available. Meanwhile, the Energy Function provides a stable rejection mechanism for identifying unknown samples, ensuring that unseen ransomware families are correctly excluded from known classifications. Additionally, Focal Loss is incorporated to reduce the impact of imbalanced data distributions, ensuring that minority families with scarce samples receive adequate recognition during training. While our proposed method demonstrates strong performance in detecting unknown ransomware, several limitations warrant further discussion. Regarding computational overhead, the meta-learning framework involves a nested optimization process, resulting in higher theoretical training complexity than standard DNN-based models. In terms of scalability, since the task sampling mechanism of MAML primarily depends on the number of classes per task rather than the total number of classes, the impact of increasing the total family count on training overhead is minimal. Although maintaining independent energy thresholds for each known family introduces a marginal storage requirement as the number of families grows, the inference efficiency remains high because the process only requires a single threshold comparison against the predicted class. Lastly, this study currently focuses on static feature analysis for ransomware. Future work will aim to extend this framework to all types of malware detection, thereby enhancing the generalization and versatility of the model in broader malicious software identification scenarios.