Zero-Day Ransomware Attack Detection Using Static Portable Executable Header Features

Venčkauskas, Algimantas; Jusas, Vacius; Barisas, Dominykas

doi:10.3390/app151910576

Open AccessArticle

Zero-Day Ransomware Attack Detection Using Static Portable Executable Header Features

by

Algimantas Venčkauskas

,

Vacius Jusas

^*

and

Dominykas Barisas

Department of Computer Science, Kaunas University of Technology, LT-51390 Kaunas, Lithuania

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(19), 10576; https://doi.org/10.3390/app151910576

Submission received: 4 September 2025 / Revised: 25 September 2025 / Accepted: 29 September 2025 / Published: 30 September 2025

(This article belongs to the Special Issue Privacy-Preserving and System Security Control Based on Machine Learning)

Download

Browse Figure

Versions Notes

Abstract

Ransomware is one of the types of malware attacks that most severely affects financial institutions, since they cannot afford to lose their data or experience long-term disruptions. It is crucial for financial institutions to protect themselves from ransomware attacks. To fight zero-day ransomware attacks that are previously unseen attacks, we have presented a method that uses the static header features of portable executables. The method forms a comprehensive static feature set that includes the header fields of portable executables, count of dynamic link libraries (DLLs), DLL average, DLL list, function call average, and a measure of section content randomness. In order to make a compact feature set, a threshold was applied to three feature sets: portable executable header, DLL features, and section randomness. To determine the DLL average usage, the Tanimoto coefficient was applied to measure DLL similarity. The same procedure was applied to determine the function call average. The Chi-square test was applied to measure the section content randomness of portable executables. A stacking classifier was applied to measure the performance of the developed feature set. A publicly available dataset was used for the experiments. The results for the detection of zero-day attacks demonstrated averages of 97.15% accuracy, 98.06% recall, and 92.74% F-measure. When compared with other methods using the same dataset, our proposed method provided slightly better performance for many ransomware families.

Keywords:

ransomware; zero-day attacks; portable executable (PE); static features; stacking classifier

1. Introduction

The world is highly dependent on Internet-based technologies. Malware attacks try to disrupt the normal functioning of the Internet-based world. Consequently, research work to focus on developing security solutions to secure such a world is ongoing.

Ransomware is one type of malware attack. Its goal is to encrypt files, making those files useless. The victims are required to pay a ransom to gain access to the decrypted files. If the victims do not pay, they can lose all the data that was encrypted. Moreover, the attacker threatens to sell, auction, or publish the data on third-party sites in the case of the ransom not being paid [1]. According to the Sophos report of the year 2023 [2], 66% of 3000 organizations surveyed across 14 countries have been attacked by ransomware. The research revealed that the rate of ransomware attacks has remained the same as in the review of the previous year. The education sector was the most likely to have been attacked by ransomware in the year 2022 with 79% reporting. We can acknowledge this fact by the experience of our university. According to a 2024 report from cybersecurity company Sophos [3], 59% of 5000 organizations surveyed across 14 countries have been attacked by ransomware. While percentage of attacked organizations decreased, the number of surveyed organizations increased by almost double. Therefore, this decrease cannot be counted as a true decrease.

Financial institutions are particularly often targets of ransomware attacks according to a report from cybersecurity company Sophos [4]. A survey [4] of 592 IT and cybersecurity leaders in the financial services sector found that 65% of them had been affected by ransomware, up from 64% in 2023. Ransomware attacks in the financial sector have increased over the past two years, with only 48% of them having been affected by ransomware in 2020. Financial institutions are targeted because they cannot afford to lose their data or experience long-term disruptions, since they are the backbone of the modern economy [5]. They provide a large range of services, varying from savings to investment opportunities. Financial institutions handle large volumes of financial transactions and sensitive user data that can be stolen, which could cause identity theft and financial fraud. Their role is so fundamental that any disturbance to their operations can have a ripple effect across various sectors, affecting the daily lives of ordinary citizens. Therefore, financial institutions are subject to strict regulations [6]. A ransomware attack can lead to non-compliance with these regulations. Therefore, it is crucial for financial institutions to protect themselves from ransomware attacks.

Initially, a detection of ransomware relied on file signatures to identify malicious files [7]. A signature refers to a sequence of bytes uniquely representing a file. The signature is based on strings, code segments, hashes, and other patterns in the binary file. Such a method is widely adopted by antivirus software for its high precision and ease of implementation. However, this method faces challenges, since altering a few lines of code within the file changes the signature of the file. Recycled malware can easily escape signature-based detection. Creation of signatures for a modified or new malware is a time-consuming process, since an analysis, which can take several months, is required. In response, the researchers turned their attention to behavior-based detection to identify ransomware [8].

Behavior-based detection seeks to uncover the intended malicious actions of a file by analyzing it either at rest (static detection) or during execution (dynamic behavior). Machine learning (ML) enables recognition of malicious files by building ML models and comparing file behavior to known benign and malicious files. To build ML models, the features must be extracted. Static feature extraction approaches identify the static features of ransomware by analyzing its binary code [9]. Static detection does not require an execution of the ransomware code. Therefore, static detection is considered a secure approach. The features commonly used for static detection include n-grams opcodes, byte sequence, API calls, portable executable (PE) header information, and so on.

Dynamic feature extraction approaches require an execution of the ransomware code [10]. The features used for dynamic detection include system calls, API calls, changes in the entropy of inputs/output data buffers, and so on. Extracting dynamic features can be challenging because many types of ransomwares try to evade detection during their operation. Furthermore, it is unclear how to determine the appropriate duration for extracting dynamic features, as behavioral changes may not manifest for some time. As a consequence, dynamic detection significantly increases the time required for the detection in comparison with static detection.

The signature-based ransomware detection faces another problem, since it may not be effective for detecting zero-day attacks, which are previously unseen threats different from the known signatures [11]. Zero-day ransomware attacks exploit new vulnerabilities, which become known, but patches are not yet available. Such attacks present a serious threat to behavior-based approaches as well, since the training data becomes available only after the attack takes place. Therefore, the solutions of behavior-based approaches are needed, which can detect the zero-day attacks using previously acquired knowledge.

When we consider behavior-based approaches, it is possible to observe that dynamic feature extraction approaches require execution of ransomware code in an isolated environment. Such an approach faces two challenges: demand of significant time and computational resources and risk of ransomware leakage. The advantage of the static method is that it can detect ransomware before the program is launched, which allows for timely detection of potential threats, preventing system infection, and significantly saving time and resources [12]. Therefore, a development of static feature extraction approach, which can detect zero-day attacks, is an objective of the current research. The proposed approach analyzes portable executable (PE) header of ransomware samples and extracts the static features, since a feature extraction from the PE header is a relatively straightforward and fast process compared to the extraction of other static features [13]. ML then is applied to learn the extracted static features.

The contributions of this paper are as follows:

Providing a systematic literature review of the methods using PE static features;
Forming a comprehensive static feature set that combines many static different features;
Creating and implementing the method based on PE static features, which is capable detecting the zero-day ransomware attacks;
To the best of our knowledge, the Chi-square test was first applied to measure the randomness of the content of a PE section. The Chi-square test showed better performance than the commonly used Shannon entropy;
To the best of our knowledge, the Tanimoto coefficient was first applied to measure the similarity of the dynamic link libraries and function names in the field of ransomware. The formula of the Tanimoto coefficient does not involve calculating the square root of the sum of the square values of a vector. So, it is more computationally efficient than calculation of cosine similarity;
Using a stacking classifier since the ensemble classifier is performs better than the alone classifier;
Providing the experimental results on the publicly available dataset and comparing them with available results of other similar methods.

The remainder of this paper is organized in the following way: a review of the existing methods using static PE features to detect ransomware and malware, in general, is provided in Section 2. Section 3 presents the proposed method. Section 4 considers its implementation by delivering the results of the experiment and provides a discussion of obtained results compared with related works. Finally, Section 5 draws conclusions.

2. Review of Related Work

Recent research has shown progress in detecting ransomware based on the PE file header. The typical approach involves extracting features from the ransomware PE header and classifying them using methods of either ML or DL. Manavi and Hamzeh have published three works [14,15,16] in this field. Manavi and Hamzeh [14] proposed a method that extracted the first 1024 bytes of each PE file header and submitted them to a LSTM network for training. The dataset used in the paper included 1000 benign and 1000 ransomware samples. The authors declared that testing was performed on unseen samples; however, they did not explain how the unseen testing samples were obtained. Moreover, they did not clarify why the testing was performed on unseen samples. The standard metrics of accuracy, of recall, of precision, and of F-measure were measured. The total values of metrics for the whole dataset were provided. The values are similar for all the metrics and constitute around 93%.

Manavi and Hamzeh [15] developed a method based on a graph construction for each ransomware sample. The graph, which has 256 nodes, was constructed for each sample. The information of the graph was saved in a graph adjacency matrix with a size of 256 × 256. The features were extracted using the concept of eigenvector and eigenvalue of the matrix. The Power Iteration method enabled the finding of a dominant eigenvector of the obtained matrix. The resulting vector was submitted to a Random Forest ensemble learning classifier for training. Three datasets were used for the experiments. The second dataset was the same as in [14]. A 10-fold cross validation was used in the research. The results obtained for the second dataset are almost the same as in [14], since the provided total values of the four standard metrics are around 93%. In both cases, the values exceed 93%.

Manavi and Hamzeh [16] presented a method that extracted a vector with 1024 bytes from the PE header of each executable file. Each byte of this vector has a value from 0 to 255. The vector is converted to a 32 × 32 grayscale image using a zigzag pattern to increase byte continuity. The image is submitted to CNN network for training, since CNN is used widely for image recognition. Three datasets were used for the experiments. The first dataset was the same as in [14,15]. A 10-fold cross validation was used in the research. The results obtained for the first dataset are almost the same as in [14,15]. In all cases, the values exceed 93%, but they are very close. In addition, Manavi and Hamzeh made a comparison of testing a proposed network of PE header bytes and of whole file bytes. The accuracy obtained was higher for the PE header bytes than the whole file bytes. This result indicates that the analysis of PE headers must be preferred over the analysis of the whole file.

A very similar method to Manavi and Hamzeh [16] was presented by Moreira et al. [17]. The proposed method extracted a PE header with 1024 bytes and converted it to a color image using four distinct patterns: sequential, zigzag, spiral, and diagonal zigzag. For classification, a special already defined Xception model of CNN network was used. Two datasets were used for the experiments. The first dataset was the same as in Manavi and Hamzeh [16], with 1000 benign and 1000 ransomware samples. The second dataset was built by the authors. The second dataset contained 1134 benign and 1023 ransomware samples, grouped into 25 ransomware families. We would like to express gratitude to Moreira et al. [17] since they made this dataset publicly available. As Cen et al. [18] noticed in their survey, there is no currently available standard dataset for ransomware classification unlike in other fields such as image processing and malware classification. The dataset built by Moreira et al. [17] can become a standard dataset for ransomware classification. Moreira et al. [17] used 10-fold cross validation in their research. The accuracy obtained for the first dataset was 93.73%. It is a little higher than the accuracy obtained by Manavi and Hamzeh [16], but it still did not reach 94%.

Considering all four reviewed research works [14,15,16,17] so far, we can make the following observation. All the authors used the same full 1024 bytes vector of PE file header, but different techniques of transformation of this vector, and different techniques of classification. However, almost the same result of classification was obtained in all the cases. Such an observation enables us to make a conclusion that transformation techniques of the vector and classification techniques are not important. The most important element is the initial feature vector.

Unlike Manavi and Hamzeh [14,15,16], Moreira et al. [17] considered zero-day ransomware attack detection. To detect zero-day attacks, Sgandurra et al. [19] evaluated the detection of each family in the considered dataset, separating all samples of one family for testing and using all remaining ransomware and goodware to train the model. Moreira et al. [17] noticed that the applied methodology to detect zero-day attacks [19] is not the correct one, since the value of recall metrics always repeats the value of accuracy metrics. Such a result, where accuracy and recall values are equal, can be observed in all of Manavi and Hamzeh’s research studies [14,15,16]. Moreira et al. [17] improved this testing methodology by adding randomly selected goodware samples to the test set with the same number of samples present in the evaluated family. Zero-day attack detection was performed for the second dataset only. The obtained mean accuracy of all ransom families in the mode of zero-day attack detection was 93.84%, which is much less than the accuracy 98.20% obtained without this mode. So, there is room for improvement to detect zero-day ransomware attacks.

Unlike considered research works [14,15,16,17], other research works [20,21,22,23,24], which investigated ransomware detection and were using PE header for feature extraction, did not use the whole 1024 bytes vector. Vehabovic et al. [20] investigated a solution for ransomware detection and constructed “minimalistic” ransomware dataset having 100–120 training samples per class. However, such a dataset can hardly be considered as minimalistic, since Moreira et al. [17] constructed a dataset with 13–50 ransomware samples per class and did not call it “minimalistic”. The feature sets extracted were really minimalistic. A total of four feature sets were created by forming vectors with 5, 7, 10, and 15 parameters. Each subsequent vector extended its predecessor by adding new parameters. The exact parameters were chosen after attentive investigation with parameters of the following sections: File Header, Optional Header, and Section Header. Vehabovic et al. [20] declared early ransomware detection; however, the main attention was concentrated on ransomware detection. Surprisingly, the classifiers of Random Forest and extreme gradient boost obtained an accuracy of more than 90% for a feature set with only five features. Finally, the zero-day ransomware detection was performed according to the Sgandurra et al. [19] model, where one ransomware class is excluded from the training and it is used for the testing. The obtained results were not optimistic, since accuracy higher than 90% was obtained for four classes only from the total number of nine classes. Again, the best performing classifiers were Random Forest and extreme gradient boost.

Deng et al. [21] presented a method using 14 features from PE header static features to detect early ransomware attacks. The features from two sections, File Header and Optional Header, were used. The developed method was the first to apply deep reinforcement learning on the PE header static features to detect ransomware. A dataset containing 27,118 goodware samples and 35,367 ransomware samples was built. This is a large dataset; however, the dataset is imbalanced. An imbalanced dataset presents an issue to the learning-based classifiers, since they are biased towards the majority class [25]. Mean accuracy of ransomware detection on this dataset was 97.9%. For ransomware early detection, a testing of unseen samples, contained in the second dataset, was performed, when training was carried out on the first dataset. The second dataset included 688 ransomware samples, and this dataset was built in the year 2016. The obtained mean accuracy of unseen ransomware samples was 99.3%. This result contradicts the results obtained in the considered works [17,20], when the accuracy of unseen samples was lower than of trained samples. Moreover, the value of obtained mean accuracy is very large. The very optimistic result can be explained by the following reasons. Firstly, the dataset of unseen samples was constructed in the year 2016, when training was performed on the latest achievements of ransomware samples. So, the unseen samples were more primitive than the latest samples. Secondly, the training dataset was very large and imbalanced. Thirdly, Deng et al. [21] declared that they will explore a zero-day ransomware detection in the future. This means an admission of authors that such a mode of testing is not suitable for detecting zero-day ransomware.

Cen et al. [22] proposed a method based on zero-shot learning to detect early zero-day ransomware attacks. The developed method was the first to apply zero-shot learning on the PE header static features to detect zero-day ransomware. A feature set included 87 features. A dataset to assess the method was provided by Sgandurra et al. [19]. The dataset included 942 benign software and 582 ransomware that is a highly imbalanced dataset. To detect zero-day ransomware, the training dataset included seven different ransomware classes, while the testing dataset included four varying ransomware classes. However, the algorithm to vary ransomware classes in testing is not presented and it is not discussed. The mean accuracy of zero-day ransomware detection was 96.02%. The shortcomings of the presented approach are as follows. Firstly, the used dataset is outdated. Secondly, the dataset is highly imbalanced; however, a cross-validation was not applied. Meanwhile, Zahoora et al. [26], which proposed a method using zero-shot learning to detect zero-day ransomware, used 5-cross validation. Thirdly, it is not clear whether it was a variation of tested ransomware classes or not.

Moreira et al. [23] introduced a method that combined various structural features extracted from PE header static features to detect zero-day ransomware attacks. The authors determined that N-gram features are unsuitable to detect zero-day ransomware attacks. A dataset, which is publicly available, used for the experiments was obtained from [17]. The dataset then was augmented by adding samples from 15 recent ransomware families and 133 benign samples. This new part of dataset was made public, as well. This new part of the dataset was used for the testing of zero-day ransomware attacks. The mean accuracy for training dataset was 98.41%. The mean accuracy for testing the dataset was 97.53%. The obtained values of the accuracy are sufficiently large; however, there is still room for an improvement since the used set of features is quite large, as well.

Yang et al. [24] provided a method to detect the ransomware attacks. A feature set included all the features of N-gram, dynamic link libraries (DLLs), subsystem, and entropy. The model combining all the features was 492,253. This means that the feature set is very large. A dataset for the experiments was custom built. It included 1200 ransomware samples and 1200 benign samples. Cross validation was used in testing ransomware samples. The mean accuracy of ransomware detection was 99.77%. The obtained value of accuracy is very large; however, several limitations of the presented approach can be noticed. Firstly, the very large number of features increases computation time. To ease the computational burden, a computer GPU is needed. The approach is not oriented to zero-day attack detection. Several shortcomings of the presentation can be spotted, as well. Firstly, the term “CNN” was mentioned only three times in the paper, in the title, in the section of related work, and in the section of system structure. A structure and parameters of CNN were not considered and were not presented, “because of the complexity of network layers and structure, we used symbolic structure representation”. Secondly, an inaccuracy is spotted concerning the statement of data availability “No datasets were generated or analyzed during the current study”, since they assembled 1200 samples of ransomware from 80 different families to construct a dataset for the experiments. Such hiding of information means that the experiments cannot be replicated.

To summarize the review of related works, they can be grouped according to several indicators. One of the most distinguishing indicators is the formation of the feature set for classification. Four research works [14,15,16,17] used the same full 1024 bytes vector of PE file header, but different techniques of transformation of this vector, and different techniques of classification. However, in all cases, almost the same classification result was obtained. Such an observation leads us to conclude that using a 1024-byte PE file header vector for a feature set is not a promising research direction. The more perspective direction is to use the various combinations of PE file header static features. Another challenging indicator is the dataset used to evaluate the proposed method. We and other researchers [5,10,18,27] have noticed that there is no currently available standard dataset for ransomware classification. Therefore, many researchers [14,17,20,21,23,24] have been developing their own dataset. However, a comparison of obtained results of different works cannot be correct when the datasets are different. Only Moreira et al. [17,23] made their developed dataset publicly available. This dataset has the potential to become a benchmark dataset for ransomware detection. However, no researchers have noticed it yet. We will support the use of the dataset provided by Moreira et al. [17,23]. Finally, the zero-day ransomware attack detection [14,17,20,22,23] is an important indicator. The testing methodology of zero-day attacks proposed by Moreira et al. [17], when randomly selected benign samples with the same number of samples present in the evaluated family are added to the test set, is preferred, since this methodology ensures the correct values of accuracy and recall metrics.

3. Proposed Method

3.1. Method Overview

In this section, the design of the method is described. Based on the previous section’s explanation of related works and techniques, our method is proposed to detect zero-day ransomware attacks using a static analysis. The static analysis can be vulnerable to deceptive techniques such as packaging, obfuscation, and polymorphism [18]. However, a comprehensive static analysis is less susceptible to deceptive techniques because it is difficult for malicious developers to apply these evasion techniques to multiple structural features [13]. Therefore, we constructed a comprehensive static feature set by combining several different static features into a single feature set. Section analysis is particularly relevant when dealing with obfuscated or packed binary files [28]. Examining the randomness of each section of a binary file is one of the methods for determining such content. High randomness values are often associated with encryption and lossless compression functions; therefore, its value is high when the section contains obfuscated or packed code [28].

Figure 1 demonstrates the workflow of the proposed static analysis method.

Firstly, a valuable structural information is extracted from each PE sample. This information includes the PE header fields, imported DLLs features, function calls, and randomness of section contents. The inclusion of PE header fields, DLLs features and function calls is quite obvious, since these features characterize the software resource. Inclusion of section randomness is intended to detect encrypted and packed malicious files. Therefore, the section randomness can play an important role in identifying malicious behavior.

To improve the efficiency of the method, the initial feature set is decreased by applying the method of threshold. The use of a threshold enables elimination of features with little information. Several types of threshold, depending on feature set, were applied. A standard deviation (SD) threshold was applied to the PE header. A mean difference threshold was applied to the usage of DLLs in ransomware and benign software. A count difference threshold was applied to the usage of sections in ransomware and benign software. The resulting feature sets are then joined into a single comprehensive static feature set.

To measure the performance of the proposed approach, we employed common machine learning metrics used for binary classification. They are as follows: accuracy, precision, recall, and F-score. These metrics are defined in terms of four values: True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN). TP denotes the number of ransomware samples correctly classified as ransomware; TN represents the number of benign samples correctly classified as benign; FP indicates the number of benign samples incorrectly classified as ransomware; and FN represents the number of ransomware samples incorrectly classified as benign. The recall metric was used as the primary metric for model comparison since it is the most important measure for detecting ransomware. To reveal the variability of the performance results, a confidence level of 95% was chosen and a confidence interval was calculated.

We applied the zero-day approach for each feature set. The application of this approach consisted of two stages: (1) no samples of tested family were used during model training; (2) the benign samples were randomly selected and included in the test set in the same number as the family being tested. Therefore, when one family was separated for testing, the remaining ransomware families were used for training. When training was complete, the tested family was increased by adding the randomly selected benign samples in the same number as the family being tested. Some ransomware families had fewer than 15 samples, so the test set was increased by adding benign samples to have no less than 30 samples to maintain a large sample size in order to preserve the ability to calculate the confidence interval [29]. In this case, we obtained some classes that are not completely balanced. To solve the issue of class imbalance, we performed ten iterations of stratified K-fold cross validation with K = 10. Following this, the assessment of each family was repeated ten times according to the rules of the zero-day approach, and the mean value of each measured metric was calculated.

To test a specific ransomware family, the equal number of randomly selected benign samples is added to the test dataset. The remaining ransomware families are used for training a classifier. A stacked classifier is used for classification.

Next, we introduce a dataset used for experimentation and we provide the details on each step of the method.

3.2. Dataset

The proposed method was evaluated using the dataset developed by Moreira et al. [23]. This dataset consists of two parts, training and testing. The training part of the dataset includes 1023 ransomware and 1134 benign software. The testing part of the dataset includes 385 ransomware and 133 benign software. The ransomware of the training part can be divided into 25 different families. The ransomware of the testing part can be divided into 15 different families. The names and the number of samples of ransomware families are provided in Table 1.

3.3. PE Header

PE is the standard file format for executable files in Microsoft Windows OS. PE file covers wide range of different formats such as .exe files, batch files (.bat), device drivers (.sys), dynamic link libraries (.dlls), and several others. The PE file contains many specific characteristics that enable to identify this file. Therefore, these characteristics can be used successfully to distinguish between benign and ransomware samples. The PE file contains two main parts: the header and the sections [30]. The header part includes the following sections: DOS Header, NT Headers, File Header, Optional Header, and Section Table. The PE header part has a permanent structure; meanwhile, the section part is closely related to the function of the file [31]. We will directly use fields from PE header only. Initially, we selected 85 numerical features from PE header. During experimentation, we observed that the values of some features are constant for ransomware and benign samples. If the values do not vary, they do not present information for classification. The features with constant values are redundant. They present only noise, and they should be removed. To identify a variability of the values, a standard deviation can be used, since it is a measure of the variance of the values of a variable around its mean. The value 0 of standard deviation indicates that all the values of the variable are equal. Such a variable does not include information that can be used to differentiate between different samples. Therefore, we applied the SD threshold to remove the features with an SD equal to 0. Eight features were removed. The remaining features of PE header are presented in Table 2.

3.4. DLL Features and Function Calls

The section part of the PE file contains many specific sections. One of these sections is .idata section that contains one entry for each imported dynamic link library (DLL). DLLs contain many subroutines to perform common actions. The subroutines are grouped according to the performed actions, and they are joined into a specific DLL. DLLs possess the specific property that they are loaded into memory whenever needed and released from memory whenever they are no longer needed. Therefore, DLLs are an important part of a program because they ensure efficient use of available memory and resources through dynamic linking capabilities. It is possible, but it is difficult to imagine that a program would not use DLLs. Consequently, certain characteristics of a program can be determined from the set of DLLs it uses. Analysis of the import table of the .idata section can help identify ransomware.

A total of 576 different DLL instances were found in the dataset used. This is a fairly large number of instances, if every instance were to be considered a separate feature. We decided to form the integrated characteristics of DLLs. Firstly, we counted the total number of different DLL instances used for each dataset sample separately. Next, we decided to determine similarity of DLLs used between separate samples. A common measure of similarity between two vectors is the cosine index [32,33]. To measure the similarity between two sets of DLLs, they have to be converted into two vectors. It is possible to use a simple conversion form. The entry of the vector is assigned the value 1, if the DLL is present, the value 0 is assigned in the opposite case. When the vectors hold the binary values, the cosine similarity index can be interpreted in terms of common attributes. A simple variant of cosine similarity can be applied to this scenario. This variant is called the Tanimoto coefficient [34], and it is defined as follows:

s i m (x, y) = \frac{x \cdot y}{x \cdot x + y \cdot y - x \cdot y}

(1)

The Tanimoto coefficient defines the ratio of the number of attributes shared by x and y to the number of attributes possessed by x or y. In such a way, the similarity indices are calculated between a specific sample and all the remaining samples of the dataset. The vector of values is obtained for each sample. To measure a central tendency of vector values, a median is calculated. The median is less affected by outliers and skewed data than the mean.

The last component of DLL integrated characteristics is a list of DLLs selected according to the threshold criterion. The idea is to select the DLLs that are the most frequently used either by ransomware or benign software, but not both. If the DLLs are frequently used by ransomware, they allow us to distinguish ransomware. If the DLLs are frequently used by benign software, they allow us to distinguish benign software. If the DLLs are frequently used by ransomware and benign software, they do not allow to distinguish ransomware from benign software. To implement the idea, a matrix is formed. The samples correspond to the rows in the matrix. The used DLL corresponds to the columns of the matrix. The entry of the matrix is assigned the value 1, if the DLL is used for the specific sample that corresponds to the particular row, the value 0 is assigned in the opposite case. Such an assignment of the values is performed separately for ransomware and benign samples. A value of mean is calculated for every DLL in use for ransomware and benign software. This value is a relative measure of the frequency of DLL use in samples. The obtained mean values are then compared between the same DLL of ransomware and benign software. The inclusion criterion is a mean difference higher than 0,1 between the use of DLL in ransomware and benign software. It is the least possible difference value between the uses of the same DLL in ransomware and benign software. The selected DLLs are provided in Table 3. A total of 40 DLLs were selected according to the defined threshold.

The considered DLLs are just names that connect functions. The main operating entities are functions. A program can be distinguished from others by the function calls that are called to perform a specific action. The list of function calls can reveal the behavior of the program. The collected list of function calls can further improve the capability to distinguish between benign software and ransomware. A total of 9961 different instances of function calls were identified. This is a very large number of features, if to consider every instance as a separate feature. We decided to form a single integrated feature of all function calls. We formed it in the same way as for DLL names using Tanimoto coefficient to determine similarity between to separate samples of function calls.

3.5. Section Randomness

The section part of PE file contains many sections. Some of them are as follows: executable code section (.text), resource handling section (.rsrc), exception handling section (.pdata), and many others. The section part is closely related to the function of the file. Therefore, the number of sections varies quite substantially in different files. Some of sections like .text, data, .rsrc are present in many files. Some of the sections like .ndata, .bss, .itext are present in a few files. To measure the randomness of section contents all the reviewed works, [22,23,24] used Shannon entropy. However, several studies [35,36,37,38] have stressed that using Shannon entropy does not enable us to distinguish successfully between compressed and encrypted files since both types of files demonstrate similar values. Other standard mathematical calculations are Chi-square test, Kullback–Leibler distance, serial byte correlation, which can be used to measure randomness of information [39]. Davies et al. [39] made the conclusion after a comparison of several methods that the results from the Chi-square test produced the highest accuracy. Moreover, Palisse et al. [36] and Arakkal et al. [38] used the Chi-square test to measure a randomness of the file contents instead of Shannon entropy. Having in mind the provided reasons, we decided to measure randomness of section contents using Chi-square test.

The Chi-square (X2) test is a statistical test of the accuracy of a distribution. It measures how closely an observed distribution is statistically similar to the expected distribution [40]. It is a non-parametric statistical test, which means that no assumptions are made about the distribution of the samples. The observed data sequence is considered discrete and it is arranged in a frequency histogram [0,255]. The Formula (2) for calculating the chi-square test is as follows:

x^{2} = \sum_{i = 0}^{255} \frac{{{(O}_{i} - E_{i})}^{2}}{E_{i}}

(2)

where O_i is observed value, and E_i is expected value.

Our method was able to identify 116 different sections from the training data. To be consistent with zero-day testing methodology, only specific sections of the training part of the dataset were included. Some sections are present only in a few samples. We decided to apply the threshold value for the inclusion of the sections into feature set. The threshold value is a count difference between the samples of ransomware and benign software for the specific section. We are targeting zero-day ransomware. The specific sections present in just a few samples are of interest. Therefore, the threshold value was chosen to be 10. The selected sections are presented in Table 4. A total of 32 sections were selected according to the defined threshold. We can observe that the chosen sections constitute only 28% of the initial set of sections.

It needs to be noticed that we have checked the sections of the testing part of the dataset, as well. Some new sections, which were not present in the training part of the dataset, were observed. The sections of the testing part had an influence on the general frequency of use of the sections. If we consider the sections of both parts of the dataset, the selected sections could differ slightly.

3.6. Choice of Classifier

When features are selected, the next step in the classification process is to choose a classifier. A common approach to solve this challenge is to employ several typical ML and DL classifiers and to choose the best performing classifier to detect new ransomware variants. However, many research works [23,41,42,43] have proved that ensemble learning techniques perform better than an alone classifier. The performance of ensemble classifiers depends on the chosen base classifiers and how they are combined to produce a final classification result. The three best-known ensemble learning methods [44] are bagging [45,46], boosting [47,48], and stacking [49,50]. We have chosen to use the stacking ensemble method since it is a generic framework that combines many ensemble methods, either homogenous or heterogenous. The stacking ensemble method employs two levels of learning, base learning, and meta-learning. In the base learning, the base classifiers are trained with the training dataset. After training, the base classifiers create a new dataset for the meta-classifier. The meta-classifier then is trained with dataset formed by the base classifiers. The trained meta-classifier is used to classify the test set. The main difference between stacking and other ensemble methods is that during stacking, a meta-level learning-based classification is applied as the final classification. Maniriho et al. [50] used support vector machines (SVM), logistic regression (LR), stochastic gradient descent (SGD) as base classifiers. CatBoost classifier was used as a meta-classifier. We will start our stacking ensemble method using this combination of classifiers, since Maniriho et al. [50] were successful in detecting new malware attacks. Moreover, Hancock and Khoshgoftaar [51] announced in their review that the CatBoost classifier was the most successful when comparing CatBoost, LightGBM, SVM, and logistic regression in a multi-class and binary classification task for identifying computer network attacks. Of this combination of classifiers, the least known classifier is SGD. Therefore, we will explore the possibility of changing this classifier by other more frequently used classifiers. These classifiers include Random Forest (RF) [23,33,42], gradient boost (GB) [33,42], and K-nearest neighbor [22,43].

4. Experiments and Discussion

4.1. Choice of the Base Classifiers

The classifiers are used at the last stage of the process. However, they form and deliver the final result of the assessment. Therefore, they have to be chosen first. In the design of the method, we already have announced that the initial stacking classifier will join SVM, LR, SGD as the base classifiers, and the CatBoost classifier will be used as a meta-classifier. The weakest link of this set of classifiers is the SGD classifier, since it is the least used classifier in the research literature. We explored the possibility of changing this classifier to other more frequently used classifiers. These classifiers include RF, GB, and KNN. The CatBoost classifier possesses the features needed to be as the meta-classifier [52] since it combines a collection of base learners. Furthermore, it is recognized as a classification algorithm capable of achieving excellent performance results because it uses symmetric decision trees.

The results of the experiment to choose base classifiers for stacking classifier are provided in Table 5. The used metrics of the assessment are recall, since this metric defines the quality of the ransomware evaluation. The feature set includes all our considered features. They are as follows: PE header, DLL count, DLL similarity average, DLL list, function similarity average, Shannon entropy. Attention should be brought to fact that the common algorithm of cosine similarity was used for the similarity definition. The Shannon entropy was used to define the randomness of the sections. The confidence interval was not calculated.

We can observe from Table 5, as expected, the worst performance was shown by the stacking classifier that included SVM, LR, and SGD. The performances of other classifiers are quite similar. The differences are small. Nevertheless, the stacking classifier, which includes SVM, LR, and RF, showed slightly better results than the others. This stacking classifier is our choice for further experiments.

4.2. Comparison Between Cosine Similarity and Tanimoto Coefficient to Choose a Similarity Measure

Cosine similarity measures the similarity between two vectors. This is a common measure of the similarity, and it is used very widely in different fields. The formula for calculating cosine similarity involves calculating the square root of the sum of the square values of a vector. So, such a calculation requires additional resources. When the vectors contain binary values only, the simpler variation of cosine similarity, which is Tanimoto coefficient, can be applied. The formula of Tanimoto coefficient does not involve calculating the square root of the sum of the square values of a vector. So, it is more computationally efficient than cosine similarity.

The results of the experiment to compare applying cosine similarity and Tanimoto coefficient to measure similarity among used DLLs are shown in Table 6. The metrics used for the assessment is recall. The feature set includes the following features: PE header, DLL count, DLL similarity average, DLL list, function similarity average. We measure the effect of similarity in the context of several feature groups, since we are interested in how the similarity measure fits in the larger context because these similarity measures will be used in this context.

Table 6 shows that the average performance of the Tanimoto coefficient (97.62) is slightly lower than that of the cosine similarity (97.65). However, the average margin of error of Tanimoto coefficient (0.34) is less than the average margin of error of cosine similarity (0.53). This means that the results are more stable when using Tanimoto coefficient. We conclude that the use of the Tanimoto coefficient instead of cosine similarity enables us to obtain the same performance and the lesser margin of error.

4.3. Comparison Between Shannon Entropy and Chi-Square Test to Measure Randomness

The experiment to compare the performance of Shannon entropy and of Chi-square test was performed on testing part of the dataset. As we already know that 7 of the 15 families have fewer than 15 samples. Therefore, the test set was increased by adding benign samples to have no less than 30 samples. The results of the experiment are shown in Table 7. The metrics used for the assessment is recall. The feature set includes the following features: PE header, DLL count, DLL similarity average, DLL list, function similarity average, Sections entropy. We measure the effect of randomness of sections in the context of the whole feature set, since we are interested in how the randomness measure fits in this context. The sections used are shown in Table 4. We would like to provide a reminder that the sections from the training part of the dataset only are included in Table 4.

Table 7 shows that the average performance of the Chi-square test (96.27) is significantly higher than that of the Shannon entropy (93.82). Moreover, the average margin of error of Chi-square test (0.67) is significantly less than the average margin of error of cosine similarity (1.01). This means that the results are more stable when using Chi-square test. We conclude that use of Chi-square test instead of Shannon entropy enables us to obtain better performance and a lesser margin of error.

4.4. The Results of Ransomware Zero-Day Detection

In the previous subsections of Section 4, we justified the choices of our proposed method using the results of the different experiments. In this subsection, we will provide the results of experiments applying the proposed method. The results will be provided in the expanding order of the feature set. The results of the experiments of the training part of the dataset are provided in Table 8. The upsampling method, which was introduced in Section 3.1, is first applied to the training part of the dataset. However, only 2 of 25 families had fewer than 15 samples in the training part of dataset. The value of recall metrics is used. The feature set is expanded step by step and the corresponding results of the expanded feature set are provided. For example, the second column includes only the results of the PE header. The third column already provides the results of the feature set that includes PE header and DLL count, and so on. In such a way, we observe the performance benefit of a specific part of the feature set to the overall result. The last column shows the overall performance of the whole feature set. It is like a performing ablation experiment but in the opposite direction.

We can observe that the average value of the whole feature set of all families is less than the average recall value of the feature set of all families, which does not include section entropy calculation. This value decrease was mostly influenced by three ransomware families: DoppelPaymer, Makop, and Zeppelin. We would like to provide a reminder that the choice of the entropy calculation was made on the results of testing part of the dataset. We can admit that the results of the feature sets of training and testing parts do not show the same tendency. However, the results of the testing part of the dataset are more important than of the training part of the dataset, since the results of the testing part of the dataset shows the readiness of the method to detect unknown ransomware.

Next, we can observe that the average recall value of the whole feature set is little bit less than the average recall value of the feature set that includes only the PE header. This is not the result that we would appreciate. However, the variation in the values among different experiment trials are lesser in the last column (0.55) than in the column of PE header (0.91). This is a positive achievement, since this result shows the higher stability of the results when the feature set includes more different features in comparison with the feature set containing only the PE header.

In addition, we can observe that expanding the feature set, except for function average, does not improve the average value of the all families. The average value is kept almost stable; however, if checking for specific ransomware families, there are different variations in recall value for different ransomware families. Again, we can state the same previously announced conclusion that the results when the feature set includes more different features in comparison with the feature set that includes a lesser number of features guarantee the stability of the obtained values.

The results of the testing part of the dataset are shown in Table 9. The upsampling method, which was introduced in Section 3.1, is applied to the testing part of the dataset. In line with the proposed zero-day method to test the ransomware families of the testing part, only the ransomware families of training part were used for the training. The sample size increase was particularly relevant for the testing part of the dataset, since 7 of the 15 families had fewer than 15 samples. Moreover, the benign samples added to the testing set were taken from the testing benign part that was not used to train the classifier. Consequently, the precision metrics of testing part families may obtain 100% only in rare cases.

We see that recall metrics show very good results, since 100% is achieved for 12 families out of 15 families. If we consider the average value, we observe that the highest value is obtained when the feature set includes features of PE header, DLL count, DLL average, and DLL list. However, the least margin of error is obtained when the full feature set is applied. Furthermore, we see that the margin of error is 0.00 for the family Clop in this case, even though the highest recall value of 100% is not obtained. Such a result shows a stability of the results. Therefore, we can conclude that the presented method fulfilled our expectations.

4.5. Discussion

The results for the dataset of our choice were presented in two papers [17,23]. However, Moreira et al. [17] used only the training part of the dataset. Moreover, the results of the recall metrics were not presented in the table, but in the figure. However, the figure does not include the results of the confidence interval. Therefore, we compare the results of F-score and accuracy metrics for the training part of the dataset (Table 10). We compare the results of F-score, accuracy and recall metrics for the testing part of the dataset (Table 11). The F-score metric expresses the results of the precision and recall metrics directly combined. The accuracy metric expresses the results of the implicitly combined precision and recall metrics. So, the results of the recall metric are included in these two metrics, and they reflect the values of the recall metrics.

We can observe the obtained average values in [17] are much less than in [23] and with our method. The obtained average values of accuracy and F-score are almost the same in [23] and with our method. However, our method possesses a distinguishable feature in that it obtained 100% accuracy and 100% F-score values for 10 families out of 25 families. Meanwhile method [23] obtained 100% for the single ransomware family only. This result shows a trustworthiness of our proposed method. The method guarantees stable results for 10 ransomware families out of 25 families.

The results of the testing part of the dataset of our choice were presented in [23] only. We compare the results of F-score, accuracy, and recall metrics (Table 11).

We can observe that our method obtained a higher average recall value than a method used in [23]. The recall metrics are the main characteristics to show the ability to detect ransomware. So, this obtained result shows the value of our proposed method. The method [23] obtained 100% recall value for 13 families out of 15 families. Our proposed method obtained 100% recall value for the same 12 families out of 15 families but not for the family BlackBasta. Margin of error of our proposed method for the family BlackBasta is 4,29, which is quite high. If we look at the intermediate results of the experiment, we see that our proposed method for the family BlackBasta obtained 100% recall value in 6 trials out of 10 trials. We see that our method is able to obtain 100% recall value; however, the result is not stable.

The next advantage of our proposed method over the method [23] is that our proposed method uses much smaller feature set than the method [23]. The use of the compact feature set by our method ensures a performance advantage for real-time applications.

To summarize, the advantages of our proposed zero-day attack detection method are as follows:

Comprehensive feature set that combines many different static features;
Using Tanimoto coefficient to measure the similarity between DLLs since it is more computationally efficient than commonly used cosine similarity;
Using the Chi-square test to measure section randomness since it enabled obtaining better performance and a lower margin of error than commonly used Shannon entropy;
Using a stacking classifier since it is a generic framework that enables combining different classifiers and it is performing better than alone classifier.

Finally, we can acknowledge one limitation of our method. The main goal of our method is to detect the ransomware as early as possible. In this case, the recall metric is the most important one. Our method prevailed over other methods in this metric. However, the obtained average values of accuracy and F-score are less with our method than with the method [23]. It means that our method is not so good in distinguishing benign software from ransomware samples.

5. Conclusions

The main direction of the research to detect ransomware attacks is a behavior-based approach. The behavior-based approaches can be static, dynamic, and hybrid that combines both static and dynamic. The static approach requires the least amount of resources and it is the most secure approach, since it does not require the direct execution of the ransomware code. The advantage of the static method is that it can detect ransomware before the program is launched, which allows for timely detection of potential threats, preventing system infection, and significantly saving time and resources. Therefore, we have developed a static feature extraction approach that can detect zero-day attacks. The proposed approach analyzes the PE header of ransomware samples and extracts the static features, since a feature extraction from the PE header is a relatively straightforward and fast process compared to the extraction of other static features. The approach forms the combined comprehensive static feature set that includes the PE header fields, DLL count, DLL average, DLL list, function call average and a measure of section contents randomness. To determine DLL average usage, the similarity of the used DLLs was measured between two samples. To the best of our knowledge, the Tanimoto coefficient was first applied to measure the similarity of the DLLs in the field of ransomware. The formula of Tanimoto coefficient does not involve calculating the square root of the sum of the square values of a vector. So, it is more computationally efficient than calculation of cosine similarity. The performed experiments allow us to conclude that use of the Tanimoto coefficient instead of cosine similarity enables obtaining the same performance and a lower margin of error. The same procedure was applied to measure the usage of function call average.

To the best of our knowledge, the Chi-square test was first applied to measure the randomness of the content of a PE section. The experiments conducted allow us to conclude that using the Chi-square test instead of Shannon entropy enables obtaining better performance and a lower margin of error.

When the feature set is constructed, machine learning is then applied to learn the extracted static features. We have chosen to use a stacking ensemble method since it is a generic framework that combines many ensemble methods either homogenous or heterogenous. The stacking ensemble method employs two levels of learning, base learning, and meta-learning. For the base classifiers, we have used three classifiers, support vector machines, logistic regression, and Random Forest. CatBoost classifier was used as a meta-classifier. The comparison of the obtained results showed that our method had a full advantage over the method that used feature set in the form of image and used convolutional neural network for classification. For many ransomware families, our method showed slightly better performance over the method that formed combined feature set in the similar way as our method and used ensemble method with soft voting for classification. However, our feature set was much smaller since we have applied threshold values to include a feature into the feature set. The comparison of the results was performed on the same publicly available dataset.

Author Contributions

Conceptualization, V.J. and A.V.; Methodology, V.J.; Software, D.B.; Validation, V.J., D.B. and A.V.; Formal Analysis, V.J.; Investigation, V.J.; Resources, V.J. and D.B.; Data Curation, V.J. and D.B.; Writing—Original Draft Preparation, V.J.; Writing—Review and Editing, V.J.; Visualization, D.B.; Supervision, A.V.; Project Administration, A.V.; Funding Acquisition, A.V. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Economic Revitalization and Resilience Enhancement Plan “New Generation Lithuania” as part of the execution of Project “Mission-driven Implementation of Science and Innovation Programmes” (No. 02-002-P-0001).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data ser is available publicly [23].

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ispahany, J.; Islam, R.; Islam, Z.; Khan, M.A. Ransomware Detection Using Machine Learning: A Review, Research Limitations and Future Directions. IEEE Access 2024, 12, 68785–68813. [Google Scholar] [CrossRef]
Sophos. The State of Ransomware 2023. 2023. Available online: https://assets.sophos.com/X24WTUEQ/at/c949g7693gsnjh9rb9gr8/sophos-state-of-ransomware-2023-wp.pdf (accessed on 27 March 2025).
Sophos. The State of Ransomware 2024. 2024. Available online: https://assets.sophos.com/X24WTUEQ/at/9brgj5n44hqvgsp5f5bqcps/sophos-state-of-ransomware-2024-wp.pdf (accessed on 27 March 2025).
Mahendru, P. The State of Ransomware in Financial Services 2024. Available online: https://news.sophos.com/en-us/2024/06/24/the-state-of-ransomware-in-financial-services-2024/ (accessed on 27 August 2025).
Darem, A.A.; Alhashmi, A.A.; Alkhaldi, T.M.; Alashjaee, A.M.; Alanazi, S.M.; Ebad, S.A. Cyber Threats Classifications and Countermeasures in Banking and Financial Sector. IEEE Access 2023, 11, 125138–125158. [Google Scholar] [CrossRef]
Akinbowale, O.E.; Klingelhöfer, H.E.; Zerihun, M.F. Analysis of cyber-crime effects on the banking sector using the balanced score card: A survey of literature. J. Financ. Crime 2020, 27, 945–958. [Google Scholar] [CrossRef]
Albshaier, L.; Almarri, S.; Rahman, M.M.H. Earlier Decision on Detection of Ransomware Identification: A Comprehensive Systematic Literature Review. Information 2024, 15, 484. [Google Scholar] [CrossRef]
Ferdous, J.; Islam, R.; Mahboubi, A.; Islam, Z. AI-Based Ransomware Detection: A Comprehensive Review. IEEE Access 2024, 12, 136666–136695. [Google Scholar] [CrossRef]
Kim, S.; Yeom, S.; Oh, H.; Shin, D.; Shin, D. Automatic Malicious Code Classification System through Static Analysis Using Machine Learning. Symmetry 2021, 13, 35. [Google Scholar] [CrossRef]
Alraizza, A.; Algarni, A. Ransomware Detection Using Machine Learning: A Survey. Big Data Cogn. Comput. 2023, 7, 143. [Google Scholar] [CrossRef]
Por, L.Y.; Dai, Z.; Leem, S.J.; Chen, Y.; Yang, J.; Binbeshr, F.; Phan, K.Y.; Ku, C.S. A Systematic Literature Review on AI-Based Methods and Challenges in Detecting Zero-Day Attacks. IEEE Access 2024, 12, 144150–144163. [Google Scholar] [CrossRef]
Soliman, K.; Sobh, M.; Bahaa-Eldin, A.M. Robust Malicious Executable Detection Using Host-Based Machine Learning Classifier. Comput. Mater. Contin. 2024, 79, 1419–1439. [Google Scholar] [CrossRef]
Miraoui, M.; Ben Belgacem, M. Binary and multiclass malware classification of windows portable executable using classic machine learning and deep learning. Front. Comput. Sci. 2025, 7, 1539519. [Google Scholar] [CrossRef]
Manavi, F.; Hamzeh, A. Static Detection of Ransomware Using LSTM Network and PE Header. In Proceedings of the 2021 26th International Computer Conference, Computer Society of Iran (CSICC), Tehran, Iran, 3–4 March 2021; pp. 1–5. [Google Scholar] [CrossRef]
Manavi, F.; Hamzeh, A. A novel approach for ransomware detection based on PE header using graph embedding. J. Comput. Virol. Hacking Tech. 2022, 18, 285–296. [Google Scholar] [CrossRef]
Manavi, F.; Hamzeh, A. Ransomware detection based on PE header using convolutional neural networks. ISC Int. J. Inf. Secur. 2022, 14, 181–192. [Google Scholar] [CrossRef]
Moreira, C.C.; Moreira, D.C.; Sales, C., Jr. Improving ransomware detection based on portable executable header using xception convolutional neural network. Comput. Secur. 2023, 130, 103265. [Google Scholar] [CrossRef]
Cen, M.; Jiang, F.; Qin, X.; Jiang, Q.; Doss, R. Ransomware early detection: A survey. Comput. Netw. 2024, 239, 110138. [Google Scholar] [CrossRef]
Sgandurra, D.; Muñoz González, L.; Mohsen, R.; Lupu, E.C. Automated dynamic analysis of ransomware: Benefits, limitations and use for detection. arXiv 2016, arXiv:1609.03020. [Google Scholar] [CrossRef]
Vehabovic, A.; Zanddizari, H.; Ghani, N.; Shaikh, F.; Bou-Harb, E.; Pour, M.S.; Crichigno, J. Data-centric machine learning approach for early ransomware detection and attribution. In Proceedings of the NOMS 2023-2023 IEEE/IFIP Network Operations and Management Symposium, Miami, FL, USA, 8–12 May 2023; pp. 1–6. [Google Scholar] [CrossRef]
Deng, X.; Cen, M.; Jiang, M.; Lu, M. Ransomware early detection using deep reinforcement learning on portable executable header. Clust. Comput. 2024, 27, 1867–1881. [Google Scholar] [CrossRef]
Cen, M.; Deng, X.; Jiang, F.; Doss, R. Zero-Ran Sniff: A zero-day ransomware early detection method based on zero-shot learning. Comput. Secur. 2024, 142, 103849. [Google Scholar] [CrossRef]
Moreira, C.C.; Moreira, D.C.; Sales, C., Jr. A comprehensive analysis combining structural features for detection of new ransomware families. J. Inf. Secur. Appl. 2024, 81, 103716. [Google Scholar] [CrossRef]
Yang, C.-C.; Hsu, J.-M.; Leu, J.-S.; Hsieh, W.-B. Ransomware detection with CNN and deep learning based on multiple features of portable executable files. J. Supercomput. 2025, 81, 680. [Google Scholar] [CrossRef]
Shaukat, K.; Luo, S.; Varadharajan, V. A novel machine learning approach for detecting first-time-appeared malware. Eng. Appl. Artif. Intell. 2024, 131, 107801. [Google Scholar] [CrossRef]
Zahoora, U.; Rajarajan, M.; Pan, Z.; Khan, A. Zero-day Ransomware Attack Detection using Deep Contractive Autoencoder and Voting based Ensemble Classifier. Appl. Intell. 2022, 52, 13941–13960. [Google Scholar] [CrossRef]
Begovic, K.; Al-Ali, A.; Malluhi, Q. Cryptographic ransomware encryption detection: Survey. Comput. Secur. 2023, 132, 103349. [Google Scholar] [CrossRef]
Husari, G.; Niu, X.; Chu, B.; Al-Shaer, E. Using Entropy and Mutual Information to Extract Threat Actions from Cyber Threat Intelligence. In Proceedings of the 2018 IEEE International Conference on Intelligence and Security Informatics (ISI), Miami, FL, USA, 8–10 November 2018; pp. 1–6. [Google Scholar] [CrossRef]
Aityan, S.K. Confidence intervals. In Business Research Methodology: Research Process and Methods; Springer International Publishing: Cham, Switzerland, 2022; pp. 233–277. [Google Scholar] [CrossRef]
PE Format. Available online: https://learn.microsoft.com/en-us/windows/win32/debug/pe-format (accessed on 24 April 2025).
Maniriho, P.; Mahmood, A.N.; Chowdhury, M.J.M. A systematic literature review on Windows malware detection: Techniques, research issues, and future directions. J. Syst. Softw. 2024, 209, 111921. [Google Scholar] [CrossRef]
Vielberth, M.; Englbrecht, L.; Pernul, G. Improving data quality for human-as-a-security-sensor. A process driven quality improvement approach for user-provided incident information. Inf. Comput. Secur. 2021, 29, 332–349. [Google Scholar] [CrossRef]
Ayub, A.; Siraj, A.; Filar, B.; Gupta, M. Static-RWArmor: A Static Analysis Approach for Prevention of Cryptographic Windows Ransomware. In Proceedings of the 2023 IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Exeter, UK, 1–3 November 2023; pp. 1673–1680. [Google Scholar] [CrossRef]
Anastasiu, D.C.; Karypis, G. Efficient identification of Tanimoto nearest neighbors. Int. J. Data Sci. Anal. 2017, 4, 153–172. [Google Scholar] [CrossRef]
Penrose, P.; Macfarlane, R.; Buchanan, W.J. Approaches to the classification of high entropy file fragments. Digit. Investig. 2013, 10, 372–384. [Google Scholar] [CrossRef]
Palisse, A.; Durand, A.; Le Bouder, H.; Le Guernic, C.; Lanet, J.-L. Data Aware Defense (DaD): Towards a Generic and Practical Ransomware Countermeasure. In NordSec2017, Proceedings of the 22nd Nordic Conference on Secure IT Systems, Tartu, Estonia, 8–10 November 2017; LNCS; Lipmaa, H., Mitrokotsa, A., Matulevičius, R., Eds.; Springer: Cham, Switzerland, 2017; Volume 10674, pp. 192–208. [Google Scholar] [CrossRef]
Davies, S.R.; Macfarlane, R.; Buchanan, W.J. Differential area analysis for ransomware attack detection within mixed file datasets. Comput. Secur. 2021, 108, 102377. [Google Scholar] [CrossRef]
Arakkal, A.; Pazheri Sharafudheen, S.; Vasudevan, A.R. Crypto-Ransomware Detection: A Honey-File Based Approach Using Chi-Square Test. In Information Systems Security. ICISS 2023; International Conference on Information Systems Security, LNCS 14424; Springer Nature: Cham, Switzerland, 2023; pp. 449–458. [Google Scholar] [CrossRef]
Davies, S.R.; Macfarlane, R.; Buchanan, W.J. Comparison of Entropy Calculation Methods for Ransomware Encrypted File Identification. Entropy 2022, 24, 1503. [Google Scholar] [CrossRef]
Pont, J.; Arief, B.; Hernandez-Castro, J.C. Why Current Statistical Approaches to Ransomware Detection Fail. In Information Security, Proceedings of the 23rd International Conference, ISC 2020, Bali, Indonesia, 16–18 December 2020; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2020; pp. 199–216. [Google Scholar] [CrossRef]
Yao, W.; Hu, L.; Hou, Y.; Li, X. A Lightweight Intelligent Network Intrusion Detection System Using One-Class Autoencoder and Ensemble Learning for IoT. Sensors 2023, 23, 4141. [Google Scholar] [CrossRef]
Ahmed, U.; Lin, J.C.-W.; Srivastava, G. Mitigating adversarial evasion attacks of ransomware using ensemble learning. Comput. Electr. Eng. 2022, 100, 107903. [Google Scholar] [CrossRef]
Mauri, L.; Damiani, E. Hardening behavioral classifiers against polymorphic malware: An ensemble approach based on minority report. Inf. Sci. 2025, 689, 121499. [Google Scholar] [CrossRef]
Li, Y.; Li, Z.; Li, M. A comprehensive survey on intrusion detection algorithms. Comput. Electr. Eng. 2025, 121, 109863. [Google Scholar] [CrossRef]
Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
Zhou, Q.; Huang, C. A recommendation attack detection approach integrating CNN with Bagging. Comput. Secur. 2024, 146, 104030. [Google Scholar] [CrossRef]
Freund, Y.; Schapire, R.E. Game theory, on-line prediction and boosting. In Proceedings of the the Ninth Annual Conference on Computational Learning Theory, Desenzano del Garda, Italy, 28 June–1 July 1996; ACM: New York, NY, USA, 1996; pp. 325–332. [Google Scholar] [CrossRef]
Chauhan, N.R.; Dwivedi, R.K. Hybrid one-dimensional residual autoencoder and ensemble of gradient boosting for cloud IDS. Concurr. Comput. Pract. Exp. 2024, 36, e8088. [Google Scholar] [CrossRef]
Wolpert, D.H. Stacked generalization. Neural Netw. 1992, 5, 241–259. [Google Scholar] [CrossRef]
Maniriho, P.; Mahmood, A.N.; Chowdhury, M.J.M. MeMalDet: A memory analysis-based malware detection framework using deep autoencoders and stacked ensemble under temporal evaluations. Comput. Secur. 2024, 142, 103864. [Google Scholar] [CrossRef]
Hancock, J.T.; Khoshgoftaar, T.M. CatBoost for big data: An interdisciplinary review. J. Big Data 2020, 7, 94. [Google Scholar] [CrossRef]
Louk, M.H.L.; Tama, B.A. Tree-Based Classifier Ensembles for PE Malware Analysis: A Performance Revisit. Algorithms 2022, 15, 332. [Google Scholar] [CrossRef]

Figure 1. The workflow of proposed static analysis method.

Table 1. Distribution of different ransomware families.

Training Part of Dataset
ID	Family	Number of Samples	ID	Family	Number of Samples
1.	Avaddon	49	2.	Babuk	44
3.	BlackMatter	45	4.	Conti	48
5.	DarkSide	50	6.	Dharma	46
7.	DoppelPaymer	23	8.	Exorcist	19
9.	GandCrab	50	10.	LockBit	48
11.	Makop	35	12.	Maze	50
13.	MountLocker	14	14.	Nefilim	39
15.	NetWalker	50	16.	Phobos	50
17.	Pysa	38	18.	Ragnarok	43
19.	RansomeXX	13	20.	REvil	49
21.	Ryuk	48	22.	Stop	48
23.	Thanos	35	24.	WastedLocker	40
25.	Zeppelin	49
Testing part of dataset
1.	AvosLocker	50	2.	BianLian	11
3.	BlackBasta	32	4.	BlackByte	13
5.	BlackCat	50	6.	BlueSky	34
7.	Clop	46	8.	Hive	50
9.	HolyGhost	4	10.	Karma	13
11.	Lorenz	16	12.	Maui	3
13.	NightSky	14	14.	PlayCrypt	43
15.	Quantum	6

Table 2. The selected PE header features.

Section	Features
DOS Header	e_cblp, e_cp, e_crlc, e_cparhdr, e_minalloc, e_maxalloc, e_ss, e_sp, e_csum, e_ip, e_cs, e_lfarlc, e_ovno, e_oemid, e_oeminfo, e_lfanew
File Header	Machine, NumberOfSections, PointerToSymbolTable, NumberOfSymbols, SizeOfOptionalHeader, Characteristics
Optional Header	Magic, MajorLinkerVersion, MinorLinkerVersion, SizeOfCode, SizeOfInitializedData, SizeOfUninitializedData, AddressOfEntryPoint, BaseOfCode, BaseOfData, ImageBase, SectionAlignment, FileAlignment, MajorOperatingSystemVersion, MinorOperatingSystemVersion, MajorImageVersion, MinorImageVersion, MajorSubsystemVersion, MinorSubsystemVersion, SizeOfImage, SizeOfHeaders, CheckSum, Subsystem, DllCharacteristics, SizeOfStackReserve, SizeOfStackCommit, SizeOfHeapReserve, SizeOfHeapCommit
Directories ¹	EXPORT, IMPORT, RESOURCE, EXCEPTION, SECURITY, BASERELOC, DEBUG, COPYRIGHT, TLS, LOAD_CONFIG, BOUND_IMPORT, IAT, DELAY_IMPORT, COM_DESCRIPTOR

¹ Every directory contains two fields, VirtualAddress and Size.

Table 3. The selected DLLs.

Criterion	DLL Names
More popular in ransomware	rstrtmgr.dll, mpr.dll, netapi32.dll, iphlpapi.dll, ws2_32.dll, kernel32.dll
Used by benign only	api-ms-win-core-processthreads-l1-1-0.dll, api-ms-win-core-errorhandling-l1-1-0.dll, api-ms-win-core-sysinfo-l1-1-0.dll, api-ms-win-core-profile-l1-1-0.dll, api-ms-win-core-libraryloader-l1-2-0.dll, api-ms-win-core-synch-l1-2-0.dll, api-ms-win-core-com-l1-1-0.dll, api-ms-win-core-synch-l1-1-0.dll, api-ms-win-core-handle-l1-1-0.dll, api-ms-win-core-heap-l1-1-0.dll, api-ms-win-core-localization-l1-2-0.dll, api-ms-win-core-registry-l1-1-0.dll, api-ms-win-core-debug-l1-1-0.dll, api-ms-win-core-string-l1-1-0.dll, api-ms-win-core-rtlsupport-l1-1-0.dll, api-ms-win-core-heap-l2-1-0.dll, api-ms-win-core-file-l1-1-0.dll, api-ms-win-core-processenvironment-l1-1-0.dll, api-ms-win-eventing-provider-l1-1-0.dll, api-ms-win-core-processthreads-l1-1-1.dll, api-ms-win-security-base-l1-1-0.dll, api-ms-win-core-interlocked-l1-1-0.dll, api-ms-win-core-delayload-l1-1-0.dll, api-ms-win-core-delayload-l1-1-1.dll, api-ms-win-crt-string-l1-1-0.dll, api-ms-win-core-winrt-string-l1-1-0.dll
More popular in benign	version.dll, msvcrt.dll, comctl32.dll, comdlg32.dll, ntdll.dll, user32.dll, gdi32.dll, ole32.dll

Table 4. The selected sections.

Criterion	Section Names
More popular in ransomware	.rdata, .itext, .gfids, .tls
Used by ransomware only	.cdata, .ndata, .bss,/19,/32,/99,/112,/63,/46,/124,/80, .CRT, .symtab,/4, .text1, .v0rmpw, UPX2, .keys, .data1, .cfg
More popular in benign	.reloc, .rsrc, .data, .idata, .pdata, .text, UPX1, .didat

Table 5. The results of the experiment to choose the base classifiers.

Family	SVM + LR + SGD	SVM + LR + GB	SVM + LR + KNN	SVM + LR + RF
Avaddon	100	100	100	100
Babuk	100	100	100	100
BlackMatter	100	100	100	100
Conti	100	97.92	97.92	98.96
DarkSide	98.67	96	100	99.8
Dharma	100	100	100	100
DoppelPaymer	47.83	82.61	100	99.13
Exorcist	94.74	100	100	100
GandCrab	100	100	100	100
LockBit	100	100	100	100
Makop	65.71	60	60	67.43
Maze	100	100	100	100
MountLocker	100	100	100	100
Nefilim	100	92.31	97.44	97.44
NetWalker	100	100	100	99.8
Phobos	98	98	98	98
Pysa	100	100	100	100
Ragnarok	100	100	100	100
RansomeXX	100	100	100	96.15
REvil	100	100	100	100
Ryuk	100	100	81.25	100
Stop	100	100	100	100
Thanos	53.33	91.43	97.14	92.29
WastedLocker	100	100	100	100
Zeppelin	97.96	100	100	97.55
Average	94.25	96.73	97.27	97.86

If all combinations of classifiers received the same score, the best performing one is not highlighted, otherwise the best performing one is shown in bold.

Table 6. The results of the experiment to compare cosine similarity and Tanimoto coefficient.

Family	Cosine Similarity		Tanimoto Coefficient
Family	Recall	Confidence Interval (±)	Recall	Confidence Interval (±)
Avaddon	100	0	100	0
Babuk	100	0	100	0
BlackMatter	100	0	100	0
Conti	98.75	0.32	98.75	0.32
DarkSide	100	0	100	0
Dharma	98.04	0.2	99.35	0.31
DoppelPaymer	97.83	0.94	96.96	1.29
Exorcist	100	0	100	0
GandCrab	99.8	0.19	100	0
LockBit	100	0	100	0
Makop	67.14	4.56	57.71	1.92
Maze	100	0	100	0
MountLocker	100	0	100	0
Nefilim	98.97	0.39	98.72	0.4
NetWalker	99.8	0.19	99.8	0.19
Phobos	99.2	0.31	98.2	0.19
Pysa	100	0	100	0
Ragnarok	100	0	100	0
RansomeXX	100	0	100	0
REvil	99.59	0.39	100	0
Ryuk	91.46	2.82	98.96	0.44
Stop	100	0	100	0
Thanos	96.29	1.03	98	0.82
WastedLocker	99.5	0.31	98.25	1.28
Zeppelin	94.9	1.56	95.92	1.36
Average	97.65	0.53	97.62	0.34

If obtained values of similarity are the same for both measures, the best performing one is not highlighted, otherwise the best performing one is shown in bold.

Table 7. The results of the experiment to compare Shannon entropy and Chi-square test.

Family	Shannon Entropy		Chi-Square Test
Family	Recall	Confidence Interval (±)	Recall	Confidence Interval (±)
AvosLocker	99.1	0.19	100	0
BianLian	100	0	100	0
BlackBasta	95.31	0.43	94.06	0.39
BlackByte	71.15	1.6	87	5.01
BlackCat	100	0	100	0
BlueSky	100	0	100	0
Clop	99.02	0.2	95.65	0.43
Hive	100	0	88.5	0.93
HolyGhost	100	0	100	0
Karma	100	0	100	0
Lorenz	100	0	100	0
Maui	85	0	100	0
NightSky	75.36	9.87	100	0
PlayCrypt	100	0	100	0
Quantum	91	3.87	86.33	3.26
Average	93.82	1.01	96.27	0.67

If obtained values of randomness are the same for both measures, the best performing one is not highlighted, otherwise the best performing one is shown in bold.

Table 8. The results of ransomware zero-day detection of training part of the dataset (recall metrics).

Family	PE Header	DLL Count	DLL Average	DLL List	Function Average	Section Entropy
Avaddon	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00
Babuk	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00
BlackMatter	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00
Conti	97.92 ± 0.00	97.29 ± 0.59	98.54 ± 0.59	97.5 ± 0.97	98.13 ± 0.7	98.12 ± 0.39
DarkSide	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00
Dharma	99.35 ± 0.62	99.35 ± 0.62	99.35 ± 0.62	99.78 ± 0.40	99.78 ± 0.40	99.57 ± 0.54
DoppelPaymer	98.70 ± 1.23	96.09 ± 2.81	92.61 ± 6.03	94.35 ± 2.43	98.26 ± 1.32	88.26 ± 4.52
Exorcist	100 ± 0.00	100 ± 0.00	100 ± 0.00	98.95 ± 1.30	100 ± 0.00	100 ± 0.00
GandCrab	99.80 ± 0.37	98.60 ± 0.57	100 ± 0.00	98.8 ± 0.61	99.40 ± 0.57	98.60 ± 0.57
LockBit	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00
Makop	66.29 ± 7.46	64.57 ± 6.35	62 ± 2.97	60.29 ± 0.53	68 ± 5.75	60.86 ± 0.81
Maze	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00
MountLocker	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00
Nefilim	99.23 ± 0.73	97.95 ± 0.64	98.97 ± 0.78	97.44 ± 0.00	96.67 ± 1.43	95.90 ± 1.46
NetWalker	99.80 ± 0.37	99.80 ± 0.37	99.80 ± 0.37	99.80 ± 0.37	99.80 ± 0.37	100 ± 0.00
Phobos	98.60 ± 0.57	98.60 ± 0.57	99.20 ± 0.61	98 ± 0.00	98 ± 0.00	98 ± 0.00
Pysa	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00
Ragnarok	99.77 ± 0.43	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00
RansomeXX	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00
REvil	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00
Ryuk	92.92 ± 5.42	93.13 ± 5.07	95.83 ± 4.87	99.79 ± 0.39	99.38 ± 0.83	99.58 ± 0.77
Stop	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00
Thanos	94.57 ± 2.56	96.57 ± 2.35	90.29 ± 12.85	95.71 ± 1.81	97.43 ± 1.47	98 ± 1.78
WastedLocker	100 ± 0.00	100 ± 0.00	97.75 ± 4.18	100 ± 0.00	99.75 ± 0.46	100 ± 0.00
Zeppelin	90.82 ± 2.9	95.92 ± 2.53	94.49 ± 2.77	97.14 ± 2.48	98.57 ± 2.27	92.86 ± 2.95
Average	97.51 ± 0.91	97.51 ± 0.90	97.15 ± 1.47	97.50 ± 0.45	98.13 ± 0.62	97.19 ± 0.55

If the obtained values are the same for all feature sets, the best performing one is not highlighted, otherwise the best performing one is shown in bold.

Table 9. The results of ransomware zero-day detection of testing part of the dataset (recall metrics).

Family	PE Header	DLL Count	DLL Average	DLL List	Function Average	Section Entropy
AvosLocker	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00
BianLian	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00
BlackBasta	98.44 ± 1.30	95.63 ± 5.49	90.63 ± 5.20	99.38 ± 0.77	96.88 ± 3.97	98.44 ± 1.98
BlackByte	84.62 ± 7.39	87.69 ± 8.31	80.77 ± 8.86	86.15 ± 9.00	70 ± 1.43	74.62 ± 4.29
BlackCat	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00
BlueSky	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00
Clop	96.96 ± 0.89	95.22 ± 2.16	93.04 ± 2.47	95.43 ± 2.29	93.26 ± 2.86	97.83 ± 0.00
Hive	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00
HolyGhost	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00
Karma	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00
Lorenz	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00
Maui	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00
NightSky	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00
PlayCrypt	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00
Quantum	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00	100 ± 0.00
Average	98.67 ± 0.64	98.57 ± 1.06	97.63 ± 1.10	98.73 ± 0.80	97.34 ± 0.55	98.06 ± 0.42

If obtained values of similarity are the same for all feature sets, the best performing one is not highlighted, otherwise the best performing one is shown in bold.

Table 10. Comparison of the results of the training part of the dataset.

	Accuracy			F-Score
Family	[17]	[23]	Ours	[17]	[23]	Ours
Avaddon	98.98 ± 0.26	99.49 ± 0.14	100 ± 0.00	99.01 ± 0.25	99.50 ± 0.13	100 ± 0.00
Babuk	99.55 ± 0.16	99.54 ± 0.16	99.89 ± 0.21	99.55 ± 0.15	99.55 ± 0.15	99.89 ± 0.21
BlackMatter	99.00 ± 0.16	99.33 ± 0.18	100 ± 0.00	99.02 ± 0.16	99.35 ± 0.18	100 ± 0.00
Conti	95.73 ± 0.24	95.73 ± 0.15	98.85 ± 0.35	95.60 ± 0.25	95.57 ± 0.15	98.85 ± 0.35
DarkSide	98.50 ± 0.13	99.80 ± 0.08	100 ± 0.00	98.51 ± 0.13	99.80 ± 0.08	100 ± 0.00
Dharma	97.61 ± 0.31	99.57 ± 0.15	99.78 ± 0.27	97.64 ± 0.30	99.57 ± 0.14	99.78 ± 0.27
DoppelPaymer	96.52 ± 0.98	98.91 ± 0.31	94.13 ± 0.26	96.31 ± 1.10	98.94 ± 0.31	93.61 ± 2.53
Exorcist	98.42 ± 0.77	98.68 ± 0.56	100 ± 0.00	98.39 ± 0.79	98.73 ± 0.54	100 ± 0.00
GandCrab	97.40 ± 0.22	99.20 ± 0.15	99.30 ± 0.28	97.39 ± 0.22	99.21 ± 0.14	99.29 ± 0.29
LockBit	99.06 ± 0.15	99.69 ± 0.13	100 ± 0.00	99.08 ± 0.14	99.69 ± 0.13	100 ± 0.00
Makop	72.86 ± 1.08	95.14 ± 0.48	80.29 ± 0.35	63.72 ± 1.75	94.95 ± 0.51	75.52 ± 0.54
Maze	96.10 ± 0.25	99.50 ± 0.18	99.90 ± 0.19	96.07 ± 0.26	99.51 ± 0.18	99.90 ± 0.18
MountLocker	73.67 ± 2.64	99.67 ± 0.36	99.64 ± 0.66	59.43 ± 5.57	99.66 ± 0.37	99.66 ± 0.64
Nefilim	91.67 ± 0.94	98.21 ± 0.19	97.95 ± 0.73	90.83 ± 1.09	98.20 ± 0.18	97.89 ± 0.77
NetWalker	98.60 ± 0.20	99.30 ± 0.20	100 ± 0.00	98.62 ± 0.20	99.32 ± 0.19	100 ± 0.00
Phobos	97.70 ± 0.22	98.30 ± 0.18	98.80 ± 0.25	97.72 ± 0.21	98.30 ± 0.17	98.79 ± 0.25
Pysa	98.82 ± 0.16	99.61 ± 0.14	100 ± 0.00	98.83 ± 0.16	99.61 ± 0.13	100 ± 0.00
Ragnarok	92.79 ± 0.67	99.19 ± 0.16	100 ± 0.00	92.33 ± 0.74	99.20 ± 0.15	100 ± 0.00
RansomeXX	97.33 ± 0.89	100 ± 0.00	100 ± 0.00	96.86 ± 1.04	100 ± 0.00	100 ± 0.00
REvil	96.84 ± 0.32	99.39 ± 0.10	100 ± 0.00	96.78 ± 0.34	99.39 ± 0.10	100 ± 0.00
Ryuk	88.02 ± 0.28	99.48 ± 0.17	99.69 ± 0.41	86.62 ± 0.34	99.49 ± 0.16	99.68 ± 0.42
Stop	93.65 ± 0.27	98.44 ± 0.19	99.90 ± 0.19	93.40 ± 0.29	98.44 ± 0.19	99.90 ± 0.19
Thanos	61.86 ± 1.77	90.00 ± 0.70	99 ± 0.89	37.82 ± 4.23	88.99 ± 0.82	98.97 ± 0.92
WastedLocker	97.13 ± 0.51	99.50 ± 0.18	99.88 ± 0.23	97.11 ± 0.52	99.51 ± 0.18	99.88 ± 0.23
Zeppelin	87.65 ± 1.04	94.39 ± 0.10	96.33 ± 1.53	85.94 ± 1.48	94.12 ± 0.10	96.13 ± 1.62
Average	93.84 ± 0.47	98.41 ± 0.20	98.53 ± 0.35	92.27 ± 0.66	98.36 ± 0.20	98.31 ± 0.38

The best performing results are highlighted separately according to the accuracy metrics and F-score metrics.

Table 11. Comparison of the results of the testing part of the dataset.

	Accuracy		F-Score		Recall
Family	[23]	Ours	[23]	Ours	[23]	Ours
AvosLocker	98.78 ± 0.18	96.60 ± 0.50	98.80 ± 0.17	96.72 ± 0.47	100.0 ± 0.00	100 ± 0.00
BianLian	98.33 ± 0.78	96.00 ± 1.24	97.86 ± 0.99	94.89 ± 1.53	100.0 ± 0.00	100 ± 0.00
BlackBasta	98.53 ± 0.34	96.56 ± 1.36	98.57 ± 0.33	96.61 ± 1.37	100.0 ± 0.00	98.44 ± 1.98
BlackByte	72.00 ± 0.81	84.00 ± 2.41	54.50 ± 0.85	80.05 ± 3.20	38.62 ± 0.55	74.62 ± 4.29
BlackCat	98.86 ± 0.18	96.40 ± 0.05	98.88 ± 0.18	96.53 ± 0.46	100.0 ± 0.00	100 ± 0.00
BlueSky	98.82 ± 0.28	98.24 ± 0.55	98.85 ± 0.27	98.27 ± 0.53	100.0 ± 0.00	100 ± 0.00
Clop	97.33 ± 0.26	95.54 ± 0.64	97.32 ± 0.27	95.65 ± 0.59	97.02 ± 0.36	97.83 ± 0.00
Hive	98.79 ± 0.18	96.60 ± 0.41	98.81 ± 0.18	96.72 ± 0.39	100.0 ± 0.00	100 ± 0.00
HolyGhost	97.87 ± 0.88	94.33 ± 1.86	93.23 ± 2.68	83.21 ± 4.96	100.0 ± 0.00	100 ± 0.00
Karma	98.80 ± 0.64	97.00 ± 1.45	98.67 ± 0.71	96.72 ± 1.56	100.0 ± 0.00	100 ± 0.00
Lorenz	98.81 ± 0.57	95.63 ± 1.28	98.85 ± 0.55	95.85 ± 1.20	100.0 ± 0.00	100 ± 0.00
Maui	97.60 ± 0.94	94.33 ± 1.61	90.44 ± 3.54	78.81 ± 5.47	100.0 ± 0.00	100 ± 0.00
NightSky	98.67 ± 0.75	95.00 ± 1.67	98.64 ± 0.76	94.99 ± 1.61	100.0 ± 0.00	100 ± 0.00
PlayCrypt	98.55 ± 0.22	95.35 ± 0.79	98.58 ± 0.22	95.57 ± 0.72	100.0 ± 0.00	100 ± 0.00
Quantum	98.00 ± 0.81	95.67 ± 1.32	95.50 ± 1.77	90.44 ± 2.76	100.0 ± 0.00	100 ± 0.00
Average	97.53 ± 0.38	95.15 ± 1.17	96.41 ± 0.58	92.74 ± 1.79	97.52 ± 0.06	98.06 ± 0.42

If the values of the individual metrics for both methods are the same, the best-performing method is not highlighted, otherwise the best-performing method is highlighted for each individual metric separately.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Venčkauskas, A.; Jusas, V.; Barisas, D. Zero-Day Ransomware Attack Detection Using Static Portable Executable Header Features. Appl. Sci. 2025, 15, 10576. https://doi.org/10.3390/app151910576

AMA Style

Venčkauskas A, Jusas V, Barisas D. Zero-Day Ransomware Attack Detection Using Static Portable Executable Header Features. Applied Sciences. 2025; 15(19):10576. https://doi.org/10.3390/app151910576

Chicago/Turabian Style

Venčkauskas, Algimantas, Vacius Jusas, and Dominykas Barisas. 2025. "Zero-Day Ransomware Attack Detection Using Static Portable Executable Header Features" Applied Sciences 15, no. 19: 10576. https://doi.org/10.3390/app151910576

APA Style

Venčkauskas, A., Jusas, V., & Barisas, D. (2025). Zero-Day Ransomware Attack Detection Using Static Portable Executable Header Features. Applied Sciences, 15(19), 10576. https://doi.org/10.3390/app151910576

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Zero-Day Ransomware Attack Detection Using Static Portable Executable Header Features

Abstract

1. Introduction

2. Review of Related Work

3. Proposed Method

3.1. Method Overview

3.2. Dataset

3.3. PE Header

3.4. DLL Features and Function Calls

3.5. Section Randomness

3.6. Choice of Classifier

4. Experiments and Discussion

4.1. Choice of the Base Classifiers

4.2. Comparison Between Cosine Similarity and Tanimoto Coefficient to Choose a Similarity Measure

4.3. Comparison Between Shannon Entropy and Chi-Square Test to Measure Randomness

4.4. The Results of Ransomware Zero-Day Detection

4.5. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI