POS: A Recognition Method for Packed Software in Opened-Set Scenario

Qian, Zhenghao; Liu, Fengzheng; He, Mingdong; Li, Bo; Li, Xuewu; Zhao, Chuangye; Fu, Gehua; Hu, Yifan; Liu, Hao

doi:10.3390/electronics14224450

Open AccessArticle

POS: A Recognition Method for Packed Software in Opened-Set Scenario

by

Zhenghao Qian

^1,*

,

Fengzheng Liu

^1,*

,

Mingdong He

¹,

Bo Li

¹,

Xuewu Li

¹,

Chuangye Zhao

¹,

Gehua Fu

¹,

Yifan Hu

¹ and

Hao Liu

^2,*

¹

Information Center, Guangdong Power Grid Co., Ltd., Guangzhou 510050, China

²

Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou 510555, China

^*

Authors to whom correspondence should be addressed.

Electronics 2025, 14(22), 4450; https://doi.org/10.3390/electronics14224450

Submission received: 23 October 2025 / Revised: 7 November 2025 / Accepted: 11 November 2025 / Published: 14 November 2025

(This article belongs to the Special Issue Recent Advances in Cybersecurity and Information Security)

Download

Browse Figures

Versions Notes

Abstract

Malware plays a critical role in network attacks, making its analysis essential for ensuring network security. To evade detection, malware developers often use packing techniques to hide malicious code, making it difficult for analysts to identify the software’s true behavior. Software that has been packed is referred to as “packed software,” and network security analysts need to employ unpacking strategies to remove these protective layers and restore the software’s actual behavior. This process is crucial in preventing malware from bypassing traditional security mechanisms, as unpacking reveals the underlying code that can be analyzed for malicious intent. However, as malware evolves, packed software can vary greatly in its packing techniques, requiring analysts to stay ahead of emerging trends in obfuscation methods. Furthermore, new packing methods are frequently introduced, posing an ongoing challenge to existing detection systems. Existing packed software identification methods largely rely on known training sets, which can identify known types of packed software but struggle with the opened-set problem, where new or unknown packed software types are encountered. To address this issue, this paper introduces the problem of identifying packed software in both closed-set and opened-set scenarios and proposes an evaluation mechanism using known/unknown recall rates to assess the ability to recognize both types. The known recall rate evaluates the system’s ability to identify known types, while the unknown recall rate measures its ability to recognize new, unknown packed software. This dual approach helps bridge the gap between identifying familiar threats and detecting previously unseen ones, which is increasingly important as malware continues to evolve. Additionally, the paper proposes a strategy that simultaneously addresses both recognition problems, aiming to improve the overall performance of the identification system. Experimental results on a packed software dataset demonstrate that this strategy significantly improves the accuracy and comprehensiveness of identification, validating its effectiveness in practical applications.

Keywords:

cyberspace security; packed software recognition; malware analysis; static analysis; opened-set recognition

1. Introduction

Research based malware [1,2,3,4,5,6,7,8,9,10] plays an important role in network attacks and has become one of the main threats facing network security today. With the popularization of the Internet and the rapid development of information technology, the ways of network attacks are increasingly diversified and more hidden [11,12,13,14]. The increasingly developing malware has strong concealment and diversity [15,16,17,18]. In recent years, malware developers have adopted an increasing number of adversarial techniques to resist conventional analysis methods, such as code obfuscation and process hiding. Among them, packing technology has been widely used as a protective measure [19,20,21,22,23,24,25]. Packing technology increases the difficulty of malware analysis by encapsulating malicious code in encrypted or compressed data. The software after packing addition is called packed software, and it is crucial to effectively identify and unpack. Therefore, the protection technology of packed software has become a major challenge for network security analysts, and it is particularly important to develop effective methods for identifying packed software.

Although existing packed software recognition methods have achieved certain results in some cases, most of these methods rely on existing training data, such as machine learning methods such as support vector machine (SVM) [26], random forest (RF) [27], as well as deep network methods such as convolutional neural network (CNN) [28], long short-term memory network (LSTM), or VGG16 [29], to construct effective recognition models, and focus on identifying known/seen types of malicious software. Such strategies often overlook unknown/unseen types of malware that may be encountered in practical applications, and therefore cannot address opened-set issues. The opened-set problem refers to the inability of a system to effectively identify unknown malware outside the training set, resulting in the creation of security vulnerabilities. Therefore, researching how to effectively identify packed software in an opened-set scenario has become a key issue in improving the efficiency and accuracy of malware analysis.

To address this issue, this article clarifies the closed-set and opened-set recognition problems of packed software and designs a new evaluation metric, known/unknown (seen/unseen) recall rate, to measure the recognition ability of the two types. This metric not only considers the efficient identification of seen malware, but also enhances the ability to identify unknown types of malware, overcoming the shortcomings of traditional methods in opened-set problems. This article also proposes a new multi-model recognition prevention method that can simultaneously meet the recognition requirements of closed-set and opened-set packed software. Through experimental verification of the packed software dataset, the results show that the proposed prevention has significant advantages in improving recognition accuracy and adaptability.

The contributions of this article are as follows.

1.: Through the sorting and analysis of previous methods for identifying packed software, combined with the existing environmental background of malware detection, an overlooked problem in the field of packed software identification, namely opened-set identification, was discovered. The closed-set and opened-set identification problems of packed software were compared and analyzed, and these two problems were clearly defined.
2.: Through detailed discussions on the opened-set problem, three solutions were identified for solving the problem of identifying opened-set in packed software, and a comprehensive analysis showed that the multi-model recognition scheme is suitable for meeting the recognition requirements of closed-set and opened-set packed software. A method for identifying unseen types of packed software based on multi-class model schemes is proposed.
3.: In order to effectively measure the ability of packed software recognition methods in closed-set recognition problems and opened-set recognition problems, an evaluation metric called seen/unseen recall rate was designed to measure the closed-set and opened-set recognition methods of packed software. And further integrate more macroscopic method metrics based on the above two metrics.
4.: Through algorithm comparison experiments, parameter adjustment experiments, and scheme comparison experiments, the effectiveness of the proposed method was verified from three aspects.

Section structure: Section 2 will review relevant research work. Section 3 proposes to explore the motivation and methodology architecture of method design. Section 4 will present the experimental results and analyze them. Section 5 summarizes the work and looks forward to future research directions.

Table 1 provides a list of abbreviations used throughout this paper.

Table 2. Summary of related work.

Method	Raw Data	Feature	Algorithm	Problem
Kim et al. [19]	Byte sequences	The first 15 bytes	SVM	PRC
Jung et al. [21]	Byte sequences	Byte entropy	GBoost	PRC
Li et al. [20]	CEGs	GNN	DNN	PRC
2-SPIFF [22]	File attribute+FCGs	File+FCG	SVM+KNN	PRC+PRO
Mondon et al. [24]	Assembly codes+CFGs	Code+CFG	SMOTE SVM	PRC
PackHero [25]	CGs	Node+Signatures	GMN+Cluster	PRC

2. Related Work

The packing technology of malware is widely used in anti analysis, which seriously affects traditional methods of malware detection. Therefore, researchers have proposed various technical means to improve the recognition efficiency and accuracy of packed software. The existing related work can be mainly divided into two categories: one is recognition methods based on feature analysis, and the other is recognition methods based on graph analysis. The following provides a review of these two types of methods and makes corresponding summaries and analyses.

2.1. Recognition Method Based on Feature Analysis

The recognition method based on feature analysis mainly extracts the byte sequence or other file features of malware and uses classification algorithms for recognition.

In 2019, Kim et al. [19] proposed a byte recognition method based on packed software, using the first 15 bytes of packed software as training data and implementing an effective packed software recognition model using SVM.

In 2020, Jung et al. [21] proposed a packed software recognition method based on byte sequences, which analyzes the byte sequences of malicious software, extracts its features, and classifies them. Unlike traditional signature detection methods, this method can recognize different packing tools, and experimental results show an accuracy of 91.6%. This method is particularly suitable for analyzing malware containing different packed tools.

2.2. Recognition Method Based on Graph Analysis

The identification method based on graph analysis mainly analyzes and identifies malware by constructing graph structures such as control flow graphs (CFGs) or function call graphs (FCGs).

In 2019, Li et al. [20] proposed a method based on consistent executing graphs (CEGs) to identify the packing technique used by packed malware. This method maximizes semantic preservation and uses graph matching algorithms and graph kernel techniques for efficient recognition. The experimental results show that this method has good performance for packed software with complex graph structures, but it requires a significant amount of computational resources.

In 2021, Liu et al. [22] proposed a two-stage packed software recognition method, 2-SPIFF, which combines FCGs and file attribute features to identify packed and non-packed files through a two-stage detection strategy, and further distinguish packed tools or software. This method achieved a detection accuracy of 99.80% for packing technology and a recognition accuracy of 98.49% for packing technology in the experiment.

In 2024, Mondon and Lemos [24] proposed a string obfuscation detection method based on CFGs and string encryption analysis. They designed an efficient detector that can accurately identify string obfuscation in malware by combining assembly code features, control flow graphs, and directed graphs. The experimental results show that the method achieved an accuracy of over 90% in all evaluation metrics, making it particularly suitable for analyzing malware with string obfuscation.

In 2025, Di Gennaro et al. [25] proposed PackHero, a graph based scalable packed software recognition method. PackHero uses graph matching networks (GMNs) and clustering algorithms to identify programs protected by different packing tools’ call graphs (CGs). The experimental results show that PackHero can achieve a macro average F1 score of 93.7% with only 10 samples, and improve to 98.3% with 100 samples. PackHero is particularly adept at recognizing virtualization packing technology, outperforming existing signature detection tools.

2.3. Summary of Related Work

Through the analysis of the above research, it can be found that there are two main trends in the existing packed technology identification methods: one is to analyze through byte sequences or file features, and the other relies on building the graph structure of malware for identification. The former can efficiently and accurately identify packed technology types, but has poor ability to cope with complex packing techniques. The latter, by modeling the complex behavior of malicious software, can identify more types of packing techniques, especially when facing new packed methods, demonstrating stronger adaptability.

However, most existing research has focused on the recognition problem of packed technology under closed-set conditions, and the recognition ability for opened-set scenarios has not been fully explored. Only the 2-SPIFF method proposed by Liu et al. [22] has explored unseen packed software to some extent, but it has not yet deeply solved the opened-set problem. Therefore, future research needs to pay more attention to how to improve the adaptability of models to unseen packed techniques, especially in opened-set environments. How to effectively identify newly emerging packing techniques remains an urgent problem to be solved.

3. Motivation and Method

3.1. Problem Definition

In previous packed software detection problems, packed software recognition was defined as a supervised learning problem. This article explores the scalability based on this definition.

Packing technology, as a software protection technique, is not a static technology, but a technology that is constantly updated and changed as needed. The above content also applies to malware, therefore, the models constructed by previous packed software detection methods that only target labels in the training dataset are not suitable for real detection environments. Therefore, Definitions 1 and 2 are made to take into account the real detection situation in the method architecture.

Definition 1.

Packed software recognition on closed-set (PRC) problem. The set of labels being tested is a packed software recognition problem that is a subset of the labels in training, which belongs to supervised learning problems. The labels in the training set are called seen labels.

Definition 2.

Packed software recognition on opened-set (PRO) problem. The set of labels being tested is not a subset of the labels in the training packed software recognition problem, which belongs to unsupervised learning problem. The labels in the non training set are unseen labels.ls in the non training set are referred to as unknown labels.

3.2. Solution Analysis

The previous packed software detection problems all belong to Definition 1 PRC problem. The premise assumption of PRC problem is that there are partitions in the feature vector space according to certain rules, and the packed software in the same partition area belongs to a specified label. As shown in Figure 1, a schematic diagram of a region partition is shown, where the serial number represents the class number, and each region shares a boundary with its neighborhood without gaps. This is because in general supervised learning problems, the label with the highest confidence will be used as the predicted label, which will result in each sample in the feature vector space partitioned by the model uniquely existing in one region or on the boundary of multiple regions (such as two labels having equal confidence). Below are three solutions for solving the problem of opened-set recognition in packed software. Note that Figure 2, Figure 3, Figure 4 are transformations of the three schemes in the vector space shown in Figure 1, without any direct causal relationship among them. Class 6 is not a special category but a relatively dense cluster in the high-dimensional space learned by the model, making it more prone to being enclosed by other classes.

3.2.1. Threshold Setting Solution

In order to detect unseen packed software, based on Figure 1, 2-SPIFF [22] first proposed a threshold based unseen label recognition scheme, which adds category gaps in Figure 1. When a sample is predicted by the model with the highest confidence level below the threshold, it falls into the classification gap and is predicted as an unseen sample. Otherwise, it will be judged as an unseen packed software. As shown in Figure 2, the light colored areas represent unseen packed software, while the dark colored areas correspond to multiple class packed software. The 2-SPIFF solution has opened up ideas for solving the problem of opened-set recognition in packed software. The following will introduce two other solutions.

3.2.2. Single Model Solution

The advantage of the threshold setting solution is that it can maintain high recognition accuracy for seen samples while providing recognition capability for unseen packed software. It is not an independent scheme designed for packed software of unseen classes, but an extension of the closed-set recognition scheme for packed software. The single model scheme considers all the packed software in the training set as a class of packed software, and uses the single class recognition model to realize the recognition of unseen packed software. As shown in Figure 3, the single model scheme does not rely on the assumption of multiple classification spaces, but only classifies the feature space into seen class space and unseen class space based on seen packed software.

3.2.3. Multi-Class Model Solution

Due to the difference in distribution and density of multiple classes, the single model scheme will judge some edge samples of seen classes as unseen classes. The multi class model scheme extends the single model scheme. For each independent classification, the single model scheme is used, so that on the one hand, more seen classes of paceked software can be retained in the area covered by the model, and on the other hand, compared with the single model scheme, the unseen classes of packed software between classes can be eliminated. As shown in Figure 4, the multi class model scheme adopts the method of increasing the number of models to more accurately divide seen classes and unseen classes. Different colors represent different training data sources for the model. Therefore, the subsequent method framework adopts the multi-class model solution.

3.3. Proposed Method

In order to effectively solve the problem of open-set recognition in packed software, a feature and multi-class model solution with good classification performance was adopted, as shown in Figure 5. The proposed packed software recognition method in opened-set scenario is called POS. The POS framework is divided into four main steps:

1.: File preprocessing. In the initial stage of software testing, the software to be analyzed is first disassembled. By using powerful disassembly analysis tools such as IDA Pro, one can delve into the internal structure of the software and help extract key disassembly code. Specifically, by analyzing the function call graph structure in these codes, a directed graph can be constructed to describe the call relationships between software functions. In this diagram, each function name serves as a vertex, and the calling relationship between functions is represented by directed edges, with the starting point of the edge being the caller function and the ending point being the called function. This diagram can not only help identify the basic architecture of the software, but also reveal the complex interaction relationships between different functions during program execution. These pieces of information are crucial for subsequent feature extraction and pattern recognition work, providing valuable clues for analyzing software behavior and potential malicious code.
2.: Feature extraction. In the feature extraction stage, the goal is to extract vectors from disassembly code that can effectively represent software characteristics. Although this method framework itself does not impose strict restrictions on feature classes, in order to ensure consistency and comparability between different detection methods, the same feature extraction method is chosen. The extracted features mainly include two classes: segment class features and function call graph class features. The characteristics of segment classes involve the distribution and structure of different memory segments (such as code areas, data areas, etc.) in software, and the information of these segments can effectively reflect the overall organization and execution process of the program. The feature of function call graph class focuses on the characteristics of the relationships between functions in the function call graph, which can reveal the behavior pattern of the program from the frequency, hierarchical structure, and call path of function calls. These two classes of features are processed numerically and used as inputs for machine learning models, providing support for subsequent classification and recognition.
3.: Multi-model training. For packed software from different classes, a multi class model training solution is adopted, where each class of packed software is modeled through an independent single class recognition model. These single class models are trained separately for each seen packed software, ensuring that each model can achieve maximum performance on a specific software class. Table 3 shows all the extracted features.To train these models, one class support vector machine (One Class SVM) algorithm was used, which can specifically recognize a certain class of feature and has strong discriminative ability for abnormal data (i.e., unseen samples). During the training process, by inputting a large number of labeled seen samples, the model is able to learn typical features of different software classes in the feature space. After training, each model can accurately identify the corresponding class of packed software, providing a foundation for the final recognition.
4.: Packed software recognition. In the actual recognition process, a multi-class model trained in the early stage will classify and judge the software to be detected. Specifically, when the packed software features to be detected are input into these models, the models will classify them based on the extracted features. If at least one model determines that the software to be tested belongs to a seen packed software, then the software will be classified as that class. If all models fail to recognize the seen class, the software is judged as an unseen sample. In order to ensure the accuracy of recognition, the whole recognition process adopts the integrated learning method, that is, the judgment results of multiple single models will be comprehensively considered, so as to improve the robustness and accuracy of the final classification results. Through this method, accurate judgments can be quickly made in different classes of packed software, and seen and unseen classes can be effectively distinguished.

4. Experiments and Discussions

This section will provide a detailed experimental description and discussion of POS, and overall introduce it from three aspects. In Section 4.1, introduce the hardware environment, experimental dataset, and experimental indicators used for the validation experiment. In the following subsections, experiments and analysis were conducted on the overall method, algorithms used, and adjusted parameters. Finally, a comparison was made with the baseline method, and the effectiveness of the proposed method was verified in multiple aspects, improving the ability to detect packed software under opened-set conditions.

4.1. Experimental Configuration

4.1.1. Experimental Equipment

The experiments were conducted on a laptop running Windows 11 Home, Version 25H2, equipped with an Intel Core i7-10710U CPU (6 cores, 12 threads, 1.10 GHz base, 4.70 GHz turbo, 384 KB L1 cache, 1.5 MB L2 cache, 12 MB L3 cache) and 16 GB RAM.

4.1.2. Dataset

In order to ensure the accuracy of experimental data annotation, all data were constructed using packing technology tools, with software in a personal computer as input, and the packed software were output. This experiment uses 10 classes of packed software, with their numbers and quantities shown in Table 4.

In order to achieve recognition of unseen classes, in the experimental setup, 5 classes were selected as seen classes each time, and the other 5 classes were selected as unseen classes. Therefore, a total of 21 groups were divided, and the information for each group is shown in Table 5.

4.1.3. Experimental Metrics

In order to effectively measure the location class of packed software, a recall based metric is proposed. As described in Section 2, in the opened-set problem of packed software, there are seen and unseen samples. For seen samples, multiple indicators are used to measure the ability of the method. Using seen recall

S R

(Equation (1)) to measure the recognition ability of the method on seen samples:

S R = T S / (T S + N S)

(1)

Equation (1) represents the number of seen samples that were correctly predicted and the number of seen samples that were incorrectly predicted. It should be noted that it does not include samples of misclassified classes that were predicted. For example, if a class 1 sample is predicted as a class 2 sample, it is still included.

Use unseen recall

U R

(Equation (2)) to measure the method’s ability to recognize unseen samples:

U R = T U / (T U + N U)

(2)

Equation (2) represents the number of correctly predicted unseen samples and the number of incorrectly predicted unseen samples.

Using the average

S R

, called

A S R

(Equation (3)) measurement method to evaluate the average performance of seen classes:

A S R = \sum_{i}^{n} S R_{i} / n

(3)

Using the average

U R

, called

A U R

(Equation (4)) measurement method to evaluate the average performance of unseen classes:

A U R = \sum_{i}^{n} U R_{i} / n

(4)

Using the average metric

A R

(Equation (5)) to measure the average performance:

A R = (A S R + A U R) / 2

(5)

4.2. Overall Analysis

After the discussion in Section 3, based on the multi class model scheme in Section 3.2, isolation forest (iForest) [30,31,32] algorithms was used to construct independent recognition models for multiple classes of packed software. The proposed method was finally implemented and measured using the metrics in Section 4.1.3.

The results of the experiment are shown in Table 6, as described in the dataset partitioning method in Section 4.1.2. It can be seen that in all 21 experimental groups,

S R

of POS can be maintained above 89%. In terms of

U R

, POS can also maintain a score of over 87% in each group of experiments. In the results of macroscopic observations, POS also maintained a high level of

A S R

and

A U R

,

A R

reaching 87.11%. The above results demonstrate that POS performs well in both of PRC and PRO problems.

4.3. Algorithm Analysis

In order to comprehensively evaluate the effectiveness of the scheme, comparative experiments were conducted on the algorithms. The role of local outlier factor (LOF) detection method and one-class-SVM [33] are similar to that of isolation forest, therefore, they are used as the algorithms for comparison. Table 7 shows the experimental results. LOF’s ability to recognize unseen classes is extremely poor. It focuses more on detecting samples in this class, which leads to overfitting. The metrics of One-class-SVM are better than LOF’s, but lower than POS’s.

4.4. Parameter Analysis

In a single class recognition model, there is an adjustable training parameter called training error (TE). TE is used to set the proportion of samples in the current training sample that will be discarded, because if all points are considered as being able to be classified into the current category, it will cause overfitting during training. Therefore, the TE value is always a decimal in the 0–1 open range.

In order to enhance the recognition ability of the proposed method for unseen classes, an analysis was conducted on the TE of each class. Specifically, a TE value range from 0 to 0.3 was used, and the best TE was gradually tested with a step size of 0.01. Figure 6 the relationship between AR and TE in 10 classes, where the coordinates of the maximum AR value in each class have been marked with circles and data labels. Table 8 shows the all values and corresponding TE for each classes, and the maximum TE for each classes is labeled in bold and underline. Table 9 shows the TEs list used in POS. Based on Figure 6 and Table 8, it can be observed that most classes achieve their peak AR within a particular TE range. In the case of Class 6 (Themida), several identical maximum values occur between 0.14 ≤ TE ≤ 0.18; consequently, the median value of this interval is adopted as the TE value for POS.

4.5. Comparative Analysis

In order to more intuitively demonstrate the effectiveness of POS, a comparison was made with other works. Since previous methods have assumed the closed-set problem of packed software, the multi-class model solution (Kim et al. [19] and Juang et al. [21]) in Section 3.2 can be used to endow it with the ability to recognize opened-set samples. As 2-SPIFF [22] uses a threshold setting solution, it is no longer adjusted. Table 10, Table 11, and Table 12 respectively present the multifaceted differences between the compared method and POS.

In Table 10, it can be clearly observed that the method proposed by Juang et al. [21] has the weakest ability to recognize unseen classes. This is because its feature extraction discrimination is too weak, and using entropy features will result in a large number of different byte orders being ignored, leading to the recognition model will judge most of the samples as packed software of unseen classes. The other three methods perform better because their features have good discriminability, and the algorithms and strategies used are in line with the requirements of simultaneously detecting seen and unseen classes. Among them, the method proposed by Kim et al. [19] uses the first byte method to have good recognition ability for version invariant packed software, which is actually a disguised label matching scheme, resulting in more completely correct seen classes recognition effects.

In Table 11, the recognition ability of each method for unseen classes is shown. The method proposed by Kim et al. [19], as it has always been similar to label matching, exhibits overfitting in its ability to recognize unseen classes, with only two values of 0 and 100 present in 21 experiments. Although the method proposed by Juang et al. [21] has good recognition ability for unseen classes, it completely sacrifices the recognition ability for seen classes. POS is relatively balanced, without overfitting or sacrificing any recognition ability.

Table 12 shows the macroscopic results of four methods, and it can be seen that POS achieved the highest average recall rate, demonstrating the effectiveness of the method in compatibility with seen and unseen classes.

To investigate the poor performance of the POS method in certain groups, dimensionality-reduced data were analyzed. Principal Component Analysis (PCA) is employed to compress high-dimensional data and visualize its spatial distribution. As shown in Figure 7, most classes exhibit clear separability; however, some classes contain scattered points that overlap with others, which compromises the effectiveness of specific groupings.

4.6. Real Malware Analysis

In order to measure the ability of POS in real packed malware, malware was collected on MALWAREBazaar (https://bazaar.abuse.ch/browse/) 500 samples. And using the method described in Section 4.1.2, two different packers Enigma (Malware 1) and eXPressor (Malware 2) were used for packing. POS identified two groups of real packed malware with clear labels. Table 13 shows the capabilities on malware data. Obviously, POS still has a certain ability to identify real packed malware.

4.7. Efficiency Analysis

To evaluate the efficiency of the POS methods in practical applications, the average time cost was measured for each stage: feature extraction (4.161249 s/sample), model training (0.24 s/model), and detection (0.0004 s/sample). Consequently, in real-world environments, the theoretical time required for a sample to be successfully detected is approximately 4 s.

5. Limitations

This section injects several common binary obfuscation attacks into real-world data, following the descriptions of Lucas et al. [34,35]. Note that more effective attacks typically require access to the malware source code (or equivalent capabilities). Table 14 lists the attacks and their descriptions. Table 15 reports the effects of the changestr attack on three groups: the first group (unattacked baseline), the second group (attacked, with the attacked classes unseen during training), and the third group (attacked, with the attacked classes seen during training). The results show that POS is affected by the changestr attack, and applying the more sophisticated attacks from Table 14 would further degrade POS performance. Enhancing the robustness of POS against such obfuscation techniques is left for future work.

6. Conclusions

Through an in-depth analysis of previous packed software recognition techniques, two key challenges in practical applications were identified: the Closed-set Packed Software Recognition (PRC) problem and the Open-set Packed Software Recognition (PRO) problem. These challenges not only affect recognition accuracy but also impose higher demands on the generalization capability of recognition systems. Specifically, the PRC problem deals with accurate classification within a known sample set, while the PRO problem addresses the handling of unseen class samples, particularly in the context of complex and diverse packed software. Three approaches to the PRO problem were analyzed and compared. After evaluating the strengths and weaknesses of each, a multi-class model recognition solution was chosen, as it effectively distinguishes between packed software classes and addresses the challenges posed by the PRO problem.

In order to further evaluate and measure the ability of this method in handling PRO problem, a new performance metric called “unseen recall rate” (

U R

) was proposed. This metric is mainly used to measure the recognition ability of the model when facing unseen packed software, especially in the accuracy of identifying unseen classes. Through verification and testing in multiple experimental scenarios, the experimental results show that the proposed method, called POS, exhibits superior performance in opened-set recognition tasks, effectively identifying unseen samples and maintaining a high recall rate. Therefore, based on the experimental results, the effectiveness and feasibility of this method in practical applications have been demonstrated.

In future work, existing methods will be further optimized, focusing on single-class recognition algorithms to improve

U R

. By refining model structure, algorithm strategies, and data preprocessing techniques, the goal is to enhance both robustness and accuracy, especially in handling unseen samples, thereby advancing the development of packed software recognition technology.

Author Contributions

Conceptualization, Z.Q.; Methodology, F.L.; Software, M.H. and H.L.; Validation, F.L.; Formal analysis, F.L.; Investigation, B.L. and H.L.; Resources, X.L. and H.L.; Data curation, C.Z. and H.L.; Writing—original draft, G.F. and H.L.; Writing—review & editing, G.F. and H.L.; Visualization, Y.H. and H.L.; Supervision, Z.Q. and F.L.; Project administration, Z.Q. and F.L.; Funding acquisition, Z.Q. and F.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by China Southern Power Grid’s major network-level scientific and technological project “Research and Application of Multi-dimensional Active Defense Technology for Digital Grid”, project number 037800KC24040002 (GDKJXM20240428).

Data Availability Statement

The personal software data used in this study cannot be made publicly available due to privacy concerns. However, the malware dataset can be accessed from the website [https://bazaar.abuse.ch/browse/].

Conflicts of Interest

Author Z.Q., F.L., M.H., B.L., X.L., C.Z., G.F. and Y.H. were employed by the company Guangdong Power Grid Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Alshoulie, M.; Mehmood, A. Deep Learning Approaches for Malware Detection: A Comprehensive Review of Techniques, Challenges, and Future Directions. IEEE Access 2025, 13, 118652–118677. [Google Scholar] [CrossRef]
Or-Meir, O.; Nissim, N.; Elovici, Y.; Rokach, L. Dynamic malware analysis in the modern era—A state of the art survey. ACM Comput. Surv. (CSUR) 2019, 52, 88. [Google Scholar] [CrossRef]
Tahir, R. A study on malware and malware detection techniques. Int. J. Educ. Manag. Eng. 2018, 8, 20. [Google Scholar] [CrossRef]
Liu, H.; Tian, Z.; Qiu, J.; Liu, Y.; Fang, B. Survey on Few-shot for Malware Detection. J. Softw. 2024, 35, 3785–3808. [Google Scholar] [CrossRef]
Hafeth, A.A.; Abdullahi, A.I. An Efficient Malware Detection Method Using a Hybrid ResNet-Transformer Network and IGOA-Based Wrapper Feature Selection. Electronics 2025, 14, 2741. [Google Scholar] [CrossRef]
Sherazi, S.N.A.; Qureshi, A. Hybrid Analysis Model for Detecting Fileless Malware. Electronics 2025, 14, 3134. [Google Scholar] [CrossRef]
Kulkarni, S.S.; Di Troia, F. Robust Hashing for Improved CNN Performance in Image-Based Malware Detection. Electronics 2025, 14, 3915. [Google Scholar] [CrossRef]
Tong, Y.; Liang, H.; Ma, H.; Zhang, S.; Yang, X. A Survey on Reinforcement Learning-Driven Adversarial Sample Generation for PE Malware. Electronics 2025, 14, 2422. [Google Scholar] [CrossRef]
Miura, H.; Kimura, T.; Hirata, K. Modeling of Malware Propagation in Wireless Mobile Networks with Hotspots Considering the Movement of Mobile Clients Based on Cosine Similarity. Electronics 2025, 14, 3528. [Google Scholar] [CrossRef]
Roy, A.; Di Troia, F. Discriminative Regions and Adversarial Sensitivity in CNN-Based Malware Image Classification. Electronics 2025, 14, 3937. [Google Scholar] [CrossRef]
Liu, H.; Zhou, Y.; Fang, B.; Sun, Y.; Hu, N.; Tian, Z. PHCG: PLC Honeypoint Communication Generator for Industrial IoT. IEEE Trans. Mob. Comput. 2025, 24, 198–209. [Google Scholar] [CrossRef]
Chen, K.; Lu, H.; Yao, Y.; Fang, B.; Liu, Y.; Tian, Z. Enhancing Container Security through Phase-Based System Call Filtering. IEEE Trans. Cloud Comput. 2025, 13, 983–994. [Google Scholar] [CrossRef]
Ren, Y.; Xiao, Y.; Zhou, Y.; Zhang, Z.; Tian, Z. CSKG4APT: A Cybersecurity Knowledge Graph for Advanced Persistent Threat Organization Attribution. IEEE Trans. Knowl. Data Eng. 2023, 35, 5695–5709. [Google Scholar] [CrossRef]
Wang, Z.; Zhou, Y.; Liu, H.; Qiu, J.; Fang, B.; Tian, Z. ThreatInsight: Innovating Early Threat Detection Through Threat-Intelligence-Driven Analysis and Attribution. IEEE Trans. Knowl. Data Eng. 2023, 36, 9388–9402. [Google Scholar] [CrossRef]
Gubbi, K.I.; Saber Latibari, B.; Srikanth, A.; Sheaves, T.; Beheshti-Shirazi, S.A.; PD, S.M.; Rafatirad, S.; Sasan, A.; Homayoun, H.; Salehi, S. Hardware trojan detection using machine learning: A tutorial. ACM Trans. Embed. Comput. Syst. 2023, 22, 46. [Google Scholar] [CrossRef]
Xie, B.; Liu, M. Dynamics stability and optimal control of virus propagation based on the e-mail network. IEEE Access 2021, 9, 32449–32456. [Google Scholar] [CrossRef]
Bala, B.; Behal, S. AI techniques for IoT-based DDoS attack detection: Taxonomies, comprehensive review and research challenges. Comput. Sci. Rev. 2024, 52, 100631. [Google Scholar] [CrossRef]
Zheng, R.; Wang, Q.; Lin, Z.; Jiang, Z.; Fu, J.; Peng, G. Cryptocurrency malware detection in real-world environment: Based on multi-results stacking learning. Appl. Soft Comput. 2022, 124, 109044. [Google Scholar] [CrossRef]
Kim, Y.; Paik, J.Y.; Choi, S.; Cho, E.S. Efficient svm based packer identification with binary diffing measures. In Proceedings of the 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), Milwaukee, WI, USA, 15–19 July 2019; Volume 1, pp. 795–800. [Google Scholar]
Li, X.; Shan, Z.; Liu, F.; Chen, Y.; Hou, Y. A consistently-executing graph-based approach for malware packer identification. IEEE Access 2019, 7, 51620–51629. [Google Scholar] [CrossRef]
Jung, B.; Bae, S.I.; Choi, C.; Im, E.G. Packer identification method based on byte sequences. Concurr. Comput. Pract. Exp. 2020, 32, e5082. [Google Scholar] [CrossRef]
Liu, H.; Guo, C.; Cui, Y.; Shen, G.; Ping, Y. 2-SPIFF: A 2-Stage Packer Identification Method Based on Function Call Graph and File Attributes. Appl. Intell. 2021, 51, 9038–9053. [Google Scholar] [CrossRef]
Alkhateeb, E.; Ghorbani, A.; Habibi Lashkari, A. Identifying malware packers through multilayer feature engineering in static analysis. Information 2024, 15, 102. [Google Scholar] [CrossRef]
Mondon, P.; de Lemos, R. Detecting Cryptographic Functions for String Obfuscation. In Proceedings of the 2024 IEEE International Conference on Cyber Security and Resilience (CSR), London, UK, 2–4 September 2024; pp. 315–320. [Google Scholar]
Di Gennaro, M.; D’Onghia, M.; Polino, M.; Zanero, S.; Carminati, M. PackHero: A Scalable Graph-based Approach for Efficient Packer Identification. In Proceedings of the International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, Lausanne, Switzerland, 17–19 July 2024; Springer: Berlin/Heidelberg, Germany, 2025; pp. 253–274. [Google Scholar]
Li, J.; He, J.; Li, W.; Fang, W.; Yang, G.; Li, T. SynDroid: An adaptive enhanced Android malware classification method based on CTGAN-SVM. Comput. Secur. 2024, 137, 103604. [Google Scholar] [CrossRef]
Lichy, A.; Bader, O.; Dubin, R.; Dvir, A.; Hajaj, C. When a RF beats a CNN and GRU, together—A comparison of deep learning and classical machine learning approaches for encrypted malware traffic classification. Comput. Secur. 2023, 124, 103000. [Google Scholar] [CrossRef]
Akhtar, M.S.; Feng, T. Detection of malware by deep learning as CNN-LSTM machine learning techniques in real time. Symmetry 2022, 14, 2308. [Google Scholar] [CrossRef]
Alzahrani, A.I.; Ayadi, M.; Asiri, M.M.; Al-Rasheed, A.; Ksibi, A. Detecting the presence of malware and identifying the type of cyber attack using deep learning and VGG-16 techniques. Electronics 2022, 11, 3665. [Google Scholar] [CrossRef]
Zhai, Y.; Liu, D.; Cheng, Z.; Fang, S. A Novel Prognostic Model of the Degradation Malfunction Combining a Dynamic Updated-ARIMA and Multivariate Isolation Forest: Application to Radar Transmitter. Electronics 2022, 11, 1921. [Google Scholar] [CrossRef]
Heigl, M.; Anand, K.A.; Urmann, A.; Fiala, D.; Schramm, M.; Hable, R. On the Improvement of the Isolation Forest Algorithm for Outlier Detection with Streaming Data. Electronics 2021, 10, 1534. [Google Scholar] [CrossRef]
Fang, N.; Fang, X.; Lu, K. Anomalous Behavior Detection Based on the Isolation Forest Model with Multiple Perspective Business Processes. Electronics 2022, 11, 3640. [Google Scholar] [CrossRef]
Zhao, Y.; Zhou, X.; Chen, L.; Mao, Y.; Yan, M. Research on Abnormal Radio Detection Method Combining Local Outlier Factor and One-Class Support Vector Machine. Electronics 2025, 14, 4055. [Google Scholar] [CrossRef]
Lucas, K.; Pai, S.; Lin, W.; Bauer, L.; Reiter, M.K.; Sharif, M. Adversarial Training for Raw-Binary Malware Classifiers. In Proceedings of the 32nd USENIX Security Symposium (USENIX Security 23), Anaheim, CA, USA, 9–11 August 2023; pp. 1163–1180. [Google Scholar]
Lucas, K.; Lin, W.; Bauer, L.; Reiter, M.K.; Sharif, M. Training Robust ML-based Raw-Binary Malware Detectors in Hours, not Months. In Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, New York, NY, USA, 14–18 October 2024; CCS ’24. pp. 124–138. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of feature space in closed-set recognition problem.

Figure 2. Schematic diagram of threshold setting solution.

Figure 3. Schematic diagram of single model solution.

Figure 4. Schematic diagram of multi-class model solution.

Figure 5. Framework of POS.

Figure 6. Relation of AR and TE.

Figure 7. PCA of features.

Table 1. List of abbreviations.

Abbreviation	Description	First Occurrence
SVM	Support vector machine	Section 1
RF	Random forest	Section 1
CNN	Convolutional neural network	Section 1
LSTM	Long short-term memory network	Section 1
CFG(s)	Control flow graph	Section 2.2
FCG(s)	Function call graph	Section 2.2
CEG(s)	Consistent executing graph	Section 2.2
GMN(s)	Graph matching network	Section 2.2
CG(s)	Call graph	Section 2.2
DNN	Deep neural network	Table 2
GBoost	Gradient Boosting	Table 2
KNN	K-nearest neighbors	Table 2
PRC	Packed software recognition on closed-set	Definition 1
PRO	Packed software recognition on opened-set	Definition 2
iForest	Isolation forest	Section 4.2
PCA	Principal Component Analysis	Section 4.5

Table 3. Features of POS.

Name	Description ¹
Number of vertices	-
Number of directed edges	-
Maximum penetration	-
Minimum output	-
Relationship between vertices and directed edges	>, <, or =
Differ between vertices and directed edges	The absolute value of the difference in the number of vertices and directed edges
Maximum degree	-
Minimum degree	-
Number of connected parts	-
Number of root vertices	in-degree = 0 and out-degree ≠ 0
Number of leaf vertex	out-degree = 0 and in-degree ≠ 0
Number of isolated vertices	in-degree = out-degrees = 0
Number of aggregated-call vertices	in-degree > out-degree > 0
Number of divided-call vertices	out-degree > in-degree > 0
Number of transferred-call vertices	out-degree = in-degree = 1
In-degree of the EP vertex	EP is the program entry point function
Out-degree of the EP vertex	-
Number of sections	The number of sections in software
Maximum segment size	-

¹—means the name is equal to the description.

Table 4. Dataset of packed software.

Class ID	Packing Techniques Name	Number
1	ASPack	588
2	Enigma	588
3	eXPressor	584
4	mpress	570
5	Nspack	587
6	Themida	581
7	UPX	590
8	VMProtect	575
9	WinUpack	584
10	ZProtect	585

Table 5. Groups of dataset.

Group ID	Train Classes	Group ID	Train Classes
1	1, 2, 3, 4, 5	12	3, 4, 5, 6, 7
2	1, 2, 3, 4, 6	13	3, 4, 5, 6, 8
3	1, 2, 3, 4, 7	14	3, 4, 5, 6, 9
4	1, 2, 3, 4, 8	15	3, 4, 5, 6, 10
5	1, 2, 3, 4, 9	16	4, 5, 6, 7, 8
6	1, 2, 3, 4, 10	17	4, 5, 6, 7, 9
7	2, 3, 4, 5, 6	18	4, 5, 6, 7, 10
8	2, 3, 4, 5, 7	19	5, 6, 7, 8, 9
9	2, 3, 4, 5, 8	20	5, 6, 7, 8, 10
10	2, 3, 4, 5, 9	21	6, 7, 8, 9, 10
11	2, 3, 4, 5, 10

Table 6. Result of POS (%).

Group ID	SR	UR	Group ID	SR	UR
1	92.28	99.91	12	87.82	99.91
2	90.39	100	13	90.51	99.61
3	89.76	94.22	14	89.33	94.55
4	92.45	99.91	15	87.62	99.78
5	91.27	91.3	16	90.37	99.61
6	89.56	99.44	17	89.19	87.04
7	90.36	99.91	18	87.48	99.65
8	89.73	94.1	19	89.55	90.3
9	92.42	99.83	20	87.84	99.57
10	91.24	89.44	21	86.83	91.71
11	89.53	99.4
ASR	89.79	AUR	96.62	AR	93.21

Table 7. Results of LOF, One-class-SVM and POS (%).

	LOF		One-Class-SVM		POS
Group ID	SR	UR	SR	UR	SR	UR
1	98.80	25.41	88.12	99.93	92.28	99.91
2	98.97	10.44	88.85	80.64	90.39	100.0
3	99.14	25.02	87.84	80.68	89.76	94.22
4	99.32	15.68	88.55	72.39	92.45	99.91
5	98.97	24.67	89.52	79.70	91.27	91.30
6	99.14	25.70	88.87	74.82	89.56	99.44
7	98.97	14.38	92.94	99.52	90.36	99.91
8	99.14	14.43	91.92	80.01	89.73	94.10
9	99.31	4.68	92.62	71.77	92.42	99.83
10	99.14	14.14	93.1	78.39	91.24	89.44
11	99.14	14.60	92.96	74.16	89.53	99.40
12	99.14	5.03	94.49	99.96	87.82	99.91
13	99.31	4.38	95.19	72.25	90.51	99.61
14	99.14	13.95	95.16	79.05	89.33	94.55
15	99.14	5.04	95.52	71.10	87.62	99.78
16	99.66	5.48	96.74	89.60	90.37	99.61
17	99.66	5.28	96.72	63.23	89.19	87.04
18	99.66	5.56	97.25	75.02	87.48	99.65
19	99.83	42.75	97.07	87.21	89.55	90.30
20	99.83	33.43	97.6	73.20	87.84	99.57
21	100.0	61.99	97.77	97.29	86.83	91.71
ASR	99.30		93.28		89.79
AUR	17.72		80.95		96.62
AR	58.51		87.11		93.21

Table 8. Relationships between AR and TE (%).

Class ID	0.01	0.02	0.03	0.04	0.05	0.06	0.07	0.08	0.09	0.1
1	55.83	56.04	69.37	75.17	80.18	81.77	92.22	93.4	95.85	97.43
2	66.27	66.72	75.13	87.33	87.35	87.46	87.25	87.98	90.8	90.59
3	61.1	65.37	71.1	80.46	81.12	84.71	84.48	84.85	86.39	87.02
4	56.3	59.11	67.3	72.05	75.8	80.84	88.17	88.88	88.52	89.81
5	66.75	83.83	94.75	95.01	95.36	95.4	95.2	94.41	93.8	92.46
6	54.36	59.31	72.64	74.81	81.12	81.26	86.6	87.74	87.75	91.37
7	58.74	60.79	66.03	66.5	67.52	70.98	72.19	71.84	72.41	75.02
8	55.89	76.05	81.71	82.29	95.6	97.42	97.06	96.28	96.28	95.42
9	59.48	66.83	69.46	69.47	69.26	71.41	73.56	76.43	78.94	81.57
10	55.3	59.27	62.9	76.53	78.35	80.55	84.78	86.44	89.19	89.95
Class ID	0.11	0.12	0.13	0.14	0.15	0.16	0.17	0.18	0.19	0.2
1	96.61	96.19	96.19	94.92	94.07	93.64	93.64	93.64	93.64	92.8
2	89.77	89.35	88.87	88.48	87.71	87.88	87.9	87.09	87.14	86.31
3	87.11	87.37	88.82	88.61	89.53	89.91	90.07	91.32	91.37	91.38
4	90.59	92.66	93.44	94.83	96.77	96.86	96.49	96.49	96.05	95.18
5	93.39	93.04	92.61	92.61	92.61	91.74	91.3	90.87	90	87.39
6	92.16	92.5	92.63	92.67	92.67	92.67	92.67	92.67	91.38	90.95
7	78.85	78.33	78.4	78.49	77.78	77.74	79.76	82.8	82.81	81.72
8	94.55	94.13	93.69	93.69	93.27	93.27	92.42	92.43	92.46	91.59
9	82.64	83.65	85.07	84.62	84.81	84.07	84.06	83.17	82.69	82.92
10	89.53	89.56	89.15	89.25	89.67	90.14	90.32	90.35	90.03	90.03
Class ID	0.21	0.22	0.23	0.24	0.25	0.26	0.27	0.28	0.29	0.3
1	90.68	90.25	90.25	90.25	90.25	88.56	88.14	88.14	88.14	87.71
2	85.94	85.94	87.52	87.53	87.59	87.6	86.36	86.37	85.95	88.11
3	90.09	90.09	90.09	89.67	88.39	88.39	87.98	86.7	86.28	85.43
4	94.3	94.3	93.42	93.42	92.98	92.54	91.67	90.35	90.35	89.04
5	87.39	87.39	87.39	86.52	85.22	84.35	83.04	83.04	83.04	83.04
6	90.52	90.52	90.52	88.79	86.64	86.64	85.78	85.78	85.34	84.05
7	80.99	81.19	81.01	81.91	82.09	81.56	81.87	82.27	81.93	81.62
8	91.16	89.87	87.3	86.87	86.44	85.58	85.14	84.73	84.73	84.31
9	82.72	82.98	82.86	81.76	81.93	82.19	82.27	81.96	81.17	80.83
10	89.2	88.35	88.36	88.36	87.51	86.67	84.97	83.68	83.26	82.4

Table 9. TEs of POS.

Class	1	2	3	4	5	6	7	8	9	10
TE	0.1	0.09	0.2	0.16	0.06	0.16	0.19	0.06	0.13	0.18
AR	97.43	90.8	91.38	96.86	95.4	92.67	82.81	97.42	85.07	90.35

Table 10. SR results of compared methods (%).

Group ID	Kim et al. [19]	Juang et al. [21]	2-SPIFF [22]	POS
1	100.0	1.10	99.66	92.28
2	80.27	0.00	98.97	90.39
3	99.14	0.00	99.49	89.76
4	100.0	0.00	98.97	92.45
5	99.14	0.00	99.66	91.27
6	100.0	0.00	95.03	89.56
7	100.0	2.9	99.66	90.36
8	99.14	0.33	100.0	89.73
9	100.0	0.47	99.48	92.42
10	99.31	0.54	100.0	91.24
11	100.0	0.40	96.21	89.53
12	99.66	0.60	99.66	87.82
13	99.65	1.77	99.48	90.51
14	99.48	0.41	99.48	89.33
15	99.66	1.10	99.66	87.62
16	77.72	0.48	99.31	90.37
17	94.32	2.54	99.31	89.19
18	82.44	0.13	99.66	87.48
19	75.6	0.00	98.97	89.55
20	62.20	0.00	98.8	87.84
21	61.23	2.51	98.46	86.83

Table 11. UR results of compared methods (%).

Group ID	Kim et al. [19]	Juang et al. [21]	2-SPIFF [22]	POS
1	100.0	99.67	46.21	99.91
2	100.0	99.80	54.55	100.0
3	100.0	99.88	30.46	94.22
4	0.00	99.88	59.82	99.91
5	100.0	98.81	31.19	91.30
6	0.00	99.84	48.13	99.44
7	100.0	100.0	51.78	99.91
8	100.0	100.0	51.32	94.10
9	0.00	99.43	56.28	99.83
10	100.0	98.11	48.17	89.44
11	0.00	99.49	69.12	99.40
12	0.00	100.0	51.27	99.91
13	0.00	100.0	33.36	99.61
14	100.0	98.37	28.26	94.55
15	0.00	100.0	54.87	99.78
16	100.0	100.0	50.56	99.61
17	100.0	93.76	36.16	87.04
18	100.0	100.0	41.59	99.65
19	100.0	98.91	48.89	90.30
20	100.0	100.0	50.65	99.57
21	100.0	100.0	57.52	91.71

Table 12. Macro results of compared methods (%).

	Kim et al. [19]	Juang et al. [21]	2-SPIFF [22]	POS
ASR	91.86	0.73	99.04	89.79
AUR	66.66	99.33	47.63	96.62
AR	79.26	50.03	73.33	93.21

Table 13. Macro results of packed malware (%).

	POS	POS vs. Malware 1	POS vs. Malware 2
ASR	89.79	88.29 (−0.15)	89.71 (−0.08)
AUR	96.62	95.29 (−1.33)	93.60 (−0.02)
AR	93.21	91.79 (−1.42)	91.38 (−1.83)

Table 14. Common obfuscation attacks.

Abbreviation	Description
changestr	Change strings
equiv	Equivalent instruction replacement
swap	Swap the registers being used
preserv	Preserve register function
reorder	Reorder instruction
disp	Displacement instruction
semnops	Add semantic NOP instruction

Table 15. Results of POS attacked by different obfuscation (%).

Attack	1st Group	2nd Group	3rd Group
ASR	88.95	87.89 (−1.06)	86.64 (−2.31)
AUR	95.21	94.02 (−1.19)	94.02 (−1.19)
AR	92.08	90.33 (−1.75)	90.33 (−1.75)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qian, Z.; Liu, F.; He, M.; Li, B.; Li, X.; Zhao, C.; Fu, G.; Hu, Y.; Liu, H. POS: A Recognition Method for Packed Software in Opened-Set Scenario. Electronics 2025, 14, 4450. https://doi.org/10.3390/electronics14224450

AMA Style

Qian Z, Liu F, He M, Li B, Li X, Zhao C, Fu G, Hu Y, Liu H. POS: A Recognition Method for Packed Software in Opened-Set Scenario. Electronics. 2025; 14(22):4450. https://doi.org/10.3390/electronics14224450

Chicago/Turabian Style

Qian, Zhenghao, Fengzheng Liu, Mingdong He, Bo Li, Xuewu Li, Chuangye Zhao, Gehua Fu, Yifan Hu, and Hao Liu. 2025. "POS: A Recognition Method for Packed Software in Opened-Set Scenario" Electronics 14, no. 22: 4450. https://doi.org/10.3390/electronics14224450

APA Style

Qian, Z., Liu, F., He, M., Li, B., Li, X., Zhao, C., Fu, G., Hu, Y., & Liu, H. (2025). POS: A Recognition Method for Packed Software in Opened-Set Scenario. Electronics, 14(22), 4450. https://doi.org/10.3390/electronics14224450

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

POS: A Recognition Method for Packed Software in Opened-Set Scenario

Abstract

1. Introduction

2. Related Work

2.1. Recognition Method Based on Feature Analysis

2.2. Recognition Method Based on Graph Analysis

2.3. Summary of Related Work

3. Motivation and Method

3.1. Problem Definition

3.2. Solution Analysis

3.2.1. Threshold Setting Solution

3.2.2. Single Model Solution

3.2.3. Multi-Class Model Solution

3.3. Proposed Method

4. Experiments and Discussions

4.1. Experimental Configuration

4.1.1. Experimental Equipment

4.1.2. Dataset

4.1.3. Experimental Metrics

4.2. Overall Analysis

4.3. Algorithm Analysis

4.4. Parameter Analysis

4.5. Comparative Analysis

4.6. Real Malware Analysis

4.7. Efficiency Analysis

5. Limitations

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI