SINNER: A Reward-Sensitive Algorithm for Imbalanced Malware Classification Using Neural Networks with Experience Replay
Abstract
:1. Introduction
- It presents SINNER, i.e., a DRL-based classifier, which leverages a reward function slightly modified compared to that proposed in [28].
- It provides an extended benchmark analysis that involves a state-of-the art DL-based malware family classifier that can deal with class skew at the algorithm level.
2. Preliminaries
2.1. Malware Analysis
2.1.1. Static Analysis
2.1.2. Dynamic Analysis
2.2. Deep Reinforcement Learning
2.2.1. Deep Q-Network
2.2.2. Double Deep Q-Network
2.2.3. Dueling Network
2.2.4. Prioritized Experience Replay
3. Related Work
3.1. Deep Learning for API-Based Malware Classification
3.2. Imbalanced Multi-Class Malware Classification
3.3. Deep Reinforcement Learning for Malware Analysis
3.4. Motivation
4. Methodology
4.1. Environment Setting
- Training data provide the observation space S; therefore, each training sample represents an observation for a specific timestep t. Note that , with m the number of samples within the training set and n the number of features.
- The action space A consists of all known labels for classes. Therefore, given K classes, , i.e., .
- The reward function represents the main component of the proposed cost-sensitive approach according to the following formula:
- Finally, according to the definition of S, the states-transition probability is deterministic; thus, the agent advances from to , as determined by the order of the samples within S.
4.2. Reward-Sensitive Training Analysis
4.3. Modified Reward Function
5. Experimental Setup
5.1. Approaches Used for Benchmark
- According to the literature review provided in Section 3, the following DL models are selected and combined with the cost-sensitive strategies proposed in [11,62], the working principle of which is shown in Table 2:
- -
- LSTM [79]: This popular model belongs to the class of RNNs. The structure of this network consists of three gates in its hidden layers: an input gate, an output gate, and a forget gate. These entities form the so-called memory cell, which traces the data flow, i.e., remembers or forgets information over time. In such a way, LSTMs can maintain long-term dependencies on sequential data. The LSTM used in our experiments has 100 units (size of hidden cells) connected to a final layer, which is a multi-class classification layer (K nodes, each having a softmax activation). Between these connections, there is a dropout layer to mitigate overfitting using a chance of of randomly discarding a neuron.
- -
- BiLSTM [80]: This differs from the aforementioned model regarding the adoption of a bidirectional layer, which enables the forwarding and backwarding of the input to two separate recurrent nets, both of which are connected to the same output layer (having the same properties of the LSTM last layer).
- -
- BiGRU [81]: This method uses a bidirectional approach to analyze sequences in both directions as the previous DL model described, involving as a main model the so-called GRU, which is an LSTM variant. In fact, the GRU has gating units (update and reset gates) that control the flow of information inside each unit without having separate memory cells. The update gate helps the model to determine how much of the past information (from previous time steps) must be passed to the future. In contrast, the reset gate is used by the model to determine how much of the past information is to be forgotten. In this case, the dropout rate is fixed so that a neuron can be discarded with a probability of 0.3.
- -
- TabNet [82]: This is a DL architecture specifically designed for tabular data. During each of the decision steps, such a model exploits a sequential attention mechanism to select features useful to perform a specific prediction, according to the aggregated information collected (the aggregation in dimension is realized by the attentive transformer component of the TabNet encoder). This property enhances the explainability of the model (because of the presence of a feature masking component, which is part of the TabNet decoder, i.e., the module delegated to reconstruct the features generated by the encoder). According to the suggestions provided by the authors of the original paper, , .
Lastly, all the above DL models optimize the loss function using the Adam optimization algorithm and sampling mini-batches of 128, 64, and 1024 training samples for LSTM, bidirectional models, and TabNet, respectively. A number of 50 epochs were considered for the first three models, whereas the last model was trained on 100 training epochs. - RTF [16]: This model consists of an ensemble of homogeneous (equivalent structure of base estimators) pre-trained transformer models. Each is fine-tuned to implement a sequence classification layer using a subset of training data obtained as a result of a stratified (to retain the class distribution coming from the original set) bootstrap sampling technique. Each i-th model, with , generates a probability employed in a majority voting schema, which leads to a traditional bagging method (exploiting the robustness of such an algorithmic procedure with respect to class skew). BERT and CANINE (including CANINE-C and CANINE-S variants) were evaluated as pre-trained models. The setting was the same as that proposed in the experimental evaluation of the original article (Table 7 in [16]).
5.2. Datasets Selected for This Study
- APIs statically extracted from the PE structure of malware samples collected from two main providers, i.e., VirusShare (https://virusshare.com/, accessed on 16 May 2024) and VirusSample (https://www.virussamples.com/, accessed on 16 May 2024), which were labeled using the VirusTotal (https://www.virustotal.com/, accessed on 16 May 2024) engine [83]. These two datasets differ in the number of samples and malware families within each, but they share the feature space size as in [16].
- API sequences traced by dynamically analyzing each malware sample using the Cuckoo sandbox. These are collected in two different datasets, namely Catak [63] and Oliveira [84]. Note that while Catak represents the state-of-the-art in the category of multi-class malware classification problems using APIs, the Oliveira dataset was released as suitable for binary classification problems. Therefore, only the malware contained in the latter dataset were used and labeled, so they were assigned to the malware family indicated by the VirusTotal service. In this assignment process, statistical units without associated classes and malware families with fewer than 100 samples were discarded [16].
5.3. Metrics
- The F1 score, as the harmonic mean of precision (PREC) and true positive rate (TPR), defined as and , i.e., . Specifically, the macro-averaged metric was examined because it assumes that each class has the same impact regardless of its skew [86].
- The area under receiver operating characteristic curve (AUC) computed by identifying the surface below the graph that relates the false positive rate (FPR) to the TPR.
5.4. Setting of the Proposed Methodology
5.5. Hardware Settings and Implementation Details
6. Results and Discussion
6.1. Reward Influence Analysis
- The three algorithms that achieve the highest F1 score among all evaluated DRL configurations in the case of the Catak dataset are dueling DDQN, dueling DQN with PER and dueling DQN. This trio shares a key finding: the reward function used is Equation (11). With the same configuration, adopting Equation (7) results in a performance degradation that is more evident in the F1 score than in the AUC.
- Using the Oliveira dataset, among the top three performers, there are, once more, dueling DQN and dueling DDQN, followed by the dueling of DDQN with PER. As before, the best results are obtained using as the reward; in fact, it is remarkable that the three opposite algorithms achieve F1 scores that are half of those achieved by the algorithms using Equation (11). Similarly, using rather than r improves the AUC.
- Using the PER technique for the VirusSample dataset brings benefits that are reflected in the performance achieved by the dueling of DQN (which performs effectively also using the ER not prioritized) and DDQN algorithms, respectively. The dueling of DQN with PER, adopting for reward-sensitive training, reaches an F1 close to 80%, outperforming the same algorithm configuration trained using Equation (7) by . An improvement in F1 scores is also found for the remaining two algorithms using instead of r. In contrast, the opposite trend is shown by evaluating the AUC.
- The benefit achieved by introducing the revised reward formulation is confirmed for the VirusShare dataset, for which the top performers are given by the following three algorithms: dueling DDQN, dueling DQN, and dueling DDQN with PER.
6.2. Performance Comparison
- Table 4 reveals that the cost-sensitive strategies proposed in [11] and [62] are uniquely beneficial to the BiLSTM model when using the VirusShare dataset. In fact, the remaining three DL algorithms do not produce satisfactory performance, with AUC values (∼0.5) indicating that the algorithms performed random classifications. However, the classification metric scores obtained by the BiLSTM algorithm do not reach the state-of-the-art performance achieved by the RTF algorithm. In addition, it appears to be extremely disadvantageous from the perspective of the required training time, which is the maximum in this list, regardless of the cost-sensitive strategy adopted. The proposed methodology outperforms all of the competitors in terms of F1 score and prediction time. Specifically, SINNER achieves a value of the F1 that is approximately 2% higher compared to the same obtained by RTF, requiring significantly less prediction time. On the other hand, RTF remains advantageous in terms of both the training period and the AUC value that it yields.
- In contrast to the previous case, Table 5 indicates that the results achieved by BiGRU are comparable to those obtained by the BiLSTM. In particular, the scores produced are comparable when adopting a specific cost-sensitive strategy; however, between the two alternatives, the one based on the use of the custom loss function proposed in [62] performs better in timing performance and F1 score. Furthermore, LSTM and TabNet combined with cost-sensitive strategies are also ineffective in this test. However, as mentioned previously, bidirectional classifiers do not achieve performance comparable to RTF, which is targeted by SINNER. In fact, the proposed methodology generated the closest F1 score to that achieved by RTF, with an inference time that is again shorter, although it is ten times longer in learning time and less in AUC.
- As shown in Table 6, SINNER appears to be underperforming when compared with bidirectional DL models combined with the pair of cost-sensitive strategies and RTF, which remains the state-of-the-art model on the Catak dataset with impressive F1 scores and AUC values. Therefore, it appears that SINNER has difficulty learning using an observation space with a large number of variables (n).
- According to Table 7, five top performers are identified using the Oliveira dataset, namely LSTM (that joins the top classifiers for the first time, indicating the presence of a temporal relationship between the variables in each statistical unit, which is a likely condition since the dataset is extracted from a dynamic analysis process); BiLSTM and BiGRU leveraging the strategy proposed in [62]; RTF, and SINNER. In particular, the five algorithms obtained F1 values between 0.561 and 0.569. While RTF remains the most advantageous in terms of AUC, SINNER stands out with respect to timing performance, requiring the second-lowest training time (LSTM is the top performer for this particular metric) and the lowest prediction time, which, as in all the cases discussed above, is approximately hundredths of seconds.
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
ADASYN | Adaptive Synthetic Sampling Approach for Imbalanced Learning |
AI | Artificial Intelligence |
API | Application Programming Interface |
AUC | Area Under Receiver Operating Characteristic Curve |
AutoML | Automated Machine Learning |
AV | Anti-Virus |
BERT | Bidirectional Encoder Representations from Transformers |
BiLSTM | Bidirectional Long Short-Term Memory |
BiGRU | Bidirectional Gated Recurrent Unit |
CANINE | Character Architecture with No Tokenization In Neural Encoders |
CART | Classification and Regression Tree |
CNN | Convolutional Neural Network |
DL | Deep Learning |
DNN | Deep Neural Network |
DRL | Deep Reinforcement Learning |
DQN | Deep Q-Network |
DDQN | Double Deep Q-Network |
ER | Experience Replay |
FN | False Negative |
FP | False Positive |
FPR | False Positive Rate |
GRU | Gated Recurrent Unit |
ICMDP | Imbalanced Classification Markov Decision Process |
IoC | Indicator of Compromise |
IoT | Internet of Things |
LIME | Local Interpretable Model-Agnostic Explanations |
LSTM | Long Short-Term Memory |
MDP | Markov Decision Process |
ML | Machine Learning |
MLP | Multilayer Perceptron |
NoisyNet | Noisy Network |
NLP | Natural Language Processing |
PE | Portable Executable |
PER | Prioritized Experience Replay |
PREC | Precision |
PPO | Proximal Policy Optimization |
RELU | Rectified Linear Unit |
RL | Reinforcement Learning |
RNN | Recurrent Neural Network |
ROS | Random Oversampler |
RTF | Random Transformer Forest |
RUS | Random Undersampler |
SHAP | Shapley Additive Explanations |
TabNet | Deep Neural Network Architecture for Tabular Data |
TD | Temporal Difference |
T-link | Tomek Links |
TN | True Negative |
TP | True Positive |
TPR | True Positive Rate |
US | Unbalanced Scenario |
XAI | Explainable Artificial Intelligence |
YARA | Yet Another Recursive Acronym |
References
- Aboaoja, F.A.; Zainal, A.; Ghaleb, F.A.; Al-rimy, B.A.S.; Eisa, T.A.E.; Elnour, A.A.H. Malware detection issues, challenges, and future directions: A survey. Appl. Sci. 2022, 12, 8482. [Google Scholar] [CrossRef]
- Sibi Chakkaravarthy, S.; Sangeetha, D.; Vaidehi, V. A Survey on malware analysis and mitigation techniques. Comput. Sci. Rev. 2019, 32, 1–23. [Google Scholar] [CrossRef]
- Xu, L.; Qiao, M. Yara rule enhancement using Bert-based strings language model. In Proceedings of the 2022 5th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE), Wuhan, China, 22–24 April 2022; pp. 221–224. [Google Scholar] [CrossRef]
- Coscia, A.; Dentamaro, V.; Galantucci, S.; Maci, A.; Pirlo, G. YAMME: A YAra-byte-signatures Metamorphic Mutation Engine. IEEE Trans. Inf. Forensics Secur. 2023, 18, 4530–4545. [Google Scholar] [CrossRef]
- Or-Meir, O.; Nissim, N.; Elovici, Y.; Rokach, L. Dynamic Malware Analysis in the Modern Era—A State of the Art Survey. ACM Comput. Surv. 2019, 52, 1–48. [Google Scholar] [CrossRef]
- Ucci, D.; Aniello, L.; Baldoni, R. Survey of machine learning techniques for malware analysis. Comput. Secur. 2019, 81, 123–147. [Google Scholar] [CrossRef]
- Liu, Y.; Wang, Y. A Robust Malware Detection System Using Deep Learning on API Calls. In Proceedings of the 2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), Chengdu, China, 15–17 March 2019; pp. 1456–1460. [Google Scholar] [CrossRef]
- Vinayakumar, R.; Alazab, M.; Soman, K.P.; Poornachandran, P.; Venkatraman, S. Robust Intelligent Malware Detection Using Deep Learning. IEEE Access 2019, 7, 46717–46738. [Google Scholar] [CrossRef]
- Li, C.; Cheng, Z.; Zhu, H.; Wang, L.; Lv, Q.; Wang, Y.; Li, N.; Sun, D. DMalNet: Dynamic malware analysis based on API feature engineering and graph learning. Comput. Secur. 2022, 122, 102872. [Google Scholar] [CrossRef]
- Rabadi, D.; Teo, S.G. Advanced Windows Methods on Malware Detection and Classification. In Proceedings of the ACSAC ’20: 36th Annual Computer Security Applications Conference, Austin, TX, USA, 7–11 December 2020; pp. 54–68. [Google Scholar] [CrossRef]
- Alzammam, A.; Binsalleeh, H.; AsSadhan, B.; Kyriakopoulos, K.G.; Lambotharan, S. Comparative Analysis on Imbalanced Multi-class Classification for Malware Samples using CNN. In Proceedings of the 2019 International Conference on Advances in the Emerging Computing Technologies (AECT), Al Madinah Al Munawwarah, Saudi Arabia, 10 February 2020; pp. 1–6. [Google Scholar] [CrossRef]
- Lu, Y.; Shetty, S. Multi-Class Malware Classification Using Deep Residual Network with Non-SoftMax Classifier. In Proceedings of the 2021 IEEE 22nd International Conference on Information Reuse and Integration for Data Science (IRI), Las Vegas, NV, USA, 10–12 August 2021; pp. 201–207. [Google Scholar] [CrossRef]
- Kumar, K.A.; Kumar, K.; Chiluka, N.L. Deep learning models for multi-class malware classification using Windows exe API calls. Int. J. Crit. Comput.-Based Syst. 2022, 10, 185–201. [Google Scholar] [CrossRef]
- Oak, R.; Du, M.; Yan, D.; Takawale, H.; Amit, I. Malware Detection on Highly Imbalanced Data through Sequence Modeling. In Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security, Association for Computing Machinery, London, UK, 15 November 2019; pp. 37–48. [Google Scholar] [CrossRef]
- Ding, Y.; Wang, S.; Xing, J.; Zhang, X.; Qi, Z.; Fu, G.; Qiang, Q.; Sun, H.; Zhang, J. Malware Classification on Imbalanced Data through Self-Attention. In Proceedings of the 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Guangzhou, China, 29 December 2020–1 January 2021; pp. 154–161. [Google Scholar] [CrossRef]
- Demirkıran, F.; Çayır, A.; Ünal, U.; Dağ, H. An ensemble of pre-trained transformer models for imbalanced multiclass malware classification. Comput. Secur. 2022, 121, 102846. [Google Scholar] [CrossRef]
- Wang, H.; Singhal, A.; Liu, P. Tackling imbalanced data in cybersecurity with transfer learning: A case with ROP payload detection. Cybersecurity 2023, 6, 2. [Google Scholar] [CrossRef]
- Naim, O.; Cohen, D.; Ben-Gal, I. Malicious website identification using design attribute learning. Int. J. Inf. Secur. 2023, 22, 1207–1217. [Google Scholar] [CrossRef]
- Sewak, M.; Sahay, S.K.; Rathore, H. Deep reinforcement learning in the advanced cybersecurity threat detection and protection. Inf. Syst. Front. 2023, 25, 589–611. [Google Scholar] [CrossRef]
- Nguyen, T.T.; Reddi, V.J. Deep Reinforcement Learning for Cyber Security. IEEE Trans. Neural Networks Learn. Syst. 2021, 34, 3779–3795. [Google Scholar] [CrossRef] [PubMed]
- Kamal, H.; Gautam, S.; Mehrotra, D.; Sharif, M.S. Reinforcement Learning Model for Detecting Phishing Websites. In Cybersecurity and Artificial Intelligence: Transformational Strategies and Disruptive Innovation; Jahankhani, H., Bowen, G., Sharif, M.S., Hussien, O., Eds.; Springer: Berlin, Germany, 2024; pp. 309–326. [Google Scholar] [CrossRef]
- Shen, S.; Xie, L.; Zhang, Y.; Wu, G.; Zhang, H.; Yu, S. Joint Differential Game and Double Deep Q-Networks for Suppressing Malware Spread in Industrial Internet of Things. IEEE Trans. Inf. Forensics Secur. 2023, 18, 5302–5315. [Google Scholar] [CrossRef]
- Lin, E.; Chen, Q.; Qi, X. Deep Reinforcement Learning for Imbalanced Classification. Appl. Intell. 2020, 50, 2488–2502. [Google Scholar] [CrossRef]
- Yuan, F.; Tian, T.; Shang, Y.; Lu, Y.; Liu, Y.; Tan, J. Malicious Domain Detection on Imbalanced Data with Deep Reinforcement Learning. In Proceedings of the Neural Information Processing; Springer International Publishing: Berlin/Heidelberg, Germany, 2021; pp. 464–476. [Google Scholar] [CrossRef]
- Maci, A.; Santorsola, A.; Coscia, A.; Iannacone, A. Unbalanced Web Phishing Classification through Deep Reinforcement Learning. Computers 2023, 12, 118. [Google Scholar] [CrossRef]
- Maci, A.; Tamma, N.; Coscia, A. Deep Reinforcement Learning-based Malicious URL Detection with Feature Selection. In Proceedings of the 2024 IEEE 3rd International Conference on AI in Cybersecurity (ICAIC), Houston, TX, USA, 7–9 February 2024; pp. 1–7. [Google Scholar] [CrossRef]
- Maci., A.; Urbano., G.; Coscia., A. Deep Q-Networks for Imbalanced Multi-Class Malware Classification. In Proceedings of the 10th International Conference on Information Systems Security and Privacy—ICISSP, Roma, Italy, 26–28 February 2024; pp. 342–349. [Google Scholar] [CrossRef]
- Yang, J.; El-Bouri, R.; O’Donoghue, O.; Lachapelle, A.S.; Soltan, A.A.S.; Clifton, D.A. Deep Reinforcement Learning for Multi-class Imbalanced Training. arXiv 2022, arXiv:2205.12070. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
- Hasselt, H.V.; Guez, A.; Silver, D. Deep reinforcement learning with double Q-Learning. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence; AAAI Press: Washington, DC, USA, 2016; Volume 30, pp. 2094–2100. [Google Scholar] [CrossRef]
- Wang, Z.; Schaul, T.; Hessel, M.; van Hasselt, H.; Lanctot, M.; de Freitas, N. Dueling Network Architectures for Deep Reinforcement Learning. In Proceedings of the 33rd International Conference on Machine Learning. PMLR, New York, NY, USA, 20–22 June 2016; Volume 48, pp. 1995–2003. [Google Scholar] [CrossRef]
- Schaul, T.; Quan, J.; Antonoglou, I.; Silver, D. Prioritized Experience Replay. arXiv 2016, arXiv:1511.05952. [Google Scholar] [CrossRef]
- Fortunato, M.; Azar, M.G.; Piot, B.; Menick, J.; Osband, I.; Graves, A.; Mnih, V.; Munos, R.; Hassabis, D.; Pietquin, O.; et al. Noisy Networks for Exploration. arXiv 2019, arXiv:1706.10295. [Google Scholar] [CrossRef]
- Alkhateeb, E.; Ghorbani, A.; Habibi Lashkari, A. Identifying Malware Packers through Multilayer Feature Engineering in Static Analysis. Information 2024, 15, 102. [Google Scholar] [CrossRef]
- Gibert, D. PE Parser: A Python package for Portable Executable files processing. Softw. Impacts 2022, 13, 100365. [Google Scholar] [CrossRef]
- Yamany, B.; Elsayed, M.S.; Jurcut, A.D.; Abdelbaki, N.; Azer, M.A. A Holistic Approach to Ransomware Classification: Leveraging Static and Dynamic Analysis with Visualization. Information 2024, 15, 46. [Google Scholar] [CrossRef]
- Brescia, W.; Maci, A.; Mascolo, S.; De Cicco, L. Safe Reinforcement Learning for Autonomous Navigation of a Driveable Vertical Mast Lift. IFAC-PapersOnLine 2023, 56, 9068–9073. [Google Scholar] [CrossRef]
- Han, D.; Mulyana, B.; Stankovic, V.; Cheng, S. A Survey on Deep Reinforcement Learning Algorithms for Robotic Manipulation. Sensors 2023, 23, 3762. [Google Scholar] [CrossRef]
- Tran, M.; Pham-Hi, D.; Bui, M. Optimizing Automated Trading Systems with Deep Reinforcement Learning. Algorithms 2023, 16, 23. [Google Scholar] [CrossRef]
- Hu, Y.J.; Lin, S.J. Deep Reinforcement Learning for Optimizing Finance Portfolio Management. In Proceedings of the 2019 Amity International Conference on Artificial Intelligence (AICAI), Dubai, United Arab Emirates, 4–6 February 2019; pp. 14–20. [Google Scholar] [CrossRef]
- Yang, J.; El-Bouri, R.; O’Donoghue, O.; Lachapelle, A.S.; Soltan, A.A.S.; Eyre, D.W.; Lu, L.; Clifton, D.A. Deep reinforcement learning for multi-class imbalanced training: Applications in healthcare. Mach. Learn. 2023, 113, 2655–2674. [Google Scholar] [CrossRef]
- Chen, T.; Liu, J.; Xiang, Y.; Niu, W.; Tong, E.; Han, Z. Adversarial attack and defense in reinforcement learning-from AI security view. Cybersecurity 2019, 2, 11. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018; Available online: https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf (accessed on 18 January 2024).
- Wang, X.; Wang, S.; Liang, X.; Zhao, D.; Huang, J.; Xu, X.; Dai, B.; Miao, Q. Deep Reinforcement Learning: A Survey. IEEE Trans. Neural Netw. Learn. Syst. 2022, 35, 5064–5078. [Google Scholar] [CrossRef] [PubMed]
- Jang, B.; Kim, M.; Harerimana, G.; Kim, J.W. Q-Learning Algorithms: A Comprehensive Classification and Applications. IEEE Access 2019, 7, 133653–133667. [Google Scholar] [CrossRef]
- Zhang, H.; Yu, T. Taxonomy of Reinforcement Learning Algorithms. In Deep Reinforcement Learning: Fundamentals, Research and Applications; Dong, H., Ding, Z., Zhang, S., Eds.; Springer: Singapore, 2020; pp. 125–133. [Google Scholar] [CrossRef]
- Berman, D.S.; Buczak, A.L.; Chavis, J.S.; Corbett, C.L. A Survey of Deep Learning Methods for Cyber Security. Information 2019, 10, 122. [Google Scholar] [CrossRef]
- Kolosnjaji, B.; Zarras, A.; Webster, G.; Eckert, C. Deep Learning for Classification of Malware System Call Sequences. In Proceedings of the AI 2016: Advances in Artificial Intelligence; Springer International Publishing: Berlin/Heidelberg, Germany, 2016; pp. 137–149. [Google Scholar] [CrossRef]
- Meng, X.; Shan, Z.; Liu, F.; Zhao, B.; Han, J.; Wang, H.; Wang, J. MCSMGS: Malware Classification Model Based on Deep Learning. In Proceedings of the 2017 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), Nanjing, China, 12–14 October 2017; pp. 272–275. [Google Scholar] [CrossRef]
- Maniath, S.; Ashok, A.; Poornachandran, P.; Sujadevi, V.; A.U., P.S.; Jan, S. Deep learning LSTM based ransomware detection. In Proceedings of the 2017 Recent Developments in Control, Automation & Power Engineering (RDCAPE), Noida, India, 26–27 October 2017; pp. 442–446. [Google Scholar] [CrossRef]
- Cannarile, A.; Dentamaro, V.; Galantucci, S.; Iannacone, A.; Impedovo, D.; Pirlo, G. Comparing Deep Learning and Shallow Learning Techniques for API Calls Malware Prediction: A Study. Appl. Sci. 2022, 12, 1645. [Google Scholar] [CrossRef]
- Cannarile, A.; Carrera, F.; Galantucci, S.; Iannacone, A.; Pirlo, G. A Study on Malware Detection and Classification Using the Analysis of API Calls Sequences Through Shallow Learning and Recurrent Neural Networks. In Proceedings of the 6th Italian Conference on Cybersecurit (ITASEC22), CEUR Workshop Proceedings. Rome, Italy, 20–23 June 2022; Available online: https://ceur-ws.org/Vol-3260/paper9.pdf (accessed on 8 March 2024).
- Li, C.; Lv, Q.; Li, N.; Wang, Y.; Sun, D.; Qiao, Y. A novel deep framework for dynamic malware detection based on API sequence intrinsic features. Comput. Secur. 2022, 116, 102686. [Google Scholar] [CrossRef]
- Chanajitt, R.; Pfahringer, B.; Gomes, H.M.; Yogarajan, V. Multiclass Malware Classification Using Either Static Opcodes or Dynamic API Calls. In Proceedings of the AI 2022: Advances in Artificial Intelligence; Springer International Publishing: Berlin/Heidelberg, Germany, 2022; Volume 13728, pp. 427–441. [Google Scholar] [CrossRef]
- Maniriho, P.; Mahmood, A.N.; Chowdhury, M.J.M. API-MalDetect: Automated malware detection framework for windows based on API calls and deep learning techniques. J. Netw. Comput. Appl. 2023, 218, 103704. [Google Scholar] [CrossRef]
- Bensaoud, A.; Kalita, J. CNN-LSTM and transfer learning models for malware classification based on opcodes and API calls. Knowl.-Based Syst. 2024, 290, 111543. [Google Scholar] [CrossRef]
- Syeda, D.Z.; Asghar, M.N. Dynamic Malware Classification and API Categorisation of Windows Portable Executable Files Using Machine Learning. Appl. Sci. 2024, 14, 1015. [Google Scholar] [CrossRef]
- He, X.; Zhao, K.; Chu, X. AutoML: A survey of the state-of-the-art. Knowl.-Based Syst. 2021, 212, 106622. [Google Scholar] [CrossRef]
- Brown, A.; Gupta, M.; Abdelsalam, M. Automated machine learning for deep learning based malware detection. Comput. Secur. 2024, 137, 103582. [Google Scholar] [CrossRef]
- Qian, L.; Cong, L. Channel Features and API Frequency-Based Transformer Model for Malware Identification. Sensors 2024, 24, 580. [Google Scholar] [CrossRef]
- Yunan, Z.; Huang, Q.; Ma, X.; Yang, Z.; Jiang, J. Using Multi-features and Ensemble Learning Method for Imbalanced Malware Classification. In Proceedings of the 2016 IEEE Trustcom/BigDataSE/ISPA, Tianjin, China, 23–26 August 2016; pp. 965–973. [Google Scholar] [CrossRef]
- Akarsh, S.; Simran, K.; Poornachandran, P.; Menon, V.K.; Soman, K. Deep Learning Framework and Visualization for Malware Classification. In Proceedings of the 2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS), Coimbatore, India, 15–16 March 2019; pp. 1059–1063. [Google Scholar] [CrossRef]
- Catak, F.O.; Ahmed, J.; Sahinbas, K.; Khand, Z.H. Data augmentation based malware detection using convolutional neural networks. PeerJ Comput. Sci. 2021, 7, e346. [Google Scholar] [CrossRef]
- Liu, J.; Zhuge, C.; Wang, Q.; Guo, X.; Li, Z. Imbalance Malware Classification by Decoupling Representation and Classifier. In Proceedings of the Advances in Artificial Intelligence and Security; Sun, X., Zhang, X., Xia, Z., Bertino, E., Eds.; Springer: Cham, Switzerland, 2021; pp. 85–98. [Google Scholar] [CrossRef]
- Bacevicius, M.; Paulauskaite-Taraseviciene, A. Machine Learning Algorithms for Raw and Unbalanced Intrusion Detection Data in a Multi-Class Classification Problem. Appl. Sci. 2023, 13, 7328. [Google Scholar] [CrossRef]
- Li, T.; Luo, Y.; Wan, X.; Li, Q.; Liu, Q.; Wang, R.; Jia, C.; Xiao, Y. A malware detection model based on imbalanced heterogeneous graph embeddings. Expert Syst. Appl. 2024, 246, 123109. [Google Scholar] [CrossRef]
- Xue, L.; Zhu, T. Hybrid resampling and weighted majority voting for multi-class anomaly detection on imbalanced malware and network traffic data. Eng. Appl. Artif. Intell. 2024, 128, 107568. [Google Scholar] [CrossRef]
- Fang, Z.; Wang, J.; Geng, J.; Kan, X. Feature Selection for Malware Detection Based on Reinforcement Learning. IEEE Access 2019, 7, 176177–176187. [Google Scholar] [CrossRef]
- Wu, Y.; Li, M.; Zeng, Q.; Yang, T.; Wang, J.; Fang, Z.; Cheng, L. DroidRL: Feature selection for android malware detection with reinforcement learning. Comput. Secur. 2023, 128, 103126. [Google Scholar] [CrossRef]
- Wang, Y.; Stokes, J.W.; Marinescu, M. Neural Malware Control with Deep Reinforcement Learning. In Proceedings of the MILCOM 2019 - 2019 IEEE Military Communications Conference (MILCOM), Norfolk, VA, USA, 12–14 November 2019; pp. 1–8. [Google Scholar] [CrossRef]
- Fang, Z.; Wang, J.; Li, B.; Wu, S.; Zhou, Y.; Huang, H. Evading Anti-Malware Engines with Deep Reinforcement Learning. IEEE Access 2019, 7, 48867–48879. [Google Scholar] [CrossRef]
- Wang, Y.; Stokes, J.; Marinescu, M. Actor Critic Deep Reinforcement Learning for Neural Malware Control. In Proceedings of the AAAI Conference on Artificial Intelligence. Association for the Advancement of Artificial Intelligence (AAAI), 2020, Hilton New York Midtown, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 1005–1012. [Google Scholar] [CrossRef]
- Song, W.; Li, X.; Afroz, S.; Garg, D.; Kuznetsov, D.; Yin, H. MAB-Malware: A Reinforcement Learning Framework for Blackbox Generation of Adversarial Malware. In Proceedings of the 2022 ACM on Asia Conference on Computer and Communications Security. Association for Computing Machinery, Nagasaki, Japan, 30 May–3 June 2022; pp. 990–1003. [Google Scholar] [CrossRef]
- Anderson, H.S.; Kharkar, A.; Filar, B.; Evans, D.; Roth, P. Learning to Evade Static PE Machine Learning Malware Models via Reinforcement Learning. arXiv 2018, arXiv:1801.08917. [Google Scholar] [CrossRef]
- Deng, X.; Cen, M.; Jiang, M.; Lu, M. Ransomware early detection using deep reinforcement learning on portable executable header. Cluster Computing 2023, 27, 1867–1881. [Google Scholar] [CrossRef]
- Birman, Y.; Hindi, S.; Katz, G.; Shabtai, A. Cost-effective ensemble models selection using deep reinforcement learning. Inf. Fusion 2022, 77, 133–148. [Google Scholar] [CrossRef]
- Atti, M.; Yogi, M.K. Application of Deep Reinforcement Learning (DRL) for Malware Detection. Int. J. Inf. Technol. Comput. Eng. (IJITC) 2024, 4, 23–35. [Google Scholar] [CrossRef]
- Al-Fawa’reh, M.; Abu-Khalaf, J.; Szewczyk, P.; Kang, J.J. MalBoT-DRL: Malware Botnet Detection Using Deep Reinforcement Learning in IoT Networks. IEEE Internet Things J. 2024, 11, 9610–9629. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Graves, A.; Schmidhuber, J. Framewise phoneme classification with bidirectional LSTM networks. In Proceedings of the 2005 IEEE International Joint Conference on Neural Networks, Montreal, QC, Canada, 31 July–4 August 2005; Volume 4, pp. 2047–2052. [Google Scholar] [CrossRef]
- Cho, K.; van Merrienboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. arXiv 2014, arXiv:1406.1078. [Google Scholar] [CrossRef]
- Arik, S.Ö.; Pfister, T. TabNet: Attentive Interpretable Tabular Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, 2021, Virtual conference, 2–9 February 2021; Volume 35, pp. 6679–6687. [Google Scholar] [CrossRef]
- Düzgün, B.; Cayir, A.; Demirkiran, F.; Kahya, C.; Gençaydın, B.; Dag, H. New Datasets for Dynamic Malware Classification. 2021. Available online: https://www.researchgate.net/publication/356664607_New_Datasets_for_Dynamic_Malware_Classification (accessed on 16 May 2024).
- De Oliveira, A.S.; Sassi, R.J. Behavioral Malware Detection Using Deep Graph Convolutional Neural Networks. TechRxiv 2019. [Google Scholar] [CrossRef]
- Do, N.Q.; Selamat, A.; Krejcar, O.; Herrera-Viedma, E.; Fujita, H. Deep Learning for Phishing Detection: Taxonomy, Current Challenges and Future Directions. IEEE Access 2022, 10, 36429–36463. [Google Scholar] [CrossRef]
- Grandini, M.; Bagli, E.; Visani, G. Metrics for Multi-Class Classification: An Overview. arXiv 2020, arXiv:2008.05756. [Google Scholar] [CrossRef]
- Harris, C.R.; Millman, K.J.; van der Walt, S.J.; Gommers, R.; Virtanen, P.; Cournapeau, D.; Wieser, E.; Taylor, J.; Berg, S.; Smith, N.J.; et al. Array programming with NumPy. Nature 2020, 585, 357–362. [Google Scholar] [CrossRef]
- McKinney, W. Data Structures for Statistical Computing in Python. In Proceedings of the 9th Python in Science Conference, Austin, Texas, USA, 28 June–3 July 2010; van der Walt, S., Millman, J., Eds.; pp. 56–61. [Google Scholar] [CrossRef]
- Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. 2015. Available online: https://www.tensorflow.org (accessed on 5 February 2024).
Malware | m | |||||
---|---|---|---|---|---|---|
Family | US-1 | US-2 | US-3 | US-1 | US-2 | US-3 |
Adware | 303 | 36 | ← | 0.689 | 0.987 | 0.942 |
Backdoor | 800 | ← | ← | 0.261 | 0.044 | 0.042 |
Downloader | 800 | ← 1 | ← | ↑ | ↑ | ↑ |
Dropper | 713 | 446 | 179 | 0.292 | 0.079 | 0.189 |
Spyware | 665 | 398 | 131 | 0.314 | 0.089 | 0.258 |
Trojan | 800 | ← | ← | 0.261 | 0.044 | 0.042 |
Virus | 800 | ← | ← | ↑ | ↑ | ↑ |
Worms | 800 | ← | ← | ↑ | ↑ | ↑ |
Paper | Equation | Implementation Details and Description |
---|---|---|
A. Alzammam et al. [11] | The computed class weight is passed as a parameter to the learning function of the model. Because of such a formulation, the lower , the greater .1 | |
S. Akarsh et al. [62] | The developed custom categorical cross-entropy loss consists of the product between the traditional loss and the left-side equation. Note that . does not consider class skew, while , which decreases with the increase in . In our setting, . |
Dataset | Technique | |||
---|---|---|---|---|
DQN | DDQN | Dueling | PER | |
Catak | ✗ | ✓ | ✓ | ✗ |
VirusSample | ✓ | ✗ | ✓ | ✓ |
Oliveira | ✓ | ✗ | ✓ | ✗ |
VirusShare | ✗ | ✓ | ✓ | ✓ |
Algorithm | Cost-Sensitive Strategy | Training Time | Inference Time | F1-Score | AUC |
---|---|---|---|---|---|
LSTM | Alzammam et al. [11] | 355.892 | 2.319 | 0.009 | 0.500 |
BiLSTM | 11,166.234 | 29.632 | 0.519 | 0.857 | |
BiGRU | 8840.091 | 26.029 | 0.015 | 0.500 | |
TabNet | 1283.507 | 5.157 | 0.141 | 0.575 | |
LSTM | Akarsh et al. [62] | 296.317 | 2.338 | 0.098 | 0.500 |
BiLSTM | 11,122.994 | 29.627 | 0.676 | 0.795 | |
BiGRU | 8745.446 | 26.149 | 0.015 | 0.500 | |
TabNet | 1208.977 | 6.005 | 0.099 | 0.498 | |
RTF | None | 682.800 | 2.755 | 0.727 | 0.951 |
SINNER | Equation (11) | 5087.307 | 0.044 | 0.744 | 0.832 |
Algorithm | Cost-Sensitive Strategy | Training Time | Inference Time | F1-Score | AUC |
---|---|---|---|---|---|
LSTM | Alzammam et al. [11] | 188.601 | 1.202 | 0.014 | 0.500 |
BiLSTM | 1417.669 | 3.997 | 0.661 | 0.917 | |
BiGRU | 1132.346 | 3.564 | 0.659 | 0.918 | |
TabNet | 899.666 | 4.437 | 0.230 | 0.604 | |
LSTM | Akarsh et al. [62] | 143.860 | 1.215 | 0.129 | 0.500 |
BiLSTM | 1383.538 | 3.980 | 0.753 | 0.849 | |
BiGRU | 1112.441 | 3.529 | 0.741 | 0.852 | |
TabNet | 944.372 | 4.175 | 0.129 | 0.499 | |
RTF | None | 481.300 | 4.952 | 0.806 | 0.977 |
SINNER | Equation (11) | 3605.696 | 0.021 | 0.791 | 0.864 |
Algorithm | Cost-Sensitive Strategy | Training Time | Inference Time | F1-Score | AUC |
---|---|---|---|---|---|
LSTM | Alzammam et al. [11] | 92.839 | 0.798 | 0.186 | 0.556 |
BiLSTM | 362.934 | 1.764 | 0.541 | 0.737 | |
BiGRU | 345.497 | 1.703 | 0.537 | 0.736 | |
TabNet | 662.308 | 2.877 | 0.101 | 0.498 | |
LSTM | Akarsh et al. [62] | 91.102 | 0.798 | 0.178 | 0.547 |
BiLSTM | 352.377 | 1.785 | 0.515 | 0.721 | |
BiGRU | 336.035 | 1.711 | 0.537 | 0.736 | |
TabNet | 658.649 | 2.856 | 0.091 | 0.497 | |
RTF | None | 1626.000 | 4.952 | 0.615 | 0.882 |
SINNER | Equation (11) | 1262.938 | 0.014 | 0.427 | 0.668 |
Algorithm | Cost-Sensitive Strategy | Training Time | Inference Time | F1-Score | AUC |
---|---|---|---|---|---|
LSTM | Alzammam et al. [11] | 380.438 | 2.764 | 0.341 | 0.780 |
BiLSTM | 1217.654 | 4.006 | 0.400 | 0.782 | |
BiGRU | 1051.595 | 4.411 | 0.410 | 0.790 | |
TabNet | 3572.845 | 15.849 | 0.047 | 0.532 | |
LSTM | Akarsh et al. [62] | 373.986 | 2.656 | 0.561 | 0.735 |
BiLSTM | 1166.489 | 3.826 | 0.568 | 0.734 | |
BiGRU | 1023.659 | 3.601 | 0.569 | 0.731 | |
TabNet | 3428.129 | 17.121 | 0.137 | 0.516 | |
RTF | None | 8711.400 | 4.531 | 0.565 | 0.885 |
SINNER | Equation (11) | 976.798 | 0.015 | 0.563 | 0.725 |
Imbalanced Malware Classifier | Positive | Negative |
---|---|---|
LSTM | Combined with an appropriate loss function balancing strategy, the model performs adequately with a strong time dependence among the data, i.e., a dynamic analysis-based dataset. In addition, the learning and inference times are very competitive with respect to the same measures achieved by the alternative DL solutions employed in the experiment. | Poor classification performance in 87.5% of experiments involving this algorithm. |
BiLSTM/BiGRU | The bidirectional models can benefit from the cost-sensitive strategy proposed by Akarsh et al. | Poor timing performance, and using three of the evaluated datasets, the classification scores are far from those achieved by competitors. |
TabNet | In most cases, it appears suitable in terms of training time (in 2 out of 4 tests) compared to other solutions. | This algorithm, combined with the two evaluated cost-senstive strategies, was the worst classifier in all experiments. |
RTF | State-of-the-art classification performance on three out of four datasets. | This model performs an ensemble of transformers; therefore, the high number of parameters required by the involved DL architectures can make it impracticable in several application scenarios. In addition, it achieves state-of-the-art classification performance via dataset-specific hyperparameter tuning. |
SINNER | Using the same hyperparameters for each tested dataset, the proposed solution targets the RTF in terms of F1 and outperforms the top competitor using VirusShare. It achieves promising results on both static and dynamic analysis datasets. In addition, it is the fastest algorithm among the considered list of competitors in providing the prediction. | Longer training time in 50% of tests compared to RTF. Lower AUC score than that achieved by RTF. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Coscia, A.; Iannacone, A.; Maci, A.; Stamerra, A. SINNER: A Reward-Sensitive Algorithm for Imbalanced Malware Classification Using Neural Networks with Experience Replay. Information 2024, 15, 425. https://doi.org/10.3390/info15080425
Coscia A, Iannacone A, Maci A, Stamerra A. SINNER: A Reward-Sensitive Algorithm for Imbalanced Malware Classification Using Neural Networks with Experience Replay. Information. 2024; 15(8):425. https://doi.org/10.3390/info15080425
Chicago/Turabian StyleCoscia, Antonio, Andrea Iannacone, Antonio Maci, and Alessandro Stamerra. 2024. "SINNER: A Reward-Sensitive Algorithm for Imbalanced Malware Classification Using Neural Networks with Experience Replay" Information 15, no. 8: 425. https://doi.org/10.3390/info15080425
APA StyleCoscia, A., Iannacone, A., Maci, A., & Stamerra, A. (2024). SINNER: A Reward-Sensitive Algorithm for Imbalanced Malware Classification Using Neural Networks with Experience Replay. Information, 15(8), 425. https://doi.org/10.3390/info15080425