A Malicious Code Detection Method Based on FF-MICNN in the Internet of Things
Abstract
:1. Introduction
- A grayscale image converted from malicious code is used as the input for the improved network model, and the malicious code detection task is converted into an image classification task;
- The FF-MICNN algorithm is proposed. The opcode sequence features and grayscale image features are fused, and the improved CNN is used for detection.
2. Related Work
3. FF-MICNN Algorithm
3.1. Data Processing
3.1.1. Grayscale Image Feature Extraction
3.1.2. Feature Extraction of Opcode Sequences
Algorithm 1 Extraction algorithm of opcode sequences |
|
3.1.3. Feature Fusion
Algorithm 2 Feature fusion algorithm |
|
3.2. Model Building
3.2.1. Convolution Layer
3.2.2. Pooling Layer
3.2.3. Added Layer
3.2.4. Fully Connected Layer
3.3. Classification
4. Simulation and Analysis
4.1. Experimental Data
4.2. Evaluation Index
4.3. Parameter Setting
4.4. Experimental Simulation
4.4.1. Simulation Experiment of Opcode Sequence
4.4.2. Grayscale Image Simulation Experiment
4.4.3. Simulation Detection of Fused Features
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Shen, G.; Chen, Z.; Wang, H.; Chen, H.; Wang, S. Feature fusion-based malicious code detection with dual attention mechanism and BiLSTM. Comput. Secur. 2022, 119, 1–14. [Google Scholar] [CrossRef]
- Trivikram, M.; Nir, N. Improving malicious email detection through novel designated deep-learning architectures utilizing entire email. Neural Netw. 2022; in press. [Google Scholar]
- Wang, Q.; Qian, Q. Malicious code classification based on opcode sequences and textCNN network. J. Inf. Secur. Appl. 2022, 67, 1–12. [Google Scholar] [CrossRef]
- Hou, J.; Liu, F.; Lu, H.; Tan, Z.; Zhuang, X.; Tian, Z. A novel flow-vector generation approach for malicious traffic detection. J. Parallel Distrib. Comput. 2022, 169, 72–86. [Google Scholar] [CrossRef]
- Malka, N. Estimation of the success probability of a malicious attacker on blockchain-based edge network. Comput. Netw. 2022; in press. [Google Scholar]
- RAsim, M.; Fargana, J.; SAbira, S. Image-based malicious Internet content filtering method for child protection. J. Inf. Secur. Appl. 2022, 65, 103123. [Google Scholar]
- Lara, K.; Divakaran, L. Predicting stock market returns from malicious attacks: A comparative analysis of vector autoregression and time-delayed neural networks. Decis. Support Syst. 2022, 51, 745–759. [Google Scholar]
- Marcus, B.; Marco, Z.; Daniela, O.; Andre, G. HEAVEN: A Hardware-Enhanced AntiVirus ENgine to accelerate real-time, signature-based malware detection. Expert Syst. Appl. 2022, 201, 117083. [Google Scholar]
- Wu, J.; Wang, W.; Huang, L.; Zhang, F. Intrusion detection technique based on flow aggregation and latent semantic analysis. Appl. Soft Comput. 2022, 127, 109375. [Google Scholar] [CrossRef]
- Zhu, J.; Wu, Z.; Guan, Z. API Sequences Based Malware Detection for Android. In Proceedings of the Ubiquitous Intelligence & Computing & IEEE Intl Conf on Autonomic & Trusted Computing & IEEE Intl Conf on Scalable Computing & Communications & Its Associated Workshops, Beijing, China, 21 July 2016; pp. 673–676. [Google Scholar]
- Zhang, F.; Zhao, T. Malware Detection and Classification Based on N-Grams Attribute Similarity. In Proceedings of the 2017 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC), Guangzhou, China, 21–24 July 2017; pp. 793–796. [Google Scholar]
- Abhijit, Y.; Maninder, S. Malware detection based on opcode frequency. In Proceedings of the 2016 International Conference on Advanced Communication Control and Computing Technologies (ICACCCT), Pyeongchang, South Korea, 31 January–3 February 2016; pp. 646–649. [Google Scholar]
- Kang, B.; Yerima, S.Y.; Sezer, S.; McLaughlin, K. N-gram Opcode Analysis for Android Malware Detection. Int. J. Cyber Situat. Aware. 2016, 1, 231–254. [Google Scholar] [CrossRef]
- Imran, M.; Afzal, M.T.; Qadir, M.A. Similarity-Based Malware Classification Using Hidden Markov Model. In Proceedings of the 2015 Fourth International Conference on Cyber Security, Cyber Warfare, and Digital Forensic (CyberSec), Jakarta, Indonesia, 29–31 October 2015; pp. 129–134. [Google Scholar]
- Siddiquiet, M.; Wang, M.; Lee, J. Detecting Internet Worms Using Data Mining Techniques. J. Syst. Cybern. Inform. 2008, 6, 48–53. [Google Scholar]
- Moser, A.; Kruegel, C.; Kirda, E. Limits of Static Analysis for Malware Detection. In Proceedings of the Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007), Miami Beach, FL, USA, 10–14 December 2007; pp. 421–430. [Google Scholar]
- Hisham, S.G.; Yousef, B.M.; Mohammed, A.A. Behavior-based features model for malware detection. J. Comput. Virol. Hacking Tech. 2016, 12, 59–67. [Google Scholar]
- Li, M.; Jia, X.; Wang, R.; Lin, D. A Feature Selection and Modelling Method for Malicious Code. Comput. Appl. Softw. 2015, 32, 266–271. [Google Scholar]
- Rong, F.; Zuo, Z.; Fang, Y. MACSPMD: Malicious Code Detection Based on Malicious API Call Sequence Pattern Mining. Comput. Sci. 2018, 45, 131–138. [Google Scholar]
- Ucci, D.; Aniello, L.; Baldoni, R. Survey of machine learning techniques for malware analysis. Comput. Secur. 2019, 81, 123–147. [Google Scholar] [CrossRef] [Green Version]
- Davuluru, V.S.P.; Narayanan, B.N.; Balster, E.J. Convolutional Neural Networks as Classification Tools and Feature Extractors for Distinguishing Malware Programs. In Proceedings of the 2019 IEEE National Aerospace and Electronics Conference (NAECON), Dayton, OH, USA, 15–19 July 2019; pp. 273–278. [Google Scholar]
- Mohaisen, A.; Alrawi, O.; Mohaisen, M. AMAL: High-fidelity, behavior-based automated malware analysis and classification. Comput. Secur. 2015, 52, 251–266. [Google Scholar] [CrossRef]
- Liu, Y.; Wang, Z.; Hou, Y. Malware visualization and automatic classification with enhanced information density. J. Tsinghua Univ. 2019, 59, 9–14. [Google Scholar]
- Wan, L.; Xia, J.; Zhu, Y.; Lv, Z. An Improved Semi-supervised Feature Selection Algorithm Based on Information Entropy. Stat. Decis. 2021, 17, 66–70. [Google Scholar]
- Han, X.; Qu, W.; Yao, X.X.; Guo, C.Y.; Zhou, F. Research on Malicious Code Variant Detection Method Based on Texture Fingerprint. J. Commun. 2014, 35, 125–136. [Google Scholar]
- Hashem, H.; Ali, H. Visual malware detection using local malicious pattern. J. Comput. Virol. Hacking Tech. 2019, 15, 1–14. [Google Scholar] [CrossRef]
- Xiao, G.; Li, J.; Chen, Y.; Li, K. MalFCS: An effective malware classification framework with automated feature extraction based on deep convolutional neural networks. J. Parallel Distrib. Comput. 2020, 141. [Google Scholar] [CrossRef]
- Chu, Q.; Liu, G.; Zhu, X. Visualization Feature and CNN Based Homology Classification of Malicious Code. Chin. J. Electron. 2020, 29, 154–160. [Google Scholar] [CrossRef]
- Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61, 85–117. [Google Scholar] [CrossRef] [Green Version]
- Lecun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- Ke, J.; Lin, R.; Sharma, A. An Automatic Instrument Recognition Approach Based on Deep Convolutional Neural Network. Recent Adv. Electr. Electron. Eng. 2021, 14–16. [Google Scholar] [CrossRef]
- Qiang, H.; Guo, Y.; Tian, L. Research on malicious code detection method based on deep belief network. Comput. Technol. Dev. 2019, 29, 93–97. [Google Scholar]
- Kumar, R.; Zhang, X.; Wang, W.; Khan, R.U.; Kumar, J.; Sharif, A. A Multimodal Malware Detection Technique for Android IoT Devices Using Various Features. IEEE Access 2019, 7, 64411–64430. [Google Scholar] [CrossRef]
- Ren, W.; Zhai, L.; Jia, J.; Wang, L.; Zhang, L. Learning selection channels for image steganalysis in spatial domain. Neurocomputing 2020, 401, 10012–10026. [Google Scholar] [CrossRef]
- Chechlinski, U.; Siemitkowska, B.; Majewski, M. A System for Weeds and Crops Identification-Reaching over 10 FPS on Raspberry Pi with the Usage of MobileNets, DenseNet and Custom Modifications. Sensors 2019, 19, 3787. [Google Scholar] [CrossRef] [Green Version]
- Hamzeh, A.; Bakhshinejad, N. Parallel-CNN Network for Malware Detection. IET Inf. Secur. 2019, 14, 210–219. [Google Scholar]
- Gibert, D.; Mateu, C.; Planes, J. Using convolutional neural networks for classification of malware represented as images. J. Comput. Virol. Hacking Tech. 2019, 15, 15–28. [Google Scholar] [CrossRef] [Green Version]
- Cui, Z.; Xue, F.; Cai, X.; Cao, Y.; Wang, G.; Chen, J. Detection of Malicious Code Variants Based on Deep Learning. IEEE Trans. Ind. Inform. 2018, 14, 3187–3196. [Google Scholar] [CrossRef]
- Lang, D.; Ding, W.; Jiang, H.; Chen, Z. Malicious Code Classification Algorithm Based on Multi-feature Fusion. J. Comput. Appl. 2019, 39, 2333–2338. [Google Scholar]
- Xiu, Y.; Liu, J. Malware Detection Based on Opcode Sequence Frequency Vector and Behavior Feature Vector. Inf. Secur. Commun. Priv. 2016, 9, 97–101. [Google Scholar]
- Li, S.; Wang, C.; Shi, Y. Malicious Code Detection Based on Multi-feature Random Forest. Comput. Appl. Softw. 2020, 37, 328–333. [Google Scholar]
- Luo, S. Research on Deep Learning Malicious Code Analysis and Detection Technology. Ph.D. Thesis, Xinjiang University, Ürümqi, China, 2018. [Google Scholar]
Authors | Algorithm Description | Merits |
---|---|---|
Zhu et al. [10] | A malicious code detection method using | Can quickly detect malicious code, |
API sequence of malicious code. | cannot detect newly emerged | |
malicious code. | ||
Zhang et al. [11] | Use attribute similarity to detect | Can quickly analyze malicious |
malicious code. | code, cannot accurately analyze | |
confused code. | ||
Abhijit et al. [12] | A malicious code detection method based | Has high accuracy and can detect |
on frequency characteristics. | wrong and missed code, consumes a | |
large number of resources. | ||
Kang et al. [13] | An n-gram opcode features-based approach | Supports automatic feature discovery |
that utilizes machine learning to identify | without relying on prior experts | |
and categorize Android malware. | or domain knowledge. | |
Imran et al. [14] | A detection scheme based on the hidden | Relies heavily on the API sequence, |
Markov model and discriminant classifier. | and requires a large amount of | |
computation. | ||
Siddiquiet et al. [15] | A worm prevention technology using a | Can improve the detection rate of |
data mining framework. | new worms without using a large | |
amount of data. | ||
Moser et al. [16] | A binary obfuscation scheme. | Easily avoided if malicious code is |
packaged or confused. |
Authors | Algorithm Description | Merits |
---|---|---|
Hisham et al. [17] | A behavior-based feature model can | Can quickly detect malicious code, |
dynamically analyze and evaluate | cannot detect newly emerged | |
a dataset of malicious code. | malicious code. | |
Li et al. [18] | A detection method based on the | Malicious code can be detected, |
semantic dynamic characteristics of | evasive malicious code cannot. | |
malicious code. | ||
Rong et al. [19] | A MACSPD detection method, which is | Better than a similar type of algorithm |
an API sequence model mining method. | in detecting unknown malicious code. | |
Ucci et al. [20] | A scalable clustering method to | Can identify and group similar malware |
identify and group malware samples. | programs with better accuracy. | |
Phodeet et al. [21] | A model that predicts malware in | Determine the presence of malware |
executing files. | before it executes a payload. | |
Mohaisenet et al. [22] | A behavior-based automated malware | Difficult to realize this method in the |
analysis and classification System. | case of too much malicious code data. |
Authors | Algorithm Description | Merits |
---|---|---|
Wan et al. [24] | A malicious code classification | Can improve the performance of the |
method based on analytic behavior. | algorithm while reducing data dimensions. | |
Han et al. [25] | A malicious code detection method | Cannot sufficiently solve the |
based on the texture features of | artificial influence, nor can it | |
malicious code. | achieve end-to-end detection. | |
Hashemi et al. [26] | An image-based method to detect | Can be regarded as a framework with |
unknown malicious code. | flexibility. | |
Xiao et al. [27] | A strategy to select a deep learning | Classification accuracy was 99.72% |
model that fits the malware | higher than that of other | |
visualization images. | classification methods. |
Serial Number | Malicious Code Category | Description | Sample Size |
---|---|---|---|
1 | Lollipo | — | 2478 |
2 | Ramni | Contains the code for a powerful botnet | 1542 |
3 | Simda | Consists of four types of malicious code, | |
The most sophisticated of which are botnets, | 42 | ||
Trojans, backdoors, and password theft | |||
4 | Vundo | Trojans and worms | 474 |
5 | Tracur | Trojans | 751 |
6 | Gatak | Trojan horse | 1013 |
7 | Kelihos_ver3 | Encrypted P2P botnets | 2942 |
8 | Kelihos_ver1 | Bot | 398 |
9 | Obfuscator.ACY | Malicious code formed by a | |
combination of four methods | 1228 |
Types | Real Samples | Pseudo-Samples |
---|---|---|
Real samples | TP | FN |
Pseudo-samples | FP | TN |
Parameter | Numerical Value | Instruction |
---|---|---|
Input size | variable | Fusion features of malicious code |
Convolution kernel | 3 × 3 | Size of the convolution kernel |
Step length | 1 | Step size of the convolution kernel and pooling window |
Pool size | 2 × 2 | Size of the pool window |
Learning rate | 0.001 | Learning rate of FF-MICNN |
Iterations | 15 | Number of iterations |
Activation function | ReLU | Activation function selected by FF-MICNN |
Classifier | softmax | Softmax regression classification model |
Dataset partition | 7:3 | Ratio of training data to test data |
Algorithm Name | TPR | FPR | Accuracy | F1 |
---|---|---|---|---|
Parallel-CNN [30] | 0.959 | 0.0326 | 0.9818 | 0.9805 |
CNN-Image [31] | 0.9387 | 0.0735 | 0.9406 | 0.9317 |
CNN+Image+Bat [32] | 0.9255 | 0.0311 | 0.9346 | 0.9221 |
FF-MICNN | 0.9614 | 0.0311 | 0.986 | 0.9817 |
Paper Author | Accuracy | Description |
---|---|---|
Kaggle champion | 98% | Feature fusion of image features, opcodes, and headers |
Lang et al. | 87% | Feature fusion of opcode sequences and gist features |
Xiu et al. | 96% | Feature fusion of opcode sequence frequency and behavior |
Li et al. | 95.518% | Feature fusion of grayscale image and color |
Luo et al. | 95.76% | Feature fusion of grayscale image texture and operation frequency |
FF-MICNN | 98.6% | Truncated scale grayscale image and opcode sequence |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, W.; Feng, Y.; Han, G.; Zhu, H.; Tan, X. A Malicious Code Detection Method Based on FF-MICNN in the Internet of Things. Sensors 2022, 22, 8739. https://doi.org/10.3390/s22228739
Zhang W, Feng Y, Han G, Zhu H, Tan X. A Malicious Code Detection Method Based on FF-MICNN in the Internet of Things. Sensors. 2022; 22(22):8739. https://doi.org/10.3390/s22228739
Chicago/Turabian StyleZhang, Wenbo, Yongxin Feng, Guangjie Han, Hongbo Zhu, and Xiaobo Tan. 2022. "A Malicious Code Detection Method Based on FF-MICNN in the Internet of Things" Sensors 22, no. 22: 8739. https://doi.org/10.3390/s22228739