CSMC: A Secure and Efficient Visualized Malware Classification Method Inspired by Compressed Sensing
Abstract
:1. Introduction
2. Related Work
- (1)
- The proposed method is a compressed-sensing-inspired malware classification method that integrates deep learning technologies. Namely, it can compress malware samples before data sharing, model training, and family classification. The volume of data involved in these processes is significantly reduced, which is particularly critical in resource-constrained sensor environments. Moreover, unlike most existing compressed-sensing-based methods that often merely use deterministic measurement matrices to compress data, the proposed method can extract malware family features during data compression, which helps it outperform classical compressed-sensing-based methods in terms of classification accuracy, as well as reconstruction quality.
- (2)
- The proposed method provides security for malware classification by preventing classification participants from accurately reconstructing the original malware samples based on the compressed ones. This is particularly important in IoT environments, as sensors may typically have lower security defenses, rendering them more susceptible to security threats. By employing the proposed method, malware samples are not directly exposed to classification participants and can hardly be exploited by malicious attackers.
- (3)
- Experiments are designed and conducted to demonstrate that the proposed method outperforms many existing compressed-sensing-based and machine- or deep-learning-based methods in terms of classification accuracy and reconstruction quality, either with or without the impacts caused by noises. In IoT environments, devices are often deployed in settings where they may encounter electromagnetic interference or signal attenuation. Enhancing resistance to noise ensures that malware classification accuracy and efficiency are maintained even under these complex and challenging conditions, thereby improving the overall security of sensors.
3. Proposed Method
3.1. Step One: Sample Compression
Algorithm 1 Sample Compression Algorithm |
|
3.2. Step Two: Malware Classification
Algorithm 2 The Proposed Classification Network Algorithm |
|
4. Experiment
4.1. Metric
- : The number of samples that are from family i and are correctly predicted to be in family i.
- : The number of samples that do not belong to family i and are correctly predicted not to belong to family i.
- : The number of samples that do not belong to family i and are incorrectly predicted to be in family i.
- : The number of samples that are incorrectly predicted to belong to family i but actually belong to other families.
4.2. Experimental Setting and Dataset
4.3. Experiment Results
4.3.1. Feature Extraction Analysis
4.3.2. Classification and Reconstruction Performance
4.3.3. Malware Code Reverting
4.3.4. Noise Robustness
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
IoT | Internet of Things |
CSMC | Compressed Sensing Malware Classification |
API | Application Programming Interface |
CNN | Convolutional Neural Network |
CS | Compressed Sensing |
MDNBC | Multidimensional Naive Bayes Classification |
LSTM | Long Short-Term Memory |
PE | Portable Executable |
DRSN | Deep Residual Shrinkage Network |
PSNR | Peak Signal-to-Noise Ratio |
MSE | Mean Squared Error |
References
- Cui, Z.; Xue, F.; Cai, X.; Cao, Y.; Wang, G.; Chen, J. Detection of malicious code variants based on deep learning. IEEE Trans. Ind. Inform. 2018, 14, 3187–3196. [Google Scholar] [CrossRef]
- Aslan, Ö.A.; Samet, R. A comprehensive review on malware detection approaches. IEEE Access 2020, 8, 6249–6271. [Google Scholar] [CrossRef]
- Ye, Y.; Li, T.; Adjeroh, D.; Iyengar, S.S. A survey on malware detection using data mining techniques. ACM Comput. Surv. 2017, 50, 1–40. [Google Scholar] [CrossRef]
- Ranveer, S.; Hiray, S. Comparative analysis of feature extraction methods of malware detection. Int. J. Comput. Appl. 2015, 120, 975–980. [Google Scholar] [CrossRef]
- Sun, B.; Li, Q.; Guo, Y.; Wen, Q.; Lin, X.; Liu, W. Malware family classification method based on static feature extraction. In Proceedings of the 2017 3rd IEEE International Conference on Computer and Communications (ICCC), Chengdu, China, 13–16 December 2017; pp. 507–513. [Google Scholar]
- Khammas, B.M.; Monemi, A.; Ismail, I.; Nor, S.M.; Marsono, M.N. Metamorphic malware detection based on support vector machine classification of malware sub-signatures. TELKOMNIKA (Telecommun. Comput. Electron. Control) 2016, 14, 1157–1165. [Google Scholar] [CrossRef]
- Yuan, Z.; Lu, Y.; Wang, Z.; Xue, Y. Droid-sec: Deep learning in android malware detection. In Proceedings of the 2014 ACM Conference on SIGCOMM, Chicago, IL, USA, 17–22 August 2014; pp. 371–372. [Google Scholar]
- Ahmed, M.; Afreen, N.; Ahmed, M.; Sameer, M.; Ahamed, J. An inception V3 approach for malware classification using machine learning and transfer learning. Int. J. Intell. Netw. 2023, 4, 11–18. [Google Scholar] [CrossRef]
- David, O.E.; Netanyahu, N.S. Deepsign: Deep learning for automatic malware signature generation and classification. In Proceedings of the 2015 International Joint Conference on Neural Networks (IJCNN), Killarney, Ireland, 12–17 July 2015; pp. 1–8. [Google Scholar]
- Avci, C.; Tekinerdogan, B.; Catal, C. Analyzing the performance of long short-term memory architectures for malware detection models. Concurr. Comput. Pract. Exp. 2023, 35, 1. [Google Scholar] [CrossRef]
- Khammas, B.M. Ransomware detection using random forest technique. ICT Express 2020, 6, 325–331. [Google Scholar] [CrossRef]
- Wu, Y.; Shi, J.; Wang, P.; Zeng, D.; Sun, C. DeepCatra: Learning flow-and graph-based behaviours for Android malware detection. IET Inf. Secur. 2023, 17, 118–130. [Google Scholar] [CrossRef]
- Li, X.; Qiu, K.; Qian, C.; Zhao, G. An adversarial machine learning method based on opcode n-grams feature in malware detection. In Proceedings of the 2020 IEEE Fifth International Conference on Data Science in Cyberspace (DSC), Hong Kong, China, 27–30 July 2020; pp. 380–387. [Google Scholar]
- García, D.E.; DeCastro-García, N.; Castañeda, A.L.M. An effectiveness analysis of transfer learning for the concept drift problem in malware detection. Expert Syst. Appl. 2023, 212, 118724. [Google Scholar] [CrossRef]
- AV-Atlas. Malware Statistics. Available online: https://portal.av-atlas.org/malware/statistics (accessed on 1 January 2020).
- Donoho, D.L. Compressed sensing. IEEE Trans. Inf. Theory 2006, 52, 1289–1306. [Google Scholar] [CrossRef]
- Zhang, Y.; Wang, P.; Huang, H.; Zhu, Y.; Xiao, D.; Xiang, Y. Privacy-assured FogCS: Chaotic compressive sensing for secure industrial big image data processing in fog computing. IEEE Trans. Ind. Inform. 2020, 17, 3401–3411. [Google Scholar] [CrossRef]
- Knill, C.; Schweizer, B.; Sparrer, S.; Roos, F.; Fischer, R.F.; Waldschmidt, C. High range and Doppler resolution by application of compressed sensing using low baseband bandwidth OFDM radar. IEEE Trans. Microw. Theory Tech. 2018, 66, 3535–3546. [Google Scholar] [CrossRef]
- Wu, W.; Peng, H.; Wen, L.; Liu, Y.; Tong, F.; Li, L. A Secure and Efficient Data Transmission Method with Multi-level Concealment Function Based on Chaotic Compressive Sensing. IEEE Sens. J. 2023, 23, 19823–19841. [Google Scholar] [CrossRef]
- Wu, W.; Peng, H.; Tong, F.; Li, L. A Chaotic Compressed Sensing-Based Multigroup Secret Image Sharing Method for IoT With Critical Information Concealment Function. IEEE Internet Things J. 2022, 10, 1192–1207. [Google Scholar] [CrossRef]
- Wu, W.; Peng, H.; Tong, F.; Li, L. Novel secure data transmission methods for IoT based on STP-CS with multilevel critical information concealment function. IEEE Internet Things J. 2022, 10, 4557–4578. [Google Scholar] [CrossRef]
- Wang, Y.; Doleschel, S.; Wunderlich, R.; Heinen, S. Evaluation of digital compressed sensing for real-time wireless ECG system with bluetooth low energy. J. Med. Syst. 2016, 40, 170. [Google Scholar] [CrossRef] [PubMed]
- Zhang, L.Y.; Wong, K.W.; Zhang, Y.; Zhou, J. Bi-level protected compressive sampling. IEEE Trans. Multimed. 2016, 18, 1720–1732. [Google Scholar] [CrossRef]
- Martins, N.; Cruz, J.M.; Cruz, T.; Abreu, P.H. Adversarial machine learning applied to intrusion and malware scenarios: A systematic review. IEEE Access 2020, 8, 35403–35419. [Google Scholar] [CrossRef]
- Gopinath, M.; Sethuraman, S.C. A comprehensive survey on deep learning based malware detection techniques. Comput. Sci. Rev. 2023, 47, 100529. [Google Scholar]
- Shalaginov, A.; Banin, S.; Dehghantanha, A.; Franke, K. Machine learning aided static malware analysis: A survey and tutorial. In Cyber Threat Intelligence; Springer: Berlin/Heidelberg, Germany, 2018; pp. 7–45. [Google Scholar]
- Brown, A.; Gupta, M.; Abdelsalam, M. Automated machine learning for deep learning based malware detection. Comput. Secur. 2024, 137, 103582. [Google Scholar] [CrossRef]
- Ucci, D.; Aniello, L.; Baldoni, R. Survey of machine learning techniques for malware analysis. Comput. Secur. 2019, 81, 123–147. [Google Scholar] [CrossRef]
- Yi, T.; Chen, X.; Zhu, Y.; Ge, W.; Han, Z. Review on the application of deep learning in network attack detection. J. Netw. Comput. Appl. 2023, 212, 103580. [Google Scholar] [CrossRef]
- Azeez, N.A.; Odufuwa, O.E.; Misra, S.; Oluranti, J.; Damaševičius, R. Windows PE malware detection using ensemble learning. Informatics 2021, 8, 10. [Google Scholar] [CrossRef]
- Syeda, D.Z.; Asghar, M.N. Dynamic Malware Classification and API Categorisation of Windows Portable Executable Files Using Machine Learning. Appl. Sci. 2024, 14, 1015. [Google Scholar] [CrossRef]
- Akhtar, M.S.; Feng, T. Malware analysis and detection using machine learning algorithms. Symmetry 2022, 14, 2304. [Google Scholar] [CrossRef]
- Chen, Z.; Ren, X. An efficient boosting-based windows malware family classification system using multi-features fusion. Appl. Sci. 2023, 13, 4060. [Google Scholar] [CrossRef]
- Aditya, W.R.; Hadiprakoso, R.B.; Waluyo, A. Deep learning for malware classification platform using windows api call sequence. In Proceedings of the 2021 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS), Jakarta, Indonesia, 28–29 October 2021; pp. 25–29. [Google Scholar]
- Qiu, J.; Zhang, J.; Luo, W.; Pan, L.; Nepal, S.; Xiang, Y. A survey of Android malware detection with deep neural models. ACM Comput. Surv. 2021, 53, 126. [Google Scholar] [CrossRef]
- Liu, Y.; Tantithamthavorn, C.; Li, L.; Liu, Y. Deep learning for Android malware defenses: A systematic literature review. ACM Comput. Surv. 2023, 55, 153. [Google Scholar] [CrossRef]
- Yadav, P.; Menon, N.; Ravi, V.; Vishvanathan, S.; Pham, T.D. A two-stage deep learning framework for image-based android malware detection and variant classification. Comput. Intell. 2022, 38, 1748–1771. [Google Scholar] [CrossRef]
- Li, W.; Bao, H.; Zhang, X.Y.; Li, L. Amdetector: Detecting Large-Scale and Novel Android Malware Traffic with Meta-Learning. International Conference on Computational Science, London, UK, 21–23 June 2022; Springer: Cham, Switzerland, 2022; pp. 387–401. [Google Scholar]
- Fallah, S.; Bidgoly, A.J. Android malware detection using network traffic based on sequential deep learning models. Softw. Pract. Exp. 2022, 52, 1987–2004. [Google Scholar] [CrossRef]
- Nataraj, L.; Karthikeyan, S.; Jacob, G.; Manjunath, B.S. Malware images: Visualization and automatic classification. In Proceedings of the 8th International Symposium on Visualization for Cyber Security, Pittsburgh, PA, USA, 20 July 2011. [Google Scholar]
- Arp, D.; Spreitzenbarth, M.; Hübner, M.; Gascon, H.; Rieck, K. Drebin: Effective and explainable detection of android malware in your pocket. In Proceedings of the Network and Distributed System Security, San Diego, CA, USA, 23–26 February 2014; Volume 14. [Google Scholar]
- Nataraj, L.; Yegneswaran, V.; Porras, P.; Zhang, J. A comparative assessment of malware classification using binary texture analysis and dynamic analysis. In Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence, Chicago, IL, USA, 21 October 2011. [Google Scholar]
- Yue, S.; Wang, T. Imbalanced malware images classification: A CNN based approach. arXiv 2017, arXiv:1708.08042. [Google Scholar]
- Makandar, A.; Patrot, A. Malware class recognition using image processing techniques. In Proceedings of the 2017 International Conference on Data Management, Analytics and Innovation (ICDMAI), Pune, India, 24–26 February 2017. [Google Scholar]
- Yajamanam, S.; Selvin, V.R.S.; Troia, F.D.; Stamp, M. Deep Learning versus Gist Descriptors for Image-based Malware Classification. In Proceedings of the 4th International Conference on Information Systems Security and Privacy (ICISSP), Madeira, Portugal, 22–24 January 2018. [Google Scholar]
- Bhodia, N.; Prajapati, P.; Troia, F.D.; Stamp, M. Transfer learning for image-based malware classification. arXiv 2019, arXiv:1903.11551. [Google Scholar]
- Cui, Z.; Du, L.; Wang, P.; Cai, X.; Zhang, W. Malicious code detection based on CNNs and multi-objective algorithm. J. Parallel Distrib. Comput. 2019, 129, 50–58. [Google Scholar] [CrossRef]
Layer | Channel | Kernel | Stride | Padding | Activation | Annotation |
---|---|---|---|---|---|---|
0 | c | - | - | - | - | Input Layer |
1 | c | k = | 1,1 | same | Linear | Conv_2d |
2 | c | k = | 1,1 | same | Linear | Conv_2d |
3 | k = | 2,2 | same | Max_pool_2d | ||
4–10 | Residual_shrinkage_block (out_channel = ) | |||||
11–17 | Residual_shrinkage_block (out_channel = ) | |||||
18–24 | Residual_shrinkage_block (out_channel = ) | |||||
25 | Batch_normalization (decay = 0.9) | |||||
26 | Activation (ReLU) | |||||
27 | Global_avg_pool | |||||
28 | Fully_connected (activation = softmax) | |||||
29 | Momentum (learning_rate = 0.1, lr_decay = 0.1, decay_step = 20,000) | |||||
30 | Regression (loss = categorical crossentropy) |
Compression Ratio | Compressed Method | Classification Accuracy | PSNR (dB) |
---|---|---|---|
0.05 | Gaussian | 0.41067 | 8.05946 |
Logistic | 0.43733 | 10.79386 | |
Tent | 0.43999 | 9.96238 | |
CSMC | 0.97867 | 19.02935 | |
0.1 | Gaussian | 0.57333 | 8.16376 |
Logistic | 0.48133 | 11.08026 | |
Tent | 0.57466 | 10.18503 | |
CSMC | 0.98000 | 20.07452 | |
0.3 | Gaussian | 0.73333 | 8.37789 |
Logistic | 0.74667 | 11.57817 | |
Tent | 0.73733 | 10.5878 | |
CSMC | 0.98000 | 24.32906 | |
0.5 | Gaussian | 0.72533 | 8.50674 |
Logistic | 0.71733 | 12.00514 | |
Tent | 0.69067 | 10.88755 | |
CSMC | 0.97733 | 28.64138 |
Compression Ratio | Compressed Method | Classification Accuracy | PSNR (dB) |
---|---|---|---|
0.05 | Gaussian | 0.96852 | 27.69984 |
Logistic | 0.98333 | 28.57814 | |
Tent | 0.98611 | 28.49639 | |
CSMC | 0.97500 | 36.65134 | |
0.1 | Gaussian | 0.97222 | 28.07057 |
Logistic | 0.97685 | 29.14585 | |
Tent | 0.97222 | 28.96592 | |
CSMC | 0.98056 | 39.59088 | |
0.3 | Gaussian | 0.97407 | 28.96270 |
Logistic | 0.96667 | 30.90944 | |
Tent | 0.97222 | 30.37308 | |
CSMC | 0.98148 | 45.83466 | |
0.5 | Gaussian | 0.96759 | 29.18425 |
Logistic | 0.97407 | 32.00142 | |
Tent | 0.97407 | 31.10516 | |
CSMC | 0.97778 | 50.63212 |
Metric | F-Value | p-Value |
---|---|---|
Malimg Accuracy | 8.450406 | 0.002744 |
Drebin Accuracy | 1.660423 | 0.228034 |
Malimg PSNR | 35.495435 | 0.000003 |
Drebin PSNR | 17.364036 | 0.000116 |
Metric | H-Value | p-Value |
---|---|---|
Malimg Accuracy | 8.505155 | 0.036648 |
Drebin Accuracy | 5.379464 | 0.146028 |
Malimg PSNR | 13.786765 | 0.003210 |
Drebin PSNR | 10.213235 | 0.016838 |
Method | Technique | Input | Reconstructable | Accuracy |
---|---|---|---|---|
[42] | Machine Learning | Uncompressed | No | 0.9718 |
[43] | Deep Learning | Uncompressed | No | 0.9732 |
[44] | Machine Learning | Uncompressed | No | 0.8911 |
[1] | Deep Learning | Uncompressed | No | 0.9760 |
[45] | Machine Learning | Uncompressed | No | 0.97 |
[46] | Deep Learning | Uncompressed | No | 0.9480 |
[1] | Machine Learning | Uncompressed | No | 0.922 |
[1] | Machine Learning | Uncompressed | No | 0.9190 |
[1] | Machine Learning | Uncompressed | No | 0.9320 |
[1] | Machine Learning | Uncompressed | No | 0.9250 |
[1] | Deep Learning | Uncompressed | No | 0.9450 |
[47] | Deep Learning | Uncompressed | No | 0.976 |
This Paper | Deep Learning | Compressed | Yes | 0.98 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wu, W.; Peng, H.; Zhu, H.; Zhang, D. CSMC: A Secure and Efficient Visualized Malware Classification Method Inspired by Compressed Sensing. Sensors 2024, 24, 4253. https://doi.org/10.3390/s24134253
Wu W, Peng H, Zhu H, Zhang D. CSMC: A Secure and Efficient Visualized Malware Classification Method Inspired by Compressed Sensing. Sensors. 2024; 24(13):4253. https://doi.org/10.3390/s24134253
Chicago/Turabian StyleWu, Wei, Haipeng Peng, Haotian Zhu, and Derun Zhang. 2024. "CSMC: A Secure and Efficient Visualized Malware Classification Method Inspired by Compressed Sensing" Sensors 24, no. 13: 4253. https://doi.org/10.3390/s24134253
APA StyleWu, W., Peng, H., Zhu, H., & Zhang, D. (2024). CSMC: A Secure and Efficient Visualized Malware Classification Method Inspired by Compressed Sensing. Sensors, 24(13), 4253. https://doi.org/10.3390/s24134253