ConvProtoNet: Deep Prototype Induction towards Better Class Representation for Few-Shot Malware Classification
Abstract
:1. Introduction
2. Related Work
2.1. Malware Classification
2.2. Malware Image
2.3. Few-Shot Learning
3. Method
3.1. Problem Definition
3.2. Training Strategy
3.3. Conversion from Malware to Images
Algorithm 1 Conversion from malware to images |
Input:b: Byte sequence of malware; l: Length of byte sequence; w: Expected width of square malware image |
Output: Square malware image x |
1: Convert each byte in byte sequence to an integer ranges : |
2: |
3: Drop last integers in sequence: |
4: Reshape the integer sequence to a 2D square matrix: |
5: Convert 2D matrix to a gray scale image: |
6: if then |
7: Do up-sampling: |
8: else |
9: Do down-sampling: |
10: return x |
3.4. Model Architecture
3.4.1. Embedding Module
3.4.2. Convolutional Prototype Induction
3.4.3. Modified Cosine Similarity SoftMax Classifier Generator
4. Experimental Evaluation
4.1. Datasets and Preprocessing
4.1.1. Large PE Malware Dataset
4.1.2. VirusShare Malware Dataset
4.1.3. Drebin Dataset
4.2. Experiment Setup
4.3. Baseline Models
4.4. Validation across Different Datasets
4.5. Implement Details
4.6. Experiment Results
5. Discussion and Analysis
5.1. Effectiveness of Malware Image
5.2. Why ConvProtoNet Works: Some Comparisons
5.2.1. With Hybrid Attention-Based Prototypical Network
5.2.2. With Prototype Network
5.2.3. With Induction Network
6. Future Work
7. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Gandotra, E.; Bansal, D.; Sofat, S. Malware analysis and classification: A survey. J. Inf. Secur. 2014, 5, 56. [Google Scholar] [CrossRef] [Green Version]
- Ye, Y.; Li, T.; Adjeroh, D.; Iyengar, S.S. A survey on malware detection using data mining techniques. ACM Comput. Surv. (CSUR) 2017, 50, 41. [Google Scholar] [CrossRef]
- Raff, E.; Barker, J.; Sylvester, J.; Brandon, R.; Catanzaro, B.; Nicholas, C.K. Malware detection by eating a whole exe. In Proceedings of the Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
- Kolosnjaji, B.; Zarras, A.; Webster, G.; Eckert, C. Deep learning for classification of malware system call sequences. In Australasian Joint Conference on Artificial Intelligence; Springer: Cham, Switzerland, 2016; pp. 137–149. [Google Scholar]
- Pascanu, R.; Stokes, J.W.; Sanossian, H.; Marinescu, M.; Thomas, A. Malware classification with recurrent networks. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, Australia, 19–24 April 2015; pp. 1916–1920. [Google Scholar]
- Wang, Y.; Yao, Q. Few-shot learning: A survey. arXiv 2019, arXiv:1904.05046. [Google Scholar]
- Li, F.-F.; Fergus, R.; Perona, P. One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 594–611. [Google Scholar]
- Sung, F.; Yang, Y.; Zhang, L.; Xiang, T.; Torr, P.H.; Hospedales, T.M. Learning to compare: Relation network for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1199–1208. [Google Scholar]
- Snell, J.; Swersky, K.; Zemel, R. Prototypical networks for few-shot learning. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2017; pp. 4077–4087. [Google Scholar]
- Gidaris, S.; Komodakis, N. Dynamic few-shot visual learning without forgetting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4367–4375. [Google Scholar]
- Qi, H.; Brown, M.; Lowe, D.G. Low-shot learning with imprinted weights. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 5822–5830. [Google Scholar]
- Ravi, S.; Larochelle, H. Optimization as a model for few-shot learning. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
- Koch, G.; Zemel, R.; Salakhutdinov, R. Siamese Neural Networks for One-Shot image Recognition. In Proceedings of the ICML Deep Learning Workshop, Lille, France, 6–11 July 2015; Volume 2. [Google Scholar]
- Geng, R.; Li, B.; Li, Y.; Ye, Y.; Jian, P.; Sun, J. Few-Shot Text Classification with Induction Network. arXiv 2019, arXiv:1902.10482. [Google Scholar]
- Gao, T.; Han, X.; Liu, Z.; Sun, M. Hybrid attention-based prototypical networks for noisy few-shot relation classification. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, AAAI-19, Honolulu, HI, USA, 27 January–1 February 2019. [Google Scholar]
- Nataraj, L.; Karthikeyan, S.; Jacob, G.; Manjunath, B. Malware images: Visualization and automatic classification. In Proceedings of the 8th International Symposium on Visualization for Cyber Security, Pittsburgh, PA, USA, 20 July 2011; p. 4. [Google Scholar]
- Abou-Assaleh, T.; Cercone, N.; Keselj, V.; Sweidan, R. N-gram-based detection of new malicious code. In Proceedings of the 28th Annual International Computer Software and Applications Conference, COMPSAC 2004, Hong Kong, China, 28–30 September 2004; Volume 2, pp. 41–42. [Google Scholar]
- Santos, I.; Laorden, C.; Bringas, P.G. Collective classification for unknown malware detection. In Proceedings of the International Conference on Security and Cryptography, Seville, Spain, 18–21 July 2011; pp. 251–256. [Google Scholar]
- Anderson, B.; Storlie, C.; Lane, T. Improving malware classification: Bridging the static/dynamic gap. In Proceedings of the 5th ACM Workshop on Security and Artificial Intelligence, Raleigh, NC, USA, 19 October 2012; pp. 3–14. [Google Scholar]
- Santos, I.; Penya, Y.K.; Devesa, J.; Bringas, P.G. N-grams-based File Signatures for Malware Detection. ICEIS (2) 2009, 9, 317–320. [Google Scholar]
- Ye, Y.; Chen, L.; Wang, D.; Li, T.; Jiang, Q.; Zhao, M. SBMDS: An interpretable string based malware detection system using SVM ensemble with bagging. J. Comput. Virol. 2009, 5, 283. [Google Scholar] [CrossRef] [Green Version]
- Moskovitch, R.; Feher, C.; Tzachar, N.; Berger, E.; Gitelman, M.; Dolev, S.; Elovici, Y. Unknown malcode detection using opcode representation. In European Conference on Intelligence and Security Informatics; Springer: Berlin, Germany, 2008; pp. 204–215. [Google Scholar]
- Islam, R.; Tian, R.; Batten, L.; Versteeg, S. Classification of malware based on string and function feature selection. In Proceedings of the 2010 Second Cybercrime and Trustworthy Computing Workshop, Ballarat, Australia, 19–20 July 2010; pp. 9–17. [Google Scholar]
- Liu, L.; Wang, B. Malware classification using gray-scale images and ensemble learning. In Proceedings of the 2016 3rd International Conference on Systems and Informatics (ICSAI), Shanghai, China, 19–21 November 2016; pp. 1018–1022. [Google Scholar]
- Ucci, D.; Aniello, L.; Baldoni, R. Survey of machine learning techniques for malware analysis. Comput. Secur. 2019, 81, 123–147. [Google Scholar] [CrossRef] [Green Version]
- Bhodia, N.; Prajapati, P.; Di Troia, F.; Stamp, M. Transfer Learning for Image-Based Malware Classification. arXiv 2019, arXiv:1903.11551. [Google Scholar]
- Oliva, A.; Torralba, A. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Comput. Vis. 2001, 42, 145–175. [Google Scholar] [CrossRef]
- Gibert, D. Convolutional Neural Networks for Malware Classification. Master’s Thesis, University Rovira i Virgili, Tarragona, Spain, 2016. [Google Scholar]
- Nataraj, L.; Yegneswaran, V.; Porras, P.; Zhang, J. A comparative assessment of malware classification using binary texture analysis and dynamic analysis. In Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence, Chicago, IL, USA, 21 October 2011; pp. 21–30. [Google Scholar]
- Choi, S.; Jang, S.; Kim, Y.; Kim, J. Malware detection using malware image and deep learning. In Proceedings of the 2017 International Conference on Information and Communication Technology Convergence (ICTC), Jeju, Korea, 18–20 October 2017; pp. 1193–1195. [Google Scholar]
- Kim, H.J. Image-based malware classification using convolutional neural network. In Advances in Computer Science and Ubiquitous Computing; Springer: Singapore, 2017; pp. 1352–1357. [Google Scholar]
- Kalash, M.; Rochan, M.; Mohammed, N.; Bruce, N.D.; Wang, Y.; Iqbal, F. Malware classification with deep convolutional neural networks. In Proceedings of the 2018 9th IFIP International Conference on New Technologies, Mobility and Security (NTMS), Paris, France, 26–28 February 2018; pp. 1–5. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Vu, D.L.; Nguyen, T.K.; Nguyen, T.V.; Nguyen, T.N.; Massacci, F.; Phung, P.H. A Convolutional Transformation Network for Malware Classification. arXiv 2019, arXiv:1909.07227. [Google Scholar]
- Vinyals, O.; Blundell, C.; Lillicrap, T.; Wierstra, D. Matching networks for one shot learning. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2016; pp. 3630–3638. [Google Scholar]
- Finn, C.; Abbeel, P.; Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; Volume 70, pp. 1126–1135. [Google Scholar]
- Mishra, N.; Rohaninejad, M.; Chen, X.; Abbeel, P. A simple neural attentive meta-learner. arXiv 2017, arXiv:1707.03141. [Google Scholar]
- Li, Z.; Zhou, F.; Chen, F.; Li, H. Meta-sgd: Learning to learn quickly for few-shot learning. arXiv 2017, arXiv:1707.09835. [Google Scholar]
- Munkhdalai, T.; Yu, H. Meta networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; Volume 70, pp. 2554–2563. [Google Scholar]
- Chen, Z.; Fu, Y.; Zhang, Y.; Jiang, Y.G.; Xue, X.; Sigal, L. Semantic feature augmentation in few-shot learning. arXiv 2018, arXiv:1804.05298. [Google Scholar]
- Dixit, M.; Kwitt, R.; Niethammer, M.; Vasconcelos, N. Aga: Attribute-guided augmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7455–7463. [Google Scholar]
- Wang, Y.X.; Girshick, R.; Hebert, M.; Hariharan, B. Low-shot learning from imaginary data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7278–7286. [Google Scholar]
- Seok, S.; Kim, H. Visualized malware classification based-on convolutional neural network. J. Korea Inst. Inf. Secur. Cryptol. 2016, 26, 197–208. [Google Scholar] [CrossRef] [Green Version]
- Chen, W.Y.; Liu, Y.C.; Kira, Z.; Wang, Y.C.F.; Huang, J.B. A Closer Look at Few-shot Classification. arXiv 2019, arXiv:1904.04232. [Google Scholar]
- Lin, M.; Chen, Q.; Yan, S. Network in network. arXiv 2013, arXiv:1312.4400. [Google Scholar]
- Fang, Z.; Wang, J.; Li, B.; Wu, S.; Zhou, Y.; Huang, H. Evading anti-malware engines with deep reinforcement learning. IEEE Access 2019, 7, 48867–48879. [Google Scholar] [CrossRef]
- Total, V. Virustotal-Free Online Virus, Malware and Url Scanner. Available online: https://www.virustotal.com/en (accessed on 29 October 2019).
- Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd; AAAI Press: Palo Alto, CA, USA, 1996; Volume 96, pp. 226–231. [Google Scholar]
- Roberts, J.M. Virus Share. 2011. Available online: https://virusshare.com (accessed on 26 October 2019).
- Sebastián, M.; Rivera, R.; Kotzias, P.; Caballero, J. AVclass: A Tool for Massive Malware Labeling. In International Symposium on Research in Attacks, Intrusions, and Defenses; Springer: Cham, Switzerland, 2016. [Google Scholar]
- Arp, D.; Spreitzenbarth, M.; Hubner, M.; Gascon, H.; Rieck, K. Drebin: Efficient and Explainable Detection of Android Malware in Your Pocket. In Proceedings of the 21th Annual Network and Distributed System Security Symposium (NDSS), San Diego, CA, USA, 23–26 February 2014. [Google Scholar]
- Michael, S.; Florian, E.; Thomas, S.; Felix, C.F.; Hoffmann, J. Mobilesandbox: Looking deeper into android applications. In Proceedings of the 28th International ACM Symposium on Applied Computing (SAC), Coimbra, Portugal, 18–22 March 2013. [Google Scholar]
- Yajamanam, S.; Selvin, V.R.S.; Di Troia, F.; Stamp, M. Deep Learning versus Gist Descriptors for Image-based Malware Classification. In Proceedings of the ICISSP, Madeira, Portugal, 22–24 January 2018; pp. 553–561. [Google Scholar]
- Ronen, R.; Radu, M.; Feuerstein, C.; Yom-Tov, E.; Ahmadi, M. Microsoft malware classification challenge. arXiv 2018, arXiv:1802.10135. [Google Scholar]
- Wang, F.; Jiang, M.; Qian, C.; Yang, S.; Li, C.; Zhang, H.; Wang, X.; Tang, X. Residual attention network for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3156–3164. [Google Scholar]
Models | Filtered | Unfiltered | ||||||
---|---|---|---|---|---|---|---|---|
5-Way Acc. | 20-Way Acc. | 5-Way Acc. | 20-Way Acc. | |||||
5-Shot | 10-Shot | 5-Shot | 10-Shot | 5-Shot | 10-Shot | 5-Shot | 10-Shot | |
Gist + kNN [54] | 69.07 ± 2.81 | 75.18 ± 2.53 | 55.51 ± 1.60 | 61.82 ± 1.58 | 50.17 ± 1.69 | 57.90 ± 1.76 | 33.75 ± 0.82 | 41.47 ± 0.98 |
N-Gram + kNN [17] | 46.57 ± 2.18 | 49.37 ± 2.04 | 36.67 ± 1.09 | 38.30 ± 1.80 | 36.05 ± 2.03 | 41.64 ± 3.24 | 25.33 ± 1.46 | 31.22 ± 2.13 |
PE + kNN [26] | - | - | - | - | 57.18 ± 1.10 | 66.56 ± 1.00 | 41.77 ± 0.82 | 51.09 ± 0.84 |
pixel kNN | 63.60 ± 0.94 | 67.39 ± 0.96 | 42.49 ± 0.58 | 45.22 ± 0.42 | 45.11 ± 0.64 | 47.81 ± 0.67 | 16.87 ± 0.31 | 19.39 ± 0.30 |
InductionNet [14] | 68.08 ± 0.38 | 69.17 ± 0.19 | 42.09 ± 0.09 | 44.24 ± 0.18 | 45.09 ± 0.36 | 52.33 ± 0.35 | 28.13 ± 0.16 | 28.88 ± 0.15 |
SNAIL [38] | 73.81 ± 0.36 | 75.13 ± 0.35 | - | - | 52.87 ± 0.35 | 57.29 ± 0.34 | - | - |
Meta-SGD [39] | 75.54 ± 0.31 | 77.84 ± 0.31 | 44.85 ± 0.18 | 46.89 ± 0.18 | 60.47 ± 0.31 | 63.37 ± 0.31 | 37.06 ± 0.14 | 38.31 ± 0.16 |
ProtoNet [9] | 79.57 ± 0.22 | 82.63 ± 0.20 | 63.31 ± 0.12 | 65.00 ± 0.11 | 63.51 ± 0.16 | 66.73 ± 0.15 | 43.22 ± 0.08 | 47.58 ± 0.07 |
RelationNet [8] | 74.98 ± 0.23 | 77.28 ± 0.23 | 53.13 ± 0.12 | 57.48 ± 0.13 | 61.41 ± 0.16 | 60.05 ± 0.17 | 38.45 ± 0.08 | 39.50 ± 0.09 |
HABPN [15] | 80.19 ± 0.20 | 82.24 ± 0.21 | 56.34 ± 0.19 | 59.44 ± 0.18 | 58.78 ± 0.23 | 65.62 ± 0.15 | 40.32 ± 0.09 | 40.67 ± 0.08 |
ConvProtoNet | 83.34 ± 0.13 | 86.63 ± 0.13 | 68.56 ± 0.08 | 71.38 ± 0.08 | 68.07 ± 0.23 | 71.22 ± 0.15 | 48.61 ± 0.07 | 53.53 ± 0.08 |
Models | 5-Way Acc. | 20-Way Acc. | ||
---|---|---|---|---|
5-Shot | 10-Shot | 5-Shot | 10-Shot | |
Gist + kNN [54] | 59.91 ± 2.85 | 69.28 ± 2.25 | 44.34 ± 1.32 | 52.99 ± 1.47 |
N-Gram + kNN [17] | 54.75 ± 2.80 | 57.08 ± 2.00 | 47.56 ± 2.34 | 47.88 ± 2.04 |
PE + kNN [26] | 70.58 ± 1.03 | 78.51 ± 0.92 | 56.58 ± 1.68 | 64.81 ± 1.88 |
pixel kNN | 51.06 ± 1.11 | 53.94 ± 1.24 | 33.24 ± 0.60 | 38.64 ± 0.63 |
InductionNet [14] | 54.10 ± 0.37 | 56.61 ± 0.37 | 36.33 ± 0.18 | 43.20 ± 0.18 |
SNAIL [38] | 54.60 ± 0.37 | 59.11 ± 0.35 | - | - |
Meta-SGD [39] | 65.15 ± 0.32 | 67.72 ± 0.31 | 41.79 ± 0.18 | 44.44 ± 0.17 |
ProtoNet [9] | 69.80 ± 0.35 | 76.78 ± 0.23 | 58.13 ± 0.12 | 63.42 ± 0.11 |
RelationNet [8] | 65.42 ± 0.22 | 67.53 ± 0.23 | 46.12 ± 0.11 | 49.05 ± 0.12 |
HABPN [15] | 67.70 ± 0.23 | 70.83 ± 0.23 | 49.78 ± 0.11 | 54.18 ± 0.12 |
ConvProtoNet | 75.59 ± 0.22 | 80.30 ± 0.45 | 59.17 ± 0.24 | 64.03 ± 0.23 |
Models | 5-Shot | |
---|---|---|
5-Way Acc. | 10-Way Acc. | |
Gist + kNN [54] | 62.06 ± 2.00 | 50.71 ± 1.48 |
N-Gram + kNN [17] | 57.82 ± 2.07 | 48.88 ± 2.12 |
PE + kNN [26] | - | - |
pixel kNN | 33.40 ± 0.90 | 24.39 ± 0.61 |
InductionNet [14] | 42.87 ± 0.32 | 31.10 ± 0.18 |
SNAIL [38] | 47.95 ± 0.37 | 25.21 ± 0.16 |
Meta-SGD [39] | 54.01 ± 0.34 | 37.94 ± 0.14 |
ProtoNet [9] | 66.14 ± 0.24 | 51.05 ± 0.10 |
RelationNet [8] | 52.39 ± 0.23 | 40.37 ± 0.10 |
HABPN [15] | 56.90 ± 0.24 | 43.58 ± 0.09 |
ConvProtoNet | 68.58 ± 0.22 | 57.10 ± 0.09 |
Base Datasets | Tested Dataset | ||||
---|---|---|---|---|---|
LargePE (Filtered) | LargePE (Unfiltered) | Microsoft | VirusShare | Drebin | |
LargePE (filtered) | 83.34 ± 0.13 | 63.19 ± 0.23 | 77.19 ± 0.15 | 70.02 ± 0.24 | 57.42 ± 0.24 |
LargePE (unfiltered) | 83.85 ± 0.19 | 68.07 ± 0.23 | 78.29 ± 0.15 | 71.87 ± 0.24 | 52.34 ± 0.21 |
VirusShare | 81.34 ± 0.21 | 63.47 ± 0.23 | 77.77 ± 0.16 | 75.59 ± 0.22 | 60.01 ± 0.21 |
Drebin | 74.61 ± 0.24 | 52.91 ± 0.24 | 73.69 ± 0.19 | 62.21 ± 0.25 | 68.58 ± 0.22 |
Measures | Datasets | |||
---|---|---|---|---|
Filtered LargePE | Unfiltered LargePE | VirusShare | Drebin | |
Accuracy | 0.8334 | 0.6807 | 0.7559 | 0.6858 |
Precision | 0.8349 | 0.6893 | 0.7666 | 0.6966 |
Recall | 0.8225 | 0.6778 | 0.7545 | 0.6623 |
F1 Score | 0.8158 | 0.6684 | 0.7460 | 0.6494 |
AUC | 0.9568 | 0.8978 | 0.9349 | 0.8831 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tang, Z.; Wang, P.; Wang, J. ConvProtoNet: Deep Prototype Induction towards Better Class Representation for Few-Shot Malware Classification. Appl. Sci. 2020, 10, 2847. https://doi.org/10.3390/app10082847
Tang Z, Wang P, Wang J. ConvProtoNet: Deep Prototype Induction towards Better Class Representation for Few-Shot Malware Classification. Applied Sciences. 2020; 10(8):2847. https://doi.org/10.3390/app10082847
Chicago/Turabian StyleTang, Zhijie, Peng Wang, and Junfeng Wang. 2020. "ConvProtoNet: Deep Prototype Induction towards Better Class Representation for Few-Shot Malware Classification" Applied Sciences 10, no. 8: 2847. https://doi.org/10.3390/app10082847
APA StyleTang, Z., Wang, P., & Wang, J. (2020). ConvProtoNet: Deep Prototype Induction towards Better Class Representation for Few-Shot Malware Classification. Applied Sciences, 10(8), 2847. https://doi.org/10.3390/app10082847