HertDroid: Android Malware Detection Method with Influential Node Filter and Heterogeneous Graph Transformer
Abstract
:1. Introduction
2. Related Work
3. Preliminary
4. HertDroid Model
4.1. Architecture
4.2. Heterogeneous Graph Constructor
4.3. Android APK Transformer
4.3.1. DropAPI
4.3.2. Heterogeneous Multi-Component Attention
5. Experiments
5.1. Datasets
5.2. Baselines and Evaluation Criteria
- GCN [14] aggregates graph structure and node features through graph convolutions for homogeneous graphs to generate node embeddings;
- GAT [18] treats all nodes as nodes of the same type and computes attention for each node’s neighbor node.
- GATv2 [36] is an improvement of GAT that replaces the static attention mechanism with dynamic attention. This means that GATv2 is capable of changing the ranking of attention weights based on the query node.
- GraphSAGE [37] learns node representations by aggregating information from the node’s neighbors and recursively aggregates information from the node’s k-hop neighbors until a fixed-depth is reached. Similar to GAT, it is also a node generation method designed for homogeneous graphs.
- HAN [38] is a heterogeneous graph network utilizing meta-paths and attention mechanism. It first performs attention computation on the connected nodes of the target node and then performs attention computation on each meta-path after computing all nodes.
- Metapath2vec [39] is a node representation generation method for HG. Metapath2vec first specifies a meta-path and follows that path to randomly wander, constructs the heterogeneous neighborhood of each vertex, and finally computes the node embedding using the Skip-Gram model.
- HinDroid [20] utilizes API calls and three types of relations to construct heterogeneous graphs and classify based on multi-kernel support vector machines.
- DroidEvlover [41] extracts features based on API occurrence and determines whether the application is benign or malicious by implementing a model pool such as Passive Aggressive and Online Gradient Descent.
5.3. Experiment Setup
Environment | Version | Environment Type |
---|---|---|
GPU | NVIDIA RTX4090 | Hardware |
Python | 3.8.10 | Programming language |
Androguard [26] | 3.3.5 | Decompiled tool |
Pytorch [43] | 1.12.1 | Python package |
Pytorch Geometric [28] | 2.1.0.post1 | Python package |
Networkit [31] | 10.1 | Python package |
Pandas [44] | 1.5.2 | Python package |
Matplotlib [45] | 3.5.2 | Python package |
Neo4j [27] | 4.4.17 | Database |
5.4. Experiment Results and Discussion
5.4.1. Comparison with Domain-Specific Baselines
5.4.2. Comparison with GNN Baselines
5.5. Components Evaluation
5.5.1. Importance of DropAPI
5.5.2. Importance of Using HG to Express Android APKs
5.5.3. Importance of Embedding Model for HG without Manually Setting Meta-Paths
5.6. Visualization
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Statcounter. Available online: https://gs.statcounter.com/os-market-share/mobile/worldwide/#yearly-2023-2023-bar (accessed on 10 February 2024).
- Wu, Y.; Li, M.; Zeng, Q.; Yang, T.; Wang, J.; Fang, Z.; Cheng, L. DroidRL: Feature selection for android malware detection with reinforcement learning. Comput. Secur. 2023, 128, 103126. [Google Scholar] [CrossRef]
- Sharma, T.; Rattan, D. Malicious application detection in android—A systematic literature review. Comput. Sci. Rev. 2021, 40, 100373. [Google Scholar] [CrossRef]
- Pan, Y.; Ge, X.; Fang, C.; Fan, Y. A systematic literature review of android malware detection using static analysis. IEEE Access 2020, 8, 116363–116379. [Google Scholar] [CrossRef]
- Sharma, A.; Gupta, B.B.; Singh, A.K.; Saraswat, V. Orchestration of APT malware evasive manoeuvers employed for eluding anti-virus and sandbox defense. Comput. Secur. 2022, 115, 102627. [Google Scholar] [CrossRef]
- Şahin, D.Ö.; Kural, O.E.; Akleylek, S.; Kl, E. A novel permission-based Android malware detection system using feature selection based on linear regression. Neural Comput. Appl. 2021, 35, 4903–4918. [Google Scholar] [CrossRef]
- Arora, A.; Peddoju, S.K.; Conti, M. Permpair: Android malware detection using permission pairs. IEEE Trans. Inf. Forensics Secur. 2019, 15, 1968–1982. [Google Scholar] [CrossRef]
- Mohamad Arif, J.; Ab Razak, M.F.; Awang, S.; Tuan Mat, S.R.; Ismail, N.S.N.; Firdaus, A. A static analysis approach for Android permission-based malware detection systems. PLoS ONE 2021, 16, e0257968. [Google Scholar] [CrossRef] [PubMed]
- Shen, F.; Del Vecchio, J.; Mohaisen, A.; Ko, S.Y.; Ziarek, L. Android malware detection using complex-flows. IEEE Trans. Mob. Comput. 2018, 18, 1231–1245. [Google Scholar] [CrossRef]
- Pektaş, A.; Acarman, T. Deep learning for effective Android malware detection using API call graph embeddings. Soft Comput. 2020, 24, 1027–1043. [Google Scholar] [CrossRef]
- Li, D.; Zhao, L.; Cheng, Q.; Lu, N.; Shi, W. Opcode sequence analysis of Android malware by a convolutional neural network. Concurr. Comput. Pract. Exp. 2020, 32, e5308. [Google Scholar] [CrossRef]
- Tang, J.; Li, R.; Jiang, Y.; Gu, X.; Li, Y. Android malware obfuscation variants detection method based on multi-granularity opcode features. Future Gener. Comput. Syst. 2022, 129, 141–151. [Google Scholar] [CrossRef]
- Zhou, J.; Cui, G.; Hu, S.; Zhang, Z.; Yang, C.; Liu, Z.; Wang, L.; Li, C.; Sun, M. Graph neural networks: A review of methods and applications. AI Open 2020, 1, 57–81. [Google Scholar] [CrossRef]
- Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
- Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Philip, S.Y. A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 4–24. [Google Scholar] [CrossRef]
- Zhang, S.; Tong, H.; Xu, J.; Maciejewski, R. Graph convolutional networks: A comprehensive review. Comput. Soc. Netw. 2019, 6, 11. [Google Scholar] [CrossRef] [PubMed]
- Feng, P.; Yang, L.; Lu, D.; Xi, N.; Ma, J. BejaGNN: Behavior-based Java Malware Detection via Graph Neural Network. J. Supercomput. 2023, 79, 15390–15414. [Google Scholar] [CrossRef] [PubMed]
- Velikovi, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
- Gao, H.; Cheng, S.; Zhang, W. GDroid: Android malware detection and classification with graph convolutional network. Comput. Secur. 2021, 106, 102264. [Google Scholar] [CrossRef]
- Hou, S.; Ye, Y.; Song, Y.; Abdulhayoglu, M. Hindroid: An intelligent android malware detection system based on structured heterogeneous information network. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; pp. 1507–1515. [Google Scholar]
- Hei, Y.; Yang, R.; Peng, H.; Wang, L.; Xu, X.; Liu, J.; Liu, H.; Xu, J.; Sun, L. Hawk: Rapid android malware detection through heterogeneous graph attention networks. IEEE Trans. Neural Netw. Learn. Syst. 2021, 35, 4703–4717. [Google Scholar] [CrossRef] [PubMed]
- Ye, Y.; Hou, S.; Chen, L.; Lei, J.; Wan, W.; Wang, J.; Xiong, Q.; Shao, F. Out-of-sample node representation learning for heterogeneous graph in real-time android malware detection. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI), Macao, China, 10–16 August 2019. [Google Scholar]
- Hu, Z.; Dong, Y.; Wang, K.; Sun, Y. Heterogeneous graph transformer. In Proceedings of the Web Conference 2020, Taipei, Taiwan, 20–24 April 2020; pp. 2704–2710. [Google Scholar]
- Yang, X.; Yan, M.; Pan, S.; Ye, X.; Fan, D. Simple and efficient heterogeneous graph neural network. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; pp. 10816–10824. [Google Scholar]
- Sun, Y.; Han, J.; Yan, X.; Yu, P.S.; Wu, T. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. Proc. VLDB Endow. 2011, 4, 992–1003. [Google Scholar] [CrossRef]
- Androguard. Available online: https://github.com/androguard/androguard (accessed on 10 April 2023).
- Neo4j. Available online: https://github.com/neo4j/neo4j (accessed on 13 April 2023).
- Fey, M.; Lenssen, J.E. Fast graph representation learning with PyTorch Geometric. arXiv 2019, arXiv:1903.02428. [Google Scholar]
- Akram, M.; Kahraman, C.; Zahid, K. Group decision-making based on complex spherical fuzzy VIKOR approach. Knowl. Based Syst. 2021, 216, 106793. [Google Scholar] [CrossRef]
- Riondato, M.; Kornaropoulos, E.M. Fast approximation of betweenness centrality through sampling. In Proceedings of the 7th ACM International Conference on Web Search and Data Mining, New York, NY, USA, 24–28 February 2014; pp. 413–422. [Google Scholar]
- Angriman, E.; van der Grinten, A.; Hamann, M.; Meyerhenke, H.; Penschuck, M. Algorithms for large-scale network analysis and the NetworKit toolkit. In Algorithms for Big Data: DFG Priority Program 1736; Springer Nature: Cham, Switzerland, 2023; pp. 3–20. [Google Scholar]
- Misra, D. Mish: A Self Regularized Non-Monotonic Activation Function. In Proceedings of the British Machine Vision Conference, Virtual, 7–10 September 2020. [Google Scholar]
- Allix, K.; Bissyand, T.F.; Klein, J.; Le Traon, Y. Androzoo: Collecting millions of android apps for the research community. In Proceedings of the 13th International Conference on Mining Software Repositories, Austin, TX, USA, 14–15 May 2016; pp. 468–471. [Google Scholar]
- VirusShare. Available online: https://virusshare.com (accessed on 5 April 2023).
- APKtool. Available online: https://github.com/iBotPeaches/Apktool (accessed on 10 April 2023).
- Brody, S.; Alon, U.; Yahav, E. How attentive are graph attention networks? arXiv 2021, arXiv:2105.14491. [Google Scholar]
- Hamilton, W.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Wang, X.; Ji, H.; Shi, C.; Wang, B.; Ye, Y.; Cui, P.; Yu, P.S. Heterogeneous graph attention network. In Proceedings of the world Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019; pp. 2022–2032. [Google Scholar]
- Dong, Y.; Chawla, N.V.; Swami, A. metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; pp. 135–144. [Google Scholar]
- Schlichtkrull, M.; Kipf, T.N.; Bloem, P.; Van Den Berg, R.; Titov, I.; Welling, M. Modeling relational data with graph convolutional networks. In Proceedings of the Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, 3–7 June 2018; Proceedings 15. pp. 593–607. [Google Scholar]
- Xu, K.; Li, Y.; Deng, R.; Chen, K.; Xu, J. Droidevolver: Self-evolving android malware detection system. In Proceedings of the 2019 IEEE European Symposium on Security and Privacy (EuroS&P), Stockholm, Sweden, 17–19 June 2019; pp. 47–62. [Google Scholar]
- Loshchilov, I.; Hutter, F. Decoupled weight decay regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar]
- Pytorch. Available online: https://pytorch.org/ (accessed on 13 April 2023).
- Pandas. Available online: https://pandas.pydata.org/ (accessed on 13 April 2023).
- Matplotlib. Available online: https://matplotlib.org/ (accessed on 13 April 2023).
- Chen, D.; Lin, Y.; Li, W.; Li, P.; Zhou, J.; Sun, X. Measuring and relieving the over-smoothing problem for graph neural networks from the topological view. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 3438–3445. [Google Scholar]
- Zhang, C.; Song, D.; Huang, C.; Swami, A.; Chawla, N.V. Heterogeneous graph neural network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 793–803. [Google Scholar]
- Xu, M. Understanding graph embedding methods and their applications. SIAM Rev. 2021, 63, 825–853. [Google Scholar] [CrossRef]
- Cai, H.; Zheng, V.W.; Chang, K.C.-C. A comprehensive survey of graph embedding: Problems, techniques, and applications. IEEE Trans. Knowl. Data Eng. 2018, 30, 1616–1637. [Google Scholar] [CrossRef]
- Goyal, P.; Ferrara, E. Graph embedding techniques, applications, and performance: A survey. Knowl. Based Syst. 2018, 151, 78–94. [Google Scholar] [CrossRef]
- Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
Name | Entities | Feature Coding Mode |
---|---|---|
App | Application | One-hot code |
API | Application Programming Interface | Bag-of-words of names |
Feature | Feature | Bag-of-words of names |
Permission | Permission | Bag-of-words of names |
So | Shared library file | Bag-of-words of names |
Intent | Intent | Bag-of-words of types and actions |
Edge Type | Related Entities |
---|---|
DECLARES | App → Permission |
PASSES | App → Intent |
USES | App → Feature |
HAS | App → So |
CALLS | App → API |
INVOKES | API → API |
Datasets | Number of Apps | Sources | Years |
---|---|---|---|
Benign set | 10,361 | Google Play Store, AppChina | 2018–2022 |
Malicious set | 11,043 | VirusShare | 2018–2022 |
Measurement | Description |
---|---|
Params | Number of parameters in a model |
Accuracy | (TP + TN)/(TP + TN + FP + FN) |
Precision | TP/(TP + FP) |
Recall | TP/(TP + FN) |
F1 | 2 × Precison × Recall/(Precision + Recall) |
Model | Accuracy | Precision | Recall | F1 |
---|---|---|---|---|
HinDroid | 0.9420 | 0.9505 | 0.9536 | 0.9521 |
DroidEvolver | 0.9060 | 0.9264 | 0.9172 | 0.9218 |
HertDroid | 0.9700 | 0.9767 | 0.9735 | 0.9751 |
Model | Accuracy | Precision | Recall | F1 | Params |
---|---|---|---|---|---|
GraphSAGE | 0.9293 | 0.9394 | 0.9616 | 0.9504 | 11.34 M |
GAT | 0.9501 | 0.9504 | 0.9501 | 0.9502 | 5.61 M |
GATv2 | 0.7082 | 0.7105 | 0.9854 | 0.8257 | 23.92 M |
GCN | 0.9270 | 0.9292 | 0.9629 | 0.9458 | 5.67 M |
GraphSAGE-hete | 0.9425 | 0. 9683 | 0.9609 | 0.9645 | 24.93 M |
GAT-hete | 0.9318 | 0.9316 | 0.9318 | 0.9317 | 21.21 M |
HAN | 0.9306 | 0.9445 | 0.9672 | 0.9557 | 6.80 M |
Metapath2vec | 0.6284 | 0.8768 | 0.6018 | 0.7137 | 22.77 M |
HertDroid | 0.9802 | 0.9841 | 0.9904 | 0.9873 | 10.23 M |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Meng, X.; Li, D. HertDroid: Android Malware Detection Method with Influential Node Filter and Heterogeneous Graph Transformer. Appl. Sci. 2024, 14, 3150. https://doi.org/10.3390/app14083150
Meng X, Li D. HertDroid: Android Malware Detection Method with Influential Node Filter and Heterogeneous Graph Transformer. Applied Sciences. 2024; 14(8):3150. https://doi.org/10.3390/app14083150
Chicago/Turabian StyleMeng, Xinyi, and Daofeng Li. 2024. "HertDroid: Android Malware Detection Method with Influential Node Filter and Heterogeneous Graph Transformer" Applied Sciences 14, no. 8: 3150. https://doi.org/10.3390/app14083150
APA StyleMeng, X., & Li, D. (2024). HertDroid: Android Malware Detection Method with Influential Node Filter and Heterogeneous Graph Transformer. Applied Sciences, 14(8), 3150. https://doi.org/10.3390/app14083150