A Network Scanning Organization Discovery Method Based on Graph Convolutional Neural Network
Abstract
1. Introduction
- For the first time, we construct an exhaustive dataset of 19 network scanning organizations, including 1,201,797 pieces of network scanning traffic data.
- We propose a network scanning organization discovery method based on GCN, which models the correlations between network scanning behaviors to identify network scanning organizations.
- We construct an attribute graph to represent the network scanning behavior, use a Laplace filter to smooth the feature matrix and extract deep features by GCN, and finally use a clustering algorithm to identify organizations.
- The effectiveness of the proposed method is demonstrated through experiments, with an identification accuracy of 83.41%.
2. Related Work
2.1. Network Scanning Behavior Identification
2.2. Network Scanning Organization Discovery
3. Method
3.1. Attribute Graph Construction
3.2. Feature Extraction
3.3. Graph Embedding
3.4. Organization Discovery
Algorithm 1 The network scanning organization discovery algorithm based on GCN and K-means |
Input: Network Scanning Attribute Graph: G, Number of Clusters: k. Output: Cluster Centres and Labels. Process:
|
Algorithm 2 The network scanning organization discovery algorithm based on GCN and Spectral |
Input: Network Scanning Attribute Graph: G, Number of Clusters: k. Output: Cluster Centres and Labels. Process:
|
Algorithm 3 The network scanning organization discovery algorithm based on GCN and DBSCAN |
Input: Network Scanning Attribute Graph: G, Neighborhood Radius: , Minimum Number of Neighbors: . Output: Cluster Centres and Labels. Process:
|
4. Evaluation
4.1. Experiment Setup
4.2. Evaluation Results
- non-Graph: Instead of constructing a node attribute graph, the relevant attributes of the host devices and the relevant attributes of the network connections are spliced together, and the whole is used as a feature.
- Graph-G: Construct a node attribute graph with host devices as nodes and network connections as relationships.
- Graph-LG: Based on the construction of the node attribute graph, the feature matrix is smoothed using the Laplace smoothing technique.
5. Discussion
5.1. Limitations
5.2. Future Work
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Hao, H.; Xu, C.; Zhang, W.; Yang, S.; Muntean, G.M. Joint task offloading, resource allocation, and trajectory design for multi-uav cooperative edge computing with task priority. IEEE Trans. Mob. Comput. 2024, 23, 8649–8663. [Google Scholar] [CrossRef]
- Hao, H.; Xu, C.; Zhang, W.; Yang, S.; Muntean, G.M. Task-Driven Priority-Aware Computation Offloading Using Deep Reinforcement Learning. IEEE Trans. Wirel. Commun. 2025, 24, 8114–8128. [Google Scholar] [CrossRef]
- Camelo, M.; Soto, P.; Latré, S. A General Approach for Traffic Classification in Wireless Networks Using Deep Learning. IEEE Trans. Netw. Serv. Manag. 2022, 19, 5044–5063. [Google Scholar] [CrossRef]
- Jenefa, A.; Sam, S.; Nair, V.; Thomas, B.G.; George, A.S.; Thomas, R.; Sunil, A.D. A Robust Deep Learning-based Approach for Network Traffic Classification using CNNs and RNNs. In Proceedings of the 2023 4th International Conference on Signal Processing and Communication (ICSPC), Coimbatore, India, 23–24 March 2023; pp. 106–110. [Google Scholar] [CrossRef]
- Marchetta, P.; Pescapé, A. DRAGO: Detecting, quantifying and locating hidden routers in Traceroute IP paths. In Proceedings of the 2013 Proceedings IEEE INFOCOM, Turin, Italy, 14–19 April 2013; pp. 3237–3242. [Google Scholar] [CrossRef]
- Sherry, J.; Katz-Bassett, E.; Pimenova, M.; Madhyastha, H.V.; Anderson, T.; Krishnamurthy, A. Resolving IP aliases with prespecified timestamps. In Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement, Melbourne, Australia, 1–3 November 2010; IMC ’10. pp. 172–178. [Google Scholar] [CrossRef]
- Marchetta, P.; Persico, V.; Pescapè, A. Pythia: Yet another active probing technique for alias resolution. In Proceedings of the Ninth ACM Conference on Emerging Networking Experiments and Technologies, Santa Barbara, CA, USA, 9–12 December 2013; CoNEXT ’13. pp. 229–234. [Google Scholar] [CrossRef]
- Yang, B.; Sun, S.; Li, J.; Lin, X.; Tian, Y. Traffic flow prediction using LSTM with feature enhancement. Neurocomputing 2019, 332, 320–327. [Google Scholar] [CrossRef]
- Vikram, A.; Mohana. Anomaly detection in Network Traffic Using Unsupervised Machine learning Approach. In Proceedings of the 2020 5th International Conference on Communication and Electronics Systems (ICCES), Coimbatore, India, 10–12 June 2020; pp. 476–479. [Google Scholar] [CrossRef]
- Marwah, M.; Arlitt, M. Deep Learning for Network Traffic Data. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 14–18 August 2022; KDD ’22. pp. 4804–4805. [Google Scholar] [CrossRef]
- Elsheikh, M.; Shalaby, M.; Sobh, M.A.; Bahaa-Eldin, A.M. Deep Learning Techniques for Intrusion Detection Systems: A Survey and Comparative Study. In Proceedings of the 2023 International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC), Cairo, Egypt, 27–28 September 2023; pp. 1–9. [Google Scholar] [CrossRef]
- Liu, H.; Lang, B. Machine Learning and Deep Learning Methods for Intrusion Detection Systems: A Survey. Appl. Sci. 2019, 9, 4396. [Google Scholar] [CrossRef]
- Kurnala, V.; Naik, S.A.; Surapaneni, D.C.; Reddy, C.B. Hybrid Detection: Enhancing Network & Server Intrusion Detection Using Deep Learning. In Proceedings of the 2023 IEEE 5th International Conference on Cybernetics, Cognition and Machine Learning Applications (ICCCMLA), Hamburg, Germany, 7–8 October 2023; pp. 248–251. [Google Scholar] [CrossRef]
- Ashiku, L.; Dagli, C. Network Intrusion Detection System using Deep Learning. Procedia Comput. Sci. 2021, 185, 239–247. [Google Scholar] [CrossRef]
- He, P.; Zhu, J.; He, S.; Li, J.; Lyu, M.R. Towards Automated Log Parsing for Large-Scale Log Data Analysis. IEEE Trans. Dependable Secur. Comput. 2018, 15, 931–944. [Google Scholar] [CrossRef]
- Landauer, M.; Skopik, F.; Wurzenberger, M.; Rauber, A. System Log Clustering Approaches for Cyber Security Applications: A Survey. Comput. Secur. 2020, 92, 101739. [Google Scholar] [CrossRef]
- Zhong, M.; Zhou, Y.; Chen, G. A Security Log Analysis Scheme Using Deep Learning Algorithm for IDSs in Social Network. Secur. Commun. Networks 2021, 2021, 5542543. [Google Scholar] [CrossRef]
- Ramachandran, S.; Agrahari, R.; Mudgal, P.; Bhilwaria, H.; Long, G.; Kumar, A. Automated Log Classification Using Deep Learning. Procedia Comput. Sci. 2023, 218, 1722–1732. [Google Scholar] [CrossRef]
- Bhuyan, M.H.; Bhattacharyya, D.; Kalita, J. Surveying Port Scans and Their Detection Methodologies. Comput. J. 2011, 54, 1565–1581. [Google Scholar] [CrossRef]
- Mirza, A. Port Scanning: Techniques, Tools and Detection. engrXiv, 2023; preprint. [Google Scholar] [CrossRef] [PubMed]
- Wang, Y.; Zhang, J. DeepPort: Detect Low Speed Port Scan Using Convolutional Neural Network. In Proceedings of the International Conference on Bio-Inspired Computing: Theories and Applications, Beijing, China, 2–4 November 2018; Qiao, J., Zhao, X., Pan, L., Zuo, X., Zhang, X., Zhang, Q., Huang, S., Eds.; Springer: Singapore, 2018; pp. 368–379. [Google Scholar]
- Algaolahi, A.Q.M.; Hasan, A.A.; Sallam, A.; Sharaf, A.M.; Abdu, A.A.; Alqadi, A.A. Port-Scanning Attack Detection Using Supervised Machine Learning Classifiers. In Proceedings of the 2021 1st International Conference on Emerging Smart Technologies and Applications (eSmarTA), Sana’a, Yemen, 10–12 August 2021; pp. 1–5. [Google Scholar] [CrossRef]
- Aksu, D.; Ali Aydin, M. Detecting Port Scan Attempts with Comparative Analysis of Deep Learning and Support Vector Machine Algorithms. In Proceedings of the 2018 International Congress on Big Data, Deep Learning and Fighting Cyber Terrorism (IBIGDELFT), Ankara, Turkey, 3–4 December 2018; pp. 77–80. [Google Scholar] [CrossRef]
- Sun, H.; He, F.; Huang, J.; Sun, Y.; Li, Y.; Wang, C.; He, L.; Sun, Z.; Jia, X. Network Embedding for Community Detection in Attributed Networks. ACM Trans. Knowl. Discov. Data 2020, 14, 1–25. [Google Scholar] [CrossRef]
- Boden, B.; Ester, M.; Seidl, T. Density-Based Subspace Clustering in Heterogeneous Networks. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases Machine Learning and Knowledge Discovery in Databases, Nancy, France, 15–19 September 2014; Calders, T., Esposito, F., Hüllermeier, E., Meo, R., Eds.; Springer: Berlin/Heidelberg, Germany, 2014; pp. 149–164. [Google Scholar]
- Cui, G.; Zhou, J.; Yang, C.; Liu, Z. Adaptive Graph Encoder for Attributed Graph Embedding. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual, 6–10 July 2020; ACM: New York, NY, USA, 2020. KDD ’20. [Google Scholar] [CrossRef]
- Richter, P.; Berger, A. Scanning the Scanners: Sensing the Internet from a Massively Distributed Network Telescope. In Proceedings of the Internet Measurement Conference, Amsterdam, The Netherlands, 21–23 October 2019; IMC ’19. pp. 144–157. [Google Scholar] [CrossRef]
- Li, X.; Azad, B.A.; Rahmati, A.; Nikiforakis, N. Good Bot, Bad Bot: Characterizing Automated Browsing Activity. In Proceedings of the 2021 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 24–27 May 2021; pp. 1589–1605. [Google Scholar] [CrossRef]
- Mazel, J.; Strullu, R. Identifying and characterizing ZMap scans: A cryptanalytic approach. arXiv 2019, arXiv:1908.04193. [Google Scholar] [CrossRef]
Entity Type | Entity Description and Example |
---|---|
Ip: IP Address, e.g., xxx.xxx.xxx.34 | |
Port: Port Number, e.g., 42824 | |
Whois: Information About The Domain Name Associated With The Source IP Address | |
Areacode: The Code Of The Country Where The Source IP Address Is Located, e.g., US | |
City: The Name Of The City Where The Source IP Address Is Located, e.g., San Francisco. | |
Host Computer Equipment | Isp: Name Of The Internet Service Provider Of The Source IP Address, e.g., Enes Koken |
Asn: Autonomous System Number Assigned To Each ISP, e.g., 14061 | |
Org: Name Of The Service Provider Or organization Managing The Source IP, e.g., DigitalOcean, LLC | |
P: The Name Of The Province Or State Where The Source IP Address Is Located, e.g., California | |
Lat: Latitude Where The Source IP Address Is Located, e.g., 37.775090 | |
Lon: The Longitude Of The Source IP Address, e.g., −122.419640 | |
Action: Type Of Request, e.g., Connect | |
Transport Protocol: Type of transport protocol, e.g., TCP | |
Utc: Universal Standard Time, e.g., 31 October 2023 11:57:00 p.m. | |
Headers: Request Headers Stored As Key-Value Pairs | |
Headers_keys: Keys For Request Headers | |
Network Connection | Headers_values: Values For Request Headers |
Method: Request Method, Such As GET | |
Proto: Protocol Type, e.g., HTTP/1.1 | |
Uri: Uniform Resource Identifier Used To Indicate The Path To The Requested Resource, e.g., /manage/account/login | |
Pack_datagram: Hexadecimal Representation Of The Packet | |
Data_length: Length Of The Requested Data |
Method | ACC | NMI | ARI | SI | CHI | DBI | |
---|---|---|---|---|---|---|---|
K-means | 0.7913 | 0.3948 | 0.2639 | 0.4095 | 5546.8930 | 0.9796 | |
non-Graph | Spectral | 0.7104 | 0.3133 | 0.2461 | 0.5213 | 3920.2341 | 0.8721 |
DBSCAN | 0.4233 | 0.2015 | 0.1971 | 0.1358 | 379.3614 | 1.3172 | |
K-means | 0.8012 | 0.5542 | 0.4179 | 0.7167 | 23,914.2313 | 0.4496 | |
Graph-G | Spectral | 0.7728 | 0.4854 | 0.4253 | 0.7741 | 9176.2561 | 0.6182 |
DBSCAN | 0.5144 | 0.2448 | 0.2175 | 0.3142 | 1052.6348 | 2.9178 | |
K-means | 0.8341 * | 0.6074 * | 0.5652 * | 0.7828 * | 31,034.5827 * | 0.4235 * | |
Graph-LG | Spectral | 0.7543 | 0.5188 | 0.4732 | 0.7182 | 10,378.5347 | 0.6017 |
DBSCAN | 0.5214 | 0.3437 | 0.2831 | 0.2876 | 987.4192 | 3.7206 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xue, P.; Dong, L.; Wang, C.; Huang, C.; Wang, J. A Network Scanning Organization Discovery Method Based on Graph Convolutional Neural Network. Information 2025, 16, 899. https://doi.org/10.3390/info16100899
Xue P, Dong L, Wang C, Huang C, Wang J. A Network Scanning Organization Discovery Method Based on Graph Convolutional Neural Network. Information. 2025; 16(10):899. https://doi.org/10.3390/info16100899
Chicago/Turabian StyleXue, Pengfei, Luhan Dong, Chenyang Wang, Cheng Huang, and Jie Wang. 2025. "A Network Scanning Organization Discovery Method Based on Graph Convolutional Neural Network" Information 16, no. 10: 899. https://doi.org/10.3390/info16100899
APA StyleXue, P., Dong, L., Wang, C., Huang, C., & Wang, J. (2025). A Network Scanning Organization Discovery Method Based on Graph Convolutional Neural Network. Information, 16(10), 899. https://doi.org/10.3390/info16100899