Convolutional Graph Network-Based Feature Extraction to Detect Phishing Attacks
Abstract
1. Introduction
- Improvement in the accuracy and adaptability to evolving phishing behaviors by utilizing graph convolutional networks (GCNs) for feature extraction and combining them with SVMs.
- The introduction of an innovative feature selection process using Manhattan similarity and the neighborhood random walk method, ensuring that the model can dynamically capture the relationships between features.
- The use of the hinge loss function alongside similarity metrics to enhance the model’s classification performance, improving its ability to discriminate between real and phishing sites.
2. Literature Review
3. Materials and Methods
3.1. Preprocessing
- x_norm is the normalized feature value;
- μ is the mean of the feature;
- x is the value of the main feature;
- σ is the standard deviation of the feature.
3.2. Extracting Features Based on Graph Convolution Network
- Pv means the probability of the nodes corresponding to the data sample;
- A means the relation of the data sample with the Vs (feature node);
- B means the connection of the v with the sample data;
- O means that the relation between the V and V in the matrix is null because each V is not related to itself.
Algorithm 1: Phishing Detection with GNN and SVM |
Input:
|
3.3. Phishing Detection with SVM
3.4. Computational Complexity
4. Experiments
4.1. Dataset
- Lexical features: These features included the dash count, symbol, domain length, IP address, and domain depth (the dot number in the domain name).
- Content features: These features included the proper HTML, an iframe, and a form with a URL. References are inserted for the Top of Form and elements, with the proper src, features href, and action.
- Domain features: These features included the field age (seconds between the last update and the expiry date), certificate validity (like confirmed and dynamic via Rustls), and certificate reliability (computed applying the certificate’s duration and when the issuer was trusted).
4.2. Experimental Setup
- The hidden-layer sizes (final selected: [64, 32]);
- The learning rate (final: 0.001);
- The dropout probability (final: 0.1);
- The alpha threshold for graph edge construction (final: 0.95);
- The batch size (final: 16);
- The number of epochs (final: 50).
4.3. Evaluation Criteria
4.4. Results and Discussion
4.5. Limitations
- The dataset primarily includes phishing and genuine URLs from publicly available sources, which may not fully capture new phishing patterns, zero-day assaults, or region-specific phishing campaigns.
- Our model relies on handmade lexical, content, and domain features, which may be disguised by phishing websites.
- While evaluated on a public dataset, the model’s applicability to other domains or multilingual phishing sites warrants additional examination.
- GNN-based models can be difficult to interpret in security-critical applications. More research is needed to improve the explainability for incident response.
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
SVM | Support Vector Machine |
GNN | Graph Neural Network |
APWG | Anti-Phishing Working Group |
IPSs | Intrusion Prevention Systems |
IDSs | Intrusion Detection Systems |
ML | Machine Learning |
ST | Scenario-Based Techniques |
DL | Deep Learning |
NB | Naïve Bayes |
DT | Decision Tree |
RF | Random Forest |
kNN | k-Nearest Neighbor |
CNN | Convolutional Neural Network |
RNN | Recurrent Neural Network |
MLP | Multilayer Perceptron |
GRU | Gated Recurrent Unit |
DNN | Deep Neural Network |
LSTM | Long Short-Term Memory |
GCN | Graph Convolutional Network |
FAR | False-Alarm Rate |
ForestPA | Forest Penalizing Attribute |
AC | Association Classification |
IAC | Intelligent Associative Classification |
GA | Genetic Algorithm |
Bi-LSTM | Bidirectional Long Short-Term Memory |
SI-BBA | Swarm Intelligence Binary Bat Algorithm |
ReLU | Rectified Linear Unit |
GAT | Graph Attention Network |
NRW | Neighborhood Random Walk |
ROC | Receiver Operating Characteristic |
AUC | Area Under the Curve |
PCA | Principal Component Analysis |
References
- Chen, Y.; Zhang, X.; Deng, H. Trust calibration of automated security IT artifacts: A multi-domain study of phishing-website detection tools. Inf. Manag. 2021, 58, 103394. [Google Scholar] [CrossRef]
- Lokesh, G.H.; BoreGowda, G. Phishing website detection based on effective machine learning approach. J. Cyber Secur. Technol. 2021, 5, 1–14. [Google Scholar] [CrossRef]
- Sadiq, A.; Ahmad, R.W.; Salah, K.; Jayaraman, R.; Yaqoob, I. A review of phishing attacks and countermeasures for the Internet of things-based smart business applications in Industry 4.0. Hum. Behav. Emerg. Technol. 2021, 3, 854–864. [Google Scholar] [CrossRef]
- Alkawaz, M.H.; Alhassan, A.M.; Ismail, A.S. A comprehensive survey on identification and analysis of phishing website based on machine learning methods. In Proceedings of the 2021 IEEE 11th IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE), Penang, Malaysia, 3–4 April 2021. [Google Scholar]
- Deshpande, A.; Yadav, A.; Borkar, A.; Kale, S. Detection of phishing websites using machine learning. Int. J. Eng. Res. Technol. (IJERT) 2021, 10, 430–434. [Google Scholar]
- Zolfagharipour, L.; Kadhim, M.H.; Mandeel, T.H. Enhance the security of access to IoT-based equipment in fog. In Proceedings of the 2023 Al-Sadiq International Conference on Communication and Information Technology (AICCIT), Al-Muthana, Iraq, 4–6 July 2023. [Google Scholar]
- Das, S.; Nippert-Eng, C.; Camp, L.J. Evaluating user susceptibility to phishing attacks. Inf. Comput. Secur. 2022, 30, 1–18. [Google Scholar] [CrossRef]
- Alkhalil, Z.; Hewage, C.; Nawaf, L.; Khan, I. Phishing attacks: A recent comprehensive study and a new anatomy. Front. Comput. Sci. 2021, 3, 563060. [Google Scholar] [CrossRef]
- Chiew, K.L.; Yong, K.S.; Tan, C.L. A survey of phishing attacks: Their types, vectors, and technical approaches. Expert Syst. Appl. 2018, 106, 1–20. [Google Scholar] [CrossRef]
- Petrič, G.; Roer, K. The impact of formal and informal organizational norms on susceptibility to phishing. Telemat. Inform. 2022, 67, 101766. [Google Scholar] [CrossRef]
- Patil, R.R.; Kaur, G.; Jain, H.; Tiwari, A.; Joshi, S.; Rao, K.; Sharma, A. Machine learning approach for phishing website detection: A literature survey. J. Discrete Math. Sci. Cryptogr. 2022, 25, 817–827. [Google Scholar] [CrossRef]
- Al-Hagery, M.A.; Abdalla Musa, A.I. Automated Credit Card Risk Assessment using Fuzzy Parameterized Neutrosophic Hypersoft Expert Set. Int. J. Neutrosophic Sci. (IJNS) 2025, 25, 93–103. [Google Scholar]
- Patil, S.; Dhage, S. A methodical overview on phishing detection along with an organized way to construct an anti-phishing framework. In Proceedings of the 2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS), Coimbatore, India, 15–16 March 2019. [Google Scholar]
- Ozcan, A.; Catal, C.; Donmez, E.; Senturk, B. A hybrid DNN–LSTM model for detecting phishing URLs. Neural Comput. Appl. 2021, 34, 10821–10837. [Google Scholar] [CrossRef] [PubMed]
- Zolfagharipour, L.; Kadhim, M.H. A Technique for Efficiently Controlling Centralized Data Congestion in Vehicular Ad Hoc Networks. Int. J. Comput. Networks Appl. 2025, 12, 267–277. [Google Scholar] [CrossRef]
- Kambar, M.E.Z.N.; Esmaeilzadeh, A.; Kim, Y.; Taghva, K. A survey on mobile malware detection methods using machine learning. In Proceedings of the 2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC), Virtual Conference, 26–29 January 2022. [Google Scholar]
- Do, N.Q.; Selamat, A.; Krejcar, O.; Herrera-Viedma, E.; Fujita, H. Deep learning for phishing detection: Taxonomy, current challenges, and future directions. IEEE Access 2022, 10, 80795–80815. [Google Scholar] [CrossRef]
- Anagora, R.A.R.; Rudini, R.; Taufiq, R.T.R.; Jubaedi, A.D.J.A.D.; Wirawan, R.W.R.; Putra, A.S. The Classification of Phishing Websites using Naive Bayes Classifier Algorithm. Int. J. Sci. Technol. Manag. 2022, 3, 553–562. [Google Scholar]
- Anupam, S.; Kar, A.K. Phishing website detection using support vector machines and nature-inspired optimization algorithms. Telecommun. Syst. 2021, 76, 17–32. [Google Scholar] [CrossRef]
- Zhu, E.; Ju, Y.; Chen, Z.; Liu, F.; Fang, X. DTOF-ANN: An artificial neural network phishing detection model based on decision tree and optimal features. Appl. Soft Comput. 2020, 95, 106505. [Google Scholar] [CrossRef]
- Zhu, E.; Chen, Z.; Cui, J.; Zhong, H. MOE/RF: A novel phishing detection model based on revised multi-objective evolution optimization algorithm and random forest. IEEE Trans. Netw. Serv. Manag. 2022, 19, 2400–2412. [Google Scholar] [CrossRef]
- Assegie, T.A. K-nearest neighbor based URL identification model for phishing attack detection. Indian J. Artif. Intell. Neural Netw. (IJAINN) 2021, 1, 45–53. [Google Scholar]
- Alhamad, H.; Alzyadh, T.; Badawi, M.A. Detecting e-banking phishing website using C4.5 algorithm. Int. J. Comput. Sci. Netw. Secur. 2020, 20, 46–52. [Google Scholar]
- Pandey, P.; Prabhakar, R. An analysis of machine learning techniques (J48 & AdaBoost)-for classification. In Proceedings of the 2016 1st India International Conference on Information Processing (IICIP), Delhi, India, 12–14 August 2016. [Google Scholar]
- Alsariera, Y.A.; Elijah, A.V.; Balogun, A.O. Phishing website detection: Forest by penalizing attributes algorithm and its enhanced variations. Arab. J. Sci. Eng. 2020, 45, 10459–10470. [Google Scholar] [CrossRef]
- Alqahtani, M. Phishing websites classification using association classification (PWCAC). In Proceedings of the 2019 International Conference on Computer and Information Sciences (ICCIS), Sakaka, Saudi Arabia, 3–4 April 2019. [Google Scholar]
- Al-Fayoumi, M.; Alwidian, J.; Abusaif, M. Intelligent association classification technique for phishing website detection. Int. Arab J. Inf. Technol. 2020, 17, 488–496. [Google Scholar] [CrossRef]
- Al-Sarem, M.; Saeed, F.; Al-Mekhlafi, Z.G.; Mohammed, B.A.; Al-Hadhrami, T.; Alshammari, M.T.; Alreshidi, A.; Alshammari, T.S. An optimized stacking ensemble model for phishing websites detection. Electronics 2021, 10, 1285. [Google Scholar] [CrossRef]
- Karabatak, M.; Mustafa, T. Performance comparison of classifiers on reduced phishing website dataset. In Proceedings of the 2018 6th International Symposium on Digital Forensic and Security (ISDFS), Antalya, Turkey, 22–25 March 2018. [Google Scholar]
- Lakshmanarao, A.; Rao, P.S.P.; Krishna, M.M.B. Phishing website detection using novel machine learning fusion approach. In Proceedings of the 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS), Coimbatore, India, 25–27 March 2021. [Google Scholar]
- Almousa, M.; Zhang, T.; Sarrafzadeh, A.; Anwar, M. Phishing website detection: How effective are deep learning-based models and hyperparameter optimization? Secur. Privacy 2022, 5, e256. [Google Scholar] [CrossRef]
- Babagoli, M.; Aghababa, M.P.; Solouk, V. Heuristic nonlinear regression strategy for detecting phishing websites. Soft Comput. 2019, 23, 4315–4327. [Google Scholar] [CrossRef]
- Kalabarige, L.R.; Rao, R.S.; Abraham, A.; Gabralla, L.A. Multilayer stacked ensemble learning model to detect phishing websites. IEEE Access 2022, 10, 79543–79552. [Google Scholar] [CrossRef]
- Pavan, R.; Nara, M.; Gopinath, S.; Patil, N. Bayesian optimization and gradient boosting to detect phishing websites. In Proceedings of the 2021 55th Annual Conference on Information Sciences and Systems (CISS), Baltimore, MD, USA, 24–26 March 2021. [Google Scholar]
- Zaman, S.; Deep, S.M.U.; Kawsar, Z.; Ashaduzzaman; Pritom, A.I. Phishing Website Detection Using Effective Classifiers and Feature Selection Techniques. In Proceedings of the 2019 2nd International Conference on Innovation in Engineering and Technology (ICIET), Dhaka, Bangladesh, 23–24 December 2019. [Google Scholar]
- Priya, S.; Selvakumar, S.; Velusamy, R.L. Gravitational search-based feature selection for enhanced phishing websites detection. In Proceedings of the 2020 2nd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), Bangalore, India, 5–7 March 2020. [Google Scholar]
- Roy, S.S.; Awad, A.I.; Amare, L.A.; Erkihun, M.T.; Anas, M. Multimodel phishing URL detection using LSTM, bidirectional LSTM, and GRU models. Future Internet 2022, 14, 340. [Google Scholar] [CrossRef]
- Kumar, P.P.; Jaya, T.; Rajendran, V. SI-BBA–a novel phishing website detection based on swarm intelligence with deep learning. Mater. Today Proc. 2021, 45, 3741–3745. [Google Scholar]
- Kulkarni, A.D.; Convolution Neural Networks for Phishing Detection. Computer Science Faculty Publications and Presentations, 2023, Paper 23. Available online: http://hdl.handle.net/10950/4224 (accessed on 1 July 2025).
- Yin, K.; Ye, B. Phishing scam detection for Ethereum based on community enhanced graph convolutional networks. In Proceedings of the International Conference on Neural Information Processing, Changsha, China, 20–23 November 2023; pp. 191–206. [Google Scholar]
- Huang, T.; Lin, D.; Wu, J. Ethereum account classification based on graph convolutional network. IEEE Trans. Circuits Syst. II: Express Briefs 2022, 69, 2528–2532. [Google Scholar] [CrossRef]
- Chen, Z.; Huang, J.; Liu, S.; Long, H. Multiscale feature fusion and graph convolutional network for detecting Ethereum phishing scams. Electronics 2024, 13, 1012. [Google Scholar] [CrossRef]
- Zhou, Y.; Cheng, H.; Yu, J.X. Graph clustering based on structural/attribute similarities. Proc. VLDB Endow. 2009, 2, 718–729. [Google Scholar] [CrossRef]
- Nivaashini, M.; Soundariya, R.S. Deep stacked autoencoder based feature representation for phishing URLs detection. J. Adv. Res. Dyn. Control Syst. 2017, 9, 904–916. [Google Scholar]
- Gopi, R.; Sathiyamoorthi, V.; Selvakumar, S.; Manikandan, R.; Chatterjee, P.; Jhanjhi, N.Z.; Luhach, A.K. Enhanced method of ANN based model for detection of DDoS attacks on multimedia Internet of Things. Multimed. Tools Appl. 2022, 82, 15979–15993. [Google Scholar] [CrossRef]
- Bilot, T.; Geis, G.; Hammi, B. PhishGNN: A phishing website detection framework using graph neural networks. In Proceedings of the 19th International Conference on Security and Cryptography, Lisbon, Portugal, 11–13 July 2022; pp. 428–435. [Google Scholar]
- Han, J.; Kamber, M.; Pei, J. Data Mining: Concepts and Techniques, 4th ed.; Morgan Kaufmann: San Francisco, CA, USA, 2022. [Google Scholar]
Ref. | Authors | Year | Detection Algorithm | Performance |
---|---|---|---|---|
[29] | Karabatak, et al. | 2018 | Machine learning | ACC = 97.58 |
[26] | Alqahtani, Mohammed | 2019 | Phishing Website Association Classification (PWCAC) | ACC = 95.20 |
[32] | Babagoli, et al. | 2019 | Heuristic nonlinear regression strategy | ACC = 92.80 |
[35] | Zaman, Shihabuz, et al. | 2019 | Effective classifiers and feature selection techniques | ACC = 96.25 Precision = 97.1 Recall = 96.3 |
[27] | Al-Fayoumi, et al. | 2020 | Intelligent association classification technique | ACC = 85.36 Precision = 85.8 Recall = 85.7 F-score = 85.7 |
[36] | Priya, et al. | 2020 | Gravitational search-based feature selection | ACC = 95.53 TPR = 94.87 TNR = 96.05 |
[34] | Pavan, Rakesh, et al. | 2021 | Bayesian optimization and gradient boosting | ACC = 97.08 |
[28] | Al-Sarem, Mohammed, et al. | 2021 | Optimized stacking ensemble model | ACC = 97.02 Precision = 96.58 Recall = 98.08 F-score = 97.49 |
[30] | Lakshmanarao, et al. | 2021 | Novel machine learning fusion approach | ACC = 97 |
[31] | Almousa, May, et al. | 2022 | DL-based models and hyperparameter optimization | ACC = 94.5 |
[33] | Kalabarige, Lakshmana Rao, et al. | 2022 | Multilayer stacked ensemble learning model | ACC = 97.76 Precision = 97.34 Recall = 98.07 F-score = 97.70 |
[41] | Huang, et al. | 2022 | Graph convolutional network | ACC = 63.25 Precision = 75.23 Recall = 16.25 F-score = 26.73 |
[39] | Kulkarni AD. | 2023 | Convolution Neural Networks | ACC = 86.5 |
[38] | Kumar, et al. | 2023 | Binary bat algorithm and neural network | ACC = 94.8 |
[40] | Yin, et al. | 2023 | Community-enhanced GCN-based detection model | ACC = 63.25 Precision = 75.23 Recall = 16.25 F-score = 26.73 |
[42] | Chen, et al. | 2024 | Graph convolutional network | ACC = 87.3 Precision = 87.8 Recall = 89.0 F-score = 88.4 |
Predicted Class | |||
Yes | No | ||
Real class | Yes | TP | FN |
No | FP | TN |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shakir, S.S.; Mohammad Khanli, L.; Emami, H. Convolutional Graph Network-Based Feature Extraction to Detect Phishing Attacks. Future Internet 2025, 17, 331. https://doi.org/10.3390/fi17080331
Shakir SS, Mohammad Khanli L, Emami H. Convolutional Graph Network-Based Feature Extraction to Detect Phishing Attacks. Future Internet. 2025; 17(8):331. https://doi.org/10.3390/fi17080331
Chicago/Turabian StyleShakir, Saif Safaa, Leyli Mohammad Khanli, and Hojjat Emami. 2025. "Convolutional Graph Network-Based Feature Extraction to Detect Phishing Attacks" Future Internet 17, no. 8: 331. https://doi.org/10.3390/fi17080331
APA StyleShakir, S. S., Mohammad Khanli, L., & Emami, H. (2025). Convolutional Graph Network-Based Feature Extraction to Detect Phishing Attacks. Future Internet, 17(8), 331. https://doi.org/10.3390/fi17080331