A Novel Logo Identification Technique for Logo-Based Phishing Detection in Cyber-Physical Systems
Abstract
:1. Introduction
- We propose a novel logo-identification mechanism for logo-based phishing detection.
- The proposed mechanism uses the hue value ratio and the pixel density distribution in a logo that uniquely defines its feature and identifies well-known brands.
- A number of 21 brands with 48 different classes of logos are used for training the ML model.
- The detection accuracy of ensemble random forest algorithm is found to be the best with an accuracy of 87%.
2. Overview of Phishing Attacks
Classification of Various Phishing Attack Detection Mechanisms
- URL-scan-based approach [11]: These mechanisms scan the suspicious URL to parse numerous aspects of interest in order to evaluate their pattern for detecting anomalies related to phishing sites. This approach discovers possible phishing URLs by constructing various combinatorial URLs from current legitimate URLs and determining whether they exist and are involved in phishing-related activity over the internet.
- Content-based approach: The content-based technique identifies keywords and patterns in the requested email text and/or Uniform Resource Locator (URL) of the entire website, its components and document object model. The extracted contents of the web-page are examined and used for detection.
- Heuristic-learning-based approach [2]: This approach is based on the detection of anomalies in a website using a set of heuristics developed over a long-term observation. If a reported website has one or more anomaly patterns, then existing heuristics are used to detect them. These techniques extract a set of elements, such as text, images and URL-specific information. Finally, the anomaly is detected using the set of defined heuristics and applying the rules or thresholds obtained from the assimilating algorithms.
- Visual-similarity-based approach [12,13]: This approach uses a visual similarity score between web-pages. The score of similarity between these sites is used to detect phishing. Logos, photos, font size and type, alignment, text location, etc. are examples of the elements used for calculating the similarity score. The goal is to determine whether a reported site and a related popular/legitimate site have visual similarity.
- Blacklisting approach [14]: These approaches maintain a database of previously detected phishing sites/links, URLs and domains. The URLs reported by user comments, and trusted third parties are used to create the blacklist. A freshly arrived link or URL is checked against the blacklisted sites for a better match. When a match with a higher similarity ratio is identified, it is flagged to be a phishing web-site and added to the blacklist.
- Machine Learning (ML) and Hybrid Approach [15]: To enhance efficiency, these strategies integrate one or more of the aforementioned techniques as well as other detection techniques. ML-based hybrid techniques are becoming more popular as a result of their capacity to infer new prospective phishing patterns from existing ones. These strategies make use of a variety of algorithms to fine-tune classification and increase phishing detection precision.
3. Logo-Based Phishing Detection
4. Literature Survey
5. Proposed Work
Algorithm 1: The algorithm for logo feature extraction. |
Input: Set of logo images Output: Set of feature dictionary |
Algorithm 2: The algorithm for training the ML models. |
Input: Set of feature dictionary Output: Set of ML training models |
distinctHue: | [0, 5, 6, 15, 30, 90, 98, 99, 100, 101, 102, 103, 104, 105, |
106, 107, 150] |
hueCluster: | repHue | -> | hueMembers |
4 | -> | [0, 5, 6], | |
15 | -> | [15], | |
30 | -> | [30], | |
90 | -> | [90], | |
102 | -> | [98, 99, 100, 101, 102, 103, 104, 105, 106, 107], | |
150 | -> | [150] |
hueValue: 102 | ||||
pixratio: 16.0 | ||||
pixels: 596 | ||||
relativeDistanceDistribution: | [0.0: 0, | 0.1: 0, | 0.2: 19, | 0.3: 30, |
0.4: 21, | 0.5: 20, | 0.6: 2, | 0.7: 4, | |
0.8: 2, | 0.9: 2, | 1.0: 0] |
6. Experiment Overview
Dataset Description & Analysis
7. Results
8. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Ramzan, Z. Phishing attacks and countermeasures. In Handbook of Information and Communication Security; Stavroulakis, P., Ed.; Springer: Berlin/Heidelberg, Germany, 2010; pp. 433–448. [Google Scholar]
- Mishra, A.K.; Tripathy, A.K.; Saraswathi, S.; Das, M. Prevention of Phishing Attack in Internet-of-Things based Cyber-Physical Human System. In High Performance Vision Intelligence; Springer: Berlin/Heidelberg, Germany, 2020; pp. 15–32. [Google Scholar]
- Sahoo, B.; Rath, S.; Puthal, D. Energy efficient protocols for wireless sensor networks: A survey and approach. Int. J. Comput. Appl. 2012, 44, 43–48. [Google Scholar]
- Bhatt, P.; Thakker, B. A novel forecastive anomaly based botnet revelation framework for competing concerns in internet of things. J. Appl. Secur. Res. 2021, 16, 258–278. [Google Scholar] [CrossRef]
- Varshney, G.; Misra, M.; Atrey, P.K. A survey and classification of web phishing detection schemes. Secur. Commun. Netw. 2016, 9, 6266–6284. [Google Scholar] [CrossRef]
- Das, M.; Saraswathi, S.; Panda, R.; Mishra, A.K.; Tripathy, A.K. Exquisite Analysis of Popular Machine Learning–Based Phishing Detection Techniques for Cyber Systems. J. Appl. Secur. Res. 2021, 16, 538–562. [Google Scholar] [CrossRef]
- Gangavarapu, T.; Jaidhar, C.; Chanduka, B. Applicability of machine learning in spam and phishing email filtering: Review and approaches. Artif. Intell. Rev. 2020, 53, 5019–5081. [Google Scholar] [CrossRef]
- Halevi, T.; Memon, N.; Nov, O. Spear-phishing in the wild: A real-world study of personality, phishing self-efficacy and vulnerability to spear-phishing attacks. Phishing-Self-Effic. Vulnerability Spear-Phishing Attacks 2015, 2015. [Google Scholar] [CrossRef]
- Bullee, J.W.; Montoya, L.; Junger, M.; Hartel, P. Spear phishing in organisations explained. Inf. Comput. Secur. 2017, 25, 1–21. [Google Scholar] [CrossRef]
- Zuraiq, A.A.; Alkasassbeh, M. Phishing detection approaches. In Proceedings of the 2019 Second International Conference on New Trends in Computing Sciences (ICTCS), Amman, Jordan, 9–11 October 2019; pp. 1–6. [Google Scholar]
- Almeida, R.; Westphall, C. Heuristic Phishing Detection and URL Checking Methodology Based on Scraping and Web Crawling. In Proceedings of the 2020 IEEE International Conference on Intelligence and Security Informatics (ISI), Arlington, VA, USA, 9–10 November 2020; pp. 1–6. [Google Scholar]
- Medvet, E.; Kirda, E.; Kruegel, C. Visual-similarity-based phishing detection. In Proceedings of the fourth International Conference on Security and Privacy in Communication Netowrks, Istanbul Turkey, 22–25 September 2008; pp. 1–6. [Google Scholar]
- Jain, A.K.; Gupta, B.B. Phishing detection: Analysis of visual similarity based approaches. Secur. Commun. Netw. 2017, 2017, 1–21. [Google Scholar] [CrossRef]
- Hara, M.; Yamada, A.; Miyake, Y. Visual similarity-based phishing detection without victim site information. In Proceedings of the 2009 IEEE Symposium on Computational Intelligence in Cyber Security, Nashville, TN, USA, 30 March–2 April 2009; pp. 30–36. [Google Scholar]
- Kumar, A.; Chatterjee, J.M.; Díaz, V.G. A novel hybrid approach of SVM combined with NLP and probabilistic neural network for email phishing. Int. J. Electr. Comput. Eng. 2020, 10, 486. [Google Scholar] [CrossRef]
- Bozkir, A.S.; Aydos, M. LogoSENSE: A companion HOG based logo detection scheme for phishing web page and E-mail brand recognition. Comput. Secur. 2020, 95, 101855. [Google Scholar] [CrossRef]
- Chiew, K.L.; Chang, E.H.; Tiong, W.K. Utilisation of website logo for phishing detection. Comput. Secur. 2015, 54, 16–26. [Google Scholar] [CrossRef]
- Bianco, S.; Buzzelli, M.; Mazzini, D.; Schettini, R. Deep learning for logo recognition. Neurocomputing 2017, 245, 23–30. [Google Scholar] [CrossRef]
- Yao, W.; Ding, Y.; Li, X. Deep learning for phishing detection. In Proceedings of the 2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom), Melbourne, VIC, Australia, 11–13 December 2018; pp. 645–650. [Google Scholar]
- Peng, T.; Harris, I.; Sawa, Y. Detecting phishing attacks using natural language processing and machine learning. In Proceedings of the 2018 IEEE 12th International Conference on Semantic Computing (ICSC), Laguna Hills, CA, USA, 31 January–2 February 2018; pp. 300–301. [Google Scholar] [CrossRef]
- Ding, Y.; Luktarhan, N.; Li, K.; Slamu, W. A keyword-based combination approach for detecting phishing webpages. Comput. Secur. 2019, 84, 256–275. [Google Scholar] [CrossRef]
- Rao, R.S.; Pais, A.R. Jail-Phish: An improved search engine based phishing detection system. Comput. Secur. 2019, 83, 246–267. [Google Scholar] [CrossRef]
- Azeez, N.A.; Misra, S.; Margaret, I.A.; Fernandez-Sanz, L.; Abdulhamid, S.M. Adopting automated whitelist approach for detecting phishing attacks. Comput. Secur. 2021, 108, 102328. [Google Scholar] [CrossRef]
- Lin, Y.; Liu, R.; Divakaran, D.M.; Ng, J.Y.; Chan, Q.Z.; Lu, Y.; Si, Y.; Zhang, F.; Dong, J.S. Phishpedia: A Hybrid Deep Learning Based Approach to Visually Identify Phishing Webpages. In Proceedings of the 30th USENIX Security Symposium (USENIX Security 21), Virtual Event, 11–13 August 2021; pp. 3793–3810. [Google Scholar]
- Butnaru, A.; Mylonas, A.; Pitropakis, N. Towards Lightweight URL-Based Phishing Detection. Future Internet 2021, 13, 154. [Google Scholar] [CrossRef]
- Gupta, B.B.; Yadav, K.; Razzak, I.; Psannis, K.; Castiglione, A.; Chang, X. A novel approach for phishing URLs detection using lexical based machine learning in a real-time environment. Comput. Commun. 2021, 175, 47–57. [Google Scholar] [CrossRef]
- Moedjahedy, J.; Setyanto, A.; Alarfaj, F.K.; Alreshoodi, M. CCrFS: Combine Correlation Features Selection for Detecting Phishing Websites Using Machine Learning. Future Internet 2022, 14, 229. [Google Scholar] [CrossRef]
- Liu, R.; Lin, Y.; Yang, X.; Ng, S.H.; Divakaran, D.M.; Dong, J.S. Inferring Phishing Intention via Webpage Appearance and Dynamics: A Deep Vision Based Approach. In Proceedings of the 30th USENIX Security Symposium (USENIX Security 21), Virtual Event, 11–13 August 2021. [Google Scholar]
- Dou, Z.; Khalil, I.; Khreishah, A.; Al-Fuqaha, A.; Guizani, M. Systematization of Knowledge (SoK): A Systematic Review of Software-Based Web Phishing Detection. IEEE Commun. Surv. Tutor. 2017, 19, 2797–2819. [Google Scholar] [CrossRef]
- Alabdan, R. Phishing Attacks Survey: Types, Vectors, and Technical Approaches. Future Internet 2020, 12, 168. [Google Scholar] [CrossRef]
- Almomani, A.; Alauthman, M.; Shatnawi, M.T.; Alweshah, M.; Alrosan, A.; Alomoush, W.; Gupta, B.B.; Gupta, B.B.; Gupta, B.B. Phishing Website Detection With Semantic Features Based on Machine Learning Classifiers: A Comparative Study. Int. J. Semant. Web Inf. Syst. (IJSWIS) 2022, 18, 1–24. [Google Scholar] [CrossRef]
- Jain, A.K.; Gupta, B. A survey of phishing attack techniques, defence mechanisms and open research challenges. Enterp. Inf. Syst. 2022, 16, 527–565. [Google Scholar] [CrossRef]
- Ahn, J.S.; Lee, Y.K. Color distribution of a shade guide in the value, chroma, and hue scale. J. Prosthet. Dent. 2008, 100, 18–28. [Google Scholar] [CrossRef]
- Bouguettaya, A.; Yu, Q.; Liu, X.; Zhou, X.; Song, A. Efficient agglomerative hierarchical clustering. Expert Syst. Appl. 2015, 42, 2785–2797. [Google Scholar] [CrossRef]
- Qian, B.; Su, J.; Wen, Z.; Jha, D.N.; Li, Y.; Guan, Y.; Puthal, D.; James, P.; Yang, R.; Zomaya, A.Y.; et al. Orchestrating the development lifecycle of machine learning-based IoT applications: A taxonomy and survey. ACM Comput. Surv. (CSUR) 2020, 53, 1–47. [Google Scholar] [CrossRef]
- Rajora, S.; Li, D.L.; Jha, C.; Bharill, N.; Patel, O.P.; Joshi, S.; Puthal, D.; Prasad, M. A comparative study of machine learning techniques for credit card fraud detection based on time variance. In Proceedings of the 2018 IEEE Symposium Series on Computational Intelligence (SSCI), Bangalore, India, 18–21 November 2018; pp. 1958–1963. [Google Scholar]
Sl. No. | Title | Authors | Year | Keynotes |
---|---|---|---|---|
1 | Utilization of website logo for phishing detection | Chiew et al. [17] | 2015 | Content-based image retrieval feature from Google image database and used image similarity for phishing detection |
2 | Deep learning for logo recognition | Biancho et al. [18] | 2017 | CNN-based Deep Learning model for logo detection |
3 | Detecting Phishing Attacks Using NLP and ML | Peng et al. [20] | 2018 | NLP-based approach to analyse text and content of webpages to detect phishing attack. |
4 | Deep Learning for Phishing Detection | Yao et al. [19] | 2018 | Used both URL and logo to propose a phishing-detection mechanism |
5 | Jail-Phish: An improved search-engine-based phishing detection system | Rao and Pais [22] | 2019 | Website contents, such as the logo, favicon, images and text, for phishing detection |
6 | A keyword-based combination approach for detecting phishing webpages | Ding et al. [21] | 2019 | A compound approach of search engine, heuristics rules and LR classifier is used to detect a phising site |
7 | LogoSENSE: A companion HOG-based logo detection scheme for phishing brand recognition | Bozkir and Aydos [16] | 2020 | Used Histogram Oriented Gradients to obtain visual representation of logo image and hence recognize phishing web pages |
8 | Adopting automated whitelist approach for detecting phishing attacks | Azeez et al. [23] | 2021 | Came up with an automated whitelist-based approach to detect phishing web-pages by using hyperlink parameters |
9 | Phishpedia: A Hybrid Deep Learning Based Approach to Visually Identify Phishing Webpages | Lin et al. [24] | 2021 | Presented a hybrid deep-learning system that matches logo images of the same brand with the website logo to detect phishing |
10 | Towards Lightweight URL-Based Phishing Detection | Butnaru et al. [25] | 2021 | Used only URL-based features to train and detect phishing using ML algorithms. |
11 | A novel approach for phishing URLs detection using lexical-based machine learning in a real-time environment | Gupta et al. | 2021 | Used nine features of an URL to train and detect a phishing URL using ML algorithms |
12 | CCrFS: Combine Correlation Features Selection for Detecting Phishing Websites Using Machine Learning | Moedjahedy et al. [27] | 2022 | Used combine approach of correlation and recursive feature elimination process to limit the number of features to detect phishing detection. |
Sl. No. | Title | Authors | Year | Keynotes |
---|---|---|---|---|
1 | A survey and classification of web phishing detection schemes | Varshney et al. [5] | 2016 | Presented a comprehensive analysis of phished website detection and outlining advantages and disadvantages |
2 | Phishing Detection: Analysis of Visual Similarity Based Approaches | Jain et al. [13] | 2017 | Presented a comprehensive and comparative analysis of phishing attacks and detection using visual-similarity-based approaches |
3 | Systematization of Knowledge (SoK): A Systematic Review of Software-Based Web Phishing Detection | Dou et al. [29] | 2017 | Systemized study of phishing-detection techniques and analysis. |
4 | SoK: A Comprehensive Reexamination of Phishing Research From the Security Perspective | Abhisha Das et al. | 2020 | Re-examined phishing research works, categorizing the existing works based on attack vectors and examination of properties and features for phishing detection |
5 | Phishing Attacks Survey: Types, Vectors and Technical Approaches | Alabdan et al. [30]. | 2020 | A comprehensive survey of attack properties and various detection mechanisms. |
6 | Exquisite Analysis of Popular Machine Learning–Based Phishing Detection Techniques for Cyber Systems | Meenakshi Das et al. [6] | 2021 | Performed an exquisite analysis various machine-learning-based phishing-detection techniques, which includes analysis and taxonomy used in various methods |
7 | Phishing Website Detection With Semantic Features Based on Machine Learning Classifiers: A Comparative Study | Almomani et al. [31] | 2022 | Classified the ML-based detection techniques and performed an experiment based accuracy comparison of the ML-based techniques |
8 | A survey of phishing attack techniques, defence mechanisms and open research challenges | Jain et al. [32] | 2022 | Emphasized on distribution procedure of phishing attack, highlighted the consequences of phishing threats and enlisted various challenges involved in phishing detection |
Class ID | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|---|
Class Name | apple | axis | boa | bob | boi | chase | dhl |
Class ID | 7 | 8 | 9 | 10 | 11 | 12 | 13 |
Class Name | ebay | fb | fedex | gplay | hdfc | icici | |
Class ID | 14 | 15 | 16 | 17 | 18 | 19 | 20 |
Class Name | ms | netflix | paypal | sbi | vodafone | yahoo | yes |
Parameters | LogoSENSE | Proposed Work |
---|---|---|
OS | Ubuntu 18.04 | Windows 10 |
Processor | Intel i7 | Intel i5 |
RAM size | 24 GB | 8 GB |
Dataset | Phishtank, Phishbank | Own |
Training Dataset Size | 3060 | 432 |
Testing Dataset Size | 864 | 106 |
No. of epochs | 1000 | 500 |
Number of brands | 15 | 21 |
Number of classes | 16 | 48 |
Overall accuracy | 93.5% | 87% |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Panda, P.; Mishra, A.K.; Puthal, D. A Novel Logo Identification Technique for Logo-Based Phishing Detection in Cyber-Physical Systems. Future Internet 2022, 14, 241. https://doi.org/10.3390/fi14080241
Panda P, Mishra AK, Puthal D. A Novel Logo Identification Technique for Logo-Based Phishing Detection in Cyber-Physical Systems. Future Internet. 2022; 14(8):241. https://doi.org/10.3390/fi14080241
Chicago/Turabian StylePanda, Padmalochan, Alekha Kumar Mishra, and Deepak Puthal. 2022. "A Novel Logo Identification Technique for Logo-Based Phishing Detection in Cyber-Physical Systems" Future Internet 14, no. 8: 241. https://doi.org/10.3390/fi14080241
APA StylePanda, P., Mishra, A. K., & Puthal, D. (2022). A Novel Logo Identification Technique for Logo-Based Phishing Detection in Cyber-Physical Systems. Future Internet, 14(8), 241. https://doi.org/10.3390/fi14080241