Machine Learning-Driven Detection of Cross-Site Scripting Attacks
Abstract
:1. Introduction
- Client-side (Document Object Model (DOM)-based XSS);
- Server-side (persistent and non-persistent XSS).
- Create an ML-based model: We provide an ML model that greatly enhances the precision and potency of XSS detection in web applications.
- Identify ideal features: To guarantee the accurate detection of XSS attacks while reducing false alarms, we seek to identify the best traits and data sources for ML model training.
- Assess the efficacy of ML-based detection systems: We will evaluate the ML-based approach’s accuracy, efficiency, and reliability by comparing it with state-of-the-art detection techniques.
- Examine current methods: We will quickly summarize current ML and deep learning (DL) algorithms that have been applied to XSS detection.
2. Related Work
3. Research Methodology
3.1. Data Collection
3.2. Preprocessing
3.3. Feature Selection
3.4. Model Training
3.5. Evaluation
3.5.1. Accuracy
3.5.2. Precision
3.5.3. Recall
3.5.4. F1-Score
3.5.5. ROC-AUC
4. Results
4.1. Logistic Regression (LR)
4.2. Support Vector Machine (SVM)
4.3. Multi-Layer Perceptron (MLP)
4.4. Artificial Neural Networks (ANNs)
4.5. Convolutional Neural Networks (CNNs)
4.6. Extreme Gradient Boosting (XGBoost)
4.7. Decision Tree (DT)
4.8. Random Forest (RF)
4.9. Ensemble Model of MLP Classifier and RF
4.10. Ensemble Model of DTs, RF, and GB
5. Discussion
5.1. Comparison with Other State-of-the-Art Methods
5.2. Practical Implementation Challenges in Real-World Systems
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Sotnik, S.; Shakurova, T.; Lyashenko, V. Development Features Web-Applications. 2023. Available online: www.ijeais.org/ijaar (accessed on 13 June 2024).
- Prasetio, D.A.; Kusrini, K.; Arief, M.R. Cross-site Scripting Attack Detection Using Machine Learning with Hybrid Features. J. Infotel 2021, 13, 1–6. [Google Scholar] [CrossRef]
- Bielova, N. Survey on JavaScript security policies and their enforcement mechanisms in a web browser. J. Log. Algebr. Program. 2013, 82, 243–262. [Google Scholar] [CrossRef]
- Dasgupta, D.; Akhtar, Z.; Sen, S. Machine learning in cybersecurity: A comprehensive survey. J. Def. Model. Simul. 2022, 19, 57–106. [Google Scholar] [CrossRef]
- Chaudhari, G.R.; Vaidya, M.V. A Survey on Security and Vulnerabilities of Web Application. 2014. Available online: www.ijcsit.com (accessed on 13 June 2024).
- Parashar, P.; Srivastava, P. An Analysis of XSS Vulnerabilities and Prevention of XSS Attacks in Web Applications. Available online: https://www.researchgate.net/publication/371724261_An_Analysis_of_XSS_Vulnerabilities_and_Prevention_of_XSS_Attacks_in_Web_Applications (accessed on 3 January 2024).
- Nir, O. “OWASP Top Ten 2023—The Complete Guide”, Reflectiz. Available online: https://www.reflectiz.com/blog/owasp-top-ten-2023/ (accessed on 9 October 2023).
- Kaur, J.; Garg, U.; Bathla, G. Detection of cross-site scripting (XSS) attacks using machine learning techniques: A review. Artif. Intell. Rev. 2023, 56, 12725–12769. [Google Scholar] [CrossRef]
- Edgescan. Vulnerability Statistics Snapshot. January 2022. Available online: https://www.edgescan.com/january-2022-vulnerability-statistics-snapshot/ (accessed on 10 August 2023).
- Erşahin, B.; Erşahin, M. Web application security. South Fla. J. Dev. 2022, 3, 4194–4203. [Google Scholar] [CrossRef]
- Awad, M.; Ali, M.; Takruri, M.; Ismail, S. Security vulnerabilities related to web-based data. Telkomnika (Telecommun. Comput. Electron. Control) 2019, 17, 852–856. [Google Scholar] [CrossRef]
- Habibi, G.; Surantha, N. XSS Attack Detection with Machine Learning and n-Gram Methods; Institute of Electrical and Electronics Engineers: Los Alamitos, CA, USA, 2020. [Google Scholar]
- Sarker, I.H. Multi-aspects AI -based modeling and adversarial learning for cybersecurity intelligence and robustness: A comprehensive overview. Secur. Priv. 2023, 6, e295. [Google Scholar] [CrossRef]
- Stency, V.S.; Mohanasundaram, N. A Study on XSS Attacks: Intelligent Detection Methods. In Journal of Physics: Conference Series, Volume 1767, International E-Conference on Data Analytics, Intelligent Systems and Information Security & ICDIIS 2020, Pollachi, India, 11–12 December 2020; IOP Publishing Ltd.: Bristol, UK, 2021. [Google Scholar] [CrossRef]
- Marashdih, A.W.; Zaaba, Z.F.; Suwais, K.; Mohd, N.A. Web application security: An investigation on static analysis with other algorithms to detect cross site scripting. Procedia Comput. Sci. 2019, 161, 1173–1181. [Google Scholar] [CrossRef]
- Cheah, C.S.; Selvarajah, V. A Review of Common Web Application Breaching Techniques (SQLi, XSS, CSRF). In Proceedings of the 3rd International Conference on Integrated Intelligent Computing Communication & Security (ICIIC 2021), Bangalore, India, 6–7 August 2021. [Google Scholar]
- Liu, M.; Zhang, B.; Chen, W.; Zhang, X. A Survey of Exploitation and Detection Methods of XSS Vulnerabilities. IEEE Access 2019, 7, 182004–182016. [Google Scholar] [CrossRef]
- Rodríguez, G.E.; Torres, J.G.; Flores, P.; Benavides, D.E. Cross-site scripting (XSS) attacks and mitigation: A survey. Comput. Netw. 2020, 166, 106960. [Google Scholar] [CrossRef]
- Hickling, J. What Is DOM XSS and Why Should You Care? Comput. Fraud Secur. 2021, 4, 6–10. [Google Scholar] [CrossRef]
- Panwar, P.; Mishra, H.; Patidar, R. An Analysis of the Prevention and Detection of Cross Site Scripting Attack. Int. J. Emerg. Trends Eng. Res. 2023, 11, 30–34. [Google Scholar] [CrossRef]
- Kascheev, S.; Olenchikova, T. The Detecting Cross-Site Scripting (XSS) Using Machine Learning Methods. In Proceedings of the 2020 Global Smart Industry Conference, GloSIC 2020, Chelyabinsk, Russia, 17–19 November 2020; Institute of Electrical and Electronics Engineers Inc.: Los Alamitos, CA, USA, 2020; pp. 265–270. [Google Scholar] [CrossRef]
- Mokbal, F.M.M.; Dan, W.; Xiaoxi, W.; Wenbin, Z.; Lihua, F. XGBXSS: An Extreme Gradient Boosting Detection Framework for Cross-Site Scripting Attacks Based on Hybrid Feature Selection Approach and Parameters Optimization. J. Inf. Secur. Appl. 2021, 58, 102813. [Google Scholar] [CrossRef]
- Thajeel, I.K.; Samsudin, K.; Hashim, S.J.; Hashim, F. Machine and Deep Learning-based XSS Detection Approaches: A Systematic Literature Review. J. King Saud Univ.—Comput. Inf. Sci. 2023, 35, 101628. [Google Scholar] [CrossRef]
- Banerjee, R.; Baksi, A.; Singh, N.; Bishnu, S.K. Detection of XSS in web applications using Machine Learning Classifiers. In Proceedings of the 2020 4th International Conference on Electronics, Materials Engineering and Nano-Technology, IEMENTech 2020, Kolkata, India, 2–4 October 2020; Institute of Electrical and Electronics Engineers Inc.: Los Alamitos, CA, USA, 2020. [Google Scholar] [CrossRef]
- Gogoi, B.; Ahmed, T.; Saikia, H.K. Detection of XSS Attacks in Web Applications: A Machine Learning Approach. Int. J. Innov. Res. Comput. Sci. Technol. 2021, 9, 1–10. [Google Scholar] [CrossRef]
- Stiawan, D.; Bardadi, A.; Afifah, N.; Melinda, L.; Heryanto, A.; Septian, T.W.; Idris, M.Y.; Subroto, I.M.; Budiarto, R. An Improved LSTM-PCA Ensemble Classifier for SQL Injection and XSS Attack Detection. Comput. Syst. Sci. Eng. 2023, 46, 1759–1774. [Google Scholar] [CrossRef]
- RKadhim, W.; Gaata, M.T. A hybrid of CNN and LSTM methods for securing web application against cross-site scripting attack. Indones. J. Electr. Eng. Comput. Sci. 2020, 21, 1022–1029. [Google Scholar] [CrossRef]
- Buz, B.; Gülçiçek, B.; Bahtiyar, Ş. A Hybrid Machine Learning Model to Detect Reflected XSS Attack. Balk. J. Electr. Comput. Eng. 2021, 9, 235–241. [Google Scholar] [CrossRef]
- Melicher, W.; Fung, C.; Bauer, L.; Jia, L. Towards a lightweight, hybrid approach for detecting DOM XSS vulnerabilities with machine learning. In Proceedings of the Web Conference 2021—Proceedings of the World Wide Web Conference, WWW 2021, Ljubljana, Slovenia, 12–16 April 2021; Association for Computing Machinery, Inc.: New York, NY, USA, 2021; pp. 2684–2695. [Google Scholar] [CrossRef]
- Lamrani Alaoui, R.; Habib Nfaoui, E. Cross Site Scripting Attack Detection Approach Based on LSTM Encoder-Decoder and Word Embeddings. 2023. Available online: www.ijisae.org (accessed on 13 June 2024).
- Gupta, C.; Singh, R.K.; Mohapatra, A.K. GeneMiner: A Classification Approach for Detection of XSS Attacks on Web Services. Comput. Intell. Neurosci. 2022, 2022, 3675821. [Google Scholar] [CrossRef]
- Dawadi, B.R.; Adhikari, B.; Srivastava, D.K. Deep Learning Technique-Enabled Web Application Firewall for the Detection of Web Attacks. Sensors 2023, 23, 2073. [Google Scholar] [CrossRef]
- Tian, Z.; Luo, C.; Qiu, J.; Du, X.; Guizani, M. A Distributed Deep Learning System for Web Attack Detection on Edge Devices. IEEE Trans. Ind. Inf. 2020, 16, 1963–1971. [Google Scholar] [CrossRef]
- Chaudhary, P.; Gupta, B.B.; Chang, X.; Nedjah, N.; Chui, K.T. Enhancing big data security through integrating XSS scanner into fog nodes for SMEs gain. Technol. Forecast. Soc Chang. 2021, 168, 120754. [Google Scholar] [CrossRef]
- Luo, C.; Tan, Z.; Min, G.; Gan, J.; Shi, W.; Tian, Z. A Novel Web Attack Detection System for Internet of Things via Ensemble Classification. IEEE Trans. Ind. Inf. 2021, 17, 5810–5818. [Google Scholar] [CrossRef]
- Odun-Ayo, I.; Toro-Abasi, W.; Adebiyi, M.; Alagbe, O. An implementation of real-time detection of cross-site scripting attacks on cloud-based web applications using deep learning. Bull. Electr. Eng. Inform. 2021, 10, 2442–2453. [Google Scholar] [CrossRef]
- Lei, L.; Chen, M.; He, C.; Li, D. XSS Detection Technology Based on LSTM-Attention. In Proceedings of the 2020 5th International Conference on Control, Robotics and Cybernetics, CRC 2020, Wuhan, China, 16–18 October 2020; Institute of Electrical and Electronics Engineers Inc.: Los Alamitos, CA, USA, 2020; pp. 175–180. [Google Scholar] [CrossRef]
- Tan, X.; Xu, Y.; Wu, T.; Li, B. Detection of Reflected XSS Vulnerabilities Based on Paths-Attention Method. Appl. Sci. 2023, 13, 7895. [Google Scholar] [CrossRef]
- Zhang, X.; Zhou, Y.; Pei, S.; Zhuge, J.; Chen, J. Adversarial Examples Detection for XSS Attacks Based on Generative Adversarial Networks. IEEE Access 2020, 8, 10989–10996. [Google Scholar] [CrossRef]
- Alaoui, R.L.; Nfaoui, E.H. Generative Adversarial Network-Based Approach for Automated Generation of Adversarial Attacks Against a Deep-Learning Based XSS Attack Detection Model. 2023. Available online: www.ijacsa.thesai.org (accessed on 13 June 2024).
- Tariq, I.; Sindhu, M.A.; Abbasi, R.A.; Khattak, A.S.; Maqbool, O.; Siddiqui, G.F. Resolving cross-site scripting attacks through genetic algorithm and reinforcement learning. Expert Syst. Appl. 2021, 168, 114386. [Google Scholar] [CrossRef]
- Thajeel, I.K.; Samsudin, K.; Hashim, S.J.; Hashim, F. Dynamic feature selection model for adaptive cross site scripting attack detection using developed multi-agent deep Q learning model. J. King Saud Univ.—Comput. Inf. Sci. 2023, 35, 101490. [Google Scholar] [CrossRef]
- Van Den Bergh, D.; van Doorn, J.; Marsman, M.; Draws, T.; van Kesteren, E.-J.; Derks, K.; Dablander, F.; Gronau, Q.F.; Kucharský, Š.; Gupta, A.R.K.N.; et al. A tutorial on conducting and interpreting a bayesian ANOVA in JASP. Annee Psychol. 2020, 120, 73–96. [Google Scholar] [CrossRef]
- Omuya, E.O.; Okeyo, G.O.; Kimwele, M.W. Feature Selection for Classification using Principal Component Analysis and Information Gain. Expert Syst. Appl. 2021, 174, 114765. [Google Scholar] [CrossRef]
- Khyat, J.; Chitra, S. Feature Selection Methods for Improving Classification Accuracy-A Comparative Study. UGC Care Group I Listed J. 2020, 10, 1. [Google Scholar]
Author | No. of Features | Feature Types | No. of Benign Samples | No. of Malicious Samples | Total Number of Samples |
---|---|---|---|---|---|
Mokbal et al. [22] | 167 | URL, JS, and HTML | 100.000 | 38.569 | 138,569 |
Feature No. | Feature Name | Feature Description |
---|---|---|
1 | url_length | The length of the URL string in characters. |
2 | url_special_characters | The count of special characters (e.g., !, @, #, $) present in the URL. |
3 | url_tag_script | Binary indicator (0 or 1) representing whether the URL contains the <script> tag, which is commonly exploited in XSS attacks. |
4 | url_cookie | Binary indicator (0 or 1) representing whether the URL contains references to cookies, which may indicate potential security vulnerabilities. |
5 | url_number_keywords_param | The count of predefined keywords (e.g., signup, login, query) present as parameters in the URL. |
6 | url_number_domain | The count of domains referenced in the URL, which may indicate redirection or external linking. |
7 | html_tag_script | Binary indicator (0 or 1) representing whether the HTML content contains the <script> tag, which can execute JS code and potentially lead to XSS vulnerabilities. |
8 | html_tag_meta | Binary indicator (0 or 1) representing whether the HTML content contains the <meta> tag, which is used for metadata information and can be manipulated for malicious purposes. |
9 | html_tag_link | Binary indicator (0 or 1) representing whether the HTML content contains the <link> tag, which is used to define relationships between documents and can be exploited in XSS attacks. |
10 | html_tag_div | Binary indicator (0 or 1) representing whether the HTML content contains the <div> tag, which is commonly used for layout purposes and can be manipulated for XSS attacks. |
11 | html_tag_style | Binary indicator (0 or 1) representing whether the HTML content contains the <style> tag, which is used to define styles and can be manipulated to execute malicious code. |
12 | html_attr_background | Binary indicator (0 or 1) representing whether the HTML content contains the background attribute, which can be exploited for XSS attacks. |
13 | html_attr_href | Binary indicator (0 or 1) representing whether the HTML content contains the href attribute, commonly used for hyperlinks and can be manipulated for XSS attacks. |
14 | html_attr_src | Binary indicator (0 or 1) representing whether the HTML content contains the src attribute, commonly used to specify the source of external resources and can be manipulated for XSS attacks. |
15 | html_event_onmouseout | Binary indicator (0 or 1) representing whether the HTML content contains the onmouseout event attribute, which can execute JS code when the mouse leaves an element and may be exploited for XSS attacks. |
16 | js_file | Binary indicator (0 or 1) representing whether JS files are referenced in the HTML content, which may contain vulnerable code. |
17 | js_dom_location | Binary indicator (0 or 1) representing whether the JS code accesses the location object, which can manipulate the URL and may lead to XSS vulnerabilities. |
18 | js_dom_document | Binary indicator (0 or 1) representing whether the JS code accesses the document object, which represents the HTML document and can be manipulated for XSS attacks. |
19 | js_method_getElementsByTagName | Binary indicator (0 or 1) representing whether the JS code uses the getElementsByTagName() method, which retrieves elements by tag name and may be used in XSS attacks. |
20 | js_method_getElementById | Binary indicator (0 or 1) represents whether the JS code uses the getElementById() method, which retrieves an element by its ID and may be exploited for XSS attacks. |
21 | js_method_alert | Binary indicator (0 or 1) represents whether the JS code uses the alert() method, which displays an alert dialog box and may be used for XSS attacks. |
22 | js_min_length | The minimum length of JS strings in the code. |
23 | js_min_function_calls | The minimum number of function calls in the JS code. |
24 | js_string_max_length | The maximum length of JS strings in the code. |
25 | html_length | The length of the HTML content in characters. |
Model | Evaluation Metrics | Confusion Matrix | ||||||
---|---|---|---|---|---|---|---|---|
Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) | TP (%) | FP (%) | TN (%) | FN (%) | |
LR | 98.28 | 99.38 | 94.32 | 96.79 | 99.78 | 0.22 | 94.32 | 5.68 |
SVMs | 98.53 | 99.21 | 97.84 | 98.52 | 99.22 | 0.78 | 97.84 | 2.16 |
MLP | 99.14 | 99.26 | 99.02 | 99.14 | 99.27 | 0.73 | 99.02 | 0.98 |
ANNs | 99.06 | 99.08 | 99.04 | 99.06 | 99.08 | 0.92 | 99.04 | 0.96 |
CNNs | 98.82 | 99.57 | 98.07 | 98.81 | 99.57 | 0.43 | 98.07 | 1.93 |
XGboost | 99.62 | 99.70 | 99.54 | 99.62 | 99.70 | 0.30 | 99.54 | 0.46 |
DTs | 99.47 | 99.22 | 99.72 | 99.47 | 99.22 | 0.78 | 99.72 | 0.28 |
RF | 99.78 | 99.80 | 99.75 | 99.78 | 99.80 | 0.20 | 99.75 | 0.25 |
Ensemble model (MLP with RF) | 99.65 | 99.59 | 99.71 | 99.65 | 99.59 | 0.41 | 99.71 | 0.29 |
Ensemble model (DTs, RF with GB) | 99.76 | 99.74 | 99.77 | 99.76 | 99.74 | 0.26 | 99.77 | 0.23 |
Author | Methodology | Features | Evaluation Metrics | ||||
---|---|---|---|---|---|---|---|
Algorithms | Feature Selection Method | No. of Selected Features | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) | |
Mokbal et al. [22] (2021) | Hybrid (IG and SBS) | 30 | 99.59 | 99.50 | 99.01 | 99.27 | |
XGBoost | |||||||
Thajeel et al. [42] (2023) | DTs | Dynamic | 167 | 98.81 | 98.16 | 97.70 | 97.84 |
Tariq et al. [41,42] (2023) | Genetic algorithm, statistical inference, and reinforcement learning | - | 167 | 95.38 | 95.93 | 99.54 | 95.20 |
Our proposed models | RF | IG | 25 | 99.78 | 99.80 | 99.75 | 99.78 |
DT and RF with GB | 99.76 | 99.74 | 99.77 | 99.76 | |||
MLP with RF | 99.65 | 99.59 | 99.71 | 99.65 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alhamyani, R.; Alshammari, M. Machine Learning-Driven Detection of Cross-Site Scripting Attacks. Information 2024, 15, 420. https://doi.org/10.3390/info15070420
Alhamyani R, Alshammari M. Machine Learning-Driven Detection of Cross-Site Scripting Attacks. Information. 2024; 15(7):420. https://doi.org/10.3390/info15070420
Chicago/Turabian StyleAlhamyani, Rahmah, and Majid Alshammari. 2024. "Machine Learning-Driven Detection of Cross-Site Scripting Attacks" Information 15, no. 7: 420. https://doi.org/10.3390/info15070420
APA StyleAlhamyani, R., & Alshammari, M. (2024). Machine Learning-Driven Detection of Cross-Site Scripting Attacks. Information, 15(7), 420. https://doi.org/10.3390/info15070420