Detection of SQL Injection Attack Using Machine Learning Techniques: A Systematic Literature Review

: An SQL injection attack, usually occur when the attacker(s) modify, delete, read, and copy data from database servers and are among the most damaging of web application attacks. A successful SQL injection attack can affect all aspects of security, including conﬁdentiality, integrity, and data availability. SQL (structured query language) is used to represent queries to database management systems. Detection and deterrence of SQL injection attacks, for which techniques from different areas can be applied to improve the detect ability of the attack, is not a new area of research but it is still relevant. Artiﬁcial intelligence and machine learning techniques have been tested and used to control SQL injection attacks, showing promising results. The main contribution of this paper is to cover relevant work related to different machine learning and deep learning models used to detect SQL injection attacks. With this systematic review, we aims to keep researchers up-to-date and contribute to the understanding of the intersection between SQL injection attacks and the artiﬁcial intelligence ﬁeld.


Introduction
Most cyber-physical system (CPS) applications are safety-critical; misbehavior caused by random failures or cyber-attacks can considerably restrict their growth. Thus, it is important to protect CPS from being damaged in this way [1]. Current security solutions have been well-integrated into many networked systems including the use of middle boxes, such as antivirus protection, firewall, and intrusion detection systems (IDS). A firewall controls network traffic based on the source or destination address. It alters network traffic according to the firewall rules. Firewalls are also limited to their knowledge of the hosts receiving the content and the amount of state available. An IDS is a type of security tool that scans the system for suspicious activity, monitors the network traffic, and alerts the system or network administrator [2]. In this context, a number of frameworks and mechanisms have been suggested in recent papers.
In this paper, we have considered SQL injection attacks that target the HTTP/HTTPS protocol, which aim to pass through the web application firewall (WAF) and obtain an unauthorized access to proprietary data. SQL injection belongs to the injection family of web attacks, wherein an attacker inserts inputs into a system to execute malicious statements. The victim system is usually not ready to process this input, typically resulting in data leakage and/or granting of unauthorized access to the attacker; in this case, the attacker can access and/or modify the data, affecting all aspects of security, including confidentiality, integrity, and data availability [3].
In an SQL injection, the attacker inserts an SQL statement into an exchange between a client and database server [3]. SQL (structured query language) is used to represent queries to database management systems (DBMSs). The maliciously injected SQL statement is designed to extract or modify data from the database server. A successful injection can result in authentication and bypass and changes to the database by inserting, modifying, and/or deleting data, causing data loss and/or destruction of the entire database. Furthermore, such an attack could overrun and execute commands on the hosted operating system, typically leading to more serious consequences [4].
Thus, SQL injection attacks present aserious threats to organizations. A variety of research has been undertaken to address this threat, presenting various artificial intelligence (AI)techniques for detection of SQL injection attacks using machine learning and deep learning models [5]. AI techniques to facilitate the detection of threats are usually implemented via learning from historical data representing an attack and/or normal data. Historical data are useful for learning, in order to recognize patterns of attacks, understanding detected traffic, and even predicting future attacks before they occur [6].
SQL injection attackers and defenders must understand how SQL language works to know how it can be misused [3]. To extract data from a database or modify the data, queries must be written using SQL language and they must follow a standard syntax, such as: "SELECT * FROM books WHERE author = 'MAHA'" The above query will return all books authored by MAHA. Queries are submitted to the DBMS and are usually written through a web browser. For the query to be transmitted to the database server through the web browser, it has to be encoded through a long URL string, such as: http://www.xyz_website.com?QUERY=SELECT%20*%20FROM%20 books%20WHERE%20author=7453.
What if the attacker adds to the previous SQL query? For example: "SELECT * FROM books WHERE author = MAHA OR 1 = 1 " As the statement 1 = 1 is always true, the query will return all books in the database, not just the books authored by MAHA.
The previous example might not represent a threat, especially if the stored list of books is not confidential. However, it could be applied to valuable using different syntax, and if successful, it might return sensitive data, such as passwords, bank accounts, trade secrets, and personal data, which might be considered a privacy breach, among other consequences.
In some research, injecting a code using 'OR' followed by a TRUE statement, such as 1 = 1 is called "tautology" [7]. Methods other than tautology can be used, such as when an attacker intentionally injects an incorrect query to force the database server to return a default error page, which might contain valuable information that could help an attacker to understand the database to form a more advance attack [7]. The SQL syntax "UNION" can also be used to extract information, in addition to many other methods based on the same idea, of misusing SQL syntax to extract or even update the data in the targeted database. This is how SQL injection works; the question then becomes: how does one detect this type of attack using deep learning methods?
Deep learning is a machine learning and artificial intelligence method. It can be used to support the detection of SQL injection attacks by training a classifier to achieve the ability to recognize and therefore detect an attack. The classifier is trained using deep learning models and can be used to classify new data, such as traffic or data in log files. If the classifier is passive, it will alert the administrator; if it is active, it will prevent data from passing to the database server. The classifier can be trained to recognize and detect SQL injection attacks using three different learning methods [8].
First is, unsupervised learning, where features are extracted from unclassified data, i.e., data that are labelled as neither normal nor abnormal. Using information and the Bayesian probability theory, the classifier detects hidden structures in the unclassified dataset. An unclassified dataset means that it is not known whether these data are normal or abnormal (malicious). Different techniques can be used in unsupervised learning, such as clustering and density estimation [8].
The second is, supervised learning, whereby a labelled training dataset is used to train the classifier. As the input data are labelled, i.e., normal or abnormal, the output is known beforehand. Therefore, the process involves simple mapping between the input training data and the known output, followed by continuous modification of the algorithm and changing of the weights until an acceptable classification accuracy is achieved. Then, a test dataset is used to test the classifier; if the result is with an acceptable accuracy range, the classifier is ready to detect novel data, i.e., data not previously used in training or testing. The main drawback of supervised learning is generating and labelling the training and testing data, which might consume processing time, especially for complex attacks. Supervised learning is categorized into classification and regression algorithms. The most common supervised learning algorithms include Bayesian networks, decision trees, support vector machines (SVMs), K-nearest neighbors, and neural networks. Third is, semi-supervised learning, which use combination of supervised and unsupervised learning methods [8].
The main contribution of this paper is to provide a systematic review of the machine learning and deep learning solutions that, are used to improve the detectability of SQL injection attacks. With this systematic review, we aim to keep researchers up-to-date and contribute to the understanding of the intersection between an SQL injection attack and artificial intelligence.
The paper is organized as follows. Section 1 is an introduction to SQL injection attacks and deep learning algorithms. In Section 2, we discuss related studies and consider previous systematic reviews. In Section 3, we present the research method and planning of the systematic review. In Section 4, highlights the results and review all related studies. In Section 5, presents the discussion and answers to research questions. Finally, in Section 6, we present our conclusions.

Related Studies
In this section, four published systematic reviews were considered. Newer systematic reviews typically include both recent and older studies in the area under investigation. Therefore, all of the papers we considered were relatively recent. The first was published in 2017 [9] and it covered previous primary studies on SQL injection attacks, techniques, and tools. In [9], forty-six primary studies were analyzed related to SQL injection attacks, tools, and techniques, in addition to the impact of the attack. We adapted the same methodology as that used in [9] due to its comprehensiveness and because it achieves satisfying results, in addition, this research was similar to that in [9] in terms of objectives, ideas, and the area of research.
Qiu et al. [10] provided a comprehensive review of using artificial intelligence in attacking and defending against security attacks, concentrating on the training and testing stages. In their study, they sorted technologies and applications of adversarial attacks in terms of natural language processing, cyberspace security, computer vision, and the physical world. Furthermore, the authors considered defense strategies in their research and proposed methods to deal with specific types of adversarial attack. Martins et al. [11] explored more than 15 papers that applied adversarial machine learning techniques used in intrusion and malware detection models. In their study, the authors summarized the most common adversarial attacks and defense mechanisms for intrusion and malware detection.
Muslihi et al. [12] conducted a review of more than 14 studies published using deep learning methods to detect SQL injection attacks, including CNN, LSTM, DBN, MLP, and Bi-LSTM. They also provided a comparison of methods in terms of objectives, techniques, features, and datasets. Muhammad et al. [13] reviewed and analytically evaluated the methods and tools that are commonly used to detect and prevent SQL injection attacks, considering a total of 82 studies. Their review results showed that most researchers focused on proposing approaches to detect and mitigate SQL injection attacks (SQLIAs) rather than evaluating the effectiveness of existing SQLIA detection methods.

Research Method
This systematic literature review was conducted in four main phases: (A) planning the systematic review; (B) conducting the review; (C) reporting the results; and (D) discussing the results. In the planning phase, research questions and the research strategy were set. Section 4 outlines the systematic review. We discuss our results in Section 5. Figure 1 is a representation of the phases of this research.
focused on proposing approaches to detect and mitigate SQL injection attacks (SQLIAs) rather than evaluating the effectiveness of existing SQLIA detection methods.

Research Method
This systematic literature review was conducted in four main phases: (A) planning the systematic review; (B) conducting the review; (C) reporting the results; and (D) discussing the results. In the planning phase, research questions and the research strategy were set. Section 4 outlines the systematic review. We discuss our results in Section 5. Figure 1 is a representation of the phases of this research.

Planning the Systematic Review
Research Questions Q1: What are the machine learning and deep learning methods used to detect SQL injection attacks? Q2: How are SQL injection attack datasets generated using machine learning techniques? Q3: How can machine learning be used to generate adversarial SQL injection attacks?
The first question was the main question decided upon before starting the review, whereas the second and third questions were added later after reviewing other systematic reviews covered in Section 4.

Research Strategy
The libraries used to retrieve the research papers were ACM, IEEE, Springer and Science Direct. The main search topics were SQL injection attacks and machine learning models. The search was configured to retrieve papers published between 2012 and 2021, and we retrieved conference papers, journal articles, and review articles. Some inclusion criteria were defined to select relevant papers among the publications retrieved at the time of the search. These criteria were used to decide which papers to review and which to discard and not include for further study.

•
Papers related to SQL injection attacks; • Papers that included our search keywords; • Papers from the scientific databases ACM, IEEE, SpringerLink, and ScienceDirect.

•
Papers on the topic of machine learning and the security domain.

Planning the Systematic Review
Research Questions Q1: What are the machine learning and deep learning methods used to detect SQL injection attacks? Q2: How are SQL injection attack datasets generated using machine learning techniques? Q3: How can machine learning be used to generate adversarial SQL injection attacks?
The first question was the main question decided upon before starting the review, whereas the second and third questions were added later after reviewing other systematic reviews covered in Section 4.

Research Strategy
The libraries used to retrieve the research papers were ACM, IEEE, Springer and Science Direct. The main search topics were SQL injection attacks and machine learning models. The search was configured to retrieve papers published between 2012 and 2021, and we retrieved conference papers, journal articles, and review articles. Some inclusion criteria were defined to select relevant papers among the publications retrieved at the time of the search. These criteria were used to decide which papers to review and which to discard and not include for further study.

•
Papers related to SQL injection attacks; • Papers that included our search keywords; • Papers from the scientific databases ACM, IEEE, SpringerLink, and ScienceDirect. • Papers on the topic of machine learning and the security domain.

Exclusion Criteria
• Papers not covering machine learning techniques and SQL injection attacks; • Papers published before 2012; and • Papers that are not available in full-text format.

Conducting the Review
After filtering retrieved studies according to the inclusion criteria, 36 studied were retained. Selected studies were reviewed, as they could possibly provided answers to the research questions.
Q1: What are the machine learning and deep learning methods used to detect SQL injection attacks?
Many researchers have demonstrated the use of machine learning and deep learning algorithms to detect SQL injection attacks [14]. Hasan and Tarique [14] tested and compared 23 machine learning classifiers using MATLAB. They generated their own datasets, into which they injected abnormal SQL syntax. They checked and manually verified the SQL statements. A total of 616 SQL statements were used to train the test classifiers. The used the following machine learning algorithms: "coarse k-NN, bagged trees, linear SVM, fine k-NN, medium k-NN, RUS boosted trees, subspace discriminant, boosted trees, weighted k-NN, cubic k-NN, linear discriminant, medium tree, subspace k-NN, simple tree, quadratic discriminant, cubic SVM, fine Gaussian SVM, cosine k-NN, complex tree, logistic regression, coarse Gaussian SVM, medium Gaussian, and SVM". The five best models in terms of accuracy were determined to be ensemble boosted, bagged trees, linear discriminant, cubic SVM, and fine Gaussian SVM.
Gao et al. [15] proposed a model called ATTAR to detect SQL injection attacks by analyzing web access logs to extract SQL injection attack features. The features were chosen based on access behavior mining and a grammar pattern recognizer. The main target of this model was detection of unknown SQL injection statements that had not been previously used in the training data. Five machine learning algorithms were used for training: naive Bayesian, random forest, SVM, ID3, and k-means. The experimental results showed that the accuracy of the models based on random forest and ID3 achieved the best results in detecting SQL injection attacks. We could not find what ATTAR stands for in [15].
Gandhi et al. [16] proposed a hybrid CNN-BiLSTM-based model for SQL injection attack detection. The authors presented a detailed comparative analysis of different types of machine learning algorithms used for detection of SQL injection attacks. The CNN-BiLSTM approach provided accuracy of approximately 98%, compared withother described machine learning algorithms.
Zhang [17] presented a machine learning classifier to detect SQL injection vulnerabilities in PHP code. Multiple machine learning algorithms were trained and evaluated, including random forest, logistic regression, SVM, multilayer perceptron (MLP), long shortterm memory (LSTM), and a convolutional neural network (CNN). Zhang found that CNN provided the best precision of 95.4%.
Gi Li et al. [18] proposed an adaptive deep forest model (ADF) with the integration of the AdaBoost algorithm. AdaBoost stands for adaptive boosting, which is a statistical classification algorithm, and the deep forest model is a layered model based on a deep neural network. The adaptive deep forest model proposed in [16] achieved high efficiency, comparable to that of traditional machine learning models, such as decision trees, and a better performance compared with regular deep neural network models, such as RNN and CNN.
Uwagbole et al. [19] created a dataset using symbolic finite automata to train a classifier to detect SQL injection attacks. The generated data were labelled, and training was conducted with a supervised learning model with an ML algorithm of two-class support vector machine (TC SVM) and two-class logistic regression (TC LR). The generated models were evaluated using a receiver operating characteristic (ROC) curve.
Ahmed et al. [20] proposed an SQL injection detection method using an ensemble learning algorithm and natural language processing (NLP) to generate a bag-of-words model used to train a random forest classifier. Prediction was also considered in this research to improve the detection ability of the classifier. In this study, decision tree, naïve Bayes, SVM, and k-NN classification models were also trained to classify the same testing dataset, and their performances were compared with that of the proposed method. The experimental results showed that the proposed method achieved better accuracy, higher TPR, and lower FNR than the other four classifiers. Evaluation metrics were used to measure the performance of the classifier. The measurements were based on a confusion matrix, accuracy, precision, true-positive rate, false-positive rate, true-negative rate, falsenegative rate, receiver operating characteristic curve, and area under the curve.
Tripathy et al. [21] created a dataset by gathering and combining a large number of smaller datasets. The generated dataset was labelled, and the learning model was supervised learning. They trained seven machine learning models: decision tree, AdaBoost, random forest, optimized linear, TensorFlow linear, deep ANN, and a boosted trees classifier. Then, they compared the seven algorithms in terms of performance and accuracy. The results showed that the random forest classifier outperformed all other classifiers and achieved an accuracy of 99.8%.
Chinmay and Kulkarni [22] proposed a novel approach to detection of SQL injection attacks using a human agent knowledge transfer (HAT) and TD machine learning algorithm. In this model, a machine learning agent acted as a maze game to differentiated between normal SQL queries and malicious SQL queries. If the incoming SQL query was an SQL injection attack query, then it gained more rewards and was deemed an SQL injection attack query before achieving the final state. This machine learning approach achieved an accuracy of 95%.
Makiou et al. [23] proposed a detection system based on two approaches. The first detection method was based on pattern matching, which is the same as a signature-based detection system whereby the classifier has a database of SQL attack signatures and only inspects the HTTP URL in an attempt to find a match. The second detection method used was based on machine learning techniques. To build this model, the authors collected malicious data and trained the classifier with these data by extracting the features representing attacks. The following algorithms were employed: SVM, naïve Bayes, and K-nearest neighbor. The performance of the classifier was measured using the total cost ratio (TCR).
Kar et al. [24] trained a support vector machine (SVM) to detect malicious SQL queries by modelling the WHERE clause of a query as an interaction network of tokens and computing the centrality of the nodes. Node centralities were used to quantify the degree of importance or centrality of a node in the network. The experimental results obtained on a dataset collected from five web applications using some automated attack tools, confirmed that three of the centrality measures used in this study can effectively detect SQL injection attacks with minimal impact on performance.
Wang et al. [25] analyzed the existing SQL injection detection algorithms in an intelligent transportation system. The authors proposed a long short-term memory (LSTM)-based SQL injection attack detection method and a method of generating SQL injection samples to augment the dataset. This method can simulate SQL injection attacks and generate valid positive samples to solve the problem of overfitting caused by a lack of positive samples. The experimental results showed that the accuracy, precision, and F1 score of the proposed method were all above 92%. Kamtuo and Soomlek. [26] proposed a framework for SQL injection prevention via server-side scripting using machine learning and compiler platforms. A dataset of 1100 samples of SQL commands were trained in four machine learning models: boosted decision tree, decision tree, support vector machine (SVM), and an artificial neural network. The results indicate that the decision tree algorithm achieved the highest prediction efficiency among the tested models.
Sivasangari et al. [27] used the AdaBoost algorithm to detect SQL injection attacks. In this study, the data were converted into stumps, which were classified as weak stumps providing less weight to the output or strong stumps providing the highest weight in the overall output. The experimental result showed that the proposed algorithm accurately and effectively detected injection attacks.
Daset al. [28] proposed a method for classifying dynamic SQL queries as either attacks or normal based on a web profile prepared during the training phase. Naïve Bayes, SVM, and parse tree approaches were used for the classification process. The overall detection rate using the two datasets was 91% and 90%, respectively.
Kasim [29] designed a method to detect malicious SQL queries. Decision tree algorithms were used for the classification processes to detect different levels of SQL injection. The proposed model maintained an accuracy more than 98% in detecting SQL injection attacks and an accuracy of 92% in classifying the level of attack as simple, unified, or lateral.
Tanget et al. [30] presented a simple method for SQL injection attack detection based on an artificial neural network. First, a large amount of SQL injection data were analyzed to extract the relevant features. Then, a variety of neural network models, such as MLP and LSTM, were trained. The experimental results showed that the detection rate of MLP was better than that of LSTM.
Erdődiet al. [31] automatized the process of exploiting SQL injection attacks through reinforcement learning agents. In this study, the problem was modelled as a Markov decision process. The experimental results show that reinforcement learning agents can be used in the future to perform security assessment and penetration testing.
Kar et al. [32] presented a detection method by modeling SQL queries as a graph of tokens and utilized the centrality measure of tokens to train single and multiple SVM classifiers. The system was tested using directed and undirected graphs with different SVM classifiers. The experimental results demonstrated that the proposed technique is able to effectively identify malicious SQL queries.
Solomon et al. [33] presented a model of a two-class support vector machine (TCSVM) to predict binary labelled outcomes concerning whether an SQL injection attack was positive or negative in a web request. This model intercepted web requests at the proxy level and applied ML predictive analytics to predict SQL injection attacks.
Mcwhirter et al. [34] presented a novel approach for classifying SQL queries. A gapweighted string subsequence kernel algorithm was used to compute the similarity metric between the query strings. Then, the support vector machine was trained on the similarity metrics to determine whether the query strings was normal or malicious. The proposed approach was evaluated using a number of datasets and achieved 92.48% accuracy.
Mejia-Cabrera et al. [35] presented a new approach to the construction of a dataset with a NoSQL query database. Six classification algorithms were trained and evaluated to identify SQL injection attacks, which included: decision tree, SVM, random forest, k-NN, neural network, and multilayer perceptron. The experimental results showed that the last two algorithms obtained an accuracy of 97.6%.
Pathak et al. [36] trained a progressive neural network model with a naïve Bayesian classifier to successfully detect SQL injection attacks. Progressive neural networks were trained using parameters such as error-based, time-based, SQL query and, union-based SQL injection attacks. The proposed method achieved an accuracy of 97.897%.
Wang et al. [37] proposed a hybrid approach using tree-vector kernels in SVM to learn SQL statements. The authors used both the parse tree structure of SQL queries and the query value similarity characteristic to distinguish between malicious and benign queries. The results confirmed the benefit of incorporation to efficiently and accurately identify abnormal queries.
Fang et al. [38] proposed a tool based on LSTM neural networks and the word vectors of SQL tokens. According to the syntactic functions of the SQL queries, each query was converted into sequences of tokens to build an SQL word vector model. Then, the LSTM neural network was trained. The results of the experiment showed that the proposed tool achieved an accuracy of 98.60%.
Zhang et al. [39] proposed a deep learning-based approach to detect SQL injection attacks in network traffic. The proposed approach selected only the target features needed by the model to be trained using a deep belief network (DBN) model. The authors also employed test data to test the performance of different models, including LSTM, CNN, and MLP. According to the experimental results, DBN achieved an accuracy of 96%.
Priyaa et al. [40] proposed a framework that combined the EDADT (efficient data adaptive decision tree) algorithm and the SVM classification algorithm to detect SQL injection attacks. The employed dataset was created using the MovieLens dataset system for movie recommendations, which included user login and movie details. The experimental results showed that the proposed approach achieved an accuracy of 99.87%.
Joshi et al. [41] proposed a method for detecting SQL injection using the naïve Bayes machine learning algorithm. The authors applied a tokenization process to break the query into meaningful elements called tokens. Then, the list of tokens became an input for the further classification processes. The result of the naïve Bayes approach was analyzed using precision, recall, and accuracy.
Q2: How are SQL injection attack datasets generated using machine learning techniques? Many researchers have been developed and generated their SQL injection datasets instead of using existing datasets [42]. Islam et al. [43] developed a training dataset for NoSQL injection to manually design important features using various supervised learning algorithms. In this study, the authors generated a dataset including approximately 75% benign and 25% injection queries, which was tested on a local server.
Appelt et al. [44] proposed automated testing techniques that generated SQL injection attacks, bypassing web application firewalls (WAFs). The authors developed SQL injection grammar based on existing SQL injection attacks, as well as an automated input generation technique to automatically generate attack payloads. Then, machine learning was used to efficiently generate additional payloads and new successful attacks with a high probability of bypassing the firewall.
Ross et al. [42] proposed a system consisting of three phases to generate data: traffic generation, capture, and preprocessing. In the traffic generation phase, the simulated normal and malicious traffic was generated from the scripts located on the traffic generation server. Then, the traffic was captured by the webapp server and at the Datiphy appliance. Finally, data preprocessing was achieved with bash shell scripts on the webapp server. The resulting data from preprocessing was imported into Weka, which is a machine learning framework that includes many ML tools. The data were processed into word vectors using the weak filter StringToVec. Then correlated feature selection was employed to reduce the number of features for efficient machine learning.
Liu et al. [45] proposed a tool called DeepSQLi to generate test cases for detection of SQL injection attacks using a deep learning model and sequence-of-words prediction. DeepSQLi used the neural language model, which can be trained to learn semantic features of SQL attacks to translate the test case (or user input) into a new test case. Therefore, DeepSQLi is able to generate SQL injection attacks that have not been captured by patterns in the training datasets. Siddiq et al. [46] proposed a learning-based SQL injection fix tool called SQLIFIX. This tool creates an abstraction of SQL injection code from a training dataset that consists of 14 projects and then clusters them using hierarchical clustering. The proposed approach generated correct solutions for 67.52% of cases for Java and 41.33% of correct solutions for PHP on an independent test set.
Naghmeh [47] proposed a model for the detection of SQLI attacks using artificial intelligence (AI) techniques. This model consisted of three main components: uniform resource locator (URL) generator to generate thousands of normal and malicious URLs; a URL classifier to classify all generated URLs as either normal or malicious; and a neural network (NN) model to detect whether a given URL was a malicious, or benign URL. The model was first trained and then evaluated by employing both benign and malicious URLs. URL classifiers were also used to convert all generated URLs into strings of logic (1 = malicious; 0 = benign).
Q3:How can machine learning be used to generate adversarial SQL injection attacks? Adversarial machine learning (AML) is based on the threats posed by an attacker with the aim of being incorrectly classified by the victim machine learning algorithm. Generating an adversarial SQL injection dataset starts with a target malicious query that was correctly detected. And then, a set of mutation operators was iteratively applied in order to generate new queries [48].
Demetrio and Valenza [48] developed a tool named WAF-A-MoLE to generate adversarial examples against web application firewalls (WAFs) by applying a set of syntactic mutations. The authors produced a dataset of SQL injection queries through an automatic procedure. To evaluate the effectiveness of the proposed tool, it was applied to different ML-based WAFs and evaluated in terms of their robustness against WAF-A-MoLE.
Appelt et al. [49] proposed a black-box automated technique, named 4SQLi, for generating test inputs that could bypass security filters, resulting in executable SQL queries. This technique was based on a set of multiple mutation operators that manipulated inputs to produce new test inputs to trigger SQLi attacks, making it possible to create inputs that contained new attack patterns, thus increasing the possibility of generating a successful SQLi attacks.

Machine Learning and Deep Learning Techniques for Detection of SQL Injection Attacks (Related to Q1)
In this section, the results reported in Section 4 are discussed. In related studies, various algorithms and techniques can be used for detecting SQL injection attacks. Table 1 summarizes the algorithms under review, in addition to the employed datasets and evaluation methods.
TD Machine Learning Technique Not mentioned Not mentioned 95%. - SVM, Naïve Bayes, K-Nearest Neighbor  Table 1 shows that most of the studies focused on using supervised machine learning to detect and classify SQL injection attacks; 89% of the studies used supervised learning, and 4% used unsupervised learning and mixed learning, whereas 3% used other types of learning, as shown in Figure 2.  Table 1 shows that most of the studies focused on using supervised machine learn to detect and classify SQL injection attacks; 89% of the studies used supervised learni and 4% used unsupervised learning and mixed learning, whereas 3% used other types learning, as shown in Figure 2.

Generating SQL Injection Attack Datasets Using Machine Learning Techniques (Related Q2)
A high-quality dataset for training is essential for machine learning and deep lea ing methods to achieve effective detection performance. It is difficult to identify suita datasets with patterns to train classifiers in SQL injection attack research [30]. The resu of the studies reviewed in Section 4 showed that, after automatically generating SQL jection attack payloads from different web applications, machine learning techniques c learn incrementally learn the payloads that are passed or blocked by the firewalls and c be used to efficiently generate additional payloads with high probability of bypassing firewall. A total of 83% of the reviewed studies used datasets collected from public rep itories and HTTP requests. The remaining 17% of the reviewed studies used datasets c ated by the authors using deep learning models that can be trained to learn the seman features of SQL attacks to generate new test cases from user inputs.

Generating Adversarial SQL Injection Attacks Using ML Techniques (Related to Q3)
The result reported in Section 4 showed that adversarial SQL injection attacks can generated using mutation operators, which are a set of operators that alter the syntax the original payload without affecting its semantics. Such operators can be classified i three classes based on their purpose: behavior-changing, syntax-repairing, and obfusc ing operators [49,50]. Table 2 provides a summary of the mutation operators.

Generating SQL Injection Attack Datasets Using Machine Learning Techniques (Related to Q2)
A high-quality dataset for training is essential for machine learning and deep learning methods to achieve effective detection performance. It is difficult to identify suitable datasets with patterns to train classifiers in SQL injection attack research [30]. The results of the studies reviewed in Section 4 showed that, after automatically generating SQL injection attack payloads from different web applications, machine learning techniques can learn incrementally learn the payloads that are passed or blocked by the firewalls and can be used to efficiently generate additional payloads with high probability of bypassing the firewall. A total of 83% of the reviewed studies used datasets collected from public repositories and HTTP requests. The remaining 17% of the reviewed studies used datasets created by the authors using deep learning models that can be trained to learn the semantic features of SQL attacks to generate new test cases from user inputs.

Generating Adversarial SQL Injection Attacks Using ML Techniques (Related to Q3)
The result reported in Section 4 showed that adversarial SQL injection attacks can be generated using mutation operators, which are a set of operators that alter the syntax of the original payload without affecting its semantics. Such operators can be classified into three classes based on their purpose: behavior-changing, syntax-repairing, and obfuscating operators [49,50]. Table 2 provides a summary of the mutation operators. Table 2. Summary of mutation operators (adopted from [50]).

MO Class MO Name Description Example
Behavior-Changing Operators MO or Adds an OR clause to the input Original input: "SELECT * FROM table WHERE id= " the input will change the logic of the statement and turns it as follows: "SELECT * FROM table WHERE id = 1 OR 1 = 1

MO and
Adds an AND clause to the input MO semi Adds a semicolon followed by an additional clause Syntax-Repairing Operators MO par Appends a parenthesis to a valid input Original inpt: "SELECT * FROM table WHERE character = CHR(" + input + ")" The changed SQL statement: SELECT * FROM

Conclusions
SQL injection attacks represent a major threat to web applications, and this may have major implications for privacy and security. Machine learning and deep learning applications have achieved considerable success in detecting this type of web attack. In this study, we conducted a systematic literature review of 36 articles related to research on SQL injection attacks and machine learning techniques. We identified the most commonly used machine learning techniques to detect all types of SQL injection attacks. The review results showed that few studies used machine learning tools and methods to generate new SQL injection attack datasets. Similarly, the results showed that only a few studies focused only on using mutation operators to generate adversarial SQL injection attack queries. In future work, we aim to cover the use of other machine learning and deep learning models to generate and detect SQL injection attacks., In addition to investigating the use of other AI techniques to generate adversarial SQL injection attacks, such as generative adversarial networks (GANs).