Multimodel Phishing URL Detection Using LSTM, Bidirectional LSTM, and GRU Models

: In today’s world, phishing attacks are gradually increasing, resulting in individuals losing valuables, assets, personal information, etc., to unauthorized parties. In phishing, attackers craft malicious websites disguised as well-known, legitimate sites and send them to individuals to steal personal information and other related private details. Therefore, an efﬁcient and accurate method is required to determine whether a website is malicious. Numerous methods have been proposed for detecting malicious uniform resource locators (URLs) using deep learning, machine learning


Introduction
Most of our daily activities are Internet-based, including communication, business, marketing, education, travel, and shopping.Therefore, with the massive growth of Internet usage, the likelihood of sharing personal information online has also grown rapidly, making sensitive information vulnerable to cybercrime.While the Internet has many benefits, it is also used by criminals to commit cybercrimes, including phishing.A phishing attack, an effective cybercrime, is a social engineering technique where a fraudulent message is sent through email, chat applications, or text to a victim on the premise of arriving from a safe source; its main aim is to trick the recipient to reveal sensitive information.According to a phishing and fraud report, phishing attacks soared by 220% during the COVID-19 pandemic [1].In a phishing attack, criminals attempt to steal private information, such as login credentials and financial details, from individuals for fraudulent use [2].
The first known instance of a phishing attack occurred in the mid-1990s, when a group of hackers or phishers called the warez community stole login credentials and personal information from AOL users [3].In early 2000, attackers turned their attention to financial systems and launched an attack on E-Gold in June 2001 [4].By 2003, phishers registered several domain names that resembled the names of legitimate commercial sites such as eBay and PayPal and sent mass-mailings to customers asking them to visit the sites and provide their personal information and credit card details [5].In 2020, Google registered 2.02 million malicious sites, which is a 19% increase over 2019.In 2021, CISCO's cybersecurity threat trend reported that 90% of data breaches occur due to phishing.Phishing attacks are a major and serious issue worldwide.The prevention of such attacks is becoming increasingly complicated [6].Various strategies have been proposed to overcome phishing; among Future Internet 2022, 14, 340 2 of 15 them, two methods stand out: traditional and nontraditional.The first includes legal, education and awareness, blacklist/whitelist, visual similarity, and search engine; the second comprises techniques such as artificial intelligence (AI)-based, content-based, deeplearning-and machine-learning-based, heuristics-inspired, data mining, and fuzzy-rulebased techniques [7].
Blacklisting is the most commonly used method by modern browsers to detect phishing websites.However, this protection method fails to detect the zero-day phishing sites [8,9].Machine-learning-based techniques have also been used to detect phishing uniform resource locators (URLs).To detect a phishing website, URLs first need to be analyzed for feature extraction, then a training set is built using the extracted features along with their labels, followed by supervised machine-learning techniques [10].In this study, we focus on deep learning.Recurrent neural networks (RNNs) are among the most common deep-learning techniques for the classification and prediction of sequential data [11].This paper presents a classification method for detecting malicious URLs using normal LSTM, bidirectional long short-term memory (Bi-LSTM), and gated recurrent units.
The research contributions in the study are highlighted in the following points: • In all web-based malicious activities, users are required to click a URL; using this URL's information, we aim to develop a deep learning model that detects malicious URLs.

•
The URL is padded as a step of sequences; therefore, instead of RNN, we have proposed LSTM-based architectures such as LSTM, Bi-LSTM, and gated recurrent units (GRU), as RNNs are subject to the vanishing gradient problem.However, with the ability to better understand the URL input, Bi-LSTM has an edge in terms of accuracy in detecting malicious URLs.The performance of the proposed models LSTM, Bi-LSTM, and GRU was analyzed using different metrics, such as precision, recall, F1 score, and accuracy.

•
The architecture and working steps of each proposed algorithm are demonstrated in detail, and a detailed comparative performance with other existing models is conducted.

•
The proposed model can be used for real-time website detection.
The structure of the paper is organized as follows: Section 2 presents a literature review and related works.Section 3 provides a conceptual understanding, theoretical foundations, and mathematical model of the proposed mechanisms.Section 4 introduces and discusses the experimental results and outcomes.Finally, concluding remarks and future work are presented in Section 5.

Related Work
In [12], the authors proposed a novel generalized phishing detection system based on an AV binary-modified equilibrium optimizer with k-nearest neighbors (KNN).The system uses a binary version of a modified equilibrium optimizer (AV-BMEO) for feature selection and a K-nearest neighbor machine learning algorithm for classification.
Balogun et al. in [13] introduced a functional tree meta-learning mechanism for phishing site detection.A functional tree-based (FT) model is highly effective for detecting phishing and legitimate websites with better accuracy, and it is recommended.In [14], researchers proposed a novel method to detect a phishing URL website using a self-attention convolutional neural network (CNN) algorithm.The authors used an imbalanced dataset and a generative adversarial network (GAN) deep learning model to produce data for an imbalanced dataset.Next, they combined the CNN deep learning model and multithread self-attention to build the classifier.Linear and nonlinear space-transformation-based methods have been used [15] for malicious URL detection via feature engineering.In this study, a two-stage distance metric technique was developed for linear transformation and the Nyström approach for kernel approximation was introduced for both linear and nonlinear transformations.
A machine-learning-based predictive model to classify websites as phishing and legitimate was introduced in 2020 [16].The authors in this study proposed a machinelearning-based system to detect phishing websites.Support vector machines, convolutional neural networks, and K-nearest neighbor machine learning algorithms were used to detect phishing websites.Haynes et al. proposed a lightweight URL-based phishing detection method using natural language processing transformers for mobile devices.They applied the artificial neural network (ANNs) model to URL-based and HTML-based website features to distinguish malicious from legitimate URLs, and the proposed method can only be used on mobile devices [17].In 2018, the authors of [18] implemented a neuralnetwork-based model to detect phishing websites.This model used neural-network-based classification with a simple and stable Monte Carlo algorithm.
In [19], a model for detecting spam emails was proposed, which is a hybrid system of neural networks.The proposed approach used a technique that automatically adds the number of emails to corpus datasets.Babagoli et al. [20] developed a model based on a heuristic-based regression approach combined with a decision tree algorithm and a wrapper feature selection approach to detect phishing websites.The authors of [21] proposed a model that uses the extracted heuristic features from the website itself.The authors used eight different machine learning (ML) algorithms for the evaluation, and the principal component analysis random forest (PCA-RF) algorithm yielded the highest accuracy and image analysis.Yasin and Abuhasan [22] developed an intelligent classification model for detecting phishing emails.Publishers mainly used two models: knowledge discovery, which extracts the features from the given string, and data mining, which selects the best classification model.A Java program was used to extract the features from the email header and body, after which a data mining algorithm was applied to the extracted features to determine the algorithm with the best results.
In this study, we checked the potential capabilities of LSTM, Bi-LSTM, and GRU for detecting malicious URLs.Compared to RNN, LSTM is a better choice, and RNNs are difficult to train for the input dataset of URLs because they have long-term temporal dependencies.The reason for this is that the gradient decay of the loss function results in nonpolynomial time.By contrast, LSTM uses special units with additional memory cells and can retain information for a long time.Therefore, regardless of which URLs LSTM passes using hidden layers, it preserves such information of URLs; therefore, it is unidirectional, whereas Bi-LSTM tends to run the URLs training data in two ways: first, from past to future, and second, from future to past.LSTM has one hidden layer state to train URLs, whereas Bi-LSTM has two hidden states for training.Finally, GRU is almost similar to LSTM with a less complex structure compared to LSTM [23][24][25].

Proposed Models
The overall architecture of the proposed approach is shown in Figure 1.The proposed system has as illustrated in Figure 1 has four main structures: input URL, data preprocessing, training, and classification.The input data is legitimate and malicious URLs dataset collected from a Kaggle source.The next step is data preprocessing, in which we developed a character-embedding mechanism that encodes all the available characters in the input URL into a numerical form.Further data preprocessing steps and mechanisms are briefly discussed in Section 4.2.In the next step, the training set is fed into the proposed deep learning model to train it to perform the desired task.Once our model is trained and evaluated, the last step is prediction.In this step, real-time URL data are passed to the model, which predicts the maliciousness or legitimacy of the given URL.

Proposed Model I: Long Short-Term Memory (LSTM)
Recurrent neural networks are a form of neural network that is good for processing sequence data prediction.As RNNs process more steps, they are more susceptible vanishing gradients than other neural networks [26].LSTMs and GRU-based RNNs are methods for overcoming the challenges of simple RNNs [27].LSTM, proposed in 1997 by Hochreiter and Schmidhuber, is an evolution of RNNs capable of learning long-term dependencies and remembering input information for a long period through gates [28].It is composed of a cell state, an input gate, a forget gate, and an output gate [29].The detailed architecture of LSTM is shown in Figure 2 [30].In the LSTM model, the cell state   is the main chain for the forward data flow.Two steps must be updated: one is from the forget gate   , which decides which information to

Proposed Model I: Long Short-Term Memory (LSTM)
Recurrent neural networks are a form of neural network that is good for processing sequence data prediction.As RNNs process more steps, they are more susceptible vanishing gradients than other neural networks [26].LSTMs and GRU-based RNNs are methods for overcoming the challenges of simple RNNs [27].LSTM, proposed in 1997 by Hochreiter and Schmidhuber, is an evolution of RNNs capable of learning long-term dependencies and remembering input information for a long period through gates [28].It is composed of a cell state, an input gate, a forget gate, and an output gate [29].The detailed architecture of LSTM is shown in Figure 2 [30].

Proposed Model I: Long Short-Term Memory (LSTM)
Recurrent neural networks are a form of neural network that is good for processing sequence data prediction.As RNNs process more steps, they are more susceptible vanishing gradients than other neural networks [26].LSTMs and GRU-based RNNs are methods for overcoming the challenges of simple RNNs [27].LSTM, proposed in 1997 by Hochreiter and Schmidhuber, is an evolution of RNNs capable of learning long-term dependencies and remembering input information for a long period through gates [28].It is composed of a cell state, an input gate, a forget gate, and an output gate [29].The detailed architecture of LSTM is shown in Figure 2 [30].In the LSTM model, the cell state   is the main chain for the forward data flow.Two steps must be updated: one is from the forget gate   , which decides which information to In the LSTM model, the cell state C t is the main chain for the forward data flow.Two steps must be updated: one is from the forget gate f t , which decides which information to keep or forget from C t .Data from h t−1 and information from x t are moved through the σ values that yield a 0 or 1.Here, 1 represents keep, and 0 represents forget.The final result from the forget gate is multiplied by the old cell state C t−1 Equations ( 1) and ( 2).The second is from the input gate i t and the candidate memory cell C t , which decides whether to add new information.In an input gate, a sigmoid layer determines whether a piece of new information should be added.In the candidate memory cell, the input from h t−1 and the current input pass through a hyperbolic tangent function.The result from the input gate was multiplied by the values of the candidate memory cell.Finally, the two values from the first and second steps were added to update the cell state [31].
Finally, the output gate determines the output value.First, the previous output and current input pass through a sigmoid function and are then multiplied by the newly updated cell after passing through a tanh function.See Algorithm 1.In a Bi-LSTM network, the information flows in two directions through the backward and forward layers [30], whereas in regular long-short-term memory, there is only one possible way of information flow, either using a backward or forward layer.In the Bi-LSTM model, the output layer can obtain information from the past and future states simultaneously.The general architecture of the Bi-LSTM is shown in Figure 3 [32,33].See Algorithm 2.
where A f t = forward-layer output sequence, A b t = backward-layer output sequence, y t = output vector, σ = activation function used to merge A

Proposed Model III: Gated Recurrent Unit (GRU)-Based RNN
The third model, the GRU-based RNN, is similar to regular LSTM.As in the normal gate, three gates were used; the GRU has two gates, namely the "reset gate" and "update gate" [34,35].Here, r t forgets the LSTM cell gate.It is a combination of the previous hidden state and current input and determines how much of the past information is neglected.A general representation of the GRU is shown in Figure 4. See Algorithm 3.
Step 5 : End for Step 6 : Obtain Y by merging    and    using sigmoid activation function Step 8 : End

Proposed Model III: Gated Recurrent Unit (GRU)-Based RNN
The third model, the GRU-based RNN, is similar to regular LSTM.As in the no gate, three gates were used; the GRU has two gates, namely the "reset gate" and "u gate" [34,35].Here,   forgets the LSTM cell gate.It is a combination of the previou den state and current input and determines how much of the past information glected.A general representation of the GRU is shown in Figure 4. See Algorithm 3. The GRU model is implemented using the following equations [30,36,37]: ℎ −1 and   merge and pass through a sigmoid function and the result stored in   , Equation (11).
Step 4 : Compute   from ℎ −1 and   (different biases and weights Equat (12) Step 5 : Compute ℎ ̃ by combining the new input   with the reset ℎ −1 and pass their output through a tanh function Equation (13) Step 6 : Subtract   from a vector containing all 1s and multiply it with the previous hidden state.Step 7 : The output from Step 4   multiplied with the output from Step 5 ℎ ̃ The GRU model is implemented using the following equations [30,36,37]:

Dataset
This section provides information regarding the dataset used to evaluate the phishing detection models proposed in this study.We used a dataset from Kaggle contain-ing 450,176 URLs, along with 345,738 legitimate and 104,438 phishing URLs [https:// www.kaggle.com/datasets/siddharthkumar25/malicious-and-benign-urls](Accessed on 11 September 2022).The numbers of the phishing and legitimate sites used for implementation are shown in Figure 5.

Dataset
This section provides information regarding the dataset used to evaluate the phishing detection models proposed in this study.We used a dataset from Kaggle containing 450,176 URLs, along with 345,738 legitimate and 104,438 phishing URLs [https://www.kaggle.com/datasets/siddharthkumar25/malicious-and-benign-urls](Accessed on 11 September 2022).The numbers of the phishing and legitimate sites used for implementation are shown in Figure 5.

Data Preprocessing
In the data processing part, the first step that must be performed is transforming every character in the URL into a numerical form.In this stage, the characters from the URL are transformed into numbers using a character-level tokenization technique.Before passing these tokens to our deep learning model, we must ensure that the variable-length sequence is the same.For a fixed number of characters in a URL length (L), if the given input URL is greater than L, the excessive characters will be removed, and if the given input URL length is less than the fixed length L, an appropriate number of zeros will be added to the matrix before or after the characters in each row.This sequence of numbers is then turned into an embedding using embedding mechanisms, and finally, the translated URL is transferred into the proposed system layers.

Result and Discussion
This section presents the experimental results for each proposed algorithm using different performance metrics.

Performance Metrics
To evaluate the proposed phishing URL detection method using different deep learning techniques, we used a set of different evaluation metrics.Some of the measurements used to analyze our work performance are the confusion matrix (Table 1), which is one of

Data Preprocessing
In the data processing part, the first step that must be performed is transforming every character in the URL into a numerical form.In this stage, the characters from the URL are transformed into numbers using a character-level tokenization technique.Before passing these tokens to our deep learning model, we must ensure that the variable-length sequence is the same.For a fixed number of characters in a URL length (L), if the given input URL is greater than L, the excessive characters will be removed, and if the given input URL length is less than the fixed length L, an appropriate number of zeros will be added to the matrix before or after the characters in each row.This sequence of numbers is then turned into an embedding using embedding mechanisms, and finally, the translated URL is transferred into the proposed system layers.

Result and Discussion
This section presents the experimental results for each proposed algorithm using different performance metrics.

Performance Metrics
To evaluate the proposed phishing URL detection method using different deep learning techniques, we used a set of different evaluation metrics.Some of the measurements used to analyze our work performance are the confusion matrix (Table 1), which is one of the metrics mainly used to analyze and evaluate the performance of the URL detection mechanism.In Figure 8a-c, the blue curve indicates the training loss for all the methods and the orange curve describes the validation loss.In Figure 8, the x-axis indicates the number of epochs, and the y-axis represents the loss.Table 2 shows the training accuracy achieved using the LSTM, Bi-LSTM, and GRU networks.
(a) In Figure 8a-c, the blue curve indicates the training loss for all the methods and the orange curve describes the validation loss.In Figure 8, the x-axis indicates the number of epochs, and the y-axis represents the loss.Table 2 shows the training accuracy achieved using the LSTM, Bi-LSTM, and GRU networks.In Figure 8a-c, the blue curve indicates the training loss for all the methods and the orange curve describes the validation loss.In Figure 8, the x-axis indicates the number of epochs, and the y-axis represents the loss.Table 2 shows the training accuracy achieved using the LSTM, Bi-LSTM, and GRU networks.

Comparative Analysis
In the literature, several methods have been proposed for URL detection.A feedforward neural network was employed to classify URL as legitimate or malicious.One study [38] proposed a model to detect URL using a feed-forward neural network.The malicious URL dataset used contained 48,006 legitimate website URLs.The trained model exhibited an accuracy of 97%.They performed feature extraction, which reduced 16 features to 2 features.Once the model was trained, to make it easy to test the new link, they deployed the web app using a Python framework named Flask.The user can place the URL, and the model can classify whether the URL is malicious or legitimate.
Future Internet 2022, 14, x FOR PEER REVIEW 14 of 16 study [38] proposed a model to detect URL using a feed-forward neural network.The malicious URL dataset used contained 48,006 legitimate website URLs.The trained model exhibited an accuracy of 97%.They performed feature extraction, which reduced 16 features to 2 features.Once the model was trained, to make it easy to test the new link, they deployed the web app using a Python framework named Flask.The user can place the URL, and the model can classify whether the URL is malicious or legitimate.Figure 9 shows a comparison of the accuracy of the different algorithms, and Table 3 shows extended information about precision, recall, and F1 score.In this figure, we have compared the performance of other existing algorithms such as logistic regression (LR) [32], XGBoost (XGB) [34], multinomial naive Bayes (MNB) [35,36], and k-nearest neighbor (KNN) [37,38].It can be observed from Table 3, that the accuracies obtained using LSTM, Bi-LSTM, GRU, LR, XGBoost, MNB, and KNN were 97%, 99%, 98%, 96%, 85%, 95.7%, and 92.4%, respectively.

Conclusions
Individuals, government organizations, and industries are always subject to phishing attacks.Attackers create a phishing website that imitates a legitimate site to steal personal information.This paper proposed deep learning techniques such as long short-term memory (LSTM), bidirectional long short-term memory (Bi-LSTM), and gated recurrent unit (GRU).The proposed model was tested on URLs public datasets.We used various performance metrics to evaluate the proposed approaches.The experimental results showed that Bi-LSTM produced the best result in all evaluation measures among the three proposed models.The proposed Bi-LSTM model achieved an accuracy of 99.0%.In the future, we would like to use other deep learning algorithms to detect phishing websites using massive, imbalanced datasets.

Figure 1 .
Figure 1.Conceptual architecture of the three proposed models for detecting phishing URLs.

Figure 2 .
Figure 2. Detailed architecture of the long short-term memory (LSTM) model.In Figure 2,   = cell state,  −1 =   ,   = forget gate,   = input gate,   = output gate, ℎ  = hidden state, tanh = hyperbolic tangent activation function,  ̃= cell update, and σ = sigmoid activation function.In the LSTM model, the cell state   is the main chain for the forward data flow.Two steps must be updated: one is from the forget gate   , which decides which information to

Figure 1 .
Figure 1.Conceptual architecture of the three proposed models for detecting phishing URLs.

Future 16 Figure 1 .
Figure 1.Conceptual architecture of the three proposed models for detecting phishing URLs.

Figure 2 .
Figure 2. Detailed architecture of the long short-term memory (LSTM) model.In Figure 2,   = cell state,  −1 =   ,   = forget gate,   = input gate,   = output gate, ℎ  = hidden state, tanh = hyperbolic tangent activation function,  ̃= cell update, and σ = sigmoid activation function.In the LSTM model, the cell state   is the main chain for the forward data flow.Two steps must be updated: one is from the forget gate   , which decides which information to

Figure 2 .
Figure 2. Detailed architecture of the long short-term memory (LSTM) model.In Figure2, C t = cell state, C t−1 = old cell state, f t = forget gate, i t = input gate, O t = output gate, h t = hidden state, tanh = hyperbolic tangent activation function, C t = cell update, and σ = sigmoid activation function.In the LSTM model, the cell state C t is the main chain for the forward data flow.Two steps must be updated: one is from the forget gate f t , which decides which information to keep or forget from C t .Data from h t−1 and information from x t are moved through the σ values that yield a 0 or 1.Here, 1 represents keep, and 0 represents forget.The final

ftAlgorithm 2 :
and A b t , W / = weight matrix, and b / = bias.Bi-LSTM for phishing URL detection Input : URL {x 1 , x 2 , x 3 . . ..xL }, L = length of URL Output : Phishing or legitimate (Y) Step 1 : Begin Step 2 : For t = 1 to L do Compute forward layer output sequence A f t Equation (8) Step 3 : End for Step 4 : For t = L to 1 do Compute backward layer output sequence A b t Equation (9) Step 5 : End for Step 6 : Obtain Y by merging A f t and W b t using sigmoid activation function Step 7 : End Future Internet 2022, 14, x FOR PEER REVIEW 6 of 16 LSTM model, the output layer can obtain information from the past and future states simultaneously.The general architecture of the Bi-LSTM is shown in Figure 3 [32,33].See Algorithm 2.

Figure 3 .
Figure 3. Detailed structure of the bidirectional long short-term memory (Bi-LSTM) model.

Figure 3 .
Figure 3. Detailed structure of the bidirectional long short-term memory (Bi-LSTM) model.

Figure 4 .
Figure 4. Operational diagram representing the functions of the GRU-based RNN model.

Figure 4 .
Figure 4. Operational diagram representing the functions of the GRU-based RNN model.

Figure 5 .
Figure 5. Length of malicious and legitimate URLs.

Figure 5 .
Figure 5. Length of malicious and legitimate URLs.

Figure
Figure 7a-c shows the training and validation accuracy of the proposed neural networks.The orange curve represents the validation accuracy, and the blue curve represents the training accuracy of the models.The training and validation loss of each proposed LSTM, Bi-LSTM, and GRU model are shown in Figure 7a-c, respectively.

Figure
Figure 7a-c shows the training and validation accuracy of the proposed neural networks.The orange curve represents the validation accuracy, and the blue curve represents the training accuracy of the models.The training and validation loss of each proposed LSTM, Bi-LSTM, and GRU model are shown in Figure 7a-c, respectively.The blue curves in Figure 7a-c indicate the training accuracy and the orange curves describe the validation accuracy.

Figure 7 .
Figure 7. (a) Training and validation accuracy for LSTM model.(b).Training and validation accuracy for Bi-LSTM model.(c).Training and validation accuracy for GRN-based RNN algorithm.

FutureFigure 7 .
Figure 7. (a) Training and validation accuracy for LSTM model.(b).Training and validation accuracy for Bi-LSTM model.(c).Training and validation accuracy for GRN-based RNN algorithm.The blue curves in Figure 7a-c indicate the training accuracy and the orange curves describe the validation accuracy.In Figure8a-c, the blue curve indicates the training loss for all the methods and the orange curve describes the validation loss.In Figure8, the x-axis indicates the number of epochs, and the y-axis represents the loss.Table2shows the training accuracy achieved using the LSTM, Bi-LSTM, and GRU networks.

Figure 8 .
Figure 8.(a) Training and validation loss for LSTM model.(b) Training and validation loss for Bi-LSTM model.(c) Training and validation loss for GRN-based RNN model.

Figure 9 .
Figure 9. Accuracy comparison of the proposed models and other models.

Figure 9 .
Figure 9. Accuracy comparison of the proposed models and other models.
Calculate an input gate value i t Equation (3) Step 8 : Calculate the candidate key ( C t ) of another weight matrix in candidate memory cell Equation (4) Step 9 : Calculate the cell state C t Equation (5) Step 10 : Compute the value of the output cell state O t and multiply i t by tanh of C and store it in h t Equations (6) and (7) Step 11 : return (h t , C t ) Step 12 : End for Step 13 : Y = SoftMax (h 1 ,h 2 . . .h L ) Step 14 : End 3.2.Proposed Model II: Bidirectional Long Short-Term Memory (Bi-LSTM) (1)orithm 1: Regular LSTM for phishing URL detection Input : URL {x 1 , x 2 , x 3 ....xL }, L = length of URL Output : Phishing or legitimate (Y) Step 1 : BeginStep 2 : For t = 1 to L do Step 3 : calculate the value of f t using Equation(1) (12)Step 3 : h t−1 and x t merge and pass through a sigmoid function and the result stored in r t , Equation(11).Step 4 : Compute z t from h t−1 and x t (different biases and weights Equation(12)Step 5 : Compute h t by combining the new input x t with the reset h t−1 and pass their output through a tanh function Equation (13) Step 6 : Subtract z t from a vector containing all 1s and multiply it with the previous hidden state.Step 7 : The output from Step 4 z t multiplied with the output from Step 5 h t .Step 8 : Combine the output from Step 6 with the output from Step 7 and store in h t h t = (1 − z t ) * h t−1 + z t * h t (14)where r t = reset gate, z t = update gate, h t = intermediate memory, and h t = output.Algorithm 3: GRU-based RNN for phishing URL detection Input : URL {x 1 , x 2 , x 3 . . ..xL }, L = length of URL Output : Phishing or legitimate (Y) (Y = 0: Legitimate, Y = 1: Malicious) Step 1 : Begin Step 2 : For each URL x 1 ..x

Table 1 .
Confusion matrix for malicious and legitimate classes.

Table 2 .
Comparison table for the accuracy of the proposed LSTM, Bi-LSTM, and GRU models.
In the literature, several methods have been proposed for URL detection.A feedforward neural network was employed to classify URL as legitimate or malicious.One

Table 2 .
Comparison table for the accuracy of the proposed LSTM, Bi-LSTM, and GRU models.

Table 3 .
Performance comparison of the proposed model and other machine algorithms used for classifying legitimate and malicious URLs.

Table 3 .
Performance comparison of the proposed model and other machine algorithms used for classifying legitimate and malicious URLs.