A Malicious URL Detection Framework Based on Custom Hybrid Spatial Sequence Attention and Logic Constraint Neural Network

Zhou, Jinyang; Zhang, Kun; Zheng, Bing; Zhou, Yu; Xie, Xin; Jin, Ming; Liu, Xiling

doi:10.3390/sym17070987

Open AccessArticle

A Malicious URL Detection Framework Based on Custom Hybrid Spatial Sequence Attention and Logic Constraint Neural Network

by

Jinyang Zhou

¹,

Kun Zhang

^1,*

,

Bing Zheng

^2,*,

Yu Zhou

^1,*,

Xin Xie

¹,

Ming Jin

³ and

Xiling Liu

¹

School of Information Science and Technology, Hainan Normal University, Haikou 571158, China

²

Hainan Engineering Research Center for Virtual Reality Technology and Systems, Hainan Vocational University of Science and Technology, Haikou 571126, China

³

School of Foreign Languages, Hainan Normal University, Haikou 571158, China

^*

Authors to whom correspondence should be addressed.

Symmetry 2025, 17(7), 987; https://doi.org/10.3390/sym17070987

Submission received: 7 May 2025 / Revised: 13 June 2025 / Accepted: 18 June 2025 / Published: 23 June 2025

(This article belongs to the Section Computer)

Download

Browse Figures

Versions Notes

Abstract

With the rapid development of the Internet, malicious URL detection has emerged as a critical challenge in the field of cyberspace security. Traditional machine-learning techniques and subsequent deep-learning frameworks have shown limitations in handling the complex malicious URL data generated by contemporary phishing attacks. This paper proposes a novel detection framework, HSSLC-CharGRU (Hybrid Spatial–Sequential Attention Logically constrained neural network CharGRU), which balances high efficiency and accuracy while enhancing the generalization capability of detection frameworks. The core of HSSLC-CharGRU is the Gated Recurrent Unit (Gated Recurrent Unit, GRU), integrated with the HSSA (Hybrid Spatial–Sequential Attention, HSSA) module. The HSSLC-CharGRU framework proposed in this paper integrates symmetry concepts into its design. The HSSA module extracts URL sequence features across scales, reflecting multi-scale invariance. The interaction between the GRU and HSSA modules provides functional complementarity and symmetry, enhancing model robustness. In addition, the LCNN module incorporates logical rules and prior constraints to regulate the pattern-learning process during feature extraction, reducing the model’s sensitivity to noise and anomalous patterns. This enhances the structural symmetry of the feature space. Such logical constraints further improve the model’s generalization capability across diverse data distributions and strengthen its stability in handling complex URL patterns. These symmetries boost the model’s generalization across datasets and its adaptability and robustness in complex URL patterns. In the experimental part, HSSLC-CharGRU shows excellent detection accuracy compared with the current character-level malicious URL detection models.

Keywords:

malicious URL detection; neural networks; CharGRU; cross-dataset; generalization capability; robustness

1. Introduction

Malicious URL detection is crucial in cybersecurity. It safeguards user privacy, protects corporate data, and upholds the trustworthiness of online services. As cyberattacks grow more sophisticated, the concealment methods of malicious URLs and their data complexity also become more complicated. This demands continuous innovation and refinement of detection technologies.

In recent years, global malicious URL attacks have surged. Proofpoint’s 2024 Phishing Report [1] reveals that malicious URL attacks remain a major cybersecurity threat. The report indicates that 71% of employees admit to risky online behavior, such as reusing or sharing passwords and clicking on suspicious links. Proofpoint also thwarts around 66 million business email compromise attacks monthly. CertiK’s Hack3D: 2024 Security Report shows that malicious URL attacks caused $1.05 billion in losses in 2024, accounting for half of the annual stolen amount and 39.1% of all cyberattack incidents. Compared to 2023, both the number and impact of these attacks rose significantly in 2024. In the Web 3.0 space, malicious URL attacks were the most damaging, with 296 incidents. Notably, 96% of users are aware of the risks but still engage in these behaviors.

To counter severe cyberattacks via malicious URLs, many experts have conducted targeted research and introduced innovations in malicious URL detection. Shirazi et al. [2] proposed an adversarial autoencoder data synthesis method to boost machine-learning-based phishing detection by generating synthetic data. This method enriches training diversity and addresses data imbalance in real-world phishing datasets, improving the generalizability of detection models. Zhu E [3] developed CCBLA, a lightweight phishing detection model combining CNN, BiLSTM, and attention mechanisms for better efficiency and accuracy. This method enriches training diversity and addresses data imbalance in real-world phishing datasets, improving the generalizability of detection models. Tsai Y D [4] explored ways to develop more generalizable malicious URL detection models by addressing data bias. This method enriches training diversity and addresses data imbalance in real-world phishing datasets, improving the generalizability of detection models. Jeeva S C [5] presented an intelligent phishing URL detection method based on association rule mining to identify malicious URLs via feature association rules. This contribution adds interpretability to detection decisions, making it easier to understand model reasoning. Lee O V [6] described a malicious URL detection system integrating optimization techniques and machine-learning classifiers. By combining algorithmic optimization with detection tasks, this system improves model accuracy while reducing training time. AlEroud A [7] proposed a URL-based phishing attack method using generative adversarial deep neural networks to bypass existing detection and explore system vulnerabilities. This adversarial modeling contributes to the understanding of attack vectors that evade traditional systems. Remya S et al. [8] proposed an effective phishing URL detection method based on residual multi-layer perceptron validated in experiments. This adversarial modeling contributes to the understanding of attack vectors that evade traditional systems. Islam M S et al. [9] introduced WebGuardML, a machine-learning-based malicious URL detection method to protect users. It offers a scalable solution suitable for real-time user protection in web environments. Mia M [10] analyzed the credibility of phishing URL detection features across datasets using explainable AI. This work enhances interpretability by identifying consistent and trustworthy features across varied datasets. Mia M et al. [11] proposed a deep-learning-based multi-agent model for phishing attack detection, improving accuracy and robustness via agent collaboration. It introduces agent cooperation to mitigate single-model weaknesses and increase system resilience. Literature [12] proposed a BERT-based method to identify malicious URLs, overcoming traditional feature extraction and semantic understanding limitations. It demonstrates the strength of contextual embeddings in capturing the nuanced semantics of URL structures. Geyik B [13] used classification techniques in WEKA for phishing URL detection, offering an effective ML solution. This study validates traditional machine-learning frameworks as lightweight and accessible solutions. Taofeek A O [14] developed a new ML-based phishing detection method, solving feature selection and model optimization issues. It focuses on enhancing detection accuracy through refined input feature selection strategies. Liang Y et al. [15] proposed a robust malicious URL detection method based on a self-defined step-width deep learning for accuracy and robustness in complex networks. It focuses on enhancing detection accuracy through refined input feature selection strategies. Wu T et al. [16] presented a malicious URL detection model based on bidirectional gated recurrent units and attention mechanisms to capture long-sequence dependencies and focus on key features. The model captures sequential patterns while selectively addressing critical input segments. Liu R [17] proposed a malicious URL detection method using a pre-trained language model-guided multi-level feature attention network for feature extraction and model generalization. The integration of pre-trained knowledge enhances the model’s ability to generalize across unseen data. Mahdaouy A E et al. [18] proposed DomURLs_BERT, a pre-trained BERT-based model for malicious domain and URL detection. It reinforces the utility of domain-specific BERT adaptations in URL analysis. Nowroozi E [19] analyzed adversarial attacks in malicious advertising URL detection frameworks to enhance robustness. This study provides practical insight into the vulnerabilities of URL detectors against adversarial inputs. Le H [20] proposed URLNet, a deep-learning-based URL representation learning method for malicious URL detection. By learning URL embeddings directly, this work advances the feature abstraction process for phishing classification. Literature [21] proposed a fast and accurate phishing detector based on CNN. This model emphasizes real-time performance while maintaining high accuracy through convolutional feature extraction. Afzal S et al. [22] proposed Urldeepdetect, a deep-learning method using semantic vector models to overcome traditional feature engineering limitations. It effectively replaces manual feature crafting by learning semantic representations of URLs end-to-end. Wang H [23] proposed a bi-LSTM malicious web page detection algorithm combining CNN and independent recursive neural networks. The hybrid structure captures both spatial and temporal patterns in malicious web content. Atrees M et al. [24] enhanced malicious URL detection by combining boosting methods and lexical features. This work shows how ensemble techniques and handcrafted lexical inputs can complement each other for robust detection. Vanhoenshoven F [25] explored ML-based malicious URL detection methods to adapt to changing URLs. This work shows how ensemble techniques and handcrafted lexical inputs can complement each other for robust detection. Smadi S et al. [26] proposed a dynamic evolving neural network based on reinforcement learning to detect phishing emails. This work shows how ensemble techniques and handcrafted lexical inputs can complement each other for robust detection. Literature [27] proposed a federated-learning-based malicious URL detection method to ease data privacy and security pressures. It enables collaborative model training without centralizing sensitive data, improving privacy-aware detection. Tong X et al. [28] proposed MM-ConvBERT-LMS, integrating multi-modal learning and pre-trained models for malicious web page detection. This comprehensive approach captures diverse signal modalities, enhancing model robustness in complex detection scenarios. Sabir B et al. [29] analyzed the reliability and robustness of ML-based phishing URL detectors in enhancing defense against adversarial attacks. It offers a benchmark for evaluating model stability under adversarial conditions, guiding future resilience improvements.

Currently, deep-learning-based malicious URL detection has demonstrated promising performance. However, challenges remain when dealing with complex and highly dynamic data patterns. On the one hand, attackers continuously evolve URL construction techniques—such as dynamic domain generation, feature obfuscation, cross-domain redirection, and context-aware evasion—making it difficult for traditionally static models to comprehensively capture and generalize these variations. On the other hand, existing models often lack the ability to effectively integrate contextual dependencies and multi-modal features (such as URL text and associated metadata), which leads to decreased accuracy and robustness when facing highly variable and stealthy malicious URLs. Moreover, most models lack adaptive updating mechanisms, limiting their ability to respond promptly to emerging threats in dynamic data environments.

To boost the detection efficiency and accuracy of malicious URL attacks and address generalization issues with complex sample data, this paper proposes a malicious URL detection method based on the HSSLC-CharGRU framework. The proposed model integrates HSSA to capture local and global patterns, combines LCNN to strengthen the logical constraints of URL structure, and performs cross-dataset generalization experiments while performing traditional dataset comparison experiments, such that it is highly robust, to obfuscate and evolve malicious URLs.

The key contributions of this study are as follows:

(1): The HSSA module in HSSLC-CharGRU uses multi-scale convolution to capture local features of different scales and models long-range dependencies in sequences via multi-head self-attention. This combination enables the model to process local and global information simultaneously, enhancing its ability to model complex patterns.
(2): HSSLC-CharGRU includes an LCNN to ensure that its output is consistent with the actual business logic. Logic constraint loss enforces hard constraints on the model’s outputs to ensure that they reflect URL length, character distribution, and domain structure characteristics, improving the model’s detection performance in the face of complex malicious URL data.
(3): Cross-dataset generalization tests show that HSSLC-CharGRU has strong generalization and robustness when facing complex malicious URL data across dataset boundaries.

The remainder of this paper is structured as follows: Section 2 presents the related technologies involved in HSSLC-CharGRU and provides an overview of these techniques. Section 3 details the workflow of HSSLC-CharGRU when processing malicious URL data. In Section 4, we conduct comparative experiments and cross-dataset evaluations to provide a more comprehensive assessment of the detection performance of HSSLC-CharGRU on malicious URLs, with the experimental results visually presented. Finally, Section 5 summarizes the work of this paper and discusses the potential application scenarios of HSSLC-CharGRU as well as future research directions.

2. Related Work

Cho et al. [30] proposed an encoder–decoder architecture based on Recurrent Neural Networks (RNNs) to learn phrase-level representations, which significantly enhanced sequence modeling and translation quality in statistical machine translation and laid the groundwork for the subsequent development of neural machine translation (NMT).

Zhang K et al. [31] addressed the common limitations of traditional phishing attack detection methods, which are often delayed and reactive. It proposes a proactive detection framework based on machine learning that learns latent feature patterns from large-scale data. This approach enables the system to identify malicious activities in advance before large-scale phishing campaigns are launched, thereby achieving more forward-looking and real-time phishing attack prevention.

Zhou J et al. [32] to address the limitations of existing malicious URL detection methods in feature extraction and their limited effectiveness in recognizing complex URL patterns, this paper proposes an integrated detection framework that combines CSPPC (Convolutional Spatial Pyramid Pooling Convolution, CSPPC) with a BiLSTM (Bidirectional Long Short-Term Memory, BiLSTM) network. The CSPPC module is employed to extract multi-scale spatial features, while the BiLSTM is used to model the contextual dependencies within the sequential features, thereby significantly improving detection accuracy and enhancing the system’s ability to identify complex and obfuscated malicious URLs.

Currently, most character-level malicious URL detection models are based on variant recurrent neural networks. The advantages of GRU lie in its simple structure and high computational efficiency, with the ability to effectively capture long-term dependencies in sequences while mitigating the vanishing gradient problem, making it well-suited for various time-series tasks. Therefore, this study is further innovative based on GRU.

2.1. Gated Recurrent Unit

The GRU (Gated Recurrent Unit), a variant of the Recurrent Neural Network (RNN), was proposed by Kyunghyun Cho et al. [30]. It is designed to handle long-range dependencies in sequential data. The GRU introduces two key mechanisms: the Update Gate and the Reset Gate. The Reset Gate is computed before the Update Gate, ensuring the candidate’s hidden state generation properly uses the Reset Gate’s output. This controls information flow and improves the model’s ability to handle long sequences. The GRU effectively mitigates the vanishing and exploding gradient problems in traditional RNNs. Compared to LSTM, which uses three gates, the GRU’s two-gate approach for regulating long-term information flow is more streamlined, with fewer parameters and higher computational efficiency. The overall structure of GRU is shown in Figure 1.

In the GRU, the Update Gate determines how much past information to carry forward at time step t and how much information from time steps t − 1 and t to retain. Specifically, the calculation process of the Update Gate is as follows: First, multiply the input vector

x_{t}

at time step t by the weight matrix

ω^{(z)}

to get the linear transformation result of the input. Then, perform a linear transformation on the hidden state

h_{t - 1}

that holds information from time step t − 1. After that, add these two transformed pieces of information together and feed them into a Sigmoid activation function, which restricts the activation result to a value between 0 and 1. The structure of the Update Gate is shown in Figure 2, and its calculation process is given in Equation (1).

z_{t} = σ (ω^{(z)} x_{t} + U^{(z)} h_{t - 1})

(1)

In the GRU, the Reset Gate’s core function is to judge how much past information should be forgotten. Specifically, it first performs linear transformations on the hidden state

h_{t - 1}

from the previous time step and the input

x_{t}

of the current time step separately. Then, it adds these two results together and feeds them into the Sigmoid activation function. The output is an activation value between 0 and 1, which determines the extent to which the information is retained. The structure of the Reset Gate is shown in Figure 3, and its calculation process is given in Equation (2).

In the application of the Reset Gate, new memory content is retained. Specifically, the element-wise product (Hadamard product) of the Reset Gate

r_{t}

and

U^{(r)} h_{t - 1}

is computed. Since the Reset Gate

r_{t}

is a vector of 0 s to 1 s, it measures the retention level of each element. For instance, a gate value of 0 for an element means its information is completely discarded. This element-wise product clarifies which past information to keep and which to forget.

h_{t}^{'} = \tanh (ω x_{t} + r_{t} ⊙ U h_{t - 1})

(2)

In the final memory computation, the Update Gate is crucial as it determines which information from the current memory content

h_{t}^{'}

and the previous time step

h_{t - 1}

needs to be integrated. Specifically, the activation result

z_{t}

of the Update Gate regulates the information flow in a gating manner. The element-wise product (Hadamard product) of

z_{t}

and

h_{t - 1}

represents the information retained from the previous time step to the final memory. The information retained from the current memory content to the final memory is then added to this, resulting in the output of the Gated Recurrent Unit (GRU).

h_{t} = z_{t} ⊙ h_{t - 1} + (1 - z_{t}) ⊙ h_{t}^{'}

(3)

The GRU does not erase past information over time. Instead, it selectively retains and passes on relevant details to the next unit. This allows it to make full use of all relevant information and effectively prevents the vanishing gradient problem.

2.2. Hybrid Spatial–Sequential Attention

Li X et al. [33], inspired by the adaptive RF (Receptive Field, RF) sizes of visual cortex neurons, proposed SKNets (Selective Kernel Networks, SKNets) with SK (Selective Kernel, SK) convolution. This approach uses soft attention for adaptive kernel selection, addressing the issue of insufficient consideration of RF size adaptability in existing CNNs and enhancing object recognition. Ashish Vaswani et al. [34] introduced Multi-Head Attention to solve the problem of single-head attention in Transformers averaging and thus suppressing the model’s ability to focus on different subspace information at different positions. This also alleviates the issue of reduced effective resolution from averaged self-attention weights.

Drawing inspiration from these studies, this paper proposes a Custom HSSA that combines the benefits of both. The HSSA module integrates diverse convolutional kernel feature extraction with advanced self-attention techniques, capturing both local details and global correlations in URL data to enrich feature representation. Its design allows dynamic feature weight adjustment that enhances flexibility and accuracy in complex sequence data recognition. Additionally, HSSA optimizes information integration through multi-source feature fusion, improving generalization and stability in complex classification tasks. The structure of HSSA is illustrated in Figure 4.

The workflow of HSSA is as follows. First, convolution and feature aggregation are performed on the input feature map

X \in R^{B \times C \times H \times W}

. Convolution operations with different kernel sizes K are used to obtain the feature map

F_{k}

. Then, all feature maps generated by the convolutional kernels are weighted and summed to get the aggregated feature map U, as shown in Equations (4) and (5).

F_{k} = Re L U (B a t c h N o r m (C o n v_{k} (X)))

(4)

U = \sum_{k} F_{k}

(5)

Next, the aggregated feature map U is processed with global average pooling and fully connected mapping, yielding the global feature S and the convolutional feature map

O u t_{c o n v}

. H and W represent the height and width of the feature map, respectively, while i and j denote the row and column indices, as shown in Equations (6) and (7).

S = 1 / H \times W \sum_{i = 1}^{H} \sum_{j = 1}^{W} U :, :, i, j

(6)

O u t_{c o n v} = F u l l y c o n n e c t e d m a p p i n g (U)

(7)

The global feature

S

is first mapped to dimension

d

via a fully connected layer. Then, weights

ω_{k}

are generated for each convolutional kernel and normalized to produce weights

ω_{k}

. Finally, the normalized weights are used to perform a weighted fusion with the convolutional feature maps to obtain the final output. The corresponding formulas are shown below, where

γ

represents the learnable parameters.

ω_{k} = D i m e n s i o n A d j u s t m e n t s (F c_{k} (Z))

(8)

ω_{k}^{~} = \exp (ω_{k}) / \sum_{k} \exp (ω_{k})

(9)

O u t = O u t_{c o n v} + γ \cdot ω_{k}^{~}

(10)

2.3. Logically Constrained Neural Network

The LCNN integrates the length, character distribution, and domain structure logic constraints of malicious URLs directly into model training. It combines the forward propagation of multi-layer neural networks with logic loss calculation. The model processes input data through three fully connected layers and uses a logic loss function to ensure outputs meet specific logic constraints. The structure of LCNN is shown in Figure 5.

The network’s output layer uses a multi-task learning framework. The first dimension directly predicts URL length, while the others encode character distribution and domain structure features. These dimensions are optimized by a weighted loss function that balances the importance of different logic constraints. This enables coordinated learning across multiple tasks, ensuring logically consistent outputs and enhancing the model’s generalization and interpretability. Below are the detailed workflow, mathematical formulas, and explanations.

The input data first goes through a fully connected layer Fc1, which maps the input dimension from Input_dim to Hidden_dim. Non-linearity is introduced via the Tanh activation function, and the data is then passed to the second fully connected layer Fc2, which maintains the data dimension as Hidden_dim.

ω^{[n]}

and

b^{[n]}

are the weights and biases for layer n (where n is 1, 2, or 3) in Fc, and X is the input data.

a^{[1]} = \tanh (ω^{[1]} X + b^{[1]})

(11)

The output of the second hidden layer is once more processed by an activation function. The data is fed into the third fully connected layer, Fc3, which maps the data dimension from Hidden_dim to Output_dim, resulting in the model’s final output.

a^{[2]} = \tanh (ω^{[2]} a^{[1]} + b^{[2]})

(12)

a^{[3]} = ω^{[3]} a^{[2]} + b^{[3]}

(13)

The total logical loss is calculated by comparing the model’s output with the actual URL length, character distribution, and domain structure and then summing these differences with weights.

l e n g t h_l o s s = m e a n ({(a_{1}^{[3]} - u r l_l e n g t h s)}^{2})

(14)

c h a r_l o s s = m e a n ({(a_{c h a r . s h a p e [1] + 1}^{[3]} - u r l_c h a r_d i s t r i b u t i o n s)}^{2})

(15)

d o m a i n_s t r u c t u r e_l o s s = m e a n ({(a_{domain . s h a p e [1] + 1}^{[3]} - d o m a i n_s t r u c t u r e)}^{2})

(16)

\log ic_l o s s = α \times l e n g t h_l o s s + β \times c h a r_l o s s + γ \times d o m a i n_s t r u c t u r e_l o s s

(17)

In the model’s output, the dimension

a_{1}^{[3]}

represents the predicted URL length, while

a_{c h a r . s h a p e [1] + 1}^{[3]}

and

a_{domain . s h a p e [1] + 1}^{[3]}

correspond to the character distribution and domain structure features, respectively. The subscript domain refers to a 2D tensor with shape (batch_size, num_features). Therefore, domain.shape [1] indicates the number of features in the second dimension. The expression domain.shape [1] + 1 represents the index offset used to locate the starting position of the domain structure features within the model’s output vector. Similarly, char.shape [1] + 1 refers to the character distribution feature length plus one and is typically used to identify where the domain structure features begin. This indexing strategy helps correctly segment the model’s output into URL length, character distribution, and domain structure components. Weights

α

,

β

, and

γ

are assigned to the URL length, character distribution, and domain structure constraints, respectively. The total logical loss is obtained by calculating the weighted sum of the differences between the model’s output and the actual URL length, character distribution, and domain structure.

3. Methodology

In this study, we developed a novel hybrid model called HSSLC-CharGRU. It combines CharGRU, HSSA, and LCNN to enhance the analysis and prediction of complex malicious URL sequence data. The structure of HSSLC-CharGRU is shown in Figure 6. The processing workflow for malicious URL sequence data in this model is as follows (Table 1) (Some of the operators have already been introduced in Section 2 and will not be elaborated here):

In the initial stage, the input malicious URL sequence data is transformed into high-dimensional vectors through a character embedding layer.

e_{t}

is the output after the URL data is processed by the CharEmbedding layer.

e_{t} = C h a r E m b e d d i n g (I n p u t)

(18)

Then, these vectors are fed into a GRU to identify character dependencies and extract comprehensive sequence features. Among them,

z_{t}

is the output of

e_{t}

after going through the Update Gate in the GRU.

h_{t}^{'}

is the output of

e_{t}

after going through the Reset Gate in the GRU.

h_{t}

is the output of

e_{t}

after going through the entire GRU module.

\{\begin{matrix} z_{t} = σ (ω^{(z)} e_{t} + U^{(z)} h_{t - 1}) \\ h_{t}^{'} = \tanh (ω e_{t} + r_{t} ⊙ U h_{t - 1}) \end{matrix}

(19)

h_{t} = z_{t} ⊙ h_{t - 1} + (1 - z_{t}) ⊙ h_{t}^{'}

(20)

To boost feature expressiveness, the model integrates an HSSA module, which combines convolution and self-attention mechanisms. This module performs weighted fusion on the GRU output to highlight key features and filter out less important information.

U = \sum_{k} Re L U (B a t c h N o r m (C o n v_{k} (h_{t})))

(21)

\{\begin{matrix} O u t_{c o n v} = F u l l y c o n n e c t e d m a p p i n g (U) \\ ω_{k}^{~} = D i m e n s i o n A d j u s t m e n t s (G l o b a l a v e r a g e p o l i n g (U)) \end{matrix}

(22)

H S S A_{o u t} = O u t_{c o n v} + γ \cdot ω_{k}^{~}

(23)

Subsequently, the HSSA-processed features are fed into an LCNN module, which consists of three fully connected layers. The LCNN identifies complex feature relationships and generates final predictions. To ensure logical outputs, the model includes a logical loss calculation component. This component constrains the outputs based on preset criteria (URL length, character distribution, and domain structure) and optimizes multiple constraints via a weighted loss function. Here, V represents the output of the input data after being processed by the hidden layers, which consists of three fully connected layers and two Tanh activation operations. ULL denotes the URL length loss, LCD refers to the loss of character distribution, and LDN represents the loss of domain name structure.

V = \sum_{i = 1}^{n} [Tanh [{\sum_{i = 1}^{n} [Tanh [{\sum_{i = 1}^{n} H S S A_{o u t}}_{i}^{[1]}]]}_{i}^{[2]} {]]}_{i}^{[3]}

(24)

O u t p u t = \{\begin{matrix} U L L (V) \\ L C D (V) \\ L D N (V) \end{matrix}

(25)

Finally, the model’s predictions are adjusted through an additional, fully connected layer to produce the final results. If extra logical constraints (URL length, character distribution, and domain structure) are provided, the model calculates the logical loss and uses it during training. This encourages the model to learn and generate logically consistent predictions. This design enables the HSSLC-CharGRU model to understand data’s inherent structure while ensuring prediction accuracy and interpretability, making it excel in analyzing complex sequence data.

4. Experimental Results and Analysis

In this chapter, we conduct and report on experiments to evaluate the performance of the HSSLC-CharGRU model comprehensively. This includes both comparative experiments and cross-dataset generalization tests. We detail the datasets used, the deep-learning models selected for comparison, the experimental settings, and the metrics employed for evaluation. Finally, we present and analyze the experimental results.

4.1. Datasets

To evaluate the HSSLC-CharGRU model’s performance, we use the following three datasets: the large Grambedding [35], the small PhishCrawl [36], and a cross-dataset setup where we use the training set from Grambedding and the test set from PhishCrawl, partitioned at a 1/4 training–testing ratio.

In Grambedding, there are 960,083 samples in total, with 480,088 benign (“legitimate”) and 479,995 malicious (“malicious”) URLs. The training set has 480,039 samples (240,001 malicious and 240,038 legitimate), and the test set has 160,014 samples (79,998 malicious and 80,016 legitimate).

PhishCrawl contains 101,833 samples, including 55,388 legitimate and 46,445 malicious URLs. Its training set has 81,464 samples (37,156 malicious and 44,308 legitimate), and the test set has 20,369 samples (9289 malicious and 11,080 legitimate). The cross-dataset setup has 101,852 samples, with 50,015 legitimate and 51,837 malicious URLs. The training set has 81,483 samples (40,726 malicious and 40,757 legitimate), and the test set has 20,369 samples (9289 malicious and 11,080 legitimate). Figure 7, Figure 8 and Figure 9 illustrate the URL data distributions in the Grambedding, PhishCrawl, and cross-dataset datasets, respectively.

4.2. Compare Models and Evaluation Metrics

To highlight HSSLC-CharGRU’s performance, we selected four advanced characters-level neural network models and 12 evaluation metrics. The selected models and evaluation metrics are as follows:

CharGRU: A character-level GRU network.
CharLSTM: A character-level LSTM (Long Short-Term Memory, LSTM) network.
CharCNN: A character-level CNN (Convolutional Neural Networks, CNN) that uses convolutional kernels of sizes 3, 4, and 5, followed by a dropout layer and a fully connected layer.
CharCNNBiLSTM: This model combines a character-level CNN with a BiLSTM network. Features are extracted from the embedding layer using the CNN and then passed to the BiLSTM for further processing.

Accuracy = \frac{TP + TN}{TP + TN + FP + FN}

(26)

Accuracy represents the proportion of correctly predicted samples to the total number of samples. It reflects overall prediction accuracy but may be inaccurate when classes are imbalanced.

P r e c i s i o n = \frac{T P}{T P + F P}

(27)

Precision represents the proportion of true positives among the samples predicted as positive. Reflects the reliability of the model in predicting positive classes. High accuracy means fewer false positives.

R e c a l l = \frac{T P}{T P + F N}

(28)

Recall represents the proportion of actual positive samples correctly predicted as positive. Reflecting the model’s ability to capture positive classes, high recall means fewer false negatives.

F 1 S c o r e = 2 \cdot \frac{P r e c i s i o n \cdot R e c a l l}{P r e c i s i o n + Re c a l L}

(29)

The F1 score is the harmonic mean of precision and recall, providing a comprehensive evaluation of model performance. Reflects a balance between precision and recall and is suitable for class imbalance situations.

F 1_m i c r o = \frac{2 \cdot T P}{2 \cdot T P + F P + F N}

(30)

Micro F1 Score (F1_mic) is used to assess data with class imbalance. It reflects overall performance and is suitable for class-balanced datasets.

F 1_w t e d = \sum_{i = 1}^{n} \frac{s_{i}}{S} \cdot F 1_{i}

(31)

Weighted F1 Score (F1_wted) is suitable for scenarios involving class imbalance. It can better reflect the model’s performance when the class is imbalanced.

F D R = \frac{F P}{T P + F P}

(32)

False Discovery Rate (FDR) represents the proportion of false positives among the samples predicted as positive. Reflecting the proportion of false positives, a low FDR means the model is more reliable in predicting positive classes.

F N R = \frac{F N}{T P + F N}

(33)

False Negative Rate (FNR) represents the proportion of actual positive samples misclassified as negative. Reflecting the proportion of false negatives, low FNR means that the model has a strong ability to capture positive classes.

F P R = \frac{F P}{F P + T N}

(34)

False Positive Rate (FPR) represents the proportion of actual negative samples misclassified as positive. Reflecting the proportion of false positives, low FPR means that the model has a strong ability to identify negative classes.

D E = (\frac{T P}{T P + F N}) \times (\frac{T N}{T N + F P})

(35)

Diagnostic Efficiency (DE) represents the model’s overall ability to correctly identify both positive and negative samples; high DE means that the model’s overall performance is better.

N P V = \frac{T N}{T N + F N}

(36)

Negative Predictive Value (NPV) represents the proportion of true negatives among the samples predicted as negative. Reflecting the reliability of the model in predicting negative classes, high NPV means fewer false positives.

S p e c i f i c i t y = \frac{T N}{F P + T N}

(37)

Specificity (SPC) represents the proportion of actual negative samples correctly predicted as negative. Reflecting the model’s ability to identify negative classes, high SPC means fewer false positives.

4.3. Experimental Setup and Results

We used the PyTorch framework (version cuda = 12.4) with Python 3.8 for all experiments. These were conducted on a server equipped with a 12th Gen Intel(R) Core (TM) i7-12700KF CPU and an NVIDIA GeForce RTX 4070 GPU. For all models, the number of epochs, batch size, and learning rate are set to 20, 256, and 0.001, respectively.

4.3.1. Comparative Experiments and Results

To demonstrate HSSLC-CharGRU’s malicious URL detection capabilities, we conducted comparative experiments using the Grambedding and PhishCrawl datasets. We divided the 12 evaluation metrics into two categories: major metrics (higher values indicate better performance) and minor metrics (lower values indicate better performance). FDR, FNR, and FPR are minor metrics, while the rest are major metrics. Detailed experimental data is shown in Table 2. To better visualize the results, we created the following plots: Figure 10 for results on the Grambedding dataset and Figure 11 for results on the PhishCrawl dataset.

As shown in Table 2, Figure 10 and Figure 11, the HSSLC-CharGRU model delivers exceptional performance on both datasets. As shown in Table 2, Figure 10 and Figure 11, the HSSLC-CharGRU model provides excellent performance on both datasets. The blue bars in Figure 10 and Figure 11 show a visualization of the performance of the HSSLC-CharGRU model.

On Grambedding, it achieves an accuracy and F1 score of 97.79%, with a balanced F1_micro of 97.92%, FDR of 2.07%, and FNR/FPR of 2.075%. This indicates strong adaptability to varying malicious URL data distributions. Compared to CharGRU, the proposed model achieves a 0.47% improvement in accuracy, while both the False Discovery Rate (FDR) and False Negative Rate (FNR) are reduced by approximately 0.4%.

On PhishCrawl, despite a slightly lower accuracy of 97.53%, the model attains an F1 score of 97.73%, matching its F1_wted score. With a low FPR of 2.359%, FDR of 2.194%, and SPC of 97.80%, it demonstrates effective control over misclassification costs in complex data.

Overall, HSSLC-CharGRU’s character-level modeling and dynamic weighting (evident in its F1_wted score) enable robust performance across different data distributions. It shows high stability on the precision-demanding Grambedding dataset and excellent practical value on the potentially class-imbalanced PhishCrawl dataset through optimized F1 metrics.

4.3.2. Generalization Experiments Across Datasets

In this section, we conduct cross-dataset generalization experiments using the Grambedding training set and the PhishCrawl test set. The experimental setup is similar to that in Section 4.3.1. Table 3 presents the results, and Figure 12 visualizes them.

Results on the cross-dataset show HSSLC-CharGRU’s superiority. Its accuracy of 85.58% outperforms models like CharGRU (80.70%), CharLSTM (83.69%), CharCNN (75.61%), and CharCNNBiLSTM (75.35%). Precision is 86.06%, higher than other models. Recall and F1 scores are 85.58% and 85.61%, showing stable predictive power. F1_micro and F1_wted are 85.88% and 85.91%, proving robustness in the imbalanced data.

Notably, FPR is only 1.375%, and SPC is 86.24%, indicating excellent misclassification control, especially in reducing false positives in complex malicious URL scenarios, thus enhancing practicality.

Overall, HSSLC-CharGRU’s character-level modeling and dynamic weighting ensure balanced performance across different data distributions, demonstrating strong generalization and adaptability in cross-dataset experiments.

5. Conclusions

This article introduces HSSLC-CharGRU, a novel malicious URL detection framework that integrates a custom Mixed Spatial Sequential Attention (HSSA) module and a Logically Constrained Neural Network (LCNN) with a character-level recursive neural network variant (CharGRU). The HSSA module captures local and global dependencies in character-level URL sequences, enabling comprehensive and hierarchical feature extraction. The LCNN module introduces domain-specific logical constraints based on URL characteristics such as length patterns, character combinations, and domain structures, enhancing the interpretability of the model and reducing its sensitivity to adversarial patterns. Extensive experimental evaluations have shown that HSSLC-CharGRU consistently outperforms other State-of-the-Art character-level neural network models in terms of detection efficiency and accuracy. The model exhibits superior generalization and robustness, as evidenced by its performance in cross-dataset experiments, where it maintains more than 85% detection accuracy in datasets with different URL distributions and attack types. Practical applications of the HSSLC-CharGRU will also be explored.

Possible use cases include integration into enterprise-grade security systems for real-time malicious URL detection, deployment in browsers and network security software to protect end users, and support for regulators for large-scale threat monitoring and governance. In addition, the versatility of the framework will be expanded by applying it to other areas that require robust sequence-based anomaly detection, such as API call sequence analysis, email phishing detection, and IoT device traffic monitoring.

However, in the cross-data test experiment, HSSLC-CharGRU performed relatively well. However, there are still false positives in the detection accuracy of around 85%, which will be the focus of future research work. In future work, we aim to further optimize the model architecture and training strategy to improve performance across datasets. Possible directions include integrating more advanced attention mechanisms, exploring contrastive learning to improve the representation of rare malicious patterns, and leveraging transfer learning to adapt frameworks to unseen or evolving attack types.

Author Contributions

J.Z. and K.Z. wrote the body of the manuscript, and B.Z., Y.Z., X.X., M.J. and X.L. reviewed the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the following: the Hainan Province Science and Technology Special Fund (Fund No. ZDYF2024GXJS034); Hainan Engineering Research Center for Virtual Reality Technology and Systems (Fund No. Qiong fa Gai gao ji [2023] 818); the Innovation Platform for Academicians of Hainan Province (Fund No. YSPTZX202036); the Education Department of Hainan Province (Fund No.Hnky2024ZD-24); the Sanya Science and Technology Special Fund (Fund No. 2022KJCX30).

Data Availability Statement

This article deals with PhishCrawl and Grambedding. The two datasets can be obtained from the following two links: https://www.sciencedirect.com/science/article/pii/S0167739X24003315 and https://web.cs.hacettepe.edu.tr/~selman/grambeddings-dataset/ (accessed on 24 March 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Proofpoint. State of the Phish 2024: Today’s Cyber Threats and Phishing Protection; Proofpoint Inc.: Sunnyvale, CA, USA, 2024. [Google Scholar]
Shirazi, H.; Muramudalige, S.R.; Ray, I.; Jayasumana, A.P.; Wang, H. Adversarial autoencoder data synthesis for enhancing machine learning-based phishing detection algorithms. IEEE Trans. Serv. Comput. 2023, 16, 2411–2422. [Google Scholar] [CrossRef]
Zhu, E.; Yuan, Q.; Chen, Z.; Li, X.; Fang, X. CCBLA: A lightweight phishing detection model based on CNN, BiLSTM, and attention mechanism. Cogn. Comput. 2023, 15, 1320–1333. [Google Scholar] [CrossRef]
Tsai, Y.D.; Liow, C.; Siang, Y.S.; Lin, S.D. Toward more generalized malicious url detection models. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 24 February 2024; Volume 38, pp. 21628–21636. [Google Scholar]
Jeeva, S.C.; Rajsingh, E.B. Intelligent phishing url detection using association rule mining. Hum.-Centric Comput. Inf. Sci. 2016, 6, 1–19. [Google Scholar] [CrossRef]
Lee, O.V.; Heryanto, A.; Ab Razak, M.F.; Raffei, A.F.M.; Phon, D.N.E.; Kasim, S.; Sutikno, T. A malicious URLs detection system using optimization and machine learning classifiers. Indones. J. Electr. Eng. Comput. Sci. 2020, 17, 1210–1214. [Google Scholar] [CrossRef]
AlEroud, A.; Karabatis, G. Bypassing detection of URL-based phishing attacks using generative adversarial deep neural networks. In Proceedings of the Sixth International Workshop on Security and Privacy Analytics, New Orleans, LA, USA, 18 March 2020; pp. 53–60. [Google Scholar]
Remya, S.; Pillai, M.J.; Nair, K.K.; Subbareddy, S.R.; Cho, Y.Y. An Effective Detection Approach for Phishing URL Using ResMLP. IEEE Access 2024, 12, 79367–79382. [Google Scholar] [CrossRef]
Islam, M.S.; Rahman, N.; Naeem, J.; Al Mamun, A.; Akter, F.; Jahan, S.; Rahman, S.; Yasar, S.; Omi, M.M.H. WebGuardML: Safeguarding Users with Malicious URL Detection Using Machine Learning. In Proceedings of the 26th International Conference on Computer and Information Technology (ICCIT), Cox’s Bazar, Bangladesh, 13–15 December 2023; IEEE: New York, NY, USA, 2023; pp. 1–6. [Google Scholar]
Mia, M.; Derakhshan, D.; Pritom, M.M.A. Can Features for Phishing URL Detection Be Trusted Across Diverse Datasets? A Case Study with Explainable AI. In Proceedings of the 11th International Conference on Networking, Systems, and Security, Khulna, Bangladesh, 19–21 December 2024; pp. 137–145. [Google Scholar]
Kaushik, P.; Rathore, S.P.S. Deep Learning Multi-Agent Model for Phishing Cyber-attack Detection. Int. J. Recent Innov. Trends Comput. Commun. 2023, 11, 680–686. [Google Scholar] [CrossRef]
Su, M.Y.; Su, K.L. BERT-Based Approaches to Identifying Malicious URLs. Sensors 2023, 23, 8499. [Google Scholar] [CrossRef]
Geyik, B.; Erensoy, K.; Kocyigit, E. Detection of phishing websites from URLs by using classification techniques on WEKA. In Proceedings of the 6th International Conference on Inventive Computation Technologies (ICICT), Coimbatore, India, 20–22 January 2021; IEEE: New York, NY, USA, 2021; pp. 120–125. [Google Scholar]
Taofeek, A.O. Development of a Novel Approach to Phishing Detection Using Machine Learning. ATBU J. Sci. Technol. Educ. 2024, 12, 336–351. [Google Scholar]
Liang, Y.; Wang, Q.; Xiong, K.; Zheng, X.; Yu, Z.; Zeng, D. Robust detection of malicious URLs with self-paced wide & deep learning. IEEE Trans. Dependable Secur. Comput. 2021, 19, 717–730. [Google Scholar]
Wu, T.; Wang, M.; Xi, Y.; Zhao, Z. Malicious URL detection model based on bidirectional gated recurrent unit and attention mechanism. Appl. Sci. 2022, 12, 12367. [Google Scholar] [CrossRef]
Liu, R.; Wang, Y.; Xu, H.; Qin, Z.; Liu, Y.; Cao, Z. Malicious url detection via pretrained language model guided multi-level feature attention network. arXiv 2023, arXiv:2311.12372. [Google Scholar]
Mahdaouy, A.E.; Lamsiyah, S.; Idrissi, M.J.; Alami, H.; Yartaoui, Z.; Berrada, I. DomURLs_BERT: Pre-trained BERT-based Model for Malicious Domains and URLs Detection and Classification. arXiv 2024, arXiv:2409.09143. [Google Scholar]
Nowroozi, E.; Mohammadi, M.; Conti, M. An adversarial attack analysis on malicious advertisement URL detection framework. IEEE Trans. Netw. Serv. Manag. 2022, 20, 1332–1344. [Google Scholar] [CrossRef]
Le, H.; Pham, Q.; Sahoo, D.; Hoi, S.C. URLNet: Learning a URL representation with deep learning for malicious URL detection. arXiv 2018, arXiv:1802.03162. [Google Scholar]
Wei, W.; Ke, Q.; Nowak, J.; Korytkowski, M.; Scherer, R.; Woźniak, M. Accurate and fast URL phishing detector: A convolutional neural network approach. Comput. Netw. 2020, 178, 107275. [Google Scholar] [CrossRef]
Afzal, S.; Asim, M.; Javed, A.R.; Beg, M.O.; Baker, T. Urldeepdetect: A deep learning approach for detecting malicious urls using semantic vector models. J. Netw. Syst. Manag. 2021, 29, 1–27. [Google Scholar] [CrossRef]
Wang, H.H.; Yu, L.; Tian, S.W.; Peng, Y.F.; Pei, X.J. Bidirectional LSTM Malicious webpages detection algorithm based on convolutional neural network and independent recurrent neural network. Appl. Intell. 2019, 49, 3016–3026. [Google Scholar] [CrossRef]
Atrees, M.; Ahmad, A.; Alghanim, F. Enhancing Detection of Malicious URLs Using Boosting and Lexical Features. Intell. Autom. Soft Comput. 2022, 31, 1405. [Google Scholar] [CrossRef]
Vanhoenshoven, F.; Nápoles, G.; Falcon, R.; Vanhoof, K.; Köppen, M. Detecting malicious URLs using machine learning techniques. In Proceedings of the 2016 IEEE Symposium Series on Computational Intelligence (SSCI), Athens, Greece, 6–9 December 2016; IEEE: New York, NY, USA, 2016; pp. 1–8. [Google Scholar]
Smadi, S.; Aslam, N.; Zhang, L. Detection of online phishing email using dynamic evolving neural network based on reinforcement learning. Decis. Support Syst. 2018, 107, 88–102. [Google Scholar] [CrossRef]
Khramtsova, E.; Hammerschmidt, C.; Lagraa, S.; State, R. Federated learning for cyber security: SOC collaboration for malicious URL detection. In Proceedings of the 2020 IEEE 40th Internastional Conference on Distributed Computing Systems (ICDCS), Singapore, 29 November–1 December 2020; IEEE: New York, NY, USA, 2020; pp. 1316–1321. [Google Scholar]
Tong, X.; Jin, B.; Wang, J.; Yang, Y.; Suo, Q.; Wu, Y. MM-ConvBERT-LMS: Detecting malicious web pages via multi-modal learning and pre-trained model. Appl. Sci. 2023, 13, 3327. [Google Scholar] [CrossRef]
Sabir, B.; Babar, M.A.; Gaire, R.; Abuadbba, A. Reliability and robustness analysis of machine learning based phishing URL detectors. IEEE Trans. Dependable Secur. Comput. 2022; early access. [Google Scholar]
Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar]
Zhang, K.; Wang, H.; Chen, M.; Chen, X.; Liu, L.; Geng, Q.; Zhou, Y. Leveraging machine learning to proactively identify phishing campaigns before they strike. J. Big Data. 2025, 12, 124. [Google Scholar] [CrossRef]
Zhou, J.; Zhang, K.; Bilal, A.; Zhou, Y.; Fan, Y.; Pan, W.; Peng, Q. An integrated CSPPC and BiLSTM framework for malicious URL detection. Sci. Rep. 2025, 15, 6659. [Google Scholar] [CrossRef]
Li, X.; Wang, W.; Hu, X.; Yang, J. Selective kernel networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 510–519. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
Bozkir, A.S.; Dalgic, F.C.; Aydos, M. GramBeddings: A new neural network for URL based identification of phishing web pages through n-gram embeddings. Comput. Secur. 2023, 124, 102964. [Google Scholar] [CrossRef]
Do, N.Q.; Selamat, A.; Fujita, H.; Krejcar, O. An integrated model based on deep learning classifiers and pre-trained transformer for phishing URL detection. Future Gener. Comput. Syst. 2024, 161, 269–285. [Google Scholar] [CrossRef]

Figure 1. Structure of a GRU.

Figure 2. Structure of the Update Gate (in the red box).

Figure 3. Structure of the Reset Gate (in the red box).

Figure 4. Structure of the HSSA.

Figure 5. Structure of the LCNN.

Figure 6. Structure of the HSSLC-CharGRU.

Figure 7. Distribution of the Grambedding URL Data.

Figure 8. Distribution of the PhishCrawl URL Data.

Figure 9. Distribution of the Cross-Dataset URL Data.

Figure 10. The results of the comparative Grambedding dataset test show that the big indicator is on the top, and the small indicator is on the bottom.

Figure 11. The results of the comparative PhishCrawl dataset test show that the big indicator is on the top, and the small indicator is on the bottom.

Figure 12. The results of the comparative cross-dataset test show that the big indicator is at the top, and the small indicator is at the bottom.

Table 1. HSSLC-CharGRU work process.

Step 1 HSSLC-CharGRU uses the model to analyze malicious URL
Input: Input sequence data x, optional additional information (url_lengths, url_char_distributions, domain_structures)
Output: Model output O (The result of the prediction of the URL)
1: function HSSLC_CharGRU (vocab_size, output_size, CharEmbedding _dim, hidden_dim, n_layers, drop_prob)
2: CharEmbedding <- nn. CharEmbedding (vocab_size, embedding_dim)
3: GRU <- nn.GRU (CharEmbedding _dim, hidden_dim, n_layers, dropout=drop_prob, batch_first=True, bidirectional = False)
4: dropout <- nn.Dropout(drop_prob)
5: fc_input_size <- hidden_dim * 2
6: fc <- nn.Linear(fc_input_size, output_size)
7: Hybrid Spatial-Sequential Attention <- HSSA (channel=fc_input_size, embed_dim=fc_input_size)
8: LCNN <- LCNN (input_dim=fc_input_size, hidden_dim=hidden_dim, output_dim=fc_input_size)
9: return Module (CharEmbedding, GRU, dropout, fc, Hybrid Spatial-Sequential Attention, LCNN)
10: function forward (x, url_lengths=None, url_char_distributions=None, domain_structures=None)
11: device <- x.device
12: E <- CharEmbedding (x).to (device) // [batch, seq_len, CharEmbedding _dim]
13: B, _ <- GRU(E) // [batch, seq_len, hidden_dim * 2]
14: B <- B.permute(0, 2, 1).unsqueeze(−1) // [batch, hidden_dim * 2, seq_len, 1]
15: H <- Hybrid Spatial-Sequential Attention (B) // [batch, hidden_dim * 2, seq_len, 1]
16: H <- H.squeeze(−1).permute(0, 2, 1) // [batch, seq_len, hidden_dim * 2]
17: P <- LCNN (H) // [batch, hidden_dim * 2, seq_len]
18: O <- dropout(P)
19: O <- fc O[:, −1, :]) // [batch, output_size]
20: if url_lengths is not None and url_char_distributions is not None and domain_structures is not None
21: logic_loss <- LCNN.compute_logic_loss (O, url_lengths, url_char_distributions, domain_structures)
22: return O, logic_loss
23: return O

Table 2. Presents the results of malicious URL detection. Metrics with the best performance are highlighted in bold, and all performance metrics are expressed as percentages.

Dataset	Name	Accuracy	Precision	Recall	F1Score	F1_Micro	F1_Wted	FDR	FNR	FPR	DE	NPV	SPC
Grambedding	CharGRU	97.52	97.52	97.52	97.52	97.53	97.52	2.460	2.461	2.461	95.13	97.53	97.53
	CharLSTM	97.57	97.58	97.57	97.57	97.57	97.57	2.407	2.421	2.421	95.20	97.59	97.57
	CharCNN	96.64	96.65	96.64	96.64	97.14	97.14	2.846	2.854	2.854	94.36	97.15	97.14
	CharCNNBiLSTM	96.78	96.79	96.78	96.78	96.83	96.83	3.157	3.166	3.166	93.76	96.84	96.83
	HSSLC-CharGRU	97.79	97.79	97.79	97.79	97.92	97.92	2.070	2.075	2.075	95.89	97.92	97.92
PhishCrawl	CharGRU	97.50	97.50	97.50	97.49	97.50	97.50	2.392	2.617	2.617	94.81	97.60	97.38
	CharLSTM	97.17	97.17	97.17	97.17	97.24	97.24	2.725	2.834	2.834	94.40	97.27	97.16
	CharCNN	96.96	96.96	96.96	96.96	97.51	97.51	2.464	2.542	2.542	94.97	97.53	97.45
	CharCNNBiLSTM	96.74	96.75	96.74	96.74	96.89	96.88	3.056	3.203	3.203	93.68	96.94	96.79
	HSSLC-CharGRU	97.53	97.53	97.53	97.52	97.73	97.73	2.194	2.359	2.359	95.32	97.80	97.64

Table 3. Results of generalization experiments across datasets. The best-performing metrics are highlighted in bold, and all sexual energy degrees are expressed as percentages.

Dataset	Name	Accuracy	Precision	Recall	F1Score	F1_Micro	F1_Wted	FDR	FNR	FPR	DE	NPV	SPC
Cross-dataset	CharGRU	80.70	80.83	80.70	80.57	80.74	80.61	1.901	1.983	1.983	63.83	80.98	80.16
	CharLSTM	83.69	84.22	83.69	83.50	83.76	83.57	1.540	1.699	1.699	68.16	84.59	83.00
	CharCNN	75.61	75.84	75.61	75.32	76.26	75.91	2.310	2.465	2.465	55.66	76.89	75.34
	CharCNNBiLSTM	75.35	75.35	75.35	75.34	75.41	75.41	2.477	2.478	2.478	56.51	75.22	75.21
	HSSLC-CharGRU	85.58	86.06	85.58	85.61	85.88	85.91	1.402	1.375	1.375	74.21	85.97	86.24

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhou, J.; Zhang, K.; Zheng, B.; Zhou, Y.; Xie, X.; Jin, M.; Liu, X. A Malicious URL Detection Framework Based on Custom Hybrid Spatial Sequence Attention and Logic Constraint Neural Network. Symmetry 2025, 17, 987. https://doi.org/10.3390/sym17070987

AMA Style

Zhou J, Zhang K, Zheng B, Zhou Y, Xie X, Jin M, Liu X. A Malicious URL Detection Framework Based on Custom Hybrid Spatial Sequence Attention and Logic Constraint Neural Network. Symmetry. 2025; 17(7):987. https://doi.org/10.3390/sym17070987

Chicago/Turabian Style

Zhou, Jinyang, Kun Zhang, Bing Zheng, Yu Zhou, Xin Xie, Ming Jin, and Xiling Liu. 2025. "A Malicious URL Detection Framework Based on Custom Hybrid Spatial Sequence Attention and Logic Constraint Neural Network" Symmetry 17, no. 7: 987. https://doi.org/10.3390/sym17070987

APA Style

Zhou, J., Zhang, K., Zheng, B., Zhou, Y., Xie, X., Jin, M., & Liu, X. (2025). A Malicious URL Detection Framework Based on Custom Hybrid Spatial Sequence Attention and Logic Constraint Neural Network. Symmetry, 17(7), 987. https://doi.org/10.3390/sym17070987

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Malicious URL Detection Framework Based on Custom Hybrid Spatial Sequence Attention and Logic Constraint Neural Network

Abstract

1. Introduction

2. Related Work

2.1. Gated Recurrent Unit

2.2. Hybrid Spatial–Sequential Attention

2.3. Logically Constrained Neural Network

3. Methodology

4. Experimental Results and Analysis

4.1. Datasets

4.2. Compare Models and Evaluation Metrics

4.3. Experimental Setup and Results

4.3.1. Comparative Experiments and Results

4.3.2. Generalization Experiments Across Datasets

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI