A Multi-Angle Semantic Feature Fusion Method for Web User Behavior Anomaly Detection

Wang, Li; Xia, Mingshan; Li, Yakang; Xu, Jiahong; Hou, Fengyao; Qi, Fazhi

doi:10.3390/info16090807

Open AccessArticle

A Multi-Angle Semantic Feature Fusion Method for Web User Behavior Anomaly Detection

by

Li Wang

^1,2,

Mingshan Xia

^1,2,*,

Yakang Li

^1,2,

Jiahong Xu

³,

Fengyao Hou

^1,2

and

Fazhi Qi

^1,2,*

¹

Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China

²

Spallation Neutron Source Science Center (SNSSC), Dongguan 523803, China

³

State Grid Tianjin Electric Power Company Dongli Power Supply Branch, Tianjin 300300, China

^*

Authors to whom correspondence should be addressed.

Information 2025, 16(9), 807; https://doi.org/10.3390/info16090807

Submission received: 29 July 2025 / Revised: 5 September 2025 / Accepted: 15 September 2025 / Published: 17 September 2025

(This article belongs to the Section Information Security and Privacy)

Download

Browse Figures

Versions Notes

Abstract

To address the increasing complexity of web user behavior anomaly detection and the issue of missing semantic information caused by relying solely on features like request semantics or request sequences, this study proposes a multi-angle semantic feature fusion approach for user behavior anomaly detection. The research is based on user sessions. Firstly, by analyzing the access sequence behavior within user sessions and utilizing an improved SimHash algorithm, sequence features are extracted to model browsing patterns. Secondly, combining the semantic content contained in user sessions, a multi-attention Transformer model is employed to extract semantic features, representing user visit semantics. Finally, an end-to-end model is constructed to fuse sequence and semantic features, enabling effective detection of user behavior anomalies. Experimental results demonstrate that the proposed model exhibits excellent performance and stability in detection accuracy, with significant effects in real-world anomaly user identification. As the proportion of anomalous sessions increases, precision, recall, and F1-score also improve, all reaching 99%. Even when anomalous sessions are scarce in the dataset, the model still achieves satisfactory detection results.

Keywords:

user behavior; anomaly detection; semantic feature fusion; multi-angel

1. Introduction

With the rapid development of the Internet, the number of users of various online services has surged, and user behaviors have become more complex and diverse. However, some malicious activities, such as world wide web (Web) crawlers, injection attacks, and cross-site scripting (XSS) attacks, may infringe on user privacy and threaten network security. To protect user information, analyzing and understanding these large-scale, high-dimensional user behavior data are of great significance for many application scenarios, such as cyber security monitoring and financial fraud detection.

Web logs contain rich attribute information about user behaviors, such as user agent, access IP, access time, requested resources, and more. This provides sufficient informational support for research on Web log-based user behavior anomaly detection. Researchers can analyze user access behavior from multiple perspectives by parsing different fields in Web logs, and develop various user behavior anomaly detection models based on different processing methods and modeling algorithms. For example, by studying the statistical regularities of fields such as parameter length, completeness of features, and access sequence, models for detecting anomalies in Web requests have been constructed based on these statistical features [1,2,3]. However, these rule-based or single-pattern behavior models often lack semantic features of user access behaviors, making it difficult to fully reflect the high variability and complexity of user behaviors. Additionally, since web systems contain thousands of web pages, sequence analysis can result in request sequences of varying lengths; truncating or padding sequences may also influence detection results.

In recent years, rapid advancements have been made in semantic feature engineering and representation learning techniques. Researchers are now able to directly extract semantic information from user behavior data and build models based on this information for web anomaly detection. Semantic information effectively reflects deep user intentions and preferences underlying the content, making it a key element for understanding and characterizing complex user behaviors. In anomaly detection tasks, ensemble learning methods have demonstrated significant advantages. For example, Ref. [4] effectively improved model generalization and robustness by integrating predictions from multiple machine learning models. Ref. [5] proposed an ensemble model combining eXtreme Gradient Boosting (XGBoost) with support vector machines, Decision Trees, and naive Bayes, utilizing a hard voting mechanism to distinguish between normal and malicious traffic. Although these methods have somewhat improved detection performance, they often rely on a single type of feature and have limited ability to capture semantic information and contextual associations within behavior sequences, making it challenging to effectively handle increasingly complex multi-stage attack behaviors. On the other hand, while semantic information has potential in detection, current research in semantic-based user behavior modeling still exhibits notable deficiencies. Additionally, since user behavior data originates from diverse sources, including web browsing records, access sequences, and request content, how to achieve effective multi-source feature fusion and modeling remains a critical issue that requires further exploration.

To obtain a more comprehensive view of user behaviors, establishing multi-angle semantic features and integrating multi-view user behavior data for modeling has become a research hot spot, such as through multi-modal techniques. However, existing multi-modal learning primarily focuses on traditional multimedia data like images, videos, and speech. Although its application in network user behavior anomaly detection is still relatively limited, its research approach can serve as a valuable reference for this study. Therefore, this paper combines deep learning techniques with multi-modal approaches to propose a user behavior anomaly detection method based on multi-semantic feature fusion. The specific contributions of this work can be summarized as follows:

A significant innovation of this Web anomaly behavior recognition method is its integration of behavior features and semantic features from multiple perspectives, enriching the semantic expression of user behaviors.
The behavioral modality characterizes user behavior effectively by analyzing user session sequences in Web traffic logs. An improved SimHash algorithm is utilized to address the issue of inconsistent session lengths across users, enabling the model to capture behavioral patterns and regularities.
The semantic modality employs a multi-head attention-enhanced Transformer model to deeply mine semantic information from textual data, thereby improving the understanding of user intentions and the underlying meanings behind behaviors.
An end-to-end Web abnormal user behavior detection model is constructed, which not only integrates behavior and semantic features but also designs a set of loss functions, including inter label loss, common space loss, and label loss, to minimize the mapping errors between the two modalities. This design allows the model to focus more on the fusion and complementarity of the two features during training, thereby enhancing the accuracy and robustness of anomaly detection.

The rest of the paper is organized as follows: Section 2 provides an indepth summary of related work and its limitations in user behavior detection; Section 3 describes the research methods and process; Section 4 introduces the evaluation methods and comparative experiments, along with a detailed discussion of the results; and Section 5 concludes the paper with a summary of the work.

2. Related Works

Data mining utilizes the analysis of user behavior patterns to identify potential security risks or anomalies. Some scholars have attempted to automatically discover and model normal user behavior patterns using sequence pattern mining techniques, applying these models to Web anomaly detection tasks with positive results [6,7,8,9]. Ref. [6] proposed a novel dynamic hidden semi-Markov model to simulate time-varying user behaviors, monitoring evolving behaviors of users and attackers over time. Ref. [7] employed statistical methods to extract user access behavior patterns, distinguishing normal from abnormal user activities. Ref. [8] used sequence pattern mining to uncover unique behavior patterns of individual users, discovering anomalous users through the A priori-kl combined rule mining approach. Ref. [9] introduced the BADSM method, which employs adaptive behavior pruning techniques to efficiently and targetedly mine sequence patterns from data streams, more accurately characterizing normal user behavior and achieving precise anomaly detection. However, although frequent sequence mining offers valuable insights into user behavior patterns, it typically only considers the cooccurrence of antecedent and subsequent behaviors, failing to fully capture the temporal evolution and contextual dependencies between behavior elements, which limits its effectiveness in representing the dynamic nature of real user behaviors. Additionally, Ref. [10] developed a method based on statistical features of application programming interface (API) attack access behavior patterns, providing protection against key security threats such as Economic Denial of Service (EDoS) attacks (open web application security project (OWASP) Top 4), unauthorized access attempts (OWASP Top 3), brute-force user authentication, and API abuse.

Moreover, similarity analysis is another commonly used method for detecting anomalous user behavior, primarily utilizing clustering techniques. Ref. [11] proposed a similarity-based clustering model for user anomaly detection, establishing prototype profiles of normal user behavior based on user access sequences. By calculating the similarity distance between observed user behavior and these prototype profiles, abnormal behaviors can be identified. Ref. [12] extracted textual content from web logs and computed the entropy of user behaviors. Using an optimized K-means clustering algorithm, they clustered users based on their entropy values to detect users exhibiting abnormal behavior. Ref. [13] analyzed Web log data using various clustering algorithms such as Gaussian Mixture Models (GMM), K-means, and Bayesian GMM (BGMM) to detect anomalies in user browsing behavior. Ref. [14] proposed a Web user anomaly detection method based on user similarity and fuzzy clustering, capable of efficiently detecting and locating anomalous users.

All these studies validate the effectiveness of clustering techniques for detecting browsing behavior anomalies; however, they all require pre-extracted user behavior features, and their performance heavily depends on whether these features are comprehensive.

With the development of machine learning and deep learning, many researchers have begun to focus on applying deep learning methods to user behavior anomaly detection. Ref. [15] built a graph neural network based on user access sequences, though this approach lacks consideration of the semantic features of user visits. Embedding techniques have shown promising potential in automatically extracting Web request features. Ref. [16] proposed a long short-term memory (LSTM) based approach for Web application anomaly detection, transforming raw request logs into eight-dimensional request vectors and connecting multiple requests from the same user in chronological order to form session vectors. These session vectors are then used as input to the LSTM for anomaly detection. Ref. [17] proposed a web user behaviors anomaly detection method called Web-based Learner (WebLearner), which can automatically integrate security analysis feedback and achieves a high level of accuracy. Ref. [18] introduced a deep forest-based Web application user behavior prediction method, addressing the problem of traditional learning methods requiring extensive hyperparameter tuning.

In the research on web anomaly detection, ensemble learning methods have demonstrated significant advantages. Ref. [4] proposed a web attack detection system that utilizes three independent deep learning models for classification. The final decision module combines these parallel outputs to obtain the ultimate detection result. Ref. [19] introduced a stacked ensemble-based classifier model for XSS attack detection, integrating stacking ensemble techniques with machine learning classifiers such as K-Means clustering, Random Forest, and Decision Tree as base learners. The final detection is performed using a logistic regression (LR) algorithm. Ref. [20] employed feature fusion techniques, first extracting semantic features of words and characters using a word-level convolutional neural network (CNN) and char-level CNN, respectively. These features are then merged to facilitate anomaly detection. However, due to the varying importance and weights of different features, straightforward feature concatenation may lead to information mixing or loss, potentially affecting detection accuracy.

Overall, current research mainly concentrates on behavior analysis based on a single pattern, primarily utilizing a single data source to establish either access sequence features or semantic features for modeling. There is a lack of comprehensive feature construction from multiple perspectives. Moreover, user request sequences vary in length, and truncation of sequences can affect feature accuracy, thereby impacting the effectiveness of user behavior detection.

3. The Proposed Model

This paper adopts a semantic-driven methodology, focusing on an in-depth analysis of the multi-layered semantic meanings involved in user interactions with Web services. This approach enables us to more accurately capture the intrinsic patterns of user behavior, thereby enhancing the precision and effectiveness of the analysis.

Web traffic logs contain rich semantic information across multiple dimensions, including Web request uniform resource locators (URLs) and access sequence behaviors. Among these, URLs reflect the semantic content of pages, while access sequences reveal users’ motivations and preferences. The proposed method uses the semantics of URLs and access sequences as core features to guide the detection model in learning complex user behavior patterns. This helps in identifying cases of semantic inconsistency. By cross-validating the semantics of access sequence behaviors with those of request URLs, the method complicates an attacker’s effort to deceive both semantic modes simultaneously. This dual verification mechanism effectively improves the accuracy of anomaly detection and provides an efficient means for analyzing the security of Web traffic.

3.1. Behavior Feature Extraction Algorithm

For web applications, one of the key features describing user behavior is the user access sequence. The user access sequence refers to a series of web page requests initiated by the same host over a period of time. Analyzing user access sequences is of great significance for detecting security threats and abnormal behaviors in web applications.

The user access sequence, or behavior feature, is represented by constructing a model of the user’s browsing behavior sequence to obtain a vector representation of the user behavior. For web applications, describing user behavior through access sequences is crucial. For example, different users visiting a website may follow different sequences due to various steps in their access process. For instance, User1 first visits A.html, then B.html, and subsequently accesses different links like C.html and D.html. The request sequence for User1 can be represented as ABCD. Conversely, another user, User2, may have a sequence like ACDBCD. Usually, association analysis methods like A priori [21,22,23] are employed to analyze such sequences and mine frequent patterns of user behavior. However, when malicious users exhibit behaviors similar to normal users, this approach can impact detection effectiveness. Additionally, the varying lengths of access sequences can lead to information loss due to sequence truncation.

The SimHash algorithm, proposed by Moses Charikar, is a type of hash algorithm designed to preserve local data similarity. Its core idea is to use a cleverly designed hash mapping function such that similar data points in the original high-dimensional space remain close to each other after being projected onto a lower-dimensional space.

PageRank is a web ranking algorithm developed by Larry Page in 1996, which determines a webpage’s importance and authority based on link relationships between pages. A higher importance results in a higher ranking. In this context, the page weight is obtained using the PageRank method.

Based on the SimHash algorithm, this paper proposes a Algorithm 1 that vectors user behavior sequences. The input to this algorithm is a user request sequence, and the output is a behavioral feature vector generated from the sequence. During the entire computation process, a hash value is generated for each value in the request sequence. If the hash value exceeds 1, the initial vector is increased by the weight derived from the sequence value; otherwise, it is decreased by this weight. Subsequently, the vector is dimensionality-reduced to generate a new vector value.

Algorithm 1: Behavior Feature Extraction Algorithm

1. BehaviorHash (Input Vector V):

2. Initialize a vector

V^{'}

of length

f = [0, 0, \dots, 0]

; # length is f

3. Initialize an f-bit binary value

S = [0, 0, \dots, 0]

; # length is f

4. For each element v in V:

5. # Use traditional hash to compute f-bit binary signature b of v;

6.

b = Hash (v)

;

7. For i from 0 to

f - 1

:

8. If

b [i] = = 1

:

9.

V^{'} [i] + = weight of PR (v)

;

10. Else:

11.

V^{'} [i] - = weight of PR (v)

;

12. For i from 0 to

f - 1

:

13. If

V^{'} [i] > 0

:

14.

S [i] = 1

;

15. Else:

16.

S [i] = 0

;

17. Return the vectorized SimHash value;

The modified SimHash algorithm is used to extract a behavioral feature vector for each user visit session sequence. This algorithm converts each request sequence into a vector of the same dimension, ensuring that regardless of the actual length of the request sequence, their transformed vectors have the same dimension. This approach cleverly addresses the issue of varying request sequence lengths and facilitates subsequent user behavior analysis. The behavior feature extraction algorithm is outlined below.

3.2. Semantic Feature Extraction Model

For web applications, another key feature describing user behavior is the semantic content of user requests to web services. The extraction and representation of semantic features are crucial for understanding session information. By applying semantic models, we can extract key semantic information from sessions and encode it into semantic feature vectors, providing an intuitive and efficient means for in-depth analysis of user behavior.

This study employs a Transformer model enhanced with multi-head attention mechanisms to extract user access semantic features. The model architecture is shown in Figure 1, which includes the following layers: input-layer, embed-layer, position-layer, self-attention layer, feed-forward f, and feature-layer. The subsequent sections describe the calculation methods and representations for each layer.

Input Layer: Receives the input sequence.
Embedded Layer: Converts discrete or unstructured data into continuous vector representations that the model can understand.
Positional Layer: Adds positional information to each position in the sequence. The specific calculation methods are as follows:

$P E (pos, 2 i) = sin (\frac{pos}{10000^{2 i / d}})$

(1)

$P E (pos, 2 i + 1) = cos (\frac{pos}{10000^{2 i / d}})$

(2)

Each word in the sentence has a corresponding position index $p o s$ . This word is mapped to a d-dimensional word vector. In this d-dimensional vector, the even dimensions (0, 2, 4, …) are represented and indexed by $2 i$ , while the odd dimensions (1, 3, 5, …) are represented and indexed by $2 i + 1$ .
In code implementation, instead of directly calculating the power of $10, 000^{2 i / d}$ , the equivalent exponential and logarithmic form is used: $10, 000^{- i / d} = e^{(- \frac{i}{d} \cdot ln (10, 000))}$ .
Advantages:
- Directly using exponentiation may lead to numeric overflow or underflow.
- In many computing devices and libraries, implementations of exponential and logarithmic operations are typically faster than power operations.
Self-Attention Layer: The self-attention mechanism is designed to compute relevance weights between each word and other words in the input sequence, generating a representation vector for each word that incorporates contextual information.
Multi-Head Attention Function: To further enhance the ability of the self-attention mechanism to aggregate contextual information, multi-head attention is implement-ed, allowing the model to focus on different aspects of the context. The calculation formulas are as follows:

${head}_{i} = Attention (Q W_{i}^{Q}, K W_{i}^{K}, V W_{i}^{V})$

(3)

$MultiHead (Q, K, V) = Concat ({head}_{1}, {head}_{2}, \dots, {head}_{h}) W^{Q}$

(4)

where Q, K, V, respectively, represent the matrices formed by concatenating the q, k, v vectors from different words in the input sequence.
Multi-head attention calculates multiple attention representations in parallel within different subspaces and concatenates them, enhancing the model’s capacity to capture diverse positional relationships across different subspaces.
Feature Layer: The final output layer projects the model’s output into a two-dimensional subspaces.
Loss function: we use Mean Squared Error (MSE), and all parameters in the model are optimized using Adam.

3.3. MSFFusion Model

After the above processes, both behavioral features and semantic features are obtained, which are considered as two modalities, denoted by S and L, respectively. S represents the semantic modality,

(S = {s_{1}, s_{2}, \dots, s_{m}})

, and L represents the behavioral modality,

(L = {l_{1}, l_{2}, \dots, l_{m}})

. Figure 2 illustrates the fusion model, which takes these two types of features as input. Each feature is separately processed through two CNNs to extract the respective features, and then trained using three loss functions. The model ultimately achieves the classification goal.

The three specific loss functions are as follows:

1.: Inner Label Loss
The goal of feature fusion is to discover the underlying unified semantic representation across modalities, i.e., to learn a common semantic space. The core objective of this shared space is to eliminate surface-level differences between data from different modalities and to reveal the intrinsic semantic relationships within the data. This ensures that samples close in semantic space have similar representations, while those with semantic differences are represented dissimilarly. This approach facilitates higher-level semantic understanding and analysis tasks. To learn features of web user behavior data, this research proposes a modal discrepancy minimization using Multi-Kernel Maximum Mean Discrepancy (MK-MMD). The goal is to find a function $⌀ (.)$ such that the behavior modality and semantic modality can be mapped into the same feature space, i.e., a Hilbert space.

$T_{1} = {MK - MMD}^{2} = {∥E_{u} [ϕ (S_{i})] - E_{v} [ϕ (L_{i})]∥}_{H_{κ}}^{2}$

(5)

$κ ≜ \{k = \sum_{i = 1}^{m} β_{i} κ_{i} : β \geq 0, \forall i\}$

(6)

where H is the reproducing kernel hilbert space, K is the kernel function, and $H k$ is the Hilbert space associated with the kernel K.
2.: Common Space Loss
Common Space Loss has been effectively utilized in the field of multimodal research and has been demonstrated in the literature to improve the detection performance of models. Therefore, we adopt the aforementioned approach and, following the work of Refs. [23,24], we incorporate this concept into our framework. Additionally, Ref. [25] demonstrated that features fused into a common space are the most effective for classification tasks. Building on the assumption that these two semantic features are projected into the same shared space, this paper introduces a common space loss function to optimize the shared feature representation. This loss $T_{2}$ encourages vectors of user behaviors belonging to the same class to be more compact, and those of different classes to have larger distance differences, thereby enhancing the discriminability between different request categories.

$\begin{matrix} T_{2} = & \frac{1}{n^{2}} \sum_{i, j = 1}^{n} (log (1 + e^{Γ_{i j}}) - S_{i j}^{u v} Γ_{i j}) \\ + \frac{1}{n^{2}} \sum_{i, j = 1}^{n} (log (1 + e^{Φ_{i j}}) - S_{i j}^{u u} Φ_{i j}) \\ + \frac{1}{n^{2}} \sum_{i, j = 1}^{n} (log (1 + e^{Θ_{i j}}) - S_{i j}^{v v} Θ_{i j}) \end{matrix}$

(7)

where $Γ_{i j} = \frac{1}{2} cos (u_{i}, v_{j})$ , $⌀_{i j} = \frac{1}{2} cos (u_{i}, u_{j})$ , $θ_{i j} = \frac{1}{2} cos (v_{i}, v_{j})$ . The cosine similarity $(.)$ measures the angular similarity between two vectors. $S_{i j}^{u v} = 1 {u_{i}, v_{j}}$ , $S_{i j}^{u u} = 1 {u_{i}, u_{j}}$ , $S_{i j}^{v v} = 1 {v_{i}, v_{j}}$ . The indicator function $1 {.}$ assigns a value of 1 if the two elements belong to the same class; otherwise, 0. The aim is to maximize the distances between representations of different modalities while minimizing the distances within the same modality.
3.: Label Loss
To reduce the distance between the true and predicted values, the Mean Absolute Error (MAE) function is used, where $f (.)$ is the prediction function and $y_{i}$ represents the label.

$T_{3} = T_{w} + T_{c} = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - f (s) | + \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - f (l) |$

(8)

where $T_{w}$ and $T_{c}$ denote the semantic modal loss and behavior modal classification loss, respectively. $y_{i}$ indicate the labels for corresponding classes.
4.: The final combined loss function (from Equation (5) to (8)) is shown as follows:

$T = T_{1} + T_{2} + T_{3}$

(9)

4. Experiments and Results

To validate the proposed method, experiments were conducted on publicly available standard datasets to evaluate its performance. The experimental environment runs on Windows 10, equipped with an i7 CPU, 64 GB of RAM, and an NVIDIA Quadro P600 graphics card.

4.1. Datasets

MACCDC (Mid-Atlantic Collegiate Cyber Defense Competition) [26] is a cybersecurity competition held in the United States aimed at undergraduate students specializing in network security. The MACCDC2012 dataset provides data for such competitions in American universities, currently containing over two million records of malicious activities. As a widely recognized competition dataset, MACCDC2012 helps researchers and developers to develop and test new cybersecurity tools, algorithms, and models to more effectively identify, defend against, and respond to network attacks. Specific parameter information are shown in Table 1.

Therefore, this paper selects these dataset as attack samples. Normal user behavior data were collected from the intranet of the China Spallation Neutron Source (CSNS) between 21 March 2023, and 26 April 2023. The dataset contains approximately 200 million records, which were labeled using the Apache-scalp v0.4. The malicious samples identified through data labeling, combined with the MACCDC2012 dataset, are used as attack samples, while the data labeled as normal are regarded as positive samples.

User sessions are distinguished by uid. According to reference [12], if the uid remains the same within a 30 min window, the activity is counted as belonging to the same user.

After data cleaning and preprocessing, this paper designed five different experimental scenarios, each with datasets containing varying proportions of abnormal and normal data for model validation. Details are shown in Table 2.

4.2. Evaluation Methods and Model Parameters

This paper evaluates the performance of each method using accuracy, precision, recall, and F1-score, and assesses the overall effectiveness of the models through accuracy and F1-score. Accuracy refers to the proportion of correct predictions (both positive and negative) made by the model out of all predictions. Whether the prediction is positive or negative, as long as it is correct, it contributes to the numerator of accuracy. Precision focuses on the samples predicted as positive, representing the ratio of truly positive samples within those predictions. It reflects the model’s accuracy in predicting positive samples. Recall measures the proportion of actual positive samples that are correctly identified by the model. It indicates the coverage of positive samples by the model. F1-score is the harmonic mean of precision and recall, balancing both metrics to evaluate the model’s performance comprehensively. Formulas are as follows:

Accuracy = \frac{TP + TN}{TP + FP + TN + FN}

(10)

Precision = \frac{TP}{TP + FP}

(11)

Recall = \frac{TP}{TP + FN}

(12)

F 1 - s c o r e = \frac{2 \times (Precision \times Recall)}{Precision + Recall}

(13)

Table 3 shows the initialization parameters of the model. The output length of Sim-hash is 256 and the semantic modality features are 256, to reduce randomness and improve generalization ability, and thereby obtain more stable and reliable results and performance the average of five experimental runs is used as the final experimental result.

4.3. Experimental Results and Analysis

To comprehensively evaluate our method, this paper conducts experiments on both the real datasets. The experimental results are shown below.

4.3.1. Comparison of Models Under Different Scenarios

This experiment aims to evaluate the performance of the proposed MSFFusion model under scenarios with varying ratios of abnormal sessions. As shown in Table 2, we designed five scenarios (Scenario 1–5) where the ratio of abnormal to normal sessions progressively changes from 1:20 (5k:100k) to 2:5 (40k:100k); thus, simulating data distributions ranging from highly imbalanced to relatively balanced. Experimental results are obtained, as shown Table 4 and Figure 3.

The results from Table 4 and Figure 3 clearly demonstrate that all three metrics (Precision, Recall, F1-Score) of the model exhibit a monotonically increasing trend from Scenario 1 to Scenario 5. The F1-Score steadily rises from 92.75% in Scenario 1 to 99.72% in Scenario 5. This trend proves that the model can learn more robust and precise feature representations in scenarios where abnormal samples are more abundant.

Scenario 1 (Abnormal/Normal ∼ 1:20): Even in this most imbalanced scenario, the model achieved a commendable F1-Score of 92.75%, demonstrating its capability to handle highly imbalanced data and its strong baseline performance. It is worth noting that the Recall (95.12%) was higher than the Precision (90.48%), indicating that the model’s strategy in this scenario leaned towards capturing all potential anomalies as much as possible, albeit at the cost of a slight increase in false positives.

Scenario 2 and 3 (ratios from 1:10 to 1:5): As the number of abnormal samples increased, both the Precision and Recall of the model improved synchronously, leading to a corresponding increase in the F1-Score. This indicates that the increased number of abnormal samples effectively alleviated the class imbalance problem, providing the model with more opportunities to learn anomalous patterns and resulting in a clearer decision boundary.

Scenario 4 and 5 (ratios from 3:10 to 2:5): When the proportion of abnormal samples increased further to nearly 1:3, the model’s performance peaked, with all three metrics exceeding 98.7%, approaching near-perfect levels. This shows that our model delivers its best performance in environments with a relatively balanced data distribution. The nearly identical values of Precision and Recall indicate that the model achieved an excellent balance between high accuracy (low false positives) and high recall (low missed detections), which is crucial for an anomaly detection system.

Based on our analysis, we attribute the steady performance improvement primarily to the following two reasons: (1) Learning Diverse Anomalous Patterns: more abnormal samples mean the model is exposed to a wider variety of anomalous behavior patterns (such as various attack vectors, crawler strategies, etc.), avoiding overfitting or incomplete understanding caused by an insufficient number of anomalous samples. (2) Mitigating Class Imbalance: traditional machine learning models are prone to bias towards the majority class (normal sessions) on highly imbalanced data. Although our model already performed well in Scenario 1, increasing the number of abnormal samples fundamentally reduced the learning difficulty, enabling the model to effectively distinguish between the two classes.

4.3.2. Performance Comparison of Semantic Features

This paper remains based on Scenario 5 for experiments, comparing common baseline models in Web anomaly detection, including CNN [27], CNN-GRU [15], and RNN [28]. The experimental results are presented in Table 5.

As shown in Table 5, this experiment compared the performance of the proposed multi-modal fusion model (our method) with various deep learning baseline models based on single semantic features under Scenario 5, where abnormal sessions account for 28.57% (a relatively balanced data distribution). The results demonstrate that our method achieved superior performance across all three metrics: Precision, Recall, and F1-score, significantly outperforming all comparative models, which fully validates its effectiveness and superiority.

Our model achieved a Precision of 99.72%, a Recall of 99.73%, and an F1-Score of 99.72%. This indicates that in Scenario 5, the model exhibits extremely low false negative and false positive rates, almost perfectly distinguishing between normal and abnormal sessions. Compared to the best baseline model (CNN, 95.80%), our model improved the F1-score by nearly 4 percentage points in absolute terms, representing a highly significant enhancement.

Analysis of the comparative models reveals that the CNN model performed the best among baselines (F1-Score: 95.80%). CNNs excel at capturing local spatial features (e.g., key characters in requests, n–gram patterns), which is highly effective in identifying abnormal requests with fixed attack payloads. Consequently, it achieved relatively high both Precision (94.34%) and Recall (97.31%). The RNN model achieved the highest Recall (97.46%) but the lowest Precision (92.07%). RNNs and their variants (e.g., GRU) are designed for sequence modeling, enabling a better understanding of the sequential context of requests. This allows them to detect more complex attacks hidden within long sequences (high Recall), but their sensitivity also leads to more false positives (low Precision). The CNN-GRU hybrid model underperformed expectations. Theoretically, combining the strengths of CNN (local feature extraction) and GRU (sequential dependency modeling), the CNN-GRU model’s performance in this experiment was slightly inferior to the pure CNN model. This may stem from overfitting issues due to increased model complexity, or a failure to find the optimal hyperparameter configuration, preventing its advantages from being fully realized.

The experimental results indicate that, unlike the baseline models in the table which utilize only single semantic features (e.g., URL text), the method proposed in this paper fuses behavioral and semantic modalities. In a balanced data scenario, the features from these two modalities complement and corroborate each other, resulting in extremely high decision confidence for the model and nearly eliminating uncertainty. Furthermore, the improved Simhash and multi-head attention Transformer architecture employed by the model outperforms traditional CNN or RNN models in feature representation, enabling the learning of more robust and discriminative feature representations.

4.3.3. Performance Comparison of Behavioral Features

This experiment specifically compared the performance of the proposed multi-modal fusion model (our method) against a series of deep learning baseline models based on sequential features under Scenario 5 with a 28.57% proportion of abnormal samples. The results once again demonstrate that our method achieved superior performance across all three metrics—Precision, Recall, and F1-Score (all above 99.72%)—with its comprehensive performance significantly outperforming all comparative models relying solely on sequential features. The experimental results are presented in Table 6.

It is noteworthy that in the experimental results of Table 6, the RNN model emerged as the top performer among the baseline models in this specific scenario, which diverges from common expectations.

RNN (F1-Score: 92.39%): Its recall rate (96.23%) was the highest among all baselines, indicating that the RNN architecture is particularly effective at capturing anomalies with an extremely low missed detection rate. However, its precision (91.71%) was relatively low, meaning it has a higher false positive rate and tends to misclassify some normal sequences as abnormal.

CNN-GRU (F1-Score: 91.25%): As a hybrid model, its performance fell between that of CNN and RNN but failed to surpass the pure RNN model. This may suggest that, under this specific scenario and data distribution, the sequential modeling benefits brought by the GRU module were insufficient to compensate for the increased model complexity.

CNN (F1-Score: 90.52%): The pure CNN model exhibited the worst performance in both precision and recall among the baselines. This confirms CNN’s inherent limitation in handling long-range sequential dependencies. It may struggle to effectively understand complex attack patterns spanning extended time periods, resulting in both missed anomalies (low recall) and numerous false alarms (low precision).

The comparative experimental results reveal that analyzing request sequences alone is insufficient to distinguish between “behaviorally similar but intentionally different” requests. For example, a high-frequency access sequence to ‘/api/data’ could originate from either malicious crawlers or normal front-end page auto-loading activities. Single-modality sequence models cannot effectively differentiate such cases, leading to either high false positives (as with RNN) or high false negatives (as with CNN).

Our model only concludes an anomaly when both modalities yield abnormal judgments. This cross-verification mechanism significantly suppresses false positives (addressing RNN’s weakness) while simultaneously reducing missed detections through multidimensional information integration (solving CNN’s problem).

Table 6 shows the experimental results based on semantic features in Scenario 5, while Table 6 represents the experimental results based on behavioral features. Comparing the two tables, it can be seen that in CNN, the semantic features achieve an F1-score of 95.8%, whereas the sequential features achieve an F1-score of 90.52%. The main reason is that semantic features typically better reflect the user’s actual intentions and deeper content information, thus having higher discriminatory power in anomaly detection. In contrast, behavioral features mainly describe user operation patterns and action sequences. Although they can capture some abnormal behaviors, they may be limited by the diversity and similarity of behaviors.

4.3.4. Ablation Study

During the model training process, this paper employed three different loss functions to optimize the model’s learning effectiveness. To further analyze the contributions and impacts of these three loss functions on the final results, an ablation study was conducted. The experiment was carried out under the conditions of Scenario 1, and the results are shown in Table 7. In Table 7, L1 represents the model without the modal discrepancy minimization loss, L2 indicates the model without the shared space loss, and L3 corresponds to the use of all the loss functions together. Observing the results in Table 7, it is evident that L3 achieves the best performance metrics, which demonstrates that the reasonable design and integration of these three loss functions are beneficial to the final detection performance. They can complement each other and leverage their respective advantages. L3 shows a significant improvement over L1, validating the importance of the modal feature discrepancy loss, which effectively reduces the differences between semantic features of different modalities and achieves better feature fusion. Additionally, L3 also out-performs L2, indicating that introducing discriminative losses for the two-modal space has a positive effect on guiding the learning of features from the fused behavior and semantic modalities, thereby helping to extract more discriminative feature representations.

4.3.5. Efficiency Analysis

It is noteworthy that compared to the single-modality baseline models (e.g., CNN, RNN) evaluated in this study, the proposed MSFFusion multi-modal fusion model required significantly longer training time, reaching 21.75 h, due to its more complex architecture. This is primarily attributed to the model’s inherent higher complexity: it must process two heterogeneous data modalities (semantic and behavioral) in parallel for feature extraction, and the employed improved SimHash algorithm, multi-head self-attention mechanism, and the cooperative optimization of multiple loss functions collectively introduce substantial computational overhead.

However, during the inference phase, the average detection time per session for our model was merely 0.65 ms, significantly outperforming the 0.89 ms reported by Ref. [29] and 5.1 ms by Ref. [28]. This is because, once trained, the model’s forward propagation process is solidified into a static computation graph. The complex feature fusion and decision-making process is completed efficiently in a single forward pass, requiring no iteration or post-processing.

This characteristic of being “costly to train yet efficient to infer” indicates that the MSFFusion model is particularly suitable for real-world application scenarios with high demands on detection real-time performance but no need for frequent model retraining (e.g., Web Application Firewalls-WAFs). Once deployed, the trained model can handle massive concurrent requests at a speed comparable to other models while delivering high-precision anomaly detection. A key focus of future work will be to further optimize training efficiency, for instance, through techniques like knowledge distillation or model pruning, to reduce model complexity and training costs without compromising performance.

5. Conclusions

This paper proposes a multi-modal fusion-based approach for web anomaly detection using web traffic logs. The method extracts user access features from both semantic and behavioral perspectives and achieves more accurate detection through feature fusion. An improved SimHash algorithm is adopted to handle variable-length access sequences, enhancing the representational capability of user behavior patterns. Meanwhile, a Transformer-based semantic encoder is introduced to extract request-level semantic information, and effective fusion of dual-modal features is realized through a CNN architecture and multi-loss function design.

The xperimental results demonstrate that as the proportion of abnormal samples gradually increased from approximately 4.76% (Scenario 1) to 28.57% (Scenario 5), the model’s precision, recall, and F1-Score all demonstrated a clear monotonically increasing trend, indicating that the model exhibits robust performance across varying ratios of anomalous sessions.Furthermore, the multi-modal feature fusion strategy significantly enhanced detection performance, markedly outperforming single-modality approaches. Under Scenario 5, whether in experiments based on semantic features (Table 5) or behavioral features (Table 6), the MSFFusion model integrating dual-modal information exceeded 99.7% across all key metrics, significantly outperforming baseline models that relied solely on a single modality (CNN, CNN-GRU, RNN). These results validate the core hypothesis that semantic and behavioral features provide complementary information. Through the fusion and cross-validation mechanism, the model effectively reduced the common issues of false positives and false negatives typical in single-modality models. Moreover, the proposed multi-modal loss function proved effective, with its components demonstrating complementarity. Although the multi-modal structure resulted in a longer training time (21.75 h), its inference efficiency was extremely high, requiring only 0.65 milliseconds per session, outperforming the results reported in existing studies. This indicates that the model has strong potential for practical application.

Although the model performs excellently in offline testing, it still faces challenges in computational efficiency for real-time detection and high-concurrency environments in practical deployment. Future work will focus on the following aspects: First, optimizing the model structure by investigating lightweight feature extraction and fusion mechanisms to improve online inference speed; second, exploring incremental learning mechanisms to enable the model to adapt to dynamically changing web access patterns; and third, further investigating the interpretability of multi-modal fusion to enhance the credibility and operability of detection results.

Author Contributions

The authors declare their individual contributions to this paper as follows: L.W. was responsible for conceiving and designing the study as well as collecting the data. M.X. performed analysis and interpretation of the results and drafted the initial manuscript. F.Q., Y.L., F.H., and J.X. modified the final manuscript. All authors subsequently reviewed the findings, results and approved the final manuscript prior to submission. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by a grant from the National Natural Science Foundation of China (no. 12105303, no. 12505226).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

The author would like to extend sincere gratitude to the methodologies that have significantly contributed to and facilitated this research.

Conflicts of Interest

Author Jiahong Xu was employed by the State Grid Tianjin Electric Power Company Dongli Power Supply Branch. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

XSS	Cross-Site Scripting
Web	World Wide Web
XGBoost	eXtreme Gradient Boosting
EDoS	Economic Denial of Sustainability
OWASP	Open Web Application Security
API	Application Programming Interface
GMM	Gaussian Mixture Models
BGMM	Bayesian Gaussian Mixture Models
LSTM	Long Short-Term Memory
WebLearner	Web-based Learner
LR	Logistic Regression
CNN	Convolutional Neural Network
URLs	Uniform Resource Locators
MSE	Mean Squared Error
MK-MMD	Multi-Kernel Maximum Mean Discrepancy
MAE	Mean Absolute Error
MACCDC	Mid-Atlantic Collegiate Cyber Defense Competition
CSNS	China Spallation Neutron Source

References

Kruegel, C.; Vigna, G. Anomaly detection of web-based attacks. In CCS ’03, Proceedings of the 10th ACM Conference on Computer and Communications Security, Washington, DC, USA, 27–31 October 2003; Association for Computing Machinery: New York, NY, USA, 2003; pp. 251–261. [Google Scholar] [CrossRef]
Kruegel, C.; Vigna, G.; Robertson, W. A multi-model approach to the detection of web-based attacks. Comput. Netw. 2005, 48, 717–738. [Google Scholar] [CrossRef]
Robertson, W.; Vigna, G.; Kruegel, C.; Kemmerer, R.A. Using Generalization and Characterization Techniques in the Anomaly-Based Detection of Web Attacks. NDSS. 2006. Available online: https://api.semanticscholar.org/CorpusID:8266221 (accessed on 14 September 2025).
Luo, C.; Tan, Z.; Min, G.; Gan, J.; Shi, W.; Tian, Z. A Novel Web Attack Detection System for Internet of Things via Ensemble Classification. IEEE Trans. Ind. Inform. 2021, 17, 5810–5818. [Google Scholar] [CrossRef]
Laiq, F.; Al-Obeidat, F.; Amin, A.; Moreira, F. DDoS Attack Detection in Edge-IIoT using Ensemble Learning. In Proceedings of the 2023 7th Cyber Security in Networking Conference (CSNet), Montreal, QC, Canada, 16–18 October 2023; pp. 204–207. [Google Scholar] [CrossRef]
Xie, Y.; Tang, S. Online anomaly detection based on web usage mining. In Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops PhD Forum, Shanghai, China, 21–25 May 2012; pp. 1177–1182. [Google Scholar] [CrossRef]
Zhang, Z.; Sun, R.; Wang, X.; Zhao, C. A situational analytic method for user behavior pattern in multimedia social networks. IEEE Trans. Big Data 2019, 5, 520–528. [Google Scholar] [CrossRef]
Hong, L. Based on the user behavior characteristics of mining database anomaly detection model design. In Proceedings of the 2013 6th International Conference on Information Management, Innovation Management and Industrial Engineering, Xi’an, China, 23–24 November 2013; Volume 2, pp. 619–622. [Google Scholar] [CrossRef]
Zhou, Y.; Wang, Y.; Ma, X. A user behavior anomaly detection approach based on sequence mining over data streams. In Proceedings of the 2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), Guangzhou, China, 16–18 December 2016; pp. 376–381. [Google Scholar] [CrossRef]
Prinakaa, S.; Bavanika, V.; Sanjana, S.; Srinivasan, S.; Sarasvathi, V. A Real-Time Approach to Detecting API Abuses Based on Behavioral Patterns. In Proceedings of the 2024 8th International Conference on Cryptography, Security and Privacy (CSP), Osaka, Japan, 20–24 April 2024; pp. 24–28. [Google Scholar] [CrossRef]
Hu, S.; Xiao, Z.; Rao, Q.; Liao, R. An anomaly detection model of user behavior based on similarity clustering. In Proceedings of the 2018 IEEE 4th Information Technology and Mechatronics Engineering Conference (ITOEC), Chongqing, China, 14–16 December 2018; pp. 835–838. [Google Scholar] [CrossRef]
Gao, Y.; Ma, Y.; Li, D. Anomaly detection of malicious users’ behaviors for web applications based on web logs. In Proceedings of the 2017 IEEE 17th International Conference on Communication Technology (ICCT), Chengdu, China, 27–30 October 2017; pp. 1352–1355. [Google Scholar] [CrossRef]
Paul, M.; Medhe, K. Using Machine Learning to Detect Anomalies in Internet Browsing Pattern of Users. In Proceedings of the 5th International Conference on Cyber Security & Privacy in Communication Networks (ICCS) 2019, Kurukshetra, India, 29–30 November2019; Available online: https://ssrn.com/abstract=3511054 (accessed on 14 September 2025).
Xiao, R.; Su, J.; Du, X.; Jiang, J.; Lin, X.; Lin, L. Sfad: Toward effective anomaly detection based on session feature similarity. Knowl. Based Syst. 2019, 165, 149–156. Available online: https://api.semanticscholar.org/CorpusID:58005338 (accessed on 1 February 2019). [CrossRef]
Modell, A.; Larson, J.; Turcotte, M.; Bertiger, A. A graph embedding approach to user behavior anomaly detection. In Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), Orlando, FL, USA, 15–18 December 2021; pp. 2650–2655. [Google Scholar] [CrossRef]
Ma, H.; Wang, C.; Qi, H. Anomaly Behavior Detection for the Web Application Based on LSTM. In Proceedings of the 2021 IEEE Conference on Telecommunications, Optics and Computer Science (TOCS), Shenyang, China, 10–11 December 2021; pp. 553–559. [Google Scholar] [CrossRef]
Gui, J.; Chen, Z.; Yu, X.; Lumezanu, C.; Chen, H. Anomaly detection on web-user behaviors through deep learning. In Security and Privacy in Communication Networks; Springer International Publishing: Cham, Switzerland, 2020; Available online: https://api.semanticscholar.org/CorpusID:229182996 (accessed on 12 December 2020).
Ma, C.-S.; Du, X.-R.; Lou, J.; Wang, M.-Q. A User Behavior Prediction Method for Web Applications Based on Deep For-est. J. Web Eng. 2025, 24, 39–56. [Google Scholar] [CrossRef]
Perumal, S.; Sujatha, P.K. Stacking ensemble-based xss attack detection strategy using classiffcation algorithms. In Proceedings of the 2021 6th International Conference on Communication and Electronics Systems (ICCES), Coimbatore, India, 8–10 July 2021; pp. 897–901. [Google Scholar] [CrossRef]
Le, H.; Pham, Q.; Sahoo, D.; Hoi, S.C. Urlnet: Learning a url representation with deep learning for malicious url detection. arXiv 2018, arXiv:abs/1802.03162. Available online: https://api.semanticscholar.org/CorpusID:3670018 (accessed on 9 February 2018). [CrossRef]
Li, H.; Ni, Y. Intrusion detection technology research based on apriori algorithm. Phys. Procedia 2012, 24, 1615–1620. [Google Scholar] [CrossRef]
Zolotukhin, M.; Hämäläinen, T.; Kokkonen, T.; Siltanen, J. Analysis of http requests for anomaly detection of web attacks. In Proceedings of the 2014 IEEE 12th International Conference on Dependable, Autonomic and Secure Computing, Dalian, China, 24–27 August 2014; pp. 406–411. [Google Scholar] [CrossRef]
Rathi, P.; Singh, N. An improved medoid clustering algorithm for intrusion detection using web usage mining technique. J. Posit. Sch. Psychol. 2022, 6, 10095–10108. [Google Scholar]
Zhen, L.; Hu, P.; Wang, X.; Peng, D. Deep supervised cross-modal retrieval. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 10386–10395. [Google Scholar] [CrossRef]
Zeng, D.; Oyama, K. Learning joint embedding for cross-modal retrieval. In Proceedings of the 2019 International Conference on Data Mining Workshops (ICDMW), Beijing, China, 8–11 November 2019; pp. 1070–1071. [Google Scholar] [CrossRef]
Yang, T.; Li, M.; Deng, H.; Wang, J. A sentence-bert-based model for expressing key features of hospital web logs. In Proceedings of the 2023 4th International Seminar on Artiffcial Intelligence, Networking and Information Technology (AINIT), Nanjing, China, 16–18 June 2023; pp. 467–470. [Google Scholar] [CrossRef]
Ahir, D.; Shaikh, N. Analyzing Machine Learning Frameworks for Anomaly Detection on Web Server Log Data. In Proceedings of the 2025 International Conference on Emerging Smart Computing and Informatics (ESCI), Pune, India, 5–7 March 2025; pp. 1–7. [Google Scholar] [CrossRef]
Wang, J.; Zhou, Z.; Chen, J. Evaluating cnn and lstm for web attack detection. In Proceedings of the 2018 10th International Conference on Machine Learning and Computing, Macau, China, 26–28 February 2018; ACM: New York, NY, USA, 2018; pp. 283–287. [Google Scholar]
Mac, H.; Truong, D.; Nguyen, L.; Nguyen, H.; Tran, H.A.; Tran, D. Detecting attacks on web applications using autoencoder. In Proceedings of the Ninth International Symposium on Information and Communication Technology, Da Nang, Vietnam, 6–7 December 2018; pp. 416–421. [Google Scholar]

Figure 1. Transformer model.

Figure 2. MSFFusion model.

Figure 3. Evaluation indicators of models in different scenarios.

Table 1. MACCDC2012 dataset field description.

Parameters	Type	Description
ts	Time	Request timestamp
uid	String	Connect the unique ID to identify the session.
id	Record	A class that includes source host/source port and destination host/destination port numbers
trans_depth	Count	Length of the transmitted content
method	String	Request method: GET, POST, etc.
host	String	Service address in the request.
uri	String	Uniform Resource Identifier in the request
referrer	Count	The content of Referrer in the header
user_agent	String	User-agent in the web request header, representing the client content requesting the web service
request_body_len	Count	The actual size of uncompressed data transmitted from the client
response_body_len	Count	The actual size of uncompressed data transmitted from the server
status_code	Count	Status code returned from the server, usually including codes like 404, 302, etc.
status_msg	String	Message returned from the server; for example, 404 indicates the requested page was not found, does not exist
info_code	Count	The server’s last response code, generally an 1xx informational response code
info_msg	String	The message from the server corresponding to the last 1xx response code
tags	Set	Indicators for various discovered attributes
username	Vector	If basic authentication is performed on the request, this is the username
password	Vector	If basic authentication is performed on the request, this is the password

Table 2. Experiment Sessions dataset.

Scenarios	Anomaly/Sessions	Normal/Sessions
Scenario 1	5000	100,000
Scenario 2	10,000	100,000
Scenario 3	20,000	100,000
Scenario 4	30,000	100,000
Scenario 5	40,000	100,000

Table 3. Experiment parameters of MSFFusion Model.

Parameters	Value
Learning rate	0.01
Batch size	256
EPOCH	100

Table 4. Comparison of evaluation indicators for different models in different scenarios.

Method	Precision (%)	Recall (%)	F1-Score (%)
Scenario 1	90.48	95.12	92.75
Scenario 2	95.65	93.22	93.92
Scenario 3	95.15	93.27	93.68
Scenario 4	98.77	98.75	98.73
Scenario 5	99.72	99.73	99.72

Table 5. Performance comparison of semantic features.

Method	Precision (%)	Recall (%)	F1-Score (%)
CNN	94.34	97.31	95.80
CNN-GRU	93.31	97.01	95.13
RNN	92.07	97.46	94.69
Our Method	99.72	99.73	99.72

Table 6. Performance comparison of Behavioral features.

Method	Precision (%)	Recall (%)	F1-Score (%)
CNN	89.26	94.03	90.52
CNN-GRU	90.74	94.96	91.25
RNN	91.71	96.23	92.39
Our Method	99.72	99.73	99.72

Table 7. Ablation Study Results on Under Scenario 5.

Method	Precision (%)	Recall (%)	F1-Score (%)
L1	95.81	96.09	95.74
L2	96.94	96.53	96.22
L3	99.72	99.73	99.72

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, L.; Xia, M.; Li, Y.; Xu, J.; Hou, F.; Qi, F. A Multi-Angle Semantic Feature Fusion Method for Web User Behavior Anomaly Detection. Information 2025, 16, 807. https://doi.org/10.3390/info16090807

AMA Style

Wang L, Xia M, Li Y, Xu J, Hou F, Qi F. A Multi-Angle Semantic Feature Fusion Method for Web User Behavior Anomaly Detection. Information. 2025; 16(9):807. https://doi.org/10.3390/info16090807

Chicago/Turabian Style

Wang, Li, Mingshan Xia, Yakang Li, Jiahong Xu, Fengyao Hou, and Fazhi Qi. 2025. "A Multi-Angle Semantic Feature Fusion Method for Web User Behavior Anomaly Detection" Information 16, no. 9: 807. https://doi.org/10.3390/info16090807

APA Style

Wang, L., Xia, M., Li, Y., Xu, J., Hou, F., & Qi, F. (2025). A Multi-Angle Semantic Feature Fusion Method for Web User Behavior Anomaly Detection. Information, 16(9), 807. https://doi.org/10.3390/info16090807

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Multi-Angle Semantic Feature Fusion Method for Web User Behavior Anomaly Detection

Abstract

1. Introduction

2. Related Works

3. The Proposed Model

3.1. Behavior Feature Extraction Algorithm

3.2. Semantic Feature Extraction Model

3.3. MSFFusion Model

4. Experiments and Results

4.1. Datasets

4.2. Evaluation Methods and Model Parameters

4.3. Experimental Results and Analysis

4.3.1. Comparison of Models Under Different Scenarios

4.3.2. Performance Comparison of Semantic Features

4.3.3. Performance Comparison of Behavioral Features

4.3.4. Ablation Study

4.3.5. Efficiency Analysis

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI