Dark Web Traffic Classification Based on Spatial–Temporal Feature Fusion and Attention Mechanism

Li, Junwei; Pan, Zhisong

doi:10.3390/computers14070248

Open AccessArticle

Dark Web Traffic Classification Based on Spatial–Temporal Feature Fusion and Attention Mechanism

by

Junwei Li

^1,2 and

Zhisong Pan

^2,*

¹

Institute of Computer and Information Engineering, Xinxiang University, Xinxiang 453003, China

²

Institute of Command Control Engineering, Army Engineering University, Nanjing 210007, China

^*

Author to whom correspondence should be addressed.

Computers 2025, 14(7), 248; https://doi.org/10.3390/computers14070248

Submission received: 12 May 2025 / Revised: 19 June 2025 / Accepted: 23 June 2025 / Published: 25 June 2025

Download

Browse Figures

Versions Notes

Abstract

There is limited research on current traffic classification methods for dark web traffic and the classification results are not very satisfactory. To improve the prediction accuracy and classification precision of dark web traffic, a classification method (CLA) based on spatial–temporal feature fusion and an attention mechanism is proposed. When processing raw bytes, the combination of a CNN and LSTM is used to extract local spatial–temporal features from raw data packets, while an attention module is introduced to process key spatial–temporal data. The experimental results show that this model can effectively extract and utilize the spatial–temporal features of traffic data and use the attention mechanism to measure the importance of different features, thereby achieving accurate predictions of different dark web traffic. In comparative experiments, the accuracy, recall rate, and F1 score of this model are higher than those of other traditional methods.

Keywords:

network traffic classification; deep learning; spatial–temporal feature; dark web traffic; attention mechanism

1. Introduction

With the rapid development of the Internet, the dark web has become an important platform for illegal activities such as drug trafficking, arms trading, and personal information theft, posing a significant threat to social security [1]. However, due to the high anonymity and concealment of the dark web, traditional network monitoring and tracking technologies struggle to be effective, especially existing network traffic classification methods, which have a low accuracy when identifying and classifying dark web traffic.

Existing traffic classification methods mainly include port-based classification methods, deep packet inspection classification methods, statistical feature-based classification methods, machine learning-based classification methods, and a small number of deep learning-based classification methods [2]. The port-based method classifies network traffic through port mapping, which is simple and easy to implement, but has basically become ineffective with the popularity of dynamic port numbers. The deep packet inspection method compares traffic based on pre-extracted fingerprint features, which provides good classification results for unencrypted traffic but struggles to work for encrypted traffic, especially dark web traffic. The statistical feature-based method, starting from statistical theory, performs load randomness tests and other behaviors on traffic, but is only suitable for simple traffic classification and cannot adapt to more complex traffic. The machine learning-based method has improved the performance of encrypted traffic classification but still heavily relies on manual feature selection, and the algorithm lacks universality and generalization.

Although some people use deep learning technology for traffic classification, it can only target certain dimensions of traffic, with limited performance improvement [3].

To improve the accuracy of dark web traffic prediction and classification accuracy and lay the foundation for subsequent network management and security, this paper proposes a dark web traffic classification method (CLA) based on spatial–temporal feature fusion and an attention mechanism. When processing raw bytes, the method combines a CNN and LSTM to extract local spatial–temporal features from raw data packets and introduces an attention mechanism to distinguish key spatial–temporal data.

Therefore, researching dark web traffic classification technology can further strengthen the identification of illegal online activities, effectively strengthen network management, and ensure network security and social stability.

The structure of this paper is as follows: Section 2 introduces relevant methods and technologies, Section 3 describes the experimental setup, Section 4 analyzes the experimental results, Section 5 discusses the existing problems and future research directions, and Section 6 summarizes the entire thesis and draws research conclusions.

2. Methods

2.1. Definition and Characteristics

According to the hierarchical division of the Internet’s architecture, network resources can be categorized into the Surface Web and the Deep Web. The Surface Web is indexed by traditional search engines (such as Google and Baidu) and relies on static web pages and open hyperlink mechanisms, constrained by the Robots protocol [4,5,6]. In contrast, the Deep Web requires dynamic interaction (such as database queries and permission authentication) to access, with an information storage capacity hundreds of times that of the Surface Web and a superior quality and thematic concentration in specialized fields. However, within the Deep Web, a portion of the network space is accessible only through encrypted protocols (such as Tor and I2P), which have anonymous communication and anti-censorship characteristics [7]. This is the dark web. Due to the inability of standard search engines and conventional web browsing methods to access dark web content, it can only be accessed through special software, configurations, or authorizations [8,9,10]. The dark web is difficult to detect by conventional network monitoring methods. These characteristics make the dark web a breeding ground for criminal and illegal activities, while also posing significant challenges to network security. Figure 1 shows the dark web Tor routing diagram.

2.2. Main Applications and Threats of the Dark Web

Various illegal activities occur on the dark web, including but not limited to drug trafficking, arms dealing, human trafficking, and malware dissemination [11,12,13]. These activities pose serious threats to national security, social stability, and personal privacy. Additionally, the dark web is frequently used for planning and executing cyberattack activities, such as DDoS attacks and phishing attacks [14]. Therefore, conducting classification research on dark web traffic to more effectively identify and block these illegal activities is an important topic in the field of cybersecurity.

2.3. Traditional Traffic Classification Technology

Traditional traffic classification technology mainly relies on basic information such as port numbers, IP addresses, and protocol types for classification. Li Junwei summarized the principles of various traditional traffic classification methods based on port numbers, deep packet inspection, and statistical features. He compared and analyzed the advantages and disadvantages of each method and the scenarios where they are suitable for traffic classification [15]. This is a literature review in which the authors provided a detailed analysis of various traffic classification methods based on port numbers, deep packet inspection, machine learning, and deep learning. The paper discussed the basic principles, main practices, applicable domains, and classification performance of each method. It particularly focused on deep learning, providing a detailed comparison and analysis of mainstream convolutional neural networks such as 1D-CNN, 2D-CNN, and 3D-CNN, as well as methods like LSTM, from the perspectives of algorithms, datasets, and performance results. This review serves as an excellent entry point for beginners in the field of traffic classification.

However, in the dark web environment, the effectiveness of these methods is significantly reduced because dark web traffic typically uses encrypted communication and dynamic ports, making port- and IP-address-based classification methods difficult to apply. Additionally, dark web traffic often transmits through anonymous networks like Tor, further increasing the difficulty of traffic classification.

2.4. Traffic Classification Technology Based on Machine Learning

With the development of machine learning technology, its application in traffic classification has gradually increased. Researchers have begun to use machine learning algorithms such as Support Vector Machines (SVMs) and Random Forests to classify dark web traffic. These methods can improve classification accuracy to some extent by analyzing the statistical features and behavioral patterns of traffic. Kangseok Kim devised an anomaly detection technique grounded in unsupervised machine learning for a network intrusion detection system (NIDS), which proved highly effective in establishing objective thresholds for identifying anomalies, thereby enhancing the system’s reliability in detecting cyberattacks [16]. An anomaly detection method for network intrusion detection systems (NIDSs) based on hybrid unsupervised learning techniques was proposed, using a model combining autoencoders from deep learning with traditional machine learning algorithms. The dataset used was CICIDS2017. Compared to single AE or traditional algorithms, both accuracy and recall rates were significantly improved. However, only accuracy and recall rates were provided, without involving comprehensive metrics such as F1 score, which prevents a full evaluation of the model’s performance.

However, machine learning methods usually require extensive manual feature engineering, and their effectiveness in classifying complex and changing dark web traffic is still limited.

2.5. Traffic Classification Technology Based on Deep Learning

The rise of deep learning technology has brought new possibilities for dark web traffic classification. Deep learning models, especially Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), can automatically learn and extract complex features from traffic data, thereby achieving more precise traffic classification. In recent years, researchers have proposed various deep learning-based dark web traffic classification models, which have yielded good experimental results. Zhang Long proposed a network traffic classification module using NetFlow data in conjunction with a deep neural network, which was demonstrated by its application to two real-world datasets, and an average classification accuracy of 95% was obtained [17]. A traffic classification framework based on deep neural networks (DNNs) was proposed, utilizing NetFlow data to build an end-to-end deep learning model for automatically learning traffic features and performing network traffic classification. The dataset used consisted of two real-world datasets, though their specific names were not explicitly stated. Through comparison with traditional classifiers, the classification accuracy significantly outperformed traditional methods such as SVM, random forest, and K-nearest neighbors, demonstrating the advantages of deep learning in feature extraction and classification efficiency.

Shi Dong proposed a network anomaly traffic detection algorithm based on a variational autoencoder, which combined the network input with a hidden vector in the decoder, enabling more precise and efficient classification results [18]. A network anomaly detection algorithm (A-CAVE) based on variational autoencoder (VAE) was proposed. Its core idea was to learn the latent representation of normal traffic through VAE and identify abnormal traffic using reconstruction errors. The dataset was not explicitly specified. The experiment continued by comparison with traditional methods (such as rule-based detection or SVM), showing that VAE performed better in complex attack scenarios.

Mansoor G. Al-Thani Our’s proposed Traffic Transformer employs the sophisticated multi-head attention mechanism in lieu of the widely used recurrent architecture [19], However, it has a high computational complexity, relies on large-scale data and computing power, and has a limited ability to model the spatial dynamic characteristics of traffic. Silu He proposed a new hypothesis: GDTi behaves macroscopically as a transmitting causal relationship (TCR) underlying traffic flow, which remains stable under a dynamic changing traffic flow [20], but it struggles to dynamically capture the time-varying characteristics of complex interactions in traffic flow and the model requires high-quality data, which may limit its generalization ability in scenarios with long-distance dependencies or noise. Suha Rabbani proposed a contrastive SSL model for stress assessment using ECG signals based on the SimCLR framework [21]. However, it failed to achieve end-to-end traffic classification, resulting in limitations in complex noise scenarios and cross-individual generalization capabilities.

2.6. The Logical Relationships of All Traffic Classification Methods

As of now, numerous studies on traffic classification have been conducted both domestically and internationally, focusing on different classification objects and methods and outputting classification results based on different application requirements. According to the chronological order and adopted technologies, traffic classification methods can be divided into three stages. First, flow classification based on traditional methods, including classification based on port numbers, deep packet inspection, statistical features, and host behavior characteristics. Second, traffic classification based on machine learning and online learning. Third, traffic classification based on deep learning, mostly using deep structures such as convolutional neural networks and recurrent neural networks. The overall research progress and content are shown in Figure 2.

The above traditional traffic classification methods, machine learning-based traffic classification methods, and deep learning-based classification methods have achieved good classification performances in their respective application domains. However, these existing models are “inadequate” or “do not fully capture spatial–temporal features”. Traditional traffic classification methods based on port numbers or deep packet inspection can only be used in specific domains and have a poor generalization performance. The classification performance of machine learning-based traffic classification methods mainly relies on manual feature selection, and these features mainly depend on expert experience, making it difficult to achieve end-to-end traffic classification. Deep learning-based traffic classification methods, when dealing with encrypted traffic, result in high false positives due to the inability to effectively combine the spatial–temporal features of data. There are indeed some methods using deep learning for network traffic classification. These methods either combine two CNNs, RNNs, and LSTM, or use one of them combined with attention mechanisms. Although these models can handle large-scale traffic data and outperform traditional methods and machine learning methods in terms of classification accuracy and efficiency, their performance metrics still need further improvement compared to the proposed CLA method, and they will seem inadequate in the prediction and classification of dark web traffic.

On the basis of these methods, this paper proposes a deep learning-based dark web traffic prediction and classification method called CLA. In summary, CLA considers integrating the temporal and spatial characteristics of dark web traffic, uses CNN and LSTM for the extraction of spatial–temporal features, realizes end-to-end dark web traffic prediction and classification, and greatly improves indicators such as classification efficiency and accuracy.

3. Experimental Setup

3.1. Data Preprocessing

Before building a dark web traffic classification model, it is necessary to collect, clean, and preprocess dark web traffic data [22,23,24]. Dark web traffic data collection is achieved through network sniffers and honeypots. The collected data often contains a large amount of noise and irrelevant information, so data cleaning is required to remove invalid data and abnormal traffic [25,26,27]. After data cleaning, the traffic data still needs to be standardized to meet the requirements of model input [28].

3.2. Dataset Construction

To build an effective dataset, we collect traffic information from multiple dark web sites, perform targeted data labeling, then clean the data to eliminate noise, and finally standardize the data to ensure the reliability and consistency of model training [29,30].

In a specially constructed dark web virtual environment, network traffic data generated in real time is captured using Wireshark 7.0.1. Professional annotation tools such as Labelme are used for bounding box annotation or entity recognition. After annotation, mutual review and verification are conducted to ensure annotation consistency. Finally, a dataset in pcap format is obtained. Moving forward, we will continue to expand the boundaries of our research, especially in the field of classifying dark web traffic, and we look forward to achieving more groundbreaking results.

The dataset does not involve ethical implications and privacy concerns. In order to protect privacy and data security, all data in the thesis have been anonymized before being used for network traffic analysis and anomaly detection research.

Let the original traffic data be D and the noise data be N, then the cleaned data D_clean can be expressed as Formula (1), as follows:

D_{c l e a n} = D - N

(1)

After normalizing the data, assuming a certain feature in the original data is x, with its maximum value as x_max and minimum value as x_min, then the normalized feature x′ is Formula (2), as follows:

x^{'} = \frac{x - x_{m i n}}{x_{m a x} - x}

(2)

3.3. Model Architecture Design

For the problem of dark web traffic classification, this paper proposes a model architecture based on deep learning—the CLA model. This model fully considers the spatial–temporal characteristics of dark web traffic, integrating a CNN and LSTM to form a new model architecture. Specifically, the CNN part considers the spatial characteristics of the traffic data, extracting local features from the data. Subsequently, the LSTM part focuses on the temporal characteristics of the traffic data, capturing the time series of the data. The CLA model is shown in Figure 3.

Meanwhile, to capture the global dependencies between byte data in packets, we introduce a multi-head self-attention module to mine the global dependencies within network flows and assign greater weights to critical data positions. This enables a better capture of complex spatial–temporal representations and enhances local spatial–temporal features. For input

X_{c}

, the multi-head attention module can be expressed as Formulas (3)–(6).

A t t e n t i o n (Q_{i}, K_{i}, V_{i}) = S o f t m a x (\frac{Q_{i} K_{i}^{T}}{\sqrt{D_{K}}}) V_{i}

(3)

S o f t m a x = \frac{e x p (X_{i j})}{\sum_{K} e x p (X_{k j})}

(4)

O_{m s a} = M u l t i h e a d (Q, K, V) = c o n c a t ({h e a d}_{1}, {h e a d}_{2}, \dots, {h e a d}_{H}) W^{m s a}

(5)

{h e a d}_{i} = A t t e n t i o n (Q_{i}, K_{i}, V_{i}) = A t t e n t i o n (X_{c} W_{Q_{i}}, X_{c} K_{i}, X_{c} V_{i})

(6)

Among which,

W_{Q_{i}} \in R^{M_{b} \times D_{K}}

,

W_{K_{i}} \in R^{M_{b} \times D_{K}}

,

W_{V_{i}} \in R^{M_{b} \times D_{v}}

, and

W^{m s a} \in R^{D_{v} \times D_{v}}

belong to the parameter matrix, and Softmax() is a function that normalizes elements column.

Q_{m s a} \in R^{{H N}_{p} \times D_{v}}

is the output of the multi-head self-attention module, and H represents the number of heads.

Spatial and temporal features contain rich data information, but their contributions to dark web traffic prediction and classification are significantly different. Therefore, the model introduces an attention mechanism to effectively integrate and utilize these two types of features. In the spatial feature extraction module, a spatial attention module is used to dynamically weight multi-scale features, strengthen key spatial dimension information, and suppress irrelevant feature interference. In the temporal feature processing stage, the attention mechanism adaptively captures the temporal dependencies between data packets, assigns higher weights to important time nodes, and enhances the model’s perception ability of dynamic behavior patterns. This attention mechanism design achieves fine-grained feature selection and weight distribution, significantly enhancing the model’s feature recognition effectiveness in dark web traffic prediction and classification.

The convolutional neural network model consists of the following eight trainable layers: five convolutional layers and three fully connected layers, as well as three non-trainable pooling layers; the LSTM has three layers. Input represents the inputs to the neural network, which are grayscale images generated from encrypted traffic. Output represents the outputs of the neural network, which are the category labels of encrypted data. The network uses the rectified linear unit function (ReLU) for activation, with a dropout probability of 0.5 for neuron deactivation, the maximum pooling method, and the multi-head attention mechanism.

Through the full integration of the CNN and LSTM, the CLA model can consider both the temporal and spatial features of traffic data, effectively improving the accuracy of traffic data prediction and classification. Specifically, the model first performs a convolution operation on the traffic data through the CNN layer, extracting the local feature maps of the traffic.

Then, these feature maps are input into the LSTM layer, where the LSTM layer, through memory units and gating mechanisms, captures the time series features of the traffic data. Finally, the output of the LSTM layer is connected to a fully connected layer for classification decision making.

3.4. Model Training and Optimization

During model training, a cross-entropy loss function is specifically introduced, and the model is further optimized using the Adam optimizer. The cross-entropy loss function is given by Formula (7), as follows:

L (x, \hat{y}) = - \frac{1}{N} \sum_{i = 1}^{N} [y_{i} \log ({\hat{y}}_{i} + (1 - y_{i}) \log (1 - {\hat{y}}_{i})] L (y, \hat{y})

(7)

To make full use of the dataset and avoid the impact of specific data points on the results, this paper adopts five-fold cross-validation, conducting five rounds in each iteration. The dataset is randomly divided into five equal parts, with each part serving as testing data in turn in each round, while the remaining parts of the dataset serve as training data.

To verify whether the network is overfitted, a test is conducted after each epoch of training. An epoch refers to training all training data once. After 10 epochs, both training accuracy and testing accuracy are stabilized. During the 20-epoch training process, the network does not show signs of overfitting. The testing accuracy maintains a high level in the later stages and slowly rises to stability.

By using the cross-entropy loss function, the predicted and classified results of CLA can be compared with the true labels of the traffic data, and the cross-entropy loss function can be further adjusted based on the differences. The Adam optimizer can quickly converge to the optimal solution. Additionally, to further enhance the generalization ability of CLA, regularization is introduced to prevent overfitting and data augmentation is used to generate more samples for training. Furthermore, the training process is optimized by adjusting the learning rate and batch size, among other hyperparameters.

4. Results

4.1. Experimental Environment and Dataset

To verify the effectiveness of the proposed model, experiments were conducted on a server with high configuration, using a dataset containing various types of dark web activities. The dataset included different types of dark web traffic such as drug transactions, weapon sales, and malicious software dissemination. To protect privacy and data security, all data were anonymized. Specifically, the experimental environment was configured as follows: Intel Xeon E5-2620v4 processor, NVIDIA Tesla P100 GPU, 256 GB memory. The dataset containsed100,000 flow samples, each with a length of 1024 bytes, containing 50% normal traffic and 50% malicious traffic. The dataset was obtained through collaboration with relevant research institutions and was rigorously filtered and preprocessed to ensure its representativeness and balance.

4.2. Experimental Procedures

For data preprocessing and data cleaning, network sniffers and honeypot technologies were used to collect dark web traffic data, removing invalid data and abnormal traffic. For data standardization, the cleaned traffic data was standardized to meet the requirements of model input. For model construction, a Convolutional Neural Network layer was built to extract local features from the traffic data. A long short-term memory network layer was built to capture the time series features of the traffic data.

The output of the LSTM layer was connected to a fully connected layer for classification decisions and model training.

The deviation between the predicted results by CLA and the true labels of the traffic data was compared using the cross-entropy loss function.

The Adam optimizer was used to quickly converge to the optimal solution. For data augmentation, more training samples were generated by randomly transforming the original data to increase the training dataset size.

Regularization terms were added to the loss function to prevent model overfitting.

Model evaluation was performed as follows:

Accuracy Evaluation: We calculated the accuracy of the model in different types of dark web traffic classification tasks.

Efficiency Evaluation: We evaluated the processing speed and robustness of the model when handling large-scale traffic data.

Ablation Study: We removed key components of the model one by one to assess the impact of each component on the model’s performance.

Comparative Study: We compared the proposed model with traditional traffic classification methods and machine learning-based traffic classification methods.

4.3. Experimental Results of Traffic Classification

The experimental results show that the proposed deep learning-based model performed excellently in dark web traffic classification tasks. The model’s classification accuracy for different types of dark web traffic reached over 90%, especially achieving a 93% and 95% accuracy in identifying drug trafficking and arms dealing, respectively. The experimental results also show the volume of different types of dark web traffic data in the dataset. Malware Spread was the highest, reaching three stars; both Drug Trafficking and Arms Dealing reached two stars; Human Trafficking was the lowest, with only one star; and the remaining Other Illegal Activities also accounted for two stars.

These results indicate that deep learning models can effectively capture complex features in dark web traffic, achieving high-precision classification. Specific experimental results are shown in Table 1.

It can be seen that when performing traffic identification and classification on existing dark web traffic datasets, the results mainly fall into the following five categories: Drug Trafficking, Arms Dealing, Malware Spread, Human Trafficking, and Other Illegal Activities. By visualizing the accuracy rates and data volumes of these categories, we find that Drug Trafficking is the easiest to identify, but not the most abundant in terms of data volume; conversely, Malware Spread has the largest data volume, but its accuracy rate is only moderate, as shown in Figure 4. Since there is no direct correlation between accuracy rates and data volumes, this, to some extent, reflects the reliability of the dark web traffic classification and identification algorithms.

Figure 5 shows the test results of the proposed method, which are the average values of 10 experiments rounded to integers.

4.4. Experimental Results of Contrast Experiment

We designed comparative experiments to compare the deep learning model with other traffic classification methods (such as Support Vector Machines, Random Forests, CNN, and LSTM). The experimental results show that the deep learning model had significant advantages in terms of classification accuracy and efficiency. Notably, the LSTM model demonstrated strong capabilities in processing time series data, making it suitable for dark web traffic, which has time-related characteristics.

When evaluating model performance, we used accuracy (Accuracy), recall (Recall), and F1 score as metrics. Assuming that the number of true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN) is known, accuracy and recall can be expressed as Formulas (8)–(11).

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(8)

R e c a l l = \frac{T P}{T P + F N}

(9)

F 1 = 2 * \frac{P r e c i s i o n * R e c a l l}{P r e c i s i o n + R e c a l l}

(10)

Among them, precision is shown in Formula (11), as follows:

P r e c i s i o n = \frac{T P}{T P + F P}

(11)

In the four sets of comparative experiments designed, the accuracy, recall, and F1 of the Support Vector Machines and Random Forest algorithms based on machine learning in classifying dark web traffic are much lower than those of the traffic classification methods based on deep learning.

Among the three dark web traffic classification methods based on deep learning, CLA performs better than CNN and LSTM in terms of accuracy, recall, and F1. Concretely, the traffic classification accuracy is compared in Figure 6, which shows that the CLA model has a significantly higher accuracy compared to other traffic classification methods, while the traffic classification recall is compared in Figure 7, which shows that the CLA model has a significantly higher recall compared to other traffic classification methods, and the traffic classification F1 score is compared in Figure 8, which shows that the CLA model has a significantly higher F1 score compared to other traffic classification methods.

4.5. Model Performance Evaluation

The comparison and analysis of five algorithms—SVM, RM, CNN, LSTM, and CLA—in terms of their usage scenarios, accuracy, recall, and F1 score for dark web traffic classification are shown in Table 2.

Apart from accuracy, the model also demonstrates a good performance in processing efficiency and robustness. When handling large-scale traffic data, the model can maintain a fast processing speed, meeting the requirements of real-time monitoring. Additionally, the model has a strong resistance to interference from noisy data and abnormal traffic, allowing it to operate stably in different network environments.

Table 3 presents a comparison and analysis of SVM, RM, CNN, LSTM, and CLA in terms of feature engineering requirements, performance, generalization, and real-time applicability, in addition to the following three classification performance metrics: Accuracy, Recall, and F1. SVM and RM are based on machine learning algorithms and require the most feature engineering. However, due to these features being specifically extracted for certain algorithms, their generalization performance is poor, and their performance is not superior to other methods. CNN, LSTM, and CLA can all achieve end-to-end traffic classification. Since they do not rely on manual feature selection, their feature engineering requirements are significantly less than SVM and RM, and their performance and generalization are better than SVM and RM. When analyzing real-time applicability, SVM and RM are better. CLA is slightly worse due to the additional attention mechanism, while CNN and LSTM are in the middle.

From the above assessment, it can be seen that current traffic classification methods have limited research on dark web traffic and the classification results are not very satisfactory. To improve the prediction accuracy and classification precision of dark web traffic, a classification method (CLA) based on spatial–temporal feature fusion and attention mechanism is proposed. When processing raw bytes, the combination of a CNN and LSTM is used to extract local spatial–temporal features from raw data packets, while an attention module is introduced to process key spatial–temporal data. The experimental results show that this model can effectively extract and utilize the spatial–temporal features of traffic data and use the attention mechanism to measure the importance of different features, thereby achieving the accurate prediction of different dark web traffic. In comparative experiments, the accuracy, recall rate, and F1 score of this model are higher than those of other traditional methods.

Due to the more comprehensive selection of spatiotemporal features and the targeted training enabled by the attention mechanism, CLA can effectively overcome the weaknesses of other algorithms while maintaining a superior performance.

Furthermore, comparative experiments and ablation experiments were designed and conducted, further proving the advanced performance of the CLA model in predicting and classifying dark web traffic.

5. Discussion

5.1. Research Summary

A dark web traffic classification model based on deep learning, CLA, was proposed. By combining a CNN and LSTM, the model can effectively extract and utilize spatial and temporal features of traffic data, achieving high-precision classification. The experimental results show that the model performs excellently in various dark web traffic classification tasks, particularly in identifying illegal activities with a high accuracy. Additionally, the model demonstrates good performance in processing efficiency and robustness, meeting the requirements of real-time monitoring.

Compared with the related papers of DeepPacket and DeepFlow, the proposed CLA in this paper has the following differences:

(1): Regarding DeepPacket, the most well-known work is “Deep Packet: A Novel Approach for Encrypted Traffic Classification Using Deep Learning” [31], published in 2020 by Lotfollahi M, Jafari Siavoshani M, Shirali Hossein Zade R, et al. in the journal “Soft Computing”. The main network structure used is SAE + CNN, without incorporating attention mechanisms.
(2): Regarding DeepFlow, a frequently mentioned paper is “Network-Centric Distributed Tracing with DeepFlow: Troubleshooting Your Microservices in Zero Code” [32], co-authored by Professor Yin Xia’s team from the Department of Computer Science and Technology at Tsinghua University and the DeepFlow team at Yunsong Networks. This paper was published in the SIGCOMM 2023 conference proceedings. The system achieves zero instrumentation, full-stack coverage, and universal tagging for observability, greatly reducing the complexity of implementing observability in cloud-native applications.

In summary, the architectures adopted by the above two techniques differ significantly from the CNN + LSTM + Attention model proposed in this paper, which, to some extent, verifies the uniqueness of our proposed CLA. However, the examples cited by the reviewer are indeed worthy of our attention and learning. Therefore, we have included these two papers in the research status section and added them to the reference list. Once again, we appreciate the reviewer’s guidance.

5.2. Research Prospects

Future research can consider introducing more types of traffic features, such as application layer protocol features and user behavior features, to further improve classification performance. At the same time, exploring the lightweight design of the model for deployment on resource-constrained devices is also an important direction for future research. Additionally, as dark web technologies continue to evolve, traffic classification models also need to be updated and optimized to address new challenges and threats. Therefore, future research should continuously pay attention to the development dynamics of dark web technologies, adjusting and optimizing the classification model in a timely manner to ensure its effectiveness and adaptability.

5.3. Future Research Directions

5.3.1. Introducing More Types of Traffic Features

To further improve classification performance, future research can consider introducing more types of traffic features. For example, application layer protocol features can provide information about the type of traffic application and communication content, while user behavior features can reflect the activity patterns and habits of users in the network. By combining these multi-dimensional traffic features, the model can more comprehensively capture the complex features of dark web traffic, improving the accuracy and robustness of classification.

5.3.2. Lightweight Design of the Model

With the development of the IoT and edge computing, more network devices need to possess the ability to classify and monitor traffic. However, these devices are often resource-constrained and cannot run complex deep learning models. Therefore, exploring lightweight design techniques such as model compression and pruning to deploy the model on resource-constrained devices is an important direction for future research.

5.3.3. Addressing the Continuous Evolution of Dark Web Technologies

Dark web technologies are constantly evolving, with new anonymous network protocols and encryption technologies continuously emerging, posing new challenges to traffic classification. To address these challenges, future research needs to continuously monitor the development dynamics of dark web technologies, adjusting and optimizing the classification model in a timely manner. For example, researching new de-anonymization techniques and traffic analysis techniques to address the encryption and anonymity challenges of dark web traffic. Additionally, exploring the use of emerging technologies such as federated learning and multimodal learning can improve the adaptability and generalization ability of traffic classification models.

6. Conclusions

This study initially explored dark web traffic classification using deep learning models. The experimental results show that this method continues to demonstrate promising performance. Traditional approaches to classifying web traffic are inadequate in the dark web context, failing to efficiently detect and track illicit activities. However, advancements in deep learning have opened new avenues for dark web traffic classification through the implementation of deep learning-based models. This paper presents a novel deep learning model, CLA, that fuses spatial–temporal features to effectively extract and leverage the spatial and temporal characteristics of traffic data, enabling precise classification. The experimental outcomes demonstrate the model’s exceptional performance in diverse dark web traffic classification tasks, achieving accuracy rates of 93% and 95% for identifying drug trafficking and arms sales, respectively. Moreover, in comparative studies, the model outperforms traditional methods in terms of accuracy, recall, and F1 score. Given the evolving nature of dark web technology, future investigations can incorporate adaptive learning strategies to enhance the model’s resilience and classification efficacy.

We have, therefore, conducted further literature reviews and found that research on traffic classification models based on machine learning methods such as SVM and RM has largely stalled. Studies based on CNNs mostly rely on converting traffic data into images, while those based on LSTM focus more on the temporal characteristics of traffic. Classification models combining CNN and LSTM do not use attention mechanisms and perform slightly worse in comparison. Of course, we will continue to follow more research and the latest developments in dark web traffic classification in our future work to ensure the fairness of model comparison and the reliability of research findings.

In the future, we plan to further optimize the model structure, introduce more data augmentation techniques, improve classification performance, and apply it to actual cybersecurity monitoring. Research on deep learning-based dark web traffic classification provides a new perspective and method for the cybersecurity field. By building efficient classification models, we can effectively identify and combat illegal activities on the dark web, ensuring cybersecurity and social stability. Future research should continue to explore new traffic features and model architectures to improve classification performance, while paying attention to the development dynamics of dark web technologies, and adjusting and optimizing the classification models in a timely manner to ensure their effectiveness and adaptability.

Author Contributions

J.L.: writing—original draft, visualization, methodology, investigation, formal analysis, conceptualization. Z.P.: supervision, visualization, investigation, suggestions, project administration. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China grant No. 61473149 and No. 62076251 and the Henan Province Young Core Teacher Development Program Project No. 2023GGJS161.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All correspondence and requests for materials should be addressed to Zhisong Pan.

Acknowledgments

The authors express their gratitude to Yanna Li, Jing Lu for their assistance with the experiments and their valuable discussions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Behdadnia, T.; Deconinck, G.; Ozkan, C.; Singelee, D. Encrypted Traffic Classification for Early-Stage Anomaly Detection in Power Grid Communication Network. In Proceedings of the 2023 IEEE PES Innovative Smart Grid Technologies Europe (ISGT EUROPE), Grenoble, France, 23–26 October 2023; pp. 1–6. [Google Scholar]
Adelipour, S.; Haeri, M. Privacy-Preserving Model Predictive Control Using Secure Multi-Party Computation. In Proceedings of the 2023 31st International Conference on Electrical Engineering (ICEE), Tehran, Iran, 9–11 May 2023; pp. 915–919. [Google Scholar]
Ishizawa, R.; Sato, H.; Takadama, K. From Multipoint Search to Multiarea Search: Novelty-Based Multi-Objectivization for Unbounded Search Space Optimization. In Proceedings of the 2024 IEEE Congress on Evolutionary Computation (CEC), Yokohama, Japan, 30 June–5 July 2024; pp. 1–8. [Google Scholar]
Yan, J.; Chen, J.; Zhou, Y.; Wu, Z.; Lu, L. An Uncertain Graph Method Based on Node Random Response to Preserve Link Privacy of Social Networks. KSII Trans. Internet Inf. Syst. 2024, 18, 147–169. [Google Scholar]
Peng, Y.; Liu, Q.; Tian, Y.; Wu, J.; Wang, T.; Peng, T.; Wang, G. Dynamic Searchable Symmetric Encryption with Forward and Backward Privacy. In Proceedings of the 2021 IEEE 20th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Shenyang, China, 20–22 October 2021; pp. 420–427. [Google Scholar]
Kabanov, I.S.; Yunusov, R.R.; Kurochkin, Y.V.; Fedorov, A.K. Practical cryptographic strategies in the post-quantum era. AIP Conf. Proc. 2018, 1936, 020021. [Google Scholar]
Guo, C.; Katz, J.; Wang, X.; Yu, Y. Efficient and Secure Multiparty Computation from Fixed-Key Block Ciphers. In Proceedings of the 2020 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 18–20 May 2020; pp. 825–841. [Google Scholar]
Zhao, J.; Jing, X.; Yan, Z.; Pedrycz, W. Network traffic classification for data fusion: A survey. Inf. Fusion 2021, 72, 22–47. [Google Scholar] [CrossRef]
Xiang, B.; Zhang, J.; Deng, Y.; Dai, Y.; Feng, D. Fast Blind Rotation for Bootstrapping FHEs. In Advances in Cryptology—CRYPTO 2023; Handschuh, H., Lysyanskaya, A., Eds.; Springer: Berlin/Heidelberg, Germany, 2023. [Google Scholar]
Jain, A.; Jin, Z. Non-Interactive Zero Knowledge from Sub-exponential DDH. In Advances in Cryptology—EUROCRYPT 2021; Canteaut, A., Standaert, F.X., Eds.; Springer: Berlin/Heidelberg, Germany, 2021. [Google Scholar]
Gordon, S.D.; Starin, D.; Yerukhimovich, A. The More the Merrier: Reducing the Cost of Large-Scale MPC. In Advances in Cryptology—EUROCRYPT 2021, Lecture Notes in Computer Science; Canteaut, A., Standaert, F.X., Eds.; Springer: Cham, Switzerland, 2021; Volume 12697. [Google Scholar] [CrossRef]
Kaur, P.; Kumar, N.; Singh, M. Biometric cryptosystems: A comprehensive survey. Multimed. Tools Appl. 2022, 82, 16635–16690. [Google Scholar] [CrossRef]
Li, X.; Xu, L.; Zhang, H.; Xu, Q. Differential Privacy Preservation for Graph Auto-Encoders. Neurocomputing 2023, 521, 113–125. [Google Scholar] [CrossRef]
Shiraly, D.; Eslami, Z.; Pakniat, N. Hierarchical Identity-Based Authenticated Encryption with Keyword Search over encrypted cloud data. J. Cloud Comput. 2024, 13, 112. [Google Scholar] [CrossRef]
Li, J.; Pan, Z. Network Traffic Classification Based on Deep Learning. KSII Trans. Internet Inf. Syst. 2020, 14, 4246–4267. [Google Scholar]
Kim, K. An Effective Anomaly Detection Approach based on Hybrid Unsupervised Learning Technologies in NIDS. KSII Trans. Internet Inf. Syst. 2024, 18, 494–510. [Google Scholar]
Long, Z.; Jinsong, W. Network Traffic Classification Based on a Deep Learning Approach Using NetFlow Data. Comput. J. 2023, 66, 1882–1892. [Google Scholar] [CrossRef]
Dong, S.; Su, H.; Liu, Y. A-CAVE: Network abnormal traffic detection algorithm based on variational autoencoder. ICT Express 2023, 9, 896–902. [Google Scholar] [CrossRef]
Al-Thani, M.G.; Sheng, Z.; Cao, Y.; Yang, Y. Traffic Transformer: Transformer-based framework for temporal traffic accident prediction. AIMS Math. 2024, 9, 12610–12629. [Google Scholar] [CrossRef]
He, S.; Luo, Q.; Du, R.; Zhao, L.; He, G.; Fu, H.; Li, H. STGC-GNNs: A GNN-based traffic prediction framework with a spatial–temporal Granger causality graph. Phys. A Stat. Mech. Its Appl. 2023, 623, 128913. [Google Scholar] [CrossRef]
Rabbani, S.; Khan, N. Contrastive Self-Supervised Learning for Stress Detection from ECG Data. Bioengineering 2022, 9, 374. [Google Scholar] [CrossRef]
NIST Standardization Team. CRYSTALS-Kyber: A Post-Quantum Public-Key Encryption and Key-Establishment Algorithm. NIST Spec. Publ. 2024, 800–802. Available online: https://github.com/srm1071/kyber/ (accessed on 22 June 2025).
Sheikh, A.; Singh, K.U.; Jain, A.; Chauhan, J.; Singh, T.; Raja, L. Lightweight Symmetric Key Encryption to Improve the Efficiency and Safety of the IoT. In Proceedings of the 2024 IEEE International Conference on Contemporary Computing and Communications (InC4), Bangalore, India, 15–16 March 2024; pp. 1–5. [Google Scholar]
Joshi, S.; Choudhury, A.; Minu, R.I. Quantum blockchain-enabled exchange protocol model for decentralized systems. Quantum Inf. Process. 2023, 22, 404. [Google Scholar] [CrossRef]
Chia, P.H.; Desfontaines, D.; Perera, I.M.; Simmons-Marengo, D.; Li, C.; Day, W.-Y.; Wang, Q.; Guevara, M. KHyperLogLog: Estimating Reidentifiability and Joinability of Large Data at Scale. In Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 20–22 May 2019; pp. 350–364. [Google Scholar]
Shaw, S.; Dutta, R. Post-quantum secure compact deterministic wallets from isogeny-based signatures with rerandomized keys. Theor. Comput. Sci. 2025, 1035, 115–127. [Google Scholar] [CrossRef]
Ge, J.; Shan, T.; Xue, R. Tighter QCCA-Secure KEM with Explicit Rejection. IEEE Trans. Inf. Forensics Secur. 2022, 17, 1789–1802. [Google Scholar]
Gurpur, S. Post-Quantum Cryptography: Preparing for the Quantum Threat. Comput. Fraud. Secur. 2024, 2024, 114–122. [Google Scholar] [CrossRef]
Alexandru, A.B.; Pappas, G.J. Secure Multi-party Computation for Cloud-Based Control. Priv. Dyn. Syst. 2019, 16, 179–207. [Google Scholar]
Su, T.; Wang, J.; Hu, W.; Dong, G.; Gwanggil, J. Abnormal traffic detection for internet of things based on an improved residual network. Comput. Mater. Contin. 2024, 79, 4433–4448. [Google Scholar] [CrossRef]
Lotfollahi, M.; Jafari Siavoshani, M.; Shirali Hossein Zade, R.; Saberian, M. Deep packet: A novel approach for encrypted traffic classification using deep learning. Soft Comput. 2020, 24, 1999–2012. [Google Scholar] [CrossRef]
Shen, J.; Zhang, H.; Xiang, Y.; Shi, X.; Li, X.; Shen, Y.; Zhang, Z.; Wu, Y.; Yin, X.; Wang, J.; et al. Network-Centric Distributed Tracing with DeepFlow: Troubleshooting Your Microservices in Zero Code. ACM SIGCOMM Comput. Commun. Rev. 2023, 53, 1–27. [Google Scholar]

Figure 1. Tor routing diagram.

Figure 2. Research progress of network traffic classification.

Figure 3. CLA classification model.

Figure 4. Comparison of dark web traffic classification.

Figure 5. Model test results.

Figure 6. Accuracy of various traffic classification algorithms.

Figure 7. Recall of various traffic classification algorithms.

Figure 8. F1 of various traffic classification algorithms.

Table 1. Dark web traffic prediction.

Dark Web Traffic Category	Accuracy	Data Volume (Use ★ to Indicate the Amount of Data)
Drug Trafficking	93%	★★
Arms Dealing	95%	★★
Malware Spread	92%	★★★
Human Trafficking	91%	★
Other Illegal Activities	89%	★★

Table 2. Comparative analysis of experimental results.

Algorithms	Applicable Scenarios	Accuracy Advantage	Recall Advantage	F1 Advantage
SVM	High-dimensional data and limited sample classification	Stability depends on kernel function selection	Low (dependence on kernel function design)	Minimum (high precision but low recall)
RM	Small sample, linear and separable data	Performs well on high-dimensional data with balanced categories	Minimum (requires manual parameter adjustment for improvement)	Low (dependent on parameter optimization)
CNN	Image/spatial feature processing	Top in image classification	Medium (dependent on local feature capture)	Medium (high on image tasks, medium on sequence tasks)
LSTM	Time series data modeling	Higher in temporal tasks	High (capturing long-term dependencies)	Medium (high in temporal tasks, low in spatial tasks)
CLA	Multi-source information fusion	Highest (considering comprehensive spatial–temporal features)	Maximum (joint modeling for false negative reduction)	Overall performance is optimal

Table 3. Summary table of comparison approaches.

Algorithms	Feature Engineering Requirements	Performance	Generalization	Real-Time Applicability
Algorithms	The Number of ★ Indicates the Strength of Each Column Index
SVM	★★★	★	★	★★★
RM	★★★	★	★	★★★
CNN	★	★★	★★★	★★
LSTM	★	★★	★★★	★★
CLA	★	★★★	★★★	★

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, J.; Pan, Z. Dark Web Traffic Classification Based on Spatial–Temporal Feature Fusion and Attention Mechanism. Computers 2025, 14, 248. https://doi.org/10.3390/computers14070248

AMA Style

Li J, Pan Z. Dark Web Traffic Classification Based on Spatial–Temporal Feature Fusion and Attention Mechanism. Computers. 2025; 14(7):248. https://doi.org/10.3390/computers14070248

Chicago/Turabian Style

Li, Junwei, and Zhisong Pan. 2025. "Dark Web Traffic Classification Based on Spatial–Temporal Feature Fusion and Attention Mechanism" Computers 14, no. 7: 248. https://doi.org/10.3390/computers14070248

APA Style

Li, J., & Pan, Z. (2025). Dark Web Traffic Classification Based on Spatial–Temporal Feature Fusion and Attention Mechanism. Computers, 14(7), 248. https://doi.org/10.3390/computers14070248

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Dark Web Traffic Classification Based on Spatial–Temporal Feature Fusion and Attention Mechanism

Abstract

1. Introduction

2. Methods

2.1. Definition and Characteristics

2.2. Main Applications and Threats of the Dark Web

2.3. Traditional Traffic Classification Technology

2.4. Traffic Classification Technology Based on Machine Learning

2.5. Traffic Classification Technology Based on Deep Learning

2.6. The Logical Relationships of All Traffic Classification Methods

3. Experimental Setup

3.1. Data Preprocessing

3.2. Dataset Construction

3.3. Model Architecture Design

3.4. Model Training and Optimization

4. Results

4.1. Experimental Environment and Dataset

4.2. Experimental Procedures

4.3. Experimental Results of Traffic Classification

4.4. Experimental Results of Contrast Experiment

4.5. Model Performance Evaluation

5. Discussion

5.1. Research Summary

5.2. Research Prospects

5.3. Future Research Directions

5.3.1. Introducing More Types of Traffic Features

5.3.2. Lightweight Design of the Model

5.3.3. Addressing the Continuous Evolution of Dark Web Technologies

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI