CLSTM-MT (a Combination of 2-Conv CNN and BiLSTM Under the Mean Teacher Collaborative Learning Framework): Encryption Traffic Classification Based on CLSTM (a Combination of 2-Conv CNN and BiLSTM) and Mean Teacher Collaborative Learning

Qiu, Xiaozong; Yan, Guohua; Yin, Lihua

doi:10.3390/app15095089

Open AccessArticle

CLSTM-MT (a Combination of 2-Conv CNN and BiLSTM Under the Mean Teacher Collaborative Learning Framework): Encryption Traffic Classification Based on CLSTM (a Combination of 2-Conv CNN and BiLSTM) and Mean Teacher Collaborative Learning

by

Xiaozong Qiu

,

Guohua Yan

and

Lihua Yin

^*

Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou 510006, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(9), 5089; https://doi.org/10.3390/app15095089

Submission received: 13 March 2025 / Revised: 10 April 2025 / Accepted: 29 April 2025 / Published: 3 May 2025

Download

Browse Figures

Versions Notes

Abstract

The identification and classification of network traffic are crucial for maintaining network security, optimizing network management, and ensuring reliable service quality. These functions help prevent malicious activities, such as network attacks and illegal intrusions, while supporting the efficient allocation of network resources and enhancing user experience. However, the widespread use of traffic encryption technology, while improving data transmission security, also obscures the content of traffic, making it challenging to accurately classify and identify encrypted traffic. This limitation hampers both network security maintenance and further improvements in service quality. Therefore, there is an urgent need to develop an efficient and accurate encryption traffic identification method. This study addresses three key challenges: First, existing methods fail to explore the potential relationship between flow load features and sequence features during feature extraction. Second, there is a need for approaches that can adapt to the diverse characteristics of different protocols, ensuring the accuracy and robustness of encrypted traffic identification. Third, traditional deep learning models need large amounts of labeled data, which are expensive to acquire. To overcome these challenges, we propose an encrypted traffic recognition method based on a CLSTM model (a combination of 2-conv CNN and BiLSTM) and Mean Teacher collaborative learning. This approach detects and integrates traffic load features with sequence features to improve the accuracy and robustness of encrypted traffic identification while reducing the model’s reliance on labeled data through the consistency constraint of unlabeled data using Mean Teacher. Experimental results demonstrate that the CLSTM-MT collaborative learning method outperforms traditional methods in encrypted traffic identification and classification, achieving superior performance even with limited labeled data, thus addressing the high cost of data labeling.

Keywords:

encrypted traffic classification; convolutional neural network (CNN); Bidirectional Long Short-Term Memory (BiLSTM); semi-supervised learning; deep learning

1. Introduction

Network traffic has become an indispensable part of modern society. For the purpose of protecting user’s privacy and data security, traffic encryption technology has been universally used in data transmission. However, this also presents many new challenges for network traffic identification. Factors such as the diversity of encryption algorithms [1], data security and privacy, and dynamically changing traffic patterns have increased the difficulty of identifying encrypted traffic.

Nowadays, mainstream encryption traffic identification methods [2] mainly include feature-based methods, machine-learning-based methods, and deep-learning-based methods. Feature-based methods rely on the manual selection and manual extraction of explicit features. Although this method is intuitive and easy to implement, its performance is easily affected by changes in the encryption algorithm, resulting in low recognition accuracy. Machine-learning-based methods can automatically learn the characteristics of traffic from data to decrease the dependence on specific features and improve the flexibility of the model, but they still face the problem of complex model training and the high expense of data labeling when dealing with encrypted traffic. However, deep-learning-based methods count heavily on the manual annotation of data during model training. In particular, when new encryption protocols or algorithms are encountered, the robustness and adaptability of existing models are often insufficient, and they cannot respond to new situations quickly and effectively. Therefore, there is an urgent need to develop a new method with high detection accuracy and low data annotation cost, as well as strong robustness and adaptability, to accurately identify encrypted application traffic.

The purpose of this method is to propose a cryptographic application traffic identification method based on CLSTM (a combination of CNN and BiLSTM) and Mean Teacher collaborative learning, aiming at the following.

Improve recognition accuracy: The strong feature extraction capabilities of CNN and BiLSTM, combined with the advantages of Mean Teacher in semi-supervised learning, are leveraged to enhance recognition accuracy.

Enhance robustness and adaptability: The consistency constraint on unlabeled data is used to improve the robustness and accuracy of the model to identify different types of encrypted traffic.

Reduce the cost of traffic labeling: Under the Mean Teacher framework, the overall training cost of the model is reduced by using partially labeled data and mostly unlabeled data to train the model. To achieve these objectives, we designed the following scheme:

Dataset preparation: Collect and preprocess encrypted traffic data from publicly available datasets.

CLSTM model design: Construct a feature extraction model combining CNN and BiLSTM.

Integration of the Mean Teacher framework: Integrate the Mean Teacher framework into the CLSTM model.

Experimental validation: Experiments were designed to compare the effect of existing methods, and the effectiveness of the method is verified by ablation experiments.

Through the above methods, we aim to propose a novel approach that excels in the area of encrypted traffic identification, not only improving the accuracy of traffic recognition but also maintaining good robustness and adaptability while reducing the cost of traffic data labeling.

The rest part of this article is organized as follows. In the second section, we analyze and summarize the relevant work in the area of encrypted traffic classification. In the third section, we have introduced the system structure of CLSTM-MT and the data preprocessing module and its classification process. In the fourth part, through the detailed introduction of the whole experiment environment, the dataset used in the experiment, and the evaluation indicators of the experiment, we carried out the evaluation and visual analysis of the experimental outcomes. Finally, the thesis is summarized in Section 5.

2. Related Work

2.1. Rule-Based Methods

Rule-based methods [3] typically count on the port number or protocol identifier of the traffic to identify the application, such as HTTP traffic on port 80 or port 443. However, these methods are almost ineffective in encrypted traffic identification [4] because encryption protocols obscure port information and protocol flags.

2.2. Statistical-Feature-Based Methods

Statistical-feature-based methods perform classification by analyzing the statistical characteristics of traffic, such as the traffic packet size and time intervals. APPScanner [5] leverages the statistical features of packet sizes to train a random forest classifier. Huang et al. [6] rely entirely on statistical features derived from flow headers and use the K-Nearest Neighbor (KNN) algorithm for classification without depending on port numbers or the application-layer payload content. While these features are relatively apparent in non-encrypted traffic, encrypted data obscure the statistical features, making them less reliable in encrypted traffic. Therefore, this approach also faces challenges in the identification of encrypted traffic.

2.3. Machine-Learning-Based Methods

Machine-learning-based methods [7,8] utilize algorithms, such as Support Vector Machines (SVMs) and random forests, to study traffic features. Earlier researchers used load data and statistical features [9,10], both of which are appropriate for the identification of specific scenes in complex traffic. Hao et al. [11] improved on the traditional SVM method and proposed an SVM network optimization method by calculating the feature weights and parameter values of each individual SVM binary classifier. Despite their good performance in non-encrypted traffic identification, these methods still face challenges in encrypted traffic identification due to the lack of clear feature patterns. Consequently, traditional traffic identification methods often perform poorly in encrypted traffic recognition.

2.4. Deep-Learning-Based Methods

A review of existing traffic identification methods reveals that traditional port- and protocol-based, statistical-feature-based, and machine-learning-based methods have significant limitations in encrypted traffic identification.

Deep-learning-based approaches [12] have already achieved significant advantages in the area of traffic classification. Yang et al. [13] first proposed a scheme using a convolutional neural network (CNN) to extract encrypted traffic features, and its classification performance is observably better than that of traditional machine learning methods. Huang et al. [14] converted raw bytes of encrypted flow into images and encoded them using a masked autoencoder [15] (MAE), capturing traffic patterns through the reconstruction. Liu et al. [16] dug out the deep sequence characteristics of traffic through an in-depth analysis of the potential characteristics of traffic and proposed a multi-layer flow codec structure. Yao et al. [17] introduced an attention mechanism while extracting the time series characteristics of traffic through recurrent neural networks. Lin et al. [18] transformed raw bytes of traffic into text tokens, employing models based on Transformer architectures [19].

Since data encryption does not alter the overall structure of the data flow, even after encryption, network traffic remains a sequence of data with a start and end point. Therefore, convolutional neural network (CNN) models remain effective for classifying encrypted traffic. However, encryption can scramble request information in certain parts of the traffic protocol, making it difficult to identify features across different data segments. Thus, this study considers incorporating the recognition of temporal features in encrypted traffic to improve traditional deep learning models. In this domain, recurrent neural networks (RNNs) and their improved versions, such as BiLSTM, have shown better performance. Therefore, this article combines the spatial features extracted by CNN with the time features extracted by BilSTM to learn the flow features.

2.5. Semi-Supervised Learning Methods

Additionally, we consider introducing semi-supervised learning methods [20,21,22] to reduce the dependency of deep learning models on labeled datasets. Mean Teacher [23] is a general framework for semi-supervised learning that enhances model robustness and adaptability by leveraging unlabeled data, providing a new solution for encrypted traffic identification. Shi et al. [24] designed a lightweight encrypted traffic classifier based on CNN. By using a semi-supervised learning framework to convert traffic data into grayscale images as input to the model, they improved the accuracy of MT-CNN while using only limited labeled data. Alam et al. [25] combined CNNs with autoencoders to develop unsupervised machine learning approaches for detecting anomalies in network traffic. Although these methods reduce the model’s dependency on labeled data to some extent, they do not consider the potential relationships between traffic payload features and sequence features.

Therefore, this study aims to propose a more effective encrypted traffic recognition method by combining the advantages of CLSTM (a fusion of 2-conv CNN and BiLSTM) and the Mean Teacher framework, which aim to solve the limitations of traditional methods in encryption traffic identification. Specifically, we first utilize CNN and BiLSTM to automatically learn the feature representations of traffic from encrypted traffic data and then fuse them. The Mean Teacher framework is employed to strengthen the model’s generalization ability by leveraging a large part of unlabeled data in addition to the limited labeled data. By detecting the fusion features of traffic load characteristics and sequence features, the accuracy and robustness of encrypted traffic recognition are improved while reducing the model’s reliance on labeled data.

3. Methodology

3.1. System Architecture of CLSTM-MT

We will explain the system architecture of the CLSTM-MT model in this section. The encrypted traffic classifier structure of the CLSTM-MT model is illustrated in Figure 1. The model is divided into two parts, namely the student model and the teacher model, which has the same network model architecture, but the parameters are updated in different ways. The CNN component of the model is composed of six layers: the first convolutional layer, the first pooling layer, the second convolutional layer, the second pooling layer, a flattening layer, and a fully connected Softmax layer. The BiLSTM component of both models defines two bidirectional classifiers to process the sequential data of the traffic. The parameters include the input dimension, the hidden layer dimensions, and the number of layers. The full connection layer is used to map the output to the categories required for the classification task.

The input data for the model consist of four npy feature files, representing the preprocessed traffic data. In the training process, the maximum and minimum value approaches are used to normalize the data, mapping the data values from the range [0, 255] to [0, 1], and the output of the model is the classification result for the samples.

3.2. Data Preprocessing Module

As the first step for traffic classification, data preprocessing is also a critical step to ensure the effectiveness of model training. We have processed the original traffic data of PCAP in accordance with the steps in Figure 2:

Session Segmentation: The purpose of segmentation is to reduce continuous PCAP streams based on sessions. The two common types of traffic representation are sessions and flows. A session is a stream divided into five tuples (source IP address, source port, destination IP address, destination port, and protocol). Flows are very similar to sessions but only contain traffic in a single direction, meaning that the source IP/port and destination IP/port cannot be swapped. In this study, continuous PCAP files are segmented based on sessions, converting incoming traffic files into PCAP format data. Each session is limited to a maximum of 100,000 packets and 100,000 bytes to facilitate subsequent uniform processing.
Feature Extraction: Before extracting features, we anonymize the packets and select the most useful features for traffic identification.
- Packet Anonymization: To ensure the normal use of traffic data, we anonymize the IP and MAC addresses of every packet in each session to prevent user’s privacy from being divulged and to protect the usability of the traffic data.
- Feature Selection: We extract the following features: Sequence Length Feature (Sequence), the length of the sequence; Payload Feature (Payload), the first Byte_Num bytes of the payload from the first Packet_Num packets; and Statistical Feature (Statistic), 26-dimensional statistical features of the traffic (e.g., flag features, packet size features, protocol distribution features, etc.).

We set Packet_Num to 4 and Byte_Num to 256 because the initial few packets of network traffic typically contain a large number of key information, for example, TCP/UDP ports, protocol type, sequence numbers, and flags, which are very useful for traffic type identification. Extracting partial data from the first few packets rather than the entire data stream can significantly reduce computational load while still retaining sufficient information for classification.

3.: Data Normalization: To enable the model to better learn the underlying relationships in the data, we normalize packet payload data, sequence data, and statistical data to the [0, 1] range using the normalization formula shown in the formula below.

X = \frac{X_{i} - X_{\min}}{X_{\max} - X_{\min}}

(1)

4.: Feature Encoding: The processed features are converted into numerical form and saved as npy files to facilitate model processing.
5.: Dataset Splitting: Finally, the preprocessed data are divided into training and validation sets according to the required proportions.

3.3. Model Design

(1): CNN architecture and its role in encryption application traffic identification

Through multi-layer convolution and pooling operations, CNN convolutional neural networks gradually reduce the spatial dimensions of encrypted traffic data features while increasing the number of channels and finally carry out Softmax classification through the fully connected layer, which can be effectively used to extract local features of encrypted traffic data. Specifically, CNN first automatically extracts relevant features, such as packet load and statistics in traffic data, through the convolutional layer. Then, the feature dimension is reduced by the pooling layer, the most important feature information is retained, and finally, the classification decision is ultimately determined by the entire fully connected layer. The specific model of CNN is shown in Figure 3:

During the convolution and pooling operations of CNN, we treat x_i ∈ R as a k-dimensional vector associated with the ith byte in a session or stream. Thus, a session or long flow n can be expressed as an n × k matrix, where each line represents a feature vector for each traffic byte in a session or stream. In this way, the entire session or stream is converted into a two-dimensional matrix that can be passed as input to the convolutional layer of the CNN. Specifically, this session or stream can be expressed in the following forms:

x = [x_{1}, x_{2}, \dots, x_{n}]

(2)

Let xi:i+j represent the traffic bytes from x_i to x_i+j, i.e., x_i, x_i+1, …, x_i+j. We define the strainer w ∈ R for a window that is used for m bytes during each convolution operation, thereby producing a new feature. Such as the feature c_i can be created by the following equation:

c_{i} = f (w \cdot x_{i : i + m - 1} + α)

(3)

In the equation mentioned above, f is an activation function, the α denotes a bias term. Each filter is used for those possible traffic byte windows {x_1:m+1, x_2:m+1, …, x_n−m+1:n} during the feature mapping process to generate the feature maps.

c = [c_{1}, c_{2}, \dots, c_{n - m + 1}]

(4)

In the expression above, c∈R denotes that the variable c belongs to the set of real numbers. Subsequently, the feature maps undergo a max temporal pooling operation through the pooling layer, and the next one takes the peak c^ = max{c} as a feature. After multiple convolution and pooling processes, a rich set of features can be extracted. These features are fed into softmax for final output.

Despite the fact that data encryption does not alter the overall structure of the data flow, making CNN models still applicable for classifying encrypted traffic, the encryption process may disrupt the request information in protocol-specific parts. This makes the identification of features within different segments of the data more challenging, thereby affecting the performance of the CNN model to some extent.

(2): BiLSTM Architecture and Its Role in Encrypted Application Traffic Identification

To address the limitations of convolutional neural networks (CNNs) in processing encrypted traffic, and for the unique characteristics of this traffic, we need to optimize the model. The time series characteristics of encrypted traffic are represented by packet length series and arrival time series. Inter-arrival time sequences often exhibit linear trends, which can be considered as the superposition of a horizontal smoothing equation and a trend equation, forming a linear recursive sequence. In contrast, packet length sequences display nodular similarities. Since VPN encryption is achieved by connecting to specific websites and nodes, the packet lengths in sessions exhibit particular patterns, combined with the handshake mechanisms characteristic of encrypted traffic, forming a superposition sequence of horizontal smoothing equations and seasonal smoothing equations.

To leverage the preceding numerical features for predicting future data changes, we consider incorporating Bidirectional Long Short-Term Memory networks (BiLSTMs) to enhance the model. As an advanced form of recurrent neural networks (RNNs), BiLSTMs include memory cells within their network architecture that effectively capture and retain long-term dependencies. This makes BiLSTMs particularly suitable for handling time-series characteristics of encrypted traffic data, automatically learning temporal dependency features within the traffic data.

Specifically, BiLSTMs first read the traffic data sequence and process these data over each time step. At each of these time steps, BiLSTMs not only consider the current data but also integrate information from both preceding and subsequent contexts, thereby providing a more comprehensive understanding of the entire sequence. After multiple layers of processing, the final layer outputs a fixed-length vector that encapsulates the key features of the entire traffic sequence. These feature vectors are sent to a fully connected layer, resulting in a final classification decision.

By doing so, BiLSTMs can better capture the complex patterns and temporal dependencies in encrypted traffic, offering more accurate and robust traffic identification capabilities compared to traditional CNN methods. Additionally, the bidirectional nature of BiLSTMs allows them to simultaneously consider past and future contextual information during analysis, further enhancing the model’s expressiveness and predictive accuracy. The LSTM model architecture is shown in Figure 4:

During the training part, for a specific time step t, the minimum input batch is expressed as X_t ∈ R^n×d, where n indicates the count of sequence samples, and d denotes the feature dimension at each step. In the BiLSTM architecture, let the forward and backward hidden states be expressed as

{\vec{H}}_{t}

∈ R^n×h and

{\overset{\leftarrow}{H}}_{t}

∈ R^n×h, respectively, where h symbolizes the number of hidden units.

The status update formula of the forward LSTM is shown below:

{\vec{H}}_{t} = φ (X_{t} W_{x h} + {\vec{H}}_{t - 1} W_{h h} + b_{h})

(5)

The status update formula of the backward LSTM is shown below:

{\overset{\leftarrow}{H}}_{t} = φ (X_{t} W_{x h}^{'} + {\overset{\leftarrow}{H}}_{t + 1} W_{h h}^{'} + b_{h}^{'})

(6)

Here, φ represents the activation function. W_xh represents the weight matrices from the input to the hidden layer and W_hh represents the weight matrices from the hidden layer to itself. Respectively, W′_xh and W′_hh represent the weight matrices for the reverse direction. b_h and b′_h are the bias terms. The hidden state H_t ∈ R^n×2h is a combination of the forward hidden

\vec{H}

_t state and the backward hidden

{\overset{\leftarrow}{H}}_{t}

state.

H_{t} = [{\vec{H}}_{t}; {\overset{\leftarrow}{H}}_{t}]

(7)

Two layers of BiLSTM are superimposed in our CLSTM-MT model, which can better enhance the learning ability of the network. The output of the first hidden state H_t⁽¹⁾ acts as the input of the second hidden state H_t⁽²⁾.

First Layer BiLSTM Output:

H_{t}^{(1)} = [{\vec{H}}_{t}^{(1)}; {\overset{\leftarrow}{H}}_{t}^{(1)}]

(8)

Second Layer BiLSTM Output:

H_{t}^{(2)} = [{\vec{H}}_{t}^{(2)}; {\overset{\leftarrow}{H}}_{t}^{(2)}]

(9)

Finally, the output layer uses H_t⁽²⁾ from the second layer to compute the output O_t:

O_{t} = {H_{t}}^{(2)} W_{h q} + b_{q}

(10)

Here, W_hq denotes the weight matrix that connects the hidden to the output, while b_q represents the bias term for the output. The variable q indicates the count of output classes. Considering that our BiLSTM model is an architecture with sequences as inputs and vectors as outputs, we utilize the ultimate output O_t from the BiLSTM pathway and integrate it with the data features extracted by the CNN pathway. Ultimately, these combined features are entered into a fully connected layer and classified for prediction.

3.4. Mean Teacher Framework Integration

During a traffic classification task, acquiring a substantial number of labeled data is both labor-intensive and time-consuming. The Mean Teacher framework represents a semi-supervised learning approach that enhances model performance through the interaction between the teacher and the student. The parameters of the teacher are updated via an exponential moving average (EMA), whereas the student is trained by using conventional gradient descent methods. By maintaining the stability of the teacher, it effectively guides the learning process of the student.

Specifically, during the training part, the student is updated on account of a composite loss function, which comprises both the classification loss and the consistency loss. Consistency loss quantifies the discrepancy between teacher and student predictions for the same input data. The objective is to ensure that the student model’s predictions are closely consistent with those of the teacher model, thereby stabilizing the training process and enhancing generalization capabilities.

Given an input sample x, let f_s(x) express the prediction generated by the student model, and let f_t(x) express the prediction generated by the teacher model. The consistency loss L_cons can be defined as

L_{cons} (x) = d (f_{s} (x), f_{t} (x))

(11)

where d( , ) is a distance metric. In our implementation, we used MSE as the distance metric:

L_{cons} (x) = \frac{1}{N} \sum_{i = 1}^{N} {(f_{s} (x_{i}) - f_{t} (x_{i}))}^{2}

(12)

Here, N is the count of output dimensions.

The teacher model’s parameters are updated by using the EMA of the student model’s parameters. The EMA calculation is given by the equation below:

θ_{t}^{'} = α θ_{t - 1}^{'} + (1 - α) θ_{t}

(13)

where θ′_t is the teacher model parameter whose time step is t, θ′_t−1 is the previous time step, θ_t is the parameter of the student model when the time step is t, and α is the smoothing coefficient, set to 0.9 in this study. We chose α = 0.9 because it strikes a balance between stability and responsiveness. A lower α (e.g., 0.5) would make the teacher model too sensitive to changes in the student model, potentially leading to instability. Conversely, a higher α (e.g., 0.99) might result in overly slow updates and reduce the guidance of the teacher model for the student model. After the training process is complete, we use the teacher model for the validation part.

4. Experimental Evaluation

4.1. Experimental Setup

4.1.1. Data Preparation

For the sake of verifying the robustness and availability of the proposed encryption application traffic identification method based on the CLSTM–Mean-Teacher collaborative learning across different traffic datasets, we selected two publicly available encrypted traffic datasets from the internet: ISCXVPN2016 and ISCXTOR2016 [26,27]. These datasets include a variety of common encrypted application traffics, covering different types of web applications. We categorized the traffic into 14 label categories according to the type of application they belong to. During training, we divided the input dataset into training data, validation data, and test data at a ratio of 8:1:1. Among these, the training set was further divided into labeled and unlabeled data according to the proportions used in the experiments (0.1:9.9; 0.5:9.5; 1:9; 1.5:8.5; 2:8). This approach was used to assess the validity of encrypted traffic classification on VPN data. Table 1 and Table 2 show the classification of the ISCXVPN2016 traffic dataset and the ISCXTOR2016 traffic dataset, respectively.

4.1.2. Equipment Requirements

During the training phase, the Adam optimizer is used to train the model. The learning rate is set to 0.003 based on preliminary experiments, where this value provided a good balance between convergence speed and stability. We observed that smaller learning rates led to slower training processes, while larger rates often resulted in unstable training dynamics. The entire network is optimized over 50 epochs, which was determined by monitoring the validation loss; performance improvements plateaued around 50 epochs, indicating that further training would not significantly benefit the model. A batch size of 64 was chosen after evaluating several options. This size allowed us to effectively utilize GPU memory while maintaining a stable gradient estimate, which is crucial for model convergence. We trained with PyTorch 1.9.3 as the software environment, Intel^® Core™ i9-11900K @ 3.50 GHz (Intel Corporation, Santa Clara, CA, USA), 64 GB RAM(Kingston Technology Company, Inc., Fountain Valley, CA, USA) and NVIDIA GeForce RTX 3090 (NVIDIA Corporation, Santa Clara, CA, USA) as the hardware environment.

4.1.3. Evaluation Metrics

To comprehensively assess the performance of the model, we used the following evaluation metrics: Accuracy (AC), Precision (PC), Recall (RC), and F1 Score (F1).

Accuracy (AC): The proportion of correctly classified samples out of the total number of samples.
Precision (PC): The proportion of correctly classified positive samples out of all predicted-positive samples.
Recall (RC): The proportion of correctly classified positive samples out of all actual positive samples.
F1 Score (F1): The harmonic mean of precision and recall.

The specific formulas for each metric are as follows:

AC = \frac{T P + T N}{T P + T N + F P + F N}

(14)

PC = \frac{T P}{T P + F P}

(15)

RC = \frac{T P}{T P + T N}

(16)

F 1 = \frac{2 \times P C \times R C}{P C + R C}

(17)

TP (True Positive): The number of samples that are correctly predicted as positive and are actually positive.

FP (False Positive): The number of samples that are incorrectly predicted as positive but are actually negative.

FN (False Negative): The number of samples that are incorrectly predicted as negative but are actually positive.

TN (True Negative): The number of samples that are correctly predicted as negative and are actually negative.

4.2. Experimental Results Compared to Baseline Models

We used five baseline models to validate the performance of our work. 1-conv CNN: The 1-conv CNN method uses a one-dimensional convolutional layer to directly extract features from the processed NPY files and then classifies the encrypted traffic using these features. 2-conv CNN: The 2-conv CNN method is very similar to the 1-conv CNN, but it uses two convolutional layers and two pooling layers in the convolutional part to extract and pool features. 3-conv CNN: Unlike one-dimensional and two-dimensional convolutions, the 3-conv CNN uses three convolutional layers and three pooling layers in the feature extraction part. The 1-conv CNN and deeper convolutional networks (such as 2-conv CNN and 3-conv CNN) are mainly used in extracting local spatial features from traffic. By gradually increasing the number of convolutional and pooling layers, we are able to capture more local information in the data.

BiLSTM: Additionally, we used a Bidirectional Long Short-Term Memory (BiLSTM) network to extract features from the sequential data of the traffic to evaluate the influence of sequential features on the experimental results. CNN-BiLSTM: By combining the CNN and BiLSTM networks, we fused the local features and sequential data features of the traffic to validate the effectiveness of this combined network in traffic data processing. Encrypted traffic data have strong temporal characteristics. BiLSTM can effectively model these temporal relationships through its bidirectional propagation mechanism, thereby extracting time-dependent information from the traffic data. We found that when using the BiLSTM model alone, the model can leverage the sequential data characteristics of the traffic to improve classification accuracy. This indicates that temporal features are crucial for the encrypted traffic classification task.

With the count of CNN layers increasing, the model becomes more and more complex and has a higher ability to adapt to the training data. Nevertheless, this added complexity can result in overfitting, with the model learning noise and uncorrelated patterns from the training set, resulting in a poorer ability to generalize to unseen data. In our experiments, when we increased the number of CNN layers from two to three, the training accuracy improved slightly, but the validation accuracy dropped significantly. This shows that the model started to overfit the training data. This result is reasonable. Therefore, we selected the 2-conv CNN, which performed better in classification among the three convolutional models, as part of our base model algorithm. Furthermore, the BiLSTM network takes the traffic sequence data as the input feature of the model, and we observe a significant improvement in performance. Consequently, we chose the combined 2-conv CNN and BiLSTM network as the base model architecture for both the student and teacher models in the CLSTM-MT framework.

The results in Table 3 show that the 2-conv CNN + BiLSTM model combined with Mean Teacher framework performs better than other baseline models under the same proportion of labeled data. This may be because the model structure can make more effective use of limited labeled data for feature extraction and learning. The Mean Teacher framework enhances the model’s ability to utilize unlabeled data through consistent regularization and other ways so that better traffic classification results can be obtained under different proportions of labeled data. Moreover, our model can still maintain relatively high accuracy when labeling data at a relatively low proportion (such as less than 5%), which is of great significance for resource-limited application scenarios and can reduce the cost of data labeling while ensuring certain classification accuracy. Besides, on the “ISCXVPN2016” and “ISCXTOR2016” dataset, the accuracy of CLSTM-MT reaches as high as 97.84% and 86.3%. The 2-conv CNN + BiLSTM model, combined with the Mean Teacher framework, achieves better traffic classification results under different labeled traffic data compared to the baseline models. This not only significantly enhances the accuracy of traffic recognition but also substantially reduces the required running time and model parameter size. In short, compared to the other five approaches, our model framework has much higher overall accuracy, precision, recall, and F1 score.

4.3. Compared with the Experimental Results of Other Advanced Models

To further compare the performance of our model, we also assessed our model framework with various other methods, including

Rule-Based Method: FlowPrint [28];
Statistical-Feature-Based Method: APPScanner [5];
Machine-Learning-Based Method: SVM [11];
Semi-Supervised and Deep Learning-Based Method: MT-CNN [24].

To better balance the performance of our model with others, the experiment used 20% labeled traffic. From Table 4, it is obvious that compared to the other models, the MT-CNN and CLSTM-MT models exhibit superior overall classification performance. FlowPrint relies on predefined rule sets, which perform poorly when dealing with complex and variable encrypted traffic. Since rules need to be manually formulated and updated, they struggle to adapt to new or unknown traffic patterns, resulting in lower classification accuracy. APPScanner extracts statistical features from traffic for classification. While this method can identify traffic types to some extent, its ability to capture temporal sequence features is limited. SVM have limited capability in processing non-linear data, particularly in the context of complex encrypted traffic data. This often necessitates extensive feature engineering, increasing the complexity and difficulty of parameter tuning. However, one limitation of MT-CNN is that it focuses solely on converting network traffic into a graph for type identification, without considering the temporal sequence features of the traffic. This leads to lower recall rates, lower F1 scores, and relatively poorer classification performance compared to the proposed CLSTM-MT model.

4.4. Ablation Experiments and Results

For the sake of determining the validity of each component of CLSTM-MT, we conducted ablation experiments to verify the impact of each component of CLSTM-MT on the final model performance. To clearly present the importance of each component, we provide specific ablation experiment results. For each ablation experiment, we modified the configuration of the CLSTM-MT model by removing one component at a time while keeping the other components unchanged.

According to the experimental results in Table 5, when we remove each component one by one, the model performance shows a significant decrease:

Without 2-conv CNN: If the CNN model is removed and only BiLSTM and the Mean Teacher architecture are used, the model accuracy drops to 85.6% and 80.4%. This indicates that the local information captured during the feature extraction phase by CNN is critical to the overall performance of the model.
Without BiLSTM: When BiLSTM is removed and only CNN and the Mean Teacher architecture are used, the model accuracy is 91.2% and 82.2%. This shows that BiLSTM plays an important role in modeling the traffic sequence information.
Without the Mean Teacher architecture: Without the Mean Teacher semi-supervised learning architecture, the model accuracy significantly decreases to 62.7% and 56.6%. This suggests that the Mean Teacher architecture can effectively leverage unlabeled data for learning, thus significantly improving model performance.

Table 5. Comparison of the results of the ablation experiment.

Model	ISCXVPN2016					ISCXTOR2016
Model	1%	5%	10%	15%	20%	1%	5%	10%	15%	20%
Ours	62.1%	80.3%	85.4%	93.4%	97.8%	41.0%	61.5%	73.4%	81.2%	86.3%
Ours w/o 2-conv CNN	51.1%	54.6%	69.4%	73.5%	85.6%	43.2%	51.8%	62.7%	71.5%	80.4%
Ours w/o LSTM	53.2%	62.3%	75.3%	81.1%	91.2%	41.4%	52.9%	65.3%	73.2%	82.2%
Ours w/o MT	32.4%	40.3%	52.9%	55.2%	62.7%	31.3%	37.6%	43.9%	51.0%	56.6%

4.5. Analysis of Visual Experiment Results

In the domain of traffic classification, a confusion matrix is usually used to assess the classification outcomes of various traffic data, thereby evaluating the performance of the model. The performance of the CLSTM-MT model is described and illustrated by the confusion matrix in Figure 5. The prominent elements along the diagonal line signify that the CLSTM-MT model exhibits robust classification accuracy for each application, with minimal instances of misclassification.

Specifically, the horizontal axis represents the predicted category and the vertical axis represents the actual category. For each element of the matrix, the number in a cell represents the proportion of the number of samples that are one type of traffic but that are predicted to be another type of traffic to the number of samples that are actually that type of traffic. The darkened entries on the main diagonal indicate high precision and recall for individual classes, underscoring the model’s efficacy in distinguishing between different types of traffic data. This suggests that the CLSTM-MT model achieves substantial accuracy across multiple categories, with only minor discrepancies attributable to confusion errors. Consequently, the model demonstrates strong generalization capabilities and reliable performance in real-world traffic classification tasks.

Meanwhile, the training process of the accuracy and loss of the model on the ISCXVPN2016 dataset is revealed in Figure 6. It is observed that the accuracy rate of the model is 65.2%, and the loss is 1.87 at epoch 1. The accuracy of the model gradually converges to 96%, and the loss converges to less than 0.1. The above results indicate that our method has higher experimental accuracy and lower model loss and can perform well in encryption application traffic identification tasks with fewer data in the case of high cost of data annotation.

5. Discussion

This study introduces a collaborative learning method based on CLSTM-MT to address the challenges in encrypted traffic identification. Through detailed experimental verification and analysis of the results, we demonstrate the significant advantages of this approach in improving the accuracy of encrypted traffic identification. The proposed model performed better than the baseline method in the aspects of accuracy, recall, precision, and F1 scores.

5.1. Key Findings and Contributions

Combining CNN and BiLSTM: By integrating the powerful feature extraction capabilities of CNN and BiLSTM with the advantages of Mean Teacher in semi-supervised learning, the accuracy of encrypted traffic identification is significantly improved. This integration reduces the model’s reliance on labeled data, making it more practical for real-world applications where labeled data are scarce.
Consistency Constraint: The use of consistency loss ensures that the student model’s predictions are closely aligned with those of the teacher model. This constraint enhances the robustness and adaptability of the model, enabling effective identification and classification of encrypted traffic even when dealing with complex or anomalous patterns.
Theoretical and Practical Significance: Our approach has important theoretical significance and practical value for network security, traffic management, quality of service assurance, and other fields. It provides a reliable solution for recognizing and classifying encrypted traffic, which is critical for maintaining the integrity and security of network systems.

5.2. Limitations and Future Work

computational Overhead:

Issue: Maintaining both a teacher and a student model increases computational complexity and memory usage. This can be a limitation in resource-constrained environments. This is because both models require operations such as parameter updates, forward propagation, and backpropagation, which will consume more computing resources. Moreover, storing the model parameters will also occupy more memory space. On some embedded devices or edge devices with limited computing power, such additional resource consumption may prevent the model from running properly or significantly reduce its operating efficiency, making it difficult to meet application requirements such as real-time performance.

Future Work: To mitigate this issue, we plan to apply model compression techniques such as pruning, quantization, and knowledge distillation. Pruning reduces the number of model parameters by removing unimportant connections or neurons, thereby decreasing computational complexity and the memory footprint. Quantization converts the data representation of model parameters and computations from high precision to low precision, significantly reducing memory usage and computational load without significantly affecting model performance to a certain extent. Knowledge distillation transfers the knowledge learned by the teacher model to the student model, enabling the student model to achieve good performance even at a smaller scale. Through the application of these techniques, we can reduce the size and complexity of both models, enabling them to be more widely applied in resource-constrained environments, especially in real-time scenarios where efficient operation under limited resources can be achieved.

2.: Data Requirements:

Issue: The Mean Teacher framework relies on a large amount of unlabeled data to effectively leverage its semi-supervised learning capabilities. If the amount of unlabeled data is insufficient, the benefits of the framework may not be fully realized. The core of semi-supervised learning lies in using the information contained in a large amount of unlabeled data to assist the model in learning. When the amount of unlabeled data is inadequate, the model has difficulty capturing sufficiently rich data distribution characteristics, resulting in its inability to fully explore the underlying patterns in the data, which in turn affects the model’s generalization ability and performance improvement. In practical application scenarios, it is sometimes difficult to obtain a large amount of high-quality unlabeled data, which limits the advantages of the Mean Teacher framework.

Future Work: We aim to use data augmentation techniques to generate additional labeled and unlabeled data. Data augmentation can expand the diversity of data by performing operations such as flipping, rotating, and adding noise to existing data, allowing the model to learn more abundant data features. Additionally, we will explore synthetic data generation methods to create realistic and diverse traffic patterns. This can not only increase the amount of data but also enrich the variety of data, helping the model better adapt to different real-world scenarios, improving the model’s generalization ability and mitigating the adverse impact of limited labeled data.

3.: Generalization Across Different Traffic Types:

Issue: Although our model combines CNN, BiLSTM, and Mean Teacher architectures, it efficiently extracts various features of encrypted traffic and reduces the dependence on labeled data through semi-supervised learning. However, the generalization ability of the model to new types of encrypted traffic unseen during training still faces some challenges.

Training Data Diversity: If new types of encrypted traffic differ significantly from the types seen in training data, the model may struggle to classify them effectively. This is because the training process of the existing model depends on labeled data, and it cannot guarantee perfect performance on unseen traffic.

Role of Mean Teacher: Although we use the Mean Teacher semi-supervised learning framework to improve the model’s generalization ability, the model’s performance may still be impacted when the features of new encrypted traffic differ significantly from the training data.

Future Work: In practical applications, continuously updating the training data and incrementally learning new traffic types is an effective way to enhance generalization. We will investigate advanced architectures (e.g., transformer-based models) and regularization techniques (e.g., dropout, mixup) to further enhance the model’s ability to handle complex and diverse traffic patterns. Advanced architectures and regularization can improve the model’s robustness and adaptability to different traffic types.

4.: Real-Time Deployment Strategies:

Future outlook: To address the computational overhead and memory footprint challenges, we will optimize the storage format of the CLSTM-MT framework and reduce its resource requirements using model compression techniques such as pruning, quantization, and knowledge distillation to enable efficient deployment in real-time applications such as edge computing and distributed systems. These strategies will significantly reduce the model’s dependence on memory and computing resources at runtime, supporting time-sensitive tasks such as network traffic classification.

In addition, we plan to validate the performance of the model on a wider range of real-time scenarios and larger datasets, further improving its practicality and robustness, providing a more efficient solution for encrypted traffic identification, and facilitating practical applications in areas such as network security, traffic management, and the quality of service control.

6. Conclusions

This study aims to solve the challenges in encrypted traffic identification and proposes a collaborative learning method based on CLSTM-MT. Through detailed experimental verification and analysis of results, we expound the significant advantages of this method in improving the accuracy of encrypted traffic identification. The experimental results indicate that the proposed collaborative learning method based on CLSTM-MT performs well in the task of encryption application traffic identification. The model is significantly better than the baseline method in accuracy, recall, accuracy, and F1 scores. The major contributions of our study are as follows: By combining the powerful feature extraction capabilities of CNN and BiLSTM and the advantages of Mean Teacher in semi-supervised learning, the accuracy of encrypted traffic identification is improved and the model’s dependence on annotated data is significantly reduced. By using the consistency constraint of unlabeled data, the robustness and applicability of this model to identify various encrypted traffic are enhanced, and the encrypted traffic can be effectively identified and classified. It has important theoretical significance and application value in terms of network security, traffic management, and service quality control.

Author Contributions

Conceptualization, X.Q.; Methodology, X.Q.; Software, X.Q. and G.Y.; Validation, X.Q.; Formal analysis, X.Q.; Writing—original draft, X.Q.; Supervision, G.Y. and L.Y.; Project administration, X.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Key R&D Program of China (No. 2022YFB3104100).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Chen, L.; Gao, S.; Liu, B.; Lu, Z. Research Status and Development Trends on Network Encrypted Traffic Identification. Netinfo Secur. 2019, 19, 19–25. [Google Scholar]
Rezaei, S.; Liu, X. Deep learning for encrypted traffic classification: An overview. IEEE Commun. Mag. 2019, 57, 76–81. [Google Scholar] [CrossRef]
Madhukar, A.; Williamson, C. A longitudinal study of P2P traffic classification. In Proceedings of the 14th IEEE International Symposium on Modeling, Analysis, and Simulation, Monterey, CA, USA, 11–14 September 2006. [Google Scholar]
Yeganeh, S.H.; Eftekhar, M.; Ganjali, Y.; Keralapura, R.; Nucci, A. Cute: Traffic classification using terms. In Proceedings of the 2012 21st International Conference on Computer Communications and Networks (ICCCN), Munich, Germany, 30 July–2 August 2012; pp. 1–9. [Google Scholar]
Taylor, V.F.; Spolaor, R.; Conti, M.; Martinovic, I. AppScanner: Automatic Fingerprinting of Smartphone Apps from Encrypted Network Traffic. In Proceedings of the 2016 IEEE European Symposium on Security and Privacy (EuroS&P), Saarbruecken, Germany, 21–24 March 2016; pp. 439–454. [Google Scholar]
Huang, S.; Chen, K.; Liu, C.; Liang, A.; Guan, H. A statistical-feature-based approach to internet traffic classification using Machine Learning. In Proceedings of the 2009 International Conference on Ultra Modern Telecommunications & Workshops, St. Petersburg, Russia, 12–14 October 2009; pp. 1–6. [Google Scholar]
Perdisci, R.; Lee, W.; Jajodia, S. A deep-learning approach to detecting encrypted malicious web traffic. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014; pp. 1271–1280. [Google Scholar]
Chen, L.; Li, Y.; Jiang, X. A deep convolutional neural network approach for encrypted traffic classification. Comput. Commun. 2020, 155, 151–161. [Google Scholar]
Abbasi, M.; Shahraki, A.; Taherkordi, A. Deep Learning for Network Traffic Monitoring and Analysis(NTMA): A Survey. Comput. Commun. 2021, 170, 19–41. [Google Scholar] [CrossRef]
Mirsky, Y.; Doitshman, T.; Elovici, Y.; Shabtai, A. Kitsune: An ensemble of autoencoders for online network intrusion detection. In Proceedings of the Network and Distributed System Security Symposium (NDSS), San Diego, CA, USA, 18–21 February 2018. [Google Scholar]
Hao, S.; Hu, J.; Liu, S.; Song, T.; Guo, J.; Liu, S. Improved SVM method for internet traffic classification based on feature weight learning. In Proceedings of the 2015 International Conference on Control, Automation and Information Sciences (ICCAIS), Changshu, China, 29–31 October 2015; pp. 102–106. [Google Scholar]
Wang, M.; Zheng, K.; Luo, D.; Yang, Y.; Wang, X. An Encrypted Traffic Classification Framework Based on Convolutional Neural Networks and Stacked Autoencoders. In Proceedings of the 2020 IEEE 6th International Conference on Computer and Communications (ICCC), Chengdu, China, 11–14 December 2020; pp. 634–641. [Google Scholar]
Yang, Y.; Kang, C.; Gou, G.; Li, Z.; Xiong, G. TLS/SSL Encrypted Traffic Classification with Autoencoder and Convolutional Neural Network. In Proceedings of the 2018 IEEE 20th International Conference on High Performance Computing and Communications, Exeter, UK, 28–30 June 2018; pp. 362–369. [Google Scholar]
Hang, Z.; Lu, Y.; Wang, Y.; Xie, Y. Flow-MAE: Leveraging Masked AutoEncoder for Accurate, Efficient and Robust Malicious Traffic Classification. In Proceedings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses (RAID ’23), Hong Kong, China, 16–18 October 2023; Association for Computing Machinery: New York, NY, USA, 2023; pp. 297–314. [Google Scholar]
He, K.; Chen, X.; Xie, S.; Li, Y.; Dollár, P.; Girshick, R. Masked Autoencoders Are Scalable Vision Learners. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 15979–15988. [Google Scholar]
Liu, C.; He, L.; Xiong, G.; Cao, Z.; Li, Z. FS-Net: A flow sequence network for encrypted traffic classification. In Proceedings of the IEEE International Conference on Computer Communications (INFOCOM), Paris, France, 29 April–2 May 2019; pp. 1171–1179. [Google Scholar]
Yao, H.; Liu, C.; Zhang, P.; Wu, S.; Jiang, C.; Yu, S. Identification of Encrypted Traffic Through Attention Mechanism Based Long Short Term Memory. IEEE Trans. Big Data 2019, 8, 241–252. [Google Scholar] [CrossRef]
Lin, X.; Xiong, G.; Gou, G.; Li, Z.; Shi, J.; Yu, J. ET-BERT: A contextualized datagram representation with pre-training transformers for encrypted traffic classification. In Proceedings of the WWW‘22: The ACM Web Conference 2022, Lyon, France, 25–29 April 2022. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17), Long Beach, CA, USA, 4–9 December 2017; Curran Associates Inc.: Red Hook, NY, USA, 2017; pp. 6000–6010. [Google Scholar]
Fahad, A.; Almalawi, A.; Tari, Z.; Alharthi, K.; Al Qahtani, F.S.; Cheriet, M. SemTra: A semi-supervised approach to traffic flow labeling with minimal human effort. Pattern Recognit. 2019, 91, 1–12. [Google Scholar] [CrossRef]
Zhao, R.; Zhan, M.; Deng, X.; Wang, Y.; Wang, Y.; Gui, G.; Xue, Z. Yet another traffic classifier: A masked autoencoder based traffic transformer with multi-level flow representation. Proc. AAAI Conf. Artif. Intell. 2023, 37, 5420–5427. [Google Scholar] [CrossRef]
Zhou, G.; Guo, X.; Liu, Z.; Li, T.; Li, Q.; Xu, K. TrafficFormer: An Efficient Pre-trained Model for Traffic Data. In Proceedings of the 2025 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 12–14 May 2025; p. 101. [Google Scholar]
Tarvainen, A.; Valpola, H. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17), Long Beach, CA, USA, 4–9 December 2017; Curran Associates Inc.: Red Hook, NY, USA, 2017; pp. 1195–1204. [Google Scholar]
Shi, K.; Zeng, Y.; Ma, B.; Liu, Z.; Ma, J. MT-CNN: A Classification Method of Encrypted Traffic Based on Semi-Supervised Learning. In Proceedings of the GLOBECOM 2023—2023 IEEE Global Communications Conference, Kuala Lumpur, Malaysia, 4–8 December 2003; pp. 7538–7543. [Google Scholar]
Alam, S.; Alam, Y.; Cui, S.; Akujuobi, C.M. Unsupervised network intrusion detection using convolutional neural networks. In Proceedings of the 2023 IEEE 13th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 8–11 March 2023; pp. 712–717. [Google Scholar]
Draper-Gil, G.; Lashkari, A.H.; Mamun, M.S.I.; Ghorbani, A.A. Characterization of encrypted and VPN traffic using time-related features. In Proceedings of the 2nd International Conference on Information Systems Security and Privacy, Rome, Italy, 19–21 February 2016; pp. 407–414. [Google Scholar]
Lashkari, A.H.; Gil, G.D.; Mamun, M.S.I.; Ghorbani, A.A. Characterization of Tor Traffic using Time based Features. In Proceedings of the International Conference on Information Systems Security and Privacy, Porto, Portugal, 19–21 February 2017. [Google Scholar]
van Ede, T.; Bortolameotti, R.; Continella, A.; Ren, J.; Dubois, D.J.; Lindorfer, M.; Choffnes, D.; van Steen, M.; Peter, A. FlowPrint: Semi-Supervised Mobile-App Fingerprinting on Encrypted Network Traffic. In Proceedings of the Network and Distributed System Security Symposium, San Diego, CA, USA, 23–26 February 2020. [Google Scholar]

Figure 1. Overview of CLSTM-MT: The architecture of CLSTM-MT has been divided into three components: the data preprocessing module, the model training module, and the model validation module. Raw PCAP traffic is processed by the data preprocessing module into four npy feature files. Then, these files were divided in an 8:1:1 ratio and segmented into training, validation, and testing datasets. The training dataset is further segmented, a small part (1–20%) of labeled data are used as input data for model training, and a large portion of the unlabeled data (the remaining 80% to 99%) are used as input for model validation. This approach helps train our model effectively.

Figure 2. Data preprocessing module: The module consists of five parts, which are session segmentation, feature extraction, data normalization, feature coding, and dataset segmentation.

Figure 3. CNN model architecture diagram.

Figure 4. LSTM model architecture diagram.

Figure 5. Recall rate confusion matrix of the classification performance of ISCXVPN2016 traffic dataset by the CLSTM–Mean-Teacher model.

Figure 6. Convergence process of accuracy and loss on the ISCXVPN2016 dataset.

Table 1. ISCXVPN2016 dataset label.

Label	Class	Count
1	Aim_Chat	1340
2	Email	5000
3	FaceBook	5000
4	FTPS	5000
5	Hangout	5000
6	ICQ	823
7	Netflix	5000
8	SFTP	5000
9	Skype	5000
10	Spotify	5000
11	Torrent	5000
12	Vimeo	5000
13	VoipBuster	5000
14	Youtube	5000

Table 2. ISCXTOR2016 dataset label.

Label	Class	Count
1	Traffic	5000
2	Web Browsing	5000
3	Email	5000
4	Chat	5000
5	Streaming	5000
6	File Transfer	5000
7	VoIP	5000
8	P2P	5000

Table 3. The performance of each model under different proportions of labeled traffic.

Model	ISCXVPN2016					ISCXTOR2016					FLOPs (10⁶)	Params (10⁶)
Model	1%	5%	10%	15%	20%	1%	5%	10%	15%	20%	FLOPs (10⁶)	Params (10⁶)
1-conv CNN	16.8%	31.5%	47.6%	49.8%	52.9%	23.2%	27.1%	35.7%	41.3%	51.2%	285.7	5.7
2-conv CNN	27.2%	33.6%	50.8%	52.6%	54.1%	26.7%	30.3%	42.9%	47.5%	53.8%	65.3	5.2
3-conv CNN	26.7%	33.9%	46.6%	50.4%	53.2%	21.8%	31.2%	33.2%	45.6%	51.7%	93.2	8.5
BiLSTM	30.3%	41.2%	50.7%	51.1%	58.2%	24.2%	34.4%	38.8%	42.7%	55.4%	63.8	4.7
2-conv CNN + BiLSTM	32.4%	43.3%	52.9%	55.2%	62.7%	31.3%	37.6%	43.9%	51.0%	56.6%	71.9	10.9
CLSTM-MT	62.1%	80.3%	83.9%	93.4%	97.8%	41.0%	61.5%	73.4%	81.2%	86.3%	34.9	4.1

Table 4. Comparison with other advanced methods.

Method	ISCXVPN2016				ISCXTOR2016
Method	Accuary	Precision	Recall	F1-Score	Accuary	Precision	Recall	F1-Score
FlowPrint	0.6163	0.6697	0.6651	0.6673	0.5155	0.6778	0.6120	0.6337
AppScanner	0.6266	0.4864	0.5198	0.5030	0.5369	0.6920	0.6276	0.6582
SVM	0.6767	0.5152	0.5153	0.5150	0.6097	0.7190	0.5898	0.6480
MT-CNN	0.9329	0.9492	0.9173	0.9330	0.8124	0.9108	0.7538	0.8099
CLSTM-MT	0.9784	0.9788	0.9784	0.9783	0.8633	0.6778	0.7684	0.8336

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qiu, X.; Yan, G.; Yin, L. CLSTM-MT (a Combination of 2-Conv CNN and BiLSTM Under the Mean Teacher Collaborative Learning Framework): Encryption Traffic Classification Based on CLSTM (a Combination of 2-Conv CNN and BiLSTM) and Mean Teacher Collaborative Learning. Appl. Sci. 2025, 15, 5089. https://doi.org/10.3390/app15095089

AMA Style

Qiu X, Yan G, Yin L. CLSTM-MT (a Combination of 2-Conv CNN and BiLSTM Under the Mean Teacher Collaborative Learning Framework): Encryption Traffic Classification Based on CLSTM (a Combination of 2-Conv CNN and BiLSTM) and Mean Teacher Collaborative Learning. Applied Sciences. 2025; 15(9):5089. https://doi.org/10.3390/app15095089

Chicago/Turabian Style

Qiu, Xiaozong, Guohua Yan, and Lihua Yin. 2025. "CLSTM-MT (a Combination of 2-Conv CNN and BiLSTM Under the Mean Teacher Collaborative Learning Framework): Encryption Traffic Classification Based on CLSTM (a Combination of 2-Conv CNN and BiLSTM) and Mean Teacher Collaborative Learning" Applied Sciences 15, no. 9: 5089. https://doi.org/10.3390/app15095089

APA Style

Qiu, X., Yan, G., & Yin, L. (2025). CLSTM-MT (a Combination of 2-Conv CNN and BiLSTM Under the Mean Teacher Collaborative Learning Framework): Encryption Traffic Classification Based on CLSTM (a Combination of 2-Conv CNN and BiLSTM) and Mean Teacher Collaborative Learning. Applied Sciences, 15(9), 5089. https://doi.org/10.3390/app15095089

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

CLSTM-MT (a Combination of 2-Conv CNN and BiLSTM Under the Mean Teacher Collaborative Learning Framework): Encryption Traffic Classification Based on CLSTM (a Combination of 2-Conv CNN and BiLSTM) and Mean Teacher Collaborative Learning

Abstract

1. Introduction

2. Related Work

2.1. Rule-Based Methods

2.2. Statistical-Feature-Based Methods

2.3. Machine-Learning-Based Methods

2.4. Deep-Learning-Based Methods

2.5. Semi-Supervised Learning Methods

3. Methodology

3.1. System Architecture of CLSTM-MT

3.2. Data Preprocessing Module

3.3. Model Design

3.4. Mean Teacher Framework Integration

4. Experimental Evaluation

4.1. Experimental Setup

4.1.1. Data Preparation

4.1.2. Equipment Requirements

4.1.3. Evaluation Metrics

4.2. Experimental Results Compared to Baseline Models

4.3. Compared with the Experimental Results of Other Advanced Models

4.4. Ablation Experiments and Results

4.5. Analysis of Visual Experiment Results

5. Discussion

5.1. Key Findings and Contributions

5.2. Limitations and Future Work

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI