2.1. TLS Protocol-Based Network Traffic Classification
Transport Layer Security (TLS) is a cryptographic protocol that secures network communication and serves as an upgraded version of SSL. Compared to SSL, TLS introduces improvements in three key areas: handshake mechanisms, cipher suites, and alert protocols. TLS ensures confidentiality, authentication, and data integrity by combining symmetric (e.g., AES, DES), asymmetric (e.g., RSA, ECC), and hashing algorithms (e.g., SHA-256). It uses digital certificates for identity verification and key exchange protocols such as RSA or Diffie-Hellman to generate session keys.
In TLS-based communications, the process begins with a handshake to establish a secure session. The handshake involves several steps. First, the client sends a list of supported protocol versions and cipher suites, enabling the server to assess the client’s encryption capabilities. Then, the server selects appropriate cryptographic parameters and sends a digital certificate for identity verification. Next, both sides exchange keys via public-key cryptography to derive a shared session key. Finally, encrypted communication begins.
Encrypted network traffic makes data content nearly invisible, rendering traffic classification methods relying on plaintext ineffective. As a result, traditional content-dependent classification methods struggle to identify encrypted malicious traffic and assess security threats. While encryption enhances data security, it creates challenges for detection systems, reducing their ability to detect malicious activities. Bonfiglio et al. [
11] (2007) proposed a method for detecting Skype-encrypted traffic using statistical analysis of bitstream randomness and packet characteristics, such as length and inter-arrival time. However, this method relies on Skype-specific protocol signatures, which makes it prone to protocol updates and reduces its accuracy in TCP/VPN traffic. Korczyński et al. [
12] (2014) proposed a Markov chain-based method for SSL/TLS traffic classification that generates probabilistic fingerprints from protocol message sequences. However, its reliance on specific SSL/TLS behaviors requires fingerprint updates whenever application behavior changes. Husák et al. [
13] (2016) developed a passive SSL/TLS fingerprinting method for HTTPS client identification. By correlating unencrypted TLS metadata with HTTP User-Agent strings, it achieved 95.4% accuracy. However, ambiguity between fingerprint and User-Agent strings complicates identification. Shen et al. [
14] (2017) developed a 2D feature graph from SSL/TLS certificates and application packet lengths to model application attributes, employing a second-order model for encrypted traffic recognition. Feature drift occurs when applications or TLS implementations change, requiring fingerprint updates to sustain classification accuracy. Anderson et al. [
15] (2018) analyzed TLS handshake discrepancies and flow dynamics between benign and malicious traffic. They developed a hybrid approach integrating logistic regression and rule-based classification to detect encrypted threats and classify malware families. However, poor performance on specific families (e.g., Dridex) highlighted the limitations of current feature engineering methods.
Shekhawat et al. [
16] (2019) used Zeek (developed by the Zeek Project, USA) to extract four-tuple logs (source/destination IP, port, protocol) from raw traffic and derived features like connection statistics, SSL handshake attributes, and X.509 certificate properties. Lucia et al. [
17] (2019) used TLS message size and direction as SVM/CNN inputs to model encrypted traffic’s spatial semantics, but their method overlooks TLS version differences, potentially affecting detection accuracy. Dai et al. [
18] (2019) used Zeek (developed by the Zeek Project, USA) to extract flow statistics, SSL handshake data, and certificate attributes, applying mutual information to select features and improve model performance. Hu et al. [
19] (2020) extracted plaintext fields from Client/Server Hello messages as features and used logistic regression for encrypted traffic detection. Ferriyan et al. [
20] (2020) proposed TLSVec, which converts Client/Server Hello plaintext fields into Word2Vec embeddings. Experimental results indicate near-perfect detection accuracy (F1 ≈ 1.0) for encrypted malicious traffic. Brzuska et al. [
21] (2022) analyzed TLS 1.3’s deployment, focusing on adoption rates, security, performance, and implementation. Their study, based on extensive network data, showed improvements in security and performance over previous versions. However, they found challenges like inconsistent protocol support, poor library compatibility, and regional adoption disparities. Xue et al. [
22] (2024) introduced a TLS handshake metadata-driven method for detecting proxy traffic. The method analyzes handshake size, timing, and directional patterns to identify obfuscated traffic. It provides high accuracy, passive monitoring, and protocol-agnostic detection. However, it faces challenges in TLS 1.3, multiplexed traffic, and UDP-based proxy environments.
2.2. Time-Series Algorithm-Based Network Traffic Classification Method
Time-series algorithms analyze the temporal patterns in network traffic data and extract meaningful features for application in various tasks. These algorithms are useful for traffic classification, anomaly detection, prediction, and malware detection. Its key advantages include dynamic dependency, non-invasiveness, and high adaptability. Time-series analysis methods are divided into two categories: one uses image processing on packet payloads and timestamps, while the other extracts statistical features from packet sizes and timestamps. Draper-Gil et al. [
23] (2016) introduced a time-series analysis method leveraging statistical features to enhance classification performance. Key features extracted include flow bytes per second and packet arrival intervals. Experimental results indicate that shorter flow timeouts (e.g., 15 s) generally improve accuracy, whereas longer timeouts benefit specific traffic types, such as VPN-Mail. Vu et al. [
24] (2018) introduced an LSTM-based deep learning approach that extracts temporal features from traffic flows to capture long-term dependencies. Shapira et al. [
25] (2019) developed FlowPic, a method that converts traffic temporal-size information into image representations for CNN-based classification. However, it only achieved 67.8% accuracy in classifying Tor-encrypted traffic, highlighting challenges with Tor-specific encryption.
Niu et al. [
26] (2022) proposed an enhanced LSTM model for APT malware detection by combining time-series and association rule features. The model achieved nearly 100% prediction accuracy and improved classification by 5–10% over traditional methods. The study also found that directly processing raw PCAP files results in inefficient data handling. Tang et al. [
27] (2023) introduced a novel method for time-series feature extraction in encrypted application traffic classification. This method focuses on analyzing sequences of null packets to identify critical behavioral traits, enabling effective traffic pattern recognition without traditional payload inspection. Tosef et al. [
28] (2023) proposed a SFTS-based traffic classification approach by introducing 69 novel time-series features. Experimental results on 15 public datasets demonstrated up to 5% improvement in classification accuracy on specific tasks. Tosef et al. [
9] (2024) proposed NetTiSA, a method that uses packet size time-series analysis to extract 13 core features and adds 7 more for a stronger feature set. Testing on 25 classification tasks, ranging from small networks to 100 Gbps systems, showed NetTiSA performs as well as or better than current methods. However, it slightly lags behind plaintext-based methods in tasks like Tor detection and intrusion detection.
2.3. Network Traffic Classification Based on Feature Encoding
Encoding models are a crucial step in machine learning and deep learning, transforming input data into a format suitable for model processing. These models can be categorized into five types: basic encoding (e.g., one-hot), distributed embedding (e.g., Word2Vec [
29], ELMo [
30]), deep learning-based (e.g., BERT [
31]), sequence feature encoding (e.g., Transformer [
32]), and hybrid models (e.g., multimodal fusion). Traditional feature encoding commonly uses one-hot encoding. However, for certain data features, the resulting high dimensionality can hinder preconfigured models from yielding satisfactory results. Therefore, alternative encoding methods are often considered for feature representation. Chen et al. [
33] (2019) proposed a CNN-based method for encrypted C&C traffic identification. By converting traffic bytes into numeric vectors via Word2Vec and leveraging server-independent features of malware C&C communication, their multi-window CNN extracts local features and inter-block relationships, achieving 91.07% accuracy in high-precision identification. Li et al. [
34] (2020) introduced an HTTP traffic anomaly detection method using weighted Word2Vec segment vectors. It reduces training complexity through TF-IDF-weighted mapping and employs LightGBM-CatBoost algorithms for efficient detection. However, it shows limited adaptability to diverse encryption algorithms and may lack sensitivity to unknown or rare HTTP request patterns.
Ferriyan et al. [
20] (2022) proposed TLS2Vec, a Word2Vec-based method for detecting malicious behaviors in encrypted traffic. By analyzing TLS handshake and payload characteristics with LSTM networks, it achieves traffic classification. Experiments show TLS2Vec attains 99.9% detection accuracy, outperforming non-Word2Vec methods. However, performance degrades on imbalanced classes and discretized payload lengths. Kholgh et al. [
35] (2023) developed PAC-GPT, a GPT-3-based framework for generating synthetic network traffic. It includes traffic and packet generators capable of producing both normal and malicious traffic scenarios. Current limitations include support only for ICMP/DNS protocols and poor performance on complex protocols. Tang et al. [
27] (2023) introduced an Elmo-encoding and LSTM-based method for encrypted traffic classification. Replacing traditional one-hot encoding, Elmo maps words to fixed-length vectors and generates context-aware dynamic embeddings. Ali et al. [
36] (2024) leveraged BERT’s semantic feature extraction with MLP for efficient classification of imbalanced network traffic in intrusion detection. Dong et al. [
37] (2024) reviewed the significance of pre-training (e.g., BERT, GPT, XLNet) in encrypted traffic analysis. Their self-supervised learning and Transformer architectures enable robust feature extraction, particularly BERT’s bidirectional encoder capturing complex traffic patterns.
2.4. Deep Learning-Based Method for Network Traffic Classification
With advancements in deep learning, researchers have increasingly explored its application to encrypted traffic identification. For instance, Wang et al. [
38] (2017) proposed an end-to-end 1D-CNN method for encrypted traffic classification. The experimental results of the ISCX VPN-NonVPN dataset show that 1D-CNN outperforms the existing methods in multiple evaluation metrics, especially in VPN traffic classification. However, they also show that the performance of 1D-CNN in non-VPN traffic classification needs to be improved. Wang et al. [
39] (2017) introduced a 2D-CNN malicious traffic classification method. The method maps the raw traffic to grayscale images and converts them to IDX format as model input. Validation on the USTC-TF2016 dataset demonstrated an accuracy of 99.41%. The advantage of this method is that it can directly process the raw data, thereby reducing the reliance on manual feature engineering. Wu et al. [
40] (2018) proposed BotCatcher, a CNN-LSTM hybrid system for extracting spatio-temporal features of botnets. While effective, the structural complexity of the model (3M parameters) requires extended training and inference time and significant computational resources. Lotfollahi et al. [
41] (2020) developed a CNN-autoencoder framework for encrypted traffic classification. The experimental results show that it achieved 0.98 and 0.94 recall for application and service identification. However, preprocessing steps like Ethernet header removal and traffic filtering add to the implementation complexity.
Aceto et al. [
42] (2021) proposed a multimodal multi-task DL method for cryptographic traffic classification and named it Distiller. According to the evaluation, the average accuracy and F1 score were improved by 8.45% and the training time was reduced by 41.7% as compared to existing models. Li et al. [
43] (2022) designed a lightweight CNN-SIndRNN architecture for malicious TLS detection. While effectively capturing local patterns and long-range dependencies via 1D-CNN and enhanced SIndRNN, the method exhibits degraded performance on low-sample malware families (e.g., WannaCry), necessitating improved generalization. Bader et al. [
44] (2022) proposed an improved DISTILLER-based model (MalDIST) that incorporates statistical features of data packets to enhance feature learning. This method was successfully applied to the detection and classification of malicious encrypted traffic, achieving 99.7% accuracy, precision, recall, and F1 score. Shekhawat et al. [
16] (2023) systematically compared the performance of SVM, Random Forest and XGBoost in encrypted traffic analysis. Although RF and XGBoost are close to perfect in terms of accuracy (≈99%), their selected features are inferior to SVM in terms of interpretability and intuitiveness. Huo et al. [
45] (2023) proposed a multi-view collaborative classification model (MCC) based on semi-supervised learning. The model uses stream metadata features and TLS certificate features to build XGBoost and random forest classifiers. By employing a co-training strategy, it effectively enhances the detection of malicious behavior in encrypted traffic. Zhao et al. [
46] (2024) propose a graph representation-based method for malicious TLS traffic detection (GCN-RF). By converting network traffic into graph structures and leveraging GCNs, the method improves detection performance. However, the use of GNNs also increases the model complexity, resulting in longer training and inference times and higher computational cost. Guo et al. [
47] (2024) proposed an encrypted traffic classification method based on low-dimensional second-order Markov matrix (LDSM). By constructing a state transition matrix and using Gini gain for feature dimension reduction, the model complexity and computational overhead are reduced, and the classification efficiency and accuracy are improved.
In conclusion, this study combined SSL/TLS encryption features, time-series analysis, encoding techniques and deep learning to propose a more accurate and efficient encrypted malicious traffic detection method. We proposed using both encryption features and time-series features in our experiments. In order to effectively capture the temporal dependency in traffic data, we employed a transformer-based XLNet encoding model for feature representation, which enhances the model’s ability to understand sequence patterns. By integrating a convolutional neural network (CNN) and a recurrent neural network (RNN), the model extracts local temporal features and captures long-term dependencies, thereby improving detection accuracy and optimizing training and testing efficiency.