Unlocking Few-Shot Encrypted Traffic Classification: A Contrastive-Driven Meta-Learning Approach

Li, Zheng; Wang, Jian; Song, Ya-Fei; Yue, Shao-Hua

doi:10.3390/electronics14214245

Open AccessArticle

Unlocking Few-Shot Encrypted Traffic Classification: A Contrastive-Driven Meta-Learning Approach

¹

Air Defense and Antimissile School, Air Force Engineering University, Xi’an 710051, China

²

Graduate School, Air Force Engineering University, Xi’an 710051, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(21), 4245; https://doi.org/10.3390/electronics14214245 (registering DOI)

Submission received: 8 September 2025 / Revised: 9 October 2025 / Accepted: 24 October 2025 / Published: 30 October 2025

(This article belongs to the Special Issue Machine Learning for Cyber Security and Privacy: Innovations, Challenges, and Future Directions)

Download

Browse Figures

Versions Notes

Abstract

The classification of encrypted traffic is critical for network security, yet it faces a significant “few-shot” challenge as novel applications with scarce labeled data continuously emerge. This complexity arises from the high-dimensional, noisy nature of traffic data, making it difficult for models to generalize from few examples. Existing paradigms, such as meta-learning from scratch or standard pre-train/fine-tune methods, often fail in this scenario. To address this gap, we propose Contrastive Learning Meta-Flow (CL-MetaFlow), a novel two-stage learning framework that uniquely synergizes the strengths of contrastive representation learning and meta-learning adaptation. In the first stage, a robust feature encoder is pre-trained using supervised contrastive learning on known traffic classes, shaping a highly discriminative and metric-friendly embedding space. In the second stage, this pre-trained encoder initializes a Prototypical Network, enabling rapid and effective adaptation to new, unseen classes from only a few samples. Extensive experiments on a benchmark dataset (ISCX-VPN-2016 & ISCX-Tor-2017) demonstrate the superiority of our approach. Notably, in a five-way five-shot setting, CL-MetaFlow achieves a Macro F1-Score of 0.620, significantly outperforming from-scratch ProtoNet (0.384), a standard fine-tuning baseline (0.160), and strong pre-training counterparts like SimCLR+ProtoNet (0.545) and a re-implemented T-Sanitation (0.591). Our work validates that a high-quality, domain-adapted feature prior is the key to unlocking high-performance few-shot learning in complex network environments, providing a practical and powerful solution for real-world traffic analysis.

Keywords:

encrypted traffic classification; few-shot learning; meta-learning; contrastive learning; network security; feature representation

1. Introduction

1.1. The Challenge of Traffic Classification in the Encryption Era

Internet communication is undergoing a profound encryption revolution. Next-generation protocols, represented by TLS 1.3, QUIC, and encrypted DNS (DoH/DoT), have significantly enhanced the confidentiality and integrity of user data [1,2]. However, this advancement is a double-edged sword: while protecting legitimate users’ privacy, it also provides a natural sanctuary for malicious activities. Command and Control (C&C) communications of botnets, data exfiltration by ransomware, and the covert infiltration of Advanced Persistent Threats (APTs) are increasingly hidden within encrypted tunnels [3]. The effectiveness of traditional Deep Packet Inspection (DPI) techniques has been greatly diminished as they can no longer decrypt and inspect traffic payloads, leaving network operators and security analysts facing an unprecedented “network visibility” crisis.

In response, the research paradigm has shifted from payload analysis to analysis based on the external behavioral features of traffic. These methods do not rely on packet content but identify and classify traffic by analyzing a series of observable metadata—such as packet size sequences, inter-arrival times, handshake parameters, and statistical features like flow duration and byte counts. Deep learning has been applied in many areas [4,5,6,7,8]. Although methods based on machine learning, especially deep learning, have achieved significant success in this area [9,10], they generally share a common prerequisite: the existence of a large-scale, diverse, and accurately labeled training dataset.

1.2. The Emergence of the Few-Shot Dilemma

In a dynamic and adversarial network environment, the “big data” assumption often fails to hold. Encrypted applications and services are emerging and iterating at an unprecedented rate. For instance, new VPN or proxy tools, niche encrypted chat software, unique communication protocols of IoT devices, and new C&C channels adopted by malware variants appear daily [11]. For these incrementally emerging novel classes, we face a typical few-shot dilemma: in their early stages, we can often only capture a very small number of traffic session samples (perhaps only a few dozen or even just a few). It is neither realistic nor timely to collect thousands of samples for each new application before training a model. Therefore, enabling models with the ability to “learn from a few samples,” i.e., few-shot encrypted traffic classification, has become a pressing frontier problem in this field.

1.3. Limitations of Existing Few-Shot Learning Paradigms

Currently, there are two mainstream paradigms for solving few-shot problems, but both expose their limitations when directly applied to the complex scenario of encrypted traffic.

Paradigm A: Standard Meta-Learning. Meta-learning methods like Prototypical Networks [12] and MAML [13] are designed to “learn to learn,” aiming to quickly adapt to new tasks by training on a large number of simulated few-shot tasks. However, these methods typically need to learn a metric-friendly embedding space made from scratch (tabula rasa). The feature space of encrypted traffic is notoriously high-dimensional, heterogeneous, and noisy, where the differences between classes can be very subtle (e.g., distinguishing two different VoIP applications). Without strong prior knowledge, requiring a meta-learning model to directly optimize a well-structured embedding space from raw features is an extremely challenging task, often leading to sub-optimal solutions.
Paradigm B: Pre-Training and Fine-Tuning. This paradigm first pre-trains a general-purpose feature encoder on a large dataset via self-supervised learning (e.g., contrastive learning [14]), and then fine-tunes it on downstream tasks. This approach can learn robust feature representations. However, the standard fine-tuning strategy (e.g., training a linear classifier on a few samples) is not optimized for the specific goal of “fast adaptation from few samples.” It is a generic adaptation method that may not fully unlock the potential of the pre-trained model in scenarios of extreme sample scarcity (K = 1 or K = 5).

1.4. Problem Statement and Key Challenges

Based on the foregoing analysis, the core problem this paper addresses is as follows: How can we build a model that can accurately classify encrypted traffic from novel, unseen application classes using only a handful of labeled examples? Solving this problem involves tackling three key challenges:

High-Dimensional, Noisy Feature Space: Encrypted traffic features are inherently high-dimensional and contain significant noise, making it difficult to learn a discriminative embedding space, especially with limited data.
The “Learn to Adapt” Dilemma: The model must not only learn robust features but also learn the meta-skill of adapting to new classes quickly and efficiently, a task for which standard fine-tuning is ill-suited.
Decoupling Representation and Adaptation: A single-stage training process often forces the model to learn feature representation and few-shot adaptation simultaneously, leading to a sub-optimal performance. A clear decoupling strategy is needed.

1.5. Our Work and Contributions

In light of these limitations, a natural question arises: Can we unify the advantages of both paradigms, leveraging contrastive learning to obtain a structurally robust feature prior while using meta-learning to achieve optimal, rapid adaptation to few-shot tasks?

To answer this question, this paper proposes CL-MetaFlow, a two-stage framework that integrates supervised contrastive pre-training with meta-learning fine-tuning. Our core idea is to use the supervision available from a large number of known classes to “shape” an ideal feature space through contrastive learning. Then, within this high-quality space, a meta-learning algorithm can more easily and efficiently learn how to distinguish unknown classes.

The main contributions of this paper can be summarized as follows:

1.: A Domain-Adapted Framework for Few-Shot Learning: We propose CL-MetaFlow, a novel two-stage framework that synergizes supervised contrastive learning with meta-learning. This design is specifically tailored to the domain of encrypted traffic, effectively decoupling the complex processes of representation learning and few-shot adaptation.
2.: Systematic Multi-View Feature Analysis: We provide a detailed methodology and analysis of the multi-view feature fusion strategy. Our work clarifies how diverse, non-payload-based traffic features are processed and integrated to form a comprehensive and discriminative representation, a critical aspect often overlooked in the related literature.
3.: Rigorous Empirical Validation and Benchmarking: We conduct extensive experiments on a challenging, real-world benchmark. The results rigorously validate that CL-MetaFlow consistently and significantly outperforms a variety of established baselines, setting a new performance benchmark for this specific problem domain.
4.: In-Depth Ablation and Insight: Through a deeper analysis of our ablation studies, we not only quantify the contribution of each system component but also provide critical insights into the hierarchical importance of different feature views in encrypted traffic, revealing the underlying mechanisms of our framework’s success.

1.6. Paper Structure

The remainder of this paper is organized as follows. Section 2 reviews related work in encrypted traffic classification and few-shot learning. Section 3 provides preliminary definitions for the problem and task setup. Section 4 details the proposed CL-MetaFlow framework. Section 5 presents the experimental setup, results, and analysis. Finally, Section 6 concludes the paper and discusses future work.

2. Related Work

2.1. Machine Learning-Based Encrypted Traffic Classification

Research in this area has evolved from the early use of traditional machine learning models, such as C4.5 and SVM with expert-designed statistical features, to the current dominance of end-to-end or semi-end-to-end deep learning methods. For example, pioneering works like Deep Packet [9] and FS-Net [10] demonstrated the ability of deep learning models to learn directly from traffic sequences. More recent works, such as ET-BERT [15] and FSSL [16], have further explored leveraging pre-trained language models and federated learning to handle encrypted traffic. However, most of these works still focus on scenarios with sufficient labeled data and do not directly address the few-shot problem.

More recently, the field has been tackling new challenges. For instance, with the proliferation of protocols like QUIC [17], researchers are developing specialized models to handle its unique characteristics, which differ significantly from traditional TCP-based traffic. Another critical frontier is the adversarial robustness of ETC models. Studies like [18] show that deep learning models are vulnerable to carefully crafted adversarial examples, motivating the need for more robust classification paradigms. Recent work by Zhang et al. [19] specifically explores adversarial learning techniques for detecting Tor malware traffic, demonstrating the potential of adversarial approaches in traffic analysis. Furthermore, some works like [20,21] have already begun to explore few-shot learning for malicious traffic detection, employing hierarchical feature learning or masked auto-encoders, which highlights the growing consensus on the importance of the few-shot problem that our paper addresses.

2.2. Few-Shot Learning in Cybersecurity

The application of few-shot learning (FSL) in cybersecurity is a rapidly growing field. The broader context of few-shot and zero-shot learning has been comprehensively reviewed by Chen et al. [22], who highlight the challenges and opportunities in applying these techniques to various domains including network analysis. While early works applied meta-learning to general-purpose intrusion detection, such as Meta-IDS [23] for automotive networks, more recent research has zoomed in on the specific challenges of network traffic analysis. For example, FS-IDS [24] and Xu et al. [25] proposed meta-learning frameworks for network intrusion detection, demonstrating the feasibility of adapting to new attack types with few examples. Lu et al. [26] further advanced this field by proposing a model-agnostic meta-learning approach for IoT security, while Li and Wang [27] introduced depthwise separable convolution techniques to improve few-shot network intrusion detection efficiency. Similarly, ref. [28] applied metric learning for malicious node detection in IoT networks. More specific to our context, Bovenzi et al. [29] have successfully used FSL to classify attack traffic in IoT environments. However, these works often focus on detecting distinctly anomalous attack traffic. Our work differentiates itself by tackling the more subtle task of fine-grained encrypted application classification, where the inter-class similarity is much higher and powerful, domain-adapted feature representation, as we propose, becomes paramount.

2.3. Contrastive Learning in Network Traffic Analysis

Contrastive learning, as a powerful branch of self-supervised learning, has been preliminarily explored for learning general representations of traffic data. Recently, some research has begun to explore combining contrastive learning with meta-learning [30]. For example, COLA [31] successfully combined graph contrastive learning with graph meta-learning in the graph learning domain to solve few-shot node classification problems. These studies inspired us to explore the potential of a similar “contrastive + meta” paradigm in the entirely different domain of encrypted traffic. Our work is the first successful application and validation of this idea in the field of encrypted traffic classification. A summary of the key related works is presented in Table 1.

3. Preliminaries

Few-shot learning is a problem setting where the goal is to build models that can generalize to new classes from only a few labeled examples. Meta-learning, or “learning to learn,” is a dominant paradigm for solving this problem. Instead of training a model to classify specific classes, meta-learning trains the model on a series of “meta-tasks” (or episodes), where each task simulates a few-shot scenario. This process enables the model to acquire a meta-knowledge of how to quickly adapt to new, unseen tasks, which is the core principle we leverage in this work.

3.1. Problem Definition: Encrypted Traffic Session

An encrypted traffic session or flow is defined as a series of packets with the same five-tuple (source IP, destination IP, source port, destination port, and protocol) over a period of time. Since the payload is encrypted, we cannot directly use its contents. Therefore, a session S is represented by its observable external features. In our work, S is a multi-view representation,

S = {v_{flow}, v_{temporal}, v_{stat}, v_{protocol}}

, where each view is a fixed-size vector summarizing a different aspect of the traffic behavior:

$v_{flow}$ : Flow View represents the behavioral “shape” of the communication, capturing features derived from the sequence of packet sizes and directions.
$v_{temporal}$ : Temporal View represents the “rhythm” of the communication, including not only inter-arrival times (IATs) but also features related to traffic bursts and periodicity.
$v_{stat}$ : Statistical View, a comprehensive feature vector containing over 20 aggregated statistical values, such as flow duration, total packets, packet size distribution, and payload entropy.
$v_{protocol}$ : Protocol View refers to observable parameters from the transport and handshake layers, such as TCP flag sequences and direction change patterns.

3.2. Few-Shot Encrypted Traffic Classification Task (N-Way K-Shot)

Our goal is to solve the few-shot classification problem, which follows the standard meta-learning setup [32], illustrated in Figure 1. We assume a large-scale dataset

D_{train}

with abundant labeled examples from a set of base classes

C_{train}

, and a small dataset

D_{test}

with only a few labeled examples from a set of novel classes

C_{test}

, where

C_{train} \cap C_{test} = \emptyset

.

An N-way K-shot task

T

is constructed from

D_{test}

and consists of the following:

A Support Set $S_{T} = {(x_{i}, y_{i})}_{i = 1}^{N \times K}$ , which contains N classes from $C_{test}$ , with K labeled samples for each class.
A Query Set $Q_{T} = {(x_{j}^{*}, y_{j}^{*})}_{j = 1}^{N \times Q}$ , which contains $N \times Q$ unlabeled samples from the same N classes, used for evaluation.

The model’s task is to predict the labels of the samples in the query set

Q_{T}

given the support set

S_{T}

.

4. The CL-MetaFlow Framework

To address the challenges of few-shot encrypted traffic classification, we designed CL-MetaFlow, a clear two-stage learning framework. Figure 2 illustrates its overall architecture.

4.1. Design Rationale: Why a Two-Stage Approach?

Our core argument is that single-stage learning paradigms face inherent bottlenecks in the complex task of few-shot encrypted traffic classification.

Challenge for Meta-Learning: Applying meta-learning directly to raw, multi-view encrypted traffic data forces the model to perform two difficult tasks simultaneously: (1) learn a meaningful feature representation from high-dimensional, noisy, and heterogeneous inputs, (2) learn how to perform few-shot classification within that representation space. This makes the optimization process extremely difficult, often preventing the model from converging to an ideal embedding space where classes are naturally separated.
Inadequacy of Pre-training: While the traditional “pre-train and fine-tune” paradigm can provide good features through self-supervised learning, its fine-tuning process (e.g., training a linear classifier) is geared towards fitting large amounts of data and is not optimized for the core meta-learning problem of “how to most efficiently extract information from K samples.” This limits its fast adaptation capabilities when samples are extremely scarce.

Our solution, CL-MetaFlow, decouples this dilemma with a two-stage approach. Stage 1 has a single goal: to pre-train a powerful feature encoder

f_{θ}

using large-scale known-class data via supervised contrastive learning, such that its output feature space is inherently robust and metric-friendly. Stage 2 then has a clear task: within this already well-structured feature space, use a meta-learning algorithm to focus on learning how to most efficiently adapt to few-shot classes. This “shape the space, then learn to classify” strategy greatly reduces the difficulty of the downstream task.

4.2. Stage 1: Robust Representation Pre-Training via Supervised Contrastive Learning

In the first stage, we leverage the existing, category-rich meta-training set

D_{train}

. The architecture consists of a set of view-specific encoders and projection heads. For each view

v \in {flow, temporal, stat, protocol}

, we instantiate a dedicated encoder

f_{θ_{v}}

and a projection head

g_{ϕ_{v}}

. Each encoder is a two-layer MLP that maps the 32-dimensional view vector to a 256-dimensional feature embedding. The projection head, also an MLP, then maps this embedding to a 128-dimensional space where the contrastive loss is computed.

We use supervised contrastive loss (SupCon) [33]. Unlike traditional cross-entropy, which only considers the relationship between a sample and its correct class, SupCon leverages label information to define positive and negative pairs. For any given sample (anchor) in a batch, all other samples from the same class are treated as positives, while all samples from different classes are treated as negatives.

The loss function is defined as

L_{SupCon} = \sum_{i \in I} - log \frac{\frac{1}{| P (i) |} \sum_{p \in P (i)} exp (z_{i} \cdot z_{p} / τ)}{\sum_{a \in A (i)} exp (z_{i} \cdot z_{a} / τ)}

(1)

where

z_{i}

is the embedding of anchor i after passing through

f_{θ}

and

g_{ϕ}

,

P (i)

is the set of all positives for i in the batch,

A (i)

is the set of all other samples in the batch, and

τ

is a temperature hyperparameter. By minimizing this loss, the encoder

f_{θ}

is encouraged to learn a feature space where embeddings of same-class samples (regardless of their view) are clustered tightly together, while clusters of different classes are pushed far apart. This is precisely the ideal working environment for a Prototypical Network.

4.3. Stage 2: Fast Meta-Learning Adaptation

After the pre-training stage, the projection head

g_{ϕ}

is discarded, and the robust feature encoder

f_{θ}

is retained. This encoder serves as the foundational backbone for the meta-learning stage, providing high-quality embeddings for few-shot adaptation.

4.3.1. Justification and Mechanism for Multi-View Feature Fusion

A critical step in this stage is the fusion of features from multiple views. Encrypted traffic is a complex phenomenon, and no single feature type can fully capture its behavior. The Flow view describes the “shape” of the communication, the Temporal view its “rhythm,” the Protocol view its “handshake etiquette,” and the Statistical view its ”summary statistics.” Fusing these complementary perspectives is necessary to build a holistic and robust representation that is resilient to minor variations in traffic patterns.

As defined in Section 3.1, each traffic session S is represented by a set of views

{v_{flow}, v_{temporal}, \dots}

. The encoder

f_{θ}

processes each view through its respective path, yielding a set of embedding vectors

{z_{flow}, z_{temporal}, \dots}

for each session. To create a single, unified representation for the downstream meta-learner, we employ a direct yet powerful fusion strategy: feature concatenation.

z_{fused} = concat (z_{flow}, z_{temporal}, z_{stat}, z_{protocol})

(2)

This concatenated vector

z_{fused}

serves as the final, rich representation of the traffic session. This approach is deliberately chosen for its robustness and efficiency, especially in a few-shot context. Firstly, it is a parameter-free operation, which introduces no additional trainable weights. This is crucial for preventing overfitting when the support set is extremely small (e.g., K = 1 or K = 5). More complex mechanisms like attention or gating would introduce new parameters that are difficult to train with scarce data. Secondly, concatenation fully preserves all information from each view, allowing the downstream Prototypical Network to leverage the complementary signals from different modalities to compute more discriminative class prototypes. Our experimental results (Section 5.3) validate that this straightforward strategy is highly effective. Consequently, the dimension of the fused embedding

z_{fused}

is the sum of the individual view-specific embedding dimensions. While this increases the feature dimensionality, the parameter-free nature of concatenation adds minimal computational overhead, and the subsequent distance-based classification in the Prototypical Network is efficient, making the trade-off between dimensionality and performance favorable.

4.3.2. Justification and Adaptation with Prototypical Networks

The choice of Prototypical Networks for the adaptation stage is a direct consequence of our two-stage design. After Stage 1, we have a high-quality feature encoder that maps traffic sessions into a metric-friendly embedding space, where samples of the same class are inherently close and samples of different classes are far apart. In such a well-structured space, a simple, non-parametric, distance-based classifier is not only sufficient but also optimal.

Prototypical Networks [12] provide exactly this mechanism. Instead of learning a complex, parameter-heavy classifier that is prone to overfitting on K-shot data, it performs classification based on a simple and intuitive principle: a query sample belongs to the class whose “prototype” (or centroid) it is closest to in the embedding space. This approach is highly data-efficient and robust, making it an ideal choice for the fast adaptation required in our framework. For any given N-way K-shot task

T

sampled from the novel classes

C_{test}

, we use the following steps:

1.: Prototype Calculation: We feed all $N \times K$ samples from the support set $S_{T}$ into the pre-trained encoder and fusion layer to obtain their fused embeddings $z_{fused}$ . For each class $c \in {1, \dots, N}$ , we compute its prototype $p_{c}$ by averaging the embeddings of its K samples:

$p_{c} = \frac{1}{K} \sum_{(x_{i}, y_{i}) \in S_{T}, y_{i} = c} z_{fused, i}$

(3)
2.: Query Classification: For any query sample $x_{j}^{*}$ , we compute the probability distribution over classes by applying a Softmax function to the negative squared Euclidean distance between its embedding $z_{fused, j}^{*}$ and all N class prototypes:

$p (y = c | x_{j}^{*}) = \frac{exp (- d (z_{fused, j}^{*}, p_{c}))}{\sum_{n = 1}^{N} exp (- d (z_{fused, j}^{*}, p_{n}))}$

(4)

where $d (\cdot, \cdot)$ is the squared Euclidean distance.
3.: Loss and Optimization: The loss is the standard cross-entropy loss on the query set. During the meta-training phase, we perform this process on tasks from $D_{train}$ and fine-tune the parameters of the encoder $f_{θ}$ . During the meta-testing phase, we use tasks from $D_{test}$ to evaluate the final performance.

4.4. Overall Algorithm

The complete pipeline of CL-MetaFlow is summarized in Algorithm 1.

Algorithm 1: The CL-MetaFlow Algorithm

5. Experiments

5.1. Experimental Setup

Dataset: We use the publicly available ISCX-VPN-2016 [34] and ISCX-Tor-2017 [35] datasets. After preprocessing and cleaning, our combined dataset contains approximately 25,000 traffic sessions distributed across 17 application classes. The class distribution is naturally imbalanced, reflecting real-world scenarios where some applications are far more common than others. This imbalance makes the classification task more challenging and realistic. We randomly select 12 classes as the meta-training set ( $C_{train}$ ) and the remaining 5 as the meta-test set ( $C_{test}$ ), ensuring no class overlap.
Evaluation Protocol: We evaluate the models on 5-way K-shot classification tasks, where K is set to 1, 5, 10, and 15. This setting is a standard and widely adopted benchmark in the few-shot learning literature, representing a challenging scenario where the model must distinguish between 5 novel classes with very limited support data. For each task, we sample 20 query instances per class for evaluation. All reported results are the average over 1000 independently generated test episodes to ensure statistical significance. We choose Macro F1-Score as the primary metric. The F1-Score is the harmonic mean of precision and recall, providing a more balanced measure than accuracy alone. Specifically, the Macro F1-Score computes the F1-Score for each class independently and then averages them, giving equal weight to each class. This is crucial for our imbalanced dataset, as it prevents the model’s performance on dominant classes from masking its poor performance on rare classes. To further ensure a comprehensive evaluation on the imbalanced data, we also report Balanced Accuracy in our detailed logs, which confirms the trends observed with the Macro F1-Score.
Baseline Methods: To provide a comprehensive evaluation, we compare our model against a wide range of representative baselines spanning different learning paradigms.
–
ProtoNet [12]: A classic metric-based meta-learning method that learns an embedding space from scratch.
–
MAML [13]: A classic optimization-based meta-learning method that learns good model initialization for fast adaptation.
–
AE + ProtoNet: A baseline that first pre-trains an Autoencoder with a reconstruction loss, then uses the encoder as the backbone for a Prototypical Network. This tests if simple reconstructive pre-training is sufficient.
–
SimCLR + ProtoNet: A strong baseline using a powerful self-supervised method, SimCLR [14], for pre-training. This helps quantify the benefit of using supervised contrastive learning over unsupervised contrastive learning.
–
Meta-Baseline [36]: A powerful pre-training baseline that uses a standard classifier on pre-trained features and then performs nearest-neighbor classification on the support set.
–
CL(Fine-tune): This baseline uses our powerful SupCon pre-trained encoder but follows a standard fine-tuning procedure (training a new linear classifier on K-shot samples). This is crucial to validate the superiority of our meta-adaptation stage.
–
T-Sanitation (re-impl.) [21]: Our re-implementation of a recent SOTA method that uses a Masked Autoencoder (MAE) for contrastive pre-training, followed by few-shot adaptation. This provides a direct comparison with a state-of-the-art external approach.
Implementation Details: Our framework is implemented in PyTorch (version 2.1.2). For each view, we use a separate two-layer MLP as the encoder $f_{θ_{v}}$ . This MLP takes the 32-dimensional view vector as the input, passes it through a 512-dimensional hidden layer with ReLU activation, and outputs a 256-dimensional feature embedding. The projection head $g_{ϕ_{v}}$ is also a two-layer MLP, mapping the 256-dim embedding to the final 128-dim projection space. The pre-training stage (Stage 1) uses the Adam optimizer with a learning rate of 1 $\times 10^{- 4}$ for 200 epochs. The meta-learning stage (Stage 2) also uses Adam with a learning rate of 1 $\times 10^{- 4}$ for 100 epochs. The contrastive loss temperature $τ$ is a learnable parameter initialized to 0.07.

5.2. Overall Performance Comparison

We compare CL-MetaFlow with several baselines across different K-shot settings. The detailed results for Macro F1-Score are shown in Table 2, and the performance trend is visualized in Figure 3.

The results clearly demonstrate the superiority of CL-MetaFlow. The key insights are as follows:

1.: The Necessity of Pre-training: CL-MetaFlow vastly outperforms from-scratch methods like ProtoNet and MAML (e.g., a 23.6% absolute F1-Score improvement over ProtoNet in the five-shot case). This confirms that learning from scratch is ineffective for complex traffic data; a strong feature prior is essential.
2.: The Superiority of Supervised Contrastive Pre-training: Our method significantly beats both reconstructive pre-training (AE + ProtoNet) and unsupervised contrastive pre-training (SimCLR + ProtoNet). The gap between CL-MetaFlow (0.620) and SimCLR+ProtoNet (0.545) in the five-shot setting highlights that leveraging base class labels during pre-training via SupCon is critical for shaping a more discriminative feature space tailored for classification.
3.: The Necessity of Meta-Learning Adaptation: The most thought-provoking comparison is with CL(Fine-tune). Despite using the same powerful pre-trained encoder, the traditional fine-tuning strategy leads to a catastrophic performance collapse due to severe overfitting. In contrast, non-parametric adaptation methods like Meta-Baseline (0.582) and our Prototypical Network-based approach (0.620) are far more robust and effective. This validates that a meta-learning adaptation strategy is superior for unlocking the potential of the pre-trained model.
4.: Excellent Data Efficiency: A remarkable finding is the data efficiency of our framework. CL-MetaFlow, when trained with only a single sample per class (K = 1, F1-Score ≈ 0.48), already outperforms a standard ProtoNet trained with fifteen samples per class (K = 15, F1-Score $\approx 0.39$ ). This demonstrates a powerful generalization capability, indicating that the structured feature space allows the model to extract significantly more discriminative information from each scarce sample.

Comparison with State-of-the-Art Approaches

A direct quantitative comparison with recent state-of-the-art (SOTA) few-shot traffic classification models is challenging due to discrepancies in datasets and evaluation protocols. However, by re-implementing T-Sanitation [21], a recent SOTA method, on our benchmark, we can provide a direct comparison. As shown in Table 2, our CL-MetaFlow (0.620) outperforms the re-implemented T-Sanitation (0.591) in the five-shot setting. We attribute this advantage to our pre-training strategy. T-Sanitation employs a Masked Autoencoder (MAE), a self-supervised objective, whereas our use of supervised contrastive learning (SupCon) in Stage 1 is a deliberate choice to leverage available base class labels. We argue that this creates a more discriminative and metric-friendly embedding space than purely self-supervised objectives, directly benefiting the downstream Prototypical Network. This result suggests that for few-shot classification, a supervised pre-training objective on base classes is more effective than a purely unsupervised one.

5.3. Ablation Study and Deeper Analysis

To dissect the contribution of each component in our framework, we conduct a series of ablation studies. The primary goal is to answer two key questions: (1) How much does each feature view contribute to the final performance? (2) Is our chosen fusion and adaptation mechanism optimal?

5.3.1. Analysis of Feature View Contribution

To quantify the contribution of each of our four views and understand their interplay, we conducted ablation experiments by removing one view at a time. This process helps to reveal the hierarchy of feature importance for the task of few-shot encrypted traffic classification. The results on the five-way five-shot task are shown in Table 3.

The results lead to several key analytical insights:

All Views Provide Value: The removal of any single view results in a performance degradation, which confirms that our multi-view representation is effective and its components are complementary rather than redundant.
Primacy of Behavioral Patterns: The Flow view is unequivocally the most critical feature, with its removal causing a drastic performance drop of 9.51%. This strongly indicates that the sequence of packet sizes and directions contains the most discriminative information.
Hierarchy of Importance: The performance drops reveal a clear hierarchy of feature importance: Flow > Temporal > Protocol > Statistical. This deep analysis validates our multi-view design and provides valuable insights for future feature engineering in encrypted traffic analysis.

5.3.2. Analysis of Feature Fusion Method

To validate our choice of simple feature concatenation as the fusion method, we conducted an additional experiment comparing it with a more complex attention-based fusion mechanism (‘CL-MetaFlow-Attn’). In this variant, a small attention network learns to assign weights to each view’s features before summing them. The results, shown in Table 4, indicate that while the attention mechanism performs competently, it does not offer a significant advantage over simple concatenation and introduces additional complexity. This suggests that in our two-stage framework, where the pre-trained features are already highly discriminative, a simple and parameter-free fusion method like concatenation is a more robust and efficient choice.

5.3.3. Analysis of Non-Linear Adapter

To address the sensitivity of the classification head, we evaluated a non-linear adapter as an alternative to the simple distance-based classification in the Prototypical Network. Specifically, we replaced the distance calculation with a two-layer MLP that takes the query embedding and class prototype as input to predict a match score. The results, presented in Table 5, show that the non-linear adapter performs worse than our simpler approach. We hypothesize that this is due to overfitting on the small K-shot support set, as the MLP introduces additional parameters that are difficult to train with limited data. This finding reinforces our choice of a non-parametric, metric-based approach, which is more robust in a few-shot setting.

6. Conclusions

This paper addressed the challenging and increasingly important problem of few-shot encrypted traffic classification. We proposed, validated, and analyzed CL-MetaFlow, a two-stage learning framework designed to solve this problem by innovatively combining supervised contrastive learning for robust feature representation with meta-learning for rapid task adaptation. On a rigorous experimental benchmark built from real-world datasets, CL-MetaFlow’s performance significantly surpassed multiple mainstream baselines. Our work not only provides an effective solution to the community but, more importantly, reveals and validates a key insight: in complex classification tasks, building a high-quality, structured feature prior through pre-training for a downstream meta-learner is an effective path to high-performance few-shot learning.

It is important to acknowledge the dual-use nature of this technology. While intended for defensive purposes, such as enforcing network policies by detecting unauthorized application usage, the techniques could potentially be misused. We advocate for the responsible application of traffic classification, governed by strict legal and ethical frameworks, and encourage future research into privacy-preserving machine learning techniques.

Future work could explore several promising directions. One avenue is the integration of more advanced meta-learners, such as Relation Networks that learn a flexible metric, to potentially capture more complex class relationships. Another direction is to explore transductive meta-learning, which leverages the entire query set during inference to improve classification accuracy, a scenario that is highly practical for offline traffic analysis. Furthermore, a significant future direction is the integration of our framework with privacy-preserving paradigms. For example, a “Federated CL-MetaFlow” could be developed where the powerful feature encoder is pre-trained collaboratively across multiple organizations without sharing raw data, and then deployed locally for private, on-premise meta-adaptation to new, sensitive traffic classes.

Author Contributions

Conceptualization, Z.L. and S.-H.Y.; methodology, Z.L.; software, Z.L.; validation, Z.L., J.W. and Y.-F.S.; formal analysis, Z.L.; investigation, Z.L.; resources, J.W.; data curation, Z.L. and Y.-F.S.; writing—original draft preparation, Z.L.; writing—review and editing, S.-H.Y.; visualization, Z.L.; supervision, S.-H.Y.; project administration, S.-H.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are derived from the following publicly available resources: ISCX-VPN-2016 [34] and ISCX-Tor-2017 [35]. The datasets can be accessed through the Canadian Institute for Cybersecurity.

Acknowledgments

The authors would like to thank the support from the Command and Control teaching and research section.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Rescorla, E. The Transport Layer Security (TLS) Protocol Version 1.3; Technical Report, RFC 8446; Internet Engineering Task Force (IETF): Fremont, CA, USA, 2018. [Google Scholar]
Iyengar, J.; Thomson, M. QUIC: A UDP-Based Multiplexed and Secure Transport; Technical Report, RFC 9000; Internet Engineering Task Force (IETF): Fremont, CA, USA, 2021. [Google Scholar]
Aboaoja, F.A.; Zainal, A.; Ghaleb, F.A.; Al-Rimy, B.A.S.; Eisa, T.A.E.; Elnour, A.A.H. Malware detection issues, challenges, and future directions: A survey. Appl. Sci. 2022, 12, 8482. [Google Scholar] [CrossRef]
Song, Y.; Zhang, D.; Wang, J.; Wang, Y.; Wang, Y.; Ding, P. Application of deep learning in malware detection: A review. J. Big Data 2025, 12, 99. [Google Scholar] [CrossRef]
Zhang, D.; Song, Y.; Xiang, Q.; Wang, Y. IMCMK-CNN: A lightweight convolutional neural network with Multi-scale Kernels for Image-based Malware Classification. Alex. Eng. J. 2025, 111, 203–220. [Google Scholar] [CrossRef]
Wang, K.; Song, Y.; Xu, Y.; Quan, W.; Ni, P.; Wng, P.; Li, C.; Zhi, X. A novel automated neural network architecture search method of air target intent recognition. Chin. J. Aeronaut. 2025, 38, 103295. [Google Scholar] [CrossRef]
Li, C.; Wang, K.; Song, Y.; Wang, P.; Li, L. Air target intent recognition method combining graphing time series and diffusion models. Chin. J. Aeronaut. 2025, 38, 103177. [Google Scholar] [CrossRef]
Li, S.; Wang, J.; Song, Y.; Wang, S.; Wang, Y. A Lightweight Model for Malicious Code Classification Based on Structural Reparameterisation and Large Convolutional Kernels. Int. J. Comput. Intell. Syst. 2024, 17, 30. [Google Scholar] [CrossRef]
Lotfollahi, M.; Jafari Siavoshani, M.; Zade Shirazi, R.S.; Saberian, M. Deep packet: A novel approach for encrypted traffic classification using deep learning. Soft Comput. 2020, 24, 1999–2012. [Google Scholar] [CrossRef]
Liu, C.; He, L.; Xiong, G.; Cao, Z. Fs-net: A flow sequence network for encrypted traffic classification. In Proceedings of the IEEE INFOCOM 2019—IEEE Conference on Computer Communications, Paris, France, 29 April–2 May 2019. [Google Scholar]
Rachit Bhatt, S.; Ragiri, P.R. Security trends in Internet of Things: A survey. SN Appl. Sci. 2021, 3, 121. [Google Scholar] [CrossRef]
Snell, J.; Swersky, K.; Zemel, R. Prototypical networks for few-shot learning. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
Finn, C.; Abbeel, P.; Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the International Conference on Machine Learning, PMLR, Sydney, Australia, 6–11 August 2017; pp. 1126–1135. [Google Scholar]
Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 13–18 July 2020; pp. 1597–1607. [Google Scholar]
Lin, X.; Xiong, G.; Gou, G.; Li, Z.; Shi, J.; Yu, J. Et-bert: A contextualized datagram representation with pre-training transformers for encrypted traffic classification. In Proceedings of the ACM Web Conference 2022, Virtual, 25–29 April 2022; pp. 633–642. [Google Scholar]
Jin, Z.; Liang, Z.; He, M.; Peng, Y.; Xue, H.; Wang, Y. A federated semi-supervised learning approach for network traffic classification. Int. J. Netw. Manag. 2023, 33, e2222. [Google Scholar] [CrossRef]
Joarder, Y.A.; Fung, C. Exploring quic security and privacy: A comprehensive survey on quic security and privacy vulnerabilities, threats, attacks and future research directions. IEEE Trans. Netw. Serv. Manag. 2024, 21, 6953–6973. [Google Scholar] [CrossRef]
Macas, M.; Wu, C.; Fuertes, W. Adversarial examples: A survey of attacks and defenses in deep learning-enabled cybersecurity systems. Expert Syst. Appl. 2024, 238, 122223. [Google Scholar] [CrossRef]
Hu, X.; Gao, Y.; Cheng, G.; Wu, H.; Li, R. An Adversarial Learning-based Tor Malware Traffic Detection Model. In Proceedings of the GLOBECOM 2022—2022 IEEE Global Communications Conference, Rio de Janeiro, Brazil, 4–8 December 2022; pp. 74–79. [Google Scholar]
Peng, S.; Wang, L.; Shuai, W.; Song, H.; Zhou, J.; Yu, S.; Xuan, Q. Hierarchical Local-Global Feature Learning for Few-shot Malicious Traffic Detection. arXiv 2025, arXiv:2504.03742. [Google Scholar] [CrossRef]
Sun, J.; Zhang, B.; Li, H.; Yuan, L.; Chang, H. T-Sanitation: Contrastive masked auto-encoder-based few-shot learning for malicious traffic detection. J. Supercomput. 2025, 81, 727. [Google Scholar] [CrossRef]
Chen, J.; Mi, R.; Wang, H.; Wu, H.; Mo, J.; Guo, J.; Lai, Z.; Zhang, L.; Leung, V.C.M. A review of few-shot and zero-shot learning for node classification in social networks. IEEE Trans. Comput. Soc. Syst. 2024, 12, 1927–1941. [Google Scholar] [CrossRef]
Wang, H.Q.; Li, J.; Huang, D.H.; Tao, Y.D. Meta-IDS: Meta-Learning Automotive Intrusion Detection Systems with Adaptive and Learnable. Peer-Netw. Appl. 2025, 18, 152. [Google Scholar] [CrossRef]
Yang, J.; Li, H.; Shao, S.; Zou, F.; Wu, Y. FS-IDS: A framework for intrusion detection based on few-shot learning. Comput. Secur. 2022, 122, 102899. [Google Scholar] [CrossRef]
Xu, C.; Shen, J.; Du, X. A Method of Few-Shot Network Intrusion Detection Based on Meta-Learning Framework. IEEE Trans. Inf. Forensics Secur. 2020, 15, 3540–3552. [Google Scholar] [CrossRef]
Lu, C.; Wang, X.; Yang, A.; Liu, Y.; Dong, Z. A few-shot-based model-agnostic meta-learning for intrusion detection in security of internet of things. IEEE Internet Things J. 2023, 10, 21309–21321. [Google Scholar]
Li, G.; Wang, M. A Meta-learning Approach for Few-shot Network Intrusion Detection Using Depthwise Separable Convolution. J. ICT Stand. 2024, 12, 443–470. [Google Scholar] [CrossRef]
Zhou, K.; Lin, X.; Wu, J.; Bashir, A.K.; Li, J.; Imran, M. Metric Learning-based Few-Shot Malicious Node Detection for IoT Backhaul/Fronthaul Networks. In Proceedings of the GLOBECOM 2022—2022 IEEE Global Communications Conference, Rio de Janeiro, Brazil, 4–8 December 2022; pp. 5777–5782. [Google Scholar]
Bovenzi, G.; Di Monda, D.; Montieri, A.; Pescapè, A. Classifying attack traffic in IoT environments via few-shot learning. J. Inf. Secur. Appl. 2024, 83, 103762. [Google Scholar] [CrossRef]
Li, H.; Bai, Y.; Zhao, Y.; Xu, Y. MetaCL: A semi-supervised meta learning architecture via contrastive learning. Int. J. Mach. Learn. Cybern. 2024, 15, 227–236. [Google Scholar] [CrossRef]
Liu, H.; Feng, J.; Kong, L.; Tao, D.; Chen, Y.; Zhang, M. Graph Contrastive Learning Meets Graph Meta Learning: A Unified Method for Few-shot Node Tasks. In Proceedings of the ACM Web Conference 2024, Virtual, 13–17 May 2024. [Google Scholar]
Vinyals, O.; Blundell, C.; Lillicrap, T.; Wierstra, D. Matching networks for one shot learning. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; Volume 29. [Google Scholar]
Khosla, P.; Teterwak, P.; Wang, C.; Sarna, A.; Tian, Y.; Isola, P.; Maschinot, A.; Liu, C.; Krishnan, D. Supervised contrastive learning. In Proceedings of the Advances in Neural Information Processing Systems, Virtual, 6–12 December 2020; Volume 33, pp. 18661–18673. [Google Scholar]
Gil, G.D.; Lashkari, A.H.; Mamun, M.; Ghorbani, A.A. Characterization of encrypted and VPN traffic using time-related features. In Proceedings of the 2nd International Conference on Information Systems Security and Privacy (ICISSP 2016), Rome, Italy, 19–21 February 2016; pp. 407–414. [Google Scholar]
Lashkari, A.H.; Gil, G.D.; Mamun, M.S.; Ghorbani, A.A. Characterization of tor traffic using time based features. In Proceedings of the International Conference on Information Systems Security and Privacy, Porto, Portugal, 19–21 February 2017; Volume 2, pp. 253–262. [Google Scholar]
Chen, Y.; Liu, Z.; Xu, H.; Darrell, T.; Wang, X. Meta-baseline: Exploring simple meta-learning for few-shot learning. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Virtual, 11–17 October 2021; pp. 9062–9071. [Google Scholar]

Figure 1. The meta-learning training and testing pipeline. The method follows a two-stage paradigm. Top (meta-training): A meta-learner is trained on numerous N-way K-shot classification tasks sampled from base classes (

C_{train}

) to acquire a generalizable learning strategy. The model’s parameters are optimized by minimizing prediction loss on the query sets. Bottom (meta-testing): The trained model is then evaluated on new, unseen classes (

C_{test}

). Given a small support set (e.g., 3-way 2-shot), its task is to classify samples in the query set without further model updates, assessing its few-shot generalization performance. The support set serves as the minimal labeled training data for each task, containing K examples per class. During meta-training, the model learns to extract discriminative features and compute class prototypes from these few samples. During meta-testing, the support set provides the only supervision for novel classes, forcing the model to generalize from extremely limited data. The query set contains unlabeled samples from the same N classes and serves a dual purpose: (1) In meta-training, it provides supervision signals to optimize the model’s few-shot learning capability by computing classification loss, enabling the meta-learner to learn how to best leverage limited support samples. (2) In meta-testing, it serves as the evaluation set to assess the model’s generalization performance on novel classes, where predictions are made solely based on the support set without any model parameter updates. This episodic training paradigm, where each episode simulates a complete few-shot task with distinct support and query sets, is the key mechanism that enables the model to acquire the meta-skill of fast adaptation.

Figure 1. The meta-learning training and testing pipeline. The method follows a two-stage paradigm. Top (meta-training): A meta-learner is trained on numerous N-way K-shot classification tasks sampled from base classes (

C_{train}

) to acquire a generalizable learning strategy. The model’s parameters are optimized by minimizing prediction loss on the query sets. Bottom (meta-testing): The trained model is then evaluated on new, unseen classes (

C_{test}

). Given a small support set (e.g., 3-way 2-shot), its task is to classify samples in the query set without further model updates, assessing its few-shot generalization performance. The support set serves as the minimal labeled training data for each task, containing K examples per class. During meta-training, the model learns to extract discriminative features and compute class prototypes from these few samples. During meta-testing, the support set provides the only supervision for novel classes, forcing the model to generalize from extremely limited data. The query set contains unlabeled samples from the same N classes and serves a dual purpose: (1) In meta-training, it provides supervision signals to optimize the model’s few-shot learning capability by computing classification loss, enabling the meta-learner to learn how to best leverage limited support samples. (2) In meta-testing, it serves as the evaluation set to assess the model’s generalization performance on novel classes, where predictions are made solely based on the support set without any model parameter updates. This episodic training paradigm, where each episode simulates a complete few-shot task with distinct support and query sets, is the key mechanism that enables the model to acquire the meta-skill of fast adaptation.

Figure 2. The overall architecture of the CL-MetaFlow framework. Stage 1 (representation pre-training): On a large dataset of known classes (

D_{train}

), a multi-view feature encoder is trained using supervised contrastive loss. This stage’s sole purpose is to shape a feature space where samples from the same class (e.g., YouTube) are pulled together, while samples from different classes (e.g., YouTube vs. Skype) are pushed apart. Stage 2 (meta-learning adaptation): The pre-trained encoder is frozen and used as a backbone for a Prototypical Network. The model learns to adapt to new, unseen classes by computing class prototypes from a few support samples and classifying query samples based on the nearest prototype in the embedding space.

Figure 2. The overall architecture of the CL-MetaFlow framework. Stage 1 (representation pre-training): On a large dataset of known classes (

D_{train}

), a multi-view feature encoder is trained using supervised contrastive loss. This stage’s sole purpose is to shape a feature space where samples from the same class (e.g., YouTube) are pulled together, while samples from different classes (e.g., YouTube vs. Skype) are pushed apart. Stage 2 (meta-learning adaptation): The pre-trained encoder is frozen and used as a backbone for a Prototypical Network. The model learns to adapt to new, unseen classes by computing class prototypes from a few support samples and classifying query samples based on the nearest prototype in the embedding space.

Figure 3. Macro F1-Score comparison across different models on 5-way K-shot tasks.

Table 1. Summary of key related works in encrypted traffic classification.

Work	Method	Focus	Limitation
Deep Packet [9]	CNN+SAE	Application Identification	Requires large labeled dataset
FS-Net [10]	Flow Sequence Network	Application Identification	Requires large labeled dataset
ET-BERT [15]	Transformer-based	General Representation	Not optimized for few-shot
T-Sanitation [21]	MAE+FSL	Malicious Traffic Detection	Self-supervised, no label use
HLGF [20]	Hierarchical GNN	Malicious Traffic Detection	Complex, specialized architecture
CL-MetaFlow (Ours)	SupCon+ProtoNet	Few-shot App. ID	-

Table 2. Performance comparison (Macro F1-Score) on 5-way K-shot tasks. Best results are in bold.

Model Framework	K = 1 Shot	K = 5 Shots	K = 10 Shots	K = 15 Shots
From Scratch (no pre-training)
ProtoNet	0.350	0.384	0.387	0.391
MAML	0.346	0.439	0.458	0.463
Pre-trained and Adapted Paradigms
AE + ProtoNet	0.346	0.429	0.459	0.477
SimCLR + ProtoNet	0.421	0.545	0.563	0.578
Meta-Baseline	0.455	0.582	0.601	0.615
CL(Fine-tune)	0.104	0.160	0.198	0.224
T-Sanitation (re-impl.)	0.462	0.591	0.613	0.625
CL-MetaFlow (Ours)	0.479	0.620	0.632	0.650

Table 3. Ablation study on the contribution of different views (5-way 5-shot Macro F1-Score).

Experiment Setting	Macro F1-Score	Drop (ΔF1)
Baseline (All views)	0.6197	-
Removing Flow View	0.5246	−0.0951
Removing Temporal View	0.5727	−0.0470
Removing Protocol View	0.5848	−0.0349
Removing Statistical View	0.6088	−0.0109

Table 4. Comparison of feature fusion methods (5-way 5-shot Macro F1-Score).

Fusion Method	Macro F1-Score
Concatenation (Ours)	0.6197
Attention-based Fusion	0.6152

Table 5. Comparison of adaptation methods (5-way 5-shot Macro F1-Score).

Adapter Method	Macro F1-Score
Prototype Distance (Ours)	0.6197
Non-linear MLP Adapter	0.5985

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, Z.; Wang, J.; Song, Y.-F.; Yue, S.-H. Unlocking Few-Shot Encrypted Traffic Classification: A Contrastive-Driven Meta-Learning Approach. Electronics 2025, 14, 4245. https://doi.org/10.3390/electronics14214245

AMA Style

Li Z, Wang J, Song Y-F, Yue S-H. Unlocking Few-Shot Encrypted Traffic Classification: A Contrastive-Driven Meta-Learning Approach. Electronics. 2025; 14(21):4245. https://doi.org/10.3390/electronics14214245

Chicago/Turabian Style

Li, Zheng, Jian Wang, Ya-Fei Song, and Shao-Hua Yue. 2025. "Unlocking Few-Shot Encrypted Traffic Classification: A Contrastive-Driven Meta-Learning Approach" Electronics 14, no. 21: 4245. https://doi.org/10.3390/electronics14214245

APA Style

Li, Z., Wang, J., Song, Y.-F., & Yue, S.-H. (2025). Unlocking Few-Shot Encrypted Traffic Classification: A Contrastive-Driven Meta-Learning Approach. Electronics, 14(21), 4245. https://doi.org/10.3390/electronics14214245

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Unlocking Few-Shot Encrypted Traffic Classification: A Contrastive-Driven Meta-Learning Approach

Abstract

1. Introduction

1.1. The Challenge of Traffic Classification in the Encryption Era

1.2. The Emergence of the Few-Shot Dilemma

1.3. Limitations of Existing Few-Shot Learning Paradigms

1.4. Problem Statement and Key Challenges

1.5. Our Work and Contributions

1.6. Paper Structure

2. Related Work

2.1. Machine Learning-Based Encrypted Traffic Classification

2.2. Few-Shot Learning in Cybersecurity

2.3. Contrastive Learning in Network Traffic Analysis

3. Preliminaries

3.1. Problem Definition: Encrypted Traffic Session

3.2. Few-Shot Encrypted Traffic Classification Task (N-Way K-Shot)

4. The CL-MetaFlow Framework

4.1. Design Rationale: Why a Two-Stage Approach?

4.2. Stage 1: Robust Representation Pre-Training via Supervised Contrastive Learning

4.3. Stage 2: Fast Meta-Learning Adaptation

4.3.1. Justification and Mechanism for Multi-View Feature Fusion

4.3.2. Justification and Adaptation with Prototypical Networks

4.4. Overall Algorithm

5. Experiments

5.1. Experimental Setup

5.2. Overall Performance Comparison

Comparison with State-of-the-Art Approaches

5.3. Ablation Study and Deeper Analysis

5.3.1. Analysis of Feature View Contribution

5.3.2. Analysis of Feature Fusion Method

5.3.3. Analysis of Non-Linear Adapter

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI