1. Introduction
Deep learning has brought revolutionary advances to computer vision, with widespread applications in object detection [
1,
2,
3], image segmentation [
4], and medical image analysis [
5,
6,
7]. However, the remarkable success of deep learning heavily relies on large-scale, high-quality annotated datasets. Benchmark datasets such as [
8,
9] have played a fundamental role in model training. It is important to note that constructing such datasets requires substantial human, material, and time resources. Moreover, in fields such as healthcare and finance, which involve privacy and regulatory constraints, data collection faces additional legal and ethical challenges. These practical factors significantly limit the broader application and development of deep learning.
To reduce the dependence on large amounts of labeled data, semi-supervised learning has emerged as an effective solution and attracted increasing attention. Its core idea is to jointly train models using a small set of labeled samples and a large amount of unlabeled data, leveraging the underlying structural and distributional information in the unlabeled data to enhance model performance. Previous studies have shown that, under a semi-supervised learning framework, models can achieve performance comparable to fully supervised methods using only a very limited number of labeled samples [
10,
11,
12]. This makes semi-supervised learning particularly advantageous in scenarios where labeled data is scarce.
Current mainstream semi-supervised learning methods are primarily built upon two theoretical foundations: consistency regularization and entropy minimization. In both frameworks, the effective utilization of unlabeled data is crucial. Methods based on consistency regularization [
13,
14,
15,
16,
17] enhance model robustness by applying diverse data augmentations to unlabeled samples and constraining the model to produce consistent predictions across these augmented versions. Entropy minimization methods, on the other hand, typically rely on high-confidence thresholds to filter samples [
18], aiming to encourage the model to produce low-entropy (high-confidence) predictions. However, these approaches often treat all unlabeled samples equally, ignoring differences in sample reliability, which may lead to the accumulation of “confirmation bias” [
19]—where the model repeatedly reinforces its own incorrect predictions during iteration, ultimately compromising training effectiveness.
With the advancement of research, recent methods [
20,
21,
22] have begun to combine entropy minimization with consistency regularization, achieving synergy through a pseudo-labeling mechanism [
23,
24,
25]. Such frameworks typically apply both weak and strong augmentations to the same unlabeled image: the weakly augmented version is used to generate pseudo-labels, while the strongly augmented version is used to compute the consistency loss. Although these image augmentation-based approaches have achieved remarkable results [
26], their augmentation operations remain largely limited to image-level preprocessing transformations (e.g., random cropping, color jittering, horizontal flipping, etc.) and fail to fully explore the potential of semantic perturbations at the feature level.
Furthermore, existing methods commonly suffer from a significant limitation: they only partially utilize unlabeled data. Specifically, most approaches only incorporate high-confidence unlabeled samples into training while discarding those with lower confidence to avoid potential negative interference. However, this practice also leads to the undervaluation of a large number of samples with higher uncertainty. Reference [
27] introduced the idea of negative learning [
28] within the entropy minimization framework in an attempt to include some low-confidence samples via pseudo-labeling, yet samples outside the threshold range are still excluded. Overall, traditional methods tend to discard low-confidence samples to prevent error accumulation, thereby limiting the overall utilization efficiency of unlabeled data.
We argue that low-confidence samples contain unique supervisory signals. Unlike high-confidence samples that explicitly indicate “belonging to a certain class,” low-confidence samples can provide complementary information about “not belonging to certain classes.” Such complementary labels [
29,
30,
31] can serve as effective supervisory signals to guide the model away from incorrect predictions.
The core objective of contrastive learning is to learn common features among similar instances and enhance distinctions between dissimilar instances by comparing different examples [
32,
33,
34,
35,
36]. Specifically, it encourages similar samples to converge in the feature space while pushing dissimilar samples apart. In contrast, negative learning focuses on leveraging the valuable information contained in low-confidence samples—even when their predictive confidence is low, such samples can still provide meaningful supervisory signals [
27,
37,
38]. Based on this characteristic, negative learning can be effectively integrated with contrastive learning: by incorporating low-confidence samples into the construction of positive and negative sample pairs, data utilization can be made more comprehensive, thereby enhancing the discriminative power and robustness of feature representations.
To address the aforementioned limitations, we developed a dual enhancement strategy in terms of augmentation techniques. In addition to applying standard weak and strong augmentations at the image level, we introduce random perturbation operations in the feature space. Specifically, by applying two distinct Dropout masks to the feature representations of strongly augmented samples, we generate diversified feature views. This multi-level augmentation strategy effectively expands the perturbation space, enabling the model to learn more robust feature representations.
Furthermore, to tackle the under-utilization of low-confidence samples, we employ a complementary label supervision mechanism. By selecting high-quality complementary labels through reliable positive pair screening, these labels are used to guide the training of low-confidence samples, allowing the model to learn valuable knowledge from the perspective of “not belonging to certain classes.” This approach effectively avoids the information waste caused by simply discarding low-confidence samples while reducing the negative impact of noisy labels on model training.
Regarding threshold setting, using a fixed threshold often fails to accurately reflect the model’s training state, leading to low label utilization efficiency. Instead, dynamic threshold strategies [
39,
40] are generally adopted. Inspired by [
40], we employ a dynamic threshold strategy rather than a fixed one, maximizing the utilization of unlabeled images. This approach also helps ensure the accuracy of the generated complementary labels, thereby facilitating more effective model training.
The main contributions of this paper are summarized as follows:
We expand the perturbation space of images, thereby enhancing the model’s generalization capability.
Instead of simply discarding low-confidence samples, we fully exploit them through complementary labels.
2. Methods
Semi-supervised learning requires a small amount of labeled data and a large amount of unlabeled data for training. For a k-class classification problem, we define as a batch of labeled samples, where and represent the training sample and its corresponding label, respectively. Let denote a batch of unlabeled samples, where μ indicates the relative size between χ and υ.
2.1. Complementary Labels
Under the traditional paradigm of semi-supervised learning, unlabeled data is typically utilized by setting a confidence threshold—where predicted classes with high probability are taken as pseudo-labels—and training is performed with consistency regularization applied through strong and weak augmentations of the same image. However, this strategy has notable limitations: on the one hand, incorrect pseudo-labels may accumulate over iterative training rounds, leading to confirmation bias and thus degrading model performance; on the other hand, valuable information contained in low-confidence predictions is often overlooked and remains underutilized.
Unlike pseudo-labels that indicate which class a sample “belongs to,” complementary labels reflect which classes the sample “does not belong to.” This mechanism enables the model to effectively learn negative knowledge from the samples, thereby more accurately constraining the class boundaries in the semantic space. In the early stages of training, models tend to learn from easily classifiable samples. Therefore, it is a reasonable strategy to appropriately lower the confidence threshold during this phase to select more reliable pseudo-labels. The cross-entropy loss function encourages the model to make more confident predictions, thereby facilitating the steady extraction of useful information from unlabeled data.
However, predicting all unlabeled data at each training step would incur high computational costs. To alleviate this issue, a sliding average of the prediction confidence from the current time step can be used as an estimation baseline, updated dynamically via an exponential moving average. This approach reduces computational overhead while maintaining stability.
In terms of threshold setting, we do not employ a fixed threshold but instead adopt a dynamic threshold capable of adapting to different categories. Specifically, the global threshold is initially configured according to the number of classes and is subsequently updated dynamically using an exponential moving average method, as expressed by the following formula:
where C represents the number of classes,
is a hyper-parameter,
denotes the batch size of unlabeled data, and
represents the conditional probability
.
For class-specific threshold adjustment, which modulates the global threshold in a class-wise manner, we estimate the learning status of each category
by computing the expectation of the model’s predictions for that class.
The final threshold for each class is set to
Different thresholds are set according to the learning difficulty of each category, in order to select reliable pseudo-labels. Meanwhile, low-confidence labels are still retained; however, as their corresponding pseudo-labels are considered unreliable, they are assigned a value of −1. For labeled samples, we directly use their ground-truth labels. The formula is as follows:
Based on the confidence level, unlabeled data can be categorized into two distinct groups: high-confidence and low-confidence data. The set of high-confidence samples comprises all labeled data points along with unlabeled samples whose prediction confidence exceeds a predefined threshold.
Here, represents an input sample that includes both its weakly and strongly augmented views.
Within the overall training sample set Φ, samples not classified as high-confidence constitute the low-confidence subset, denoted as L = Φ\H. For each sample in this subset, complementary categories—those to which the sample does not belong—are identified based on its predicted probability distribution. Specifically, after ranking the predicted probabilities, conventional complementary label methods typically select only the lowest-probability category as the complementary label. In contrast, for high-confidence samples, all categories except the ground-truth one can be treated as complementary labels, allowing the selection of multiple improbable categories to form the complementary label set
. This procedure is formally expressed by the following formula:
Here, denotes the number of improbable categories, and indicates the position of the sample’s label.
Based on complementary labels, we can construct reliable negative pairs. This process is divided into two cases: (1) If both samples belong to the high-confidence set, and the label of one sample is included in the complementary label set of the other, they form a reliable negative pair; (2) If one sample comes from the high-confidence set and the other from the low-confidence set, all complementary labels of the high-confidence sample are used to construct negative pairs with the low-confidence sample.
Inspired by supervised contrastive learning [
15], reliable positive pairs are constructed under the assumption that different samples from the same class should exhibit similar representations. Accordingly, for samples in the high-confidence set, any two samples belonging to the same class are considered a positive pair. For two samples in the low-confidence set, if their weakly augmented views originate from the same original image, they are also regarded as belonging to the same class and thus can form a positive pair.
is a defined function that returns the weakly augmented image.
2.2. Feature Perturbation
Image augmentation-based methods have emerged as a pivotal paradigm in semi-supervised learning. A representative work, Fixmatch, operates under a consistency regularization framework: it applies both weak and strong augmentations to unlabeled images, generating pseudo-labels from the weakly augmented views to supervise the predictions of the strongly augmented versions. Renowned for its simplicity and efficacy, Fixmatch has demonstrated exceptional performance in image classification tasks. Such methods typically rely on a set of predefined image-level transformations—such as random cropping, horizontal flipping, and color jittering—to augment the data distribution and thereby enhance the model’s generalization capability.
However, existing augmentation strategies suffer from several limitations: (1) the augmentations are predominantly confined to the input level, lacking direct perturbations in the feature space; (2) the diversity of perturbations is restricted to manually designed transformation sets, which may be insufficient to cover a broader spectrum of semantic-invariant variations; (3) only one strongly augmented view is generated for each unlabeled image, failing to fully exploit the feature-level diversity of the same sample under different perturbations, thereby limiting the exploration of deeper feature representations.
While retaining conventional image augmentations, we introduce an auxiliary feature-level perturbation mechanism. Specifically, we incorporate dropout layers into the network to construct a dual-branch feature perturbation pathway. The network weights were randomly initialized at the start of training. During training, a dropout rate of 0.5 was utilized, which randomly deactivates a portion of the neurons in each forward pass. During training, pseudo-labels are obtained from the weakly augmented view of an unlabeled image. The corresponding strongly augmented view is then forwarded through the network, where it undergoes two independent feature dropout operations, yielding two distinct perturbed feature representations. The overall framework is illustrated in the following
Figure 1:
2.3. The Proposed Method
We propose a semi-supervised learning method that integrates complementary labels and feature perturbations (CLFP). CLFP processes each unlabeled image using one weak and two strong augmentation operations, yielding the transformed views
,
, and
. The strongly augmented image is generated by applying RandAugment [
41] to the weakly augmented image. For labeled data, only weak augmentation is applied, producing
. All augmented samples are fed into the network, where the two strongly augmented views
and
are further processed by a feature-assisted augmentation module that performs dual-branch feature enhancement, resulting in refined features
and
. The model then employs a projection head to map high-dimensional features into a lower-dimensional space, outputting the final feature representations. For pseudo-label generation, predictions from the weakly augmented view
are utilized to select samples with confidence scores above a predefined threshold as pseudo-labels. Meanwhile, high- and low-confidence regions are identified based on the same threshold, facilitating the generation of complementary labels. Negative pairs are constructed according to Equation (7), and a function returning the weakly augmented image is defined. Positive pairs are formed using Equation (8). The complete workflow of the CLFP framework is illustrated in
Figure 2.
CLFP integrates three distinct loss terms: the supervised classification loss , the unsupervised classification loss , and the supervised contrastive loss .
The supervised classification loss, denoted by
on dataset
, is calculated using the cross-entropy loss between the ground-truth labels and the predicted outputs, as expressed by the following equation:
The unsupervised classification loss
leverages pseudo-labels generated from weakly augmented images as supervision signals. It is computed as the cross-entropy between these pseudo-labels and the predictions from both the strongly augmented images and their feature-enhanced representations. The formulation is given below:
The contrastive loss
is computed using the constructed positive and negative pairs, as defined by the following formula:
3. Results
This section presents the experimental setup for CLFP in
Section 3.1, followed by an evaluation of its effectiveness on standard datasets including CIFAR-10, STL-10, and SVHN, with comparative experiments against existing semi-supervised learning methods detailed in
Section 3.2. Ablation studies are also provided in
Section 3.3 to demonstrate the effectiveness of each component in the proposed framework.
3.1. Experimental Settings
Datasets: (1) CIFAR-10 [
42] is a balanced dataset widely used in the field of computer vision. It consists of 10 classes, with a total of 60,000 color images, of which 50,000 are allocated for training and 10,000 for testing. Each image has a resolution of 32 × 32 pixels. (2) STL-10 [
43] is a more complex dataset compared to CIFAR-10. While it also comprises 10 object classes, it contains 5000 labeled training images and 8000 labeled test images, along with a large set of 100,000 unlabeled images. The unlabeled subset includes samples not only from the 10 known classes but also from other unknown categories, adding to the learning challenge. All images are in RGB format with a resolution of 96 × 96 pixels. (3) SVHN [
44] (Street View House Numbers) is derived from real-world street view images and consists of 10 classes corresponding to digits 0 through 9. The dataset includes a training set of over 70,000 images, a test set of more than 26,000 images, and an additional set containing over 520,000 images, which exhibits a significant class imbalance. All images are 32 × 32 pixels in RGB color space.
Setting: All experiments employ WideResNet as the backbone network. The WideResNet-28-2 architecture [
45] is used consistently across the CIFAR-10, STL-10, and SVHN datasets. Each training batch consists of 64 randomly sampled labeled examples and
× 64 unlabeled examples, with the unlabeled ratio μ set to 7. The algorithm utilizes an Exponential Moving Average (EMA) strategy with a momentum coefficient of 0.999. Stochastic Gradient Descent (SGD) is adopted as the optimizer, with a momentum of 0.9. A cosine annealing learning rate schedule is applied to all datasets, formulated as
, where the initial learning rate
is 0.03, t denotes the current training step, and the total number of training steps N is set to 300 × 2
10. In the complementary label generation process, the number of complementary categories k is set to 7 by default. dropout is applied after the first block, with the dropout rate set to 0.5.
3.2. Comparative Experiments
In the comparative experiments section, we selected a set of representative baseline methods in the field of semi-supervised learning for performance comparison, including Mean Teacher [
14], MixMatch [
46], ReMixMatch [
47], Fixmatch [
21], Fixmatch+CCL [
48]. All methods were evaluated consistently on the CIFAR-10, STL-10, and SVHN datasets.
We conducted comparative experiments on the CIFAR-10 dataset against several state-of-the-art semi-supervised learning methods, including Mean Teacher, MixMatch, ReMixMatch, Fixmatch, and Fixmatch+CCL, under varying labeled-data settings (e.g., 40, 250, and 4000 labeled samples). As summarized in
Table 1, the proposed CLFP consistently outperformed all baseline methods across all label-scarce scenarios. Notably, under the extremely low-label setting (40 labeled samples), CLFP achieved an accuracy of 94.91%, surpassing the previously best-performing method, Fixmatch (92.33%), by 2.58 percentage points. This significant improvement demonstrates CLFP’s ability to effectively leverage complementary labels and feature-level perturbations to extract meaningful information from unlabeled data, thereby alleviating the constraints imposed by limited supervision. Even as the number of labeled samples increased to 250 and 4000, CLFP maintained a performance advantage, achieving accuracies of 94.85% and 95.50%, respectively. These results indicate that the proposed approach not only effectively addresses label scarcity but also enhances generalization under more abundant supervision, validating its robustness and scalability.
Unlike CIFAR-10, the STL-10 dataset contains unknown classes and exhibits an inherent class imbalance. Experimental results on this dataset further validate the strong generalization capability and robustness of our method. Following the same labeled-data configuration as used in the CIFAR-10 experiments, we evaluated STL-10 under the settings of 4, 25, and 100 labeled samples per class (corresponding to 40, 250, and 1000 labeled samples in total), and conducted systematic comparisons with several leading semi-supervised learning approaches. As shown in
Table 2, CLFP achieves the best performance across all labeled-data settings, with particularly notable gains under the medium-label regime (250 labeled samples). Under the extreme low-label setting (40 labeled samples), CLFP attains an accuracy of 62.86%, significantly outperforming Fixmatch (54.27%) and Fixmatch+CCL (55.70%). When the number of labeled samples increases to 250, CLFP further improves to 91.27%, exceeding the Fixmatch+CCL (88.03%), by an absolute margin of 3.24 percentage points—demonstrating its superior capability to extract useful information under limited supervision. Even under the relatively sufficient labeled-data setting (1000 labeled samples), CLFP maintains a leading accuracy of 92.58%, highlighting its consistent adaptability under varying degrees of supervision. These results collectively confirm that CLFP generalizes robustly across datasets with distinct characteristics, especially in challenging scenarios involving unknown classes and inherent data imbalance.
To comprehensively evaluate the generalization capability of the CLFP method, we conducted additional experiments on the relatively simple SVHN dataset. Compared to CIFAR-10 and STL-10, this dataset exhibits more structured content and lower intra-class variation. For this evaluation, we adopted two labeled-data settings: 40 and 1000 labeled samples. Under the extremely low-label setting (40 labels, i.e., only 4 labeled samples per class), our method achieved a top accuracy of 97.13%, significantly outperforming Fixmatch (92.73%) and Fixmatch+CCL (96.23%). In particular, CLFP outperformed Fixmatch+CCL by 0.90 percentage points, demonstrating its ability to effectively leverage unlabeled data even under severely limited supervision. When the number of labeled samples increased to 1000, CLFP maintained a leading accuracy of 97.84%, exceeding the second-best method, Fixmatch (97.34%), by 0.50 percentage points. These results indicate that CLFP consistently enhances model performance across varying degrees of supervision, further validating its adaptability and scalability under different data scenarios. The experimental results are summarized in
Table 3.
Based on systematic experiments on CIFAR-10, STL-10, and SVHN datasets, the proposed CLFP method demonstrates comprehensively leading performance in semi-supervised learning tasks, with particularly outstanding advantages in extremely label-scarce scenarios. The core innovation of the method lies in the introduction of complementary labels and feature-level perturbation mechanisms, which effectively exploit the valuable information in low-confidence samples that is often overlooked by traditional methods. By constructing reliable positive and negative sample pairs combined with a dual-branch feature enhancement strategy, CLFP significantly improves the regularization effect of the model. Furthermore, by integrating complementary labels with feature-space perturbations, the model enhances feature diversity while maintaining semantic consistency, thereby achieving better generalization ability and more stable training convergence.
In
Table 4, we compare the computational time of CLFP, Fixmatch, and Fixmatch+CCL. The results show that CLFP exhibits a notable increase in iterative runtime compared to Fixmatch, primarily due to the additional computational overhead introduced by its feature perturbation and complementary labeling mechanisms. However, this is accompanied by a significant improvement in Top-1 accuracy. It is worth noting that Fixmatch+CCL, which also incorporates complementary labels, achieves lower accuracy than Fixmatch. This indicates that the increased computational cost of CLFP is effectively translated into enhanced model learning capability. Therefore, in scenarios where computational resources permit, CLFP achieves a practical balance between accuracy and efficiency.
It is noteworthy that CLFP exhibits excellent performance across datasets with different characteristics: it achieves significant improvements on complex datasets such as CIFAR-10 and STL-10, while also maintaining stable advantages on the relatively simple SVHN dataset. These results fully demonstrate the method’s strong adaptability and robustness under different data characteristics and supervision levels. As shown in the performance curves during training (
Figure 3), CLFP achieves rapid convergence and the highest accuracy while maintaining low loss values, further verifying the effectiveness and stability of the method. The significant results obtained by CLFP indicate that it provides an effective solution to the problem of label scarcity in semi-supervised learning through innovative utilization of low-confidence samples and feature-level enhancement techniques.
3.3. Ablation Experiment
We conducted systematic ablation studies on the CIFAR-10 dataset (with 40 labeled samples) to validate the effectiveness of individual components in CLFP, while also evaluating key parameters.
Based on the CLFP baseline model, we conducted a systematic ablation study on its core components, including removing the complementary label module, eliminating the dynamic thresholding mechanism, applying feature perturbations solely to the weakly augmented branch, and applying perturbations only to the two strongly augmented branches. The experimental results are summarized in
Table 5.
Based on the ablation results presented in
Table 4, the contribution of each component to the CLFP model’s performance can be clearly determined. Under the extreme low-resource setting of the CIFAR-10 dataset (with only 4 labeled samples per class), the complete CLFP model achieved an accuracy of 94.91%. Removing the complementary labels module resulted in a performance drop of 1.15 percentage points (to 93.76%), indicating that this module is the core component for improving model performance and effectively validating that low-confidence samples indeed contain useful information that aids model learning. Eliminating the dynamic threshold mechanism led to a performance decrease of 0.63 percentage points (to 94.28%), demonstrating the important role of adaptive threshold adjustment in maintaining pseudo-label quality. Regarding the feature perturbation strategy, experimental results show that applying feature perturbation only to the weak augmentation branch yielded an accuracy of 94.17%, while applying perturbation only to the two strong augmentation branches resulted in an accuracy of 93.05%. Neither reached the performance of the complete model, with feature perturbation on the strong augmentation branches contributing more significantly. This indicates that the dual-branch feature perturbation strategy jointly enhances the diversity and robustness of feature representations through a complementary mechanism.
In summary, CLFP effectively exploits the valuable information in low-confidence samples by introducing complementary labels and enhancing model generalization through dual-branch feature perturbations, thereby achieving significant performance improvement in semi-supervised learning tasks.
To investigate the impact of the number of complementary labels
n model performance, we conducted an ablation study on the value of
. While keeping other parameters unchanged, we progressively increased
from 1 to 10 for comparative evaluation. The results are shown in
Figure 4. The results demonstrate that the model achieves optimal performance at
= 7 (Top-1 accuracy: 94.91%), while deviations from this value lead to performance degradation.
4. Discussion
The CLFP method proposed in this paper introduces two key innovations within a semi-supervised learning framework: a complementary labeling mechanism and a feature perturbation strategy. In its implementation, the method first establishes a dynamic threshold to distinguish between high-confidence and low-confidence predictions. Based on high-confidence samples and their corresponding labels, the seven classes with the lowest prediction confidence are selected as complementary labels, thereby transforming low-confidence samples into effective supervisory signals that are incorporated into the training process. Furthermore, the method employs a feature perturbation strategy: pseudo-labels are generated from weakly augmented images, while feature-space perturbations are applied to two independently generated strongly augmented images. Consistency loss is then computed between the original and perturbed strong augmentations under the guidance of the pseudo-labels to enhance model robustness. Additionally, positive and negative sample pairs are constructed using high-confidence samples and their complementary labels, and a supervised contrastive loss is calculated to improve the discriminability of feature representations.
In the extreme low-resource scenario on the CIFAR-10 dataset with only 4 labels per class, CLFP demonstrates superior performance compared to Fixmatch and Fixmatch+CCL. The key lies in the synergistic effect between random perturbations in the feature space and the negative supervisory signals provided by complementary labels. Specifically, feature perturbation with a dropout rate of 0.5 forces the network to learn more generalizable and structurally noise-robust feature representations by randomly “dropping” half of the neurons, thereby enhancing the consistency of the model’s predictions on unseen data and indirectly improving pseudo-label quality.
Meanwhile, the dynamic threshold mechanism strictly filters high-confidence pseudo-labels in the early stages of training to avoid error accumulation caused by initial perturbations and noisy labels, while gradually releasing more unlabeled samples for learning as the model’s capability improves. This “perturbation-screening” closed-loop design effectively combines the two components, mitigating the common issue of confirmation bias in semi-supervised learning. As a result, the model can safely and efficiently leverage large-scale unlabeled data starting from extremely limited supervisory signals, achieving stable training and accuracy improvement.
The method demonstrates strong adaptability across diverse data characteristics (e.g., class imbalance in the STL-10 dataset) and varying levels of supervision. Notably, on the relatively simple SVHN dataset, CLFP maintains its edge even with limited annotations, while matching the best baseline in fully annotated scenarios. This indicates the method consistently delivers stable performance across tasks of varying difficulty.
The ablation experiments demonstrated that feature perturbation yielded the most significant performance improvement (1.15%), conclusively proving the importance of fully leveraging the value of unlabeled data. The dual-branch feature perturbation strategy achieved diversified optimization of the feature space by introducing perturbations at different enhancement levels. Notably, the feature perturbation contribution of the strong enhancement branch (1.86%) was significantly higher than that of the weak enhancement branch (0.74%), indicating that introducing feature-level perturbations on samples undergoing complex transformations can yield more pronounced optimization effects.
Analysis of computational time reveals that CLFP requires approximately 0.5 s per iteration, showing a slight increase compared to Fixmatch. This additional cost primarily stems from extra forward propagation and feature perturbation calculations for two strong enhancement paths. However, we consider this trade-off reasonable and efficient. Firstly, compared to Fixmatch+CCL—which introduces complementary labels but sacrifices performance—CLFP’s time overhead delivers clear accuracy gains, demonstrating effective utilization of computational resources to enhance model capabilities. Secondly, despite the extended iteration time, CLFP typically achieves convergence with fewer training cycles, maintaining competitive overall efficiency in terms of total training duration.
Despite the positive outcomes achieved by CLFP, this study still has certain limitations. The current approach employs a relatively simple form of feature perturbation with fixed random dropout ratios. The hyper-parameters of the method (e.g., perturbation strength, k-value) still require manual configuration. CLFP has only been validated in image classification tasks and has not been extended to other tasks.
In conclusion, the proposed CLFP method addresses the challenge of limited labeled data in semi-supervised learning by innovatively combining complementary labeling and feature perturbation strategies. Extensive experiments demonstrate that CLFP outperforms existing mainstream methods across multiple benchmark datasets under various labeled-data configurations. The complementary labeling mechanism effectively leverages information from low-confidence samples, while the feature perturbation provides significant regularization benefits, offering new insights for semi-supervised learning research. The exceptional performance of CLFP confirms that fully exploiting the potential of unlabeled data with multi-level enhancement strategies can substantially improve the performance of semi-supervised learning.
5. Conclusions
This paper proposes an innovative semi-supervised learning approach named CLFP, which significantly enhances model performance in label-scarce scenarios through the introduction of a complementary labeling mechanism and a feature-level perturbation strategy. Our key contributions are twofold: first, we develop a learning framework based on complementary labels that effectively converts low-confidence samples—typically discarded by conventional methods—into valuable training signals, thereby fully unleashing the potential of unlabeled data; second, we design a dual-branch feature perturbation mechanism that extends beyond conventional image-level augmentations by incorporating perturbations in the feature space, with a dedicated dual-perturbation strategy applied to strongly augmented samples notably enhancing the model’s generalization capability.
While the current method has demonstrated superior performance, we acknowledge that there remains potential for further refinement. In future work, we will continue to explore more complex structured or adversarial perturbation strategies to generate more challenging and targeted feature representations, as well as develop adaptive methods for adjusting perturbation intensity and k-values. These approaches will be extended to more complex visual tasks such as detection and segmentation, and their scalability and efficiency will be further evaluated on large-scale datasets like ImageNet.