Meta-Data-Guided Robust Deep Neural Network Classification with Noisy Label

Lu, Jie; Wang, Yufeng; Shi, Aiju; Ma, Jianhua; Jin, Qun

doi:10.3390/app15042080

Open AccessArticle

Meta-Data-Guided Robust Deep Neural Network Classification with Noisy Label

by

Jie Lu

¹,

Yufeng Wang

^1,*

,

Aiju Shi

¹,

Jianhua Ma

²

and

Qun Jin

³

¹

Portland Institute, Nanjing University of Posts and Telecommunications, Nanjing 210003, China

²

Faculty of Computer & Information Sciences, Hosei University, Tokyo 184-8584, Japan

³

Faculty of Human Sciences, Waseda University, Tokorozawa-shi, Saitama 359-1192, Japan

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(4), 2080; https://doi.org/10.3390/app15042080

Submission received: 31 December 2024 / Revised: 10 February 2025 / Accepted: 14 February 2025 / Published: 16 February 2025

(This article belongs to the Special Issue Cutting-Edge Neural Networks for NLP (Natural Language Processing))

Download

Browse Figures

Versions Notes

Abstract

Deep neural network (DNN)-based classifiers have witnessed great applications in various fields. Unfortunately, the labels of real-world training data are commonly noisy, i.e., the labels of a large percentage of training samples are wrong, which negatively affects the performance of a trained DNN classifier during inference. Therefore, it is challenging to practically formulate a robust DNN classifier using noisy labels in training. To address the above issue, our work designs an effective architecture for training the robust DNN classifier with noisy labels, named a cross dual-branch network guided by meta-data on a single side (CoNet-MS), in which a small amount of clean data, i.e., meta-data, are used to guide the training of the DNN classifier. Specifically, the contributions of our work are threefold. First, based on the principle of small loss, each branch using the base classifier as a neural network module infers partial samples with pseudo-clean labels, which are then used for training another branch through a cross structure that can alleviate the cumulative impact of mis-inference. Second, a meta-guided module is designed and inserted into the single branch, e.g., the upper branch, which dynamically adjusts the ratio between the observed label and the pseudo-label output by the classifier in the loss function for each training sample. The asymmetric dual-branch design makes two classifiers diverge, which facilitates them to filter different types of noisy labels and avoid confirmation bias in self-training. Finally, thorough experiments demonstrate that the trained classifier with our proposal is more robust: the accuracy of the classifier trained with our proposed CoNet-MS on multiple datasets under various ratios of noisy labels and noise types outperforms other classifiers of learning with noisy labels (LNLs), including the state-of-the-art meta-data-based LNL classifier.

Keywords:

noisy label; meta-learning; noise robustness; classification task; deep neural networks

1. Introduction

Deep neural networks (DNNs) have witnessed great applications in various fields, enabling remarkable advancements in supervision-based classification tasks, due to the existence of a huge volume of training data with correct labels. However, practically, real-world datasets often face the problem of the high cost of manual labeling and even disagreement among experts in specialized fields [1], which always make the labels of training samples noisy, i.e., not sufficiently correct. The ubiquitous presence of noisy labels in real-world training data poses a significant challenge: DNNs are capable of memorizing the entire data, which easily overfits noisy labels and leads to poor performance during inference [2]. Therefore, it is particularly important to improve the accuracy and robustness of deep learning models, when facing noisy labels in training data [3,4].

Existing methods for learning with noisy labels (LNLs), such as the noise transition matrix [5], mainly focus on modeling noisy data. However, in practical applications, the noise types are usually dependent on the features of the samples. Thus, it is difficult to estimate the noise transition matrix, especially when the number of classes is large [6]. Therefore, instead of modeling noisy labels, based on the observation that DNNs often tend to fit samples with clean labels, i.e., learning simple patterns before fitting noisy labels, recent research such as MentorNet [7] aims to select clean samples out of the noisy ones and then uses them to update the DNN classifier through treating low-loss samples as clean [8]. To alleviate the accumulated error caused by the biased selection of training samples, co-teaching [9] maintains two neural networks simultaneously, in which each network feeds its filtered small-loss samples to teach its peer network for updating the parameters. However, the drawback of the above sample-selection-based methods lies in that they only use filtered clean samples for training and overlook training subsets which may affect performance in high-noise scenarios.

Unlike approaches that discard samples with noisy labels, meta-learning methods assign adaptive weights to each sample, with noisier ones receiving lower weights. Specifically, MetaLabelNet [10] uses a portion of existing clean samples in each training batch to train a classifier, which is then used to generate soft labels for other samples to train the base classifier. However, directly training the model on clean data may cause it to overfit, resulting in poor performance on the test dataset. Different from MetaLabelNet, with the help of a partially clean dataset, Learning to Bootstrap (L2B) [11] relabels training samples and further reweights different loss terms. However, the potentially biased relabeling of samples in L2B may lead to accumulative errors and consequently limit the performance of the model.

In brief, unlike previous approaches that directly utilize clean data to generate pseudo-labels or rely only on single models to reweight samples [10,11,12], we propose a novel classifier framework robust to a high noise ratio, called a cross dual-branch network guided by meta-data on a single side (CoNet-MS), which fully exploits the DNN training characteristics and prevents the accumulation of errors under a meta-learning framework. Specifically, our contribution can be summarized as follows:

We propose a cross dual-branch classifier framework enhanced with meta-learning to guide each other and achieve adaptive sample relabeling, CoNet-MS. CoNet-MS can improve the accuracy of classifiers by cross-guiding dual-branch and reweighting labels with the meta-guided module on the upper branch. Two classifier networks with divergent initialization can facilitate learning complementary features to filter distinct errors from noisy labels. Following the small-loss principle [13] where DNNs prioritize clean samples, each branch treats low-loss samples as high-confidence pseudo-clean ones. Cross-filtering high-confidence samples can alleviate error accumulation and mitigate overfitting to noisy labels. The meta-guided module in the upper branch further diversifies the dual branches, enhancing framework robustness.
Furthermore, the innovative meta-net module in the upper branch dynamically adjusts the weights of the output of the observed and pseudo-label by the upper classifier in the loss function, using part of the clean dataset. In brief, the clean data information is implicitly utilized to guide the pseudo-labeling correction of the samples instead of participating directly in training.
We experimentally demonstrate that the proposed meta-learning enhanced cross-dual network framework, CoNet-MS can improve the robustness of the model to label noise. Specifically, CoNet-MS achieves significant performance improvements compared to L2B. In addition, we also provide a comprehensive comparison with state-of-the-art (SOTA) techniques [9,10,11], and the result shows that our method has superior performance, highlighting the advantages of CoNet-MS in the field of meta-learning and LNLs.

This paper is organized as follows. Section 2 summarizes the research work related to LNLs. Section 3 describes our proposed framework CoNet-MS and its main modules and functionalities in detail. Section 4 thoroughly presents the experimental setting and results, and an ablation study is also provided. Finally, Section 5 briefly concludes this paper.

2. Related Works

This section introduces the related works of LNLs in the field of deep learning, which is categorized as schemes directly learning with noisy labels and meta-learning-assisted learning with noisy labels.

Directly learning with noisy labels. The field of noisy label learning has proposed several approaches to address the challenges of training models using noisy labeled data [2]. A noise transition matrix is used to capture the transition probability between the noisy labels and true labels [13,14]. However, the size of the noise transition matrix grows exponentially as the number of dataset categories increases, making it intractable to calculate. To avoid modeling noise, some methods commonly rely on the internal noise tolerance of the classifier and aim to boost performance by regularizing undesired memorization of noisy data [15,16,17,18]. Although these methods enhance the scalability of the LNL algorithm, they are highly dependent on the internal noise tolerance of the classifier and do not fully exploit the valid information in the noisy sample data [19]. Noise detection methods, on the other hand, focus on utilizing semi-supervised learning to identify and generate target probabilities. By guiding the model to the target probabilities, these methods avoid memorizing noisy labels and mitigate the effects of bad sample data [3,20]. In addition, label correction techniques aim to generate pseudo-labels that are more consistent with real labels [21]. Some methods strengthen sample selection through unsupervised contrastive learning, thus emphasizing robust representation learning [21,22,23,24]. Several studies have focused on selecting high-confidence samples and reweighting them. Ref. [25] proposed a co-optimization framework that can correct labels during training by alternating updates of network parameters and labels. Similarly, Ref. [9] proposed a co-teaching framework that aims to train two deep neural networks simultaneously. This framework is based on the property that deep neural networks tend to preferentially learn simple and correct samples [13]. The framework utilizes the small loss principle [4,7,26,27] to filter out reliable clean samples to guide the training of the other networks. Due to the differences in the initial parameters of the two networks, they can filter out the noisy labels when guiding each other’s training, thus mitigating the effect of noise on the model. Another sample-selecting method is Curriculum Learning (CL), which starts with simple samples and adds difficult samples for training as the model gradually improves [28]. Based on the CL theory, clean-labeled data are considered simple samples, while noise-labeled data are considered difficult tasks. Therefore, researchers start with confident clean samples and learn complex samples after the model is improved. Filtering clean samples can be performed by using a specific filtering loss function to rank the instances based on their characteristics [29].

Meta-learning with noisy labels. Meta-learning aims to learn a meta-task, which is a higher-level learning objective than classical supervised learning. The goal of meta-learning is to train a model on a variety of learning tasks, such that it can solve label noise tasks using only a small number of training samples. For the field of LNLs, technically, meta-learning methods use part of clean datasets to guide model optimization [10,12,14,30,31,32]. Based on meta-learning, Model-Agnostic Meta-Learning (MAML) is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, such as label noise learning [33]. A similar approach consists of finding model parameters that minimize the effect of noisy labels in meta-learning-based noise-tolerant (MLNT) training [30]. In this framework, MLNT formulates a meta-objective as training the model such that after it learns from various synthetic noisy labels using gradient update, the updated models give consistent predictions. In addition, another study called MetaLabelNet aims to train the base classifier on soft labels produced according to a meta-objective [10]. Although MetaLabelNet claims to fully utilize the information from the noisy data, the model’s accuracy on the meta-data is often significantly higher than that on the test dataset because the meta-labels are directly derived from meta-data. Unlike MetaLabelNet, L2B enables models to bootstrap themselves using their own predictions without being adversely affected by erroneous pseudo-labels, and since the clean data are not directly involved in the network update, this avoids overfitting the model to a small number of correct datasets [11].

Different from the above schemes of directly using clean data for sample relabeling or reweighting, we propose a generic learning framework that simultaneously trains dual-branch classifiers with different initializations and cross guidance. Furthermore, the meta-guided module is designed to reweight training samples and diversify the dual classifier branches.

3. The Proposed CoNet-MS Framework and Its Main Modules

The main notations and their meanings used in this paper are given in Table 1 for clarity of presentation.

3.1. Problem Statement

In supervised learning, assume that there are

N

training samples consisting of the dataset

D_{t r a} = \{(x_{i}, y_{i})| i = 1, \dots, N\}

, where

x_{i} \in X

denotes the i-th sample, e.g., image,

y_{i} \in Y

denotes the label of the i-th sample,

X

is the feature space, and

Y

is the label space. The observed label of each training sample is encoded as a one-hot vector with the length C. In addition, we introduce a clean dataset

D_{v a l} = \{(x_{i}^{v}, y_{i}^{v})| i = 1, \dots, M\}

without noise, where

M ≪ N

and the superscript

v

denotes that the dataset is the validation set. Our goal is to find a model mapping function

f : X \to Y

parameterized by

θ

, where

X

denotes the model input and

Y

denotes the model prediction. Typically, the empirical risk

R (f)

is computed on the training samples, shown as Equation (1).

R (f_{θ}) = E_{(x, y) \in D_{t r a}} [L (f_{θ} (x_{i}), y_{i})]

(1)

In Equation (1), the expectation

E (L)

denotes the math expectation of the loss function, and

L (f_{θ} (x_{i}), y_{i})

is the cross entropy, shown as Equation (2).

L (f_{θ} (x_{i}), y_{i}) = - y_{i} l o g f_{θ} (x_{i})

(2)

The optimal classifier

f

parameterized by

θ^{*}

aims to minimize the risk, given as Equation (3).

θ^{*} = \arg \min_{θ} R (f_{θ})

(3)

If the labels of the dataset are exactly correct, the classifier parameter

θ

can be learned by optimizing Equation (3) via the gradient descent method. However, in this study, there exist unknown noise types and distributions in the training dataset

D_{t r a}

, and the classifier can only observe the set of noisy training labels

y_{i}

. The task involves training a DNN to reduce the negative impact of the noisy labels and predict the true labels. In the following subsections, we describe the training process of the proposed method on noisy datasets.

3.2. The Overall Framework of CoNet-MS

As shown in Figure 1, CoNet-MS contains an asymmetric dual-branch classifier network structure: upper branch and low branch. Specifically, besides the same modules in the lower branch, the upper branch additionally includes a meta-guided module. The purpose of the meta-guided module is to learn to reweight loss values between the pseudo-label and the observed label of each training sample

x_{i}

, i.e.,

α_{i}

and

β_{i}

. Note that the diverse dual classifier branches can be cross-guided between each other and, due to the different learning abilities, can filter out some erroneous tendencies thus realizing self-correction.

The main modules of the proposed CoNet-MS framework are respectively presented in the following subsections.

3.3. Cross Dual-Branch Classifier Network Architecture

In this paper, we aim to reduce the accumulation of errors in the classifier by filtering the noisy labeled samples by exchanging the high-confidence samples between two networks. To achieve this, we designed the crossed dual-branch network architecture. During training, in each batch of data, samples with high confidence are filtered out based on the small-loss principle, and the high-confidence samples are fed into another branch of its classifier. Note that both classifiers in dual branches have the same neural network structure, but the upper-branch classifier is parameterized by

θ

, and the lower-branch classifier is parameterized by

φ

. Specifically, we first create two neural networks with different initial parameters. During each batch, classifiers make predictions for all samples

{{x}_{i}, y_{i}}

, and the loss values are computed. Subsequently, we select

P (t)

, the percentage of small-loss instances, out of the batch as high-confidence data, where

t

denotes the

t

-th training epoch.

\{x_{θ, i}, y_{θ, i}\}

denotes the high-confidence dataset from the upper-branch classifier,

{x_{φ}, y_{φ}}

is the high-confidence dataset of the lower-branch classifier. These samples are fed into another classifier; that is,

{x_{θ}, y_{θ}}

are input into the lower-branch classifier, and

{x_{φ}, y_{φ}}

are input into the upper-branch classifier.

The above operation requires the classifier to be robust enough to accurately filter out clean data. To ensure that the high-confidence samples selected in each batch are less noisy, we use the method of dynamically adjusting the memory rate

P (t)

. Since the network tends to fit simple clean data at the beginning of training, as the loss values can more accurately distinguish between clean and noisy samples,

P (t)

is designed to be a larger rate at the beginning of training. As the number of training times increases,

P (t)

should correspondingly decrease to avoid model overfitting to noisy labels. The empirical selection

P (t)

is given in Section 4.4.

The inferred pseudo-clean dataset by upper and lower branches

\{x_{θ}, y_{θ}\}

and

{x_{φ}, y_{φ}}

are formulated as Equations (4) and (5), where

|\{x_{i}, y_{i}\}|

means the number of samples in the obtained dataset.

\{x_{θ}, y_{θ}\} = {\arg m i n}_{\{x_{i}, y_{i}\} : |\{x_{i}, y_{i}\}| \geq P (t) \cdot n} L (f_{θ} (x_{i}), y_{i})

(4)

\{x_{φ}, y_{φ}\} = {\arg m i n}_{\{x_{i}, y_{i}\} : |\{x_{i}, y_{i}\}| \geq P (t) \cdot n} L (f_{φ} (x_{i}), y_{i})

(5)

Next, we input these samples into another branch; that is,

{x_{θ}, y_{θ}}

are input into the lower-branch classifier, and

{x_{φ}, y_{φ}}

are input into the upper-branch classifier. Then, the classifiers’ parameters are updated.

3.4. Meta-Guided Module: Reweighting Losses Between the Pseudo-Label and the Observed Label in Upper Branch

Due to the presence of noise in the dataset, Reed et al. proposed a method to generate training targets by using a convex combination of model predictions and training labels to generate pseudo-labels, thus avoiding the need to directly model the noise distribution [34]. Based on this idea, Ref. [11] implicitly filtered samples and corrected the mislabeling by predicting from a model trained on a clean dataset, assigning pseudo-labels to the training data and dynamically adjusting the weights of the samples and pseudo-labels. In our study, shown as Equation (6), pseudo-labels are defined as the maximum probability class predicted by the model, where

C

is the number of dataset classes.

y_{i}^{p s e u d o} = \arg \min_{c = 1, \dots, C} f_{θ} (x_{i})

(6)

The loss function of the optimization parameter

θ

is designed as Equation (7).

θ^{*} (α, β) = \arg \min_{θ} \sum_{i = 1}^{N} L (f_{θ} (x_{i}), α_{i} y_{i} + β_{i} y_{i}^{p s e u d o})

(7)

where

α

and

β

are used to balance the weights between the observed sample labels and pseudo-labels output by the upper branch. When

α

is small and

β

is large,

θ

is designed to narrow the gap between the model prediction and the pseudo-label for label correction. When both

α

and

β

are small, the weight of the samples in the loss accounts for a relatively small amount of weight, whose effect on the parameter update can be ignored, thus implicitly realizing the sample selection.

Therefore, the assignment of

α

and

β

needs to be very accurate to achieve the above goal. To this end, a method for generating meta-learning weights is introduced, which utilizes a small portion of the clean validation set

D_{v a l} = \{(x_{i}^{v}, y_{i}^{v})| i = 1, \dots, M\}

as the meta-data and generates the optimal loss weights via a meta-net, given as Equation (8).

α, β = \arg \min_{α, β \geq 0} \frac{1}{M} \sum_{i = 1}^{M} L (f_{θ^{*} (α, β)} (x_{i}^{v}, y_{i}^{v}))

(8)

where

θ^{*} (α, β)

is the result of updating the parameter

θ

in round t after Equation (7). The solution of Equation (8) can be approximated by the gradient descent on each batch validation set, shown as Equation (9).

(α_{i}, β_{i}) = {- η \nabla (\sum_{i = 1}^{m} f_{θ^{*} (α, β)} (x_{i}^{v}, y_{i}^{v}))|}_{α_{i} = 0, β_{i} = 0}

(9)

where

η

is the step size for updating.

Then, the following Equations (10) and (11) are simply used to ensure that

α

and

β

are non-negative and normalized.

α_{i} = \max (α_{i}, 0), β_{i} = m a x (β_{i}, 0)

(10)

α_{i}^{*} = \frac{α_{i}}{\sum_{i = 1}^{n} α_{i} + β_{i}}, β_{i}^{*} = \frac{β_{i}}{\sum_{i = 1}^{n} α_{i} + β_{i}}

(11)

Finally, as shown in Equations (12) and (13), the parameters of dual-branch classifier networks are updated based on the optimal weights obtained, where

\{x_{θ}, y_{θ}\}

and

\{x_{φ}, y_{φ}\}

are the high-confidence samples computed by Equations (4) and (5).

φ^{(t + 1)} = φ^{(t)} - λ \nabla (\sum_{i = 1}^{P (t) \cdot n} L (f_{φ} (x_{θ, i}), y_{θ, i}))

(12)

θ^{(t + 1)} = θ^{(t)} - λ \nabla (\sum_{i = 1}^{P (t) \cdot n} α_{i}^{*} L (f_{θ} (x_{φ, i}), y_{φ, i}) + β_{i}^{*} L (f_{θ} (x_{φ, i}), y_{φ, i}^{p s e u d o}))

(13)

where

λ

is the step size for updating classifier parameters

θ

and

φ

.

3.5. Training Pseudo-Codes

The training pseudo-codes of CoNet-MS are given in the following Algorithm 1.

Algorithm 1 CoNet-MS

1: Input:

θ_{0}, φ_{0}, D_{t r a}, D_{v a l}, n, m, L

2: Output:

θ_{T}

3: for

t = 0

to

T - 1

do

4:

{x_{i}, y_{i}} \to

SampleMiniBatch

(D_{t r a}, n)

5:

{x_{v}, y_{v}} \to

SampleMiniBatch

(D_{v a l}, m)

6: for each sample in

D_{t r a}

do
7:

y_{i}^{p s e u d o} \to a r g m a x_{c = 1, \dots, C} f θ (x i)

8: Initialize learnable weights

α, β

9: Compute Meta-net parameters

θ^{*}

by Equation (7)
10: Update Hyperparameters

α^{*}, β^{*}

by Equation (9)–(11)
11: Obtain

{x_{θ}, y_{θ}}

by Equation (4)
12: Obtain

{x_{φ}, y_{φ}}

by Equation (5)
13: end for
14: Update

φ^{(t + 1)}

by Equation (12)
15: Update

θ^{(t + 1)}

by Equation (13)
16: end for

The computational complexity of each module in our proposed CoNet-MS framework is given in Table 2, where

T

is the number of training epochs,

n

is the batch size of the training set, and

m

is the batch size of the clean validation set.

Specifically, the computational complexity of CoNet-MS is dominated by the cross dual-branch classifier network architecture and meta-guided module. To further elaborate, since the batch size of the clean validation set

m

is significantly smaller than the training batch size

n

(i.e.,

m ≪ n

), the computational complexity of the meta-guided module

O (T (n + m))

can be simplified to

O (T n)

. When combined with the cross dual-branch classifier’s complexity

O (T n l o g n)

, the overall complexity remains dominated by the higher-order term

O (T n l o g n)

.

4. Experiments

4.1. Experimental Datasets

To validate the performance of CoNet-MS, this paper conducted experiments on multiple datasets, including the CIFAR10 and ISIC2019 datasets.

For CIFAR10, the dataset contains 60,000 images across 10 balanced categories (6000 per class). We followed a standard split with 50,000 images for training, 5000 for testing and 5000 as metadata.

For the medical image analysis, we adopted the imbalanced ISIC2019 dataset for skin lesion classification. The training set consists of 6120 images with the following class distribution: Melanoma (MEL, 1124), Melanocytic nevus (NV, 3191), Basal cell carcinoma (BCC, 798), Benign keratosis (BKL, 606), Squamous cell carcinoma (SCC, 129), Actinic keratosis (AK, 192), Vascular lesion (VASC, 36) and Dermatofibroma (DF, 44). This reflects a significant class imbalance, where the majority class (NV) has over 70× more samples than the smallest classes (DF and VASC). Additionally, we used 640 images for validation and 4291 images for testing. The imbalance in training data extremely challenges model generalizability, which aligns with our focus on robust performance under real-world clinical scenarios.

In our experiments, the training data labels are corrupted by two kinds of noise types: uniform noise and pair-flipping noise. For uniform noise, labels are flipped uniformly into all the other classes with a given error rate. For pair-flipping noise, labels are flipped into similar classes with a given error rate. Specifically, the training datasets are manually corrupted by the noise transition matrix

Q

, where

Q_{i j} = P (\tilde{y} = j| y = i)

, given that noisy

\tilde{y}

is flipped from clean

y

. The precise definitions of uniform noise and pair-flipping noise transfer matrices are given as Equations (14) and (15).

U n i f o r m n o i s e : Q = [\begin{matrix} 1 - ϵ & \frac{ϵ}{n - 1} & \dots & \frac{ϵ}{n - 1} & \frac{ϵ}{n - 1} \\ \frac{ϵ}{n - 1} & 1 - ϵ & \frac{ϵ}{n - 1} & \dots & \frac{ϵ}{n - 1} \\ ⋮ & ⋱ & ⋮ \\ \frac{ϵ}{n - 1} & \dots & \frac{ϵ}{n - 1} & 1 - ϵ & \frac{ϵ}{n - 1} \\ \frac{ϵ}{n - 1} & \frac{ϵ}{n - 1} & \dots & \frac{ϵ}{n - 1} & 1 - ϵ \end{matrix}]

(14)

P a i r f l i p p i n g : Q = [\begin{matrix} 1 - ϵ & ϵ & 0 & \dots & 0 \\ 0 & 1 - ϵ & ϵ & 0 \\ ⋮ & ⋱ & ⋱ & ⋮ \\ 0 & 1 - ϵ & ϵ \\ ϵ & 0 & \dots & 0 & 1 - ϵ \end{matrix}]

(15)

where

n

is the number of the total classes, and

ϵ

is the noise rate. Note that the above uniform noise and pair-flipping noise transfer matrices are totally the same as those in co-teaching [9].

4.2. Baselines

In the comparison experiments, this paper presents a comprehensive comparison between CoNet-MS and a variety of state-of-the-art schemes that train classification neural networks using datasets of samples with noisy labels, the comparative schemes are described as follows:

Conventional cross entropy: Training a classifier neural network directly with noisy data as the most basic and lowest-bound comparison to visualize the effect of noisy labels on classification performance.
Co-teaching [9]: Simultaneously training two classifier networks. Based on the small-loss principle, each network guides the other network to filter out the noisy data during training to reduce the influence of noisy samples.
MetaLabelNet [10]: Alternatively training a meta-network using meta-data and letting the meta-network generate pseudo-labels for training samples to guide the updating of a classifier.
L2B [11]: Using meta-data to reweight loss function weights, adjust sample influence and implicitly correct labels.

4.3. Evaluation Metrics

In this paper, the proposed scheme and all baselines are evaluated in terms of classification accuracy on the test dataset, i.e., test accuracy (TA), which is defined as Equation (16).

T A = \frac{|(x_{i}^{t}, y_{i}^{t}) {\in D}_{t e s t} : f (x_{i}^{t}) = y_{i}^{t}|}{D_{t e s t}}

(16)

where

D_{t e s t}

represents the test set,

x_{i}^{t}

represents the features of the test instance,

y_{i}^{t}

represents the ground-truth label of each sample, and

f (x_{i}^{t})

represents the output of the classifier trained by each scheme. If two classifiers are trained in some scheme, e.g., co-teaching and our proposed,

f (x_{i}^{t})

is simply set as the average of both classifiers’ output. Note that, intuitively, under the same percentage of noise, the larger the TA is, the more robust the classifier is.

Another evaluation indicator is Macro F1, which is defined as Equation (17).

F 1 = \frac{2 \cdot P r e c i s i o n \cdot R e c a l l}{P r e c i s i o n + R e c a l l}

(17)

M a c r o F 1 = \frac{\sum F 1 s c o r e f o r e a c h c l a s s}{T h e n u m b e r o f c l a s s e s}

(18)

The definitions of precision (P) and recall (R) are given in Equations (19) and (20).

P = \frac{T P}{T P + F P}

(19)

R = \frac{T P}{T P + F N}

(20)

The meanings of TP, FP, FN and TN are illustrated as follows:

TP (true positive): the number of samples that are actually positive and predicted to be positive;
FP (false positive): the number of samples that are actually negative but predicted to be positive;
FN (false negative): the number of samples that are actually positive but predicted to be negative;
TN (true negative): the number of samples that are actually negative and predicted to be negative.

4.4. Experimental Settings

Without the loss of generality, we used a convolutional neural network containing eight layers, including six convolutional layers and two fully connected layers. The batch size was set to 64. The model was trained using stochastic gradient descent (SGD) with a momentum of 0.9 and weight decay of 0.0005. A total of 120 epochs of model training were conducted, with the first 10 epochs set as warm-up training. The initial learning rate was set to 0.15 with a cosine annealing decay. The optimizer and learning rate of the main network and meta-network were kept constant, while the training process was stabilized by gradient clipping. An RTX 4060 GPU was used for the experiments.

In our experiments, as shown in Figure 2, the function of high-confidence sample proportion

P (t)

is linearly reduced from 1 to

(1 - n o i s e r a t e)

in the first 10 warm-up epochs. The rationale lies in that, when the number of epochs gets large, classifiers will eventually overfit on noisy labels, and thus,

P (t)

should be set to keep clean instances and drop those noisy ones before classifiers memorize them.

4.5. Performance Evaluation

Comparison with the State-of-the-Art Methods

Table 3 shows a comparison of the classification accuracy of each scheme under different uniform noise rates on the dataset CIFAR-10. Our algorithm shows the best performance under all noise rate conditions.

According to the experimental results in Table 3, the following conclusions can be drawn:

Firstly, CoNet-MS shows the best classification performance under many different noise rates among all schemes, which fully demonstrates the superiority and robustness of our proposed scheme.
L2B achieves the second-best test accuracy under all noise scenarios by adjusting the loss weights by meta-learning to counteract noisy labels. However, it is not sufficient to reduce the influence of noisy samples in the loss function, and there is a gap between the pseudo-label assigned to the samples and the ground-truth label. Our scheme makes the following improvement: introducing a dual-network sample filtering mechanism to discard samples with lower confidence and retain more valuable samples at each update. Empirically, compared to L2B, our scheme CoNet-MS achieves better classification accuracy at different noise rates, with a $3 %$ improvement in test accuracy under the scenario of high noise rate ( $80 %$ ), demonstrating the high robustness of our scheme to high-ratio noise.
Although MetaLabelNet uses part of clean data, i.e., meta-data for the relabeling of training samples, the meta-network is only trained on clean datasets and may overfit to these data and lead to the biased soft label for other training samples, which may affect the classifier performance. To solve this problem, CoNet-MS utilizes the meta-net to reweight the loss value between the pseudo-label and the observed label of the training sample while filtering samples through the cross dual-branch structure. As a result, the classification accuracy of the final model is improved by nearly $20 %$ compared to the former in the case of an 80% noise rate.
The co-teaching method utilizes the “small loss” principle to filter out samples with high confidence and achieves good noise immunity by filtering potentially noisy samples from two networks with different parameters. However, simply using small loss to select training samples may result in leftover noise labels, and even worse, we cannot use the abundant samples with noisy labels, which may negatively affect the model. To address this issue, CoNet-MS proposed implicitly realizes the discarding of noisy samples by dynamically weighting the loss function to reduce the impact of lower confidence samples when updating the network. In addition, CoNet-MS also performs label estimation, which further improves the accuracy of the model by correcting the sample labels.

Figure 3 shows the confusion matrix of our work under the uniform noise rate of 50%. The results demonstrate that our work effectively learns the correct distribution of samples. The diagonal cells in Figure 3 indicate the percentage of samples that are correctly classified by our scheme, while the non-diagonal cells mean the percentage of misclassified samples. Obviously, under a high noise ratio, e.g., 50%, our scheme can achieve outstanding classification performance.

Table 4 shows a comparison of the classification accuracy of each method under the pair-flipping noise rate of 40% on the dataset Cifar-10. Note that we chose this rate to simulate a moderate to large level of noise that is likely to occur in real-world scenarios. Actually, a similar trend can be obtained for a higher pair-flipping noise rate.

As shown in Table 4, our CoNet-MS achieves the highest test accuracy of 88.43% under 40% pair-flipping noise on CIFAR-10, outperforming L2B (87.96%) and co-teaching (73.85%). Moreover, the following conclusions can be drawn:

Unlike L2B, which relies solely on meta-weights, CoNet-MS further reduces confirmation bias by cross-validating pseudo-labels between two networks, leading to a 0.47% accuracy gain over L2B.
Compared to co-teaching (73.85%), our method shows stronger resilience to high noise ratios, indicating that meta-learning enhances the precision of the sample selection of co-teaching.

In terms of the performance metrics accuracy and Macro F1, Table 5 and Table 6 give a comparison of the classification performance of each method under the uniform noise and pair-flipping noise at the rate of 40% on the dataset ISIC2019. Our algorithm shows the best performance.

In detail, the following conclusions can be drawn:

First of all, under the imbalanced dataset, CoNet-MS outperforms all baseline methods in both noise types. Specifically, under uniform noise, CoNet-MS achieves the highest accuracy of 74.23%, which is a 1.82% improvement compared to L2B. Meanwhile, with a pair-flipping noise rate of 40%, our scheme still achieves the best performance, i.e., 69.75% accuracy, but improves by only 0.51% relative to L2B, reflecting the particular challenges caused by the noise between similar classes in an imbalanced dataset, which should be thoroughly investigated in future work.
Secondly, compared with results in Table 3 and Table 4, that is, the test accuracy of uniform and pair-flipping noises on the balanced dataset, i.e., CIFAR-10, the results in Table 5 and Table 6 are generally smaller than those in Table 3 and Table 4. It is well known that an imbalanced training dataset will have a negative effect on classification accuracy, even in a training dataset with fully clean labels.
As shown in Table 5, under uniform noise, our method achieves the Macro F1 of 0.7234, surpassing co-teaching and L2B. This indicates that CoNet-MS not only improves overall accuracy but also better balances precision and recall across imbalanced classes. For pair-flipping noise (Table 6), CoNet-MS attains the Macro F1 of 0.6734, marginally exceeding L2B. The smaller Macro F1 gap (0.0076) compared to the uniform noise scenario (0.0136) aligns with the observed accuracy trend. Specifically, the noise-induced confusion between semantically close classes likely degrades both precision (false positives) and recall (false negatives), thereby limiting Macro F1 gains.

4.6. Ablation Study

To evaluate the effect of different components in our proposed CoNet-MS framework, we conducted ablation experiments.

4.6.1. Comparison with Asymmetric CoNet-MS and Symmetric Variant CoNet-MBS

We created a control framework named a cross dual-branch network guided by meta-data on both sides (CoNet-MBS). Specifically, CoNet-MBS adds the meta-guide module to the lower branch of CoNet-MS and modifies its loss function to Equation (7), which is a weighted loss dynamically adjusted by meta-learning. In other words, the framework consists of two symmetric classifiers, but the initial parameters of the two classifiers still differ.

Figure 4 demonstrates the accuracy of the CoNet-MBS model under different uniform noise rates. The following results can be observed:

The test accuracy of CoNet-MS outperforms CoNet-MBS at all noise rates. A key idea of a dual-branch classifier network to filter out samples is to utilize the difference in learning capabilities between the two networks for mutual supervision to reduce labeling noise. If two networks simultaneously use the same strategy to adjust the weights, they may interfere with the sample selection process and produce redundant and erroneous sample judgments, thus reducing the selection of valid samples, which in turn affects the model performance. In contrast, if only one network performs meta-learning weight adjustment, that network is better able to utilize its features without being disturbed by another network’s adjustment.
At low noise rates (e.g., $20 %$ to $60 %$ ), the performance of the two models is very close, while at a high noise rate of $80 %$ , the performance of the CoNet-MS model improves by almost $10 %$ over CoNet-MBS. This is because two networks with simultaneous weight adjustments may lead to overly complex fitting of the model to the training data, especially at high noise rates, which is more likely to result in overfitting, thus reducing the model’s ability to generalize to unseen data.

4.6.2. Comparison of the Outputs of the Upper- and Lower-Branch Classifiers

To comprehend the role of meta-learning, we also compare the performance of different classifiers in CoNet-MS only using the upper-branch classifier, only using the lower-branch classifier, and using both classifiers in dual branches. The final output of CoNet-MS averages the outputs of the upper- and lower-branch classifiers at the Softmax layer. Figure 5 shows the accuracy of the three classifiers at different uniform noise levels. The following results can be observed:

The test accuracy of CoNet-MS is the highest at all noise rates. This illustrates that the upper and lower-branch classifiers learn different features and patterns from the data, and when their predictions are averaged, they can combine this complementary information to provide more comprehensive and accurate predictions.
In addition, Figure 5 shows that the lower classifier has higher test accuracy than the upper classifier at all noise rates. This may be due to the fact that the upper branch is embedded with a meta-guided module, which makes the high-confidence samples of its classifier more reliable. By exchanging high-confidence samples through the dual network, the lower branch receives reliable samples from the upper branch, which improves the accuracy of the lower classifier.

5. Conclusions

In this paper, we propose CoNet-MS, a general deep neural network classification framework based on meta-learning with cross dual-branch classifier networks to combat noisy labels in training datasets for classification tasks. Each branch in CoNet-MS firstly filters out the clean data by utilizing the principle of “small loss” of DNN learning characteristics and then is crossed to train each other. In addition, in one branch, we adopt a meta-learning approach, where a small portion of the clean dataset is used as metadata to guide the network in assigning loss weights, implicitly adjusting the sample weights and pseudo-labeling and thus improving the robustness of the model. On multiple datasets including balanced and imbalanced datasets, we compare the proposed CoNet-MS with other baseline schemes under multiple noise types including uniform and pair-flipping noise and various noise rates, and the experimental results show the robustness of the proposed CoNet-MS: achieve the best test accuracy and Macro F1.

Our work can be further improved in the following areas. First, following the convention in meta-learning, in our work, part of the clean validation dataset is assumed to exist, future work could relax this assumption: infer the pseudo-clean validation dataset from the training dataset and investigate its effect on our framework. Second, the small-loss principle is simply used as a sample selection strategy, other more effective principles should be explored for sample selection and would further improve our framework’s performance. In addition, the dual-branch framework has a limited effect on imbalanced datasets, and the design of pseudo-labels needs to be further optimized. Finally, in this paper, we just utilize the maximum probability class predicted by the model on the samples as a pseudo-label. However, under high-noise-rate conditions, model predictions become less reliable, and the maximum probability class pseudo-labeling approach may be misleading to the model. Therefore, some semi-supervised-learning-based label estimation can be used to infer more reliable pseudo-labels through data augmentation and consistency regularization to improve the model performance. These enhancements could further bridge performance gaps in challenging scenarios like pair-flipping noise on imbalanced datasets, advancing real-world applicability in domains where expert annotations are costly and error-prone.

Author Contributions

Methodology, Y.W.; Software, J.L.; Validation, J.M.; Investigation, J.M.; Resources, A.S.; Data curation, A.S.; Writing—original draft, J.L.; Writing—review & editing, Y.W.; Supervision, Q.J.; Project administration, Q.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Song, H.; Kim, M.; Park, D.; Shin, Y.; Lee, J.G. Learning from noisy labels with deep neural networks: A survey. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 8135–8153. [Google Scholar] [CrossRef]
Algan, G.; Ulusoy, I. Image classification with deep learning in the presence of noisy labels: A survey. Knowl.-Based Syst. 2021, 215, 106771. [Google Scholar] [CrossRef]
Liu, S.; Niles-Weed, J.; Razavian, N.; Fernandez-Granda, C. Early-learning regularization prevents memorization of noisy labels. Adv. Neural Inf. Process. Syst. 2020, 33, 20331–20342. [Google Scholar]
Yao, Y.; Sun, Z.; Zhang, C.; Shen, F.; Wu, Q.; Zhang, J.; Tang, Z. Jo-src: A contrastive approach for combating noisy labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 5192–5201. [Google Scholar]
Dgani, Y.; Greenspan, H.; Goldberger, J. Training a neural network based on unreliable human annotation of medical images. In Proceedings of the 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), Washington, DC, USA, 4–7 April 2018; pp. 39–42. [Google Scholar]
Cheng, J.; Liu, T.; Ramamohanarao, K.; Tao, D. Learning with bounded instance and label-dependent label noise. In Proceedings of the International Conference on Machine Learning, Vienna, Austria, 13–18 July 2020; pp. 1789–1799. [Google Scholar]
Jiang, L.; Zhou, Z.; Leung, T.; Li, L.-J.; Fei-Fei, L. Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 2304–2313. [Google Scholar]
Arpit, D.; Jastrzkebski, S.; Ballas, N.; Krueger, D.; Bengio, E.; Kanwal, M.S.; Maharaj, T.; Fischer, A.; Courville, A.; Bengio, Y.; et al. A closer look at memorization in deep networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 233–242. [Google Scholar]
Han, B.; Yao, Q.; Yu, X.; Niu, G.; Xu, M.; Hu, W.; Tsang, I.; Sugiyama, M. Co-teaching: Robust training of deep neural networks with extremely noisy labels. Adv. Neural Inf. Process. Syst. 2018, 31, 8536–8546. [Google Scholar]
Algan, G.; Ulusoy, I. Metalabelnet: Learning to generate soft-labels from noisy-labels. IEEE Trans. Image Process. 2022, 31, 4352–4362. [Google Scholar] [CrossRef] [PubMed]
Zhou, Y.; Li, X.; Liu, F.; Wei, Q.; Chen, X.; Yu, L.; Xie, C.; Lungren, M.P.; Xing, L. L2B: Learning to Bootstrap Robust Models for Combating Label Noise. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 23523–23533. [Google Scholar]
Algan, G.; Ulusoy, I. Meta soft label generation for noisy labels. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 7142–7148. [Google Scholar]
Li, M.; Soltanolkotabi, M.; Oymak, S. Gradient descent with early stopping is provably robust to label noise for overparameterized neural networks. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Online Conference, 26–28 August 2020; pp. 4313–4324. [Google Scholar]
Xia, X.; Liu, T.; Wang, N.; Han, B.; Gong, C.; Niu, G.; Sugiyama, M. Are anchor points really indispensable in label-noise learning? Adv. Neural Inf. Process. Syst. 2019, 32, 6835–6846. [Google Scholar]
Ghosh, A.; Kumar, H.; Sastry, P.S. Robust loss functions under label noise for deep neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar]
Wang, X.; Hua, Y.; Kodirov, E.; Clifton, D.A.; Robertson, N.M. IMAE for noise-robust learning: Mean absolute error does not treat examples equally and gradient magnitude’s variance matters. arXiv 2019, arXiv:1903.12141. [Google Scholar]
Zhang, Z.; Sabuncu, M. Generalized cross entropy loss for training deep neural networks with noisy labels. Adv. Neural Inf. Process. Syst. 2018, 32, 8792–8802. [Google Scholar]
Wang, Y.; Ma, X.; Chen, Z.; Luo, Y.; Yi, J.; Bailey, J. Symmetric cross entropy for robust learning with noisy labels. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 322–330. [Google Scholar]
Zhang, C.; Bengio, S.; Hardt, M.; Recht, B.; Vinyals, O. Understanding deep learning (still) requires rethinking generalization. Commun. ACM 2021, 64, 107–115. [Google Scholar] [CrossRef]
Zhang, B.; Li, Y.; Tu, Y.; Peng, J.; Wang, Y.; Wu, C.; Xiao, Y.; Zhao, C. Learning from noisy labels with coarse-to-fine sample credibility modeling. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2022; pp. 21–38. [Google Scholar]
Li, J.; Socher, R.; Hoi, S.C. Dividemix: Learning with noisy labels as semi-supervised learning. arXiv 2020, arXiv:2002.07394. [Google Scholar]
Ghosh, A.; Lan, A. Contrastive learning improves model robustness under label noise. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 2703–2708. [Google Scholar]
Karim, N.; Rizve, M.N.; Rahnavard, N.; Mian, A.; Shah, M. Unicon: Combating label noise through uniform selection and contrastive learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 9676–9686. [Google Scholar]
Li, J.; Xiong, C.; Hoi, S.C. Learning from noisy data with robust representation learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 9485–9494. [Google Scholar]
Tanaka, D.; Ikami, D.; Yamasaki, T.; Aizawa, K. Joint optimization framework for learning with noisy labels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5552–5560. [Google Scholar]
Shen, Y.; Sanghavi, S. Learning with bad training data via iterative trimmed loss minimization. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 5739–5748. [Google Scholar]
Yu, X.; Han, B.; Yao, J.; Niu, G.; Tsang, I.; Sugiyama, M. How does disagreement help generalization against label corruption? In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 7164–7173. [Google Scholar]
Bengio, Y.; Louradour, J.; Collobert, R.; Weston, J. Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning, Montreal, QC, Canada, 14 June 2009; pp. 41–48. [Google Scholar]
Han, B.; Tsang, I.W.; Chen, L.; Celina, P.Y.; Fung, S.-F. Progressive stochastic learning for noisy labels. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 5136–5148. [Google Scholar] [CrossRef] [PubMed]
Li, J.; Wong, Y.; Zhao, Q.; Kankanhalli, M.S. Learning to learn from noisy labeled data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 5051–5059. [Google Scholar]
Zhang, Z.; Zhang, H.; Arik, S.O.; Lee, H.; Pfister, T. Distilling effective supervision from severe label noise. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9294–9303. [Google Scholar]
Zheng, G.; Awadallah, A.H.; Dumais, S. Meta label correction for noisy label learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual Conference, 2–9 February 2021; Volume 35, pp. 11053–11061. [Google Scholar]
Finn, C.; Abbeel, P.; Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 1126–1135. [Google Scholar]
Reed, S.; Lee, H.; Anguelov, D.; Szegedy, C.; Erhan, D.; Rabinovich, A. Training deep neural networks on noisy labels with bootstrapping. arXiv 2014, arXiv:1412.6596. [Google Scholar]

Figure 1. The overall framework of CoNet-MS.

Figure 2. Illustration of the used function of high-confidence proportion

P (t)

over epoch

t

.

Figure 2. Illustration of the used function of high-confidence proportion

P (t)

over epoch

t

.

Figure 3. A visualization of the confusion matrix learned by our CoNet-MS model trained with 50% uniform noise.

Figure 4. Test accuracies for CoNet-MS and the variant CoNet-MBS at different uniform noise rates on CIFAR10.

Figure 5. Test accuracies for different levels of uniform noise for the two classifiers and CoNet-MS on CIFAR10.

Table 1. Main notations and their meanings throughout this paper.

Notation	Description
$N$	The number of samples in the training set
$M$	The number of samples in the clean validation set
$n$	The batch size of training set
$m$	The batch size of clean validation set
$D_{t r a}$	$The set of all training samples \{(x_{i}, y_{i})\| i = 1, \dots, N\}$
$D_{v a l}$	$The set of all validation samples \{(x_{i}^{v}, y_{i}^{v})\| i = 1, \dots, M\}$
$D_{t e s t}$	$The set of all test samples {(x_{i}^{t}, y_{i}^{t}) \| i = 1, \dots, S}$
$f_{θ}$	$The classifier parameterized by θ$
$f_{φ}$	$The classifier parameterized by φ$
$L$	Cross-entropy loss function
$y_{i}^{p s e u d o}$	Pseudo-label of the sample $x_{i}$
$C$	The number of classes in dataset
$f_{θ} (x_{i})$	$The class probability \{p_{1}, p_{2}, \dots, p_{C}\}$ $of x_{i}$ $output by classifier (θ$ )
$P (t)$	The high-confidence proportion of samples in the $t$ -th epoch
$α_{i}$	The weight loss of the pseudo-label for $x_{i}$
$β_{i}$	$The weight loss of the observed label for x_{i}$
$α_{i}^{}, β_{i}^{}$	$The normalized weights of α_{i}$ $and β_{i}$
${x_{θ}, y_{θ}}$	The high-confidence dataset selected by $f_{θ}$
${x_{φ}, y_{φ}}$	The high-confidence dataset selected by $f_{φ}$
$θ^{*}$	Parameters of the meta-net
$θ^{(t)}$	Parameters of the upper-branch classifier in the $t$ -th epoch
$η$	$The step size for updating α, β$
$λ$	$The step size for updating θ, φ$

Table 2. Computational complexity of each module in the CoNet-MS framework.

Component	Computational Complexity
Cross dual-branch classifier network architecture	$O (T n l o g n)$
Meta-guided module	$O (T (n + m))$

Table 3. Test accuracy (%) of all schemes under different levels of uniform noise on the CIFAR-10 dataset.

Noise Ratio (%)	20	40	50	60	80
Conventional Cross Entropy	82.86	77.84	72.06	67.62	40.02
Co-teaching [9]	85.88	81.02	75.84	70.28	43.88
MetaLabelNet [10]	83.42	77.74	75.32	71.2	49.98
L2B [11]	89.44	87.86	85.99	77.99	66.09
Ours (CoNet-MS)	91.3	89.61	88.80	80.78	69.11

Bolded and underlined content in the table indicates that it is data from our paper.

Table 4. Comparison of test accuracy (%) with 40% pair-flipping noise on CIFAR-10 dataset.

Noise Ratio (%)	40
Conventional Cross Entropy	73.24
Co-teaching [9]	73.85
MetaLabelNet [10]	84.66
L2B [11]	87.96
Ours (CoNet-MS)	88.43

Bolded and underlined content in the table indicates that it is data from our paper.

Table 5. Performance of all schemes under the uniform noise with a rate of 40% on the dataset ISIC2019 (imbalanced dataset).

Method	Accuracy (%)	Macro F1
Conventional Cross Entropy	70.47	0.6852
Co-teaching [9]	73.43	0.7123
L2B [11]	72.41	0.7098
Ours (CoNet-MS)	74.23	0.7234

Bolded and underlined content in the table indicates that it is data from our paper.

Table 6. Performance of all schemes under the pair-flipping noise with a rate of 40% on the dataset ISIC2019 (imbalanced dataset).

Method	Accuracy (%)	Macro F1
Conventional Cross Entropy	62.90	0.5918
Co-teaching [9]	67.49	0.6375
L2B [11]	69.24	0.6658
Ours (CoNet-MS)	69.75	0.6734

Bolded and underlined content in the table indicates that it is data from our paper.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lu, J.; Wang, Y.; Shi, A.; Ma, J.; Jin, Q. Meta-Data-Guided Robust Deep Neural Network Classification with Noisy Label. Appl. Sci. 2025, 15, 2080. https://doi.org/10.3390/app15042080

AMA Style

Lu J, Wang Y, Shi A, Ma J, Jin Q. Meta-Data-Guided Robust Deep Neural Network Classification with Noisy Label. Applied Sciences. 2025; 15(4):2080. https://doi.org/10.3390/app15042080

Chicago/Turabian Style

Lu, Jie, Yufeng Wang, Aiju Shi, Jianhua Ma, and Qun Jin. 2025. "Meta-Data-Guided Robust Deep Neural Network Classification with Noisy Label" Applied Sciences 15, no. 4: 2080. https://doi.org/10.3390/app15042080

APA Style

Lu, J., Wang, Y., Shi, A., Ma, J., & Jin, Q. (2025). Meta-Data-Guided Robust Deep Neural Network Classification with Noisy Label. Applied Sciences, 15(4), 2080. https://doi.org/10.3390/app15042080

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Meta-Data-Guided Robust Deep Neural Network Classification with Noisy Label

Abstract

1. Introduction

2. Related Works

3. The Proposed CoNet-MS Framework and Its Main Modules

3.1. Problem Statement

3.2. The Overall Framework of CoNet-MS

3.3. Cross Dual-Branch Classifier Network Architecture

3.4. Meta-Guided Module: Reweighting Losses Between the Pseudo-Label and the Observed Label in Upper Branch

3.5. Training Pseudo-Codes

4. Experiments

4.1. Experimental Datasets

4.2. Baselines

4.3. Evaluation Metrics

4.4. Experimental Settings

4.5. Performance Evaluation

Comparison with the State-of-the-Art Methods

4.6. Ablation Study

4.6.1. Comparison with Asymmetric CoNet-MS and Symmetric Variant CoNet-MBS

4.6.2. Comparison of the Outputs of the Upper- and Lower-Branch Classifiers

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI