Noisy Label Learning for Gait Recognition in the Wild

Yuan, Shuping; Zheng, Jinkai; Li, Xuan; Sun, Yaoqi; Li, Wenchao; Gao, Ruilai; Omar, Mohd Hasbullah; Zhang, Jiyong

doi:10.3390/electronics14193752

Open AccessArticle

Noisy Label Learning for Gait Recognition in the Wild

by

Shuping Yuan

¹,

Jinkai Zheng

^2,3,*,

Xuan Li

²,

Yaoqi Sun

^3,4,

Wenchao Li

⁵,

Ruilai Gao

⁶,

Mohd Hasbullah Omar

⁶ and

Jiyong Zhang

^1,*

¹

School of Artificial Intelligence, Hangzhou Dianzi University, Hangzhou 310018, China

²

School of Communication Engineering, Hangzhou Dianzi University, Hangzhou 310018, China

³

Lishui Institute of Hangzhou Dianzi University, Lishui 323000, China

⁴

School of Mathematics and Computer Science, Lishui University, Lishui 323000, China

⁵

School of Transportation and Vehicle Engineering, Wuxi University, Wuxi214105, China

⁶

School of Computing, Universiti Utara Malaysia, Kedah 06010, Kedah Darul Aman, Malaysia

^*

Authors to whom correspondence should be addressed.

Electronics 2025, 14(19), 3752; https://doi.org/10.3390/electronics14193752

Submission received: 22 August 2025 / Revised: 15 September 2025 / Accepted: 16 September 2025 / Published: 23 September 2025

Download

Browse Figures

Versions Notes

Abstract

Gait recognition, as a biometric technology with great potential, has been widely applied in numerous fields due to its unique advantages. Through in-depth research and the creation of in-the-wild gait datasets, gait recognition technology is progressively extending from laboratory settings to complex real-world scenarios, achieving notable advancements. However, the complexity of annotating gait data inevitably leads to labeling errors, known as noisy labels, which are one of the reasons for the suboptimal performance of in-the-wild gait recognition. To address these issues, this paper explores noisy label learning for in-the-wild gait recognition for the first time. We propose a plug-and-play gait recognition framework named Dynamic Noise Label Correction Network (DNLC). Specifically, it consists of two main parts: the dynamic class-center feature library and the label correction module, which can automatically identify and correct noisy labels based on the class-center feature library. In addition, we introduce the two-stage augmentation strategy to increase the diversity of the data and help reduce the impact of noisy labels. We integrated our proposed framework into five existing gait models and conducted extensive experiments on two widely used gait recognition datasets: Gait3D and CCPG. The results show that our framework increased the average Rank-1 accuracy of five methods by 10.03% and 6.45% on the Gait3D and CCPG datasets, respectively. These findings demonstrate the superior performance of our method.

Keywords:

gait recognition; real-world scenarios; noisy label learning; label correction; two-stage data augmentation; deep learning

1. Introduction

Gait, as a biometric feature with significant potential, reflects the walking patterns of pedestrians. Due to variations in movement and body shape, each individual’s gait is unique, allowing for the identification of target pedestrians in videos [1]. Compared to other biometric features, such as face, fingerprint, and iris, gait has distinct advantages, including remote accessibility, non-contact nature, and difficulty in disguise. Owing to the distinctive advantages of gait recognition technology, its applications have been successfully extended to various domains, including emotion recognition [2], health status assessment [3], and security protection [4].

Early gait recognition research focused almost entirely on laboratory environments, with CASIA-B [5] and OU-MVLP [6] as the standard benchmarks. Powered by deep learning, remarkable success has been achieved [7,8,9]. Notably, CSTL [10] reaches 98% Rank-1 on the CASIA-B. In recent years, gait recognition research has gradually shifted from laboratory environments to real-world scenarios. Lots of real-world gait datasets have been successively built, such as GREW [11] and Gait3D [12]. However, the above-mentioned gait recognition methods both showed significant performance degradation in real-world scenarios. In particular, the accuracy of the above-mentioned gait recognition methods on Gait3D has dropped by more than 40%.

To address this issue, researchers have adopted diverse strategic approaches. Some have focused on optimizing gait feature modeling. For instance, the MTSGait [13] method proposed by Zheng et al. effectively models gait patterns in real-world scenarios by concurrently learning spatial features and multi-level temporal features. In contrast to the MTSGait approach, which directly extracts features from video frames to construct representations, the DyGait [14] method focuses on analyzing differences between frames. It abstracts dynamically rich areas, such as the legs, to build spatiotemporal feature representations of human body motion. Meanwhile, other researchers have been exploring more efficient gait representation methods. For example, the XGait [15] method proposes to integrate the silhouette sequence and the gait parsing sequence to unleash the mutual information of these two gait representations. Despite the notable advancements made, there remains a disparity between the performance of current methods in real-world environments and superior performance in Laboratory settings.

After our detailed analysis, we find that these studies overlooked a fact, that is, the noise label. In unconstrained real-world environments, complex backgrounds and frequent identity switches [12] inevitably introduce widespread mislabeling during dataset construction. As illustrated in Figure 1, two types of noise are commonly introduced during the annotation of gait data. The first is identity confusion noise (Noise-I), where different pedestrians are mistakenly labeled with the same identity ID. The second is identity split noise (Noise-II), where the same pedestrian is incorrectly assigned to multiple different identity IDs. Research shows that noisy labels seriously damage the performance of deep learning-based models [16,17].

To this end, we propose a plug-and-play gait recognition framework named Dynamic Noise Label Correction Network (DNLC), which contains two main parts. The first part is the dynamic class-center feature library. It can adaptively aggregate similar instances and distill them into class-center features, which guide subsequent label correction. The second part is the label correction module. It can automatically discover and correct noisy labels by assessing the similarity between instance features and class-center features. Furthermore, we introduce a two-stage data augmentation strategy that learns the intrinsic features of the data rather than just memorizing noisy labels. Extensive experiments are conducted on two widely adopted challenging gait datasets, i.e., Gait3D and CCPG, which demonstrate the effectiveness of our proposed method.

In summary, the contributions of this paper are as follows:

To the best of our knowledge, we are the first to explore noisy label learning for gait recognition in the wild.
We propose a plug-and-play gait framework, named Dynamic Noise Label Correction Network (DNLC), to automatically discover and correct the noisy labels, which consists of the dynamic class-center feature library and the label correction module.
We introduce a new two-stage augmentation strategy, which can efficiently improve the model to learn robust gait features in noisy labels.
Extensive experiments demonstrate that our proposed method can effectively promote the performance of existing gait recognition methods. As a plug-and-play solution, the DNLC framework can be seamlessly integrated into existing gait recognition systems without the need for additional complex operations or techniques.

The organization of the remainder of this paper is as follows: Section 2 offers a comprehensive review of existing literature, and identifies the gaps in current research that our study aims to address. Section 3 provides an overview of the proposed Dynamic Noise Label Correction Network (DNLC) framework, detailing its architectural components and the underlying methodologies. In Section 4, we delve into the experimental setup, covering the datasets used, implementation details, and evaluation metrics. We then present the results of our extensive experiments, comparing the DNLC framework with state-of-the-art gait recognition methods under various conditions. Additionally, an ablation study is conducted to analyze the impact of different components of our framework on overall performance. Section 5 discusses the implications of our findings and suggests potential directions for future research. Finally, Section 6 concludes the paper with a summary of our key findings and contributions.

2. Related Works

2.1. Gait Recognition

Gait recognition technology has gradually evolved from controlled laboratory settings to real-world scenarios. Early research in gait recognition primarily focused on laboratory scenarios, where the primary objective was to directly extract gait features from silhouette sequences to establish representations. For instance, Chao et al. [7] introduced the GaitSet method, which treats gait sequences as unordered sets and employs max pooling to extract spatiotemporal information, enabling cross-view gait recognition. Fan et al. [8] proposed a new network called GaitPart that horizontally partitions gait silhouettes and designs global and local branches to collect more useful gait knowledge, significantly enhancing gait recognition performance. Lin et al. [9] used 3D convolutions to simultaneously extract spatial and temporal information and introduced a GLConv module to aggregate global and local features, further strengthening the model’s representation of gait features. These works were mainly conducted on laboratory datasets such as CASIA-B and OU-LP. Data collection in laboratory settings is often constrained, with issues such as fixed viewpoints, uniform walking patterns, and simple backgrounds. This has led to limitations in handling real-world noise data and complex backgrounds, despite these methods significantly improving gait recognition accuracy in laboratory scenarios.

In recent years, the research focus has gradually shifted to gait recognition in real-world scenarios. Gait recognition in the wild faces challenges such as arbitrary viewpoint changes, diverse walking patterns, pedestrian occlusions, and complex backgrounds. To address these challenges, researchers have proposed a series of methods. Zheng et al. [18] proposed a new gait representation, i.e., the gait parsing sequence, which can greatly improve recognition accuracy. Zheng et al. introduced the MTSGait [13] method, which effectively models the temporal patterns of gait while concurrently learning spatial features and hierarchical inter-frame features, offering a comprehensive approach to capturing dynamic gait characteristics. Wang et al. [14] proposed a dynamic gait recognition method that adapts to gait variations in different scenarios by dynamically adjusting model parameters. XGait [15] designed global and local cross-granularity modules to achieve efficient granularity alignment of silhouette and parsing sequences at both global and local levels.

Despite significant progress in handling diverse data and complex backgrounds in real-world scenarios, gait recognition methods for learning with noisy labels are still in their infancy. Currently, there are very few gait recognition methods that focus on learning with noisy labels. Yu et al. [19] proposed a noise-resistant cyclic network (CNTN) that refines the teaching process and reduces the memorization of noisy labels by imposing robust constraints of co-teaching, thereby improving model performance. However, this study mainly focused on laboratory scenarios. To the best of our knowledge, our study is the first to address gait recognition with noisy labels in real-world scenarios.

2.2. Noisy Label Learning

Deep neural networks (DNNs) have shown outstanding performance in various computer vision applications, such as image classification [20,21,22], object detection [23], and image processing [24,25]. However, their effectiveness is highly contingent upon the availability of large volumes of accurately annotated data [26]. Zhang et al. [27] demonstrated that the presence of noisy labels enables DNNs with a substantial number of parameters to overfit training datasets with any proportion of corrupted labels, thereby significantly diminishing the model’s generalization ability on test sets and leading to performance degradation. It has been reported that the range of labeling noise in real-world datasets spans from 8% to 38.5% [28,29]. The issue of noisy labels in datasets is becoming a more significant challenge in deep learning, especially in areas requiring accurate classification [30,31,32]. Current techniques for learning with noisy labels can be primarily categorized into four types [33]: First, robust supervised learning network architectures involve modeling a noise transition matrix to estimate the relationship between noisy and clean labels, constructing robust deep learning models based on the obtained noise transition matrix. Early papers explored adding noise adaptation layers on top of networks [34], while current research tends to design specialized network architectures [35,36] to mitigate the adverse effects of noise on model training. Second, robust regularization techniques aim to enhance model stability in the presence of label noise. Examples include Dropout and normalization [37,38], weight decay [39,40], and data augmentation [41]. Third, loss function adjustments typically follow two approaches: customizing loss functions [42,43,44], or adjusting the loss values of training samples before updating deep neural networks, such as label correction [45,46] and loss correction [47], to minimize the negative impact of noisy labels. Fourth, sample selection strategies carefully design selection mechanisms based on data characteristics to identify and retain more reliable samples, thereby fully exploiting sample information. Representative methods include co-training (Decoupling [48], co-teaching [49], co-teaching+ [50], and JoCoR [51]), as well as studies based on these methods [52]. These methods, starting from different perspectives, together form the research framework for learning with noisy labels, providing a diverse set of solutions to address this challenge.

Gait datasets in the real world are often subject to interference from complex backgrounds, changes in clothing, occlusions, and other factors, which inevitably introduce noisy labels during the data annotation process. By leveraging the latest techniques in the field of learning with noisy labels and integrating the characteristics of the gait recognition task, we propose a robust training framework that can effectively address the issue of noisy labels and enhance the performance of gait recognition in the wild.

3. Method

In this section, we first provide an overview of the proposed framework. Then, we describe the key components of our method, including the construction and update mechanism of the dynamic class-center feature library and the label correction module. Next, a two-stage gait data augmentation strategy is introduced. Finally, we present the details of model training and inference.

3.1. Overview

We denote the sample sequences in the gait dataset as

{\{(x_{i}, y_{i})\}}_{i = 1}^{n}

, where n is the number of gait silhouette sequences in the dataset,

x_{i}

represents the input silhouette sequence, and

y_{i} \in {1, \dots, K}

is its identity label. Here K is the total number of identities in the dataset, i.e., the number of classes. To address the noisy labels in the task of gait recognition in the wild, we design an efficient Dynamic Noise Label Correction Network (DNLC). As shown in Figure 2, given an input gait sequence

x_{i}

, we first adopt the corresponding data augmentation strategy according to the different training stages: Warm-up or formal training stage, to generate the augmented sequence

x_{i}^{'}

. Then, we input the sequence

x_{i}^{'}

into an existing gait model to obtain the gait feature

f_{i}

. Subsequently, in each iteration of the training, we compute the similarity vector

s_{i}^{k}

between the instance feature

f_{i}

and each class-center feature

c_{k}

. Next, we combine the classification probabilities

p_{i}

output by the gait model with

s_{i}

to realize the label correction. Finally, we utilize the corrected labels to progressively update the class-center features and model parameters.

3.2. The Dynamic Class-Center Feature Library

As illustrated in Figure 2, we construct and maintain a class-center feature library, which consists of a set of feature vectors

\{c_{1}, c_{2}, \dots, c_{K}\}

, where K denotes the total number of classes. Each

c_{k}

is the class centroid. In particular, we treat the library as a self-learning process. At the start of training, we take one random feature per class as the initial centroid. It is worth noting that the exact initial feature is not important, since the vector will be iteratively optimized online. In each iteration, we incorporate the features of each sample into the class-center features in a certain proportion. Specifically, the Exponential Moving Average (EMA) is employed. The specific process is as follows:

c_{{\hat{y}}_{i}} = m c_{{\hat{y}}_{i}} + (1 - m) f_{i},

(1)

where m represents the momentum coefficient, used to balance the influence of new and old knowledge.

{\hat{y}}_{i}

is the corrected label, which participates in subsequent loss calculations, as detailed in Section 3.3.

During the process, the movement of the class-center feature is synchronous. These class-center features are then added to the label correction to form a closed loop. Better centroids yield cleaner labels, and cleaner labels refine the centroids. Thus, a virtuous cycle is formed. Without manual tuning, the class-center features adaptively become robust and serve as a reliable prior for correcting noisy labels.

3.3. The Label Correction Module

Guided by the dynamic class-center feature library, the label correction module is proposed to automatically discover and correct noisy labels. During each iteration of training, we first compute the cosine similarity scores

S_{i}^{k}

between each instance feature

f_{i}

and the class-center feature

c_{k}

. This step measures the proximity of the sample feature to each class centroid. The specific operation is as follows:

s_{i}^{k} = \frac{exp (f_{i} \cdot \frac{c_{k}}{τ})}{\sum_{k = 1}^{K} exp (f_{i} \cdot \frac{c_{k}}{τ})},

(2)

where

τ

represents the temperature coefficient, which plays a crucial role in controlling the sharpness of the softmax distribution. Specifically,

τ

affects the spread of the probability distribution over the classes. A lower

τ

results in a more focused distribution, emphasizing the class with the highest score, whereas a higher

τ

leads to a more diffuse distribution, diminishing the relative differences between class probabilities.

Subsequently, we utilize the classifier to provide classification probabilities

p_{i} \in R^{K}

, where K is the number of classes. By combining the classification probabilities

p_{i}

and the computed similarities

s_{i} = {\{s_{i}^{k}\}}_{k = 1}^{K}

, we further generate the soft pseudo labels

g_{i}

:

g_{i} = λ p_{i} + (1 - λ) s_{i},

(3)

where

λ

serves as a tunable scaling factor to balance the influence of classification probabilities and similarity scores.

Comparing the soft label

g_{i}

with a predefined threshold T, if the highest score in

g_{i}

exceeds the threshold T, the corresponding class of that score is selected as the corrected label

{\hat{y}}_{i}

; otherwise, the original label

y_{i}

is retained.

{\hat{y}}_{i} = \{\begin{matrix} {argmax}_{j} g_{i}^{j}, & if \max_{j} g_{i}^{j} > T \\ y_{i}, & otherwise . \end{matrix}

(4)

3.4. The Two-Stage Augmentation Strategy

To increase diversity and complexity in the training process, thereby better simulating potential interferences and occlusions in real-world scenarios, we plan to implement a series of data augmentation measures on gait silhouette sequences. The purpose of these augmentation strategies is not only to prevent the model from memorizing noisy labels but, more importantly, to promote the model’s in-depth understanding of the essential features of the data. We propose a two-stage augmentation strategy (TSA). TSA will adopt a more aggressive data augmentation strategy during the warm-up phase and switch to a milder augmentation method during the formal training phase. We have employed a suite of data augmentation techniques tailored for gait silhouettes, including horizontal flipping, rotation, perspective transformation, and translation. These methods have been validated for their efficacy in enhancing recognition accuracy [53]. Through this phased and intensity-differentiated data augmentation approach, we expect the model to remain sensitive to clean samples while enhancing its robustness to noisy labels.

3.5. Training and Inference

Training. To optimize the performance, the refined labels

{\hat{y}}^{i}

are used to guide both the cross-entropy loss and triplet loss functions. The cross-entropy loss is as follows:

L_{ce} = - \frac{1}{N} \sum_{i = 1}^{N} \sum_{k = 1}^{K} {\hat{y}}_{i k} log (p ({\hat{y}}_{i} = k ∣ x_{i})) .

(5)

When calculating the triplet loss, the class-center vectors

x_{a}

serve as anchors. Samples from the same class are treated as positive samples

x_{p}

, while those from different classes are treated as negative samples

x_{n}

. The similarities

s_{ap}

and

s_{an}

between the anchor and the positive sample, and between the anchor and the negative sample, are then calculated.

s_{ap}^{i} = s (x_{a}, x_{p}^{i}) = \frac{x_{a} \cdot x_{p}^{i}}{∥x_{a}∥ ∥x_{p}^{i}∥};

(6)

s_{an}^{i} = s (x_{a}, x_{n}^{i}) = \frac{x_{a} \cdot x_{n}^{i}}{∥x_{a}∥ ∥x_{n}^{i}∥} .

(7)

Our goal is to bring samples from the same class closer together and push samples from different classes further apart in the feature space.

L_{tri} = \frac{1}{N} \sum_{i = 1}^{N} [max (0, τ (s_{an}^{i} - s_{ap}^{i}) + ϵ)],

(8)

where

ϵ

is a very small positive number, typically used for numerical stability to prevent division by zero. The value of

τ

is consistent with that in Equation (2).

Ultimately, our framework optimizes model performance through a composite loss function.

L = α L_{ce} + β L_{tri} .

(9)

where parameters

α

and

β

serve as loss weight factors.

Inference. During the inference phase, we follow the same configuration as existing methods [7].

4. Experiment

In Section 4.1 of this paper, we first introduce the datasets utilized. Subsequently, in Section 4.2, we outline the specific implementation details. Following this, in Section 4.3 and Section 4.4, we integrate our proposed DNLC framework into several existing gait recognition methods on the Gait3D and CCPG datasets to explore its effectiveness. We selected four methods for experimentation: GaitBase, a straightforward yet robust baseline approach; DyGait, which effectively focuses on dynamic information during human walking, fully leveraging temporal features. XGait, which explores the unique advantages of both silhouette and gait sequence features, and achieves multimodal fusion for the first time. Additionally, to verify the universality of our DNLC framework, we also applied it to the GaitSet method. Although GaitSet was initially developed based on laboratory scenarios, it was the first to treat silhouette sequences as unordered sets in the field of gait recognition. Currently, most gait recognition methods have adopted this input format. Therefore, the methods we chose cover both laboratory and real-world scenarios, encompassing both single and multimodal gait recognition approaches, which systematically and comprehensively validate the effectiveness and universality of DNLC. Subsequently, in Section 4.5, we conducted ablation studies, and further provided visualization analysis in Section 4.7.

4.1. Dataset

Gait3D. The Gait3D [12] dataset is a gait dataset based on real-world scenarios. It includes 4000 subjects, 25,309 sequences, and 3,279,239 frame images captured by 39 cameras in an unconstrained indoor environment, namely a large supermarket. Following the official training/testing strategy [12], 3000 subjects are selected for the training set, and another 1000 for the testing set. In the testing set, one sequence per subject is designated as the query, with the remaining sequences forming the gallery. The dataset features unique challenges such as 3D viewpoints, irregular walking speeds, and occlusions. These factors make gait recognition in the wild highly challenging. Despite rigorous label cleaning during its creation, we introduce artificial noise labels into the training set to simulate real-world scenarios. Considering the 3000 IDs in the testing set, we opt for pair-wise noise injection. The performance of the Gait3D dataset is evaluated using Rank-1, Rank-5, mean Average Precision (mAP), and mean Inverse Negative Penalty (mINP) [54].

CCPG. The CCPG dataset, recently introduced for gait recognition under clothing changes [55], consists of 200 subjects and 16,566 sequences. These sequences were captured using two outdoor and eight indoor cameras. The dataset offers a wide range of clothing variations, including 13 tops, 8 bottoms, and 5 different bags. It also covers various walking routes, such as turns, occlusions, and background changes. These factors make CCPG highly challenging. According to the official training/testing strategy [55], 100 subjects are used for training and another 100 for testing. During the testing phase, four clothing change scenarios are provided: full clothing change (CL-FULL), upper clothing change (CL-UP), lower clothing change (CL-DN), and carrying a backpack (CL-BG). To maintain consistency with the Gait3D dataset, we also introduce pair-wise noise into the CCPG dataset. The performance of the CCPG dataset is evaluated using Rank-1, mean Average Precision (mAP), and mean Inverse Negative Penalty (mINP).

4.2. Implementation Details

Inputs. For the Gait3D dataset, we directly use the official silhouette and parsing sequences as inputs for our proposed framework. For the CCPG dataset, since parsing data is not provided, we follow the method from the XGait [15] paper and use CDGNet to extract parsing information. To explore our framework’s label correction capability under different noise rates, we introduce paired noise [56] at various levels into both the Gait3D and CCPG datasets, with ratios of 10%, 20%, 30%, 40%, 45%, and 50%.

Settings. In this study, we standardized the preprocessing of silhouette and gait sequences from the Gait3D and CCPG datasets, adjusting their resolution to 64 × 44 pixels. As shown in Table 1, the GaitSet, GaitBase, DyGait, and XGait methods have been successfully implemented on the Gait3D dataset, and we strictly adhered to their original experimental configurations. Upon integrating these methods into our framework, we introduced a warm-up iteration count (warmup_iter), set to one-third of the total iterations, while maintaining all other settings consistent with the original methods on the Gait3D. On the CCPG dataset, only the XGait method has been successfully applied, so we retained the experimental settings consistent with the original paper. For the other three methods, we referred to their configurations in OpenGait [53] and adjusted the batch sizes according to the CCPG dataset paper [55] for training. Specific experimental details are shown in Table 1.

Apart from the parameters listed in the tables, there are no differences in other settings between the two datasets; that is, the same method employs identical settings across both datasets, including the optimizer, learning rate, and loss functions. The specific details are consistent with the original methods on the Gait3D. In Equations (2) and (8), we set the value of

τ

to 0.1. Both

λ

in Equation (3) and m in Equation (1) were assigned a value of 0.5, to balance the contributions of the two parts. The particular value for T in Equation (4) is 0.8. As for the loss weights

α

and

β

in Equation (9), they remain consistent with the original methods.

4.3. Experimental Results on Gait3D

To validate the effectiveness of our proposed Dynamic Noise Label Correction Network (DNLC), we conducted extensive experiments on the Gait3D dataset, comparing it with several state-of-the-art methods, including GaitSet [7], MTSGait [13], GaitBase [53], DyGait [14], and XGait [15].

As shown in Table 2, our framework consistently outperformed these methods across all noise rates, achieving significant improvements in key performance metrics. Specifically, for GaitSet-DNLC, MTSGait-DNLC, GaitBase-DNLC, DyGait-DNLC, and XGait-DNLC, we achieved average improvements in Rank-1 of 12.18%, 5.75%, 18.81%, 10.42%, and 2.97% across various noise rates, respectively. These results indicate that DNLC not only effectively mitigates the impact of noisy labels but also enhances overall performance on clean data. Moreover, as the noise rate increases, the magnitude of performance improvement also generally shows an upward trend. This superior performance is attributed to the dynamic label correction mechanism integrated into our framework, which effectively identifies and rectifies noisy labels during training. Therefore, our proposed gait recognition framework, DNLC, demonstrates potential for real-world gait recognition applications.

4.4. Experimental Results on CCPG

To further test the generalization ability of our framework, we conducted experiments on the CCPG dataset. This set introduces additional challenges, such as clothing changes and more complex environments. We compared our method with GaitSet, GaitBase, DyGait, and XGait, using the same noise levels as in Gait3D.

Table 3 shows that DNLC brings consistent gains across all backbones from 0.1 to 0.5 noise. This steady improvement highlights our label correction method. It adapts to the diverse and noisy nature of CCPG by identifying and removing identity-switch errors. Furthermore, the increased accuracy on clean data confirms that the framework not only removes noisy labels but also distills intrinsic gait features. These features remain sharp in real-world scenes.

Particularly, DyGait-DNLC and XGait-DNLC exhibit a “dip-then-rise” pattern. When the noise level is less than or equal to 0.2, the Rank-1 accuracy of these methods is slightly lower than the original methods, with a maximum decrease of 0.9 percentage points (pp). When the noise level is more than or equal to 0.3, the curves quickly reverse, leading to a final lead of 1.2 pp. We attributed this to the varying activation frequency of a fixed threshold across different noise intervals. In low-noise regions, DyGait and XGait already capture real-world features well. Labels are almost clean, so DNLC’s extra correction acts as mild regularization and briefly tightens the decision boundary. In high-noise regions, erroneous labels grow rapidly. The dynamic center library compresses noise through confidence-weighted momentum updates. Correction benefits soon outweigh regularization costs, and performance surpasses the baseline. This noise sensitivity suggests future work should introduce adaptive correction strength. The framework can then transition smoothly between clean and noisy scenes, avoiding loss in low noise and amplifying gains in high noise. Such noise-adaptive behavior is essential for reliable gait recognition in security and medical applications. This phenomenon can be attributed to the varying activation frequency of a fixed threshold across different noise intervals.

4.5. Ablation Study

To systematically evaluate the contributions of key components within the Dynamic Noise Label Correction Network (DNLC), we conducted a series of ablation studies on the Gait3D dataset with a noise rate of 0.5. In this section, we first validated the effectiveness of each critical component on Gait3D at a 50% label noise rate, including the label correction module (LCM) and the two-stage augmentation strategy (TAS). As shown in Table 4, we observed the specific impacts of each component on model performance. Subsequently, we further analyzed the combined effects of these components, revealing the synergistic benefits when they work together.

Effectiveness of TAS. Applying TAS alone raises GaitSet’s Rank-1 to 31.5% and XGait’s to 60.1%. The two-stage schedule—strong augmentation during warm-up, weak augmentation thereafter—suppresses noisy label overfitting regardless of the backbone origin. GaitSet, originally optimized for stable lighting and fixed viewpoints, learns sharper gait cues under aggressive augmentation, while XGait, already robust to viewpoint variation, uses the milder second stage to fine-tune decision boundaries without catastrophic forgetting. Consequently, TAS delivers consistent gains (+6.8% for GaitSet, +0.9% for XGait) without adding parameters.

Effectiveness of LCM. As shown in Table 4, equipping GaitSet with LCM lifts Rank-1 from 24.7% to 33.9%, mAP from 19.1% to 26.1%, and mINP from 10.6% to 14.9%. This 9.2% jump under 50% label noise demonstrates that the dynamic class-center memory can transfer corrective guidance across domains. GaitSet was engineered on clean laboratory sequences, yet the same network, without any architectural change, now learns to ignore background clutter and focus on gait-critical regions when LCM re-weights samples by their confidence. On the real-world-native XGait, LCM boosts Rank-1 from 59.2% to 62.5% and mAP from 49.8% to 52.6%, proving that even models already trained on noisy data still benefit from explicit label cleaning.

Synergy of LCM and TAS. Combining LCM and TAS yields the best results: GaitSet reaches 37.2% Rank-1, a further 3.3% beyond the sum of individual gain, and XGait peaks at 63.3%, setting a new record on Gait3D under 50% noise. The synergy stems from complementary mechanisms: LCM explicitly corrects identities, reducing label noise, while TAS hardens features against residual noise. This joint strategy bridges the gap between laboratory-born architectures and real-world data, confirming that label-level cleaning plus data-level robustness is essential for high-noise gait recognition.

4.6. Sensitivity Analysis

To assess the model’s sensitivity to specific hyperparameter values, we systematically adjusted several key hyperparameters and closely monitored the impact of these adjustments on model performance. Specifically, we initially conducted a parameter sensitivity analysis on the Gait3D dataset with a moderate noise level (0.3) using the GaitBase [53] model. GaitBase is a straightforward yet robust baseline. Subsequently, we modified the hyperparameters. Specifically, we adjusted the temperature coefficient

τ

from 0.1 to 0.01, 0.2, and 0.5; the label correction threshold T from 0.8 to 0.6; and both the label cleaning weight parameter

λ

and the momentum parameter m were adjusted from 0.5 to 0.8.

As shown in Table 5, the experimental results demonstrate that the model maintained high stability across different values of these hyperparameters, with no significant performance fluctuations observed. This stability reflects a certain degree of robustness in our model’s selection of hyperparameters, which is significant in practical applications as it reduces reliance on hyperparameter tuning. Furthermore, it suggests that the model may have converged to a relatively optimal solution, where minor changes in hyperparameters are not sufficient to cause significant performance changes.

4.7. Visualization

Feature Distribution. To evaluate the discriminability of the learned features, we randomly sampled eight identities from the Gait3D test set and visualized their feature distributions.

Figure 3 shows t-SNE embeddings projected onto a 2-D plane, with colors denoting identities. After introducing our DNLC framework, clusters tighten for every method. GaitSet moves from loose blobs to compact ellipses. GaitBase loses almost all cross-identity overlaps. DyGait pulls its scattered tails back to the center. Even XGait, pre-trained on wild data, now shows sharper boundaries at dynamic regions like shoulders and foot landings.

HeatMap. In this section, we visualize activation heatmaps on CPG before and after adding the DNLC framework. For this purpose, we have intentionally selected GaitBase and XGait as representatives of several methods in our visualizations. As a simple yet powerful baseline method, GaitBase relies solely on core components without the need for complex modules or techniques, thus effectively representing unimodal approaches. In contrast, XGait serves as a quintessential representative of multimodal methods, highlighting the distinction between the two.

Figure 4 shows that GaitBase initially highlights scattered background regions. After DNLC is applied, the activations quickly focus on the torso, shoulders, and foot-landing areas. This shift indicates that the network discards redundant backgrounds and captures gait-related dynamics. For XGait, although the original heatmaps already cover legs and foot contacts, they still include many irrelevant regions. XGait-DNLC refines these maps, concentrating activation on the shoulders and foot-landing points. The model now emphasizes shoulder swing and leg motion, integrating both spatial structure and temporal dynamics. Overall, DNLC guides the model toward essential gait cues and significantly improves both discriminability and robustness.

5. Discussion

In this study, we have conducted an in-depth analysis of the experimental results based on existing literature, discussed potential uncertainties, identified the limitations of our research, and proposed future research directions.

The Dynamic Noise Label Correction Network (DNLC) we proposed has demonstrated superior performance on the Gait3D and CCPG datasets compared to the current state-of-the-art methods. As shown in Table 2 and Table 3, DNLC outperforms methods such as GaitSet, GaitBase, DyGait, and XGait across various noise levels. This confirms our hypothesis regarding the importance of achieving noise-robust learning in real-world scenarios. Furthermore, our framework is both simple and effective. The integration of our framework into the model does not significantly increase the number of parameters.

It must be acknowledged that there are certain uncertainties and limitations. Firstly, the DNLC framework is predicated on the assumption of uniformly distributed noise across the entire dataset. However, noise distribution in real-world datasets is often non-uniform, which may impact the model’s generalization capability. Secondly, our current research focuses solely on label correction for gait silhouette sequences. To address these limitations, future work will concentrate on several directions: First, extending DNLC to datasets with non-uniform noise distributions. By further optimizing the model structure to enable adaptive learning rate schedules based on noise levels, we can enhance robustness against noise. Second, we can explore the possibility of noise label correction in other gait representations, such as parsing sequences and skeletal sequences.

6. Conclusions

In this study, we introduce the concept of “noise label self-correction” into real-world gait recognition tasks for the first time. We propose an end-to-end framework known as the Dynamic Noise Label Correction Network (DNLC). This approach differs from previous methods that were based on increasing model capacity or applying regularization to “passively dilute” noisy labels. The DNLC framework explicitly tracks the evolution of class prototypes during training through a dynamic class-center feature library. It also actively rewrites suspicious labels using a label correction module. Furthermore, a two-stage data augmentation strategy is employed to collaboratively suppress noisy memories at the data level. Our experiments on the Gait3D and CCPG datasets demonstrate the effectiveness of DNLC. It has achieved significant improvements over existing gait models such as GaitSet, MTSGait, GaitBase, DyGait, and XGait, across various noise rates. Notably, on the Gait3D dataset, when the noise rate reached 50%, DNLC increased the Rank-1 accuracy of the aforementioned methods by 12.5%, 7.5%, 19.6%, 11.5%, and 4.13% percentage points, respectively. These results highlight the plug-and-play nature of DNLC, which can seamlessly integrate into existing models, then enhance their performance without major architectural adjustments or parameter tuning. Nevertheless, our study has limitations. The DNLC framework assumes a uniform distribution of noise across the dataset, which may not hold in the real world. Future research can explore adaptive mechanisms to cope with non-uniform noise distributions, further enhancing the model’s generalization capability.

Author Contributions

Conceptualization, S.Y. and J.Z. (Jinkai Zheng); methodology, S.Y.; software, X.L.; validation, S.Y. and J.Z. (Jinkai Zheng); formal analysis, S.Y.; investigation, X.L., W.L. and R.G.; resources, J.Z. (Jiyong Zhang) and M.H.O.; data curation, S.Y.; writing—original draft preparation, S.Y.; writing—review and editing, S.Y. and J.Z. (Jinkai Zheng); visualization, S.Y.; supervision, J.Z. (Jinkai Zheng); project administration, Y.S., W.L. and R.G.; funding acquisition, Y.S., W.L., R.G. and J.Z. (Jiyong Zhang). All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Pioneer and Leading Goose R&D Program of Zhejiang Province, grant number 2023C01046, and the Fundamental Research Funds for the Provincial Universities of Zhejiang, grant number GK259909299001-044. The APC was funded by the Pioneer and Leading Goose R&D Program of Zhejiang Province.

Data Availability Statement

The dataset used in this study for the Gait3D number of infections is available at: https://gait3d.github.io/#dataset, while CCPG can be found at: https://github.com/BNU-IVC/CCPG. Both them are accessed on December 2023. All other data sources are related to our model parameters and can be found in Section 4.1 in which we provide references from published articles and/or websites.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wang, L.; Tan, T.; Ning, H.; Hu, W. Silhouette analysis-based gait recognition for human identification. IEEE Trans. Pattern Anal. Mach. Intell. 2003, 25, 1505–1518. [Google Scholar] [CrossRef]
Zou, Y.; He, N.; Sun, J.; Huang, X.; Wang, W. Occluded Gait Emotion Recognition Based on Multi-Scale Suppression Graph Convolutional Network. Comput. Mater. Contin. 2025, 82, 1255. [Google Scholar] [CrossRef]
Harris, E.; Khoo, I.; Demircan, E. A Survey of Human Gait-Based Artificial Intelligence Applications. Front. Robot. AI Ed. Pick. 2023, 8, 17. [Google Scholar] [CrossRef]
Wan, C.; Wang, L.; Phoha, V.V. A survey on gait recognition. ACM Comput. Surv. 2018, 51, 1–35. [Google Scholar] [CrossRef]
Tan, D.; Huang, K.; Yu, S.; Tan, T. Efficient Night Gait Recognition Based on Template Matching. In Proceedings of the 18th International Conference on Pattern Recognition (ICPR’06), Hong Kong, China, 20–24 August 2006; IEEE Computer Society: Washington, DC, USA, 2006; pp. 1000–1003. [Google Scholar]
Takemura, N.; Makihara, Y.; Muramatsu, D.; Echigo, T.; Yagi, Y. Multi-view large population gait dataset and its performance evaluation for cross-view gait recognition. IPSJ Trans. Comput. Vis. Appl. 2018, 10, 4. [Google Scholar] [CrossRef]
Chao, H.; He, Y.; Zhang, J.; Feng, J. GaitSet: Regarding Gait as a Set for Cross-View Gait Recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; pp. 8126–8133. [Google Scholar]
Fan, C.; Peng, Y.; Cao, C.; Liu, X.; Hou, S.; Chi, J.; Huang, Y.; Li, Q.; He, Z. GaitPart: Temporal Part-Based Model for Gait Recognition. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 14213–14221. [Google Scholar]
Lin, B.; Zhang, S.; Yu, X.; Chu, Z.; Zhang, H. Learning Effective Representations from Global and Local Features for Cross-View Gait Recognition. arXiv 2020, arXiv:2011.01461. [Google Scholar]
Huang, X.; Zhu, D.; Wang, H.; Wang, X.; Yang, B.; He, B.; Liu, W.; Feng, B. Context-sensitive temporal feature learning for gait recognition. In Proceedings of the IEEE/CVF International Conference on Compsuter Viion, Montreal, BC, Canada, 11–17 October 2021; pp. 12909–12918. [Google Scholar]
Zhu, Z.; Guo, X.; Yang, T.; Huang, J.; Deng, J.; Huang, G.; Du, D.; Lu, J.; Zhou, J. Gait recognition in the wild: A benchmark. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 14789–14799. [Google Scholar]
Zheng, J.; Liu, X.; Liu, W.; He, L.; Yan, C.; Mei, T. Gait Recognition in the Wild with Dense 3D Representations and A Benchmark. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 20196–20205. [Google Scholar]
Zheng, J.; Liu, X.; Gu, X.; Sun, Y.; Gan, C.; Zhang, J.; Liu, W.; Yan, C. Gait Recognition in the Wild with Multi-hop Temporal Switch. In Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, 10–14 October 2022; pp. 6136–6145. [Google Scholar]
Wang, M.; Guo, X.; Lin, B.; Yang, T.; Zhu, Z.; Li, L.; Zhang, S.; Yu, X. DyGait: Exploiting Dynamic Representations for High-performance Gait Recognition. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 13378–13387. [Google Scholar]
Zheng, J.; Liu, X.; Zhang, B.; Yan, C.; Zhang, J.; Liu, W.; Zhang, Y. It Takes Two: Accurate Gait Recognition in the Wild via Cross-granularity Alignment. In Proceedings of the 32nd ACM International Conference on Multimedia, Melbourne, VIC, Australia, 28 October–1 November 2024; pp. 8786–8794. [Google Scholar]
Chen, J.; Deng, S.; Teng, D.; Chen, D.; Jia, T.; Wang, H. APPN: An Attention-based Pseudo-label Propagation Network for few-shot learning with noisy labels. Neurocomputing 2024, 602, 128212. [Google Scholar] [CrossRef]
He, Z.; Xu, J.; Huang, P.; Wang, Q.; Wang, Y.; Guo, Y. PLTN: Noisy label learning in long-tailed medical images with adaptive prototypes. Neurocomputing 2025, 645, 130514. [Google Scholar] [CrossRef]
Zheng, J.; Liu, X.; Wang, S.; Wang, L.; Yan, C.; Liu, W. Parsing is All You Need for Accurate Gait Recognition in the Wild. In Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, ON, Canada, 29 October–3 November 2023; pp. 116–124. [Google Scholar]
Yu, W.; Yu, H.; Huang, Y.; Cao, C.; Wang, L. CNTN: Cyclic Noise-tolerant Network for Gait Recognition. arXiv 2022, arXiv:2210.06910. [Google Scholar] [CrossRef]
Liu, H.; Zeng, S.; Deng, L.; Liu, T.; Liu, X.; Zhang, Z.; Li, Y.F. HPCTrans: Heterogeneous Plumage Cues-Aware Texton Correlation Representation for FBIC via Transformers. IEEE Trans. Circuits Syst. Video Technol. 2025. [Google Scholar] [CrossRef]
Liu, H.; Zhang, C.; Deng, Y.; Xie, B.; Liu, T.; Li, Y.F. TransIFC: Invariant cues-aware feature concentration learning for efficient fine-grained bird image classification. IEEE Trans. Multimed. 2023, 27, 1677–1690. [Google Scholar] [CrossRef]
Liu, H.; Zhou, Q.; Zhang, C.; Zhu, J.; Liu, T.; Zhang, Z.; Li, Y.F. MMATrans: Muscle movement aware representation learning for facial expression recognition via transformers. IEEE Trans. Ind. Inform. 2024, 20, 13753–13764. [Google Scholar] [CrossRef]
Garuda, N.; Prasad, G.; Dev, P.P.; Das, P.; Ghaderpour, E. CNNViT: A robust deep neural network for video anomaly detection. IET Conf. Proc. 2023, 2023, 13–22. [Google Scholar] [CrossRef]
Zhang, Y.; Zhou, D.; Chen, S.; Gao, S.; Ma, Y. Single-image crowd counting via multi-column convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 589–597. [Google Scholar]
Wang, W.; Dou, S.; Jiang, Z.; Sun, L. A fast dense spectral–spatial convolution network framework for hyperspectral images classification. Remote Sens. 2018, 10, 1068. [Google Scholar] [CrossRef]
Song, B.; Zhao, S.; Dang, L.; Wang, H.; Xu, L. A survey on learning from data with label noise via deep neural networks. Syst. Sci. Control Eng. 2025, 13, 2488120. [Google Scholar] [CrossRef]
Zhang, C.; Bengio, S.; Hardt, M.; Recht, B.; Vinyals, O. Understanding deep learning requires rethinking generalization. In Proceedings of the ICLR, Toulon, France, 24–26 April 2017. [Google Scholar]
Xiao, T.; Xia, T.; Yang, Y.; Huang, C.; Wang, X. Learning from massive noisy labeled data for image classification. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 2691–2699. [Google Scholar]
Song, H.; Kim, M.; Lee, J. SELFIE: Refurbishing Unclean Samples for Robust Deep Learning. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; Volume 97, pp. 5907–5915. [Google Scholar]
Liu, H.; Song, Y.; Liu, T.; Chen, L.; Zhang, Z.; Yang, X.; Xiong, N.N. TransSIL: A Silhouette Cue-Aware Image Classification Framework for Bird Ecological Monitoring Systems. IEEE Internet Things J. 2025. [Google Scholar] [CrossRef]
Liu, H.; Liu, T.; Chen, Y.; Zhang, Z.; Li, Y.F. EHPE: Skeleton cues-based gaussian coordinate encoding for efficient human pose estimation. IEEE Trans. Multimed. 2022, 26, 8464–8475. [Google Scholar] [CrossRef]
Deng, Y.; Ma, J.; Wu, Z.; Wang, W.; Liu, H. DSR-Net: Distinct selective rollback queries for road cracks detection with detection transformer. Digit. Signal Process. 2025, 164, 105266. [Google Scholar] [CrossRef]
Song, H.; Kim, M.; Park, D.; Shin, Y.; Lee, J. Learning From Noisy Labels with Deep Neural Networks: A Survey. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 8135–8153. [Google Scholar] [CrossRef]
Goldberger, J.; Ben-Reuven, E. Training deep neural-networks using a noise adaptation layer. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
Zhao, J.; Liu, X.; Zhao, W. Balanced and Accurate Pseudo-Labels for Semi-Supervised Image Classification. ACM Trans. Multim. Comput. Commun. Appl. 2022, 18, 145:1–145:18. [Google Scholar] [CrossRef]
Zhang, J.; Song, B.; Wang, H.; Han, B.; Liu, T.; Liu, L.; Sugiyama, M. BadLabel: A Robust Perspective on Evaluating and Enhancing Label-Noise Learning. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 4398–4409. [Google Scholar] [CrossRef]
Rusiecki, A. Batch Normalization and Dropout Regularization in Training Deep Neural Networks with Label Noise. In Proceedings of the International Conference on Intelligent Systems Design and Applications, Online, 13–15 December 2021; Volume 418, pp. 57–66. [Google Scholar]
Chen, Y.; Hu, S.X.; Shen, X.; Ai, C.; Suykens, J.A.K. Compressing Features for Learning with Noisy Labels. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 2124–2138. [Google Scholar] [CrossRef] [PubMed]
Chen, Y.; Jin, C.; Li, G.; Li, T.H.; Gao, W. Mitigating Label Noise in GANs via Enhanced Spectral Normalization. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 3924–3934. [Google Scholar] [CrossRef]
An, W.; Tian, F.; Shi, W.; Lin, H.; Wu, Y.; Cai, M.; Wang, L.; Wen, H.; Yao, L.; Chen, P. DOWN: Dynamic Order Weighted Network for Fine-grained Category Discovery. Knowl. Based Syst. 2024, 293, 111666. [Google Scholar] [CrossRef]
Zhang, L.; Wang, B.; Liang, P.; Yuan, X.; Li, N. Semi-supervised fault diagnosis of gearbox based on feature pre-extraction mechanism and improved generative adversarial networks under limited labeled samples and noise environment. Adv. Eng. Inform. 2023, 58, 102211. [Google Scholar] [CrossRef]
Fang, C.; Cheng, L.; Mao, Y.; Zhang, D.; Fang, Y.; Li, G.; Qi, H.; Jiao, L. Separating Noisy Samples From Tail Classes for Long-Tailed Image Classification with Label Noise. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 16036–16048. [Google Scholar] [CrossRef] [PubMed]
Chung, Y.; Lu, W.; Tian, X. Data Cleansing for Salt Dome Dataset with Noise Robust Network on Segmentation Task. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
Liu, D.; Zhao, J.; Wu, J.; Yang, G.; Lv, F. Multi-category classification with label noise by robust binary loss. Neurocomputing 2022, 482, 14–26. [Google Scholar] [CrossRef]
Yu, X.; Zhang, S.; Jia, L.; Wang, Y.; Song, M.; Feng, Z. Noise is the fatal poison: A Noise-aware Network for noisy dataset classification. Neurocomputing 2024, 563, 126829. [Google Scholar] [CrossRef]
Chen, G.; Qin, H.; Huang, L. Recursive noisy label learning paradigm based on confidence measurement for semi-supervised depth completion. Int. J. Mach. Learn. Cybern. 2024, 15, 3201–3219. [Google Scholar] [CrossRef]
Zhang, Y.; Sugiyama, M. Approximating Instance-Dependent Noise via Instance-Confidence Embedding. arXiv 2021, arXiv:2103.13569. [Google Scholar]
Malach, E.; Shalev-Shwartz, S. Decoupling “when to update” from “how to update”. In Proceedings of the NIPS’17: 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 960–970. [Google Scholar]
Han, B.; Yao, Q.; Yu, X.; Niu, G.; Xu, M.; Hu, W.; Tsang, I.W.; Sugiyama, M. Co-teaching: Robust training of deep neural networks with extremely noisy labels. In Proceedings of the NIPS’18: 32nd International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018; pp. 8536–8546. [Google Scholar]
Yu, X.; Han, B.; Yao, J.; Niu, G.; Tsang, I.W.; Sugiyama, M. How does Disagreement Help Generalization against Label Corruption? In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; Volume 97, pp. 7164–7173. [Google Scholar]
Wei, H.; Feng, L.; Chen, X.; An, B. Combating Noisy Labels by Agreement: A Joint Training Method with Co-Regularization. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 13723–13732. [Google Scholar]
Lin, J.; Zhao, Y.; Wang, S.; Tang, Y. A robust training method for object detectors in remote sensing image. Displays 2024, 81, 102618. [Google Scholar] [CrossRef]
Fan, C.; Liang, J.; Shen, C.; Hou, S.; Huang, Y.; Yu, S. OpenGait: Revisiting Gait Recognition Toward Better Practicality. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 9707–9716. [Google Scholar]
Ye, M.; Shen, J.; Lin, G.; Xiang, T.; Shao, L.; Hoi, S.C.H. Deep Learning for Person Re-Identification: A Survey and Outlook. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 2872–2893. [Google Scholar] [CrossRef] [PubMed]
Li, W.; Hou, S.; Zhang, C.; Cao, C.; Liu, X.; Huang, Y.; Zhao, Y. An in-depth exploration of person re-identification and gait recognition in cloth-changing conditions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 13824–13833. [Google Scholar]
Nishi, K.; Ding, Y.; Rich, A.; Höllerer, T. Augmentation Strategies for Learning with Noisy Labels. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 8022–8031. [Google Scholar]

Figure 1. The A–E figures denote the true identity; Each dashed box encapsulates a set of contour images from different frames of the same sequence; edge color represents the assigned label. (a) Correct label: Three different views of the identity A are correctly labeled as the same identity. (b) Noise-I: Different pedestrians, i.e., the identities B, C, and D, are wrongly labeled as the same identity. (c) Noise-II: Different sequences of the same identity are wrongly labeled as different identities. (Best viewed by color).

Figure 2. Overview of the proposed DNLC framework, where the symbol “*” denotes element-wise multiplication. (Best viewed in color).

Figure 3. The visualization of feature distributions. Samples of the same color belong to the same person. (Best viewed in color).

Figure 4. (rgb) Extracted from walking-video sequences of subject ID 109 captured under different viewpoints. (a) Heatmaps for GaitBase [53]. (b) Heatmaps for GaitBase-DNLC. (c) Heatmaps for XGait [15]. (d) Heatmaps for XGait-DNLC. The different color blocks in the heatmap represent the varying degrees of attention the model pays to different features. A color leaning towards red indicates that the model is focusing on that area, with darker shades of red signifying higher levels of focus. Conversely, a color leaning towards blue signifies that the model is not focusing on those features. (Best viewed in color).

Table 1. Experimental settings on Gait3D and CCPG datasets.

Method	Gait3D				CCPG
Method	Total_Iter	Warmup_Iter	Milestone	Batch_Size	Total_Iter	Warmup_Iter	Milestone	Batch_Size
Gaitset	180 k	6 k	[30 k, 90 k]	[32, 4, 30]	80 k	3 k	[30 k, 60 k]	[16, 16, 30]
MTSGait	180 k	6 k	[30 k, 90 k]	[32, 4, 30]	-	-	-	-
GaitBase	60 k	2 k	[20 k, 40 k, 50 k]	[32, 4, 30]	80 k	3 k	[30 k, 60 k]	[16, 16, 30]
DyGait	160 k	2 k	[60 k, 120 k]	[8, 16, 30]	160 k	6 k	[60 k, 120 k]	[8, 16, 30]
XGait	120 k	4 k	[40 k, 80 k, 100 k]	[32, 2, 30]	120 k	4 k	[40 k, 80 k, 100 k]	[8, 8, 30]

Table 2. Comparison of the state-of-the-art gait recognition methods on Gait3D dataset. The five methods integrated with DNLC—GaitSet, MTSGait, GaitBase, DyGait, and XGait—achieved their respective best Rank-1 accuracies of 46.40%, 48.90%, 64.00%, 60.60%, and 81.30%.

Method	Noise Rate
	0				0.1				0.2				0.3				0.4				0.5
	Rank-1	Rank-5	mAP	mINP	Rank-1	Rank-5	mAP	mINP	Rank-1	Rank-5	mAP	mINP	Rank-1	Rank-5	mAP	mINP	Rank-1	Rank-5	mAP	mINP	Rank-1	Rank-5	mAP	mINP
GaitSet [7]	39.00	59.90	31.52	18.61	28.50	49.30	22.64	12.38	27.70	45.90	20.50	11.20	25.30	45.50	19.83	11.03	24.20	44.9	19.48	10.85	24.70	44.00	19.10	10.56
GaitSet-DNLC	46.40	66.50	36.26	21.60	42.30	61.60	32.51	18.79	41.40	60.80	30.89	17.67	38.40	58.10	29.00	16.58	36.80	56.80	27.74	15.90	37.20	57.40	27.91	15.95
MTSGait [13]	48.70	67.10	37.63	21.93	38.50	56.40	28.51	15.66	36.80	56.50	28.24	16.12	35.40	52.90	26.16	14.54	32.10	52.70	24.94	14.17	33.00	52.60	24.46	13.50
MTSGait-DNLC	48.90	69.11	38.73	23.35	45.40	64.70	34.98	20.87	42.30	61.90	32.05	18.39	41.20	62.20	31.64	18.56	40.70	58.90	30.31	17.51	40.50	59.70	30.20	17.27
GaitBase [53]	47.20	67.40	38.21	23.38	36.50	58.10	28.83	17.06	30.60	51.60	24.74	14.27	30.90	49.10	23.31	13.67	28.40	47.70	22.34	13.32	28.60	50.40	22.71	13.10
GaitBase-DNLC	64.00	79.50	53.64	35.51	55.30	75.10	45.88	28.49	51.01	70.20	41.97	25.80	49.20	67.70	39.81	24.67	47.50	68.10	39.01	23.92	48.20	66.90	38.29	22.70
DyGait [14]	51.30	68.70	42.11	22.09	40.70	60.40	31.58	15.80	37.9	55.50	28.30	13.95	35.90	53.10	26.35	12.54	34.40	52.10	26.35	12.77	34.80	53.30	25.55	12.51
DyGait-DNLC	60.60	78.40	52.07	29.12	51.20	70.70	42.23	22.18	47.90	67.00	39.11	20.50	46.90	66.00	37.37	19.72	44.60	66.20	34.85	17.85	46.30	63.50	35.91	18.79
XGait [15]	80.50	91.90	73.30	55.40	69.90	86.80	62.21	42.95	65.90	83.20	56.63	37.95	61.30	80.00	52.75	34.37	58.80	79.70	51.50	33.56	59.20	77.90	49.75	31.50
XGait-DNLC	81.30	93.10	74.12	56.63	72.20	88.20	64.65	45.68	67.60	85.50	59.54	40.85	65.80	84.30	56.63	38.07	63.20	81.30	56.79	36.24	63.33	80.89	53.49	34.73

Table 3. Comparison of the state-of-the-art gait recognition methods on CCPG dataset. CL-FULL, CL-UP, CL-DN, and CL-BG denote cloth changing, top-changing, pants-changing, and bag-carrying, respectively. The four methods integrated with DNLC—GaitSet, GaitBase, DyGait, and XGait—achieved their respective best Rank-1 accuracies of 63.497%, 70.408%, 37.630%, and 73.200%.

Method	Noise Rate
	0				0.1				0.2				0.3				0.4				0.5
	CL-FULL	CL-UP	CL-DN	CL-BG	CL-FULL	CL-UP	CL-DN	CL-BG	CL-FULL	CL-UP	CL-DN	CL-BG	CL-FULL	CL-UP	CL-DN	CL-BG	CL-FULL	CL-UP	CL-DN	CL-BG	CL-FULL	CL-UP	CL-DN	CL-BG
GaitSet [7]	57.006	60.737	61.628	64.857	42.579	48.379	47.985	51.505	34.447	37.866	39.274	44.392	28.951	33.549	34.940	37.913	26.720	30.337	31.635	34.599	25.417	30.173	30.094	34.069
GaitSet-DNLC	63.497	70.681	71.795	77.715	52.479	61.222	61.846	70.094	45.226	52.676	54.725	64.549	43.212	49.572	52.931	62.496	40.488	47.089	49.175	59.648	39.721	47.075	49.232	58.715
GaitBase [53]	65.245	69.260	71.857	73.574	51.082	55.708	59.009	63.225	42.744	46.457	51.727	56.329	38.318	42.517	43.722	51.467	34.649	39.353	41.247	47.174	35.085	39.046	42.016	47.072
GaitBase-DNLC	70.408	77.1336	76.970	81.561	55.331	60.443	63.860	70.937	46.063	52.039	55.267	62.689	41.918	47.207	51.037	58.676	37.556	44.126	46.634	53.647	37.363	42.356	46.207	52.104
DyGait [14]	40.575	48.085	47.116	56.249	31.895	38.966	38.015	46.786	29.139	33.790	34.640	43.552	16.790	22.670	20.712	28.795	16.644	120.668	20.993	26.804	17.260	21.720	22.314	29.542
DyGait-DNLC	37.630	47.518	47.964	61.361	29.390	40.287	38.970	52.605	26.871	34.749	35.862	48.671	23.318	31.329	32.208	44.580	22.695	28.805	30.157	41.206	21.279	28.213	29.270	40.547
XGait [15]	72.500	76.723	78.990	79.989	53.799	55.750	62.628	62.732	42.560	43.542	51.440	53.061	36.576	39.116	45.768	46.604	33.920	37.079	43.056	46.426	32.670	34.815	43.352	46.92
XGait-DNLC	73.200	77.616	80.723	81.855	52.531	57.101	63.415	65.208	42.817	45.252	53.269	55.709	37.235	40.318	47.519	50.829	35.692	38.920	45.028	48.051	33.773	36.158	45.009	48.717

Table 4. The accuracy (%) on the Gait3D subset with a 0.5 noise rate under different combinations of proposed modules. LCM and TAS denote the label correction module and the two-stage augmentation strategy. A check mark indicates that the corresponding module is included in the training configuration.

Baseline	TAS	LCM	GaitSet				XGait
Baseline	TAS	LCM	Rank-1	Rank-5	mAP	mINP	Rank-1	Rank-5	mAP	mINP
✓			24.70	44.00	19.10	10.56	59.20	77.90	49.75	31.50
✓	✓		31.50	50.90	24.39	13.94	60.11	78.56	50.70	32.36
✓		✓	33.90	54.70	26.09	14.86	62.50	80.10	52.62	33.78
✓	✓	✓	37.20	57.40	27.91	15.95	63.33	80.89	53.49	34.73

Table 5. The sensitivity analysis of the GaitBase [53] model under a 0.3 noise rate on the Gait3D subset. The original hyperparameter settings included a temperature coefficient

τ

of 0.1, a label correction threshold T of 0.8, a label cleaning weight parameter

λ

of 0.5, and a momentum parameter m of 0.5. In this study, we employed a controlled variable method, adjusting only one parameter at a time while keeping the other three parameters consistent with their original settings.

Table 5. The sensitivity analysis of the GaitBase [53] model under a 0.3 noise rate on the Gait3D subset. The original hyperparameter settings included a temperature coefficient

τ

of 0.1, a label correction threshold T of 0.8, a label cleaning weight parameter

λ

of 0.5, and a momentum parameter m of 0.5. In this study, we employed a controlled variable method, adjusting only one parameter at a time while keeping the other three parameters consistent with their original settings.

Hyperparameter		Rank-1	Rank-5	mAP	mINP
Original		49.20	67.7	39.81	24.67
Temperature Coefficient: $τ$	0.01	49.09	67.53	39.77	24.54
	0.2	49.17	67.72	39.90	24.48
	0.5	49.11	67.80	39.79	24.68
Label Correction Threshold: T	0.5	49.15	67.75	39.85	24.60
Label Cleaning Weight: $λ$	0.6	49.11	67.65	39.80	24.75
Momentum Parameter: m	0.6	49.18	67.81	39.84	24.70

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yuan, S.; Zheng, J.; Li, X.; Sun, Y.; Li, W.; Gao, R.; Omar, M.H.; Zhang, J. Noisy Label Learning for Gait Recognition in the Wild. Electronics 2025, 14, 3752. https://doi.org/10.3390/electronics14193752

AMA Style

Yuan S, Zheng J, Li X, Sun Y, Li W, Gao R, Omar MH, Zhang J. Noisy Label Learning for Gait Recognition in the Wild. Electronics. 2025; 14(19):3752. https://doi.org/10.3390/electronics14193752

Chicago/Turabian Style

Yuan, Shuping, Jinkai Zheng, Xuan Li, Yaoqi Sun, Wenchao Li, Ruilai Gao, Mohd Hasbullah Omar, and Jiyong Zhang. 2025. "Noisy Label Learning for Gait Recognition in the Wild" Electronics 14, no. 19: 3752. https://doi.org/10.3390/electronics14193752

APA Style

Yuan, S., Zheng, J., Li, X., Sun, Y., Li, W., Gao, R., Omar, M. H., & Zhang, J. (2025). Noisy Label Learning for Gait Recognition in the Wild. Electronics, 14(19), 3752. https://doi.org/10.3390/electronics14193752

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Noisy Label Learning for Gait Recognition in the Wild

Abstract

1. Introduction

2. Related Works

2.1. Gait Recognition

2.2. Noisy Label Learning

3. Method

3.1. Overview

3.2. The Dynamic Class-Center Feature Library

3.3. The Label Correction Module

3.4. The Two-Stage Augmentation Strategy

3.5. Training and Inference

4. Experiment

4.1. Dataset

4.2. Implementation Details

4.3. Experimental Results on Gait3D

4.4. Experimental Results on CCPG

4.5. Ablation Study

4.6. Sensitivity Analysis

4.7. Visualization

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI