Adaptive Intra-Class Variation Contrastive Learning for Unsupervised Person Re-Identification in Substation Worker Safety Monitoring

Liu, Lingzhi; Du, Zexu; Chang, Zhengwei; Zhang, Yi; Zhang, Linghao

doi:10.3390/electronics15112339

Open AccessArticle

Adaptive Intra-Class Variation Contrastive Learning for Unsupervised Person Re-Identification in Substation Worker Safety Monitoring

by

Lingzhi Liu

^1,*

,

Zexu Du

¹,

Zhengwei Chang

²,

Yi Zhang

¹ and

Linghao Zhang

²

¹

Institute of Artificial Intelligence, China Electric Power Research Institute Co., Ltd., Beijing 100192, China

²

Power Internet of Things Key Laboratory, State Grid Sichuan Electric Power Research Institute, Chengdu 610041, China

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(11), 2339; https://doi.org/10.3390/electronics15112339

Submission received: 15 May 2026 / Accepted: 25 May 2026 / Published: 28 May 2026

(This article belongs to the Special Issue AI for Industry)

Download

Browse Figures

Versions Notes

Abstract

Ensuring safety compliance is paramount in substation operations. However, worker re-identification (Re-ID) remains highly challenging due to severe occlusions, uniform appearance similarity, and substantial illumination variations across shifts and environments. Moreover, the escalating cost of manual identity annotation in large-scale, multi-site surveillance systems necessitates annotation-free approaches for practical deployment. In this paper, we propose AdaInCV (Adaptive Intra-Class Variation Contrastive Learning), an unsupervised Re-ID framework tailored for substation worker safety monitoring. The proposed method quantitatively evaluates the model’s learning capacity for each pseudo-cluster by measuring intra-class feature variation after DBSCAN clustering, and adaptively selects training samples with appropriate difficulty throughout the learning process. To this end, two novel strategies are introduced. Adaptive Sample Mining (AdaSaM) progressively constructs reliable identity clusters while dynamically updating the memory dictionary. Adaptive Outlier Filtering (AdaOF) further exploits informative outlier samples—primarily caused by heavy occlusion or extreme illumination—as hard negatives to enhance contrastive representation learning. Extensive experiments on two widely used Re-ID benchmarks (Market-1501 and MSMT17), as well as an in-house Substation Worker Re-ID (SWRID) dataset, demonstrate that AdaInCV achieves state-of-the-art performance with significantly faster convergence than existing methods, establishing a practical foundation for intelligent safety supervision in power grid operations.

Keywords:

unsupervised re-identification; contrastive learning; substation safety monitoring; curriculum learning; power grid

1. Introduction

Power grid infrastructure underpins modern industry and daily life, with high-voltage substations serving as critical nodes that regulate electricity transmission and distribution. Within these environments, field workers perform routine inspection and maintenance tasks in close proximity to energized equipment, where safety violations—including unauthorized entry into restricted zones or improper grounding operations—can result in severe equipment damage, widespread power outages, or fatal injuries. Ensuring worker safety therefore represents not only an operational priority but a societal imperative.

Modern substations are increasingly equipped with multi-camera surveillance networks and body-worn sensing devices, generating substantial volumes of visual data continuously. However, the ability to extract actionable safety intelligence from this data remains limited. Manual inspection of large-scale video streams is impractical, while conventional rule-based video analytics systems suffer from high false alarm rates. In addition, simple object detection-based approaches are prone to both false positives and missed detections in complex substation environments. More critically, existing methods lack the capability to associate the same individual across multiple camera views. As a result, they fail to provide a coherent spatiotemporal understanding of worker behavior, particularly for safety events that are only partially observable from a single viewpoint. This gap between data availability and analytical capability motivates the development of more principled and robust automated approaches. From a deployment perspective, practical substation monitoring systems must also satisfy hardware constraints imposed by grid-edge infrastructure: low-power embedded processors and network-bandwidth-limited environments prohibit cloud-offloaded inference, requiring that perception models remain computationally lightweight. Furthermore, grid safety standards such as IEC 61850 [1] (substation communication and automation) and IEEE 1686 [2] (intelligent electronic device security) constrain the integration of third-party software into control environments, underscoring the importance of minimal-footprint, annotation-free approaches that can be validated and certified without reliance on proprietary labeled data pipelines.

Person re-identification (Re-ID) offers a compelling solution to this cross-camera association problem. Formally, Re-ID aims to retrieve all visual instances of a given individual across a distributed camera network. In particular, unsupervised person Re-ID, which requires no manual identity annotations, is especially attractive for deployment in real-world industrial settings where labeled training data is difficult or costly to obtain. Despite rapid progress in general-purpose surveillance scenarios [3], applying unsupervised Re-ID to substation environments poses unique and underappreciated challenges. First, dense equipment layouts and confined workspaces introduce persistent, structured occlusion that creates substantial intra-class appearance variation for each worker. Second, mandated safety uniforms including standardized helmets and protective clothing severely reduce inter-person discriminability, compounding fine-grained recognition difficulty [4]. Third, substations span heterogeneous lighting conditions across outdoor, indoor, and underground zones, amplifying appearance inconsistency for the same individual across views. Together, these factors make substation Re-ID a significantly harder problem than existing benchmarks suggest.

Existing unsupervised Re-ID methods fall into two major paradigms. The first, unsupervised domain adaptation (UDA) [5,6,7,8,9], transfers knowledge from a labeled source domain to an unlabeled target domain through a two-stage pre-training and fine-tuning procedure. The second, pure unsupervised learning (USL) [10,11,12,13,14,15], directly learns discriminative representations from entirely unlabeled data, typically by assigning pseudo-labels via clustering and using them for contrastive or classification-based training. This work focuses on the USL paradigm, where representation quality is central to performance. A common design in recent USL methods is to maintain a memory dictionary of instance-level features for contrastive learning. Depending on how this dictionary is updated, existing approaches can be broadly categorized into two categories: those that treat all pseudo-labeled samples within a cluster as equally reliable [8,13], and those that prioritize the hardest positive samples to enhance intra-class compactness [10,12]. However, both strategies exhibit inherent limitations. Equal-sample selection strategies, which treat all pseudo-labeled samples as equally reliable, are susceptible to noise and often converge slowly. In contrast, hard-sample selection strategies, while intuitively appealing, are particularly vulnerable in early training stages when pseudo-labels are unreliable. In such cases, the hardest samples are often the noisiest, and their inclusion can distort cluster representations—a problem that becomes more pronounced on large-scale and challenging datasets such as MSMT17 [16]. This trade-off between sample informativeness and reliability remains a central challenge in unsupervised Re-ID.

To address this trade-off, we draw on the principle of curriculum learning [17], which proposes organizing training samples from easy to hard in accordance with the model’s evolving capability. While existing curriculum and self-paced learning strategies [18] operate at the dataset level and apply a uniform difficulty schedule across all training instances, this coarse granularity is ill-suited to unsupervised Re-ID, where clusters exhibit markedly different intra-class densities and difficulty profiles. We argue that sample selection should be performed adaptively at the cluster level, guided by a continuous estimate of the model’s current discriminative capability for each cluster.

In this work, we propose AdaInCV, an unsupervised Re-ID framework built around adaptive intra-class variation estimation. For each cluster, we quantify the model’s learning progress by measuring the similarity gap between the hardest and easiest positive pairs—a large gap indicates underdeveloped representations, while a small gap signals sufficient learning. This signal drives two novel components: (1) Adaptive Sample Mining (AdaSaM), which continuously calibrates the difficulty of samples selected for memory dictionary updates at the cluster level; and (2) Adaptive Outlier Filtering (AdaOF), which dynamically incorporates informative outliers—instances typically discarded by existing methods—as hard negatives to strengthen contrastive learning. Rather than treating outliers as noise to be eliminated, AdaOF leverages them selectively when the model is capable of learning from their challenging characteristics, such as severe occlusion or extreme illumination.

We validate AdaInCV on two large-scale public Re-ID benchmarks and on a newly collected in-house Substation Worker Re-ID (SWRID) dataset that reflects the real-world challenges described above. Experimental results demonstrate that AdaInCV achieves competitive performance on standard benchmarks while showing strong generalization to the demanding substation setting, supporting its practical potential for intelligent safety supervision in power grid operations. The main contributions of this work are as follows:

We identify adaptive, cluster-level sample difficulty estimation as a missing and critical capability in unsupervised Re-ID, and propose AdaInCV as a principled framework to address it.
We introduce AdaSaM, which performs cluster-aware difficulty-calibrated sample mining for memory updates, and AdaOF, which dynamically integrates informative outliers as hard negatives based on the model’s learning state.
Extensive experiments on MSMT17, Market-1501, and the SWRID dataset demonstrate the effectiveness and practical applicability of AdaInCV in both general and industrial-safety Re-ID scenarios.

2. Materials and Methods

2.1. Problem Formulation

Let

X = {x_{1}, x_{2}, \dots, x_{N}}

denote an unlabeled dataset of worker images captured from multiple fixed cameras and body-worn recorders across a substation site, without identity annotations. The objective is to learn a feature extractor

f_{θ} (\cdot)

that maps images of the same worker—despite variations in viewpoints, occlusion levels, and illumination conditions—into nearby embeddings, while ensuring separability between different identities. Such a formulation naturally supports safety monitoring applications by enabling cross-camera identity association, continuous activity tracking, and multi-view violation verification.

However, the highly heterogeneous visual conditions in substation environments lead to significant variations in intra-class feature distributions across pseudo-clusters. For instance, a cluster corresponding to a worker captured in well-lit outdoor areas may exhibit compact feature distributions, whereas clusters formed from mixed environments (e.g., outdoor substations and underground cable trenches) tend to be highly dispersed. Under such conditions, applying a uniform global training strategy is suboptimal, as it fails to account for cluster-specific learning dynamics. To address this issue, AdaInCV introduces a per-cluster adaptive mechanism that dynamically adjusts training difficulty according to the intra-class variation of each pseudo-cluster.

2.2. Pipeline Overview

As shown in Figure 1, the proposed framework incorporates an adaptive sample mining strategy inspired by curriculum learning, which enables the model to select samples with appropriate difficulty levels according to the learning status of each cluster. This mechanism improves the quality of cluster-specific feature updates.

Each training epoch consists of the following steps. (1) L2-normalized embeddings are computed for all images using the current student network. (2) DBSCAN clustering is performed to generate pseudo-labels. (3) The memory dictionary is initialized or updated using cluster centroid features. (4) For each mini-batch, reliable positive and negative samples are selected through the Adaptive Sample Mining and Adaptive Outlier Filter modules (as illustrated in Figure 1). The HybridNCE contrastive loss (Equation (2)) and the mean squared error (MSE) self-distillation loss (Equation (3)) are computed. (5) The student network is updated via back-propagation, while the teacher network is updated using an exponential moving average (EMA) strategy.

The centroid of the

k

-th cluster is defined as:

\begin{matrix} c_{k} = \frac{1}{|H_{k}|} \sum_{f_{θ} (x_{i}) \in H_{k}} f_{θ} (x_{i}) \end{matrix}

(1)

where

H_{k}

denotes the set of samples belonging to cluster

k

.

To enhance the diversity of negative samples, the proposed HybridNCE loss incorporates adaptively filtered outliers

I_{k}

in addition to cluster representations:

L_{hybrid} = - \log \frac{\exp (\frac{⟨q, Φ_{k}^{+}⟩}{τ})}{\sum_{k = 1}^{n_{c}} \exp (\frac{⟨q, Φ_{k}⟩}{τ}) + \sum_{k = 1}^{n_{o}} \exp (\frac{⟨q, I_{k}⟩}{τ})}

(2)

where

Φ_{k}

is the unique representation vector of the

k

-

t h

cluster.

I_{k}

is the outliers filtered based on the current model’s ability. For any query person image

q

,

Φ_{k}^{+}

represents the positive cluster feature to which

q

belongs. The temperature

τ

is empirically set to 0.05, and ⟨ ·, · ⟩ denotes the inner product between two feature vectors, used to measure their similarity.

n_{c}

is the number of clusters and

n_{o}

is the number of un-clustered instances.

Inspired by MoCo [19], we adopt a teacher–student framework composed of

f_{θ_{t}} (\cdot)

and

f_{θ_{s}} (\cdot)

. The student is optimized via back-propagation, while the teacher is updated as an EMA of the student. The model is trained in a self-supervised manner by minimizing the mean squared error (MSE) between the predicted class probability distributions

P

.

L_{m s e} = {‖P_{t} (x_{i}) - P_{s} (x_{i})‖}^{2}

(3)

where

P_{s} (x_{i})

and

P_{t} (x_{i})

denote the class probability distributions predicted by the student and teacher networks for image

x_{i}

, respectively.

The overall training objective combines contrastive learning and self-distillation, where

λ_{m}

is a balancing hyperparameter controlling the relative weight of the self-distillation loss

L_{m s e}

(Equation (3)). In our experiments,

λ_{m}

is set to 1.0, as preliminary experiments showed that

L_{h y b r i d}

and

L_{m s e}

contribute at comparable magnitudes, making equal weighting a natural and stable choice.

L_{t o t a l} = L_{h y b r i d} + λ_{m} L_{m s e}

(4)

2.3. Adaptive Model Capability Acquisition

For a mini-batch containing

B

identities, each with

K

instances, we denote the normalized feature of the

k

-th sample in class

i

as

F_{i}^{k} = \frac{f_{i}^{k}}{∥ f_{i}^{k} ∥}

. Existing memory update strategies exhibit complementary limitations: average-based methods (e.g., SpCL, CC) fail to emphasize hard samples in later stages, while hardest-sample strategies (e.g., ICE, HDCRL) are prone to noise in early training due to insufficient model capacity. To address this trade-off, we introduce a curriculum-inspired adaptive similarity formulation that dynamically balances easy and hard positive pairs according to intra-class variation.

From a geometric perspective, the hardest positive pair determines the radius of the largest hypersphere enclosing all samples of class

i

, reflecting the worst-case intra-class compactness. Its similarity is approximated as:

{S i m}_{h, i}^{+} \approx - τ \log (\sum_{n = 1}^{K} \sum_{m = 1}^{K} \exp (- \frac{F_{i}^{n} \cdot F_{i}^{m}}{τ}))

(5)

Conversely, each sample can form a local hypersphere with its farthest positive, and the aggregation of these spheres characterizes easier intra-class relations. The least-hard positive similarity, capturing the best-case intra-class compactness, is defined as:

{S i m}_{l h, i}^{+} \approx τ \log (\sum_{n = 1}^{K} \frac{1}{\sum_{m = 1}^{K} \exp (- \frac{F_{i}^{n} \cdot F_{i}^{m}}{τ})})

(6)

In practice, large intra-class variations (e.g., illumination changes, occlusion, or viewpoint shifts) often lead to unreliable hardest pairs, particularly in early training stages. In contrast, smaller variations indicate that the model is sufficiently discriminative and can benefit from harder samples. To adaptively balance these two regimes, we exploit the discrepancy between

S i m_{h, i}^{+}

and

S i m_{l h, i}^{+}

as a proxy for intra-class variation and define an adaptive weight via their harmonic mean. The harmonic mean is preferred over the arithmetic mean because it is more sensitive to asymmetry between the two terms. When intra-class variation is large, the similarity of the hardest pair decreases substantially, whereas the similarity of the least-hard pair may remain relatively high. In such cases, the arithmetic mean tends to overestimate cluster reliability. By contrast, the harmonic mean penalizes this imbalance by biasing the result toward the smaller value, thereby producing a lower weight,

α_{i}

. This conservative weighting strategy delays the emphasis on hard samples until the model has learned sufficiently discriminative representations:

h_{i} = \frac{2 {Sim}_{lh, i}^{+} {S i m}_{h, i}^{+}}{{S i m}_{l h, i}^{+} + {S i m}_{h, i}^{+}}, α_{i} = \{\begin{matrix} h_{i}, {S i m}_{h, i}^{+} \geq 0 \\ 0, o t h e r w i s e \end{matrix}

(7)

Finally, the weighted positive similarity is formulated as:

{S i m}_{i}^{+} = α_{i} {S i m}_{h, i}^{+} + (1 - α_{i}) {S i m}_{l h, i}^{+}

(8)

This formulation enables a smooth transition from easy-sample dominance in early training to hard-sample emphasis in later stages, thereby improving robustness against noisy positives while maintaining strong discriminative learning.

2.4. Adaptive Sample Mining (AdaSaM)

Unlike instance-level memory dictionaries, our memory-based feature dictionary stores cluster-level representations, where each entry corresponds to a pseudo-label (i.e., a cluster). To adapt the memory update to intra-class variability, we measure the dispersion of samples within each cluster based on the weighted positive pair similarity defined in the previous section.

Specifically, we quantify the intra-class variation by normalizing the relative position of the current cluster similarity

S i m_{i}^{+}

between the hardest and least-hard positive similarities, yielding a difficulty score:

d i f f_{i} = \frac{{S i m}_{i}^{+} - {S i m}_{h, i}^{+}}{{S i m}_{l h, i}^{+} - {S i m}_{h, i}^{+}}

(9)

This formulation provides an interpretable measure of intra-class compactness. When

d i f f_{i} \approx 1

, the cluster exhibits tight and well-separated representations, indicating that the model is sufficiently discriminative and can benefit from emphasizing the hardest positive samples. In contrast, when

d i f f_{i} \approx 0

, the cluster is highly scattered—typically due to occlusion, illumination variation, or viewpoint changes—suggesting that relying on hard samples may introduce noise into the memory.

To address this, we design a difficulty-aware sampling strategy that adaptively selects the update feature according to the current cluster state. Instead of always using the hardest or average sample, we interpolate between them by ranking samples according to their similarity to the cluster centroid. Concretely, we define a selection coefficient:

β_{i} = \{\begin{matrix} 1, & i f r o u n d (\frac{{S i m}_{l h, i}^{+}}{{S i m}_{h, i}^{+}}) = 1 \\ d i f f_{i}, & o t h e r w i s e \end{matrix}

(10)

which determines the relative position of the selected sample in the ranked list.

The memory entry

Φ_{k}

is then updated via a momentum scheme:

Φ_{k} \leftarrow m Φ_{k} + (1 - m) q_{{r a n k}_{i}}^{β_{i} \cdot N}

(11)

where

q_{{r a n k}_{i}}^{β_{i} \cdot N}

denotes the

(β_{i} \cdot N)

-th sample after sorting all instances in cluster i by their similarity to the cluster centroid in descending order.

This mechanism enables a smooth transition from easy-sample-dominated updates in early training (low

d i f f_{i}

) to hard-sample-focused updates as the model becomes more robust (high

d i f f_{i}

). Consequently, it mitigates error accumulation caused by noisy hard samples while preserving the discriminative benefits of challenging examples in later stages.

2.5. Adaptive Outlier Filter (AdaOF)

We further define a global difficulty indicator to characterize the overall learning status of the model. Specifically, the global difficulty score is computed as the average of all cluster-wise difficulty values:

d i f f_{global} = \frac{1}{C} \sum_{i = 1}^{C} d i f f_{i}

(12)

This metric reflects the model’s current discriminative capability across all pseudo-labels and serves as a key signal for adaptive outlier utilization. In practice, clustering algorithms (e.g., DBSCAN) inevitably produce outliers, which often correspond to samples with extreme intra-class variations, such as severe occlusion, illumination inconsistency, or viewpoint changes. Rather than treating these samples as pure noise, we consider them as informative hard negatives that can enhance contrastive learning when appropriately incorporated.

To this end, we propose an adaptive outlier filtering (AdaOF) strategy guided by

d i f f_{global}

. Specifically, we rank all samples (including outliers) according to their distance to cluster centroids in descending order. During early training stages, when

d i f f_{global}

is low and the model lacks robustness, we preferentially select samples that are farther from all clusters, as they are more likely to be reliable negative instances. As training progresses and

d i f f_{global}

increases, the selection criterion is gradually relaxed, allowing outliers to be incorporated from far to near, thereby increasing the diversity and hardness of negative samples.

To further stabilize training, we introduce a curriculum factor:

γ = \frac{cur_epoch}{total_epoch}

(13)

which modulates the influence of

d i f f_{global}

. This design effectively slows down outlier incorporation in early epochs—where intra-class variance is typically high—thus preventing noisy samples from prematurely contaminating the memory dictionary. As the model becomes more robust, the influence of

γ

increases, enabling a smoother and more reliable transition toward harder negative mining.

Notably, this curriculum factor is not limited to outlier handling; it is consistently integrated into the difficulty-aware learning process, including the cluster-level difficulty estimation (

d i f f_{i}

), forming a unified mechanism for progressive sample selection. This design is especially beneficial in scenarios with significant intra-class variability, while introducing negligible overhead for relatively simpler distributions.

3. Results

3.1. Datasets

Market-1501 [20] consists of 32,668 annotated images of 1501 identities from 6 cameras, with 12,936 training images of 751 identities and 19,732 test images of 750 identities.

MSMT17 [16] is the largest publicly available Re-ID dataset, containing 126,441 images of 4101 identities from 15 cameras, with 32,621 training images and 93,820 testing images.

Substation Worker Re-ID (SWRID). To evaluate the proposed method in realistic industrial scenarios, we construct a dedicated dataset termed Substation Worker Re-ID (SWRID). Each substation is equipped with 21 to 29 fixed high-definition surveillance cameras (1080p) deployed across outdoor transformers, indoor GIS rooms, and control buildings. In total, the dataset comprises 98 cameras across all substations. Data collection spans eight months, covering spring, summer, and autumn operating conditions, including varying natural illumination (daytime and dusk) and artificial lighting (night-shift indoor environments). All pedestrian crops are generated automatically using an off-the-shelf detector without any manual bounding box annotation.

We adopt a subject-disjoint training and testing protocol to rigorously assess cross-camera generalization. Specifically, 55 identities (approximately 70%) from all four substations are used for training, yielding 8743 unlabeled images for unsupervised representation learning without accessing identity annotations. The remaining 24 identities are reserved for evaluation, from which a gallery set of 2304 images (96 per identity) and a query set of 480 images (20 per identity) are constructed. Query images are sampled from camera views disjoint from the gallery. Training and evaluation identities are strictly non-overlapping, preventing any potential label leakage during unsupervised training. Note that SWRID is not publicly available due to the data security policies of State Grid Corporation of China, which limits direct reproducibility on this dataset. Researchers interested in accessing the dataset may contact the corresponding author to discuss formal data-sharing arrangements.

3.2. Implementation Details

Network and Training. We adopt ResNet-50 as the backbone encoder, initialized with parameters pre-trained on ImageNet. The network outputs 2048-dimensional L2-normalized feature representations via a global average pooling layer followed by batch normalization. The model is optimized using the Adam optimizer with a weight decay of

5 \times 10^{- 4}

and an initial learning rate of

3.5 \times 10^{- 4}

. A linear warm-up strategy is applied during the first 10 epochs, with no subsequent learning rate decay. The model is trained for 70, 30, and 60 epochs on Market-1501, MSMT17, and SWRID, respectively. Each mini-batch contains 256 images, sampled from 16 pseudo-identities with 16 instances per identity. During training, images are first resized to 256 × 128, padded by 10 pixels, randomly cropped back to 256 × 128, randomly flipped horizontally (p = 0.5), and randomly erased (p = 0.5). During inference, only resizing to 256 × 128 is applied. In both cases, images are normalized using ImageNet mean and standard deviation (mean = [0.485, 0.456, 0.406], std = [0.229, 0.224, 0.225]).

Clustering and Memory Update. Pseudo-labels are generated using DBSCAN [21], following commonly adopted settings in prior work. The neighborhood thresholds are set to 0.5, 0.7, and 0.65 for Market-1501, MSMT17, and SWRID, respectively. For SWRID, the parameter is selected following the same validation protocol as in existing benchmarks, reflecting its distinct cross-substation and cross-device data distribution. The exponential moving average (EMA) momentum for memory updates is fixed to m = 0.999, following standard practice in momentum-based representation learning.

Considering the varying degrees of intra-class variation and outlier ratios across datasets, we incorporate a curriculum factor

γ = \frac{cur_epoch}{total_epoch}

into the difficulty estimation to adaptively regulate the influence of hard samples during training. This factor is applied to the difficulty score

d i f f_{i}

, enabling a progressive transition from conservative to more challenging sample selection.

Evaluation Metrics. Following standard person Re-ID protocols, we evaluate all methods using mean Average Precision (mAP) and Cumulative Matching Characteristic (CMC) accuracy at Rank-1, Rank-5, and Rank-10. mAP captures holistic retrieval quality, while CMC metrics reflect the probability of finding at least one correct match within the top-k results.

Hardware. All experiments are conducted on a server equipped with 4 NVIDIA RTX 2080 Ti GPUs using the PyTorch framework (Version 1.11.0).

3.3. Comparison with State-of-the-Art Methods

We first evaluate the proposed method on widely used benchmarks, Market-1501 and MSMT17, with results summarized in Table 1. Our method achieves state-of-the-art performance among fully unsupervised learning (USL) methods, reaching 87.4% and 38.8% mAP on Market-1501 and MSMT17, respectively. It outperforms all existing USL methods on Market-1501 and all camera-agnostic USL methods on MSMT17. Compared with the baseline HDCRL, the proposed method improves mAP by 2.9% and 18.1%, respectively. In addition, it consistently surpasses existing contrastive learning-based USL approaches. For reference, we also include several UDA methods (marked with *) that leverage additional labeled source-domain data; while direct comparison is not strictly fair due to this extra supervision, our method nonetheless outperforms them, demonstrating that our adaptive contrastive learning strategy is competitive even against approaches with privileged access to labeled data. Compared with ISE, which requires generating auxiliary samples from cluster centroids, our method achieves superior performance without introducing extra synthetic samples. Unlike ICE [10] and CAP [11], our method does not exploit camera information; under this camera-agnostic setting, it achieves 38.8% mAP and 69.8% Rank-1 accuracy on MSMT17, significantly outperforming prior methods.

To further validate its effectiveness in more challenging real-world scenarios, we conduct experiments on the SWRID dataset, as shown in Table 2. AdaInCV achieves 68.9% mAP and 80.2% Rank-1 accuracy, outperforming all baseline methods by a significant margin. Notably, the Hardest update strategy (ICE-style) yields the lowest performance on SWRID (41.3% mAP), with a more pronounced degradation than that observed on MSMT17. This observation supports our hypothesis that hardest-sample mining can be detrimental during early training stages, particularly for datasets with extreme intra-class variations caused by occlusion and multi-zone illumination. In contrast, the Adaptive strategy (without outlier filtering) already surpasses all fixed-strategy baselines. Further incorporating AdaOF leads to an additional 5.8% improvement in mAP, which can be attributed to the informative and discriminative cues contained in the high proportion of outlier samples in SWRID.

3.4. Ablation Study

The performance gains of Adaptive Intra-Class Variation Contrastive Learning (AdaInCV) mainly stem from the proposed Adaptive Sample Mining (AdaSaM) and Adaptive Outlier Filtering (AdaOF) strategies. To evaluate the contribution of each component, we conduct ablation studies on Market-1501 and MSMT17, as summarized in Table 3. For conciseness, Table 3 reports mAP and Rank-1, which are the primary indicators of overall retrieval quality and top-match accuracy respectively; these two metrics are most sensitive to changes in the memory update strategy and are therefore most informative for component-level analysis. The Rank-5 and Rank-10 trends are consistent with those observed in Table 1.

Among different memory update strategies, the proposed Adaptive method consistently outperforms CM, Hardest, and Linear. Specifically, CM updates the memory using all intra-class embedding features; Hardest selects the instance with the lowest cosine similarity to the query; Linear follows a curriculum learning paradigm with a fixed easy-to-hard progression; and Adaptive dynamically adjusts the sample selection strategy. As illustrated in Figure 2, these strategies differ in their sample selection mechanisms.

Even without outlier handling, AdaSaM achieves the best performance, improving mAP by 1.1% and 3.0% on Market-1501 and MSMT17, respectively. Compared with Linear curriculum learning, the superior performance indicates that fixed easy-to-hard schedules are suboptimal, while adaptive selection better aligns with the model’s learning dynamics. Although both Linear and Adaptive strategies are inspired by curriculum learning, experimental results demonstrate that the samples selected by the Adaptive strategy are more appropriate.

This can be attributed to the fact that, in the early stages of training, selecting only the hardest samples may lead to biased optimization trajectories, whereas focusing solely on easy samples is insufficient to continuously improve model performance. Only by dynamically selecting samples that match the model’s current capability can the rationality and stability of the optimization process be ensured.

Notably, the Hardest strategy performs worst on both datasets, suggesting that selecting only the most difficult samples can mislead the model in early training stages. In addition, directly incorporating all outliers degrades performance compared with AdaSaM alone, highlighting that premature inclusion of noisy samples is detrimental. By contrast, AdaOF effectively mitigates this issue through adaptive outlier integration, leading to further performance gains.

In terms of training efficiency, our method achieves competitive or fewer training epochs while maintaining superior performance. We note that epoch count is used here as a proxy for convergence speed; direct learning-curve plots were not included in this study. As shown in Table 4, on Market-1501, our method achieves the best mAP (87.4%) with a comparable number of training epochs to ClusterContrast. Notably, when trained for the same number of epochs as ClusterContrast, our method already achieves a higher mAP (83.2%) than CC’s converged performance. On the more challenging MSMT17 dataset, our method converges in fewer epochs (30 epochs) while outperforming all compared unsupervised SOTA methods in final accuracy, demonstrating its effectiveness in learning discriminative representations with improved training efficiency.

3.5. Occlusion and Illumination Robustness

To systematically evaluate AdaInCV under varying levels of difficulty, we partition the SWRID test set into three subsets based on occlusion severity and illumination conditions: (a) low difficulty (≥60% body visibility under favorable lighting), (b) medium difficulty (20–60% body visibility or moderately constrained indoor environments), and (c) high difficulty (≤20% visibility and/or adverse lighting conditions). Table 5 presents a comparison between AdaInCV and HDCRL across these subsets. Although ISE achieves higher overall performance on SWRID (Table 2), HDCRL is selected for subset-level comparison because it is a dynamic hybrid contrastive learning method specifically designed for hard-sample scenarios, making it the most directly comparable reference for difficulty-stratified analysis.

AdaInCV’s advantage increases consistently with difficulty, achieving gains of +2.8, +9.3, and +20.1 percentage points, respectively. This trend indicates that the proposed adaptive curriculum is particularly effective under severe occlusion and challenging lighting, where fixed-strategy baselines tend to saturate and fail to improve.

4. Discussion

The results across Market-1501, MSMT17, and SWRID consistently indicate that a key challenge in unsupervised person Re-ID lies in the large intra-class feature variation within pseudo-clusters. While this issue is relatively moderate in standard benchmarks such as Market-1501, it becomes more pronounced in MSMT17 due to its multi-camera and multi-condition setting, and is further exacerbated in SWRID by structured occlusion, appearance homogeneity, and extreme illumination variation.

The consistent improvements achieved by AdaInCV across all datasets suggest that per-cluster adaptive curriculum learning provides a general and principled solution for controlling intra-class variance under progressively more challenging conditions.

Failure Analysis of Hard-Sample Mining. The Hardest strategy exhibits a severe performance drop (41.3% for hardest vs. 68.9% for Ours), indicating a critical failure mode. In substation scenarios, the “hardest” samples are often caused by illumination mismatch or occlusion rather than true semantic variation. Using such samples to update cluster centroids in early training distorts feature representations and propagates clustering errors. AdaInCV mitigates this issue by adaptively down-weighting unreliable clusters, delaying the influence of hard samples until more robust representations are learned.

Implications for Multi-View Safety verification. A robust Re-ID model enables cross-camera retrieval of the same worker, addressing the limitations of single-view safety monitoring. Given a query from one camera, AdaInCV can retrieve corresponding instances from other views, facilitating more reliable multi-view analysis without requiring labeled data.

Contribution of Adaptive Outlier Utilization. The adaptive inclusion of outliers in AdaOF not only improves contrastive learning, but also emphasizes inherently ambiguous samples. These typically correspond to safety-critical scenarios (e.g., heavy occlusion or confined spaces), suggesting that the learned curriculum implicitly prioritizes difficult yet operationally important cases. A qualitative analysis of the visual characteristics and identity coverage of AdaOF-identified outliers is provided in Appendix A.

Limitations and future work. SWRID is limited to substations within a single region, and its generalization to diverse environments remains to be validated. Future work includes incorporating multi-modal data such as thermal imagery for illumination-invariant representation, and integrating Re-ID with downstream action recognition to enable automated safety violation detection.

5. Conclusions

We present AdaInCV, a fully unsupervised person Re-ID framework designed for substation worker safety monitoring. The key idea is to quantify intra-class feature variation within each pseudo-cluster after DBSCAN clustering, which implicitly reflects the degree of occlusion and illumination complexity associated with each identity. Based on this signal, AdaInCV performs cluster-level adaptive curriculum learning to control sample difficulty during training.

This framework is realized through two complementary components: AdaSaM (Adaptive Sample Mining), which dynamically adjusts positive sample selection for memory updates, and AdaOF (Adaptive Outlier Filtering), which progressively incorporates informative outlier samples—primarily heavy-occlusion and extreme-illumination images—as hard negatives.

Extensive experiments on Market-1501, MSMT17, and the SWRID dataset demonstrate that AdaInCV consistently achieves state-of-the-art performance among fully unsupervised learning (USL) methods. In particular, it yields substantial improvements under challenging conditions, including a +20.1 percentage-point gain in Rank-1 accuracy under high-occlusion and extreme-illumination settings compared with HDCRL. These results indicate that explicitly modeling cluster-wise difficulty is critical for robust unsupervised Re-ID in complex real-world environments.

Overall, AdaInCV provides a practical solution for annotation-free worker re-identification in power grid scenarios, enabling reliable cross-camera identity association that can support downstream applications such as multi-view verification and safety monitoring.

Author Contributions

Conceptualization, L.L. and Z.D.; methodology, L.L. and Y.Z.; validation, L.L. and Z.C.; data curation, L.Z.; writing, L.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by State Grid Corporation of China Headquarters Project, ‘Research and Application of Key Technologies for Explainable Multi-object Recognition of On-site Operational Violations Based on Image-Text Models,’ under Grant 5700-202426249A-1-1-ZN.

Data Availability Statement

The SWRID dataset is subject to data security policies of State Grid Corporation of China. Requests for access may be directed to the corresponding author pending formal data-sharing agreement. Market-1501 and MSMT17 are publicly available at their respective project pages.

Acknowledgments

This research was supported by the Science and Technology Project of State Grid Corporation of China (5700-202426249A-1-1-ZN). The authors would like to thank all those who contributed to and assisted in the completion of this manuscript.

Conflicts of Interest

Authors Lingzhi Liu, Zexu Du and Yi Zhang are employed by the Institute of Artificial Intelligence, China Electric Power Research Institute Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AdaInCV	Adaptive Intra-Class Variation Contrastive Learning
AdaSaM	Adaptive Sample Mining
AdaOF	Adaptive Outlier Filter
Re-ID	Re-Identification
SWRID	Substation Worker Re-Identification Dataset
USL	Unsupervised Learning
UDA	Unsupervised Domain Adaptation
mAP	Mean Average Precision
EMA	Exponential Moving Average
DBSCAN	Density-Based Spatial Clustering of Applications with Noise
MSE	Mean Squared Error
NCE	Noise Contrastive Estimation

Appendix A. Qualitative Analysis of AdaOF: Outlier Appearance and Noise Assessment

Appendix A.1. Experimental Setup

To examine the visual characteristics of AdaOF-identified outliers and assess whether they constitute informative hard negatives or harmful noise, we conduct a qualitative analysis on Market-1501 (12,936 training images) using DBSCAN (ε = 0.5, min_samples = 4) with Jaccard distance on features extracted at Epoch 0 (ImageNet-pretrained ResNet-50, no Re-ID fine-tuning) and at convergence (final trained model). All hyperparameters are identical to those used during training.

Appendix A.2. Outlier Characteristics Across Training

As shown in Figure A1, at Epoch 0, DBSCAN assigns 87.8% of images (11,355/12,936) as outliers, consistent with the training log (11,346 outliers, 204 clusters). Visual inspection reveals that these samples are not corrupted or misdetected—they depict pedestrians under normal conditions, spanning diverse clothing, poses, and camera angles indistinguishable from the clustered population. Their outlier status reflects the inability of ImageNet-pretrained features to form identity-discriminative clusters, not any image-level defect. These samples carry genuine identity information awaiting feature maturation.

Figure A1. AdaOF outlier dynamics on Market-1501 over 80 training epochs.

After training, only 55 images (0.43%) remain as outliers—a 99.5% reduction. These final outliers exhibit a striking semantic regularity: nearly all depict individuals interacting with bicycles (riding, pushing, or parking), where occlusion of the lower body and viewpoint-dependent silhouette fragmentation defeat Jaccard-distance clustering. This is not random noise but a coherent, reproducible failure mode for appearance-based Re-ID.

Appendix A.3. Identity-Level Analysis

Figure A2 presents a per-identity comparison: for each identity with a residual outlier at convergence, the corresponding clustered images of the same ground-truth identity show the person walking unobstructed, confirming two key properties of AdaOF: (1) no identity is wholly discarded—each retains cluster coverage through its more-discriminative views; (2) bicycle-occluded views are correctly withheld from cluster prototypes and retained as hard negatives in the outlier pool, providing contrastive signal without polluting the memory dictionary. The progressive mAP improvement from 2.7% (Epoch 1) to 87.4% (convergence) confirms that the memory dictionary grows consistently more discriminative throughout training, demonstrating the absence of noise accumulation.

Figure A2. Final Outliers Compared with Clustered Samples of the Same Ground-Truth Identity. The red boxes indicate outliers, while the blue boxes represent normally clustered image samples.

References

Solorio-García, M.Y.; Mata-López, W.A.; Álvarez-Flores, J.L.; Simón, J.; Castillo, V.H. Supporting Translation and Analysis of the Configuration of an Electrical Substation Automation System Based on the IEC 61850 2.0 Standard. Electricity 2026, 7, 15. [Google Scholar] [CrossRef]
IEEE Std 1686-2022/Cor 1-2025 (Corrigendum to IEEE Std 1686-2022); IEEE Standard for Intelligent Electronic Devices Cybersecurity Capabilities—Corrigendum 1. IEEE: Piscataway, NJ, USA, 2026; pp. 1–13. [CrossRef]
Ming, Z.; Zhu, M.; Wang, X.; Zhu, J.; Cheng, J.; Gao, C.; Yang, Y.; Wei, X. Deep learning-based person re-identification methods: A survey and outlook of recent works. Image Vis. Comput. 2022, 119, 104394. [Google Scholar] [CrossRef]
Xiang, C.-Y.; Wu, X.; He, J.-Y.; Yuan, Z.; He, T. Person in Uniforms Re-Identification. ACM Trans. Multimed. Comput. Commun. Appl. 2024, 21, 61. [Google Scholar] [CrossRef]
Chen, G.; Lu, Y.; Lu, J.; Zhou, J. Deep Credible Metric Learning for Unsupervised Domain Adaptation Person Re-identification. In Proceedings of the Computer Vision—ECCV 2020, Glasgow, UK, 23–28 August 2020; pp. 643–659. [Google Scholar] [CrossRef]
Chen, H.; Lagadec, B.; Bremond, F. Enhancing Diversity in Teacher-Student Networks via Asymmetric branches for Unsupervised Person Re-identification. In Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2021; pp. 1–10. [Google Scholar] [CrossRef]
Ge, Y.; Chen, D.; Li, H. Mutual mean-teaching: Pseudo label refinery for unsupervised domain adaptation on person re-identification. arXiv 2020, arXiv:2001.01526. [Google Scholar] [CrossRef]
Ge, Y.; Zhu, F.; Chen, D.; Zhao, R. Self-paced contrastive learning with hybrid memory for domain adaptive object re-id. Adv. Neural Inf. Process. Syst. 2020, 33, 11309–11321. [Google Scholar]
Wu, Q.; Li, J.; Dai, P.; Ye, Q.; Cao, L.; Wu, Y.; Ji, R. Unsupervised Domain Adaptation on Person Reidentification via Dual-Level Asymmetric Mutual Learning. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 1371–1382. [Google Scholar] [CrossRef] [PubMed]
Chen, H.; Lagadec, B.; Bremond, F. ICE: Inter-instance Contrastive Encoding for Unsupervised Person Re-identification. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 14940–14949. [Google Scholar] [CrossRef]
Wang, M.; Lai, B.; Huang, J.; Gong, X.; Hua, X.-S. Camera-Aware Proxies for Unsupervised Person Re-Identification. Proc. AAAI Conf. Artif. Intell. 2021, 35, 2764–2772. [Google Scholar] [CrossRef]
Cheng, D.; Zhou, J.; Wang, N.; Gao, X. Hybrid Dynamic Contrast and Probability Distillation for Unsupervised Person Re-Id. IEEE Trans. Image Process. 2022, 31, 3334–3346. [Google Scholar] [CrossRef] [PubMed]
Dai, Z.; Wang, G.; Yuan, W.; Zhu, S.; Tan, P. Cluster Contrast for Unsupervised Person Re-identification. In Proceedings of the Computer Vision—ACCV 2022, Macao, China, 4–8 December 2023; pp. 319–337. [Google Scholar] [CrossRef]
Wu, Y.; Huang, T.; Yao, H.; Zhang, C.; Shao, Y.; Han, C.; Gao, C.; Sang, N. Multi-Centroid Representation Network for Domain Adaptive Person Re-ID. Proc. AAAI Conf. Artif. Intell. 2022, 36, 2750–2758. [Google Scholar] [CrossRef]
Zhang, X.; Li, D.; Wang, Z.; Wang, J.; Ding, E.; Shi, J.Q.; Zhang, Z.; Wang, J. Implicit Sample Extension for Unsupervised Person Re-Identification. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 7359–7368. [Google Scholar] [CrossRef]
Wei, L.; Zhang, S.; Gao, W.; Tian, Q. Person Transfer GAN to Bridge Domain Gap for Person Re-identification. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 79–88. [Google Scholar] [CrossRef]
Wang, X.; Chen, Y.; Zhu, W. A Survey on Curriculum Learning. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 4555–4576. [Google Scholar] [CrossRef] [PubMed]
Kumar, M.; Packer, B.; Koller, D. Self-paced learning for latent variable models. Adv. Neural Inf. Process. Syst. 2010, 1, 1189–1197. [Google Scholar]
He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum Contrast for Unsupervised Visual Representation Learning. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 9726–9735. [Google Scholar] [CrossRef]
Zheng, L.; Shen, L.; Tian, L.; Wang, S.; Wang, J.; Tian, Q. Scalable Person Re-identification: A Benchmark. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1116–1124. [Google Scholar] [CrossRef]
Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD’96: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, OR, USA, 2–4 August 1996; AAAI Press: Washington, DC, USA, 1996; Volume 96, pp. 226–231. [Google Scholar]
Lee, G.; Lee, S.; Kim, D.; Shin, Y.; Yoon, Y.; Ham, B. Camera-Driven Representation Learning for Unsupervised Domain Adaptive Person Re-identification. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 11419–11428. [Google Scholar] [CrossRef]
Liu, Z.; Liu, B.; Zhao, Z.; Chu, Q.; Yu, N. Dual-Uncertainty Guided Curriculum Learning and Part-Aware Feature Refinement for Domain Adaptive Person Re-Identification. In Proceedings of the ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar] [CrossRef]
Li, J.; Zhang, S. Joint Visual and Temporal Consistency for Unsupervised Domain Adaptive Person Re-identification. In Proceedings of the Computer Vision—ECCV 2020, Glasgow, UK, 23–28 August 2020; pp. 483–499. [Google Scholar] [CrossRef]
Wang, D.; Zhang, S. Unsupervised Person Re-Identification via Multi-Label Classification. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 10978–10987. [Google Scholar] [CrossRef]
Chen, H.; Wang, Y.; Lagadec, B.; Dantcheva, A.; Bremond, F. Joint Generative and Contrastive Learning for Unsupervised Person Re-identification. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 2004–2013. [Google Scholar] [CrossRef]
Cho, Y.; Kim, W.J.; Hong, S.; Yoon, S.-E. Part-based Pseudo Label Refinement for Unsupervised Person Re-identification. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 7298–7308. [Google Scholar] [CrossRef]
Peng, J.; Jiang, G.; Wang, H. Adaptive Memorization with Group Labels for Unsupervised Person Re-Identification. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 5802–5813. [Google Scholar] [CrossRef]
Li, Y.; Tang, W.; Wang, S.; Qian, S.; Xu, C. Distribution-Guided Hierarchical Calibration Contrastive Network for Unsupervised Person Re-Identification. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 7149–7164. [Google Scholar] [CrossRef]

Figure 1. Overview of the proposed framework. The model integrates Adaptive Sample Mining and an Adaptive Outlier Filter to construct reliable positive and negative pairs for contrastive learning.

Figure 2. Illustration of memory bank update mechanisms in the proposed method and existing mainstream approaches.

Table 1. Comparison of Re-ID methods on the Market-1501 and MSMT17 datasets.

Methods	Market-1501				MSMT17
Methods	mAP	Rank-1	Rank-5	Rank-10	mAP	Rank-1	Rank-5	Rank-10
CaCL * [22]	84.7	93.8	97.7	98.6	36.5	66.6	75.3	80.1
DUCL + PAFR * [23]	84.0	93.9	97.5	98.6	33.9	65.0	74.8	79.7
JVTC [24]	41.8	72.9	84.2	88.7	15.1	39.0	50.9	56.8
MMCL [25]	45.5	80.3	89.4	92.3	11.2	35.4	44.8	49.8
SpCL [8]	73.1	88.1	95.1	97.0	19.1	42.3	55.6	61.2
GCL [26]	66.8	87.3	93.5	95.5	21.3	45.7	58.6	64.5
ICE [10]	79.5	92.0	97.0	98.1	29.8	59.0	71.7	77.0
PPLR [27]	81.5	92.8	97.1	98.1	31.4	61.1	73.4	77.8
ISE [15]	84.7	94.0	97.8	98.8	35.0	64.7	75.5	79.4
ClusterContrast [13]	82.6	93.0	97.0	98.1	27.6	56.0	66.8	71.5
AdaMG [28]	84.6	93.9	97.9	98.9	38.0	66.3	76.9	80.6
DHCCN [29]	85.6	94.1	-	-	36.4	65.9	-	-
HDCRL(Baseline) [12]	84.5	93.5	97.6	98.6	20.7	43.8	55.1	60.1
AdaInCV	87.4	94.9	98.1	98.8	38.8	69.8	79.6	83.0

* denotes UDA approaches that utilize labeled source-domain data. Bold values indicate the best performance.

Table 2. Re-ID results on SWRID (camera-agnostic, fully unsupervised).

Method	mAP	Rank-1	Rank-5	Rank-10
SpCL [8]	48.3	64.7	80.1	85.9
ICE [10]	41.3	57.2	74.8	81.3
ClusterContrast [13]	55.7	70.4	84.2	89.0
ISE [15]	60.1	74.8	87.3	91.8
HDCRL [12]	57.4	72.1	85.6	90.4
Ours (AdaSaM Only)	63.1	76.5	89.2	93.1
Ours (AdaInCV Full)	68.9	80.2	91.4	94.8

Bold values indicate the best performance.

Table 3. Ablation study on Market-1501 and MSMT17 (camera-agnostic, fully unsupervised).

Update	Outliers	Market-1501		MSMT17
Update	Outliers	mAP	Rank-1	mAP	Rank-1
CM	None	85.9	93.7	32.7	61.8
Hardest	None	86.6	94.4	20.5	42.8
Linear	None	86.1	93.8	34.3	66.0
Adaptive	None	87.0	93.7	35.7	67.4
Adaptive	All	86.9	94.0	35.5	67.6
Adaptive	Adaptive	87.4	94.9	38.8	69.8

Bold values indicate the best performance.

Table 4. Comparison of epochs, iterations, and mAP in unsupervised person re-identification methods using contrastive learning.

Method	Market-1501			MSMT17
Method	Epoch	Iters	mAP	Epoch	Iters	mAP
SpCL	50	400	73.1	50	800	19.1
ICE	50	400	79.5	50	400	29.8
ClusterContrast	50	200	82.6	50	400	27.6
ISE	70	200	84.7	50	400	35.0
HDCRL	120	200	84.5	120	400	20.7
AdaInCV	50/70	200	83.2/87.4	30	400	38.8

Bold values indicate the best performance.

Table 5. Rank-1 accuracy by difficulty subset on SWRID test set.

Methods	Low Occ. R-1	Med. Occ. R-1	High Occ. + Extr. Lighting R-1
HDCRL	88.4	71.3	43.7
AdaInCV	91.2 (+2.8)	80.9 (+9.3)	63.8 (+20.1)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, L.; Du, Z.; Chang, Z.; Zhang, Y.; Zhang, L. Adaptive Intra-Class Variation Contrastive Learning for Unsupervised Person Re-Identification in Substation Worker Safety Monitoring. Electronics 2026, 15, 2339. https://doi.org/10.3390/electronics15112339

AMA Style

Liu L, Du Z, Chang Z, Zhang Y, Zhang L. Adaptive Intra-Class Variation Contrastive Learning for Unsupervised Person Re-Identification in Substation Worker Safety Monitoring. Electronics. 2026; 15(11):2339. https://doi.org/10.3390/electronics15112339

Chicago/Turabian Style

Liu, Lingzhi, Zexu Du, Zhengwei Chang, Yi Zhang, and Linghao Zhang. 2026. "Adaptive Intra-Class Variation Contrastive Learning for Unsupervised Person Re-Identification in Substation Worker Safety Monitoring" Electronics 15, no. 11: 2339. https://doi.org/10.3390/electronics15112339

APA Style

Liu, L., Du, Z., Chang, Z., Zhang, Y., & Zhang, L. (2026). Adaptive Intra-Class Variation Contrastive Learning for Unsupervised Person Re-Identification in Substation Worker Safety Monitoring. Electronics, 15(11), 2339. https://doi.org/10.3390/electronics15112339

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Adaptive Intra-Class Variation Contrastive Learning for Unsupervised Person Re-Identification in Substation Worker Safety Monitoring

Abstract

1. Introduction

2. Materials and Methods

2.1. Problem Formulation

2.2. Pipeline Overview

2.3. Adaptive Model Capability Acquisition

2.4. Adaptive Sample Mining (AdaSaM)

2.5. Adaptive Outlier Filter (AdaOF)

3. Results

3.1. Datasets

3.2. Implementation Details

3.3. Comparison with State-of-the-Art Methods

3.4. Ablation Study

3.5. Occlusion and Illumination Robustness

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Qualitative Analysis of AdaOF: Outlier Appearance and Noise Assessment

Appendix A.1. Experimental Setup

Appendix A.2. Outlier Characteristics Across Training

Appendix A.3. Identity-Level Analysis

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI