Symmetry-Guided Prototype Alignment and Entropy Consistency for Multi-Source Pedestrian ReID in Power Grids: A Domain Adaptation Framework

He, Jia; Zhang, Lei; Zhang, Xiaofeng; Xu, Tong; Wang, Kejun; Li, Pengsheng; Liu, Xia

doi:10.3390/sym17050672

Open AccessArticle

Symmetry-Guided Prototype Alignment and Entropy Consistency for Multi-Source Pedestrian ReID in Power Grids: A Domain Adaptation Framework

by

Jia He

^1,2,

Lei Zhang

^3,*

,

Xiaofeng Zhang

⁴,

Tong Xu

⁵,

Kejun Wang

⁴,

Pengsheng Li

⁴ and

Xia Liu

⁴

¹

Faculty of Data Science, City University of Macau, Macau 999078, China

²

School of Applied Science and Civil Engineering, Beijing Institute of Technology, Zhuhai 519087, China

³

College of Transportation Engineering, Tongji University, Shanghai 200070, China

⁴

School of Information Technology, Beijing Institute of Technology, Zhuhai 519087, China

⁵

College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150009, China

^*

Author to whom correspondence should be addressed.

Symmetry 2025, 17(5), 672; https://doi.org/10.3390/sym17050672

Submission received: 5 March 2025 / Revised: 8 April 2025 / Accepted: 24 April 2025 / Published: 28 April 2025

(This article belongs to the Special Issue Navigating New Horizons: Symmetry and Advances in the Integration and Active Support of Large-Scale Renewable Energy)

Download

Browse Figures

Versions Notes

Abstract

This study proposes a multi-source unsupervised domain adaptation framework for person re-identification (ReID), addressing cross-domain feature discrepancies and label scarcity in electric power field operations. Inspired by symmetry principles in feature space optimization, the framework integrates (1) a Reverse Attention-based Feature Fusion (RAFF) module aligning cross-domain features using symmetry-guided prototype interactions that enforce bidirectional style-invariant representations and (2) a Self-Correcting Pseudo-Label Loss (SCPL) dynamically adjusting confidence thresholds using entropy symmetry constraints to balance source-target domain knowledge transfer. Experiments demonstrate 92.1% rank-1 accuracy on power industry benchmarks, outperforming DDAG and MTL by 9.5%, with validation confirming robustness in operational deployments. The symmetric design principles significantly enhance model adaptability to inherent symmetry breaking caused by heterogeneous power grid environments.

Keywords:

person re-identification; multi-source domain adaptation; power industry; cross-domain attention; pseudo-label optimization; feature fusion

1. Introduction

Field operations in the power industry—such as inspection, maintenance, and emergency repairs—are essential for ensuring the safe and stable operation of power grids. These operations, however, take place in dynamic and complex environments, with personnel often working across widely dispersed regions. Real-time tracking of personnel locations and movements is crucial for on-site management, emergency response, and workload reporting. Yet, traditional methods like GPS and indoor positioning systems struggle to meet the required accuracy and coverage, particularly due to signal interference from terrain obstructions in field environments and the impracticality of deploying high-density infrastructure needed for indoor systems across expansive or isolated operational areas. While facial recognition provides precise identification, it depends on high-resolution cameras and often fails in outdoor settings with fluctuating lighting conditions. As a result, there is an urgent need for an innovative personnel identification solution tailored to the power industry.

Person re-identification has emerged as a promising technology for pedestrian identification in recent years. Unlike facial recognition, which requires high-precision frontal facial images that are often unavailable in dynamic field operations with varying camera angles, ReID overcomes these limitations by utilizing structural symmetry inherent in holistic body features (e.g., clothing patterns, gait cycles, and posture configurations), which remain identifiable even in low-resolution images or side-view captures. By learning comprehensive human features, including appearance and posture, ReID enables effective cross-camera matching of individuals [1]. Compared to facial recognition, ReID requires less sophisticated hardware and is more adaptable to varying angles and distances, making it a viable solution for personnel identification in the power sector. Significant advancements in deep learning have accelerated ReID’s progress, particularly through the development of neural networks that automatically learn discriminative appearance features, enhancing performance [2]. Additionally, metric learning has improved similarity calculations, bolstering the generalization ability of ReID models. With ample annotated data, supervised ReID algorithms have made considerable strides [3].

However, applying ReID technology to the power industry faces several obstacles. The diversity and complexity of field environments lead to cross-domain symmetry disruption between different operational sites (e.g., substations with varying equipment layouts and lighting conditions), making it costly to gather large-scale annotated data that account for these asymmetries. Furthermore, substantial differences in environments across substations hinder the generalization of models. To address the annotation challenge, unsupervised domain adaptation (UDA) has gained attention as a promising approach. UDA uses labeled data from source domains to train models and transfers them to unlabeled target domains in an unsupervised manner, easing the annotation burden. Multi-source domain adaptation (MUDA), which leverages data from multiple source domains, further bridges the gap between domains, improving model generalization.

In the realm of domain adaptation for ReID, many methods focus on generating pseudo-labels for the target domain and fine-tuning models [4,5,6]. Although substantial research has tackled the distribution gap between source and target domains, most existing methods focus on generic public datasets, with little consideration for symmetry corruption in domain-specific features encountered in power field operations. Consequently, these models’ generalization capabilities still need validation in real-world settings. Power field operations, with their unique characteristics, present a specific challenge. Additionally, the reliability of pseudo-label generation is crucial, as noisy labels can lead to the accumulation of prediction errors. Therefore, a robust mechanism for generating pseudo-labels in a multi-source domain framework is necessary.

This paper addresses the challenges of underutilized multi-source domain data and suboptimal performance in applying ReID to the power industry. Building on existing research in multi-source domain ReID, we propose a novel framework for multi-domain information fusion. The framework includes two key components. First, we introduce a Reverse Attention-based Feature Fusion (RAFF) module. The RAFF module leverages reverse attention to capture high-order correlations between features while restoring domain-invariant symmetry in identity attributes. It implements a contrastive reverse attention mechanism that distinguishes domain-specific asymmetries from essential identity features, enabling cross-domain knowledge transfer. RAFF effectively integrates domain-specific data from various sources, fostering deep fusion across domains. Second, to refine pseudo-labels during training, we propose a Self-Correcting Pseudo-Label Loss (SCPL) function, which employs symmetry-constrained optimization to suppress pseudo-label asymmetry. This function imposes tight constraints on label confidence distributions, allowing the network to mathematically certify label correctness and correct symmetry violation errors.

Our method achieves advanced performance on several large-scale ReID public datasets, demonstrating its superiority. We then apply the method to personnel re-identification in power field operations, testing it on a self-built power industry dataset. Comparative experiments with typical ReID models show that our method outperforms the baselines across multiple evaluation metrics. These results highlight its excellent generalization capability, offering a new solution for personnel identification in the power industry. The key innovations of this work are as follows:

The RAFF module recovers cross-domain feature symmetry between source and target domains, reducing the domain gap and improving the discriminability of pedestrian identity features.
The SCPL loss function implements symmetry-driven noise suppression to mitigate the impact of erroneous pseudo-labels.
The method demonstrates competitive advanced performance in multi-source domain adaptation (MUDA) on benchmark ReID datasets, outperforming most other approaches.
ReID technology is applied to the power industry for the first time. Our method is validated on a self-constructed power industry dataset, demonstrating strong generalization ability.

2. Related Work

2.1. Supervised Person Re-Identification

Person re-identification aims to create a large gallery of detected pedestrian images and retrieve matching images through probe images, making it an image retrieval problem. The primary challenge is learning discriminative features that can effectively distinguish between images of the same identity and those of different identities. However, in real-world applications, pedestrians may appear in multiple camera views, with variations in viewpoint, pose, lighting, and resolution, making it difficult to learn robust and discriminative features. Early ReID methods relied on handcrafted features, such as Histogram of Oriented Gradients (HOG) [7], Scale-Invariant Feature Transform (SIFT) [8], and HSV color features [9], and aligned query images with gallery images using metric learning [10,11]. However, as big data applications have become more prevalent, traditional handcrafted feature methods have shown limitations due to low accuracy and high computational costs.

Since 2014, deep learning methods have become the dominant approach in computer vision, with network architectures like VGG and Residual Neural Networks (ResNets) [12,13] achieving impressive performance on various datasets. Deep learning-based methods have also gradually replaced traditional handcrafted feature engineering in ReID, effectively addressing issues such as lighting, viewpoint, and occlusion, and significantly improving model accuracy. ReID methods based on deep learning can be divided into two categories: global feature methods and local feature methods. Global feature methods represent the entire image using global features. For example, Chen et al. [14] employed a Siamese network to encode two input images into feature vectors and used Euclidean distance to determine whether the images correspond to the same pedestrian. Liu et al. [15] drew inspiration from face recognition and used triplet loss to enhance the discriminative power of pedestrian features. Qian et al. [16] introduced a multi-scale deep representation learning model to capture discriminative information at various scales. In contrast, local feature methods focus on extracting features from different regions of the image to improve robustness. For example, Sun et al. [17] divided pedestrian images into several regions and learned features for each region separately. Chen et al. [18] applied attention mechanisms to guide the network to focus on key image regions, improving the representation of local features. Additionally, Wu et al. [19] proposed a graph neural network-based method that aligns pose points and constructs a graph model using metric distances for ReID.

In new scenarios, traditional methods struggle to bridge the distribution gap between source and target domains, often requiring annotations for the target domain. Unsupervised ReID methods, which do not require target domain annotations, have emerged as a promising solution for the practical application of ReID technology. The following section will review the progress made in unsupervised ReID research.

2.2. Research Progress in Unsupervised Person Re-Identification

Research in unsupervised ReID mainly focuses on two approaches: fully unsupervised ReID and unsupervised domain adaptation (UDA) ReID.

Fully unsupervised ReID aims to learn directly from completely unannotated target domain data. Common methods in this area often utilize pseudo-label generation techniques, such as clustering, to assign pseudo-labels to images and assist with network training. Research in this direction has concentrated on improving both pseudo-label generation and training strategies. Lin et al. [20] proposed the BUC method, which treats each image as a class and performs bottom-up clustering with regularization to improve the robustness of clustering results. Lin et al. [21] introduced the SSL method, which calculates image similarity and normalizes it to assign pseudo-labels based on similarity scores, thereby enhancing pseudo-label accuracy. Jin et al. [22] proposed the GDS method, which optimizes pseudo-label generation by introducing global distance distribution constraints. Other studies focus on optimizing training strategies, such as the MMCL method proposed by Wang et al. [23], which assigns multiple labels to each image and utilizes Wasserstein distance clustering to reduce the influence of noise during training.

UDA ReID, on the other hand, utilizes annotated source domain data to perform transfer learning, with the goal of improving performance on the target domain. Compared to fully unsupervised methods, UDA techniques generally yield better performance by leveraging source domain data, making them more suited to practical applications. UDA ReID methods can be categorized into image-based domain adaptation and feature-based domain adaptation. Image-based domain adaptation methods often use Generative Adversarial Networks (GANs) to align the styles of source and target domain images. For example, SPGAN [24] and PTGAN [25] use GANs to transform source domain images into the style of the target domain while preserving pedestrian identity information. PTGAN further ensures pixel-level matching through color consistency constraints during transformation. While these methods are effective in some cases, their performance typically deteriorates significantly when there is a substantial gap between the source and target domains.

Feature-based domain adaptation methods, which do not rely on GANs for image transformation, aim to close the domain gap by aligning features. This approach is widely used in UDA ReID. Yu et al. [26] proposed the MAR method, which generates pseudo-labels by calculating similarity scores between target and source domain data and uses this information to guide training. Fu et al. [27] introduced the SSG method, which assigns multi-scale pseudo-labels by incorporating human local features, effectively addressing the lack of supervision. Ge et al. [4] proposed the MMT method, which maximizes the utilization of known information through mutual mean-teaching, enhancing the robustness of feature learning by enabling mutual supervision between two networks. Zheng et al. [28] adopted a self-labeling approach from self-supervised learning, using dynamic programming for clustering to improve model performance.

Although these methods have made significant strides in UDA ReID, most are limited to a single source domain, which restricts their applicability. To overcome this limitation, Bai et al. [29] introduced multi-source domain adaptation (MUDA) that uses multiple source domains to mitigate the impact of inappropriate source domain selection on model performance. MUDA methods not only increase the amount of available data and improve model performance, but also eliminate the constraints associated with single-source dependency, making them more adaptable to real-world diversity. Notable studies in MUDA include Bai et al.’s [28] graph convolutional network method, Qi et al.’s [30] approach of establishing specific network paths for each source domain, Wang et al.’s [31] MSTNet method, and Chen et al.’s [32] multi-source adaptive meta-learning framework (MAM) that enhances multi-source domain generalization in ReID through a multi-source adaptive feature refinement module (MAFR) and meta-learning strategies. Other studies include Tian et al.’s [33] sample weighting-based unsupervised MUDA method that which uses adversarial learning to align domains and address training bias caused by sample size differences and Tian et al.’s [34] feature fusion and pseudo-label refinement-based unsupervised MUDA method, both of which have demonstrated impressive performance in MUDA ReID tasks. Xian et al. [35] proposed a method that utilizes the diversity and consistency of multi-source data through self-training and inter-expert similarity distillation, maintaining feature diversity using representation decorrelation techniques. This method achieved state-of-the-art performance across multiple benchmark datasets.

Despite the success of unsupervised MUDA methods in ReID tasks on public datasets, most of these studies focus on general scenarios, leaving a gap in research for specific scenarios, such as those found in power field operations. There is little work addressing the challenges and practical needs of this domain. Adapting and optimizing advanced domain adaptation techniques for the power industry can significantly enhance model generalization and provide strong technical support for safety monitoring. Therefore, further exploration of the application of these methods to power field environments holds both theoretical and practical significance.

3. Methodology

3.1. General

In the unsupervised multi-source domain adaptive pedestrian re-identification task, we are given a labeled dataset

{D_{m}}_{m = 1}^{M}

from M source domains, where

D_{m} = {(u_{j}^{m}, v_{j}^{m})}_{j = 1}^{P_{m}}

represents the data samples from the m-th source domain along with the corresponding true labels. Additionally, an unlabeled dataset

U = {u_{i}^{t}}_{i = 1}^{Q}

from the target domain is available. The objective of this study is to leverage both the labeled samples from the source domains and the unlabeled samples from the target domain to train a pedestrian re-identification model with good generalization performance on the target domain. It is important to note that, for the purposes of this paper, the terms “dataset” and “domain” are used interchangeably.

To address the challenges inherent in this setting, we propose an unsupervised multi-source domain adaptation framework for pedestrian re-identification. As illustrated in Figure 1, the framework is composed of three main components: the feature extractor, the Reverse Attention-based Feature Fusion (RAFF) module, and the Self-Correcting Pseudo-Label Loss (SCPL) function. Rather than opting for the widely used DBSCAN-based pseudo-label generation method [36], our approach employs the SCPL and RAFF modules to dynamically optimize the learning of the feature extractor and generate high-quality pseudo-labels.

The backbone network is responsible for extracting domain-invariant feature representations from the input images. The backbone can be flexibly chosen from various popular convolutional neural network architectures, such as ResNet [37] or DenseNet [27].

Once the feature representations are obtained from the backbone network, the RAFF module is introduced in the network’s head structure. The RAFF module facilitates the transfer of identity features between multiple domains, aiming to minimize inter-domain differences. By propagating and aggregating features, the RAFF module enables the model to fully exploit complementary information from different domains, leading to more robust feature representations.

During training, the SCPL loss function is used, with a maintained memory buffer that computes both soft and pseudo (hard) labels for the target domain. This loss function plays a critical role in mitigating the negative impact of noisy pseudo-labels during training. The design and optimization strategies of these key modules will be detailed in the following sections of the paper.

3.2. Reverse Attention-Based Domain Merging Module

The proposed Reverse Attention-based Feature Fusion (RAFF) module facilitates a profound integration of pedestrian features across diverse domains through the transmission of domain-specific information. The cornerstone of this approach lies in enabling features to adaptively assimilate information from other domains, predicated on cross-domain relevance. As illustrated in Figure 2, the RAFF module’s specific implementation unfolds through the following sequential steps:

(1): Domain Prototype Computation

For a given domain d, the domain prototype is conceptualized as the mean embedding of all sample features within that domain:

P_{d} = \frac{1}{n_{d}} \sum_{j = 1}^{n_{d}} f_{j}^{(1)}

(1)

where n_d denotes the number of batch samples in domain d, and h_j⁽¹⁾ is the feature of the jth sample in domain d.

(2): Reverse Attention Mechanisms

The feature correlations of domain d with other domains are computed using the reverse attention mechanism. First, the domain prototype and all pedestrian features are mapped to a new space:

Q = [f_{1}, \dots, f_{s N}, t] U_{q} U_{q} \in ℝ^{D \times D}

(2)

[K, V] = [f_{1^{'}}, \dots, f_{2^{'}}, \dots, f_{t^{'}}] U_{k v} U_{k v} \in ℝ^{D \times 2 D}

(3)

where

U_{q}

and

U_{k v}

are learnable projection matrices, and D is the feature dimension.

(3): Attention weight calculation

The attention weight matrix

A_{g}

is generated using the reverse attention mechanism:

A_{g} = S o f t m a x (\frac{q_{g} \cdot K^{T}}{\sqrt{D}}) A_{g} \in ℝ^{1 \times B}

(4)

where

q_{g}

is the query feature of domain g, K is the key matrix, D is the feature dimension, and B is the batch size.

With the above formula, pedestrian features that are semantically closer to the domain prototype (i.e., domain styles that are similar to g) are given higher weights. However, this emphasis mainly reinforces the original domain style information of g and does not help to eliminate domain bias. Instead, we adopt a different strategy to force the network to focus on pedestrian features that are semantically different from the domain prototype, which usually have different domain-specific information. Specifically, we generate a new reverse attention graph

A_{g}

by “inverting” the obtained attention graph:

A_{g^{'}} = \frac{1 - A_{g}}{s u m (1 - A_{g})} A_{g^{'}} \in ℝ^{1 \times R}

(5)

where 1 is the all-1 vector. Thus, the obtained

A_{g}^{'}

can achieve the opposite effect of the previous

A_{g}

, i.e., those pedestrian features that are more different from domain g in terms of domain-specific information are assigned higher weights.

Subsequently, pedestrian feature fusion is performed on domain g using

A_{g}^{'}

, which enables the integration of feature information from different domains based on the extent to which the pedestrian features diverge from domain g. Specifically, the fusion process is carried out by employing g to merge feature information from multiple domains:

{\hat{Φ}}_{g} = A_{g^{'}} V = \sum_{g \in G} \sum_{i = 1}^{b} a_{i}^{g} v_{i}^{g}

(6)

where

a_{i}^{g}

denotes the inverse attentional weight between the prototype of g and the ith pedestrian feature in g. The weight of

a_{i}^{g}

is obtained by summing up the pedestrian features from different domains.

{\hat{Φ}}_{g}

is derived by weighting pedestrian features from different domains and summing them according to their differences from g. The weights are computed by combining pedestrian features from different domains and summing them based on their divergence from g. Given that it emphasizes the integration of knowledge with diverse styles, we refer to

{\hat{Φ}}_{g}

as the domain style centroid with respect to g. The domain style centroid of g represents the aggregated domain style information. Additionally, to ensure the stability of

{\hat{Φ}}_{g}

, an exponential moving average (EMA) is employed to maintain it.

{({\hat{Φ}}_{g})}^{n} = (1 - α) {({\hat{Φ}}_{g})}^{n - 1} + α {({\hat{Φ}}_{g})}^{n}

(7)

where, n denotes the nth small batch. The recorded

{\hat{Φ}}_{g}

will be utilized in the inference phase.

Finally, the domain style centroid

{\hat{Φ}}_{g}

is integrated into the original pedestrian features. This process is modeled using a basic multilayer perceptron (MLP).

h_{g^{'}} = M L P ([f_{g^{'}}, {\hat{Φ}}_{g}])

(8)

In the pedestrian re-identification (Re-ID) task,

h_{x}^{i}

represents the fused feature representation of pedestrian images from different camera views. In this case, domain style encoders are applied to the features under each camera viewpoint to capture environmental characteristics (e.g., variations in illumination, angle, and occlusion) of different camera scenes. Using this feature fusion mechanism, the model can learn a robust representation across camera views. To further enhance the recognition performance of the model, a metric-based loss function is employed to optimize the feature space, which ensures that the feature representations of the same pedestrian under different cameras are more compact, thereby improving the accuracy of pedestrian matching across cameras.

L_{m l} = \sum_{s = 0}^{b} \sum_{j = 1}^{b} {‖h_{x}^{i} - μ_{y}^{i}‖}_{2}^{2}

(9)

μ_{y}^{i}

is a D-dimensional vector representing the feature center of the pedestrian ID class. To enhance the robustness of the model in cross-camera scenarios, a feature memory bank M is designed to store class-level feature representations for each pedestrian ID. These features are dynamically updated through mini-batch processing during the training phase. For implementation, feature extraction is first performed on all pedestrian images from all cameras, and an initial feature-centered representation is created for each pedestrian ID. For unlabeled target domain data (e.g., new camera scenes), the classification prediction results of the model are utilized as pseudo-labels to update the corresponding feature centers. This mechanism plays a crucial role in computing the within-class metric loss

L_{m l}

, as it enables the model to learn a more stable and discriminative feature representation.

μ_{k} = Avg (\sum_{j = 1}^{N} \sum_{x_{j} = x_{i}}^{y_{j} - k} h_{x}^{i} + \sum_{x_{j} = x_{i}}^{a r g m a x} p_{j} (x_{j}) - k)

(10)

where,

p_{j} (x)

denotes the confidence level that the model recognizes the input image x as the jth pedestrian ID. After each network parameter update, the feature memory bank M is updated according to the following rule provided at

p_{j} (x)

to maintain the accuracy of the feature center representation for each pedestrian ID:

μ_{k}^{new} = (1 - β) μ_{k}^{add} + β h

(11)

where k represents the identity tag of the pedestrian, and

μ_{k}

is the feature center of the kth pedestrian ID stored in the feature memory bank M. Under the supervision of the intra-class metric loss

L_{m l}

, the feature representation of the same pedestrian ID maintains a compact distribution, even when the viewpoints of different cameras vary.

Overall, effective feature alignment and information fusion between different camera scenes are achieved using the proposed Reverse Attention-based Feature Fusion (RAFF) mechanism. Combined with the constraint of intra-class metric loss

L_{m l}

, the model maintains strong pedestrian identity discrimination while enabling seamless feature fusion across different camera scenes. This capability is pivotal for enhancing the accuracy of cross-camera pedestrian re-identification.

3.3. Adaptive Reverse Cross-Entropy Loss

General unsupervised domain adaptation (UDA) approaches typically employ clustering algorithms to generate pseudo-labels for the entire unlabeled target dataset. However, due to the lack of constraints, pseudo-labels generated in this manner are often misassigned, introducing noise and degrading model performance. To address this issue, we propose a novel approach that assigns gradients to the pseudo-label generation process and constrains them with carefully designed rules, enabling the network to jointly optimize both predicted values and pseudo-labels.

First, we aim to obtain pseudo-labels with gradient flow. Leveraging the dynamically updated memory M introduced in Section 4.2, we can derive both soft and hard labels in each batch by computing the distance between pedestrian features and mid-class level features (i.e., the average features per identity):

q (y = k | x) = \frac{e^{h \cdot μ_{k}}}{\sum_{j = 1}^{K} e^{h \cdot μ_{j}}}

(12)

\hat{y} = \arg \max_{k} q (y = k | x)

(13)

where

q (y | x)

is the soft label, and

\hat{y}

is the hard label. The soft labels obtained in this manner enable gradient backpropagation and can be supervised by the designed loss function. However, this generation process may lead to error accumulation due to the presence of noise in the pseudo-labels, necessitating additional constraints. A trained metric learning module contains rich information, which is crucial for preventing the model from failing in unlabeled target domains. Therefore, based on the predicted values of the metric learning module, we establish constraints on the pseudo-label generation using the inverse cross-entropy loss as follows:

L_{c r e} = \sum_{k = 1}^{K} p_{k} (x) \log q (y = k | x)

(14)

where

q (y | x)

denotes the confidence of predicting pedestrian x as the kth identity. This constraint mitigates overly aggressive pseudo-label generation and prevents error accumulation. However, applying the same rules to pseudo-labels that are often accurate is inherently unfair. Ideally, the constraint should be adaptively adjusted to align with the reliability of different pseudo-labels. Since pseudo-labels are fundamentally derived from pedestrian features extracted by the network, their accuracy is closely tied to the network’s ability to distinguish between pedestrians. When the model can effectively distinguish a sample, the corresponding prediction value should approximate a one-hot vector. Inspired by this observation, we design an adaptive inverse cross-entropy loss.

First, we define a distributional variance metric that measures the distribution of predicted values in addition to hard labels:

δ_{j} (x, \hat{y}) = \frac{p_{j} (x)}{1 - p_{\hat{y}} (x)}

(15)

where

\hat{y}

is the generated hard label of pedestrian x in the target domain, and

p_{j} (x)

denotes the confidence level that the model predicts x as the jth category. We further define the adaptive reverse cross entropy loss as follows:

L_{a c r e} (x, \hat{y}) = \frac{l_{r e c}}{\exp (- \sum_{j \neq \hat{y}} δ_{j} (x, \hat{y}) \log δ_{j} (x, \hat{y})) / τ}

(16)

where τ is a temperature parameter used to amplify the effect of distributional differences. This loss function adaptively adjusts the constraint strength based on the network’s discriminative ability. Specifically, when the entropy is lower, it indicates that the network struggles to differentiate between pedestrians in the line; thus, the constraint strength is increased. Conversely, when the entropy is higher, it suggests that the network can effectively distinguish between pedestrians, and the constraint strength is reduced accordingly.

Ultimately, the adaptive reverse cross-entropy loss is defined as follows:

L_{S C P L} = \sum_{i}^{B} l_{a c r e} (x_{i}, {\hat{y}}_{i})

(17)

Additionally, we employ the triplet loss commonly used in ReID tasks, which is defined as follows:

L_{C E} = - \frac{1}{B} \sum_{i = 1}^{B} \log p_{y_{i}} (x_{i})

(18)

Finally, the loss function of the entire framework can be expressed as follows:

L = L_{C E} + L_{m l} + L_{S C P L}

(19)

L_{m l}

is used to facilitate feature fusion between different domains.

L_{S C P L}

utilizes the classifier prediction results to adaptively constrain the pseudo-label generation and reduce the influence of noisy labels.

L_{m l}

acts on both source and target domains, while

L_{S C P L}

acts only on the target domain.

3.4. Optimization

Phase 1: Source model pre-training

In the first stage, the pedestrian re-identification model is trained by optimizing the identity loss (cross-entropy loss) and the triplet loss. Specifically, the cross-entropy loss is employed to optimize the classification task, while the triplet loss enhances the discriminative capability of the features by minimizing the distance between the anchor and positive samples while maximizing the distance between the anchor and negative samples.

Phase 2: Fine-tuning of domain adaptation

In the second stage, the re-identification model is further enhanced with a Reverse Attention-based Feature Fusion (RAFF) module and a metric learning module. As illustrated in Figure 1, the metric learning module is stacked on top of the backbone network. During this stage, both real and pseudo-labels provide supervised information. To optimize the pseudo-label generation process and mitigate the impact of noisy labels, we employ the adaptive reverse cross-entropy loss.

4. Experimental

4.1. Datasets and Evaluation Indicators

We evaluate the proposed method using four widely recognized public datasets for pedestrian re-identification: Market1501 [38], DukeMTMC-reID [39], CUHK03 [40], and MSMT17 [25]. Additionally, we introduce a custom dataset consisting of images of 80 electric field workers, which serves as the target domain. The performance of the algorithm is assessed using standard metrics, including mean average precision (mAP) and Cumulative Match Characteristic (CMC) at ranks 1, 5, and 10.

4.2. Realization Details

We use ResNet50 [11] as the backbone network in both the pre-training and domain-adaptive fine-tuning stages. The experiments were conducted on two NVIDIA RTX 3090 GPUs. The Adam optimizer [41] was applied with a momentum factor of 0.9 and a weight decay of 0.0001. Input images were uniformly resized to 256 × 128 pixels. Data augmentation techniques included random horizontal flipping, random cropping, random rotation (±10°), and random color jitter. Random erasure was employed only during the fine-tuning stage.

Phase 1: Pre-training

The ResNet50 network is initialized with pre-trained ImageNet weights. Each training batch consists of 128 images from 16 identity categories (eight images per category). The learning rate follows a cosine annealing schedule, starting at 0.001 with a warm-up period of two epochs. Over the course of 60 training epochs, the learning rate gradually decays to 0.00001 of its initial value.

Phase 2: Domain Adaptive Fine-tuning

The ResNet50-RAFF network is initialized using the weights from the ResNet50 model. Each training batch contains 128 × (K + 1) images from K source domains and one target domain, with 16 identity categories selected per domain. The same cosine annealing strategy is used, starting with a learning rate of 0.001.

4.3. Comparison with State-of-the-Art Methods

This section systematically evaluates the performance of the proposed RCDL method in multi-source unsupervised domain adaptation (MSUDA) tasks across four widely adopted person re-identification datasets: Market-1501 (M), DukeMTMC-reID (D), CUHK03 (C), and MSMT17 (T). To comprehensively validate the effectiveness of our approach, we compare it with state-of-the-art single-source and multi-source domain adaptation method. The evaluation metrics include mean average precision (mAP) and rank-1 accuracy, with detailed results presented in Table 1 and Table 2.

When targeting Market-1501 or DukeMTMC, RCDL demonstrates notable advantages across diverse source domain combinations. Using Market-1501 as the target domain, RCDL achieves 82.4% mAP and 93.1% rank-1 accuracy under the D + T + C multi-source configuration. While its mAP and rank-1 slightly trails the leading multi-source method MSUDA [29], RCDL surpasses other methods in mAP and rank-1. Notably, MSUDA relies on computationally intensive feature fusion strategies, whereas RCDL employs a lightweight architecture without sacrificing performance. Compared to single-source methods, RCDL improves mAP by 2.9% over the top-performing single-source method GLT [28], validating that multi-source adaptation effectively mitigates bias from individual source domains through complementary information fusion.

Using DukeMTMC-reID as the target domain, RCDL achieves 71.9% mAP and 81.8% rank-1 under the M + T + C configuration, not only attaining the best performance but also significantly surpassing earlier multi-source approaches. Further analysis reveals that conventional methods such as MASDF [51] suffer from inadequate modeling of cross-domain discrepancies. In contrast, RCDL enhances the discriminative power of target domain features via dynamic inter-domain relationship modeling and cross-domain prototype alignment.

MSMT17, characterized by complex lighting, resolution variations, and cluttered backgrounds, poses a more challenging target domain. Experiments indicate that single-source methods struggle on MSMT17, with mAP consistently below 30%. Multi-source methods, however, achieve substantial improvements by integrating knowledge from multiple domains. Specifically, RCDL achieves 34.1% mAP and 62.9% rank-1 under the M + D + C configuration, surpassing CDM [35] by 1.2% and outperforming the best single-source method GLT (27.7%) by 23.1% in mAP. Although its rank-1 slightly lags behind CDM [35] (62.9% vs. 63.8%), the superior mAP highlights RCDL’s robustness in overall retrieval accuracy.

While RCDL excels in most scenarios, its rank-1 on Market-1501 marginally trails MSUDA [29], potentially due to MSUDA’s more extensive integration of detailed features. Future work could further optimize performance by incorporating fine-grained alignment modules. Additionally, the mAP on MSMT17 leaves room for improvement, necessitating exploration of advanced cross-domain noise suppression mechanisms.

In summary, the experimental results confirm RCDL’s competitiveness in multi-source adaptation tasks. Its superior generalization capability provides valuable insights for future research in domain-adaptive person re-identification.

4.4. Ablation Studies

To systematically evaluate the contribution of the proposed components, we conduct ablation experiments on the multi-source domain adaptation ReID task, using Market-1501 as the target domain. The results are summarized in Table 3. Our baseline follows a standard unsupervised domain adaptation (UDA) pipeline: ResNet50 pre-trained on labeled source domains (Market-1501 and DukeMTMC), pseudo-labels generated for the target domain using DBSCAN clustering on feature embeddings, and joint fine-tuning with source labels and target pseudo-labels using cross-entropy and triplet loss. This generic approach excludes domain alignment modules (RAFF) and pseudo-label correction (SCPL) to isolate the impact of our contributions.

Impact of the Reverse Attention-based Feature Fusion (RAFF) Module: We first isolate the RAFF module by removing the Semantic Consistency Prototype Learning (SCPL) loss, labeled as RAFF (without SCPL). The baseline model (without both RAFF and SCPL) achieved 72.3% mAP and 83.1% rank-1 accuracy. When only RAFF is applied, the model achieves 76.3% mAP and 88.1% rank-1, demonstrating its ability to align multi-source features through attention-driven fusion.

Effect of the Self-Correcting Pseudo-Label Loss (SCPL): Next, we retain only the SCPL loss while disabling RAFF, labeled as SCPL (without RAFF). This configuration achieved 74.9% mAP and 87.2% rank-1, indicating that SCPL effectively constrains pseudo-label noise, thereby enhancing the discriminability of cross-domain features.

Synergy between RAFF and SCPL: When both RAFF and SCPL are optimized simultaneously, the complete model achieves 82.4% mAP and 93.1% rank-1, surpassing the individual contributions of RAFF (76.3% mAP, 88.1% rank-1) and SCPL (74.9% mAP, 87.2% rank-1). This synergy arises because RAFF improves feature fusion through attention mechanisms, mitigating domain-specific feature biases, while SCPL ensures cross-domain semantic consistency by constraining pseudo-label noise.

4.5. Visualization of Experimental Results

To visualize the impact of domain fusion, we project the fused features ℎ onto a 2D space, as shown in Figure 3. At the early stages of training, the feature distributions across different domains appear scattered, with clear domain-specific differences. As training progresses, the features from different domains gradually converge and fuse. By the end of training, features from the same category become more compact in the feature space, while the deep fusion between domains is maintained. This demonstrates that the RAFF module effectively aligns and merges features from different domains.

4.6. Parametric Analysis

We also analyze the impact of two key hyperparameters, α and β. α controls the exponential sliding average update of domain style centers, while β governs the updating of class-level feature memories M. Experimental comparisons with different values of these hyperparameters are presented in Figure 4. The results show that α tends to have a larger value, while β is generally smaller. This indicates that the domain style center requires a larger update magnitude to provide sufficient gradient for the network, whereas the memory M benefits from a slower update to maintain stability, as it plays a role in the generation of pseudo-labels.

4.7. Further Validation on the Power Field Operator Dataset

To thoroughly assess the generalization and practicality of the proposed method, we conducted additional experiments using a self-constructed dataset of power field operators. This dataset contains image data collected from real power operation scenes, capturing variations in environmental conditions, shooting angles, and personnel postures.

4.7.1. Introduction to Datasets

The power field operator dataset used in this study is jointly authorized by a domestic provincial power company and a double first-class university. Due to industry confidentiality requirements, the data collector’s information and geographic identifiers have been desensitized, and access to the original data is restricted to authorized personnel in an encrypted environment. The dataset includes a total of 80 distinct operator identities, with each identity having 20 to 30 image samples. The dataset is named “PowerID80” and will be referred to as “P” in the subsequent sections of this paper.

To ensure effective model training and evaluation, we randomly split the dataset into a training set, a validation set, and a test set at a 6:1:3 ratio. The training set comprises 60% of the total dataset and is used to train the model to learn features of the different operators. The validation set, which accounts for 10%, is used for hyperparameter tuning and model evaluation during training to prevent overfitting. The test set, making up 30%, is used to evaluate the model’s generalization ability on unseen data. This division allows for comprehensive performance evaluation at different stages while maintaining a balanced data distribution.

4.7.2. Experimental Settings

We use Market1501 and DukeMTMC-reID as the source domain datasets and the power field operator dataset as the target domain dataset. Model training and evaluation follow the same experimental setup as described in Section 4. Given the relatively small size of the power field operator dataset, we used a smaller batch size (8) and increased the number of training rounds (100) to maximize data utilization. Additionally, we monitored the performance of the validation set during training and implemented an early-stop mechanism to prevent overfitting. All experimental results are based on the test set.

4.7.3. Experimental Results and Analysis

To validate the effectiveness of our method, we compared it with recent state-of-the-art methods on the Electric Power Operator dataset. All comparative experiments were conducted based on the open-source code of each method, with the results obtained on our dataset marked with an asterisk (*). As shown in Table 4, our method achieved competitive performance, with a mAP of 87.5% and a rank-1 accuracy of 92.1%, outperforming most existing works, which significantly demonstrates the effectiveness of our approach. Furthermore, compared to the MSUDA method, our method maintained a minimal performance gap (a −0.6% mAP drop and −1.5% rank-1 drop). It is noteworthy that MSUDA remains the best-performing open-source method in this subfield, as there has been limited research published or made available since 2021. Although MSUDA achieved slightly higher accuracy, its multi-source fusion framework based on graph neural networks requires more computational resources, limiting its deployment in resource-constrained scenarios. This also indicates that, even without considering resource consumption, our method has already surpassed most existing methods, further confirming its efficiency and practicality.

To provide a more intuitive demonstration of the algorithm’s effectiveness, Figure 5 presents several representative retrieval results. Each row shows the top ten retrievals for a specific query image, with correct matches highlighted by a green border and incorrect matches indicated by a red border. The results illustrate that the proposed algorithm can accurately recognize the same operator across various environments, showcasing its strong robustness and accuracy.

In summary, the unsupervised domain adaptive pedestrian re-identification method proposed in this paper demonstrates strong performance on a self-constructed dataset of electric power field operators, highlighting the algorithm’s generalization and practical potential. The dataset captures various challenges encountered in real operating scenarios within the electric power industry, and the algorithm’s performance is evaluated in a thorough and objective manner. Experimental results show that the proposed method effectively leverages complementary information from multi-domain data to learn robust feature representations. This approach is highly significant for improving the intelligence of power field operations, ensuring operator safety and enhancing the efficiency of inspection and repair processes.

5. Conclusions

In this paper, we introduce a novel symmetry-aware domain adaptive framework for the multi-source unsupervised pedestrian re-identification task, which has been validated on multiple public datasets, demonstrating significant performance improvements. The framework incorporates a cross-domain attention mechanism and an adaptive pseudo-label optimization strategy. It features the design of a cross-domain attention alignment module (RAFF) capable of preserving structural symmetry in cross-domain features and an adaptive reverse cross-entropy loss function (SCPL), achieving deep feature fusion through domain prototyping and an inverse attention mechanism. Additionally, it adaptively constrains the pseudo-label generation process by exploiting prediction distribution differences while incorporating symmetry-rectifying constraints.

Building on this, we apply the framework to the task of re-identifying electric field operators, using several typical ReID models on a self-constructed electric power industry dataset for comparison. The results show that the proposed method outperforms the benchmark model across various evaluation metrics, fully demonstrating its practicality and superiority in re-identifying personnel in the electric power industry.

Future research directions include model miniaturization and real-time adaptation, reducing computation and storage overheads through model compression and knowledge distillation techniques to cater to resource-constrained power operation scenarios. The introduction of active learning strategies could help minimize manual annotation costs while enhancing model performance. Moreover, the multi-source domain adaptation paradigm and core ideas (e.g., feature space alignment, pseudo-label optimization) proposed in this paper can be extended to tasks such as face recognition, vehicle re-identification, and fine-grained image classification. These advancements could provide new solutions for challenges like labeling scarcity and domain bias. It is expected that these research directions will lead to breakthroughs in intelligent applications, both within the power industry and in other domains.

Author Contributions

Conceptualization, J.H. and L.Z.; methodology, J.H. and L.Z.; software, X.Z. and T.X.; validation, K.W. and P.L.; formal analysis, X.L. and T.X.; investigation, J.H. and X.Z.; resources, K.W. and P.L.; data curation, X.Z. and X.L.; writing—original draft preparation, J.H. and L.Z.; writing—review and editing, L.Z. and X.L.; visualization, T.X. and X.Z.; supervision, L.Z.; project administration, L.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Key Area Project of the Guangdong Provincial Department of Education (No. 2023ZDZX1043), the R&D Plan Project in the Key Scientific Research Platform of Universities in Guangdong Province (No. 2022KSYS016), the Provincial Key Platform and Major Scientific Research Project of Guangdong Universities (No. 2023KCXTD044), and the Research Project on Financial Security-Oriented Intelligent Edge Computing Technology and Applications Based on National Cryptographic Standards (No. 2220004002460).

Institutional Review Board Statement

The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Institutional Review Board of Zhuhai College Beijing Institute of Technology (BITZH-BUS-2025-004 and date of approval, 21 April 2025).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The public datasets used in this study, including Market1501, DukeMTMC-reID, CUHK03, and MSMT17, are widely available in the research community. Per confidentiality agreements, the research data are not publicly available. All participants provided informed consent.

Acknowledgments

The authors would like to express their gratitude to the Guangdong Provincial Department of Education for their support in advancing research on unsupervised adaptive pedestrian re-identification methods based on multiple source domains. We also acknowledge the contributions of our colleagues and collaborators who provided valuable insights and technical assistance throughout this study.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

RAFF	Reverse Attention-based Feature Fusion
SCPL	Self-Correcting Pseudo-Label Loss
ReID	Person Re-Identification
mAP	Mean Average Precision
CMC	Cumulative Matching Characteristics
EMA	Exponential Moving Average
Adam	Adaptive Moment Estimation
GPU	Graphics Processing Unit
UDA	Unsupervised Domain Adaptation
MUDA	Multi-source Unsupervised Domain Adaptation
MLP	Multi-Layer Perceptron

References

Sun, Z.; Wang, X.; Zhang, Y.; Song, Y.; Zhao, J.; Xu, J.; Yan, W.; Lv, C. A comprehensive review of pedestrian re-identification based on deep learning. Complex Intell. Syst. 2024, 10, 1733–1768. [Google Scholar] [CrossRef]
Zheng, L.; Yang, Y.; Hauptmann, A.G. Person re-identification: Past, present and future. arXiv 2016, arXiv:1610.02984. [Google Scholar]
Ye, M.; Shen, J.; Lin, G.; Xiang, T.; Shao, L.; Hoi, S.C. Deep learning for person re-identification: A survey and outlook. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 2872–2893. [Google Scholar] [CrossRef] [PubMed]
Ge, Y.; Chen, D.; Li, H. Mutual mean-teaching: Pseudo label refinery for unsupervised domain adaptation on person re-identification. arXiv 2020, arXiv:2001.01526. [Google Scholar]
Song, L.; Wang, C.; Zhang, L.; Du, B.; Zhang, Q.; Huang, C.; Wang, X. Unsupervised domain adaptive re-identification: Theory and practice. Pattern Recognit. 2020, 102, 107173. [Google Scholar] [CrossRef]
Yang, F.; Li, K.; Zhong, Z.; Luo, Z.; Sun, X.; Cheng, H.; Guo, X.; Huang, F.; Ji, R.; Li, S. Asymmetric co-teaching for unsupervised cross-domain person re-identification. Proc. AAAI Conf. Artif. Intell. 2020, 34, 12597–12604. [Google Scholar] [CrossRef]
Navneet, D.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; pp. 886–893. [Google Scholar]
Lowe, D.G. Object recognition from local scale-invariant features. In Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece, 20–27 September 1999; pp. 1150–1157. [Google Scholar]
Chen, L.; Chen, H.; Li, S.; Zhu, J. Person re-identification based on Weber local descriptor and color features. In Proceedings of the 2015 12th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), Chengdu, China, 18–20 December 2015; p. 33. [Google Scholar]
Koestinger, M.; Hirzer, M.; Wohlhart, P.; Roth, P.M.; Bischof, H. Large scale metric learning from equivalence constraints. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 2288–2295. [Google Scholar]
Liao, S.; Hu, Y.; Zhu, X.; Li, S.Z. Person re-identification by local maximal occurrence representation and metric learning. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 2197–2206. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Simonyan, K. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Chen, S.; Wang, H.; Jin, C.; Zhang, W. Person re-identification based on siamese network and re-ranking. J. Comput. Appl. 2018, 38, 3161–3166. [Google Scholar]
Liu, J.; Zha, Z.-J.; Tian, Q.; Liu, D.; Yao, T.; Ling, Q.; Mei, T. Multi-scale triplet cnn for person re-identification. In Proceedings of the 24th ACM International Conference on Multimedia, Amsterdam, The Netherlands, 15–19 October 2016; pp. 192–196. [Google Scholar]
Qian, X.; Fu, Y.; Jiang, Y.-G.; Xiang, T.; Xue, X. Multi-scale deep learning architectures for person re-identification. In Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5399–5408. [Google Scholar]
Sun, Y.; Zheng, L.; Yang, Y.; Tian, Q.; Wang, S. Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 480–496. [Google Scholar]
Chen, Y.; Wang, H.; Sun, X.; Fan, B.; Tang, C.; Zeng, H. Deep attention aware feature learning for person re-identification. Pattern Recognit. 2022, 126, 108567. [Google Scholar] [CrossRef]
Wu, Y.; Bourahla, O.E.F.; Li, X.; Wu, F.; Tian, Q.; Zhou, X. Adaptive graph representation learning for video person re-identification. IEEE Trans. Image Process. 2020, 29, 8821–8830. [Google Scholar] [CrossRef]
Lin, Y.; Dong, X.; Zheng, L.; Yan, Y.; Yang, Y. A bottom-up clustering approach to unsupervised person re-identification. Proc. AAAI Conf. Artif. Intell. 2019, 33, 8738–8745. [Google Scholar] [CrossRef]
Lin, Y.; Xie, L.; Wu, Y.; Yan, C.; Tian, Q. Unsupervised person re-identification via softened similarity learning. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 3390–3399. [Google Scholar]
Jin, X.; Lan, C.; Zeng, W.; Chen, Z. Global distance-distributions separation for unsupervised person re-identification. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Proceedings, Part VII, Glasgow, UK, 23–28 August 2020; pp. 735–751. [Google Scholar]
Wang, D.; Zhang, S. Unsupervised person re-identification via multi-label classification. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10981–10990. [Google Scholar]
Deng, W.; Zheng, L.; Ye, Q.; Yang, Y.; Jiao, J. Similarity-preserving image-image domain adaptation for person re-identification. arXiv 2018, arXiv:1811.10551. [Google Scholar]
Wei, L.; Zhang, S.; Gao, W.; Tian, Q. Person transfer gan to bridge domain gap for person re-identification. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 79–88. [Google Scholar]
Yu, H.-X.; Zheng, W.-S.; Wu, A.; Guo, X.; Gong, S.; Lai, J.-H. Unsupervised person re-identification by soft multilabel learning. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2148–2157. [Google Scholar]
Fu, Y.; Wei, Y.; Wang, G.; Zhou, Y.; Shi, H.; Huang, T. Self-similarity grouping: A simple unsupervised cross domain adaptation approach for person re-identification. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6112–6121. [Google Scholar]
Zheng, K.; Liu, W.; He, L.; Mei, T.; Luo, J.; Zha, Z.-J. Group-aware label transfer for domain adaptive person re-identification. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 5310–5319. [Google Scholar]
Bai, Z.; Wang, Z.; Wang, J.; Hu, D.; Ding, E. Unsupervised multi-source domain adaptation for person re-identification. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 12914–12923. [Google Scholar]
Qi, L.; Liu, J.; Wang, L.; Shi, Y.; Geng, X. Unsupervised generalizable multi-source person re-identification: A domain-specific adaptive framework. Pattern Recognit. 2023, 140, 109546. [Google Scholar] [CrossRef]
Wang, H.; Hu, J.; Zhang, G. Multi-source transfer network for cross domain person re-identification. IEEE Access 2020, 8, 83265–83275. [Google Scholar] [CrossRef]
Chen, Y.; Tang, Q.; Ma, H. Multi-source adaptive meta-learning framework for domain generalization person re-identification. Soft Comput. 2024, 28, 4799–4820. [Google Scholar] [CrossRef]
Tian, Q.; Cheng, Y. Unsupervised multi-source domain adaptation for person re-identification via sample weighting. Intell. Data Anal. 2024, 28, 943–960. [Google Scholar] [CrossRef]
Tian, Q.; Cheng, Y.; He, S.; Sun, J. Unsupervised multi-source domain adaptation for person re-identification via feature fusion and pseudo-label refinement. Comput. Electr. Eng. 2024, 113, 109029. [Google Scholar] [CrossRef]
Xian, Y.; Peng, Y.X.; Sun, X.; Zheng, W.S. Distilling consistent relations for multi-source domain adaptive person re-identification. Pattern Recognit. 2025, 157, 110821. [Google Scholar] [CrossRef]
Ester, M.; Kriegel, H.-P.; Sander, J.; Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, OR, USA, 2–4 August 1996; pp. 226–231. [Google Scholar]
Fan, H.; Zheng, L.; Yan, C.; Yang, Y. Unsupervised person re-identification: Clustering and fine-tuning. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 2018, 14, 83. [Google Scholar] [CrossRef]
Zheng, L.; Shen, L.; Tian, L.; Wang, S.; Wang, J.; Tian, Q. Scalable person re-identification: A benchmark. In Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1116–1124. [Google Scholar]
Ristani, E.; Solera, F.; Zou, R.; Cucchiara, R.; Tomasi, C. Performance measures and a data set for multi-target, multi-camera tracking. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–10 & 15–16 October 2016; pp. 17–35. [Google Scholar]
Li, W.; Zhao, R.; Xiao, T.; Wang, X. Deepreid: Deep filter pairing neural network for person re-identification. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 152–159. [Google Scholar]
Kingma, D.P. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Zhong, Z.; Zheng, L.; Luo, Z.; Li, S.; Yang, Y. Invariance matters: Exemplar memory for domain adaptive person re-identification. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 598–607. [Google Scholar]
Zhai, Y.; Lu, S.; Ye, Q.; Shan, X.; Chen, J.; Ji, R.; Tian, Y. Ad-cluster: Augmented discriminative clustering for domain adaptive person re-identification. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9021–9030. [Google Scholar]
Zhao, F.; Liao, S.; Xie, G.-S.; Zhao, J.; Zhang, K.; Shao, L. Unsupervised domain adaptation with noise resistible mutual-training for person re-identification. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Proceedings, Part XI, Glasgow, UK, 23–28 August 2020; pp. 526–544. [Google Scholar]
Zhai, Y.; Ye, Q.; Lu, S.; Jia, M.; Ji, R.; Tian, Y. Multiple Expert Brainstorming for Domain Adaptive Person Re-Identification. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Proceedings of the Computer Vision—ECCV 2020, Part VII, Glasgow, UK, 23–28 August 2020; Springer: Cham, Switzerland, 2020; pp. 594–611. [Google Scholar]
Yang, Q.; Yu, H.-X.; Wu, A.; Zheng, W.-S. Patch-based discriminative feature learning for unsupervised person re-identification. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3633–3642. [Google Scholar]
Ge, Y.; Zhu, F.; Chen, D.; Zhao, R. Self-paced contrastive learning with hybrid memory for domain adaptive object re-id. Adv. Neural Inf. Process. Syst. 2020, 33, 11309–11321. [Google Scholar]
Zheng, K.; Lan, C.; Zeng, W.; Zhang, Z.; Zha, Z.-J. Exploiting sample uncertainty for domain adaptive person re-identification. Proc. AAAI Conf. Artif. Intell. 2021, 35, 3538–3546. [Google Scholar] [CrossRef]
Xian, Y.; Hu, H. Enhanced multi-dataset transfer learning method for unsupervised person re-identification using co-training strategy. IET Comput. Vis. 2018, 12, 1219–1227. [Google Scholar] [CrossRef]
Yu, H.-X.; Wu, A.; Zheng, W.-S. Unsupervised person re-identification by deep asymmetric metric embedding. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 42, 956–973. [Google Scholar] [CrossRef]
Wu, A.; Zheng, W.-S.; Guo, X.; Lai, J.-H. Distilled person re-identification: Towards a more scalable system. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 1187–1196. [Google Scholar]
Li, J.; Zhang, S. Joint Visual and Temporal Consistency for Unsupervised Domain Adaptive Person Re-identification. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Proceedings of the Computer Vision—ECCV 2020, Part XXIV, Glasgow, UK, 23–28 August 2020; Springer: Cham, Switzerland, 2020; pp. 483–499. [Google Scholar]
Zou, Y.; Yang, X.; Yu, Z.; Kumar, B.V.K.V.; Kautz, J. Joint Disentangling and Adaptation for Cross-Domain Person Re-Identification. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Proceedings of the Computer Vision—ECCV 2020, Part II, Glasgow, UK, 23–28 August 2020; Springer: Cham, Switzerland, 2020; pp. 87–104. [Google Scholar]
Luo, C.; Song, C.; Zhang, Z. Generalizing Person Re-Identification by Camera-Aware Invariance Learning and Cross-Domain Mixup. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Proceedings of the Computer Vision—ECCV 2020, Part XV, Glasgow, UK, 23–28 August 2020; Springer: Cham, Switzerland, 2020; pp. 224–241. [Google Scholar]
Xuan, S.; Zhang, S. Intra-inter camera similarity for unsupervised person re-identification. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 11921–11930. [Google Scholar]

Figure 1. An overview of the proposed framework, which includes three key components: (1) a feature extractor that extracts pedestrian features from various domains; (2) the Reverse Attention-based Feature Fusion (RAFF) module that facilitates message passing and fuses features from different domains, with identical colors representing the same domain and distinct shapes denoting different pedestrian IDs; (3) the Self-Correcting Pseudo-Label Loss (SCPL) that uses a memory bank to generate soft labels and pseudo-labels (hard labels) for the target domain. This framework aims to improve the discriminability of pedestrian features across domains by utilizing domain-specific information and reducing the impact of noisy pseudo-labels during training. The RAFF module focuses on transferring less relevant, domain-specific features, while the SCPL loss helps correct and reinforce pseudo-labels, enhancing the model’s generalization ability.

Figure 2. RAFF review: Pedestrian features from different domains are fused by the Reverse Attention-based Feature Fusion (RAFF) mechanism. For simplicity, N, B, and D are set to 2, 6, and 5, respectively, where N denotes the number of domains, B represents the number of pedestrian images in each domain, and D denotes the feature dimension. Domain prototypes are mapped to Q, while all pedestrian features are mapped to K and V. Domain style centroids are obtained by inverse attention maps A′ and V and then updated by EMA. Finally, the fused pedestrian features h are obtained using MLP, for subsequent metric learning and pseudo-label generation.

Figure 3. Visualization of feature distribution during training with Market-1501 as the target domain. Panels (a–c) correspond to the 1st, 10th, and 20th epochs, respectively. To enhance clarity, features from the top ten categories of each domain are selected, with different colors representing different categories. Solid circles represent all source domains, while squares represent the target domain.

Figure 4. Sensitivity analysis of α and β.

Figure 5. Visualization of retrieval results of different algorithms on representative samples.

Table 1. Performance comparison of single-source and multi-source unsupervised domain adaptation methods on Market-1501 and DukeMTMC-reID target domains. M: Market-1501, D: DukeMTMC-reID, C: CUHK03, T: MSMT17. The mAP and rank-1 are reported in %.

Methods	Reference	Target: Market-1501			Target: DukeMTMC
Methods	Reference	Source	mAP	Rank-1	Source	mAP	Rank-1
Single-Source UDA Methods
ECN [42]	CVPR’19	D	43.0	75.1	M	40.4	63.3
SSG [27]	ICCV’19	D	58.3	80.0	M	53.4	73.0
MMCL [23]	CVPR’20	D	60.4	84.4	M	51.4	72.4
ACT [6]	AAAI’20	D	60.6	80.5	M	54.5	72.4
AD-Cluster [43]	CVPR’20	D	68.3	86.7	M	54.1	72.6
NRMT [44]	ECCV’20	D	72.2	88.0	M	62.3	78.1
MMT [4]	ICLR’20	D	71.2	87.7	M	65.1	78.0
MEB [45]	ECCV’20	D	76.0	89.9	M	66.1	79.6
MAR [26]	CVPR’19	T	40.0	67.7	T	48.0	67.1
PAUL [46]	CVPR’19	T	40.1	68.5	T	53.2	72.0
SpCL [47]	NIPS’20	T	77.5	89.7	-	-	-
UNRN [48]	AAAI’21	D	78.1	91.9	M	69.1	82.0
GLT [28]	CVPR’21	D	79.5	92.2	M	69.2	82.0
Multiple-Source UDA Methods
PUCL [49]	IET-CV’18	D + C	22.9	48.5	M + C	19.5	33.1
DECAMEL [50]	TPAMI19	Multi.	32.4	60.2	-	-	-
MASDF [51]	CVPR’19	Multi.	33.5	61.5	Multi.	29.4	48.4
MSUDA [29]	CVPR’21	D + T + C	86.0	94.8	M + T + C	68.9	82.1
CDM [35]	PR 2025	D + T + C	81.2	92.9	M + T + C	70.8	82.3
RCDL (ours)	This work	D + T + C	82.4	93.1	M + T + C	71.9	81.8

Table 2. Performance comparison of single-source and multi-source unsupervised domain adaptation methods on MSMT17 target domain. M: Market-1501, D: DukeMTMC-reID, C: CUHK03. The mAP and rank-1 are reported in %. The results for MMT-dbscan* and SpCL* marked with asterisks are reproduced from the findings reported in [35].

Methods	Reference	Target Domain: MSMT17
Methods	Reference	Source	mAP	Rank-1
Single-Source UDA Methods
PTGAN [25]	CVPR’18	D	3.3	11.8
ECN [42]	CVPR’19	D	10.2	30.2
SSG [27]	ICCV19	D	13.3	32.2
MMCL [23]	CVPR’20	D	16.2	43.6
JVTC [52]	ECCV’20	D	19.0	42.1
NRMT [44]	ECCV’20	D	20.6	45.2
DG-Net++ [53]	ECCV20	D	22.1	48.8
MMT [4]	ICLR’20	D	23.3	50.0
GPR [54]	ECCV’20	D	24.3	51.7
SpCL [47]	NIPS20	M	26.8	53.7
GLT [55]	CVPR’21	D	27.7	59.5
Multiple-Source UDA Methods
MMT-dbscan*	ICLR’20	M + D + C	25.9	51.8
SpCL*	NIPS’20	M + D + C	27.3	54.1
CDM [35]	PR 2025	M + D + C	32.9	63.8
RCDL (ours)	This work	M + D + C	34.1	62.9

Table 3. Ablation studies evaluating model components using Market-1501 as the target domain. M: Market-1501, D: DukeMTMC-reID, C: CUHK03, T: MSMT17. The mAP and rank-1 are reported in %.

Method	D + T + C → M		Notes
Method	mAP	Rank-1	Notes
Baseline (w/o RAFF, w/o SCPL)	72.3	83.1	Baseline performance
RAFF (w/o SCPL)	76.3	88.1	Attention-driven feature fusion
SCPL (w/o RAFF)	74.9	87.2	Constraining pseudo label noise
RCDL(RAFF + SCPL)	82.4	93.1	Optimal performance with synergy observed

Table 4. Comparison of test results on the power field operator dataset.

Methods	Reference	Target: PowerID80
Methods	Reference	Source	mAP	Rank-1
MMT-dbscan *	ICLR’20	D + C	78.7	89.3
MEB *	ECCV’20	D + C	80.2	91.1
MSUDA *	CVPR’21	D + C	88.1	93.6
Ours	This work	D + C	87.5	92.1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

He, J.; Zhang, L.; Zhang, X.; Xu, T.; Wang, K.; Li, P.; Liu, X. Symmetry-Guided Prototype Alignment and Entropy Consistency for Multi-Source Pedestrian ReID in Power Grids: A Domain Adaptation Framework. Symmetry 2025, 17, 672. https://doi.org/10.3390/sym17050672

AMA Style

He J, Zhang L, Zhang X, Xu T, Wang K, Li P, Liu X. Symmetry-Guided Prototype Alignment and Entropy Consistency for Multi-Source Pedestrian ReID in Power Grids: A Domain Adaptation Framework. Symmetry. 2025; 17(5):672. https://doi.org/10.3390/sym17050672

Chicago/Turabian Style

He, Jia, Lei Zhang, Xiaofeng Zhang, Tong Xu, Kejun Wang, Pengsheng Li, and Xia Liu. 2025. "Symmetry-Guided Prototype Alignment and Entropy Consistency for Multi-Source Pedestrian ReID in Power Grids: A Domain Adaptation Framework" Symmetry 17, no. 5: 672. https://doi.org/10.3390/sym17050672

APA Style

He, J., Zhang, L., Zhang, X., Xu, T., Wang, K., Li, P., & Liu, X. (2025). Symmetry-Guided Prototype Alignment and Entropy Consistency for Multi-Source Pedestrian ReID in Power Grids: A Domain Adaptation Framework. Symmetry, 17(5), 672. https://doi.org/10.3390/sym17050672

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Symmetry-Guided Prototype Alignment and Entropy Consistency for Multi-Source Pedestrian ReID in Power Grids: A Domain Adaptation Framework

Abstract

1. Introduction

2. Related Work

2.1. Supervised Person Re-Identification

2.2. Research Progress in Unsupervised Person Re-Identification

3. Methodology

3.1. General

3.2. Reverse Attention-Based Domain Merging Module

3.3. Adaptive Reverse Cross-Entropy Loss

3.4. Optimization

4. Experimental

4.1. Datasets and Evaluation Indicators

4.2. Realization Details

4.3. Comparison with State-of-the-Art Methods

4.4. Ablation Studies

4.5. Visualization of Experimental Results

4.6. Parametric Analysis

4.7. Further Validation on the Power Field Operator Dataset

4.7.1. Introduction to Datasets

4.7.2. Experimental Settings

4.7.3. Experimental Results and Analysis

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI