Person Re-Identification Under Non-Overlapping Cameras Based on Advanced Contextual Embeddings
Abstract
1. Introduction
- We conducted a systematic analysis of the key hyperparameters of CA-Jaccard [2], specifically the number of neighbors and the expansion range , providing an empirical basis for parameter settings when applying this method to different datasets.
- We utilized Feature Vector Normalization and Feature Space Compression to jointly enhance Person Re-identification performance. Together, they enable robust and efficient cross-camera feature matching while maintaining high accuracy.
- We leveraged Neighborhood Expansion Structure Optimization and Cross-camera Penalty Weight Tuning to synergistically enhance Person Re-identification by improving cross-camera feature matching. Together, they enhance the robustness and accuracy of cross-camera feature matching.
2. Related Work
2.1. Paradigm Shift from Handcrafted Features to Deep Learning
2.2. Introduction and Development of the Transformer Architecture
2.3. Auxiliary Technologies and Specialized Research Directions
- Distance Metric and Re-ranking: The distance matching process after feature extraction is equally crucial, as the traditional Euclidean distance cannot fully reflect the semantic similarity between features. To address this, Zhong et al. [11] proposed k-reciprocal re-ranking, which utilizes the neighborhood relationships among samples to refine the initial ranking. The CA-Jaccard [2] method adopted in our study builds upon this foundation by further incorporating Jaccard distance and camera information to tackle the domain shift problem across cameras.
- Data Augmentation: To enhance model robustness, data augmentation techniques are widely applied. Random Erasing, proposed by Zhong et al. [12], simulates occlusion by randomly erasing regions of an image. Meanwhile, methods based on Generative Adversarial Networks (GANs), such as PTGAN proposed by Wei et al. [13] and CamStyle proposed by Zhong et al. [14], can generate samples in different styles to bridge the domain gap between cameras.
- Unsupervised and Cross-Domain Learning: Due to the high cost of data annotation, unsupervised ReID has become a prominent research area. MMT (Mutual Mean-Teaching), proposed by Ge et al. [15], employs a framework where two models learn from each other to effectively leverage unlabeled data. In the context of cross-domain adaptation, SPGAN, proposed by Deng et al. [16], and HH-ReID, proposed by Zhong et al. [17] are dedicated to preserving identity similarity during style transfer and addressing adaptation challenges with heterogeneous data modalities, respectively.
- Multi-Modal and Emerging Data Sources: To overcome the limitations of RGB imagery, research has begun to explore multi-modal fusion. Wu et al. [18] combined RGB and infrared data to handle scenes with insufficient illumination. More recently, Guo et al. [19] pioneered an ReID method based on LiDAR point clouds, which leverages 3D geometric structural information to counteract appearance variations and has opened a new direction for the field.
- Video-based ReID: For video data, the research focus lies in effectively utilizing temporal information. Early works, such as that by McLaughlin et al. [20], used LSTMs for sequence modeling. Subsequent methods like TRL [21] and TMN [22] have designed more complex spatiotemporal modeling mechanisms to capture dynamic cues.
- Lightweight Models: To deploy ReID models on resource-constrained edge devices, lightweight design has also garnered significant attention. The classic MobileNetV2 [23] architecture is widely used, while MetaGON [24], proposed by Zhang et al., combines meta-learning and GANs to design an efficient domain generalization model specifically for edge devices.
- Camera-aware k-reciprocal Nearest Neighbors (CK-RNNs) [2]: Traditional k-RNNs consider only the mutual proximity of image features, but ignore camera-related factors. On the other hand, CK-RNNs prioritize neighbors captured under the same or similar camera conditions when computing k-reciprocal neighbors, filtering out noisy neighbors caused by camera variations.
- Camera-aware Local Query Expansion (CLQE) [2]: This is an auxiliary module designed to leverage camera variations as a strong constraint. During the query expansion stage, it mines reliable samples from related neighbors and assigns higher weights based on camera similarity, further refining the overlap computation. This helps reduce cross-camera mismatches.
3. Proposed Method
3.1. System Framework
3.1.1. Feature Extraction and Optimization
3.1.2. Camera-Aware Re-Ranking
3.2. Stage One: Feature Extraction and Optimization
3.2.1. TransReID-Based Feature Extraction
3.2.2. Feature Space Optimization
L2 Normalization
PCA Compression
3.3. Stage Two: Camera-Aware Re-Ranking
4. Experiments
4.1. Experimental Setup
4.2. Experimental Results
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- He, S.; Luo, H.; Wang, P.; Wang, F.; Li, H.; Jiang, W. TransReID: Transformer-Based Object Re-Identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 15013–15022. [Google Scholar] [CrossRef]
- Chen, Y.; Fan, Z.; Chen, Z.; Zhu, Y. CA-Jaccard: Camera-Aware Jaccard Distance for Person Re-Identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024; pp. 17532–17541. [Google Scholar] [CrossRef]
- Gray, D.; Tao, H. Viewpoint Invariant Pedestrian Recognition with an Ensemble of Localized Features. In Proceedings of the European Conference on Computer Vision (ECCV 2008), Marseille, France, 12–18 October 2008; LNCS 5302. Springer: Berlin/Heidelberg, Germany, 2008; pp. 262–275. [Google Scholar] [CrossRef]
- Farenzena, M.; Bazzani, L.; Perina, A.; Murino, V.; Cristani, M. Person re-identification by symmetry-driven accumulation of local features. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA, 13–18 June 2010; pp. 2360–2367. [Google Scholar] [CrossRef]
- Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), San Diego, CA, USA, 20–26 June 2005; pp. 886–893. [Google Scholar] [CrossRef]
- Sun, Y.; Zheng, L.; Yang, Y.; Tian, Q.; Wang, S. Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; LNCS 11211. pp. 501–518. [Google Scholar] [CrossRef]
- Wang, G.; Yuan, Y.; Chen, X.; Li, J.; Zhou, X. Learning discriminative features with multiple granularities for person re-identification. In Proceedings of the ACM Multimedia Conference (ACM MM), Seoul, Republic of Korea, 22–26 October 2018; pp. 274–282. [Google Scholar] [CrossRef]
- Zhu, K.; Guo, H.; Zhang, S.; Wang, Y.; Liu, J.; Wang, J.; Tang, M. AAformer: Auto-aligned transformer for person re-identification. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 17307–17317. [Google Scholar] [CrossRef] [PubMed]
- Zhang, G.; Zhang, P.; Qi, J.; Lu, H. HAT: Hierarchical Aggregation Transformers for Person Re-Identification. In Proceedings of the 29th ACM International Conference on Multimedia (ACM MM 2021), Chengdu, China, 20–24 October 2021; pp. 516–525. [Google Scholar] [CrossRef]
- Zhong, Z.; Zheng, L.; Cao, D.; Li, S. Re-ranking person re-identification with k-reciprocal encoding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 3652–3661. [Google Scholar] [CrossRef]
- Zhong, Z.; Zheng, L.; Kang, G.; Li, S.; Yang, Y. Random Erasing Data Augmentation. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-20), New York, NY, USA, 7–12 February 2020; pp. 13001–13008. [Google Scholar] [CrossRef]
- Wei, L.; Zhang, S.; Gao, W.; Tian, Q. Person Transfer GAN to Bridge Domain Gap for Person Re-Identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 79–88. [Google Scholar] [CrossRef]
- Zhong, Z.; Zheng, L.; Zheng, Z.; Li, S.; Yang, Y. Camera Style Adaptation for Person Re-Identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 5157–5166. [Google Scholar] [CrossRef]
- Ge, Y.; Chen, D.; Li, H. Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-Identification. In Proceedings of the Eighth International Conference on Learning Representations (ICLR 2020), Virtual Event, 26–30 April 2020; Available online: https://openreview.net/forum?id=rJlnOhVYPS (accessed on 4 September 2025).
- Deng, W.; Zheng, L.; Ye, Q.; Kang, G.; Yang, Y.; Jiao, J. Image-Image Domain Adaptation with Preserved Self-Similarity and Domain-Dissimilarity for Person Re-Identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 994–1003. [Google Scholar] [CrossRef]
- Zhong, Z.; Zheng, L.; Cao, D.; Li, S. Generalizing a Person Retrieval Model Hetero- and Homogeneously. In Computer Vision—ECCV 2018, Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Springer: Cham, Switzerland, 2018; pp. 172–188. [Google Scholar] [CrossRef]
- Wu, A.; Zheng, W.-S.; Yu, H.-X.; Gong, S.; Lai, J. RGB-Infrared Cross-Modality Person Re-Identification. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 5390–5399. [Google Scholar] [CrossRef]
- Guo, W.; Pan, Z.; Liang, Y.; Xi, Z.; Zhong, Z.; Feng, J.; Zhou, J. LiDAR-Based Person Re-Identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 19–21 June 2024; pp. 17437–17447. [Google Scholar] [CrossRef]
- McLaughlin, N.; Martinez del Rincon, J.; Miller, P. Recurrent Convolutional Network for Video-Based Person Re-Identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1325–1334. [Google Scholar] [CrossRef]
- Dai, J.; Zhang, P.; Lu, H.; Wang, H. Video Person Re-Identification by Temporal Residual Learning. arXiv 2018, arXiv:1802.07918. [Google Scholar] [CrossRef] [PubMed]
- Eom, C.; Lee, G.; Lee, J.; Ham, B. Video-Based Person Re-Identification with Spatial and Temporal Memory Networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 12036–12045. [Google Scholar] [CrossRef]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar] [CrossRef]
- Peng, X.; Zhao, H.; Zhang, X.; Zhang, C.; Chen, C. MetaGON: A Lightweight Pedestrian Re-Identification Domain Generalization Model Adapted to Edge Devices. IEEE Open J. Commun. Soc. 2024, 5, 690–699. [Google Scholar] [CrossRef]
- Yan, J.; Wang, Y.; Luo, X.; Tai, Y.W. Fusionsegreid: Advancing person re-identification with multimodal retrieval and precise segmentation. arXiv 2025, arXiv:2503.21595. [Google Scholar]
- Asperti, A.; Fiorilla, S.; Nardi, S.; Orsini, L. A review of recent techniques for person re-identification. Mach. Vis. Appl. 2025, 36, 25. [Google Scholar] [CrossRef]
- Zheng, L.; Shen, L.; Tian, L.; Wang, S.; Wang, J.; Tian, Q. Scalable Person Re-Identification: A Benchmark. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1116–1124. [Google Scholar] [CrossRef]
- Yang, Z.; Jin, X.; Zheng, K.; Zhao, F. Unleashing Potential of Unsupervised Pre-Training with Intra-Identity Regularization for Person Re-Identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 14298–14307. [Google Scholar]
- Zhou, K.; Yang, Y.; Cavallaro, A.; Xiang, T. Omni-Scale Feature Learning for Person Re-Identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 3702–3712. [Google Scholar] [CrossRef]
- Zhu, K.; Guo, H.; Liu, Z.; Tang, M.; Wang, J. Identity-Guided Human Semantic Parsing for Person Re-Identification. In Computer Vision—ECCV 2020; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2020; Volume 12348, pp. 346–363. [Google Scholar] [CrossRef]
- Zhang, Z.; Lan, C.; Zeng, W.; Jin, X.; Chen, Z. Relation-Aware Global Attention for Person Re-Identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 3186–3195. [Google Scholar] [CrossRef]
- Li, H.; Wu, G.; Zheng, W.-S. Combined Depth Space Based Architecture Search for Person Re-Identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 6729–6738. [Google Scholar] [CrossRef]
- Zhang, A.; Gao, Y.; Niu, Y.; Liu, W.; Zhou, Y. Coarse-to-Fine Person Re-Identification with Auxiliary-Domain Classification and Second-Order Information Bottleneck. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 598–607. [Google Scholar]
- Wang, P.; Zhao, Z.; Su, F.; Meng, H. LTReID: Factorizable Feature Generation with Independent Components for Long-Tailed Person Re-Identification. IEEE Trans. Multimed. 2022, 25, 4610–4622. [Google Scholar] [CrossRef]
- Jia, M.; Cheng, X.; Lu, S.; Zhang, J. Learning Disentangled Representation Implicitly via Transformer for Occluded Person Re-Identification. IEEE Trans. Multimed. 2022, 25, 1294–1305. [Google Scholar] [CrossRef]
- Wang, T.; Liu, H.; Song, P.; Guo, T.; Shi, W. Pose-Guided Feature Disentangling for Occluded Person Re-Identification Based on Transformer. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-22), Virtual Event, 22 February–1 March 2022; pp. 2540–2549. [Google Scholar] [CrossRef]
- Zhu, H.; Ke, W.; Li, D.; Liu, J.; Tian, L.; Shan, Y. Dual Cross-Attention Learning for Fine-Grained Visual Categorization and Object Re-Identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 4692–4702. [Google Scholar] [CrossRef]
- Zhang, G.; Zhang, Y.; Zhang, T.; Li, B.; Pu, S. PHA: Patch-Wise High-Frequency Augmentation for Transformer-Based Person Re-Identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 14133–14142. [Google Scholar] [CrossRef]





| CPU | i7-13700 2.10 GHz |
| GPU | NVIDIA GeForce RTX 4090 |
| GPU Memory | 24 GB |
| OS | Ubuntu 23.10.1 |
| Python Version | 3.8.18 |
| PyTorch Version | 1.10.0 |
| CUDA Runtime Version | 11.2 |
| Batch Size | 64 |
| Epochs | 120 |
| Optimizer | Adam |
| Learning Rate | 0.00035 |
| Weight Decay | 5 × 10−4 |
| Image Size | 256 × 128 |
| Feature Dimension | 768 |
| Transformer Type | ViT-Base |
| CA-Jaccard (, Combinations |
| Methods | DBSCAN () | mAP | Rank-1 |
|---|---|---|---|
| TransReID | - | 88.2% | 95% |
| TransReID + CA-Jaccard ( = 20, = 4) | 0.5 | 93.57% | 94.95% |
| TransReID + CA-Jaccard ( = 25, = 4) | 0.5 | 93.56% | 94.96% |
| TransReID + CA-Jaccard ( = 30, = 4) | 0.5 | 93.42% | 94.77% |
| TransReID + CA-Jaccard ( = 20, = 6) | 0.5 | 93.58% | 94.98% |
| TransReID + CA-Jaccard ( = 25, = 6) | 0.5 | 93.51% | 94.86% |
| TransReID + CA-Jaccard ( = 30, = 6) | 0.5 | 93.45% | 94.75% |
| Methods | DBSCAN () | mAP | Rank-1 |
|---|---|---|---|
| TransReID | - | 64.9% | 83.3% |
| TransReID + CA-Jaccard ( = 20, = 4) | 0.5 | 68.76% | 83.28% |
| TransReID + CA-Jaccard ( = 25, = 4) | 0.5 | 68.78% | 83.24% |
| TransReID + CA-Jaccard ( = 30, = 4) | 0.5 | 68.79% | 83.24% |
| TransReID + CA-Jaccard ( = 20, = 6) | 0.5 | 68.81% | 83.3% |
| TransReID + CA-Jaccard ( = 25, = 6) | 0.5 | 68.79% | 83.24% |
| TransReID + CA-Jaccard ( = 30, = 6) | 0.5 | 68.79% | 83.29% |
| Methods | Reference | Market1501 | MSMT17 | ||
|---|---|---|---|---|---|
| mAP | Rank-1 | mAP | Rank-1 | ||
| CNN-based methods | |||||
| OSNet [29] | ICCV2019 | 84.9 | 94.8 | 52.9 | 78.7 |
| ISP [30] | ECCV2020 | 84.9 | 94.2 | - | - |
| RGA-SC [31] | CVPR2020 | 88.4 | 96.1 | 57.5 | 80.3 |
| CDNet [32] | CVPR2021 | 86.0 | 95.1 | 54.7 | 80.3 |
| C2F [33] | CVPR2021 | 87.7 | 94.8 | - | - |
| LTReID [34] | TMM2022 | 86.9 | 94.7 | 58.6 | 81.0 |
| ViT-based methods | |||||
| TransReID [1] | ICCV2021 | 88.2 | 95.0 | 64.9 | 83.3 |
| DRL-Net [35] | TMM2021 | 86.9 | 94.7 | 55.3 | 78.4 |
| HAT [10] | ACM2021 | 89.5 | 95.6 | 61.2 | 82.3 |
| PFD [36] | AAAI2022 | 89.7 | 95.5 | 64.4 | 83.8 |
| DCAL [37] | CVPR2022 | 87.5 | 94.7 | 64.0 | 83.1 |
| AAformer [9] | TNNLS2023 | 88.0 | 95.4 | 65.6 | 84.4 |
| PHA [38] | CVPR2023 | 90.2 | 96.1 | 68.9 | 86.1 |
| MGN + UP-ReID [28] | CVPR2023 | 91.1 | 97.1 | - | - |
| TransReID + CA-Jaccard (Ours) | - | 93.6 | 95.0 | 68.8 | 83.3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chuang, C.-H.; Huang, T.-C.; Wang, C.-W.; Lo, J.-H.; Lin, C.-L. Person Re-Identification Under Non-Overlapping Cameras Based on Advanced Contextual Embeddings. Algorithms 2025, 18, 714. https://doi.org/10.3390/a18110714
Chuang C-H, Huang T-C, Wang C-W, Lo J-H, Lin C-L. Person Re-Identification Under Non-Overlapping Cameras Based on Advanced Contextual Embeddings. Algorithms. 2025; 18(11):714. https://doi.org/10.3390/a18110714
Chicago/Turabian StyleChuang, Chi-Hung, Tz-Chian Huang, Chong-Wei Wang, Jung-Hua Lo, and Chih-Lung Lin. 2025. "Person Re-Identification Under Non-Overlapping Cameras Based on Advanced Contextual Embeddings" Algorithms 18, no. 11: 714. https://doi.org/10.3390/a18110714
APA StyleChuang, C.-H., Huang, T.-C., Wang, C.-W., Lo, J.-H., & Lin, C.-L. (2025). Person Re-Identification Under Non-Overlapping Cameras Based on Advanced Contextual Embeddings. Algorithms, 18(11), 714. https://doi.org/10.3390/a18110714

