Dual-Stage Attribute Embedding and Modality Consistency Learning-Based Visible–Infrared Person Re-Identification
Abstract
:1. Introduction
- We propose a novel attribute information embedding module to learn fine-grained information with modality consistency, which is the first exploration for fusing attribute information with token embeddings in the transformer backbone.
- We design an attribute embedding enhancement module to implement the secondary embedding of attribute information to ensure that the learned fine-grained discriminative features are not destroyed during training.
- To reduce modality differences, we design a modality consistency learning loss that can eliminate distribution discrepancy between the predictions for pedestrian images with the same identity.
2. Related Work
2.1. Single-Modality Person Re-Identification
2.2. Visible–Infrared Cross-Modality Person Re-Identification
3. Methods
3.1. Attribute Information Embedding Module
3.2. Attribute Embedding Enhancement Module
3.3. Modality Consistency Learning
3.4. Loss Function
4. Experiments
4.1. Datasets and Evaluation Metrics
4.2. Implementation Details
4.3. Comparison with State-of-the-Art Methods
4.4. Ablation Study
4.5. Visual Analysis
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Sun, Y.; Zheng, L.; Yang, Y.; Tian, Q.; Wang, S. Beyond Part Models: Person Retrieval with Refined Part Pooling (and A Strong Convolutional Baseline). In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 480–496. [Google Scholar]
- Wang, G.; Yang, S.; Liu, H.; Wang, Z. High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 6448–6457. [Google Scholar]
- Zhang, Z.; Lan, C.; Zeng, W.; Chen, Z.; Chang, S. Beyond Triplet Loss: Meta Prototypical N-Tuple Loss for Person Re-identification. IEEE Trans. Multimed. 2020, 24, 4158–4169. [Google Scholar] [CrossRef]
- Yang, S.; Zhang, Y.; Zhao, Q.; Pu, Y.; Yang, H. Prototype-Based Support Example Miner and Triplet Loss for Deep Metric Learning. Electronics 2023, 12, 3315. [Google Scholar] [CrossRef]
- Yu, H.; Wu, A.; Zheng, W. Unsupervised Person Re-Identification by Deep Asymmetric Metric Embedding. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 956–973. [Google Scholar] [CrossRef] [PubMed]
- Song, Y.; Liu, S.; Yu, S.; Zhou, S. Adaptive Label Allocation for Unsupervised Person Re-Identification. Electronics 2022, 11, 763. [Google Scholar] [CrossRef]
- Huang, Z.; Zhang, Z.; Lan, C.; Zeng, W. Lifelong Unsupervised Domain Adaptive Person Re-identification with Coordinated Anti-forgetting and Adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 14268–14277. [Google Scholar]
- Pu, N.; Zhong, Z.; Sebe, N.; Lew, M. A Memorizing and Generalizing Framework for Lifelong Person Re-Identification. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 13567–13585. [Google Scholar] [CrossRef] [PubMed]
- Liu, X.; Zhao, H.; Tian, M.; Sheng, L. HydraPlus-Net: Attentive Deep Features for Pedestrian Analysis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 350–359. [Google Scholar]
- Li, L.; Yan, S.; Yu, Z.; Tao, D. Attribute-Identity Embedding and Self-Supervised Learning for Scalable Person Re-Identification. IEEE Trans. Circuits Syst. Video Technol. 2019, 30, 3472–3485. [Google Scholar] [CrossRef]
- Dai, P.; Ji, R.; Wang, H.; Wu, Q.; Huang, Y. Cross-Modality Person Re-Identification with Generative Adversarial Training. In Proceedings of the International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018; pp. 677–683. [Google Scholar]
- Wang, G.; Zhang, T.; Cheng, J.; Liu, S. RGB-Infrared Cross-Modality Person Re-Identification via Joint Pixel and Feature Alignment. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3622–3631. [Google Scholar]
- Wang, Z.; Wang, Z.; Zheng, Y.; Chuang, Y.Y.; Satoh, S.I. Learning to Reduce Dual-Level Discrepancy for Infrared-Visible Person Re-Identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 618–626. [Google Scholar]
- Li, D.; Wei, X.; Hong, X.; Gong, Y. Infrared-visible Cross-Modal Person Re-Identification with an X Modality. In Proceedings of the AAAI conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 4610–4617. [Google Scholar]
- Wei, Z.; Yang, X.; Wang, N.; Gao, X. Syncretic Modality Collaborative Learning for Visible Infrared Person Re-Identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 225–234. [Google Scholar]
- Feng, Z.; Lai, J.; Xie, X. Learning Modality-Specific Representations for Visible-Infrared Person Re-Identification. IEEE Trans. Image Process. 2019, 29, 579–590. [Google Scholar] [CrossRef] [PubMed]
- Wu, A.; Zheng, W.; Gong, S.; Lai, J. Person Re-identification by Cross-Modality Similarity Preservation. Int. J. Comput. Vis. 2020, 128, 1765–1785. [Google Scholar] [CrossRef]
- Ye, M.; Shen, J.; Crandall, D.; Shao, L.; Luo, J. Dynamic Dual-Attentive Aggregation Learning for Visible-Infrared Person Re-identification. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 229–247. [Google Scholar]
- Hao, X.; Zhao, S.; Ye, M.; Shen, J. Cross-Modality Person Re-Identification via Modality Confusion and Center Aggregation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 16383–16392. [Google Scholar]
- Lu, Y.; Wu, Y.; Liu, B.; Zhang, T.; Li, B. Cross-Modality Person Re-Identification With Shared-Specific Feature Transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 13376–13386. [Google Scholar]
- Chen, Y.; Wan, L.; Li, Z.; Jing, Q.; Sun, Z. Neural Feature Search for RGB-Infrared Person Re-Identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 587–597. [Google Scholar]
- Fu, C.; Hu, Y.; Wu, X.; Shi, H. Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 11803–11812. [Google Scholar]
- Wu, A.; Dai, P.; Chen, J.; Lin, C.; Wu, Y. Discover Cross-Modality Nuances for Visible-Infrared Person Re-Identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 4330–4339. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In Proceedings of the International Conference on Learning Representations, Vienna, Austria, 3–7 May 2021. [Google Scholar]
- Chen, C.; Ye, M.; Qi, M.; Wu, J.; Jiang, J.; Lin, C. Structure-Aware Positional Transformer for Visible-Infrared Person Re-Identification. IEEE Trans. Image Process. 2022, 31, 2352–2364. [Google Scholar] [CrossRef] [PubMed]
- Jiang, K.; Zhang, T.; Liu, X.; Qian, B.; Zhang, Y.; Wu, F. Cross-Modality Transformer for Visible-Infrared Person Re-Identification. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 480–496. [Google Scholar]
- Liang, T.; Jin, Y.; Liu, W.; Li, Y. Cross-Modality Transformer with Modality Mining for Visible-Infrared Person Re-Identification. IEEE Trans. Multimed. 2023, 1–13, Early Access. [Google Scholar]
- Zhao, J.; Wang, H.; Zhou, Y.; Yao, R.; Chen, S.; Saddik, A. Spatial-Channel Enhanced Transformer for Visible-Infrared Person Re-Identification. IEEE Trans. Multimed. 2023, 25, 3668–3680. [Google Scholar] [CrossRef]
- Lu, H.; Zou, X.; Zhang, P. Learning Progressive Modality-Shared Transformers for Effective Visible-Infrared Person Re-identification. In Proceedings of the AAAI conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; pp. 1835–1843. [Google Scholar]
- Zhang, S.; Chen, C.; Song, W.; Gan, Z. Deep Feature Learning with Attributes for Cross-Modality Person Re-Identification. J. Electronic Imaging 2020, 29, 033017. [Google Scholar] [CrossRef]
- Tarvainen, A.; Valpola, H. Mean Teachers are Better Role Models: Weight-Averaged Consistency Targets Improve Semi-Supervised Deep Learning Results. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
- Ge, Y.; Chen, D.; Li, H. Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identification. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Wu, A.; Zheng, W.; Yu, H.; Gong, S.; Lai, J. RGB-Infrared Cross-Modality Person Re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5390–5399. [Google Scholar]
- Nguyen, D.; Hong, H.; Kim, K.; Park, K. Person Recognition System Based on a Combination of Body Images from Visible Light and Thermal Cameras. Sensors 2017, 17, 605. [Google Scholar] [CrossRef]
- Deng, J.; Dong, W.; Socher, R.; Li, L.; Li, K.; Li, F. ImageNet:A Large-Scale Hierarchical Image Database. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Seokeon, C.; Sumin, L.; Youngeun, K.; Taekyung, K.; Changick, K. Hi-CMD: Hierarchical Cross-Modality Disentanglement for Visible-Infrared Person Re-Identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10254–10263. [Google Scholar]
- Liu, J.; Sun, Y.; Zhu, F.; Pei, H.; Yang, Y.; Li, W. Learning Memory-Augmented Unidirectional Metrics for Cross-modality Person Re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 19344–19353. [Google Scholar]
- Ramprasaath, R.; Michael, C.; Abhishek, D.; Ramakrishna, V.; Devi, P.; Dhruv, B. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. Int. J. Comput. Vis. 2020, 128, 336–359. [Google Scholar]
- Laurens, M.; Geoffrey, H. Visualizing Data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
- Daniel, D.; Dawei, D.; Christopher, F.; Joseph, V.; Roderic, C.; Kellie, C. MEVID: Multi-view Extended Videos with Identities for Video Person Re-Identification. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 2–7 January 2023; pp. 1634–1643. [Google Scholar]
Method | Venue | SYSU-MM01 | RegDB | ||||||
---|---|---|---|---|---|---|---|---|---|
All-Search | Indoor-Search | VIS to IR | IR to VIS | ||||||
Rank-1 | mAP | Rank-1 | mAP | Rank-1 | mAP | Rank-1 | mAP | ||
cmGAN | IJCAI-2018 | 26.97 | 27.80 | 31.63 | 42.19 | - | - | - | - |
D2RL | CVPR-2019 | 28.90 | 29.20 | - | - | 43.40 | 44.10 | - | - |
MSR | TIP-2019 | 37.35 | 38.11 | 39.64 | 50.88 | 48.43 | 48.67 | - | - |
Hi-CMD | CVPR-2020 | 34.94 | 35.94 | - | - | 70.93 | 66.04 | - | - |
AlignGAN | ICCV-2019 | 42.40 | 40.70 | 45.90 | 54.30 | 57.90 | 53.60 | 56.30 | 53.40 |
CMSP | IJCV-2020 | 43.56 | 44.98 | 48.62 | 57.50 | 65.07 | 64.50 | - | - |
cm-SSFT | CVPR-2020 | 47.70 | 54.10 | - | - | 65.40 | 65.60 | 63.80 | 64.20 |
XIV-ReID | AAAI-2020 | 49.92 | 50.73 | - | - | 62.21 | 60.18 | - | - |
DDAG | ECCV-2020 | 54.75 | 53.02 | 61.02 | 67.98 | 69.34 | 63.46 | 68.06 | 61.80 |
NFS | CVPR-2021 | 56.91 | 55.45 | 62.79 | 69.79 | 80.54 | 72.10 | 77.95 | 69.79 |
DFLN-ViT | TMM-2022 | 59.84 | 57.70 | 62.13 | 69.03 | - | - | - | - |
CM-NAS | ICCV-2021 | 61.99 | 60.02 | 67.01 | 72.95 | 84.54 | 80.32 | 82.57 | 78.31 |
CMTR | TMM-2023 | 62.58 | 61.33 | 67.02 | 73.78 | 80.62 | 74.42 | 81.06 | 73.75 |
SPOT | TIP-2022 | 65.34 | 62.25 | 69.42 | 74.63 | 80.35 | 72.46 | 79.37 | 72.26 |
MCLNet | ICCV-2021 | 65.40 | 61.98 | 72.56 | 76.58 | 80.31 | 73.07 | 75.93 | 69.49 |
SMCL | ICCV-2021 | 67.39 | 61.78 | 68.84 | 75.56 | 83.93 | 79.83 | 83.05 | 78.57 |
PMT | AAAI-2023 | 67.53 | 64.98 | 71.66 | 76.52 | 84.83 | 76.55 | 84.16 | 75.13 |
MPANet | CVPR-2021 | 70.58 | 68.24 | 76.74 | 80.95 | 83.70 | 80.90 | 82.80 | 80.70 |
MAUM | CVPR-2022 | 71.68 | 68.79 | 76.97 | 81.94 | 87.87 | 85.09 | 86.95 | 84.34 |
CMT | ECCV-2022 | 71.88 | 68.57 | 76.90 | 79.91 | 95.17 | 87.30 | 91.97 | 84.46 |
Ours | - | 73.27 | 74.57 | 78.80 | 83.68 | 93.42 | 88.61 | 92.25 | 87.02 |
Setting | SYSU-MM01 | |||
---|---|---|---|---|
All-Search | Indoor-Search | |||
Rank-1 | mAP | Rank-1 | mAP | |
Base | 48.57 | 47.36 | 67.86 | 61.55 |
Base + AIE | 59.21 | 60.95 | 70.24 | 71.48 |
Base + AIE + AEE | 70.58 | 71.66 | 75.95 | 81.84 |
Base + AIE + AEE + MCL | 73.27 | 74.57 | 78.80 | 83.68 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cheng, Z.; Fan, H.; Wang, Q.; Liu, S.; Tang, Y. Dual-Stage Attribute Embedding and Modality Consistency Learning-Based Visible–Infrared Person Re-Identification. Electronics 2023, 12, 4892. https://doi.org/10.3390/electronics12244892
Cheng Z, Fan H, Wang Q, Liu S, Tang Y. Dual-Stage Attribute Embedding and Modality Consistency Learning-Based Visible–Infrared Person Re-Identification. Electronics. 2023; 12(24):4892. https://doi.org/10.3390/electronics12244892
Chicago/Turabian StyleCheng, Zhuxuan, Huijie Fan, Qiang Wang, Shiben Liu, and Yandong Tang. 2023. "Dual-Stage Attribute Embedding and Modality Consistency Learning-Based Visible–Infrared Person Re-Identification" Electronics 12, no. 24: 4892. https://doi.org/10.3390/electronics12244892
APA StyleCheng, Z., Fan, H., Wang, Q., Liu, S., & Tang, Y. (2023). Dual-Stage Attribute Embedding and Modality Consistency Learning-Based Visible–Infrared Person Re-Identification. Electronics, 12(24), 4892. https://doi.org/10.3390/electronics12244892