A Patch Information Supplement Transformer for Person Re-Identification
Abstract
:1. Introduction
- We designed a PPN structure and applied it with transformer for person re-ID, solving the global perturbation caused by spatial dimension segmentation and poor fine-grained features.
- We provided an approach to identity information embedding, encoding identity information through learnable embeddings. Thus, effectively addressing the problem of learned feature bias.
2. Related Work
2.1. Auxiliary Feature Representation Learning
2.2. Visual Transformer
3. Methodology
3.1. Baseline
3.2. Patch Pyramid Network
3.3. Identity Information Embedding Module
4. Experiments
4.1. Datasets
4.2. Experimental Strategy and Experimental Environment
4.3. Results of Backbones
4.4. Ablation Study of PPN
4.5. Ablation Study of IDE
4.6. Ablation Study of PIT
4.7. Comparison with State-of-the-Art Approaches
4.8. The Matching Visualization Results
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Zheng, W.S.; Gong, S.; Xiang, T. Associating Groups of People. In Proceedings of the British Machine Vision Conference, London, UK, 7–10 September 2009. [Google Scholar] [CrossRef]
- Hirzer, M.; Beleznai, C.; Roth, P.M.; Bischof, H. Person re-identification by descriptive and discriminative classification. In Proceedings of the Image Analysis: 17th Scandinavian Conference, SCIA 2011, Ystad, Sweden, 9 May 2011; pp. 91–102. [Google Scholar]
- Khorramshahi, P.; Peri, N.; Chen, J.C.; Chellappa, R. The devil is in the details: Self-supervised attention for vehicle re-identification. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; pp. 369–386. [Google Scholar]
- Sun, Y.; Zheng, L.; Yang, Y.; Tian, Q.; Wang, S. Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In Proceedings of the European Conference on Vomputer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 480–496. [Google Scholar]
- Zhang, Z.; Lan, C.; Zeng, W.; Jin, X.; Chen, Z. Relation-aware global attention for person re-identification. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020; pp. 3186–3195. [Google Scholar]
- Luo, W.; Li, Y.; Urtasun, R.; Zeme, T. Understanding the effective receptive field in deep convolutional neural networks. arXiv 2016, arXiv:1701.04128. [Google Scholar]
- Luo, H.; Gu, Y.; Liao, X.; Lai, S.; Jiang, W. Bag of tricks and a strong baseline for deep person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–17 June 2019. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Wang, C.; Zhang, Q.; Huang, C.; Liu, W.; Wang, X. Mancs: A multi-task attentional network with curriculum sampling for person re-identification. In Proceedings of the European conference on computer vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 365–381. [Google Scholar]
- Yao, H.; Zhang, S.; Hong, R.; Zhang, Y.; Xu, C.; Tian, Q. Deep representation learning with part loss for person re-identification. IEEE Trans. Image Process. 2019, 28, 2860–2871. [Google Scholar] [CrossRef] [PubMed]
- Luo, H.; Jiang, W.; Zhang, X.; Fan, X.; Qian, J.; Zhang, C. AlignedReID++: Dynamically matching local information for person re-identification. Pattern Recognit. 2019, 94, 53–61. [Google Scholar] [CrossRef]
- Zhang, X.; Luo, H.; Fan, X.; Xiang, W.; Sun, Y.; Xiao, W.; Jiang, W.; Zhang, C.; Sun, J. Alignedreid: Surpassing human-level performance in person re-identification. arXiv 2017, arXiv:1711.08184. [Google Scholar]
- Zhuang, Z.; Wei, L.; Xie, L.; Zhang, T.; Zhang, H.; Wu, H.; Ai, H.; Tian, Q. Rethinking the distribution gap of person re-identification with camera-based batch normalization. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 140–157. [Google Scholar]
- Sun, X.; Zheng, L. Dissecting person re-identification from the viewpoint of viewpoint. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 608–617. [Google Scholar]
- Wang, G.; Lai, J.; Huang, P.; Xie, X. Spatial-temporal person re-identification. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 8933–8940. [Google Scholar]
- Ye, M.; Shen, J.; Lin, G.; Xiang, T.; Shao, L.; Hoi, S.C. Deep learning for person re-identification: A survey and outlook. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 2872–2893. [Google Scholar] [CrossRef] [PubMed]
- Zheng, L.; Shen, L.; Tian, L.; Wang, S.; Wang, J.; Tian, Q. Scalable person re-identification: A benchmark. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1116–1124. [Google Scholar]
- Ristani, E.; Solera, F.; Zou, R.; Cucchiara, R.; Tomasi, C. Performance measures and a data set for multi-target, multi-camera tracking. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–10 October 2016; 15–16 October 2016; pp. 17–35. [Google Scholar]
- Miao, J.; Wu, Y.; Liu, P.; Ding, Y.; Yang, Y. Pose-guided feature alignment for occluded person re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 542–551. [Google Scholar]
- Wei, L.; Zhang, S.; Gao, W.; Tian, Q. Person transfer gan to bridge domain gap for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 79–88. [Google Scholar]
- Chang, X.; Hospedales, T.M.; Xiang, T. Multi-level factorisation net for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2109–2118. [Google Scholar]
- Lin, J.; Ren, L.; Lu, J.; Feng, J.; Zhou, J. Consistent-aware deep learning for person re-identification in a camera network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5771–5780. [Google Scholar]
- Sarfraz, M.S.; Schumann, A.; Eberle, A.; Stiefelhagen, R. A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5771–5780. [Google Scholar]
- Liu, Y.; Li, D.; Wan, S.; Wang, F.; Dou, W.; Xu, X.; Li, S.; Ma, R.; Qi, L. A long short-term memory-based model for greenhouse climate prediction. Int. J. Intell. Syst. 2022, 37, 135–151. [Google Scholar] [CrossRef]
- Liu, Y.; Song, Z.; Xu, X.; Rafique, W.; Zhang, X.; Shen, J. Bidirectional GRU networks-based next POI category prediction for healthcare. Int. J. Intell. Syst. 2022, 37, 4020–4040. [Google Scholar] [CrossRef]
- Zhong, Z.; Zheng, L.; Kang, G.; Li, S.; Yang, Y. Random erasing data augmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 13001–13008. [Google Scholar]
- Li, Y.; He, J.; Zhang, T.; Liu, X.; Zhang, Y.; Wu, F. Diverse part discovery: Occluded person re-identification with part-aware transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 2898–2907. [Google Scholar]
- Liu, X.; Zhao, H.; Tian, M.; Sheng, L.; Shao, J.; Yi, S.; Yan, J.; Wang, X. Hydraplus-net: Attentive deep features for pedestrian analysis. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 350–359. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on COMPUTER Vision, Glasgow, UK, 23–28 August 2020; pp. 213–229. [Google Scholar]
- Jiang, Y.; Chang, S.; Wang, Z. Transgan: Two transformers can make one strong gan. arXiv 2021, arXiv:2102.07074. [Google Scholar]
- Li, X.; Hou, Y.; Wang, P.; Gao, Z.; Xu, M.; Li, W. Trear: Transformer-based rgb-d egocentric action recognition. IEEE Trans. Cogn. Dev. Syst. 2021, 14, 246–252. [Google Scholar] [CrossRef]
- He, S.; Luo, H.; Wang, P.; Wang, F.; Li, H.; Jiang, W. Transreid: Transformer-based object re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 15013–15022. [Google Scholar]
- Wang, G.A.; Yang, S.; Liu, H.; Wang, Z.; Yang, Y.; Wang, S.; Yu, G.; Zhou, E.; Sun, J. High-order information matters: Learning relation and topology for occluded person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 6449–6458. [Google Scholar]
- Chen, H.; Lagadec, B.; Bremond, F. Ice: Inter-instance contrastive encoding for unsupervised person re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 14960–14969. [Google Scholar]
- Zhu, K.; Guo, H.; Zhang, S.; Wang, Y.; Huang, G.; Qiao, H.; Liu, J.; Wang, J.; Tang, M. Aaformer: Auto-aligned transformer for person re-identification. arXiv 2021, arXiv:2104.00921. [Google Scholar]
- Jia, M.; Cheng, X.; Lu, S.; Zhang, J. Learning disentangled representation implicitly via transformer for occluded person re-identification. arXiv 2022, arXiv:2107.02380. [Google Scholar] [CrossRef]
- Ge, Y.; Zhu, F.; Chen, D.; Zhao, R. Self-paced contrastive learning with hybrid memory for domain adaptive object re-id. Adv. Neural Inf. Process. Syst. 2020, 33, 11309–11321. [Google Scholar]
- Wang, Z.; Zhu, F.; Tang, S.; Zhao, R.; He, L.; Song, J. Feature Erasing and Diffusion Network for Occluded Person Re-Identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 4754–4763. [Google Scholar]
- Zhang, X.; Li, D.; Wang, Z.; Wang, J.; Ding, E.; Shi, J.Q.; Zhang, Z.; Wang, J. Implicit Sample Extension for Unsupervised Person Re-Identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 7369–7378. [Google Scholar]
Dataset | #ID | #Cam | #Image | #Query | #Gallery |
---|---|---|---|---|---|
Market-1501 | 1501 | 6 | 32,668 | 3368 | 15,913 |
Duke-MTMC | 1812 | 8 | 36,411 | 2228 | 17,661 |
MSMT17 | 4101 | 15 | 126,441 | 11,659 | 82,161 |
Occluded-Duke | 1812 | 8 | 15,618 | 2210 | 17,661 |
Backbone | Market-1501 | Duke-MTMC | Occluded-Duke | |||
---|---|---|---|---|---|---|
mAP | Rank-1 | mAP | Rank-1 | mAP | Rank-1 | |
SqueezeNet | 30.7 | 49.1 | 21 | 36.3 | 6.7 | 11.2 |
ShuffleNet | 34.1 | 56.9 | 25.2 | 41.1 | 9.5 | 14.8 |
DenseNet | 62.5 | 83.9 | 61.6 | 73.6 | 28.8 | 42.1 |
ResNet50 | 67.9 | 84.5 | 54.3 | 73 | 31.1 | 42.7 |
ResNet101 | 68.7 | 84.9 | 59.7 | 76.8 | 38.4 | 41 |
SEResNet50 | 73.7 | 88.4 | 60.1 | 77.6 | 32.7 | 42.8 |
SEResNet101 | 75.1 | 89.3 | 63 | 80.2 | 35.9 | 45.3 |
VIT-B/16 | 86.7 | 94.5 | 79.4 | 88.4 | 52.6 | 60.2 |
Backbone | Extraction Module | mAP | Rank-1 | Rank-5 | Rank-10 | ||||
---|---|---|---|---|---|---|---|---|---|
Baseline | √ | 60.9 | 81 | 90.2 | 92.9 | ||||
+PPN | √ | √ | √ | 61.2 | 81.2 | 90.2 | 92.9 | ||
+PPN | √ | √ | √ | √ | 61.9 | 81.8 | 90.3 | 93.0 | |
+PPN | √ | √ | √ | √ | 61.7 | 81.5 | 90.2 | 92.9 | |
+PPN | √ | √ | √ | √ | √ | 62.5 | 82.1 | 90.5 | 93.1 |
Method | mAP | Rank-1 | Rank-5 | Rank-10 |
---|---|---|---|---|
Baseline | 86.7 | 94.5 | 98.2 | 99.0 |
Baseline + IDE | 88.4 | 95.3 | 98.5 | 99.1 |
Backbone | PPN | IDE | mAP | Rank-1 | Rank-5 | Rank-10 |
---|---|---|---|---|---|---|
Baseline | × | × | 60.9 | 81 | 90.2 | 92.9 |
√ | × | 62.5 | 82.1 | 90.5 | 93.1 | |
× | √ | 62.8 | 81.9 | 90.4 | 92.9 | |
PIT | √ | √ | 63.1 | 82.3 | 90.8 | 93.1 |
Methods | Market-1501 | Duke-MTMC | MSMT17 | Occluded-Duke | ||||
---|---|---|---|---|---|---|---|---|
mAP | Rank-1 | mAP | Rank-1 | mAP | Rank-1 | mAP | Rank-1 | |
PCB(2018) [4] | 81.6 | 93.8 | 69.2 | 83.3 | - | - | 33.7 | 42.6 |
AlignedReID (2017) [12] | 79.3 | 91.2 | - | - | - | - | 37.3 | 51.4 |
PGFA (2019) [19] | 76.8 | 91.2 | 65.5 | 82.6 | - | - | 43.8 | 55.1 |
HONet(2020) [33] | 84.9 | 94.2 | 75.6 | 86.9 | - | - | 52.3 | 62.8 |
ICE (2021) [34] | 86.6 | 95.1 | 76.5 | 88.2 | 50.4 | 76.4 | - | - |
Transreid (2021) [32] | 88.0 | 94.7 | 81.2 | 90.1 | 63.9 | 82.7 | 55.6 | 62.8 |
PAT (2021) [27] | 88.0 | 95.4 | 78.2 | 88.8 | - | - | 53.6 | 64.5 |
AAformer (2021) [35] | 87.7 | 95.4 | 80.0 | 90.1 | 63.2 | 83.6 | 58.2 | 67.0 |
DRL-Net (2022) [36] | 86.9 | 94.7 | 76.6 | 88.1 | 55.3 | 78.4 | 50.8 | 65.0 |
SpCL (2022) [37] | 76.7 | 90.3 | 68.8 | 82.9 | 26.8 | 53.7 | - | - |
FED (2022) [38] | 85.9 | 94.8 | 78.0 | 89.4 | - | - | 55.8 | 67.4 |
ISE (2022) [39] | 87.8 | 95.6 | - | - | 51.0 | 76.8 | - | - |
Baseline | 86.7 | 94.5 | 79.4 | 88.4 | 60.9 | 81 | 52.6 | 60.2 |
Ours | 88.8 | 95.6 | 81.3 | 90.4 | 63.1 | 82.3 | 54.8 | 61.7 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhu, L.; Jiang, C.; Wu, M. A Patch Information Supplement Transformer for Person Re-Identification. Electronics 2023, 12, 1997. https://doi.org/10.3390/electronics12091997
Zhu L, Jiang C, Wu M. A Patch Information Supplement Transformer for Person Re-Identification. Electronics. 2023; 12(9):1997. https://doi.org/10.3390/electronics12091997
Chicago/Turabian StyleZhu, Li, Chenglong Jiang, and Minghu Wu. 2023. "A Patch Information Supplement Transformer for Person Re-Identification" Electronics 12, no. 9: 1997. https://doi.org/10.3390/electronics12091997
APA StyleZhu, L., Jiang, C., & Wu, M. (2023). A Patch Information Supplement Transformer for Person Re-Identification. Electronics, 12(9), 1997. https://doi.org/10.3390/electronics12091997