Dual Branch Attention Network for Person Re-Identification
Abstract
:1. Introduction
- We design a residual attention block (CPSA) to extract channel, position and spatial features in different dimensions. It can capture key fine-grained information, such as bags and shoes. We observe that these three blocks are complementary and can capture the vital information of the input image adaptively.
- We notice that the network with multi-branches was more effective in learning robust features than single ones. Specifically, we not only use the global network branch but also introduce another branch after the fourth layer of ResNet-50. Each branch utilizes generalized mean pooling (GeM) [17]. The double branch strategy is significant for fusing global and local vital information.
- Based on ID loss, we use complementary triplet loss and WRL loss to optimize DBA-Net. Each branch employs the same loss function. Triplet loss and WRL loss can enhance the intra-class compactness and inter-class separability in the Euclidean space.
2. Related Work
2.1. Attention Mechanisms in Person Re-ID
2.2. Loss Function for DBA-Net
3. Dual Branch Attention Network
3.1. Architecture of DBA-Net
3.2. CPSA Module
3.3. Dual Branch with Generalized Mean Pooling
3.4. Loss Function
4. Experiment
4.1. Dataset Description
4.2. Hyper-Parameter Experimental Analysis
4.3. Comparison with State-of-the-Art Methods
4.4. Ablation Experiment
4.5. CPS-Attention
5. Visualization and Analysis
5.1. Loss and CMC Curves
5.2. Visualization
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Ye, M.; Shen, J.; Lin, G.; Xiang, T.; Shao, L.; Hoi, S.C.H. Deep learning for person re-identification: A survey and outlook. IEEE Trans. Pattern Anal. Mach. Intell. 2021. [Google Scholar] [CrossRef] [PubMed]
- Li, W.; Zhao, R.; Xiao, T.; Wang, X. DeepReID: Deep Filter Pairing Neural Network for Person Re-identification. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014. [Google Scholar]
- Schneider, S.; Taylor, G.W.; Kremer, S.C. Similarity Learning Networks for Animal Individual Re-Identification—Beyond the Capabilities of a Human Observer. In Proceedings of the 2020 IEEE Winter Applications of Computer Vision Workshops (WACVW), Snowmass, CO, USA, 1–5 March 2020. [Google Scholar]
- Yang, W.; Huang, H.; Zhang, Z.; Chen, X.; Huang, K.; Zhang, S. Towards Rich Feature Discovery with Class Activation Maps Augmentation for Person Re-Identification. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- Chen, B.; Deng, W.; Hu, J. Mixed High-Order Attention Network for Person Re-Identification. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019. [Google Scholar]
- Chen, T.; Ding, S.; Xie, J.; Yuan, Y.; Chen, W.; Yang, Y.; Ren, Z.; Wang, Z. ABD-Net: Attentive but Diverse Person Re-Identification. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019. [Google Scholar]
- Si, J.; Zhang, H.; Li, C.-G.; Kuen, J.; Kong, X.; Kot, A.C.; Wang, G. Dual Attention Matching Network for Context-Aware Feature Sequence Based Person Re-Identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Li, W.; Zhu, X.; Gong, S. Harmonious Attention Network for Person Re-Identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Luo, H.; Jiang, W.; Gu, Y.; Liu, F.; Liao, X.; Lai, S.; Gu, J. A Strong Baseline and Batch Normalization Neck for Deep Person Re-Identification. IEEE Trans. Multimed. 2019, 22, 2597–2609. [Google Scholar] [CrossRef] [Green Version]
- Tay, C.P.; Roy, S.; Yap, K.H. AANet: Attribute Attention Network for Person Re-Identifications. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- Luo, W.; Li, Y.; Urtasun, R.; Zemel, R. Understanding the Effective Receptive Field in Deep Convolutional Neural Networks. In Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016. [Google Scholar]
- Zhang, Z.; Lan, C.; Zeng, W.; Jin, X.; Chen, Z. Relation-Aware Global Attention for Person Re-Identification. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual Conference, 14–19 June 2020; Available online: http://cvpr2020.thecvf.com/ (accessed on 27 August 2021).
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Chen, X.; Fu, C.; Zhao, Y.; Zheng, F.; Song, J.; Ji, R.; Yang, Y. Salience-Guided Cascaded Suppression Network for Person Re-Identification. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual Conference, 14–19 June 2020; Available online: http://cvpr2020.thecvf.com/ (accessed on 27 August 2021).
- Wang, X.; Hua, Y.; Kodirov, E.; Hu, G.; Garnier, R.; Robertson, N.M. Ranked List Loss for Deep Metric Learning. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- Gong, Y.; Wang, L.; Li, Y.; Du, A. A Discriminative Person Re-Identification Model With Global-Local Attention and Adaptive Weighted Rank List Loss. IEEE Access 2020, 8, 203700–203711. [Google Scholar] [CrossRef]
- Radenovic, F.; Tolias, G.; Chum, O. Fine-tuning CNN Image Retrieval with No Human Annotation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 41, 1655–1668. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zheng, L.; Shen, L.; Tian, L.; Wang, S.; Wang, J.; Tian, Q. Scalable Person Re-identification: A Benchmark. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Ristani, E.; Solera, F.; Zou, R.; Cucchiara, R.; Tomasi, C. Performance Measures and a Data Set for Multi-target, Multi-camera Tracking. In Proceedings of the European Conference on Computer Vision (ECCV)—2016 Workshops, Amsterdam, The Netherlands, 8–16 October 2016. [Google Scholar]
- Wang, G.; Yuan, Y.; Li, J.; Ge, S.; Zhou, X. Receptive Multi-Granularity Representation for Person Re-Identification. IEEE Trans. Image Process. 2020, 29, 6096–6109. [Google Scholar] [CrossRef] [PubMed]
- Zheng, Z.; Yang, X.; Yu, Z.; Zheng, L.; Yang, Y.; Kautz, J. Joint Discriminative and Generative Learning for Person Re-Identification. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- Lu, Y.; Wu, Y.; Liu, B.; Zhang, T.; Li, B.; Chu, Q.; Yu, N. Cross-Modality Person Re-Identification With Shared-Specific Feature Transfer. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual Conference, 14–19 June 2020; Available online: http://cvpr2020.thecvf.com/ (accessed on 27 August 2021).
- Hou, R.; Ma, B.; Chang, H.; Gu, X.; Shan, S.; Chen, X. Interaction-And-Aggregation Network for Person Re-Identification. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- Zheng, F.; Deng, C.; Sun, X.; Jiang, X.; Guo, X.; Yu, Z.; Huang, F.; Ji, R. Pyramidal Person Re-IDentification via Multi-Loss Dynamic Training. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- Sun, Y.; Zheng, L.; Yang, Y.; Tian, Q.; Wang, S. Beyond Part Models: Person Retrieval with Refined Part Pooling (and A Strong Convolutional Baseline). In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Wang, C.; Zhang, Q.; Huang, C.; Liu, W.; Wang, X. Mancs: A Multi-task Attentional Network with Curriculum Sampling for Person Re-Identification. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Xia, B.; Gong, Y.; Zhang, Y.; Poellabauer, C. Second-Order Non-Local Attention Networks for Person Re-Identification. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019. [Google Scholar]
- Chen, G.; Lin, C.; Ren, L.; Lu, J.; Zhou, J. Self-Critical Attention Learning for Person Re-Identification. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019. [Google Scholar]
- Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local Neural Networks. Available online: https://arxiv.org/abs/1711.07971 (accessed on 27 August 2021).
- Cao, Y.; Xu, J.; Lin, S.; Wei, F.; Hu, H. GCNet: Non-Local Networks Meet Squeeze-Excitation Networks and Beyond. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Korea, 27–28 October 2019. [Google Scholar]
- Fu, J.; Liu, J.; Jiang, J.; Li, Y.; Bao, Y.; Lu, H. Scene Segmentation With Dual Relation-Aware Attention Network. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 2547–2560. [Google Scholar] [CrossRef] [PubMed]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Zheng, Z.; Zheng, L.; Yang, Y. A Discriminatively Learned CNN Embedding for Person Re-Identification. ACM Trans. Multimed. Comput. Commun. Appl. 2018, 14, 1–20. [Google Scholar] [CrossRef]
- Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Li, F.-F. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009. [Google Scholar]
- Liu, H.; Feng, J.; Qi, M.; Jiang, J.; Yan, S. End-to-End Comparative Attention Networks for Person Re-Identification. IEEE Trans. Image Process. A Publ. IEEE Signal Process. Soc. 2017, 26, 3492–3506. [Google Scholar] [CrossRef] [Green Version]
- Zhou, S.; Wang, F.; Huang, Z.; Wang, J. Discriminative Feature Learning With Consistent Attention Regularization for Person Re-Identification. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019. [Google Scholar]
- Li, K.; Ding, Z.; Li, K.; Zhang, Y.; Fu, Y. Support Neighbor Loss for Person Re-Identification. In Proceedings of the 26th ACM International Conference on Multimedia, Seoul, Korea, 22–26 October 2018. [Google Scholar]
- Zeng, M.; Tian, C.; Wu, Z. Person Re-identification with Hierarchical Deep Learning Feature and efficient XQDA Metric. In Proceedings of the 2018 ACM Multimedia Conference, New York, NY, USA, 22–26 October 2018. [Google Scholar]
- Wang, G.; Yuan, Y.; Chen, X.; Li, J.; Zhou, X. Learning Discriminative Features with Multiple Granularities for Person Re-Identification. In Proceedings of the 26th ACM International Conference on Multimedia, Seoul, Korea, 22–26 October 2018. [Google Scholar]
- Song, C.; Huang, Y.; Ouyang, W.; Wang, L. Mask-Guided Contrastive Attention Model for Person Re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Wang, Y.; Wang, L.; You, Y.; Zou, X.; Chen, V.; Li, S.; Huang, G.; Hariharan, B.; Weinberger, K.Q. Resource Aware Person Re-identification across Multiple Resolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Zhong, Z.; Zheng, L.; Kang, G.; Li, S.; Yang, Y. Random Erasing Data Augmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
- Gu, H.; Fu, G.; Li, J.; Zhu, J. Auto-ReID+: Searching For A Multi-branch ConvNet For Person Re-Identification. Neurocomputing 2021, 435, 53–66. [Google Scholar] [CrossRef]
- Jiao, S.; Pan, Z.; Hu, G.; Shen, Q.; Du, L.; Chen, Y.; Wang, J. Multi-scale and multi-branch feature representation for person re-identification—ScienceDirect. Neurocomputing 2020, 414, 120–130. [Google Scholar] [CrossRef]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
Datasets | Train ID | Images | Test ID | Images | Sum | Cameras |
---|---|---|---|---|---|---|
Market-1501 | 751 | 12,936 | 750 | 19,732 | 32,668 | 6 |
DukeMTMC-ReID | 702 | 16,522 | 702 | 19,889 | 36,411 | 8 |
Cuhk-03 | 767 | 7356 | 700 | 6732 | 14,088 | 2 |
Epoch | Lr | Epoch | Lr |
---|---|---|---|
1–10 | 5.0 × × 0.1 t | ||
11–50 | 5.0 × | 161–210 | 4.0 × |
51–90 | 1.0 × | 211–250 | 8.0 × |
91–160 | 2.0 × | 251–300 | 1.6 × |
Methods | Rank-1 | mAP |
---|---|---|
AANet [9] (CVPR2019) | 93.9 | 83.4 |
IANet [23] (CVPR2019) | 94.4 | 83.1 |
SONA [27] (CVPR2019) | 95.6 | 88.8 |
AGW [1] (arXiv2020) | 95.1 | 87.8 |
ABD-Net [6] (CVPR2019) | 95.6 | 88.3 |
MGN [20] (ACM2018) | 95.7 | 86.9 |
CAM [4] (CVPR2019) | 94.7 | 84.5 |
GD-Net [21] (CVPR2019) | 94.8 | 86.0 |
BagTricks [10] (CVPR2019) | 94.5 | 85.9 |
GLWR [16] (IEEE Access 2020) | 95.5 | 88.5 |
Pyramid [24] (CVPR2019) | 95.7 | 88.2 |
Auto-ReID+ [44] (Neurocomputing2021) | 95.8 | 88.2 |
SGSN(4 stages) [16] (CVPR2020) | 95.7 | 88.5 |
Ms-Mb [45] (Neurocomputing2020) | 95.8 | 88.9 |
DBA-Net | 95.9 | 90.3 |
Methods | Rank-1 | mAP |
---|---|---|
AANet [9] (CVPR2019) | 87.7 | 74.3 |
IANet [23] (CVPR2019) | 83.1 | 73.4 |
SONA [27] (CVPR2019) | 89.5 | 78.3 |
AGW [1] (arXiv2020) | 89.0 | 79.6 |
ABD-Net [6] (CVPR2019) | 89.0 | 78.6 |
MGN [20] (ACM2018) | 88.7 | 78.4 |
CAM [4] (CVPR2019) | 85.8 | 72.9 |
GD-Net [21] (CVPR2019) | 86.6 | 74.8 |
BagTricks [10] (CVPR2019) | 86.4 | 76.4 |
SCAL (spatial) [28] (ICCV2019) | 89.0 | 79.6 |
SCAL (channel) [28] (ICCV2019) | 88.9 | 79.1 |
GLWR [16] (IEEE Access 2020) | 90.7 | 81.4 |
Pyramid [24] (CVPR2019) | 89.0 | 79.0 |
Auto-ReID+ [44] (Neurocomputing2021) | 90.1 | 80.1 |
SGSN(4 stages) [16] (CVPR2020) | 91.0 | 79.0 |
Ms-Mb [45] (Neurocomputing2020) | 90.8 | 82.2 |
DBA-Net | 92.1 | 83.0 |
Methods | Rank-1 | mAP |
---|---|---|
SONA [27] (CVPR2019) | 79.9 | 77.3 |
AGW [1] (arXiv2020) | 63.6 | 62.0 |
RAG-SC [12] (CVPR2020)) | 79.6 | 74.5 |
MGN [20] (ACM2018) | 68.0 | 66.0 |
CAM [4] (CVPR2019) | 66.6 | 64.2 |
BagTricks [10] (CVPR2019) | 58.8 | 56.6 |
SCAL (channel) [28] (ICCV2019) | 71.1 | 68.6 |
GLWR [16] (IEEE Access 2020) | 82.3 | 78.9 |
Pyramid [24] (CVPR2019) | 78.9 | 74.8 |
Auto-ReID+ [45] (Neurocomputing2021) | 78.1 | 74.2 |
Ms-Mb [44] (Neurocomputing2020) | 75.4 | 72.9 |
SGSN(4 stages) [16] (CVPR2020) | 84.7 | 81.0 |
DBA-Net | 86.4 | 83.2 |
Market-1501 | DukeMTMC-ReID | Cuhk-03 | ||||
---|---|---|---|---|---|---|
Rank-1 | mAP | Rank-1 | mAP | Rank-1 | mAP | |
B | 93.7 | 83.9 | 85.6 | 74.8 | 60.2 | 55.0 |
+GeM | 94.1 | 84.5 | 86.8 | 75.4 | 62.1 | 56.4 |
+Triplet | 95.1 | 87.9 | 89.9 | 80.0 | 63.8 | 62.7 |
+WRLL | 95.3 | 89.6 | 90.7 | 80.8 | 83.3 | 80.9 |
+CPSA | 95.6 | 90.0 | 90.9 | 82.0 | 85.2 | 82.4 |
DBA-Net | 95.9 | 90.3 | 92.1 | 83.0 | 86.4 | 83.2 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Fan, D.; Wang, L.; Cheng, S.; Li, Y. Dual Branch Attention Network for Person Re-Identification. Sensors 2021, 21, 5839. https://doi.org/10.3390/s21175839
Fan D, Wang L, Cheng S, Li Y. Dual Branch Attention Network for Person Re-Identification. Sensors. 2021; 21(17):5839. https://doi.org/10.3390/s21175839
Chicago/Turabian StyleFan, Denghua, Liejun Wang, Shuli Cheng, and Yongming Li. 2021. "Dual Branch Attention Network for Person Re-Identification" Sensors 21, no. 17: 5839. https://doi.org/10.3390/s21175839
APA StyleFan, D., Wang, L., Cheng, S., & Li, Y. (2021). Dual Branch Attention Network for Person Re-Identification. Sensors, 21(17), 5839. https://doi.org/10.3390/s21175839