Person Search via Deep Integrated Networks
Abstract
:1. Introduction
2. Related Work
3. Overview of the Deep Integrated Networks for Person Search
3.1. Pedestrian Detection Network
3.1.1. Sharing Convolutional Layers
3.1.2. Region Proposal Network
3.1.3. ROI Pooling and Classification Layer
3.2. Discriminative Feature Extraction via the Multi-Class Identification Network
3.3. Similarity Measure with a Learnable Distance Metric
4. Experimental Results
4.1. Experimental Setup and Dataset
4.2. Experiments with Various Features
4.3. Performance Metrics and Experimental Results
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Gan, W.; Lin, J.C.W.; Fournier-Viger, P.; Chao, H.C.; Yu, P.S. HUOPM: High-utility occupancy pattern mining. IEEE Trans. Cybern. 2019, 1–14. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lin, J.C.W.; Yang, L.; Fournier-Viger, P.; Hong, T.P. Mining of skyline patterns by considering both frequent and utility constraints. Eng. Appl. Artif. Intell. 2019, 77, 229–238. [Google Scholar] [CrossRef]
- Gan, W.; Lin, J.C.W.; Fournier-Viger, P.; Chao, H.C.; Yu, P.S. A survey of parallel sequential pattern mining. ACM Trans. Knowl. Discov. Data (TKDD) 2019, 13, 1–34. [Google Scholar] [CrossRef]
- Zou, Z.; Shi, Z.; Guo, Y.; Ye, J. Object detection in 20 years: A survey. arXiv 2019, arXiv:1905.05055. [Google Scholar]
- Bouindour, S.; Snoussi, H.; Hittawe, M.M.; Tazi, N.; Wang, T. An on-line and adaptive method for detection abnormal events in videos using spatio-temporal convent. Appl. Sci. 2019, 9, 757. [Google Scholar] [CrossRef] [Green Version]
- Wang, M.; Deng, W. Deep face recognition: A survey. arXiv 2019, arXiv:1804.06655. [Google Scholar]
- Xu, Y.; Ma, B.; Huang, R.; Lin, L. Person search in a scene by jointly modeling people commonness and person uniqueness. In Proceedings of the 22nd ACM International Conference on Multimedia, Orlando, FL, USA, 3–7 November 2014. [Google Scholar]
- Lan, X.; Zhu, X.; Gong, S. Person search by multi-scale matching. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2018; pp. 536–552. [Google Scholar]
- Xiao, T.; Li, S.; Wang, B.; Lin, L.; Wang, X. Joint detection and identification feature learning for person search. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 3415–3424. [Google Scholar]
- Buda, M.; Maki, A.; Mazurowski, M.A. A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw. 2018, 106, 249–259. [Google Scholar] [CrossRef] [Green Version]
- Liao, S.; Hu, Y.; Zhu, X.; Li, S.Z. Person re-identification by local maximal occurrence representation and metric learning. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 2197–2206. [Google Scholar]
- Koestinger, M.; Hirzer, M.; Wohlhart, P.; Roth, P.M.; Bischof, H. Large scale metric learning from equivalence constraints. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 2288–2295. [Google Scholar]
- Li, W.; Zhao, R.; Xiao, T.; Wang, X. Deepreid: Deep filter pairing neural network for person re-identification. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 152–159. [Google Scholar]
- Ahmed, E.; Jones, M.; Marks, T.K. An improved deep learning architecture for person re-identificatio. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3908–3916. [Google Scholar]
- Yi, D.; Lei, Z.; Liao, S.; Li, S.Z. Deep metric learning for person re-identification. In Proceedings of the 2014 22nd International Conference on Pattern Recognition, Stockholm, Sweden, 24–28 August 2014; pp. 34–39. [Google Scholar]
- Krizhevsky, I.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Neural Inf. Process. Syst. 2012. [Google Scholar] [CrossRef]
- Hoang, T.; Do, T.; Tan, D.; Cheung, N. Selective deep convolutional features for image retrieval. In Proceedings of the 25th ACM International Conference on Multimedia, Mountain View, CA, USA, 23–27 October 2017. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Glasmachers, T. Limits of end-to-end learning. Mach. Learn. Res. 2017, 77, 17–32. [Google Scholar]
- Varior, R.R.; Haloi, M.; Wang, G. Gated Siamese convolutional neural network architecture for human reidentification. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2016. [Google Scholar]
- Xiao, T.; Li, H.; Ouyang, W.; Wang, X. Learning deep feature representations with domain guided dropout for person re-identification. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Zheng, L.; Yang, Y.; Hauptmann, A.G. Person reidentification: Past, present and future. arXiv 2016, arXiv:1610.02984. [Google Scholar]
- Zheng, Z.; Zheng, L.; Yang, Y. Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
- Zhuo, J.; Chen, Z.; Lai, J.; Wang, G. Occluded person reidentification. arXiv 2018, arXiv:1804.02792. [Google Scholar]
- Wang, Y.; Wang, L.; You, Y.; Zou, X.; Chen, V.; Li, S.; Huang, G.; Hariharan, B.; Weinberger, K.Q. Resource aware person re-identification across multiple resolutions. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8042–8051. [Google Scholar]
- Zhong, Z.; Zheng, L.; Kang, G.; Li, S.; Yang, Y. Random erasing data augmentation. arXiv 2017, arXiv:1708.04896. [Google Scholar]
- Li, D.; Chen, X.; Zhang, Z.; Huang, K. Learning deep context-aware features over body and latent parts for person re-identification. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Su, C.; Li, J.; Zhang, S.; Xing, J.; Gao, W.; Tian, Q. Pose-driven deep convolutional model for person re-identification. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
- Zhao, L.; Li, X.; Wang, J.; Zhuang, Y. Deeply-learned part-aligned representations for person re-identification. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
- McLaughlin, N.; del Rincon, J.M.; Miller, P. Data augmentation for reducing dataset bias in person reidentification. In Proceedings of the 2015 12th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Karlsruhe, Germany, 25–28 August 2015. [Google Scholar]
- Radford, A.; Metz, L.; Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks. In Proceedings of the International Conference on Learning Representations, San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- Gray, D.; Brennan, S.; Tao, H. Evaluating appearance models for recognition, reacquisition, and tracking. Int. Workshop Perform. Eval. Track. Surveill. 2007, 3, 1–7. [Google Scholar]
- Hirzer, M.; Beleznai, C.; Roth, P.M.; Bischof, H. Person re-identification by descriptive and discriminative classification. In Image Analysis; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
- Li, W.; Wang, X. Locally aligned feature transforms across views. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
- Yu, F.; Li, W.; Li, Q.; Liu, Y.; Shi, X.; Yan, J. Poi: Multiple object tracking with high performance detection and appearance feature. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2016; pp. 36–42. [Google Scholar]
- Felzenszwalb, P.F.; Girshick, R.B.; McAllester, D.; Ramanan, D. Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 1627–1645. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Liu, H.; Feng, J.; Jie, Z.; Jayashree, K.; Zhao, B.; Qi, M.; Jiang, J.; Yan, S. Neural person search machines. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
- Zheng, W.S.; Gong, S.; Xiang, T. Re-identification by relative distance comparison. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 653–668. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Davis, J.V.; Kulis, B.; Jain, P.; Sra, S.; Dhillon, I.S. Information-theoretic metric learning. In Proceedings of the 24th International Conference on Machine Learning, Corvalis, OR, USA; 2007; pp. 209–216. [Google Scholar]
- Gray, D.; Tao, H. Viewpoint invariant pedestrian recognition with an ensemble of localized features. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2008; Volume 5302, pp. 262–275. [Google Scholar]
- Farenzena, M.; Bazzani, L.; Perina, A.; Murino, V.; Cristani, M. Person re-identification by symmetry-driven accumulation of local features. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 2360–2367. [Google Scholar]
- Yang, Y.; Yang, J.; Yan, J.; Liao, S.; Yi, D.; Li, S.Z. Salient color names for person re-identification. Eur. Conf. Comput. Vis. 2014, 8689, 536–551. [Google Scholar]
- Kviatkovsky, I.; Adam, A.; Rivlin, E. Color invariants for person reidentification. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1622–1634. [Google Scholar] [CrossRef]
- Liu, Y.; Zhang, D.; Lu, G.; Ma, W.Y. Region-based image retrieval with high-level semantic color names. In Proceedings of the 11th International Multimedia Modelling Conference, Melbourne, Australia, 12–14 January 2005; pp. 180–187. [Google Scholar]
- Kuo, C.H.; Khamis, S.; Shet, V. Person re-identification using semantic color names and rankboost. In Proceedings of the 2013 IEEE Workshop on Applications of Computer Vision (WACV), Tampa, FL, USA, 15–17 January 2013; pp. 281–287. [Google Scholar]
- Weinberger, K.Q.; Saul, L.K. Fast solvers and efficient implementations for distance metric learning. In Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, 5–9 July 2008; pp. 1160–1167. [Google Scholar]
- Zhong, Z.; Zheng, L.; Zheng, Z.; Li, S.; Yang, Y. Camera style adaptation for person re-identification. arXiv 2017, arXiv:1711.10295. [Google Scholar]
- Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005. [Google Scholar]
- Varior, R.R.; Shuai, B.; Lu, J.; Xu, D.; Wang, G. A siamese long short-term memory architecture for human reidentification. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2016. [Google Scholar]
- Cheng, D.; Gong, Y.; Zhou, S.; Wang, I.; Zheng, N. Person re-identification by multi-channel parts-based cnn with improved triplet loss function. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1335–1344. [Google Scholar]
- Hermans, L.B.; Leibe, B. In defense of the triplet loss for person re-identification. arXiv 2017, arXiv:1703.07737. [Google Scholar]
- Wang, G.C.; Lai, J.H.; Xie, X.H. P2snet: Can an image match a video for person re-identification in an end-to-end way? IEEE Trans. Circuits Syst. Video Technol. 2017, 28, 2777–2787. [Google Scholar] [CrossRef]
- Wu, S.; Chen, Y.-C.; Li, X.; Wu, A.C.; You, J.J.; Zheng, W.S. An enhanced deep feature representation for person re-identification. In Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, USA, 7–10 March 2016. [Google Scholar]
- Shen, Y.; Lin, W.; Yan, J.; Xu, M.; Wu, J.; Wang, J. Person re-identification with correspondence structure learning. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Zheng, W.S.; Li, X.; Xiang, T.; Liao, S.; Lai, J.; Gong, S. Partial person re-identification. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 4678–4686. [Google Scholar]
- Zhao, R.; Ouyang, W.; Wang, X. Unsupervised salience learning for person re-identification. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013. [Google Scholar]
- Wei, L.; Zhang, S.; Yao, H.; Gao, W.; Tian, Q. Glad: Global-local-alignment descriptor for pedestrian retrieval. In Proceedings of the 25th ACM International Conference on Multimedia, Mountain View, CA, USA, 23–27 October 2017. [Google Scholar]
- Li, W.; Zhu, X.; Gong, S. Harmonious attention network for person re-identification. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; p. 2285. [Google Scholar]
- Girshick, R. Fast R-CNN. In International Conference on Computer Vision; Springer: Cham, Switzerland, 2015. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Zeiler, M.D.; Fergus, R. Visualizing and understanding convolutional neural networks. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2014; pp. 818–833. [Google Scholar]
- Wu, J. Introduction to Convolutional Neural Networks; National Key Lab for Novel Software Technology: Nanjing, China, 2017. [Google Scholar]
- Weber, B. Generic Object Detection Using Adaboost; Department of Computer Science University of California: Santa Cruz, CA, USA, 2008. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Uijlings, J.R.; van de Sande, K.E.; Gevers, T.; Smeulders, A.W. Selective search for object recognition. Int. Conf. Comput. Vis. 2013, 104, 154–171. [Google Scholar] [CrossRef] [Green Version]
- Zheng, L.; Zhang, H.; Sun, S.; Chandraker, M.; Yang, Y.; Tian, Q. Person re-identification in the wild. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. ECCV 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Sharing Convolution Layer | Kernel Number | Kernel Size | Stride | Max Pooling | Stride |
---|---|---|---|---|---|
Conv1 | 96 | 7 × 7 | 2 | 3 × 3 | 2 |
Conv2 | 256 | 5 × 5 | 2 | 3 × 3 | 2 |
Conv3 | 384 | 3 × 3 | 1 | -- | -- |
Conv4 | 384 | 3 × 3 | 1 | -- | -- |
Conv5 | 256 | 3 × 3 | 1 | -- | -- |
Experiments | Experiment 1 | Experiment 2 | Experiment 3 |
---|---|---|---|
Network Configuration | Multi-class CNN | Two-class CNN | Two-class CNN |
Training class number | 306 | 2 | 2 |
Feature types | RGB image | Vertically concatenated RGB image | Vertically concatenated convolutional feature maps |
Number of training pedestrian images | 7508 | Different pair: 87,363Same pair: 29,226 | Different pair: 87,363Same pair: 29,226 |
Test class number | 168 | 2 | 2 |
Number of test pedestrian images | TSet 1: Probe 168, Galley 168 | 28,224Combinations of 168 IDs (i.e., 168 × 168 = 28,224) | 28,224Combinations of 168 IDs (i.e., 168 × 168 = 28,224) |
Precision Rate | Recall Rate | F1-Score | |
---|---|---|---|
0.5 | 0.82 | 0.52 | 0.63 |
0.7 | 0.49 | 0.31 | 0.38 |
Experiment 1 | Feature Vector | Distance Metric | Top | Top-5 | Top-10 | Top-20 |
---|---|---|---|---|---|---|
TSet 1 | FC6 | Euclidean | 42.9 | 66.7 | 77.4 | 89.9 |
Learned metric | 43.5 | 66.7 | 76.8 | 86.9 | ||
FC7 | Euclidean | 37.5 | 60.7 | 73.8 | 88.1 | |
Learned metric | 38.1 | 63.1 | 72.6 | 87.5 | ||
TSet 2 | FC6 | Euclidean | 48.2 | 72.6 | 81.9 | 89.9 |
Learned metric | 52.1 | 75.9 | 83.0 | 90.8 | ||
FC7 | Euclidean | 48.5 | 69.6 | 78.3 | 88.4 | |
Learned metric | 49.7 | 72.3 | 80.1 | 88.9 | ||
TSet 3 | FC6 | Euclidean | 57.6 | 78.3 | 85.0 | 89.9 |
Learned metric | 62.7 | 81.8 | 87.4 | 91.1 | ||
FC7 | Euclidean | 53.1 | 74.6 | 82.4 | 88.5 | |
Learned metric | 56.8 | 78.8 | 84.5 | 89.6 | ||
GT-FC6 | Euclidean | 69.6 | 86.8 | 94.6 | 97.0 | |
Learned metric | 73.8 | 90.6 | 95.2 | 98.2 |
Experiments | Input | Distance Function | Top | Top-5 | Top-10 | Top-20 |
---|---|---|---|---|---|---|
Experiment 1 | FC6 | Learned Metric | 62.7 | 81.8 | 87.4 | 91.1 |
Experiment 2 | Concatenation of the probe and gallery of RGB images | Softmax | 17.3 | 44.04 | 58.9 | 72.6 |
Experiment 3 | Concatenation of the probe and gallery of convolutional feature maps | 4.76 | 17.9 | 20.4 | 36.3 |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, J.-C.; Wu, C.-F.; Chen, C.-H.; Lin, C.-R. Person Search via Deep Integrated Networks. Appl. Sci. 2020, 10, 188. https://doi.org/10.3390/app10010188
Chen J-C, Wu C-F, Chen C-H, Lin C-R. Person Search via Deep Integrated Networks. Applied Sciences. 2020; 10(1):188. https://doi.org/10.3390/app10010188
Chicago/Turabian StyleChen, Ju-Chin, Cheng-Feng Wu, Chun-Huei Chen, and Cheng-Rong Lin. 2020. "Person Search via Deep Integrated Networks" Applied Sciences 10, no. 1: 188. https://doi.org/10.3390/app10010188
APA StyleChen, J.-C., Wu, C.-F., Chen, C.-H., & Lin, C.-R. (2020). Person Search via Deep Integrated Networks. Applied Sciences, 10(1), 188. https://doi.org/10.3390/app10010188