Age Should Not Matter: Towards More Accurate Pedestrian Detection via Self-Training †
Abstract
:1. Introduction
- The STPD was constructed by extending the pedestrian dataset, WSPD, using self-training.
- In order to rigorously evaluate the detection performance for “adults” versus “children,” we constructed a new evaluation dataset.
- The person detector with STPD pre-training reduced the miss rate of “adults” and “children” compared to the detector with WSPD pre-training. Furthermore, we observed a mitigating effect of self-training on the detection rate gap.
- We studied three aspects to investigate the cause of the gap in detection rates by age: (i) the appearance of “adults” and “children”; (ii) the quantity of data for “children”; and (iii) the scale of the input images.
2. Related Work
2.1. Detector
2.2. Pedestrian Detection
3. Self-Training
3.1. Problem
3.2. Solution
3.3. Experimental Settings
3.4. Evaluation Metric
3.5. Results
4. Analysis and Discussion
4.1. The Relationship between the Bias in the Quantity of Data and the Miss Rate
4.2. The Relationship between the Size of a Person’s Bounding Box and the Miss Rate
4.3. Appearance Difference
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Brandao, M. Age and Gender Bias in Pedestrian Detection Algorithms. arXiv 2019, arXiv:1906.10490. [Google Scholar]
- Minoguchi, M.; Okayama, K.; Satoh, Y.; Kataoka, H. Weakly Supervised Dataset Collection for Robust Person Detection. arXiv 2020, arXiv:2003.12263. [Google Scholar]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot Multibox Detector. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 7–30 June 2016. [Google Scholar]
- Zhao, Q.; Sheng, T.; Wang, Y.; Tang, Z.; Chen, Y.; Cai, L.; Ling, H. M2det: A single-shot Object Detector Based on Multi-level Feature Pyramid Network. Proc. AAAI Conf. Artif. Intell. 2019, 33, 9259–9266. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; and Dollar, P. Focal Loss for Dense Object Detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
- He, K.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
- Wilson, B.; Hoffman, J.; Morgenstern, J. Predictive Inequity in Object Detection. arXiv 2019, arXiv:1902.11097. [Google Scholar]
- Zhou, B.; Lapedriza, A.; Khosla, A.; Oliva, A.; Torralba, A. Places: A 10 million Image Database for Scene Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 1452–1464. [Google Scholar] [CrossRef] [Green Version]
- Zoph, B.; Ghiasi, G.; Lin, T.-Y.; Cui, Y.; Liu, H.; Cubuk, E.D.; Le, Q. Rethinking Pre-training and Self-training. Adv. Neural Inf. Process. Syst. 2020, 33, 3833–3845. [Google Scholar]
Dataset | Image | Bounding Box | Class |
---|---|---|---|
Pascal VOC | 11,530 | 27,450 | 20 |
MS COCO | 123,287 | 896,782 | 80 |
OpenImages v5 | 1,743,042 | 14,610,229 | 600 |
CityPersons | 5000 | 35,016 | 2 |
EuroCity Persons | 47,300 | 238,200 | 17 |
Caltech Pedestrian | 250,000 | 350,000 | 2 |
WSPD | 2,822,421 | 8,716,461 | 2 |
STPD (Ours) | 3,461,024 | 9,739,996 | 1 |
FA-INRIA (Ours) | 902 | 2993 | 2 |
Annotation Type | Images | % |
---|---|---|
(i) Adult | 2687 | 53.7 |
(ii) Children | 169 | 3.4 |
(iii) Noise | 536 | 10.7 |
(iv) Multiple | 1608 | 32.2 |
Age | Images | Bounding Boxes |
---|---|---|
Adult | 870 | 2672 |
Children | 151 | 321 |
All | 902 | 2993 |
Dataset | Batch Size, Epochs | |||
---|---|---|---|---|
64, 10 | 13.9 | 23.1 | 4.6 | |
WSPD | 128, 10 | 13.8 | 21.2 | 3.2 |
256, 10 | 13.1 | 19.2 | 3.1 | |
64, 10 | 13.8 | 19.2 | 2.7 | |
STPD (ours) | 128, 10 | 13.4 | 19.2 | 2.9 |
256, 10 | 13.1 | 17.3 | 2.1 |
Batch Size, Epochs | ||||||
---|---|---|---|---|---|---|
w/o Aug. | w/ Aug. | w/o Aug. | w/ Aug. | w/o Aug. | w/ Aug. | |
64, 10 | 13.8 | 12.6 | 19.2 | 19.2 | 2.7 | 3.3 |
128, 10 | 13.4 | 12.1 | 19.2 | 19.2 | 2.9 | 3.6 |
256, 10 | 13.1 | 10.7 | 17.3 | 15.4 | 2.1 | 2.4 |
Batch | |||||||||
---|---|---|---|---|---|---|---|---|---|
Size, | Input size of the image (pixels × pixels) | ||||||||
Epochs | 150 | 300 | 600 | 150 | 300 | 600 | 150 | 300 | 600 |
64, 10 | 14.9 | 14.1 | 14.4 | 17.3 | 15.4 | 15.4 | 1.2 | 0.7 | 0.5 |
128, 10 | 15.2 | 14.6 | 13.9 | 21.2 | 17.3 | 15.4 | 3.0 | 1.4 | 0.8 |
256, 10 | 14.1 | 14.7 | 13.4 | 21.2 | 17.3 | 15.4 | 3.6 | 1.3 | 1.0 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kogure, S.; Watabe, K.; Yamada, R.; Aoki, Y.; Nakamura, A.; Kataoka, H. Age Should Not Matter: Towards More Accurate Pedestrian Detection via Self-Training. Comput. Sci. Math. Forum 2022, 3, 11. https://doi.org/10.3390/cmsf2022003011
Kogure S, Watabe K, Yamada R, Aoki Y, Nakamura A, Kataoka H. Age Should Not Matter: Towards More Accurate Pedestrian Detection via Self-Training. Computer Sciences & Mathematics Forum. 2022; 3(1):11. https://doi.org/10.3390/cmsf2022003011
Chicago/Turabian StyleKogure, Shunsuke, Kai Watabe, Ryosuke Yamada, Yoshimitsu Aoki, Akio Nakamura, and Hirokatsu Kataoka. 2022. "Age Should Not Matter: Towards More Accurate Pedestrian Detection via Self-Training" Computer Sciences & Mathematics Forum 3, no. 1: 11. https://doi.org/10.3390/cmsf2022003011
APA StyleKogure, S., Watabe, K., Yamada, R., Aoki, Y., Nakamura, A., & Kataoka, H. (2022). Age Should Not Matter: Towards More Accurate Pedestrian Detection via Self-Training. Computer Sciences & Mathematics Forum, 3(1), 11. https://doi.org/10.3390/cmsf2022003011