A Two-Branch Network for Weakly Supervised Object Localization
Abstract
:1. Introduction
- We presented a two-branch network for WSOL and a self-attention mechanism was embedded to improve the ability of feature expression by connecting object parts.
- We applied multi-scale detection to output two-scale features in order to improve the detection performance in localization.
2. Related Works
2.1. The CNN-Based Model for WSOL
2.2. Attention Mechanism in CNN-Based WSOL Methods
3. The Proposed Method
3.1. Framework Overview
Algorithm 1 Training process for our two-branch network |
Input: Training image , threshold t
|
3.2. The Detection Branch
3.3. The Self-Attention Branch
3.4. Objective Function
4. Experiments and Discussion
4.1. Experiment Setup
4.2. Experimental Results and Discussion
4.2.1. Performance on the CUB-200-2011 Dataset
4.2.2. Performance on the VOC2007 Dataset
4.2.3. Ablation Study
4.2.4. Result Analysis
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Dietterich, T.G.; Lathrop, R.H.; Lozano-Pérez, T. Solving the multiple instance problem with axis-parallel rectangles. Artif. Intell. 1997, 89, 31–71. [Google Scholar] [CrossRef] [Green Version]
- Li, D.; Huang, J.B.; Li, Y.; Wang, S.; Yang, M.H. Weakly supervised object localization with progressive domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 3512–3520. [Google Scholar]
- Wang, C.; Ren, W.; Huang, K.; Tan, T. Weakly supervised object localization with latent category learning. In Proceedings of the European Conference on Computer Vision (ECCV), Zurich, Switzerland, 6–12 September 2014; pp. 431–445. [Google Scholar]
- Gokberk Cinbis, R.; Verbeek, J.; Schmid, C. Multi-fold mil training for weakly supervised object localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 2409–2416. [Google Scholar]
- Wan, F.; Liu, C.; Ke, W.; Ji, X.; Jiao, J.; Ye, Q. C-MIL: Continuation multiple instance learning for weakly supervised object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 2199–2208. [Google Scholar]
- Wei, Y.; Shen, Z.; Cheng, B.; Shi, H.; Xiong, J.; Feng, J.; Huang, T. Ts2c: Tight box mining with surrounding segmentation context for weakly supervised object detection. In Proceedings of the European Conference on Computer Vision (ECCV), Venue, Munich, Germany, 8–14 September 2018; pp. 434–450. [Google Scholar]
- Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2921–2929. [Google Scholar]
- Jie, Z.; Wei, Y.; Jin, X.; Feng, J.; Liu, W. Deep self-taught learning for weakly supervised object localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1377–1385. [Google Scholar]
- Zhang, Y.; Bai, Y.; Ding, M.; Li, Y.; Ghanem, B. Weakly-supervised object detection via mining pseudo ground truth bounding-boxes. Pattern Recognit. 2018, 84, 68–81. [Google Scholar] [CrossRef] [Green Version]
- Shen, Y.; Ji, R.; Wang, C.; Li, X.; Li, X. Weakly supervised object detection via object-specific pixel gradient. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 5960–5970. [Google Scholar] [CrossRef] [PubMed]
- Oquab, M.; Bottou, L.; Laptev, I.; Sivic, J. Is object localization for free?-weakly-supervised learning with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 685–694. [Google Scholar]
- Diba, A.; Sharma, V.; Pazandeh, A.; Pirsiavash, H.; Van Gool, L. Weakly supervised cascaded convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 914–922. [Google Scholar]
- Zhang, X.; Wei, Y.; Feng, J.; Yang, Y.; Huang, T.S. Adversarial complementary learning for weakly supervised object localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 1325–1334. [Google Scholar]
- Zhang, X.; Wei, Y.; Kang, G.; Yang, Y.; Huang, T. Self-produced guidance for weakly-supervised object localization. In Proceedings of the European Conference on Computer Vision (ECCV), Venue, Munich, Germany, 8–14 September 2018; pp. 597–613. [Google Scholar]
- Choe, J.; Shim, H. Attention-based dropout layer for weakly supervised object localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 2219–2228. [Google Scholar]
- Wang, F.; Jiang, M.; Qian, C.; Yang, S.; Li, C.; Zhang, H.; Wang, X.; Tang, X. Residual attention network for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 3156–3164. [Google Scholar]
- Zhu, Y.; Zhao, C.; Guo, H.; Wang, J.; Zhao, X.; Lu, H. Attention couplenet: Fully convolutional attention coupling network for object detection. IEEE Trans. Image Process. 2018, 28, 113–126. [Google Scholar] [CrossRef] [PubMed]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual attention network for scene segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 3146–3154. [Google Scholar]
- Liu, M.; Li, L.; Hu, H.; Guan, W.; Tian, J. Image caption generation with dual attention mechanism. Inf. Process. Manag. 2020, 57, 102178. [Google Scholar] [CrossRef]
- Zhou, Y.; Chen, Z.; Shen, H.; Liu, Q.; Zhao, R.; Liang, Y. Dual-attention Focused Module for Weakly Supervised Object Localization. arXiv 2019, arXiv:1909.04813. [Google Scholar]
- Zhang, H.; Goodfellow, I.; Metaxas, D.; Odena, A. Self-attention generative adversarial networks. arXiv 2018, arXiv:1805.08318. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 8–16 October 2016; pp. 21–37. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Cai, Z.; Fan, Q.; Feris, R.S.; Vasconcelos, N. A unified multi-scale deep convolutional neural network for fast object detection. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 8–16 October 2016; pp. 354–370. [Google Scholar]
- Welinder, P.; Branson, S.; Mita, T.; Wah, C.; Schroff, F.; Belongie, S.; Perona, P. Caltech-UCSD Birds 200; Technical Report CNS-TR-2010-001; California Institute of Technology: Pasadena, CA, USA, 2010. [Google Scholar]
- Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef] [Green Version]
- Kim, D.; Cho, D.; Yoo, D.; So Kweon, I. Two-phase learning for weakly supervised object localization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 3534–3543. [Google Scholar]
- Kantorov, V.; Oquab, M.; Cho, M.; Laptev, I. Contextlocnet: Context-aware deep network models for weakly supervised localization. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 8–16 October 2016; pp. 350–365. [Google Scholar]
- Bilen, H.; Vedaldi, A. Weakly supervised deep detection networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26–30 June 2016; pp. 2846–2854. [Google Scholar]
- Durand, T.; Mordan, T.; Thome, N.; Cord, M. Wildcat: Weakly supervised learning of deep convnets for image classification, pointwise localization and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 642–651. [Google Scholar]
- Zhu, Y.; Zhou, Y.; Ye, Q.; Qiu, Q.; Jiao, J. Soft proposal networks for weakly supervised object localization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 1841–1850. [Google Scholar]
- Tao, X.; Gong, Y.; Shi, W.; Cheng, D. Object detection with class aware region proposal network and focused attention objective. Pattern Recognit. Lett. 2018, 130, 353–361. [Google Scholar] [CrossRef]
- Li, H.; Liu, Y.; Ouyang, W.; Wang, X. Zoom out-and-in network with map attention decision for region proposal and object detection. Int. J. Comput. Vis. 2019, 127, 225–238. [Google Scholar] [CrossRef] [Green Version]
- Jiang, W.; Zhao, Z.; Su, F. Weakly supervised detection with decoupled attention-based deep representation. Multimed. Tools Appl. 2018, 77, 3261–3277. [Google Scholar] [CrossRef]
- Teh, E.W.; Rochan, M.; Wang, Y. Attention Networks for Weakly Supervised Object Localization. In Proceedings of the British Machine Vision Conference (BMVC), York, UK, 19–22 September 2016; pp. 1–11. [Google Scholar]
- Teh, E.W.; Guo, Z.; Wang, Y. Object localization in weakly labeled data using regularized attention networks. In Proceedings of the 2017 IEEE Visual Communications and Image Processing (VCIP), St. Petersburg, FL, USA, 10–13 December 2017; pp. 1–4. [Google Scholar]
- Li, K.; Wu, Z.; Peng, K.C.; Ernst, J.; Fu, Y. Tell me where to look: Guided attention inference network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 9215–9223. [Google Scholar]
- Hu, T.; Xu, J.; Huang, C.; Qi, H.; Huang, Q.; Lu, Y. Weakly supervised bilinear attention network for fine-grained visual classification. arXiv 2018, arXiv:1808.02152. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef] [Green Version]
- Deselaers, T.; Alexe, B.; Ferrari, V. Weakly supervised localization and learning with generic knowledge. Int. J. Comput. Vis. 2012, 100, 275–293. [Google Scholar] [CrossRef] [Green Version]
- Zhang, X. SPG. 2018. Available online: https://github.com/xiaomengyc/SPG (accessed on 16 December 2019).
- Zhang, X. ACoL. 2018. Available online: https://github.com/xiaomengyc/ACoL (accessed on 3 December 2019).
- Zhang, J.; Bargal, S.A.; Lin, Z.; Brandt, J.; Shen, X.; Sclaroff, S. Top-down neural attention by excitation backprop. Int. J. Comput. Vis. 2018, 126, 1084–1102. [Google Scholar] [CrossRef] [Green Version]
- Durand, T. wildcat.pytorch. 2017. Available online: https://github.com/durandtibo/wildcat.pytorch (accessed on 20 December 2019).
- Christian, S.; Wei, L.; Yangqing, J.; Pierre, S.; Scott, R.; Dragomir, A.; Dumitru, E.; Vincent, V.; Andrew, R. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
Methods | Top-1 Err. | Top-5 Err. |
---|---|---|
ADL [15] | 25.45 | - |
SPG# [14] | 25.18 | 7.65 |
ACoL# [13] | 24.14 | 7.13 |
ours | 22.13 | 5.63 |
Methods | Top-1 Err. | Top-5 Err. |
---|---|---|
ACoL# [13] | 29.51 | 14.62 |
SPG# [14] | 27.03 | 10.06 |
ours | 24.28 | 8.28 |
Methods | Top-1 Err. | Top-5 Err. |
---|---|---|
CAM [7] | 59.00 | - |
ACoL [13] | 54.08 | 43.49 |
SPG [14] | 53.36 | 42.28 |
ADL [15] | 46.96 | - |
ours | 51.22 | 41.55 |
Methods | Accuracy |
---|---|
ACoL# [13] | 61.27 |
Wildcat# [30] | 62.18 |
SPG# [14] | 77.43 |
MWP [44] | 79.30 |
CAM [7] | 80.80 |
c-MWP [44] | 85.10 |
ours | 82.20 |
Methods | Accuracy |
---|---|
Wildcat# [30] | 21.99 |
ACoL# [13] | 22.92 |
SPG# [14] | 23.78 |
ours | 30.65 |
Feature Map Scales | ||||
---|---|---|---|---|
Pointing localization | top-1 err. | top-5 err. | top-1 err. | top-5 err. |
26.57 | 11.11 | 24.28 | 8.28 | |
IoU localization | top-1 err. | top-5 err. | top-1 err. | top-5 err. |
51.22 | 41.55 | 61.00 | 53.12 |
Feature Maps | Detection Branch Features | Self-Attention Branch Features | Fusion Features | |||
---|---|---|---|---|---|---|
Pointing localization | top-1 err. | top-5 err. | top-1 err. | top-5 err. | top-1 err. | top-5 err. |
24.43 | 8.42 | 24.36 | 8.42 | 24.28 | 8.28 | |
IoU localization | top-1 err. | top-5 err. | top-1 err. | top-5 err. | top-1 err. | top-5 err. |
51.73 | 42.26 | 51.68 | 42.17 | 51.22 | 41.55 |
Backbones | VGG | GoogLeNet | ||
---|---|---|---|---|
Classification | top-1 err. | top-5 err. | top-1 err. | top-5 err. |
22.13 | 5.63 | 25.51 | 8.42 | |
Pointing localization | top-1 err. | top-5 err. | top-1 err. | top-5 err. |
24.28 | 8.28 | 27.14 | 10.44 | |
IoU localization | top-1 err. | top-5 err. | top-1 err. | top-5 err. |
51.22 | 41.55 | 60.00 | 50.98 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sun, C.; Ai, Y.; Wang, S.; Zhang, W. A Two-Branch Network for Weakly Supervised Object Localization. Electronics 2020, 9, 955. https://doi.org/10.3390/electronics9060955
Sun C, Ai Y, Wang S, Zhang W. A Two-Branch Network for Weakly Supervised Object Localization. Electronics. 2020; 9(6):955. https://doi.org/10.3390/electronics9060955
Chicago/Turabian StyleSun, Chang, Yibo Ai, Sheng Wang, and Weidong Zhang. 2020. "A Two-Branch Network for Weakly Supervised Object Localization" Electronics 9, no. 6: 955. https://doi.org/10.3390/electronics9060955
APA StyleSun, C., Ai, Y., Wang, S., & Zhang, W. (2020). A Two-Branch Network for Weakly Supervised Object Localization. Electronics, 9(6), 955. https://doi.org/10.3390/electronics9060955