Multitask Coupling Network for Occlusion Relation Reasoning
Abstract
:1. Introduction
- (1)
- An MTCN model that includes two paths, the occlusion edge extraction path and occlusion orientation prediction path, is proposed;
- (2)
- A low-level feature context integration module (LFCI) is proposed that utilizes a self-attention mechanism to distinguish the importance of pixels at different positions and obtain local detail features of objects’ contours;
- (3)
- To enhance the receptive field of the side output features of the backbone network, we propose a multipath receptive field block (MRFB);
- (4)
- To fuse the feature flows extracted at different scales in the network structure, we propose a bilateral complementary fusion module (BCFM) that we designed;
- (5)
- Finally, to address the problem of blurred contour detection caused by a severe imbalance between the edge and non-edge pixels in data samples, we propose an adaptive multitask loss function that we designed.
2. Related Work
3. MTCN Model
3.1. Network Architecture
3.1.1. Edge Extraction Path
3.1.2. Orientation Prediction Path
3.1.3. Submodule Structure
- (1)
- Low-level Feature Context Integration Module
- (2)
- BCFM
- (3)
- Multipath Receptive Field Block
- (4)
- Multitask Loss Function
4. Results and Discussion
4.1. Datasets and Implementation Details
- (1)
- Implementation Details
- (2)
- Evaluation Metrics
4.2. Ablation Experiments
4.3. Comparison Experiments
- (1)
- Performance Comparison on BSDS Dataset
- (2)
- Performance Comparison on PIOD Dataset
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Zhou, Y.; Bai, X.; Liu, W.; Latecki, L. Fusion with Diffusion for Robust Visual Tracking; NIPS: Lake Tahoe, Nevada, 2012. [Google Scholar]
- Zhou, Y.; Ming, A. Human action recognition with skeleton induced discriminative approximate rigid part model. Pattern Recognit. Lett. 2016, 83, 261–267. [Google Scholar] [CrossRef]
- Teo, C.L.; Fermuller, C.; Aloimonos, Y. Fast 2d Border Ownership Assignment; In Proceedings of the CVPR 2015, Boston, MA, USA, 7–12 June 2015.
- Calderero, F.; Caselles, V. Recovering relative depth from low-level features without explicit t-junction detection and interpretation. Int. J. Comput. Vis. 2013, 104, 38–68. [Google Scholar] [CrossRef]
- Ayvaci, A.; Soatto, S. Detachable Object Detection: Segmentation and Depth Ordering from Short-Baseline Video. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 1942–1951. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Gao, T.; Packer, B.; Koller, D. A Segmentation-Aware Object Detection Model with Occlusion Handling. In Proceedings of the CVPR 2011, Colorado Springs, CO, USA, 20–25 June 2011; pp. 1361–1368. [Google Scholar]
- Fan, D.-P.; Lin, Z.; Ji, G.-P.; Zhang, D.; Fu, H.; Cheng, M.-M. Taking a Deeper Look at Co-Salient Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 2919–2929. [Google Scholar]
- Pang, Y.; Zhao, X.; Zhang, L.; Lu, H. Multi-Scale Interactive Network for Salient Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2020; pp. 9413–9422. [Google Scholar]
- Sargin, M.E.; Bertelli, L.; Manjunath, B.S.; Rose, K. Probabilistic occlusion boundary detection on spatio-temporal lattices. Proceeding of the 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 29 September–2 October 2009; pp. 560–567. [Google Scholar]
- Marshall, J.A.; Burbeck, C.A.; Ariely, D.; Rolland, J.P.; Martin, K.E. Occlusion edge blur: A cue to relative visual depth. JOSA A 1996, 13, 681–688. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Alatise, M.B.; Hancke, G.P. A review on challenges of autonomous mobile robot and sensor fusion methods. IEEE Access 2020, 8, 39830–39846. [Google Scholar] [CrossRef]
- Saxena, A.; Chung, S.; Ng, A. Learning Depth from Single Monocular Images. In Proceedings of the NIPS, Vancouver, BC, Canada, 5–8 December 2005. [Google Scholar]
- Jia, Z.; Gallagher, A.; Chang, Y.-J.; Chen, T. A Learning-Based Framework for Depth Ordering. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 294–301. [Google Scholar]
- Derek, H.; Alexei, E.; Martial, H. Recovering Occlusion Boundaries from an Image. In Proceedings of the ICCV 2007, Rio de Janeiro, Brazil, 14–20 October 2007. [Google Scholar]
- Hoiem, D.; Efros, A.A.; Hebert, M. Recovering occlusion boundaries from an image. Int. J. Comput. Vis. 2011, 91, 328–346. [Google Scholar] [CrossRef]
- Zhou, Y.; Bai, X.; Liu, W.; Latecki, L. Similarity Fusion for Visual Tracking. Int. J. Comput. Vis. 2016, 118, 337–363. [Google Scholar] [CrossRef]
- Avola, D.; Bernardi, M.; Cascio, M.; Cinque, L.; Foresti, G.L.; Massaroni, C. A New Descriptor for Keypoint-Based Background Modeling. In Image Analysis and Processing–ICIAP 2019; Ricci, E., Rota Bulò, S., Snoek, C., Lanz, O., Messelodi, S., Sebe, N., Eds.; ICIAP 2019; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2019; Volume 11751. [Google Scholar] [CrossRef]
- Wang, P.; Yuille, A. DOC: Deep OCclusion Estimation from a Single Image. In Computer Vision–ECCV 2016; Lecture Notes in Computer Science; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer: Berlin/Heidelberg, Germany, 2016; Volume 9905. [Google Scholar] [CrossRef] [Green Version]
- Wang, G.; Wang, X.; Li, F.W.B.; Liang, X. DOOBNet: Deep Object Occlusion Boundary Detection from an Image. In Computer Vision–ACCV 2018; Lecture Notes in Computer Science; Jawahar, C., Li, H., Mori, G., Schindler, K., Eds.; Springer: Berlin/Heidelberg, Germany, 2019; Volume 11366. [Google Scholar] [CrossRef] [Green Version]
- Lu, R.; Xue, F.; Zhou, M.; Ming, A.; Zhou, Y. Occlusion-Shared and Feature-Separated Network for Occlusion Relationship Reasoning. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republica of Korea, 27 October–2 November 2019; pp. 10342–10351. [Google Scholar] [CrossRef] [Green Version]
- Feng, P.; She, Q.; Zhu, L.; Li, J.; Zhang, L.; Feng, Z.; Wang, C.; Li, C.; Kang, X.; Ming, A.; et al. MT-ORL: Multitask Occlusion Relationship Learning. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 9344–9353. [Google Scholar] [CrossRef]
- Ren, X.; Fowlkes, C.C.; Malik, J. Figure/Ground Assignment in Natural Images; Leonardis, A., Bischof, H., Pinz, A., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; LNCS; Volume 3952, pp. 614–627. [Google Scholar]
- Cooper, M.C. Interpreting line drawings of curved objects with tangential edges and surfaces. Image Vis. Comput. 1997, 15, 263–276. [Google Scholar] [CrossRef]
- Nitzberg, M.; Mumford, D. The 2.1-D sketch. In Proceedings of the Third International Conference on Computer Vision, Osaka, Japan, 4–7 December 1990; pp. 138–144. [Google Scholar] [CrossRef] [Green Version]
- Maire, M.; Narihira, T.; Yu, S.X. Affinity CNN: Learning Pixel-Centric Pairwise Relations for Figure/Ground Embedding. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 174–182. [Google Scholar] [CrossRef] [Green Version]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- He, T.; Zhang, Z.; Zhang, H.; Zhang, Z.; Xie, J.; Li, M. Bag of tricks for image classification with convolutional neural networks. In Proceedings of the IEEE/CVF Conference Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 558–567. [Google Scholar]
- Bo, Q.; Ma, W.; Lai, Y.-K.; Zha, H. All-Higher-Stages-In Adaptive Context Aggregation for Semantic Edge Detection. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 6778–6791. [Google Scholar] [CrossRef]
- Bao, S.-S.; Huang, Y.-R.; Xu, G.-Y. Bidirectional Multiscale Refinement Network for Crisp Edge Detection. IEEE Access 2022, 10, 26282–26293. [Google Scholar] [CrossRef]
- Yu, F.; Koltun, V.; Funkhouser, T. Dilated Residual Networks. In Proceedings of the CVPR 2017, Honolul, HI, USA, 21–26 July 2017; Volume 3, p. 4. [Google Scholar]
- Liu, S.; Huang, D.; Wang, Y. Receptive Field Block Net for Accurate and Fast Object Detection. In Computer Vision–ECCV 2018; Lecture Notes in Computer Science; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Springer: Berlin/Heidelberg, Germany, 2018; Volume 11215. [Google Scholar] [CrossRef] [Green Version]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
- Xie, S.; Tu, Z. Holistically-nested edge detection. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1395–1403. [Google Scholar] [CrossRef] [Green Version]
ILFC | MRFB | BCFM | B-ODS/O-ODS |
---|---|---|---|
√ | √ | 62.1/59.6 | |
√ | √ | 56.2/57.6 | |
√ | √ | 60.5/58.8 | |
√ | √ | √ | 68.1/63.2 |
Method | PIOD Dataset | ||||||
---|---|---|---|---|---|---|---|
B-ODS | B-OIS | B-AP | O-ODS | O-OIS | O-AP | FPS | |
SRF-OCC | 34.5 | 36.9 | 20.7 | 26.8 | 28.6 | 15.2 | 0.018 |
DOC-HED | 50.9 | 53.2 | 46.8 | 46.0 | 47.9 | 40.5 | 19.6 ₤ |
DOC-DMLFOV | 66.9 | 68.4 | 67.7 | 60.1 | 61.1 | 58.5 | 21.1 ₤ |
DOOBNet | 73.6 | 74.6 | 72.3 | 70.2 | 71.2 | 68.3 | 25.8 ₤ |
OFNet | 75.1 | 76.2 | 77 | 71.8 | 72.8 | 72.9 | 27.2 ₤ |
MT-ORL † | 78.6 | 79.6 | 79.5 | ||||
MT-ORL ‡ | 76.1 | 77.0 | 76.1 | 32.3 ₤ | |||
MT-ORL | 79.5 | 80.4 | 83.1 | 77.1 | 77.8 | 79.4 | 30.6 ₤ |
Ours † | 79.2 | 79.8 | 80.1 | ||||
Ours ‡ | 76.9 | 77.6 | 77.3 | 35.4 ₤ | |||
Ours | 80.6 | 81.3 | 85.2 | 78.5 | 78.3 | 81.6 | 33.2 ₤ |
Method | PIOD Dataset | ||||||
---|---|---|---|---|---|---|---|
B-ODS | B-OIS | B-AP | O-ODS | O-OIS | O-AP | FPS | |
SRF-OCC | 34.5 | 36.9 | 20.7 | 26.8 | 28.6 | 15.2 | 0.018 |
DOC-HED | 50.9 | 53.2 | 46.8 | 46.0 | 47.9 | 40.5 | 18.3 ₤ |
DOC-DMLFOV | 66.9 | 68.4 | 67.7 | 60.1 | 61.1 | 58.5 | 18.9 ₤ |
DOOBNet | 73.6 | 74.6 | 72.3 | 70.2 | 71.2 | 68.3 | 26.7 ₤ |
OFNet | 75.1 | 76.2 | 77 | 71.8 | 72.8 | 72.9 | 28.3 ₤ |
MT-ORL † | 78.6 | 79.6 | 79.5 | ||||
MT-ORL ‡ | 76.1 | 77.0 | 76.1 | 33.2 ₤ | |||
MT-ORL | 79.5 | 80.4 | 83.1 | 77.1 | 77.8 | 79.4 | 31.6 ₤ |
Ours † | 79.2 | 79.8 | 80.1 | ||||
Ours ‡ | 76.9 | 77.6 | 77.3 | 36.8 ₤ | |||
Ours | 80.6 | 81.3 | 85.2 | 78.5 | 78.3 | 81.6 | 34.6 ₤ |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Bao, S.; Huang, Y.; Xu, J.; Xu, G. Multitask Coupling Network for Occlusion Relation Reasoning. Electronics 2023, 12, 3303. https://doi.org/10.3390/electronics12153303
Bao S, Huang Y, Xu J, Xu G. Multitask Coupling Network for Occlusion Relation Reasoning. Electronics. 2023; 12(15):3303. https://doi.org/10.3390/electronics12153303
Chicago/Turabian StyleBao, Shishui, Yourui Huang, Jiachang Xu, and Guangyu Xu. 2023. "Multitask Coupling Network for Occlusion Relation Reasoning" Electronics 12, no. 15: 3303. https://doi.org/10.3390/electronics12153303
APA StyleBao, S., Huang, Y., Xu, J., & Xu, G. (2023). Multitask Coupling Network for Occlusion Relation Reasoning. Electronics, 12(15), 3303. https://doi.org/10.3390/electronics12153303