Multi-Scale Aggregation Stereo Matching Network Based on Dense Grouping Atrous Convolution
Abstract
:1. Introduction
- (1)
- We propose a dense grouping atrous convolution pyramid pooling (DenseGASPP) module to reduce the hole part of the network to obtain a large dense receptive field that improves the accuracy of segmentation.
- (2)
- We introduce multi-scale cost aggregation to replace the 3D convolution method commonly used in traditional stereo matching to reduce the mismatching problems in the ill-posed region.
2. Related Work and Methods
2.1. Related Work
2.1.1. End-to-End Module for Binocular Stereo Matching
2.1.2. Atrous Convolution
2.1.3. Cost Aggregation
2.2. Methods
2.2.1. Network Architecture
2.2.2. DenseGASPP for Dense Feature Extraction
2.2.3. Multi-Scale Cost Aggregation for Reduced Mismatching
2.2.4. Disparity Regression
2.2.5. Loss Function
3. Experiments
3.1. Dataset
3.2. Evaluation Metrics
3.3. Implementation Details
3.4. Experimental Results
3.4.1. Performance Analysis of Stereo Matching
3.4.2. Accuracy Analysis of Disparity Map
3.5. Ablation Study
3.5.1. Ablation Study of DenseGASPP Module
3.5.2. Ablation Study of Multi-Scale Cost Aggregation
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Marr, D. Vision a Computational Investigation into the Human Representation and Processing of Visual Information; MIT Press: London, UK, 1983; Volume 8. [Google Scholar]
- Xiu-Juan, L.I.; Liu, W.; Shan-Hong, L.I. Robust Control Algorithm of Bionic Robot Based on Binocular Vision Navigation. Comput. Sci. 2017, 21, 318–322. [Google Scholar]
- Trzcinski, T.; Christoudias, M.; Fua, P.; Lepetit, V. Boosting Binary Keypoint Descriptors. Computer Vision and Pattern Recognition. In Proceedings of the CVPR–2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; IEEE: Piscataway, NJ, USA, 2013. [Google Scholar]
- LeCun, Y.; Boser, B.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.; Jackel, L.D. Backpropagation Applied to Handwritten Zip Code Recognition. Neural Comput. 1989, 1, 541–551. [Google Scholar] [CrossRef]
- Mayer, N.; Ilg, E.; Hausser, P.; Fischer, P.; Cremers, D.; Dosovitskiy, A.; Brox, T. A Large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 4040–4048. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Yu, F.; Koltun, V.; Funkhouser, T. Dilated Residual Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Zhu, Z.; He, M.; Dai, Y.; Rao, Z.; Li, B. Multi-scale cross-form pyramid network for stereo matching. In Proceedings of the 2019 14th IEEE Conference on lndustrial Electronics and Applications (ICIEA), Xi’an, China, 19–21 June 2019; IEEE: Piscataway, NJ, USA, 2019. [Google Scholar]
- Zhu, Z.; Guo, W.; Chen, W.; Li, Q.; Zhao, Y. MPANet: Multi-Scale Pyramid Aggregation Network For Stereo Matching. In Proceedings of the 2021 IEEE International Conference on lmage Processing (ICIP), Virtual, 19–22 September 2021; IEEE: Piscataway, NJ, USA, 2021. [Google Scholar]
- Yang, M.; Yu, K.; Zhang, C.; Li, Z.; Yang, K. DenseASPP for Semantic Segmentation in Street Scenes. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; IEEE: Piscataway, NJ, USA, 2018. [Google Scholar]
- Zou, Q.; Yu, J.; Fang, H.; Qin, J.; Zhang, J.; Liu, S. Group-Based Atrous Convolution Stereo Matching Network. Wirel. Commun. Mob. Comput. 2021, 2021, 7386280 . [Google Scholar] [CrossRef]
- Yang, P.; Sun, X.; Li, W.; Ma, S.; Wu, W.; Wang, H. SGM: Sequence Generation Model for Multi-label Classification. In Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, NM, USA, 21–25 August 2018. [Google Scholar]
- Kendall, A.; Martirosyan, H.; Dasgupta, S.; Henry, P.; Kennedy, R.; Bachrach, A.; Bry, A. End-to-end learning of geometry and context for deep stereo regression. In Proceedings of the IEEE International Conference on Computer Vision 2017, Venice, Italy, 22–29 October 2017; pp. 66–75. [Google Scholar]
- Chang, J.; Chen, Y. Pyramid stereo matching network. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5410–5418. [Google Scholar]
- Scharstein, D.; Szeliski, R. A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms. Int. J. Comput. Vis. 2002, 47, 7–42. [Google Scholar] [CrossRef]
- Sun, J.; Zheng, N.; Shum, H. Stereo matching using belief propagation. IEEE Trans. Pattern Anal. Mach. Intell. 2003, 7, 787–800. [Google Scholar]
- Kolmogorov, V.; Zabih, R. Computing visual correspondence with occlusions using graph cuts. In Proceedings of the Eighth IEEE International Conference on Computer Vision, Vancouver, BC, Canada, 7–14 July 2001; Volume 2, pp. 508–515. [Google Scholar]
- Yoon, K.; Kweon, I.S. Adaptive support-weight approach for correspondence search. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 4, 650–656. [Google Scholar] [CrossRef] [PubMed]
- Hosni, A.; Rhemann, C.; Bleyer, M.; Rother, C.; Gelautz, M. Fast Cost-Volume Filtering for Visual Correspondence and Beyond. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 35, 504–511. [Google Scholar] [CrossRef] [PubMed]
- Min, D.; Lu, J.; Do, M.N. A revisit to cost aggregation in stereo matching: How far can we reduce its computational redundancy? In Proceedings of the 2011 International Conference on Computer Vision, Washington, DC, USA, 6–13 November 2011; pp. 1567–1574. [Google Scholar]
- Xu, H.; Zhang, J. AANet: Adaptive Aggregation Network for Efficient Stereo Matching. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; IEEE: Piscataway, NJ, USA, 2020. [Google Scholar]
- Dosovitskiy, A.; Fischer, P.; Ilg, E.; Hausser, P.; Hazirbas, C.; Golkov, V.; van der Smagt, P.; Cremers, D.; Brox, T. FlowNet: Learning optical flow with convolutional networks. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 2758–2766. [Google Scholar]
- Chang, Q.; Maruyama, T. Real-Time Stereo Vision System: A Multi-Block Matching on GPU. IEEE Access 2018, 6, 42030–42046. [Google Scholar] [CrossRef]
- Wang, D.; Hua, L.; Cheng, X. A Miniature Binocular Endoscope with Local Feature Matchingand Stereo Matching for 3D Measurement and 3D Reconstruction. Sensors 2018, 18, 2243. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kun, Z.; Xiangxi, M.; Cheng, B. Review of stereo matching algorithms based on deep learning. Comput. Intell. Neurosci. 2020, 14, 8562323. [Google Scholar]
- Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–17 June 2019; pp. 5693–5703. [Google Scholar]
- Yu, F.; Koltun, V. Multi-Scale Context Aggregation by Dilated Convolutions. In Proceedings of the International Conference on Learning Representations—ICLR, San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- Huang, G.; Liu, Z.; Van der Maaten, L.; Weinberger, Q.K. Densely connected convolutional Networks. In Proceedings of the Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Calonder, M.; Lepetit, V.; Strecha, C.; Fua, P. BRIEF: Binary robust independent elementary features. In Proceedings of the European Conference on Computer Vision (ECCV), Heraklion, Greece, 5–11 September 2010; pp. 778–792. [Google Scholar]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
- Object scene flow for autonomous vehicles. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3061–3070.
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. In Proceedings of the Advances in Neural Information Processing Systems: Annual Conference on Neural Information Processing Systems; Vancouver, BC, Canada, 8–14 December 2019, pp. 8024–8035.
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Software/Hardware | Configuration |
---|---|
CPU | Intel Core i9-10920x |
GPU | RTX 6000 |
Operating system | Ubuntu 20.04 |
CUDA version | CUDA 10.0 |
Language | Python 3.7 |
Deep learning framework | PyTorch 1.3.0 |
Network Model | EPE | >1 px |
---|---|---|
PSM-Net | 1.09 | 12.1 |
GA-Net | 0.87 | 9.9 |
AANet | 0.86 | 9.3 |
DenseGASPP | 0.68 | 7.2 |
Serial Number | Interval of Dilation Rate | Dilation Group | Scene Flow EPE | KITTI 2015 EPE | ||
---|---|---|---|---|---|---|
Group 1 | Group 2 | Group 3 | ||||
1 | 1 | (2,3) | (4,5) | (6,7) | 0.86 | 0.69 |
2 | 2 | (2,3) | (5,6) | (8,9) | 0.93 | 0.77 |
3 | 2 | (3,4) | (6,7) | (9,10) | 0.92 | 0.75 |
4 | 3 | (2,3) | (6,7) | (10,11) | 0.98 | 0.71 |
5 | 4 | (2,3) | (7,8) | (12,13) | 1.02 | 0.94 |
6 | 5 | (2,3) | (8,9) | (14,15) | 1.14 | 0.96 |
Dilation Rate | Scale of Cost Volume | Scene Flow EPE | KITTI 2015 EPE |
---|---|---|---|
1/2 1/4 1/8 | 0.86 | 0.69 | |
(2,3) (4,5) (6,7) | 1/3 1/6 1/12 | 0.67 | 0.44 |
1/4 1/8 1/16 | 0.92 | 0.74 |
Network Model | Inference Time (s) | Scene Flow EPE | KITTI 2015 EPE |
---|---|---|---|
PSM-Net | 0.047 | 0.97 | 0.75 |
PSM-Net-D | 0.053 | 0.85 | 0.62 |
AANet | 0.095 | 0.87 | 0.68 |
AANet-D | 0.051 | 0.67 | 0.44 |
GASPP | 0.049 | 0.92 | 0.73 |
GASPP-D | 0.055 | 0.81 | 0.63 |
Network Model | EPE | >1 px |
---|---|---|
PSM-Net | 1.09 | 10.3 |
PSM-Net-S | 0.97 | 10.2 |
GC-Net | 2.51 | 16.9 |
GC-Net-S | 0.98 | 10.8 |
GA-Net | 0.87 | 9.9 |
GA-Net-S | 0.88 | 9.3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zou, Q.; Zhang, J.; Chen, S.; Gao, B.; Qin, J.; Dong, A. Multi-Scale Aggregation Stereo Matching Network Based on Dense Grouping Atrous Convolution. Appl. Sci. 2023, 13, 7033. https://doi.org/10.3390/app13127033
Zou Q, Zhang J, Chen S, Gao B, Qin J, Dong A. Multi-Scale Aggregation Stereo Matching Network Based on Dense Grouping Atrous Convolution. Applied Sciences. 2023; 13(12):7033. https://doi.org/10.3390/app13127033
Chicago/Turabian StyleZou, Qijie, Jie Zhang, Shuang Chen, Bing Gao, Jing Qin, and Aotian Dong. 2023. "Multi-Scale Aggregation Stereo Matching Network Based on Dense Grouping Atrous Convolution" Applied Sciences 13, no. 12: 7033. https://doi.org/10.3390/app13127033
APA StyleZou, Q., Zhang, J., Chen, S., Gao, B., Qin, J., & Dong, A. (2023). Multi-Scale Aggregation Stereo Matching Network Based on Dense Grouping Atrous Convolution. Applied Sciences, 13(12), 7033. https://doi.org/10.3390/app13127033