An Image Stereo Matching Algorithm with Multi-Spectral Attention Mechanism
Abstract
:1. Introduction
- (1)
- Introducing a multi-spectral attention mechanism to weight different frequency feature maps, helping the residual network capture various detailed features of the input images and improve recognition accuracy in complex scenes.
- (2)
- Introducing a coordinated attention mechanism in the pyramid pooling module to extract spatial features at different scales. By adaptively learning the positional relationships of features at each scale, the feature-extraction capability is enhanced. The extracted features from the left and right images are concatenated using a shift-and-stitch method to construct a four-dimensional cost volume, which is then optimized and used for disparity calculation through stacked hourglass networks. The network architecture is illustrated in Figure 1.
2. Related Work
3. Proposed Method
3.1. Multi-Attention Feature-Extraction Network
3.1.1. Residual Network with Embedded Multi-Spectral Attention Mechanism
3.1.2. Spatial Pyramid Pooling Module with Embedded Coordinated Attention Mechanism
3.2. Cost Structure Construction
3.3. Cost Aggregation Network
3.4. Visual Disparity Calculation
4. Experimental Results and Analysis
4.1. Dataset and Experimental Parameters
4.1.1. Dataset
4.1.2. Experimental Environment and Parameter Settings
4.2. Algorithm Evaluation Metrics
4.3. Experimental Analysis
4.3.1. Ablation Experiment
4.3.2. Comparative Experimental Analysis on Three Major Public Datasets
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Mur-Artal, R.; Tardos, J.D. ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras. IEEE Trans. Robot. 2017, 33, 1255–1262. [Google Scholar] [CrossRef]
- Goldberg, S.B.; Maimone, M.W.; Matthies, L. Stereo vision and rover navigation software for planetary exploration. In Proceedings of the IEEE Aerospace Conference, Big Sky, MT, USA, 9–16 March 2002; IEEE: Piscataway, NJ, USA, 2002. [Google Scholar]
- Li, H.; Xu, C.; Xiao, Q.; Xu, X. Visual navigation of an autonomous robot using white line recognition. In Proceedings of the IEEE International Conference on Robotics and Automation, Taipei, Taiwan, 14–19 September 2003; IEEE: Piscataway, NJ, USA, 2003. [Google Scholar]
- Scharstein, D.; Szeliski, R. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. 2002, 47, 7–42. [Google Scholar] [CrossRef]
- Ning, I. Research on Target Pose Measurement Technology Based on Monocular Vision. Ph.D. Thesis, Beijing Institute of Technology, Beijing, China, 2016. [Google Scholar]
- Knöbelreiter, P.; Pock, T. Learned collaborative stereo refinement. Int. J. Comput. Vis. 2021, 129, 2565–2582. [Google Scholar] [CrossRef]
- Shahbazi, M.; Sohn, G.; Théau, J. High-density stereo image matching using intrinsic curves. ISPRS J. Photogramm. Remote Sens. 2018, 146, 373–388. [Google Scholar] [CrossRef]
- Zhou, J.; Yu, C.; Chao, W. Binocular stereo matching algorithm based on labeled matching region correction. Pattern Recognit. Artif. Intell. 2020, 33, 11. [Google Scholar]
- Zbontar, J.; LeCun, Y. Stereo matching by training a convolutional neural network to compare image patches. J. Mach. Learn. Res. 2016, 17, 2287–2318. [Google Scholar]
- Seki, A.; Pollefeys, M. Sgm-nets: Semi-global matching with neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Mayer, N.; Ilg, E.; Hausser, P.; Fischer, P.; Cremers, D.; Dosovitskiy, A.; Brox, T. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Xu, H.; Zhang, J. Aanet: Adaptive aggregation network for efficient stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2022. [Google Scholar]
- Tankovich, V.; Hane, C.; Zhang, Y.; Kowdle, A.; Fanello, S.; Bouaziz, S. Hitnet: Hierarchical iterative tile refinement network for real-time stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
- Tang, H. Research on Optimization of End-to-End Binocular Stereo Matching Algorithm Based on Convolutional Neural Network. Ph.D. Thesis, Inner Mongolia University, Hohhot, China, 2022. [Google Scholar]
- Qin, Z.; Zhang, P.; Wu, F.; Li, X. Fcanet: Frequency channel attention networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021. [Google Scholar]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
- Chang, J.; Chen, Y. Pyramid stereo matching network. In Proceedings of the IEEE conference on computer vision and pattern recognition , Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Kendall, A.; Martirosyan, H.; Dasgupta, S.; Henry, P.; Kennedy, R.; Bachrach, A.; Bry, A. End-to-end learning of geometry and context for deep stereo regression. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar]
- Yang, G.; Zhao, H.; Shi, J.; Deng, Z.; Jia, J. Segstereo: Exploiting semantic information for disparity estimation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Pang, J.; Sun, W.; Ren, J.S.; Yang, C.; Yan, Q. Cascade residual learning: A two-stage convolutional neural network for stereo matching. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy, 22–29 October 2017. [Google Scholar]
- Badki, A.; Troccoli, A.; Kim, K.; Kautz, J.; Sen, P.; Gallo, O. Bi3d: Stereo depth estimation via binary classifications. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Xu, G.; Cheng, J.; Guo, P.; Yang, X. ACVNet: Attention concatenation volume for accurate and efficient stereo matching. arXiv 2022, arXiv:2203.02146. [Google Scholar]
Network Setting | KITTI2015 | Scene Flow |
---|---|---|
- | Val Err (%) | End Point Err (px) |
ResNet + SPP | 1.98 | 1.09 |
ResNet + SPP_CA | 1.92 | 1.083 |
ResNet_FCA + SPP_CA | 1.86 | 1.072 |
Algorithm | MANet | AANet [12] | GCNet [18] | DispNetC [11] | PSMNet [17] |
---|---|---|---|---|---|
EPE (px) | 1.072 | 1.43 | 2.51 | 1.68 | 1.09 |
All (%) | Noc (%) | |||||
---|---|---|---|---|---|---|
Algorithm | D1-bg | D1-fg | D1-all | D1-bg | D1-fg | D1-all |
DispNetC [11] | 4.32 | 4.41 | 4.34 | 4.11 | 3.72 | 4.05 |
GCNet [18] | 2.21 | 6.16 | 2.87 | 2.02 | 5.58 | 2.61 |
CRL [20] | 2.48 | 3.59 | 2.67 | 2.32 | 3.12 | 2.45 |
Bi3D [21] | 1.95 | 3.48 | 2.21 | 1.79 | 3.11 | 2.01 |
DFFNet [14] | 1.71 | 4.25 | 2.23 | - | - | - |
AANet [12] | 1.99 | 5.39 | 2.55 | 1.80 | 4.93 | 2.32 |
PSMNet [17] | 1.86 | 4.62 | 2.32 | 1.71 | 4.31 | 2.14 |
MANet | 1.81 | 4.26 | 2.22 | 1.67 | 3.88 | 2.03 |
>2 px (%) | >3 px (%) | >5 px (%) | Mean Error (px) | |||||
---|---|---|---|---|---|---|---|---|
Algorithm | NOC | ALL | NOC | ALL | ALL | NOC | ALL | NOC |
SGMNet [10] | 3.60 | 5.15 | 2.29 | 3.50 | 1.60 | 2.36 | 0.7 | 0.9 |
DispNetC [11] | 7.38 | 8.11 | 4.11 | 4.65 | 2.05 | 2.39 | 0.9 | 1.0 |
GCNet [18] | 2.71 | 3.46 | 1.77 | 2.30 | 1.12 | 1.46 | 0.6 | 0.7 |
SegStereo [19] | 2.66 | 3.19 | 1.68 | 2.03 | 1.00 | 1.21 | 0.5 | 0.6 |
AANet [12] | 2.30 | 2.96 | 1.55 | 2.04 | 0.98 | 1.30 | 0.4 | 0.5 |
PSMNet [17] | 2.44 | 3.01 | 1.49 | 1.89 | 0.90 | 1.15 | 0.5 | 0.6 |
MANet | 2.26 | 2.87 | 1.39 | 1.82 | 0.83 | 1.09 | 0.5 | 0.5 |
>2 px (%) | >3 px (%) | >5 px (%) | Mean Error (px) | |||||
---|---|---|---|---|---|---|---|---|
Algorithm | NOC | ALL | NOC | ALL | NOC | ALL | NOC | ALL |
SGMNet [10] | 22.09 | 25.70 | 15.31 | 18.97 | 10.39 | 13.55 | 3.0 | 3.8 |
DispNetC [11] | 24.13 | 26.54 | 16.04 | 18.15 | 8.39 | 9.88 | 2.1 | 2.3 |
GCNet [18] | 16.58 | 19.07 | 10.80 | 12.80 | 6.59 | 7.99 | 1.8 | 2.0 |
ACVNet [22] | 11.42 | 13.53 | 7.03 | 8.67 | 4.14 | 5.20 | 1.4 | 1.5 |
AANet [12] | 15.89 | 17.87 | 10.51 | 11.97 | 6.25 | 7.02 | 1.7 | 1.8 |
PSMNet [17] | 13.77 | 16.06 | 8.36 | 10.18 | 4.58 | 5.64 | 1.4 | 1.6 |
MANet | 11.93 | 14.32 | 6.86 | 8.70 | 3.61 | 4.75 | 1.3 | 1.5 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Quan, Z.; Wu, B.; Luo, L. An Image Stereo Matching Algorithm with Multi-Spectral Attention Mechanism. Sensors 2023, 23, 8179. https://doi.org/10.3390/s23198179
Quan Z, Wu B, Luo L. An Image Stereo Matching Algorithm with Multi-Spectral Attention Mechanism. Sensors. 2023; 23(19):8179. https://doi.org/10.3390/s23198179
Chicago/Turabian StyleQuan, Zhenhua, Bin Wu, and Liang Luo. 2023. "An Image Stereo Matching Algorithm with Multi-Spectral Attention Mechanism" Sensors 23, no. 19: 8179. https://doi.org/10.3390/s23198179
APA StyleQuan, Z., Wu, B., & Luo, L. (2023). An Image Stereo Matching Algorithm with Multi-Spectral Attention Mechanism. Sensors, 23(19), 8179. https://doi.org/10.3390/s23198179