Deep-Learning-Based Stereo Matching of Long-Distance Sea Surface Images for Sea Level Monitoring Systems
Abstract
:1. Introduction
2. Related Work
2.1. Sea Level Measurement
2.2. Deep-Learning-Based Stereo Matching
3. Sea Level Monitoring System
4. Transformer and Siamese Joint Sparse Matching Network
4.1. Sea Wave Detection Module
4.2. Sea Wave Matching Module
4.3. Disparity Initialization and Matching
5. Experiments
5.1. Implementation
5.2. Evaluation of Sea Wave Detection
5.3. Matching Mask
5.4. Matching Results
6. Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Pompe, J.J.; Rinehart, J.R. Mitigating damage costs from hurricane strikes along the southeastern U.S. Coast: A role for insurance markets. Ocean Coast. Manag. 2008, 51, 782–788. [Google Scholar] [CrossRef]
- Villamayor, B.M.R.; Rollon, R.N.; Samson, M.S.; Albano, G.M.G.; Primavera, J.H. Impact of Haiyan on Philippine mangroves: Implications to the fate of the widespread monospecific Rhizophora plantations against strong typhoons. Ocean Coast. Manag. 2016, 132, 1–14. [Google Scholar] [CrossRef]
- Pugh, D.; Woodworth, P. Sea-Level Science: Understanding Tides, Surges, Tsunamis and Mean Sea-Level Changes; Cambridge University Press: Cambridge, UK, 2014. [Google Scholar]
- Intergovernmental Oceanographic Commission. Manual on Sea Level Measurement and Interpretation; Volume I-Basic procedures; UNESCO: Paris, France, 1985. [Google Scholar]
- Miguez, B.M.; Testut, L.; Wöppelmann, G. The Van de Casteele test revisited: An efficient approach to tide gauge error characterization. J. Atmos. Ocean. Technol. 2008, 25, 1238–1244. [Google Scholar] [CrossRef]
- Bunt, T.G. XI. Description of a new tide-gauge, constructed by Mr. TG Bunt, and erected on the eastern bank of the river Avon, in front of the Hotwell House, Bristol, 1837. Philos. Trans. R. Soc. Lond. 1838, 128, 249–251. [Google Scholar]
- Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? the kitti vision benchmark suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 3354–3361. [Google Scholar]
- Menze, M.; Geiger, A. Object scene flow for autonomous vehicles. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3061–3070. [Google Scholar]
- Koomey, J.; Berard, S.; Sanchez, M.; Wong, H. Implications of historical trends in the electrical efficiency of computing. IEEE Ann. Hist. Comput. 2010, 33, 46–54. [Google Scholar] [CrossRef]
- Zagoruyko, S.; Komodakis, N. Learning to compare image patches via convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 4353–4361. [Google Scholar]
- Zbontar, J.; LeCun, Y. Computing the stereo matching cost with a convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1592–1599. [Google Scholar]
- Luo, W.; Schwing, A.G.; Urtasun, R. Efficient deep learning for stereo matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 5695–5703. [Google Scholar]
- Dosovitskiy, A.; Fischer, P.; Ilg, E.; Hausser, P.; Hazirbas, C.; Golkov, V.; Van Der Smagt, P.; Cremers, D.; Brox, T. Flownet: Learning optical flow with convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 2758–2766. [Google Scholar]
- Mayer, N.; Ilg, E.; Hausser, P.; Fischer, P.; Cremers, D.; Dosovitskiy, A.; Brox, T. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 4040–4048. [Google Scholar]
- Gidaris, S.; Komodakis, N. Detect, replace, refine: Deep structured prediction for pixel wise labeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5248–5257. [Google Scholar]
- Kendall, A.; Gal, Y. What uncertainties do we need in bayesian deep learning for computer vision? In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Rao, Z.; He, M.; Dai, Y.; Zhu, Z.; Li, B.; He, R. Nlca-net: A non-local context attention network for stereo matching. APSIPA Trans. Signal Inf. Process. 2020, 9, e18. [Google Scholar] [CrossRef]
- Liang, Z.; Guo, Y.; Feng, Y.; Chen, W.; Qiao, L.; Zhou, L.; Zhang, J.; Liu, H. Stereo matching using multi-level cost volume and multi-scale feature constancy. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 43, 300–315. [Google Scholar] [CrossRef]
- Spencer, R.; McGarry, C.; Harrison, A.; Vassie, J.; Baker, T.; Smithson, M.; Haranzogo, S.; Woodworth, P. The ACCLAIM programme in the South Atlantic and Southern oceans. Int. Hydrogr. Rev. 1993, 70, 7–21. [Google Scholar]
- Meinig, C.; Stalin, S.E.; Nakamura, A.I.; González, F.; Milburn, H.B. Technology developments in real-time tsunami measuring, monitoring and forecasting. In Proceedings of the OCEANS 2005 MTS/IEEE, Washington, DC, USA, 17–23 September 2005; IEEE: Piscataway, NJ, USA, 2005; pp. 1673–1679. [Google Scholar]
- Agency, J.M. Prediction of Tsunami. Available online: http://www.data.jma.go.jp/svd/eqev/data/tsunami/ryoteki.html (accessed on 11 September 2022).
- Jin, D.; Lin, J. Managing tsunamis through early warning systems: A multidisciplinary approach. Ocean Coast. Manag. 2011, 54, 189–199. [Google Scholar] [CrossRef]
- Valentini, N.; Saponieri, A.; Damiani, L. A new video monitoring system in support of Coastal Zone Management at Apulia Region, Italy. Ocean Coast. Manag. 2017, 142, 122–135. [Google Scholar] [CrossRef]
- Benetazzo, A. Measurements of short water waves using stereo matched image sequences. Coast. Eng. 2006, 53, 1013–1032. [Google Scholar] [CrossRef]
- Gallego, G.; Yezzi, A.; Fedele, F.; Benetazzo, A. A variational stereo method for the three-dimensional reconstruction of ocean waves. IEEE Trans. Geosci. Remote Sens. 2011, 49, 4445–4457. [Google Scholar] [CrossRef]
- Gallego, G.; Yezzi, A.; Fedele, F.; Benetazzo, A. Variational stereo imaging of oceanic waves with statistical constraints. IEEE Trans. Image Process. 2013, 22, 4211–4223. [Google Scholar] [CrossRef] [PubMed]
- Wanek, J.M.; Wu, C.H. Automated trinocular stereo imaging system for three-dimensional surface wave measurements. Ocean Eng. 2006, 33, 723–747. [Google Scholar] [CrossRef]
- Brandt, A.; Mann, J.; Rennie, S.; Herzog, A.; Criss, T. Three-dimensional imaging of the high sea-state wave field encompassing ship slamming events. J. Atmos. Ocean. Technol. 2010, 27, 737–752. [Google Scholar] [CrossRef]
- Bergamasco, F.; Torsello, A.; Sclavo, M.; Barbariol, F.; Benetazzo, A. WASS: An open-source pipeline for 3D stereo reconstruction of ocean waves. Comput. Geosci. 2017, 107, 28–36. [Google Scholar] [CrossRef]
- Wedel, A.; Rabe, C.; Vaudrey, T.; Brox, T.; Franke, U.; Cremers, D. Efficient dense scene flow from sparse or dense stereo data. In Proceedings of the European Conference on Computer Vision, Marseille, France, 12–18 October 2008; Springer: Berlin/Heidelberg, Germany, 2008; pp. 739–751. [Google Scholar]
- Liang, Z.; Feng, Y.; Guo, Y.; Liu, H.; Chen, W.; Qiao, L.; Zhou, L.; Zhang, J. Learning for disparity estimation through feature constancy. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2811–2820. [Google Scholar]
- Pang, J.; Sun, W.; Ren, J.S.; Yang, C.; Yan, Q. Cascade residual learning: A two-stage convolutional neural network for stereo matching. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy, 22–29 October 2017; pp. 887–895. [Google Scholar]
- Song, X.; Zhao, X.; Fang, L.; Hu, H.; Yu, Y. Edgestereo: An effective multi-task learning network for stereo matching and edge detection. Int. J. Comput. Vis. 2020, 128, 910–930. [Google Scholar] [CrossRef]
- Xu, H.; Zhang, J. Aanet: Adaptive aggregation network for efficient stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 1959–1968. [Google Scholar]
- Yang, G.; Zhao, H.; Shi, J.; Deng, Z.; Jia, J. Segstereo: Exploiting semantic information for disparity estimation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 636–651. [Google Scholar]
- Lipson, L.; Teed, Z.; Deng, J. Raft-stereo: Multilevel recurrent field transforms for stereo matching. In Proceedings of the 2021 International Conference on 3D Vision (3DV), London, UK, 1–3 December 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 218–227. [Google Scholar]
- Teed, Z.; Deng, J. Raft: Recurrent all-pairs field transforms for optical flow. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020. Proceedings, Part II 16. pp. 402–419. [Google Scholar]
- Wang, F.; Galliani, S.; Vogel, C.; Pollefeys, M. IterMVS: Iterative probability estimation for efficient multi-view stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 8606–8615. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; Volume 28. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020. Proceedings, Part I 16. pp. 213–229. [Google Scholar]
- Stewart, R.; Andriluka, M.; Ng, A.Y. End-to-end people detection in crowded scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2325–2333. [Google Scholar]
- Ren, M.; Zemel, R.S. End-to-end instance segmentation with recurrent attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6656–6664. [Google Scholar]
- Bolme, D.S.; Beveridge, J.R.; Draper, B.A.; Lui, Y.M. Visual object tracking using adaptive correlation filters. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 2544–2550. [Google Scholar]
- Bertinetto, L.; Valmadre, J.; Henriques, J.F.; Vedaldi, A.; Torr, P.H. Fully-convolutional siamese networks for object tracking. In Proceedings of the Computer Vision–ECCV 2016 Workshops, Amsterdam, The Netherlands, 8–10 and 15–16 October 2016; Springer: Berlin/Heidelberg, Germany, 2016. Proceedings, Part II 14. pp. 850–865. [Google Scholar]
- He, A.; Luo, C.; Tian, X.; Zeng, W. Towards a better match in siamese network based visual object tracker. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Zheng, J.; Ma, C.; Peng, H.; Yang, X. Learning to track objects from unlabeled videos. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 13546–13555. [Google Scholar]
- Yan, B.; Peng, H.; Wu, K.; Wang, D.; Fu, J.; Lu, H. Lighttrack: Finding lightweight neural networks for object tracking via one-shot architecture search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 15180–15189. [Google Scholar]
- Hu, W.; Wang, Q.; Zhang, L.; Bertinetto, L.; Torr, P.H. Siammask: A framework for fast online object tracking and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 3072–3089. [Google Scholar] [PubMed]
- Yang, Y.; Lu, C.; Li, Z. Long-Distance Sea Wave Sparse Matching Algorithm for Sea Level Monitoring System. J. Mar. Sci. Eng. 2023, 11, 391. [Google Scholar] [CrossRef]
- Yang, Y.; Lu, C. Long-distance sea wave extraction method based on improved Otsu algorithm. Artif. Life Robot. 2019, 24, 304–311. [Google Scholar] [CrossRef]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar]
- Kuhn, H.W. The Hungarian method for the assignment problem. Nav. Res. Logist. Q. 1955, 2, 83–97. [Google Scholar] [CrossRef]
- Yang, Y.; Lu, C. A stereo matching method for 3D image measurement of long-distance sea surface. J. Mar. Sci. Eng. 2021, 9, 1281. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Module | Detection | Matching | ||||
---|---|---|---|---|---|---|
Layers | ||||||
backbone | ResNet-50 | |||||
layer 1 | conv2d [2048, 256] | conv2d [2048, 256] | ||||
flatting | ||||||
position embedding | ||||||
layer 2 | Transformer | deep-wise cross correlation | ||||
layer 3 | linear [256, 251] | linear [256, 256] | conv2d [256, 256] | conv2d [256, 256] | conv2d [256, 256] | |
linear [256, 256] | conv2d [256, 10] | conv2d [256, 20] | conv2d [256, 3969] | |||
linear [256, 4] | ||||||
output | object classification | bounding box | match score | bounding box | mask |
Methods | Original Number | Neural Network Detection | Traditional Detection | |
---|---|---|---|---|
Images | ||||
14–20 km (10 images) | 218 | 181 | 150 | |
8–14 km (33 images) | 615 | 528 | 497 | |
4–10 km (24 images) | 823 | 723 | 587 | |
total | 1656 | 1432 | 1234 | |
ratio | 1.00 | 86.47 | 74.52 |
Items | Ground Truth Number | Well (IoU > 0.5) | Positive (IoU > 0.0) | Negative (IoU = 0) | |
---|---|---|---|---|---|
Images | |||||
14–20 km (13 image pairs) | 133 | 27 | 91 | 42 | |
8–14 km (15 image pairs) | 150 | 15 | 133 | 17 | |
4–10 km (25 image pairs) | 343 | 53 | 262 | 81 | |
total | 626 | 95 | 486 | 140 | |
ratio | 1.00 | 15.0 | 77.6 | 22.4 |
NO | RANSAC + SURF | [51] Method | Proposed Method | |||
---|---|---|---|---|---|---|
Precision | Recall | Precision | Recall | Precision | Recall | |
➀ | 72.7 | 28.6 | 100.0 | 92.6 | 76.4 | 46.4 |
➁ | 92.3 | 48.0 | 95.7 | 88.0 | 86.7 | 52.0 |
➂ | 100.0 | 22.0 | 92.5 | 98.0 | 93.3 | 28.0 |
➃ | 100.0 | 2.4 | 88.1 | 90.2 | 100 | 24.4 |
➄ | 81.8 | 9.6 | 99.0 | 100.0 | 87.3 | 67.4 |
➅ | 81.8 | 9.7 | 96.7 | 95.7 | 80.8 | 64.1 |
Average | 88.1 | 20.5 | 95.3 | 97.4 | 87.4 | 47.5 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, Y.; Lu, C.; Li, Z. Deep-Learning-Based Stereo Matching of Long-Distance Sea Surface Images for Sea Level Monitoring Systems. J. Mar. Sci. Eng. 2024, 12, 961. https://doi.org/10.3390/jmse12060961
Yang Y, Lu C, Li Z. Deep-Learning-Based Stereo Matching of Long-Distance Sea Surface Images for Sea Level Monitoring Systems. Journal of Marine Science and Engineering. 2024; 12(6):961. https://doi.org/10.3390/jmse12060961
Chicago/Turabian StyleYang, Ying, Cunwei Lu, and Zhenhua Li. 2024. "Deep-Learning-Based Stereo Matching of Long-Distance Sea Surface Images for Sea Level Monitoring Systems" Journal of Marine Science and Engineering 12, no. 6: 961. https://doi.org/10.3390/jmse12060961
APA StyleYang, Y., Lu, C., & Li, Z. (2024). Deep-Learning-Based Stereo Matching of Long-Distance Sea Surface Images for Sea Level Monitoring Systems. Journal of Marine Science and Engineering, 12(6), 961. https://doi.org/10.3390/jmse12060961