# A Physics-Driven CNN Model for Real-Time Sea Waves 3D Reconstruction

^{1}

^{2}

^{3}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

#### Related Work

## 2. WASSfast CNN

#### 2.1. The WASSfast Pipeline

- feature detection and optical flow are used to extract a set of matching feature points between left and right images;
- matches are triangulated to obtain a sparse 3D point cloud;
- spectrum at time $\left(t\right)$ (i.e., the current frame) is Predicted from the previous estimate at time $(t-1)$ according to the linear dispersion relation;
- spectrum prediction is updated to fit the triangulated points obtained in step 2. This creates an estimate of the spectrum at time $\left(t\right)$ that is used when processing frames at time $(t+1)$. Thus, the process repeats from step 1.

#### 2.2. Network Architecture

#### 2.3. Depth Completion Block

#### 2.4. Temporal Combiner Block

#### 2.5. Surface Reconstruction Block

## 3. Network Training

- Training data should be as heterogeneous as possible to comprise different wave direction, sampling density, frame rates, etc. This requires great effort in organizing a vast set of WASS processed data that would be impractical. Moreover, in this way, we are not ensured to capture as many conditions as possible to avoid overfitting.
- WASS data partially suffers from depth quantization produced by the dense stereo approach. If used for training data without proper filtering, the CNN would probably learn to “simulate" the quantization effect along the image scanlines;
- A vast amount of data needed to train a Deep Neural Network model without overfitting. We currently do not have enough data to ensure proper training, and data augmentation is a partially viable option since it is difficult to define image transformations to realistically simulate different view angles, wave directions, lighting conditions, etc.

#### 3.1. Loss Function

#### 3.2. Training Process

- The depth completion block alone (Figure 4) is trained on the whole dataset until convergence.
- The full WASSfast CNN (with temporal combiner and surface reconstruction) is trained introducing the depth completion block weights already trained in the first step.

- 50 epochs, sampling $d\sim U(0.15,0.20)$
- 50 epochs, sampling $d\sim U(0.10,0.15)$
- 70 epochs, sampling $d\sim U(0.05,0.10)$
- 70 epochs, sampling $d\sim U(0.03,0.05)$

## 4. Experimental Results

#### 4.1. Comparison against Synthetic Data

#### 4.2. Comparison on Real Data

#### 4.2.1. Time Series Comparison

`G201810061000`and

`G201810061700`, characterized by better contrast and exhibiting a higher number of triangulated points. The Pearson’s coefficient between WASS and WASSfast CNN is on the order of $0.98$, with the CNN mode performing better than PU for the two aforementioned sequences (exact coefficients reported in Figure 10). Only in sequence

`G201810061400`did the CNN mode performed slightly worse, even if the difference is almost negligible.

#### 4.2.2. Sea-Waves Spectrum

`G201810061400`shows a slightly different spectrum than the other two, and that may be the reason why the Pearson’s correlation for that timeseries is slightly worse for the CNN mode.

`G20200916T010003`computed with PU and CNN mode. It is interesting to observe that PU suffers some energy loss at a certain portion of the spectrum, as reported in [15]. Note, for instance, that, at $0.5$ Hz, the top-left section of the spectrum is noisier than the bottom right half. The new CNN approach seems not to be affected by this problem, even if it uses the same principle of PU mode for predicting the wave spectra among different frames. We think that the final surface reconstruction part of the network is able to compensate this energy loss more efficiently than the update step of the WASSfast PU mode. This behavior is very interesting and will be investigated in the near future.

#### 4.2.3. Qualitative Results

## 5. Conclusions

## Supplementary Materials

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Benetazzo, A.; Barbariol, F.; Bergamasco, F.; Torsello, A.; Carniel, S.; Sclavo, M. Observation of extreme sea waves in a space-time ensemble. J. Phys. Oceanogr.
**2015**, 45, 2261–2275. [Google Scholar] [CrossRef] [Green Version] - Alvise, B.; Barbariol, F.; Bergamasco, F.; Carniel, S.; Sclavo, M. Space-time extreme wind waves: Analysis and prediction of shape and height. Ocean Model.
**2017**, 113, 201–216. [Google Scholar] [CrossRef] - Benetazzo, A.; Barbariol, F.; Bergamasco, F.; Bertotti, L.; Yoo, J.; Shim, J.S.; Cavaleri, L. On the extreme value statistics of spatio-temporal maximum sea waves under cyclone winds. Prog. Oceanogr.
**2021**, 197. [Google Scholar] [CrossRef] - Filipot, J.F.; Guimaraes, P.; Leckler, F.; Hortsmann, J.; Carrasco, R.; Leroy, E.; Fady, N.; Accensi, M.; Prevosto, M.; Duarte, R.; et al. La Jument Lighthouse: A real scale laboratory for the study of giant waves and their loading on marine structures. Philos. Trans. R. Soc.
**2019**, 377, 20190008. [Google Scholar] [CrossRef] [Green Version] - Stringari, C.E.; Prevosto, M.; Filipot, J.F.; Leckler, F.; Guimarães, P.V. A New Probabilistic Wave Breaking Model for Dominant Wind-Sea Waves Based on the Gaussian Field Theory. J. Geophys. Res. Ocean.
**2021**, 126, e2020JC016943. [Google Scholar] [CrossRef] - Douglas, S.; Cornett, A.; Nistor, I. Image-Based Measurement of Wave Interactions with Rubble Mound Breakwaters. J. Mar. Sci. Eng.
**2020**, 8, 472. [Google Scholar] [CrossRef] - Zappa, C.J.; Banner, M.L.; Schultz, H.; Corrada-Emmanuel, A.; Wolff, L.B.; Yalcin, J. Retrieval of short ocean wave slope using polarimetric imaging. Meas. Sci. Technol.
**2008**, 19, 055503. [Google Scholar] [CrossRef] - Young, I.R.; Rosenthal, W.; Ziemer, F. A Three-Dimensional Analysis of Marine Radar Images for the Determination of Ocean Wave Directionality and Surface Currents. J. Geophys. Res.
**1985**, 90, 1049–1059. [Google Scholar] [CrossRef] [Green Version] - Nieto Borge, J.; Reichert, K.; Hessner, K. Detection of spatio-temporal wave grouping properties by using temporal sequences of X-band radar images of the sea surface. Ocean Model.
**2013**, 61, 21–37. [Google Scholar] [CrossRef] - Jähne, B.; Klinke, J.; Waas, S. Imaging of short ocean wind waves: A critical theoretical review. J. Opt. Soc. Amer. A Opt. Image Sci. Vis.
**1994**, 11, 2197–2209. [Google Scholar] [CrossRef] - Benetazzo, A.; Fedele, F.; Gallego, G.; Shih, P.C.; Yezzi, A. Offshore stereo measurements of gravity waves. Coast. Eng.
**2012**, 64, 127–138. [Google Scholar] [CrossRef] [Green Version] - Bergamasco, F.; Torsello, A.; Sclavo, M.; Barbariol, F.; Benetazzo, A. WASS: An open-source pipeline for 3D stereo reconstruction of ocean waves. Comput. Geosci.
**2017**, 107, 28–36. [Google Scholar] [CrossRef] - Gallego, G.; Yezzi, A.; Fedele, F.; Benetazzo, A. Variational stereo imaging of oceanic waves with statistical constraints. IEEE Trans. Image Process.
**2013**, 22, 4211–4223. [Google Scholar] [CrossRef] [Green Version] - Guimarães, P.V.; Ardhuin, F.; Bergamasco, F.; Leckler, F.; Filipot, J.F.; Shim, J.S.; Dulov, V.; Benetazzo, A. A data set of sea surface stereo images to resolve space-time wave fields. Sci. Data
**2020**, 7, 145. [Google Scholar] [CrossRef] [PubMed] - Bergamasco, F.; Benetazzo, A.; Yoo, J.; Torsello, A.; Barbariol, F.; Jeong, J.Y.; Shim, J.S.; Cavaleri, L. Toward real-time optical estimation of ocean waves’ space-time fields. Comput. Geosci.
**2021**, 147. [Google Scholar] [CrossRef] - Bertero, M.; Boccacci, P. Introduction to Inverse Problems in Imaging; CRC Press: Boca Raton, FL, USA, 1998; ISBN 9780750304351. [Google Scholar]
- Keller, J.B. Inverse Problems. Am. Math. Mon.
**1976**, 83, 107–118. [Google Scholar] [CrossRef] - Shemdin, O.H.; Tran, H.; Wu, S. Directional Measurement of Short Ocean Waves With Stereophotography. J. Geophys. Res.
**1988**, 93, 13891–13901. [Google Scholar] [CrossRef] - Shemdin, H.; Tran, H.M. Measuring short surface waves with stereophotography. Photogram. Eng. Remote Sens.
**1992**, 93, 311–316. [Google Scholar] - Banner, M.L.; Jones, I.S.F.; Trinder, J.C. Wavenumber spectra of short gravity waves. J. Fluid Mech.
**1989**, 198, 321–344. [Google Scholar] [CrossRef] - Scharstein, D.; Szeliski, R. A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms. Int. J. Comput. Vis.
**2002**, 47, 7–42. [Google Scholar] [CrossRef] - Benetazzo, A. Measurements of short water waves using stereo matched image sequences. Coast. Eng.
**2006**, 53, 1013–1032. [Google Scholar] [CrossRef] - Wanek, J.M.; Wu, C.H. Automated trinocular stereo imaging system for three-dimensional surface wave measurements. Ocean Eng.
**2006**, 33, 723–747. [Google Scholar] [CrossRef] - Gallego, G.; Benetazzo, A.; Yezzi, A.; Fedele, F. Wave Statistics and Spectra via a Variational Wave Acquisition Stereo System. In Proceedings of the ASME 2008 27th International Conference on Offshore Mechanics and Arctic Engineering, Estoril, Portugal, 15–20 June 2008; pp. 801–808. [Google Scholar]
- Gallego, G.; Yezzi, A.; Fedele, F.; Benetazzo, A. A Variational Stereo Method for the Three-Dimensional Reconstruction of Ocean Waves. Geosci. Remote Sens. IEEE Trans.
**2011**, 49, 4445–4457. [Google Scholar] [CrossRef] - Vieira, M.; Guimarães, P.; Violante-Carvalho, N.; Benetazzo, A.; Bergamasco, F.; Pereira, H. A low-cost stereo video system for measuring directional wind waves. J. Mar. Sci. Eng.
**2020**, 8, 831. [Google Scholar] [CrossRef] - Benetazzo, A.; Bergamasco, F.; Yoo, J.; Cavaleri, L.; Kim, S.S.; Bertotti, L.; Barbariol, F.; Shim, J.S. Characterizing the signature of a spatio-temporal wind wave field. Ocean Model.
**2018**, 129, 104–123. [Google Scholar] [CrossRef] - Guimarães, P.; Leckler, F.; Filipot, J.F.; Duarte, R.; Deeb, S.; Benetazzo, A.; Horstmann, J.; Carrasco, R. Extreme sea state measurements by stereo video system. Int. J. Offshore Polar Eng.
**2019**, 3, 2492–2497. [Google Scholar] - Pereira, H.; Violante-Carvalho, N.; Fabbri, R.; Babanin, A.; Pinho, U.; Skvortsov, A. An algorithm for tracking drifters dispersion induced by wave turbulence using optical cameras. Comput. Geosci.
**2021**, 148. [Google Scholar] [CrossRef] - Benetazzo, A.; Cavaleri, L.; Ma, H.; Jiang, S.; Bergamasco, F.; Jiang, W.; Chen, S.; Qiao, F. Analysis of the effect of fish oil on wind waves and implications for air-water interaction studies. Ocean Sci.
**2019**, 15, 725–743. [Google Scholar] [CrossRef] [Green Version] - Bergamasco, F.; Benetazzo, A.; Barbariol, F.; Carniel, S.; Sclavo, M. Multi-view horizon-driven sea plane estimation for stereo wave imaging on moving vessels. Comput. Geosci.
**2016**, 95, 105–117. [Google Scholar] [CrossRef] [Green Version] - Schwendeman, M.; Thomson, J. Sharp-crested breaking surface waves observed from a ship-based stereo video system. J. Phys. Oceanogr.
**2017**, 47, 775–792. [Google Scholar] [CrossRef] - Zhou, K.; Meng, X.; Cheng, B. Review of Stereo Matching Algorithms Based on Deep Learning. Comput. Intell. Neurosci.
**2020**, 2020. [Google Scholar] [CrossRef] [PubMed] - Franke, R.; Nielson, G.M. Scattered Data Interpolation and Applications: A Tutorial and Survey. In Geometric Modeling; Hagen, H., Roller, D., Eds.; Springer: Berlin/Heidelberg, Germany, 1991; pp. 131–160. [Google Scholar]
- Li, B.; Zhang, T.; Xia, T. Vehicle Detection from 3D Lidar Using Fully Convolutional Network. arXiv
**2016**, arXiv:1608.07916. [Google Scholar] - Ma, F.; Karaman, S. Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018. [Google Scholar]
- Zweig, S.; Wolf, L. InterpoNet, A brain inspired neural network for optical flow dense interpolation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Uhrig, J.; Schneider, N.; Schneider, L.; Franke, U.; Brox, T.; Geiger, A. Sparsity Invariant CNNs. In Proceedings of the 2017 International Conference on 3D Vision (3DV), Qingdao, China, 10–12 October 2017; pp. 11–20. [Google Scholar]
- Huang, Z.; Fan, J.; Cheng, S.; Yi, S.; Wang, X.; Li, H. HMS-Net: Hierarchical Multi-Scale Sparsity-Invariant Network for Sparse Depth Completion. IEEE Trans. Image Process.
**2019**, 29, 3429–3441. [Google Scholar] [CrossRef] [Green Version] - Jaritz, M.; de Charette, R.; Wirbel, E.; Perrotton, X.; Nashashibi, F. Sparse and Dense Data with CNNs: Depth Completion and Semantic Segmentation. In Proceedings of the 2018 International Conference on 3D Vision (3DV), Verona, Italy, 5–8 September 2018; pp. 52–60. [Google Scholar]
- Yan, Z.; Wang, K.; Li, X.; Zhang, Z.; Xu, B.; Li, J.; Yang, J. RigNet: Repetitive Image Guided Network for Depth Completion. 2021. Available online: https://arxiv.org/pdf/2107.13802.pdf (accessed on 20 September 2021).
- Hu, M.; Wang, S.; Li, B.; Ning, S.; Fan, L.; Gong, X. PENet: Towards Precise and Efficient Image Guided Depth Completion. 2021. Available online: https://arxiv.org/pdf/2103.00783.pdf (accessed on 20 September 2021).
- Shepard, D. A Two-dimensional Interpolation Function for Irregularly-spaced Data. In Proceedings of the 1968 23rd ACM National Conference, Las Vegas, NV, USA, 27–29 August 1968; ACM: New York, NY, USA, 1968; pp. 517–524. [Google Scholar] [CrossRef]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016; Available online: http://www.deeplearningbook.org (accessed on 20 September 2021).
- Brodtkorb, P.; Johannesson, P.; Lindgren, G.; Rychlik, I.; Rydén, J.; Sjö, E. WAFO—A Matlab Toolbox for the Analysis of Random Waves and Loads. In Proceedings of the Tenth International Offshore and Polar Engineering Conference, Seattle, WA, USA, 27 May–2 June 2000; Volume 3, pp. 343–350. [Google Scholar]
- Zhao, H.; Gallo, O.; Frosio, I.; Kautz, J. Loss Functions for Image Restoration With Neural Networks. IEEE Trans. Comput. Imaging
**2017**, 3, 47–57. [Google Scholar] [CrossRef] - Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Trans. Image Process.
**2004**, 13, 600–612. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Bengio, Y.; Louradour, J.; Collobert, R.; Weston, J. Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning, Montreal, QC, Canada, 14–18 June 2009; pp. 41–48. [Google Scholar]
- Graves, A.; Bellemare, M.G.; Menick, J.; Munos, R.; Kavukcuoglu, K. Automated curriculum learning for neural networks. In Proceedings of the International Conference On Machine Learning, Sydney, Australia, 7–9 August 2017; pp. 1311–1320. [Google Scholar]
- Narvekar, S.; Peng, B.; Leonetti, M.; Sinapov, J.; Taylor, M.E.; Stone, P. Curriculum learning for reinforcement learning domains: A framework and survey. arXiv
**2020**, arXiv:2003.04960. [Google Scholar]

**Figure 1.**The WASSfast reconstruction pipeline. Input stereo frames are analyzed to extract a sparse set of corresponding feature points for triangulation. This create a sparse 3D point cloud from which a gridded 3D surface is estimated. The original approach described in [15] is shown at the top with the name “PU Mode”. At the bottom, the CNN mode described in this paper uses a CNN to directly reconstruct the surface with a learning-based approach.

**Figure 2.**The point discretization process used to prepare data for the subsequent WASSfast CNN.

**Top-left**: initially, points are defined in the left (or right) camera reference system.

**Top-right**: points are transformed to the mean sea-plane reference system spanning the x–y axis with the z oriented upward.

**Bottom-right**: points are parallel projected into the regular grid defined on the mean sea-plane.

**Bottom-left**: A closeup of what happens if multiple points (a, b, c) falls on the same grid cell. A random point is chosen and its x–y coordinates are approximated to the coordinate of the grid cell center. This way, the entire grid cell takes the elevation value of the randomly chosen point.

**Figure 3.**The WASSfast surface reconstruction CNN. Input is composed by 3 frames taken at time $(t-1)$, $\left(t\right)$ and $(t+1)$. Each frame is a 2-channel $256\times 256\times 2$ tensor containing the sparse elevation data and the validity mask. The phase rotation matrices ${\mathcal{P}}_{{\Delta}_{t}-1}$ and ${\mathcal{P}}_{{\Delta}_{t}+1}$ are assumed to be known according to the current sequence frame rate, wave propagation direction, etc. Frames ${\mathcal{I}}_{t-1}$ and ${\mathcal{I}}_{t+1}$ are processed in parallel by 2 depth completion blocks with shared weights. Then, the temporal combiner transports the surfaces ${\mathcal{S}}_{t-1}$ and ${\mathcal{S}}_{t+1}$ to time t. The predicted surfaces are multiplied by their original masks (${\mathcal{M}}_{t-1}$, ${\mathcal{M}}_{t+1}$) and merged with ${\mathcal{I}}_{t}$, creating new denser data $(\overline{\mathcal{I}},\overline{\mathcal{M}})$. The result is then processed by the surface reconstruction block to produce the final surface ${\mathcal{O}}_{t}$.

**Figure 4.**The depth completion block involves a sequence of sparse convolution layers (see Figure 5), interleaved by ReLU activations.

**Figure 5.**The sparse convolution operation takes an input tensor composed by sparse data (in white) and the associated validity mask (in yellow). Data are convolved and then normalized to account only the valid points encompassed by the kernel. Mask is dilated by the max pooling operation and finally concatenated to the output.

**Figure 6.**Example of one synthetically generated scenario with network input at different densities $d=0.1,0.2$ and the corresponding ${\overline{\mathcal{O}}}_{t}$ (

**right**).

**Figure 7.**Top row: comparison of the mean absolute error (

**left**) and peak signal to noise ratio (

**right**) of the surface reconstructed with WASSfast CNN, SparseCnn and IDW varying the sample density. Bottom row: frequency spectra of timeseries extracted from a grid center when reconstructing a synthetic sequence at different sampling densities.

**Figure 8.**Qualitative result of our CNN for sea waves’ surface reconstruction. From

**left**to

**right**: sparse input data, IDW interpolation, output of depth completion, WASSfast CNN output ${\mathcal{O}}_{t}$, ground truth output ${\overline{\mathcal{O}}}_{t}$. Each row shows a different scenario with an increasing sampling. Note how the full network output (with temporal combiner and an additional feed-forward CNN step) improves the reconstruction of the depth completion block alone (SparseCNN), especially at high frequencies. Colorbar is in meters.

**Figure 9.**Surface reconstruction errors (in meters) when reconstructing the synthetically generated data. From

**left**to

**right**: sparse input data, ground truth, IDW interpolation, sparseCNN, WASSfast CNN. Each row shows a different scenario with an increasing sampling.

**Figure 10.**Time series comparison between WASS, WASSfast PU and WASSfast CNN on the three sequences at the Gageocho ORS. Pearson’s correlation between each WASSfast mode and standard WASS is reported in the legends.

**Figure 11.**Frequency-spectra comparison between WASS, WASSfast PU, and WASSfast CNN.

**Bottom**-

**right**: The reconstructed area (red polygon) with the grid point used to extract the elevation timeseries.

**Figure 12.**Directional spectra sliced from the 3D spectrum $\mathcal{S}({K}_{x},{K}_{y},{\omega}_{a})$ at $\omega =0.3,0.4,0.5$ Hz for record G20200916T01000.

**Top row**: WASSfast PU mode;

**Bottom row**: WASSfast CNN mode.

**Figure 13.**Qualitative comparison of the surface grid reconstructed by WASS (

**Top**) and WASSfast PU (

**Mid**) and WASSfast CNN (

**Bottom**) for record G201810061400.

Parameter | Value |
---|---|

Grid size | $256\times 256$ |

Grid cell size (m) | $0.46$ |

Frame rate (Hz) | 7 |

Significant wave height Hm0 (m) | Random uniform in range $[5.0\dots 8.0]$ |

Primary peak period Tp (s) | Random uniform in range $[7.2\dots 8.8]$ |

Spreading parameter Sp (deg) | Random uniform in range $[15.0\dots 22.0]$ |

Wave direction ${\theta}_{0}$ (rad) | Random uniform in range $[0\dots 2\pi ]$ |

Number of frames | 700 |

**Table 2.**The three stereo sequences used to compare the two WASSfast reconstruction modes against the old WASS pipeline.

Record Name | Time | Location | Rate | Length |
---|---|---|---|---|

G201810061000 | 6 October 2018, 10:00 UTC9 | Gageocho ORS | $7.5$ Hz | 2000 frames |

G201810061400 | 6 October 2018, 14:00 UTC9 | Gageocho ORS | $7.5$ Hz | 2000 frames |

G201810061700 | 6 October 2018, 17:00 UTC9 | Gageocho ORS | $7.5$ Hz | 2000 frames |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Pistellato, M.; Bergamasco, F.; Torsello, A.; Barbariol, F.; Yoo, J.; Jeong, J.-Y.; Benetazzo, A.
A Physics-Driven CNN Model for Real-Time Sea Waves 3D Reconstruction. *Remote Sens.* **2021**, *13*, 3780.
https://doi.org/10.3390/rs13183780

**AMA Style**

Pistellato M, Bergamasco F, Torsello A, Barbariol F, Yoo J, Jeong J-Y, Benetazzo A.
A Physics-Driven CNN Model for Real-Time Sea Waves 3D Reconstruction. *Remote Sensing*. 2021; 13(18):3780.
https://doi.org/10.3390/rs13183780

**Chicago/Turabian Style**

Pistellato, Mara, Filippo Bergamasco, Andrea Torsello, Francesco Barbariol, Jeseon Yoo, Jin-Yong Jeong, and Alvise Benetazzo.
2021. "A Physics-Driven CNN Model for Real-Time Sea Waves 3D Reconstruction" *Remote Sensing* 13, no. 18: 3780.
https://doi.org/10.3390/rs13183780