Research on a Measurement Method for the Ocean Wave Field Based on Stereo Vision

.


Introduction
Ocean waves indicate a classic and important research direction in the field of physical oceanography [1].The wave parameter is also an important environmental input condition [2].The directional spreading of ocean waves plays an important role in various aspects of ocean engineering, such as wave-induced loads, nonlinear wave evolution, and wave breaking [3].Moreover, consistent sea state time series are essential for building coastal protection or offshore structures [4].Current wave observation methods can be divided into two categories [5], the first of which is the "point" measurement method that is represented by wave buoys, which requires a fixed mooring line and is troublesome to deploy and maintain [6].Another one is the "surface" measurement method that is represented by a synthetic aperture radar that obtains wave data depending on the selection of inversion parameters.It has a low measurement accuracy.Owing to such problems in the existing wave observation methods, research on non-contact, mobile and high-precision wave field measurement technology has become one of the important tasks for many countries [7].
Stereoscopic wave measurement technology is a wave measurement technology that uses two photographic cameras to synchronically collect images of the sea's surface.It can directly extract the three-dimensional spatiotemporal distribution of the sea surface fluctuations through the principle of image matching and lens imaging and it obtains the direction spectrum [8].This technology can effectively make up for the deficiencies of buoys and the remote sensing observation method.Therefore, the development of stereo vision wave measurement technology has important scientific significance and a practical application value for the theoretical development of ocean waves and their application in ocean remote sensing and ocean engineering [9].
The direct linear transformation (DLT) theory was proposed by Abdel-Aziz in 1971; it takes several or even ten days for each pair of images to be completely analyzed, thereby affecting the post-processing time [10].In 2009, Vies et al. developed a stereo-photographic ocean wave measurement system and performed an experiment on the coast of Scheveningen [11].They successfully measured the evolution of waves and they found that the measured sea surface area reached 1800 m 2 [12].Furthermore, in 2011, Wanek developed a fully automated trinocular photography ocean wave observation system (ATSIS) [13].Different from the other photography systems, ATSIS uses three synchronized cameras to form an image acquisition system in the shape of a "product" and it simulates the measurement scene through the previously recorded camera frame's orientation in the laboratory in order to calibrate the external orientation elements.The ATSIS calibration process is cumbersome and the measurement area is limited by the camera's frame, indeed the ATSIS measurement area is only about 10 m 2 [14].In 2010, Cai Zhenghan used three synchronous cameras to form an image acquisition system in the shape of a "pin" [15] and then obtained the coordinates of the control point through the means of assessing the underwater makeup.It was necessary to set the calibration points with the current pool and the overall operation was complex [16].In 2011, Yu Heng of Henan University captured a target image of the Yellow River model water flow vertically through a binocular system.By combining the improved Canny edge detection algorithm with the morphological connected domain segmentation method, he finally extracted and identified the river's potential width and the flow velocity of the Yellow River model [17].In 2012, Jiang Wenzheng, through his digital photogrammetry wave measurement system, reported a good study on the feature matching of left and right images [18].Moravec's corner detector is a corner detection operator that is based on gray variance that extracts the points with the maximum and minimum gray variance in four main directions as the feature points [19].First, the Moravec corner points were extracted from the top layer of a pyramidal image.Then, the left and right images were extracted.Next, the corner points that were extracted from the top-level image of the pyramid were matched based on the principle of a maximum cross-correlation measure, thus obtaining the toplevel matching pair set.In the matching pair sets, each matching pair was in the corresponding lower-level area of the pyramid and then matching that was based on a maximum cross-correlation measure was performed.After that, the matching sets of the left and right images were obtained.This feature matching method provides matching pairs as coordinates for the three-dimensional reconstruction of waves and, in this way, realizes sea surface wave measurement.However, the matched results that are obtained via this method have many mismatches.Accordingly, in the future, complex algorithms need to be introduced in order to eliminate the mismatches, alternatively algorithmically accurate wave heights can be introduced in the three-dimensional reconstruction [20].In 2014, Cui Hao et al. achieved the acquisition of wave crest coordinates based on contour extraction technology, which reduced the influence of the acquisition accuracy on the two-dimensional coordinates of the wave crest in non-contact wave photogrammetry technology, but only the wave's crest was studied [21].This paper presents a comprehensive process algorithm of wave measurement that is based on stereo vision.
(1) The matching part utilizes the gridding siftGPU method, wherein first the common area of the left and right cameras is determined and, then, grid zoning for the common area is carried out.The grid coordinates of the left camera are used to preliminarily locate the corresponding grid coordinates of the right camera and GPU multithreading processing is used in order to realize multi feature points and fast matching.Compared with the conventional siftGPU, the matching time with the proposed approach can be controlled below 6 s while the number of matching points can be maintained above 20,000, an outcome that is evidently much better than that of the traditional siftGPU algorithm.(2) In this paper, the least squares method and wave theory are used to normalize the sea surface-based plane calibration, plane error point fitting, and wave parameter inversion.The design's method is simple and feasible.Compared with the traditional wave height meter, the wave height range is 0.2 m-1 m, the measurement error is less than 10%, and the periodic measurement error is less than 0.5 s. (3) The Hough transform and least squares method have been used herein to calculate the wave's direction value.It was observed that, for a given wave image, the detection results and the original direction are basically the same.
The rest of this paper is organized as follows.Section 2 introduces the whole process of utilizing the algorithm of wave measurement that is based on stereo vision.Section 3 presents the experimental tests and results.Finally, the conclusions and directions for future work are given in Sections 4 and 5.

Stereo Calibration
The purpose of calibration is to obtain a mapping model between the target that is in three-dimensional coordinate space and the two-dimensional image [22].Essentially, the internal parameters, external parameters, and distortion coefficients that are obtained through this calibration are the basis of the image correction, parameter inversion, and three-dimensional reconstruction.This paper refers to the method that was proposed by Zhang Zhengyou [23] in order to achieve dual-objective targeting.The specific steps are listed as follows: (1) Construct a checkerboard calibration board with black and white grids, wherein the length and size of the black and white grids are 10 cm and 9 cm × 7 cm, respectively.(2) Use the binocular camera to take 25 images of the calibration board from different angles.(3) Use the algorithm to identify the pixel coordinates of the corner points (i.e., the black and white grid intersections) in the image for matching.(4) Calculate the inversion in order to obtain the internal parameters, external parameters, and lens distortion coefficients of the binocular camera.

Scale-Invariant Feature Transform (SIFT) Feature Extraction and Matching
The first part of the SIFT process is the production of a Gaussian scale space [24], in which each octave is computed by means of the convolution between a down-sampled image and different Gaussian kernels, such as L(x, y, σ) in Equation (1), where G(x, y, σ) is a variable-scale Gaussian kernel and I(x, y) is an input image.
Next, the second part includes a difference of Gaussian (DoG) scale space [25], this DoG is a close approximation of the Laplacian of Gaussian (LoG), as expressed in Equation (2).Compared with LoG, DoG has lower computational costs and uses approximately the same function to extract stable features.L(x, y, σ) = G(x, y,σ) I(x,y)  (1) The third part of SIFT contains the pixel and sub-pixel localizations of a keypoint [26].The extreme pixel is compared with 26 neighboring pixels in a DoG scale space, which is chosen as the candidate pixel keypoint.Then, the localization of the sub-pixel keypoint is implemented using a quadratic approximation, which is computed using the second-order Taylor expansion.
The fourth part of the SIFT algorithm uses 128 dimension descriptors [27] to conduct the feature matching process.The SIFT feature descriptors represent the image gradient magnitudes and orientations in a 4 × 4 sub-region around the candidate feature.
Unfortunately, this strategy needs high computational power and memory consumption, which limit the use of SIFT in real-time engineering applications [28].Notably, the above steps of the primal SIFT can run on a GPU in parallel fashion.It is shown that the performance of SiftGPU is nearly real-time for the images and the quality of the features that are extracted by SiftGPU is approximately the same as those that are extracted by the primal SIFT.However, since the GPU's memory is limited, large images need to be downsampled for size reduction before applying SIFT [29].
Due to the limitations of GPU and SIFT, a gridding siftGPU method is proposed in this work in order to extract and match the features from large images more efficiently.

Gridding SiftGPU Feature Extraction and Matching
First, we used several pairs of wave images to perform the image matching through the use of traditional SIFT, determine the common area of the left and right images, and determine the left image edge points lmin lmin (u , v ) and right image edge points rmax rmax (u , v ) .Here, the common area of the left image is while that of right image is (0 − rmax u , 0 − rmax v ), and the non-overlapping area is cropped.Subsequently, we divided the common area of the left and right images into matching blocks.At this stage, a specific matching block size was selected according to the actual resolution of the image.It was assumed that the pixel value of a block in the left image was the same as the pixel value in the matching block in the right image.Meanwhile, the size of the areas corresponding to the same matching block are not equal and the pixel coordinates do not have a simple one-to-one correspondence.The calculation method is given in Figure 1: Considering the camera distortion [30], there are: In above formula, il il (u , v ) are the pixel coordinates of the left image, 0l 0l (x , y ) are the center coordinates of the left image, k is the distortion parameter of the left and right cameras, and r is the distance between the pixel coordinates and center coordinates, As shown in Figure 2, when it is assumed that the vector coordinates of OA are   il il r = (x , y , f) and that  o is the vector coordinates of OO', then the normal vector of the plane is expressed as follows: Then the normal vector n in the right camera coordinate system is expressed as follows: In above formula, R is the rotation matrix of the left camera and N represents the matrix corresponding to normal n.Therefore, the right camera pixel coordinates are: where f is the focus of the camera, N is the vector corresponding to matrix n, and  ) are the coordinates of the right camera.
Considering the camera distortion, there are: Where k is the distortion parameter of the right camera, r is the distance between the pixel coordinates and the center coordinates, and 0r 0r (x , y ) are the center coordinates of the right image.In this way, the pixel coordinates of the left and right matching grids were obtained and the position of the left and right corresponding windows was confirmed.
After determining the corresponding matching blocks in the left and right images, we then performed multi-threaded siftGPU accelerated matching on the matching grids in order to achieve fast and accurate matching.

Sea Surface Base Plane Calibration
We then converted the image coordinates to camera coordinates and the specific method for this conversion is described as follows: Assume that the coordinates of the wave target point in the left camera coordinate system  (u , v ),(u , v ) .According to the conver- sion relationship, the formula for converting the left and right image points to the left camera coordinate system can be given as: In above formula, l r where M, U can be obtained from l H , r H , and the pixel coordinates Then the camera coordinates of the wave space point w w w (x , y , z ) can be obtained through the following formula: Hereafter, we converted the camera coordinates to the object coordinates; the specific method is as follows: Select the calm sea images that match.The sea surface base plane equation is given as: In the formula, a and b are the normal vectors in the X and Y directions and c is the average distance from the camera to the sea surface base plane.
Through the transformation of the coordinate system, the coordinates of the feature points in the camera coordinate system are obtained as: T The parameters of the sea surface base plane can be calculated using the least squares method as: Normalizing the plane normal vector that was obtained from each image results in: In the formula,  n is the normal vector of the sea surface base plane.The spherical coordinate transformation is performed on the normal vector, which is convenient for calculating the changing relationship between the image coordinate system and the object coordinate system.
In these formulas,  and  correspond to the spherical coordinates of the sea surface base plane.
Then, the conversion of the rotation and translation matrices from the camera coordinate system to that object coordinate system is expressed as: w sin cos 0 R cos cos cos sin sin sin cos sin sin cos Subsequently, the camera coordinates are obtained based on the sea surface base plane equation for the conversion from the camera coordinate system   w w w x ,y ,z to the object coordinate system   c c c x ,y ,z .This conversion equation is:

Wave Height and Period
The center point of the matching block was selected as the wave inversion point.In order to ensure the accuracy of the calculation, the center point of the four matching blocks was selected in order to calculate the wave's height and the changes in the wave data throughout 10 min was recorded.Next, the spectrum was obtained by the use of the Fourier transform and the main frequency range was selected.Following this, we selected the frequency of 0-0.5 Hz, filtered out the high-frequency part, and performed an inverse Fourier transform in order to obtain the time-domain curve corresponding to the main frequency.Then, we used the zero-crossing method to calculate the wave height and period.

Wave Direction
The wave direction calculation method that was employed in this paper is as follows: Essentially, the Hough transform is a feature extraction method.In this work, the Hough transform was performed on the left image and the specific steps are as follows: 1. Use a two-dimensional Gaussian function to the detect the extreme points.Obtain the coordinates of those extreme points i i (u , v ) .

Convert the planar coordinates i i
(u , v ) to polar coordinates ( ρ, θ ) by using: 3. Calculate the maximum value with a two-dimensional accumulator A(a, b) as: The camera resolution is 1 2

M M
 and, therefore, the parameter coordinate range is: The maximum value corresponding to the accumulator in the statistical parameter coordinate system is ,       .Record all of the pixel coordinates on the line., select the corresponding coordinates to carry out the least squares fitting, so as to calculate the normal vector of the main peak line; The normal vector of each straight line corresponds to two angles; thus it is necessary to determine the direction of the wave motion.In this paper, true north and counterclockwise rotation are considered as the positive direction when judging the change in the intercept of the wave crest line that was calculated in the frame images afterwards, so as to determine the wave movement direction.

Laboratory Verification
The relevant experiment was carried out in the China Shipbuilding Research Center.The experiment used two cameras with a resolution of 6280 × 3158 in order to obtain the impact pair (shown in the Figure 3).The image shooting time was 17 January, 2022.Moreover, a calibration box was used to check the system algorithm in order to measure its accuracy and the specific parameters are as follows:

Calibration Results
The calibration results of the internal and external parameters of the camera are in Table 1: = 5700 and rmax v = 3000.Meanwhile, the gridding matching block size was 300 × 300 and the number of matching blocks was 190.Here, we chose the left picture (580, 158) and the right image (0, 0) for verification and the matching results are provided in the following Figure 3.Following this, siftGPU and gridding siftGPU were used to match image 1 and image 2 and the matching result is shown in the following Figures 4-7.It can be seen from Figures 4-7 that the matching number of feature points that are based on the gridding siftGPU algorithm is obviously better than that of the points that are based on the siftGPU algorithm.The calculation results are shown in the Table 2.By comparison, the matching time of the gridding siftGPU was 1/6 of that of the sift-GPU, whereas the match quantity that was based on the gridding siftGPU was 8 times that which was based on siftGPU.

Calibration Results for the Base Plane
Twenty feature points were selected in image 2 in order to calculate the base plane.The selected area is the ground feature in image 2, which is shown in the Figure 8: The code numbers and coordinates of the selected feature points are shown in Table 3.  Next, 14 points were selected in order to verify the measurement accuracy of the system.The obtained measurement results were compared with the tape (the minimum scale was 1 mm) and the error was basically controlled within 1%.The results are shown in Tables 4 and 5.

Offshore Test Verification
The sea test was performed on the Qingdao Blue Valley No. 1 fixed platform on January 19, 2022.The image was captured simultaneously by two QHY26m industrial cameras and the lens that was used was a Nikon 50 mm/1.4Dfixed-focus.The pixels of the QHY26M industrial camera were arranged to a value of 6280 × 3158, the pixel size was 3.76 um, and the height of the camera from the sea surface was about 10 m (relative to the tide level).The exposure time was set to 2 ms, the sampling frequency was 2 Hz, and the test position was the hatch of the platform.The test interval was between 14:00-16:00 and, in each observation, the data were collected for 10 min.The results of this work are compared with an RBR wave height meter's data for verification.Furthermore, the test parameters are consistent with the laboratory verification parameters and the test pictures are shown in Figure 10  As shown in the Figure 11, in the same time interval, the time-domain variation curve of wave height as measured at grid point (0, 0) with the proposed stereo vision wave measuring system is consistent with the time-domain variation trend of the wave height as measured by an RBR wave height meter.Moreover, the measurement results at grid point (−320, 0) are consistent with those at grid point (0, 0), in the same time interval.
The calculation results are shown in Table 6: As shown in the Table 7, the wave height error that was based on binocular stereo vision is less than 10% and the period is less than 0.5 s, relative to the results of the RBR wave height meter.The wave direction diagram is shown in the Figure 12: As shown in Figure 12, the wave direction of the normal vector was basically consistent with the actual movement direction of the wave.The wave direction measurement error is ±10 degrees and the resolution is 1 degree.These findings imply that the measurement results are accurate.The calculation results are shown in Table 8:  As is evident from Figure 12, the proposed method can effectively realize the threedimensional reconstruction of an ocean wave field.
The wave height value of the 3D inversion is basically the same as the measurement result of the RBR wave height meter.

Discussion
Based on the results that were achieved by utilizing stereo vision in wave-height measurement, the following conclusions have been drawn: (1) Relying on the actual sea trial measurement results, it can be confirmed that the stereo vision wave measurement method can enable the accurate measurement of wave parameters (wave height, period, and wave direction) [30].This is because the stereo vision technology uses the parallax formula to calculate the distance to the measured object, while the height of the object can be measured through corresponding coordinate conversion.Through the calculation of the elevation value of a fixed area, the wave height and the period of the sea wave can be analyzed.Furthermore, the wave direction can be detected using the normal vector of the wave crest line in the image, therefore it is necessary to recognize and fit the line features.(2) In order to realize a real-time operation of the whole process algorithm, the richness of the wave field's features, and the accuracy of the measurement results, it is necessary to use a high-resolution camera for the measurement.However, this results in a too long matching time and very high computer memory costs.In this regard, the gridding siftGPU algorithm that is proposed in this paper considers that the common area of the left and right images can be gridded without down-sampling and the number of feature pairs can be matched using GPU multithreading processing, hence the matching is faster.
At present, radar is used for onboard wave measurements, however radar measurement is expensive and it has a low level of accuracy [31].The fixed-point measurement methods mainly include ADCP and the wave heigh meter, both of which have high requirements for the water's depth and they are also troublesome to place and recover.ADCP needs to be installed on the ground, while the wave heigh meter requires mooring line fixation [32].In addition, both of these methods can only realize single point measurement and the measurement results will degrade due to the process of wave breaking under strong wave conditions.In contrast, stereo vision wave measurement technology can realize a non-contact and high-precision measurement.The main limitation of the approach that is presented in this paper is the high requirements of computer memory and GPU.Besides this, the internal and external parameters of the camera need to be calibrated manually before the measurement is undertaken.The calibration parameters also need to be kept unchanged throughout the actual application.
Binocular stereo wave measurement technology is a promising field, nevertheless there are still some problems that need to be solved in order to fully enable this technology for future engineering application [33].These include the system's motion compensation and applicability to complex lighting conditions (rainy and foggy days).The pre-calibration part can form a standard library for the internal and external parameters of the camera, which can be called directly in the practical application, so as to adapt to different measurement distances.Under complex lighting conditions, the illumination adaptability of the algorithm needs to be further studied.The relationship between the illumination intensity and the best aperture and focal lengths can be trained by deep learning in order to realize all-weather automatic focusing [34].The system needs to consider the motion compensation algorithm and hardware for the motion of the platform.Taking a ship as an example, rolling and surge will not affect the measurement accuracy of the system and the heave direction needs to be compensated for by the integration of the results of the acceleration measurement, such as rolling and pitch.For yaw, a three degree of freedom motion compensation platform needs to be developed for the hardware's compensation.Furthermore, with the development of ship and marine intelligent chemical engineering, marine measurement technology that is based on vision will become more and more important.

Conclusions
This paper presents a comprehensive processing algorithm for wave measurement that is based on stereo vision.
(1) Compared with the conventional siftGPU, based on 6280 × 4210 image processing, the matching time can be controlled below 6 s and the number of matching points can be kept above 20,000, indicating much better performance than that of the traditional siftGPU algorithm.(2) In this paper, the least squares method and wave theory have been used to normalize the sea surface base plane calibration, plane error point fitting and wave parameter inversion.This design method is simple and feasible.Compared with RBR wave height measurement, the wave height range is 0.2 m-1 m, the measurement error is less than 10%, and the periodic measurement error is less than 0.5 s. (3) In this paper, the Hough transform and least squares method were used to calculate the wave direction value.Because the RBR wave height instrument does not have the function of wave direction measurement, compared with the original wave image, the detection results and original direction are basically the same.

Figure 1 .
Figure 1.Gridding image and identification of corresponding block.

Figure 2 .
Figure 2. Schematic diagram of the principle of image matching.
y , z ) and that the coordinates of the corresponding image points on the left and right images are l l r r

,A
A are the internal parameter matrices of the left and right cameras, respectively; R, T are the rotation and translation matrices of the dual target, which can be obtained by calibration, and, then, l H and r H can be obtained.Expanding the above formula can provide the following formulas:

Figure 10 .Figure 11 .
Figure 10.Illustration: (a) stereo vision ocean wave observation system; (b) RBR wave height meter.The size of the field was   min 1500mm X ,  max 1000mm X ,   min 1000mm Y ,

Figure 12 .
Figure 12.Illustration: (a) wave direction diagram of the first image frame; (b) wave direction diagram of the second image frame; (c) wave direction diagram of the third image frame; (d) wave direction diagram of the thirteenth image frame.

Figure 13 .
Figure 13.Illustration: (a) wave field 3D reconstruction of the first frame image; (b) wave field 3D reconstruction of the second frame image; (c) wave field 3D reconstruction of the third frame image; (d) wave field 3D reconstruction of the fourth frame image.

Table 2 .
Matching time and feature points numbers.

Table 3 .
Sea surface base plane calibration feature point coordinates.

Table 4 .
Measurement accuracy verification of the calibration box image 1.

Table 5 .
Measurement accuracy verification of calibration box image 2.

Table 7 .
Data comparison for different measured coordinates.

Table 8 .
Measurement accuracy verification of wave direction.
A 3D reconstruction of the wave field is shown in Figure13: