A Stereo Matching Method for 3D Image Measurement of Long-Distance Sea Surface

: Tsunamis are some of the most destructive natural disasters. Some proposed tsunami measurement and arrival prediction systems use a limited number of instruments, then judge the occurrence of the tsunami, forecast its arrival time, location and scale. Since there are a limited number of measurement instruments, there is a possibility that large prediction errors will occur. In order to solve this problem, a long-distance tsunami measurement system based on the binocular stereo vision principle is proposed in this paper. The measuring range is 4–20 km away from the system deployment site. In this paper, we will focus on describing the stereo matching method for the proposed system. This paper proposes a two-step matching method. It ﬁrst performs fast sparse matching, and then complete high precision dense matching based on the results of the sparse matching. A matching descriptor based on the physical features of sea waves is proposed to solve the matching difﬁculty caused by the similarity of sea surface image textures. The relationship between disparity and the y coordinate is built to reduce the matching search range. Experiments were conducted on sea surface images with different shooting times and distances; the results verify the effectiveness of the presented method.


Introduction
The tsunami warning systems operated by the Tsunami Warning Center in the United States rely on teleseismic measurements [1,2]. Limited by the number of measurement instruments, the monitoring scope is limited to the coast adjacent to the earthquake. The submarine cable system in Japan inputs the earthquake magnitude and hypocenter into pre-built simulation models to execute a tsunami warning [3,4]. A lack of parts will result in poor warning accuracy. The Great March 11th Earthquake in Japan provides a grim example for this. Three minutes after the earthquake happened, the earliest tsunami warning predicted that in the Iwate Prefecture the sea level height on shore would reach 3 m, but post-tsunami surveys showed that the average wave height was up to 11.8 m [5]. The German-Indonesian Tsunami Early Warning System takes an "end-to-end" approach to cover the complete tsunami warning chain; its working mechanism is similar to the submarine cable system and the Tsunami Warning systems in India and Australia [6][7][8].
Other methods include rapid determination of sea level variations caused by tsunamis and tsunami parameter estimation from Global Navigation Satellite Systems, tsunami detection and forecasting by radar on unconventional airborne crafts [9][10][11]. The sparseness and cost limit their capabilities. In this paper, a much more small-scale and flexible tsunami stereo measurement system is proposed; it scans the sea surface to increase measurement coverage and aims to measure sea level height in real time within its coverage area.
With the advancement in the field of computer vision, some scholars have turned towards building 3D geometry of sea waves [12][13][14][15]. Wave research based on the stereo system started to become more common after a partially supervised 3D stereo system called Wave Acquisition Stereo System (WASS) was proposed [16]. A novel video observational system relying on variational stereo techniques to reconstruct the 3D wave surface was developed [17,18]. Other seminal reconstruction methods adopted local methods to compute the disparity map [13,16,19]. In 2017, Bergamasco et al. proposed an open-source pipeline for the 3D stereo reconstruction of ocean waves [20]. They specifically described all the steps required to estimate dense point clouds from stereo images. The system is mounted 12 m above the mean sea level, covering an area of 85 × 65 m 2 . Currently, mostly stereo systems for wave acquisition are used to compensate for the lost details of sea waves on a small scale, and the 3D reconstruction scope is limited to the sea area near the system. However, the excellent advances in this field of research have given us the inspiration and confidence to build a novel long distance stereo measurement system for tsunami measurement. With the use of telephoto lens, the measurement range of our proposed system is extended to 4-20 km away from the system deployment site, which can meet the requirement of tsunami warning. Figure 1a shows the proposed system configuration. Figure 1b shows the data processing flowchart of the system. Firstly, we take sea surface images with the proposed stereo system which has one camera on the left and one on the right, then conduct stereo matching, calculate sea level height according to matching results and, lastly, judge if there is a tsunami happening. Stereo matching is one of the key steps, and in this paper we will focus on the stereo matching method of the proposed system. To realize accurate tsunami measurement, we must reduce stereo matching errors to smaller than eight pixels (we give the reasons for this accuracy requirement in the Discussion Section) as well as processing time to less than 24 −1 s (real time measurement). Sea surface images lack texture and sea water is non-still. Additionally, long distance measurement suffers from large disparities in search range. For these reasons, stereo matching in this system is difficult. The existing stereo matching methods for wave reconstruction stereo systems can be classified into three categories: (1) local methods that suffer the delicate trade-off between the disparity window size (which influences the match localization accuracy) and the required surface smoothness [13,16,19]; (2) global methods that are so computationally intensive that they are unlikely to be used in practice [17,18]; and (3) semi-global methods that are based on well-packaged OpenCV library functions [21] and lose efficacy for long distance sea surface image pairs [20,22]. In this paper, stereo matching is divided into two steps: (1) fast sparse sea wave matching is done by feature vectors to reduce disparity searching range, and (2) pixel-wise dense matching is done by leaning cost volume to realize accurate matching. In our method, a decision tree is built to accomplish sparse matching based on the feature vector [23]. According to the sparse matching result, we can build the leaning cost volume and a penalty volume, and then utilize them to complete the dense matching based on a simplified semiglobal matching (SGM) algorithm [24]. The SGM algorithm suggested 16 searching paths for providing good coverage of the 2D image. To increase the matching speed, in this paper, the number of searching paths is decreased from 16 to 8 as we conduct sparse matching first and a penalty volume is built instead of a constant penalty. The rest of the paper is organized as follows. In Section 2, we will introduce the whole stereo matching method, and in Section 2.2 the first step fast sparse matching method is introduced. In Section 2.3 the second step high precision dense matching is demonstrated. We also formulate the relationship between disparity d and the y coordinate, and a leaning cost volume based on the relationship is built to reduce the time and memory consumption of dense matching. In Section 3 we will show the experimental results. Finally, in the Section 4, the applicability for tsunami measurement and the limitations of the proposed method are discussed.

Method
Our proposed stereo matching method is described in distinct processing steps. It is a combination of sparse matching and dense matching. There may be some terms used that can be unclear for a common reader. Thus, we define them in Table 1.

Motivations
Recently, stereo matching utilizing deep neural networks has achieved significant advances [25][26][27]. We are now also working on stereo matching of sea surface images using a deep learning network. However, there are still some difficulties that remain unsolved, such as the lack of ground truth for supervised networks, low accuracy of unsupervised network results, time and memory consumption, etc. Therefore, this paper focuses on stereo matching using traditional methods. A Siamese network was built to complete sparse stereo matching of sea surface images [28]. Figure 2a shows the matching result. However, it cannot be applied to tsunami measurements due to the high time consumption; the running time of one pair of stereo images lasts more than one minute. The key of sparse matching is the selection and description of feature points. Long distance sea surface images are low-texture images. Common feature point detectors, such as Harris, FAST, LoG and DOG can only detect feature points and are not sensitive enough for low-texture images [29][30][31][32]. Thus, for sea surface images, we detect feature regions for sparse matching. An adaptive dynamic threshold method is used to detect sea waves as feature regions [33]. Common descriptors, such as SIFT, SURF, GLOH, DAISY and LIOP cannot be adaptive to the feature region's size, which makes them difficult to describe the detected sea waves, because sea waves change randomly in shape, size and location [20,32,[34][35][36]. Figure 2b shows one of the result of these methods (RANSAC+SURF [20,37]), where only well-characterized large waves are being correctly matched. Therefore, we propose a new descriptor for sea waves to perform sparse matching. We will describe this in detail in the next subsection.
The measurement of tsunami by the proposed stereo system requires a smaller than eight pixels matching error. It is difficult to ensure by sparse matching since it is a regionto-region matching method. Thus, the second step (dense matching) needs to be conducted. Dense matching contains: (1) cost computation, such as SAD, MI and NCC, etc. [38] and (2) cost aggregation, such as SGM, graph cuts and BF [24,38,39]. Little research on dense matching has studied the establishment of cost volume. The previous algorithms assume that the disparity changes within a constant small range D s so that the algorithm can find the best matching within limited memory space and computing time. However, for long distance sea surface images, the disparity varies in a range of over 600 pixels. The traditional cost volume will lead to greater than 4 GB consumption of memory space. To solve this problem, this paper proposes a leaning cost volume based on the first step sparse matching result. We will give a detailed description in Section 2.3.

Feature Vector Definition
Wave matching can usually be done using the shape and grayscale (or color) of the waves. However, the shape and grayscale can be easily influenced by some uncontrollable factors such as different extraction thresholds, different shooting angles, etc. Furthermore, using shape matching is computationally complex and time consuming, which makes it difficult to guarantee a processing time of 24 −1 s or less. Thus, we define the wave feature vector by combining its barycenter, size, circularity, width, height, brightness and diagonal lengths to conduct sparse matching. The following content is the explanation of each element in the feature vector.
Epipolar constraint [40] is usually used to reduce the searching region of matching. The epipolar line can be calculated by RANSC [37]. If the same sea wave is extracted by different thresholds, it will lead to differences in extracted sizes, shapes and edges. Thus, compared with other points within the sea wave, the barycenter [41] is much more stable and it is chosen as the representative of the sea wave location.
Next, diagonal lengths (45 • and 135 • ), height and width can be used to reflect the sea wave shape. Circularity can be used to measure the similarity of the sea wave to a circle. Brightness distribution can also be used as a discriminative feature, but it is not a decisive discriminant feature as shooting from different angles will change the brightness distribution. Figure 3 shows the features. The feature vector of sea wave i is defined as follows: where: f i1 the location of barycenter, displayed in x and y coordinates of the pixel; f i2 the size of the sea wave, displayed in total number of pixels; f i3 , f i4 the width and height of the sea wave, displayed in largest numbers of wave pixel in vertical and horizontal orientations; f i5 the circularity of the sea wave, displayed in a number between 0 and 1. The larger it is the more similar it is to a circle; f i6 , f i7 the lengths of the sea wave on the two diagonals' orientations of 45 • and 135 • ; f i8 the brightness of the sea wave, displayed in the sum of grayscale of all pixel points of sea wave i. In the following content, the k-th feature of the calculated sea wave is easily expressed in f k .
From one image pair, we extract two sea wave feature vector sets. As a result, for the proposed stereo system, the problem of sea wave matching can be converted to the problem of matching between two feature vector sets F L , F R .
where F L and F R are the sea wave feature vector sets of the left and right images, n and m are the number of sea waves extracted on the left and right images and F L i is the feature vector of the i-th sea wave on the left image.

Fast Matching by Decision Tree
To make judgments based on a global analysis of each feature, the decision tree [23] classifier (DTC) is chosen as the matching strategy. The DTC can be selectively robust to some uncertainty features, so it reduces the influence of the uncontrollable factors while also reduces computational load.
The C4.5 [42] system takes the information gain ratio (4), to choose a split feature for each node. It can resist the uneven distribution of different categories of training samples. Pessimistic Error Pruning (PEP) is used to avoid overfitting.
where P represents the set of training samples and f k is the k-th feature of calculated sea waves. Please refer to [42] for a detailed definition of In f oGain(P, f k ) and SplitIn f o f k (P). The labels of the training data are determined by manual identification. We made 853 training samples with 336 correct correspondences and 517 incorrect correspondences. Figure 4 shows the built decision tree. We traverse from the root to the leaves to determine if the two waves are a correct match. Figure 5 shows two of the sparse matching results on the sea surface images taken at 15:00 and 17:00, with the monitoring range being between 14 km and 20 km. For the sea surface image pair at 15:00 and 17:00: n = 54 and 65, m = 72 and 63 sea waves are extracted from the left and right images, respectively. A total of 25 and 26 pairs of sea waves are correctly matched by our method, the matching precisions are 100% and 92.3%. The matching accuracy is judged by visual inspection, if the same wave in the left and right images can be recognized by our method as one wave, we consider the matching result to be correct. The computational complexity and running time will be discussed in Section 3.2.

Relationship between Disparity and y Coordinates of Sea Surface Images
In our research, the monitoring distance is 4-20 km, in this case the geodestic distance on the Earth's surface is approximately equal to straight-line distance. For simplicity, in this manuscript we use the straight-line distance in Euclidean space instead of using the distance in Riemannian space. To set up our user coordinate system (UCS), we set the camera location at coordinate origin (0, 0, 0), with axis X, Y and Z as shown in Figure 6. We define the angle between the Z axis and the shooting camera's normal direction as θ and earth surface as E. We assume the earth is a sphere. Its center is C = (0, H, 0), and its radius is R. The coordinate of target T in the UCS is (X, Y, Z), and the coordinate of its projected point t in the image plane is (x, y). According to Figure 6 and the pinhole camera model, we know that the relationship between the target T and its image projection point t is as follows: Our UCS shares the same origin with the pinhole camera coordinate and rotates θ around the X axis. We assume this pinhole camera is the left camera. The rotation and translation of it is R T . To simplify we assume for two well calibrated and rectified cameras that the rotation and translation between left and right cameras are I and b 0 0 0 T . Accordingly, we can formulate the relationship between disparity d and image coordinate y, as Equation (6) shows: In (6), y is the y coordinate of the target in the image plane, h (pixel) is the length of half the image plane's height, f x and f y (pixel) are the focal lengths of the camera, d (pixel) represents the disparity between left and right images, b represents the baseline length between left and right cameras, R is the radius of earth, H is the length between the earth center and our UCS origin and X is the coordinate in the X axis of our UCS. From (6), we know the relationship between y and d. Figure 7a illustrates the relationship and the bold line represents the relationship between d and y within the image plane.
Given a sea surface image taken by our system, the disparity d increases when y increases. For target T : (X , Y , Z ), the sea surface height is not 0, and the coordinate y of the projected target point will be smaller than the value calculated according to (6), which will cause a bright region in the disparity map, as Figure 7b illustrates (it is a schematic of a disparity simulation for the case where the sea surface height changes).  (6) and (b) illustrates the disparity map in the case where sea wave height is not 0 (like the target T' shown in Figure 6).
According to the relationship between d and y, we can drastically reduce the size of cost volume. A much smaller leaning cost volume is built to accomplish dense matching of long distance sea surface images, although the disparity d varies in a wide range for this class of images. We will introduce its construction in Section 2.3.2.
For simplicity, we can also simply assume that the relationship between y and d is proportional. It still yields a relatively accurate result, but the disparity range of the matching search will be extended and the accuracy will be slightly lower due to the lowtexture nature of the sea surface images. In this paper, we still recommend one-time sparse matching for the same class of sea surface images to obtain an exact leaning cost volume first.

Build Leaning Cost Volume
Leaning cost volume can be built based on two facts: (1) the relationship between disparity d and y coordinate obeying (6), and (2) sparse matching has been accomplished, which can offer initial data for fitting the (6) function. This paper utilizes the least square method [43] to fit the function.
To resist the luminance change of left and right images caused by different shooting angles, the descriptor, which is used to compute the matching cost volume, is the combination of intensity, gradient and census [44] features. The matching cost of each pixel p (on the left image) is calculated from the difference between its feature descriptor and the feature descriptor of its suspected correspondence (on the right image), the suspected location being q = f (p) + d. The function f (p) calculates the suspected location of the matching point on the right image according to (6), and d is the disparity between left and right pixel points caused by the unevenness of the sea surface. Equation (7) shows the calculation of cost C(p, d) at point p: Here, I L p denotes the intensity of pixel point p. ∇ x is the grayscale gradient in the x axis direction. I R q is the intensity of the corresponding pixel of p on the right image. The disparity between them is f (p) + d. α balances the color and gradient terms' contributions, and τ 1 , τ 2 are the truncation values. σ 1 , σ 2 balance the census, intensity and gradient terms' contributions. Hamming(census(p) L , census(q) R ) represents calculating the hamming distance between vector census(p) L and census(q) R . census(p) L denotes the census vector of pixel p on the left image, census(q) R is the census vector of the corresponding pixel of p on the right image. Census transformation is robust when the overall luminance of the image changes. We can choose the appropriate size of the neighborhood window to generate census vectors of different lengths. Figure 8 demonstrates the cost volume. (a) is the common cost volume. All the points in the d = 1 plane share the same disparity. It is usually adapted when disparity varies in a small range D s . (b) is the leaning cost volume proposed in this paper, the disparity plane is plane 2, and we can get it according to (6). Although the disparity varies in a large range D l , we can still search the best matching in the small range D s . By the leaning cost volume, large disparity change can be calculated in advance according to (6), and we just need to search the disparity d ∈ D s (= [0, 20]) interval for the best matching results. It greatly reduces the memory and time consumption of dense matching for the tsunami measurement system.

Fast Dense Matching
Pixel-wise cost calculation is generally ambiguous and wrong matches can easily have a lower cost value than correct ones, due to the low-texture area, noise and so forth. Therefore, we need to add other constraints to remove wrong matches. Observing the sea surface, we find that sea level height change is continuous, thus, we add a constraint that adjacent pixels must have similar disparities, and define the following global energy in Equation [38]: is the data term of disparity map D, representing the sum of all pixels' cost ∑ p C(p, d p ). E smooth (D) is the smooth term, d p is the disparity between pixel p on the left image and its suspected matching point on the right image, p is the adjacent pixel point of p and T[•] equals 1 if the argument is true and 0 otherwise, N p represents the set of neighboring points of p. The u(p, p ) multiplier can be interpreted as the penalty of a discontinuity between p and p , and in this paper, it is the combination of the intensity difference and space distance, as Equation (9) shows: It is inspired by the bilateral filter [39], where P 1 is the maximal penalty for a discontinuous pixel, and u(p, p ) decreases when the intensity difference or space distance of pixel p and p increases, which can preserve the discontinuity of the edge. σ sp and σ I balance the contributions of intensity and space distance to the discontinuity penalty. The problem of dense matching now converts to find a disparity map D that minimizes the global energy E(D). According to [24], it is an NP-hard problem. Graph cut and SGM are proposed in [45] and [24], respectively, and they can approximately minimize the energy function in polynomial time.
In this paper, we utilize the Semi-global matching algorithm to minimize the energy function; it is similar to [24]. However, different from [24], which adds a constant penalty P 1 for all pixels in the neighborhood of p when the disparity changes a little bit (that is, 1 pixel) and adds a larger constant penalty P 2 for all larger disparity changes, we calculate a dynamic penalty for each pixel in the neighborhood of p based on space and intensity differences, as (9) shows. Thus, we first need to calculate the penalty volume for the image.
The dimension of the penalty volume is N × H × W. N depends on the neighborhood window size, and W and H are the width and height of the image. Adjacent pixels share the same penalty, which can be used to reduce the size of the penalty volume in the N dimension, like Figure 9 shows. It is the 8-connection situation, where the window size is three. For point p, we have marked all its neighborhood points in 8 directions (in gray). If we calculate the penalty volume from the top left point of image, the penalty of adjacent points in the top left corner of p (in dark gray) has already been calculated. In other words, the penalty in r 1 , r 2 , r 3 , r 4 directions have already been calculated; therefore, we just need to calculate and store the penalty of the remaining r 5 , r 6 , r 7 , r 8 directions. For point p : (x, y), its penalty in the (u, v) direction equals the penalty of point p : (x + u, y + v) in the (−u, −v) direction, which can be used to search the penalty in r 1 , r 2 , r 3 , r 4 directions from the former pixels' penalty vectors. We know r i and (u, v) are two different forms of adjacent directions. If r i = (u, v), we can formulate the relationship between i and u, v as the following Equation shows: i is the index of direction r i , which can be calculated by function g(u, v). wds is the window size of the neighborhood. For any pixel in the image (except the first row and first column), we only calculate its penalty in the last half of the directions. The penalty P(x, y) in the first half of directions r i can be searched from the former pixels' penalty vectors, for which we have: P(x, y) r i = P(x + u, y + v) r j i = g(u, v), j = g(−u, −v) After we have established the penalty volume, the minimization method is similar to [24]. Figure 10 shows the summary of all the processing steps of our proposed method, including sparse matching and dense matching.   To complete dense matching, first, a leaning cost volume (size W × H × D s ), and a penalty volume (N × H × W) need to be built in advance. Each element in the two volumes needs to be calculated, thus, the complexity is O(W HD s + W HN). Then, we conduct cost aggregation similar to [24]. The calculation of smooth cost (cost aggregation) in one direction requires O(D s )steps at each pixel. Each pixel is visited exactly K (aggregation path number) times, which results in a total complexity of O(KW HD s ).
With the leaning cost volume, the complexity of the building cost volume decreases from O(W HD l ) to O(W HD s ), For real long distance sea surface images, D s is approximately 20, D l is approximately 600, and nearly 97% of running time and memory consumption are saved. In the cost aggregation process, by using the penalty volume, we can reduce the aggregation path number by half (compared to what the author of [24] used), which can save 50% in running time.

Experiment Results
In this section, we apply our method to sea surface images taken at different times with different illuminance conditions and shooting locations. We show two kinds of experiment results: sparse matching results and dense matching results. The sea surface images are captured by our tsunami measurement system in three periods: 29 February

Configuration of the Proposed System
The stereo system consists of two telephoto cameras to take sea surface images, two Pan/Tilt/Zoom (PTZ) heads that can be rotated in the pan and tilt directions to adjust the sight of each camera, one console panel to manually input control signal to PTZ heads, and two client computers to control the photography by adjusting the parameters of the cameras and receive the images captured by the cameras. There is also one total server to communicate with client computers, and control and send the necessary commands for photography to the client computers. Figure 11

Sparse Matching Results
In order to evaluate the performance of the sparse matching method, we compare the matching results of the proposed method with the RANSAC+SURF method and Euclidean distance method. Precision, recall and runtimes are the three main evaluation terms, with precision and recall defined as the following, respectively: where nTP and nFP are the numbers of correctly and wrongly detected correspondences in the matching method, respectively. nFN is the number of correct correspondences that are not detected.
To illustrate the comparative result intuitively, we show six representative images' matching results taken at different times. In Figure 12 From the matching results, we find that the RANSAC+SURF algorithm can generate stable and correct matching results only within the region where features are obvious and distinctive, such as mountains or large size sea waves. Our proposed method can match more than 90% of the sea waves correctly and stably. Among the three methods, our proposed method correctly matches the largest number of sea waves.
We also conducted a quantitative comparison on different image pairs, the performance of each method is shown in Table 2. The average precision of the RANSAC+SURF algorithm was 88.4%, showing that sea waves can be correctly matched. However the average recall was 29.1%, meaning that many sea waves are missed by the algorithm, and it will influence the final precision. The average precision of our proposed method is 95.3%; it is sufficient for the second step dense matching. The ground-truth is established by manual checking. From the comparison results, we can conclude that more than 90% of sea waves can be matched stably and correctly by feature vectors, regardless of different illumination conditions. It can solve the problem of no obvious feature points on the sea surface image during the sparse matching process.
The precision of Euclidean distance matching is 5.3% greater than the RANSAC+SURF method's. It means that the feature vector defined in this paper is effective. Meanwhile, the recall of our method is 39.7% greater than Euclidean distance method's. Thus, we can conclude that decision tree we built is more suitable for sea surface image matching.
We also conducted a much more general comparison on sea surface image pairs with the RANSAC+SURF and Euclidean distance method. Figure 13 shows the comparison of the matching results of RANSAC+SURF, Euclidean distance and the proposed methods. The comparison is conducted on 367 pairs of sea surface images taken in 2016, 2017 and 2018.   The left column is the correctly matched sea wave numbers of the three matching methods. The horizontal axis is the image number ordering by image shooting time, the vertical axis is the correctly matched sea wave number, and each point (t, n) on the line represents that there are n correctly matched sea waves on the tth sea surface image. The blue line represents the ground-truth sea wave number; it is counted by manual check. The purple line represents RANSAC+SURF matching results, the green line represents Euclidean distance matching results and the red line represents the number of correctly matched waves by our proposed method. We can find that most of the time, the red line is higher than the green and purple lines, meaning that our proposed method can match most sea waves at the most times. The right column is the matching precision of these three methods. The precision of our proposed method and the Euclidean distance matching is close in many cases, and in some cases, our proposed method is a little bit better than the Euclidean distance matching. Additionally, in many cases, our proposed method is better than RANSAC+SURF method.

Computational Complexity
To judge if two sea waves can be matched, we need to traverse from the root to the leaf of the decision tree. The maximum comparison time is the depth of the decision tree, six times, and the minimum comparison time is two times. There are nine comparison paths in the decision tree, the average comparison time is 4.25. It is nearly half the length of the feature vector. The Euclidean distance method and normalized cross correlation (NCC) used in SURF/SIFT need to traverse the whole feature vector. Thus, compared with these methods, the decision tree built in this paper can save half the time. Figure 14 shows the comparison results. The red line represents our proposed method's runtime, the green line is the runtime of of the Euclidean distance matching and the purple line represents the runtime of the RANSAC+SURF (NCC is used to calculate similarity) method. We find the runtime of our proposed method and the Euclidean distance matching are very close. Observing the partial original running time data (see the small table in the upper right corner), we find our proposed method is 1-2 ms faster than Euclidean distance matching method in some cases. Our method is surprisingly more effective than the RANSAC+SURF algorithm. As RANSAC+SURF algorithm is point to point matching, the number of extracted feature points is much greater than our method's sea wave number. At the same time, the feature vector length is 128, much longer than our proposed feature vector length. Figure 15 shows one of the dense matching results of sea surface images by leaning cost volume. The image was taken at 17:00, with a monitoring distance of 8-14 km. (a) (b) are the left and right images and (c) is the disparity map of (a) and (b). The intensity of each pixel represents the disparity of each pixel between the left and right images. Higher intensity represents larger disparity, and the stripe-like areas in the bottom left and top right corners are disparity missing areas caused by the occlusion between the left and right images. Comparing the original image (a) and (b) with the disparity map (c), we can find that, consistent with the conclusion drawn in Section 2.3, the intensity in the presence of a sea wave is larger than the background intensity. Since there is no planar structure on sea surface images, there is rarely area with the same disparity, it is consistent with figure (c).

Dense Matching Results
To validate the correctness of dense matching results, we choose a line on the left image (a). For each pixel on the line, we compute its matching pixel on the right image according to the disparity map (c).   Figure 16 shows the matching results, and the two matched pixels are connected by a line. The matching accuracy is checked by manual operation. The black line represents correct matching. The red line represents incorrect matching, when the two linked points break the disparity continuity criterion. The matching accuracy of Figure 16 is 87.0%. In Section 2.3.1, the relationship between y and d is shown in Figure 7a, and it is consistent to Figure 16 result, in which the change tendency of the lines' slopes are consistent with the change of y. We also conduct dense matching experiments on sea surface images taken at other times and locations. Figure 17 is the dense matching results of sea surface images, The (a) column is taken by the left camera, the (b) column is taken by the right camera and the (c) column is the disparity map. The irregular striped areas in the disparity map are non-overlapping areas in the left and right camera fields of view. 1 and 2 were taken 18-23 August 2018 from Fukuoka Institute of Technology, with a monitoring distance of 8-14 km. 3 , 4 were taken 8-15 March, 2017 from the same site, with a monitoring distance of 4-10 km. 5 and 6 were taken 29 February-6 March 2016 from Fukuoka Kenritsu Suisan High School, with a monitoring distance of 14-20 km. Due to the lack of ground truth, it is difficult for us to evaluate the dense matching accuracy and manual point by point inspection would be costly, thus, we can only perform a general accuracy check such as Figure 16. Limited by this paper's length, in this paragraph, we only make a general evaluation of the final results. According to the conclusion of Section 2.3.1, the disparity will increase in the area where there is a sea wave, coast or mountains, causing a whitish area on the disparity map. This phenomenon is consistent with our experimentally derived disparity maps. In this case, we tentatively conclude that the our dense matching method is valid, and in the future, we will focus on obtaining ground truth to accurately evaluate the dense matching results. . Column (a) shows the images taken by the left camera; Column (b) shows the images taken by the right camera; and column (c) shows the disparity maps.

Discussion and Conclusions
We know that the wavelength of a tsunami wave is very long and usually the increase of sea surface height is small in the place where the tsunami occurs. According to its speed calculation equation c = gh, where g is the acceleration of gravity and h is the water depth, the tsunami moves fast in deep sea areas and slows down near the coast. Thus, even a small increase in sea surface height of only 20 cm could be caused by a tsunami [46] and that has the potential to cause a large sea surface height increase near the coast. As the tsunami waves slow down near the coast,the wavelength becomes shorter while the wave energy is the same, resulting in a significant increase in sea surface height near the coast.
For different subsea topography, a tsunami 20 km away takes about 15-30 min to reach the coast. If our proposed stereo system can detect an abnormal sea surface rise of 20 cm and higher, we will have the opportunity to be able to provide coastal people with 15-20 min of escape time before the tsunami comes ashore. The proposed system has two long focal length cameras of 1140 mm, an acquisition rate of 30 fps, resolution of 1920 × 1080 pixels and two cameras are deployed 27 m apart from each other. The uncertainties in the calibration processing are controlled [47]. Thus, to realize real-time measurement for tsunami warning, the stereo matching error must be smaller than eight pixels and the running time of stereo matching must be shorter than 24 −1 s. Furthermore, the stereo matching method must be capable of processing stereo images under different lighting conditions.
To verify the effectiveness of the proposed method for sea surface images under different lighting conditions, we conducted three groups of experiments, acquiring sea surface images from three different locations over three weeks, and performed sparse matching and dense matching on them. The experiment results show that our proposed method can correctly match greater than 90% of sea waves on the sea surface images, see Figure 13, regardless of the distance from the surface, shooting angles and sea state conditions. The running time of sparse matching ranges from 0-400 ms when the correctly matched number ranges from 0 to 140 for stereo images under different conditions, see Figures 13 and 14, for most stereo images, the running time is less than 40 ms. Without considering the time consumption of dense matching, it can meet the requirement for real-time monitoring. We did not record the running time of dense matching, because it is much longer than sparse matching as we need to traverse the leaning cost volume for K (the searching path number) times (time complexity of O(KW HD s )). It is the major time consumption process, and causes our method to not be able to output matching results in real time. In the future, we will focus on increasing the speed of the dense matching by adding parallel operations and matrix operations. By manually checking one of the dense matching results, the dense matching accuracy is 87.0%, thus making it able to meet the accuracy requirement of a tsunami warning.
Indeed, the tsunami problem is very serious and many elements can affect the system. For example, there are many elements affecting the sea surface elevations such as tidal waves, typhoon waves, tsunami waves, etc. Sea surface changes alone are not sufficient to indicate a tsunami. In our project, we believe that the wavelength of a tsunami wave is much longer than the wavelength of typhoon-induced waves, so when the sea surface height changes in a small area, we consider it to be caused by typhoons or sea breezes, etc. The change in sea surface height caused by tides has a certain time pattern, so when the sea surface height is abnormal in a wide area and the tidal factor is excluded, we will consider the occurrence of a tsunami. Furthermore, in our project, there are also many other research subjects such as 24 h image capture for long-distance sea surface of 4-20 km, image measurement in bad weather such as rain and snow, stereo mapping for binocular stereo vision, calculation of sea level height, method of determining the presence or absence of tsunami and how to estimate the arrival time. As we are limited by the paper length, in this paper, we only introduce the stereo matching algorithm. To achieve a high precision and fast stereo matching of proposed tsunami measurement system, we proposed a two step sea surface image matching method based on feature vectors and leaning cost volume.
To resist the computation load caused by a large disparity range, sparse matching is proposed for sea surface image stereo matching. Experiments on multiple groups of sea surface images show that the sparse matching method based on feature vectors can achieve an average precision of 95.3%, and recall of 94.1%, which is better than the RANSAC+SURF method. Furthermore, the accuracy is large enough for second step dense matching.
We formulated the relationship between the disparity d and y coordinate. To reduce the computation load of dense matching, we constructed leaning cost volume based on sparse matching results. During this process, a dynamic penalty method is taken and penalty volume is calculated before we minimize the energy function. The final dense matching results validate our conclusions. In addition to time consumption, another limitation is the high requirement on image quality, the sparse matching can be conducted only in the case, where there are sea waves on the sea surface and the sea waves are captured by the stereo system. To address this insufficiency, we are considering fusing point-to-point sparse matching.
Differently from traditional sea surface stereo matching methods [13,16,19,20], we firstly perform the stereo matching on long distance sea surface images (4-20 km). Although the method here has many shortcomings, our expectation is that this attempt can open up new and exciting possibilities in terms of wave measurements for tsunami warnings.
Author Contributions: Conceptualization, methodology and validation, Y.Y., methodology implementation and supervision, C.L., writing-original draft preparation, Y.Y., writing-review and editing, C.L., sea surface image resource, C.L. and Y.Y. and other members in Lulab. All authors have read and agreed to the published version of the manuscript Funding: This research was funded by JSPS KAKENHI Grant Numbers JP17K01331, and the MEXT-Supported Program for the Strategic Research Foundation at Private Universities S1311050.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.