Improving Accuracy and Robustness of Space-Time Image Velocimetry (STIV) with Deep Learning

: Image-based river ﬂow measurement methods have been attracting attention because of their ease of use and safety. Among the image-based methods, the space-time image velocimetry (STIV) technique is regarded as a powerful tool for measuring the streamwise ﬂow because of its high measurement accuracy and robustness. However, depending on the image shooting environment such as stormy weather or nighttime, the conventional automatic analysis methods may generate incorrect values, which has been a problem in building a real-time measurement system. In this study, we tried to solve this problem by incorporating the deep learning method, which has been successful in the ﬁeld of image analysis in recent years, into the STIV method. The case studies for the three datasets indicated that deep learning can improve the efﬁciency of the STIV method and can continuously improve performance by learning additional data. The proposed method is suitable for building a real-time measurement system because it has no tuning parameters that need to be adjusted according to the shooting conditions and the calculation speed is fast enough for real-time measurement.


Introduction
The frequent occurrence of flood disasters in recent years has increased the importance of accumulating basic hydrological data such as rainfall, water level, and river discharge, which are fundamental to river disaster countermeasures [1,2]. Non-contact automated observation systems are indispensable from a labor and cost standpoint to constantly collect accurate data on water levels and river flow at multiple points, even during large floods [3]. However, compared to water level observation, an automatic river flood observation system has not yet been established, and in Japan, this is generally measured manually using floats [4,5]. To overcome this situation, the momentum for unmanned and laborsaving flow observation has been growing in recent years in Japan, shown by projects such as the Innovative River Technology Project [6] led by the Ministry of Land, Infrastructure, Transport and Tourism (MLIT). In the project, comparative research has been conducted for practical applications of recently proposed flow measurement methods such as ADCP (Acoustic Doppler Current Profiler) [7][8][9], SVR (Surface Velocity Radars) [10,11], and image analysis methods [12][13][14]. The project is underway at three first class rivers at present. Of the above methods, the image-based method offers significant economic advantages over other methods because it can use existing camera equipment installed for river monitoring. Furthermore, video images taken from a drone can be used to measure a variety of river surface flows at arbitrary locations.
Image analysis methods include the particle tracking velocimetry (PTV) [15], largescale particle image velocimetry (LSPIV) [16,17], optical flow method [18] and spacetime image velocimetry (STIV) [19][20][21][22][23]. Among these, STIV, which is the subject of this research, has recently been put to practical use in flow observation due to its robustness of measurement and efficiency of calculation, and the software (Hydro-STIV) [24] developed mainly by the authors is now available. In STIV, a search line of arbitrary length is set in the mainstream direction of the image, and the flow velocity is calculated from the gradient of the striped pattern that appears in the space-time image (STI) generated by stacking the image intensity information in the time direction. The advantages of STIV over other methods are that it can analyze images even with a small depression angle of about two degrees by paying attention to the time-and line-averaged streamwise velocity component. On the other hand, in STIV, the mainstream direction needs to be known to set the search line along the mainstream direction. However, since the velocity component perpendicular to the cross section is important in discharge measurements, the direction of the search line can be determined from the cross section setting in many cases except for complex curved parts.
So far, various methods have been proposed for automatically detecting the pattern gradient, such as the gradient tensor method [19], QESTA [25], and masked 2D Fourier spectra method [21,26]. These methods have been used in the surveying work of river flow measurements in the actual field, but problems occasionally occur when the observation environment is not appropriate, such as in stormy weather conditions and during the night. In such cases, it becomes necessary to manually adjust the pattern gradient. To build a reliable real-time measurement system [27], it is important to develop a robust pattern gradient detection method that does not require manual parameter adjustment. To this end, we used a deep learning method, which has achieved remarkable results in recent years in image analysis such as pattern recognition, to improve the robustness of pattern gradient detection and to fully automate the process without manual parameter adjustment.

Image Analysis Procedure of STIV
STIV is an image analysis method for estimating surface velocities in the flow direction by analyzing videos usually taken obliquely from the riverbank as shown in Figure 1a [25,26]. First, search lines with a constant physical length are usually set parallel to the flow direction at regular intervals on the orthorectified image, as shown in Figure 1b. Next, a space-time image (STI) is generated for each search line by stacking the sequential image intensity distribution over time along the search line. As a result, the STI displays inclined patterns indicating the streamwise surface flow velocity at the search line location. image velocimetry (STIV) [19][20][21][22][23]. Among these, STIV, which is the subject of this research, has recently been put to practical use in flow observation due to its robustness of measurement and efficiency of calculation, and the software (Hydro-STIV) [24] developed mainly by the authors is now available. In STIV, a search line of arbitrary length is set in the mainstream direction of the image, and the flow velocity is calculated from the gradient of the striped pattern that appears in the space-time image (STI) generated by stacking the image intensity information in the time direction. The advantages of STIV over other methods are that it can analyze images even with a small depression angle of about two degrees by paying attention to the time-and line-averaged streamwise velocity component. On the other hand, in STIV, the mainstream direction needs to be known to set the search line along the mainstream direction. However, since the velocity component perpendicular to the cross section is important in discharge measurements, the direction of the search line can be determined from the cross section setting in many cases except for complex curved parts. So far, various methods have been proposed for automatically detecting the pattern gradient, such as the gradient tensor method [19], QESTA [25], and masked 2D Fourier spectra method [21,26]. These methods have been used in the surveying work of river flow measurements in the actual field, but problems occasionally occur when the observation environment is not appropriate, such as in stormy weather conditions and during the night. In such cases, it becomes necessary to manually adjust the pattern gradient. To build a reliable real-time measurement system [27], it is important to develop a robust pattern gradient detection method that does not require manual parameter adjustment. To this end, we used a deep learning method, which has achieved remarkable results in recent years in image analysis such as pattern recognition, to improve the robustness of pattern gradient detection and to fully automate the process without manual parameter adjustment.

Image Analysis Procedure of STIV
STIV is an image analysis method for estimating surface velocities in the flow direction by analyzing videos usually taken obliquely from the riverbank as shown in Figure  1a [25,26]. First, search lines with a constant physical length are usually set parallel to the flow direction at regular intervals on the orthorectified image, as shown in Figure 1b. Next, a space-time image (STI) is generated for each search line by stacking the sequential image intensity distribution over time along the search line. As a result, the STI displays inclined patterns indicating the streamwise surface flow velocity at the search line location.  of the ripples, the flow velocity v is obtained by the following equation with the proportional coefficient k which indicates the conversion between pixels and real scale.
In recent research, it has been recognized from field observations [28] and direct numerical simulation (DNS) [29] that STI contains textures generated by turbulence-generated ripples advected with the surface velocity, which corresponds to the flow (flow signal), and dispersive wave components traveling in positive or negative directions. As an example, an STI obtained from a river flow movie is shown in Figure 2a, in which textures with different gradients are superposed, which makes the detection of the actual pattern for the surface flow difficult. On the other hand, the Fourier transform image of the STI shown in Figure 2b clearly demonstrates the differences between each component as different locations in the image. Since the linear texture passing through the origin corresponds to the flow signal component, it is useful to use the Fourier transform image to determine the pattern gradient corresponding to the average surface velocity over the search line and measurement time. As described above, the STI texture contains various patterns generated not only by turbulence-related ripples but also by the effects of dispersive waves propagating in all directions, as well as various types of noise generated by other causes. Therefore, the measurement accuracy of STIV depends on how accurately it can detect the texture pattern gradients associated with only the surface flow signals.  Figure 2 shows an example of the original STI and its two-dimensional Fourier tran form image. From the gradient of the stripe pattern appearing in the STI as the traje tory of the ripples, the flow velocity is obtained by the following equation with t proportional coefficient which indicates the conversion between pixels and real scal tan In recent research, it has been recognized from field observations [28] and direct n merical simulation (DNS) [29] that STI contains textures generated by turbulence-gen ated ripples advected with the surface velocity, which corresponds to the flow (flow s nal), and dispersive wave components traveling in positive or negative directions. As example, an STI obtained from a river flow movie is shown in Figure 2a, in which textur with different gradients are superposed, which makes the detection of the actual patte for the surface flow difficult. On the other hand, the Fourier transform image of the S shown in Figure 2b clearly demonstrates the differences between each component as d ferent locations in the image. Since the linear texture passing through the origin cor sponds to the flow signal component, it is useful to use the Fourier transform image determine the pattern gradient corresponding to the average surface velocity over t search line and measurement time. As described above, the STI texture contains vario patterns generated not only by turbulence-related ripples but also by the effects of disp sive waves propagating in all directions, as well as various types of noise generated other causes. Therefore, the measurement accuracy of STIV depends on how accurately can detect the texture pattern gradients associated with only the surface flow signals.

Gradient Tensor Method
This method calculates the gradient tensor of an STI and performs a pattern ang estimation [19]. The method consists of the following steps. First, the STI is divided into number of overlapping blocks (windows), and the pattern angle is calculated for ea block using Equations (2) and (3) (Figure 3a).
, in Equation (3) is the image intens value of the STI image at , . and take either or .

Gradient Tensor Method
This method calculates the gradient tensor of an STI and performs a pattern angle estimation [19]. The method consists of the following steps. First, the STI is divided into a number of overlapping blocks (windows), and the pattern angle is calculated for each block using Equations (2) and (3) (Figure 3a). g(x, t) in Equation (3) is the image intensity value of the STI image at (x, t). p and q take either x or t.
Finally, a histogram of the pattern angles calculated in the first step is created (Figure 3c), and the average angle weighted by the coherency using Equation (5) is calculated, where N is the number of blocks.
Next, the coherency (pattern clarity) of each block is calculated using Equatio (Figure 3b).

4
Finally, a histogram of the pattern angles calculated in the first step is created (F 3c), and the average angle weighted by the coherency using Equation (5)  This is the first automatic estimation method developed in STIV, and it can accur estimate the pattern angle for videos with good shooting conditions. On the other h for videos shot under bad conditions, delicate adjustment of parameters such as win size ( , ), window step width ( , ), coherency threshold ( ), and histog range to be averaged ( ) is required for a reasonable analysis. In this study, thes rameters were set as follows: ( , ) = (30 pix, 30 pix), ( , ) = (10 pix, 10 pix), 0.0, and was set to a range up to 70% of the maximum value, which gives reason results in many usual cases.

Fourier Predominant Angular Analysis Method
This method uses Equation (6) to calculate the integral of the radial intensities of angle in the STI's Fourier transform image and uses the angle with the largest int value as the pattern angle of the STI (Figure 4).

Here,
, is the intensity value of the Fourier transform image at , in p coordinate. The reason this works is because, as mentioned above, the flow signal pa shows a linear peak through the origin of the STI's Fourier transform image. Note tha This is the first automatic estimation method developed in STIV, and it can accurately estimate the pattern angle for videos with good shooting conditions. On the other hand, for videos shot under bad conditions, delicate adjustment of parameters such as window size (MX, MT), window step width (LX, LT), coherency threshold (CT), and histogram range to be averaged (HR) is required for a reasonable analysis. In this study, these parameters were set as follows: (MX, MT) = (30 pix, 30 pix), (LX, LT) = (10 pix, 10 pix), CT = 0.0, and HR was set to a range up to 70% of the maximum value, which gives reasonable results in many usual cases.

Fourier Predominant Angular Analysis Method
This method uses Equation (6) to calculate the integral of the radial intensities of each angle in the STI's Fourier transform image and uses the angle with the largest integral value as the pattern angle of the STI ( Figure 4).
Here, G(θ, r) is the intensity value of the Fourier transform image at (θ, r) in polar coordinate. The reason this works is because, as mentioned above, the flow signal pattern shows a linear peak through the origin of the STI's Fourier transform image. Note that the peak angle θ appearing in the Fourier transform image and the gradient angle of the original STI stripe pattern φ have an orthogonal relationship. Although fast and robust angle estimation is possible in many cases, these peak structures appearing in the Fourier transform image may be mis-detected in cases where stationary noise and gravity waves are dominant. Furthermore, in the case of strong blur, the peak structure does not appear clearly, and the analysis value may fluctuate greatly depending on the high-pass filter range in the Fourier transform image which is applied as pre-processing. Therefore, in previous research [21,26], the maximum angle is not directly adopted, but is used as filter before the execution of the gradient tensor method. In this study, the high-pass filter applied as pre-processing was set to 1% of the STI size, which gives reasonable results in many usual cases. peak angle appearing in the Fourier transform image and the gradient angle of the original STI stripe pattern have an orthogonal relationship. Although fast and robust angle estimation is possible in many cases, these peak structures appearing in the Fourier transform image may be mis-detected in cases where stationary noise and gravity waves are dominant. Furthermore, in the case of strong blur, the peak structure does not appear clearly, and the analysis value may fluctuate greatly depending on the high-pass filter range in the Fourier transform image which is applied as pre-processing. Therefore, in previous research [21,26], the maximum angle is not directly adopted, but is used as filter before the execution of the gradient tensor method. In this study, the high-pass filter applied as pre-processing was set to 1% of the STI size, which gives reasonable results in many usual cases.

Outline of Deep Learning
Starting with the success of AlexNet [30] in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012, research and development of deep learning methods for various image analysis tasks have been vigorously pursued in recent years. In particular, the development of a convolutional neural network (CNN) has been remarkable in image analysis tasks. CNN is based on a multilayered structure consisting of a convolutional layer extracting image features and a pooling layer that compresses the extracted features. For details, refer to references [30][31][32][33][34].

Application of CNN to a STI Pattern Gradient Detection Problem
In STIV, the flow velocity is obtained by Equation (1) with the gradient of the stripe pattern appearing in the STI. Hence, the automatic detection of STI pattern gradients is achieved by using a CNN to accurately approximate the following nonlinear function , which calculates the gradient of the stripe pattern from the STI information ∈ × × . is the number of the STI channels, corresponding to 3 for an RGB image and 1 for a grayscale image, is the height of the STI, indicating the number of image frames that make up the STI, and is the width of the STI, which is the number of pixels that make up the search line.
In the CNN approximation of , it is natural to train it as a regression problem to output the correct pattern gradient value for the input STI. However, in this study, we classify the range from 0° to 180° into classes with a specific incremental range and train the CNN as a classification problem to estimate the corresponding gradient class from the input STI. In other words, we used CNN to approximate the pattern gradients classification probability distribution when image intensity value information is

Outline of Deep Learning
Starting with the success of AlexNet [30] in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012, research and development of deep learning methods for various image analysis tasks have been vigorously pursued in recent years. In particular, the development of a convolutional neural network (CNN) has been remarkable in image analysis tasks. CNN is based on a multilayered structure consisting of a convolutional layer extracting image features and a pooling layer that compresses the extracted features. For details, refer to references [30][31][32][33][34].

Application of CNN to a STI Pattern Gradient Detection Problem
In STIV, the flow velocity v is obtained by Equation (1) with the gradient φ of the stripe pattern appearing in the STI. Hence, the automatic detection of STI pattern gradients is achieved by using a CNN to accurately approximate the following nonlinear function f (I), which calculates the gradient φ of the stripe pattern from the STI information I ∈ R Ch×H×W . Ch is the number of the STI channels, corresponding to 3 for an RGB image and 1 for a grayscale image, H is the height of the STI, indicating the number of image frames that make up the STI, and W is the width of the STI, which is the number of pixels that make up the search line.
In the CNN approximation of f (I), it is natural to train it as a regression problem to output the correct pattern gradient value for the input STI. However, in this study, we classify the range from 0 • to 180 • into N classes with a specific incremental range and train the CNN as a classification problem to estimate the corresponding gradient class from the input STI. In other words, we used CNN to approximate the pattern gradients classification probability distribution p when image intensity value information I is given as shown in the following equation, and the estimated gradient φ isφ ∈ {φ 1 , φ 2 , ···, φ N } which gives the maximum probability.
This allows not only estimation of gradient values, but also output of estimated probability distributions belonging to each class, confidence interval evaluation of estimated gradient values, and ensemble estimation. By applying multiple pretreatment methods, the highest gradient is adopted from the classification probability (confidence level) of each STI. Furthermore, we use the two-dimensional Fourier transform image [35] of the STI as input to the CNN instead of the original STI because, as described in Section 2.1, the Fourier transform image makes it easier to identify the pattern gradient corresponding to the average surface velocity. Figure 5 shows the process of pattern gradient detection by CNN. First, the STI generated from the search line is resized and normalized (Figure 5a). Next, a 2D Fourier transformation is performed and a high-pass filter is applied to remove static components such as obstacles (Figure 5b). The preprocessed image is input into the CNN and the classification probability distribution of the pattern gradient is output as a result. Ensemble estimation is performed using multiple images with the center of the image magnified, taking into account that the pattern gradient due to the surface features generated by the turbulence corresponds to the surface average velocity passing through the origin (Figure 5c). Finally, the velocity value is calculated from Equation (1) by using the estimated gradientφ with the maximum probability instead of φ.
argmax (8) This allows not only estimation of gradient values, but also output of estimated probability distributions belonging to each class, confidence interval evaluation of estimated gradient values, and ensemble estimation. By applying multiple pretreatment methods, the highest gradient is adopted from the classification probability (confidence level) of each STI. Furthermore, we use the two-dimensional Fourier transform image [35] of the STI as input to the CNN instead of the original STI because, as described in Section 2.1, the Fourier transform image makes it easier to identify the pattern gradient corresponding to the average surface velocity. Figure 5 shows the process of pattern gradient detection by CNN. First, the STI generated from the search line is resized and normalized (Figure 5a). Next, a 2D Fourier transformation is performed and a high-pass filter is applied to remove static components such as obstacles (Figure 5b). The preprocessed image is input into the CNN and the classification probability distribution of the pattern gradient is output as a result. Ensemble estimation is performed using multiple images with the center of the image magnified, taking into account that the pattern gradient due to the surface features generated by the turbulence corresponds to the surface average velocity passing through the origin (Figure 5c). Finally, the velocity value is calculated from Equation (1) by using the estimated gradient with the maximum probability instead of .

Generation of Synthetic Dataset
For training and validation datasets, Perlin noise-based [36] artificial STI data of size 128 × 128 were generated. Specifically, 300 base images were generated by weighting and averaging the pixel values of three to five stripe pattern images. The images were generated by randomly selecting the number of nodes for setting the gradient with Perlin noise  For training and validation datasets, Perlin noise-based [36] artificial STI data of size 128 × 128 were generated. Specifically, 300 base images were generated by weighting and averaging the pixel values of three to five stripe pattern images. The images were generated by randomly selecting the number of nodes for setting the gradient with Perlin noise from the range of 2-48. To obtain the STI with various image intensity distributions, 100 images were randomly selected from these base images, rotated by selecting the rotation angle from the normal distribution where the variance σ from the target angle was 1.5 • , and then normalized after the weighted average of pixel values to obtain the final artificial STI ( Figure 6). The dataset consists of a pair of Fourier transform images of each artificial STI and the corresponding gradient angle (correct label). Since the artificial STI is square, the peak angle of the Fourier transform image and the gradient angle of the original STI stripe pattern are orthogonal. The dataset is divided into 360 classes from 0 • to 180 • in 0.5 • increments. Each class contains 100 images for training and 10 images for accuracy verification. The total amount of data in the dataset is 360 × 100 = 36,000 training pairs and 360 × 10 = 3600 images for accuracy verification. STI ( Figure 6). The dataset consists of a pair of Fourier transform images of each artificial STI and the corresponding gradient angle (correct label). Since the artificial STI is square, the peak angle of the Fourier transform image and the gradient angle of the original STI stripe pattern are orthogonal. The dataset is divided into 360 classes from 0° to 180° in 0.5° increments. Each class contains 100 images for training and 10 images for accuracy verification. The total amount of data in the dataset is 360 × 100 = 36,000 training pairs and 360 × 10 = 3600 images for accuracy verification.

Deep Neural Network Structure and Learning Configurations
In this study, we adopted a network structure that applies GAP (Global Average Pooling) [31] to multilayer CNNs. This network structure has been used in many image classifications tasks, and the GAP layer, which is placed at the final layer of the multilayer CNN feature extraction, compresses the two-dimensional information of each channel into scalar values to reduce the weight of the model and prevent overfitting. The learning conditions are listed in Table 1 and the network parameters after tuning are shown in Table 2. The input is a Fourier transform image of STI resized to 128 × 128 pixels after gray scaling, and the output is a 360-class classification probability distribution divided into classes between 0°-180° in 0.5° increments. As shown in Equation (1), tan and are used to convert the angle into velocity, so even if the angular resolution is the same, the larger the and , the lower the velocity resolution. For example, for a flow velocity of 1.0 m/s at 45°, the resolution is 1.0 ± 0.017 m/s, and for a flow velocity of 1.0 m/s at 70°, the resolution is 1.0 ± 0.028 m/s, which is sufficient for practical use. "Conv" is the Convolution layer, "MaxPool" is the MaxPooling layer, "Dropout" is the Dropout layer, and "Dense" is the Full connected layer (the number of cells is 360, the same as the number of classes). "Res I/O" and "Ch I/O" indicate how the size of the input data and the number of channels is converted in each layer. "Kernel" is the kernel size (filter size) in the Convolution and MaxPooling layers. As the activation function, Leaky-ReLU is used in the Convolution layer and the softmax function is used in the Full connected layer.

Deep Neural Network Structure and Learning Configurations
In this study, we adopted a network structure that applies GAP (Global Average Pooling) [31] to multilayer CNNs. This network structure has been used in many image classifications tasks, and the GAP layer, which is placed at the final layer of the multilayer CNN feature extraction, compresses the two-dimensional information of each channel into scalar values to reduce the weight of the model and prevent overfitting. The learning conditions are listed in Table 1 and the network parameters after tuning are shown in Table 2. The input is a Fourier transform image of STI resized to 128 × 128 pixels after gray scaling, and the output is a 360-class classification probability distribution divided into classes between 0 • -180 • in 0.5 • increments. As shown in Equation (1), tan φ and k are used to convert the angle into velocity, so even if the angular resolution is the same, the larger the φ and k, the lower the velocity resolution. For example, for a flow velocity of 1.0 m/s at 45 • , the resolution is 1.0 ± 0.017 m/s, and for a flow velocity of 1.0 m/s at 70 • , the resolution is 1.0 ± 0.028 m/s, which is sufficient for practical use. "Conv" is the Convolution layer, "MaxPool" is the MaxPooling layer, "Dropout" is the Dropout layer, and "Dense" is the Full connected layer (the number of cells is 360, the same as the number of classes). "Res I/O" and "Ch I/O" indicate how the size of the input data and the number of channels is converted in each layer. "Kernel" is the kernel size (filter size) in the Convolution and MaxPooling layers. As the activation function, Leaky-ReLU is used in the Convolution layer and the softmax function is used in the Full connected layer.

Application to Synthetic STI Dataset
The results of the application to artificial STI (mentioned in Section 4.1.1 as 3600 images for accuracy verification) are shown in Figure 7 (the red line in the figure is the CNN estimated angle). As mentioned earlier, if we keep in mind that the peak angle appearing in the Fourier transform image and the gradient angle of the stripe pattern in the original STI have an orthogonal relationship, we can confirm that good identification accuracy is shown. Reflecting the fact that the training data generation was performed in the normal distribution range where the variance σ from the target angle was 1.5, we confirmed that the identification accuracy was close to 99% in the range of ±1.0 • to ±1.5 • , where differences could not be identified visually due to blurring caused by angular variance (Table 3).

Application to Categorized STI Dataset
To verify the applicability of the method to STI obtained from real river movies, we applied the method to characteristic STI patterns (Figure 8), which often produce errone-

Application to Categorized STI Dataset
To verify the applicability of the method to STI obtained from real river movies, we applied the method to characteristic STI patterns (Figure 8), which often produce erroneous values in the existing automatic analysis methods. For details on the shooting conditions under which each STI pattern appears, refer to Fujita et al. (2020) [26]. STI patterns that are difficult to detect visually have been excluded from validation for comparison with visual results. The following verification is based on the CNN that trained the artificial STI in the previous section, but since the angular resolution is in increments of 0.5 • , the angular resolution was improved by combining the existing method of Fourier predominant angular analysis [26]. Specifically, we performed the Fourier predominant angular analysis with 0.1 • angle increments within the range of ±2.0 • of the CNN angle estimate and used the result as the CNN estimate. Figure 9 shows the results of comparing the CNN estimated angle with the conventional automatic analysis method, the gradient tensor method, for the manually confirmed pattern angle. Even with the STI pattern, which is an outlier of the traditional gradient tensor method, the CNN shows good estimation accuracy of about ±3 • (black dotted line in Figure 9). As mentioned earlier, the accuracy of the velocity depends on φ and k, but within ±3 • , the error range is at most 10 to 15% of the velocity value in most cases.

Application to River Flow Measurement
To verify the applicability of the method to real-time measurements in a real river, we compared the results of the CNN estimation (combined with the Fourier superior angle analysis) described in the previous section with the results of the conventional gradient tensor method, the Fourier predominant angular analysis method (FTMaxAngle), manual analysis, and LSPIV analysis using Fudaa-LSPIV software (Version 1.7.3, which is available as free software from EDF and Irstea, Paris, France.) [37,38]. Assuming a realtime measurement system, the conventional STIV method used all common parameters (parameter values are described in Section 2.2) and did not adjust the parameters on a caseby-case basis. On the other hand, in the LSPIV analysis, the parameters had to be adjusted for each case because it is difficult to analyze various cases with common parameters. The details of the settings of the LSPIV analysis are shown in Table 4. Table 4. The details of the settings of the LSPIV analysis. Velocities were calculated as the average of the velocities of all frames after filtering out those with correlation coefficients less than 0.5.  Figures 10-13 show the measurement conditions for cases where STI is classified as normal, shadow, light, and wavy, respectively [26]. In the normal case, since the water surface is fairly flat and the flow velocity is nearly constant, the STI exhibits a linear parallel pattern without significant noise. In the shadow case, part of the search line crosses the shaded area created by the bridge, and the STI shows a non-uniform texture. The light case is obtained from nighttime shooting, where the reflection from the streetlight makes part of the STI too bright, making it difficult to detect the texture indicating the flow signal. In the wavy case, the boil vortices are actively interacting with the water surface. In this case, the combination of the dispersive waves caused by the boil vortices and the advection of the water surface features produces a complex pattern on the STI. allel pattern without significant noise. In the shadow case, part of the search line crosses the shaded area created by the bridge, and the STI shows a non-uniform texture. The light case is obtained from nighttime shooting, where the reflection from the streetlight makes part of the STI too bright, making it difficult to detect the texture indicating the flow signal. In the wavy case, the boil vortices are actively interacting with the water surface. In this case, the combination of the dispersive waves caused by the boil vortices and the advection of the water surface features produces a complex pattern on the STI.      Table 5. When compared with LSPIV, the normal case shows good consistency between STIV and LSPIV. However, in the shadow and light cases, where the videos were taken under unfavorable conditions, the analyzed velocities by LSPIV are unstable and underestimated, especially on the opposite shore side of the camera (left bank side in Figures 14 and 15), where the image resolutions are decreased. In the wavy case, LSPIV also shows good consistency with STIV as in the normal case, but the tendency of underestimation on the opposite shore (right bank side in Figure 15) is the same. These results will be discussed in a later section. As shown in Figures 14 and 15, most of the outliers measured by the conventional STIVs have been improved in the CNN, resulting in robust estimates that are consistent with the manual analysis. However, in the wavy case, there are some STIs where CNN estimates are different from those by the manual estimation. An example of such a case is shown in Figure 16, in which linear peaks due to gravitational waves appear. These patterns are similar to those produced by turbulence (flow signal), but with different coexisting slopes. In such cases, these peaks can be mistaken as a flow signal that corresponds to the average flow velocity on the surface. This is because the training data artificially generated by Perlin noise did not include these patterns.  Table 5. When compared with LSPIV, the normal case shows good consistency between STIV and LSPIV. However, in the shadow and light cases, where the videos were taken under unfavorable conditions, the analyzed velocities by LSPIV are unstable and underestimated, especially on the opposite shore side of the camera (left bank side in Figures 14 and 15), where the image resolutions are decreased. In the wavy case, LSPIV also shows good consistency with STIV as in the normal case, but the tendency of underestimation on the opposite shore (right bank side in Figure 15) is the same. These results will be discussed in a later section. As shown in Figures 14 and 15, most of the outliers measured by the conventional STIVs have been improved in the CNN, resulting in robust estimates that are consistent with the manual analysis. However, in the wavy case, there are some STIs where CNN estimates are different from those by the manual estimation. An example of such a case is shown in Figure 16, in which linear peaks due to gravitational waves appear. These patterns are similar to those produced by turbulence (flow signal), but with different coexisting slopes. In such cases, these peaks can be mistaken as a flow signal that corresponds to the average flow velocity on the surface. This is because the training data artificially generated by Perlin noise did not include these patterns.       To improve the false detection case, 500 STIs showing a pattern similar to the false detection case were collected from the actual river video, and additional learning of CNN was performed. Figure 17 shows examples of STIs for additional training data. Angles that can be manually confirmed were set for the labels (ground truths). Since the amount of additional learning data is considerably smaller than the STI artificially generated by Perlin noise: 36,000, the learning utilization ratio of the artificial STI and the STI additionally collected from the actual river video was set to 4:1 by random sampling with duplication. Figure 18 shows the results of re-estimation with the CNN after additional learning for the false detection cases indicated in Figure 16. From Figure 18, it is obvious that the estimated result is greatly improved by additionally learning the cases where multiple peaks coexist in STI. To improve the false detection case, 500 STIs showing a pattern similar to the false detection case were collected from the actual river video, and additional learning of CNN was performed. Figure 17 shows examples of STIs for additional training data. Angles that can be manually confirmed were set for the labels (ground truths). Since the amount of additional learning data is considerably smaller than the STI artificially generated by Perlin noise: 36,000, the learning utilization ratio of the artificial STI and the STI additionally collected from the actual river video was set to 4:1 by random sampling with duplication. Figure 18 shows the results of re-estimation with the CNN after additional learning for the false detection cases indicated in Figure 16. From Figure 18, it is obvious that the estimated result is greatly improved by additionally learning the cases where multiple peaks coexist in STI.
of additional learning data is considerably smaller than the STI artificially generated by Perlin noise: 36,000, the learning utilization ratio of the artificial STI and the STI additionally collected from the actual river video was set to 4:1 by random sampling with duplication. Figure 18 shows the results of re-estimation with the CNN after additional learning for the false detection cases indicated in Figure 16. From Figure 18, it is obvious that the estimated result is greatly improved by additionally learning the cases where multiple peaks coexist in STI.

Discussions
In the adverse case shown in Figures 13 and 14, the STIV method using CNN shown to be stable without any parameter adjustment. Comparing with the results LSPIV in these cases, the robust measurement performance of STIV is well demonst On the other hand, there are some recent studies concerning the uncertainty intro in the image velocimetry methods due to adverse conditions, especially in PIVmethods [39,40]. The parameter sensitivity control method and the velocity corr method proposed in these studies may stabilize and improve the LSPIV results bette the present LSPIV cases, but further examination is necessary. However, as far streamwise flow measurement for discharge estimation is concerned, the proposed technique yielded more favorable results than the conventional LSPIV techniqu shown in Figures 16-18, although machine learning methods such as deep learnin vulnerable to unlearned patterns, one of the major advantages of the methods is th accuracy can be continuously improved by learning additional data. As data are acc lated through continuous observation, further improvements in accuracy can be exp by enriching more data to learn in the future. In addition, there is also a possibi building a CNN that is automatically optimized for each observation point by lea

Discussions
In the adverse case shown in Figures 13 and 14, the STIV method using CNN was shown to be stable without any parameter adjustment. Comparing with the results from LSPIV in these cases, the robust measurement performance of STIV is well demonstrated. On the other hand, there are some recent studies concerning the uncertainty introduced in the image velocimetry methods due to adverse conditions, especially in PIV-based methods [39,40]. The parameter sensitivity control method and the velocity correction method proposed in these studies may stabilize and improve the LSPIV results better than the present LSPIV cases, but further examination is necessary. However, as far as the streamwise flow measurement for discharge estimation is concerned, the proposed STIV technique yielded more favorable results than the conventional LSPIV technique. As shown in Figures 16-18, although machine learning methods such as deep learning are vulnerable to unlearned patterns, one of the major advantages of the methods is that the accuracy can be continuously improved by learning additional data. As data are accumulated through continuous observation, further improvements in accuracy can be expected by enriching more data to learn in the future. In addition, there is also a possibility of building a CNN that is automatically optimized for each observation point by learning the patterns unique to each point. As for the calculation speed, the CNN took about 0.5-1 s per STI on a consumer-level computer (it requires a CPU that supports the AVX instruction set, but most recent CPUs meet the requirements) to estimate an angle after learning. While not as fast as the gradient tensor method or the Fourier predominant angular analysis method (less than 0.1 s per STI), this is fast enough to build a real-time measurement system, since the measurement frequency in a typical real-time measurement system is about every 10 min to an hour. The most time-consuming part of the STIV process is the STI generation process (less than 10 s per STI in most cases for a 15-s video with 30 fps), and the total time for STIV measurement is less than 1 to 2 min in most cases. Finally, although it might be difficult to make a direct comparison, LSPIV took about 10 times longer than STIV in the present analysis.

Conclusions
In this study, we applied a deep learning method to STIV's STI pattern gradient detection process and verified its effectiveness in realizing a real-time measurement system using STIV. It was confirmed that the new method can provide favorable flow velocity estimations with no parameter adjustment even when the conventional methods yield erroneous results and manual adjustment is required. Unlike conventional methods, the new method has the advantage of being able to continuously improve accuracy by learning from further examples as training data. Although the deep learning method takes time to learn the data, the inference calculation speed in the actual operation after learning is fast, and it is very suitable for real-time measurement systems.