Deep Learning-Based Enhancement for Surface Velocity Measurements in Tidal Estuaries

Huang, Wei-Che; Wulansari, Whita; Suharyanto,; Liu, Wen-Cheng

doi:10.3390/w18040468

Open AccessArticle

Deep Learning-Based Enhancement for Surface Velocity Measurements in Tidal Estuaries

¹

Department of Civil and Disaster Prevention Engineering, National United University, Miaoli 360302, Taiwan

²

Department of Civil Engineering, Diponegoro University, Semarang 50275, Indonesia

^*

Authors to whom correspondence should be addressed.

Water 2026, 18(4), 468; https://doi.org/10.3390/w18040468

Submission received: 12 January 2026 / Revised: 3 February 2026 / Accepted: 9 February 2026 / Published: 11 February 2026

(This article belongs to the Special Issue AI, Machine Learning and Digital Twin Applications in Water, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Accurate estimation of river surface velocity is essential for hydrological monitoring and flood management. However, conventional Large-Scale Particle Image Velocimetry (LSPIV) is often affected by errors arising from inaccurate Region of Interest (ROI) delineation and interference from floating objects or vessels. To overcome these limitations, this study integrates LSPIV with two deep learning models, SegNet and YOLOv8, to enable automated ROI segmentation and vessel detection. SegNet performs real-time identification of water body regions, while YOLOv8 detects and removes vessel intrusions within the ROI, thereby enhancing the precision of velocity estimation. Six field experiments were conducted to assess the performance of the proposed system. The deep learning-enhanced LSPIV achieved Root Mean Square Error (RMSE) values ranging from 0.048 to 0.11 m/s and Normalized RMSE (NRMSE) values between 3.53% and 10.34%, with coefficients of determination (R²) exceeding 0.895 when compared with Acoustic Doppler Current Profiler (ADCP) measurements. SegNet-based ROI segmentation reduced RMSE by up to 0.046 m/s andNRMSE by up to 3.44%, and improved R² by up to 0.012, while image enhancement further improved segmentation accuracy under varying illumination conditions. Moreover, YOLOv8 successfully detected all vessel intrusions observed in this study, thereby reducing the discrepancies between LSPIV and ADCP-derived velocities from 0.032–0.345 m/s to 0.022–0.314 m/s. Overall, the integration of LSPIV with SegNet and YOLOv8 establishes a highly automated and accurate framework for river surface velocity estimation, demonstrating strong potential for real-time hydrological monitoring and flood risk assessment.

Keywords:

surface velocity measurement; deep learning; LSPIV; SegNet; ROI; YOLOv8

Graphical Abstract

1. Introduction

Monitoring river velocity is fundamental to effective river management. As the key variable for discharge estimation, velocity underpins reservoir operation, agricultural irrigation, and urban water supply planning. Velocity observations further capture flood-wave propagation in near real time, supporting early warning and evacuation. Traditional float and current-meter techniques are limited to small rivers or localized measurements. Among established methods, the Acoustic Doppler Current Profiler (ADCP) provides high-precision, contact-based observations, whereas Large-Scale Particle Image Velocimetry (LSPIV) offers a non-intrusive and flexible alternative that has gained increasing prominence in both research and practice. Non-contact measurement techniques do not disturb the observed target and are therefore generally regarded as providing estimates that are closer to the true values than those obtained using contact-based methods. Consequently, such techniques have been increasingly adopted across a broad range of applications, including flow measurements in bridge engineering [1,2].

LSPIV, introduced in 1998 as an adaptation of conventional Particle Image Velocimetry (PIV) for natural rivers [3], enables instantaneous measurement of surface velocity fields over areas of several hundred square meters, with typical errors within 10% [4,5,6]. Numerous field studies have confirmed strong agreement between LSPIV-derived surface velocities and ADCP measurements, demonstrating sufficient accuracy for operational monitoring [7,8,9]. Early implementations relied on fixed cameras [10,11], later expanding to handheld cameras and smartphones [12,13,14,15]. More recently, the integration of LSPIV with unmanned aerial vehicles (UAVs) has provided high mobility, enabling monitoring in remote sites and during flash-flood events [16,17,18].

In parallel, rapid advances in deep learning have transformed computer vision. Since the breakthrough of Convolutional Neural Networks (CNNs) in 2012 [19,20], four primary application domains have emerged: image classification, semantic segmentation, object detection, and instance segmentation. Representative semantic segmentation models include U-Net, SegNet, and DeepLab v3+, introduced in 2015, 2017, and 2018, respectively [21,22,23]. DeepLab v3+ generally achieves the highest accuracy but with greater computational cost, while SegNet and U-Net offer lower computational demands at the expense of reduced precision. In object detection, the “You Only Look Once” (YOLO) framework marked a milestone by integrating localization and classification into a single-pass model, enabling real-time detection [24]. Successive YOLO versions, from YOLOv3 and YOLOv4 to YOLOv7 and YOLOv8, have further advanced accuracy and efficiency [25,26,27,28].

Although LSPIV has demonstrated satisfactory accuracy and flexibility, its application in real river environments remains subject to several challenges. Accurate delineation of the region of interest (ROI) is critical for reliable LSPIV-based velocity estimation; however, fixed or manually defined ROI may inadvertently include near-bank stagnant zones or exposed shallow areas resulting from water level fluctuations [5,29]. In addition, non-hydrodynamic features within the camera field of view, such as vessels or intruding animals and birds, can introduce errors in velocity estimation [4,10]. These limitations hinder the implementation of LSPIV for long-term and fully automated monitoring.

In response to these practical challenges, recent research has increasingly integrated deep learning techniques into image-based river velocity measurement approaches such as LSPIV, Space–Time Image Velocimetry (STIV), and optical flow methods to enhance both accuracy and automation [30]. For instance, CNNs have been embedded within STIV frameworks to automatically identify flow-texture orientations, thereby improving robustness under complex flow conditions [31,32,33,34]. Similarly, the combination of optical flow algorithms with deep learning enables velocity estimation without manual feature extraction, demonstrating both accuracy and practical feasibility [35,36]. The Recurrent All-Pairs Field Transforms (RAFT) model further advances optical flow applications by reliably reconstructing velocity fields under varying illumination and viewing conditions [37,38,39]. Although still emerging, the application of deep learning to non-contact, image-based river velocity measurements exhibit strong potential for improving accuracy while reducing the need for human intervention.

Within this context, STIV and optical flow techniques have already benefited from deep learning integration, whereas LSPIV has yet to be systematically coupled with such methods. To address this gap, the present study integrates LSPIV with two deep learning models—SegNet for semantic segmentation and YOLOv8 for object detection—to enhance surface velocity estimation. In contrast to previous studies that directly integrate deep learning models into STIV or optical flow-based velocity estimation, this study adopts an alternative integration strategy by applying deep learning to key preprocessing stages of LSPIV (Table 1). Specifically, deep learning is employed for automated ROI identification and vessel interference detection, thereby enhancing the automation level and robustness of LSPIV-based velocity estimation under complex flow conditions while preserving the physical interpretability inherent to the cross-correlation principles of LSPIV. Accordingly, this study does not seek to develop or modify deep learning model architectures; instead, it focuses on the systematic integration of established deep learning models into the LSPIV-based river surface velocity measurement framework to improve automation and practical applicability. The main contributions of this study are as follows:

(1): Automated ROI detection: For tidal rivers with highly variable water extents, SegNet automatically delineates the ROI, within which LSPIV is applied for velocity computation.
(2): Automated intrusion detection: YOLOv8 identifies intruding objects (e.g., vessels), enabling the exclusion of affected regions from analysis and thereby reducing estimation bias.
(3): Adaptation to field conditions: The robustness and feasibility of the proposed approach are assessed under representative field conditions to evaluate its real-world applicability.

2. Materials and Methods

2.1. Model Architecture

This study develops a deep learning-enhanced LSPIV framework for river surface velocity measurement (Figure 1). In this framework, YOLOv8 is applied to detect intruding objects, while SegNet segments the water body. The intrusion-free water body region is defined as the ROI for subsequent LSPIV analysis. Concurrently, the water boundary extracted by SegNet is treated as the water level line, from which the water level is derived using a virtual gauge. Accurate water level estimation is critical for reliable surface velocity calculation. The LSPIV procedure incorporates the collinearity equation, image matching, and spatial intersection, with collinearity parameters obtained from a tri-axial accelerometer. By integrating SegNet and YOLOv8, the framework enables flow velocity estimation within an automatically adjusted ROI that adapts to water level variations, while eliminating interference from intruding objects. The program was implemented in MATLAB (R2024b) and executed on a workstation equipped with an Intel i7-11700 2.5 GHz CPU, 32 GB DDR4 RAM, and an NVIDIA GeForce RTX 3060 12 GB GPU.

2.2. Study Site

The field study was conducted on the Tamsui River, a major river system in northern Taiwan. Originating from the confluence of the Dahan and Xindian Rivers, the Tamsui flows through Taipei City before discharging into the Taiwan Strait. With a total length of approximately 159 km, it is vital to the region’s water supply, transportation, and flood control [40]. The research focused on a section near Taipei Bridge, where hydrological conditions are strongly influenced by tidal dynamics, urban runoff, and seasonal rainfall. This site was selected because of its frequent surface velocity fluctuations and high vessel traffic, both of which pose challenges to achieving accurate LSPIV measurements. Such conditions provide a rigorous test environment for evaluating the performance of deep learning–enhanced LSPIV. A camera was mounted at the Tamsui River water level gauge station to capture images encompassing the full river width (Figure 2). The river width varies from approximately 360 m at high tide to about 200 m at low tide.

2.3. Data Collection

The Water Resources Agency of Taiwan (Tenth River Management Branch, Ministry of Economic Affairs) conducts annual full-tide measurements of the Tamsui River using an ADCP. The ADCP records cross-sectional velocity and discharge at 30 min intervals and was employed in this study to validate LSPIV-derived surface velocity.

LSPIV observations were carried out from 2019 to 2024, with six field experiments conducted on 17 July 2019; 5 July 2020; 25 June 2021; 29 June 2022; 18 June 2023; and 20 June 2024. During each campaign, image acquisition was performed between 9:00 a.m. and 5:00 p.m. at 10 min intervals. Each acquisition consisted of three image sets, with five consecutive frames per set captured at a 0.05 s interval. The image resolution was 1624 × 1234 pixels.

To generate training datasets for SegNet and YOLOv8, an additional field survey was conducted on 22 December 2024, from 9:00 a.m. to 5:00 p.m. Images were captured every 30 min at 20 fps, with five frames collected per interval, yielding 85 raw images. These images were augmented using MATLAB’s Image Augmenter toolbox through rotation, shearing, and color adjustment, resulting in 500 images for SegNet training. In parallel, 76 vessel-containing images were collected and similarly augmented to produce 500 images for the YOLOv8 training dataset. For the SegNet training dataset, images were manually annotated in MATLAB by labeling each pixel as either “water” or “non-water,” enabling the model to learn river extents and bank boundaries. For the YOLOv8 training dataset, vessel targets were manually annotated in MATLAB using bounding boxes. Only a single class (vessel) was defined to specifically identify and remove interference objects.

2.4. Deep Learning Models

2.4.1. SegNet Model

SegNet was adopted as the semantic segmentation model in this study based on considerations of segmentation accuracy, computational cost, and compatibility with the MATLAB environment. Compared with models such as DeepLab, SegNet requires relatively lower computational and memory resources, making it suitable for applications that demand efficient processing. In comparison with U-Net, SegNet provides a more streamlined architecture and a mature, stable implementation in MATLAB, thereby supporting reliable integration into the proposed workflow.

SegNet comprises two main components: an encoder that extracts features using convolutional layers with downsampling, and a decoder that reconstructs spatial details through upsampling and stored pooling indices [22]. Unlike conventional architectures that discard spatial information during downsampling, SegNet preserves these indices, enabling accurate boundary reconstruction during decoding [41].

In this study, the MATLAB built-in SegNet model was employed. The network consists of 59 layers and 66 connections, with both the encoder and decoder containing eight convolutional layers. Input images were set to 320 × 240 pixels. Convolutional filters measured 3 × 3 pixels, with 64 filters per layer and a stride of 1. Batch Normalization layers were configured with a mean and variance decay factor of 0.1. Training was performed with a maximum of 100 epochs and a batch size of 10.

2.4.2. YOLOv8 Model

YOLO is a real-time object detection framework that performs classification and localization in a single network pass, in contrast to multi-stage detection methods that rely on region proposals [24,42]. This one-stage design enables efficient and accurate image analysis, making YOLO well suited for continuous monitoring of rapidly varying river conditions.

YOLOv8 consists of three modules: Backbone, Neck, and Head. The Backbone extracts features via convolutional layers and C2f modules, while the Spatial Pyramid Pooling Fast (SPPF) module captures multi-scale information. The Neck integrates features across resolutions using a Path Aggregation Network—Feature Pyramid Network (PAN-FPN) structure with upsampling and concatenation. The Head adopts an anchor-free design, directly performing bounding box regression and classification, thereby improving generalization and inference speed.

MATLAB provides a built-in YOLOv8 model for direct analysis. When retraining is required, MATLAB invokes the Python package, limiting the parameters adjustable within MATLAB. In this study, the YOLOv8-m model was used, with an input size of 320 × 240 pixels, an initial learning rate of 0.001, a maximum of 100 epochs, and a batch size of 10.

2.5. LSPIV

The collinearity equation, derived from the principle of perspective projection, defines the mapping between real-world object coordinates and their image plane counterparts. When lens distortion is considered during camera calibration, the equations are expressed as Equations (1) and (2) [6], where the left-hand side represents image coordinates and the right-hand side denotes real-world coordinates. The distortion corrections Δu and Δv are specified in Equations (3) and (4) [8].

u_{n} - u_{c} - ∆ u = - f \frac{[r_{11} (x_{n} - x_{c}) + r_{12} (y_{n} - y_{c}) + r_{13} (z_{n} - z_{c})]}{[r_{31} (x_{n} - x_{c}) + r_{32} (y_{n} - y_{c}) + r_{33} (z_{n} - z_{c})]}

(1)

v_{n} - v_{c} - ∆ v = - f \frac{[r_{21} (x_{n} - x_{c}) + r_{22} (y_{n} - y_{c}) + r_{23} (z_{n} - z_{c})]}{[r_{31} (x_{n} - x_{c}) + r_{32} (y_{n} - y_{c}) + r_{33} (z_{n} - z_{c})]}

(2)

∆ u = (u - C_{u}) (q_{1} r^{2} + k_{2} r^{4}) + p_{1} (r^{2} + 2 (u - C_{u})^{2}) + 2 p_{2} (u - C_{u}) (v - C_{v})

(3)

∆ v = (v - C_{v}) (q_{1} r^{2} + k_{2} r^{4}) + p_{2} (r^{2} + 2 (v - C_{v})^{2}) + 2 p_{1} (u - C_{u}) (v - C_{v})

(4)

The perspective center, denoted as (x_c, y_c, z_c), represents the camera position in real-world coordinates, while (u_c, v_c) specifies its location in the image plane. The focal length f defines the distance from the perspective center to the image plane. The rotation matrix elements r_ij are derived from the three rotation angles: azimuth (α), tilt (τ), and roll (θ). The object point coordinates in real-world space are expressed as (x_n, y_n, z_n), and their conjugate image coordinates are denoted as (u_n, v_n). The principal point, or image center, is represented by (C_u, C_v). Lens distortion is accounted for using the radial distortion coefficients q₁ and q₂, the tangential distortion coefficients p₁ and p₂, and the radial distance r from the image center.

In the space intersection method, conventional LSPIV applies the back-intersection approach, in which the parameters of the collinearity equations are solved using the coordinates of ground control points. Even when the interior orientation parameters (u_c, v_c, f, q₁, q₂, p₁, and p₂) are obtained through laboratory calibration, at least three control points are still required to determine the six exterior orientation parameters (x_c, y_c, z_c, α, τ, and θ). For river surface velocity estimation, however, a two-dimensional forward intersection method can be applied. In this case, the exterior parameter z_c is obtained from elevation measurements, the tilt angle (τ) is determined using a tri-axial accelerometer, and the remaining four parameters are assumed to be zero.

A tri-axial accelerometer provides a novel means of estimating the camera orientation [9]. When the accelerometer lies horizontally, the gravitational acceleration vector (g) points vertically downward, resulting in X- and Y-axis components of 0 g, while the Z-axis component equals g. When the accelerometer is tilted, gravitational acceleration is distributed across all three axes, with values dependent on orientation. These components are directly related to spatial attitude and can be expressed as follows:

\tan κ_{1} = A x / \sqrt{A y \times A y + A z \times A z}

(5)

\tan β_{1} = A y / \sqrt{A x \times A x + A z \times A z}

(6)

\tan γ_{1} = A z / \sqrt{A x \times A x + A y \times A y}

(7)

where Ax, Ay, and Az denote the gravitational acceleration along the X-, Y-, and Z-axes, respectively. The angles k₁, β₁, and γ₁ represent the orientation of the object relative to the horizontal plane, with k₁ and β₁ corresponding to the τ and θ angles in the collinearity equations.

Image matching is a critical step in LSPIV, where surface velocity is derived by tracking the displacement of tracer particles on the water surface across successive frames. The process begins by defining an Interrogation Area (IA) in the initial frame (e.g., t = 0 s) centered on the target point. In the subsequent frame (e.g., t = 0.05 s), a Search Area (SA) is specified to locate the displaced target. When the flow direction is unknown, the SA is extended symmetrically around the IA; if the flow direction is known, the SA is extended preferentially along the flow to enhance computational efficiency. The IA is then iteratively shifted within the SA, and the location with the highest similarity is identified as the match.

Similarity was evaluated using the correlation coefficient method [43,44], which ranges from 0 (no similarity) to 1 (perfect similarity). The correlation coefficient (cc) is defined as follows:

c c = \frac{\sum_{i = 1}^{m} \sum_{j = 1}^{n} [(A_{i j} - \bar{A}) (B_{i j} - \bar{B})]}{\sqrt{[\sum_{i = 1}^{m} \sum_{j = 1}^{n} {(A_{i j} - \bar{A})}^{2}] [\sum_{i = 1}^{m} \sum_{j = 1}^{n} {(B_{i j} - \bar{B})}^{2}]}}

(8)

where A_ij denotes the gray value of the pixel in row i, column j of the IA in the first frame, B_ij expresses the corresponding pixel in the second frame,

\bar{A}

represents the mean gray value of all pixels in the first IA, and

\bar{B}

indicates the mean gray value of all pixels in the second IA.

2.6. Virtual Water Gauge

The virtual water gauge integrates real-world elevation data into the computational domain by mapping elevation values to their corresponding image coordinates. Using the SegNet model to identify the waterline, the intersection between the waterline and the virtual gauge provides the image space Y coordinate of the water level. The corresponding water level elevation is then determined through the virtual gauge method [45], as expressed in Equation (9):

W = \frac{(W_{2} - W_{1})}{(Y_{2} - Y_{1})} (Y - Y_{1}) + W_{1}

(9)

where W₁ denotes the elevation of the levee crest (m), W₂ expresses the elevation of the levee toe (m), W represents the water level elevation (m), Y₁ and Y₂ indicate the v-axis coordinates of the crest and toe of the levee in the image (pixels), and Y is the v-axis coordinate of the detected waterline (pixels).

2.7. Evaluation Metrics

2.7.1. Metrics for Deep Learning Models

The performance of the SegNet and YOLOv8 models was evaluated using standard metrics. SegNet results were reported in terms of Loss and Mean Accuracy [46,47], while YOLOv8 performance was assessed using Loss and mean Average Precision (mAP) [48,49]. The respective formulations are summarized below:

SegNet:

S e g N e t L o s s = - \frac{1}{N} \sum_{i = 1}^{N} \sum_{c = 1}^{C} y_{i, c} l o g ({\hat{y}}_{i, c})

(10)

M e a n A c c u r a c y = \frac{1}{C} \sum_{c = 1}^{c} \frac{{T P}_{c}}{{T o t a l}_{c}}

(11)

YOLOv8:

Y O L O L o s s = λ_{b o x} L_{b o x} + λ_{c l s} L_{c l s} + λ_{d f l} L_{d f l}

(12)

L_{b o x} = 1 - I o U (b_{p r e d}, b_{g t})

(13)

L_{c l s} = - \frac{1}{N} \sum_{i = 1}^{N} [y_{i} \log ({\hat{y}}_{i}) + (1 - y_{i}) l o g (1 - {\hat{y}}_{i})]

(14)

L_{d f l} = - \frac{1}{N} \sum_{i = 1}^{N} q_{i} l o g (p_{i})

(15)

m A P = \frac{1}{C} \sum_{c = 1}^{c} {A P}_{c}

(16)

{A P}_{c} = \frac{1}{M} \sum_{i = 1}^{M} P_{j} {Δ R}_{j}

(17)

P_{j} = \frac{T P}{T P + F P}

(18)

{Δ R}_{j} = \frac{T P}{T P + F N}

(19)

where N denotes the number of pixels, C the number of classes, y_i,c the ground-truth label of pixel i for class c,

{\hat{y}}_{i, c}

the predicted probability, TP_c the number of correctly classified pixels in class c, and Total_c the total number of pixels in class c. For YOLOv8, λ_box, λ_cls, and λ_dfl are weighting coefficients, IoU denotes intersection-over-union, b_pred and b_gt are the predicted and ground-truth bounding boxes, y_i is the true label,

{\hat{y}}_{i}

the predicted probability, q_i the target distribution, p_i the predicted distribution, AP_c the average precision for class c, M the number of sampling points, P_j the precision at point j, and ΔR_j the change in recall. TP, FP, and FN represent true positives, false positives, and false negatives, respectively.

2.7.2. Statistical Errors for Surface Velocity Measurement

The accuracy of LSPIV-derived surface velocities was validated against reference velocities measured by ADCP. Three error metrics were used: Root Mean Square Error (RMSE), Normalized RMSE (NRMSE), and the coefficient of determination (R²) [50,51]. RMSE reflects the average magnitude of prediction errors, with larger discrepancies weighted more heavily. NRMSE normalizes RMSE relative to the observed velocity range, facilitating comparisons across datasets. The R² statistic quantifies the proportion of variance in the ADCP data explained by LSPIV predictions.

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {({V_{i}}^{L S P I V} - {V_{i}}^{A D C P})}^{2}}

(20)

N R M S E = \frac{R M S E}{V_{m a x} - V_{m i n}} \times 100 %

(21)

R^{2} = 1 - \frac{\sum_{i = 1}^{N} {({V_{i}}^{L S P I V} - {V_{i}}^{A D C P})}^{2}}{\sum_{i = 1}^{N} {({V_{i}}^{L S P I V} - {\bar{V}}^{A D C P})}^{2}}

(22)

where

{V_{i}}^{L S P I V}

and

{V_{i}}^{A D C P}

represent the surface velocities from LSPIV and ADCP, respectively; n denotes the number of samples; V_max and V_min express the maximum and minimum velocities measured by LSPIV; and

{\bar{V}}^{A D C P}

is the mean ADCP velocity.

The Taylor diagram [52] provides a statistical visualization framework for simultaneously evaluating multiple simulations against observational data within a single plot. It integrates the correlation coefficient (R), standard deviation (σ), and RMSE in a polar coordinate system, offering a comprehensive depiction of the agreement and deviations between model outputs and observations.

In this study, six experimental cases were analyzed; thus, both the standard deviation and RMSE were normalized in the Taylor diagram. The normalized standard deviation (NSD) was calculated by dividing the standard deviations of river surface velocity obtained from LSPIV and deep learning-based LSPIV by that derived from the ADCP. Similarly, the normalized RMSE (δ) was determined by dividing the RMSE of each method by the ADCP-derived standard deviation of river surface velocity.

N S D = \frac{σ_{L S P I V}}{σ_{A D C P}} = \frac{\sqrt{\frac{1}{n - 1} \sum_{i = 1}^{n} {({V_{i}}^{L S P I V} - {\bar{V}}^{L S P I V})}^{2}}}{\sqrt{\frac{1}{n - 1} \sum_{i = 1}^{n} {({V_{i}}^{A D C P} - {\bar{V}}^{A D C P})}^{2}}}

(23)

δ = \frac{R M S E}{σ_{A D C P}}

(24)

R = \sqrt{R^{2}}

(25)

3. Results

3.1. Training and Validation Results of the SegNet Model

The training and validation outcomes of the SegNet model for detecting the ROI on river surface images are presented in this section. A total of 500 images were used, with 70% allocated for training and 30% for validation. Model performance was assessed using loss and mean accuracy metrics to provide a comprehensive evaluation of the learning process.

As shown in Figure 3a, the training mean accuracy (orange line) began at approximately 30%, reflecting a high initial error rate. Accuracy improved steadily during training, reaching 92%, indicating that SegNet effectively learned from the training data as its parameters were optimized. In contrast, the validation mean accuracy (Blue line) exhibited fluctuations throughout training, which is typical given differences between training and validation datasets. These variations do not suggest substantial differences in performance between datasets.

Regarding loss (Figure 3b), the training loss decreased consistently from 1.5 to 0.5 over the course of training, demonstrating reduced prediction error. The validation loss followed a similar pattern, further confirming comparable performance between training and validation.

3.2. Training and Validation Results of the YOLOv8 Model

The YOLOv8 model exhibited training and validation patterns similar to those of SegNet, suggesting robust performance for both approaches. A total of 500 images were used, with the same 70/30 training–validation split.

For mAP (Figure 3c), the training curve rose steadily from 45% to 95%, while the validation mAP started at 35% and reached 95% by the end of training, showing consistent performance across datasets. Training loss (Figure 3d) began at 4.75, a higher starting point compared with SegNet, reflecting initial challenges in prediction accuracy. Nevertheless, the training loss decreased progressively to 0.7, indicating effective learning. Validation loss also started at 4.75 and converged to 0.7, with smaller oscillations than SegNet. This stability may be attributed to YOLO’s relatively simpler object detection task, which promoted more consistent performance across both datasets.

3.3. Comparison Between Deep Learning-Based LSPIV and ADCP

To assess the accuracy of the deep learning-based LSPIV, the derived velocity fields and contour maps were first examined for plausibility and subsequently compared with ADCP measurements. Figure 4 presents the river surface velocities obtained from the deep learning-based LSPIV at two-hour intervals between 8:00 a.m. and 6:00 p.m. on 5 July 2020. In each subfigure, the upper panel shows the velocity fields, while the lower panel displays the geometrically corrected velocity contours. The results demonstrate that the deep learning-based LSPIV effectively captures both ebb and flood tidal stages.

At 2:00 p.m. (Figure 4d), the velocity vectors within the vessel region at the upper-left corner were successfully removed (yellow boxes), confirming the effective performance of YOLOv8 in vessel detection. In addition, the velocity fields at 4:00 p.m. and 6:00 p.m. (Figure 4e,f) contain 22 rows of velocity vectors, whereas 23 rows were observed at other times. This reduction corresponds to a declining water level that exposed the sandbar along the upper image boundary; this non-water area was excluded from the ROI segmented by SegNet.

All velocities were transformed into real-world coordinates using collinearity equations, and the contours were interpolated within the SegNet-derived ROI. As shown in Figure 4, the Y-axis extent of the contours varies temporally, with a maximum of 360 m (Figure 4b) and a minimum of 190 m (Figure 4f), indicating that SegNet successfully identified dynamic changes in the water surface associated with water level fluctuations. Finally, the velocities along the X-axis (−10 m ≤ x ≤ 10 m) were averaged at 1 m intervals and compared with the corresponding ADCP measurements.

A comparison between deep learning-based LSPIV and ADCP measurements across six dates highlights the accuracy and reliability of these methods for estimating river surface velocity (Figure 5). The deep learning-based LSPIV successfully captured the overall velocity patterns, including ebb tide (positive velocity) and flood tide (negative velocity). ADCP measurements were acquired every 30 min, while LSPIV measurements were taken every 10 min from 8:00 a.m. to 6:00 p.m., except in 2019 when LSPIV data were collected at 30 min intervals.

To quantitatively assess the influence of individual deep learning modules on LSPIV-based velocity estimation, multiple statistical metrics, including the RMSE, NRMSE, and R², were employed for comparative evaluation (Table 2). The LSPIV–ADCP configuration without deep learning was used as the baseline. The configuration denoted as “LSPIV with SegNet” was adopted to evaluate the contribution of automated region-of-interest segmentation, whereas “LSPIV with YOLOv8” was used to independently assess the effect of vessel interference detection and removal. The integrated configuration, referred to as “DL-ADCP,” incorporates both components to examine whether their performance gains are additive.

The results show that incorporating SegNet alone substantially reduced RMSE and NRMSE across all experiments, with RMSE decreasing from 0.054–0.131 m/s for the baseline to 0.051–0.111 m/s. The introduction of YOLOv8 alone further reduced estimation errors, although the magnitude of improvement was smaller, indicating that vessel interference removal still contributes to enhanced velocity estimation. When SegNet and YOLOv8 were jointly applied (DL-ADCP), the lowest RMSE (0.048–0.110 m/s) and NRMSE (3.529–10.341%) were consistently obtained across all experiments, with R² values generally exceeding 0.895. These results demonstrate that the contributions of SegNet and YOLOv8 are complementary rather than redundant.

4. Discussion

4.1. Impact of SegNet-Detected ROI on River Surface Velocity Measurements

This section evaluates the performance of traditional LSPIV and deep learning-based LSPIV through comparison with ADCP measurements across six experimental cases. In the figures, the traditional LSPIV–ADCP comparison is denoted as LSPIV–ADCP, while the deep learning-based LSPIV–ADCP comparison is denoted as DL–ADCP. Figure 6 presents the results in terms of RMSE, NRMSE, and R², illustrated using bar charts and a Taylor diagram.

Figure 6a,b demonstrate that deep learning-based LSPIV consistently outperformed traditional LSPIV in all six experiments, indicating that the use of a dynamic ROI enhances the accuracy of surface velocity estimation. In contrast, the fixed ROI used in traditional LSPIV included shallow water regions with near-zero velocity, which inflated the overall error in velocity calculations. Figure 6c further shows that, despite these limitations, traditional LSPIV results still exhibited strong correlations with ADCP measurements, with all R² values exceeding 0.88. Nevertheless, employing a dynamic ROI to isolate flowing-water regions yielded more accurate results.

As illustrated in Figure 6d, the DL–ADCP results (blue points) consistently aligned more closely with the reference line (NSD = 1.0) and the overall reference position compared with the LSPIV–ADCP results (red points). Although both approaches achieved correlation coefficients exceeding 0.94, the dynamic ROI identified by SegNet demonstrated superior agreement with ADCP measurements. These findings indicate that integrating SegNet-based ROI detection prior to LSPIV analysis substantially enhances the accuracy of river surface velocity estimation relative to conventional fixed-ROI LSPIV.

The outcomes of this study are consistent with previous research [53,54,55]. A traditional image-processing method was used to delineate flowing regions in river imagery and establish an adaptive ROI, reducing the mean NRMSE of velocity estimates from 4.87% to 11.06% [53]. Another study defined the ROI based on the spatial distribution of detected surface particles, yielding a 65% reduction in RMSE for LSPIV-derived velocities [54]. Similarly, a deep learning model (YOLOv8) for dynamic ROI selection achieved a difference of only 3% between LSPIV and ADCP velocities in a single event [55]. In the present study, across six experiments, two cases showed reductions in RMSE exceeding 40%, while four cases demonstrated differences of less than 5% between LSPIV and ADCP when a dynamic ROI was applied.

4.2. The Impact of Lighting and Shadows on SegNet’s ROI Detection Accuracy

The performance of SegNet under varying environmental conditions was evaluated through a combination of visual inspection and statistical error analysis. This analysis aimed to assess the influence of lighting-induced inaccuracies in the detected ROI on river surface velocity estimation. Figure 7 presents, from left to right, the original image, the segmentation mask predicted by SegNet on the original image, the enhanced image, and the corresponding segmentation mask generated from the enhanced image. In the segmentation results, water areas are shown in blue, riverbanks in green, and the manually annotated waterline (red solid line) is used as the reference.

To mitigate the effects of surface glare and non-uniform illumination on waterline identification by SegNet, an image enhancement procedure was applied prior to analysis. Local contrast enhancement was performed using contrast-limited adaptive histogram equalization (CLAHE), with a clip limit of 0.01 and a tile grid size of 8 × 8, to suppress strong surface reflections and enhance flow-related texture features. In addition, global brightness normalization was conducted by computing the mean image luminance and adjusting the overall brightness toward a mid-range level to achieve balanced illumination. All image enhancement procedures were implemented using built-in MATLAB image processing functions and were consistently applied across the entire image dataset.

Two representative scenarios were examined. The first occurred at 6:00 p.m. on 5 July 2020, when the camera was oriented toward the setting sun, generating extensive glare patches across the water surface. Under these conditions, reduced water levels exposed nearshore sandbanks, which exhibited glare-like reflections under direct illumination. As a result, the SegNet-detected boundary (i.e., the interface between water and bank) deviated substantially from the actual waterline. Following glare reduction, the predicted waterline and ROI exhibited improved consistency with the reference. The second scenario was recorded at 8:00 a.m. on 25 June 2021, under overcast conditions. In this case, the transition between riverbank and water was indistinct, leading SegNet to underestimate the waterline elevation. Brightness and contrast adjustments enhanced the image quality, allowing SegNet to more accurately delineate the waterline.

River surface velocities derived from deep learning-based LSPIV using both original and enhanced images were compared against ADCP measurements. Evaluation was conducted using statistical metrics including RMSE, NRMSE, R², and Taylor diagram analysis. The results indicate that enhanced images yielded velocity estimates marginally closer to ADCP measurements than those from the original images, although the overall differences remained small. As shown in Figure 8, RMSE, NRMSE, and R² values were nearly identical for both cases. Improvements were primarily evident during specific time windows, particularly in early morning and late afternoon, when insufficient illumination or direct glare compromised Closed-Circuit Television (CCTV) recordings. Nevertheless, the Taylor diagram demonstrates that enhanced images consistently provided superior agreement with ADCP data.

Because LSPIV is an optical technique, illumination variability exerts a strong influence on ROI detection and velocity estimation. Previous studies support this conclusion. A 10–15% variation in light intensity produced velocity deviations exceeding 0.2 m/s [14]. Glare removal and contrast correction reduced velocity residuals by approximately 30% [56]. Brightness fluctuations in synthetic river videos induced optical-flow biases of up to 0.25 m/s, particularly under backlit conditions [57]. Low-light image enhancement further improved segmentation accuracy by 10–25% [58]. Relative to these studies, the image preprocessing applied in the present work reduced LSPIV-derived river surface velocity RMSE by approximately 0.02 m/s and NRMSE by about 1%. The comparatively smaller improvement can be attributed to the use of daily experimental datasets, where enhancements observed during morning and afternoon periods were diluted by other intervals. Nevertheless, the findings confirm that image preprocessing effectively mitigates the impact of illumination variability on ROI segmentation accuracy in deep learning models.

4.3. Investigation of YOLO’s Capability in Detecting Vessels

The Tamsui River’s sufficient water depth supports regular commercial vessel activity. During the experiments, a vessel equipped with an ADCP was also employed for river velocity measurements and occasionally entered the ROI. Accordingly, this section evaluates the performance of YOLOv8 in detecting vessels entering the ROI and assesses the potential influence of such intrusions on LSPIV-based river surface velocity estimation.

Under the conditions examined in this study, YOLOv8 successfully detected all vessel intrusions within the ROI (Figure 9). No false positives or missed detections were observed for the datasets considered. Nevertheless, this performance is limited to the scenarios investigated. Under more complex environmental conditions, such as the presence of dense floating debris, strong shadow effects, or large variations in vessel scale, false or missed detections may still occur. The figure illustrates four representative cases captured at 12:00 p.m. on 25 June 2021; 1:00 p.m. and 4:00 p.m. on 18 June 2023; and 8:00 a.m. on 20 June 2024. This high accuracy is primarily attributed to the strong visual contrast between the vessels and the surrounding water, which enables YOLOv8 to identify and delineate vessels effectively. In the figure, detected vessels are enclosed by yellow bounding boxes, with confidence scores displayed above each box. For instance, at 8:00 a.m. on 20 June 2024, a confidence score of 0.8243 indicates that YOLOv8 was 82.43% confident that the detected object was a vessel.

The detection results are consistent with those reported in previous studies [59,60]. An improved YOLOv8-based model, YOLO-MRS, was developed to identify 14 types of maritime vessels, achieving accuracy rates between 75.1% and 98.9% [59]. Similarly, another enhanced YOLOv8 variant for maritime vessel detection, YOLO-SEA, achieved an average accuracy of 88.2% [60]. The perfect detection accuracy obtained in this study can be attributed to the lack of vessel classification and the relatively simple riverine environment. Unlike in oceanic conditions, river vessels are not partially obscured by waves, thereby reducing detection errors.

The second row of Figure 9 presents the river surface velocity field derived from LSPIV after excluding the vessel regions identified by YOLOv8, while the third row provides a magnified view of the vessel area. These results demonstrate that labeling and filtering vessel regions allow LSPIV to effectively avoid areas of interference. Further comparisons among LSPIV, DL-LSPIV, and ADCP measurements at four time points corresponding to vessel intrusions are shown in Figure 10. Negative values denote flood-tide conditions, whereas positive values represent ebb tides. The comparison reveals that conventional LSPIV exhibits substantial deviations in regions affected by vessels, as it does not account for their interference. In contrast, the DL-LSPIV method, which integrates YOLOv8-based vessel detection and filtering, yields higher accuracy by excluding vessel-influenced zones, resulting in a more reliable representation of river surface velocities.

In the early development of LSPIV, most studies focused on mitigating error sources identified in Kim’s 2006 doctoral dissertation [4,12], while the impact of physical intrusions received limited attention. Wave shoaling and breaking caused by vessels can substantially increase hydrodynamic stress, thereby altering surface ripple patterns across the river [61]. Current practices typically address physical intrusions by manually selecting frames prior to LSPIV analysis to minimize interference [62,63,64]. Therefore, the deep learning-based approach proposed in this study which automatically detects and filters physically intrusive regions offers a novel and efficient means to enhance the automation and accuracy of LSPIV analyses for river surface velocity estimation.

4.4. Limitations and Future Work

This study integrates SegNet and YOLOv8 into the LSPIV framework to improve the accuracy and robustness of river surface velocity estimation. SegNet is employed for real-time identification of the ROI, while YOLOv8 is used to detect and exclude vessels intruding into the ROI. By combining these techniques, the reliability of LSPIV-based velocity analysis is substantially enhanced.

Compared with fully deep learning-based velocity estimation approaches, the proposed hybrid framework offers several practical advantages. First, LSPIV is based on well-established cross-correlation theory with clear physical interpretation, ensuring result consistency and interpretability across a wide range of flow conditions. Second, deep learning is applied only to tasks where it demonstrates clear strengths, namely ROI segmentation and interference object detection, thereby reducing dependence on large training datasets. Finally, this hybrid design improves robustness and transferability under complex environmental conditions while preserving the operational reliability of LSPIV, making the framework suitable for long-term, real-world river monitoring. Because the proposed framework operates at the image preprocessing stage, it can be readily extended to other image-based velocity measurement techniques, such as STIV and optical flow methods.

However, several limitations remain. The image dataset used for model training was obtained from a single source and exhibited limited variability in weather and environmental conditions. Although experiments were conducted between 08:00 and 18:00, all data were collected at a single station during the summer season, which may constrain the generalizability of SegNet and YOLOv8 to other rivers, seasons, or extreme environmental conditions [65,66]. In addition, varying illumination and shadow effects can occasionally cause SegNet to misclassify the ROI, for instance, failing to detect narrow flow channels or confusing riverbank shadows with open water [67,68]. Although image enhancement improved ROI segmentation accuracy in this study, it also increased the overall complexity of the automated workflow. Similarly, YOLOv8 demonstrates reduced performance when detecting vessels or floating debris of significantly different scales, particularly when targets are small or the background is visually complex [69,70]. Uncertainty may also arise in waterline identification, particularly under strong surface reflections, shallow inundation, or rapid water level fluctuations, where the land–water interface becomes ambiguous. In addition, camera installation geometry and viewing orientation can influence measurement accuracy; deviations from assumed camera pose or imperfect geometric calibration may propagate errors into the velocity analysis [6].

Future research can address these limitations in two main directions. First, model validation should be extended to additional river reaches to enhance applicability and generalization under diverse hydrological and environmental conditions. Transfer learning could also be employed to enable faster adaptation of the proposed framework to new monitoring sites [71,72]. Second, incorporating more advanced segmentation and detection models or adopting multimodal analysis could further improve the accuracy of ROI identification and interference object detection [69,73,74,75,76]. Moreover, given its automated nature, the proposed system could be integrated with onsite imaging sensors for real-time data transmission and analysis, supporting applications such as flood early warning and river surveillance [77,78]. These advancements would enhance the reliability, adaptability, and automation of the proposed approach under broader and more challenging operational conditions.

5. Conclusions

This study integrates LSPIV with deep learning models, SegNet and YOLOv8, to achieve automated ROI segmentation and vessel detection within the ROI, thereby enhancing the accuracy of LSPIV-based river surface velocity estimation. The performance improvements resulting from the incorporation of SegNet and YOLOv8 were systematically evaluated, and four major conclusions were drawn as follows:

(1): The integration of SegNet and YOLOv8 significantly improved the accuracy of LSPIV-derived surface velocities. Across six experimental cases, the RMSE between the deep learning-enhanced LSPIV system and ADCP measurements ranged from 0.048 m/s to 0.11 m/s, with NRMSE values between 3.53% and 10.34%, and coefficients of determination (R²) exceeding 0.895.
(2): When using SegNet alone for automatic ROI segmentation, the RMSE between LSPIV and ADCP measurements was reduced by 0.003–0.046 m/s compared with conventional LSPIV. Correspondingly, NRMSE decreased by 0.24–3.44%, while R² increased by 0.001–0.012, demonstrating the model’s effectiveness in improving flow velocity estimation accuracy.
(3): SegNet-based water body segmentation was found to be sensitive to variations in illumination, occasionally resulting in misclassification along ROI boundaries. Nevertheless, adjusting image brightness and contrast effectively mitigated these errors. After correction, the RMSE decreased by 0.001–0.018 m/s, NRMSE decreased by 0.09–1.06%, and R² increased by 0.001–0.006 across six experimental cases.
(4): The deep learning-based YOLOv8 detector effectively identified and excluded all vessels intruding into the ROI under the conditions examined in this study. In four test cases, the velocity difference between LSPIV and ADCP decreased from 0.032–0.345 m/s to 0.022–0.314 m/s following vessel filtering.

In summary, this study presents a novel framework that integrates LSPIV with SegNet and YOLOv8 to enable automated ROI identification and real-time vessel interference detection. The proposed approach enhances the level of automation in LSPIV while substantially improving the accuracy and reliability of river surface velocity measurements. Furthermore, the framework is readily extendable to STIV and optical flow techniques, thereby supporting the long-term and operational deployment of image-based measurement methods for river monitoring under practical field conditions.

Author Contributions

Conceptualization, W.-C.L. and W.-C.H.; methodology, W.-C.L., W.-C.H. and W.W.; software, W.-C.H. and W.W.; validation, W.-C.L., S. and W.-C.H.; formal analysis, W.W.; investigation, W.-C.L. and S.; resources, W.-C.L.; data curation, W.W. and W.-C.H.; writing—W.-C.H. and W.W.; original draft preparation, W.W.; writing—review and editing, W.-C.L. and S.; visualization, W.-C.H. and W.W.; supervision, W.-C.L. and S.; project administration, W.-C.L. and W.-C.H.; funding acquisition, W.-C.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially supported by the National Science and Technology Council (NSTC), Taiwan (Under grant number MOST 114-2625-M-239-001) and the World Class University (Visiting Professor) Program, Universitas Diponegoro 2025/2026.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Chen, G.B.; Chen, W.L.; Kim, K.C. An investigation of aerodynamic characteristics of leading-edge jet and trailing-edge suction set in the bridge main girder. Phys. Fluids 2025, 37, 075175. [Google Scholar] [CrossRef]
Chen, G.; Chen, W.L.; Chen, C.; Gao, D.; Meng, H.; Kim, K.C. Experimental and coupled model investigation of an active jet for suppressing vortex-induced vibration of a box girder. J. Fluids Struct. 2024, 127, 104119. [Google Scholar] [CrossRef]
Fujita, I.; Muste, M.; Kruger, A. Large-scale particle image velocimetry for flow analysis in hydraulic engineering applications. J. Hydraul. Res. 1998, 36, 397–414. [Google Scholar] [CrossRef]
Muste, M.; Fujita, I.; Hauet, A. Large-scale particle image velocimetry for measurements in riverine environments. Water Resour. Res. 2008, 44, W00D19. [Google Scholar] [CrossRef]
Le Coz, J.; Hauet, A.; Pierrefeu, G.; Dramais, G.; Camenen, B. Performance of image-based velocimetry (LSPIV) applied to flash-flood discharge measurements in Mediterranean rivers. J. Hydrol. 2010, 394, 42–52. [Google Scholar] [CrossRef]
Liu, W.C.; Huang, W.C.; Young, C.C. Uncertainty analysis for image-based streamflow measurement: The influence of ground control points. Water 2023, 15, 123. [Google Scholar] [CrossRef]
Bechle, A.J.; Wu, C.H.; Liu, W.C.; Kimura, N. Development and application of an automated river–estuary discharge imaging system. J. Hydraul. Eng. 2012, 138, 327–339. [Google Scholar] [CrossRef]
Huang, W.C.; Young, C.C.; Liu, W.C. Application of an automated discharge imaging system and LSPIV during typhoon events in Taiwan. Water 2018, 10, 280. [Google Scholar] [CrossRef]
Liu, W.C.; Huang, W.C. Development of a three-axis accelerometer and large-scale particle image velocimetry (LSPIV) to enhance surface velocity measurements in rivers. Comput. Geosci. 2021, 155, 104866. [Google Scholar] [CrossRef]
Hauet, A.; Creutin, J.D.; Belleudy, P. Sensitivity study of large-scale particle image velocimetry measurement of river discharge using numerical simulation. J. Hydrol. 2008, 349, 178–190. [Google Scholar] [CrossRef]
Tsubaki, R.; Fujita, I.; Tsutsumi, S. Measurement of the flood discharge of a small-sized river using an existing digital video recording system. J. Hydro-Environ. Res. 2011, 5, 313–321. [Google Scholar] [CrossRef]
Kim, Y.; Muste, M.; Hauet, A.; Krajewski, W.F.; Kruger, A.; Bradley, A. Stream discharge using mobile large-scale particle image velocimetry: A proof of concept. Water Resour. Res. 2008, 44, W09502. [Google Scholar] [CrossRef]
Dramais, G.; Le Coz, J.; Camenen, B.; Hauet, A. Advantages of a mobile LSPIV method for measuring flood discharges and improving stage–discharge curves. J. Hydro-Environ. Res. 2011, 5, 301–312. [Google Scholar] [CrossRef]
Lewis, Q.W.; Rhoads, B.L. Resolving two-dimensional flow structure in rivers using large-scale particle image velocimetry: An example from a stream confluence. Water Resour. Res. 2015, 51, 7977–7994. [Google Scholar] [CrossRef]
Le Boursicaud, R.; Pénard, L.; Hauet, A.; Thollet, F.; LeCoz, J. Gauging extreme floods on YouTube: Application of LSPIV to home movies for the post-event determination of stream discharges. Hydrol. Process. 2016, 30, 90–105. [Google Scholar] [CrossRef]
Liu, W.C.; Lu, C.H.; Huang, W.C. Large-scale particle image velocimetry to measure streamflow from videos recorded from unmanned aerial vehicle and fixed imaging system. Remote Sens. 2021, 13, 2661. [Google Scholar] [CrossRef]
Koutalakis, P.; Zaimes, G.N. River flow measurements utilizing UAV-based surface velocimetry and bathymetry coupled with sonar. Hydrology 2022, 9, 148. [Google Scholar] [CrossRef]
Schlobies, K.; Välimäki, J.M.; Takala, T.E.; Kärkkäinen, M.; Kuzmin, A.; Lotsari, E.S. Combining hydroacoustics and large-scale particle image velocimetry: Flow dynamics at Hiitolanjoki River restoration site in Southeast Finland. J. Hydrol. Reg. Stud. 2025, 58, 102291. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. In Proceedings of the 26th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; Available online: https://dl.acm.org/doi/10.5555/2999134.2999257 (accessed on 3 December 2012).
Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), Munich, Germany, 5–9 October 2015. [Google Scholar] [CrossRef]
Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder–decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar] [CrossRef]
Redmon, J.; Farhadi, A. YOLO v3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar] [CrossRef]
Sharma, A.; Kumar, V.; Longchamps, L. Comparative performance of YOLOv8, YOLOv9, YOLOv10, YOLOv11 and Faster R-CNN models for detection of multiple weed species. Smart Agric. Technol. 2024, 9, 100648. [Google Scholar] [CrossRef]
Khanam, R.; Asghar, T.; Hussain, M. Comparative performance evaluation of YOLOv5, YOLOv8, and YOLOv11 for solar panel defect detection. Solar 2025, 5, 6. [Google Scholar] [CrossRef]
Jolley, M.J.; Russell, A.J.; Quinn, P.F.; Perks, M.T. Considerations when applying Large-Scale PIV and PTV for determining river flow velocity. Front. Water 2021, 3, 709369. [Google Scholar] [CrossRef]
Tlhomole, J.B.; Hughes, G.O.; Zhang, M.; Piggott, M.D. From PIV to LSPIV: Harnessing deep learning for environmental flow velocimetry. J. Hydrol. 2025, 649, 132446. [Google Scholar] [CrossRef]
Watanabe, K.; Fujita, I.; Iguchi, M.; Hasegawa, M. Improving accuracy and robustness of space–time image velocimetry (STIV) with deep learning. Water 2021, 13, 2079. [Google Scholar] [CrossRef]
Gao, L.; Zhang, Z.; Chen, L.; Li, H. River surface space–time image velocimetry based on dual-channel residual network. Appl. Sci. 2025, 15, 5284. [Google Scholar] [CrossRef]
Liu, R.; He, D.; Li, N.; Pu, X.; Jin, J.; Wang, J. Estimation of river velocity and discharge based on video images and deep learning. Appl. Sci. 2025, 15, 4865. [Google Scholar] [CrossRef]
Huang, Y.; Chen, H.; Huang, K.; Chen, M.; Wang, J.; Liu, B. Optimization of space-time image velocimetry based on deep residual learning. Measurement 2024, 232, 114688. [Google Scholar] [CrossRef]
Khalid, M.; Pénard, L.; Mémin, E. Optical flow for image-based river velocity estimation. Flow Meas. Instrum. 2019, 65, 110–121. [Google Scholar] [CrossRef]
Jyoti, J.S.; Medeiros, H.; Sebo, S.; McDonald, W. River velocity measurements using optical flow algorithm and unoccupied aerial vehicles: A case study. Flow Meas. Instrum. 2023, 91, 102341. [Google Scholar] [CrossRef]
Teed, Z.; Deng, J. RAFT: Recurrent all-pairs field transforms for optical flow. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020. [Google Scholar] [CrossRef]
Lagemann, C.; Lagemann, K.; Mukherjee, S.; Schröder, W. Generalization of deep recurrent optical flow estimation for particle-image velocimetry data. Meas. Sci. Technol. 2022, 33, 094003. [Google Scholar] [CrossRef]
Kriščiūnas, A.; Čalnerytė, D.; Akstinas, V.; Meilutytė-Lukauskienė, D.; Gurjazkaitė, K.; Barauskas, R. Framework for UAV-based river flow velocity determination employing optical recognition. Int. J. Appl. Earth Obs. Geoinf. 2024, 134, 104154. [Google Scholar] [CrossRef]
Liu, W.C.; Hsu, M.H.; Kuo, A.Y. Modelling of hydrodynamics and cohesive sediment transport in Tanshui River estuarine system. Mar. Pollut. Bull. 2002, 44, 1076–1088. [Google Scholar] [CrossRef] [PubMed]
Eerapu, K.K.; Lal, S.; Narasimhadhan, A.V. O-SegNet: Robust encoder and decoder architecture for objects segmentation from aerial imagery data. IEEE Trans. Emerg. Top. Comput. Intell. 2022, 6, 556–567. [Google Scholar] [CrossRef]
Tomas, J.P.Q.; Tupas, J.E.E.; Soniel, M.T.; Caruz, C.H.M.E.; Babar, D.B. Real-time detection of floating debris in waterways using YOLOv8. In Proceedings of the 2024 the 14th International Workshop on Computer Science and Engineering (WCSE), Phuket Island, Thailand, 19–21 June 2024. [Google Scholar] [CrossRef]
DeWitt, B.A.; Wolf, P.R. Elements of Photogrammetry with Applications in GIS; McGraw-Hill: Columbus, OH, USA, 2001; pp. 334–341. [Google Scholar]
Akbarpour, F.; Fathi-Moghadam, M.; Schneider, J. Application of LSPIV to measure supercritical flow in steep channels with low relative submergence. Flow Meas. Instrum. 2020, 72, 101718. [Google Scholar] [CrossRef]
Fernandes, F.E.; Nonato, L.G.; Ueyama, J. A river flooding detection system based on deep learning and computer vision. Multimed. Tools Appl. 2022, 81, 40231–40251. [Google Scholar] [CrossRef]
Lu, Y.; Liang, X.; Li, F.W.B. Multi-scale dual-branch fully convolutional network for hand parsing. arXiv 2019, arXiv:1905.10100. [Google Scholar] [CrossRef]
Fujii, H.; Tanaka, H.; Ikeuchi, M.; Hotta, K. X-net with different loss functions for cell image segmentation. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Nashville, TN, USA, 19–25 June 2021. [Google Scholar] [CrossRef]
Wu, Z.; Wang, X.; Jia, M.; Liu, M.; Sun, C.; Wu, C.; Wang, J. Dense object detection methods in RAW UAV imagery based on YOLOv8. Sci. Rep. 2024, 14, 18019. [Google Scholar] [CrossRef]
Bai, L.; Xu, W.H. Improved printed circuit board defect detection scheme. Sci. Rep. 2025, 15, 2389. [Google Scholar] [CrossRef] [PubMed]
Irving, K.; Kuemmerlen, M.; Kiesel, J.; Kakouei, K.; Domisch, S.; Jähnig, S.C. Data descriptor: A high-resolution streamflow and hydrological metrics dataset for ecological modeling using a regression model. Sci. Data 2018, 5, 180224. [Google Scholar] [CrossRef]
Wijaya, F.; Liu, W.C.; Suharyanto; Huang, W.C. Comparative assessment of different image velocimetry techniques for measuring river velocities using unmanned aerial vehicle imagery. Water 2023, 15, 3941. [Google Scholar] [CrossRef]
Taylor, K.E. Summarizing multiple aspects of model performance in a single diagram. J. Geophys. Res. Atmos. 2001, 106, 7183–7192. [Google Scholar] [CrossRef]
Yeh, M.T.; Chung, Y.N.; Huang, Y.X.; Lai, C.W.; Juang, D.J. Applying adaptive LS-PIV with dynamically adjusting detection region approach on the surface velocity measurement of river flow. Comput. Electr. Eng. 2019, 74, 466–482. [Google Scholar] [CrossRef]
Alongi, F.; Pumo, D.; Nasello, C.; Nizza, S.; Ciraolo, G.; Noto, L.V. An automatic ANN-based procedure for detecting optimal image sequences supporting LS-PIV applications for rivers monitoring. J. Hydrol. 2023, 626, 130233. [Google Scholar] [CrossRef]
La Salandra, M.; Colacicco, R.; Panza, S.; Fumai, G.; Dellino, P.; Capolongo, D. RivAIr: A custom-designed UAV-based sensor for real-time water area segmentation and surface velocity estimation. Int. J. Appl. Earth Obs. Geoinf. 2025, 142, 104720. [Google Scholar] [CrossRef]
Perks, M.T.; Fortunato Dal Sasso, S.; Hauet, A.; Jamieson, E.; LeCoz, J.; Pearce, S.; Peña-Haro, S.; Pizarro, A.; Strelnikova, D.; Tauro, F.; et al. Towards harmonisation of image velocimetry techniques for river surface velocity observations. Earth Syst. Sci. Data 2020, 12, 1545–1559. [Google Scholar] [CrossRef]
Bodart, G.; Le Coz, J.; Jodeau, M.; Hauet, A. Synthetic river flow videos for evaluating image-based velocimetry methods. Water Resour. Res. 2022, 58, e2022WR032251. [Google Scholar] [CrossRef]
Jingchun, Z.; Eg Su, G.; Shahrizal Sunar, M. Low-light image enhancement: A comprehensive review on methods, datasets and evaluation metrics. J. King Saud Univ. Comput. Inf. Sci. 2024, 36, 102234. [Google Scholar] [CrossRef]
Yu, C.; Yin, H.; Rong, C.; Zhao, J.; Liang, X.; Li, R.; Mo, X. YOLO-MRS: An efficient deep learning-based maritime object detection method for unmanned surface vehicles. Appl. Ocean Res. 2024, 153, 104240. [Google Scholar] [CrossRef]
Deng, H.; Wang, S.; Wang, X.; Zheng, W.; Xu, Y. YOLO-SEA: An enhanced detection framework for multi-scale maritime targets in complex sea states and adverse weather. Entropy 2025, 27, 667. [Google Scholar] [CrossRef]
Fleit, G.; Baranya, S. LSPIV analysis of ship-induced wave wash. Exp. Fluids 2022, 63, 160. [Google Scholar] [CrossRef]
Daigle, A.; Bérubé, F.; Bergeron, N.; Matte, P. A methodology based on particle image velocimetry for river ice velocity measurement. Cold Reg. Sci. Technol. 2013, 89, 36–47. [Google Scholar] [CrossRef]
Detert, M. How to avoid and correct biased riverine surface image velocimetry. Water Resour. Res. 2021, 57, e2020WR027833. [Google Scholar] [CrossRef]
Bodart, G.; LeCoz, J.; Jodeau, M.; Hauet, A. Quantifying and reducing the operator effect in LSPIV discharge measurements. Water Resour. Res. 2024, 60, e2023WR034740. [Google Scholar] [CrossRef]
Bao, J.; Chen, Y.; Renteria, L.; Barnes, M.; Forbes, B.; McKever, S.; Goldman, A.; Scheibe, T.; Stegen, J. Monitoring river flow status using low-cost wildlife camera and image segmentation artificial intelligence. Environ. Model. Softw. 2025, 194, 106715. [Google Scholar] [CrossRef]
Lopez-Fuentes, L.; Rossi, C.; Skinnemoen, H. River segmentation for flood monitoring. In Proceedings of the 2017 IEEE International Conference on Big Data, Boston, MA, USA, 11–14 December 2017. [Google Scholar] [CrossRef]
Cao, H.; Tian, Y.; Liu, Y.; Wang, R. Water body extraction from high spatial resolution remote sensing images based on enhanced U-Net and multi-scale information fusion. Sci. Rep. 2024, 14, 16132. [Google Scholar] [CrossRef] [PubMed]
Sun, D.; Gao, G.; Huang, L.; Liu, Y.; Liu, D. Extraction of water bodies from high-resolution remote sensing imagery based on a deep semantic segmentation network. Sci. Rep. 2024, 14, 14604. [Google Scholar] [CrossRef] [PubMed]
Chen, L.; Zhu, J. Water surface garbage detection based on lightweight YOLOv5. Sci. Rep. 2024, 14, 6133. [Google Scholar] [CrossRef]
Yu, Z.; Liang, H.; Ye, O.; Zhang, Y. ESOD-YOLOv8: Small object detection enhanced with auto-disturbance rejection convolution. Expert Syst. Appl. 2026, 296, 129046. [Google Scholar] [CrossRef]
Vandaele, R.; Dance, S.L.; Ojha, V. Deep learning for automated river-level monitoring through river-camera images: An approach based on water segmentation and transfer learning. Hydrol. Earth Syst. Sci. 2021, 25, 4435–4453. [Google Scholar] [CrossRef]
Chen, W.; Nguyen, K.A.; Lin, B.S. Deep learning and optical flow for river velocity estimation: Insights from a field case study. Sustainability 2025, 17, 8181. [Google Scholar] [CrossRef]
Zhao, F.; Liu, Y.; Wang, J.; Chen, Y.; Xi, D.; Shao, X.; Tabeta, S.; Mizuno, K. Riverbed litter monitoring using consumer-grade aerial-aquatic speedy scanner (AASS) and deep learning based super-resolution reconstruction and detection network. Mar. Pollut. Bull. 2024, 209, 117030. [Google Scholar] [CrossRef]
Moghimi, A.; Welzel, M.; Celik, T.; Schlurmann, T. A comparative performance analysis of popular deep learning models and segment anything model (SAM) for river water segmentation in close-range remote sensing imagery. IEEE Access 2024, 12, 52067–52085. [Google Scholar] [CrossRef]
Zhang, C.; Yue, J.; Fu, J.; Wu, S. River floating object detection with transformer model in real time. Sci. Rep. 2025, 15, 9026. [Google Scholar] [CrossRef] [PubMed]
Wang, Z.; Lyu, H.; Guo, Y.; Zhou, S.; Zhang, C. How to use general AI for task-specific applications: A case study of monitoring water level trends with river cameras. Environ. Model. Softw. 2025, 192, 106550. [Google Scholar] [CrossRef]
Ran, Q.H.; Li, W.; Liao, Q.; Tang, H.L.; Wang, M.Y. Application of an automated LSPIV system in a mountainous stream for continuous flood flow measurements. Hydrol. Process. 2016, 30, 3014–3029. [Google Scholar] [CrossRef]
Peña-Haro, S.; Carrel, M.; Lüthi, B.; Hansen, I.; Lukes, R. Robust image-based streamflow measurements for real-time continuous monitoring. Front. Water 2021, 3, 766918. [Google Scholar] [CrossRef]

Figure 1. Framework of deep learning-enhanced LSPIV for surface velocity analysis.

Figure 2. The location of the study site in the Tamsui River of northern Taiwan.

Figure 3. Training and validation performance of the deep learning models SegNet and YOLOv8: (a) mean accuracy of SegNet, (b) loss of SegNet, (c) mAP of YOLOv8, and (d) loss of YOLOv8.

Figure 4. Surface velocity fields and contours derived from deep learning-based LSPIV at two-hour intervals between 8:00 a.m. and 6:00 p.m. on 5 July 2020.

Figure 5. Comparison between deep learning-based LSPIV results and ADCP observations on (a) 17 July 2019, (b) 5 July 2020, (c) 25 June 2021, (d) 29 June 2022, (e) 18 June 2023, and (f) 20 June 2024.

Figure 6. Comparison of error metrics for DL–ADCP and LSPIV–ADCP: (a) RMSE, (b) NRMSE, (c) R², and (d) Taylor diagram.

Figure 7. Performance comparison of SegNet segmentation on original and enhanced images.

Figure 8. Statistical performance evaluation of original versus enhanced images: (a) RMSE, (b) NRMSE, (c) R², and (d) Taylor diagram.

Figure 9. River surface velocity fields derived from LSPIV analysis after vessel positions and regions were identified and filtering using YOLOv8. The yellow flow field denotes the flood tide, whereas the green flow field represents the ebb tide.

Figure 10. Comparison of river surface mean velocities obtained from traditional LSPIV, deep learning-based LSPIV, and ADCP measurements during vessel intrusion within the ROI.

Table 1. Comparison of image-based river surface velocity measurement approaches and proposed framework.

Approach	Core Velocity Estimation	Role of Deep Learning	Automation Level	Physical Interpretability	Suitability for Long-Term Monitoring
Conventional LSPIV	Cross-correlation	Not used	Low–Moderate (manual ROI)	High	Moderate
DL-enhanced STIV	Texture orientation/DL-assisted	Direct velocity-related inference	Moderate–High	Moderate	Moderate
DL-based optical flow	End-to-end or hybrid flow estimation	Direct velocity estimation	High	Low–Moderate	Limited by generalization
DL-assisted LSPIV (Proposed framework)	Cross-correlation	Preprocessing (ROI segmentation, vessel detection)	High	High	High

Table 2. Comparison of river surface velocity estimation performance of LSPIV using different deep learning configurations.

Statistical Error		17 July 2019	5 July 2020	25 June 2021	29 June 2022	18 June 2023	20 June 2024
LSPIV-ADCP	RMSE (m/s)	0.104	0.131	0.054	0.111	0.13	0.092
	NRMSE (%)	7.621	12.208	3.964	8.175	9.556	6.745
	R²	0.983	0.889	0.984	0.985	0.97	0.962
LSPIV with SegNet	RMSE (m/s)	0.058	0.111	0.051	0.066	0.104	0.056
	NRMSE (%)	4.213	10.353	3.735	4.735	7.638	3.887
	R²	0.987	0.895	0.986	0.986	0.973	0.974
LSPIV with YOLOv8	RMSE (m/s)	0.101	0.130	0.053	0.105	0.128	0.088
	NRMSE (%)	7.563	12.198	3.924	8.002	9.233	6.342
	R²	0.983	0.89	0.984	0.985	0.971	0.966
DL-ADCP	RMSE (m/s)	0.056	0.111	0.051	0.063	0.099	0.048
	NRMSE (%)	4.088	10.341	3.727	4.647	7.258	3.529
	R²	0.987	0.895	0.986	0.986	0.973	0.977

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Huang, W.-C.; Wulansari, W.; Suharyanto; Liu, W.-C. Deep Learning-Based Enhancement for Surface Velocity Measurements in Tidal Estuaries. Water 2026, 18, 468. https://doi.org/10.3390/w18040468

AMA Style

Huang W-C, Wulansari W, Suharyanto, Liu W-C. Deep Learning-Based Enhancement for Surface Velocity Measurements in Tidal Estuaries. Water. 2026; 18(4):468. https://doi.org/10.3390/w18040468

Chicago/Turabian Style

Huang, Wei-Che, Whita Wulansari, Suharyanto, and Wen-Cheng Liu. 2026. "Deep Learning-Based Enhancement for Surface Velocity Measurements in Tidal Estuaries" Water 18, no. 4: 468. https://doi.org/10.3390/w18040468

APA Style

Huang, W.-C., Wulansari, W., Suharyanto, & Liu, W.-C. (2026). Deep Learning-Based Enhancement for Surface Velocity Measurements in Tidal Estuaries. Water, 18(4), 468. https://doi.org/10.3390/w18040468

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deep Learning-Based Enhancement for Surface Velocity Measurements in Tidal Estuaries

Abstract

1. Introduction

2. Materials and Methods

2.1. Model Architecture

2.2. Study Site

2.3. Data Collection

2.4. Deep Learning Models

2.4.1. SegNet Model

2.4.2. YOLOv8 Model

2.5. LSPIV

2.6. Virtual Water Gauge

2.7. Evaluation Metrics

2.7.1. Metrics for Deep Learning Models

2.7.2. Statistical Errors for Surface Velocity Measurement

3. Results

3.1. Training and Validation Results of the SegNet Model

3.2. Training and Validation Results of the YOLOv8 Model

3.3. Comparison Between Deep Learning-Based LSPIV and ADCP

4. Discussion

4.1. Impact of SegNet-Detected ROI on River Surface Velocity Measurements

4.2. The Impact of Lighting and Shadows on SegNet’s ROI Detection Accuracy

4.3. Investigation of YOLO’s Capability in Detecting Vessels

4.4. Limitations and Future Work

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI