1. Introduction
As one of the key hydrological tasks, the acquisition of river data provides critical decision-making support for flood monitoring and prevention. Real-time, efficient, and accurate measurement of river flow velocity and discharge has become an indispensable means to mitigate the severity of flood disasters. However, natural rivers exhibit complex diversity due to variations in their underlying terrain, and the surrounding natural environment also impedes the implementation of traditional measurement methods. Therefore, exploring efficient and accurate non-contact measurement technologies holds great practical significance [
1].
Commonly used image-based non-contact flow velocity and discharge measurement methods include Particle Image Velocimetry (PIV) [
2], Large-Scale Particle Image Velocimetry (LSPIV) [
3], Space-Time Image Velocimetry (STIV) [
4], and the optical flow method, among others. As a traditional non-contact measurement technique, PIV captures flow field information by photographing tracer particles in the target fluid and analyzing their motion trajectories through particle extraction followed by image matching and other procedures. On the basis of PIV, Fujita et al. [
3] proposed LSPIV, which utilizes natural floating objects in rivers to replace artificially injected tracer particles, thereby avoiding environmental impacts to a certain extent. However, the application of LSPIV is susceptible to the types and quantities of floating objects in rivers, and it also suffers from poor real-time performance. To address this issue, Fujita et al. [
4] put forward STIV. This method sets velocity measurement lines along the flow direction, plots space-time images by recording the variation of grayscale on the measurement lines over time, and calculates flow velocity according to the texture angles in the space-time images. Compared with LSPIV, STIV has a significant improvement in calculation speed. However, this method is highly sensitive to the relative angle between the velocity measurement line and the flow direction, which makes it difficult to achieve high-precision velocity measurement in actual river scenarios with complex flow regimes. The optical flow method breaks free from the limitation of velocity measurement lines. Based on the assumptions of brightness consistency and temporal continuity, it estimates the motion of object images between adjacent frames by leveraging pixel changes and their correlations in sequential images over the time dimension. Horn and Schunck [
5] added the global smoothness assumption to the basic hypotheses and proposed the classic Horn-Schunck (H-S) optical flow method. The global smoothness assumption of the H-S method postulates that the optical flow field of the entire image should maintain global smoothness, with abrupt changes in optical flow only permitted at object boundaries. This method computes the optical flow value at each pixel in the image, but it involves complex computations and consumes considerable time. Lucas and Kanade [
6] introduced the spatial consistency assumption and proposed the Lucas-Kanade (L-K) optical flow method. The L-K method assumes that pixels within the same local neighborhood of an image have similar motion directions and magnitudes. Nevertheless, in most cases, the target moves rapidly and discontinuously, which violates the small motion constraint and thus introduces substantial errors into the measurements of the L-K method. To remedy this defect, Bouguet [
7] proposed the pyramid-based L-K optical flow method, which systematically integrates image pyramids with the affine L-K optical flow method. This integration ensures that fast-moving objects still satisfy the small motion constraint when represented in small-sized images. Farnebäck [
8] proposed a method that approximates each neighborhood between adjacent frames using a quadratic polynomial, thereby deriving an approach to estimate the displacement field based on polynomial expansion coefficients. This method not only addresses the accuracy deficiency of traditional methods when dealing with large-displacement and complex motion scenarios but also guarantees favorable stability. In order to solve the problem that current optical flow extraction research mostly focuses on accuracy while neglecting time complexity, which makes it unable to meet the real-time processing requirements in general non-specific field scenarios, Kroeger et al. [
9] proposed a fast optical flow calculation method based on Dense Inverse Search (DIS). By adopting inverse search to find block correspondences, multi-scale aggregation to generate dense displacement fields, and variational refinement, this method achieves a significant reduction in time complexity while effectively guaranteeing prediction accuracy.
In recent years, with the rapid development of deep learning technology, the introduction of neural networks has facilitated optical flow analysis under complex conditions. Dosovitskiy et al. [
10] proposed FlowNet, the first Convolutional Neural Network (CNN)-based optical flow prediction model, which enables high-precision real-time optical flow estimation through training on synthetic datasets. Ilg et al. [
11] put forward FlowNet 2.0. It stacks multiple FlowNetC and FlowNetS, which feature explicit correlation and a simple encoder-decoder architecture respectively, and introduces FlowNetSD, a sub-network specifically optimized for small displacements, and achieves a significant improvement in estimation accuracy. Huang et al. [
12] proposed FlowFormer, an optical flow estimation model based on the Transformer architecture, which achieves high-precision and strong generalization in optical flow prediction by constructing a 4D cost volume and performing efficient encoding and decoding. Xu et al. [
13] presented the GMFlow model; by abandoning numerous iterative refinement steps and directly modeling pixel correspondences via global matching, while addressing issues such as occlusion and out-of-boundary pixels, this model achieves a balance between large displacement handling capability, accuracy, and efficiency. However, the natural river environment is complex, and the motion states of non-rigid water flow are highly variable [
14]. Both traditional methods and deep learning-based optical flow models focus on pixel-level processing and are unable to conduct physical-level analysis of water flow, which means the accuracy of these methods in estimating water flow displacement still needs further improvement.
Due to the proposal of the Universal Approximation Theorem [
15], neural networks have been proven to possess the ability to approximate any function. Building on this, Raissi et al. [
16] proposed Physics-Informed Neural Networks (PINNs), which incorporate physical prior knowledge into the training of neural networks. They used Fully Connected Neural Networks (FCNNs) to approximate the solutions of partial differential equations (PDEs), including the Schrödinger equation, Allen-Cahn equation, Navier-Stokes equations, and Korteweg-de Vries equation, and obtained promising results. Subsequently, PINNs have been widely applied in various fields. Some studies have proposed novel methods for applying PINNs to fluid velocity measurement. These methods achieve the reconstruction of velocity fields using limited observational data while preserving the physical consistency of the predicted results. Fan et al. [
17] proposed a PINN-based pressure field reconstruction method. By embedding the Navier–Stokes equations into the loss function, they demonstrated how to solve for pressure data from velocity fields measured by PIV. Hasanuzzaman et al. [
18] proposed a PINN-based data enhancement method for PIV measurements. By embedding the physical constraints of the Reynolds-averaged Navier–Stokes equations, they successfully reconstructed the velocity field of turbulent boundary layers using domain boundary data. Zhang et al. [
19] proposed a hybrid framework OF-PhyNet based on optical flow and PINN. This framework extracts the motion characteristics of targets using the H–S optical flow method, and embeds the two-dimensional shallow water equations (SWEs) and the continuity equation into the model to provide physical constraints, thereby achieving robust reconstruction of river surface flow fields. These methods based on PINNs and fluid mechanics knowledge have achieved satisfactory results in the reconstruction of physical fields. However, most of them first obtain measured data from the measurement scenario through traditional approaches and then employ such data for model training, which renders the model performance dependent on data quality to a certain extent. Meanwhile, the specification of such labeled data renders the trained models highly specific to a particular problem or scenario [
20]. The necessity of acquiring new data and retraining the model under novel scene conditions further results in limitations in the generalization performance of these methods.
To address the limitations of existing methods, an image-based flow measurement method combining optical flow and PINN is proposed. This method introduces the convection–diffusion equation to provide physical constraints, ensuring that the model predictions conform to fluid laws. Meanwhile, optical flow information derived from multi-scenario images is adopted to replace the measured data of a single scene, which eliminates the dependence of model performance on measured data and overcomes the limitation of generalization ability, thereby realizing label-free measurement of flow velocity and discharge.
2. Method
2.1. Overview of PINNs
PINNs were originally designed to solve and discover partial differential equations [
16]. Consider the following classical form of partial differential equations:
In the equations,
x and
t represent the spatial coordinates and time coordinates, respectively, while
u denotes the system state.
F is a partial differential equation containing several differential operators,
B stands for the boundary condition (BC), and
I is the initial condition (IC). This set of equations is capable of describing most physical problems [
21], including the wave equation, heat conduction equation, and Poisson equation.
In the classical framework, PINNs are used to solve the target values of specific systems of equations in a designated domain, and the equation solution
is approximated by the model output
. To make the model output fit the solution space described by the equations as closely as possible, the loss function of the model consists of two components, whose weights are represented by
and
, respectively.
and
denote the residuals of the partial differential equation and the data fitting loss, respectively, which are given by the following equations:
In the equations, N denotes the number of collocation points, and the subscript indicates the i-th collocation point. represents the output of the neural network. denotes the collocation points for the partial differential equation, denotes the collocation points under corresponding data conditions, and denotes the labeled data of the corresponding collocation points.
In the process of equation solving,
can be divided into boundary condition collocation points
and initial condition collocation points
, with
and
denoting the weights of the losses from these two types of collocation points, respectively. Furthermore,
can be expressed as:
In the equations, the labeled data at the boundary condition collocation points and initial condition collocation points are denoted by and , respectively, and the corresponding model outputs are denoted by and , respectively.
For equation discovery, its purpose is to infer the parameters of the partial differential equation for fitting through observed data. Therefore,
denotes the observation collocation points
for data acquisition, and
denotes the data
obtained from the corresponding observation points. Furthermore,
can be expressed as:
Starting from randomly initialized parameters, the model receives inputs for forward propagation to compute predicted values and the initial loss. Subsequently, backpropagation is performed to calculate the gradients of the loss with respect to the model’s learnable parameters based on automatic differentiation. The gradient descent method is used to adjust the parameters in the direction of decreasing loss to achieve residual minimization. Through iterative training, the model can ultimately approximate the state space described by the system of partial differential equations.
2.2. Convection–Diffusion Equation
As an important branch of partial differential equations, the convection–diffusion equation describes the diffusive motion of a certain physical quantity (such as concentration, temperature, etc.) in a fluid under transport, and is widely used in the field of fluid mechanics. In a two-dimensional field, the convection–diffusion equation without sources and chemical reactions has the following form:
In the equation, c denotes the transported scalar field, is the velocity vector, D is the diffusion coefficient of c, is the vector differential operator, and is the Laplace operator.
As one of the basic conservation equations in fluid mechanics, the continuity equation describes the law of mass conservation for fluids in motion. In its two-dimensional form, the continuity equation is expressed as follows:
In the equation, denotes the fluid density.
In general, it is usually assumed that the fluid is incompressible, and thus
is a constant. In this case,
can be factored out of the divergence operation, yielding the following result:
At this point, the continuity equation can be transformed into:
Since
is not zero, the following result can be obtained:
Substituting Equation (
15) into Equation (
11), the following result can be obtained:
Expanding it, the two-dimensional convection–diffusion equation for incompressible fluids without sources and chemical reactions can be obtained:
For the convection–diffusion equation, the forward problem is to determine the scalar concentration field given the velocity field, initial conditions, and boundary conditions. Therefore, solving the inverse problem of the equation allows for the determination of the velocity field given the corresponding scalar concentration field.
2.3. Improvement of Model Generalization
Different from forward problems whose outputs are usually unique and deterministic, inverse problems infer the system inputs, internal parameters or structural characteristics that lead to such outputs from the known system outputs or observation data, and thus generally exhibit Hadamard ill-posedness.
In classical PINN frameworks, this ill-posedness is conventionally addressed by a composite loss function comprising the PDE residual
and a data-fitting term
, where
ensures that the model predictions conform to prescribed boundary conditions and initial conditions. In practical measurements, these boundary and initial conditions are replaced by observational data [
19].
Current research predominantly acquires data through conventional methods and employs it as
for model training. However, this practice restricts the prediction accuracy of the model to the measurement data obtained from conventional algorithms. Furthermore, labeled data collected under specific scenarios confine the model to settings consistent with the observations, depriving the network of generalizability across different scenarios and necessitating retraining when handling different conditions [
22].
To address this issue, the proposed method omits the explicit data-fitting loss , thereby relieving the model from data-fitting constraints and allowing its outputs access to an open solution space. Meanwhile, to enable the model to perceive environmental changes, its inputs are adjusted to image grayscale gradients instead of the spatiotemporal coordinates used in standard PINNs. To re-stabilize the solution, multi-scenario, unlabeled optical-flow gradient data are incorporated into the model training process.
Previous studies have shown that neural networks inherently tend to learn low-frequency and smooth functions during training [
23], which provides certain assistance for model convergence. Except for localized turbulence near obstacles, the velocity field is predominantly smooth and continuous at the spatial scales captured by riverbank cameras. This training tendency acts as an implicit regularizer, penalizing high-frequency components and suppressing non-physical oscillatory solutions in the inferred velocity field, thereby contributing to the stability of the derived results.
For a single river scene, the PDE residual alone admits infinitely many velocity fields. However, across different rivers with varying environmental conditions, the underlying physical laws remain invariant. Consequently, their corresponding sets of admissible solutions heuristically tend to intersect in a region that contracts progressively as scenario diversity increases. By exposing the network to gray-gradient arrays extracted from diverse river channels during training, the model is compelled to learn a generalizable mapping from brightness transport features to velocity vectors that holds universally across scenarios. Each additional training scenario functions as an independent physical realization that prunes away non-generalizable branches of the solution space; this cross-scene training objective implicitly penalizes overfitting to scenario-specific observational biases, as such memorization would produce scene-specific velocity fields that violate the underlying physics when applied to other training rivers, thereby increasing the overall PDE residual loss across the training set. In this paradigm, the diversity of training data acts as an implicit regularizer that distributes physical constraints throughout the learned parameter space, progressively narrowing the volume of admissible solutions without requiring labeled observational data for model training. The model thereby acquires the capacity to adapt to previously unseen river scenes by extracting the relationship between optical-flow gradients and surface velocity, effectively substituting cross-scene empirical constraints for conventional observational data that are typically indispensable in standard PINN frameworks.
2.4. Loss Function
Corpetti et al. [
24] proposed that the image brightness is approximately proportional to the vertical integral of mass density when processing cloud images, i.e.,
. For 2D images, the depth
z is constant, thus yielding
. Subsequent studies [
25,
26] have demonstrated the reasonableness of introducing this assumption into natural river environments. Hence, Equation (
17) can be transformed and expanded into:
In the equation,
,
and
denote the first-order partial derivatives of the grayscale
I in the
x,
y and
t directions, respectively, and
and
denote the second-order partial derivatives of the grayscale
I in the
x and
y directions, respectively.
,
,
,
and
are denoted as
,
,
,
and
, which are calculated using the following difference schemes:
In the neural network model proposed in this paper, the input is set as a five-dimensional array representing the five partial derivatives for calculation, i.e., . The model is designed to output a two-dimensional array representing the velocities in the x and y directions of the computational domain, i.e., .
The predictive results of the model depend on the pixel grayscale values and their gradients, which are affected by illumination. However, in natural river environments, variations in light intensity are far more drastic than in indoor settings, leading to severe fluctuations in image grayscale values and a significantly higher likelihood of outliers in such regions. In this scenario, a loss function formulated as Mean Squared Error (MSE) renders the training process more susceptible to being dominated by outliers, causing the model to tend toward fitting a small number of anomalous points rather than adhering to the underlying physical laws. To enable the model to handle such conditions robustly, the loss function of the proposed model adopts the form of Mean Absolute Error (MAE), since MAE exhibits stronger robustness against errors induced by lighting conditions as well as outlier anomalies [
27]. Therefore, the model is trained by minimizing the following physics-based loss function:
In the equation, the subscript denotes the i-th array, and N denotes the total number of arrays.
2.5. Network Model Design
Multilayer Perceptrons (MLPs) are one of the most widely used types of neural networks [
28]. They are capable of learning high-order feature representations from data through multi-layer nonlinear transformations, and their basic units consist of neurons. As shown in
Figure 1a, for the output signals
from
n different neurons in the previous layer, the neuron receives them through corresponding input data interfaces, performs a weighted summation of these signals with respective weights, introduces a bias term through a bias data interface, and then processes the aggregated result through an activation function. Finally, the neuron transmits the result through an output data interface. The model can be expressed as:
In the Equation (
25),
denotes the input term,
denotes the weight,
b denotes the bias term,
f denotes the activation function, and
y denotes the output term.
Compared with single-layer perceptrons, MLPs perform better in fitting complex functions. When an MLP contains a sufficient number of hidden-layer neurons, it can approximate any complex nonlinear function with arbitrary precision. An MLP usually consists of an input layer, one or more hidden layers, and an output layer, with neurons in each layer connected by weights and biases. For an input
, the
l-th hidden layer has a hidden variable
. And the L-layer deep neural network can be expressed as:
In the formula,
and
denote the weight matrix and bias of the
l-th neural network layer, respectively, which are the trainable parameters of the neural network [
29].
Figure 1b shows the MLP architecture adopted in this paper. In the process of architecture selection, this study prioritizes the overall accuracy of predictions, supplemented by the correlation between predicted results and ground truth. In iterative experiments, the baseline MLP model consists of one hidden layer with 100 neurons. The adjustment priority between network depth and the number of neurons per layer is first determined through preliminary testing; based on this, the network depth and the number of neurons per layer are sequentially adjusted to gradually increase model complexity and approach better results. The final MLP employed in this study comprises three hidden layers, each with 1000 neurons, capturing the coupling relationship between optical flow gradients and flow velocity through sufficient parameter space, thereby fitting the nonlinear fluid characteristics represented by the advection-diffusion equation. Due to its low computational cost and ability to effectively alleviate the vanishing gradient problem [
30], the ReLU function is selected as the baseline activation function for testing in this paper. In subsequent tests, it is evaluated alongside Sigmoid and Tanh, which are also widely used in neural network applications. The final activation function is determined after systematic testing and comparison.
To date, there is no unified theory to guide the design of an appropriate neural network [
31]. Under suitable conditions, increasing the complexity of a neural network can exert a positive effect on model performance, yet this effect is not linear. When the model complexity exceeds a certain threshold, the performance gains will diminish [
32]. Therefore, the goal of network design shifts to maintaining as low a complexity and computational cost as possible while ensuring the network achieves the desired accuracy. This principle generally helps to develop artificial intelligence models with fast learning speeds and excellent predictive ability, while avoiding the problem of overfitting. It should be noted, however, that the network used in this paper is not the optimal solution to this problem, but rather the best choice obtained through the testing process.
2.6. Model Training
The Adam optimizer is used to minimize the residual loss constituted by partial differential equations. Based on the first-order and second-order moments of gradients, the Adam optimizer calculates an adaptive update step size for each parameter, which addresses the issues of difficult learning rate adjustment and slow convergence in traditional Stochastic Gradient Descent (SGD). During the training process, the model output and corresponding residual loss are first calculated through forward propagation. Then, the gradients are cleared, and automatic differentiation technology is used to compute the gradients of the residual loss with respect to the model parameters. Subsequently, the Adam optimizer calculates an adaptive step size based on the gradients and moments, and updates the model parameters to minimize the residual loss. Both the initial learning rate and the maximum number of training epochs are determined through comparative testing: starting from the smallest candidate values, they are progressively increased, and the final selection is made based on the corresponding model prediction performance. Through this systematic comparison, the optimal configuration was identified as an initial learning rate of and a maximum of 7500 training epochs.
Figure 2 illustrates the schematic diagram of the PINN framework proposed in this paper for solving river surface flow velocities by combining the convection–diffusion equation. To ensure sufficient diversity of the data and enable the model to fit flow characteristics under different conditions, multiple sets of continuous frame images of rivers are adopted. This multi-scenario training strategy not only improves the model’s adaptability to diverse hydraulic conditions but also serves as an implicit regularization mechanism. Specifically, by learning from diverse illumination environments, the model becomes less sensitive to scene-specific noise and avoids overfitting to local outliers in individual scenarios. Meanwhile, the collocation points used for training can be randomly selected on the images, and the number of collocation points can also be specified arbitrarily. The model training steps are as follows:
For a series of continuous frame images of rivers used for training, the collocation points and their corresponding grayscale data are obtained and preprocessed into input arrays;
Define the network architecture;
Initialize the network parameters;
Compute the outputs via the neural network;
Compute the loss function based on the inputs and outputs;
Update the neural network parameters;
Repeat Steps 4 to 6 until the specified number of iterations is reached.
2.7. Projection Transformation
To determine the mapping relationship between real-world coordinates and pixel coordinates, at least four horizontally coplanar calibration points are arranged on both banks of the river. A coordinate system is established with one of the calibration points as the origin, and the real-world coordinates of all calibration points are determined. Meanwhile, the corresponding pixel coordinates of these calibration points can be acquired from the captured images.
By correlating the coordinates of calibration points in different coordinate systems, the projection transformation matrix
can be derived based on the principle of projective transformation. The correspondence between real-world coordinates
and captured image coordinates
can then be expressed as:
According to the projection transformation matrix and the real-world coordinates of the velocity measurement points, the pixel coordinates of these points can be obtained.
Meanwhile, river images captured by cameras show different characteristics due to factors such as camera installation height and actual shooting angle. To ensure the prediction accuracy of the model, the projective transformation principle is used to convert the captured view to a bird’s-eye view, so that the flow velocity calculation can be carried out under a unified standard.
A region containing the actual velocity measurement points is selected on the river image, and the size and coordinates of the bird’s-eye view are defined. The selected region can then be converted into a bird’s-eye view using projective transformation, and the coordinates of the velocity measurement points in the bird’s-eye view can be calculated via the corresponding transformation matrix.
For a series of bird’s-eye view images obtained by processing the images of the river to be measured, the partial derivative array at the coordinates of the velocity measurement point is calculated frame by frame based on the coordinates of the velocity measurement point in the bird’s-eye view. This array is input into the trained model to obtain the frame-by-frame velocity array at the velocity measurement point in the bird’s-eye view, and the frame-by-frame average velocity is derived by calculating the mean value. The transformation relationships among the bird’s-eye view image coordinate system, the captured image coordinate system, and the real-world coordinate system can be obtained via the projective transformation matrix, further yielding the velocity array of the velocity measurement point in the real-world coordinates. The magnitude of this array is calculated and divided by the time interval between two adjacent frames, thereby enabling the calculation of the actual surface flow velocity of the river at the corresponding velocity measurement point.
2.8. Calculation of Total Discharge and Mean Flow Velocity
After obtaining the surface flow velocities at all velocity measurement points, the total discharge and mean flow velocity of the river can be calculated using the velocity-area method.As shown in
Figure 3, let the velocity at the
i-th measurement point be
, and
n be the number of measurement points.The distance between the
i-th and the
-th measurement points is
, and the distances from the two end measurement points to the river banks are
and
, respectively.The water depth at the
i-th measurement point is
, and the water depths at the two river banks are
and
, respectively.
The cross-sectional area
between a velocity measurement point and the river bank, as well as between the
i-th and
-th velocity measurement points, can be approximated by Equation (
30):
The partial surface flow velocity corresponding to the cross-section
is:
In the formula, and are the bank coefficients of the two river banks, respectively.
The partial vertical flow velocity can be obtained from the partial surface flow velocity:
In the formula, k is the surface velocity coefficient, which is provided by the hydrological station.
The total river discharge can be obtained using the velocity-area method:
The mean flow velocity of the river can be obtained from the total discharge and the cross-sectional area:
3. Experiments and Evaluation Indicators
This section presents two case studies for testing the network proposed in this paper. It should be noted that, to assess the model’s adaptability across different scenarios, the optical flow gradient data used for training were collected from multiple independent river environments, while the cases were excluded from these training sources. Case 1 adopts the Liancheng Hydrological Station in Dali City, Yunnan Province as the experimental site, serving as the validation scenario for architecture selection and hyperparameter tuning. The second experiment uses the captured data from the Gaoqiao Hydrological Station in Chuxiong City, Yunnan Province, which constitutes a held-out test scenario to evaluate the model’s performance in a different river environment. Meanwhile, Case 1 and Case 2 respectively compare and evaluate the estimation performance of the new method and existing image-based flow measurement methods in the artificial river and the natural river.
3.1. Experiments
3.1.1. Artificial River Scenario in Dali City
The river at the Liancheng Station in Dali City is an artificial river channel. Based on the years of measured experience at this hydrological station, the left and right bank coefficients of the river are 0.8, the surface velocity coefficient is 0.82, and the channel width is 11.9 m. In general, relatively smooth artificial river channels are characterized by regular shape and low roughness, which can ensure stable flow conditions.
Figure 4 shows the captured images in this scenario. Through frame extraction processing of the captured video, 325 frame images are obtained.
Four control points A, B, C, and D are selected on both sides of the river channel. Among them, the line connecting A and D serves as both the river cross-section line and the velocity measurement line. Taking point D as the starting point, measurements are conducted every 1 m within the distance range of 2–11 m from point D using a current meter, which are taken as the vertical average flow velocity at the corresponding velocity measurement points.
3.1.2. Natural River Scenario in Chuxiong City
The river at Gaoqiao Station in Chuxiong City is a natural channel with irregular riverbanks and significant variations in cross-sectional flow velocity. Based on the measurement experience of the hydrological station, the left and right bank coefficients of this river are 0.8 and 0.7, respectively, the surface velocity coefficient is 0.97, and the channel width is 24.9 m. During the measurement period, the river was filmed, and 351 frame images were obtained after frame extraction and screening. Constrained by actual shooting conditions, the images captured in this scenario suffer from severe perspective distortion, which further validates the applicability of the proposed method to such challenging scenes.
Figure 5 shows the captured footage of this scene.
Four control points A, B, C, and D are selected on both sides of the river channel. The line connecting E and F serves as both the river cross-section line and the velocity measurement line. Taking point F as the starting point, measurements are conducted every 2 m within the distance range of 6.9–22.9 m from point F using a current meter, which are taken as the vertical average flow velocity at the corresponding velocity measurement points.
3.2. Evaluation Indicators
In current river flow measurements, although the results obtained by current meters contain some margin of error, these errors are relatively small. Therefore, the results from current meters are generally regarded as the corresponding reference values. The rotor current meter used in the experiments of this study complies with the Chinese national standard GB/T 11826-2019 “Rotating-element current-meters” [
33], and its measurement uncertainty is within a controllable range. Meanwhile, the distribution of sampling points as well as the data collection process also conforms to the relevant specifications, which ensures that the collected data can adequately represent the velocity distribution across the entire river cross-section. Consequently, the measurement results are acceptable. Thus, in the comparative tests conducted, the measurement results from the current meter are used as the ground truth. The evaluation of model performance is based on array
obtained from the current meter measurements, array
obtained from different image-based flow measurement methods at
n velocity measurement points, as well as the total cross-sectional discharge and mean flow velocity calculated by these methods.
This paper adopts the following indicators to evaluate model performance. The first one is the root mean square error (RMSE) of the measured data from each method. RMSE quantifies the deviation between different methods and the ground truth to assess their overall accuracy, which is calculated by Equation (
35):
For individual velocity measurement points, relative error is used to quantify the measurement accuracy and characterize the error of local measurement results. Absolute error represents the difference between the measured values of different methods and the ground truth, while relative error is calculated as the ratio of absolute error to the ground truth and expressed in percentage form. Relative error eliminates the influence of dimensions and numerical magnitudes, and can better reflect the deviation degree between measured values and true values. Meanwhile, relative error is also applied to evaluate the measurement accuracy of total cross-sectional discharge and mean flow velocity for different methods.
Furthermore, the standard deviation (SD) of absolute measurement error is introduced to evaluate the dispersion degree of the measurement error distribution. The SD value quantitatively reflects the fluctuation and stability of the measurement results, which is used to characterize the measurement uncertainty and reliability.
Finally, the Pearson correlation coefficient is used to measure the strength and direction of the linear relationship between the measured values of different methods and the ground truth, so as to assess their consistency. The Pearson correlation coefficient
r is calculated by Equation (
36):
5. Discussion
Image-based flow measurement algorithms such as STIV and optical flow have been widely applied to river surface velocity measurement due to their simplicity and non-intrusive nature. With the advancement of deep learning technology, the introduction of neural network models has also brought new momentum to the development of these methods. However, all these methods measure flow velocity by capturing and processing pixel variations in images. The shortage of physical knowledge makes it difficult for them to characterize the complex physical properties of water flow, leading to unsatisfactory measurement accuracy. Meanwhile, the lack of river datasets brings challenges to the training of deep learning methods that highly depend on labeled data. The inherent black-box characteristic of traditional neural network models also reduces the credibility of model decisions. On this basis, the proposed flow measurement algorithm integrated with optical flow and unlabeled PINN provides a new way to identify river surface velocity. Compared with existing methods, this method enhances the accuracy of river flow velocity and discharge estimation and offers certain physical support for the measurement results. Nevertheless, all river flow velocity and discharge measurement methods have their respective merits and limitations.
Environmental factors such as illumination may affect the performance of the algorithm. Compared with the STIV method that measures flow velocity along a single velocity line, the proposed method calculates the river flow velocity at individual velocity measurement points and may be more susceptible to environmental noise. Since explicit image processing methods (e.g., Gaussian filtering, histogram equalization) alter the distribution of pixel gradients, which violates the proportionality assumption between image grayscale and actual concentration, this study is conducted directly on images after projection transformation and grayscale conversion. To enhance the robustness of the method against noise, the loss function is set to MAE, complemented by a multi-scenario training strategy. The experimental scenarios cover typical operational conditions for river monitoring, encompassing both controlled artificial channels with regular geometries and stable flows, and natural rivers with irregular boundaries, variable flow regimes, and moderate perspective distortions. While these scenarios represent common field conditions encountered in hydrometric practice, the applicability of the proposed method under more extreme environmental conditions—such as intense illumination variations, heavy sediment loads, or nighttime monitoring—remains to be fully validated. Future research will focus on developing noise-resistant preprocessing methods that preserve the grayscale-concentration correspondence, as well as conducting comprehensive tests in challenging environments to further expand the operational envelope of the algorithm.
Meanwhile, the measurement accuracy of a single velocity measurement point remains a critical indicator for all flow measurement methods based on the velocity-area principle. Although the proposed method achieves a smaller RMSE and exhibits better overall accuracy compared with traditional methods, lens distortion gives rise to large projection transformation errors near the two river banks, resulting in poor performance of the method at the velocity measurement points in these regions. In addition, the measurement of distances between field calibration points and the selection of calibration points on captured images can also give rise to measurement errors during the projection transformation process. Therefore, follow-up research will focus on standardizing the projection transformation procedure to reduce the adverse impacts of such issues on measurement accuracy.
In addition, it should be noted that the method proposed in this paper is methodologically different from the existing deep learning methods for river flow velocity measurement. Traditional deep learning methods usually require a large amount of labeled training data, and are typically trained on public datasets before being transferred to the river scenarios for testing. Most existing PINN-based methods need to obtain labeled data from target scenarios for training, which limits their cross-scenario application capability. In contrast, the method proposed in this paper adopts an unlabeled training form, which is an important difference from existing methods. Direct comparative testing with supervised architectures makes it difficult to distinguish whether performance differences stem from the model architecture itself or the differences in training data. Therefore, how to carry out comparisons of deep learning methods with different paradigms under rigorous conditions is also an important direction to be explored in subsequent research.
6. Conclusions
To address the problems of traditional image-based flow measurement algorithms lacking a physical basis and having low accuracy in estimating river surface velocity and discharge, in this paper, an estimation method using optical flow and physics-informed neural networks has been proposed. This method introduces the convection–diffusion equation on the basis of optical flow and uses deep learning methods to solve and analyze flow velocity. While improving measurement accuracy, it provides certain physical interpretability for the obtained results, which is more consistent with the time-varying motion laws of rivers.
As an unlabeled data-driven method, this study eliminates the dependence of traditional supervised learning on the labeling of true flow velocity values, which is achieved by incorporating fluid mechanics equations into the loss function and introducing multi-scenario data. Unlike standard PINNs that take spatiotemporal coordinates as inputs and calculate partial derivatives through automatic differentiation, due to the discrete nature of image data, the method proposed in this paper uses gray gradients calculated by the finite difference scheme as the input features of the network, directly establishing a mapping relationship between pixel gradients and flow velocity. Meanwhile, to enable the model to avoid the limitations of traditional PINNs to a certain extent, the constraints of boundary conditions and initial conditions on the model are omitted from the loss function. Instead, data collected from different river scenarios are used to train the model to enhance its generalization ability.
Two test cases are employed to evaluate the performance of the model, and the effectiveness of the proposed method is verified through experiments in the artificial river channel and the natural river environment. In experiments under both scenarios, the measurement method presented in this paper achieves the smallest RMSE and exhibits good correlation with the measurement results of the current meter. Meanwhile, in the measurement of total discharge and average flow velocity, the proposed method is closer to the values measured by the current meter and realizes effective monitoring of river surface flow velocity and discharge.
Compared with traditional algorithms, the method proposed in this paper improves the accuracy of river flow velocity and discharge measurement to a certain extent, demonstrating its application potential in hydrometric measurements. Future research will focus on optimizing the performance of the algorithm under various environmental conditions and reducing the influence of lens distortion on measurement results, aiming to achieve high-precision estimation of river flow velocity and discharge in complex environments.