Optical Tracking Velocimetry (OTV): Leveraging Optical Flow and Trajectory-Based Filtering for Surface Streamflow Observations

Nonintrusive image-based methods have the potential to advance hydrological streamflow observations by providing spatially distributed data at high temporal resolution. Due to their simplicity, correlation-based approaches have until recent been preferred to alternative image-based approaches, such as optical flow, for camera-based surface flow velocity estimate. In this work, we introduce a novel optical flow scheme, optical tracking velocimetry (OTV), that entails automated feature detection, tracking through the differential sparse Lucas-Kanade algorithm, and then a posteriori filtering to retain only realistic trajectories that pertain to the transit of actual objects in the field of view. The method requires minimal input on the flow direction and camera orientation. Tested on two image data sets collected in diverse natural conditions, the approach proved suitable for rapid and accurate surface flow velocity estimations. Five different feature detectors were compared and the features from accelerated segment test (FAST) resulted in the best balance between the number of features identified and successfully tracked as well as computational efficiency. OTV was relatively insensitive to reduced image resolution but was impacted by acquisition frequencies lower than 7–8 Hz. Compared to traditional correlation-based techniques, OTV was less affected by noise and surface seeding. In addition, the scheme is foreseen to be applicable to real-time gauge-cam implementations.


Introduction
Streamflow observations are of paramount importance in hydrological modelling and engineering practice [1][2][3]. Monitoring streamflow velocity facilitates estimation of river discharge and enables the comprehension of complex phenomena, such as erosion dynamics and sediment transport [4]. Traditionally, flow velocity measurement relies on pointwise intrusive approaches, such as acoustic doppler current profilers and impeller flowmeters [5,6]. Alternatively, standard remote methods include radars and ultrasonic flowmeters [7]. Most of these methods are expensive and some of them require time-consuming experimental campaigns and the presence of qualified personnel. Intrusive and highly user-assisted technology cannot be adopted to monitor abrupt phenomena, such as flash floods, and large flood events that may be risky for personnel and equipment [8].
Close-range remote sensing through optical imagery has the potential to advance streamflow measurement by affording low-cost observations that are distributed (rather than pointwise) in space. Also, inexpensive optical equipment can be either installed at permanent measurement stations or mounted onboard unmanned aerial vehicles (UAVs), thus enabling frequent monitoring of hydrological processes. Many diverse approaches entail the extraction of hydraulic information from image data. For instance, in [9], water surface roughness was shown to influence total surface reflectance and to positively correlate with flow velocity. Thermal image data have been used to estimate flow depth and surface velocity towards an integrated and flexible river discharge measurement system [10]. Alternatively, optical image data of the stream water surface can be analyzed with different algorithms to reconstruct the surface flow velocity field. Such velocity maps can then be complemented with information on the bathymetry to provide streamflow estimations. Large scale particle image velocimetry (LSPIV) and particle tracking velocimetry (PTV) are correlation-based algorithms that are popularly used to obtain the surface flow velocity field from image data. Both methods revolve around the concept of particle, that is, a group of pixels either representing a specific object or a pattern, which functions as a tracer and whose position can be tracked in an image sequence. More specifically, LSPIV applies the principles of the classical particle image velocimetry (PIV) fluid dynamics laboratory technique to outdoor environments. PIV enables the estimation of the instantaneous flow velocity field of seeded fluids [11][12][13]. LSPIV was originally introduced by [14] and is based on a high-speed cross-correlation scheme between an interrogation area (IA) in a first image and IAs within a search region (SR) in a second image. Each image is divided into a grid of IAs and the cross-correlation coefficients between IAs and SRs are computed. The location of the maximum value of the cross-correlation coefficient in consecutive frames yields displacement vectors, and can be determined at sub-pixel accuracy using fitting schemes [13]. Based on camera acquisition frequency, it is then possible to estimate the instantaneous velocity from the displacement vectors. On the other hand, PTV consists of particle identification and tracking [15]. Firstly, images are processed to enhance the appearance of particles in the field of view and the location of the particles' centroid is recovered. Then, the centroid of the detected particles is identified in subsequent images to reconstruct the particle trajectory. Several algorithms have been developed for PTV analysis, with cross-correlation being most commonly implemented for both particle detection and tracking [15][16][17].
Due to their simplicity, correlation-based approaches have until recent been preferred to alternative image-based approaches. In [18], LSPIV has been found to underestimate surface velocities with respect to PTV in case of both highly and poorly seeded surfaces. Conversely, PTV combined with trajectory-based filtering has led to more accurate surface flow velocity estimations in both settings. Towards a more automated PTV approach, in [19], a novel nearest-neighbor PTV approach has been introduced (PTV-Stream) that affords the identification and tracking of features of any shape (rather than of round-shaped particles as in traditional correlation-based PTV). However, despite the versatility of PTV in scarcely seeded streams, both LSPIV and PTV are severely influenced by sunlight reflections and require a priori knowledge of several parameters, including tracer dimension and velocity, and the average surface flow velocity direction.
Besides LSPIV and PTV, image-based methods also entail a large number of algorithms that exhibit great accuracy even in the absence of tracers and in case of non-stationary patterns, such as optical flow approaches. Such methods aim at computing the image approximate 2D motion field from patterns of image intensity [20]. The underlying assumption to the implementation of optical flow is image brightness constancy, that is, even if objects depicted in images may change position in a short interval of time, the image reflectivity and illumination should remain constant [21]. Typically, optical flow is applied to autonomous robotics [22], whereby the motion of rigid bodies is determined, but it has also been adopted in climatological and geophysical applications [23][24][25] and to estimate the discrepancy between consecutive images [26]. The non-rigid motion structure and dynamic variations of fluid flow pose serious challenges to the implementation of optical flow, which have partially been addressed through diverse strategies [27][28][29][30][31][32][33][34].
Optical flow methods include many diverse approaches that share the following similarities: (i) Images are prefiltered to enhance their signal-to-noise ratio; (ii) basic measurements (such as derivatives) are computed from images; and (iii) such measurements are then integrated to extract the 2D flow field. Differential techniques focus on the computation of the velocity field from spatio-temporal derivatives of image intensity. They entail both global, that minimize a global energy functional such as the Horn-Schunck method [20], and local, that optimize local energy expressions schemes [35]. Global methods lead to dense flow fields whereby velocity information are extracted at each image pixel. Such high information density is however mirrored by high computational times. Conversely, local schemes tend to be more robust to noise and more computationally efficient. Several studies have explored the trade-off between quality of the computed flow and computational times by proposing fast and robust local methods [36].
Both global and local optical flow approaches have been demonstrated for fluid flow monitoring, and in several laboratory applications these methodologies have been adopted to estimate the velocity field of fluid flows [37,38]. Regarding optical flow in environmental settings, in [39], an altered version of the global approach by [40] has been applied to extract the surface flow velocity field of a river from two video frames. Similarly, in [41], the velocity field of a dam break flow and of an aerated stepped spillway flow has been estimated. Compared to classical PIV, the optical flow approach led to results at higher spatial resolution even if longer computational times were required. In [42,43], global optical flow has been exploited to reconstruct instant surface currents from either satellite-or Unmanned Aerial Vehicle (UAV)-captured images.
Among local techniques, the Lucas-Kanade algorithm has been frequently adopted in several scientific fields due to its reduced sensitivity to noise with respect to alternative first-order techniques [21]. Different from matching methods, differential algorithms are generally better at estimating subpixel displacements, which is an asset in case of the low velocity fields encountered in natural flows. For instance, in [44,45], the Lucas-Kanade algorithm enables subpixel accuracy in estimating glacier flow and sea ice drift. Similarly, in [46], objects drifting on a lake surface are tracked from images analyzed with the Lucas-Kanade approach, and in [47], tracer particles are deployed and tracked onto the surface of a water stream. Further, the Lucas-Kanade algorithm has also been instrumental to investigate high-velocity fields in case of skimming flows above a stepped chute [48] and a flash flood event [49].
Motivated by the robustness of the Lucas-Kanade algorithm and by its independence on regularly-shaped tracers, in this work, we propose a differential local optical flow-based approach, optical tracking velocimetry (OTV), that combines (i) feature detection (testing five different algorithms); (ii) tracking through the pyramidal Lucas-Kanade algorithm; and (iii) trajectory-based filtering as in [18], to estimate the surface flow velocity field of natural streams. OTV does not rely on the deployment of tracers in the field of view, encompasses automated feature identification and tracking, and only retains reliable trajectories, which can be related to the transit of physical objects in the field of view, thus minimizing the probability of fake trajectories and allowing for uncertainty estimation. The objectives of our work entail assessing (i) if OTV is suitable for streamflow measurement from image data in diverse hydrological conditions; (ii) the robustness of OTV in case of high variance between successive frames; and (iii) the performance of OTV with respect to four alternative correlation-based approaches.
Specifically, to experiment with varying hydrological settings, we apply OTV to the analysis of videos captured in the Brenta River in case of abundant and artificial tracer seeding and in the Tiber River during a moderate flood event with natural floating material. Towards the establishment of a robust OTV procedure, five different feature detector estimators are selected and applied to the analysis of experimental videos that present highly diverse hydraulic regimes and image appearances. Such analyses aim at replicating adverse environmental conditions, where rapidly changing hydroclimatic conditions may invalidate the assumption on the constancy of image brightness that underlies optical flow. The effect of the variability between consecutive images is simulated by testing OTV on image sequences at low frame acquisition frequency and reduced frame resolution. Besides providing insight on the robustness of the procedure, these tests also shed light on the possibility of implementing OTV in real-time on image captured through gauge-cams in natural rivers. Finally, the performance of OTV is validated against velocity estimations from four alternative correlation-based techniques.

Case Studies and Methodology
We executed streamflow observations by capturing videos of the surface of two natural rivers in Italy and processing image sequences with OTV, an optical flow-based procedure. The approach is demonstrated in case of feature detection through five different algorithms. Identified features are then tracked by the pyramidal Lucas-Kanade approach, and reliable trajectories are retained by applying the trajectory-based filtering procedure developed in [18]. Based on the extracted trajectories, surface flow velocity estimations are computed and then compared to values obtained from four alternative correlation-based approaches.  Experiments aimed at replicating ideal conditions for streamflow measurements with correlationbased optical approaches. Namely, woodchips were continuously deployed from the upstream side of the bridge throughout the entire cross-section. Particle diameter ranged within 10 to 15 pixels. Particles moved by less than 6 pixels between consecutive frames. On average, throughout an image sequence, particle density was approximately equal to 1.5 × 10 −4 particles/total pixels (assuming an average particle diameter of 12 pixels). Except for areas without tracers, a number of five to eight particles were on average present in interrogation windows of 32 × 32 pixels. The portable telescopic apparatus developed in [50] was installed on the downstream side of the bridge. The apparatus featured a GoPro Hero 4 Black edition camera oriented with its axis orthogonal to the water surface. A metre stick was placed on the right-side stream bank at the same level of the water surface for image photometric calibration. The camera was set to full HD (1920 × 1080 pixels) resolution and at 50 Hz acquisition frequency. A video was recorded for over 4 minutes, and 12 video sequences of 20 s each and subsampled to 25 Hz were extracted out of the original footage. Analyzed images were 1430 × 1080 pixels in resolution (corresponding to a field of view of 7.1 × 5.3 m 2 ) and a bottom right area of 552 × 375 pixels displaying vegetation was masked with a black patch. Original RGB images were converted to grayscale intensity by eliminating hue and saturation information and retaining the luminance. To emphasize lighter particles against a dark background, images were gamma corrected to darken midtones [51], Figure 1b.

A Moderate Flood Event in The Tiber River
A video recorded at the gauge-cam station on the Tiber River underneath the bridge "Ponte del Foro Italico" in the city of Rome, Italy [52,53], on February 2015, at 7:40 a.m., was analyzed through OTV, Figure 1c. The gauge-cam station acquires one-minute long videos every ten minutes through a Mobotix FlexMount S15 weatherproof internet protocol camera. The camera comprises two optical modules with independent sensors and lenses, whereby only footage from the right-side L25 lens (82 • angle of view and 4 mm focal length), which captured an area of 16.15 × 12.11 m 2 , was retained for the analyses. The optical axis is perpendicular to the water surface thus eliminating the requirement of orthorectification. Also, the gauge-cam station features a system of two <20 mW green (532 nm wavelength) lasers installed at 50 cm on both sides of the camera, Figure 1d. The lasers are activated for 20 s at the beginning of each video, and generate highly visible reference points located 1 m apart on the water surface. This laser system enables image photometric calibration. The frame acquisition frequency of the gauge-cam station is automatically adjusted based on external illumination conditions and, thus, varies for each captured clip. The overall frequency of the sequence of 410 images was estimated to be 6.95 Hz as a weighted average of each clip's frequency. Such a relatively low frequency was sufficient to apply image-based streamflow measurement [18].
During the video, the transit of debris and vegetation was observable on the river surface due to a moderate flood. Generally, floating material tended to clump together to form objects on the order of 100 × 100 pixels. On average, material had a frame-by-frame displacement of less than 15 pixels. Over the entire image sequence, particle density was estimated to be 2.2 × 10 −6 particles/total pixels (assuming an average particle diameter of 100 pixels). Large scale floating material tended to transit on the right hand side of images. Image resolution was set to 1024 × 768 pixels for both optical modules. Images were affected by barrel distortion due to the wide-angle L25 lens. Fisheye distortion was removed using the Adobe Photoshop "Lens correction" filter. Also, frames were further trimmed to 865 × 530 pixels to remove highly distorted borders, Figure 1e.

Feature Detectors
Feature identification revolves around three main phases. First, points of interest are detected based on distinctive locations inside images. For instance, corners and junctions are frequently considered candidate features [54]. Then, the neighborhood of the feature is described as a vector and finally matched among different images, whereby the matching consists of computing a distance between vectors. All of these phases are affected by noise and, depending on the selection of the neighborhood, may require diverse computational times. Outdoors, image noise is considerable and feature identification plays a key role in the computation of trajectories and flow velocity. Also, even if several comparisons among feature detectors are available in the literature [55], changing illumination and the presence of fluid flow in streamflow images pose serious challenges that have been rarely investigated. Thus, in order to provide reliable guidelines for image-based streamflow observations, we assessed the performance of five different feature detectors that are highly popular in computer vision.
The good features to track (GFTT) approach identifies features based on their change of appearance between subsequent images. In this method, good features are the ones that can be more reliably tracked in image sequences, thus circumventing the requirement to define corners and points of interest in images [56]. Further, GFTT assumes an affine motion model, where features can be identified not only if they translate but also if images display deformation. The scale invariant feature transform (SIFT) is an efficient object recognition system that transforms images into large collections of local feature vectors. Features are invariant to translation, scaling, rotation, and moderately invariant to illumination changes and affine projections [57]. Even if not as computationally efficient as alternative algorithms, this approach has proved successful in numerous computer vision applications [58]. The features from accelerated segment test (FAST) directly addresses computation efficiency by a machine learning approach that yields high quality features [59]. The algorithm classifies image corners by considering a circle of sixteen pixels around a corner candidate and evaluating the intensity of each pixel. The speeded-up robust features (SURF) stems from previous feature detectors and improves their scale and in-plane rotation invariance [60]. Similarly, the oriented FAST and rotated binary robust independent elementary features (ORB) detector builds on FAST and exhibits orientation invariance and higher robustness to noise [61].
The robustness of the feature detection approach in case of high variance between successive frames was investigated by subsampling the image sequences of a representative video on the Brenta River at 12.5 Hz and 8.3 Hz. Reducing the frame frequency is crucial to minimize computational times, thus enabling real-time and continuous image processing that is of paramount importance to capture the evolving dynamics of hydrological processes. However, lower acquisition frequencies also yield an increased difference between successive frames, thus invalidating the assumption of intensity constancy. While the Lucas-Kanade approach can generally be adjusted to frequency to partially mitigate this issue, we identified the minimum frequency at which OTV led to meaningful results. Similar to frame acquisition frequency, image resolution greatly impacts computational times and the suitability of the image processing method for real-time operation. To this end, we subsampled the resolution of the image sequences to 720 × 953 pixels and 576 × 763 pixels. We then ran OTV with the five different feature detectors on the subsampled frame sequences.
To fully inspect the performance of the detectors, we also evaluated their effectiveness at frame-by-frame feature identification on a consistent representative image sequence on the Brenta River. In our test, we computed the average number of features identified by each detectors over the entire frame sequence. In addition, to verify if identified features also pertained to actual objects, we compared the number of detected features to the total number of trajectories. Specifically, we set the maximum number of trackable features among subsequent frames to 20,000. We then summed the total number of trajectories output by the method for different feature detectors. Results were also benchmarked to features randomly selected in the field of view, where the maximum number of randomly identifiable features in each frame was again set to 20,000. Similar tests have already been adopted in the literature to compare the performance of different detectors and inform their selection [62]. Our test was executed on full HD images recorded at 25 Hz.

Tracking Approach
The Lucas-Kanade algorithm is a sparse first-order differential technique that computes spatio-temporal derivatives of image intensity. The method assumes that image intensity, I(x, t), a function of both position, x = (x, y), and time, t, is conserved. Intensity constancy in image translation can be formulated as [20]. More generally, the optical flow equation, dI(x, t)/dt = 0, is expressed as a gradient constraint equation, where ∇I(x, t) and ∂I(x, t) ∂t indicate partial space and time derivatives and ∇I · v denotes the dot product. According to [63], the residual function between images can be solved by the weighted least squares principle, where weights are assigned to image subsets. The weighting procedure gives more importance to pixels in the center of the subset.
In case of natural streamflow videos, images may display large pixel motion, which may lead to inaccurate optical flow computation. To this end, we used the pyramidal Lucas-Kanade approach where image resolution can be subsampled up to four levels [64]. Optical flow is then computed at the lower image resolution and then propagated up to the original image. Basically, the initial guess for pixel displacement is refined in a recursive fashion at higher resolution.
The algorithm was implemented in the open source computer vision (OpenCV) library in C++. We used the pyramidal approach up to the fourth level with a search window of 15 × 15 pixels in size. Features identified through the five detectors were used as input to the method and then matched in successive images. The iterative Lucas-Kanade algorithm was run until a maximum number of 20 iterations or when the search window moved by less than 0.03 pixels. Features were filtered out if the relative gradient matrix had a minimum eigenvalue less than 0.001 [64].

Trajectory-Based Filtering
Once feature trajectories were determined as in Sections 2.2.1 and 2.2.2, they were post-processed through a filtering procedure developed in [18] for PTV results. Similar to [65], the procedure aims at retaining trajectories for velocity estimation only if they exhibit minimum variance of length and angle of travel between successive images, and if the detected features proceed for a minimum length in the field of view. The camera field of view is assumed to be directed with its width along the river cross-section. Also, detected features are supposed to move forward (that is, their trajectories should be fairly orthogonal to the river cross-section). These simple assumptions aim at reducing "false" trajectories that may bias velocity results. In fact, slanted trajectories with respect to the stream cross-section were frequently found to pertain to agglomerated features that separate while moving in the field of view [66].
The filtering constraints can be summarized as follows. When two successive frames (that is, frames k and k − 1) are considered, feature position vectors X k and X k−1 should satisfy the following condition: whereby j and i are unit vectors orthogonal and parallel to the river cross-section, respectively. Equation (2) is verified between each pair of frames during the analysis. Trajectories are further selected based on their length and slant. Namely, we only retain trajectories slanting up to 80 • from the river cross-section and that extend for at least 20% of the height of the field of view (H). The first condition is imposed through whereby X end and X start indicate the feature position vectors in the final and first images of the trajectory, respectively. The length of the trajectory is validated through To minimize computational time, filtering through Equations (3) and (4) is implemented as soon as tracking of a selected feature through the Lucas-Kanade algorithm is ended. Threshold values in Equations (2)-(4) can be opportunely adjusted to account for the specific experimental conditions. Streamflow velocity for each trajectory (V) is then determined as where ∆t is the time interval between the final and first images of the trajectory. Time-averaged cross-sectional profiles obtained with OTV and the five different feature detectors were tested for statistical significance through the analysis of variance (ANOVA), with level of significance, p, set to p < 0.05.

Alternative Algorithms
Video data were processed with alternative correlation-based algorithms that are frequently implemented in hydrological studies. Specifically, image sequences were analyzed with LSPIV, PTV, and with the recently introduced PTV-Stream approach. PTV analyses were, in turn, applied both without ("Unfiltered PTV") and with ("Filtered PTV") the trajectory-based filtering procedure. Details on the procedures and implementation of LSPIV and PTV can be found in [18], while the PTV-Stream approach as well as its implementation on the Brenta and Tiber Rivers are presented in [19]. On the Brenta River, LSPIV was executed on sequences subsampled at 12.5 Hz and adopting the 1-2, 2-3 sequencing style. Standard direct cross-correlation was computed by setting the interrogation area to 32 × 32 pixels and the grid size to 16 × 16 pixels. The 2 × 3-point Gaussian fit was used as sub-pixel displacement peak estimator. Upper and lower thresholds (u ± 3σ u , with u the mean velocity for each analyzed frame pair and σ u its standard deviation) were applied to velocity results. Consistent parameters were set to analyze the video of the Tiber River.
PTV entailed particle identification through cross-correlation with a symmetric Gaussian kernel (intensity grayscale level set to 130, standard deviation to 12 pixels, and correlation threshold set to 0.5). Further, particle tracking was implemented using cross-correlation by interrogation area (interrogation area set to 20 pixels, cross-correlation threshold to 0.4, and neighbor similarity percentage to 20%). Analysis of the Tiber River image sequence with PTV encompassed consistent approach and parameters. The intensity grayscale level was set to 90 while all other parameters were consistent with the analysis on the Brenta River. PTV-Stream was implemented on videos of the Brenta and Tiber Rivers by setting the luminance threshold for particle identification to ±50 and the particle area threshold to 35 pixels. Frame-by-frame changes in particle areas were set to less than 20% or larger than 120% and the search area was set to 20 × 100 pixels.

Velocity Data Extraction and Comparison
To offer a comparison among different image-based techniques for data collected on the Brenta River, the field of view was divided into eight parallel vertical regions 100 pixels-wide each.
Such regions were selected in the center of the stream to exclude areas close to river banks, where tracers were scarce. Average velocities and standard deviations inside regions were computed for each technique. Specifically, velocities from OTV, PTV, and PTV-Stream were computed by averaging values obtained from Equation (5)  Image-based data were validated against velocities measured with an OTT Hydromet C2 current meter captured 3 cm below the water surface and at four locations along the stream cross-section up to 3.5 m from the right-side river bank. Twelve replicates of the current meter measurements were performed at each location; average velocities (v b ) and standard deviations (σ b ) are reported in Table 1. Average velocity and standard deviation data on the Tiber River were obtained by averaging over the entire LSPIV time-averaged map and over the trajectories estimated with OTV, PTV, and PTV-Stream. Results were compared to measurements from an RVM20 speed surface radar installed next to the gauge-cam station that recorded an average surface velocity of 2.33 m/s during the experiment.

Results
In this Section, we report results for the Brenta and Tiber Rivers. For each case study, we first present velocimetry results obtained by analyzing video data with OTV. Then, we illustrate the performance of the different feature detectors and tracking approach. Finally, we compare velocity estimations to results from LSPIV, PTV, and PTV-Stream.

Assessment of OTV through Controlled Outdoor Tests in the Brenta River
OTV enabled successful extraction of the surface flow velocity of the Brenta River in moderate flow conditions. This is remarkable since experimental videos featured homogeneously and continuously seeded surfaces, which are inherently advantageous to correlation-based LSPIV and PTV rather than to optical flow [34]. Nonetheless, a considerable number of trajectories (on the order of 1000) were computed through OTV in each image sequence. All retained trajectories covered more than 20% of the field of view and were several orders of magnitude more numerous than trajectories obtained with the filtered PTV approach illustrated in [18]. Therefore, OTV was not severely affected by the absence of continuous patterns in images nor by noise due to water reflections, and it may be suitable for streamflow measurements in challenging environmental conditions.  (5). Left-hand plots display all trajectories identified by the method, while plots on the right-side only report trajectories that satisfy the filtering constraints. All techniques successfully reconstruct the surface flow velocity profile in the stream reach, with faster trajectories found in the center of the stream and slower ones close to the banks.

Average Surface Streamflow Velocity
Plots of unfiltered data report numerous low-velocity trajectories close to the left-side bank of the stream, where the stream bed is rich in permanent vegetation.
FAST-based OTV successfully identifies and tracks features located at the bottom of images in the shadowy area below the stream bridge, dashed rectangle in Figure 1b. This suggests that FAST is robust to noise in the illumination conditions of the water surface. SIFT-based OTV proves instead the most affected by changes in illumination conditions since many trajectories only start above the bridge shadow. SURF and GFTT-based OTV show several filtered trajectories close to the right-side stream bank, where the transit of artificially-deployed wood chips was rare. Thus, such detectors identify features that do not necessarily exhibit a higher contrast with respect to the background, which is a crucial property in case of unseeded water surfaces. For a representative image sequence, in Figure 3, trajectories retained after the filtering procedure are reported for each vertical region and all feature detectors. Consistent with Figure 2, SURF and GFTT present a higher number of trajectories close to the right-side stream bank as compared to alternative algorithms. The SIFT technique yields the highest number of trajectories in three vertical bands out of eight, while the FAST algorithm leads to the lowest number of trajectories in seven vertical regions. Average surface flow velocity estimations obtained by OTV with the five feature detectors for the 12 image sequences are reported in Figure 4. In Figure 4a, OTV results are illustrated for all the detectors along with current meter data (black dashed line). The root mean square error (RMSE) values between OTV results and the current meter data are also reported. In Figure 4b, the coefficient of determination (R 2 ) values computed between results from each feature detector and the FAST method are showed. Average velocities are in good agreement among all procedures, with an average of 0.47 m/s. Such a value is slightly above the current meter measurements close to the center of the stream, which were taken at a depth of a few centimeters below the water surface. All the approaches also exhibit similar standard deviations (error bars in Figure 4a) and R 2 values. By analyzing the time-averaged cross-sectional profiles in Figure 4a through ANOVA, the difference among the techniques did not result statistically significant, thus supporting that all feature detectors captured the general behavior of the flow.

Subsampled Video Acquisition Frequency
On average, particles moved by less than 6 pixels between consecutive frames in the Brenta River. Minimum and maximum velocities estimated with all detectors were 0.157 m/s and 0.716 m/s, respectively. By subsampling the acquisition frequency to 12.5 Hz and 8.3 Hz, frame-by-frame displacement increased to 12 and 18 pixels for the maximum velocity and from to 2.5 and 3.8 pixels for the minimum velocity. In Figure 5, time-averaged cross-sectional profiles are reported for the five detectors and the three acquisition frequencies. Further, RMSE values between OTV results and current meter data and R 2 values computed with respect to the results at 25 Hz are showed. Specifically, a 1430 × 1080 pixels image sequence at frequencies of 25 Hz, 12.5 Hz, and 8.3 Hz was analyzed. The average velocity over the cross-section generally decreases up to a minimum of 0.44 m/s in case of the ORB detector and at the frequency of 8.3 Hz. On the other hand, the average standard deviation is not affected by frequency reduction, and therefore, even if the number of trajectories decreases, the filtering procedure is successful at retaining only reliable trajectories. Average velocities obtained with different feature detectors at the same frequency were not statistically different. Even if, at low frequencies, average velocities tended to decrease, differences between average velocities obtained with consistent feature detectors on frames subsampled at different frequencies were again not statistically significant (p >> 0.05). Below the frequency of 7-8 Hz, OTV led to a sharp decrease in average velocities well below realistic values.

Subsampled Image Resolution
In Figure 6, time-averaged cross-sectional profiles are reported for the five detectors and three image resolutions: 1080 × 1430 pixels, 720 × 953 pixels, and 576 × 763 pixels. Specifically, a 25 Hz image sequence was sampled at 1430 × 1080 pixels, 720 × 953 pixels, and 576 × 763 pixels in resolution. RMSE values between OTV results and current meter data and R 2 values computed with respect to the results at 1080 × 1430 pixels resolution are also showed in Figure 6. In the analysis of lower-resolution images, input parameters to the Lucas-Kanade algorithm and the trajectory-based filtering procedure were consistent with the ones adopted for images at the original resolution. The approach is minimally affected by decreased resolution and average velocities and standard deviations are consistent with values obtained from full HD images. No statistically significant differences are observable among average velocities obtained from consistent feature detectors on images at different resolutions. Table 2 reports the average number of features identified in individual frames with the five different detectors in a representative image sequence on the Brenta River. Also, the total number of tracked features in the entire image sequence are presented (we set the maximum number of frame-by-frame detectable features to 20,000). The ORB detector is the most efficient at sensing features in images, with almost twice the number of features detected with FAST. On the other hand, SIFT and GFTT are the least efficient at identifying features but yield the highest number of tracked trajectories in the image sequence. Thus, such approaches accurately identify objects moving on the water surface whose trajectories were realistic and retained through the filtering procedure. Randomly selected features also lead to a high number of tracked features retained in the filtering phase. Generally, the ORB detector results as the most affected by noise since a large number of the detected features are filtered out from the computation (below the threshold of 20,000). The FAST technique instead leads to an approximately consistent number of identified and then tracked features. Also, this is the most computationally efficient technique that took less than a fifth of the time requested by the SIFT-based approach to process the image sequence. Randomly selecting features in the field of view also proves more computationally expensive than the FAST detector. On average, PTV techniques led to the identification of less than 70 features per frame. Such a value was further decreased upon trajectory-based filtering.   Figure 7, time-averaged cross-sectional profiles obtained with OTV are averaged with respect to the 12 image sequences and compared to alternative algorithms and the benchmark current meter measurements. RMSE and R 2 values computed between each method and the current meter data are also reported. The behavior of OTV techniques is generally consistent among different detectors. Generally, OTV velocity estimates are slightly higher than benchmark measurements since the current meter was deployed at 3 cm below the water surface. Consistent with PTV data, OTV largely overestimates water surface velocity at 2 m from the right-side bank, where stream bed irregularities may have influenced measurements with the current meter. Filtered PTV data exhibit the lowest standard deviation probably due to the fact that much less trajectories are retained in the computation with respect to OTV and unfiltered PTV. Unfiltered PTV instead shows the highest standard deviation, and therefore, cross-correlation with a gaussian mask implemented in PTVLab [16] may not be ideal to detect features on water surfaces in outdoor conditions. The recently introduced PTV-Stream technique is in general agreement with OTV and PTV data but leads to higher standard deviations. At 1 m from the right-side stream bank, only one trajectory is computed with PTV-Stream and, therefore, the standard deviation is null. Finally, results from LSPIV are generally lower than benchmark measurements as illustrated in [18].

Feature Detector Performance
In Figure 8, R 2 values computed between each approach and the benchmark data obtained with the current meter are illustrated. FAST-based OTV results exhibit the highest similarity to reference data (R 2 = 0.83). Good performance is also achieved with ORB-based OTV (R 2 = 0.74) and the filtered PTV (R 2 = 0.70). The other PTV and OTV approaches present R 2 values approximately equal to 0.60. LSPIV results yield R 2 = 0.49. Figure 9 displays unfiltered and filtered trajectories generated with OTV and the five different feature detectors for the video on the Tiber River. The filtering procedure successfully eliminates the influence of stationary reflections in the field of view by removing the low-velocity trajectories located on the left of the field of view. On the other hand, the high velocity area is consistently identified through all detectors on the right hand side of the image sequence. This region was also rich in floating sediments, and several objects (debris, floating vegetation, etc.) could be observed over the entire duration of the video. Since many trajectories do not extend for the entire field of view, it can be noted that OTV is influenced by the bridge shadow that created two regions at different illumination parallel to the image width, dashed rectangle in Figure 1e.  Table 3 reports all the values of average velocity and standard deviations obtained with the different velocimetry approaches for the video of the moderate flood in the Tiber River. All methods severely underestimate the actual velocity and this is probably due to the variable frame acquisition frequency, that considerably influences velocity estimation. Remarkably, the performance of techniques that involve the trajectory-based filtering procedure is much better than LSPIV and unfiltered PTV (Unf. PTV in Table 3). Thus, it can be inferred that, in challenging conditions, imposing a few constraints on the general behavior of the flow may be highly beneficial for velocity estimations. The low standard deviation observed for PTV-Stream can be explained by the fact that only a meagre number of trajectories (slightly more than 20) are retained by the filtering procedure for velocity computation. In case of filtered PTV (Filt. PTV in Table 3), slightly less than 70 trajectories are retained for the computation against an average of more than 175,000 in case of OTV.

Suitability of OTV for Streamflow Observations
OTV has proved efficient at providing information on the surface flow kinematics in outdoor conditions based on video footage captured in two different experimental settings. In case of highly seeded surfaces, where optical flow schemes do not typically outperform correlation-based methods, the combination of automated feature detectors, sparse Lucas-Kanade, and trajectory-based filtering yielded accurate velocity estimations, generally in agreement with benchmark manual data. In case of the moderate flood on the Tiber River, the scheme was severely affected by unstable acquisition frequency and velocities were largely underestimated. In experiments on the Brenta River, optical flow-based velocities displayed very low standard deviations, thus supporting the evidence that the variable frequency of the footage of the Tiber River may have biased the computation of surface velocity. In both sets of experiments, the method generated up to several thousands of trajectories even in regions of the field of view where the transit of actual objects could not be visually observed. This fact is of fundamental importance in case of difficult-to-access environments, where deploying tracers may be hampered. As expected, the need for a trajectory-based filtering procedure in outdoor conditions was essential to guarantee that objects rather than stationary water reflections or noise due to illumination were detected and tracked.
In both data sets, the different feature detectors led to consistent results. The FAST approach proved as the most efficient due to its balance between identified and effectively tracked and filtered features. Computational times required by the FAST implementation were generally shorter than other techniques, even with respect to the random selection of features in the field of view. The SIFT and ORB algorithms were more significantly affected by changes in illumination, which are typical in natural settings. The SIFT approach was successful at identifying numerous features but then also led to increased processing times. Interestingly, the SURF and GFTT detectors recovered many features in regions that did not present very high contrast. This fact could be advantageous in case of mirror-like uniform water surfaces and low flow regimes, where the transit of floaters is much scarcer.
The methodology proved relatively insensitive to image resolution while acquisition frequency was found to be a key control to its performance. In experiments on the Brenta River, a frame acquisition frequency lower than 7-8 Hz led to a sharp decrease in velocity estimation. Similar findings have also been observed in PTV-Stream implementations [19]. Of course, the selection of the frame acquisition frequency is related to the size of the field of view and to flow velocity. Therefore, based on our findings, a preliminary analysis on subsampled video sequences may be beneficial to inform the installation of permanent gauge-cams or instrumentation in the field.

Comparison to Alternative Velocimetry Algorithms
OTV results were generally in good agreement with the filtered PTV method and PTV-Stream. The improved performance of procedures that involve filtering of the trajectories confirms that this is a crucial phase to retain reliable trajectories and, therefore, generate accurate velocity estimations. In addition, the filtering procedure only requires minimal information that can be easily gathered by a preliminary inspection of the field of view. Different from filtered PTV and PTV-Stream, OTV allowed for generating a much more significant number of trajectories, which is beneficial to estimate the uncertainty of velocity estimations, and it also required less input since the detection and tracking is fully automated. Another advantage with respect to traditional PTV is the independence of OTV on the shape and size of the tracers that allows for improved performance in case of naturally occurring floaters [34,39]. Compared to LSPIV and traditional unfiltered PTV, OTV led to more robust velocity estimations with lower standard deviations. Our findings demonstrate that traditional correlation-based methods suffer from noise, dishomogeneously seeded surfaces, and from the irregularities that are typical of outdoor settings. While such methods may be efficiently implemented in controlled settings, OTV may instead be a valid alternative in complex settings.
OTV was also the most computational efficient among the methods presented in this work. Processing an image sequence of the Brenta River required approximately 2 min on a personal computer (×86_64 architecture with an AMD Ryzen 7 1700X eight-core processor, CPU ranging from 2.2 GHz to 3.4 GHz, 2 threads per core, and total memory of 16.4 Gb) against the 17 min (with a deviation of 7 min on an ASUS laptop with an Intel Core i7 2670QM Processor, CPU ranging from 2.2 GHz to 3.1 GHz, 2 threads per core) taken by PTV-Stream, which is a custom-made procedure developed in Java. The computational times required for PTV and LSPIV processing with PTVLab and PIVLab are two orders of magnitude longer than OTV.

Criticalities and Future Developments
The efficiency and robustness of OTV support its implementation for real-time streamflow observations through gauge-cams. We foresee that processing units may be integrated in prototypes similar to the one installed on the Tiber River to generate spatially distributed velocity data at high temporal resolution. Of course, extensive calibration and optimization of the algorithms would be necessary to enable fast computation on simple processing units. In addition, further efforts are necessary to validate the performance of the feature detectors and the tracking scheme in even more diverse flow regimes and illumination settings. For instance, generating results in case of extremely scarce illumination or at night is still an open problem. In some other instances, the presence of image subregions that display highly different illumination, such as, for instance, underneath bridges, may lead to shorter trajectories (that is, the tracking is typically interrupted when the feature crosses two areas where the average intensity is remarkably different). Similarly, rapid changes in the illumination may introduce several problems in recovering long and reliable trajectories. On the other hand, preliminary tests on mirror-like surfaces have highlighted that the absence of objects visible with a naked eye is not crucial to apply OTV. In fact, the feature detectors successfully identify minimal changes in the intensity of the water surface that can be regarded as tracers.
OTV results were developed both in case of shallow depth (the Brenta River water level was 0.41 m deep at the time of the experiments) and rather deep stream (the water level was greater than 7 m in the Tiber River at the time of the experiment). In the Brenta River, bottom-reflected radiance was non-negligible; however, it did not affect the accuracy of surface flow velocity measurements. On the other hand, in the Tiber River, higher turbidity and depth should enhance the visibility of surface tracers and flow velocity estimation. Nonetheless, in this latter case, the irregular frequency of image sequences controlled the accuracy of the method. Interestingly, when the bottom-reflected radiance is dominant (that is, especially in shallow streams) image data could also be utilized for stream depth retrieval [67], thus leading to a flexible and integrated discharge measurement approach.
Establishing standard protocols and guidelines for the implementation of OTV in different settings is another important issue to be addressed in the future. Even if innovative observational techniques and approaches are blossoming in hydrology [68], many of these multidisciplinary methods are not validated and taken to the maturity of traditional techniques. In this respect, very few initiatives are currently aiming at leveraging the potential of new methods through the coordinated harmonization of protocols and techniques [69].
Another criticality and future endeavor entails the computation of flow discharge and the establishment of rating curves from OTV. Even if some proof-of-concept experiments have been conducted to demonstrate discharge estimation from image-based methods [70,71], this is still an open problem in hydrology. We foresee that the integration of OTV and the methodologies by [72,73] for the estimation of mean velocity in open channels may be beneficial for advancing current practice in remote hydrological observations.

Conclusions
A novel optical flow scheme, optical tracking velocimetry (OTV), was introduced for remote streamflow observations in natural settings. The approach encompasses automated feature detection, tracking through the differential sparse Lucas-Kanade algorithm, and trajectory-based filtering from a priori known information on the flow direction. The methodology was tested on two diverse image data sets: a set of videos collected on the Brenta River, where artificial seeding was provided, and the footage of a moderate flood captured by a gauge-cam on the Tiber River. Average surface flow velocity estimations on the Brenta River were in good agreement with the current meter measurements for all the five feature detectors. Velocity results for the Tiber River were instead more severely influenced by unstable frame acquisition frequency.
Different from alternative correlation-based approaches, OTV led to a number of trajectories on the order of several thousands. Out of the tested feature detectors, the FAST algorithm proved the most computationally efficient and stable in terms of number of features identified and then tracked. OTV was only mildly affected by low image resolution and led to acceptable results even in case of rather low image frequencies (7-8 Hz with respect to an average surface flow velocity on the order of 0.4-0.5 m/s). The methodology highly benefited from a posteriori trajectory-based filtering that only retains trajectories pertaining to actual objects transiting in the field of view. OTV is inherently suited for real-time implementations on gauge-cams, even if further validation and standardization of protocols are necessary in a wide array of hydrological settings.