1. Introduction
Event-Based Imaging Velocimetry [
1,
2,
3,
4] has recently gained increasing attention as a novel non-intrusive, whole-field flow velocity measurement technique. In EBIV, tracer particles are illuminated by a laser sheet and their motion, assumed to follow the underlying flow [
5], is captured by an event camera. The flow velocity is then estimated from the recorded particle events. As the enabling sensor, event cameras [
6,
7,
8,
9] respond only to logarithmic brightness changes at individual pixels and output asynchronous events with microsecond temporal resolution—far surpassing conventional frame-based cameras. This acquisition paradigm allows particle motion to be captured clearly under extremely high-speed or challenging illumination conditions, highlighting the strong potential of EBIV for resolving complex flow dynamics [
10,
11]. In practice, fully unlocking this capability relies on the estimation algorithms that can effectively convert sparse, asynchronous events into accurate and dense velocity fields. The difficulty arises from the sparsity and velocity-dependence of the event data, as well as the inherent complexity and multiscale structure of realistic flow fields [
12,
13]. These factors lead to two essential challenges for EBIV—achieving high accuracy for each individual velocity estimation and obtaining sufficiently dense spatial measurements across the field of view. In other words, the main difficulty for EBIV algorithms is to simultaneously achieve a high dynamic velocity range (DVR) and a high dynamic spatial range (DSR) [
14,
15].
EBIV algorithms can be classified by their use of pseudo-frames. (1) The first strategy converts events into pseudo-frames—such as voxel grids [
1,
16], time surfaces [
17], or binary particle maps [
11]—and then applies conventional frame-based velocity estimators, including cross-correlation [
1], optical flow [
17,
18], or deep neural networks [
16,
19], to compute displacements over a time interval. (2) The second strategy operates directly on raw events without reconstructing pseudo-frames. It can be implemented by tracking individual event trajectories—e.g., triplet-matching [
20] or Kalman filter-based trajectory reconstruction [
21]—or by aligning groups of events through methods such as local plane-fitting [
22] or motion-compensation approaches (e.g., contrast maximization [
23,
24] and projection concentration maximization [
4]). These frameless approaches generally yield more accurate flow estimates than pseudo-frame methods, albeit at higher computational cost. As a result, existing EBIV methods are generally capable of achieving a satisfactory DVR for practical flow measurements.
Across both algorithmic families, one can find window-based estimators as well as dense per-pixel methods, each exhibiting distinct dynamic spatial ranges. We argue that these differences primarily stem from how the high-dimensional flow field
is represented. Window-based estimators rely on a locally uniform-motion model,
, which suppresses intra-window variations and restricts the achievable DSR [
1,
5]. Dense per-pixel methods instead assign a velocity vector to every pixel [
25], offering high spatial resolution but suffering from ill-posedness—most notably the aperture problem—which makes them sensitive to noise and heavily dependent on regularization. These limitations suggest that a more expressive yet compact parameterization,
, may be essential for next-generation velocimetry algorithms. Similar challenges related to high-precision localization and robustness under severe speckle noise have also been widely investigated in optical imaging and sensing, motivating the development of compact representations and learning-based mitigation strategies [
26,
27]. Indeed, models with richer degrees of freedom, such as rotational [
28] or affine motion representations [
29], outperform the uniform-motion assumption, while reduced-order models demonstrate clear advantages over dense per-pixel formulations [
30,
31]. Building on this trend, representing the velocity field as a continuous coordinate-based mapping is an idea with a long history [
32], but it has recently been revitalized by the emergence of NeRFs [
33] and more general implicit neural representations [
34,
35,
36,
37]. These implicit neural flow models (
Figure 1) offer a highly flexible and compact way to represent complex velocity fields. Recent advances in PIV have demonstrated that implicit neural representations (INR) can outperform traditional cross-correlation and optical-flow methods by enabling continuous velocity parameterization, consistent regularization, and effective incorporation of physical constraints [
36,
38]. These successes naturally motivate extending INR-based formulations to event-based imaging velocimetry, making them a promising—and largely unexplored—direction for EBIV.
Motivated by the success of INRs, we propose INR-VG to achieve higher dynamic spatial resolution. Specifically, INR-VG integrates INR with voxel-grid-based event encoding for event-based imaging velocimetry. In INR-VG, the flow field is modeled as a continuous function in space, enabling expressive and resolution-independent velocity estimation. INR-VG conceptually advances beyond feed-forward optical flow models because it does not require large-scale pre-training, performs test-time optimization directly on each event sequence, and provides high-resolution continuous outputs that can yield gradient fields such as vorticity for downstream analysis. The main contributions of this work are described below.
The INR is introduced for event-based imaging velocimetry, enabling a continuous and expressive representation of the underlying flow fields.
We investigate how INR parameterization and event recording conditions affect measurement accuracy, providing insights for EBIV system design and configuration.
Extensive synthetic and real-world experiments demonstrate the superior accuracy, spatial resolution, and robustness of our INR-VG over existing EBIV approaches.
The remainder of this paper is organized as follows.
Section 2 details the proposed INR-VG method.
Section 3 presents experimental evaluations on synthetic and real datasets, demonstrating the method’s performance across varying measurement conditions. Finally,
Section 4 concludes the paper with a discussion of limitations and future research directions.
2. Methodology
Our INR-VG aims to estimate a dense, continuous flow field directly from sparse, asynchronous event streams. We assume that the underlying flow can be well represented by a continuous function [
36], and that sufficient tracer particles (with appropriate size) move through the flow to generate enough events for accurate velocity field reconstruction. As illustrated in
Figure 2, the overall pipeline consists of three key components. First, the asynchronous events are converted into a time-sliced voxel grid, providing a structured representation of the event data. Second, the underlying flow field is modeled with an implicit neural representation, which encodes the velocity field as a continuous, differentiable function over spatial coordinates, enabling dense, high-resolution predictions at arbitrary locations. Third, a voxel consistency objective is formulated to enforce agreement between the predicted flow and the observed event distributions. This objective jointly constrains temporal alignment and spatial continuity, and the entire process is optimized end-to-end via gradient-based methods. In addition, implementation details are given.
2.1. Voxel Grid Construction for Events
Event cameras output asynchronous events in the form of
, where
denotes the pixel location,
t is the timestamp, and
represents the polarity of the brightness change [
1]. Due to their asynchronous and sparse nature, these events do not form conventional image frames, making them incompatible with standard synchronous processing techniques commonly used in frame-based vision. To address this, several approaches have been proposed to encode events into synchronous, structured representations by aggregating events over small spatial and temporal intervals. Examples include the time surface [
39], which captures the most recent event timestamp at each pixel; the voxel grid [
40], which discretizes the observation window into uniform temporal slices and accumulates events within each voxel; and the event spike tensor (EST) [
41], which stacks events along both temporal and polarity dimensions to form a multi-channel tensor. Among these, we adopt the voxel grid representation due to its simplicity and effectiveness.
where
denotes the set of events within the
c-th temporal slice, and
and
are interpolation kernels for distributing events onto the spatial–temporal grid. By partitioning the observation window into uniformly spaced time slices and accumulating events in each slice, each voxel encodes the local spatiotemporal distribution of events. Stacking these slices along the temporal axis produces a structured representation, while enabling direct warping via the predicted velocity field between different time slices.
2.2. Implicit Neural Representation of the Flow Field
We represent the continuous flow field using an implicit neural representation, which maps 2D spatial coordinates
to flow velocities
. To enhance expressiveness, each coordinate is mapped into a higher-dimensional space using Fourier feature encoding [
38]:
where
is a random Gaussian matrix with elements sampled from
. As a result, the position vector
maps spatial coordinates to a high-dimensional space for modeling high-frequency flow variations. The variance parameter
controls the frequency spectrum of the encoding, with larger values emphasizing low-frequency patterns and smaller values capturing high-frequency variations.
The flow field
is parameterized by a fully connected multilayer perceptron with three hidden layers, which provides sufficient expressive capacity while maintaining stable optimization and computational efficiency for test-time optimization. The network takes an
-dimensional encoded coordinate as input, the hidden layers have widths
,
, and
, respectively, and the final layer outputs a two-dimensional velocity vector. All hidden layers utilize the Gaussian Error Linear Unit (GELU) activation function,
, providing smooth nonlinear mappings that facilitate the representation of continuous velocity fields and their spatial derivatives [
36,
42]. We note that other commonly used activation functions (e.g., ReLU or SiLU) are also applicable in our framework and lead to comparable performance. To this end, the parameterized velocity field is implemented as
, where
denotes the tunable parameters of the MLP neural network.
The velocity field is represented as a continuous function . This design choice is particularly well suited for event-based imaging velocimetry. First, the continuous coordinate-to-velocity mapping enables flow estimation at arbitrary spatial resolutions. Velocities can be queried on any desired grid without explicit interpolation or super-resolution procedures, thereby decoupling the output resolution from the discretization of the event voxel grid. This property is essential for achieving high dynamic spatial range under varying measurement conditions. Second, the differentiability of allows direct evaluation of spatial derivatives such as (e.g., automatic differentiation), which are required for downstream physical analysis including vorticity and strain-rate estimation. Compared with finite-difference schemes applied to discrete velocity fields, this formulation avoids numerical artifacts and improves the accuracy of derivative-based quantities. Overall, the continuous representation provided by INR does not simply result in another event-based optical flow method, but instead reformulates EBIV as a continuous inverse problem rather than a discrete motion estimation task.
2.3. Voxel Consistency Objective and Training
The neural network for INR is trained by enforcing voxel consistency across consecutive event slices. Specifically, given two temporally adjacent voxel grids
and
, the flow field
predicts a vector for each spatial coordinate, and the data term measures the “brightness” misalignment after symmetrical bidirectional warping [
43,
44],
where
and
denote the spatially warped voxel grids, and
represents the temporal interval between the two voxel grids. The warping operation encourages the two voxel grids to be consistent under the estimated velocity field. In practice, since event data are only available on discrete voxel grids, the integral in Equation (
3) is approximated by a finite summation over voxel coordinates:
where
denotes the set of spatial coordinates corresponding to the voxel grid. This discrete formulation corresponds to a Riemann-sum approximation of the integral.
To regularize the flow, we penalize spatial variations using a smoothness term:
Here, the gradient
is obtained directly from the explicit representation of
. Unlike the data term, the regularization is defined on the continuous velocity field and does not require voxelized measurements. Therefore, the integral in Equation (
5) is approximated by uniformly sampling spatial coordinates
over the image domain,
The total objective combines these two terms:
where the smoothness weight
balances alignment accuracy and flow smoothness. The optimal parameters
are then obtained by minimizing the total objective
, i.e.,
. This formulation enforces both voxel consistency and spatial continuity while remaining fully differentiable for gradient-based optimization. Hence, the MLP was trained using the Adam optimizer with an initial learning rate of
for 1000 epochs. The learning rate was scheduled with cosine annealing over the entire training, smoothly decaying from the initial value to zero [
45]. This training strategy is empirically verified to provide stable and satisfactory convergence in our preliminary experiments.
2.4. Implementation Details
Based on the parameter study in
Section 3, the default settings are adopted throughout this work. Specifically, the encoding dimension is set to
, the variance parameter is initialized as
, and the smoothness weight λ is fixed to 10.0. Taking the two additional derivative networks and the gradient storage required by test-time optimization into consideration, the overall memory consumption is approximately 1017 MB for a 256 × 256 input and 15.12 GB for a
input. Similarly, the total computational cost scales linearly with the number of optimization epochs: a
input requires approximately
GFLOPs per training epoch, while a
input requires roughly 53.5 GFLOPs per epoch. All experiments are implemented in PyTorch (v1.13.1+cu117) and executed on an NVIDIA GeForce RTX 3090 GPU. For reproducibility, the complete implementation, along with all experimental configurations and scripts, is publicly available at
https://github.com/yongleex/INR-VG (accessed on 3 February 2026). This enables straightforward testing of INR-VG on diverse real-world EBIV data under different experimental conditions. With these settings, the learning process takes approximately 5 s for synthetic event data at a spatial resolution of 256 × 256 px
2, and around 1 min for data at a resolution of 1280 × 720 px
2. These runtimes reflect a test-time optimization process and are not intended for real-time deployment, but rather for accurate and robust flow measurement under challenging event conditions.
3. Experiments
To comprehensively evaluate the proposed INR-VG method, both synthetic and real-world particle event data are used. Due to the difficulty of obtaining reliable ground-truth velocity fields in real EBIV experiments, extensive synthetic datasets are employed to systematically analyze accuracy, robustness, and parameter sensitivity under controlled conditions. Specifically, the synthetic test event streams (
Figure 3) are generated for arbitrary flow fields by first creating particle image sequences using a particle image generator [
5,
19], which are subsequently converted into event streams through an event simulator [
46,
47]. Meanwhile, the real events (
Figure 4) are recorded using two independent EBIV setups, including a solid-body rotation flow and a water-tank flow [
1]. Furthermore, the proposed INR-VG framework is benchmarked against five representative baselines formed by combining two event representations (EST and voxel grid) with three estimation modules, namely iterative Lucas–Kanade (ILK) [
48], Farnebäck optical flow (OF) [
49], and INR [
36]. Finally, consistent with the evaluation criteria commonly adopted in PIV [
5,
19,
50], the root mean square error (RMSE) and the average endpoint error (AEE) are used to quantify the performance [
51,
52].
Based on the experimental configuration described above, the influence of key hyperparameters in INR-VG—including the variance parameter
, the network width
, and the smoothness weight
—is first investigated. The effects of imaging conditions, such as particle density, particle size, and noise ratio, are also analyzed. Next, visualized comparisons against the baselines are performed using synthetic events, followed by statistical evaluations across 3000 synthetic event datasets with 3 different underlying flow fields [
53]. Finally, our INR-VG is further validated on real-world EBIV scenarios, demonstrating its ability to generate dense and practically usable velocity fields.
3.1. On the Hyper-Parameters of INR-VG
Figure 5 presents the measurement errors under different combinations of the encoding dimension
and the variance parameter
for the three test cases (
Figure 3). For a fixed
, the error generally decreases as
increases, particularly when
is small, indicating that insufficient encoding capacity leads to underfitting of complex and multi-scale flow structures. Once
, further increasing
yields marginal improvement, suggesting that the capacity of the MLP is already sufficient to represent the complexity of the underlying flow field. In contrast, the influence of
is different, and the error first decreases and then increases with
, reaching its minimum when
lies in the range of
–
. A small
results in an encoding dominated by high-frequency components, which limits the ability to capture the low-frequency content of the spatial velocity field. Meanwhile, an excessively large
produces an over-smoothed encoding and fails to robustly capture complex high-frequency flow structures [
36,
38]. Based on these observations and prior reports, we set
and
in the subsequent experiments.
Figure 6a analyzes the influence of the smoothness weight
on measurement accuracy using the same three event streams (
Figure 3). When
, the measurement errors are nearly unchanged. As
further increases, the error tends to rise. As expected, the INR-VG remains effective even when
is small or set to zero, since the INR representation implicitly enforces smoothness (smooth predictions with respect to spatial coordinates) [
36]. However, an excessively large
leads to over-smoothing, resulting in degraded accuracy. Additionally, the rotational flow consistently yields lower errors than the cellular flow, which in turn outperforms the sinusoidal flow, indicating that the measurement accuracy is also closely related to the characteristics of the underlying flow. Overall, the method is relatively robust to the variations of
, and we adopt
as the default setting given the unavoidable noise present in practical measurements.
Figure 6b further investigates the convergence behavior of INR-VG by varying the number of training epochs. Both RMSE and AEE decrease consistently as the number of epochs increases, demonstrating stable convergence of the test-time optimization process. Notably, the error curves become nearly flat around 1000 epochs, suggesting that the optimization has essentially converged. While additional training beyond this point can still yield minor accuracy improvements, the reduction is marginal compared to the increased computational cost. Therefore, 1000 training epochs are adopted in this work, as they provide a practical speed–accuracy trade-off, delivering near-converged accuracy while avoiding excessive computational overhead.
3.2. Effect of Imaging Conditions
Event data serve as the direct inputs to the EBIV algorithm, and their quality—and thus the achievable accuracy—is strongly influenced by the underlying imaging conditions. To examine this effect, we synthesize particle event data under different recording settings, including particle density , particle diameter , and the noise ratio R. This analysis provides insights into how imaging conditions affect INR-VG performance and offers practical guidance for constructing high-accuracy measurement conditions.
Figure 7a shows the effect of particle seeding density. As the density increases from
to
particles per pixel (ppp), all three flow fields exhibit a consistent trend in which the error first decreases and then increases. Low densities provide too few events to reliably record motion cues, whereas excessively high densities introduce particle overlap and matching ambiguity, both of which degrade estimation accuracy. Among the flow fields, sinusoidal and cellular flows show stronger sensitivity to density variation, while the simple rotational flow remains relatively stable due to its spatially smooth velocity field. Overall, the minimum errors are consistently observed at a particle density of around
ppp across all tested flow cases. These optimal densities agree well with the conventional PIV recommendations [
5].
Figure 7b presents the influence of particle diameter. Overall, measurement errors tend to increase as the particle diameter grows. A clear error jump occurs around a diameter of 1.0 px across all flow scenarios, indicating that errors are significantly lower for
px than for
px. The minimum errors are achieved at a diameter of ∼0.5 px. This behavior differs significantly from conventional PIV, where the optimal particle diameter is typically around 2.2 px [
5]. We argue that this difference arises from the characteristics of event cameras. For a particle of 0.5 px diameter, there is approximately a 50% chance that two adjacent pixels will fire events, enabling sufficient motion capture. These observations suggest that, in EBIV, smaller tracer particles than those commonly used in PIV are recommended to achieve higher measurement accuracy.
Figure 7c illustrates the effect of noise. As expected, errors increase gradually as the noise ratio rises from 0% to 20%. Nonetheless, across all three flow fields, the error remains below
even at the highest noise level, implying that INR-VG could be particularly advantageous for practical EBIV measurements in the presence of event noise.
These results establish practical EBIV operating guidelines, with optimal performance at particle densities of – ppp and particle diameters of – px. Outside these ranges, accuracy degrades smoothly rather than failing abruptly, indicating a broad and robust operating range.
3.3. Comparison on Synthetic Events
Figure 8 shows the measured velocity vector fields with endpoint error (EPE) background for the three synthetic event streams (
Figure 3). The errors primarily occur in flow regions with high streamline curvature or low velocity magnitude. Specifically, the high-curvature regions of the sinusoidal flow conflict with the uniform assumption, while low-velocity regions—such as the center of the rotational flow—generate too few events to reliably capture motion. The results demonstrate that the INR representation can accommodate high-curvature flows while effectively “interpolating” in regions with sparse data. Across all three synthetic cases, the proposed INR-EST and INR-VG methods produce substantially smaller errors than the classical optical-flow baselines. Between the two event pseudo-frames (EST and VG), the EPE maps are very similar, with no significant differences.
Table 1 summarizes the quantitative RMSE and AEE results corresponding to
Figure 8. Consistent with visual observations, INR-VG achieves the lowest errors across all flow fields, slightly outperforming INR-EST, which demonstrates the clear advantage of our INR-VG in EBIV measurement.
To systematically evaluate the performance of INR-based methods on large-scale event data, we conduct extensive statistical analyses using synthetic event streams generated from a publicly available dense flow dataset [
53]. For each flow category, 1000 event streams are synthesized, enabling statistically meaningful comparisons across different methods. Specifically, three representative flow fields with increasing complexity are considered: uniform flow, a backward-facing step (backstep) flow, and DNS turbulence. As a result, this diversity in flow complexity allows a clear evaluation of the effectiveness of INR-based methods under increasingly challenging conditions.
Figure 9 presents the box plots of RMSE and AEE from the statistical analysis. Across all three categories, the INR-based methods consistently achieve the lowest errors among all compared approaches. For the uniform flow, the average error remains below
, while for the backstep flow the error increases to approximately
. On the most challenging DNS turbulence, the error further rises to around
. This progressive error increase is directly related to the differences in flow characteristics. Specifically, the backstep flow contains large low-velocity regions where only a limited number of events are generated, leading to increased measurement difficulty, as also evidenced by the numerous outliers observed in the ILK-based methods. In contrast, the DNS turbulence features abundant small-scale structures, which cannot be fully captured by particles with finite seeding density, thereby limiting the achievable accuracy. Nevertheless, compared with all baseline methods, the INR-based approaches consistently exhibit statistically significant accuracy advantages and improved robustness across all tested conditions. Overall, these results also reflect the intrinsic challenges of EBIV measurements in flows characterized by large regions of extremely low-velocity motion, as well as by fine-scale flow structures, which inherently limit particle event generation and accurate velocity measurement. In such cases, the estimation problem becomes fundamentally under-constrained due to insufficient event support, rather than limitations of a specific reconstruction algorithm. Consequently, improving performance in these regimes primarily depends on increasing particle seeding density or event generation within practical experimental limits, which lies beyond the scope of algorithmic design.
3.4. Evaluation on Real Event Data
Figure 10 presents the measurement results on two real-world event recordings (
Figure 4). For the real rotational flow, the ILK-based and INR-based methods show good agreement with the ground-truth solid-body rotation, whereas the optical-flow-based methods fail to recover the flow structure near the center and boundary regions. A closer inspection reveals that ILK-EST and ILK-VG exhibit a few outliers in the central region, as bright spots in the background magnitude map. Quantitative results in
Table 2 further confirm that the INR-based variants achieve the lowest RMSE and AEE, indicating superior accuracy and robustness. For the turbulent water-tank flow, the limitations of the baseline methods become more apparent. The ILK-EST and ILK-VG produce more clustered outliers, OF-EST and OF-VG fail to recover meaningful flow structures, and INR-EST shows noticeable inconsistencies in the lower region of the flow field. This behavior implies an easily detectable failure mode of INR-based methods, in which voxel grids may be mismatched over a large area, rather than as single or clustered velocity outliers. In contrast, only INR-VG yields results that are consistent with the observed event patterns (
Figure 4). Overall, these results show that INR-VG offers the most robust and reliable performance on real-world event data, benefiting from its continuous flow representation and effective noise suppression via voxel-grid encoding. Moreover, the continuous formulation enables direct access to spatial velocity gradients (e.g., vorticity and strain-rate tensors) without additional post-processing, as illustrated in [
36].
4. Conclusions
This work presents INR-VG, a novel event-based imaging velocimetry algorithm that leverages implicit neural representations to model the latent flow field directly in continuous coordinate space, enabling dense velocity measurements from sparse event data. Specifically, spatial coordinates are encoded using Fourier feature embedding and mapped to flow velocities via a multilayer perceptron, with network parameters optimized through test-time optimization by minimizing a voxel-grid-based event alignment loss. Parameter experiments indicate that the configuration , , and provides an effective setting for the proposed INR-VG. In addition, imaging condition experiments suggest that a particle concentration of approximately and a particle diameter around are recommended for achieving high EBIV measurement accuracy. Extensive tests on synthetic event cases and the real rotational flow show that INR-VG achieves competitive measurement accuracy, with errors as low as . More importantly, INR-VG consistently delivers reliable and robust velocity fields on two challenging real-world measurements, whereas other competing methods fail to do so. Moving forward, extending INR-based models to the temporal dimension, such as time-resolved EBIV, may further enhance their applicability to dynamic fluid flows. From a computational perspective, exploring acceleration strategies, including multi-resolution or coarse-to-fine training schemes, could substantially improve efficiency while preserving reconstruction accuracy. In addition, incorporating uncertainty estimation mechanisms, for example through ensemble-based inference or test-time perturbation strategies, would enable quantitative confidence assessment of the reconstructed flow fields, thereby further supporting robust and reliable measurements in scientific and engineering applications.