1. Introduction
In the design phase of infrastructure projects involving riverine systems, such as marine energy farms, power plants, and bridge construction, predicting turbulence statistics of the river flow is essential. Computational models such as Large-Eddy Simulation (LES) provide high-fidelity flow physics; however, they incur considerable computational cost, limiting rapid and frequent exploration of operating scenarios, including changes in inflow Boundary Conditions (BC), tidal array configurations, and environmental factors. The engineering quantities of interest for turbulent river flows are generally time-averaged statistics, which, when obtained via LES, often require lengthy integrations and many instantaneous realizations, which leads to substantial computational and storage overhead.
To address the computational cost of high-fidelity modeling of natural rivers, we aim to implement a machine-learning (ML) model to enable three-dimensional (3D), field-scale river-flow modeling at a fraction of the cost of LES. In that light, we have developed a Convolutional Neural Network (CNN) surrogate model to map a finite set of instantaneous LES snapshots to the corresponding time-averaged flow field, i.e., first- and second-order turbulence statistics. The proposed framework is a data-driven CNN-based spatial mapping model that learns the relationship between instantaneous LES fields and the corresponding time-averaged target fields. The first- and second-order statistics constitute the mean flow velocity field and the Reynolds stress tensor, respectively. Given this objective, we selected the Piscataqua River (see
Figure 1), which is the main river that flows through Portsmouth, New Hampshire, marking the border between New Hampshire and Maine before discharging into the Gulf of Maine at Portsmouth Harbor.
Conventionally, turbulent river flow modeling relies primarily on Reynolds Averaged Navier Stokes (RANS) models and, more recently, the LES method, both of which are computationally expensive, especially when dealing with the highly complex geometry of natural rivers at high Reynolds numbers [
1,
2]. The prohibitive computational cost of these methods, especially as the complexity of flow dynamics and riverine geometry increases, has motivated the development of Reduced-Order Modeling (ROM) strategies, which are grounded in data-driven approaches. Since 2010, CNN and autoencoder-based architectures have been employed to learn compact latent representations of high-dimensional flow fields and to reconstruct turbulent velocity fields [
3,
4]. Studies show that machine-learned inflow generators based on autoencoder-type CNNs can reproduce the spatio-temporal evolution of turbulent channel flows and maintain fully developed turbulence at a fraction of the cost of direct simulations [
5,
6]. Recent convolutional autoencoders combined with convolutional LSTM networks provide efficient models for 3D turbulence and pass stringent physics-based statistical tests [
4,
7]. Moreover, convolutional autoencoders and LSTM-based ROMs have been shown to outperform Proper Orthogonal Decomposition (POD) methods and have become a common choice for non-intrusive unsteady flow modeling [
8]. These models confirm the ability to reconstruct turbulent velocity fields and develop coherent structures through learned latent dynamics.
More recently, transformer-based architectures have extended this paradigm by directly learning mappings between function spaces or long-range spatio-temporal dependencies, thus improving stability and long-horizon predictive capacity in turbulent regimes [
1,
9]. Despite these advances, the majority of studies have been validated primarily on canonical configurations, such as low-Reynolds-number turbulent channel flows with geometrically simple domains and periodic or otherwise idealized BCs [
4,
5]. Similarly, foundational studies in ML-assisted turbulence modelings focused on controlled settings that isolate specific modeling challenges [
6,
10,
11]. While such configurations are essential for methodological development, they generally underrepresent the geometric irregularity, boundary heterogeneity, and multiscale hydrodynamic interactions characteristic of natural riverine systems. Therefore, the demonstrated success of ML-based surrogates in canonical flows does not automatically translate to predictive reliability in environmentally complex domains. These early studies should instead be viewed as methodological stepping stones that establish algorithmic feasibility while leaving open the question of robustness under realistic environmental complexities.
Past investigations of tidal flow dynamics within the Piscataqua River [
12,
13] have documented several salient features, including a pronounced tidal asymmetry that modulates river flow during ebb and flood events, nonlinear evolution of tidal waveforms, and substantial variability in current magnitude under changing sea-level conditions. These dynamics reflect strong nonlinear hydrodynamic interactions and intense sensitivity to boundary forcing.
Recently, complementary feasibility and site-selection studies [
14] have adopted a hybrid strategy that combines regional-scale hydrodynamic modeling with targeted LES to explain turbulence-dependent processes and evaluate sensitivity to turbulence content and other inflow conditions. While LES provides a physically consistent solution of energy-containing turbulent flow structures, its computational cost grows rapidly with Reynolds number and grid resolution [
15], limiting its practical use for systematic parametric or ensemble exploration of BCs [
16,
17]. This limitation is especially consequential in tidal river systems, where engineering assessments and sea-level rise projections require repeated simulations in a wide range of forcing scenarios to quantify uncertainty and system sensitivity [
18]. The surrogate ML framework presented here is motivated by this methodological bottleneck: approximating brute-force LES-resolved mean-flow statistics while reducing computational expense enough to enable tractable BC sensitivity analyses and engineering-scale ensemble studies of real-life waterways.
Attempts to extend ML-based surrogates to model realistic turbulent flows have mostly focused on atmospheric boundary layers and urban canopy configurations [
19,
20], while riverine environments remain comparatively underexplored. Unlike canonical boundary-layer flows, natural rivers exhibit pronounced spatial heterogeneity due to multiscale geometric constrictions, bathymetric variability, and flow-structure interactions. Flow accelerates near contractions, diverges around bridge piers, and generates complex wake flow structures and substantial gradients in second-order turbulence statistics [
21,
22,
23]. These quantities may vary by orders of magnitude between upstream, shear-layer, and wake regions. Such variability poses a fundamental challenge for surrogate ML modeling: without appropriate normalization, sampling strategies, or loss weighting, models may become biased toward energetically dominant regions or suffer from unstable gradient dynamics during the training process.
Another limitation concerns generalization across boundary conditions. Many ML surrogates are trained under a single inflow specification and evaluated under identical or weakly perturbed conditions. However, practical river engineering applications require robustness to significant variations in discharge, velocity profiles, and turbulence intensity. Cross-boundary evaluations remain scarce, and studies that explicitly test out-of-distribution conditions report significant performance degradation when training and testing distributions differ [
24,
25]. This limitation raises concerns regarding the deployment of surrogate ML models in real-world riverine hydrodynamics scenarios.
In parallel with advances in data-driven ML-based turbulence modeling, considerable effort has been devoted to embedding physical constraints into machine learning architectures. Physics-informed neural networks (PINNs) [
26,
27] and related loss-augmentation strategies enforce conservation laws directly within training objectives and have shown promise in hydrodynamic applications [
28,
29]. In the context of CNN-based river-flow prediction, Zhang et al. [
30] incorporated a divergence-free constraint into the loss function and reported only modest improvements in the predicted flow field compared with a purely data-driven mapping network.
Nonetheless, most demonstrations remain confined to lower-dimensional or idealized configurations, including simplified geometries or depth-averaged flow representations. Applications to fully three-dimensional, geometrically complex river domains remain scarce. More broadly, state-of-the-art ML surrogates have demonstrated strong performance in reproducing canonical turbulence datasets, including channel and isotropic flows [
31]. Yet significant challenges persist when extending these approaches to strongly heterogeneous natural systems. In particular, existing models struggle with (i) spatially variable turbulence intensities spanning multiple orders of magnitude, (ii) generalization across substantially different boundary conditions, and (iii) stable training when targeting second-order turbulence statistics. Recent efforts in hydrodynamic and energy-systems modeling have leveraged reduced-order ML surrogates to improve computational efficiency and cross-boundary robustness [
28,
32]. While effective for predicting bulk quantities such as water surface elevation, discharge, or power output, these models typically do not reconstruct fully resolved three-dimensional turbulent flow fields. Consequently, they do not explicitly capture wake–bathymetry interactions, fine-scale turbulence structure, or large spatial variability in Reynolds stresses characteristic of realistic riverine environments.
This study addresses these gaps by curating LES data from a real-life riverine system on planform geometry and bathymetry, and systematically evaluating preprocessing and training strategies to stabilize surrogate learning under strongly heterogeneous, three-dimensional hydrodynamics.
This paper is organized as follows. In
Section 2, we describe the methodology, followed by the results and discussion in
Section 3. Finally, in
Section 4, we conclude the study’s findings.
2. Methodology
The high-fidelity data used to train the model were obtained from our in-house hydrodynamics model, the Virtual Flow Simulator (VFS-Geophysics) model with a wall-model treatment to resolve near-bed dynamics. The solver produced three-dimensional snapshots of the velocity components u, v, and w in the x, y, and z directions, respectively, from which, by time averaging, the first- and second-order turbulence statistics are computed. The first-order statistics, i.e., mean flow velocity field, include , and , while the second-order statistics, i.e., Reynolds stresses, consist of , and . The dataset included both the instantaneous unsteady turbulence and the time-averaged field (i.e., mean flow and turbulence intensities).
This richness in spatial and temporal characteristics, however, comes with its own challenges. Namely, the second-order statistics vary by nearly two orders of magnitude across the domain. For instance, near the inlet region, turbulence intensity could be moderate, whereas around wall-mounted infrastructures or near mixing layers within the river, Reynolds stresses could spike sharply. This heterogeneity proved to be a major obstacle to ML model training, as unscaled datasets led to unstable optimization and biased the loss function toward high-magnitude regions.
One solution we employed to address this issue was normalizing the data to mitigate the impact of this diversity. Then, to test the model’s generalization power, we split the train and test data based on two grounds. Once into instantaneous and time-averaged within one BC to assess model stability. Then, by BC, to test the cross-boundary generalization for different flow regimes within the same river system. The rationale behind this decision is that instantaneous fields capture unsteady turbulence, while time-averaged fields capture more steady mean flow and turbulence intensities. On that account, we followed two main scenarios for our surrogate models: (i) mean flow velocity predictor, which is trained on instantaneous velocity as input and time-averaged velocity as target for the RF boundary condition and tested on instantaneous velocity as input and time-averaged velocity magnitude as target for the straight channel and log-law profile inflow boundary conditions; and (ii) TKE predictor, which is trained on instantaneous velocity as input and TKE as target for the RF boundary condition and tested on instantaneous velocity as input and TKE as target for the straight channel and log-law profile inflow boundary conditions. Further details on the boundary conditions and their implementation are provided in the subsequent sections. It is worth mentioning that we deliberately chose a similar BC assignment for training and testing scenarios to keep the experiment as controlled as possible.
We then discretized the computational domain of our study area to computational nodes in the x, y, and z directions, respectively. The time step of the computations was set equal to 0.05 s to ensure that the Courant–Friedrichs–Lewy number stays less than 1.0 at all times. It should be noted that, given the computational resources at our disposal, this resolution was chosen to capture the turbulent structures of interest while remaining computationally feasible.
2.1. Hydrodynamic Model
The hydrodynamics model solves the spatially filtered Navier–Stokes equations for incompressible flow in non-orthogonal generalized curvilinear coordinates. In compact Newton notation, with repeated indices indicating summation, the equations are expressed as follows [
33]:
where the Jacobian of the geometric transformation,
is used for transforming the Cartesian coordinate system to curvilinear.
shows the contravariant volume flux, where
. The
i-th filtered velocity component in the Cartesian coordinate is shown as
,
represents the dynamic viscosity of the fluid (i.e., water), and also
, shows the contravariant metric tensor, the background density (i.e., water density) is shown as
(=
). The pressure term is defined as
p.
The subgrid-scale stresses are modeled with dynamic Smagorinsky in the LES turbulence model and defined as [
34,
35]:
where
is the viscosity of eddies, the tensor of filtered strain-rate is represented as
, and
is the Kronecker delta. The Smagorinsky constant is shown as
, and
. The filter size (
) is defined as the cube root of the cell volume. More details about the hydrodynamic solver can be found elsewhere [
36,
37]. Finally, it should be noted that the LES results for river flow within the study area were validated against field data measured with an Acoustic Doppler Velocity Profiler (ADCP), as reported by others [
14].
Importantly, the arbitrarily complex geometry of the river bathymetry is resolved using an immersed boundary method [
38,
39]. The boundary conditions of the hydrodynamics computations of the study area consist of a no-slip condition for solid surfaces (e.g., river banks, bed, and surfaces of bridge piers). Since the computations lacked sufficient resolution to capture the viscous sublayer near the solid surfaces, we employed a wall model to account for the hydrodynamic effects of the wall on the flow. The free surface of the river is described using the rigid-lid assumption. A Neumann boundary condition is imposed at the outlet cross-plane, while the inlet cross-plane of the computational domain is prescribed using various inlet boundary conditions [
40], as follows:
Straight channel: This boundary condition is calculated by simulating the flow in a 20-m-long straight channel extension from the inlet cross-plane of the study area. The cross-plane of the background mesh along the length of the straight channel is identical to the study area’s inlet cross-plane. In a separate simulation, we used periodic boundary conditions in the streamwise direction to obtain fully developed turbulent flow in the channel. The instantaneous flow field from a cross-plane in the middle of this channel was saved and later fed into the study area as the ‘straight channel’ inflow boundary conditions.
River inflow: This boundary condition is obtained from a separate LES that modeled the flow field of the 600-m-long reach of the river located immediately upstream of the study area. The instantaneous flow field at the outlet cross plane of this precursor simulation was saved and then imposed at the inlet of the study area as the “river inflow” boundary conditions. Since this boundary condition is derived from site-specific river bathymetry upstream of the study site, the so-obtained inflow constitutes a more realistic inlet condition marked by strong vertical shears, lateral variations, and turbulence intensities that are relatively more consistent with those observed in the field.
Log-law profile: This boundary condition imposes a velocity profile,
, at the inlet cross plane that follows the classic wall law form:
where
is the friction velocity,
is the von Karman constant [
41], and
is the roughness length.
2.2. Network Architecture
The NN surrogate model maps the instantaneous LES-computed flow field database of the Piscataqua River to its corresponding time-averaged quantities. Instead of running hundreds of thousands of LES time steps—depending on the turbulent flow and eddies’ frequency range—to obtain the statistically-converged mean flow quantities, the surrogate model learns the relationship between a small number of instantaneous snapshots and their ensemble-mean representation. Two complementary model configurations were developed to explore the effect of input dimensionality on predictive capability:
Single-channel network: This architecture receives a single instantaneous scalar field, typically the velocity magnitude or the streamwise velocity component u, and predicts its time-averaged counterpart . The objective of this configuration is to evaluate whether the instantaneous spatial structures in a single velocity component alone contain sufficient information for the network to infer the statistically converged mean flow field. It provides a minimal, computationally inexpensive baseline.
Tri-channel network: In the second approach, the instantaneous velocity vector (u, v, w) is supplied as a three-channel input tensor. This configuration allows the model to capture cross-component correlations and the combined influence of streamwise, spanwise, and vertical motions on the time-averaged flow field. Since the magnitudes of u, v, and w can vary greatly across the river domain, especially near bridge-induced wakes, a cube-root transformation is applied to reduce this disparity prior to normalization.
Both models are implemented as compact 3D CNNs following an encoder-decoder topology (see
Figure 2). Convolutions were performed in three spatial dimensions to preserve volumetric context within each full-domain sample. The 1-channel and 3-channel variants share the same architectural backbone, differing only in the number of input channels and in the inclusion of the cube-root preprocessing step. The end goal of both models is to establish a scalable framework that can learn the statistical imprint of turbulence from a limited set of instantaneous flow fields and serve as an efficient data-driven surrogate for brute-force LES.
2.2.1. Encoder
The encoder progressively compresses the spatial domain while enhancing the representation of turbulence features. Each encoding stage consists of a 3D convolution followed by a nonlinear activation (Rectified Linear Unit, ReLU). The network begins with a small number of filters that increase with depth to capture features at multiple spatial scales:
where
for the single-channel model and
for the tri-channel model. Each convolution uses a kernel size of
, a stride of
, and padding of
. This kernel size allows the network to aggregate information from neighboring voxels while progressively reducing the spatial resolution. The stride of 3 was empirically selected as a midpoint between preserving sufficient spatial detail (e.g., wake gradients) and maintaining GPU efficiency for 3D data. Eventually, to maintain numerical stability, a 2-pixel padding is applied to ensure that receptive fields overlap smoothly between layers.
We used the ReLU activation function to introduce nonlinearity while preserving nonzero gradients for positive activations. This choice improves training efficiency and reduces the likelihood of vanishing-gradient issues in deeper 3D convolutional architectures.
From the perspective of fluid dynamics, the encoder functions as a hierarchy of spatial filtering operations that become progressively coarser. As network depth increases, the receptive field expands, enabling the learned representations to capture flow structures at multiple spatial scales. These scales range from localized velocity gradients and wake regions to broader flow patterns determined by channel geometry and river-scale circulation.
2.2.2. Decoder
The decoder mirrors the encoder, using 3D transpose convolutions (i.e., deconvolutions) to upsample the compressed latent representation back to the original spatial resolution. The feature channels are arranged in reverse order:
Each transpose-convolution layer uses a stride of
to upsample the compressed feature maps. The first three decoder layers use a kernel size of
, while the final output layer uses a kernel size of
to recover the required output dimensions. Unlike the encoder, which uses uniform
padding, the decoder uses layer-specific padding values to match the target spatial resolution after upsampling. Specifically, the decoder padding values are
,
,
, and
for the four transpose-convolution layers, respectively. ReLU activations are applied after each convolutional and transpose-convolutional layer, including the output layer.
Figure 3 summarizes the above-mentioned configurations on the network. It should be noted that the present framework predicts statistically converged flow quantities non-iteratively; therefore, prediction errors do not recursively propagate across temporal states as in autoregressive surrogate models.
Architectural hyperparameters and the symmetric encoder–decoder arrangement were determined through preliminary numerical experiments designed to balance reconstruction accuracy, preservation of turbulent flow structures, numerical stability, GPU memory usage, and computational efficiency. Smaller kernels and strides enhanced the retention of fine spatial details but significantly increased memory consumption and training costs. In contrast, larger strides or overly deep architectures led to excessive smoothing and reduced reconstruction quality. Although it lacks direct skip connections (as in a full U-Net), the combination of wide kernels and overlapping receptive fields helps retain both coarse flow organization and moderate-scale spatial variations within the learned feature space.
2.2.3. Training Procedure
The training dataset comprised 21 instantaneous LES realizations from three inflow boundary-condition scenarios: 10 snapshots from the Straight-channel case (timesteps 105 to 195, sampled every 10 timesteps) and 11 from the Log-law case (timesteps 180 to 200, sampled every 2 timesteps). For model testing, 10 snapshots from the River-flow case (timesteps 105 to 195, sampled every 10 timesteps) were available as an alternative. In each scenario, a single LES time-averaged velocity field was used as the supervisory target, establishing a many-to-one mapping between instantaneous flow realizations and the corresponding statistically averaged flow field.
During this process, we noticed that training the NN on raw LES data is prone to instability. This could be due to the underlying turbulence quantities that span several orders of magnitude. In the Piscataqua River LES dataset, near-inlet velocities were fairly uniform (≈0.5–1 ms−1), whereas regions close to the mixing layers mark strong shear and velocity spikes exceeding 3–4 ms−1. Second-order quantities, such as Reynolds stresses and TKE, exhibited even greater variability. To mitigate these disparities and promote balanced gradient propagation during training, we applied several preprocessing and normalization strategies before feeding data to the network.
The most effective stabilization step was a cube root transform applied to the relevant input or target field. For a signed velocity quantity
q,
This nonlinear mapping compresses the dynamic range without destroying the relative ordering of values. Because the transformation is odd-symmetric, it preserves the sign of velocity fluctuations and therefore the directionality of motion. In the untransformed data, extreme gradients near mixing layers dominated the loss function, leading the network to overlook quieter regions, such as mid-channel or near-bed areas. After the cube-root transform, the distribution of magnitudes across the domain became more homogeneous, allowing the optimizer to treat each region of the river with comparable weight. For non-negative targets such as TKE, the transform simplifies to . We empirically verified that regressing the transform after prediction (i.e., cubing the outputs) restores the variables’ physical scale with negligible numerical bias.
Following the cube-root transform, the transformed input or target field was standardized using the training-set statistics:
where
and
are the mean and standard deviation computed over all training samples. The same transformation parameters were applied to validation and test data to prevent information leakage. This normalization ensures that the transformed fields contribute more consistently to the total loss and avoids scale-imbalance effects during the training procedure. It also speeds convergence of gradient-based optimization by centering the data around zero.
The study area, i.e., the LES domain of computational cells, is computationally demanding for direct full-volume processing in GPU memory. During development, tiled processing was evaluated as a possible memory-management strategy. For the present grid, a natural candidate tile size is cells because it spans the full vertical direction while partitioning the streamwise and spanwise directions. However, tiled processing increases the number of subdomains and requires an additional reconstruction step to average overlapping predictions. Therefore, the final workflow did not treat tile size or overlap ratio as model parameters, and tiled processing is left as a future extension for larger domains or higher-resolution simulations.
These preprocessing operations, especially the cube-root scaling, proved crucial for achieving stable convergence. Training runs performed without this preprocessing exhibited instability and abnormal losses in the first few epochs, whereas the transformed or preprocessed datasets exhibited smooth, monotonic loss decay. In qualitative terms, cube-root preprocessing enabled the network to see both energetic regions of the flow and quiescent zones on an equal footing, yielding spatially coherent predictions across the river’s study area.
The model was trained using the AdamW optimizer [
42] with an initial learning rate of
and a decoupled weight-decay coefficient of
. A StepLR scheduler was used with a step size of 400 epochs, a multiplicative decay factor of
, and a batch size of 2 due to memory constraints.
The loss function used for the reported results was defined as the mean squared error (MSE) between the predicted and reference time-averaged fields obtained from LES. Although the implementation was developed with the capability to incorporate additional physics-based penalty terms, such as divergence-free constraints following physics-informed formulations [
30], these terms were not activated in the final training configuration. This choice was motivated by the previous work of Zhang et al. [
30], where the addition of a divergence-free constraint produced only limited gains relative to the unconstrained data-driven CNN surrogate. The final model was therefore trained using only the data-driven MSE objective, which provided satisfactory predictive accuracy while avoiding the additional computational cost of repeatedly evaluating physics residuals during optimization. Accordingly, all results reported in this study correspond to the MSE-based CNN surrogate model. As a future extension, the currently inactive physics-based penalty terms, including divergence-free and momentum-residual constraints, can be incorporated into the training objective to evaluate their impact on physical consistency, predictive accuracy, and cross-boundary-condition generalization.
The numerical experiments conducted during the training process revealed several consistent behaviors and practical lessons regarding the network architecture and preprocessing strategies. These findings can be summarized as follows:
Single-variable network: We consider the ML model trained with only one input vector, e.g., the instantaneous streamwise velocity u, as a baseline, owing to its simplicity. Despite its limited input, this configuration demonstrated that the spatial coherence of an instantaneous u field contains information about long-term mean structures, such as the high-velocity core, shear layers near the banks, and wake regions behind bridge piers. Because the input field was relatively smooth compared with multi-component data, convergence was stable even without special transformations. However, the network’s predictions occasionally underrepresented recirculation zones and failed to reproduce fine-scale secondary flows in highly curved or sheared river sections. This shortfall is consistent with the physical limitation of single-component information. In other words, the instantaneous streamwise velocity component, u, alone cannot describe the divergence-free conditions and/or momentum conservation and thus impacts the predicted target vector, i.e., the mean flow field.
Tri-variable network: Considering the 3D instantaneous velocity field, (
), provided a richer physical basis. In principle, the three components carry the anisotropic signatures of turbulence, allowing the surrogate to infer coupling between streamwise and cross-stream motions. In practice, however, raw multi-component training proved unstable, gradients exploded within the first few epochs, and the optimizer quickly diverged. This instability stemmed from the fact that
u,
v, and
w span different magnitude ranges and dynamic behaviors, while
u dominates the bulk flow and
v and
w fluctuate around lower means but exhibit intense local bursts near shear layers. Employing the cube-root transformation equalized these magnitudes and effectively “linearized” the tails of the distribution, enabling the network to learn balanced filters across all components. After transformation, the tri-channel network produced more accurate reconstructions of turbulence-intense regions, particularly the shear layers along the banks and the wake recovery zones behind the Memorial and Sarah-Mildred Bridges (see
Figure 1). In general, the predictions preserved the large-scale topology better than those of the one-channel model.
A comparison of the two approaches highlights some key insights, as follows:
Information content vs. stability: Adding input channels increased representational capacity but required more robust normalization and regularization. The baseline single-variable network, trained on fewer variables, was easier to stabilize but less expressive.
Importance of preprocessing: Pre-training data transformation of the multi-channel network exerted a greater influence on convergence than architectural depth or number of filters. Without cube-root scaling, even more complicated network structures failed to converge.
Physical consistency of learned features: Visualization of latent features revealed that deeper layers in both ML models responded to coherent flow structures such as jet cores and vortex–shear interfaces, suggesting that the networks learned physically meaningful encodings rather than purely statistical correlations.
Cross-boundary robustness: Both models generalized reasonably well when the test scenarios (of river flow) resembled the training one, but deteriorated under a strong distribution change. This limitation underscores the need for domain adaptation or transfer learning techniques to develop surrogates across hydrodynamic regimes.
Overall, our numerical experiments confirmed that cube-root preprocessing plays a key role in stabilizing training when the target fields exhibit strong spatial heterogeneity, particularly for second-order turbulence quantities such as TKE. The final surrogate framework captures the essential topology of the mean flow field while remaining computationally lightweight. The findings also reinforce a broader lesson that the preprocessing choices grounded in physical reasoning can be as important as architectural factors in determining ML model performance.
Furthermore, several challenges were identified during training and evaluation phases that warrant further investigation:
Heterogeneity: Orders-of-magnitude differences in second-order turbulence statistics quantities made raw training challenging.
Dead-end trial: Training directly on Reynolds stresses (i.e., second-order turbulence statistics), without transformation, caused exploding gradients.
Key breakthrough: Cube-root preprocessing of TKE reduced skewness and stabilized the learning process.
Cross boundary condition generalization: Performance degraded significantly when inflow profiles, used as inlet boundary conditions, differed strongly.
We measured model performance using the mean squared error (MSE) between the predicted and reference time-averaged target fields, including mean velocity and TKE. MSE was chosen for its direct penalization of deviations in physical magnitude and its smooth, well-behaved gradients. For an instantaneous input
and LES-computed mean flow target of
, the loss is obtained as:
where
is the ML model’s predicted mean flow quantity, and N is the number of voxels per batch. The loss is computed in the normalized space (after cube-root and standardization transforms), and predictions were converted to the physical scale for visualization and error metric calculations.
Herein, the ML models were trained using the AdamW optimizer with a step-based learning-rate scheduler. AdamW builds on the Adam optimization framework [
43] by decoupling weight-decay regularization from the gradient-based update [
42], while the scheduler progressively adjusted the learning rate to support stable convergence.
Each training run consisted of 80–120 epochs, depending on the complexity of the river flow scenario and the type of the input vector. The training process was monitored by validating the loss and by consistent visual inspection of the predicted mean flow field (e.g., every 5 epochs). Early stopping terminated training if the validation loss did not improve for 20 epochs. Moreover, no explicit dropout or batch-normalization layers were used. training was performed using mini-batches with a batch size of 2, with the available full-domain training samples randomly shuffled before mini-batch construction. In addition, a cube-root transformation was applied to the target variables, followed by standardization. These preprocessing and training procedures reduced scale imbalance within the target fields, helped mitigate potential small-batch training instability, and improved convergence stability without requiring spatial tiling in the final workflow.
3. Results and Discussion
In this section, we examine the predictive performance of the trained surrogate CNN model qualitatively and quantitatively for the mean-flow velocity fields (first-order turbulence statistics) and the turbulence kinetic energy field (second-order turbulence statistics). It should be noted that the CNN model was first trained using the LES-computed mean flow field data associated with the “river inflow” boundary conditions. For brevity, this section focuses on validation (test) cases, i.e., the river flow field with the “log-law profile” and “straight channel” boundary conditions, which the CNN model did not encounter during training.
In
Figure 4, we plot the LES-computed and CNN predictions for the contours of mean velocity magnitude,
, within the study area at the water surface. As seen,
Figure 4a,b correspond to the LES-computed velocity magnitude obtained using the log-law and straight-channel boundary conditions, respectively, while panels (c) and (d) show the corresponding CNN predictions. Overall, the surrogate CNN model successfully reproduces the low and high-velocity fields throughout the study area. In particular, the model accurately identifies zones of high-velocity flow where the river narrows or near the inner bends, as indicated by the red and orange regions in the color maps. The predicted mean velocity fields closely resemble the LES-computed mean flow patterns under both the “straight channel” and “log-law profile” boundary conditions, demonstrating that the trained model generalizes well to inflow configurations that differ from those used during training (i.e., “river flow” boundary condition).
In addition,
Figure 5 provides a more detailed quantitative comparison of velocity magnitude at the six representative cross-sections (A–F) within the study area (see
Figure 1 for the locations of cross-sections). The solid and dashed curves denote the LES results and CNN predictions, respectively. As shown, the ML-predicted velocity magnitude distributions closely match those of the LES, capturing both the magnitudes and peaks of the mean velocity field. The agreement is particularly strong in sections characterized by smooth shear layers and gradual velocity transitions. In regions with stronger lateral variability, minor discrepancies appear near the channel boundaries, where sharp gradients and complex secondary flow structures are present.
Figure 6 further examines the model performance under the log-law profile. As seen, despite the change in the inflow specification, the surrogate model continues to reproduce the LES profiles with high accuracy across all sections. The predicted velocity distributions capture the primary lateral gradients and peak velocity regions associated with the channel geometry, indicating that the model retains its ability to represent the dominant flow features under unseen boundary conditions. Compared with the straight-channel case, slightly larger deviations are observed in sections exhibiting stronger asymmetry or sharper velocity gradients, particularly near the channel boundaries. The observed discrepancies are consistent with the increased complexity introduced by the log-law inflow, which induces more pronounced shear and non-uniform momentum distribution. Nevertheless, the CNN predictions remain well aligned with the LES reference profiles, demonstrating robust generalization of the surrogate model across different inlet boundary conditions.
Figure 7,
Figure 8 and
Figure 9 illustrate the predictive performance of the surrogate model for the time-averaged velocity magnitude at mid-depth.
Figure 7 presents the spatial distribution of the velocity field for both inlet boundary conditions. The CNN predictions closely reproduce the LES velocity patterns across the river reach, capturing the high-velocity core and the gradual velocity decay toward the channel banks. In addition, the model accurately represents the redistribution of momentum induced by channel curvature, with localized acceleration zones forming along the outer banks of bends.
We plot in
Figure 8 velocity profiles for the straight-channel inlet condition at mid-depth of the river. The predicted profiles show strong agreement with the LES results, capturing both the magnitudes and the overall shapes of the mid-depth lateral velocity distributions. As expected, this alignment is especially pronounced in sections with smooth velocity gradients, while minor discrepancies appear in regions of sharper shear near the channel boundaries.
Figure 9 further evaluates the model under the log-law profile as the inlet boundary condition. Despite the increased complexity associated with non-uniform inflow profiles, the CNN predictions remain well aligned with the LES reference data. The model successfully captures the primary velocity gradients and peak locations across the channel, although slightly larger deviations are observed in sections exhibiting stronger asymmetry and sharper gradients.
Now we focus our attention on the CNN predictions for the second-order turbulence statistics, namely, the turbulence kinetic energy. As shown in
Figure 10,
Figure 11 and
Figure 12, the predictive performance of our model for the TKE field at the water surface is presented.
Figure 10 compares the spatial distribution of TKE obtained from LES and CNN predictions for the two unseen inlet boundary conditions: log-law profile and straight channel inflow. The CNN model seems to have captured the large-scale spatial organization of turbulence intensity. More specifically, CNN has done a good job capturing the peaks of TKE along regions of strong shear and flow acceleration. However, compared with the mean velocity field, the TKE predictions exhibit an elevated background TKE pattern, visibly indicating a reduced ability to resolve localized turbulent structures and sharp spatial variations. We argue that such discrepancies are due to the elevated turbulence content (i.e., high background turbulence intensity) of the “river flow” inlet boundary conditions, with which the ML model was trained.
A quantitative analysis of the TKE computations of the two models at the free surface (see
Figure 11) provides a more precise comparison throughout the study area. As seen, the CNN predictions generally follow the overall trends of the LES profiles, with discrepancies more noticeable than those observed in the velocity field. The CNN model captures the locations of major peaks; however, it tends to misrepresent the magnitude of localized TKE peaks and smooths out smaller-scale fluctuations, especially in regions with strong lateral gradients.
Figure 12 further evaluates the ML model under the log-law profile inlet boundary condition. As seen, the predicted TKE distributions across the cross sections resemble those of the LES prediction, although with larger deviations in sections exhibiting strong asymmetry and intermittent turbulent flow structures. Overall, these results indicate that while the surrogate model effectively captures the large-scale behavior of the TKE field, it has limited capability to resolve fine-scale turbulence features.
In
Figure 13, we present the predictive performance of the surrogate model for the TKE field at mid-depth of the river flow. As seen, the ML model has successfully captured the overall spatial organization of the turbulence kinetic energy within the study area. However, similar to the results at the free surface, the predicted field appears to slightly underestimate the background TKE within the river, indicating a limited ability to resolve fine-scale turbulence content of the river.
On the other hand, as seen in
Figure 14 and
Figure 15, the transverse profiles of the TKE at mid-depth show that the ML model has successfully captured the trend in the TKE distribution across the river. Nevertheless, discrepancies are more evident in regions with sharp gradients and localized peaks. The model tends to misrepresent the magnitude of peak TKE values, reflecting the difficulty of learning highly intermittent turbulence features from instantaneous inputs. We should note that the ML model’s predictions have successfully produced the general TKE distributions, including the relative positioning of high- and low-energy regions across the channel.
Deviations are observed in sections with pronounced asymmetry and strong localized turbulence. These results suggest that while the CNN predictions capture the general spatial distribution of high- and low-TKE regions, the relatively low values demonstrate limited quantitative accuracy in TKE prediction. As a result, the current TKE outputs should be regarded as qualitative or semi-quantitative indicators of large-scale turbulence organization, rather than as fully reliable predictions of local turbulence intensity. To further assess the predictive performance of the trained neural network models, we used standard regression error metrics for both test cases, straight-channel and log-law profile boundary conditions.
Table 1 presents the error analysis results for the velocity magnitude prediction results (MLUavg), while
Table 2 summarizes the performance of the ML model for TKE predictions (MLTKE). The error analysis of the ML model includes the average loss, measured as mean squared error (MSE), root mean squared error (RMSE), and mean absolute error (MAE), as well as the coefficient of determination (
). It is worth mentioning that the present surrogate framework is deterministic and does not explicitly quantify predictive uncertainty, and that the above-mentioned assessments do not fully characterize predictive confidence for safety-critical or risk-informed hydrodynamic applications.
As seen in
Table 1, the MLUavg error analysis results indicate strong predictive capability across both boundary-condition scenarios. The RMSE and MAE remain relatively small, and the coefficient of determination exceeds 0.95 in both cases, suggesting that the model successfully captures the dominant spatial variability of the time-averaged velocity field even when tested under boundary conditions different from those used during training. The slightly improved performance observed with the log-law profile boundary condition, reflected in lower MSE, RMSE, and MAE values, as well as a higher
score, indicates that the learned mapping generalizes well to this inflow configuration.
Similar to the time-averaged velocity prediction results in
Table 1, the TKE prediction also performs better for the log-law profile boundary condition than for the straight-channel case, as shown in
Table 2. Specifically, compared with the straight-channel case, the log-law case reduces the MSE by approximately 28.0%, the RMSE by 15.3%, and the MAE by 19.5%, while increasing the
value by approximately 44.2%. This moderately improved performance under the log-law profile suggests that, although TKE prediction remains more challenging than mean-velocity prediction, the surrogate retains partial generalization capability across different inflow specifications. In contrast, the MLTKE error analysis (
Table 2) exhibits larger prediction errors and substantially lower
values compared with the MLUavg prediction errors. Although the MSE, RMSE, and MAE values for TKE are relatively small in absolute magnitude, the substantially lower
values indicate that the model explains less of the spatial variance of the TKE field than it does for the time-averaged velocity field. This behavior suggests that the CNN captures the broad spatial organization of TKE but has difficulty reproducing localized high-gradient regions and peak values. This is consistent with previous CNN-based turbulence-prediction studies. Santoni et al. [
44] reported larger prediction errors for TKE than for velocity fields when using a CNN and attributed this difference to the stronger nonlinearities of turbulent-flow quantities. They also suggested that additional instantaneous snapshots or deeper architectures may be required for improved TKE reconstruction. Similarly, Zhang et al. [
30] showed that second-order turbulence statistics in riverine flows can be highly heterogeneous, differing by nearly two orders of magnitude across the domain. They also noted that relative or normalized error metrics can be inflated when reference quantities are very small in magnitude. Therefore, the lower quantitative accuracy of the present TKE predictions reflects both model limitations and the inherent difficulty of learning small-magnitude, intermittent, and spatially heterogeneous second-order turbulence statistics from limited instantaneous inputs.
Several factors likely contribute to this behavior. First, TKE exhibits sharper spatial gradients and localized high-magnitude extrema, especially within shear layers, wake regions, and areas of intense turbulence production. These localized extrema are difficult to reconstruct from a limited number of instantaneous LES snapshots. Second, the encoder–decoder architecture may introduce smoothing because strided convolutions reduce the spatial resolution of feature maps, improving computational efficiency but potentially suppressing small-scale, high-frequency turbulent structures. Third, the MSE loss penalizes errors across the entire computational domain, which tends to favor spatially smooth predictions that minimize the average error rather than preserve localized extrema. Consequently, the model captures the large-scale spatial distribution of TKE but does not consistently reproduce the magnitude of localized extrema.
Insufficient temporal sampling may further contribute to the observed discrepancies in localized TKE extrema. Because TKE is derived from turbulent fluctuations, it is more sensitive to rare and intermittent flow events than the mean velocity field. A limited set of instantaneous snapshots may not fully capture the temporal variability of peak turbulence regions, causing the model to learn dominant spatial patterns while missing less frequent high-intensity events. Thus, the observed TKE discrepancies likely result from a combination of architectural smoothing, MSE-based loss averaging, and limited temporal sampling. Future work may explore peak-aware loss functions-such as weighted MSE, percentile-weighted losses, or hybrid approaches that assign greater penalties to cells with large values, increased snapshot sampling, and architectures incorporating skip connections [
37] to improve the prediction of localized TKE extrema.
We should note that in the multivariate model, cube-root preprocessing of the input vector was shown to be crucial for stable convergence and improved reconstruction of energetic zones, despite its detrimental effect on predicting local peak velocity magnitudes and TKE. Single-variable models, on the other hand, demonstrated that even limited information from instantaneous velocity fields can yield physically consistent mean-flow estimates.
Figure 16 shows the evolution of the training and testing losses over the course of model training on a logarithmic scale. Both losses decrease rapidly during the early epochs, indicating effective initial learning, and then gradually level off as training progresses. As expected for unseen data, the testing loss remains higher than the training loss throughout training; however, the testing curve follows the same decreasing trend as the training curve and does not exhibit late-epoch divergence. By the final epoch, the training and testing losses reach approximately
and
, respectively, corresponding to a final generalization gap of approximately
. Although this gap indicates that the model performs better on the training data than on unseen test data, the stable test loss behavior suggests that the model does not suffer from severe overfitting and retains acceptable generalization capability.
The computational advantage of the proposed surrogate model was assessed by comparing the cost of generating LES-based reference fields with the cost of the CNN-based prediction workflow. In the present study, each LES-based numerical result required approximately CPU-hours on AMD EPYC CPU nodes. In comparison, the complete CNN training procedure required approximately 30 GPU-hours on an NVIDIA A100 GPU node. Although CPU-hours and GPU-hours are not directly hardware-equivalent units, the comparison highlights the substantially lower workflow-level resource requirement of the surrogate approach: the LES benchmark required tens of thousands of CPU-hours, whereas the CNN training required only tens of GPU-hours, followed by sub-second inference after training. After training, the inference time required to generate a predicted time-averaged field was approximately s. This indicates that the trained surrogate model can provide near-instantaneous predictions once the offline training is complete. The resulting acceleration is particularly valuable for repeated evaluations, sensitivity studies, and scenario testing, where performing a separate LES simulation for each new condition would be computationally prohibitive. It should be noted that CPU-hours and GPU-hours are not identical hardware-normalized units; therefore, the above ratio should be interpreted as a workflow-level reduction in allocated computational cost rather than a direct processor-to-processor floating-point speed-up.