Evaluation of a Near-Wall-Modeled Large Eddy Lattice Boltzmann Method for the Analysis of Complex Flows Relevant to IC Engines

: In this paper, we compare the capabilities of two open source near-wall-modeled large eddy simulation (NWM-LES) approaches regarding prediction accuracy, computational costs and ease of use to predict complex turbulent ﬂows relevant to internal combustion (IC) engines. The applied open source tools are the commonly used OpenFOAM, based on the ﬁnite volume method (FVM), and OpenLB, an implementation of the lattice Boltzmann method (LBM). The near-wall region is modeled by the Musker equation coupled to a van Driest damped Smagorinsky-Lilly sub-grid scale model to decrease the required mesh resolution. The results of both frameworks are compared to a stationary engine ﬂow bench experiment by means of particle image velocimetry (PIV). The validation covers a detailed error analysis using time-averaged and root mean square (RMS) velocity ﬁelds. Grid studies are performed to examine the performance of the two solvers. In addition, the differences in the processes of grid generation are highlighted. The performance results show that the OpenLB approach is on average 32 times faster than the OpenFOAM implementation for the tested conﬁgurations. This indicates the potential of LBM for the simulation of IC engine-relevant complex turbulent ﬂows using NWM-LES with computationally economic costs.


Introduction
Due to the complex turbulent nature of internal combustion (IC) engine flows, their accurate prediction is a major challenge to numerical and experimental investigations. Additional difficulties arise from the interconnection of multiphysical processes, including multiphase flow phenomena, heat transfer and chemical reactions. Each process features different time and length scales, often varying in orders of magnitude, which further increases the complexity.
A particularly important turbulent flow structure for the analysis of IC engines is the intake jet [1]. This high-speed flow over the valves is critical in generating a charge motion, which is commonly referred to as a tumble motion. The tumble breakdown in engine compression results in turbulent structures, which dominate the mixing, ignition and combustion processes and in turn, the engine efficiency and pollutant emissions. Therefore, it is necessary to understand the complex processes in turbulent IC engine intake flows to improve the combustion performance and reduce cycle-to-cycle variability [2][3][4][5][6][7].
Optically accessible research engines enable the detailed investigation and visualization of the processes inside IC engines, often with simplified geometries for numerical validation. High-speed laser diagnostics have long been utilized in engine experiments and have provided more insight into the turbulent structures present in engine flows [8][9][10][11]. Simplified flow bench setups with steady state or transient operation are common tools to optimize cylinder head or intake port geometries and have been used to investigate the intake flow in industrial and scientific research [12,13]. Recent studies examined intake phenomena using magnetic resonance velocimetry (MRV) in a steady water flow bench [1] or low-speed particle image velocimetry (PIV) in a steady air flow bench [14,15]. The data from these experiments have been used as validation for large eddy simulation (LES) approaches [14][15][16][17][18].
However, high-speed PIV data for a flow bench with a realistic engine geometry is limited. Therefore, numerical simulations are another essential tool for the analysis of IC engine flows. In particular, the 3D flow data and turbulence structures obtained with LES offer data which are nearly impossible to obtain in experimental investigations. The choice of LES instead of commonly used Reynolds-averaged Navier-Stokes (RANS) approaches is justified by its ability to resolve the intrinsic unsteady flow motion resulting from the moving valves and pistons. The study of unsteady phenomena such as cycle-to-cycle variability, misfire and knock are especially important factors influencing the geometric design and the operating conditions [2]. The use of LES, which is known to be computationally expensive, is often favored with moderate Reynolds numbers (10, 000 < Re < 30, 000) and relatively small regions of interest. However, the fast prediction of accurate and detailed LES results to accelerate design cycles is still a challenge due to the increased number of cells and time required to generate adequate statistics when comparison with RANS approaches.
LES studies of fired and non-fired engine cases including moving piston and valves are reported by many researchers, e.g., [3][4][5][6][7]. Most of these numerical studies are focused on the analysis of cycle-to-cycle variations of in-cylinder flow fields and its influence on the mixing dynamics, combustion and pollutant emission. In this respect, it is worth mentioning that systematic evaluation studies of different LES approaches and models under engine-like operating conditions are rarely reported in the literature. This is mainly because of the considerable numerical effort required to carry out LES of many engine cycles of fired and non-fired cases with moving piston/valves. Furthermore, the complexity of in-cylinder flows impedes in-depth studies of individual processes and model evaluation. Therefore, it is useful to reduce the complexity of the engine configuration and evaluate LES approaches and numerical models by means of simplified flow bench configurations that represent most of the flow and mixing phenomena relevant to IC engines.
The aforementioned studies are based on traditional discretization methods like the finite volume method (FVM). In recent years, an alternative approach called the lattice Boltzmann method (LBM) has gained increasing attention in research and industry. LBM is useful in a wide range of applications, e.g., thermal flow simulations [19,20] or flows in complex geometries [21,22], due to its highly efficient parallel algorithm [23,24]. Such efficiency offers a high potential in reducing computation times for the simulation of high Reynolds number flows using DNS or LES approaches, which is usually a bottleneck in the field of turbulent flow simulations.
Qualitative and quantitative comparisons were made to estimate the capabilities of LBM based implementations for the simulation of turbulent flows in comparison with FVM-based implementations. In 2014 Kajzer et al. [25] evaluated the performance differences between an LBM and FVM implementation for the simulation of homogeneous isotropic turbulence. They found that in particular the scalability of LBM methods and the adaptivity for computations on graphics processing units (GPU) lead to a significant performance advantage compared with the tested FVM implementation in OpenFOAM. Two years later Pasquali et al. [26] showed that the calculation of the external aerodynamics of a car also benefits from the use of LBM on GPUs. A further comparison between LBM and FVM depicted that the higher grid resolution obtained by LBM leads to more resolved vortex structures in the outer layer of turbulent channel flows [27]. Barad et al. [28] compared a higher order finite difference method (FDM) with LBM in a software framework that uses the same Cartesian mesh structure. They showed that for the simulation of airframe noise the LBM implementation is 15 times faster than the higher order FDM scheme at a similar accuracy.
Most of the LBM studies related to engine flows that have been conducted to date deal with the injection process. Therefore, a multiphase approach is chosen to simulate spray formation, bubble break up and flow induced cavitation. A summary of these studies can be found in the book of Montessori and Falcucci [29]. A moving valve/pistion arrangement was simulated by Dorschner et al. [30] using the parameter-free Karlin-Bösch-Chikatamarla (KBC) collision operator. They showed that the results are in good agreement with a DNS reference.
In contrast to all these previous contributions, we focus on the implementation of open source near-wall-modeled LES (NWM-LES) to achieve fast and accurate results, which is relevant to users in academia and industry. Therefore, a recent version of the established FVM-based implementation of OpenFOAM is compared with OpenLB [31], an open source LBM framework. To get a fair comparison, we solve the same target equation in both software frameworks, including an explicit sub-grid scale (SGS) modeling and the use of a wall function. Furthermore, we do not limit the grid to a certain amount of cells or type of mesh elements so that each implementation can show its advantages. The grid generation process is also taken into account to compare the time spent on pre-processing. This is one of the first studies where in-house-conducted experimental data are used to validate two open source implementations. Moreover, a detailed error analysis of both methods covering the grid convergence of time-averaged and root mean square (RMS) velocity fields in the context of engine flows is a novelty. Additionally, a performance analysis compares the solver runtime of each implementation that is needed to calculate the statistics for the different grids. The comparison in the theory section aims to highlight the differences between both discretization methods. In addition, the differences in the implementation of wall-modeled LES in LBM and FVM are described. The NWM-LES implementation in LBM is ongoing research due to the complex boundary treatment in LBM. As a result, a new LBM wall function approach is proposed which extends the previous approaches [32,33] to curved boundaries.
This paper is organized as follows: Section 2 introduces the applied modeling approaches and shows the differences and similarities using LBM or FVM. Next, the experimental and numerical setup is described in Section 3. The related results using NWM-LES obtained with OpenFOAM and OpenLB for different grid resolutions are presented and compared to the PIV results in Section 4. Finally, Section 5 summarizes the results and draws a conclusion.

Filtered Navier-Stokes Equations
The filtered incompressible Navier-Stokes equations consist of the continuity equation and the momentum equation according Leonard's decomposition [34] which reads where Greek indices obey the Einstein notation, u α is the filtered velocity, p is the filtered pressure field, T SGS αβ is the SGS stress tensor, ρ is the density and ν is the viscosity. This set of equations can be closed by using a linear eddy viscosity hypothesis for the SGS stress tensor where ν SGS is the SGS viscosity that can be modeled by an SGS viscosity model (see Section 2.2). In Equation (2) no volume force is applied and will not be considered hereafter.

Finite Volume Method
In dealing with the FVM of incompressible fluid flow, the discretization process of the balance laws of fluid motion can be divided into two steps: (1) the spatial and temporal discretization of the solution domain and (2) the discretization of the spatial and temporal terms in the Navier-Stokes equations [35]. Then, the partial differential equations can be converted into a corresponding set of algebraic equations and solved numerically. Additionally, nonlinearities in the Navier-Stokes equations and the pressure-velocity coupling require some special numerical treatment. The second-order solution procedure employed in the open source C++ library OpenFOAM 2.4.0, which is used in the present LES study, is briefly outlined in the following. A detailed description can be found, e.g., in [36][37][38].
In the standard FVM framework of OpenFOAM, the continuum space and time domain are divided into a finite number of discrete regions called control volumes (CV) and time intervals, respectively. Thereby, the CVs completely bound the solution domain and the solution variables, such as velocity and pressure, are colocated at the cell centroids of the CVs [39]. In contrast to a staggered grid arrangement, this allows an arbitrary topology of CVs, e.g., hexahedrons, tetrahedrons, prisms, pyramids or general polyhedrons, which has significant advantages in the discretization of complex solution domains.
Several approximation schemes and solution procedures are available in the OpenFOAM framework to discretize and solve the Navier-Stokes equations. In this study, the standard pimpleFOAM solver of OpenFOAM 2.4.0 is applied, which is based on a merged PISO [40]-SIMPLE [41] algorithm for the coupling of pressure and velocity. Thereby, the governing equations are numerically solved in a segregated manner using a momentum predictor, pressure solver and momentum corrector. This iterative solution procedure is applied with a second-order implicit backward-differencing scheme for the time integration. Regarding spatial terms, a low-dissipative second-order flux-limiting differencing scheme is employed for the convection terms and a conservative scheme is used for the Laplacian and gradient terms. The resulting systems of linear equations are iteratively solved using a geometric agglomerated algebraic multigrid solver. Thereby, convergence of the overall procedure is obtained if all normalized residuals are smaller than 10 −4 . Validation and verification studies of this specific solution procedure for LES of complex fluid flows relevant to IC engines are provided in [18,42,43].

Lattice Boltzmann Method
The lattice Boltzmann equation discretizes the velocity space of the kinetic Boltzmann equation to a discrete set of lattice velocities c i , i = 0, 1, ..., q − 1. Common velocity sets to recover the three-dimensional incompressible Navier-Stokes equations are D3Q15, D3Q19 and D3Q27. The present work uses a discrete velocity D3Q19 set, which is given by The descriptor set is chosen due to the higher computation performance and the lower memory demand in the used LBM implementation. However, higher errors due to a violation of the rotational invariance are taken into account in comparison with a D3Q27 stencil [44]. The filtered lattice Boltzmann equation without external forces is given by where f i is the filtered particle distribution function at discrete lattice position x LB and time step t LB . The filtered collision operator Ω i is implemented by a single-relaxation time model proposed by Bhatnagar, Gross and Krook [45]. It can be written as where τ ef f is the effective relaxation time towards the filtered discrete particle distribution function at equilibrium state f eq i , ρ is the filtered lattice density and u the filtered velocity field. The collision operator satisfies the conservation of mass and momentum. The particle distribution function equilibrium is described by a low Mach number truncated Maxwell-Boltzmann distribution where ω i are the lattice weights obtained by the Gauss-Hermite quadrature [46,47], c s = 1/ √ 3 is the speed of sound of the lattice and δ αβ is the Kronecker operator. The moments of the particle distribution functions f i yield macroscopic flow quantities. The density ρ LB , the momentum ρ LB u LB and momentum flux Π are obtained by the zeroth, first and second moments, which are given by The lattice effective kinematic viscosity of the fluid ν ef f is connected to the effective relaxation time τ ef f as follows Assuming a simplified isothermal equation of state the filtered lattice pressure is related to the filtered density by Finally, the lattice Boltzmann algorithm is divided in 2 steps: the collision step and the streaming step. The local collision step is represented by the right-hand side of Equation (5) and the subsequent streaming step is associated with the left-hand side of Equation (5).

Sub-Grid Scale Modeling
The introduced eddy viscosity ν SGS in Equation (3) is estimated by an SGS viscosity model, which can be generally written as where C M is a model coefficient, ∆ grid is the grid filter and D M a model-related operator. The present work uses a Smagorinsky-Lilly model [48], where the model operator is defined as where S αβ is the filtered strain rate. The literature values for the Smagorinsky-Lilly model constant C M are in the range of C M = 0.065...0.24 [49,50]. For a complex turbulent flow, a Smagorinsky-Lilly constant of C M = 0.1 is a common choice [51]. The Smagorinsky-Lilly model suffers from a too dissipative behavior in the near-wall region [42,52]. One possibility to prevent this aspect is the introduction of a damping function that reduces the SGS viscosity depending on the wall distance. The van Driest damping function [53] can be incorporated in the grid filter ∆ grid by where y is the wall distance, A + = 26 is the van Driest parameter, C ∆ = 0.158 is a model constant and κ = 0.41 is the von Kármán constant [54]. The dimensionless wall distance y + in Equation (15) is defined as where u τ = T w ρ is the friction velocity and T w the wall shear stress.

SGS Model for Finite Volume Method
In the FVM framework of OpenFOAM, the SGS viscosity ν SGS is calculated explicitly for each time step using the resolved velocity field. Then, the turbulent and molecular diffusion contributions are combined into an effective stress tensor by means of the Boussinesq approximation as where ν ef f represents the effective viscosity. For the sake of computational efficiency, the velocity gradient and transposed velocity gradient terms in Equation (17) are treated separately. Thereby, the velocity gradient term is treated implicitly as a diffusion, while the transposed velocity is treated as an explicit source term. The latter is therefore calculated using the velocity at the previous iteration. Further information on the implementation of eddy viscosity turbulence in OpenFOAM can be found, e.g., in [55].

SGS Model for Lattice Boltzmann Method
Eddy viscosity models are often introduced in LBM by adding turbulent viscosity to the molecular viscosity [56], which results in an effective viscosity A consistent approach to implement eddy viscosity models in LBM was derived by Malaspinas and Sagaut [57]. They presented that due to the connection between lattice viscosity and lattice relaxation time (see Equation (11)), the relaxation time is also divided in a molecular and SGS contribution where is the eddy contribution. The filtered strain rate S LB αβ in the SGS operator formulation in Equation (14) can be obtained by a finite difference scheme or locally in the LBM framework by where Π neq αβ is the second moment of the non-equilibrium parts of the particle distribution function, which can be calculated according to Equation (10) . This implicit relation of the effective relaxation time τ ef f and the filtered strain rate S LB αβ can be replaced by an explicit expression for the Bhatnagar-Gross-Krook (BGK) collision operator by a local method proposed by Malaspinas and Sagaut [57]. This explicit expression for determining the effective relaxation time τ ef f is given by

Wall Function Approach
In contrast to a near-wall resolved LES, the NWM-LES requires additional effort to model the effects occurring in the boundary layer. However, NWM-LES allows grid spacing up to y + = 200, which results in a significantly smaller amount of grid points. In the present work, we use the idea of Werner and Wengle [58], which describes an instantaneous connection between the wall shear stress and the velocity. This consideration only applies for averaged quantities and therefore a RANS hypothesis is assumed for the boundary node. A fully developed turbulent boundary layer can be described by the Musker profile [59] which reads u + = 5.424 arctan 2.0y + − 8.15 16.7 + log 10 (y + + 10.6) 9.6 (y + 2 − 8.15y + + 86.0) 2 − 3.5072790194. (22) This empirical formulation is based on a logarithmic law and is able to describe the turbulent boundary layer from the viscous sublayer (y + ≥ 1). The solution of the implicit function (22) requires an iterative scheme.

Wall Function for Finite Volume Method
Analogous to wall-shear stress models often used in the context of RANS, the boundary condition of the SGS viscosity ν SGS is corrected for each time step by means of a wall function in the NWM-LES approach of OpenFOAM. The numerical procedure can be divided into two steps. First, the friction velocity u τ is approximated iteratively according to Musker's wall function. Thereby, the Newton-Raphson method is applied to find the root of the wall function. Then, in the second step, ν SGS at the wall is calculated as where y p denotes the wall distance of the cell centroid and u the stream-wise velocity. It is important to note that a boundary condition of ν SGS based on Musker's wall function is not available in OpenFOAM and was therefore added to the standard framework. A detailed description of the wall function approach in OpenFOAM including comprehensive validation studies in turbulent channel and impinging flows were provided by the authors in [60].

Wall Function for Lattice Boltzmann Method
The implementation of wall functions in the context of LBM is not straightforward due to numerous boundary scheme approaches. The idea of the wall model approach applied in this work was proposed by Malaspinas and Sagaut [32] and adapted to the BGK collision operator by Haussmann et al. [33]. They validated the wall function scheme using a bi-periodic turbulent channel flow. We adapt this scheme to curved boundaries using a curved link-wise instead of a wet-node boundary scheme. Our proposed algorithm is parted in two steps: the curved boundary approach and a velocity correction step according to the wall function. For better comprehension, the used indexing convention for the following two paragraphs is depicted in Figure 1.

Curved Boundary Step
In the present work, we use the curved boundary approach proposed by Bouzidi et al. [61]. This approach is an extension of a half-way bounce back scheme and characterized as precise, stable and computationally efficient for the simulation of turbulent flows [62]. The interpolated bounce-back approach uses a linear interpolation based on the dimensionless distance q B , which is defined as: Without altering the streaming step for boundary cells the unknown particle distribution function after the streaming step f¯i(x LB f , t LB + 1) can be replaced by where indexī indicates the particle distribution function in the opposite direction of index i. For q B = 0.5 the approach is equal to the half-way bounce back boundary condition.

Velocity Correction
Step The velocity correction step is used to correct the velocity in the particle distribution functions at node position x LB f according the wall function. Firstly, the distance to the boundary y LB 1 is defined in the discrete normal direction c n . Accordingly, the distance from the neighbor fluid node at position x LB n to the boundary is given by Due to the fact that the wall profile uses only the stream-wise velocity component, a local stream-wise unit vector e s is obtained by Subsequently, the stream-wise component u LB 2 of u LB n is calculated by The boundary distance y LB 2 and the stream-wise velocity component u LB 2 are inserted in the Musker profile Equation (22) to obtain the averaged wall shear stressT LB w . Therefore, the solution of the implicit equation is approximated by the Newton method. Afterwards, the stream-wise component u LB 1 ofũ LB f is calculated by the Musker profile Equation (22) using the boundary distance y LB 1 and the averaged wall shear stressT LB w . Then, the velocityũ LB f of the first fluid is computed bỹ Finally, the particle distribution function at node position x LB f is corrected as follows where superscript * denotes the quantities calculated after Equation (25). This means only the velocity is altered according the wall function, while the density and the non-equilibrium parts are preserved.

Setup of the IC Engine Test Case
In this work, a flow bench setup of an IC engine was chosen as a benchmark for the numerical comparison. With this setup, the intake flow with the focus on the intake jet over the valves into the cylinder can be examined in a realistic engine geometry and at the same time the overall complexity can be reduced compared with a real engine. The optically-accessible single cylinder engine at TU Darmstadt (Darmstadt Engine, [10]) was converted into a steady-state flow bench by removing and replacing the piston with an outlet channel open to ambient conditions (see Figure 2). As opposed to the previous flow bench studies of Freudenhammer et al. [63] in which the same spray-guided cylinder head geometry was fitted in a continuous water flow configuration for MRV measurements, the flow bench of the present study uses dry air and allows for instantaneous flow measurements. For this configuration, the cylinder liner was extended and the outlet channel geometry was optimized by means of unsteady RANS to suppress recirculation of the flow. For added simplicity to the engine geometry, the spark plug and fuel injector were replaced by flat plugs; but otherwise, the four-valve spray-guided pent-roof cylinder head (AVL) and fused-silica cylinder liner with a bore of 86 mm as well as the intake system remained unchanged. Figure 2 shows a diagram of the intake system and engine geometry of the flow bench experiment. As indicated by the red boxes, the flow bench has three optical access sections. The first section (I) represents the standard engine optical access which was fitted to the new flow bench extension (experimental sections II and III). Experimental section II allows for the characterization of the flow inside the flow bench extension for the verification of the flow structures present. Finally, experimental section III allows optical access and flow validation of the outflow through the bottom of the flow bench via a flat fused-silica plate and movable mirror. Intake valves were positioned at a fixed valve lift of 9.21 mm corresponding to 270 • CA (crank angles before top dead center) and exhaust valves were kept closed, thus mimicking the intake flow during regular engine operation. The flow bench experiment was conducted under controlled boundary conditions for consistent operation. Two mass flow controllers (Bronkhorst) were used to set a defined mass flow of 94.1 kg h , which corresponds to the respective instantaneous mass flow at 270 • CA under normal engine operation with a speed of 800 rpm and intake pressure of 0.95 bar. Since the instantaneous mass flow of engine operation is not available, the velocities in the intake jet were compared and matched such that the phase-averaged velocity (average of 400 cycles at 270 • CA) in motored engine operation matched the average velocities of the flow bench near the intake valve. As indicated in Figure 2, two absolute pressure sensors (P in,1 , P in,2 , PAA-M8cool HB, Keller) measured the static pressure and two PT100 temperature sensors (ϑ in,1 , ϑ in,2 ) measured the temperature of the flow within the intake pipe. Additionally, two more absolute pressure sensors (P out,1 , P out,2 , PMP4070, Kistler) measured the static pressure inside the flow bench. Table 1 summarizes the experimental boundary conditions.

Experimental Setup
High-speed PIV was used to measure the in-cylinder flow velocity field in the valve plane (VP, z = −19 mm) (see Figure 3). The uncertainty of velocity measurements by means of PIV depends on parameters such as the optical setup defined by imaging optics, camera and light sheet as well as tracer properties, the PIV algorithm and the flow itself. Common approaches to estimate the uncertainty as a function of different influencing variables employ artificial PIV images generated by Monte Carlo simulations [64]. Newer methods use the actual experimental data to estimate the uncertainty [65][66][67] and have been validated by a benchmark experiment [68].
The commercial software DaVis estimates the uncertainty based upon a correlation statistics approach [67]. In this study, the time-averaged uncertainty of the instantaneous velocity magnitudes is approximately 3% to 6% (normalized to the global maximum velocity range of 35 m s −1 ). This uncertainty range is valid for the jet region and lower velocity regions below the valves. Near the exhaust side cylinder walls, where the intake jet is curved downwards due to the influence of the walls, the normalized uncertainties increase to a maximum of 10%. This approach considers random errors inherent to the correlation process for particle images. Therefore, the reported uncertainties apply for instantaneous velocity fields and propagate to RMS velocity values, but are reduced to 1% for the time-averaged velocity, since most of the 25,000 pairs of particle images are uncorrelated to each other.
Other sources of error introduce a bias in the velocity calculation. Sharp gradients in the flow, e.g., at the edge of the intake jet, are underestimated due to the spatial averaging of the PIV algorithm. Acknowledging reported uncertainty assessments [64], this normalized error is assumed to be on the order of 3% to 9% for the jet region in instantaneous velocity fields and is slightly lower in the time-averaged velocity field due to the non-stationary jet position. The spatial average of the normalized uncertainty due to flow gradients amounts to 1%.
Additional systematic errors stem from the non-zero light sheet thickness and strong out-of-plane velocity components, which are detected as in-plane components due to the camera's perspective. This error is zero in the center, increases linearly to the edges of the field-of-view and can amount to more than 10% [64]. However, if the averaged out-of-plane velocity component is zero this error source is statistically zero. In the central tumble plane this assumption is justified, but less so in the valve plane, where mean out-of-plane velocity components exist. The uncertainty due to perspective errors was calculated with the time-averaged LES flow field, which provides all three velocity components. This normalized uncertainty contribution amounts to up to 10% locally and to 0.2% in the spatial average. Altogether, the spatially averaged accumulated normalized uncertainty of the time-averaged PIV measurements within this work is estimated to be 1%.

Numerical Setup
The fluid domain is depicted in Figure 4 in a clip representation. The inlet patch is colored in blue and the outlet patch in red. In contrast to the experimental setup, both the inflow and outflow regions are shortened in order to reduce the computational effort. The reduction of the inflow length to 2.62D is justified by the applied inlet boundary condition (see Section 3.3). For the estimation of the outflow length as 1.88D, the tumble flow area and the integral time scale were considered to ensure that the influence of the flow upstream is negligible. y x Figure 4. Clip representation of the simulation geometry with (x,y)-plane coordinate system. The boundary contains inlet (blue), outlet (red) and wall patches (metallic).

Boundary Conditions and Initial Conditions
The initial and boundary conditions play an important role in LES, because they mainly influence the time for statistically independent results. The inlet condition is especially challenging; the often used approaches that assume random fluctuations are not sufficient. The result is an energy signal, which equally distributes the energy in the wave number regime. Therefore, in the present study, we apply a digital filter-based operation proposed by Klein et al. [69]. This approach is able to reproduce prescribed Reynolds stresses. Considering the measured mass flow in the experiment and assuming a plug flow profile, the Dirichlet condition for the time-averaged inflow velocity is given by The superimposed fluctuations use an integral length of L = 0.25D according to the work of Ries et al. [42]. The calculation of the prescribed Reynolds stress tensor u α u β in , taking the hypothesis of homogeneous isotropic turbulence into account, reads where I = 0.06 is the turbulence intensity. The outlet condition is a free outflow condition, where the Dirichlet pressure condition is set to p out = 0 Pa.

Finite Volume Method
In the case setup of OpenFOAM, no-slip conditions are utilized for the velocity and the zero Neumann condition is used for the kinematic pressure at the solid walls. Furthermore, the wall function approach is employed for the SGS viscosity at the walls. At the outlet, a velocity inlet/outlet boundary condition is used to allow back-flow of air from downstream. Thereby, the incoming fluid velocity is obtained by the internal cell value, while the zero Neumann condition is employed in the case of outflow. Finally, as mentioned above, synthetic turbulent inflow conditions are employed at the inflow based on the digital filter method of Klein et al. [69].

Lattice Boltzmann Method
As previously mentioned in Section 2.3.2, the boundary conditions in the LBM are a critical challenge, especially in turbulent flows, where both accuracy and stability are important. The inflow condition is realized by a non-local regularized approach (see boundary scheme BC4 in [70]). The used inflow velocity is obtained by the digital filter approach, which is bilinear interpolated and mapped to each cell position. The outflow condition uses a wet-node equilibrium condition. Every particle distribution in each boundary cell before the regular collision occurs is replaced by where ρ LB out is the prescribed lattice density and u LB (x LB + c n , t LB ) the velocity of the neighbor cell in the normal direction. It is noteworthy that boundary approaches that also reconstruct the non-equilibrium part (e.g., BC3 and BC4 in [70]) show stability issues for this flow configuration.
The flow field is initialized by the equilibrium distribution f eq i (ρ LB , u LB ), where ρ LB = 1 and u LB = 0. Then, the velocity at the inflow is increased at the inlet for t = 0.05 s and is updated until the considered mass flow is reached. This procedure results in non-equilibrium parts of the particle distribution function that are adjusted according to the velocity field.

Statistics
The flow field is assumed to be statistically stationary after t ss = 0.5 s. After this start-up time, sampling is started within the LBM and FVM frameworks. The statistics are used to calculate the time-averaged velocity u and the RMS velocity u RMS , which can be calculated by where u 2 α is the time-averaged square of the velocity. The averaging time for u RMS is calculated according to the engineering correlation proposed by Ries et al. [42] as where RMS is the desired maximum sampling error. Inserting a sampling error RMS = 0.025, the averaging time is calculated as t av = 2.524 s.

Finite Volume Method
An adaptive time stepping technique is applied in the OpenFOAM setup in order to ensure a Courant-Friedrichs-Lewy (CFL) number smaller than one. Thereby, the time-averaged velocities are defined as where ∆t n is the time step at t n and N t the total number of time steps within t av . Analogously, the time-averaged square of the velocity u 2 α is given by

Lattice Boltzmann Method
Due to the use of fixed time steps, ensemble averaging is applied. The time-averaged velocity u α is given as where number N e is the number of independent ensembles. In the same way the time-averaged square of the velocity u 2 α is evaluated as Assuming Taylor's hypothesis of frozen turbulence and a spatial decorrelation distance of two integral length scales L, the number of independent ensembles N e is calculated by This results in 800 independent ensembles.

Grid Configurations
Both OpenFOAM and OpenLB are evaluated with three different grids in this work. There are certain differences between the grid structures, see Figure 5.
OpenLB uses a uniform Cartesian mesh without grid refinement. The fluid cells are indicated by checking if each grid point is inside or outside the geometry. The resulting grid is not volume conservative. In contrast, body-fitted meshes are favored by OpenFOAM. Therefore, prisms and polyhedral mesh elements are applied to reconstruct the geometry shape and preserve the volume. Furthermore, refinement layers are used to resolve more scales, especially near the wall. A detailed comparison between the three grid configurations used: low resolution (LR), medium resolution (MR) and high resolution (HR) for both OpenFOAM and OpenLB can be found in Table 2.
Acoustic scaling ∆t ∝ ∆x is used for the presented OpenLB configuration, which provides a constant compressibility error with respect to the incompressible Navier-Stokes equations but in return, requires less computational increase for smaller grid spacing than diffusive scaling. The application of acoustic scaling leads to a constant lattice Mach number Ma LB = 0.026 for the OpenLB setups. The resulting compressibility error is assumed to be sufficiently small. In terms of the inlet diameter D, Cartesian grid resolutions of N = 53, 77 and 111 are generated, approximately tripling the cell number in every configuration consisting of OpenFOAM meshes. Due to the adaptive time step and grid refinement in OpenFOAM, the size of the displayed grid spacing ∆x and time step ∆t is spaceand time-averaged, respectively.

Results of the IC Engine Test Case
In this section, PIV and LES results of the in-cylinder fluid flow are analyzed. At first, the ability of LBM and FVM to predict characteristic features of engine flows is assessed. Then, predicted time-averaged and RMS velocity profiles at several locations downstream of the valve are compared with each other and with the high-speed PIV measurements. Subsequently, the prediction accuracy of both numerical techniques are evaluated based on error analysis. Finally, the computational cost of OpenLB and OpenFOAM is appraised in terms of meshing and simulation performance.

Characterization of the In-Cylinder Flow
The absence of the plane normal components is due to the two-dimensional PIV measurement data (see Section 3. Characterized by strong flow/wall interaction processes, the turbulent flow inside IC engines features very complex flow and mixing dynamics. Considering Figure 6, some of the complex types of flow relevant to IC engines can also be found in the flow bench configuration; namely, (I) boundary layer flow, (II) impingement/stagnation, (III) wall-jets, (IV) flow separation/reattachment and (V) the so-called tumble flow. By comparing the LES results with the PIV measurements in Figure 6, it appears that LBM as well as FVM are able to properly predict such flow types. Furthermore, it can be clearly seen that predictions of LBM and FVM are quite similar to each other and also generally similar to the PIV measurements. This confirms the validity of LBM and FVM for such a fluid flow application.
The complex physics of engine flows are further analyzed and highlighted in Figure 7, which shows a snapshot of turbulent structures in the vicinity of the valve visualized by means of the Q-criterion [71]. Thereby, iso-surfaces of Q = 7 × 10 −7 are colored by the magnitude of the instantaneous velocity.
As is visible in Figure 7, a highly turbulent flow is generated around the intake valve. This gas stream separates from the valve and initiates large-scale turbulent structures, which cascade into smaller ones until they dissipate further downstream. Such a complex disintegration process is essential in the context of IC engine flows since it influences the mixing and subsequent flow pattern inside the combustion chamber. It is nearly impossible to capture these three dimensional turbulent scales experimentally. However, as seen in Figure 7, it can be well represented by means of LBM and FVM techniques.

Validation of In-Cylinder Fluid Flow
For further comparison, the magnitudes for the two-dimensional time-averaged velocity | U | and RMS velocity |U RMS | are plotted over three lines positioned at y = −7 mm, −12 mm and −17 mm, see Figure 8. The magnitude for the two-dimensional RMS velocity vector is again obtained from the two in-plane components For these three lines, each grid configuration of both solvers and the PIV results are presented in Figure 9.
It can be seen that the highest grid resolutions HR LBM and HR FVM agree well with the trends of the PIV results. Furthermore, the different convergence behaviors in the near-wall region are observable. Due to the used grid refinement, the wall jet can be described more precisely by LR FVM and MR FVM compared with LR LBM and MR LBM . In contrast, OpenLB is able to predict the transition area of the tumble flow to the right-side wall jet more accurately than OpenFOAM, even with the lower resolutions LR LBM and MR LBM .

Prediction Accuracy
The prediction accuracy of the numerical results calculated by OpenLB and OpenFOAM is compared with each other by means of the PIV measurement data. Therefore, we introduce the normalized absolute error nAE as the error criterion. The nAE for variable φ at position x is defined as where φ sim is the simulated data and φ PIV is the PIV measurement data, which is used as the reference value. The normalization is obtained by the interval length of the experimental data. The nAE | U | , concerning the time-averaged velocity | U |, is depicted for the three different grid resolutions LR, MR and HR in Figure 10. The region of interest is in accordance with the experimental data (VP, see Figures 3 and 8).
For the low grid resolution LR, both OpenFOAM and OpenLB show the largest errors in the jet region (see Figure 10a,b. Also, the tumble flow prediction accuracy is diminished in comparison with higher grid resolutions. It can be observed that especially the approximation of the jet region in LR LBM is worse than LR FVM , due to the larger grid spacing in the near-wall region around the valve. In contrast, the medium grid resolution exhibits in both cases that the error in the jet region is reduced (see Figure 10c,d. The high deviation region at the starting point of the jet is related to a shifted separation point of the boundary layer on the valve surface. Overall, the flow field of MR FVM approximates the PIV measurement data better than MR LBM , which is due to the higher accuracy in the jet and tumble flow range. In Figure 10e,f, the error maps for the highest grid resolution are presented. The error maps for HR LBM and HR FVM are in good agreement with each other and to the PIV measurement. Both the jet and the tumble flow region are well predicted. Again, it is noticeable that the highest deviation in the jet region is related to a shifted separation point. For the RMS velocity |U RMS |, high errors are more spread over the jet region compared with the mean velocity | U | errors, reaching into the tumble region as the fluctuation due to turbulent kinetic energy is amplified by the velocity (see Figure 11). For both LR and MR, OpenFOAM is able to predict the turbulent velocity fluctuations in the jet region better than OpenLB as a result of the graded mesh (see Figure 11a-d). For the same reason, OpenLB shows much smaller errors in the tumble region, while OpenFOAM suffers from greater errors at MR FVM and LR FVM (see Figure 11a,c). Similar to the | U | error map, |U RMS | is in good agreement with the PIV measurements for HR LBM and HR FVM given in Figure 11e,f.
A global error criterion can be defined by the arithmetic mean of the normalized error nAE φ . This normalized mean absolute error nMAE φ is given by where N PIV is the number of experimental data points in the plane. Figure 12 depicts the normalized mean absolute error of the mean velocity nMAE | U | and the RMS velocity nMAE |U RMS | . It can be seen that the nMAE decreases with an increasing number of cells. Both errors for the time-averaged velocity and the RMS velocity are lower than nMAE | U | < 0.08 and nMAE |U RMS | < 0.15, respectively, which is satisfactory for such coarse meshes. The errors for the highest resolution are very similar, which is also indicated by the corresponding error maps (see Figure 10e,f). The convergence order for the OpenLB and OpenFOAM configurations is lower than the first order. This diminished convergence order can be justified by the experimental reference data, where the estimated PIV measurement uncertainty is nMAE | U | = 0.01. Another source of error for the RMS velocity, besides the uncertainty of the PIV data, is the sampling error RMS = 0.025. This may also affect the convergence order of nMAE |U RMS | .    Besides the accuracy, the computational costs are a key factor to analyze the suitability of a 575 numerical method. Therefore, the runtime of the mesh generation and the solver was evaluated on a 576 single node which consists of two dodeca-core Intel Xeon processors E5-2680 v3 that support AVX2.

577
The node provides 64 GB main memory. The use of a single node for estimating runtime performance 578 was chosen because the parallel scalability is not in the scope of this study. The estimation of parallel 579 scalability requires extensive testing due to the strong influence by the cells per core ratio, the load 580 balancing method and the connection between the nodes of the cluster system. Comprehensive studies 581 that deal with the parallel scalability of OpenFOAM and OpenLB can be found in [43,72,73].

Computational Cost
Besides the accuracy, the computational costs are a key factor to analyze the suitability of a numerical method. Therefore, the runtime of the mesh generation and the solver was evaluated on a single node which consists of two dodeca-core Intel Xeon processors E5-2680 v3 that support AVX2. The node provides 64 GB main memory. The use of a single node for estimating runtime performance was chosen because the parallel scalability is not in the scope of this study. The estimation of parallel scalability requires extensive testing due to the strong influence by the cells per core ratio, the load balancing method and the connection between the nodes of the cluster system. Comprehensive studies that deal with the parallel scalability of OpenFOAM and OpenLB can be found in [43,72,73].

Meshing Performance
Due to the straightforward approach in the case of OpenLB, the grid generation is fully automatic and does not require any additional preparation steps. On the contrary, the meshing process for FVM is very time-consuming if the grid is manually obtained. Internal OpenFOAM tools can drastically reduce the effort, but require an experienced user. This study uses the built-in OpenFOAM meshing tool snappyHexMesh. Still, writing a script for grid generation for a complex geometry can take several days. Nevertheless, we only take the runtime for the mesh generation into account. The meshing time is estimated by t core,mesh = N core t node,mesh , where t node,mesh is the runtime on the node and N core the amount of used cores. The comparison of the meshing time for the three different resolutions is represented in Figure 13. t core,mesh in core seconds It can be observed that the meshing time in OpenLB is more than doubled each time the higher grid resolution is used. In contrast, the results of OpenFOAM show a certain overhead with the high resolution grid, i.e. The grid generation for OpenFOAM takes considerably more time with each increase in resolution. This can be justified by the complex meshing procedure, which consists of a castellation, snapping and adding layers step including several optimization cycles. Overall, the meshing time in OpenLB is on average 424 times shorter than in OpenFOAM. In the case of a static mesh, this performance benefit is not decisive. However, the use of a moving mesh, e.g., if piston motions are taken into account, requires several mesh updates in one cycle. Therefore, a fast grid creation process, such as that with OpenLB, can be essential in the context of engine relevant flows. The suitability of LBM for describing moving boundaries has been demonstrated in extensive comparisons for different moving boundary methods, e.g., in [74,75].

Simulation Performance
For the comparison of the simulation performance difference for each grid, a runtime metric is introduced. At t ss = 0.5 s, the beginning of the statistics computation, the runtime tracker is started. The tracked runtime t node,solver is divided by the according past simulation time t sim,solver and scaled with the number of cores N core . This core time t core,solver is written as t core,solver = N core t node,solver t sim,solver .
This means that the runtime metric calculates the core hours for one second of simulation time including the additional time for processing the turbulence statistics. The direct comparison of each grid resolution is justified by the comparable accuracy, see Section 4.3. The performance results for the three different grid resolutions are presented in Figure 14.
The bar chart reveals that the simulations obtained by OpenLB are significantly faster than the OpenFOAM simulations. The resulting performance factor can be determined by dividing t core,solver for the corresponding grid resolutions. If each grid configuration is taken into account, the mean performance factor for OpenLB to OpenFOAM can be estimated as 32.03. It is noteworthy that the performance factor varies greatly between the different grid resolutions, 21.76 for LR and 46.49 for HR. Additional quantities are introduced to further investigate the variance of the performance factor. The mean cells per core (MCPc) are given by Another performance metric are the cell updates per core and second (CUPcs), which are defined as CUPcs = N grid t core,solver ∆t .  Both quantities MCPc and CUPcs are listed for the three grid configurations in Table 3. The solvers show a contrary behavior, while OpenLB benefits from an increased MCPc and almost doubles the CUPcs from LR to HR, OpenFOAM has a decrease of about 45 percent. Consequently, OpenFOAM seems to be less affected by the used MCPc. The reasons for the different behavior can be manifold and range from the influence of the load balancing method to cache effects and communication effects. A detailed discussion of these influencing factors can be found in [72,76,77].

Conclusions and Outlook
The purpose of this paper is to evaluate NWM-LES for complex turbulent flows using LBM by comparison with FVM simulations and PIV experiments. Thereby, a van Driest damped Smagorinsky model coupled to the Musker equation was used to model the turbulent boundary layer. Both LBM and FVM NWM-LES approaches were outlined in detail. Three different grid resolutions were used to simulate an engine relevant in-cylinder flow with the open source frameworks OpenLB and OpenFOAM. Characteristic flow features of the in-cylinder flow were highlighted and compared side-by-side. In addition to the quantitative comparison, the errors of the tested grid configuration were calculated against a highly precise PIV measurement and analyzed in detail. It was shown that the matching grid configurations of both numerical methods had similar errors. Surprisingly, OpenLB requires only slightly more cells than OpenFOAM to produce the same accuracy, although no grid refinement was used. This can be justified by the chosen region of interest, which is remote from the wall, and also the incorporated near-wall treatment based on the wall function approach. The time-averaged and the RMS velocity at the highest grid resolution for OpenLB and OpenFOAM were in good agreement with the PIV measurement (nMAE | U | < 0.038 and nMAE |U RMS | < 0.098, respectively). The performance estimation revealed that the meshing process in OpenLB was 424 times faster and the simulation process approximately 32 times faster for the investigated setup. These significant performance differences in meshing and solver runtime indicate that LBM is a valuable and viable alternative to FVM in simulating IC engine relevant flows with NWM-LES. In particular, the fast grid generation process in OpenLB further reduces computational costs, if moving meshes are applied. The faster calculation speed for NWM-LES using LBM is advantageous to address industrial applications and to enable "overnight" calculations that previously took weeks. Therefore, faster design cycles and operating condition tests are feasible. The performance advantage can also be used to provide more precise results in the same time and finally paves the way for near-wall-resolved LES in the future [78].
Nevertheless, LBM still needs additional research to gain the maturity of NWM-LES with FVM. The applied equilibrium wall function approach based on Musker's law of the wall is strictly speaking only valid in fully developed turbulent boundary layers. In contrast, turbulent boundary layers of complex turbulent flows deal with pressure gradients, separation and recirculation, variable physical properties, compressibility effects and many more. Therefore, a further step is to implement a generalized wall function such as [79][80][81] in OpenLB that is able to model these flow features. In addition, the simple SGS model employed in this study can be replaced by more advanced turbulent models, e.g., models based on dynamic procedures [82], the scale similarity hypothesis [83] or wall-adapted SGS models [52], which have shown an increased accuracy for IC engine flows [2,42]. If reactive turbulent flows are considered, further investigations have to be done. In this respect, a modeling approach based on detailed chemistry with a large number of species as well as tabulated chemistry is a challenging task, especially for LBM due to the high memory requirements [84]. However, given the benefits of the mesh generation and computation time reductions shown in this work, LBM is a promising alternative to FVM in IC engine and many other complex turbulent flow applications in the future.   integral length scale Ma LB lattice Mach numbeṙ m in massflow into the flow bencḣ m out massflow out of the flow bench N resolution N core number of cores N e number of independent ensembles N grid number of grid cells N PIV number of PIV data points N t total number of time steps within t av p filtered pressure p LB filtered lattice pressure p out pressure at the numerical outflow P in, 1 absolute pressure at pressure sensor inlet 1 P in,2 absolute pressure at pressure sensor inlet 2 P out, 1 absolute pressure at pressure sensor outlet 1 P out,2 absolute pressure at pressure sensor outlet 2 q B dimensionless distance Q Q-criterion Re Reynolds number S filtered strain rate tensor wall distance y LB 1 lattice distance from the the node at position x LB f distance to the boundary y LB 2 lattice distance from the the node at position x LB n to the boundary y + dimensionless wall distance y p wall distance of the cell centroid Greek