Development and Validation of a Machine Learned Turbulence Model †

: A stand ‐ alone machine learned turbulence model is developed and applied for the solution of steady and unsteady boundary layer equations, and issues and constraints associated with the model are investigated. The results demonstrate that an accurately trained machine learned model can provide grid convergent, smooth solutions, work in extrapolation mode, and converge to a correct solution from ill ‐ posed flow conditions. The accuracy of the machine learned response surface depends on the choice of flow variables, and training approach to minimize the overlap in the datasets. For the former, grouping flow variables into a problem relevant parameter for input features is desirable. For the latter, incorporation of physics ‐ based constraints during training is helpful. Data clustering is also identified to be a useful tool as it avoids skewness of the model towards a dominant flow feature.


Introduction
Engineering applications encounter complex flow regimes involving turbulent and transition (from laminar to turbulent) flows, which encompass a wide range of length and time scales that increase with the Reynolds number (Re). Direct Numerical Simulations (DNS) require grid resolutions small enough to resolve the entire range of turbulent scales and are beyond the current computational capability. Availability of high-fidelity DNS and experimental datasets is fueling the emergence of machine learning tools to improve accuracy, convergence and speed-up of turbulent flow predictions [1,2]. Machine learning tools depend on neural networks to identify the correlation between input and output features, and have been used in different ways for turbulent flow predictions, such as direct field estimation, estimation of turbulence modeling uncertainty, or advance turbulence modeling.
In the direct field estimation approach, the entire flow field is predicted using a ML approach, i.e., the flow field is the desired output feature. For example, Milano and Koumoutsakos [3] estimated mean flow profile in the turbulent buffer-layer region using Burger's equation and channel flow DNS results as training datasets. Hocevar et al. [4] predicted the scalar concentration spectra in an airfoil wake using experimental datasets as training datasets. Jin et al. [5] estimated unsteady velocity distribution in the near wake (within four diameters) of a 2D circular cylinder using laminar solutions as training datasets and surface pressure distribution as input feature. Obiols-Sales et al. [6] developed Computational Fluid Dynamics (CFD) Network CFDNet-a physical simulation and deep learning coupled framework to speed-up the convergence of CFD simulations. For this purpose, a convolution neural network was trained to predict the primary variables of the flow. The neural network was used as intermediate step in the flow prediction, i.e., CFD simulations are solved as a warmup preprocessing step, then the neural network is used to infer the steady state solution, and following that CFD simulations are performed to correct the solution to satisfy the desired convergence constraints. The method was applied for range of canonical flows such as channel flow, flow over ellipse, airfoil, cylinder, where the results were encouraging. For the turbulence model uncertainty assessment, the desired output feature is the error in a CFD solution due to turbulence modeling. For example, Edeling et al. [7] used velocity data from several boundary layer flow experiments with variable pressure gradients to evaluate the k-ε model coefficient ranges. Then, the variation in model coefficients were used to estimate the uncertainty in Reynolds averaged Navier Stokes (RANS) solution. Ling and Tempelton [8] compared several flow features predicted by k-ε RANS with DNS/LES results for canonical flows (e.g., duct flow, flow over a wavy wall, square cylinder etc.) and estimated errors in RANS predictions due to ν T < 0, ν T isotropy, and stress non-linearity.
Simulations on coarser grids require a model for the turbulent stresses (τ). The stress terms account for the effect of unresolved (or subgrid) turbulent flow on the mean (or resolved) flow for RANS (or LES) computations. The primary question for turbulence modeling is how turbulent stresses are correlated with flow parameters or variables? A review of the literature as summarized in Appendix A Tables A1 and A2 shows that machine learning has been used to either augment physics-based models to improve their predictive capability [9][10][11][12][13][14][15][16][17][18][19][20][21][22] or develop standalone turbulence models [23][24][25][26][27][28][29][30]. The details of the input and output features and test and validation cases used in the studies are listed in the Tables, and the salient points of the studies are discussed below.
Parish and Duraisamy [9] analyzed DNS of plane channel flow to estimate the response surface of TKE production multiplier (β) as a function of four turbulent flow variables. The model was used to argument k-ω model. In a follow-on study [10] experimental data for wind turbine airfoils was used to adjust the Spalart-Allmaras (SA) RANS model ν T production as a function of five turbulent flow parameters. The β function was reconstructed using an artificial neural network to minimize the difference between the experimental data and SA model results. The models were used for aposteriori tests, where it showed significant improvement over the standard RANS models and was found to be robust even for unseen geometries and flow conditions. He et al. [11] developed a similar approach, wherein adjoint equations were derived for SA model solution error due to ν T production. The SA model predictions were compared with the experimental data at selected locations during runtime. Then, a solution of β was obtained using the adjoint equation to minimize discrepancy between the predictions and experimental data. The approach was applied for several canonical test cases and encouraging results were reported. Yang and Xiao [12] extended the work of Duraisamy et al. [9,10] to train a correction term for the transition model time-scale. Their model was trained using DNS datasets for flow over an airfoil at different angle of attack using both random forest and artificial neural network and using six different flow variables as input features. The trained correction term was implemented in a 4-Equation transition model, and applied for aposteriori tests involving unseen flow conditions (both interpolation and extrapolation mode) and geometry. The study reported good agreement for the transition location, validating the efficacy of such models.
Ling et al. [13] used six different DNS/LES canonical flow results to obtain coefficients of a non-linear stress formulation consisting of ten (10) terms involving non-linear combinations of the of rate-of-strain (S) and rotation tensors (Ω). The model coefficients were trained using a deep neural network as a function of five invariants of S and Ω tensors. The trained model coefficient map was applied for both apriori and aposteriori tests, including unseen geometry. The study reported that the ML model performs better than both the linear and non-linear physics-based models and performed reasonably well for unseen geometries and flow conditions. Xiao and coworkers [14,15] trained a response surface of the errors in k-ε RANS turbulent stress predictions as a function of ten flow features using random forest regression. The model was validated for apriori tests for two sets of cases, one where the test flow and training flow were similar, and second for unseen geometry and flow conditions. It was reported that the model performed better for the former case. Wu et al. [15] investigated metrics to quantify the similarity between the training flows and the test flow that can be used to provide guidelines for selecting suitable training flows to improve the prediction of such models. Wang et al. [16] extended the above approach for compressible flows. A model was trained using DNS datasets with 47 flow features obtained using combination of S, Ω, ∇k, and ∇T, inspired by Ling et al. [13]. The model was validated for apriori tests for flat-plate boundary layer simulations. The study identified that the machine learned model predictions depend significantly on the closeness to the training dataset.
Wu et al. [17] developed a model to address the ill-conditioned solutions predicted by machine learned model during aposteriori tests (i.e., small errors in the modeled Reynolds stresses results in large errors in velocity predictions). For this purpose, a model was trained to account for the errors in k-ε RANS model stress predictions (both linear and nonlinear terms) using seven turbulent flow features. The model was applied as an aposteriori test for flows involving slightly different geometry than the training case. Yin et al. [18] investigated the role of the feature selection and grid topology on the unsmoothness of the solution and large prediction errors reported in the above [17] study. They trained a model using 47 input features, inspired by Ling et al. [13]. The model was applied for apriori and aposteriori tests, wherein for the latter machine learned turbulent stresses were frozen. The study concluded that unsmoothness of the solution was primarily due to grid topology issues which results in discontinuities in the input features.
Yang et al. [19] used neural networks to train a wall-model for LES. The model was trained using three sets of input features: (1) wall parallel velocity (u || ) and d; (2) u || , d and grid aspect ratio; and (3) u || , d, grid aspect ratio and ∇p. The model was coupled with a Lagrangian dynamic Smagorinsky model and applied for channel flow over a wide range of flow conditions Re τ = 10 3 to 10 10 . The study reported that inclusion of additional flow features such as grid aspect ratio and pressure gradient does not show significant improvement in the predictions.
Weatheritt and Sandberg [20] used symbolic regression to derive an analytic formulation for turbulent stress anisotropy. The model was trained using hybrid RANS/LES solutions and a regression map for the model coefficients were obtained as a function of rate-of-strain and rotation tensor magnitudes. The anisotropy formulation was used along with the k-ω model, and the model showed very encouraging results for both apriori and aposteriori tests including unseen geometries. Jian et al. [21] used a deep neural network to train a regression map of the model coefficients for an algebraic RANS model. The model was trained using a single parameter |S|k/ε as the input feature. The model was validated for both apriori and aposteriori tests, including extrapolation mode, i.e., Re τ larger than those in the training dataset. The study reported that the ML model performed better than the non-linear physics-based models due to its capability to better capture the large stressstrain misalignment and strong stress anisotropy in the near-wall region. Xie et al. [22] used neural networks to train model coefficients of a mixed subgrid stress/thermal flux model. The model trained compressible isotropic turbulence flow using six flow features, and it was reported that the machine learned model performs better than the physics-based models.
Comparatively limited efforts have been made to develop standalone machine learned turbulence models. Schmelzer et al. [23] used a symbolic regression approach to infer the model coefficients of an algebraic RANS model. The model was trained using DNS datasets using invariants of S and Ω tensors. The model was applied for unseen flow conditions, where the machine learned model performed better than the k-ω RANS. Fang et al. [24] Energies 2021, 14, 1465 4 of 34 developed a model for turbulent shear stress τ uv using DNS of channel flow. The model used a deep neural network trained using mean flow gradients and Re τ as key input parameters, and non-slip boundary condition and spatial non-locality were enforced during training. The model was applied for aposteriori tests involving unseen flow conditions, and it was reported that the model worked better than the model proposed by Ling et al. [16] due to the use of Re τ and boundary condition enforcement. Zhu et al. [25] developed a regression map of RANS turbulent eddy viscosity using a radial basis function network using SA RANS solutions for flow over NACA 0012 and RAE2822 airfoils at different angles of attack using eight input features based on flow, gradient and wall distance. The flow domain was separated into near-wall, wake, and far-field regions, and a different model was trained in each region. The study reported that using wall-distance as weights helped during training, but training using log-transformation of dataset did not help. The model was applied for aposteriori tests for the training geometry but for unseen angles of attack (flow condition), and good predictions were reported for the lift/drag coefficients, and skin friction distributions.
King et al. [26] developed a subgrid stress model using DNS of isotropic and sheared turbulence using velocity, pressure, S and grid parameters as input features. The model was applied for apriori tests for the isotropic and sheared turbulence test cases on coarse grid, where it performed better than the dynamic LES model. Gamahara and Hattori [27] used a feed-forward neural network to train a regression map of LES turbulent stresses. For this purpose, DNS results for plane channel flow for Re τ = 180 to 800 were filtered on up to four times (in each direction) coarser grids, and the filtered flow field was used as input features. The study used four different sets of input features involving S, Ω, walldistance, and velocity gradients, and models were validated for apriori tests for unseen flow conditions. The study reported that the best model was predicted when u and d were used as input features. Further, it was reported that the machine learned models are more accurate than the similarity models, because of their ability to learn the non-linear functional relation between the resolved flow field and the subgrid stresses better than those prescribed in the physics-based model. Zhou et al. [28] used a similar approach to develop LES subgrid-scale model. The model was trained for isotropic decaying turbulence using gradients of filtered velocity and filter width (∆). Yuan et al. [29] also developed an LES model using deconvolution neural network with isotropic decaying turbulence datasets for training. The model was trained using filtered velocity as the input parameter and validated for both apriori and aposteriori tests. Maulik et al. [30] used an artificial neural network to train a regression map of subgrid term for 2D turbulence using primary variables and its gradients as input features. The model was validated for both apriori and aposteriori tests. The ML model provided good predictions; however, it was reported that some measure of aposteriori error must be considered during optimal model selection for greater accuracy.
Some recent studies [31][32][33] have investigated the development of machine learning models for turbulent flow predictions by incorporating physics-based constrained during the training. These models provide an important direction to the applicability of the machine learning approach, but thus far they have used only for direct field estimation.
Overall, a review of the literature shows that (1) Machine learning has been primarily used for RANS model augmentation, where either turbulence production is adjusted, or nonlinear stress components are added to linear eddy viscosity term. Limited effort has been made to develop a stand-alone model, except for some recent effort focusing on modeling of subgrid stresses for LES. (2) Studies have used wide range input flow features for machine learned model training.
There is a consensus that combining flow features into physically relevant flow feature is desirable, as this helps incorporate physics in machine-learning. Use of a large numbers of input features have been found to be helpful to some extent as it allows output features to be uniquely identified in the different flow regimes. However, they introduce additional sources of inaccuracy. For example, unsmooth solutions have been reported due to inaccurate calculation in the input features involving higher-order formulations of the derivative terms. (3) Machine learned models have been applied for both apriori and aposteriori tests for both unseen geometry and flow conditions, including Re extrapolation mode. The model in general perform well for unseen flow conditions, but issues have been reported for unseen geometries. In general, the machine learned models are most accurate when the test flow has similar complexity to the training flow. (4) Studies have reported issues during training due to overlap in the output features in different flow regimes. It has been tacked by using more input features, as discussed above, and by segregating the flow domain into regions with similar flow characteristics, such as near-wall, wake and far-field regions, and train separate models in each region.
In summary, there are open issues such as, "What is the best way to use machine learning for turbulence model development?" Should machine learning be used to augment an existing model or to develop a stand-alone model. The model augmentation approach builds on a baseline physics-based model; thus, it has some inherent robustness, especially when the model is used in extrapolation mode or for unseen geometry. However, this approach undercuts the advantage of machine learning. If it is expected that neural networks can accurately learn the errors in a RANS model and provide a universal model for the errors, then there is no reason to believe that same approach cannot provide a model for Reynolds stresses. Second, none of the studies in the literature have validated the machine learned model in a similar fashion as the physics-based model, i.e., how does it perform when started from ill-posed flow conditions, i.e., when simulation is started from an arbitrary initial condition, or does it provide grid convergence, or can independently query output in contiguous regions result in kinks in the solution. Apart from the above questions, there are additional issues regarding the best practices for optimizing machine learning approach itself [31].
The objective of this paper is to investigate the predictive capability of a stand-alone machine learned turbulence model and shed light on some of the above issues and constraints. To achieve these objectives, a DNS/LES database is curated for incompressible boundary layer solution available in the literature (i.e., channel flow and flat-plate boundary layer solution at zero pressure gradient), and a DNS database has been generated for oscillating channel flow. The datasets are used to train a ML response surface for the turbulent stress. The model is validated for apriori and aposteriori tests, and predictions are compared with DNS and RANS model results. The preliminary results for the channel flow case have been presented in ASME conference paper [32], and those of oscillating channel flow case has been presented in SC20 conference paper [33]. The results from the above publications have been further refined and are presented herein.
The novel aspects of this study include: development of a stand-alone RANS model, which has not received same level of attention as the RANS model augmentation; the effect of physics-based constraints during the training of a model is investigated, which has not been done before; the machine learned model is applied for an unsteady flow, thus far none of the studies have applied and tested machine learned model for unsteady flows; and the ability of the machine learned model to adapt to ill-posed initial flow conditions, as expected in a typical CFD simulation, and their ability to provide grid independent solution as expected for a RANS model are investigated.
The following section provides an overview of the DNS/LES datasets curated or procured in this research. Section 2 provides an overview of the machine learning approach. Sections 3 and 4 focuses on development and validation of the machine learned model for boundary layer flows and oscillating channel flow case, respectively. Finally, some key conclusions and future work are discussed in Section 5.

Machine Learning Approach
The basic framework of the deep learning neural network is described in Figure 1, inspired from LeCun et al. [34]. The network consists of various layers, including the first input layer comprising of input features, the last output layer comprising of the required output features, and multiple hidden layers comprising of features units obtained using linear combination of feature units from the previous layer. Each layer, excluding the input layer, has input features (z) and output features (y). Let's say for the pth layer with m feature units, the input and output features are z 1 to z m and y 1 to y m , respectively. The input feature units for the pth layer is obtained from the linear combination of the n output feature units in the layer above or (p − 1)th layer, as below: where superscript (p) is used to represent the pth layer, i varies from 1 to m and j varies from 1 to n, and w ij (p) are the unknown weights that need to be estimated. Thus, there are m neurons that connect the (p − 1)th and pth layer. Note that for the first hidden layer z vector are the input features, and for the last output layer y vector are the output features. Also note that some of the features in the hidden layers can be dropped depending on the threshold of weights permitted. Usually in deep learning applications, the input features in the input layer are scaled to vary in-between 0 and 1, and similarly the weights are positive and normalized such that ∑ w ij (p) = 1. For each hidden and output layers, the output features are obtained from the input features on that layer as where f (..) is a pre-defined non-linear analytic activation function. The most common functions are Rectilinear Linear Units (ReLU) function provides a linear dependency on the input features, exponential Linear Units (ELU) are same as ReLU for non-negative inputs, but use exponential modulation for negative inputs, hyperbolic tangent and logistic functions modulate both the positive and negative inputs. As pointed out by Lee et al. [35], ReLU is the most commonly used function for training deep neural networks because they do not suffer from vanishing gradients and they are fast to train. However, they ignore the negative values and hence lose information, which is referred to as "dying ReLU". ELU on the other hand can capture information for the negative inputs, so they used ELU for training their physics-informed neural network. put layer. For the hidden layers a backpropagation approach is used, where the derivative information is computed based on information from the layer ahead starting from the output layer, Equation (5), as below: and then Equations (6) and (7) are used to obtain   [34]. The above example shows a neural network with one input, two hidden and output layers. Each of the circles represent a feature in a layer, let's say i features in the input layer, k features in the hidden layer and m features in the output layer. The variable "x" are the input parameters, and "z" and "y" are the inputs and outputs, respectively, of both hidden and output layers. The features on the extreme right are the true values (T) used for training the network. The forward arrows represent the calculation of the features on the next layer based on linear combination of features on previous layer using the (positive) Figure 1. Block diagram summarizing machine learning approach inspired from LeCun et al. [34]. The above example shows a neural network with one input, two hidden and output layers. Each of the circles represent a feature in a layer, let's say i features in the input layer, k features in the hidden layer and m features in the output layer. The variable "x" are the input parameters, and "z" and "y" are the inputs and outputs, respectively, of both hidden and output layers. The features on the extreme right are the true values (T) used for training the network. The forward arrows represent the calculation of the features on the next layer based on linear combination of features on previous layer using the (positive) weight matrix (w ij ). Broken arrows show backpropagation of the network prediction error to readjust the weights. The network prediction error (E) is computed using a predefined cost function comparing the network output with the true values. The derivatives of the errors are computed to adjust the weight matrix, where α is learning rate.
The unknown weight matrix in each layer is estimated using backpropagation approach, as described below. The network is first initialized with constant zero weight for each layer and the error in the network prediction are obtained by comparing them with training dataset or true values (T) using a user specified analytic cost function, such as L 2 norm For the above defined analytic cost function, the derivatives of the errors in the output layer with respect to the output feature is and the derivative of the errors with respect to the input features is where, f is a known analytic function from Equation (3). Since the input features of a layer are linearly related to the output features of the previous layer as shown in Equation (1), the derivative of the errors with respect to the weights are computed as Lastly, the weights in each layer are adjusted as w ij where α is learning rate, and subscript it represents the iteration level. Commonly available ML softwares provide optimizers which dictate the learning rate. For example, adaptive moment estimation (ADAM) optimizer [36] available in Keras application programming interface (API) requires a user specified initial learning rate, but the rate is adaptively adjusted during training. Note that in deep neural networks "iterations" refer to running a subset of the data (batch) forward and backwards through the network, whereas "epoch" refers to running all the training data forward and backward through the network. Thus, epochs are not same as iterations, unless the entire dataset is the batch size. Further note that the terms on RHS is known from Equations (5) and (6) for the output layer. For the hidden layers a backpropagation approach is used, where the derivative information is computed based on information from the layer ahead starting from the output layer, Equation (5), as below: and then Equations (6) and (7) are used to obtain ∂E ∂w ij (p−1) . In the physics-informed machine learning (PIML) approach [37] the cost function is modified to include the residual in the governing equations (R); thus, Equation (4) is modified to Note that Equation (5) remains changed.

Test Cases and Database for Model Training
Two different text cases have been considered in this study, steady and unsteady channel flow. For the former, the mean flow solution describes the inner layer of a flat-plate boundary layer (with zero pressure gradient) at a fixed location on the plate. This case is a fundament test case for turbulence model development and several DNS studies are available for a range of flow conditions. The flow pattern for this case reduces to a onedimensional (1D) problem. For the second test case, the inner boundary layer undergoes unsteadiness due to prescribed pressure gradient pulse. The mean flow pattern for this case reduces to an unsteady 1D problem, which adds another level of complexity to the first test case.

Governing Equation
This test case focuses on the simulation of the inner layer of the turbulent boundary layer under zero-pressure gradient at a fixed location on a flat-plate. The governing equations for such flow condition can be derived from the incompressible Navier-Stokes equations under the assumption that the flow is 2D and steady, and streamwise gradients are negligible compared to those along the wall normal direction (refer to Appendix B for derivation) as below: where y is the coordinate direction normal to the wall and u is the ensembled averaged streamwise velocity. Note that the correlation between streamwise and wall-normal turbulent velocity fluctuations (τ uv = u v , which are the shear stresses) is the only unknown quantity that needs to be modeled. Also note that although the above equations are valid for flows with zero pressure gradient, the above equation includes a body force term (F), which can be misconstrued as pressure-gradient term. This term is added because the simulations are performed for flow between two flat-plates; thus, a body force is required to balance the momentum loss due to wall friction and achieve a steady state. The body force term F(τ w ) is a function of wall shear stress τ w expected at the simulated flat-plate location. The wall shear stress can be expressed in terms of friction velocity u τ , and is a fixed input parameter: and where H is the half channel height. Note that the simulated flow conditions in a channel flow case can be changed simply by changing the wall friction value (or the applied body force term). Further note that a time-derivative term in the governing equation is a pseudo time derivative, and used as a residual term and provides a measure of the solution convergence. A steady state solution is achieved as the time derivative term approaches zero. Commonly used linear URANS models express the turbulent shear stress as [38] τ uv = ν T ∂u ∂y (14) where, ν T is an unknown turbulent eddy viscosity. The one-equation URANS model requires solution of an additional transport equation for turbulent kinetic energy (k) and turbulent eddy viscosity is obtained as The turbulent length scale is prescribed as = κd 1 − e −y + /26 (17) where von-Karman constant κ = 0.41 and d is distance from the wall. Refer to Warsi [38] for details for the modeling. Similar to the streamwise velocity equation (Equation (11)), the time derivative can be perceived as a residual term for steady simulations, and a steady state is achieved as term approaches zero. The time derivate term provides the time-accurate solution for URANS simulations.

DNS/LES Database
A DNS/LES database is curated to train a ML response surface for τ uv . As summarized in Table A3, the database includes, 21 channel flow DNS cases with Reynolds numbers ranging from Re τ = 109 to 5200 [39][40][41][42][43][44][45]; 18 flat-plate boundary layer with zero-pressure gradient DNS cases with Reynolds numbers ranging from Re θ = 670 to 6500 [46][47][48]; and 14 flat-plate boundary layer with zero-pressure gradient LES cases with Re θ = 670 to 11,000 [49,50]. The database contained around 20,000 data points of which 3.4% were in the sub-layer, 5.7% in buffer layer and 90.9% in the log-layer or the outer layer. Note that all the datasets are for flat-plate boundary layer with zero-pressure gradient, where the channel flow cases represent only the inner-layer.

ML Model Training and Refinement Using Apriori Tests
ML model is developed using the cost function in Equation (4), which is referred to as data-driven machine learned (DDML) model, and cost function including the residual in the governing equations, i.e., Equation (10), which is referred to as physics-informed machine learned (PIML) model. For the PIML model training, the residual of the governing equation is obtained using the integral form of the governing equation (Equation (11)) at steady state as below: where subscript 'i' denotes the solution at the epoch level.
The DDML model is trained used a multilayer perceptron (MLP) neural network consisting of two dense hidden layers, each with 512 neurons with ReLU activations, and a linear activation on the output layer. ADAM optimizer is used during training with a mean absolute error loss function, generating 276,000 trainable parameters. To mitigate over-fitting during training, each layer has a 20% dropout [51]. The PIML model is trained using an eight-layer deep neural network with each hidden layer having 20 neurons and a hyperbolic tangent activation function. The model is optimized using an L-BFGS quasi-Newton full-batch gradient-based optimization method and iterated for 2400 epochs. For the model training, the dataset is not separated into training and test sets; but rather 70% of the dataset is chosen randomly. Thus, it is possible that there could be an overlap in the datasets used for training and apriori tests. A parametric study was also performed by choosing 80% and 100% of datasets, which did not show much effect on the model predictions in apriori tests. Further note that the number of layers and neurons used for training DDML and PIML are different. For DDML training, a parametric study was performed using more layers and different number of neurons. The tests revealed that increasing the number of layers increases the training time, but does not necessarily improve the training accuracy. Increasing the number of neurons in lieu of layers was found to be computationally efficient and helped in reducing the training error. The numbers of layers and neurons used in the study are found to provide optimal model in terms of computational cost and accuracy. The optimal width and depth of neural network architecture also depends on the amount of data available for training the networks [52]. A shallow depth network is partially due to comparable small training dataset. For the PIML training a similar test was not performed and the training set-up was same as that used by [37]. Overall, although the numbers of layers and neurons differed between the PIML and DDML model training, but the neural network architecture for both provided the needed accuracy.
The machine learned turbulence model seeks to obtain a response surface of the shear stress τ uv , which is the output feature. The input features are the flow parameters. Referring to Figure 1, if a training batch uses N number of data points and each data point has M flow parameters, then the total number of input features x i : i = N × M, and total number of output features y m : m = N. The batch size was 128 for this case. Note that even though the M flow parameters at each data point are related; however, during the training they are considered independent of each other. The PIML model may leverage the correlation between the input features at a datapoint, as the feature set is expected to satisfy the governing equations.
Considering all the possible flow variables; the response surface of the shear stress τ uv is expected to have following functional form: Instead of providing the flow features independently, they are grouped or nondimensionalized into physically relevant parameter. The velocity derivative is a key term, which captures the rate-of-strain in the flow. It is commonly accepted that turbulent stresses can be modeled by extending Stokes' law of friction (for viscous stresses), except that the linear assumption is debatable [38]. Thus, the velocity derivative normalized by the centerline velocity U c and channel height H is used as an input parameter. Figure 2 provides an overview of the correlation of τ uv with respect to key input features ∂u ∂y . As  The flow viscosity ν is one of the fundamental bulk fluid property and dictates one of the key non-dimensional parameters for the boundary layer flow, which is the Reynolds number (Re). Re is usually defined based on global properties, such as channel height H and centerline velocity , Re = H/ν. This bulk flow parameter has little significance to the local stresses; thus, is not a suitable input parameter for machine learning. Rather, Reynolds number based on wall distance = , or based on local velocity and wall distance = seem to be better choice, as they also incorporate local parameters. Note that for channel flow datasets, both the global parameters and are related via near-wall velocity gradient, and do not provide any additional information about the flow. Thus, when training a model just using the channel flow datasets is not required. However, when the datasets include both flat-plate boundary layer and channel flow datasets, then provides additional identifying information (as discussed below). A preliminary apriori analysis is performed using just the channel flow DNS datasets to evaluate which Re formulation ( or ) is better, and also to investigated how inclusion of higher-order terms of rate-of-strain affects the model. For this purpose, the DDML model was trained using the following four sets of input parameters: As shown in Figure 3a, the stresses obtained using input feature set , is around 8% under predictive whereas those using obtained using , compare very well and is only 2% over predictive. An accurate prediction by the latter is not surprising, as local Reynolds number = is a physical quantity that separates the flow regime, i.e., < 25 is sub-layer; 25 ≤ < 418 is buffer-layer; and ≥ 418 is loglayer. The advantage of over is also evident in Figure 3b,c, where the former shows a better collapse in dataset in the buffer layer. Further, including as an input feature deteriorates the model performance and the stresses are overpredicted by 9%.
However, using as a weighting parameter improves the results and the L2 norm The flow viscosity ν is one of the fundamental bulk fluid property and dictates one of the key non-dimensional parameters for the boundary layer flow, which is the Reynolds number (Re). Re is usually defined based on global properties, such as channel height H and centerline velocity U c , Re = U c H/ν. This bulk flow parameter has little significance to the local stresses; thus, is not a suitable input parameter for machine learning. Rather, Reynolds number based on wall distance Re d = U c d ν , or based on local velocity and wall distance Re l = ud ν seem to be better choice, as they also incorporate local parameters. Note that for channel flow datasets, both the global parameters ν and u τ are related via near-wall velocity gradient, and do not provide any additional information about the flow. Thus, when training a model just using the channel flow datasets u τ is not required. However, when the datasets include both flat-plate boundary layer and channel flow datasets, then u τ provides additional identifying information (as discussed below).
A preliminary apriori analysis is performed using just the channel flow DNS datasets to evaluate which Re formulation (Re d or Re l ) is better, and also to investigated how inclusion of higher-order terms of rate-of-strain affects the model. For this purpose, the DDML model was trained using the following four sets of input parameters: As shown in Figure 3a, the stresses obtained using input feature set Re d , ∂u ∂y is around 8% under predictive whereas those using obtained using Re l , ∂u ∂y compare very well and is only 2% over predictive. An accurate prediction by the latter is not surprising, as local Reynolds number Re l = u + y + is a physical quantity that separates the flow regime, i.e., Re l < 25 is sub-layer; 25 ≤ Re l < 418 is buffer-layer; and Re l ≥ 418 is log-layer. The advantage of Re l over Re d is also evident in Figure 3b as a weighting parameter improves the results and the L 2 norm error for this case is estimated to be 2.9 × 10 −4 , slightly better than the L 2 norm error of 5.2 × 10 −4 obtained using the feature set Re l , ∂u ∂y .
Energies 2021, 14, x FOR PEER REVIEW 12 of 35 error for this case is estimated to be 2.9 × 10 −4 , slightly better than the L2 norm error of 5.2 × 10 −4 obtained using the feature set , .
(a) When the entire channel and flat-plate database is considered, we need an additional parameter to demarcate between the inner and outer boundary layer. The inner and outer layer demarcation can be achieved by using a combination of = and , where large and ⟶ 0 corresponds to the start of the outer layer. Thus, a model was trained using the following set of input parameters: In addition, different weighting functions, ML-1 through ML-4 as shown below, were used during training of the DDML response surface as the data show significant overlap (as pointed out in Figure 2) at the intersection of the inner and outer layers: ML-1: No weighting ML-2: Weighted using curvature of the profiles ML-3: levels were expanded to separate out the curves ML-4: Not-weighted for ≤ 10 but weighted for > 10 Note that weighting was also considered. But, similar to the earlier test this weighting did not show significant improvement over ML-1. This is expected as neural networks should be able to learn the dependency on higher-order forms of the input parameters all by itself. Thus, such weighting is not considered further. When the entire channel and flat-plate database is considered, we need an additional parameter to demarcate between the inner and outer boundary layer. The inner and outer layer demarcation can be achieved by using a combination of y + = du τ ν and ∂u ∂y , where large y + and ∂u ∂y → 0 corresponds to the start of the outer layer. Thus, a model was trained using the following set of input parameters: In addition, different weighting functions, ML-1 through ML-4 as shown below, were used during training of the DDML response surface as the data show significant overlap (as pointed out in Figure 2) at the intersection of the inner and outer layers: ML-1: No weighting ML-2: Weighted using curvature of the profiles ML-3: τ uv levels were expanded to separate out the curves ML-4: Not-weighted for τ uv ≤ 10 −3 but weighted for τ uv > 10 −3 weighting was also considered. But, similar to the earlier test this weighting did not show significant improvement over ML-1. This is expected as neural networks should be able to learn the dependency on higher-order forms of the input parameters all by itself. Thus, such weighting is not considered further.
An apriori analysis for a range of channel flow conditions in Figure 4 shows that the DDML response surface obtained without any weighting (ML-1) is consistently under predictive. Those obtained using profile curvature as a weighting function (ML-2) results in a waviness in the profile and is not recommended. DDML using τ uv weighting improves the results, and ML-4 provides somewhat better results than ML-3. The model shows some limitations for the prediction of peak stress for Re τ = 1000, which needs to be further investigated. The PIML model shows the best predictions among all the models for the entire range of Re τ , and compares very well with DNS, including those for Re τ = 1000. In addition, PIML removes the trial-and-error approach to train the model. An apriori analysis for a range of channel flow conditions in Figure 4 shows that the DDML response surface obtained without any weighting (ML-1) is consistently under predictive. Those obtained using profile curvature as a weighting function (ML-2) results in a waviness in the profile and is not recommended. DDML using weighting improves the results, and ML-4 provides somewhat better results than ML-3. The model shows some limitations for the prediction of peak stress for Reτ = 1000, which needs to be further investigated. The PIML model shows the best predictions among all the models for the entire range of Reτ, and compares very well with DNS, including those for Reτ = 1000. In addition, PIML removes the trial-and-error approach to train the model.

Oscillating Plane Channel Flow
Oscillating channel flow is a canonical test case used in the literature to understand turbulence flow physics and validate the predictive capability of LES models. Scotti and Piomelli [53] performed DNS and LES of oscillating channel flow to study the effect of pressure gradients on the modulation of the viscous sublayer, turbulent stresses and the topology of the coherent structures. In this test case, an unsteady pressure pulse (body force term) is applied along the streamwise direction, which introduces periodic unsteadiness in the flow.

Governing Equation
The governing equation for the mean flow for this case can be obtained similar to the inner boundary layer equations as discussed in Appendix B as below:

Oscillating Plane Channel Flow
Oscillating channel flow is a canonical test case used in the literature to understand turbulence flow physics and validate the predictive capability of LES models. Scotti and Piomelli [53] performed DNS and LES of oscillating channel flow to study the effect of pressure gradients on the modulation of the viscous sublayer, turbulent stresses and the topology of the coherent structures. In this test case, an unsteady pressure pulse (body force term) is applied along the streamwise direction, which introduces periodic unsteadiness in the flow.

Governing Equation
The governing equation for the mean flow for this case can be obtained similar to the inner boundary layer equations as discussed in Appendix B as below: where is the prescribed pressure pulse. The above equation is derived under the assumption that mean flow is 2D in nature and wall-normal gradient is more dominant than the streamwise gradients. Both of these assumptions are valid for this case. Because DNS is performed using periodic domain, the mean flow variation along the streamwise direction is assumed negligible, and the velocity changes occur in time. Thus, this test case represents how boundary layer flow, at a fixed location on a flat plate, varies in time as the free-stream pressure gradient in varied. Further, note that the dP 0 dx term corresponds to the F(τ w ) forcing required to obtain steady state solution for flow between two flat-plates, which is the baseline flow. The term dP 0 dx α 0 cos(ωt) accounts for the variation of the free-stream pressure gradients. Similar to the steady inner-boundary layer case, the closure of the above equation requires modeling of shear stress τ uv . The physics-based one-equation URANS model described in Section 3.1.1 is valid for this case as well.

DNS Database
The DNS for this case is performed using an in-house parallel pseudo-spectral solver, which discretizes the incompressible Navier-Stokes equations using Fast Fourier Transform (FFT) along the homogenous streamwise and spanwise directions and Chebyshev polynomials in the wall-normal direction. The solver is parallelized using a hybrid OpenMP/MPI approach, scales up to 16K processors on up to 1 billion grid points, and has been extensively validated for LES and DNS studies [54,55]. DNS were performed for three different (high, medium, and low) streamwise pressure pulses, as used in [53]. The simulations were performed using as domain size of 3π × 2 × π along the streamwise, wall normal and spanwise directions, respectively, on a grid consisting of 192 × 129 × 192 points. The domain size is consistent with those used in [52], but the grids are finer. The details of the simulation set-up are provided in Table A4. The simulations were performed using periodic boundary condition along streamwise and spanwise directions, and no-slip wall at y = ±1.
The DNS results were validated in a previous study [56] for the prediction of the alternating (AC) and mean (DC) components of the mean flow against results presented in [53]. The AC components were obtained by decomposing normalized-planar-averaged velocity profile at every one-eight cycle using Fast Fourier Transform at each wall normal location. The DNS results were found to be in good agreement with the available benchmark results. The low frequency pulse resulted in re-laminarization and transition behavior, which is a challenging case for RANS predictions. The high frequency case was primarily driven by the pressure gradient, and turbulence levels were small. The medium frequency case provides a compromise between the two extremes and is used in this study.
The variation of the wall shear stress and streamwise and wall-normal velocities for the medium frequency case is shown in Figure 5. The positive pressure gradient generates adverse flow conditions and decelerates the flow (and decreases wall shear stress magnitude) and the negative pressure gradient generates favorable flow conditions which accelerates the flow (and increases wall shear stress). The peak wall shear (and velocity) is observed around 3/4th cycle (t = 0.75). The wall normal velocity shows that the ejection events are subdued during the first part of the oscillation cycle (descending leg), i.e., at t = 0.25 the ejection events are limited within the quarter channel and diminished

ML Model Training Using Apriori Tests
For training of the ML turbulence model, the 3D DNS dataset was processed to obtain an unsteady 1D dataset. Due to the use of a periodic boundary condition along the streamwise direction, the simulation domain is expected to move in time [57], and as demonstrated in Figure 6a. Further, the results at any time step represents multiple (every grid point in the streamwise-spanwise plane or 192 × 192) realizations of the turbulent flow field expected at that oscillation phase. The solutions were averaged over these realizations to obtain mean 1D flow along the wall normal (y) direction. The 1D solutions every 100th time step or 1/400th pressure oscillation cycle were used to generate a 2D y-time map of the mean flow as shown in Figure 6b. Solutions were collected over three pressure oscillation cycles resulting in 129 × 1201 (or around 155,000) data points. 0.5 corresponds to the peak and trough of the pressure gradient pulse, respectively. Results are shown for the medium frequency case.

ML Model Training Using Apriori Tests
For training of the ML turbulence model, the 3D DNS dataset was processed to obtain an unsteady 1D dataset. Due to the use of a periodic boundary condition along the streamwise direction, the simulation domain is expected to move in time [57], and as demonstrated in Figure 6a. Further, the results at any time step represents multiple (every grid point in the streamwise-spanwise plane or 192 × 192) realizations of the turbulent flow field expected at that oscillation phase. The solutions were averaged over these realizations to obtain mean 1D flow along the wall normal (y) direction. The 1D solutions every 100th time step or 1/400th pressure oscillation cycle were used to generate a 2D y-time map of the mean flow as shown in Figure 6b. Solutions were collected over three pressure oscillation cycles resulting in 129 × 1201 (or around 155,000) data points. For this case, k is also added as an unknown turbulence quantity, and the machine learned model seeks to obtain regression map with two outputs ( , ). Based on the experience of the boundary layer case, the input flow parameters for this case are identified to be the time varying and steady mean flow quantities as below: For this case, k is also added as an unknown turbulence quantity, and the machine learned model seeks to obtain regression map with two outputs (τ uv , k). Based on the experience of the boundary layer case, the input flow parameters for this case are identified to be the time varying and steady mean flow quantities as below: where u τ (t) is the local friction velocity and u τ0 is the baseline wall friction corresponding to dP 0 dx . Only the DDML model is trained for this case, as temporal derivatives could not be computed during training. The model was trained using a three-layer-deep neural network with 512 neurons in each hidden layer with ReLU activation functions. The final layer was a linear fully connected layer. The model was trained for 200 epochs using an ADAM optimizer with a batch size of 128 and a learning rate of 0.01. During the training, the L 2 norm error dropped by an order of magnitude in the first 25 epochs, and the error drop stalled thereafter.
The oscillating channel flow involves a wide range of turbulence regimes unlike the boundary layer case which only has sub-, buffer-and log-layer regimes. As shown in Figure 7 in this case, the turbulence is most prominent toward the end of the pressureoscillation cycle, and varies significantly along the wall-normal direction. Thus, training a ML model using the entire dataset may be skewed toward the more prominent features. For example, for the boundary layer case the models perform much better in log-layer than in the buffer-layer, as 91% of the datasets were in log-layer region. The Birch clustering algorithm sklearn was used to cluster the datapoints into unique flow regimes. The clusters were automatically determined using a non-dimensional threshold of 0.3, which resulted in 412 clusters with (min, max, mean, std) of points to be (1, 10,700, 376, 937), as shown in Figure 7. Note that the clustering preserves the periodicity of data.
where ( ) is the local friction velocity and is the baseline wall friction corresponding to . Only the DDML model is trained for this case, as temporal derivatives could not be computed during training. The model was trained using a three-layer-deep neural network with 512 neurons in each hidden layer with ReLU activation functions. The final layer was a linear fully connected layer. The model was trained for 200 epochs using an ADAM optimizer with a batch size of 128 and a learning rate of 0.01. During the training, the L2 norm error dropped by an order of magnitude in the first 25 epochs, and the error drop stalled thereafter.
The oscillating channel flow involves a wide range of turbulence regimes unlike the boundary layer case which only has sub-, buffer-and log-layer regimes. As shown in Figure 7 in this case, the turbulence is most prominent toward the end of the pressure-oscillation cycle, and varies significantly along the wall-normal direction. Thus, training a ML model using the entire dataset may be skewed toward the more prominent features. For example, for the boundary layer case the models perform much better in log-layer than in the buffer-layer, as 91% of the datasets were in log-layer region. The Birch clustering algorithm sklearn was used to cluster the datapoints into unique flow regimes. The clusters were automatically determined using a non-dimensional threshold of 0.3, which resulted in 412 clusters with (min, max, mean, std) of points to be (1, 10,700, 376, 937), as shown in Figure 7. Note that the clustering preserves the periodicity of data. Several ML models were trained by randomly sampling various percentage of points within each cluster from 0.25% to 100% of the total dataset. As shown in Figure 8b-d, training using just one datapoint per cluster (0.25% of total) results in errors of 35%. The error level reduced to 28% when four points are used per cluster (or 1% of total). The error levels were around 12% when 10% of the datasets (either 10 points or 10% of each cluster) was used. A similar error levels were obtained when model was trained using all the datapoints. In addition, training using 10% of the data points required around eight times Several ML models were trained by randomly sampling various percentage of points within each cluster from 0.25% to 100% of the total dataset. As shown in Figure 8b-d, training using just one datapoint per cluster (0.25% of total) results in errors of 35%. The error level reduced to 28% when four points are used per cluster (or 1% of total). The error Energies 2021, 14, 1465 18 of 34 levels were around 12% when 10% of the datasets (either 10 points or 10% of each cluster) was used. A similar error levels were obtained when model was trained using all the datapoints. In addition, training using 10% of the data points required around eight times smaller computational cost compared to those using all of the datapoints. This indicates that an intelligently sampled dataset can generate reliable machine learned models at a lower computational cost.

Aposteriori Tests of the ML Model
For aposteriori tests, the machine learned turbulent shear stress (and TKE) regression map obtained in the pre-processing step above was coupled with the parallel pseudospectral solver used for DNS of oscillating channel flow (discussed in Section 3.2.2). Figure  9 provides a schematic diagram demonstrating the coupling of machine learned turbulence model with the CFD solver. As shown, the stress (or TKE) regression map is queried every time step using the flow predictions at that iteration level, and the stresses are used to advance the solution to the next time step. Both the test cases considered in this study are 1D in nature, i.e., mean streamwise velocity varies only along the wall-normal direction; thus, the simulations were performed using only two points in the streamwise and spanwise direction.

Aposteriori Tests of the ML Model
For aposteriori tests, the machine learned turbulent shear stress (and TKE) regression map obtained in the pre-processing step above was coupled with the parallel pseudospectral solver used for DNS of oscillating channel flow (discussed in Section 3.2.2). Figure 9 provides a schematic diagram demonstrating the coupling of machine learned turbulence model with the CFD solver. As shown, the stress (or TKE) regression map is queried every time step using the flow predictions at that iteration level, and the stresses are used to advance the solution to the next time step. Both the test cases considered in this study are 1D in nature, i.e., mean streamwise velocity varies only along the wall-normal direction; thus, the simulations were performed using only two points in the streamwise and spanwise direction.

Steady Plane Channel Flow
The ML trained turbulent shear stress response surface (generated in previous section) was used in the solution of the boundary layer equation (Equation (11)). The equations were solved using a domain y = [-1,1] with no-slip boundary condition (u = 0) at y = ± 1. The simulations were performed using 49 (coarse), 65 (medium), and 97 (fine) grid points. Two sets of simulations were performed for this case. For the first set, the simulations were started from a converged channel flow RANS solution obtained using oneequation model, and the second set of simulations were started from an ill-conditioned velocity field. The DDML and PIML model simulations were performed using input feature set , , (as shown in Equation (21)). Further for the DDML simulations ML-3 and ML-4 weighting which provided the best predictions for apriori tests were considered. Figure 10a shows solution convergence on all the three grids obtained using DDML when solution is started from RANS solution. The convergence time history shows a kink in residual early-on in the simulation, but eventually solution shows steady convergence. The convergence is faster for the fine grid and slowest for the coarse grid. On the coarse grid the residual converges to around 2 × 10 −6 . Whereas for both medium and fine grid simulations, the residual keep dropping even below 1.2 × 10 −6 after 30 thousand (K) pseudo timestep iterations. The PIML results in Figure 10b shows a similar convergence.

Steady Plane Channel Flow
The ML trained turbulent shear stress response surface (generated in previous section) was used in the solution of the boundary layer equation (Equation (11)). The equations were solved using a domain y = [−1,1] with no-slip boundary condition (u = 0) at y = ±1. The simulations were performed using 49 (coarse), 65 (medium), and 97 (fine) grid points. Two sets of simulations were performed for this case. For the first set, the simulations were started from a converged channel flow RANS solution obtained using one-equation model, and the second set of simulations were started from an ill-conditioned velocity field. The DDML and PIML model simulations were performed using input feature set Re l , y + , ∂u ∂y (as shown in Equation (21)). Further for the DDML simulations ML-3 and ML-4 weighting which provided the best predictions for apriori tests were considered. Figure 10a shows solution convergence on all the three grids obtained using DDML when solution is started from RANS solution. The convergence time history shows a kink in residual early-on in the simulation, but eventually solution shows steady convergence. The convergence is faster for the fine grid and slowest for the coarse grid. On the coarse grid the residual converges to around 2 × 10 −6 . Whereas for both medium and fine grid simulations, the residual keep dropping even below 1.2 × 10 −6 after 30 thousand (K) pseudo timestep iterations. The PIML results in Figure 10b shows a similar convergence.

Steady Plane Channel Flow
The ML trained turbulent shear stress response surface (generated in previous section) was used in the solution of the boundary layer equation (Equation (11)). The equations were solved using a domain y = [-1,1] with no-slip boundary condition (u = 0) at y = ± 1. The simulations were performed using 49 (coarse), 65 (medium), and 97 (fine) grid points. Two sets of simulations were performed for this case. For the first set, the simulations were started from a converged channel flow RANS solution obtained using oneequation model, and the second set of simulations were started from an ill-conditioned velocity field. The DDML and PIML model simulations were performed using input feature set , , (as shown in Equation (21)). Further for the DDML simulations ML-3 and ML-4 weighting which provided the best predictions for apriori tests were considered. Figure 10a shows solution convergence on all the three grids obtained using DDML when solution is started from RANS solution. The convergence time history shows a kink in residual early-on in the simulation, but eventually solution shows steady convergence. The convergence is faster for the fine grid and slowest for the coarse grid. On the coarse grid the residual converges to around 2 × 10 −6 . Whereas for both medium and fine grid simulations, the residual keep dropping even below 1.2 × 10 −6 after 30 thousand (K) pseudo timestep iterations. The PIML results in Figure 10b shows a similar convergence.    Figure 11 compares the results obtained using DDML and PIML (after 30K pseudo timesteps) with DNS and RANS results. Among the DDML models, the model obtained using ML-4 weighting performed better than those obtained using ML-3. This is in contrast to the apriori tests, where ML-3 performed better than ML-4. Note that the one-equation model underpredicts the velocity at the center compared to the DNS, and both the DDML and PIML models resolve the issue. The DDML model predictions improve with grid refinement, whereas the largest improvement is obtained in between coarse and medium grids. However, the results show oscillation near the peak, which is clearly evident in the turbulent and shear stress predictions and more prominent for the fine grid predictions. The oscillations suggest that the "query output" jumps between the database curves. For the boundary layer equations, one of the physical constraints is the shear stress must satisfy C1 continuity, i.e., both stress and its derivative are continuous in space (i.e., along y direction). Node/point-based queries are definitely not satisfying this constraint, which is probably the cause of the oscillations.  Figure 11 compares the results obtained using DDML and PIML (after 30K pseudo timesteps) with DNS and RANS results. Among the DDML models, the model obtained using ML-4 weighting performed better than those obtained using ML-3. This is in contrast to the apriori tests, where ML-3 performed better than ML-4. Note that the one-equation model underpredicts the velocity at the center compared to the DNS, and both the DDML and PIML models resolve the issue. The DDML model predictions improve with grid refinement, whereas the largest improvement is obtained in between coarse and medium grids. However, the results show oscillation near the peak, which is clearly evident in the turbulent and shear stress predictions and more prominent for the fine grid predictions. The oscillations suggest that the "query output" jumps between the database curves. For the boundary layer equations, one of the physical constraints is the shear stress must satisfy C1 continuity, i.e., both stress and its derivative are continuous in space (i.e., along y direction). Node/point-based queries are definitely not satisfying this constraint, which is probably the cause of the oscillations. To resolve the stress oscillation issue in the DDML model predictions, solution smoothness was enforced by implementing region-based query, i.e., use averaged τ uv output for the input parameter sets at the node and its two neighboring nodes, i.e., ; y + j

DDML PIML
As shown in Figure 11, the above averaging approach helps get rid of oscillations, and the results improve with grid refinement consistent with grid convergence.
The PIML model predictions show only marginal improvement with grid refinement, and the results do not show oscillations similar to the DDML model. Further, when averaging was applied it significantly deteriorated the performance of the model. In general, the PIML model performs better than the DDML especially in the lower log-layer region, i.e., y +~3 0 to 40. In the next test set, the simulations were started from ill-conditioned initial condition profile: This profile is much steeper than that of the turbulent boundary layer, as shown in Figure 12 (It = 0 plot), and the associated set of input features are not expected to be present in the DNS/LES database. Thus, the model operates in extrapolation mode. As shown in Figure 12a, the DDML model predictions converge to the correct sub-layer and log-layer profiles, but shows significant differences in the buffer-and lower log-layer (20 ≤ y + < 80). A good prediction of the sub-and log-layer is very encouraging, and suggests that the extrapolation issue can be addressed by incorporating physics constraints during query process, such that the query recognizes that the inputs are out of bound of the database and provides a better educated guess of the output. As shown in Figure 12b, the PIML model converges slowly (as shown in the convergence plot in Figure 10b) to the DNS profile. This suggest that a well-trained machine learned model can converge to the correct solution. To resolve the stress oscillation issue in the DDML model predictions, solution smoothness was enforced by implementing region-based query, i.e., use averaged output for the input parameter sets at the node and its two neighboring nodes, i.e., | = 2  , ,  ,  +   ,   ,  ,  +   ,   ,  ,  +  , ,  , As shown in Figure 11, the above averaging approach helps get rid of oscillations, and the results improve with grid refinement consistent with grid convergence.
The PIML model predictions show only marginal improvement with grid refinement, and the results do not show oscillations similar to the DDML model. Further, when averaging was applied it significantly deteriorated the performance of the model. In general, the PIML model performs better than the DDML especially in the lower log-layer region, i.e., y + ~ 30 to 40.
In the next test set, the simulations were started from ill-conditioned initial condition profile: This profile is much steeper than that of the turbulent boundary layer, as shown in Figure 12 (It = 0 plot), and the associated set of input features are not expected to be present in the DNS/LES database. Thus, the model operates in extrapolation mode. As shown in Figure 12a, the DDML model predictions converge to the correct sub-layer and log-layer profiles, but shows significant differences in the buffer-and lower log-layer (20 ≤ y + < 80). A good prediction of the sub-and log-layer is very encouraging, and suggests that the extrapolation issue can be addressed by incorporating physics constraints during query process, such that the query recognizes that the inputs are out of bound of the database and provides a better educated guess of the output. As shown in Figure 12b, the PIML model converges slowly (as shown in the convergence plot in Figure 10b) to the DNS profile. This suggest that a well-trained machine learned model can converge to the correct solution.

Oscillating Plane Channel Flow
Three sets of simulations were performed for this case using 65 grid points in the wall-normal direction. The simulations were performed using the DDML model trained using input feature set shown in Equation (24), and using 10% of the datasets (i.e., either 10 points or 10% of each cluster). For set #1, the DDML model was used in apriori mode, i.e., simulations were performed using one-equation URANS model, and the local flow predictions were used to query the regression map. For set #2, the DDML model was used in aposteriori mode, where the simulations were started from fully converged channel flow velocity profile corresponding to Reτ = 350 obtained using RANS. For set #3, both URANS and DDML simulations were started from an ill-posed initial flow condition i.e., = (1 − ).
The DDML model predictions from set #1 and #2 are compared with DNS and URANS predictions in Figure 13. The URANS model performs quite well for the mean flow, but predicts significantly diffused shear stress and the peak values are 9% under predicted. The largest error is obtained for the turbulent kinetic energy for which the peak values are underpredicted by 60%. The DDML model predictions (in apriori mode) show significantly better shear stress predictions than the URANS model. The predictions have errors on the order of 3-5% for the mean velocity and shear-stress, and peak TKE are overpredicted by 15%. Since the DDML model uses the velocities and derivatives predicted by the URANS model, the improved prediction by the former can be attributed to its ability to learn the non-linear correlation between the turbulent stresses and rate-ofstrain. The DDML model also works very well in the aposteriori mode, and both the mean velocity, shear stress and k compare within 8% of the DNS. Note that the simulations used a (wall-normal) grid and time step size two times smaller and 10 times larger, respectively, compared to those of DNS, thus the predictions are considered reasonably accurate.
As shown in Figure 14, for the simulations started from ill-posed initial flow conditions, the URANS solutions show large differences with the DNS during the early part of the simulations, but the solution slowly recovers, and the solutions for the second and third cycles are very similar to those for the earlier case. This is because the flow is primarily driven by the pressure gradient, and the flow adapts to the pressure variations. The DDML model adjusts to ill-posed initial flow condition much faster than the URANS model, and results compare very well with the DNS, and the predictions errors are similar to those of set #2.

Oscillating Plane Channel Flow
Three sets of simulations were performed for this case using 65 grid points in the wall-normal direction. The simulations were performed using the DDML model trained using input feature set shown in Equation (24), and using 10% of the datasets (i.e., either 10 points or 10% of each cluster). For set #1, the DDML model was used in apriori mode, i.e., simulations were performed using one-equation URANS model, and the local flow predictions were used to query the regression map. For set #2, the DDML model was used in aposteriori mode, where the simulations were started from fully converged channel flow velocity profile corresponding to Re τ = 350 obtained using RANS. For set #3, both URANS and DDML simulations were started from an ill-posed initial flow condition i.e., u t=0 = U c 1 − d 8 .
The DDML model predictions from set #1 and #2 are compared with DNS and URANS predictions in Figure 13. The URANS model performs quite well for the mean flow, but predicts significantly diffused shear stress and the peak values are 9% under predicted. The largest error is obtained for the turbulent kinetic energy for which the peak values are underpredicted by 60%. The DDML model predictions (in apriori mode) show significantly better shear stress predictions than the URANS model. The predictions have errors on the order of 3-5% for the mean velocity and shear-stress, and peak TKE are overpredicted by 15%. Since the DDML model uses the velocities and derivatives predicted by the URANS model, the improved prediction by the former can be attributed to its ability to learn the non-linear correlation between the turbulent stresses and rate-of-strain. The DDML model also works very well in the aposteriori mode, and both the mean velocity, shear stress and k compare within 8% of the DNS. Note that the simulations used a (wall-normal) grid and time step size two times smaller and 10 times larger, respectively, compared to those of DNS, thus the predictions are considered reasonably accurate.
As shown in Figure 14, for the simulations started from ill-posed initial flow conditions, the URANS solutions show large differences with the DNS during the early part of the simulations, but the solution slowly recovers, and the solutions for the second and third cycles are very similar to those for the earlier case. This is because the flow is primarily driven by the pressure gradient, and the flow adapts to the pressure variations. The DDML model adjusts to ill-posed initial flow condition much faster than the URANS model, and results compare very well with the DNS, and the predictions errors are similar to those of set #2.

Conclusions and Future Work
This study investigates the ability of neural networks to train a stand-alone turbulence model and the effects of input parameter selection and training approach on the accuracy of the machine learned regression map. To achieve these objectives, a DNS/LES database is curated and/or developed for steady and unsteady boundary layer flows, for which the mean flow simplifies to a 1D steady and unsteady problem, respectively, and closure of the governing equations require modeling of turbulent shear stress. The database was used to train data driven and physics-informed machine learned turbulence model. For the latter, the residual in the governing equation solution was incorporated in the cost function during the model training. The model was validated for apriori and aposteriori tests, including ill-posed flow condition.
Overall, the results demonstrate that machine learning can help develop a stand-alone turbulence model. Moreover, an accurately trained model can provide grid convergent, smooth solutions, which works well in extrapolation mode, and converge to a correct solution from ill-posed flow conditions. The accuracy of the machine learned response surface depends on 1.
The choice of input parameters. Feature engineering was used to find the optimal input features for the neural network training. It was identified that grouping flow variables into a problem relevant parameter improves the accuracy of the model. For example, a model trained using Re based on local flow velocity and wall distance is more accurate compared to the model trained using Re based on global flow. Furthermore, higher order functions of an input variable, such as square of the rateof-strain along with rate-of-strain, does not help in improving the accuracy of the map. However, they may be used as weighting function to reduce the overlap in the datasets; and 2.
How the database is weighted to minimize the overlap between the datasets. This requires a trial-and-error method to come up with an appropriate weighting function.
A better way to improve the accuracy of the regression surface is to include physical constraints to the loss function during training, which is referred to as the PIML approach. However, it is not straightforward to incorporate physical constrains during the training due to issues in calculation of the derivates, such as temporal derivatives, for unsteady problem. Data clustering is also identified to be a useful tool to improve accuracy of the machine learned model and reduce computational cost, as it avoids skewness of the model towards a dominant flow feature.
Herein, machine learning was applied for cases which are very similar to the training datasets, which limits the applicability of the model, as well as does not sufficiently challenge the robustness of the machine learning approach. The ongoing work is focusing on generation of a larger database encompassing steady and unsteady boundary layer flows, including separated flow regimes. A model trained using such a database will help in development of a more generic turbulence model for boundary layer flows. It is expected that data clustering will be very helpful for training such as model due to the presence of wide range of turbulence characteristics. On a final note, the machine learned models were found to be extremely computationally expensive compared to the physics-based model (the former was around two order of magnitude more expensive). This was because of the added cost associated with ML query every iteration for every grid point. This research primarily focused on the accuracy of the model and efforts were not made to improve the efficiency of the ML model query. However, practical application of ML model would require investigation of approaches to improve the computational efficiency of run-time ML query.   Spalart-Allmaras RANS: Experiment: lift coefficient (C L ) and surface pressure (C P ) for wind turbine airfoils S805, S809, S814, Re = 10 6 , 2 × 10 6 , 3 × 10 6 C L and C P for S809, α = 14 • , Re = 2 × 10 6 He et al. [11] Adjoint equations for solution error. β distribution to minimize error. Ling et al. [13] Coefficients of non-linear stress terms: g n (λ 1 , λ 2 , λ 3 , λ 4 , λ 5 ) Wang et al. [14] Stress prediction error:  Stress prediction error: Channel flow Re = 1000 to 10 10 • Inducing grid aspect ratio and pressure gradient did not improve results.

Appendix B. Simplification of Navier-Stokes Equation
The mass conservation equation for incompressible Navier-Stokes equation for threedimensional flows in Cartesian coordinate is: where, u, v and w are the mean velocities along streamwise (x), wall-normal (y) and spanwise (z) directions, respectively. For 2D flows, the velocity and the gradients along the spanwise direction are assumed to be negligible, i.e., w = 0 and ∂() ∂z = 0. In addition, assuming that the flow gradients along the streamwise direction are negligible compared to those along the wall-normal direction. The mass conservation equation simplifies to: For above 2D flows, the only momentum equation which is of interest is those along the streamwise direction, i.e., ∂u ∂t + u ∂u ∂x + v ∂u ∂y + w ∂u ∂z = − 1 ρ ∂p ∂x + ν ∂ 2 u ∂x 2 + ∂ 2 u ∂y 2 + ∂ 2 u ∂z 2 + ∂τ uu ∂x + ∂τ uv ∂y + ∂τ uw ∂z (A4) Using the assumptions for 2D flows, only the time derivative term survives on the right-hand side. Among the second derivative terms, ∂ 2 u ∂y 2 is more dominant compared to ∂ 2 u ∂x 2 based on the gradient assumption, thus later is neglected. τ uu , τ uv and τ uw are the unknow turbulent stresses due to correlation between streamwise turbulent fluctuations, streamwise and wall-normal fluctuations, and streamwise and spanwise fluctuations, respectively. The term involving τ uw can be dropped because of the 2D assumptions. The term involving τ uu can be also dropped assuming that both τ uu and τ uv have similar magnitude, and gradients along the streamwise direction are negligible compared to those along the wallnormal direction. Lastly, p is the pressure in the flow, and only the streamwise pressure gradient appears in the equation. For flat-plate boundary layer flows, the streamwise pressure gradients inside the boundary layer is driven by the pressure gradients outside the boundary layer. Thus, for inner boundary layer flows the pressure gradients can be specified as a forcing term (F(t)) to account for the free-stream pressure-gradients. Thus, the unsteady simplified boundary layer equation is: