Tracking Turbulent Coherent Structures by Means of Neural Networks

The behaviours of individual flow structures have become a relevant matter of study in turbulent flows as the computational power to allow their study feasible has become available. Especially, high instantaneous Reynolds Stress events have been found to dominate the behaviour of the logarithmic layer. In this work, we present a viability study where two machine learning solutions are proposed to reduce the computational cost of tracking such structures in large domains. The first one is a Multi-Layer Perceptron. The second one uses Long Short-Term Memory (LSTM). Both of the methods are developed with the objective of taking the the structures’ geometrical features as inputs from which to predict the structures’ geometrical features in future time steps. Some of the tested Multi-Layer Perceptron architectures proved to perform better and achieve higher accuracy than the LSTM architectures tested, providing lower errors on the predictions and achieving higher accuracy in relating the structures in the consecutive time steps.


Introduction
In this paper, the terms "turbulent structures", "Qs" and "sweeps, and ejections" are used interchangeably.
Turbulence is probably the open problem in physics with most applications in daily life. Wall-bounded turbulence is present in many aspects of our life, and it is responsible for up to 5% of the CO 2 dumped by humanity every year [1]. To completely understand turbulence is perhaps a too-ambitious problem, which we still are years apart, so we should focus on improving models such that they can simulate, in a reliable and fast way, the behaviour of flows. Because we want to understand these flows first, no modelling can be used, so we have to used the Direct Numerical Simulation (DNS) technique. This technique does not use any other modelling than the Navier-Stokes equations. Moreover, DNS are restricted to simplified geometries. The most successful of these idealized flows are Poiseuille turbulent channels, where the fluid is confined between two parallel plates and the flow is driven by pressure. However, the computational cost is extremely high. For doing this, supercomputers are and will be the right technological tool. Since the 90's of the last century, their power have increased in an exponential way, and the DNS have grown accordingly. Because the seminal work of Kim, Moin, and Moser [2], the main control parameter, the friction Reynolds number Re τ has increased continuously [2][3][4][5][6][7][8].
In order to understand and model turbulence, in the last 100 years, the main mechanism has been the Reynolds decomposition, as well as to improve scaling laws. However, in the last years, we have seen a drift to structures. Since the seminal work of Chong [9] and others, we have been able to identify the basic coherent structures of turbulent flows. However, the interaction of these structures is a highly nonlineal problem. Their behaviour or even the precedence (cause-effect) [10] of some of these structures is still an open problem. The idea of this work is to create a code able to identify and follow the different structures of the flow as the flow is simulated. This avoids the necessity of storing large databases.
Traditionally, the structure-following problem has been focused on hairpins: groups of U-shaped dissipative structures that are formed near the wall and go towards the outer region. Head and Bandyopadhyay [11] identified groups of structures at 45 degrees from the wall, producing a model that Adrian et al. [12] and Del álamo et al. [13] evolved towards the vortex clusters model, identifying hairpin clusters whose lifetime was longer than their characteristic one. With the developments in computational power that have come in the last decade, the Reynolds number of large DNS simulations has risen, and the existence of these hairpins in large Reynolds number flows has been questioned by authors, such as Jiménez et al. [14].
Instead of looking for hairpins, the idea is to use the momentum transfer model of sweeps and ejections, high instantaneous Reynolds Stress events in the flow where the momentum transfer is large. Lozano-Durán et al. [15] found that these events are representative for the momentum transfer in the logarithmic layer, and tracked them through their lifetime to find that some of them become large attached to the wall and extend across the logarithmic layer, and they are responsible for most of the total momentum transfer [10]. Figure 1 contains a representation of a sweep-ejection pair extracted from a DNS simulation. However, a big problem appears. Typically, one can expect millions of structures, like these ones per step. Because the time step is around 30s in the largest simulations of turbulence, the code has to be able to identify the vortexes and relate them with the ones of the previous step in this brief time. Machine Learning (ML) has been the strategy followed here. This technique has entered most scientific areas for a vast set of data processing tasks. Under this term, ML, we can find a broad range of algorithms and modelling tools, which includes the Artificial Neural Networks (ANNs). It was proved in the decade of 80's in the last century that ANNs are universal approximators [16,17]. ANNs can be used for fitting, classification, clustering, or time series among other classical problems in Applied Mathematics. They are able to recognise patterns in data and, in some sense, allow us to "mathematize the phenomenology". Despite their bad reputation as "black boxes", ANNs are a good mathematical solution for the approximation of functions when no explicit expression can be obtained by other techniques. The use of neural networks for turbulence research is developing rapidly in recent years. The need of DNS methods to produce complete information in the whole domain for solving each step grants an abundance of available training data; and, the problems associated with fluids, such as the difficulty to measure precisely experimental data or the high computational cost of simulations, are challenges that are similar to the ones solved with neural networks in other fields. The work of Srinivasan et al. in predicting boundary layer behaviour by means of convolutional and recurrent neural networks [18], the work of Park and Choi in reconstructing data for flow control by means of convolutional neural networks [19], the work of Ling et al. in using ANNs to estimate the Reynolds Stress anisotropy tensor to augment the results provided by Reynolds Average Navier Stokes (RANS) models [20], and other ANN approaches to modelling the closure problem (modelling the time-averaged value of the Reynolds Stress tensor) [21,22] are good examples of this.
ANNs have also been used to obtain successful results in other fields of fluid dynamics where DNS data are not available, while using the experimental results as training [23][24][25]. Solutions to the closure problem that use ANN to predict the Reynolds Stress Tensor in fluid fields have also been proposed using experimental data as training, such as Singh et al., while using ANN augment the Spallart-Almaras turbulence model [26]. Brunton et al. [27] and Duraisamy et al. [28] have made detailed reviews of the state of the art of Machine Learning in Turbulence.
In this article, we perform a viability study for a novel method to identify and track these structures with the aid of neural networks. The goal is to understand the particular life of every structure and identify the most energetic structures of the flow in an efficient way. Section 2 describes the computational materials and methods used for this work, where Section 2.1 describes the DNS numerical model that has been used to obtain the fluid fields, Section 2.2 describes the process used for fluid structure and Sections 2.3 and 2.4 detail the neural network solutions to the problem. Section 3 presents the results of our analysis. Section 4 shows the conclusions.

DNS Numerical Model
In this work, we are going to simulate a Poiseuille (pressure-driven flow) plane channel flow. Figure 2 contains a schematic description of the channel used for the DNS simulation and the reference system used. The streamwise, wall-normal, and spanwise coordinates are respectively denoted by x, y, and z, and the corresponding velocity components are U, V, and W, or using index notation, U i , for i = 1, 2, 3. Statistically averaged quantities are denoted by an overbar, whereas fluctuating quantities are denoted by lowercase letters, i.e., U = U + u = U 1 + u 1 . The governing equations of the system are the incompressible Navier-Stokes equations, The LISO code has been used to solve system (1) and (2). These code has been validated in many occasions, and it has been used to run some of the largest problems in turbulence. These equations have been solved using the LISO code, which has successfully been employed in order to run some of the largest simulations of turbulence [3,29,30].The code is an evolution of the classic one of [2], but using five-point compact finite differences in y direction, with fourth-order consistency and extended spectral-like resolution [31]. A Runge-Kutta scheme, third-order semi-implicit, specially developed for this kind of flows [32] has been used for the temporal discretization. The wall-normal grid spacing is adjusted in order to keep the resolution approximately constant in terms of the local isotropic Kolmogorov scale η = (ν 3 / ) 1/4 at ∆y = 1.5η, i.e. In this formula, is the dissipation rate. In wall units, ∆y + varies from 0.3 at the wall, up to ∆y + 12 at the centerline. Table 1 provides details of the simulation.  In this simulation, high Reynolds Stress events are recognised by means of quadrant analysis, with the approach that was proposed by Wallace et al. [33]. This approach consists of studying turbulent regions with high instantaneous Reynolds Stress in the different quadrants, depending on the signs of u and v. The structures with higher instantaneous u and v are present in Q2 (ejections, u < 0, v > 0) and Q4 (sweeps, u > 0, v < 0).
According to Lozano-Durán and Jimenez [10], these structures are not only responsible for the momentum transfer inside the logarithmic layer, but they are also the primitive structures that originate the hairpin vortex clusters, having a key role in the drag generation in turbulent flows.

Tracking Sweeps and Ejections in Turbulent Flows
Sweeps and ejections are coherent regions of the flow, where the instantaneous pointwise Reynolds Stress is bigger than a threshold, established on the basis of percolation. Percolation theory is the study of the connected components on random graphs, and it was first applied to turbulent structure identification by Moisy and Jimenez [34]. If the limit is too permissive, a sponge-like structure forms, spanning all of the domain, but, if it is too restrictive, only very small structures will form. Between those, there is a value of the limit where the ratio between the volume of the largest structure and the total volume of all the detected structures raises sharply. This is the point selected for the value of the criteria. This limit was set to a factor of the standard deviation of the considered parameter (the instantaneous Reynolds Stress) to mitigate the dependence of the probability of selection to the distance to the wall, as del álamo et al. did for vorticity [13] and Lozano-Durán et al. for Reynolds Stress [15].
Once the limit for the desired value has been properly established, all of the points in the flown can be evaluated against the designed criteria, obtaining a subset of the total points of the domain where the relevant magnitude (in this case, instantaneous Reynolds stress) is significant.
Subsequently, the coherent structures can be aggregated from the obtained points into a set of three-dimensional structures in a single time step, classifying them into sweeps and ejections. Figure 3 presents the schematic 2D representation of this step. This figure shows a section of a slice of the domain, where (a) marks, in black, all points fulfilling the criteria and (b) contains all of the structures aggregated, each one filled in a different colour. An efficient method to perform this step has been described in [35]. Finally, in order to reduce the huge number of structures, those with a volume of less of 30 wall units have been removed. FInally, the temporal evolution of the sweeps and ejections must be reconstructed from the information available. This process does not have a deterministic solution, as there is no known analytic law defining the motion of sweeps and ejections. Several traditional programming (as opposed to machine learning) solutions have been used for this problem, such as the one used by Lozano-Durán and Jimenez, which computes the intersection of the structures in two consecutive time [10], and the one that was proposed by Muelder and Ma, which projects each structure onto the following time step and uses the projection an initial guess to aggregate the structures in that following time step [36]. Figure 4 shows the sweeps and ejections found in a section of the domain in two consecutive saved DNS time steps, with each indpendent structure colored differently. The objective of this step is to correlate them between the two time steps, so that the lifetime of the structures can be reconstructed later. Once the full process is performed, the lifetime of the structures can be reconstructed for further study. Figure 5 shows several steps of the life of an ejection, extracted from the DNS simulation described in Section 2.1. Figure 5. Representation of the shape and distance to the wall of an ejection through several steps of its lifetime.

Prediction of Geometrical Features of Sweeps and Ejections by Means of Multi Layer Perceptrons (MLP)
The tracking problem that was described in the previous section, understood as the correlation of structures from a time-step to the next, presents itself as a good opportunity for a neural network solution, as the classical programming solution is expensive computationally: it requires point-by-point access to a very large dataset in both memory contiguous and non-contiguous directions. To perform a simplification, the lack of any analytical description for the behaviour of sweeps and ejections can encourage the use of neural networks, as they have been proven to be universal approximators [17].
The problem must be formulated in terms of a series of inputs to provide to the network and a series of outputs that are expected as output. In this case, the geometrical features of the structures have been selected as both input and output. That way, no extra information regarding the flow needs to be passed to the routines that construct the structures, thus maximising the computational efficiency through the whole process. This geometrical features are: • Centre of mass position (three components). • Bounding box position, as given by the corners with minimum and maximum values for the three coordinates (six components). • Volume (one component).
The objective of the neural network is to obtain the values of this set of features on the next time step to the one used as input, so that the structure that is closest to the prediction of all of the ones present in the next time step can be selected to be the continuation of the initial structure. Figure 6 shows the proposed implementation of the use of ANN in the process of tracking Qs in a DNS simulation: the objective of the ANN is to provide an estimation of the geometrical features of the set of Qs for the next evaluated time step. The predictions made can then be compared with the obtained geometrical features of the Qs in the next time step in order to extract the correspondence between the structures in both of the time steps.
Additionally notice that the idea of the presented algorithm is not to foresee the full characteristics of the Qs at t i+n from time t i , something that is extremely difficult if even possible, but to identify the structures obtained in step t i+n with the ones of t i .
Multi-Layer Perceptrons (MLP) are one of the most simple artificial neural networks. They consist on a series of nodes or neurons organised in fully connected layers. One node in an intermediate layer is connected to the nodes of the previous layer and produce a result by applying a function (activation function) to the values that are obtained from those nodes. The activation function is applied to the sum of each incoming connection multiplied by a weight and an overall bias, being the weights and biases of the full networks the parameters that must be adjusted by a training process (trainable parameters). For the present work, the rectified linear unit activation function was used for the neurons' internal activation function. Figure 7 contains a schematic representation of a neuron.

DNS (n steps) Structure Identification
Minimum difference

ANN Prediction
Correspondence between G t i−n and G t i Correspondence between G t i and G t i+n The training process is an optimisation for the trainable parameters in the networks, in which a loss function that is representative of the difference between the outputs produced by the network and the expected outputs from a known series of cases is minimised. The hyperparameters, such as the learning rate, contol this training process, which must be adjusted to ensure that the training process is successful.
The dataset used for the training, validation, and testing all of the networks in this work has been constructed from 994 non-consecutive time steps of the DNS simulation described in Section 2.1, the separation between each of them being 4.5 CFL, in which the sweeps and ejections have been tracked using sweeps and ejections classical computing techniques. A series of sequences have been obtained, each one representing the full lifetime of a sweep or an ejection, and from each sequence a series spanning n time steps a set of n − 1 data points, with each consisting of a pair input-output representing the evolution of the sweep or ejection from one time step to the next. This dataset has been split into three groups: • 70% of all the available data has been used for training. • 20% for validation, evaluating the value of the loss function and checking consistency with the values that were obtained in the training dataset. • 10% for testing, examining the actual accuracy of the predictions. Table 2 contains the number of sequences and data points for each of the groups. Table 2. The number of data points (evolution of a structure from one time step to the next one) and number of sequences (total structure lifetime) in the dataset and in each one of the three sets it has been divided into. For this work, a number of different network topologies have been trained and their performance evaluated in the validation dataset. All of the evaluated network topologies present an input and output layer both with with 10 neurons to accommodate for the data shape to match the geometrical features that were selected for this. Between the input an output layers, a series of n hidden layers with u i neurons i = 1, · · · , n are present. Figure 8 contains a representation of the multi-layer perceptron used.

Number of Data Points Number of Sequences
x 0 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8  MLPs can have their architectures described by a simple matrix equation, with input vector x and output vector y. For a three-layer MLP (two hidden layers and the output layer), its matrix representation is as follows: where f j is the activation function for layer j, W j,k is the weights matrix from layer k to layer j and b j is the bias vector for layer j.
As a metric for the network performance, the table presents the mean squared error of the predictions that were made by the network against the results that are present in the validation dataset. Table 3 lists the examined MLP architectures and the mean squared error that they produced in the validation dataset once converged. Table 3. The list of evaluated network topologies, described by the number of hidden layers n and the neurons in each layer u i , and the mean squared errors (MSE) produced by their predictions in the validation dataset. The IDs contain the number of hidden layers and the number of neurons of the first hidden layer, or two firsts hidden layers for the case where the first is equal. In order to ensure that no overfitting is happening, the networks were trained for the same number of epochs than they took to reach a stable value of the loss function (MSE). During this training epochs, the validation Mean Squared Error was monitored for any upward trend. None of the studied networks presented any overfitting, which was made possible by the very high data to trainable parameters ratio.

ID
Further sensitivity analysis to initial conditions was performed because of the similarity of the values of the Mean Squared Error for networks MLP_1_10, MLP_2_10 and MLP_3_10_10. Figure 9 contains the result of training these three architectures from 20 different random starting points. Not only the minimum error values are similar for these three MLPs, but their response to different initial conditions of the parameters do not present enough differences to establish that any of the three analysed architectures performs better.

Prediction of Geometrical Features of Sweeps and Ejections by Means of Recurrent Neural Networks
A Recurrent Neural Network is a neural network architecture that presents contextawareness by means of considering its previous state for the computation of their output. That feature allows for it to be specially fitted to predict time-series data, as opposed to the Multi-Layer Perceptron input to output mode of operation. However, RNNs are difficult to train as they present the problem of vanishing gradients: the gradients in the later layers restricts the learning rates upstream in the network, which can prevent the network from learning any further. Long-Short Term Memory (LSTM) networks, developed by Hochreiter and Schmidhuber [37], use a gating mechanism to control the dynamics of the recurrent connections to avoid the vanishing gradients problems. This gating mechanism was later improved by Gers et al. [38], which included the forget gate in the LSTM units. For this problem, due to the non-markovian nature of turbulence, LSTMs can be hypothesised to present an advantage with respects to MLPs, as the successful capture of the long-term dynamics of the structure can represent an advantadge in predicting their position.
LSTM units consist on a cell and three gates: input, output, and forget. The cell contains the information on the LSTM and the gates regulate its information flow. The input gates take the information from the input vector and previous state of the whole LSTM layer to modify the cell state, the output gate regulates the flow of information downstream, and the forget gate regulates the information that stays in the LSTM cell [37].
The output value y(t) of an LSTM unit with a forget gate is calculated, as follows, from the value of the n in inputs x j (t) and previous state of the other n u units in the layer y j (t − 1): where y f (t) is the value of the forget gate, y in (t) the value of the input gate, y ou (t) the value of the output gate and c(t) is the internal state of the LSTM unit, w are the weight vectors with dimension n in of the inputs for each gate, v are the weight vectors of dimension n u of the other units in the layer for each gate, b are the biases for each gate and f f , f in and f ou , are the activation functions for each gate, g is the input squashing function, and h is the output squashing function, which, according to Gers et al., may not be needed [39]. Note that the LSTM layers feed the outputs of all their units in the previous step as inputs to all other units in the layer and, therefore, the number of trainable parameters per layer grows with the second power of the number of LSTM units per layer.
Several different LSTM configurations of n layers with u i units per layer have been tested. The LSTM layers have been placed between an input of dimension 10 and an output layer consisting of 10 convolutional neurons, so that the output is compatible with the dataset dimensions. Table 4 lists the tested LSTM configurations and the mean squared error that they produced in the validation dataset once converged. For the LSTM networks, overfitting was evaluated in the same way as in the previous case: monitoring the Mean Squared Error in the validation dataset for a number of epochs once the loss function (MSE) in the training dataset had stabilised. In this case, all of the evaluated networks presented some degree of overfitting and, therefore, a dropout rate of 20% was introduced for LSTM units, meaning that the internal status of the LSTM during training is reset, on average, in one of every five training data points. After the introduction of the dropout, the networks were trained again to obtain the values that are presented in Table 4, and no overfitting was observed.

Results and Discussion
Once the networks have converged and it is assured that no overfitting is occurring, the next step consisted of relating the training metric (mean squared error) to the functionality that the networks are intended to serve. To do that, an environment that is similar to the operational one was set up for testing. Each prediction done by the networks is compared against all of the structures present in the next time step, for every step of every sequence: 1.
Obtain the prediction of the geometrical features of the structure in the next time step using the network.

2.
Obtain the norm of the difference between the prediction and feature vectors of all the structures present in the next time step. 3.
The minimum value of that norm corresponds to the most similar structure present in the following time step. If they belong to the same sweep or ejection, the prediction is correct. Otherwise, it is not.
This process was applied to a subset of 2000 sequences from the testing dataset in order to evaluate the overall performance of the different networks. Figure 10 shows the success rate of the predictions of the different networks, as compared to their MSE. One network of each group was further tested while using the same method in the full testing dataset in order to obtain further insight into the results that were produced by the networks. For the Multi-Layer Perceptrons, the MLP_1_10 was used and this network obtained an overall success rate of 98.74% for all of the predictions done in the testing dataset. For the LSTM networks, the LSTM_2_50 was chosen, obtaining an overall sucess rate of 67.59%. Figure 11 contains the distributions of MLP_1_10's results of the predictions with respect to Figure 11a the value of norm of the difference between the features vectors of the prediction and the selected structure, Figure 11b structure volume, and Figure 11c distance to the wall y+ of the structure' centre of mass. Figure 12 contains the same information for LSTM_2_50's predictions.  Regarding the predictions that were made by MLP_1_10, the predictions are better for structures with lower volume, but there is no clear trend for the accuracy of the prediction as a function of the distance to the wall. As expected, the precision of the method is higher for a lower norm of the difference. Appendix A presents the coefficients of the tested network.
Overall, none of the LSTM architectures tested could achieve the error of the best MLP architecture or its precision. LSTM_2_50 presents an overall lower sucess rate in the predictions, and it does not overcome the problems that MLP_1_10 presented with structures with higher volume.
The present work, due to its relatively low Reynolds number and size of the dataset, can be understood as a viability study for the incorporation of machine learning techniques in fluid structure tracking problems. For the practical introduction of the described methods in large Reynolds number channels (up to Re τ = 10, 000, with meshes consisting of 8 × 10 10 points [40]), it must provide a clear execution time advantage maintaining the precision of existing methods.
Thus, only summarising some of the tested multi-layer perceptrons has achieved said requirements in terms of precision. In terms of computational load, multi-layer perceptrons also present the advantage of not needing the history of the structure to generate the prediction, potentially saving the computational cost of reconstructing the full lifetimes of all the present structures of each analysed step.

Conclusions
In this work, we have compared the performance of several MLP and LSTM artificial neural networks in predicting the future geometrical features of the sweeps and ejections of a turbulent channel flow at a relatively low friction Reynolds number, Re τ = 500. Several multi-layer perceptron and recurrent (LSTM) neural network architectures were trained in order to predict the geometrical features of both structures in step n + 1 while using the knowledge gained in step n. Different benchmarks show that the tested MLPs proved to generate better prediction than the LSTM networks. However, that does not mean that no LSTM could produce accurate results. Probably more data and computational time for training and experimentation with the network topology are needed.
However, the solutions obtained with some of the tested MLP architectures were satisfactory, obtaining accurate predictions with a relatively simple architecture, meaning fast computations. Having proven the viability of this approach, the functionality of the network could be extended by adding some extra functionality to cover the cases in which the presented neural networks would not be enough: structure creation, structures crashing to form larger structure, structures splitting into smaller children structures, and structures dissipating and dying. These introductions would equate this method to the method that was used by Lozano-Durán et al. [10], which comes at a computational cost of checking the fluid domain and comparing for overlap in consecutive steps. The development of a faster method is a necessity because we intend to perform analyses of the sweeps and ejections at larger Reynolds numbers and/or bigger boxes for different flow.
Finally, the next step in the algorithm is to train it to follow the small vortexes with shorter life, and to reduce its memory needs in order to be able to simulate medium to high Reynolds numbers.

Data Availability Statement:
The raw data that support the findings of this study are available from the corresponding author upon reasonable request. One point statistics can be downloaded from the web page of our group: http://personales.upv.es/serhocal/.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: where f 1 and f 2 are the Rectifier Linear Unit (ReLU) function f (x) = max(0, x), x is the input features vectors scaled to the range (0, 1) and the weights matrices and biases vectors are as follows: