Sequential and Parallel Algorithms to Compute Turbulent Coherent Structures

Gandía-Barberá, Sergio; Cremades, Andres; Vinuesa, Ricardo; Hoyas, Sergio; Pérez-Quiles, María Jezabel

doi:10.3390/math12213325

Open AccessArticle

Sequential and Parallel Algorithms to Compute Turbulent Coherent Structures

by

Sergio Gandía-Barberá

¹

,

Andres Cremades

²

,

Ricardo Vinuesa

²

,

Sergio Hoyas

^1,*

and

María Jezabel Pérez-Quiles

¹

Instituto Universitario de Matemática Pura y Aplicada, Universitat Politècnica de València, 46022 València, Spain

²

FLOW, Engineering Mechanics, KTH Royal Institute of Technology, SE-100 44 Stockholm, Sweden

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(21), 3325; https://doi.org/10.3390/math12213325

Submission received: 12 September 2024 / Revised: 21 October 2024 / Accepted: 22 October 2024 / Published: 23 October 2024

(This article belongs to the Section E: Applied Mathematics)

Download

Browse Figures

Versions Notes

Abstract

The behavior of turbulent flows remains a significant unsolved problem in physics. Recently, a large quantity of effort has been directed toward understanding the non-linear interactions of the different flow structures in order to address this challenge. In this paper, different implementations of one exact method for identifying these structures are analyzed. This includes two sequential algorithms and a parallelizable one, developed to handle large-scale data efficiently. The new parallel algorithm offers significant advantages in handling the computational demands of large simulations, providing a more scalable solution for future research.

Keywords:

DNS; wall turbulence; turbulent structures; coherent structures; high-performance computing

MSC:

65S-04; 76F-04; 68V-04

1. Introduction

Turbulence is arguably considered the most significant unsolved problem in physics, with widespread applications in everyday life. It is estimated that up to 5% of global CO₂ emissions each year are accounted for by wall-bounded turbulence [1]. While fully understanding turbulence may still be an overly ambitious goal that remains far from being achieved [2,3], improving models to reliably and efficiently simulate flow behavior is viewed as a more attainable objective [4]. It should be noted that even an existence and uniqueness theorem for the solution of the Navier–Stokes equations governing fluid flows has yet to be established [5].

To gain a fundamental understanding of turbulent flows, Direct Numerical Simulation (DNS) is widely recognized as one of the most important techniques today, which involves solving the Navier–Stokes Equations [6] without any modeling [7,8]. However, the use of DNS is restricted to simplified geometries due to its immense computational cost [9]. Among these idealized flows, turbulent Poiseuille channel flows, where the flow is confined between two parallel plates and driven by pressure, have been the most successful. Supercomputers have been essential tools for performing these simulations, and the exponential increase in computational power since the 1990s has allowed DNS to advance accordingly. The friction Reynolds number (

R e_{τ} = h u_{τ} / ν

), the primary control parameter, has been steadily increased since the pioneering work of Kim, Moin, and Moser [7], see [10] and references therein.

In this definition, h is the semi-height of the channel,

ν

is the kinematic viscosity, and

u_{τ}

is the friction velocity, which is defined as the square root of the stress at the wall. The friction velocity

u_{τ}

is a fictitious velocity, meaning it does not physically exist but serves as a fundamental scaling parameter for wall-bounded flows [11].

As a result of this increased size, immense databases have been created. For instance, each velocity field from Hoyas and Oberlack [10] is close to 1 Terabyte (TB). The challenge of obtaining trustworthy and valuable results from these hundreds of terabytes is becoming increasingly difficult.

Over the past century, the primary approach to understanding and modeling turbulence has been through Reynolds decomposition and the refinement of scaling laws [12]. Various techniques and a great deal of ingenuity have been employed [13,14] over the years, but even classical scaling laws continue to be debated. One example is the flow behavior in the so-called logarithmic layer, first described by Von Kármán [15] nearly 90 years ago. Although this concept has been widely applied throughout the last century, it remains a subject of significant debate today; see [16] and references therein.

Several new ideas have emerged, such as the use of symmetry theory [17,18], but a complete understanding has yet to be achieved. In recent years, a shift toward studying turbulent structures has been observed [19]. Structures were first described experimentally as streamwise streaks [20] and Reynolds-stress events [21]. Other structures closely related to the Reynolds stress are the Q-events [22], which are defined as regions of strong momentum flux. Using another point of view, in 1995, Chong et al. [23] provided a mathematical definition of vortexes, roughly identifying them as regions with more vorticity than shear. Nevertheless, the interaction of these structures remains a highly non-linear problem, and understanding their behavior or establishing cause-effect relationships between them is still an open question [2]. However, it is widely recognized that structures can be key to understanding turbulence [3,24], and causality is being explored [25,26].

This work focuses on the development of a code for identifying various structures within a turbulent flow. As one of the critical points in turbulence research is the dynamic behavior of these structures, a code capable of identifying these structures in very short times or even in situ (during simulation) is needed, thereby eliminating the necessity of storing large datasets. Thus, since this paper focuses on extracting individual features from the flow, the term “structure” is used to indicate a coherent region of the flow that can be defined in any of the aforementioned ways. The formal definition of these structures is provided below.

A significant challenge encountered in extracting these coherent regions is the large amount of RAM required. Several methods for defining structures are discussed in the literature, and in each case, a Boolean three-dimensional (3D) structure is obtained. This structure must be analyzed to extract the individual features of the flow, and after analysis, a field similar to the Boolean one is produced, but with the truth value replaced by the node identifier to which each point belongs. The mesh required for DNS is extremely large [9,10], and simulations involving meshes on the order of

10^{11}

points are now possible. Consequently, RAM on the order of hundreds of gigabytes (GB) is essential. Efficient parallel methods are critical for handling such vast datasets, as memory becomes the primary bottleneck.

This paper presents an analysis of one method with two different ways of defining structures, differing in the definition of the edges of the structures. These methods are fundamentally equivalent and are highly efficient for small problems but also suitable for very large datasets [3,27].

In addition, the parallelization of this method is demonstrated. The basic idea is to use the capabilities of the Message Passing Interface (MPI) and Hierarchical Data Format (HDF5) libraries to distribute the data across the computer, using a sequential routine to complete the procedure. These algorithms, along with definitions of structures and the method used to process the data, are explained in the Methods section. In the Results section, the three algorithms are compared and discussed. The main findings are summarized in the conclusions.

Finally, this paper aligns with the United Nations’ sustainable development goals 7, 9, and 13, as improved knowledge of turbulence can positively impact energy use and stimulate innovation [28]. It is estimated that approximately 15% of the world’s energy consumption occurs near the surface of wall-bounded flows [2]. Reducing this energy usage (Goal 7) would contribute to addressing the climate emergency (Goal 13) and promote technological advancements and knowledge (Goal 9).

2. Materials and Methods

In this work, three different ways to obtain the structures of a turbulent flow are presented. The cases were tested in several DNS of channel flow at various friction Reynolds numbers (Figure 1). In each case, a structured mesh was used. The method can be applied to any other flow, trivially if the mesh is structured and easily if an array containing the connections of every point in the mesh is provided. Details of the different simulations are given in Table 1. Details about these simulations are given in the Results section. The streamwise, wall-normal, and spanwise coordinates are denoted by x, y, and z, respectively. The data were computed at several Reynolds numbers within a computational box of length

L_{x}

, height

L_{y} = 2

, and width

L_{z}

with periodicities in the spanwise and streamwise directions.

The corresponding velocity components are

U, V, and W

or, using index notation,

U_{i}

. Statistically averaged quantities in time, x and z, are denoted by an overbar,

\bar{U}

, whereas fluctuating quantities are denoted by lowercase letters, i.e.,

U = \bar{U} + u

. Primes are reserved for intensities,

u^{'} = {\bar{u u}}^{1 / 2}

.

The Navier–Stokes equations have been solved using the LISO code, which has successfully been employed to run some of the largest simulations of turbulence [9,31] or to test several theories [16,18]. The code uses the same strategy as [7] but employs a seven-point compact finite difference scheme in y direction with fourth-order consistency and extended spectral-like resolution [32]. The temporal discretization is a third-order semi-implicit Runge–Kutta scheme [33]. The wall-normal grid spacing is adjusted to keep the resolution at

Δ y = 1.5 η

, i.e., approximately constant in terms of the local isotropic Kolmogorov scale

η = {(ν^{3} / ϵ)}^{1 / 4}

. A code similar to the one used presently, including the energy equation, is explained in [34], and a critical study about the convergence of statistics is presented in [35].

Several definitions of turbulent flow structures are found in the literature, depending on the effect to be studied. Streamwise streaks [20] and Reynolds-stress events [21] were the first to be described. The streamwise streaks are defined as flow regions of slowly moving fluid elongated in the direction of the mean flow. As a result, their definition only involves u and w. They are thought to play an important role in turbulence production near the wall and in transporting momentum to the outer region of the flow. These streaks are defined as:

\begin{matrix} u (x, y, z) & < 0, \end{matrix}

(1)

\begin{matrix} \sqrt{u^{2} (x, y, z) + w^{2} (x, y, z)} & > α u_{τ}, \end{matrix}

(2)

where

α

is the percolation index [36,37]. For the values of

R e_{τ}

used here,

α = 4

. It should be noted that if

u (x, y, z) > 0

is taken in Equation (1), the definition changes and the structure is referred to as high-velocity streaks. The importance of the percolation index is discussed below.

Intense Reynolds stress structures or Q-events are defined by:

| τ (x, y, z) | > β u^{'} (y) v^{'} (y),

(3)

where

τ (x, y, z) = - u (x, y, z) v s . (x, y, z)

is the instantaneous point-wise tangential Reynolds stress and

β

is the percolation index for

u v

-structure identification. In the examples below,

β = 1.40

. Depending on the signs of u and v, there are four possible Q-events. As shown in [3], the most significant events are ejections (

u < 0, v > 0

) and sweeps (

u > 0, v < 0

). In this case, only u and v are involved since the definition focuses on the main shear of the flow, from the streamwise direction to the wall-normal one.

The definition of vortices [23] is based on the analysis of the velocity tensor. The key point is finding regions of the flow where rotation is larger than shear. Vortices appear close to ejections [36] and are part of the proposed cycle to sustain turbulence involving the streaks [23]. Since they carry high levels of enstrophy, vortices can be considered dissipative structures.

The velocity field

U_{i}

can be computed as

U_{i} = A_{i j} x_{j}

, where Einstein’s notation has been used, and

A_{i j}

is the velocity gradient tensor. Following [23], the three invariants of this tensor, P, Q, and R, are given by:

\begin{matrix} P & = A_{i i} \\ Q & = \frac{1}{2} [P^{2} - S_{i j} S_{j i} - W_{i j} W_{j i}], and \\ R & = \frac{1}{3} [- P^{3} + 3 P Q - S_{i j} S_{j k} S_{k i} - 3 W_{i j} W_{j k} S_{k i}] \end{matrix}

where

S_{i j} = (A_{i j} + A_{j i}) / 2

is the the rate-of-strain tensor and

W_{i j} = (A_{i j} - A_{j i}) / 2

is the rate-of-rotation tensor. The discriminant D of

A_{i j}

is given by:

D = \frac{27}{4} R^{2} + Q^{3} .

In the regions where D is positive, a conjugate pair of eigenvalues for

A_{i j}

appears, indicating a region with closed or spiral patterns associated with strong vorticity.

One obtains a Boolean field of the same size as the velocity after using any of these definitions, or any other, to obtain these coherent structures. This array contains a true value (1) if the point belongs to a structure or false (0) if not. The main point of this paper starts here. An algorithm is needed to obtain the individual structures. In Figure 1, several long high-velocity streaks (pink) are shown. These structures are very close to the wall, while Q-structures (light blue) are present across the whole channel.

One very important point in these problems is the aforementioned percolation index, i.e., the constant in Equations (3) and (2). If these numbers are close to 0, a single structure with a porous shape appears. On the contrary, almost all structures are removed from the flow if the percolation index is too large. In the literature [38], the criteria for choosing a percolation index involve achieving a volume ratio below 10% while maintaining an object identification ratio close to unity. This approach ensures that the identified structures represent a sufficient population and are spaced adequately to yield reliable statistical outcomes. On the other hand, this requires a very demanding study where many different configurations must be tested.

The first point in defining a structure is to decide the connections among the different points of the structure. Figure 2 summarizes the two alternatives. As turbulence is a 3D problem, this point has to be considered in the structure’s definition. Note that any other way of defining any coherent structure is also valid. Here, these two definitions are employed to show how the algorithm works for two different cases.

2.1. Direct Queue

In the first case, Figure 2a, two points belong to the same structure if a path joins them following only the three axes. Paths running through the diagonals are allowed for the 26-connectivity case, Figure 2b. In both cases, the results are similar. The second stencil produced fewer and a bit larger structures, but this effect happens almost entirely for the smallest structures.

The main idea in both cases is illustrated in the flow diagram presented in Figure 3. The algorithm takes a Boolean array,

w k

, as input and produces an output array of the same size that stores the identified structures. The simplest way to organize the output is by replacing the true values with a unique identifier for each vortex. As will be shown later, this approach is easily parallelized. Moreover, identifying a specific structure can be done by simply reading one array in memory.

The code iterates over the mesh points. When a point with

w k (x, y, z) = 1

is found, the queue subalgorithm begins. A queue is created, with this point as the first entry. From there, the algorithm checks in every possible direction, adding points to the queue if

w k (x 1, y 1, z 1) = 1

, where

(x 1, y 1, z 1)

are the coordinates of the new point. The search in that direction stops once

w k (x 1, y 1, z 1) = 0

. To prevent adding the same point multiple times,

w k (x 1, y 1, z 1)

is set to 0 when the point is added to the queue.

Algorithm 1, 3D queue method describes the procedure to directly obtain the structures using these two stencils. The main points of this algorithm are:

Lines 2–7. Define an array with the different directions one may follow. Six directions are needed for the first case, and twenty-six are needed for the second one.
Lines 8–15. Initialize memory and start checking all points. If $w k$ is 0, the rest of the loop is skipped.
Lines 16–20. This point is the starting point of a vortex. Initialize the queue and three indices:
−
$n p$ is the length of the vortex;
−
$n q$ is the point to be studied;
−
$s q$ is the index of the points added to the queue.
Line 20. The loop ends when all the points in the queue have been analyzed.
Lines 20–35. Main loop:
−
Lines 21–22. Save this point in the structure register;
−
Line 24. Change the Boolean to 0. This 0 avoids working at this point again but seriously harms the chances of using openMP;
−
Lines 26–32. Follow every possible direction, adding points to the queue. The Boolean is set to zero after including the point.

Algorithm 1 3D queue method
1:	Input: Boolean matrix `wk`, dimensions `nx`, `ny`, `nz`, `flag`
2:	Output: Intenger matrix nodes, dimensions `nx`, `ny`, `nz`
3:	if flag == 1 then
4:	Define `dirs` as a $6 \times 3$ matrix with the directions: $\begin{matrix} - 1 & 0 & 0 \\ 0 & - 1 & 0 \\ 0 & 0 & - 1 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{matrix}$
5:	else
6:	Initialize `dirs` as a $26 \times 3$ matrix containing every possible direction
7:	end if
8:	`ldirs` = size of the first dimension of `dirs`
9:	Initialize `queue` as a $3 \times p d i m$ integer array of zeros
10:	Initialize `ind` as 1
11:	for k = 2 to ny-1 do
12:	for j = 1 to nz do
13:	for i = 1 to nx do
14:	if wk(i, j, k) == 0 then
15:	continue
16:	end if
17:	Assign `queue(:,1)` = [i; j; k]
18:	Initialize `np` = 1
19:	Initialize `nq` = 1
20:	Initialize `sq` = 1
21:	while `nq <= sq` do
22:	Assign `vrtini` = `queue(:, nq)`
23:	Assign `nodes(vrtini)` = ind
24:	Increment `np` by 1
25:	Assign `wk(vrtini(1), vrtini(2), vrtini(3))` = 0
26:	for ld = 1 to `ldirs` do
27:	Compute `vrt` based on `dirs(ld,:)` and `vrtini`
28:	while `vrt(3) != -1` and `wk(vrt(1), vrt(2), vrt(3)) == 1` do
29:	Increment `sq` by 1
30:	Assign `queue(:,sq)` = `vrt`
31:	Assign `wk(vrt(1), vrt(2), vrt(3))` = 0
32:	Update `vrt` to the next position in the same direction
33:	end while
34:	end for
35:	Increment `nq` by 1
36:	end while
37:	Increment `ind` by 1
38:	end for
39:	end for
40:	end for

After this algorithm, one gets the entire structure of the fields. To analyze it, a second set of routines were produced, including, for example, one algorithm that saves this data as a structure array in Matlab (or their equivalent in any other language) with the following fields:

pointsx: Array containing the x-coordinate;
pointsy: Array containing the y-coordinate;
pointsz: Array containing the z-coordinate;
maxx: Maximum value of x;
maxy: Maximum value of y;
maxz: Maximum value of z;
minx: Minimum value of x;
miny: Minimum value of x;
minz: Minimum value of x;
nx: Problem size in x direction;
ny: Problem size in y direction;
nz: Problem size in z direction.

This algorithm does not account for the periodicity of the box in the x and z directions. A 2D example is shown in Figure 4. Creating a separate routine to merge these vortices is more efficient than incorporating the code directly into Algorithm 1. The z-direction version of this code is presented in Algorithm 2. This algorithm uses information stored in the structure, such as vortex(1).nz. Additionally, this algorithm is iterative, as new merges may occur after some structures have been combined.

This second algorithm has three very different parts:

Lines 3–14. Identifies the structures that contain at least one point at the domain’s left or right boundaries;
Lines 17–39. Check if, for the same value of y, there is a point in common for any left-right pair;
Lines 40–47. The algorithm ends here if no structures to join are found (line 41). On the contrary, the structures are joined (line 44), the right ones are removed from the list, and the function is called again.

Algorithm 2 Check Vortex Periodicity in the z-Direction
1:	Input: Array of vortex structures `vortex` and its length n
2:	Output: Modified array of vortex structures with periodicity in z checked
3:	Initialize `indl` = 0
4:	Initialize `indr` = 0
5:	for `ix` = 1 to `n` do
6:	if `vortex(ix).minz` == 1 then
7:	`indl` = `indl` + 1
8:	`left(indl)` = `ix`
9:	end if
10:	if `vortex(ix).maxz` == `vortex(ix).nz` then
11:	`indr` = `indr` + 1
12:	`right(indr)` = `ix`
13:	end if
14:	end for
15:	Initialize `join` = 0
16:	Initialize `ind` = 1
17:	for `ii` = 1 to `indl` do
18:	`pyl` = `vortex(left(ii)).pointsy`
19:	`pyz` = `vortex(left(ii)).pointsz`
20:	`I` = find(`pyz` == 1)
21:	`pyl` = `pyl(I)`
22:	for `jj` = 1 to `indr` do
23:	if `left(ii)` == `right(jj)` then
24:	continue
25:	end if
26:	`pyr` = `vortex(right(jj)).pointsy`
27:	`pyz` = `vortex(right(jj)).pointsz`
28:	`I` = find(`pyz` == `vortex(1).nz`)
29:	`pyr` = `pyr(I)`
30:	`isContained` = any(ismember(`pyl`, `pyr`))
31:	if `isContained` then
32:	`join` = 1
33:	`IL(ind)` = `left(ii)`
34:	`IR(ind)` = `right(jj)`
35:	`ind` = `ind` + 1
36:	Print "join `left(ii)` and `right(jj)`"
37:	end if
38:	end for
39:	end for
40:	if `join` == 0 then return
41:	return
42:	end if
43:	for `ii` = 1 to `ind` - 1 do
44:	`vortex(IL(ii))` = joinVortex(`vortex(IL(ii))`, `vortex(IR(ii))`)
45:	end for
46:	Remove `vortex(IR)` from the list
47:	Recursively call `checkPeriodicityz(vortex)`

2.2. A Parallel Algorithm

The previous algorithms were implemented sequentially because it is possible to add points to the structure in any direction. However, to tackle large problems, it is imperative to use parallel computing.

One initial approach would be to utilize shared memory libraries, with OpenMP being the most popular option [39]. In this framework, all processors have access to the entire memory. However, this is inappropriate in this case, as processor A can access

w k (x, y, z)

and change its value to 0, which would stop the calculations of any other processor traversing that point. Unfortunately, this could also be true for GPU architectures, as OpenMP is somehow similar to them.

The proposed approach involves utilizing the Message Passing Interface (MPI) [40]. With MPI, each processor has access solely to its own memory, enabling efficient parallel computation. Data can be managed through the parallel input/output library HDF5 [41]. Consequently, segments of the entire array

w k

can be distributed to each processor, as illustrated in Figure 5. The maximum number of processors, np, is the number of planes in y, which is typically large, avoiding memory problems. Each processor then uses Algorithm 1 to find their local structures and write them on disk.

The key point is that each processor starts numbering their structures using a large number,

n p a r

, that must be fixed arbitrarily. This situation is represented in the lower part of Figure 5, where two blocks come from two different processors. The blue one starts at 1 and the brown one at

n p a r

. One sequential algorithm, Algorithm 3, reads the intersection part and identifies the common structures, renumbering the ones of the second block. This procedure is very fast as it only needs reading two planes. The implementation chosen uses a recursive algorithm, as some disjoint structures of Processor 2 can be linked to the same structure of Processor 1. Thus, whenever two structures are identified (both are not zero at the same coordinates, and the index is not the same), the code stops searching and calls itself again.

Algorithm 3 Join Slices
1:	Input: `Plane1`, `Plane2`
2:	Output: `Plane1`, `Plane2`
3:	`[nx, nz]` = `size(plane1)`
4:	`join` = 0
5:	for `kk` = 1 to nz do
6:	for `ii` = 1 to nx do
7:	if (`plane1(ii, kk)` · `plane2(ii, kk)`≠ 0) and (`plane1(ii, kk)` - `plane2(ii, kk)`≠ 0) then
8:	`ind1` = `plane1(ii, kk)`
9:	`ind2` = `plane2(ii, kk)`
10:	`plane2(plane2 == ind2)` = `ind1`
11:	`join` = 1
12:	break
13:	end if
14:	end for
15:	if `join` = 1 then
16:	break
17:	end if
18:	end for
19:	if `join` = 1 then
20:	`(plane1, plane2)` = joinSlices(plane1, plane2)
21:	end if

3. Results

The data used in this section are outlined in Table 1, together with a reference to the paper in which they have been used. They were obtained as explained in the Methods section. The largest case, the P2k case, needs a memory of 60 GB if the Boolean is given. If this Boolean has to be included, the memory is close to 200 GB in the most favorable case. This number is 1.5 TB for the largest simulations today [9,10]. This code will be used in these huge fields in some future works. Several Reynolds numbers and meshes were used to consider the growing complexity of the problem. In every case, the resolution is adequate [35,42].

The main results of the algorithm are given in Table 2, including the times. The algorithm has been implemented in Fortran03. Two different computers were used. All cases up to P1000 included ran in an Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20 GHz with 128 GB of RAM. The P2000 case ran in three Intel Xeon Platinum 8174 Processor nodes, with 780 GB of RAM each.

The validation of the code has two parts. The first one accounts for the computation of the different physical magnitudes that define the structures. This validation was made for other works, particularly [3,27]. The second one involves the procedure from the Boolean to individual structures.

The first verification was made using several Booleans prepared ad hoc. Once the code was carefully checked, the first test was made it the data of the smallest case. In this case, it is very easy to see all the connections and check every result. In every case, the parallel algorithm gave exactly the same values as the sequential one. Not a single point was lost.

As shown in Table 2, the case Sequential 26 is slightly slower because there are more connections to follow. This very large computational stencil leads to a smaller number of structures, but they are far larger [36].

To further demonstrate the algorithm’s scaling and avoid working with dimensional quantities, the strong scaling of the algorithm has been computed. To do so, the time used in the sequential case is taken as a reference. All times are normalized using this value, and Table 3 is obtained. As can be seen, the values in this table are extremely close to the number of processors, indicating excellent strong scaling.

Finally, the case P2000 was tested on a different machine. The memory required for this case exceeds 200 GB, making it impossible to run the sequential algorithm. The results shown in Table 4 are excellent. Near-perfect strong scaling is exhibited by the algorithm. Furthermore, the overscaling is likely due to the smaller size of the local structures on each processor. Given the high scalability of the code, it can certainly be used to obtain the turbulent structures of a flow while the simulation is running.

4. Conclusions

This study has explored three different methods for identifying coherent structures within turbulent flows as they are simulated, thereby addressing a key obstacle in DNS: the immense data storage requirements. While effective for small-scale problems, sequential algorithms can not be used in large problems due to memory and time constraints.

The parallel algorithm proposed in this study offers a promising solution to this challenge. Leveraging multiple processors significantly accelerates the process of identifying and analyzing turbulent structures within large datasets. The algorithm has shown excellent strong scaling for small to large problems.

This parallel method improves computational efficiency and makes it feasible to handle the massive data generated by DNS, enabling more detailed and extensive studies of turbulence. In particular, the authors will continue using this algorithm to understand the problems in the accuracy of DNS and the dynamics of turbulent flows.

Author Contributions

Conceptualization, S.G.-B. and S.H.; methodology, S.G.-B. and S.H.; software, A.C., S.G.-B. and S.H.; validation, M.J.P.-Q., S.G.-B. and S.H.; formal analysis, S.G.-B.; investigation, S.G.-B.; resources, M.J.P.-Q. and S.H.; data curation, S.H.; writing—original draft preparation, S.G.-B. and S.H.; writing—review and editing, A.C., R.V., M.J.P.-Q., S.G.-B. and S.H.; visualization, A.C. and S.H.; supervision, M.J.P.-Q.; project administration, M.J.P.-Q.; funding acquisition, R.V., S.H. and M.J.P.-Q. All authors have read and agreed to the published version of the manuscript.

Funding

S.H. and M.J.P.-Q. acknowledge the project PID2021-128676OB-I00 by MCIN/AEI/10.13039/501100011033 and by “ERDF A way of making Europe”, by the European Union (SH). R.V. acknowledges the financial support from ERC grant ‘2021-CoG-101043998, DEEPCONTROL’. The views and opinions expressed are, however, those of the authors only and do not necessarily reflect those of the European Union or the European Research Council. Neither the European Union nor the granting authority can be held responsible for them.

Data Availability Statement

The data and codes are available upon reasonable request by emailing Sergio Hoyas.

Acknowledgments

S.H. is grateful to their students of “Introduction to Supercomputation” in the Master of Aerospace Engineering for their invaluable questions and inputs.

Conflicts of Interest

The authors declare no conflicts of interest.

Use of Artificial Intelligence

AI or AI-assisted tools were not used in drafting any aspect of this manuscript.

Abbreviations

The following abbreviations are used in this manuscript:

DNS	Direct Numerical Simulation
GB	Gigabyte
HDF5	Hierarchical Data Format
MPI	Message Passing Interface
RAM	Random Access Memory
TB	Terabyte

References

Jiménez, J. Near-wall turbulence. Phys. Fluids 2013, 25, 101302. [Google Scholar] [CrossRef]
Jiménez, J. Coherent structures in wall-bounded turbulence. J. Fluid Mech. 2018, 842, P1. [Google Scholar] [CrossRef]
Cremades, A.; Hoyas, S.; Deshpande, R.; Quintero, P.; Lellep, M.; Lee, W.J.; Monty, J.P.; Hutchins, N.; Linkmann, M.; Marusic, I.; et al. Identifying regions of importance in wall-bounded turbulence through explainable deep learning. Nat. Commun. 2024, 15, 3864. [Google Scholar] [CrossRef]
Eivazi, H.; Tahani, M.; Schlatter, P.; Vinuesa, R. Physics-informed neural networks for solving Reynolds-averaged Navier–Stokes equations. Phys. Fluids 2022, 34, 075117. [Google Scholar] [CrossRef]
Farwig, R. From Jean Leray to the millennium problem: The Navier–Stokes equations. J. Evol. Equations 2021, 21, 3243–3263. [Google Scholar] [CrossRef]
Pope, S.B. Turbulent Flows; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar]
Kim, J.; Moin, P.; Moser, R. Turbulence statistics in fully developed channels flows at low Reynolds numbers. J. Fluid Mech. 1987, 177, 133–166. [Google Scholar] [CrossRef]
Canuto, C.; Hussaini, M.Y.; Quarteroni, A.M.; Thomas, A., Jr. Spectral Methods in Fluid Dynamics; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
Hoyas, S.; Oberlack, M.; Alcántara-Ávila, F.; Kraheberger, S.V.; Laux, J. Wall turbulence at high friction Reynolds numbers. Phys. Rev. Fluids 2022, 7, 014602. [Google Scholar] [CrossRef]
Hoyas, S.; Oberlack, M. Turbulent Couette flow up to Re_τ = 2000. J. Fluid Mech. 2024, 987, R9. [Google Scholar] [CrossRef]
von Kármán, T. Mechanische Ähnlichkeit und Turbulenz. Nachr. Ges. Wiss. Göttingen 1930, 1930, 58–76. [Google Scholar]
Reynolds, O. An experimental investigation of the circumstances which determine whether the motion of water shall be direct or sinuous, and of the law of resistance in parallel channels. Proc. R. Soc. Lond. 1883, 174, 935–982. [Google Scholar]
Kolmogorov, A.N. Local structure of turbulence in an incompressible fluid at very high Reynolds numbers. Dokl. Akad. Nauk. 1941, SSSR 30, 9–13. [Google Scholar] [CrossRef]
Kolmogorov, A.N. Dissipation of energy in isotropic turbulence. Dokl. Akad. Nauk. 1941, SSSR 32, 19–21. [Google Scholar]
von Kármán, T.; Howarth, L. On the statistical theory of isotropic turbulence. Proc. R. Soc. Lond. Ser. A-Math. Phys. Sci. 1938, 164, 192–215. [Google Scholar]
Nagib, H.; Vinuesa, R.; Hoyas, S. Utilizing indicator functions with computational data to confirm nature of overlap in normal turbulent stresses: Logarithmic or quarter-power. Phys. Fluids 2024, 36, 075145. [Google Scholar] [CrossRef]
Oberlack, M. Symmetrie, Invarianz und Selbstähnlichkeit in der Turbulenz; Shaker Aachen: Düren, Germany, 2000. [Google Scholar]
Oberlack, M.; Hoyas, S.; Kraheberger, S.V.; Alcántara-Ávila, F.; Laux, J. Turbulence Statistics of Arbitrary Moments of Wall-Bounded Shear Flows: A Symmetry Approach. Phys. Rev. Lett. 2022, 128, 024502. [Google Scholar] [CrossRef]
Lozano-Durán, A.; Flores, O.; Jiménez, J. The three-dimensional structure of momentum transfer in turbulent channels. J. Fluid Mech. 2012, 694, 100–130. [Google Scholar] [CrossRef]
Kline, S.J.; Reynolds, W.C.; Schraub, F.A.; Runstadler, P.W. The structure of turbulent boundary layers. J. Fluid Mech. 1967, 30, 741–773. [Google Scholar] [CrossRef]
Lu, S.S.; Willmarth, W.W. Measurements of the structure of the Reynolds stress in a turbulent boundary layer. J. Fluid Mech. 1973, 60, 481–511. [Google Scholar] [CrossRef]
Hamilton, J.; Kim, J.; Waleffe, F. Regeneration mechanisms of near-wall turbulence structures. J. Fluid Mech. 1995, 287, 317–348. [Google Scholar] [CrossRef]
Chong, M.; Perry, A.; Cantwell, B. A general classification of three-dimensional flow fields. J. Phys. A 1990, 2, 765–777. [Google Scholar] [CrossRef]
Atzori, M.; Vinuesa, R.; Lozano-Durán, A.; Schlatter, P. Intense Reynolds-stress events in turbulent ducts. Int. J. Heat Fluid Flow 2021, 89, 108802. [Google Scholar] [CrossRef]
Lozano-Durán, A.; Arranz, G. Information-theoretic formulation of dynamical systems: Causality, modeling, and control. Phys. Rev. Res. 2022, 4, 023195. [Google Scholar] [CrossRef]
Osawa, K.; Jiménez, J. Causal features in turbulent channel flow. arXiv 2024, arXiv:2405.15674. [Google Scholar]
Gandía Barberá, S.; Alcántara-Ávila, F.; Hoyas, S.; Avsarkisov, V. Stratification effect on extreme-scale rolls in plane Couette flows. Phys. Rev. E 2021, 6, 034605. [Google Scholar] [CrossRef]
Sánchez-Roncero, A.; Garibo-i Orts, Ò.; Conejero, J.A.; Eivazi, H.; Mallor, F.; Rosenberg, E.; Fuso-Nerini, F.; García-Martínez, J.; Vinuesa, R.; Hoyas, S. The sustainable development goals and aerospace engineering: A critical note through artificial intelligence. Results Eng. 2023, 17, 100940. [Google Scholar] [CrossRef]
Gandía-Barberá, S.; Hoyas, S.; Oberlack, M.; Kraheberger, S. The link between the Reynolds shear stress and the large structures of turbulent Couette-Poiseuille flow. Phys. Fluids 2018, 30, 041702. [Google Scholar] [CrossRef]
Hoyas, S.; Jiménez, J. Reynolds number effects on the Reynolds-stress budgets in turbulent channels. Phys. Fluids 2008, 20, 101511. [Google Scholar] [CrossRef]
Avsarkisov, V.; Hoyas, S.; Oberlack, M.; García-Galache, J. Turbulent plane Couette flow at moderately high Reynolds number. J. Fluid Mech. 2014, 751, R1. [Google Scholar] [CrossRef]
Lele, S.K. Compact finite difference schemes with spectral-like resolution. J. Comput. Phys. 1992, 103, 16–42. [Google Scholar] [CrossRef]
Spalart, P.R.; Moser, R.D.; Rogers, M.M. Spectral methods for the Navier–Stokes equations with one infinite and two periodic directions. J. Comput. Phys. 1991, 96, 297–324. [Google Scholar] [CrossRef]
Lluesma-Rodríguez, F.; Álcantara Ávila, F.; Pérez-Quiles, M.; Hoyas, S. A code for simulating heat transfer in turbulent channel flow. Mathematics 2021, 9, 756. [Google Scholar] [CrossRef]
Hoyas, S.; Vinuesa, R.; Schmid, P.; Nagib, H. Sensitivity study of resolution and convergence requirements for the extended overlap region in wall-bounded turbulence. Phys. Rev. Fluids 2024, 9, L082601. [Google Scholar] [CrossRef]
Lozano-Durán, A.; Jiménez, J. Time-resolved evolution of coherent structures in turbulent channels: Characterization of eddies and cascades. J. Fluid Mech. 2014, 759, 432–471. [Google Scholar] [CrossRef]
Bae, H.J.; Lee, M. Life cycle of streaks in the buffer layer of wall-bounded turbulence. Phys. Rev. Fluids 2021, 6, 064603. [Google Scholar] [CrossRef]
del Álamo, J.C.; Jiménez, J.; Zandonade, P.; Moser, R. Self-similar vortex clusters in the turbulent logarythmic region. J. Fluid Mech. 2006, 561, 329–358. [Google Scholar] [CrossRef]
Dagum, L.; Menon, R. OpenMP: An industry standard API for shared-memory programming. IEEE Comput. Sci. Eng. 1998, 5, 46–55. [Google Scholar] [CrossRef]
Snir, M.; Otto, S.W.; Walker, D.W.; Dongarra, J.J.; Huss-Lederman, S. MPI: The Complete Reference; MIT Press: Cambridge, MA, USA, 1996. [Google Scholar]
The HDF Group. Hierarchical Data Format, version 5; The HDF Group: Champaign, IL, USA, 1997. [Google Scholar]
Vinuesa, R.; Hites, M.; Wark, C.; Nagib, H. Documentation of the role of large-scale structures in the bursting process in turbulent boundary layers. Phys. Fluids 2015, 27, 105107. [Google Scholar] [CrossRef]

Figure 1. Sketch of the geometry of a turbulent channel showing Q-events (light blue) and low-velocity streaks (pink). The flow goes from the left to the right, and the wall is at

y / h = 0

. The flow is periodic in the x and z directions. Only half of the channel is shown to increase visibility.

Figure 1. Sketch of the geometry of a turbulent channel showing Q-events (light blue) and low-velocity streaks (pink). The flow goes from the left to the right, and the wall is at

y / h = 0

. The flow is periodic in the x and z directions. Only half of the channel is shown to increase visibility.

Figure 2. Computational stencils for the definition of structures. (a) Main axis stencil. (b) Main axis and diagonal stencil.

Figure 3. Flow diagram of the main algorithm. The input is the Boolean array wk. The output is another array, named nodes, containing the structures.

Figure 4. Several distinct structures are identified in an x-slice, i.e., in a

y - z

plane. Note that structure v1 is influenced by the periodicity in the z direction. The walls are located at

y = 1

and

y = 201

. The colors used to differentiate the structures become darker from left to right and bottom to top. Due to periodicity, the right portion of v1 appears green.

Figure 4. Several distinct structures are identified in an x-slice, i.e., in a

y - z

plane. Note that structure v1 is influenced by the periodicity in the z direction. The walls are located at

y = 1

and

y = 201

. The colors used to differentiate the structures become darker from left to right and bottom to top. Due to periodicity, the right portion of v1 appears green.

Figure 5. Parallel computing of the structures, distributing the data in blocks. This method requires a sequential algorithm to join the different structures that the processors have found.

Table 1. Summary of cases. The number of points is given in physical space for every direction.

L_{x}

and

L_{z}

stand for the lengths of the computational domain in x and z, respectively. The friction Reynolds number is given in the name of every case. The main memory needed is given in GB for the cases in which the Boolean is given or it has to be computed.

Table 1. Summary of cases. The number of points is given in physical space for every direction.

L_{x}

and

L_{z}

stand for the lengths of the computational domain in x and z, respectively. The friction Reynolds number is given in the name of every case. The main memory needed is given in GB for the cases in which the Boolean is given or it has to be computed.

Case	$nx$	$ny$	$nz$	$L_{x} / h$	$L_{z} / h$	Mem B (GB)	Total (GB)	Reference
P125S	192	201	96	$2 π$	$1 π$	0.01	0.04	[3]
P125B	384	201	288	$8 π$	$3 π$	0.21	0.62	[29]
P250	768	251	1152	$8 π$	$6 π$	0.83	2.48	[27]
P550	1536	251	1152	$8 π$	$3 π$	1.65	4.96	[16]
P1000	3072	383	2304	$8 π$	$3 π$	10.10	30.30	[16]
P2000	6144	633	4608	$8 π$	$3 π$	66	200	[30]

Table 2. Absolute times in seconds. The first two columns result from the sequential algorithm for the 6D and 26D cases. The other columns correspond to 2, 4, 8, 16, and 32 processors, always for the 6D case.

Case	P1-6D	P1-26D	P2	P4	P8	P16	P32
P125S	0.03	0.08	0.0178	0.00855	0.00388	0.00166	0.00125
P125B	0.21	0.43	0.0875	0.0487	0.02461	0.00995	0.007368
P250	2.35	4.9	0.99	0.5	0.23	0.11	0.0774
P550	4.96	10.03	2.06	1.1	0.53	0.24	0.16
P1000	42.5	84.6	14.6	7.79	4.21	1.95	1.16

Table 3. Performance results across different processor counts based on the results for the sequential case.

Case	P1	P2	P4	P8	P16	P32
P125S	1.00	2.12	4.42	9.74	22.77	30.24
P125B	1.00	2.40	4.31	8.53	21.11	28.50
P250	1.00	2.37	4.70	10.22	21.36	30.36
P550	1.00	2.33	4.35	9.04	19.96	29.94
P1000	1.00	2.91	5.46	10.10	21.79	36.64

Table 4. Performance results for case P2000 across different processor counts.

Case	P16	P32	P64	P128
P2000 (Seconds)	12.75	5.34	2.18	0.88
P2000 (Scaling)	16.00	38.20	93.58	231.82

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gandía-Barberá, S.; Cremades, A.; Vinuesa, R.; Hoyas, S.; Pérez-Quiles, M.J. Sequential and Parallel Algorithms to Compute Turbulent Coherent Structures. Mathematics 2024, 12, 3325. https://doi.org/10.3390/math12213325

AMA Style

Gandía-Barberá S, Cremades A, Vinuesa R, Hoyas S, Pérez-Quiles MJ. Sequential and Parallel Algorithms to Compute Turbulent Coherent Structures. Mathematics. 2024; 12(21):3325. https://doi.org/10.3390/math12213325

Chicago/Turabian Style

Gandía-Barberá, Sergio, Andres Cremades, Ricardo Vinuesa, Sergio Hoyas, and María Jezabel Pérez-Quiles. 2024. "Sequential and Parallel Algorithms to Compute Turbulent Coherent Structures" Mathematics 12, no. 21: 3325. https://doi.org/10.3390/math12213325

APA Style

Gandía-Barberá, S., Cremades, A., Vinuesa, R., Hoyas, S., & Pérez-Quiles, M. J. (2024). Sequential and Parallel Algorithms to Compute Turbulent Coherent Structures. Mathematics, 12(21), 3325. https://doi.org/10.3390/math12213325

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Sequential and Parallel Algorithms to Compute Turbulent Coherent Structures

Abstract

1. Introduction

2. Materials and Methods

2.1. Direct Queue

2.2. A Parallel Algorithm

3. Results

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Use of Artificial Intelligence

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI