# Efficient and Scalable Initialization of Partitioned Coupled Simulations with preCICE

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Two-Level Initialization

## 3. R-Tree-Based Mesh Filtering

`rtree`data-structure from Boost.geometry [22] with the R*-tree insertion strategy, which stores spatial data in a hierarchical data-structure. The approach for inserting data points into the tree according to the R*-tree insertion strategy aims at minimizing the area, margin, and overlap of tree nodes which leads to a good average query performance (compared to linear and quadratic R-trees) at a higher construction cost [23]. The insertion strategy can be parameterized with the maximum and minimum amount of elements per tree node, where we use the default values of Boost.geometry (16 and $0.3\dot{1}6$) similarly to [12]. The R*-tree and the R-tree have the same complexities: an insertion complexity of $O(logn)$ and a query complexity of $O(logn)$ for n spatially distributed data points [24]. As, in general, meshes are immutable in preCICE, we choose the R*-tree as the higher construction cost pays off in terms of improved lookup performance. The Boost.geometry data structure provides k-nearest-neighbor queries, which the nearest-neighbor mapping and the nearest-projection mapping require, in $O(klog(n\left)\right)$. The data structure also allows for bounding box queries, which the radial-basis-function mapping requires, in $O(log(n\left)\right)$.

**nearest-neighbor data mapping**searches for the nearest vertex in ${V}^{S}$ for each vertex of ${V}^{O}$. This requires to insert all vertices of ${V}^{S}$ into the R-tree resulting in a complexity of $O({n}_{v}^{S}log\left({n}_{v}^{S}\right))$. Afterwards, we query for the nearest-neighbor of each vertex of ${V}^{O}$ at a total cost of $O({n}_{v}^{O}log\left({n}_{v}^{S}\right))$.

**nearest-projection data mapping**searches for the nearest primitive (vertex, edge, or triangle) of ${V}^{S}$ for each vertex of ${V}^{O}$. We use a cascading scheme: For $d=3$ (see Figure 4), the mapping attempts to first find an orthogonal projection onto a triangle, if this fails, it tries to compute an orthogonal projection onto an edge, and finally, if this also fails, selects the closest vertex. For $d=2$, the above method starts at projecting onto an edge.

- we fetch the k nearest primitives of degree m of vertex ${v}^{S}$,
- we sort them by ascending distance to ${v}^{S}$,
- we process the list of primitives and calculate interpolation weights ${w}_{1},\dots ,{w}_{m}$ for data interpolation from the primitive’s vertices to the projection point on the respective plain or line,we terminate and
**return**the current primitive, if all weights are positive ${w}_{i}\ge 0\phantom{\rule{0.277778em}{0ex}}\forall i=1,\dots ,m$, - we
**return**the nearest projection of degree $m-1$ of vertex u if no valid projection of degree m could be found.

## 4. Performance Results

#### 4.1. Test Case Description

`1.5.2`of preCICE with the current version

`2.2.0`. Both were extended with additional time measuring commands and are available on the public preCICE git repository (https://github.com/precice/precice/tree/performance-paper).

#### 4.2. Hardware Description

#### 4.3. Performance Analysis

#### 4.3.1. Strong Scaling

#### 4.3.2. Weak Scaling

## 5. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Cinquegrana, D.; Vitagliano, P.L. Validation of a new fluid—Structure interaction framework for non-linear instabilities of 3D aerodynamic configurations. J. Fluids Struct.
**2021**, 103, 103264. [Google Scholar] [CrossRef] - Naseri, A.; Totounferoush, A.; González, I.; Mehl, M.; Pérez-Segarra, C.D. A scalable framework for the partitioned solution of fluid–structure interaction problems. Comput. Mech.
**2020**, 66, 471–489. [Google Scholar] [CrossRef] - Totounferoush, A.; Naseri, A.; Chiva, J.; Oliva, A.; Mehl, M. A GPU Accelerated Framework for Partitioned Solution of Fluid-Structure Interaction Problems. In Proceedings of the 14th WCCM-ECCOMAS Congress 2020, online, 11–15 January 2021; Volume 700. [Google Scholar]
- Jaust, A.; Weishaupt, K.; Mehl, M.; Flemisch, B. Partitioned coupling schemes for free-flow and porous-media applications with sharp interfaces. In Finite Volumes for Complex Applications IX—Methods, Theoretical Aspects, Examples; Klöfkorn, R., Keilegavlen, E., Radu, F.A., Fuhrmann, J., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 605–613. [Google Scholar] [CrossRef]
- Revell, A.; Afgan, I.; Ali, A.; Santasmasas, M.; Craft, T.; de Rosis, A.; Holgate, J.; Laurence, D.; Iyamabo, B.; Mole, A.; et al. Coupled hybrid RANS-LES research at the university of manchester. ERCOFTAC Bull.
**2020**, 120, 67. [Google Scholar] - Bungartz, H.J.; Lindner, F.; Mehl, M.; Scheufele, K.; Shukaev, A.; Uekermann, B. Partitioned fluid-structure-acoustics interaction on distributed data—Coupling via preCICE. In Software for Exascale Computing—SPPEXA 2013–2015; Bungartz, H.J., Neumann, P., Nagel, E.W., Eds.; Springer: Cham, Switzerland, 2016. [Google Scholar] [CrossRef]
- Lindner, F.; Mehl, M.; Uekermann, B. Radial basis function interpolation for black-box multi-physics simulations. In Proceedings of the VII International Conference on Coupled Problems in Science and Engineering (CIMNE), Rhodes Island, Greece, 12–14 June 2017; pp. 50–61. [Google Scholar]
- Mehl, M.; Uekermann, B.; Bijl, H.; Blom, D.; Gatzhammer, B.; Zuijlen, A. Parallel coupling numerics for partitioned fluid-structure interaction simulations. Comput. Math. Appl.
**2016**, 71, 869–891. [Google Scholar] [CrossRef] - Scheufele, K.; Mehl, M. Robust multisecant Quasi-Newton variants for parallel Fluid-Structure simulations—And other multiphysics applications. SIAM J. Sci. Comput.
**2017**, 39, S404–S433. [Google Scholar] [CrossRef] - Haelterman, R.; Bogaers, A.; Uekermann, B.; Scheufele, K.; Mehl, M. Improving the performance of the partitioned QN-ILS procedure for fluid-structure interaction problems: Filtering. Comput. Struct.
**2016**, 171, 9–17. [Google Scholar] [CrossRef] - Uekermann, B. Partitioned Fluid-Structure Interaction on Massively Parallel Systems. Ph.D. Thesis, Department of Informatics, Technical University of Munich, Munich, Germany, 2016. [Google Scholar] [CrossRef]
- Lindner, F. Data Transfer in Partitioned Multi-Physics Simulations: Interpolation and Communication. Ph.D. Thesis, University of Stuttgart, Stuttgart, Germany, 2019. [Google Scholar] [CrossRef]
- Lindner, F.; Totounferoush, A.; Mehl, M.; Uekermann, B.; Pour, N.E.; Krupp, V.; Roller, S.; Reimann, T.; Sternel, D.C.; Egawa, R.; et al. ExaFSA: Parallel Fluid-Structure-Acoustic Simulation. In Software for Exascale Computing—SPPEXA 2016–2019; Springer: Cham, Switzerland, 2020; pp. 271–300. [Google Scholar] [CrossRef]
- Wolf, K.; Bayrasy, P.; Brodbeck, C.; Kalmykov, I.; Oeckerath, A.; Wirth, N. MpCCI: Neutral interfaces for multiphysics simulations. In Scientific Computing and Algorithms in Industrial Simulations; Springer: Cham, Switzerland, 2017; pp. 135–151. [Google Scholar] [CrossRef]
- Joppich, W.; Kürschner, M. MpCCI—A tool for the simulation of coupled applications. Concurr. Comput. Pract. Exp.
**2006**, 18, 183–192. [Google Scholar] [CrossRef] - Slattery, S.; Wilson, P.; Pawlowski, R. The data transfer kit: A geometric rendezvous-based tool for multiphysics data transfer. In Proceedings of the International Conference on Mathematics & Computational Methods Applied to Nuclear Science & Engineering (M&C 2013), Sun Valley, ID, USA, 5–9 May 2013; pp. 5–9. [Google Scholar]
- Plimpton, S.J.; Hendrickson, B.; Stewart, J.R. A parallel rendezvous algorithm for interpolation between multiple grids. J. Parallel Distrib. Comput.
**2004**, 64, 266–276. [Google Scholar] [CrossRef] - Duchaine, F.; Jauré, S.; Poitou, D.; Quémerais, E.; Staffelbach, G.; Morel, T.; Gicquel, L. Analysis of high performance conjugate heat transfer with the openpalm coupler. Comput. Sci. Discov.
**2015**, 8, 015003. [Google Scholar] [CrossRef][Green Version] - Tang, Y.H.; Kudo, S.; Bian, X.; Li, Z.; Karniadakis, G.E. Multiscale universal interface: A concurrent framework for coupling heterogeneous solvers. J. Comput. Phys.
**2015**, 297, 13–31. [Google Scholar] [CrossRef][Green Version] - Thomas, D.; Cerquaglia, M.L.; Boman, R.; Economon, T.D.; Alonso, J.J.; Dimitriadis, G.; Terrapon, V.E. CUPyDO-An integrated Python environment for coupled fluid-structure simulations. Adv. Eng. Softw.
**2019**, 128, 69–85. [Google Scholar] [CrossRef] - De Boer, A.; van Zuijlen, A.; Bijl, H. Comparison of conservative and consistent approaches for the coupling of non-matching meshes. Comput. Methods Appl. Mech. Eng.
**2008**, 197, 4284–4297. [Google Scholar] [CrossRef] - Boost. Boost Library. Available online: http://www.boost.org/ (accessed on 15 April 2021).
- Beckmann, N.; Kriegel, H.P.; Schneider, R.; Seeger, B. The R*-tree: An efficient and robust access method for points and rectangles. In Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, Atlantic City, NJ, USA, 23–25 May 1990; pp. 322–331. [Google Scholar] [CrossRef][Green Version]
- Guttman, A. R-trees: A dynamic index structure for spatial searching. In Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data, Boston, MA, USA, 18–21 June 1984; pp. 47–57. [Google Scholar] [CrossRef]
- Geuzaine, C.; Remacle, J.F. Gmsh: A 3-D finite element mesh generator with built-in pre-and post-processing facilities. Int. J. Numer. Methods Eng.
**2009**, 79, 1309–1331. [Google Scholar] [CrossRef]

**Figure 1.**Two-level initialization scheme: the first level exchanges bounding boxes and establishes the communication channels between partner processes. The master processes of B gathers the bounding boxes from other processes (I) and sends them to the master process of solver A (II). The master process of solver A broadcasts the received bounding boxes to all other processes of solver A (III). Each process of A compares the received set of bounding boxes to its own bounding box to find the partner processes in solver B (IV). The complete set of sent bounding boxes is drawn in black, while the green boxes represent the subset relevant for the respective process of solver A. The filtered set of bounding boxes is communicated to the processes of solver B via master communication, such that not only the processes of solver A, but also the processes of solver B know their potential communication partners (V).

**Figure 2.**Two-level initialization scheme: the second level exchanges mesh partitions between partner processes to identify the exact list of data that have to be communicated during the simulation. Each process of solver B directly communicates its mesh partition to the relevant partner processes of solver A (I), that have been identified in level one. Each process of solver A compares its own mesh partition to the received mesh partitions and identifies, which data must be communicated during run time (II). The complete received mesh partitions are drawn in black while, the parts that actually have to be communicated in green.

**Figure 3.**Comparison of the nearest-neighbour mapping (

**top**) and the nearest-projection mapping (

**bottom**). Both mappings are consistent and map from the search mesh ${M}^{S}$ to the origin mesh ${M}^{O}$. The nearest-projection mapping prevents extrapolation by mapping to the closest vertex for ${v}_{1}^{O}$ and ${v}_{4}^{O}$. The vertices ${v}_{2}^{O}$ and ${v}_{3}^{O}$ are projected onto the edges $({v}_{1}^{S},{v}_{2}^{S})$ and $({v}_{2}^{S},{v}_{3}^{S})$, resulting in interpolation.

**Figure 4.**Spatial lookup for nearest-projection mapping: The selection zones of the cascading algorithm used by the nearest-projection onto a single triangle ABC. If the location of the orthogonal projection of the original data point is outside of ABC, the projection onto edges are considered before finally projecting onto vertices. The image on the right describes the order in which the zones are considered.

**Figure 6.**Test configuration using ASTE. Participant B on the left reads the mesh structure and physical data from a file ${B}_{\mathrm{in}}$. Participant A on the right reads the mesh structure from the file ${A}_{\mathrm{in}}$. B and A initialize communication, then data are transferred from ${M}_{B}$ to ${M}_{B}^{\prime}$, mapped to ${M}_{A}$ using a nearest-projection mapping and finally written to a file ${A}_{\mathrm{out}}$.

**Figure 7.**Strong scalability measurements: Total initialization time comparison between the two-level approach and the previously used one-level scheme. A mesh with mesh width 0.005 and 628,898 vertices (Table 1, M4) is used for conducting the analysis.

**Figure 8.**Strong scalability measurements: Comparison of run times for mapping computation (

**left**) and mesh communication (

**right**) between between the two-level and the one-level scheme. The mesh M4 is used for conducting the analysis.

**Figure 9.**Strong scalability measurements: Breakdown of the mapping computation time of the two-level approach. The mesh M4 is used for conducting the analysis.

**Figure 10.**Strong scalability measurements: Initialization time breakdown for the two-level initialization approach using mesh M4. Only parts significantly contributing to the runtime are depicted: 1—Bounding box comparison and feedback (Figure 1 levels IV and V). 2—Mesh communication (Figure 2 levels II). 3—Bounding box communication (Figure 1 levels II and III). 4—Compute nearest projection mapping (Figure 2 level II).

**Figure 11.**Strong scalability measurements: Initialization time breakdown for the two-level initialization approach using mesh M5 for solver A and mesh M6 for B. Only parts significantly contributing to the runtime are depicted: 1—Bounding box comparison and feedback (Figure 1 levels IV and V). 2—Mesh communication (Figure 2 levels II). 3—Bounding box communication (Figure 1 levels II and III). 4—Compute nearest projection mapping (Figure 2 level II).

**Figure 12.**Strong scalability measurements: Breakdown of the mapping computation time of the two-level approach. The mapping projects vertices from mesh M5 onto connectivity information of mesh M6.

**Figure 13.**Weak scalability measurements: Total initialization time comparison between the two-level and the one-level schemes. The core distribution and the mesh information are given in Table 3.

**Figure 14.**Weak scalability measurements: Comparison of mapping computation (

**left**) and mesh communication (

**right**) time between the two-level and the one-level scheme. The core distribution and the mesh information are given in Table 3.

**Figure 15.**Weak scalability measurements: Initialization time breakdown for the two-level initialization scheme. Only algorithmic parts with significant contributions to the initialization runtime are depicted: 1—Bounding box comparison and feedback (Figure 1 levels IV and V). 2—Mesh communication (Figure 2 levels II). 3—Bounding box communication (Figure 1 levels II and III). 4—Compute nearest projection mapping (Figure 2 level II). The core distribution and the mesh information are given in Table 3.

**Table 1.**Strong scalability measurements: Meshes for the wind turbine blade at varying mesh resolutions (mesh width). The mesh width indicates the average edge length used to construct the surface mesh.

Mesh ID | Mesh Width | Number of Vertices | Number of Triangles |
---|---|---|---|

M4 | 0.0005 | 628,898 | 1,257,391 |

M5 | 0.0004 | 1,660,616 | 3,321,140 |

M6 | 0.0003 | 2,962,176 | 5,924,260 |

**Table 2.**Number of triangles of mesh 4 and mesh M6 that are discarded due to the vertex-based partitioning (in absolute and relative values). The original mesh M4 contains 1,257,391 triangles and the original M6 mesh contains 5,924,260 triangles.

Cores | M4 | M6 | ||||
---|---|---|---|---|---|---|

Total | B | A | Lost Triangles | Rel [%] | Lost Triangles | Rel [%] |

768 | 192 | 576 | 44,244 | 3.52 | 90,195 | 1.52 |

1536 | 384 | 1152 | 62,003 | 4.93 | 154,358 | 2.61 |

3072 | 768 | 2304 | 86,085 | 6.85 | 246,937 | 4.17 |

6144 | 1536 | 4608 | 118,313 | 9.41 | 376,108 | 6.35 |

12,288 | 3072 | 9216 | 165,085 | 13.13 | 552,797 | 9.33 |

24,576 | 6144 | 18,432 | 230,556 | 18.34 | 797,042 | 13.45 |

**Table 3.**Weak scalability measurements: Meshes for the wind turbine blade at varying mesh resolution (mesh width). The mesh width indicates the average edge length used to construct the surface mesh. The total number of cores is approximately proportional to the number of mesh vertices. The available CPU cores are distributed with a 1:3 ratio between the solvers.

Mesh | Mesh | #Vertices | Cores | #Vertices per Core | |||
---|---|---|---|---|---|---|---|

ID | Width | Total | Total | B | A | B | A |

M1 | 0.0025 | 25,722 | 104 | 26 | 78 | 989 | 330 |

M2 | 0.0010 | 165,009 | 720 | 192 | 528 | 859 | 312 |

M3 | 0.00075 | 330,139 | 1344 | 336 | 1008 | 982 | 328 |

M4 | 0.0005 | 628,898 | 2496 | 624 | 1872 | 1007 | 336 |

M5 | 0.0004 | 1,660,616 | 6144 | 1536 | 4608 | 1081 | 361 |

M6 | 0.0003 | 2,962,176 | 12,288 | 3072 | 9216 | 964 | 321 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Totounferoush, A.; Simonis, F.; Uekermann, B.; Schulte, M. Efficient and Scalable Initialization of Partitioned Coupled Simulations with preCICE. *Algorithms* **2021**, *14*, 166.
https://doi.org/10.3390/a14060166

**AMA Style**

Totounferoush A, Simonis F, Uekermann B, Schulte M. Efficient and Scalable Initialization of Partitioned Coupled Simulations with preCICE. *Algorithms*. 2021; 14(6):166.
https://doi.org/10.3390/a14060166

**Chicago/Turabian Style**

Totounferoush, Amin, Frédéric Simonis, Benjamin Uekermann, and Miriam Schulte. 2021. "Efficient and Scalable Initialization of Partitioned Coupled Simulations with preCICE" *Algorithms* 14, no. 6: 166.
https://doi.org/10.3390/a14060166