A Block Iteration with Parallelization Method for the Greedy Selection in Radial Basis Functions Based Mesh Deformation

: Greedy algorithm is one of the important point selection methods in the radial basis function based mesh deformation. However, in large-scale mesh, the conventional greedy selection will generate expensive time consumption and result in performance penalties. To accelerate the computational procedure of the point selection, a block iteration with parallelization method is proposed in this paper. By the block iteration method, the computational complexities of three steps in the greedy selection are all reduced from O ( n 3 ) to O ( n 2 ) . In addition, the parallelization of two steps in the greedy selection separates boundary points into sub-cores, efﬁciently accelerating the procedure. Speciﬁcally, three typical models of three-dimensional undulating ﬁsh, ONERA M6 wing and three-dimensional Super-cavitating Hydrofoil are taken as the test cases to validate the proposed method and the results show that it improves 17.41 times performance compared to the conventional method.


Introduction
Simulation based on computational fluid dynamics (CFD) is an effective solution for various problems in aerospace engineering and ocean engineering [1], etc.Among methods adopted by CFD, the mesh deformation method plays a significant role.To promote the quality and precision of the grid after mesh deformation operation, researchers exploit two different mesh deformation strategies: connectivity methods and point-by-point methods.
One typical instance of connectivity methods is the spring analogy [2].It assumes that every edge has all the characteristics of a spring, shaping a mesh connected by springs and it is widely applied in aerodynamic field.Subsequently, Farhat et al. [3] improved its robustness by including the torsional condition.However, in large-scale mesh systems, the spring analogy method will produce an expensive cost due to its requirement for the whole mesh connectivity.The linear elastic analogy [4][5][6], in which the mesh cell is abstracted into an elastic solid, solves the mesh deformation based on the partial differential equation.This method has great robustness in large-scale mesh deformation but still produces a huge expensive cost.
The point-by-point methods mean that each node displaced independently and has no relation with other nodes [7].This means that the method could provide an identical solution for both structured and unstructured mesh.In the later period of time, Liu et al. [8] proposed a method which propagates deformation from boundary points to internal points by interpolation of barycenter.Witteveen [9] proposed a method that the boundary points could affect the deformation of the internal points by inverse distance weighted interpolation.In addition, Luke et al. [10] decreased computational cost of aforementioned methodology via tree-code optimization.Although the point-by-point methods increase the efficiency of computation, it doesn't perform well in the complex mesh system and is not able to preserve the mesh orthogonality.Fortunately, one of the point-by-point methods, radial basis function interpolation, does not have these aforementioned problems.
In 2007, Boer et al. [11] put forward the application of radial basis function (RBF) interpolation in mesh deformation.It is a promotion of the original point-by-point methods that ensures the preservation of mesh orthogonality while still maintaining its good adaptability in all mesh types.The main idea is to calculate mesh motion by interpolating the displacements of the boundary points to the internal points.The research by Boer et al. [11], Sheng et al. [12] and Bos et al. [13] showed that different RBFs adopted in the mesh deformation could result in different results.Jakobsson and Amoignon [14] proposed a point coarsening method for gradient-based aerodynamic shape optimization.By conducting a variety of testing cases, they finally gave some suggestions about how to choose these RBFs.However, it is demonstrated that, even using the best radial basis function, the computational cost will still grow rapidly as the mesh scale increases.This is due to the calculation redundancy caused by the interpolation from the unnecessary boundary points.
To eliminate the calculation redundancy, Rendall et al. [15] proposed a greedy algorithm to reduce the boundary points for interpolation and the remaining boundary points are selected as a smaller set of control points.Based on the criterion of displacement error, the greedy algorithm could effectively remove the redundant boundary points that has a smaller impact on displacement interpolation.Meanwhile, it could also preserve the mesh quality as well as the conventional RBF mesh deformation.An important work done by Michler [16] is that he directionalized the point selection by selecting a different set of points in each direction using a greedy selection method.However, it is found that, with the increasing quantity of control points, the computational complexity of most greedy point selection processes is approximately O(n 4 ), which significantly influences the computational efficiency of the whole process [17].
To solve this problem, in the last decade, researchers have achieved some effective methods by analysing the concrete process of each separated step in the greedy algorithm.Gillebaart et al. [18] adopted an adaptive method to reduce the frequency of greedy point selection.Skala et al. [19] proposed an incremental approach to effectively accelerate the step of matrix inversion.Based on it, Selim et al. [20] presented an improvement method of lower-upper (LU) decomposition.Afterwards, Fang et al. [21] provided an effective method based on recurrence Choleskey decomposition.In addition, in 2017, Kedward et al. [22] proposed a form of data reduction that still includes all the points.In addition, researchers also consider the feasibility of parallel computation in the greedy algorithm.Gerhold et al. [23] parallelized RBF mesh deformation.Then, Rendall et al. [24] proposed a parallel approach applied in multi-bladed rotors.In addition, Li et al. [17] and Fang et al. [21] proposed parallel computation methods based on the master-slave architecture.
Nevertheless, all of the aforementioned methods could just focus on some specific steps without an overall optimization for the greedy algorithm.Meanwhile, considering the continuity of all steps, only some parts could be well parallelized.Therefore, this paper proposes a method combining block iteration with parallel computing.This method provides an optimization scheme for each step in the point selection procedure and could significantly reduce the time cost for the whole RBF mesh deformation.Specifically, the main dedications are summarized as follows:

•
A block iteration method is developed by analyzing the mathematical characters of the greedy algorithm.With the application of the block iteration, some specific steps that have the feasibility of iteration could be greatly optimized.The computational complexity could be reduced from O(n 3 ) to O(n 2 ).

•
The parallelization is accomplished by analyzing the data dependency of the whole procedure.
Steps that have the parallel feasibility could have good speedups because of the low communication cost.

•
Our block iteration with parallelization method is firstly validated by three-dimensional undulating fish and ONERA M6 wing which are both 10 6 cells mesh models.To validate the method efficiency of large-scale mesh, we adopt a three-dimensional Super-cavitating Hydrofoil model with 11 million cells.All three of the models could obtain an effective improvement via the proposed method.
The rest of this paper is allocated as follows.Section 2 recalls the methodology of RBF mesh deformation and the conventional greedy algorithm.Section 3 demonstrates the fundamental theory of block iteration and the implementation of parallel computing in detail.In Section 4, it exhibits the experimental results and discusses how the results could be explained by the previous studies.Finally, conclusions are described in Section 5.

RBF Mesh Deformation with Greedy Algorithm
The utilization of point-by-point mesh deformation in CFD is divided into two processes.The first process is to solve the predefined equation coefficient of the boundary points by its actual displacement.In addition, the second process is to obtain the estimated displacement of the internal points based on the received coefficient value.This is the update of grid points after boundary motion in one time step.In this section, the formulation of RBF mesh deformation is introduced at first, and then how the greedy algorithm is adopted in the RBF mesh deformation to promote the efficiency will be explained.

RBF Mesh Deformation
For a grid model, whole points could be divided into two types.One is internal point, which fills the internal space of the grid model, and the other is boundary point, which is on the grid boundary and attaches to the rim of moving object.The RBF mesh deformation interpolation could be interpreted as follows: where x is the coordinate of a random point, x b j is the coordinate of the j th boundary point, N b is the total quantity of boundary points, λ j is the weight coefficient, φ is the selected radial basis function, x is the Euclidean norm of vector x and s(x) is the estimated displacement of point x.We could calculate the displacement of the moving boundary points which trace with the rim of moving object, and the displacements of the static boundary points are vector 0. Therefore, the displacements of the boundary points could be calculated as follows: where s(x b ) and ∆x b are the estimated displacements and actual displacements of the boundary points, Φ b,b is an N b × N b matrix, which stores the RBF calculations and λ is the N-dimensional coefficients vector that contains λ x , λ y , and λ z values.Thus, we could obtain the value of λ: After calculating the value of λ, we could obtain the estimated displacement of the internal points by: where x in i is the i th internal point and (s(x in i )) is the estimated displacement of x in i .Therefore, the matrix formulation could be described as: where Φ in,b is an N in × N b matrix, and N in is the total quantity of internal points.For the formulation above, the radial basis functions φ possess various expressions.In general, it is divided into two categories: compact functions and global functions as presented in Table 1.Compact functions limit the range of the interpolation with a support radius.Global functions operate on the whole grid internal points.The distinct of the two categories is explained by Boer et al. [11].
According to the research by Bos et al. [13], the radius basis function we have chosen is the thin plate spline (TPS), which generates the highest mesh quality of the update grid.

Greedy Selection
The large-scale mesh contains huge quantity of boundary points and it results in a long time consumption.In fact, part of the boundary points have essential impact on the displacement of internal points.In other words, redundant points bring the unnecessary time cost.To solve this problem, T.C.S. Rendall proposed the greedy algorithm in RBF mesh deformation [15], and it is presented as the following: where ∆x c is the actual displacement of the boundary point, Φ c,c is an N c × N c matrix, N c is the total quantity of control points and Φ in,c is an N in × N c matrix.The greedy selection is the process of selecting the proper control points.Its purpose is to pick out the representative points that affects the estimated displacements most.The detailed process is as follows: 1.
Choose an arbitrary point from boundary points to initialize the set of control points; 2.
Solve the predefined equation coefficient of control points by Equation (8); 3.
Obtain the estimated displacements of the moving boundary points: where Φ b,c is an N b × N c matrix and ∆x * b is the estimated displacement of boundary points; 4.
Obtain the boundary errors of all the unselected boundary points: where ε is the boundary errors;

5.
Compare the largest boundary error with the predefined criterion threshold, and figure out whether it is larger than threshold or not: where x 2 is the Euclidean norm of x and ξ tol is the artificially setting error tolerance.
If yes, put the point which have the largest boundary error into the set of control points; 6.
If not, end the selection; 7.
Repeat from Step 2 to Step 6.
Greedy selection effectively reduces the quantity and redundancy of operating-attended boundary points compared with the original RBF mesh deformation.However, this algorithm generates additional time cost for the RBF mesh deformation process.The extra time is mainly spent on the loop of points selection, and the cost will increase exponentially with the growth of control points quantity [17].To accelerate the this procedure, block iteration method and parallel computing method are proposed respectively in Section 3.

Block Iteration with Parallelization
To figure out the procedure in which the additional time cost is generated, we experimented on the conventional greedy selection of RBF mesh deformation in different conditions as presented in Figure 1.Conspicuously, it shows that the time allocation of conventional greedy selection could be divided into six parts as presented in Table 2:  As presented in Figure 1, we find out that part b, d and e consume the majority of the time cost and each of them occupies more than 15% of the whole time.Especially for part b, the whole process costs approximately 50% time on it.Thus, the optimization of part b, d and e could efficiently reduce the time cost of the greedy selection in RBF mesh deformation.The following content of this section introduces the methods of Blockiteration and Parallelcomputing, which could efficiently handle these problems.In addition, the methods from Sections 3.1.1-3.1.3and 3.2, respectively, optimize the part a, b, d and e.To find the time reduction methods, we initially dissect the construction process of Φ c,c .In terms of Equation ( 3), matrix Φ c,c could be unfolded as follows: According to the characteristics of Euclidean norm, φ( . Thus, we could get: Equation ( 14) presents that Φ c,c is a real symmetric matrix and its diagonal elements are all equal.From the beginning to the end of greedy selection, each loop step puts one point into the set of control points without any decrease.Therefore, we could establish links between (i − 1) th and i th loop: where Apparently, it establishes a relationship between Φ i−1 and Φ i .
In other words, the construction procedure of Φ c,c in greedy selection possesses the fundamental precondition of iteration.For each loop, in the process of constructing Φ i , the values needed to be calculated are vector r and φ(0), instead of constructing an i × i matrix as it is presented in Algorithm 1.In this process, the original computational complexity could be calculated by the following equation: The after-optimized computational complexity could be calculated by the following equation: Therefore, for constructing Φ c,c , the computing complexity could be reduced from O(n 3 ) to O(n 2 ) by block iteration. 14: P main sends α and x c to P (i)

The Iterative Feasibility of Inversion
The inversion of Φ c,c occupies almost half of the time cost.Optimizing the inverse process is one of the essential parts of the work in this paper.In Section 3.1.1,we separate Φ i into a 2 × 2 block matrix.According to the research done by Lu et al. [25], we could obtain several effective methods to inverse a 2 × 2 block matrix.Considering that Φ i is a real symmetric matrix, we inverse it as the following: where Q i is an i × i matrix, M i−1 is an (i − 1) × (i − 1) matrix, η is an (i − 1) dimensional vector and p is a real number.In addition, we formulate the equation: where E i is an i × i identity matrix.Thus, we could obtain the equation set: where E i−1 is an (i − 1) × (i − 1) identity matrix, and 0 is i dimensional zero vector.The solution is: In addition, we assume the equation set: where β is an (i − 1) dimensional vector and b is a real number.Combining Equations ( 21) and ( 22), we obtain the solution: Equation (23) establishes links between Phi −1 i and Φ −1 i−1 .Therefore, the process of inversion has the ability of iteration.For each loop, in this process, the values needed to be calculated are β, b and Φ −1 i−1 + bββ T rather than the inversion process of the whole i × i matrix as it presented in Algorithm 1.In general, inversion adopts the method of Gauss elimination or LU factorization to inverse matrix.The computing complexity of Gauss elimination or LU factorization is approximately O(n 3 ) [26].For the block iteration method, the computing complexity decreases to O(n 2 ) [19].
In addition, the initial matrix of some RBFs such as TPS is Φ 0 = [φ(0)] = [0], and their inverse matrices never exist.Therefore, for these kinds of RBF, at the beginning of the loop, we should inverse the initial 2 × 2 matrix by the classical methods such as LU factorization.

The Iterative Feasibility of Constructing Φ b,c
Similar to Section 3.1.1,combining Equations ( 7) and (10), matrix Φ b,c is presented as the following: For each loop in the greedy selection, the quantity of boundary points is constant.However, the number of control points adds by one in the k th loop contrasted with the (k − 1) th .Therefore, we establish links between (k − 1) th and k th loop: Thus, the process of constructing Φ b,c has iterative feasibility.For each loop, in the process of constructing Φ k , the value that should be calculated is vector τ instead of constructing an k × k matrix as it presented in Algorithm 1.In this process, with condition that N b > N c , the original computing complexity is: where σ = N b N c .In addition, the after-optimizing computing complexity is In other words, the computing complexity of this process decreases from O(n 3 ) to O(n 3 ).

The Feasibility of Parallel Computing
To optimize the multiplication process of Φ b,c and α, we dissect Equation ( 10): For the i th element of ∆x * b : According to the Section 3.1.3and Equation ( 28), we figure out that x b i and x b j , ∆x * i and ∆x * j , has no data dependency for each other.Therefore, parallelization could be adopted in the process of constructing Φ b,c and multiplying Φ b,c and α.

Implementation of Parallelization
Considering that every boundary point is data-independent from the others, all the boundary points are equally distributed to every sub-processor.Thus, for the whole process of greedy selection, Sections 3.1.1and 3.1.2are implemented at the main processor, and Sections 3.1.3and 3.2.1 are implemented at the sub-processors as presented in Algorithm 1, where P main and P i refer to the main processor and sub-processors.

Results and Discussion
In this paper, the efficiency of proposed methods are contracted with the conventional greedy selection by three-dimensional undulating fish motion, Office National d' Etudes et de Recherches Aerospatiales (ONERA) M6 wing deformation and three-dimensional super-cavitating hydrofoil.The time consumption of greedy selection is composed of six parts as presented in Table 2.For testing the parallel computational efficiency, we adopt an high performance computing (HPC) cluster that consists of more than one hundred computing nodes.Every node has an Intel Xeon 2.1 GHz E5-2620 CPU (United States of America) that consists of 12 cores and 16 GB total memory.

Mesh Quality Results
We define a quality metric for each of the mesh cells to assess the mesh quality results after deformation.Actually, the method we proposed is to improve the computational efficiency and its simulation results should be theoretically the same as the original results.Therefore, we just need to compare whether the results of the block iteration with parallelization method are equal to the original results or not.To make the mesh deformation results more intuitive, we adopt the two-dimensional undulating fish models as showed in Figure 2, in which the mesh is discretized with triangular cells.Knupp [27] proposed that size and shape are the two main metrics for a triangle cell.The size metric is to judge whether the cells are too large or too small compared to the reference.In addition, the reference is the area of the cell before the deformation.The formulation of the size metric is expressed as: where γ is the ratio of the area of the deformed cell to the area of the original cell.The f size = 1 means that if and only if the physical triangle has the same area as the reference triangle, and the f size = 0 means if and only if the physical triangle is degenerate.The shape metric is to analyse distortions with the equilateral triangles reference.For two-dimensional models, a triangular cell contains three nodes that could be coordinated as (x (k) , y (k) ), k = 0, 1, 2. In addition, the formulation of the Jacobian matrix A (k) : i,j , i, j = 1, 2 be the ij th component of the k th metric tensor.For any k, a node independent shape metric is expressed as: where ν (k) = det(A (k) ).f shape = 1 means if and only if the physical triangle is equilateral and f shape = 0 means if and only if the physical triangle is degenerate.In addition, the value of f shape is independent from a uniform scaling of the element.In general, to measure both size and shape simultaneously with different weights, a size-shape quality metric for triangles is adopted in this paper, considering that the shape metric plays a more important role than the size metric: According to the method above, we test the two-dimensional undulating fish model, controlled by Equation (33).In addition, the results are shown in Table 3. Original means the original greedy selection.Block means that computation procedure only adopts the block iteration method.BwithP means both block iteration and parallelization method are adopted with a parallelism increment of 12 cores.We conducted undulation from time = 0.00T to time = 0.50T and time = 1.00T.It obviously shows that the proposed methods could obtain the same mesh quality results as the original one.In addition, according to the average f size-shape , their mesh cells remain well.

Three-Dimensional Undulating Fish
Three-dimensional undulating fish is shown in Figure 3a.A fish is undulating in the centre of a 20 L × 5 L × 5 L cubic tank as presented in Figure 3b.It consists of 55,879 internal points, 5646 moving boundary points inside the mesh.Kern et al. [28] proposed the outline geometry of the fish.The motion of the three-dimensional fish is controlled by the sinusoidal function referred by Carling et al. [29]: where y is the displacement of mid-line profile.s is the arc length along the mid-line of the fish body, t is the current time from beginning and T is the undulating period.For the motion, we compare the selection procedure of the first 0.0001 unit of time.
For the deformation of three-dimensional fish, the distribution of control points is shown in Figure 4.The convergence history of boundary errors is shown in Figure 5. Thus, we can see the block iteration with the parallelism method could get the same result compared with the conventional greedy selection.Time comparison is shown in Table 4 and Figure 6.Origin means the conventional greedy selection.Block iteration means that computation procedure only adopts the block iteration method.Block with parallel means that both block iteration and the parallelization method are adopted with a parallelism increment of 12 cores.We can see that the time costs of parts a, b, d are efficiently reduced by 94.4% on average via a block iteration method.In addition, the parallel speedup ratio of part d, e is 10.37 on average, calculated by Table 4.   4.
For the deformation ξ tol = 0.00015, there are 489 control points selected.The block iteration time cost of part a is about 1  146 compared with the conventional condition.It could be a validation that computational complexity is reduced from approximate 1  3 N 3 c to N 2 c .For part d, the block iteration time is 1  231 of the conventional time cost, reduced from  For the deformation of ONERA M6 wing, 560 control points are selected under the condition that ξ tol = 0.0005.The distribution of boundary errors on wing surface are shown in Figure 8.Time comparison is shown in Table 5 and Figure 9.We can see that the time costs of parts a, b, d are efficiently reduced by more than 95% on average via the block iteration method.In addition, the parallel speedup of part d, e is 10.33 on average.For the deformation ξ tol = 0.00055, there are 528 control points selected.The block iteration time cost of part a is about 1  148 compared with the conventional condition.It could be a validation that computational complexity is reduced from approximately ).The parallel speedup ratio is shown in Figure 10.We can see that parallelism could efficiently reduce the time cost of part d and e.However, with the quantity increase of processors, the parallel efficiency rapidly declined.For the deformation ξ tol = 10 −5 , there are 444 control points selected.As shown in Table 6 and Figure 12a, the block iteration time cost of part a is about 1  155 compared with the conventional condition.It could be a validation that computational complexity is reduced from approximately 1  3 N 3  The parallel speedup ratio is shown in Figure 12b.We can see that, with the quantity increase of processors, the parallel efficiency rapidly declined, the same as results in Section 4.2.

Conclusions
This paper proposes block iteration with the parallel method to improve the efficiency of the greedy selection based on RBF mesh deformation.The block iteration method is proposed based on the fact that the specific selection way has the iterative feasibility.It could reduce the computational complexity of matrices construction and inversion from O(n 3 ) to O(n 2 ) in the process of the greedy selection.The parallelization addresses the load imbalance problem of the selection process and improves the efficiency of data reduction, based on the independence of the boundary point.To adopt the block iteration with the parallel method, the efficiency of greedy selection in RBF deformation is greatly improved while keeping the data accuracy.Three models are tested via this proposed method and validate the good performance of the method.The model tests show that the total speedup ratio of each test could achieve 17.35 on average via the block iteration with the parallel method.

Table 2 .
Time allocation of conventional greedy selection.a. Construct Φ c,c in Equation (8); b.Solve the value of Φ −1 c,c in Equation (8); c.Calculate the product of Φ −1 c,c and ∆x c in Equation (8); d.Construct Φ b,c in Equation (10); e. Calculate the product of Φ b,c and α in Equation (10); f .Others.

Figure 1 .
Figure 1.Time allocation of conventional greedy selection in different experiment setups.The specific explanation of the experiment setup in this figure is presented in Section 4.

Algorithm 1 1 c,c 4 :
Block iteration and Parallel computing in Greedy selection.Require: x b , ∆x b and ξ tol Ensure: x c and ∆x c 1: Equally distribute x (i) b to P (i) and put x c in P main 2: x c = [x b i ] //Select the initial control point from boundary points randomly 3: Calculate Φ c,c , Φ −1 c,c and α, and make Φ temp1 = Φ c,c , Φ −1 temp1 = Φ −P main sends α and x c to P (i) 5: Calculate Φ (i) b,c , and make

Figure 2 .
Figure 2. The mesh around two-dimensional undulating fish.

Figure 3 .
Figure 3. Three-dimensional fish and initial mesh of its cubic tank.

Figure 6 .
Figure 6.Time comparison of three-dimensional fish based on Table4.

σ 2 N 3 c
to σN 2 c .The same with part a and d, the time cost of part b is reduced from O(N 3 c ) to O(N 2 c ).Without optimization, the time cost of part c is almost the same as the conventional time cost.However, the time cost of part f is increased because of the additional variable introduction and parallel communication time consumption.For the deformation ξ tol = 0.0001, there are 618 control points selected.The computational complexities of block iteration for part a, b, d are 1 191 , 1 161 , 1 302 of the conventional complexities, reduced from O(N 3 c ) to O(N 2 c ).

Figure 8 .
Figure 8.The distribution of boundary errors on wing surface.

1 3 N 3 c
to N 2 c .For part d, the block iteration time is 1 259 of the conventional time cost, reduced from σ 2 N 3 c to σN 2 c .The same with parts a and d, and the time cost of part b is reduced from O(N 3 c ) to O(N 2 c ).The time costs of part c and f are the same as Section 4.1.For the deformation ξ tol = 0.0005, there are 560 control points selected.The computational complexities of block iteration for parts a, b, d are 1 168 , 1 110 , 1 278 of the conventional complexities, reduced from O(N 3 c ) to O(N 2 c

Figure 10 .
Figure 10.Speedup ratio and parallel efficiency of ONERA M6 wing.(a) Part d, ξ tol = 0.0005.(b) Part e, ξ tol = 0.0005.(c) Part d, ξ tol = 0.00055.(d) Part e, ξ tol = 0.000554.3.Three-Dimensional Super-Cavitating HydrofoilThe results in Sections 4.1 and 4.2 are both tested on 10 5 scale mesh.For testing the performance of the proposed method in large-scale mesh, we utilize the model of three-dimensional Super-cavitating Hydrofoil.A three-dimensional Super-cavitating Hydrofoil is shown in Figure11a.The initial surface mesh connecting the outside flow field with the hydrofoil is shown in Figure11b.It consists of 1.1 × 10 7 cells, 10,309,112 internal points and 25,702 moving boundary points inside the mesh.The hydrofoil axially rotates inside the mesh system.For the motion, we compare the selection procedure of the first 0.0001 time unit.

c to N 2 c
. For part d, the block iteration time is 1 216 of the conventional time cost, reduced from σ 2 N 3 c to σN 2 c .The same with parts a and d, the time cost of part b is reduced from O(N 3 c ) to O(N 2 c ).The time cost of part c is almost changeless.However, part f generates additional time costs compared with the conventional method.

Table 1 .
Radial basis function (θ = x r , where x is the Euclidean norm and r is the support radius).

Table 3 .
Mesh deformation results for the undulating fish using different methods, ξ tol = 0.0001.

Table 4 .
Time comparison of three-dimensional fish (value of ξ tol is in the brackets, number of cores is 12).