Numerical Evaluation of Stable and OpenMP Parallel Face-Based Smoothed Point Interpolation Method for Geomechanical Problems

Yang, Tianxiao; Qin, Jiayu; Xu, Nengxiong; Mei, Gang; Qin, Yan

doi:10.3390/math14010007

Open AccessArticle

Numerical Evaluation of Stable and OpenMP Parallel Face-Based Smoothed Point Interpolation Method for Geomechanical Problems

by

Tianxiao Yang

^1,2,

Jiayu Qin

^1,2,*,

Nengxiong Xu

^1,2,*

,

Gang Mei

^1,2

and

Yan Qin

^1,2

¹

School of Engineering and Technology, China University of Geosciences (Beijing), Xueyuan Road 29, Beijing 100083, China

²

Engineering and Technology Innovation Center for Risk Prevention and Control of Major Project Geosafety, Ministry of Natural Resources of the People’s Republic of China, Beijing 100083, China

^*

Authors to whom correspondence should be addressed.

Mathematics 2026, 14(1), 7; https://doi.org/10.3390/math14010007

Submission received: 14 November 2025 / Revised: 12 December 2025 / Accepted: 17 December 2025 / Published: 19 December 2025

(This article belongs to the Section E1: Mathematics and Computer Science)

Download

Browse Figures

Versions Notes

Abstract

Compared with the finite element method (FEM), the meshfree smoothed point interpolation method (SPIM) has a more accurate stiffness and is not sensitive to mesh distortion, which has high potential in solving engineering problems. In this study, an effective simulation program based on the face-based SPIM was developed and was applied to solve geomechanical problems. To enhance the reliability of the SPIM program when dealing with large-scale and nonlinear problems, the line search algorithm, the adaptive sub-step method, and the OpenMP parallel design were adopted to enhance the convergence, stability, and computational efficiency. The test results of the slope stability analysis show that the SPIM program is correct when compared with the Bishop method. Moreover, the SPIM program has an asymptotic quadratic convergence and satisfactory stability, even when the slope is in a critical state. In addition, for large-scale examples, the speedup ratio of the OpenMP parallel program can achieve a speedup ratio of 6~8 on a computing platform with 20 CPU cores, and the maximum speedup ratio for a single load step can reach 14.50. Finally, future work on the developing face-based SPIM simulation program is discussed.

Keywords:

smoothed point interpolation method; accelerated convergence strategies; OpenMP parallel design; geomechanics

MSC:

74S05; 74L10; 65N30

1. Introduction

In recent years, the meshfree smoothed point interpolation method (SPIM) has undergone rapid development [1,2,3,4,5,6,7,8]. Compared with the finite element method (FEM), the SPIM has a more accurate stiffness and is not sensitive to mesh distortion [9,10], which can overcome the “overly-stiff” phenomenon and mesh dependence of the FEM. However, the usage frequency of SPIM is far less than that of the FEM. One of the important factors is that the SPIM does not have optional commercial software and open-source programs like the FEM, which seriously limits the application of the SPIM. Thus, developing an effective SPIM simulation program can help to expand the application scope of the SPIM in solving engineering problems.

To this end, an effective simulation program based on the face-based SPIM [11,12] was developed, which has been applied to solve geomechanical problems (e.g., the deformation of rock and soil masses) [13,14]. However, when simulating the deformation of rock and soil masses, there are still challenges. (1) Compared with metal materials, the geometric size of the rock and soil masses is usually large, and the calculation model usually needs more nodes and elements, which will cause serious calculation efficiency problems and memory bottlenecks. To solve these problems, the research team has proposed an efficient stiffness matrix compression storage strategy and introduced the PARDISO solution library to solve the system equations for the face-based SPIM, which enhanced the calculation efficiency and reduced the memory bottleneck [13]. (2) The deformation of rock and soil masses may be highly nonlinear, which will affect the convergence and stability of the simulation program, especially in a critical state. To enhance the convergence, the research team discussed the performance of the Atiken method [15] in accelerating convergence for SPIM [13], but the convergence still needs to be improved. Thus, to enhance the performance of the simulation program, several effective strategies are proposed to improve the calculation efficiency, convergence, and stability.

The first issue we are concerned about is the convergence and stability of the simulation program. Since the face-based SPIM belongs to the implicit method, the incremental method and the Newton–Rapson iterative method are usually used to solve the governing equations [16]. Generally, the iteration can be divided into a local iteration and global iteration, where the local iteration is the stress integration algorithm, and the global iteration is the balance between internal and external forces. For the stress integration algorithm, its main idea is to solve the stress increment according to the given strain increment, which requires an iterative process to achieve plastic correction [17]. However, the stress integration algorithm may cause a convergence problem, even for simple ideal elastic–plastic models, which will cause nonconvergence in regions with high curvature of the yield surface [18]. For global iteration, its convergence is very sensitive to the size of the increment step, especially in a critical state. For an unknown calculation model, setting appropriate incremental steps can improve the stability of the simulation program. To improve the convergence and stability of the simulation program, in this paper, effective accelerated convergence strategies are designed based on the classical line search algorithm and adaptive sub-step method. Furthermore, this paper improves these methods in terms of the selection of local and global iteration step sizes [19].

Another issue we are concerned about is the computational efficiency of the simulation program. To improve the calculation efficiency, an effective strategy is to parallelize the program. Taking the FEM as an example, researchers have proposed parallel FEM based on the Central Processing Unit (CPU) [20,21,22,23,24,25] and the Graphics Processing Unit (GPU) [26,27,28,29] and achieved satisfactory results. For example, Oh and Hong [22] parallelized the FEM code using the Open Multi-Processing (OpenMP) library and achieved a satisfactory acceleration effect. For the simulation of a 3D cube under dynamic load, when the number of elements reaches 1,000,000, the calculation efficiency of the parallelized program using 24 cores is 12 times faster than the original serial code. For the FEM, Cecka [27] established and analyzed various methods for assembling and solving sparse linear systems of the FEM using Computing Unified Device Architecture (CUDA). The results show that GPU code using single precision can achieve a speedup ratio of 30 or more compared to a single-core CPU using double precision. Compared to GPU parallelization, CPU parallelization is easier to implement [30]. At present, research on CPU parallelization for SPIM has not been seen yet. Thus, this paper considers the OpenMP parallelization analysis and design of the SPIM program for the first time, mainly including feasibility analysis, parallelization implementation, and performance testing.

The rest of the paper is organized as follows: Section 2 gives a brief introduction to the face-based SPIM. Section 3 introduces several accelerated convergence strategies. Section 4 presents the OpenMP parallelization implementation of the SPIM. Section 5 provides the numerical test results. Section 6 includes the discussion and future work. Section 7 presents the conclusions.

2. Brief Introduction to the SPIM

Based on the generalized smoothing Galerkin weak form [10], the governing equations of the SPIM can be expressed as follows:

\int_{Ω} δ {(\overset{⌢}{ε} (u))}^{T} D (\overset{⌢}{ε} (u)) d Ω - \int_{Ω} δ u^{T} b d Ω - \int_{Γ} δ u^{T} t_{p} d Γ = 0,

(1)

where

\overset{⌢}{ε}

is the smoothing strain field,

u

is the standard displacement field obtained by the point interpolation method (PIM) [31],

D

is the material matrix,

b

is the body force vector, and

t_{p}

is the surface force vector. In fact, the governing equations of the SPIM are similar to those of the FEM. The difference is that the SPIM uses a smoothing strain field

\overset{⌢}{ε}

obtained based on the smoothing domain, and the smoothing domain can be constructed based on the background mesh of the FEM. For commonly used tetrahedral meshes, the smoothing domain can be formed based on each triangular facet, and the smoothing domain can be obtained by connecting the three nodes of each triangular facet with the centroids of the tetrahedra adjacent to each triangular facet. When the triangular facet lies at the boundary of the calculation model, the smoothing domain is tetrahedral; otherwise, the smoothing domain is hexahedral [32]. The expression of the smoothing strain field

\overset{⌢}{ε}

is given as follows:

{\overset{⌢}{ε}}_{k} = \frac{1}{V_{k}} \int_{Ω_{k}} ε^{h} (x) d Ω = \frac{1}{V_{k}} \int_{Γ_{k}} L u (x) d Γ,

(2)

where

ε^{h} (x)

is the standard strain field obtained by the PIM,

V_{k}

is the volume of each smoothing domain

Ω_{k}

bounded by

Γ_{k}

,

L

represents the differential operator, and

1 / V_{k}

represents the smoothing operator.

Furthermore, the discretized form of the smoothing strain field

\overset{⌢}{ε}

can be expressed as follows:

{\overset{⌢}{ε}}_{k} = \sum_{i \in N_{n}} {\overset{⌢}{B}}_{i} (x) {\bar{u}}_{i},

(3)

where

{\overset{⌢}{B}}_{i} (x)

represents the smoothing strain matrix,

{\bar{u}}_{i}

represents the displacement vector,

N_{n}

is the number of interpolation nodes associated with the smoothing domain, and the details of the node selection schemes can be seen in Reference [33]. In this paper, the T4 node selection scheme is used to search interpolation nodes. Furthermore, the expression of

{\overset{⌢}{B}}_{i} (x)

is given as follows:

{\overset{⌢}{B}}_{i} (x) = [\begin{matrix} {\overset{⌢}{b}}_{i x} (x) & 0 & 0 \\ 0 & {\overset{⌢}{b}}_{i y} (x) & 0 \\ 0 & 0 & {\overset{⌢}{b}}_{i z} (x) \\ 0 & {\overset{⌢}{b}}_{i z} (x) & {\overset{⌢}{b}}_{i y} (x) \\ {\overset{⌢}{b}}_{i z} (x) & 0 & {\overset{⌢}{b}}_{i x} (x) \\ {\overset{⌢}{b}}_{i y} (x) & {\overset{⌢}{b}}_{i x} (x) & 0 \end{matrix}],

(4)

and the discrete formula of the

{\overset{⌢}{b}}_{i l} (x)

is given as follows:

{\overset{⌢}{b}}_{i l} (x) = \frac{1}{V_{k}} \sum_{m = 1}^{N_{s}} [\sum_{n = 1}^{N_{g}} w_{n} ϕ_{i} (x) n_{l} A_{k}] (l = x, y, z)

(5)

where

N_{s}

is the number of boundary triangular facets of each smoothing domain. For the inner smoothing domain,

N_{s}

is 6; otherwise,

N_{s}

is 4.

N_{g}

is the number of Gauss points of each boundary triangular facet, and

N_{g}

is 1 in this paper.

w_{n}

is the weight,

ϕ_{i} (x)

is the shape function, and

n_{l}

is the unit outer normal vector.

By substituting the standard displacement field

u

and smoothing strain

\overset{⌢}{ε}

into Equation (1) and activating the variational operation, the discretized governing equation can be expressed as follows:

K_{i j} = \sum_{k = 1}^{N} {K_{i j}}^{(k)} = \int_{Ω_{k}} {\overset{⌢}{B}}_{i}^{T} D {\overset{⌢}{B}}_{j} d Ω = V_{k} {\overset{⌢}{B}}_{i}^{T} D {\overset{⌢}{B}}_{j},

(6)

f = \int_{Γ_{t}} {Φ_{t}}^{T} t_{p} d Γ + \int_{Ω} {Φ_{i}}^{T} b d Ω,

(7)

where

Φ_{t}

and

Φ_{i}

represent the shape function vector [31].

3. Accelerated Convergence Strategies

As stated in the Introduction, the Newton–Raphson method is used to conduct the local iteration and global iteration. However, the Newton–Raphson method converges only when the initial estimate is within the convergence radius. Moreover, the convergence is very sensitive to the size of the increment step for both local and global iterations. To enhance the convergence of both local and global iterations, efficient accelerated strategies are proposed by incorporating the line search method [34] and adaptive sub-step method [19].

3.1. Local Iteration

3.1.1. Stress Integration Algorithm

This paper uses the Closest Point Projection Method (CPPM) to conduct the stress integration [18], which belongs to the completely implicit method and has a higher precision. The core idea of the CPPM is elastic prediction and plastic correction, see Figure 1. If the elastic trial stress does not exceed the yield surface, the stress is updated directly without correction. If the elastic trial stress exceeds the yield surface, it needs to be corrected to the yield surface. Usually, the Newton–Raphson iterative method is used to conduct the plastic correction, and its iterative process is listed as follows:

(1) Set initial value: Before starting the iteration, the initial stress value is the initial elastic trial stress

σ^{B}

, and the plastic parameter

Δ λ

is set to 0. Then, we have

σ_{(0)} = σ^{B} = σ^{A} + D^{e} Δ ε, Δ λ_{(0)} = 0 .

(8)

In this paper, the path independent strategy is used to update the strain increment

Δ ε

, which is updated based on the convergent value of the previous increment step. The path independent strategy can describe the strain path more reasonably and avoid the pseudo-unloading phenomenon [35].

(2) Determine convergence or not:

F_{(k)} = F (σ_{(k)}) .

(9)

If

F_{(k)} < 10^{- 6}

, it converges and stops; otherwise, it continues to step (3).

(3) Calculate the stress residual

r_{(k)}

, plastic parameter increment

d Δ λ_{(k)}

, and stress increments

Δ σ_{(k)}

:

r_{(k)} = σ_{(k)} - σ^{T r i a l} + Δ λ_{(k - 1)} D^{e} b_{(k)},

(10)

d Δ λ_{(k)} = \frac{F_{(k)} - a_{(k)}^{T} Q_{(k)}^{- 1} r_{(k)}}{a_{(k)}^{T} Q_{(k)}^{- 1} D^{e} b_{(k)}},

(11)

Δ σ_{(k)} = - Q_{(k)}^{- 1} (r_{(k)} + d Δ λ_{(k)} D^{e} b_{(k)}),

(12)

where

Q = I + Δ λ D_{e} \frac{\partial b}{\partial σ}

,

σ^{T r i a l}

is the new elastic trial stress,

I

is the unit matrix, and

a

and

b

represent the first derivative of the yield function and the potential function with respect to stress, respectively.

(4) Update stress

σ_{(k + 1)}

, plastic parameter

Δ λ_{(k + 1)}

:

\begin{matrix} \begin{matrix}  \end{matrix} σ_{(k + 1)} = σ_{(k)} + Δ σ_{(k)}, \\ Δ λ_{(k + 1)} = Δ λ_{(k)} + d Δ λ_{(k)}, \end{matrix}

(13)

and then proceed to step (2).

When the path independent strategy and the closest point projection method are used for stress integration, the consistent tangential stiffness matrix

D^{e p c}

needs to be derived, which represents the rate of change of stress increment to strain increment and can be expressed as follows:

d Δ σ = D^{e p c} d Δ ε,

(14)

where

D^{e p c} = R - \frac{R b a^{T} R}{a^{T} R b}

R = Q^{- 1} D^{e} .

Moreover, the consistent tangential stiffness matrix can keep the asymptotic quadratic convergence rate of the Newton–Raphson method and also can avoid the pseudo-unloading caused by continuum tangential stiffness matrix.

However, the iterative process encounters a convergence problem; even for the ideal elastoplastic model, convergence problems may occur in high curvature regions of the yield surface [18]. To enhance the convergence of plastic correction, this paper employs the line search method and adaptive sub-step method.

3.1.2. Line Search Method

Only when the initial estimate is within the radius of convergence, the Newton–Raphson method shows good convergence. Moreover, the step size of the iteration is fixed, and the function value cannot be guaranteed to decline steadily. To ensure the function value decreases steadily, the line search method can be used to search for a proper step size that makes the function value decrease. In the line search method, the determination of step size can be divided into the accurate method and the inaccurate method [36,37]. The accurate method needs to search for a global optimal step size, which is very time-consuming. Thus, the inaccurate method is generally adopted, and the condition can be satisfied once the step size satisfies the function value decreases. In this paper, the simple backtracking algorithm is used, which needs to meet the following conditions:

f (x_{k} + α d_{k}) \leq f (x_{k}) + c α \nabla f {(x_{k})}^{T} d_{k},

(15)

where

f (x_{k})

is the objective function,

d_{k}

is the descent direction,

α

is the step size, and c is a dimensionless parameter and is usually set to

10^{- 4}

. From Figure 2, if the step size

α

is set within

(α_{1}, α_{2})

and

(α_{3}, α_{4})

, the function value will decrease.

In the CPPM, the stress residual

r

is used to determine whether the plastic correction is completed and it can be expressed as follows:

r = σ - σ^{T r i a l} + Δ λ D_{e} b .

(16)

To apply the line search method [34], the Euclidean norm of the stress residual

r

is chosen as the objective function. Moreover, the step size of the iteration can be controlled by only modifying the parameter

Δ λ

using the line search parameter

ζ

:

Q^{- 1} = {(I + ζ Δ λ D_{e} \frac{\partial b}{\partial σ})}^{- 1} .

(17)

Usually, the initial value of the line search parameter

ζ

is set to 1, which indicates that the Newton–Raphson method is used for iteration first. Assume that the function value of the iteration

k + 1

is greater than that of the iteration k, i.e.,

\begin{matrix} ∥r (σ_{(k + 1)})∥ = ∥r (σ_{(k)} + ζ_{(k)} d σ_{(k)})∥ > \\ ∥r (σ_{(k)})∥ - β ζ_{(k)} ∥\frac{\partial ∥r (σ_{(k)} + ζ_{(k)} d σ_{(k)})∥}{\partial σ_{(k)}} d σ_{(k)}∥, \end{matrix}

(18)

where

∥•∥

is the Euclidean norm operation,

ζ_{(k)}

is the line search parameter at the iteration k, and

β

is the fixed parameter; in this paper,

β = 10^{- 4}

.

If Equation (18) is satisfied, the results of the iteration

k + 1

are discarded, and the line search method is activated by replacing

ζ_{(k)}

using

ρ ζ_{(k)}

, where parameter

ρ

is the reduction factor and usually less than 1. Then, the iteration

k + 1

is restarted using a smaller step size. If Equation (18) is not satisfied, the proper step size is determined, and the line search parameter is kept unchanged in the following iterations. If Equation (18) is satisfied again, the line search parameter continues to be reduced until convergence is achieved. Usually, the number of searches is limited, and when the number of searches exceeds the maximum number, the line search fails; in this paper, the maximum number is set to 5.

In this paper, another case is considered: the convergence of the plastic correction can be completed by using the Newton–Raphson method, but the convergence rate is very slow. To solve this problem, when the Newton–Raphson method does not converge after 10 iterations, the line search parameter

ζ_{(k)}

is enlarged for the next iteration; in this paper, the enlargement factor is set to 2.

3.1.3. Adaptive Sub-Step Method

When the line search method fails, this paper further adopts an adaptive sub-step strategy to ensure convergence. In the adaptive sub-step strategy, the strain increment is divided into multiple parts, and the stress update is performed in each strain increment. In this paper, the strain increment

Δ ε

is divided into m uniform parts:

Δ ε = Δ ε_{1} + Δ ε_{2} + \dots + Δ ε_{m} = \sum_{k = 1}^{m} α_{k} Δ ε .

(19)

In this paper, the parameter

α_{k}

is constant and is equal to

1 / m

, the parameter m is dynamically adjusted, and its initial value is 1. When the line search fails, the current result is discarded, and the parameter m is enlarged, where the enlargement factor is 2. Then, the plastic correction is restarted, if convergence fails, the parameter m is continued to enlarge until convergence.

Assume that the stress before the update is

σ_{n}

, and the stress after the update is

σ_{n + 1}

. In the adaptive sub-step strategy, the process of stress update is expressed as follows:

σ_{n} \to σ_{n + 1 / m} \to σ_{n + 2 / m} \to \dots \to σ_{n + 1} .

(20)

For each sub-step, the plastic correction is conducted using the Newton–Rapson method and the line search method.

For the adaptive sub-step strategy, the consistent tangent stiffness matrix in Equation (14) is not appropriate. Thus, a consistent tangential stiffness matrix conforming to the adaptive sub-step strategy is adopted [19]. For each sub-step k, the nonlinear equations can be expressed as follows:

\frac{d Δ σ_{n + 1}}{d Δ ε} = [\sum_{i = 1}^{m} α_{i} \prod_{j = m}^{i} L_{n + j / m}] D_{e},

(21)

where

\begin{matrix} L_{n + k / m} = {(Q^{- 1} - \frac{Q^{- 1} D_{e} b a^{T} Q^{- 1}}{a^{T} Q^{- 1} D_{e} b})}_{n + k / m} . \end{matrix}

(22)

For example, when

k = 1

,

α_{1} = 1

, Equation (21) can be expressed as follows:

\begin{matrix} \frac{d Δ σ_{n + 1 / m}}{d Δ ε} = α_{1} L_{n + 1 / m} D_{e} = α_{1} {(R - \frac{R b a^{T} R}{a^{T} R b})}_{n + k / m}, \end{matrix}

which is consistent with Equation (14).

3.2. Global Iteration

For global iteration, when the incremental step size is large, the simulation program may not converge, especially in the critical state, and it is very sensitive to the size of the incremental step. Thus, selecting an appropriate increment step size can improve the convergence and stability of the simulation program. In ABAQUS, the adaptive sub-step method in the global iteration is designed as follows: if a sub-step can converge within a reasonable number of iterations, the incremental step size remains unchanged. If two consecutive sub-steps converge with a small number of iterations (≤4), the subsequent step size of each sub-step is increased by 50%; if a sub-step cannot converge after several iterations (≥16), 25% of the current step size is taken as the new step size. If the incremental step size is less than the specified value, the analysis program will not converge.

In this paper, a new adaptive sub-step method for global iteration is designed and can be expressed as follows:

(1) In the elastic stage, there is no convergence problem, which allows a larger incremental step size. To complete the elastic stage quickly, the initial step size can be determined by elastic trial calculation. The purpose is to make the initial step size as large as possible, and the model is still in the elastic phase. First, the trial step size R0 is initialized to 0.1. Second, for each smoothing domain, the stress is calculated using the elastic model. Once the stress of any smoothing domain exceeds the yield surface, the calculation model is judged in the plastic stage, and the initial step size R1 is set to R0. Otherwise, if the model is still in the elastic stage, then, we enlarge the R0 by adding 0.1 until the model is in the plastic phase.

(2) When determining the initial step size R1, the subsequent incremental step size R2 remains unchanged as long as the number of iterations within the incremental step is less than N.

(3) When the number of iterations within the incremental step is higher than N, it can be judged that the incremental step does not converge, and the N is set to 20 in this paper. Then, the new incremental step size is set to 50% of R2, and we restart the iteration. If the incremental step size is reduced 5 times and still cannot converge, it can be judged that the analysis program cannot converge.

Compared with the ABAQUS, the operation of enlarging the incremental step size is not considered in the SPIM simulation program. The reason is that the stress may diverge from the yield surface when the incremental step size is large, which will increase the time of stress correction. Moreover, the number of iterations cannot be used to judge the nonlinearity degree; thus, it is not proper to enlarge the incremental step size in the plastic stage. Once the next incremental step cannot converge, it needs to be recalculated, which affects the calculation efficiency. Another change is that the calculation time of the elastic stage is effectively reduced by the elastic trial calculation.

4. OpenMP Parallel Design

The OpenMP is a parallel computing programming model, which can easily transform serial code into parallel code [38]. The OpenMP works as follows: (1) Developers use OpenMP instructions to mark parallel regions. For example, the most commonly-used instruction is “#pragma omp parallel for”, which can be used before the for loop. (2) The compiler analyzes these instructions and generates multi-thread parallel code. (3) The computer system is responsible for creating and managing threads and distributing the workload to different threads when running the program. (4) In parallel regions, multiple threads execute code at the same time and share variables in memory. However, when using the OpenMP, data competition and data dependence need to be avoided. For example, two threads updating a shared variable at the same time is not allowed. Thus, before parallelizing the SPIM simulation program, it is necessary to determine whether the main modules of the program can be parallelized, that is, whether the main module has data competition and data dependence.

The flowchart of the SPIM simulation program is illustrated as follows; see Figure 3. The SPIM program mainly includes the following modules: constructing the smoothing domain, assembling the system stiffness matrix, imposing the boundary conditions, solving the system equations, and solving the stress and strain. In the following part, whether the main modules of SPIM are suitable for parallelization will be analyzed.

(1) Constructing the smoothing domain. The smoothing domain of the face-based SPIM can be constructed as follows: First, the centroid of each tetrahedron is determined; then, a loop for all edges of each tetrahedron is conducted, and the two nodes of each edge and the centroid of the tetrahedron can form a new triangular facet. Then, according to the topology between the edge and the face, the new triangular facet is stored as a boundary facet of the smoothing domain of the original face connected to the edge, where the original face is the triangular facet of the tetrahedron and not the newly generated triangular facet. Finally, after the loop is completed for all tetrahedrons, the loop continues for the triangular facets on the boundary surface, and each boundary triangular facet is stored as part of the smoothing domain of the boundary triangular facet.

As can be seen from the above, there is data competition when constructing the smoothing domain, and the reasons are as follows. The inner smoothing domain is composed of two adjacent tetrahedral elements. If the two tetrahedral elements are controlled by two different threads, there will be array access conflicts when storing the new triangular facet as the boundary surface of the smoothing domain. Moreover, designing adjacent tetrahedral elements to be controlled by the same thread is difficult. Thus, constructing the smoothing domain is not suitable for parallelization. To enhance the efficiency of constructing the smoothing domain, the effective algorithm is adopted in the simulation program [32], which can effectively reduce the complexity of the algorithm.

(2) Assembling the system stiffness matrix. This part mainly includes calculating the sub-stiffness matrix of each smoothing domain and assembling the sub-stiffness matrix into the system stiffness matrix, and its concrete steps are as follows. First, a loop for all smoothing domains is conducted; then, the smoothing strain matrix of each smoothing domain is calculated. Then, the sub-stiffness matrix is calculated using Equations (4) and (5); finally, the sub-stiffness matrix is assembled into the system stiffness matrix. More details about assembling the system stiffness matrix can be seen in Reference [13].

Since each smoothing domain is independent, the smoothing strain matrix and sub-stiffness matrix can be calculated using the OpenMP parallel method. However, when assembling the system stiffness matrix A, any two smoothing domains controlled by different threads may have the same node index, which will lead to array access conflicts when performing accumulation operations. However, considering that assembling the system stiffness matrix will occupy a lot of solving time, the parallel design for this region is still conducted. To avoid data competition, the sleep-waiting lock operation is adopted; see Algorithm 1.

Algorithm 1: Pseudocode of assembling the system stiffness matrix using the OpenMP instruction

(3) Imposing boundary conditions. To restrict rigid body motions, displacement boundary conditions must be imposed. Fortunately, the shape function of the SPIM has the property of the Kronecker function, and its displacement boundary conditions can be imposed as easily as the FEM. In this paper, the penalty function method is adopted to impose boundary conditions [39]. The penalty function method only needs to modify diagonal elements in the system stiffness matrix and the corresponding components of the force vector, which is easy to implement, see Equation (23).

K_{i i} \Rightarrow α \cdot K_{i i}, \begin{matrix} F_{i} \Rightarrow F_{j} = \{\begin{matrix} α K_{i i} {\bar{u}}_{i}, i = j \\ F_{j}, i \neq j \end{matrix} \end{matrix},

(23)

where

K_{i i}

is the original diagonal elements in the stiffness matrix,

α

is the penalty factor, which is much larger than elements in the stiffness matrix,

F_{i}

is the original force vector,

F_{j}

is the revised force vector, and

{\bar{u}}_{i}

is the predefined displacement vector. Usually,

α

can be set to

α = 10^{4} \sim 10^{8} \times {(K_{i i})}_{max}

, where

K_{i i}

is the largest diagonal element.

Since the indexes of the boundary nodes in the system stiffness matrix are unchanged, to enhance the efficiency of imposing boundary conditions, a preprocessing function is designed, which can determine the indexes of the positions to be modified in the stiffness matrix and force vector matrix in advance, and the indexes are stored in a specific array. When assembling the system stiffness matrix, the indexes that need to be modified can be quickly obtained, which can greatly reduce the calculation time. Thus, parallelization is not necessary when imposing boundary conditions in the SPIM simulation program.

(4) Solving the system equations. As stated in the Introduction, to enhance the calculation efficiency, the research team introduced the PARDISO solution library to solve the system equations for the face-based SPIM. The test results show that when the number of nodes is 13,710, the time to solve the system equations is only 3.077 s, which is satisfactory. Thus, OpenMP parallelization is not considered when solving the system equations. More details about solving the system equations can be seen in Reference [13].

(5) Solving the strain and stress. In this part, the strain increment is first obtained by the nodal displacement increment, and then the stress field of the calculation model can be obtained using the stress integration algorithm. As introduced in Section 3, the stress integration algorithm needs to be solved iteratively. When the iterations do not converge, the line search method and adaptive sub-step method are needed to enhance the convergence, which will occupy a lot of solving time, especially when the nonlinearity degree is high.

Similar to assembling the system stiffness matrix, solving the strain and stress is carried out based on each smoothing domain independently, without data competition and data dependence, which is suitable for parallelization. The pseudocode of solving the strain and stress using the OpenMP instruction can be seen in Algorithm 2.

Algorithm 2: Pseudocode of solving the strain and stress using the OpenMP instruction

5. Numerical Tests

In this section, the correctness, convergence, stability, and OpenMP parallel performance of the SPIM simulation program are verified. To test the correctness, convergence, and stability, a classical simplified heterogeneous slope model is employed, and the strength reduction method is used to conduct the slope stability analysis. To test the OpenMP parallel performance, a large-scale slope model and an underground gas storage model are employed.

5.1. Correctness

This paper utilizes a classical heterogeneous slope model, and its geometry and mesh models are illustrated in Figure 4. The mesh model is composed of 1356 nodes, 3925 tetrahedral elements, and 9153 smoothing domains. To simulate the plane strain problem, the displacement boundary conditions are set as follows: the three directions at the bottom of the model are fixed, the normal directions around the model are fixed, and the top of the model is free. Additionally, the modified Mohr–Coulomb (MC) model [40] is employed, and the calculation parameters can be seen in Table 1.

To conduct the slope stability analysis, three convergence criteria are used, including the calculation program nonconvergence (Criterion 1), displacement mutation criterion (Criterion 2), and equivalent plastic zone penetration criterion (Criterion 3). In addition, the safety factor determined by the Bishop method is 0.43, which is selected as the baseline. First, the safety factor of the simplified slope model calculated by Criterion 1 is 0.54. For Criterion 2, the Z-displacement of Point A and the X-displacement of Point B are selected as the feature points, and the displacement change curve of feature points under different reduction factors is illustrated in Figure 5. It can be seen that when the reduction factor is 0.43, the displacement at Point A and Point B produces mutation; thus, the safety factor determined by Criterion 2 is 0.42. For Criterion 3, the counter of equivalent plastic zone under different reduction factors is illustrated in Figure 6. When the reduction factor is 0.42, the equivalent plastic zone is not yet penetrated. When the reduction factor is 0.43, the equivalent plastic zone is penetrated. Thus, the safety factor determined by Criterion 3 is 0.42.

It can be found that the safety factor of the simplified slope model calculated by Criterion 1 is larger than that of Criterion 2 and Criterion 3. Generally, the safety factor determined by Criterion 2 and Criterion 3 is closer to the results of the limit equilibrium method than Criterion 1, which is consistent with the test results [41]. Moreover, the safety factor determined by Criterion 2 and Criterion 3 is 0.42, which is close to the safety factor determined by the Bishop method. The numerical simulation results verify the correctness of the simulation program.

5.2. Convergence and Stability

As discussed above, two effective accelerated convergence strategies are proposed to improve the convergence and stability of the simulation program. To verify this, two sets of mechanical parameters when the reduction factors are 0.42 and 0.54 are selected, which correspond to different nonlinearity degrees, and the latter one is higher. Moreover, the incremental step size R2 is set to 0.04, 0.08, 0.1, and 0.2 when the reduction factor is 0.42, while the R2 is set to 0.01, 0.02, 0.04, and 0.08 when the reduction factor is 0.54.

When the reduction factor is 0.42, the initial step size R1 is 0.2, and the incremental step size R2 is unchanged for all cases. Moreover, the average number of iterations is 6.67, 7.09, 7.22, and 8.20 for each case, and the maximum number of iterations is 10; see Figure 7. When the reduction factor is 0.54, the initial step size R1 is 0.1, and the incremental step size R2 is not always unchanged for all cases. For example, when the R2 is 0.01, 0.04, and 0.08, the analysis program cannot converge unless the R2 is lessened. Moreover, the average number of iterations is 7.81, 8.19, 9.21, and 9.86 for each case, and the maximum number of iterations is 19; see Figure 8. Thus, the analysis program maintains good convergence and stability. In addition, when the reduction factor is 0.42, the changes in the absolute value of the maximum unbalanced force (MaxUF) and the maximum displacement increment (MaxDU) within the last increment step are listed in Table 2. From Table 2, it can be found that the rate of convergence is quadratic even for large load increments.

In this paper, the line search method is used to improve the convergence of the local iteration. To verify this, the mechanical parameters are selected when the reduction factor is 0.54 and the R2 is set to 0.08. First, two integration points are selected, and the convergence of the local iteration is tested. When the local iteration does not converge, the line search method is used to reduce the step size, and the local iteration can achieve convergence. When the local iteration can converge but the number of iterations is large, the line search method is used to enlarge the step size, and the convergence is better; see Figure 9.

5.3. OpenMP Parallel Performance

As discussed above in Section 4, the OpenMP feasibility analysis of the SPIM simulation program is conducted, and two modules of assembling the stiffness matrix and solving the stress and strain are parallelized. To verify the performance of the OpenMP parallel program, two large-scale models are employed, including a real slope model and a simplified underground gas storage model. In this paper, the CPU of the computing platform is Intel Core i5-13500HX, and the number of cores is 20.

5.3.1. Real Slope Model

Case 1 is a heterogeneous slope model, which is composed of 11,292 nodes, 52,600 tetrahedral elements, and 110,066 smoothing domains, and its geometry and mesh model can be seen in Figure 10. The lithology of the slope is gneiss with different degrees of weathering, and its calculation parameters are shown in Table 3.

For Case 1, the safety factor of the real slope model calculated by Criterion 1 is 1.30. To better test the OpenMP parallel performance, this paper designs the following test schemes. (1) The reduction factor of the slope is set to 1.00 and 1.30, which represent different mechanical parameters and different nonlinearity degrees, respectively. The larger the reduction factor, the higher the nonlinearity degree and the longer the solving time. (2) For each set of mechanical parameters, the incremental step size R2 is set to 0.01, 0.02, and 0.04. Moreover, the smaller the R2, the longer the solving time.

Table 4 is the solving time and speedup ratio of different test schemes. The test results indicate the following: (1) when the reduction factor is 1.00, with the increase in the R2, the speedup ratio is 7.70, 7.14, and 6.04, which decreases gradually; (2) when the reduction factor is 1.30, with the increase in the R2, the speedup ratio is 8.08, 7.78, and 6.46, which decreases gradually; (3) when the reduction factor is 1.00 and 1.30, the average speedup ratios are 6.96 and 7.44, respectively; (4) when the R2 is the same, the larger the reduction factor, the larger the speedup ratio. The above results show that the OpenMP parallel method can effectively enhance the computational efficiency of the simulation program and achieve a speedup ratio of 8.08 at most. Moreover, as the amount of calculation and the nonlinearity degree increases, the speedup ratio shows an increasing trend.

To further verify, this paper compares the solving time of the last 10 loading steps, before and after the parallelization. Figure 11 provides the test results when the reduction factor is 1.00. When R2 is 0.01, the minimum speedup ratio is 8.67, the maximum value is 14.50, and the average value is 11.89. When R2 is 0.02, the minimum speedup ratio is 11.00, the maximum value is 13.00, and the average value is 12.36. When R2 is 0.04, the minimum speedup ratio is 10.67, the maximum value is 14.00, and the average value is 12.55. Figure 12 provides the test results when the reduction factor is 1.30. When R2 is 0.01, the minimum speedup ratio is 8.25, the maximum value is 13.00, and the average value is 10.39. When R2 is 0.02, the minimum speedup ratio is 11.00, the maximum value is 13.50, and the average value is 11.60. When R2 is 0.04, the minimum speedup ratio is 10.67, the maximum value is 11.33, and the average value is 11.00. The test results show that the acceleration effect of the OpenMP parallelization is more prominent for each loading step and achieves a speedup ratio of 14.50 at most.

5.3.2. Simplified Underground Gas Storage Model

Case 2 is a simplified underground gas storage model, and its geometry and mesh model can be seen in Figure 13. The lithology of the underground gas storage model from top to bottom is mudstone, salt rock, and mudstone, and its calculation parameters are shown in Table 5. To further verify, this paper designs the following test schemes: (1) the width of the calculation model is set to 40 m, 80 m, and 120 m (Model 1∼Model 3); (2) for each calculation model, the R2 is set to 0.01, 0.02, and 0.04. The mesh size of the three models is set to be the same, and the details of these calculation models are shown in Table 6.

Table 7 is the solving time and speedup ratio of different test schemes. The test results indicate the following: (1) for Model 1, Model 2, and Model 3, the average speedup ratio under different R2 is 7.41, 7.53, 7.78; (2) for Model 1, Model 2, and Model 3, with the increase in the R2, the speedup ratio decreases gradually; (3) when the R2 is the same, as the amount of calculation increases, the speedup ratio shows an increasing trend. The above results show that the OpenMP parallel method can effectively enhance the computational efficiency of the simulation program and achieve a speedup ratio of 7.78 at most.

To further verify, this paper compares the solving time of the last 10 loading steps before and after the parallelization when R2 is 0.01; see Figure 14. For Model 1, the minimum speedup ratio is 8.00, the maximum value is 11.00, and the average value is 9.05. For Model 2, the minimum speedup ratio is 8.50, the maximum value is 11.33, and the average value is 9.88. For Model 3, the minimum speedup ratio is 8.56, the maximum value is 12.57, and the average value is 9.73. The test results show that the acceleration effect of the OpenMP parallelization is more prominent for each loading step and achieves a speedup ratio of 12.57 at most.

6. Discussion and Future Work

To improve the performance of the SPIM simulation program, this paper adopts the line search method and the adaptive sub-step method to improve the convergence and stability. Moreover, the OpenMP parallelization design is carried out for the SPIM simulation program. The test results show that the program has an asymptotic quadratic convergence and satisfactory stability. In addition, the speedup ratio of the OpenMP parallel program reached up to 14.50 when using a computing platform with 20 CPU cores.

However, the calculation efficiency of the SPIM simulation program still needs to be improved. From Section 4, it can be known that two modules of assembling the stiffness matrix and solving the stress and strain have been parallelized. Due to the data competition in assembling the stiffness matrix, this paper adopts the sleep-waiting lock operation, which can ensure that at most only one thread can pass through the data competition region at any time and can avoid data competition. However, the sleep-waiting lock operation will cause a waste of computing resources. To avoid this situation, researchers usually use multi-color dyeing algorithms to partition the background mesh, and the associated mesh can be manipulated by different threads [42,43]. However, since the internal smoothing domain of the face-based SPIM is constructed using adjacent mesh elements, and the nodes associated with the smoothing domain may be involved in multiple mesh elements, in FEM, the nodes required for Gaussian point interpolation are limited to the element vertices where the Gaussian points are located. In SPIM, the interpolation nodes will cross the background element, which increases the complexity of the dyeing process. Thus, designing an applicable multi-color dyeing algorithm for the face-based SPIM is necessary.

Another method is to parallelize the SPIM simulation program using the GPU. At present, to achieve high-performance computing, GPU parallel design has been successfully applied to geotechnical problems. For example, Chen accelerated the SPH particle flow simulation program using the GPU, and the calculation efficiency was achieved at 160 times that of a single CPU [44]. Dong proposed a GPU-based MPM parallelization method; when simulating the interaction between the structural unit and soil, the maximum single-precision speedup ratio of the GPU reached 30, and the double-precision speedup ratio was 20 [45]. Moreover, there are many studies on GPU parallel implementation of the FEM [26,27,28,29,46]. At present, the research on GPU parallelization for SPIM has not been seen yet, which is a direction worthy of study.

7. Conclusions

To improve the performance of the SPIM simulation program, the line search algorithm, the adaptive sub-step method, and the OpenMP parallel method are adopted to enhance the convergence, stability, and computational efficiency. To verify, three different cases are adopted for testing in this paper. The results are as follows: (1) The test results of the slope stability analysis show that the safety factor calculated by the SPIM program is 0.42, which is close to the safety factor of 0.43 calculated by the Bishop method, indicating the correctness of the program. (2) The SPIM program can effectively solve the problems of non-convergence or slow convergence during the local iteration process. For the global iteration, the simulation program can achieve an asymptotic quadratic convergence and satisfactory stability, even in a critical state. (3) The test results of two large-scale models show that the SPIM program can achieve a speedup ratio of 6 to 8 on a computing platform with 20 CPU cores, and the maximum speedup ratio for a single load step can reach 14.50.

However, the calculation efficiency of the SPIM program still needs to be improved. For example, a multi-color dyeing algorithm suitable for face-based SPIM should be established to avoid the data competition problem encountered in assembling the stiffness matrix. Moreover, the GPU parallelization analysis and design for the SPIM program require further research.

Author Contributions

Conceptualization, J.Q. and N.X.; Methodology, J.Q.; Software, T.Y.; Validation, T.Y., J.Q., and N.X.; Formal analysis, T.Y.; Investigation, T.Y. and G.M.; Resources, N.X.; Data curation, T.Y.; Writing—original draft, T.Y.; Writing—review and editing, J.Q., N.X., and Y.Q.; Visualization, T.Y.; Supervision, N.X.; Project administration, J.Q. and Y.Q.; Funding acquisition, N.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key R&D Program of China (2023YFB4005500) and the Natural Science Foundation of China (42407243).

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

The authors would like to thank the editor and the reviewers for their contributions.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

SPIM	Smoothed Point Interpolation Method
FEM	Finite Element Method
OpenMP	Open Multi-Processing
CPU	Central Processing Unit
GPU	Graphics Processing Unit
CUDA	Computing Unified Device Architecture
PARDISO	Parallel Direct Sparse Solver
MC	Mohr–Coulomb
CPPM	Closest Point Projection Method
GPM	Generalized Smoothing Galerkin Weak Form
MaxUF	Maximum Unbalanced Force
MaxDU	Maximum Displacement Increment

References

Liu, G. AG space theory and a weakened weak (W2) form for a unified formulation of compatible and incompatible methods: Part I: Theory. Int. J. Numer. Methods Eng. 2010, 81, 1093–1126. [Google Scholar] [CrossRef]
Zhou, L.; Ren, S.; Meng, G.; Ma, Z. Node-based smoothed radial point interpolation method for electromagnetic-thermal coupled analysis. Appl. Math. Model. 2020, 78, 841–862. [Google Scholar] [CrossRef]
Tootoonchi, A.; Khoshghalb, A.; Liu, G.R.; Khalili, N. A cell-based smoothed point interpolation method for flow-deformation analysis of saturated porous media. Comput. Geotech. 2016, 75, 159–173. [Google Scholar] [CrossRef]
Liu, G. AG space theory and a weakened weak (W2) form for a unified formulation of compatible and incompatible methods: Part II: Applications to solid mechanics problems. Int. J. Numer. Methods Eng. 2010, 81, 1127–1156. [Google Scholar] [CrossRef]
Feng, S.; Cui, X.; Li, A.; Xie, G. A face-based smoothed point interpolation method (FS-PIM) for analysis of nonlinear heat conduction in multi-material bodies. Int. J. Therm. Sci. 2016, 100, 430–437. [Google Scholar] [CrossRef]
Khoshghalb, A.; Shafee, A.; Tootoonchi, A.; Ghaffaripour, O.; Jazaeri, S. Application of the smoothed point interpolation methods in computational geomechanics: A comparative study. Comput. Geotech. 2020, 126, 103714. [Google Scholar] [CrossRef]
Khoshghalb, A.; Shafee, A. Does the upper bound solution property of the Node-based Smoothed Point Interpolation Methods (NSPIMs) hold true in coupled flow-deformation problems of porous media? Comput. Geotech. 2021, 133, 104016. [Google Scholar] [CrossRef]
El Yazidi, Y.; Charkaoui, A.; Zeng, S. Finite element solutions for variable exponents double phase problems. Numer. Algorithms 2025. [Google Scholar] [CrossRef]
Chen, J.; Wu, C.; Yoon, S.; You, Y. A stabilized conforming nodal integration for Galerkin mesh-free methods. Int. J. Numer. Methods Eng. 2001, 50, 435–466. [Google Scholar] [CrossRef]
Liu, G. A generalized gradient smoothing technique and the smoothed bilinear form for galerkin formulation of a wide class of computational methods. Int. J. Comput. Methods 2011, 5, 199–236. [Google Scholar] [CrossRef]
Liu, G.; Zhang, G. Edge-based smoothed point interpolation methods. Int. J. Comput. Methods 2008, 5, 621–646. [Google Scholar] [CrossRef]
Feng, S.; Cui, X.; Chen, F.; Liu, S.; Meng, D. An edge/face-based smoothed radial point interpolation method for static analysis of structures. Eng. Anal. Bound. Elem. 2016, 68, 1–10. [Google Scholar] [CrossRef]
Qin, J.; Xu, N.; Mei, G. Designing an efficient smoothed point interpolation method for modeling slope deformation. Eng. Comput. 2023, 40, 1175–1194. [Google Scholar] [CrossRef]
Qin, J.; Xu, N.; Mei, G. Comparative study of face-based smoothed point interpolation method and linear finite element method for elastoplastic and large deformation problems in geomaterials. Eng. Anal. Bound. Elem. 2024, 169, 105969. [Google Scholar] [CrossRef]
Chow, Y.; Kay, S. On the Aitken acceleration method for nonlinear problems. Comput. Struct. 1984, 19, 757–761. [Google Scholar] [CrossRef]
Kim, N.H. Introduction to Nonlinear Finite Element Analysis; Springer: New York, NY, USA, 2014. [Google Scholar]
Simo, J.; Hughes, T. Computational Inelasticity; Springer: New York, NY, USA, 2013. [Google Scholar]
Huang, J.; Griffiths, D. Return Mapping Algorithms and Stress Predictors for Failure Analysis in Geomechanics. J. Eng. Mech. 2009, 135, 276–284. [Google Scholar] [CrossRef]
Perez-Foguet, A.; Rodriguez-Ferran, A.; Huerta, A. Consistent tangent matrices for substepping schemes. Comput. Methods Appl. Mech. Eng. 2001, 190, 4627–4647. [Google Scholar] [CrossRef]
Wu, C. A multicolour SOR method for the finite-element method. J. Comput. Appl. Math. 1990, 30, 283–294. [Google Scholar] [CrossRef]
Pantalé, O. Parallelization of an object-oriented FEM dynamics code: Influence of the strategies on the Speedup. Adv. Eng. Softw. 2005, 36, 361–373. [Google Scholar] [CrossRef]
Oh, S.E.; Hong, J.W. Parallelization of a finite element Fortran code using OpenMP library. Adv. Eng. Softw. 2017, 104, 28–37. [Google Scholar] [CrossRef]
Jarzebski, P.; Wisniewski, K.; Taylor, R. On parallelization of the loop over elements in FEAP. Comput. Mech. 2015, 56, 77–86. [Google Scholar] [CrossRef]
Rao, A. MPI-based parallel finite element approaches for implicit nonlinear dynamic analysis employing sparse PCG solvers. Adv. Eng. Softw. 2005, 36, 181–198. [Google Scholar] [CrossRef]
Guo, X.; Lange, M.; Gorman, G.; Mitchell, L.; Weiland, M. Developing a scalable hybrid MPI/OpenMP unstructured finite element model. Comput. Fluids 2015, 110, 227–234. [Google Scholar] [CrossRef]
Komatitsch, D.; Michéa, D.; Erlebacher, G. Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA. J. Parallel Distrib. Comput. 2009, 69, 451–460. [Google Scholar] [CrossRef]
Cecka, C.; Lew, A.J.; Darve, E. Assembly of finite element methods on graphics processors. Int. J. Numer. Methods Eng. 2011, 85, 640–669. [Google Scholar] [CrossRef]
Dziekonski, A.; P., S.; A., L.; M., M. Generation of large finite-element matrices on multiple graphics processors. Int. J. Numer. Methods Eng. 2013, 94, 204–220. [Google Scholar] [CrossRef]
Fu, Z.; James, L.T.; Kirby, R.M.; Whitaker, R.T. Architecting the finite element method pipeline for the GPU. J. Comput. Appl. Math. 2014, 257, 195–211. [Google Scholar] [CrossRef]
Nahar, D.; Phaye, M.D.; Garg, R.; George, A.S.; K, C. A Comparative Analysis of OpenMP and Cuda in Real-Time Applications. In Proceedings of the 2025 8th International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 23–25 July 2025; pp. 1127–1132. [Google Scholar] [CrossRef]
Liu, G.; Gu, Y. An Introduction to Meshfree Methods and Their Programming; Springer: Berlin/Heidelberg, Germany, 2015. [Google Scholar]
Li, Y.; Yue, J.; Niu, R.; Liu, G. Automatic mesh generation for 3D smoothed finite element method (S-FEM) based on the weaken-weak formulation. Adv. Eng. Softw. 2016, 99, 111–120. [Google Scholar] [CrossRef]
Chen, L.; Nguyen-Xuan, H.; Nguyen-Thoi, T.; Zeng, K.; Wu, S. Assessment of smoothed point interpolation methods for elastic mechanics. Int. J. Numer. Methods Biomed. Eng. 2010, 26, 1635–1655. [Google Scholar] [CrossRef]
Jeremić, B. Line search techniques for elasto-plastic finite element computations in geomechanics. Commun. Numer. Methods Eng. 2001, 17, 115–125. [Google Scholar] [CrossRef]
Dodds, R.H. Numerical techniques for plasticity computations in finite element analysis. Comput. Struct. 1987, 26, 767–779. [Google Scholar] [CrossRef]
Prudente, L. A Wolfe Line Search Algorithm for Vector Optimization. ACM Trans. Math. Softw. 2019, 45, 1–23. [Google Scholar] [CrossRef]
Shi, Z.-J.; Shen, J. Memory gradient method with Goldstein line search. Comput. Math. Appl. 2007, 53, 28–40. [Google Scholar] [CrossRef]
Chandra, R.; Menon, R.; Dagum, L.; Kohr, D.; Maydan, D.; McDonald, J. Parallel Programming in OpenMP; Morgan Kaufmann: Burlington, MA, USA, 2000. [Google Scholar]
Liu, G.; Karamanlidis, D. Meshfree methods: Moving beyond the finite element method. Appl. Mech. Rev. 2003, 56, B17–B18. [Google Scholar] [CrossRef]
Abbo, A.; Sloan, S. A smooth hyperbolic approximation to the mohr-coulomb yield criterion. Comput. Struct. 1995, 54, 427–441. [Google Scholar] [CrossRef]
Liu, J.; Luan, M.; Zhao, S.; Yuan, F.; Wang, J. Discussion on criteria for evaluating stability of slope in elastoplastic FEM based on shear strength reduction technique. Rock Soil Mech. 2005, 26, 1345–1348. (In Chinese) [Google Scholar]
Zhao, H.; Lin, H.; Dong, Z. Composing in parallel the stiffness matrixes of FEM with the theory of multi-color dyeing. Mech. Eng. 2005, 27, 1. [Google Scholar]
Krysl, P. Parallel assembly of finite element matrices on multicore computers. Comput. Methods Appl. Mech. Eng. 2024, 428, 117076. [Google Scholar] [CrossRef]
Chen, J.-Y.; Lien, F.-S.; Peng, C.; Yee, E. GPU-accelerated smoothed particle hydrodynamics modeling of granular flow. Powder Technol. 2020, 359, 94–106. [Google Scholar] [CrossRef]
Dong, Y.; Wang, D.; Randolph, M.F. A GPU parallel computing strategy for the material point method. Comput. Geotech. 2015, 66, 31–38. [Google Scholar] [CrossRef]
Kiran, U.; Sharma, D.; Gautam, S. A GPU-based framework for finite element analysis of elastoplastic problems. Computing 2023, 105, 1673–1696. [Google Scholar] [CrossRef]

Figure 1. Illustration of elastic prediction and plastic correction.

Figure 2. Illustration of backtracking algorithm.

Figure 3. The flowchart of the SPIM program.

Figure 4. Geometry and mesh model of the heterogeneous slope.

Figure 5. Displacement change curve of feature points: (a) Z-displacement of Point A; (b) X-displacement of Point B.

Figure 6. Counter of the equivalent plastic zone under different reduction factors: (a) 0.42, (b) 0.43, (c) 0.44.

Figure 7. Convergence at different R2 when the reduction factor is 0.42: (a) R2 = 0.04; (b) R2 = 0.08; (c) R2 = 0.1; (d) R2 = 0.2.

Figure 8. Convergence at different R2 when the reduction factor is 0.54: (a) R2 = 0.01; (b) R2 = 0.02; (c) R2 = 0.04; (d) R2 = 0.08.

Figure 9. Convergence of the local iteration: (a) Integration Point 1; (b) Integration Point 2.

Figure 10. Geometry and mesh model of a real slope model.

Figure 11. Solving time and speedup ratio of the last 10 loading steps when the reduction factor is 1.00: (a) R2 is 0.01, (b) R2 is 0.02, (c) R2 is 0.04.

Figure 12. Solving time and speedup ratio of the last 10 loading steps when the reduction factor is 1.30: (a) R2 is 0.01, (b) R2 is 0.02, (c) R2 is 0.04.

Figure 13. Geometry and mesh model of a simplified underground gas storage model.

Figure 14. Solving time and speedup ratio of the last 10 loading steps for Model 1∼Model 3 when R2 is 0.01: (a) Model 1, (b) Model 2, (c) Model 3.

Table 1. Calculation parameters of simplified slope model.

Layers	Elasticity Modulus (E/MPa)	Poisson Ratio ( $ν$ )	Cohesion (c/kPa)	Friction Angle ( $φ /^{\circ}$ )	Unit Weight ( $γ$ /kN· $m^{- 3}$ )
1	100	0.3	18.8	12	29.4
2	100	0.3	18.8	5	9.8
3	100	0.3	18.8	40	29.4

Table 2. Changes in the absolute value of the MaxUF and the MaxDU within the last increment step.

Number of Iterations	R2 = 0.1		R2 = 0.2
Number of Iterations	MaxDF (N)	MaxDU (m)	MaxDF (N)	MaxDU (m)
1	23,349.9	0.0162963	46,700.4	0.0303808
2	17,573.8	0.00210613	31,920.4	0.00646306
3	5202.2	0.000281799	9773.64	0.0017516
4	160.466	7.73369 $\times 10^{- 6}$	1148.43	0.000143174
5	1.90035	8.08061 $\times 10^{- 8}$	40.5877	2.45711 $\times 10^{- 6}$
6	0.0302254	1.1964 $\times 10^{- 9}$	1.30748	2.93889 $\times 10^{- 8}$
7	0.000481839	1.64635 $\times 10^{- 11}$	0.0219553	4.96496 $\times 10^{- 10}$
8	7.54453 $\times 10^{- 6}$	2.29069 $\times 10^{- 13}$	0.000471364	7.30844 $\times 10^{- 12}$
9	1.10085 $\times 10^{- 7}$	3.19883 $\times 10^{- 15}$	8.17335 $\times 10^{- 6}$	1.17029 $\times 10^{- 13}$
10	-	-	1.50496 $\times 10^{- 7}$	1.75795 $\times 10^{- 15}$

Table 3. Calculation parameters of a real slope model.

Layers	Elasticity Modulus (E/MPa)	Poisson Ratio ( $ν$ )	Cohesion (c/kPa)	Friction Angle ( $φ /^{\circ}$ )	Unit Weight ( $γ$ /kN· $m^{- 3}$ )
1	3928.8	0.16929	140	33.0	24.0
2	4719.2	0.16812	168	39.6	25.0
3	5679.7	0.17350	2010	47.5	27.0

Table 4. Solving time and speedup ratio of different test schemes.

Reduction Factor	R2	Unaccelerated	Accelerated	Speedup Ratio
	0.01	1456 s	189 s	7.70
1.00	0.02	821 s	115 s	7.14
	0.04	477 s	79 s	6.04
	0.01	2132 s	264 s	8.08
1.30	0.02	1175 s	151 s	7.78
	0.04	640 s	99 s	6.46

Table 5. Calculation parameters of a simplified underground gas storage model.

Layers	Elasticity Modulus (E/MPa)	Poisson Ratio ( $ν$ )	Cohesion (c/kPa)	Friction Angle ( $φ /^{\circ}$ )	Unit Weight ( $γ$ /kN· $m^{- 3}$ )
1	10,000.2	0.27	1000	35.0	26.5
2	17,584.6	0.27	1000	30.0	23.0
3	10,000.2	0.27	1000	35.0	26.5

Table 6. Details of calculation models for a simplified underground gas storage model.

Model	Number of Nodes	Number of Tetrahedral Elements	Number of Smoothing Domains
Model 1	6913	26,695	58,038
Model 2	9734	42,787	90,638
Model 3	14,546	67,349	141,186

Table 7. Solving time and speedup ratio of different underground gas storage models.

Model	R2	Unaccelerated	Accelerated	Speedup Ratio
	0.01	1599 s	200 s	7.80
Model 1	0.02	902 s	122 s	7.39
	0.04	514 s	73 s	7.04
	0.01	2786 s	355 s	7.85
Model 2	0.02	2037 s	264 s	7.72
	0.04	913 s	130 s	7.02
	0.01	6741 s	831 s	8.11
Model 3	0.02	6130 s	786 s	7.78
	0.04	3076 s	413 s	7.45

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yang, T.; Qin, J.; Xu, N.; Mei, G.; Qin, Y. Numerical Evaluation of Stable and OpenMP Parallel Face-Based Smoothed Point Interpolation Method for Geomechanical Problems. Mathematics 2026, 14, 7. https://doi.org/10.3390/math14010007

AMA Style

Yang T, Qin J, Xu N, Mei G, Qin Y. Numerical Evaluation of Stable and OpenMP Parallel Face-Based Smoothed Point Interpolation Method for Geomechanical Problems. Mathematics. 2026; 14(1):7. https://doi.org/10.3390/math14010007

Chicago/Turabian Style

Yang, Tianxiao, Jiayu Qin, Nengxiong Xu, Gang Mei, and Yan Qin. 2026. "Numerical Evaluation of Stable and OpenMP Parallel Face-Based Smoothed Point Interpolation Method for Geomechanical Problems" Mathematics 14, no. 1: 7. https://doi.org/10.3390/math14010007

APA Style

Yang, T., Qin, J., Xu, N., Mei, G., & Qin, Y. (2026). Numerical Evaluation of Stable and OpenMP Parallel Face-Based Smoothed Point Interpolation Method for Geomechanical Problems. Mathematics, 14(1), 7. https://doi.org/10.3390/math14010007

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Numerical Evaluation of Stable and OpenMP Parallel Face-Based Smoothed Point Interpolation Method for Geomechanical Problems

Abstract

1. Introduction

2. Brief Introduction to the SPIM

3. Accelerated Convergence Strategies

3.1. Local Iteration

3.1.1. Stress Integration Algorithm

3.1.2. Line Search Method

3.1.3. Adaptive Sub-Step Method

3.2. Global Iteration

4. OpenMP Parallel Design

5. Numerical Tests

5.1. Correctness

5.2. Convergence and Stability

5.3. OpenMP Parallel Performance

5.3.1. Real Slope Model

5.3.2. Simplified Underground Gas Storage Model

6. Discussion and Future Work

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI