Methods for Solving Finite Element Mesh-Dependency Problems in Geotechnical Engineering—A Review

: The instabilities of soil specimens in laboratory or soil made geotechnical structures in ﬁeld are always numerically simulated by the classical continuum mechanics-based constitutive models with ﬁnite element method. However, ﬁnite element mesh dependency problems are inevitably encountered when the strain localized failure occurs especially in the post-bifurcation regime. In this paper, an attempt is made to summarize several main numerical regularization techniques used in alleviating the mesh dependency problems, i.e., viscosity theory, nonlocal theory, high-order gradient and micropolar theory. Their fundamentals as well as the advantages and limitations are presented, based on which the combinations of two or more regularization techniques are also suggested. For all the regularization techniques, at least one implicit or explicit parameter with length scale is necessary to preserve the ellipticity of the partial differential governing equations. It is worth noting that, however, the physical meanings and their relations between the length parameters in different regularization techniques are still an open question, and need to be further studied. Therefore, the micropolar theory or its combinations with other numerical methods are promising in the future.


Introduction
Natural and artificial geotechnical structures play an essential role in our lives. Granular soils, whether as the main construction materials or the foundation of geotechnical structures, determine, to an extent, their failure mechanisms. Many disasters that affect our lives are linked to geotechnical failures, such as landslides, instabilities of side slope of high embankments or dams, collapse of excavated tunnel surfaces, and uneven displacements of buildings and roads. Most geotechnical hazards can be identified as examples of progressive failure caused by the occurrence and development of severe strain localization [1][2][3][4][5][6][7]. Accordingly, to numerically reproduce the failure phenomena, such as strain localizations among others, has long been an important and extensively researched topic.
The conventional limit equilibrium method is a very old means of stability analysis. It was initially performed by Hultin and Pettersson in 1916 [8]. A safety factor Fs should be estimated to judge the stability of slopes, as is still widely used in geotechnical engineering nowadays. For example, based on the model tests results for a retaining wall, conducted serially in Cambridge from 1962 to 1974, Leśniewska [9] systematically studied shear band patterns, formations, and mechanism using the limit equilibrium method. However, doing so requires so many assumptions that only very experienced engineers can make a reasonable prediction. In short, the limit equilibrium method is no more an accurate method than any other for analyzing structural failure [10].
The rapid progression of computing technology has greatly increased the efficiency and accuracy of calculations in geotechnical engineering. To achieve solutions, numerical difficulties of DEM for modeling a real scale structure (e.g., dam, slope, tunnel, foundation, etc.) containing large numbers of particles. By contrast, FEM is more efficient and less expensive when modeling large scale geostructures, which are just needed to be divided by fine or coarse mesh. As the mesh-free particle method, SPH and MPM are accurate and promising to simulate the high nonlinear and large deformation problems [28][29][30]. However, it is unavoidable to cause the increase in cost. Indeed, the mesh-free methods have overcome many problems encountered with mesh based method, such as the mesh dependency problems of FEM. But they are more complex and time consuming than the conventional FEM to simulate the boundary value problems in geotechnical engineering. Therefore, FEM is still a good choice only after dealing with the mesh dependency problems faced with the strain localization problems.
To intuitively compare the some important widely used numerical techniques, their main advantages and limitations are summarized in Table 1. Frankly, it is unable to cover all the numerical methods as the new and advanced numerical methods have been being consecutively developed by researchers in the world.

FDM
It is a versatile numerical method, and especially simple to be implemented and perform well for simple geometries.
Compared with FEM, FDM is not adaptable to higher-dimensional and irregular geometries.

FEM
With high computing efficiency; It is versatile for many problems of engineering and mathematical physics; Being able to reproduce the failures of specimens or model tests in laboratory as well as the large geotechnical structures in field.
Pathological or non-unique solutions in post-failure regime for stain softening problems; Cracks can only propagate along the element rather than natural path; Severe mesh dependent problems in strain localized regions; FVM It is relatively simple to be implemented; FVM is a natural choice for fluid dynamics problems, because the conservation of mass, momentum and energy is always sustained; It only needs to do flux evaluation for the cell boundaries.
Even the corner of interest can be accurately examined by refining the mesh of the corner, but the higher-order elements cannot be solved using FVM.

BEM
Compared with FEM, there is no need for discretizing the domain under consideration into the elements besides the boundary; It saves the meshing efforts, and the system matrices are smaller.
Matrices are fully populated, with complex and frequency-dependent coefficients, which deteriorate the efficiency of the solution; Singularities may arise in the solutions

X-FEM
Powerful for discontinuous problems in mechanics, such as: crack growth, complex fluid, interface and so on; Allows simulation of initiation and propagation of a crack along an arbitrary path without the requirement of remeshing.
It needs to add the specific functions into FEM, e.g., the Heaviside to describe the discontinuities; Time stepping needs to be small enough to capture crack propagation; Hard to localize the initial fracture.
DDA DDA method has an important advantage of fast convergence with unconditional numerical stability compared to the DEM; It has several strengths for stability problems in jointed rock masses.
DDA can only be used to deal with the problems about discontinuous deformation or discontinuous motion; It has serious limitations for larger scale and faster moving problems.

FCM (NMM)
FCM is more suitable for modeling intersecting cracks; It is able to predict the entire failure process from crack propagation to sliding; It can well capture both the continuous and discontinuous problems in a unified framework; The switch from continuum to discontinuum is not automatically handled but it needs parameters to control the computation to change an equilibrium quasistatic system to a kinematic system;

DEM
Micromechanics and interactions between grains at particle-scale can be closely traced and studied; It is well understood from a point of view of physical characteristics.
High computing cost and high demand for the hardware equipment; It seems impossible to model the failure of large geotechnical structures containing countless particles.

PD
The crack path in PD is a natural outcome of the simulation without requiring to describe crack topology; Compared with X-FEM, meshless methods or other partition of unity methods, it does not require techniques to smooth the normals of the crack surfaces in order to avoid erratic crack paths.
For conventional PD, it is unable to model continuous problems; The radii of horizons are required to be constant, otherwise spurious wave reflections shall emerge and ghost forces between material points will deteriorate the results.

SPH
Capable of simulating high nonlinear and large deformation, such as slope slide, large settlement and complex large-deformed free surface flows, etc.; More efficient and smoother than MPM for large-scale free surface flows.
High computing cost and high demand for the hardware equipment. The explicit solver requires a small amount of time increment.

MPM
No tensile instability that is annoying in SPH; The formulations and boundary conditions are as simple as FEM; Capable of simulating high nonlinear and large deformation, such as slope slide, large settlement, as well as fluid flow, etc.; Faster convergence speed than SPH.
High computing cost and time-consuming, and high demand for the hardware equipment; Compared with SPH, the particles on fluid free surface are relatively rough and messy.
Besides the widely used classical models, e.g., Mises model, Mohr-Coulomb model, Drucker-Prager model, Duncan-Chang model and Cam-Clay model etc. Nowadays, more and more constitutive models have been proposed to specially describe the behavior of soils, and some important models [31][32][33][34][35][36] are listed as examples in Table 2. Table 2. Constitutive models and geotechnical issues.

Constitutive Models Geotechnical Issues
Nor-Sand model Based on the second axiom of critical state theory, sand behaviors such as the attribute of normality and realistic dilation rates are obtained in the model. Nor-sand model has been used to model plane strain element tests, triaxial tests, cyclic shear tests, static liquefaction and post-liquefaction strength etc. Geotechnical structures, such as passive walls have also been modelled by the model [37].
Hypoplastic model The Hypoplastic model, with relatively simple formulations, can consider the critical state of granular materials and it is even simpler than MCC for simulating clay. The hypoplastic model has been used to describe the mechanics behaviors of overconsolidated clays in laboratory and the strain softening behavior of soil in practice. For instance, it has been used in a hydro-mechanical coupled analysis for the long-term behavior of rainfall induced landslide [38,39].

Severn-Trent sand model
Severn-Trent sand model, considering bonding surface and kinematic-hardening, can describe well the sand behavior. For example, it can reproduce well the compression triaxial tests and extension triaxial tests of Toyoura sand. The model has also been successfully used to model the cantilevered wall-supported excavations [40,41].

Unified hardening model
Based on and developed from MCC, the proposed unified constitutive model can be used to describe well both clay and sand behaviors with hardening parameter independent on stress path. The unified mode has been widely used in geotechnical engineering. It has been developed to describe the stress dilation and stress contraction, hardening and softening, overconsolidation, stress induced anisotropy, inherited anisotropy, creep, asymptotic state, rotational hardening, crushing and other characteristics of soils [42].

Simsand model
The shear contraction and strain hardening of loose sand, and shear dilatancy and strain softening of dense sand can be modelled by this simple critical-state based model. It has been used to model the behaviors of Hostun sand and Toyoura sand in laboratory as well as failures of passive wall [36,43]. It should be noted that all these models are based on classical continuum mechanics theory. Although the interested regions have been finely divided to obtain more reasonable solutions, numerical and analytical solutions for strain localization using classical continuum mechanics-based models are known to suffer from serious mesh dependent problems. The pathological solutions are caused by the loss of ellipticity (hyperbolicity) of governing field equations for static (dynamic) problems. Accordingly, non-localized methods are needed to rectify the mesh dependency problems of conventional FEM. The regularization techniques introduced in this paper are to remove or relieve the spurious mesh dependency in numerical simulation of strain localization phenomena. In most cases, the regularization techniques alleviate mesh dependency problems in simulating strain localization phenomena by incorporating at least one implicit or explicit intrinsic parameter with length scale. The length scales incorporated in the models, usually characterizing the microstructures of material, manage to define the width of the strain-localized regions. The main regularization methods include viscosity technique, nonlocal technique, gradient technique, and micropolar technique, etc. However, the advantages and disadvantages of these important regularization approaches are rarely systematically summarized and compared in the past. Accordingly, a comprehensive review of these regularization approaches is necessary for a deep understanding of their differences and for the selection of the appropriate regularization method in different cases.
This paper comprehensively summarizes the typical regularization approaches in dealing with mesh dependency in numerical finite element analysis, mainly including viscosity theory, nonlocal theory, high-gradient theory, and micropolar theory. Moreover, the advantages and disadvantages of each regularization technique are systematically introduced and compared.

Viscosity Theory
Viscosity regularization approach is used to describe different responses at some time by means of introduced viscosity. In fact, soils and granular materials demonstrate obvious rate-dependent behaviours, which can certainly be explained by viscosity. For example, strain rate within the shear band is larger than that outside it, and when the difference is obvious enough, shear bands will occur. The viscosity can control the different and accordingly the localized region. Needleman [44] argued that even without clear internal parameters for the dimension of length in the classical viscoplastic model, rate-dependent constitutive models implicitly introduce a length scale into the governing equations, at which the incremental equilibrium equations for quasistatic problems remain elliptic and wave speeds for dynamic problems remain real, even in the presence of strain softening (one main reason for strain localization, among others). Then the pathological mesh sensitivity associated with numerical solutions of localization problems for rate-independent solids is eliminated. In this way, introducing the viscosity into the elastoplastic model with strain softening behaviour is able, in some degree, to reduce the mesh dependency of finite element solutions. It is thus not surprising that the fluid in saturated granular model can greatly affect the degree of mesh sensitivity [45].
Viscoplasticity is a theory in continuum mechanics that describes the rate-dependent inelastic behaviour of solids. Herein, take a one-dimensional viscoplastic model for instance as shown in Figure 1. The elastic response of viscoplastic materials can be represented by a Hookean spring element. Rate-dependence can be represented by a nonlinear dashpot element in a manner similar to viscoelasticity. Plasticity can be accounted for by adding a sliding frictional element. Their roles playing in the viscoplastic model are described by the formulae in Figure 1.
inelastic behaviour of solids. Herein, take a one-dimesionalviscoplastic model for instance as shown in Figure 1. The elastic response of viscoplastic materials can be represented by a Hookean spring element. Rate-dependence can be represented by a nonlinear dashpot element in a manner similar to viscoelasticity. Plasticity can be accounted for by adding a sliding frictional element. Their roles playing in the viscoplastic model are described by the formulae in Figure 1. Overstress theory was firstly adopted in the models of Perzyna [46,47] or Duvaut & Lions [48]. In these models, the stress is allowed to increase beyond the rate-independent yield surface upon application of a load and then allowed to relax back to the yield surface over time. The yield surface is usually assumed to be rate-independent in such models. Rate-dependency was initially introduced to describe mesh sensitivity for localization problems in metal, as by Needleman [44], Shawki and Clifton [49], and Wu and Freund [50]. Later, it was applied to deal with the instability and strain localization phenomena of saturated porous media [51], of concrete and rock fractures [52][53][54], and of dilatancy materials and clay [55][56][57]. Building on the work of Sluys and de Borst [53,54], Wang et al. [58] introduced a consistency viscoplastic model in which the viscosity was implemented by means of a rate-dependent yield surface, which was proven to have a faster global convergence than the overstress viscoplastic models. Based on viscoplastic models proposed by Perzyna [46,47] and Duvaut and Lions [48], Dias [59] also proposed a simple Figure 1. Illustration of viscoplastic model: (a) spring element; (b) dashpot element; (c) sliding frictional element.
Overstress theory was firstly adopted in the models of Perzyna [46,47] or Duvaut & Lions [48]. In these models, the stress is allowed to increase beyond the rate-independent yield surface upon application of a load and then allowed to relax back to the yield surface over time. The yield surface is usually assumed to be rate-independent in such models. Rate-dependency was initially introduced to describe mesh sensitivity for localization problems in metal, as by Needleman [44], Shawki and Clifton [49], and Wu and Freund [50]. Later, it was applied to deal with the instability and strain localization phenomena of saturated porous media [51], of concrete and rock fractures [52][53][54], and of dilatancy materials and clay [55][56][57]. Building on the work of Sluys and de Borst [53,54], Wang et al. [58] introduced a consistency viscoplastic model in which the viscosity was implemented by means of a rate-dependent yield surface, which was proven to have a faster global convergence than the overstress viscoplastic models. Based on viscoplastic models proposed by Perzyna [46,47] and Duvaut and Lions [48], Dias [59] also proposed a simple model for viscous regularization of elastoplastic constitutive laws with softening. This model, when tested in a problem with slip-driven softening (von Mises material) as well as in a problem with decohesion-driven softening (Cam-Clay model), exhibited its capability to regularize the solution.
With the regularization of viscosity, mesh dependency problems in strain localized regions are significantly alleviated, allowing shear band thickness to be predicted and specified. For example, in Equation (1) the localized strain rate distribution along a onedimensional bar with several elements was implicitly expressed by the internal length scale l in the consistency model proposed by Wang et al. [58], where β is a small constant that represents the cut off value of the relative strain rate at the edge of the shear band, G is the shear modulus and c g = √ G /ρ is the elastic shear wave speed, m is the viscosity parameter, and h is the strain softening parameter. Wang et al. [58] found that the smaller value of the internal length scale l and the imperfection size w determined the shear band thickness (i.e., L = min[l, w]). In their numerical examples, they observed that on mesh refinement, the shear band thickness converged to the material length scale l as defined in Equation (1). Clearly, the thickness of shear band would decrease when the viscosity m decreased or when the absolute value of softening parameter |h| increased (h is a negative value). If the imperfection size w was taken into consideration, it was observed, the imperfection size dominated the shear band thickness when it was smaller than material length scale (w < l). In contrast, if the imperfection size exceeded the material length scale, the influence of the imperfection would disappear, and the material length scale determined the shear band thickness.
The main advantage of viscosity regularization is that it does not need any additional global discretization, because it requires only supplementary operations at the local level in constitutive models, whose implementation in common nonlinear finite element packages is very simple. Furthermore, it works equally well for both the decohesion failure mechanism and the slip-driven softening failure mechanism. Its main disadvantage is the need to add an artificial feature, viscosity, to describe the material behaviour when it does not exhibit rate-dependency. Therefore its applicability is obviously limited to transient loading conditions, and the regularizing effect rapidly decreases for slow loading rates or when approaching the rate-independent limit.

Nonlocal Theory
Modern nonlocal elastic constitutive models of the integral type-using weighted spatial averages-first saw use in the 1960s, motivated by homogenization of the atomic theory of Bravais lattices. By means of nonlocal approaches, researchers managed to describe the damage and dislocation phenomena in crystals on a scale comparable to the range of interatomic forces. Using nonlocal models, researchers managed to reproduce the dispersion of short elastic waves and enhance descriptions of interactions between crystal defects such as vacancies, interstitial atoms, and dislocations or even reproduce the electromagnetic wave characteristics of composite media [60][61][62][63][64][65][66][67][68]. What's more, the bending, buckling and free vibration of nanobeams, nanowires, nonuniform carbon nanotube at nano-scale, have been studied via nonlocal models these years [69][70][71][72]. Plastic nonlocal models were first proposed as a way of describing the stress field at a fracture front [73][74][75]. However, Eringen's formulation did not mean to serve as a localization limiter, and the averaging operator was applied to the total strain tensor, which could lead to spurious instabilities. Later, nonlocal plasticity theory was improved and initially introduced to describe strain localization phenomena of softening materials by Bažant et al. [76,77]. After these initial attempts, a comprehensive number of relevant contributions rapidly emerged [78][79][80][81][82][83][84][85][86][87][88][89][90][91][92][93][94].
Nonlocal regularization has been proven to reduce mesh sensitivity when simulating the damage behavior of ductile materials with microdefects and strain localization phenomena caused by strain softening.
The derivation of a nonlocal approach starts from the choice of the variable to be enhanced by nonlocality. Typical choices are, among others, the regularization of variables related to kinematics (such as the strain tensor), the regularization of internal state variables (such as scalar measurements of the amount of plastic strain or damage) or the regularization of thermodynamic forces power-conjugated with internal state variables (for instance, the elastic energy release rate in damage models). Faced with this wide range of possibilities, deciding which term is more effective is difficult. Indeed, the choice of the nonlocal variable depends on the kind of material to be modelled, as well as on the nature of the problem to be solved. In the particular case of elastoplastic damaging ductile solids, internal degradation, which is closely related to the localization phenomena, is usually chosen as an internal nonlocalized variable. After the nonlocal variable is chosen, its nonlocal counterpart can be expressed, in an integral-type formulation, by means of the spatially weighted averaging integral. For example, the spatial average of the magnitude of plastic strain ε p at location x has been illustrated as Figure 2, and the formulations are presented as follows, Sustainability 2022, 14, 2982 The angle brackets < > denote the averaging operator, with ε p and <ε p >, the local and nonlocal internal variables, respectively. V is a finite volume of the body that is dictated by one constitutive parameter, generally called intrinsic length l with a dimension of length. Vr has approximately but not exactly the same meaning as the representative volume in the statistical theory of heterogeneous materials. α(x) is the weighting function that defines the averaging and s the general coordinated vector. Because numerical computations show much better convergence if the weighting function is smooth, the error density function (normal distribution function) has been suggested as the suitable form of the weighting function [76], The angle brackets < > denote the averaging operator, with ε p and <ε p >, the local and nonlocal internal variables, respectively. V is a finite volume of the body that is dictated by one constitutive parameter, generally called intrinsic length l with a dimension of length. V r has approximately but not exactly the same meaning as the representative volume in the statistical theory of heterogeneous materials. α(x) is the weighting function that defines the averaging and s the general coordinated vector. Because numerical computations show much better convergence if the weighting function is smooth, the error density function (normal distribution function) has been suggested as the suitable form of the weighting function [76], In which, for one, two and three dimensions 2D : |x| 2 = x 2 + y 2 , k = 2 3D : |x| 2 = x 2 + y 2 + z 2 , k = 6 √ π 1/3 (8) l is the characteristic length, a material property that defines the diameter of the representative volume (a line segment, circle, or sphere), and is determined pursuant to the condition that the representative volume has the same volume as the normal distribution function extending to infinity (x, y, z are the Cartesian coordinates). For numerical finite element computations, only those elements whose integration points are distributed in the domain of 2l around x need to be included in the sum using the Gauss integration method. For those elements outside the domain, the error density function α is negligible. As for the strain localization problems caused by softening, the nonlocal average should simply be applied to those variables controlling strain softening.
Nonlocal approaches work well for both types of failure mechanisms (mode I: decohesion; mode II: slip-driven). For total stress-strain relations (without decomposition into elastic and plastic parts) the nonlocal approach is computationally more efficient than the gradient models discussed in the next section. A definite disadvantage of current nonlocal formulations is that they are at odds with existing numerical strategies [95]. Gradient models, for their part, are much more amenable to an efficient numerical implementation by preserving their favorable property of containing an internal length scale [96]. Another disadvantage is that the consistency condition results in an integral-differential equation instead of an algebraic equation which can be solved locally.

High-Order Gradient Theory
Generally speaking, gradient models and nonlocal models belong to a common theoretical category, with the gradient model a particular nonlocal model. Gradient models can be derived from nonlocal models by expanding the kernel of the integral employed in the averaging procedure for the inelastic strains. The gradient theory has been widely used as a very effective tool for regularizing finite element solutions so as to study strain localization phenomena in geotechnical engineering. Gradient approach was first used for rigid plastic material to analyze persistent slip bands [97,98] and shear bands [99] in metals. Vardoulakis and Aifantis [100,101] used the second-order gradient theory in studying the heterogeneous deformation in the granular media. They modified a flow theory and the yield function by incorporating high-order gradient, and the incorporation of an appropriate length scale allowed them to capture the shear band thickness reasonably. To gain a better understanding of its application, the paper of Vardoulakis and Sulem [102] should be referred to. Ever since, many other researchers have also contributed greatly to this area. Chambon et al. [103] proposed a local second-order gradient model for dealing with localization phenomena; Borja [104] obtained a finite element solution for the shear band evolution using the deformation gradient to map between stress tensor; Voyiadjis and Song [105] used the gradient theory to capture strain localization when considering micro-interactions between granular grains. More than just the application of high-gradient approach in a single-phase solid, it was also used to study the strain localization of saturated porous media. Chikazawa et al. [106] used a gradient-dependent viscoplastic constitutive model to study the strain localization of water saturated soils and found strain localization to be highly dependent on strain gradient; Dorgan and Voyiadjis [107] used the second-order gradient theory in the kinematic hardening by introducing an internal length scale. They found the thermos-elastic Helmholtz free energy function to be dependent on those internal variables and their second-order gradients. Even so, the internal length has not yet clear physical meaning, being merely a mathematical method. Now, with a view to explain the mechanism of gradient continuum theory (secondorder generally suffices), we revisit the gradient plasticity formulations proposed by de Borst et al. [96,[108][109][110], in which they restricted the yield function to second-order derivatives so that the yield function was also dependent on the Laplacian of a hardening parameter in addition to the hardening parameter itself [96,[108][109][110][111], Compared with nonlocal theory, a distinct advantage of gradient plasticity is that the consistency condition yields a partial differential equation instead of an integral differential equation, where n T , h, and g are given by In which g is a positive gradient influence coefficient with the dimension of force [111]. For g = 0, the classical plastic flow theory is obtained. The enhanced gradient theory aims at preserving the well-posedness of the governing equations for materials that do not comply with the material stability requirement. When a softening relation between stresses and strains (h < 0) is assumed or when non-associated plastic flow is postulated as reproducing an experimental response in soil, the tangential stiffness matrix D ep becomes non-symmetric, leading to an inclination of instability. For strain softening materials (h < 0), the gradient term seen in Equation (13) can act as a stabilizer and guarantee ellipticity of the governing partial differential Equation (10) after the onset of plastic deformation. For example, in a one-dimensional problem (de Borst et al., 1993), the gradient influence coefficient g is expressed by a strain softening parameter and an internal length parameter l, For strain hardening materials, the Laplacian term with g > 0 is also demonstrably able to smooth the solution. Similar observations can also be obtained for the general cases of three-dimensional continua [96,112]. The width of localized zones in strain localization problems, as measured by the evolution of plastic strain, was estimated analytically by a constant w = 2πl in a one-dimensional localization problem as shown in Figure 3 (pure tension of a bar with length L; [111,112]). That is to say the thickness can be related to a length parameter but not the element size. Gradient plasticity theory has proven to be highly versatile for describing localiza of deformation in a continuum media and also to be computational with good efficie The regularization of the gradient approach is effective for both mode I (decohesion) mode II (frictional slip) failures. A disadvantage of the approach is the introduction o additional variable at global level. What's more, the determination of the gradient v bles is not an easy task. Importantly, the gradient terms disappear from the constitu equations if a homogeneous state of strain and stress is analyzed, and although grad terms are negligible if strains vary slowly in the pre-peak regime of strain softening p lems, they exert a significant influence in the presence of strain localization (in the p peak regime). Because higher-order continuum models have no effect for homogen deformations, additional parameters of high-order continuum models cannot be m ured directly from elementary tests such as uniaxial or triaxial tension or compres tests. However, a semi-inverse manner can be used by fitting the experimental resul different types of tests in the post-peak regime.

Micropolar Theory
Micropolar theory is one of the most important regularization approaches, which a more physical meaning than a wholly mathematical technique when compared other regularization approaches aforementioned (e.g., nonlocal and high-gradient). M researchers [43,96,108,113-126] have used micropolar theory as a regularization appr for analyzing strain localization phenomena, and it has played a positive role to allev or even solve mesh dependency problems by preserving the ellipticity or hyperbolici governing partial differential equations for boundary value problems. Gradient plasticity theory has proven to be highly versatile for describing localization of deformation in a continuum media and also to be computational with good efficiency. The regularization of the gradient approach is effective for both mode I (decohesion) and mode II (frictional slip) failures. A disadvantage of the approach is the introduction of an additional variable at global level. What's more, the determination of the gradient variables is not an easy task. Importantly, the gradient terms disappear from the constitutive equations if a homogeneous state of strain and stress is analyzed, and although gradient terms are negligible if strains vary slowly in the pre-peak regime of strain softening problems, they exert a significant influence in the presence of strain localization (in the post-peak regime). Because higher-order continuum models have no effect for homogeneous deformations, additional parameters of high-order continuum models cannot be measured directly from elementary tests such as uniaxial or triaxial tension or compression tests. However, a semi-inverse manner can be used by fitting the experimental results of different types of tests in the post-peak regime.

Micropolar Theory
Micropolar theory is one of the most important regularization approaches, which has a more physical meaning than a wholly mathematical technique when compared with other regularization approaches aforementioned (e.g., nonlocal and high-gradient). Many researchers [43,96,108,[113][114][115][116][117][118][119][120][121][122][123][124][125][126] have used micropolar theory as a regularization approach for analyzing strain localization phenomena, and it has played a positive role to alleviate or even solve mesh dependency problems by preserving the ellipticity or hyperbolicity of governing partial differential equations for boundary value problems. Micropolar theory (also called Cosserat theory) is a generalized classical continuum theory. According to Ristinmaa [127], one of the oldest theories belonging to this class of models is the centennial couple stress theory originally proposed by Voigt in 1887 and later developed by the Cosserat brothers [128], who removed the connection between the rotational field and the displacement gradients. Couple stress theory (constrained Cosserat theory) considers the possibility of body couples existing in the interior of the body and of surface couples existing on the surface of the body. Because of the relative complexity, however, it received little attention. Investigations into Cosserat theory saw an uptick in the early 1960s with the work of, notably, Mindlin [129] and Koiter [130]. Ever since, Cosserat theory has also been called micropolar theory, a terminology in vogue at that time that has also been adopted in this manuscript. Interests in the applications of micropolar theory began to increase in the mid-1970s when specialists in geotechnical engineering began to link micropolar kinematics and strain localization phenomenon. Finite element calculations using micropolar theory with independent rotations began with Mülhaus [119] and de Borst and Sluys [115]. After that, more and more micropolar constitutive models have been implemented and adopted to analyze the shear localization phenomena of other microstructural problems by FEM.
In classical continuum mechanics, the strain tensor can be decomposed into a symmetric part (the stretch tensor) and an antisymmetric part (the spin tensor) regardless of whether it is the Green-Lagrangian strain tensor or the Eulerian strain tensor. The classical spin tensor generally corresponds to the macro-rotation caused by differences in displacement gradients. However, in practical cases, the onset and evolution of shear bands is closely related to grain rotation as well as the translational deformations, which has also been confirmed by experimental results [131,132]. Unlike in classical continuum mechanics theory, which accounts for only macro-rotations, micropolar theory takes into account the independent micro-rotations of material points, as seen in Figure 4 (a plane element having four material points). For example, for a plane strain element, there are three degrees of freedom (two translational and one rotational). The micro rotations will cause the micro curvatures and the corresponding energetically-conjugated couple stresses in the micro element surfaces. Consequently, the theorem of conjugate shearing stress is no more satisfied as shown in Figure 5. Mülhaus [119] and de Borst and Sluys [115]. After that, more and more micropolar con stitutive models have been implemented and adopted to analyze the shear localization phenomena of other microstructural problems by FEM.
In classical continuum mechanics, the strain tensor can be decomposed into a sym metric part (the stretch tensor) and an antisymmetric part (the spin tensor) regardless o whether it is the Green-Lagrangian strain tensor or the Eulerian strain tensor. The classica spin tensor generally corresponds to the macro-rotation caused by differences in displace ment gradients. However, in practical cases, the onset and evolution of shear bands is closely related to grain rotation as well as the translational deformations, which has also been confirmed by experimental results [131,132]. Unlike in classical continuum mechan ics theory, which accounts for only macro-rotations, micropolar theory takes into accoun the independent micro-rotations of material points, as seen in Figure 4 (a plane elemen having four material points). For example, for a plane strain element, there are three de grees of freedom (two translational and one rotational). The micro rotations will cause the micro curvatures and the corresponding energetically-conjugated couple stresses in the micro element surfaces. Consequently, the theorem of conjugate shearing stress is no more satisfied as shown in Figure 5.  Mülhaus [119] and de Borst and Sluys [115]. After that, more and more micropolar constitutive models have been implemented and adopted to analyze the shear localization phenomena of other microstructural problems by FEM.
In classical continuum mechanics, the strain tensor can be decomposed into a symmetric part (the stretch tensor) and an antisymmetric part (the spin tensor) regardless of whether it is the Green-Lagrangian strain tensor or the Eulerian strain tensor. The classical spin tensor generally corresponds to the macro-rotation caused by differences in displacement gradients. However, in practical cases, the onset and evolution of shear bands is closely related to grain rotation as well as the translational deformations, which has also been confirmed by experimental results [131,132]. Unlike in classical continuum mechanics theory, which accounts for only macro-rotations, micropolar theory takes into account the independent micro-rotations of material points, as seen in Figure 4 (a plane element having four material points). For example, for a plane strain element, there are three degrees of freedom (two translational and one rotational). The micro rotations will cause the micro curvatures and the corresponding energetically-conjugated couple stresses in the micro element surfaces. Consequently, the theorem of conjugate shearing stress is no more satisfied as shown in Figure 5.  In micropolar theory, rotational degrees of freedom are independent of the displacement field and are considered on constitutive level and in balance equations. From a material point of view, these rotations can be considered as the rotations of grains or aggre- In micropolar theory, rotational degrees of freedom are independent of the displacement field and are considered on constitutive level and in balance equations. From a material point of view, these rotations can be considered as the rotations of grains or aggregates. Accordingly, each grain or aggregate has additional rotational degrees of freedom besides the translational degrees of freedom. Thus, there are six degrees of freedom (three translational and three rotational) in three-dimensional problems as Equation (15) and three (two translational and one rotational) in two-dimensional problems as Equation (16) for each material point, 3D : u = u x u y u z ω x ω y ω z 2D : u = u x u y ω z where ω x , ω y , and ω z are the micro-rotations in the x, y, and z directions. These microrotations will cause the micro-curvatures and the corresponding energetically-conjugated couple stresses in the micro-element surfaces. What's more, the theorem of conjugate shearing stress is no more satisfied. For 3D problems, the generalized stress and strain vectors in micropolar theory are augmented, σ = σ xx σ yy σ zz σ xy σ yx σ yz σ zy σ zx σ xz where σ xy = σ yx , σ xz = σ zx , σ zy = σ yz , and m ij are the coupled stress components (m ii are the torsion ones and m ij (i = j) are the bending moments). κ ij are the gradients of microrotations ω j in direction i. Of the two new incorporated micro length scale parameters l c and l t , l c is the length scale parameter related to bending couple stress, and l t is related to torsion couple stress. When the microstructure is considered, a typical strain localization problem such as the relation of shear band to microstructures can be reasonably predicted to leave the thickness of shear band to be specified. At the same time, the high-order terms guarantee the ellipticity (for static problems) of the governing partial differential equations, especially in the post-bifurcation regime, and then the mesh dependent problems can be obviously alleviated.
Take the 2D problems for example, for the constitutive relation of elastic materials, the stress is linearly related to elastic strain by the elastic stiffness matrix: In which where Lamé constant λ = 2Gυ/(1 − 2υ), G and υ are the conventional shear modulus and Poisson's ratio, respectively, and G c is the micropolar shear modulus affecting the asymmetric degree of shear stress. According to de Borst et al. [96] the strain and stress invariants within the framework of the micropolar theory can be formulated as: where . e p ij is the plastic deviatoric strain rate tensor, . κ p ij is the plastic micro-curvature rate tensor, and s ij is the deviatoric stress tensor. The summation convention with respect to repeated indices has been adopted. Furthermore, the deviatoric stress q is updated by new stress invariant √ 3J 2 . For numerical convenience, the choices a 1 = a 2 = 1/4, a 3 = 1/2 and b 1 = b 2 = 1/3, b 3 = 2/3 have been used in majority cases [96,115,133].
Micropolar theory can yield efficient and fully mesh-independent solutions for static problems as well as for dynamic problems. In analyzing the problems of frictional slip failure mode (mode II failure type) involving a high localized shear band, the micropolar approach seems to be a particularly natural framework, being easily implemented and physically meaningful. However, a disadvantage of the micropolar continuum theory is that the rotational degrees of freedom are activated only under shear loading. Numerical results suggested that for failure problems in which decohesion played a prevailing role (mode I failure type), rotational degrees of freedom became inactive and microcurvatures remained zero, as did the energetically-conjugated couple stresses. That is to say, when decohesion rather than frictional slip is the predominant failure mode, the regularization effect of micropolar theory is generally too weak to preserve the ellipticity of the boundary value problems. Instead, for tensile loadings in which decohesion is the main cause of structure failure, nonlocal models [134] are very effective at keeping the boundary value problem elliptic. It is worth noting that strain localization in dry and saturated specimens has been studied experimentally by many researchers on loose sand as well as dense sand, demonstrating that frictional slip (mode II failure mode) is the dominant failure phenomenon [135][136][137]. It is also the main failure mechanism for geostructures in reality. There is no doubt, then, that micropolar theory can be used to analyze strain localization problems in geomaterials.

Discussions
In this paper, four main numerical regularization techniques, i.e., viscosity theory, nonlocal theory, high-order gradient and micropolar theory, are fundamentally demonstrated. Inevitably, each regularization method has its disadvantages and limitations, and in some cases in which a single regularization method does not work well, a regularization method combining at least two regularization approaches might be efficient. In general, the combination of viscosity with another regularization technique has seen wide adoption. For instance, Wang et al. [58,138] proposed a model regularized by both rate dependency (viscoplasticity) and plastic gradient that was effective for both quasistatic and dynamic problems when dealing with mesh dependency problems. What's more, interactions between these two methods in controlling shear band thickness were also discussed. Oka [55,139] proposed a gradient-dependent elastoviscoplastic model for clay to study the strain localization problems and deformation mode. Based on a typical plastic constitutive model proposed by de Borst et al. [109] that featured both rate and gradient dependence for strain localization analysis, Zhang [140,141] predicted the internal length scale of the combined model for general cases and illustrated the interactions between different length scale parameters for rate dependency models and gradient plastic models from a mathematical point of view using a one-dimensional example. Liu and Yang [45] proposed a coupled Boit-Cosserat model produced by combining both Biot's theory (the pore pres-sure dissipation is rate-dependent) and micropolar theory with a view to simulate strain localization phenomena caused by strain softening in saturated porous media. Numerical results demonstrated the developed models' abilities to maintain the well-posedness of boundary value problems while incorporating strain softening behavior, as well as the capability to reproduce the strain localization phenomena in geotechnical structures.
From the above descriptions of the four main regularization techniques, it can be seen that the domain of the strain localized region is closely related to the internal length scale, however, the meanings of different length scales and their relations to the thickness of shear band are not identical. For viscoplastic model, take the consistency model proposed by Wang et al. [58], the shear modulus, shear wave speed, viscosity parameter as well as the softening parameter, etc. are believed to decide the thickness of shear band, and these factors can be related to the cut off value of the strain rate at the edge of the shear band by an implicit parameter with length scale. What's more, under the approximate form of the relation, the implicit length scale is able to denote the thickness of shear band. For nonlocal theory, certain internal variables closely related to the strain localization are averaged in a nonlocal finite volume to reach the regularization effectiveness, and the finite representative volume decided by the parameter with length scale is believed to be the damaged region and strain localized region. In this way, the size of the strain localized region is controlled by the internal length scale and the chosen weighting function. For gradient theory, a special case of the nonlocal theory, the gradient term can be denoted by the softening parameter and the internal length scale, then it can be thought to reflect the fact that below the certain size scale the interaction between the microstructural carriers of the deformation is nonlocal, resulting in the thickness of shear band decided by the internal length scale. For a micropolar model, the independent grains' rotations result in the couple stresses, therefore, in 2D problems the internal length scale is naturally regarded as the bending length between grains or aggregates for granular materials. Thus, the thickness of shear band can be predicted by the value of the internal length parameter as the relation between the grain size and the shear band thickness in experimental tests.
The respective and common features of the discussed regularization techniques are summarized in Table 3, based on which researchers can choose them more reasonably and scientifically according to their features. Furthermore, one technique maybe not enough to fit all problems, so the micropolar technique and its combination with other numerical techniques are favored in prospective. Table 3. Features of each regularization technique and promising prospective.

Viscosity approach
Being able to consider the rate-dependent behaviors of soils; It can be used to model both the slip-driven failure and the decohesion-driven failure of soils; It only requires operations at the local level in constitutive models; The implementation is very simple. However, it is not appropriate for modelling the rate-independent materials.
The abilities to maintain well-posedness of boundary value problems while considering strain softening behavior; Some implicit or explicit parameters with micro-length scale are contained in the models; With excellent performances in dealing with mesh dependency problems in FEM; The thickness of strain localized region is closely influenced by the internal length scale of each regularization technique; Besides the micropolar theory, of which the length scale can be regarded as the mean grain size, it is still an open question for other regularization techniques to physically relate the length parameters to grain size.

Nonlocal approach
It works well for both the decohesion-driven and the slip-driven failure modes; It is computationally more efficient than the high gradient models; However, the choice of nonlocal variable is not an easy task, and it is affected by specific problems; The integral-differential equation is more complex than algebraic equation. Table 3. Cont.

Numerical Techniques Respective Features Common Features
High-gradient approach Gradient models can be derived from nonlocal models by expanding the kernel of the integral; It has good efficiency in computation; It is effective for both decohesion-driven and slip-driven failures; However, it introduces an additional variable at global level, and the determination of the gradient variables is not easy.

Micropolar approach
Micropolar theory takes into account the independent micro-rotations of material points, so it has a more physical meaning than a wholly mathematical technique when compared with other regularization approaches aforementioned; It can efficiently and fully obtain the mesh-independent solutions for static problems as well as for dynamic problems; The thickness of shear band can be controlled by the length parameter as the cases it is influenced by the mean grain size in laboratory tests; However, when decohesion rather than frictional slip is the predominant failure mode, the regularization effect of micropolar theory is generally too weak to preserve the ellipticity of the boundary value problems.

Micropolar approach and its combinations with other numerical methods
In present paper, the micropolar theory, considering particles' rotations and moments, is favored and believed to have more wide applications in the future. Besides the combinations of different regularization techniques discussed above. For further study, the combinations of micropolar theory with other numerical techniques, such as DEM, SPH, MPM, FCM, etc., can play more significant roles. For example, the macro rotation in strain localized regions can be described by micropolar technique, but for the most concerned regions or the interactions between particles can be analyzed by DEM.

Conclusions
In conclusion, with the capability of overcoming the numerical difficulties and dealing with the mesh dependency problems, the regularization approaches, including viscosity effect, nonlocal theory, high-order gradient and micropolar theory, have always been adopted to reproduce and track the onset and evolution of the strain localization phenomena in geotechnical engineering in finite element analysis. No matter which regularization method is adopted, at least one explicit or implicit internal length scale parameter must be generally incorporated into the constitutive model. In the research of many scientists aforementioned, the internal length scale parameters have been hypothetically related to the microstructure, with random constants distributed within a certain range of the ratio of a structure's typical dimension, internal defection, or even interactions, etc., indicating that the physical meanings of internal length scales and their interrelations have not been obtained a common sense until now. Accordingly, further investigation of these internal length scales in each regularization approach is still a matter of great urgency and significance.
At last, the argument of Tejchman [125] is also favored in present paper, the micropolar approach is more suitable, from a physical sense, for modeling of shear zones in granular materials than other models that seek to capture strain localization in a proper manner (e.g., the nonlocal, high-order gradient, and viscous models), because it takes into account grains' rotations and couple stresses, which can be experimentally observed during shearing (even though these remain negligible during homogeneous deformation). What's more, its developments or combinations with other numerical methods are believed to be promising in geotechnical engineering in the future.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.