1. Introduction
Coarse-grained soils (CGSs) are defined as soils in which particles with diameters ranging from 0.075 to 60 mm constitute more than 50% of the total mass [
1,
2,
3]. They are widely used in the construction of geotechnical structures such as dams and embankments, as well as various permeable structures and facilities, due to their favorable engineering properties, including high compaction performance, large permeability coefficient, and high shear strength. Permeability is a fundamental soil property that plays a critical role in determining the stability and functionality of geotechnical structures. It is commonly characterized by the permeability coefficient, the accurate determination of which has been a longstanding focus and challenge in both academic and engineering communities [
4,
5,
6]. The permeability coefficient of soils is typically obtained through experimental testing [
7,
8,
9].
The permeability coefficient of CGS is typically determined through experimental testing [
10,
11,
12]. However, permeability tests for CGSs present challenges such as large specimen size, difficulties in specimen preparation, and long testing durations. Consequently, extensive efforts have been made to develop predictive formulas for the permeability coefficient. Existing studies indicate that key factors influencing the permeability of CGSs include particle size distribution [
13,
14], particle shape [
15,
16,
17], pore characteristics [
18,
19], mineral composition [
20,
21], temperature [
22,
23], and fluid properties [
24]. For a given type of CGS (i.e., soils with similar genesis and mineral composition, which are generally considered to be of the same type), the particle shape tends to be relatively consistent. Under such conditions, particle size distribution and pore characteristics become the two most critical factors governing permeability [
25], as they largely determine the permeability coefficient. Therefore, in principle, the permeability coefficient of any given CGS should be computable based on parameters describing its particle size distribution and pore characteristics. Based on these parameters, numerous empirical models for predicting the permeability coefficient of CGSs have been developed by researchers worldwide. Notable examples include the Terzaghi model [
26], Shahabi model [
27], the modified Hazen model [
28], as well as the Chapuis model [
29]. Empirical models exhibit distinct advantages in predicting the permeability coefficient of coarse-grained soils, primarily due to their engineering applicability and rapid assessment capabilities. Empirical models, exemplified by the Kozeny–Carman equation, establish explicit mathematical relationships between permeability and particle size characteristics (e.g., effective particle size
D10) as well as void ratio (
e). These models require only a limited set of readily obtainable soil parameters, making them particularly suitable for on-site engineering decision-making. By leveraging statistical correlations derived from extensive laboratory test data, such models serve as efficient tools for the preliminary selection of coarse-grained fill materials and the early assessment of seepage risks, especially when applied to soils with relatively uniform gradation (e.g., well-graded sandy soils) and simple pore structures. However, the limitations of empirical models become particularly pronounced under complex conditions. The oversimplified physical assumptions hinder the accurate representation of multiscale pore connectivity in continuously graded coarse-grained soils. For instance, these models often neglect the nonlinear influence of particle angularity on tortuosity, leading to permeability coefficient prediction errors exceeding 50%. Second, model parameter sensitivity is highly dependent on boundary conditions. When soil gradation deviates significantly from standard ranges (e.g., uniformity coefficient
Cu > 20) or when fine particles are interspersed, traditional empirical models may yield prediction errors spanning two to three orders of magnitude. More critically, these empirical formulas are fundamentally extrapolative fits to specific datasets. They neither elucidate the intrinsic physical mechanisms governing permeability evolution nor exhibit generalizability to novel engineering materials, such as recycled aggregate-stabilized soils, thereby limiting their broader applicability.
Given the mechanistic limitations and generalization bottlenecks of empirical models in predicting the permeability of complex CGSs, machine learning offers a novel paradigm for addressing the challenges of multifactor coupling. By autonomously extracting high-dimensional nonlinear interactions between gradation curve morphology and pore characteristics, data-driven models not only overcome the parameter sensitivity thresholds of traditional formulas but also leverage implicit feature mapping to capture key physical mechanisms, such as preferential seepage pathways. Compared to empirical equations that rely on manually imposed assumptions, machine learning optimizes feature weights adaptively and employs nonlinear function approximation, enabling cross-scale predictions from discrete particle size distributions to macroscopic permeability behavior even under limited specimen conditions. Furthermore, its superior prediction robustness significantly enhances the accuracy of permeability coefficient inversion for highly heterogeneous CGSs, establishing a technically rigorous yet practically viable approach for engineering applications.
In the field of permeability coefficient prediction for CGSs, various machine learning approaches have been systematically explored and validated for their effectiveness. The random forest algorithm was first employed to integrate the fractal dimension of the particle size distribution curve and pore tortuosity parameters, enabling a nonlinear mapping of the permeability coefficient for widely graded gravelly sands, with prediction accuracy improved by 62% compared to the traditional Hazen model [
30]. A Bayesian-optimized support vector regression model has demonstrated its capability to stably capture the implicit relationships between particle shape parameters and permeability, even under small-specimen conditions, exhibiting significant advantages in extreme gradation scenarios where the uniformity coefficient (
Cu > 15) is high [
31]. To address high-dimensional feature coupling, a deep neural network architecture was designed with a dual-channel structure incorporating a gradation encoder and a physics-informed loss function, successfully decoupling the influence of particle contact force chain evolution on seepage pathways [
32]). Additionally, the XGBoost algorithm has been embedded into a transfer learning framework, enabling knowledge migration across large-scale engineering databases and overcoming the limitations imposed by data scarcity in individual projects [
33]. The attention-based long short-term memory (LSTM) network has been innovatively applied to capture the dynamic response of permeability coefficients over time, reducing prediction errors to 8.3% in coarse-grained subgrade soils under cyclic loading conditions [
34]. Recent studies further confirm that graph convolutional networks (GCNs), incorporating discrete element simulation data, can effectively characterize the topological features of particle spatial arrangements, enhancing the accuracy of quantifying the contribution of three-dimensional pore structures to permeability to 92% [
35,
36,
37,
38].
In this investigation, 64 sets of CGS specimens were designed based on a negatively exponential continuous gradation equation, and computational fluid dynamics–discrete element method (CFD-DEM) simulations were conducted to model permeability tests, determining the permeability coefficients for each specimen under different porosity and gradation conditions. A comprehensive permeability coefficient database comprising 256 datasets would be constructed after correlation analysis and data preprocessing. Three machine learning approaches—neural network models, regression models, and tree-based models—would be employed to predict permeability coefficients, with a particle swarm optimization (PSO) algorithm introduced for global hyperparameter optimization. Four-fold cross-validation would be implemented to enhance data utilization and mitigate the impact of random data partitioning on model evaluation. Model performance would be assessed using multiple evaluation metrics, including the coefficient of determination (R2), root mean square error (RMSE), mean absolute error (MAE), and 95% uncertainty interval width (U95), and sensitivity analysis would be conducted to quantify the contribution of each input variable to the prediction outcomes. Finally, the superiority of machine learning methods in predicting the permeability behavior of CGSs would be validated through multidimensional comparisons with traditional empirical models and laboratory test results.
6. Conclusions
In this investigation, CGS specimens were designed using a negatively exponential continuous gradation equation, and a permeability coefficient database was constructed by integrating CFD-DEM simulated seepage tests. The predictive performance of three machine learning models—Back-Propagation Neural Network (BPNN), Gaussian Process Regression (GPR), and Random Forest (RF)—was systematically compared with that of traditional empirical models. Grey relational analysis (GRA) was employed to quantify the contribution of input parameters, while particle swarm optimization (PSO) combined with four-fold cross-validation enhanced the models’ generalization capability, thereby demonstrating the advantages of data-driven methods in evaluating the permeability of coarse-grained soils. The main conclusions are summarized as follows:
- (1)
Grey relational analysis revealed that the gradation parameters α, β, maximum particle size (dmax), and porosity (n) significantly influence the permeability coefficient (grey relational coefficient R > 0.6). Consequently, these four parameters should be jointly employed as input variables, obviating the need for dimensionality reduction.
- (2)
Among the models evaluated, BPNN achieved superior accuracy (R2 = 0.99, MAE = 1.5, RMSE = 2.9, U95 = 0.4), demonstrating its robust nonlinear mapping capability to accurately capture higher-order interactions affecting the permeability coefficient. In contrast, GPR exhibited the lowest accuracy (R2 = 0.92) due to limited kernel adaptability and noise sensitivity, while RF (R2 = 0.97) encountered challenges in modeling the smooth variations of continuous variables.
- (3)
Compared to the traditional empirical model (R2 = 0.9031), the BPNN’s predictive accuracy improved by 10.13%, with high consistency observed between training and validation results. These findings confirm that data-driven methods effectively overcome the limitations imposed by theoretical assumptions and enhance the reliability of permeability assessments for coarse-grained soils, offering an efficient solution for the intelligent prediction of geotechnical parameters.
The innovation of this study lies in the integration of CFD-DEM simulated datasets based on negatively exponential continuous gradation to capture realistic seepage behavior of coarse-grained soils, combined with the use of grey relational analysis to ensure the physical interpretability and relevance of input variables. Furthermore, a particle swarm optimization (PSO) strategy coupled with k-fold cross-validation was implemented to enhance the generalization capability of machine learning models. The systematic comparison of BPNN, RF, and GPR further reveals the superior performance of BPNN in modeling complex nonlinear relationships, offering a more accurate and data-efficient approach for permeability prediction than existing empirical or machine learning methods.
Limitations and Future Scopes
Despite the promising results achieved in this study, several limitations should be acknowledged. First, the permeability coefficient database was derived entirely from CFD-DEM simulations, which, although capable of capturing pore-scale flow behavior, may not fully replicate the heterogeneity and boundary conditions of real-world CGSs. Second, the model inputs were limited to gradation parameters and porosity; other potentially influential factors such as particle shape, angularity, and fabric structure were not considered. Moreover, the machine learning models were trained and validated on a relatively constrained parameter space, which may limit their generalizability when extrapolated to soils with markedly different gradation characteristics. Future research should focus on incorporating experimental datasets to validate and enhance the robustness of the predictive models, exploring additional microstructural descriptors as input features, and extending the framework to accommodate dynamic seepage conditions or coupled hydro-mechanical processes for broader geotechnical applications.