1. Introduction
Karst regions account for over one-third of China’s land area and contain abundant hydropower resources [
1]. In recent years, an increasing number of large-scale hydraulic and hydroelectric projects [
2,
3,
4,
5], including pumped storage power stations, have been constructed in areas with extensive karst development. The soluble rock formations in karst geology often exhibit pronounced heterogeneity and anisotropy [
6,
7,
8], while the groundwater storage conditions and transfer pathways are highly complex. Additionally, the temporal and spatial distribution of groundwater in these regions is highly uneven, and the permeability characteristics of the rock masses show significant spatial variability. Such complex geological environments frequently lead to hydraulic engineering accidents, such as reservoir and dam leakage, dam foundation instability, slope deformation, and even dam failure [
9,
10,
11,
12,
13,
14]. To ensure the safe construction and reliable operation of infrastructure projects in karst areas, it is crucial to conduct a thorough analysis of the seepage distribution in the project area during the planning phase of the engineering.
The permeability coefficient is one of the key hydrogeological parameters. Common methods for obtaining the permeability coefficient include theoretical analysis, experimental approaches, and inversion techniques [
15,
16,
17]. With the development and application of numerical methods and optimization algorithms, inversion methods have become the mainstream approach for determining the permeability coefficient. Various methods have been proposed both domestically and internationally for permeability-coefficient inversion, including the forward–backward algorithm [
18], the extended-head potential function method [
19], sequential quadratic programming [
20], and methods integrating permeability tensor theory with sequential-quadratic-programming inversion techniques [
21]. However, these traditional inversion methods require numerous iterations of forward and inverse seepage models, making them time-consuming and computationally inefficient, resulting in relatively low accuracy in parameter estimation. Therefore, adopting measures to improve the computational efficiency of parameter inversion is meaningful, especially in the context of preliminary investigations in engineering areas.
In recent times, surrogate models based on intelligent algorithms have emerged as an innovative approach to optimizing inversion efficiency and have been extensively utilized. Surrogate models based on algorithms such as Support Vector Machines [
22], BP Neural Networks [
23], Extreme Learning Machines [
24], Particle Swarm Optimization [
25], and Multivariate Adaptive Regression Splines [
26] have been progressively established. Surrogate-model methodologies generate training samples through finite-element forward models and leverage the robust nonlinear fitting capabilities of algorithms to establish the mapping relationship between permeability coefficients (input variables) and borehole water levels (output variables), thereby enabling rapid solutions in place of traditional seepage forward models. While surrogate-model methodologies enhance computational efficiency, they essentially remain traditional “forward problem-solving” parameter-inversion techniques. Building upon surrogate models, integrating optimization algorithms to invert for optimal parameters has emerged as a new approach in parameter inversion. Scholars both domestically and internationally have conducted extensive research on this topic. Xu Li [
27] proposed an inversion model combining Extreme Learning Machine (ELM) and Genetic Algorithm (GA), which shows superior prediction accuracy and computational efficiency compared to traditional methods. QianWuwen [
28] combined differential evolution algorithms and reduced-order models to solve the inverse problem of seepage fields, enhancing the prediction and analysis capabilities of seepage-field characteristics. He Yiyang and colleagues [
29] used inverse analysis methods to investigate the time-dependent variation of the permeability coefficient of composite geomembranes in practical engineering, helping to more accurately assess the performance and durability of the geomembranes.
However, it has been observed that nonlinear modeling for hydrogeological permeability coefficients exhibit issues such as low prediction accuracy, poor robustness, slow convergence, limited generalization capability, and a tendency to converge to local minima [
30]. To address the aforementioned issues, scholars both domestically and internationally have introduced deep learning and other advanced machine learning models, which possess strong nonlinear modeling capabilities, high prediction accuracy, the ability to handle complex data types, and the capacity for large-scale data processing. Gaur, H. [
31] proposed a novel integral formulation for structural-mechanics analysis that effectively solves linear and nonlinear parameter-inversion problems by employing neural networks as a regression tool. Liu, B.K. [
32] proposed a five-step surrogate-model computational framework that systematically addresses stochastic multi-scale issues in composite design and enhances computational efficiency through machine learning, successfully applying it to nano-composites with results that align well with experimental data, thereby validating its effectiveness in designing new complex nano-composites. Khalil, Z.H. [
33] presents a multilayer perceptron (MLP) model for crop yield prediction using satellite-image time series, employing NDVI histogram transformation for information integrity, analyzing various activation functions, and demonstrating the model’s ability to accurately predict winter crop yields in Iraq up to nine weeks in advance, outperforming traditional methods. Chao, Q. [
34] proposed a hybrid model-driven and data-driven approach for assessing the health status of axial piston pumps, validating its effectiveness under various health conditions through the establishment of a physical-flow loss model and a support vector data description (SVDD) model. Clearly, deep learning and other advanced models may achieve higher predictive performance on certain tasks, but they typically have larger datasets, greater computational resources, and more complex hyperparameter tuning processes, which can be burdensome for the parameter inversion of permeability coefficients in engineering areas. At the same time, there are very few models for the inversion of geological permeability coefficients in karst areas. Therefore, it is necessary to seek a more suitable algorithm.
In response to these challenges, this study incorporates the Random Forest algorithm to develop a surrogate model for seepage analysis. The Random Forest algorithm offers high accuracy, strong noise resistance, and excellent capability in handling high-dimensional data. Also, it effectively prevents overfitting, demonstrates strong adaptability in regression tasks, and exhibits good robustness [
35]. Compared to deep learning or other advanced machine learning models, Random Forest (RF) has lower computational complexity and better interpretability, making it particularly suitable for small sample data. Furthermore, to improve optimization efficiency, this paper introduces a swarm intelligence optimization algorithm—the Secretary Bird Optimization Algorithm—in the optimization process to establish an RF–SBOA intelligent inversion model for permeability coefficients. The feasibility and effectiveness of the proposed model are validated through a case study of the C-pumped storage power station.
5. Conclusions
This paper establishes an intelligent inversion model for geological permeability coefficients based on the RF–SBOA. The model successfully addresses the inversion problem of geological permeability coefficients for the C-pumped storage power station, revealing the overall distribution characteristics of its natural seepage field and validating the reasonableness of seepage flow and permeability gradient under normal water-storage conditions.
By comparing with the CART, BP, and SVR models, the RF model provides borehole water-level predictions that are closer to the finite-element computation values and demonstrates the lowest performance evaluation metrics. This indicates that the RF model has superior predictive accuracy and generalization capability. Therefore, it can serve as an alternative to the finite-element forward model for seepage calculations, significantly reducing the need for extensive finite-element forward computations.
The introduction of the SBOA enhances the RF model’s global search capability and optimization efficiency, improving the ability to identify the optimal geological permeability coefficients. The inversion results for borehole water levels are reasonable, with the maximum absolute and relative errors aligning with engineering experience. In comparison with RF–PSO, RF–GWO, and RF–SSA surrogate models, the RF–SBOA demonstrates faster optimization efficiency, stronger global search capabilities, and lower errors. The calculated distribution of the natural seepage field is consistent with the general patterns observed in mountainous seepage fields. Considering the dam body and curtain grouting, during the operational period under normal water-storage conditions, the seepage flow and seepage gradient meet regulatory standards.
The introduction of the RF–SBOA improves the efficiency of the inversion process and offers strong potential for future applications in engineering geological inversion. However, there is still room for improvement. Firstly, this study only employs data from a single borehole in the project area to model and invert the optimal geological permeability coefficient. Determining how to utilize the relationships between different boreholes in modeling requires further research. Secondly, while the accuracy of the RF–SBOA is sufficient for permeability-coefficient inversion in engineering areas, whether it has broader applicability in other fields, such as material parameter inversion in static and dynamic calculations, still needs further investigation and validation.