Next Article in Journal
Anisotropy-Driven Failure Mechanisms in Deep Mining: Integrated Geomechanical Analysis of the Draa Sfar Polymetallic Mine (Morocco)
Previous Article in Journal
Gender-Aware Driver Drowsiness Detection Using Multi-Stream Shifted-Window-Based Hierarchical Vision Transformers
Previous Article in Special Issue
Research Trends in Thermal Surveys and Thermomechanical Modeling of Landslides
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Distribution Characteristics and Hazard Assessment of Ground Collapse in the Mining Activity Areas of the Turpan–Hami Basin

1
School of Earth Science and Engineering, Hebei University of Engineering, Handan 056038, China
2
State Key Laboratory of Lithospheric and Environmental Coevolution, Institute of Geology and Geophysics, Chinese Academy of Sciences, Beijing 100029, China
3
College of Earth and Planetary Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
4
School of Civil Engineering, Qingdao University of Technology, Qingdao 266033, China
*
Authors to whom correspondence should be addressed.
Appl. Sci. 2026, 16(7), 3354; https://doi.org/10.3390/app16073354
Submission received: 13 February 2026 / Revised: 23 March 2026 / Accepted: 26 March 2026 / Published: 30 March 2026

Abstract

The Turpan–Hami Basin, a critical energy hub in northwestern China, is plagued by frequent ground collapses induced by extensive mining over karst geology, threatening ecology and safety. Current hazard assessment methods, mainly single linear or traditional machine learning models, fail to capture the complex nonlinear interactions inherent to this coupled geo-mining environment. This study addresses this gap by establishing a multi-dimensional “Geology-Mining-Hydrology-Environment” index system comprising 14 critical factors—including lithology, goaf distribution, mining intensity, and their interaction terms. A coupled gradient boosting decision tree and logistic regression (GBDT-LR) model, optimized for the multi-factor coupling characteristics of ground collapse in arid mining basins, was applied for the hazard assessment. The results reveal a distinct spatial pattern of “core agglomeration with multi-level gradient differentiation.” Extremely high-hazard areas, covering 9.21% of the area, are concentrated in the core mining areas northwest of Turpan and southwest of Hami, while high-hazard areas (4.63%) form surrounding belts. The GBDT-LR model (AUC = 0.871) demonstrated significantly superior performance over a single logistic regression model (AUC = 0.813), proving its enhanced capability to identify high-hazard areas by modeling complex factor interactions. This work provides an essential scientific foundation for implementing zonal hazard management and prioritizing disaster prevention projects in key areas of the basin.

1. Introduction

In areas with intensive mining activities, the combined effects of underground extraction and surface disturbance have led to an increasing trend in the frequency and severity of geological disasters. As an important subtype of geological disasters, ground collapse can be further categorized into two typical forms: goaf collapse and karst collapse. The former occurs when overlying strata, having lost effective support after large-scale extraction of underground ore bodies, undergo caving, bending, and fracturing, which ultimately propagates to the surface, forming collapse pits or continuous subsidence basins [1]. The latter refers to the instability and collapse of overlying soil caused by changes in groundwater dynamics in areas underlain by soluble limestone [2,3]. Current disaster hazard assessment methods are primarily classified into three categories: qualitative, semi-quantitative, and quantitative. Qualitative assessment describes the disaster development level, distribution patterns, and triggering factors through field investigations and historical data analysis [4]. Semi-quantitative assessment quantitatively evaluates hazard by integrating geological environmental conditions and disaster characteristics, employing methods such as the Analytic Hierarchy Process and entropy weight method [5,6,7,8]. Quantitative assessment utilizes GIS technology, coupled with methods like the certainty factor method, weight of evidence model, and logistic regression model, to achieve disaster zoning and prediction [9,10].
In the field of geological disaster assessment, the relevant evaluation system often involves three core concepts: susceptibility, hazard, and risk. Among them, hazard focuses on the occurrence probability and potential intensity of the disaster itself, which is determined by objective factors such as the disaster-forming environment and disaster-causing factors, and serves as the basis for disaster assessment. Risk is a comprehensive embodiment of hazard, vulnerability, and exposure of hazard-bearing bodies, focusing on the probability of losses caused by disasters. Current studies on ground collapse in mining areas mostly first carry out disaster hazard assessment to identify high disaster-prone areas and then further conduct risk analysis. This study focuses on the hazard assessment of ground collapse in the mining activity areas of the Turpan–Hami Basin, aiming to identify disaster-prone areas and provide a basic basis for subsequent regional disaster prevention and control.
The Turpan–Hami Basin is a significant energy base in China, with a long history of oil, gas, and coal resource exploitation. Prolonged mining activities have created extensive goaf areas, providing conditions for goaf collapse, while localized Carboniferous–Permian limestone distributions form a potential basis for karst collapse [11]. As development activities continue, the scale and frequency of ground collapse in this region show an increasing trend [12]. These collapses not only damage surface morphology and the ecological environment [13] but may also affect regional water resource balance [14], posing a serious threat to resident safety and mining production [15]. In recent years, research on ground collapse in this region has primarily focused on disaster distribution characteristics and hazard assessment methodologies. Studies indicate that collapse disasters are concentrated in mining activity areas such as coal mines and oil–gas fields, closely related to complex geological structures, fractured lithology, and high-intensity engineering activities [16]. Regarding assessment methodologies, they encompass both remote sensing monitoring technologies with data-driven comprehensive evaluation models and introduce various machine learning models [17]. However, in regions like the Turpan–Hami Basin with active geological structures and fractured rock masses, existing methods face challenges: conventional statistical models inadequately analyze the coupled disaster-triggering mechanisms of mining activities and geological background, resulting in limited prediction accuracy [16]. Therefore, there is a need to develop assessment methods capable of integrating multi-source data and comprehensively quantifying both natural and anthropogenic factors.
Logistic regression (LR) and gradient boosting decision tree (GBDT) models are widely utilized in hazard assessment studies for geological disasters. LR has clear principles but offers limited capability in characterizing complex nonlinear relationships and is relatively sensitive to data distribution [18]. In contrast, GBDT, as an ensemble learning algorithm, can effectively capture complex nonlinear relationships and interaction effects among factors, impose less stringent requirements on data distribution, and can provide feature importance to enhance interpretability [19,20,21]. Comparing and integrating multiple models to improve assessment accuracy has become an important trend [22,23].
To reveal the distribution patterns of ground collapse disasters and achieve accurate hazard assessment in the mining activity areas of the Turpan–Hami Basin, this study proposes an innovative approach. First, a comprehensive “Mining Activity Geohazard Potential Index (G)” is constructed to systematically quantify the driving contribution of mining activities. Second, a parallel strategy employing both the “LR model” and the “GBDT-LR coupled model” is adopted to conduct the hazard assessment and perform comparative validation. Regarding the selection of evaluation factors, this study establishes and employs a multi-dimensional “Geology-Mining-Hydrology-Environment” evaluation index system. This system encompasses foundational geological factors such as elevation, slope, and lithology, while also prioritizing core mining activity factors, including goaf distribution, mining intensity, and depth. It simultaneously incorporates hydrological factors like groundwater level change and environmental factors such as vegetation cover. This comprehensive framework covers key elements of both natural background and anthropogenic disturbances, aiming to overcome the shortcomings of traditional assessments, which often inadequately quantify human-driven factors and suffer from limited dimensionality.

2. Study Area Overview

As shown in Figure 1, the Turpan–Hami Basin is located in the northwest of China, presenting an elongated east–west orientation. It is bounded by the Bogda–Harlik Mountains to the north and faces the Jueluotag Mountains to the south, with a total area of approximately 5.3 × 104 km2 [11,24]. The area features diverse landform types, with an overall low-lying topography surrounded by mountains. The climate is temperate continental and characterized by scarce precipitation, intense evaporation, and significant temperature variations. Hydrological conditions are scarce, with surface water primarily supplied by snowmelt from the surrounding mountainous areas. Rivers are mostly seasonal, often forming endorheic lakes or dry saline–alkali lands within the basin [11].
The frequent occurrence of ground collapse disasters in this region is closely related to its abundant mineral resources and high-intensity development activities. The basin has a long history of petroleum, natural gas, and coal resource exploitation, and significant progress has been made in recent years in the exploration of metallic minerals in its southern margin, leading to the discovery of numerous deposits of Cu, Pb, Zn, and others [25]. Consequently, the widely distributed and diverse mineral resources intertwined with various extraction methods in the Turpan–Hami Basin create a complex geo-engineering background that induces ground collapse, making such disasters a recurrent issue.

3. Data Sources and Research Methods

3.1. Data Sources

The ground collapse disaster point dataset used in this study was obtained through remote sensing interpretation. It should be emphasized that the ground collapse inventory in this study is limited to goaf collapse induced by mining activities. This interpretation work, based on multi-temporal high-resolution remote sensing imagery and InSAR surface deformation monitoring data, completed a preliminary screening of potential geological hazards in the study area by integrating disaster-predisposing background information. It should be noted that the remote sensing interpretation method inherently involves certain positional location errors and qualitative uncertainties. Therefore, this dataset primarily serves large-scale preliminary hazard assessment and pattern analysis. The precise locations and attributes of the points require verification through more detailed field investigations. The subsequent analysis in this paper will be conducted within this cognitive framework. The sources of information for the contributing factors are listed in Appendix A Table A1, among which the slope and aspect were calculated using the study area’s DEM data in ArcGIS (version 10.8).
It should be noted that a small fraction (<10%) of collapse points were not verified in the field due to harsh terrain and limited accessibility in some mining areas, leading to inherent positional location errors within ±30 m (consistent with the spatial resolution of the remote sensing imagery used). This uncertainty has a limited impact on the model results and subsequent analysis for the following reasons:
  • Over 90% of collapse points were accurately confirmed via field investigations, ensuring the reliability of the core dataset for hazard assessment; the positional deviation of the remaining unverified points is constrained to ±30 m, which is consistent with the spatial resolution of the remote sensing imagery.
  • This study focuses on large-scale spatial distribution patterns and multi-factor coupling mechanisms of ground collapse in the Turpan–Hami Basin (study area ≈ 5.3 × 104 km2), where collapse clusters typically span several square kilometers. Compared with this analytical spatial scale, the ±30 m positional deviation is negligible and will not interfere with the identification of overall distribution laws.
  • The GBDT-LR model relies on the aggregate spatial correlation between multiple geographic factors and collapse occurrence, rather than the precise coordinates of individual points. Thus, minor positional errors in a small subset of samples do not alter the identified disaster-causing mechanisms or degrade the model’s predictive performance.
A total of 350 ground collapse points were obtained in this study. To ensure data balance for model training, 350 non-collapse points were randomly selected as control samples within the study area. Both the training set and the validation set were maintained with a constant 1:1 ratio of collapse to non-collapse samples, thereby ensuring no skewness issue in the dataset throughout the modeling process.

3.2. Research Methods

3.2.1. Preliminary Screening and Selection of Models

To ensure the reliability and suitability of the assessment models, this study first conducted a preliminary screening of multiple models using 5-fold cross-validation on the training dataset divided by a 7:3 train–test ratio. Screening covered mainstream machine learning algorithms, including random forest (RF), extreme gradient boosting (XGBoost), and the gradient boosting decision tree–logistic regression coupled model (GBDT-LR). The preliminary screening results are shown in Table 1. The AUC value in Table 1 is derived from cross-validation on the training dataset, while the final model performance evaluation in the subsequent hazard assessment adopts an independent validation dataset. The reasonable difference in AUC values is caused by the different data scales and sample compositions of the two datasets, which is a normal generalization error of the model. Considering model performance, mechanism suitability, and application requirements, this study ultimately selected the GBDT-LR coupled model as the core assessment tool. The AUC differences among RF, XGBoost, and GBDT-LR are all less than 0.006, and the accuracy difference has no substantial impact on regional hazard assessment. This study needs to determine the action direction of 14 disaster-causing factors, calculate quantitative contribution weights, and adopt two-dimensional analysis to verify the reliability of results. Comparatively speaking, GBDT-LR is more suitable for the core analysis needs of this study; RF and XGBoost are difficult to meet such directional quantitative analysis and dual-method verification needs.

3.2.2. GBDT-LR

GBDT can automatically capture nonlinear relationships and higher-order interaction effects among contributing factors. Its objective function expression is
L ( θ ) = i = 1 n y i ln p i + ( 1 y i ) ln ( 1 p i ) + Ω ( θ )
where L(θ) represents the objective function of the GBDT model, θ denotes the model’s parameters (including tree structure, leaf node weights, etc.), n is the total number of samples, yi is the true disaster state of the i-th sample (1 indicates ground collapse occurrence, and 0 indicates non-occurrence), pi is the predicted collapse probability for the i-th sample, and Ω(θ) is the regularization term used to control model complexity and prevent overfitting.
LR is a linear model used for handling binary classification problems. Its core mechanism involves mapping the output of a linear combination to the interval (0, 1) via a sigmoid function, representing the probability that a sample belongs to a certain category [26]. The specific mathematical expression is
y ^ = 1 1 + e ( θ 0 + θ 1 x 1 + θ 2 x 2 + + θ n x n )
where y ^ represents the predicted probability, x 1 , x 2 , , x n are independent variables, and θ 0 ,   θ 1 , θ 2 ,   ,   θ n are the model’s parameters.
Parameter estimation for the logistic regression model typically employs the maximum likelihood estimation method. First, the likelihood term for each sample is calculated based on the model:
p 1 ( x ^ ; θ ^ ) y i p 0 ( x ^ ; θ ^ ) ( 1 y i )
where, p 1 and p 0 represent the probability of a sample belonging to the positive and negative classes, respectively. Then, the likelihood terms are multiplied cumulatively to construct the maximum likelihood function, which is then logarithmically transformed to obtain the log-likelihood function:
L ( θ ^ ) = i = 1 N y i ln p 1 x ^ ; θ ^ 1 y i ln 1 p 1 x ^ ; θ ^
Minimizing Equation (4) yields the parameters [27].
In the script’s implementation, to eliminate dimensionality and meet the requirements of LR for discrete explanatory variables, the seven continuous factors (elevation, slope, aspect, rainfall, NDVI, groundwater level change, and distance to goaf) were first divided into 10 equal intervals. The selection of 10 equal intervals is based on the obvious spatial heterogeneity and wide variation range of continuous factors in the Turpan–Hami Basin, which can balance the information retention of raw data and the practical interpretability of subsequent hazard zoning. This setting avoids excessive information loss from overly coarse discretization (e.g., 5 intervals) and overfitting or noise disturbance from overly fine division (e.g., 15 intervals). In line with previous studies that the discretization level of continuous factors matches the disaster sample size and model type [28,29,30], this 10-interval discretization is more suitable for the logistic regression component than direct use of continuous data. Notably, all SHAP analyses in this study were conducted based on the original continuous data, and the threshold characteristics revealed by the SHAP bar plots and scatter plots are generally consistent and mutually supportive, excluding artifacts caused by data discretization. The remaining ordinal contributing factors maintained their original classification standards. All contributing factors, after this grading and categorization process, were subjected to categorical attribute quantization encoding and standardized normalization. Finally, a logistic regression model that accounts for class balance was employed to output the probability of ground collapse occurrence.
This study determined the optimal hyperparameters for the model through 5-fold grid search cross-validation. This coupling strategy of “nonlinear models capturing complex relationships + linear models optimizing output interpretability” has been proven effective in disaster research. In similar research, combining time series decomposition with a Particle Swarm Optimization-based BP model significantly improved the accuracy of landslide displacement prediction. Its core logic is consistent with the GBDT-LR coupling approach of this study, as both overcome the limitations of single models through multi-model complementarity [31]. To achieve effective integration of GBDT and LR, the basic contributing factors were first subjected to classification and quantization processing. Then, new disaster-associated features were constructed by combining the collapse probability output from GBDT. Based on the integrated features, the LR model was retrained, and its mathematical expression is
z i c o u p = β 0 c o u p + k β k c o u p F i k
where z i c o u p represents the linear predictor of the coupled model for the i-th sample, β 0 c o u p is the intercept term of the coupled model, β k c o u p is the regression coefficient corresponding to the k-th feature, and F i k is the k-th feature of the i-th sample. In summary, this method consists of two main steps: First, GBDT is used to capture the complex nonlinear relationships among factors, and then, the extracted high-order features are fed into the LR model for training, achieving a fusion of “nonlinear feature mining” and “linear result output.” This strategy retains GBDT’s capability to capture complex disaster-causing interactions while utilizing LR to provide clear quantitative results for the contribution parameters and association strength of contributing factors. It thus balances prediction accuracy and causal interpretability. This method has been proven effective in enhancing prediction accuracy in multi-factor coupling scenarios [32].

3.2.3. SHAP Analysis Method

To accurately quantify the contribution intensity of each contributing factor to ground collapse hazard and reveal potential inter-factor relationships, this study introduced the SHAP (SHapley Additive exPlanations) analysis method (as shown in Figure 2). This method decomposes the model prediction results, attributing the collapse probability of each sample to the independent contributions and interaction effects of individual contributing factors. It has been widely applied in model interpretation for geological hazard assessment [33,34]. Unlike traditional dependence plots, this study employed a bar chart to visually present the mean SHAP value for each contributing factor. A larger absolute value of the SHAP value on the vertical axis indicates a more significant influence of the factor on the collapse hazard. A positive value indicates that the factor has a positive driving effect on collapse, while a negative value signifies a suppressing effect (as shown in Appendix B Figure A1). Through this analysis, not only were the contribution weights of factors such as elevation, slope, and aspect clarified, but the coupling amplification effect of the “lithology × mining intensity and depth” interaction term was also quantified. Compared to traditional dependence plots, this visualization method retains the capability of the SHAP method to characterize nonlinear relationships while simplifying the comparative interpretation of factor importance. It provides quantitative support for hazard zoning verification.

3.2.4. Geohazard Potential Index (G) for Mining Intensity and Depth

Existing studies mostly consider mining intensity or mining depth separately, and discrete mining data are difficult to adapt to global collapse assessment, lacking a spatial mining disaster-causing index coupling the two core factors. To quantitatively characterize the “potential triggering intensity” of mining activities in the Turpan–Hami Basin on ground collapse disasters from both mining intensity and mining depth perspectives and to overcome the difficulties arising from missing data or inconsistent scales in some mining areas during the assessment process, this study proposes the use of the “Geohazard Potential Index” (G). This index is an innovative extension based on the disaster-causing intensity and disaster-pregnant environment coupling theory, which is constructed to solve the distortion problem of single-factor evaluation in composite mining areas [35,36,37,38]. This index is based on the core hypothesis that “higher mining intensity and shallower goaf areas lead to a higher hazard of ground collapse.” It couples annual production (static/dynamic) with depth (buffer area) to achieve comparable quantification across different mining methods and mining areas with varying depths. Specifically, mining intensity acts as the core driving force for ground collapse (numerator), as high-intensity mining significantly disturbs the initial stress equilibrium of rock masses and increases the risk of overburden failure and ground collapse [35]; mining depth serves as the inhibitory buffering factor (denominator), since the self-weight stress of overlying strata can buffer and disperse the propagation of mining-induced damage with the increase of depth, thus reducing the degree of disaster manifestation [36]. The coupling form of G = I/D directly reflects the geological law that “the greater the mining intensity and the shallower the depth, the higher the disaster potential”, which makes up for the defect that traditional studies only consider a single factor of mining intensity or depth [37,38].
This study uses the mining planning intensity of the “Overall Plan for Mineral Resources in Xinjiang Uygur Autonomous Region (2021–2025)” as the mining intensity (I, in units of 104 t·a−1). To convert the discrete planning data into a continuous spatial layer, we digitized the spatial boundaries of each mining area using ArcGIS and then generated continuous buffer zones around these boundaries. Within the buffer zones, we spatially integrated the mining intensity (I) with the corresponding mining depth (D) to calculate the G-index values for all grid cells in the study area, forming the continuous spatial layer of the G-index. For mining areas with both open-pit and underground extraction, the two methods were treated as two distinct “sub-mining sources,” denoted separately as Iopen and Iunder, and were not simply summed. This was to ensure the physical reasonableness of subsequent hazard superposition. The mining depth was determined based on stratigraphic and lithological predictions for the Turpan–Hami Basin: The median value of the depth interval was taken for underground mining (e.g., 100–200 m, takes 150 m). Open-pit mining depth was set to 10 m to highlight its high hazard, while mixed mining was assigned separate values (Dopen = 10 m, Dunder = the median of the underground interval).
G o p e n = I o p e n D o p e n
G u n d e r = I u n d e r D u n d e r
G total = G open + G under
The unit of G is “104 t·a−1·m−1”, which can be understood as the annual ten-thousand-ton mining load borne per meter of overlying strata, and it can convert discrete mining point data into a global continuous spatial index to solve the problem that discrete mining data is difficult to integrate into the 14-dimensional disaster-inducing factor coupling assessment system. The shallower the depth, the more G increases nonlinearly and sharply. Conversely, for deep mining, due to the thicker overlying rock mass and lower sensitivity to surface collapse, G naturally attenuates. This property is entirely consistent with field understanding—“shallow goafs are highly prone to triggering ground collapse, while deep mining disturbances mainly result in internal rock mass movement”. Thus, G can serve as one of the core driving factors in regional ground collapse hazard assessment.
For open-pit mines, the mining depth is uniformly set to Dopen = 10 m. This setting is a rational simplification for two reasons: On the one hand, it is a prudent treatment for mining hazard assessment to avoid the omission of high-hazard areas in open-pit mining areas; on the other hand, the core research object of this study is ground collapse induced by underground goaf roof instability, and open-pit mine collapse is a secondary research object in this study. It should be noted that this unified setting has certain subjectivity and simplification, as it does not reflect the differences in the actual mining depth of individual open-pit mines and may overestimate the hazard level of some open-pit mines, which is a methodological limitation of this study. This limitation has no essential impact on the core conclusions of this study, since all analyses are centered on underground goaf collapse with parameters determined by actual survey data.

3.2.5. Coefficient of Variation (CV) for Groundwater Level Change

In the quantification process of the contributing factor “groundwater level change,” this study cites the 2005–2022 groundwater level data from Wang et al. (2025) [39]. First, the groundwater level data for the Turpan–Hami Basin from 2005 to 2022 were extracted from this dataset. The coefficient of variation (CV) was then employed to characterize the intensity of water-level fluctuations from January 2005 to December 2022 (a total of 216 months), as shown in Figure 3 (the main administrative divisions of the Turpan–Hami Basin are Hami City and Turpan City). The coefficient of variation can eliminate the influence of dimensionality, focusing on reflecting the degree of dispersion of groundwater levels over the time series, and can integrate the comprehensive characteristics of its long-term monotonic decline and seasonal oscillation to a certain extent, achieving an objective quantitative characterization of this factor. As a secondary disaster-causing factor with a low weight in this study, this quantification method is sufficient to meet the basic research needs of this mining-induced ground collapse hazard assessment. Its expression is
C V i = σ i μ i × 100 %
In the equation, σi represents the standard deviation of the 216 months of groundwater level data at the i-th disaster point, and μi is the corresponding mean value. A larger CV value indicates more intense fluctuation in groundwater levels at that point, which may lead to more significant effective stress mutation, subsurface erosion, or buoyancy effects, thereby contributing more substantially to ground collapse disasters.

4. Results

4.1. Spatial Distribution Characteristics of Disasters

The distribution of ground collapse disasters in the Turpan–Hami Basin is shown in Figure 4. From the figure, it can be observed that ground collapse disasters within the study area exhibit a distinct multi-core cluster aggregation pattern. This distribution state is closely related to mining activities in the region, representing a typical spatial response under mining disturbance [40].
In Region 1 of Figure 4, located in the northwestern part of Turpan City and the surrounding areas of Shanshan County, the collapse points exhibit characteristics of contiguous and dense distribution. This area contains the core operational areas of several coal mines, such as Qiquanhu Lake and Aydingkol Lake. Long-term underground mining has formed extensive goaf areas. Under continuous mining disturbance, the distribution of collapse points is extremely concentrated, with inter-point distances often less than several hundred meters. These collapse points are not only densely distributed within the main mining areas but also spread contiguously to the surrounding regions, showing a significant spatial aggregation effect. Extending southeast to the eastern side of Toksun County, the collapse points shift to a locally concentrated distribution pattern. This area corresponds to small and medium-sized mining areas such as Ke’erjian. Limited by the mining scale, although the collapse points still maintain aggregation characteristics, their distribution range is noticeably contracted.
In Region 2 of Figure 4, located in the southwestern part of Hami City and adjacent areas, the collapse points are differentiated into multiple dense clusters of varying sizes. Their distribution covers composite oil–gas and coal mining areas such as Dananhu Lake and Sha’erhu Lake. Influenced by the superimposed disturbances from both oil–gas extraction and coal mining, the spatial distribution of mining disturbances is relatively dispersed, yet the collapse points within each cluster are compactly distributed.
By comparison, in the eastern plains, southern lowland areas, and northern high-altitude mountainous regions of the study area, where large-scale mining activities are currently absent and rock formations have not undergone substantial engineering disturbance, the distribution of collapse points is extremely sparse. Most areas show only sporadic records, and there are even large blank areas with no collapse points.

4.2. Analysis of Factors Influencing the Spatial Distribution of Disasters

Ground collapse is the result of the combined effects of regional basic geological conditions and external triggering factors. These basic geological conditions primarily refer to static backgrounds such as topography, stratigraphic lithology, and geological structure, while external triggering factors include dynamic processes such as mining activities, rainfall, earthquakes, and groundwater level changes—all of which collectively shape the overall spatial distribution of ground collapse [4,41]. Combining the literature and the geological environmental characteristics of the Turpan–Hami Basin, this study selected 13 contributing factors, as shown in Figure 5: elevation, slope, aspect, lithology, distance to water systems/faults/roads, PGA, rainfall, goaf distribution, mining intensity and depth, groundwater level change, and NDVI. Additionally, the interaction term “lithology × mining intensity and depth” was introduced. Thereby, a multi-dimensional “Geology-Mining-Hydrology-Environment” evaluation index system was constructed [42].
The disaster formation process of ground collapse in the Turpan–Hami Basin results from the synergistic interaction between geological background and anthropogenic activities. Analyzing the weights and interaction mechanisms of the contributing factors can provide targeted evidence for regional disaster prevention and mitigation. In terms of factor weight proportion, the factors with relatively large proportions include elevation, distance to goaf, and rainfall, among others (Figure 6). It is noteworthy that the contribution weight of the “lithology × mining intensity and depth” interaction term surpasses that of some single factors, revealing a coupling amplification effect between specific geological conditions and human activities. This is similar to the findings of Wang et al. [43], who studied the Shiyangpo DCL as an example, revealing complex interactions between heavy rainfall and the engineering geological properties of loess slopes. These interactions trigger a chain of disaster events, for which their destructive power far exceeds that of individual landslides or debris flows.
Elevation is one of the primary inducing factors for ground collapse, with the highest weight proportion. Areas with lower elevation typically have relatively higher groundwater levels, showing a general correlation with the higher occurrence probability of ground collapse [44,45]. The Turpan–Hami Basin contains numerous coal mine goaf areas, and its topography—surrounded by mountains with a low-lying central basin—creates a strong correlation between elevation, goaf distribution, and groundwater conditions. This gives elevation a significant regulatory role.
The importance of distance to goaf is second only to mining intensity and depth, acting as a direct influencing factor closely related to the occurrence of goaf collapse. The formation of goafs through underground mining is closely associated with the increased hazard of ground collapse, with the influence degree closely related to the distance from the goaf [46,47]. Long-term mining activities in the Turpan–Hami Basin have formed extensively distributed goaf areas. The rock masses surrounding these goafs are in a state of stress imbalance. The closer the distance, the stronger the transmission effect of stress disturbance and strata deformation, leading to a higher risk of collapse [48]. The subsidence effect of the overlying rock strata above goafs is most significant within 5.4 km and gradually attenuates with increasing distance [49]. This highly aligns with the distribution characteristic of collapse points in the study area being “centrally clustered around goafs,” making it the second major controlling factor.
Although rainfall is a scarce factor in the Turpan–Hami Basin, its triggering effect during extreme events makes it rank third in the weight of disaster-causing factors. The Turpan–Hami Basin is an arid inland area with an average annual rainfall of only 1.1~37.7 mm (relatively concentrated in the north and scarce in the south), and extreme rainfall is an important triggering factor for ground collapse [16]. In arid regions, rock and soil masses remain in a dry state for extended periods, making their shear strength highly sensitive to moisture. A small amount of rainfall can disrupt the critical equilibrium. This triggering effect becomes particularly significant when superimposed with factors such as goaf areas and groundwater level changes [50,51].
The interaction term is a composite contributing factor constructed through quantitative coupling calculations [43]. Its core lies in integrating lithology, which characterizes the foundational geological conditions (reflecting the inherent properties of the rock mass to resist disturbance and failure), with mining intensity and depth, which characterize the intensity of anthropogenic disturbance (quantifying the load scale and influence depth of mining activities). This combination forms a comprehensive indicator capable of reflecting the synergistic effects between the two. In mining activity areas, a higher mining intensity leads to a greater likelihood of ground collapse disasters, as high-intensity mining more severely undermines the stability of the underground rock mass. When the mining depth is greater, the spatial extent of surface collapse may be relatively limited, but its depth, severity, and time lag are often more significant [46]. As a crucial energy core area in northwestern China, the Turpan–Hami Basin has experienced long-term large-scale composite mining, and the induced mining stress disturbance is quantified by the Geohazard Potential Index (G) constructed in this study. To verify the rationality of the G index, this study conducted a spatial coupling comparison between the mining intensity and depth hazard potential index G distribution map (Figure 5k) and the kernel density distribution map of ground collapse disaster points (Figure 4). The results show a high degree of consistency between the two, demonstrating that the G index can effectively reflect the intrinsic relationship between mining activities and ground collapse. The mechanical properties of different rock formations in the study area vary significantly (Table 2), resulting in different sensitivities of rock mass to mining disturbance. A single factor is entirely incapable of reflecting this “lithology + mining intensity and depth” matching relationship. Considering only the lithology reveals only the inherent stability potential of the rock mass and not the extent of impact from mining activities. Considering only mining intensity and depth indicates only the intensity of anthropogenic disturbance and not the varying tolerance of different rock formations to such disturbance. This interaction term accurately captures the core mechanism of “geological background control–anthropogenic activity triggering.” It fills the gap left by single factors in adequately representing nonlinear coupling effects, thereby establishing itself as the seventh major controlling factor.

4.3. Hazard Assessment Results

Based on the triggering mechanisms and geological environmental characteristics of ground collapse in the Turpan–Hami Basin, the 14 evaluation indicators (including the interaction term) are categorized into four dimensions: Geological Foundation Factors (elevation, slope, aspect, distance to faults, lithology, and PGA), Mining Activity Factors (distance to goaf, distance to roads, mining intensity and depth, lithology × mining intensity, and depth interaction term), Hydrogeological Factors (distance to water systems, groundwater level change, and rainfall), and Ecological Environment Factors (NDVI).
Continuous indicators (e.g., elevation, rainfall) were graded using equal interval or natural breakpoint methods, while discrete indicators (e.g., lithology, slope aspect) were processed via classification coding. All indicators underwent encoding and standardization to eliminate the effects of dimensionality differences and imbalanced data distributions [27,52]. This process fully considered the unique geological environmental characteristics of the Turpan–Hami Basin—such as the superposition effect of widely distributed weak rock formations and high-intensity mining activities. By quantifying interaction terms, it achieved a precise characterization of the composite disaster mechanism of “geological background control–human activity triggering.” This aligns with the principles for constructing a geological hazard assessment index system proposed by Van et al. [53], which emphasize simultaneously accounting for the synergistic effects of natural factors and human activities.
Model selection and construction must align with the corresponding disaster characteristics. Considering that ground collapse disasters in the Turpan–Hami Basin are subject to both linear influences from single factors (e.g., the negative correlation between distance to goaf and collapse probability) and nonlinear interaction effects among multiple factors, this study employed GBDT-LR analysis. Based on the assessment results from the GBDT-LR model, spatial statistics were performed using ArcGIS to generate a ground collapse hazard zoning map for the study area (Figure 7). The area proportions for each hazard level are as follows: extremely low-hazard area: 63.44%; low-hazard area: 13.83%; medium-hazard area: 8.88%; high-hazard area: 4.63%; and extremely high-hazard area: 9.21%.
The extremely high-hazard areas exhibit a distribution pattern of “two major core patches + scattered points,” covering a total area of approximately 4882.42 km2. These are identified in this study as areas with a high frequency of collapse occurrences. The first core area is located in the core coal mining area northwest of Turpan City. As shown in Figure 5j, the lithology in this region is predominantly soft rock formations. The synergistic effect of multiple factors, including the rock fragmentation effect induced by branching faults within the fault areas and groundwater level changes, leads to a significant increase in collapse hazard. Research indicates that in shallow, buried goaf areas, fluctuations in groundwater level can reduce the effective stress of the overlying rock strata, thereby exacerbating the collapse hazard [54,55,56,57]. The second core area is the compound mining area for oil–gas fields and coal mines southwest of Hami City. In this region, coal mine goaf areas and oil–gas well networks are interwoven, leading to stress imbalance in the overlying rock and a high density of collapse pits. The core mechanism may be attributed to the mutual superposition and interference of stress fields from different engineering disturbances.
The high-hazard areas are primarily distributed in a “belt-like pattern” surrounding the extremely high-hazard areas, covering a total area of approximately 2455.05 km2 and concentrated within a range of about 0.5-3 km from the periphery of the extremely high- hazard areas. This spatial extent aligns with the quantitative pattern of ‘engineering disturbance-induced surface instability,’ indicating that the impact of mining disturbance on surrounding rock masses attenuates in a gradient manner with increasing distance. It confirms a significant diffusion effect of mining-induced stress beyond the periphery of goaf areas, which generates a regional collapse hazard through stress transmission and superposition with local goafs. This hazard attenuates as the distance increases [48].
The medium-hazard areas exhibit a “patchy and scattered distribution,” covering a total area of approximately 4706.44 km2, primarily located around small-scale mining areas in the northern and eastern parts of the basin. Mining activity intensity in this region is relatively low, and the rock formations are comparatively hard. The collapse hazard mainly originates from localized, small-scale goafs.
The low and very low-hazard areas are distributed across the eastern plains, southern lowland areas, and northern high-altitude mountainous regions of the basin, with a total area of approximately 40,956.07 km2. This area is distant from goaf areas and has sparse faults, hard rock layers, and stable groundwater levels, resulting in the rare occurrence of collapse disasters.

5. Discussion

5.1. Quantitative Analysis with SHAP

Based on the prediction results of the GBDT-LR coupled model, this study employed SHAP analysis to deconstruct the model’s prediction logic. A bar chart (Appendix B Figure A1) was used to quantitatively analyze the contribution intensity and action priority of each contributing factor. Combined with the analysis of the coupling effect of the “lithology × mining intensity and depth” interaction term, it provides a key basis for the in-depth interpretation of multi-factor synergistic disaster formation patterns and the mechanistic validation of the model results.
Regarding the influencing factors of ground collapse in the Turpan–Hami Basin, previous research has predominantly focused on aspects such as tectonic structure, lithology, and mining activities [58,59]. While confirming these primary factors, this study further elucidates the quantitative nonlinear influence pattern of the elevation factor based on SHAP values. As shown in Appendix B Figure A1a1,a2, the interval-averaged SHAP bar plot (a1) demonstrates that elevation exerts a significant positive effect on ground collapse within the range of 600~1500 m, with this positive effect peaking at 1000~1200 m (corresponding to the interval [626.20, 1416.40] m) at a mean SHAP value of 0.537. The sample-level SHAP scatter plot (a2) further verifies this nonlinear trend: Samples in the 1000~1200 m interval exhibit the highest positive SHAP values, while the positive effect gradually weakens as elevation exceeds 1500 m and turns negative at higher altitudes. This identifies 1000~1200 m as the most sensitive interval for the elevation factor, likely resulting from the synergistic interaction between topographic conditions and other contributing factors (e.g., hydrogeological conditions or engineering activities) within this specific altitude range, thereby exacerbating the occurrence of ground collapse.
An analysis combining the distance-to-goaf distribution map (Figure 5f) and Appendix B Figure A1f1,f2 in the Turpan–Hami Basin shows that the SHAP values of the hazard factor “distance to goaf” exhibit an obvious nonlinear variation with ground collapse hazard. Within a relatively close range to the goaf, the SHAP values are positive, indicating that this region is highly sensitive to disasters, where the stress disturbance effect of the goaf is the strongest. With a gradual increase in distance, the SHAP values decrease rapidly and turn negative, implying that the driving effect of the goaf on the collapse hazard is no longer significant and may even exert an inhibitory effect in this range. This spatial distribution and distance effect assessment framework is highly consistent with relevant studies worldwide. In multi-factor hazard assessment studies within a GIS environment, Kim et al. [60], based on nine influencing factors (including “distance to mining roadways”), employed frequency ratio models, logistic regression models, and integrated GIS analysis. Their work verified that this distance factor is a key element influencing ground collapse hazard. Furthermore, the frequency ratio data indicated that collapse hazard exhibits a significant decay trend with increasing distance from mining roadways (core areas associated with goafs), a pattern fundamentally consistent with the regularity revealed by the SHAP dependence plots in this study.
As shown in Appendix B Figure A1d1, within the low rainfall range (approximately 0–8 mm), rainfall mainly exhibits a negative contribution, meaning that rainfall of this magnitude helps maintain the stability of the rock and soil mass, thereby suppressing collapse. When rainfall exceeds the critical threshold and enters the medium rainfall range (approximately 8–27 mm), it transforms into a strong positive driving factor and peaks at around 21 mm, indicating that moderate precipitation is most likely to induce geotechnical instability. However, when rainfall further increases to an extremely high level (>27 mm), the growth rate of its positive contribution begins to weaken, suggesting that excessive rainfall may trigger a certain saturation effect or physical limit. Appendix B Figure A1d2 further confirms that the positive driving effect of rainfall on hazard remains strong within approximately 10–23 mm, with samples in this range exhibiting the highest positive SHAP values; with increasing rainfall, the contribution of this feature to the model’s predictive value generally presents an inverted U-shaped trend of “first negative, then positive, and finally weakening”. This nonlinear pattern is consistent with existing studies, confirming that rainfall is a key factor driving the occurrence of disasters: its influence shows an inhibitory effect under low rainfall conditions, transforms into a strong positive driving effect in the medium rainfall range, and the growth rate of the positive contribution tends to flatten under extremely high rainfall conditions [61,62,63].
As shown in Appendix B Figure A1n1,n2, the interaction term between lithology and mining intensity and depth reveals the extreme sensitivity arising from the coupling of geological conditions and human activities. The influence of this feature on the model presents a nonlinear, monotonically increasing trend. As shown in Appendix B Figure A1n1, as the interaction index rises from 1.0 to 5.0, the SHAP value increases significantly. In the low index range, the SHAP value is at its lowest level and even negative, while in the high index range, the SHAP value can reach approximately 5.0. As shown in Appendix B Figure A1n2, samples in the range of approximately 15–25 exhibit a strong positive promoting effect on hazard. This indicates that when specific lithologies are subjected to high-intensity mining disturbances, the destructive effect is significantly amplified, reflecting that the coupling mechanism of “lithology–mining intensity and depth” is one of the core driving forces of ground collapse.

5.2. Comparison Between GBDT-LR and LR

The core requirement for ground collapse hazard assessment is to balance prediction accuracy with mechanistic interpretability, and the differences in modeling methods directly affect the reliability of the assessment results. This study preliminarily selected the GBDT-LR coupled model through multi-model screening and conducted comparative validation against the single LR model, with accuracy verification performed using ROC curves and AUC values (Figure 8). The results show that the GBDT-LR coupled model achieved an AUC value of 0.871, representing a 5.8 percentage point improvement over the single LR model (AUC = 0.813). The essence of this difference lies in the models’ varying capabilities to capture interaction effects among contributing factors: as a linear model, the single LR model struggles to characterize the nonlinear coupling relationships among multiple factors, such as “weak rock formation + high mining intensity + groundwater fluctuation,” leading to a relatively conservative hazard classification for complex mining areas. In contrast, the GBDT-LR coupled model captures high-order nonlinear interaction features through its GBDT component and then outputs interpretable coefficients and odds ratios via the LR component. This approach retains the fitting advantages of ensemble models while meeting the normative requirement of “traceable causation” in geological hazard assessment. Its advantages are fundamentally consistent with the conclusions of Merghadi et al. [34]—hybrid models integrating tree-based ensemble algorithms with linear models (such as GBDT-LR) significantly outperform single linear models in capturing nonlinear relationships among contributing factors and improving the accuracy of geological hazard assessment. This superiority originates from the synergistic effect of nonlinear feature extraction by tree-based algorithms and linear result interpretation by LR models, providing strong support for the combined approach of “nonlinear modeling + linear interpretation.”
To comprehensively evaluate the model performance and address the limitations of relying solely on AUC, we employed multiple performance metrics (accuracy, precision, recall, F1 score, and kappa coefficient) for comparative analysis. The detailed results are shown in Table 3.
As shown in Table 3, the GBDT-LR coupled model exhibits balanced and excellent performance across all metrics. Compared with the single LR model, the GBDT-LR coupled model achieves significant improvements in accuracy (+5.3%), recall (+5.7%), precision (+4.4%), F1 score (+5%), and kappa coefficient (+10.5%), indicating that the coupled model has stronger overall predictive ability and robustness. Although the GBDT model shows a slightly higher accuracy and recall, the GBDT-LR coupled model maintains a higher AUC value and achieves a better balance between precision and recall (reflected by the F1 score), which is more suitable for hazard assessment in the study area. Therefore, the higher computational cost and model complexity associated with the ensemble approach are well justified for achieving more accurate and reliable hazard zoning.
To verify the interpretive advantage of the GBDT-LR coupled model and clarify the ground collapse hazard-forming mechanism in the study area, this section presents the comprehensive LR coefficients of the model and conducts an in-depth analysis. The coefficients are derived from the weighted average of LR coefficients of high-order features extracted by GBDT, which can explicitly quantify the comprehensive contribution direction and intensity of each core disaster-causing factor (including interaction terms) to ground collapse hazard.
As shown in Table 4, the GBDT-LR model exhibits significant interpretive advantages compared to the standalone LR model. The latter can only capture linear relationships between factors and cannot reflect complex nonlinear interactions, while the GBDT-LR model can explicitly identify the role of each factor through comprehensive LR coefficients, providing a quantitative basis for hazard mechanism interpretation.
First, the distance to the goaf is the most critical promotive factor with a comprehensive coefficient of 0.167, which confirms that mining disturbance is the primary inducer of ground collapse hazard in the study area. The 222 matched high-order features indicate that the impact of the goaf on collapse presents a complex gradient effect: The collapse hazard decreases significantly as the distance from the goaf increases, and the hazard is the highest within 500 m of the goaf. This finding is consistent with the mechanical principle of goaf roof collapse—closer to the goaf, the surface disturbance caused by mining activities is stronger, and the stability of the overlying strata is lower.
Second, the lithology–mining intensity interaction term has a positive coefficient of 0.029, revealing the coupled amplification effect of geological background and human activities on collapse hazard. The study area is dominated by soft rock strata such as Quaternary loose sand and silt. An increase in mining intensity will further reduce the shear strength of soft rock, leading to a significant increase in collapse hazard. This result is consistent with the research of Ma et al. (2025) [64] on mining-induced collapse in arid basins, verifying the universality of the “lithology + mining intensity” coupled disaster-forming mechanism.
Third, elevation, slope, and NDVI exert weak inhibitory effects (absolute coefficient < 0.01), which are closely related to the geological and geographical characteristics of the Turpan–Hami Basin. Low-altitude areas (elevation < 800 m) are dominated by alluvial-proluvial plains with loose strata, which are more susceptible to disturbance, while high-altitude areas are dominated by bedrock with higher integrity, thus inhibiting collapse hazard. Steep slopes (>15°) have better rock mass integrity due to less human disturbance, and high NDVI values indicate good vegetation coverage, which can enhance soil cohesion and reduce the possibility of collapse. These regulatory effects reflect the comprehensive influence of natural factors on collapse hazard, supplementing the understanding that “mining-induced collapse is jointly affected by human activities and natural factors.”
Finally, the aspect-related coefficient is close to 0, indicating that its impact on collapse hazard is negligible in the study area. This may be attributed to the relatively uniform distribution of solar radiation and precipitation in the arid region, where differences in slope aspect have no significant effect on strata stability. This result can provide a reference for factor screening in future collapse hazard assessments in similar regions—unnecessary factors can be excluded to simplify the model without reducing assessment accuracy.

5.3. Disaster Formation Mechanisms

Integrating the identification of contributing factors, the quantitative analysis of SHAP values, and the prediction results of the GBDT-LR model from this study and referencing existing research on geological disaster formation mechanisms [65,66], a systematic discussion is conducted on the disaster formation mechanisms of ground collapse in the Turpan–Hami Basin. It is clarified that the core mechanism is a multi-factor synergistic disaster process characterized by “geological background control–anthropogenic disturbance triggering–environmental factor modulation.” The interactions and superimpositions of these factors collectively determine the occurrence, development, and spatial distribution characteristics of ground collapse, which aligns well with the regional uniqueness of the arid climate and composite mining activities in the study area.
Geological background conditions form the fundamental prerequisite for the occurrence of ground collapse disasters, with the mechanical property differences among lithology playing a crucial controlling role. The soft rock formations developed in the Turpan–Hami Basin (e.g., mudstone, sandy mudstone) exhibit low compressive strength, poor cementation, weak rock mass integrity, and insufficient resistance to disturbance [67]. They are prone to softening and disintegration under minor external disturbances, providing the material basis for ground collapse. In contrast, hard rock formations (e.g., sandstone, conglomerate) possess intact rock mass structures and excellent mechanical properties, enabling them to effectively withstand external disturbances, resulting in a significantly lower probability of collapse occurrence. Simultaneously, fault structures are well-developed in the study area. Fault areas may compromise the integrity of the rock mass and weaken the stability of the rock and soil body, thereby indirectly increasing the likelihood of collapse occurrence.
Anthropogenic mining activities are the core triggering factor for ground collapse disasters and the primary driving force behind the concentrated distribution of collapse disasters in the study area. Long-term, large-scale underground mining has created extensive goaf areas, disrupting the stress equilibrium of the underground rock mass. This causes the overlying strata of the goaf to lose support, leading to bending and deformation under their own weight, ultimately triggering ground collapse [46,47,68]. Furthermore, increases in mining intensity and depth exacerbate this state of stress imbalance. When the mining depth surpasses a critical threshold and the mining intensity exceeds the bearing capacity limit of the rock mass, the scale and frequency of collapse increase significantly [69].
Environmental factors play a significant regulatory role in the occurrence of ground collapse disasters, with the regulatory effects of groundwater dynamics and rainfall being the most prominent. As a crucial component of the rock–soil mass, fluctuations in groundwater level directly alter its water content and pore water pressure. A rising water table saturates the rock–soil mass, significantly reducing its strength and shear resistance, making the rock mass prone to softening and disintegration. Conversely, a declining water table generates seepage forces that carry away fine particles within the rock–soil mass, leading to increased porosity, loosened structure, and further compromising rock mass stability. This aligns with the findings of Ding et al. [14] regarding the relationship between groundwater and land subsidence in the Turpan–Hami Basin. The regulatory role of rainfall is primarily manifested in two aspects: short-term triggering and long-term accumulation. Short-term heavy rainfall infiltrates the subsurface via slope runoff, increasing the water content of the rock–soil mass, reducing rock strength, and triggering potential collapse points [70]. The cumulative effect of long-term rainfall persistently weakens the stability of the rock–soil mass, creating conditions for collapse. Furthermore, rainfall interacts with groundwater dynamics, further amplifying the disaster-causing effect [44]. Additionally, vegetation cover (NDVI) exerts a certain inhibitory effect on ground collapse by enhancing the integrity and erosion resistance of the rock–soil mass [71,72]. This also explains the phenomenon of relatively sparse collapse point distribution in areas with better vegetation cover within the study area.
The disaster formation mechanism of ground collapse in the Turpan–Hami Basin is not the result of a single factor, but rather a comprehensive process of multi-factor synergy. This process is characterized by geological background (lithology and fault structures) serving as the foundation, anthropogenic mining activities (goaf areas, mining intensity and depth, and their coupling effects with rock formations) acting as the core triggering factors, and environmental factors such as groundwater, rainfall, and vegetation cover functioning as regulatory factors. This mechanism not only reveals the occurrence patterns of ground collapse in the study area but also provides a scientific basis for the subsequent precise prevention and control of ground collapse disasters and the optimization of hazard zoning. Furthermore, it validates the rationality and comprehensiveness of the multi-dimensional “geology-mining-hydrology-environment” contributing factor system constructed in this study. In addition, the constructed “index system–optimized GBDT-LR adaptive application” methodology may provide useful reference and exploratory insights for disaster hazard assessment in other arid mining regions with similar geological backgrounds, and its universality awaits further validation with external datasets and cross-regional applications.

6. Conclusions

This study systematically reviewed the geological environmental background of the mining activity areas in the Turpan–Hami Basin. By integrating field surveys and multi-source data and based on previous research findings, it summarized five types of disaster formation mechanisms, identified thirteen contributing factors, and added the “lithology × mining intensity and depth” interaction term to construct a multi-dimensional hazard assessment index system for ground collapse disasters. Subsequently, the LR and GBDT-LR coupled models were employed to conduct hazard assessment and validation, and an in-depth analysis of the spatial distribution patterns of disasters and the coupling effects of mechanisms was performed. The main conclusions are as follows.
Ground collapse in the mining activity areas of the Turpan–Hami Basin is controlled by the coupling of multiple factors, exhibiting significant spatial differentiation and zonal aggregation characteristics. The extremely high-hazard areas, accounting for 9.21%, are concentrated in the coal mining area northwest of Turpan City and the composite oil–gas-coal mining area southwest of Hami City. The high- and medium-hazard areas together account for 13.51%, primarily distributed in belts surrounding the extremely high- hazard areas or scattered around secondary goaf areas. The low- and very-low-hazard areas, comprising 77.28%, extensively cover areas with minimal mining activity, such as the eastern plains, southern lowlands, and northern high mountainous regions.
Different hazard level areas are dominated by distinct combinations of contributing factors. The extremely high-hazard areas exhibit high-intensity synergy among multi-dimensional factors such as “geology–mining–hydrology”: specifically, the superposition of mining-related factors (goaf distribution, mining intensity, and depth), geological factors (lithology, distance to faults), and hydrological factors (groundwater level change). Among these, the contribution of the “lithology × mining intensity and depth” interaction term, which represents the coupling of geology and mining activity, is particularly crucial. Its importance surpasses the independent effects of single factors, quantitatively revealing that multi-factor synergy is the fundamental mechanism driving the sharp increase in hazard. The high- and medium-hazard areas are primarily controlled by partial combinations of “mining” factors with one or two other dimensional factors (e.g., hydrological or geological), with a reduced superposition intensity. The low- and extremely-low-hazard areas generally exhibit low contribution values for all dimensional factors.
In terms of assessment methodology, the GBDT-LR coupled model employed in this study significantly outperforms the single LR model in both accuracy and capability in characterizing disaster mechanisms. The coupled model achieved an AUC value of 0.871, representing a 5.8-percentage-point improvement over the single model. It more accurately captures the complex nonlinear interaction effects among contributing factors and refines the spatial boundaries of medium-to-high hazard areas to better align with field investigation results, thereby providing a more reliable method for hazard assessment in arid mining basins.
Through the GBDT-LR coupled model and GIS spatial analysis, this study precisely reveals the spatial distribution characteristics, hazard zoning, and dominant mechanisms of ground collapse in the Turpan–Hami Basin. It not only validates the importance of multi-factor synergy in disaster formation but also further confirms the composite disaster mechanism of “geological foundation control with mining activity triggering.” The findings provide precise spatial evidence and quantitative support for regional disaster prevention/mitigation planning and mining safety regulation.

Author Contributions

Conceptualization, B.Z.; methodology, T.W. and B.Z.; software, T.W.; validation, T.W., C.J. and B.Z.; formal analysis, N.L., C.J., Y.L. and B.Z.; investigation, J.Y. and Y.Z.; resources, Y.Z., J.Y. and T.W.; data curation, J.Y. and T.W.; writing—original draft preparation, T.W.; writing—review and editing, T.W., Y.L. and B.Z.; visualization, T.W.; supervision, N.L., C.J., Y.L. and B.Z.; project administration, N.L., Y.L., S.S. and B.Z.; funding acquisition, N.L., Y.L., S.S. and B.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Third Xinjiang Scientific Expedition Program Grant No. 2022xjkk1305, the National Natural Science Foundation of China under Grant Nos. 42307247, 42402298, and the Youth Innovation Promotion Association Foundation of the Chinese Academy of Sciences under Grant No. 2023073.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request. However, some data (particularly the ground collapse inventory) are not publicly available due to confidentiality requirements of the Third Xinjiang Comprehensive Scientific Expedition Project. Other supporting data (such as the DEM, geological maps) are available from the cited public sources or from the authors.

Acknowledgments

We thank Hao Luo and Songyan Li from the Institute of Geology and Geophysics, Chinese Academy of Sciences, for their guidance on aspects such as the selection of machine learning models. We also extend our thanks to Fengjiao Tang and Xiao Lu from the Institute of Geology and Geophysics, Chinese Academy of Sciences, for their guidance on data collection and data processing.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
GBDTGradient Boosting Decision Tree;
LRLogistic Regression;
GISGeographic Information System;
SHAPSHapley Additive exPlanations;
GGeohazard Potential Index;
CVCoefficient of Variation;
DEMDigital Elevation Model;
NDVINormalized Difference Vegetation Index;
PGAPeak Ground Acceleration;
ROCReceiver Operating Characteristic;
AUCArea Under the Curve;
BPBack Propagation;
RFRandom Forest;
SVMSupport Vector Machine;
RBFRadial Basis Function;
WoEWeights of Evidence Model;
IVInformation Value Model;
EWMEntropy Weight Method.

Appendix A

Table A1. Data Sources for Contributing Factors.
Table A1. Data Sources for Contributing Factors.
Data NameData Source
Elevation (DEM)ASTER GDEM data (30 m resolution) from the Geospatial Data Cloud website
RainfallWorld Clim Dataset: https://www.worldclim.org/data/worldclim21.html (accessed on 5 February 2026)
NDVI[73] Didan, K. (2015). MOD13A3 MODIS/Terra vegetation Indices Monthly L3 Global 1km SIN Grid V006 [Data set]. NASA Land Processes Distributed Active Archive Center. https://doi.org/10.5067/MODIS/MOD13A3.006 Date Accessed: 5 February 2026
PGA[74] GB 18306-2015; Seismic Ground Motion Parameters Zonation Map of China (Scale 1:4,000,000). Standards Press of China: Beijing, China, 2015.
Water Systemhttps://www.openstreetmap.org/ (accessed on 5 February 2026)
RoadsOpen-source geospatial data provided by the OpenStreetMap project
Faults[75] Qi, S. (2022). Geological structure database of Qinghai Tibet Plateau. National Tibetan Plateau/Third Pole Environment Data Center. https://doi.org/10.11888/SolidEar.tpdc.272224 Date Accessed: 5 February 2026.
Lithology[76] Qi, S. (2021). Engineering geological petrofabric database of Qinghai Tibet Plateau. National Tibetan Plateau/Third Pole Environment Data Center.
https://data.tpdc.ac.cn/zh-hans/data/34828cc5-11ec-4e1f-916d-86d5598f09bb (accessed on 5 February 2026)
Groundwater Level Change[39] Wang, M.; Yao, J.; Chang, H.; Liu, R.; Xu, N.; Liu, Z.; Gong, H.; Zheng, H.; Wang, J.; Guo, X.; Cao, Y.; Zhao, Y.; Lu, H. Monthly groundwater level grid dataset of China region (2005–2022). National Tibetan Plateau/Third Pole Environment Data Center. The data set is provided by National Tibetan Plateau/Third Pole Environment Data Center (http://data.tpdc.ac.cn). (Accessed on 5 February 2026)

Appendix B

Figure A1. SHAP dependence plots for contributing factors. (a1,a2) DEM; (b1,b2) slope; (c1,c2) aspect; (d1,d2) rainfall; (e1,e2) NDVI; (f1,f2) distance to goaf; (g1,g2) groundwater level change; (h1,h2) distance from the water system; (i1,i2) distance to fault; (j1,j2) PGA; (k1,k2) lithology; (l1,l2) mining intensity and depth; (m1,m2) distance from highway; (n1,n2) rock–mining interaction. Color explanation: Colors tending toward red indicate a more pronounced promoting effect of the factor on ground collapse, while colors tending toward blue indicate a more pronounced inhibiting effect.
Figure A1. SHAP dependence plots for contributing factors. (a1,a2) DEM; (b1,b2) slope; (c1,c2) aspect; (d1,d2) rainfall; (e1,e2) NDVI; (f1,f2) distance to goaf; (g1,g2) groundwater level change; (h1,h2) distance from the water system; (i1,i2) distance to fault; (j1,j2) PGA; (k1,k2) lithology; (l1,l2) mining intensity and depth; (m1,m2) distance from highway; (n1,n2) rock–mining interaction. Color explanation: Colors tending toward red indicate a more pronounced promoting effect of the factor on ground collapse, while colors tending toward blue indicate a more pronounced inhibiting effect.
Applsci 16 03354 g0a1aApplsci 16 03354 g0a1b

References

  1. Whittaker, B.N.; Reddish, D.J. Subsidence: Occurrence, Prediction, and Control; Developments in Geotechnical Engineering; Elsevier Science Pub. Co.: Amsterdam, The Netherlands; New York, NY, USA, 1989. [Google Scholar]
  2. Gutiérrez, F.; Cooper, A.H.; Johnson, K.S. Identification, Prediction, and Mitigation of Sinkhole Hazards in Evaporite Karst Areas. Environ. Geol. 2008, 53, 1007–1022. [Google Scholar] [CrossRef]
  3. Delle Rose, M.; Parise, M. Karst Subsidence in South-Central Apulia, Southern Italy. Int. J. Speleol. 2002, 31, 181–199. [Google Scholar] [CrossRef]
  4. Chen, K.; Chen, L.; Zhang, Z.; Chang, J.; He, Q.; He, Q.; Fang, H.; Liu, Z.; Huang, Y.; Zhao, M. Susceptibility and Risk Assessment of Geological Disasters in Xinjiang Based on Empirical Investigation. J. Eng. Geol. 2023, 31, 1156–1166. (In Chinese) [Google Scholar] [CrossRef]
  5. Gacu, J.; Kantoush, S.; Candelario, R.; Falculan, J.; Moaje, K.V.; Famaran, M.J.; Nepomuceno, M.; Ebon, J.A.; Parungao, R.; Ignacio, R.; et al. Integrated Multi-Hazard Risk Assessment Under Compound Disasters Using Analytical Hierarchy Process (AHP). Heliyon 2025, 11, e43173. [Google Scholar] [CrossRef]
  6. Kucuker, D.M.; Cedano Giraldo, D. Assessment of Soil Erosion Risk Using an Integrated Approach of GIS and Analytic Hierarchy Process (AHP) in Erzurum, Turkiye. Ecol. Inform. 2022, 71, 101788. [Google Scholar] [CrossRef]
  7. Jena, R.; Pradhan, B.; Beydoun, G.; Nizamuddin; Ardiansyah; Sofyan, H.; Affan, M. Integrated Model for Earthquake Risk Assessment Using Neural Network and Analytic Hierarchy Process: Aceh Province, Indonesia. Geosci. Front. 2020, 11, 613–634. [Google Scholar] [CrossRef]
  8. Jiang, J.; Shen, W.; Wang, Y. A Risk Assessment Approach for Road Collapse along Tunnels Based on an Improved Entropy Weight Method and K-Means Cluster Algorithm. Ain Shams Eng. J. 2024, 15, 102805. [Google Scholar] [CrossRef]
  9. Park, I.; Lee, J.; Saro, L. Ensemble of Ground Subsidence Hazard Maps Using Fuzzy Logic. Cent. Eur. J. Geosci. 2014, 6, 207–218. [Google Scholar] [CrossRef]
  10. Park, J.H.; Kang, J.; Kang, J.; Mun, D. Machine-Learning-Based Ground Sink Susceptibility Evaluation Using Underground Pipeline Data in Korean Urban Area. Sci. Rep. 2022, 12, 20911. [Google Scholar] [CrossRef]
  11. Zhi, D.; Li, J.; Yang, F.; Chen, X.; Wu, C.; Wang, B.; Zhang, H.; Hu, J.; Jin, J. Whole Petroleum System in Jurassic Coal Measures of Taibei Sag in Tuha Basin, NW China. Pet. Explor. Dev. 2024, 51, 519–534. [Google Scholar] [CrossRef]
  12. Sha, Y. Ground Subsidence Detection of Beiquan Mining Area Based on Stacking Time-Series InSAR. Geomat. Sci. Technol. 2020, 8, 60–67. [Google Scholar] [CrossRef]
  13. Bell, F.G.; Stacey, T.R.; Genske, D.D. Mining Subsidence and Its Effect on the Environment: Some Differing Examples. Environ. Geol. 2000, 40, 135–152. [Google Scholar] [CrossRef]
  14. Ding, Q.; Zhang, J.; Zhang, H.; Huang, J.; Sun, Y.; Bai, F.; Tu, Z.; Li, J. Study on Groundwater Dynamics and Its Relationship with Land Subsidence in Turpan Basin. Earth Sci. 2025, 50, 737–751. (In Chinese) [Google Scholar] [CrossRef]
  15. Guo, W.; Zhao, G.; Bai, E.; Ma, C.; Nie, X.; Chen, J.; Zhang, H. Research Status and Prospect on Cultivated Land Damage at Surface Subsidence Basin Due to Longwall Mining in the Central Coal Grain Compound Area. J. China Coal Soc. 2023, 48, 388–401. [Google Scholar]
  16. Niu, L.; Gulibostan, T.; Wang, T.; Feng, Z. Analysis on Distribution Characteristics and Main Controlling Factors of Geological Disasters in Xinjiang. West-China Explor. Eng. 2023, 11, 38–41. (In Chinese) [Google Scholar]
  17. Wang, Y.; Feng, G.; Li, Z.; Luo, S.; Wang, H.; Xiong, Z.; Zhu, J.; Hu, J. A Strategy for Variable-Scale InSAR Deformation Monitoring in a Wide Area: A Case Study in the Turpan–Hami Basin, China. Remote Sens. 2022, 14, 3832. [Google Scholar] [CrossRef]
  18. Brenning, A. Spatial Prediction Models for Landslide Hazards: Review, Comparison and Evaluation. Nat. Hazards Earth Syst. Sci. 2005, 5, 853–862. [Google Scholar] [CrossRef]
  19. Chen, J.; Huang, G.; Chen, W. Towards Better Flood Risk Management: Assessing Flood Risk and Investigating the Potential Mechanism Based on Machine Learning Models. J. Environ. Manag. 2021, 293, 112810. [Google Scholar] [CrossRef]
  20. Ma, M.; Wang, T.; Yang, J.; Chen, Z.; Wang, J.; Liu, R.; Miao, X. XAI-Driven Flood Risk Assessment: Integrating Machine Learning and Hydrological Model. Geosci. Front. 2026, 17, 102244. [Google Scholar] [CrossRef]
  21. Yang, K.; Niu, R.; Song, Y.; Dong, J.; Zhang, H.; Chen, J. Dynamic Hazard Assessment of Rainfall-Induced Landslides Using Gradient Boosting Decision Tree with Google Earth Engine in Three Gorges Reservoir Area, China. Water 2024, 16, 1638. [Google Scholar] [CrossRef]
  22. Feng, W.; Tang, Y.; Hong, B. Landslide Hazard Assessment Methods along Fault Zones Based on Multiple Working Conditions: A Case Study of the Lixian–Luojiabu Fault Zone in Gansu Province (China). Sustainability 2022, 14, 8098. [Google Scholar] [CrossRef]
  23. Shi, L.; Gong, H.; Chen, B.; Shao, Z.; Zhou, C. Land Subsidence Simulation Considering Groundwater and Compressible Layers Based on an Improved Machine Learning Method. J. Hydrol. 2025, 656, 133008. [Google Scholar] [CrossRef]
  24. Mao, X.; Li, J.; Zhang, H. Zircon U-Pb SHRIMP Ages from the Late Paleozoic Turpan-Hami Basin, NW China. J. Earth Sci. 2014, 25, 924–931. [Google Scholar] [CrossRef]
  25. Master Plan for Mineral Resources of Xinjiang Uygur Autonomous Region (2016–2020). Available online: https://zrzyt.xinjiang.gov.cn/xjgtzy/c106587/201809/d017e4f6584344b8a30ad060cda51835.shtml (accessed on 4 May 2025).
  26. Ma, S.; Shao, X.; Xu, C. Landslide Susceptibility Mapping in Terms of the Slope-Unit or Raster-Unit, Which Is Better? J. Earth Sci. 2023, 34, 386–397. [Google Scholar] [CrossRef]
  27. Hosmer, D.W., Jr.; Lemeshow, S.; Sturdivant, R.X. Introduction to the Logistic Regression Model. In Applied Logistic Regression, 3rd ed.; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2013; pp. 1–33. [Google Scholar] [CrossRef]
  28. Zhang, Y.C.; Qin, S.W.; Zhai, J.J.; Li, G.J.; Peng, S.Y.; Liu, X.; Chen, J.J. Susceptibility Assessment of Debris Flow Based on GIS and Weight Information for the Changbai Mountain Area. Hydrogeol. Eng. Geol. 2018, 45, 150–158. (In Chinese) [Google Scholar] [CrossRef]
  29. Yang, D.H.; Zhu, J.Y.; Liu, S.; Ma, B.; Dai, X.S. Comparative Analyses of Susceptibility Assessment for Landslide Disasters Based on Information Value, Weighted Information Value and Logistic Regression Coupled Model in Luoping County, Yunnan Province. Chin. J. Geol. Hazard Control 2023, 34, 43–53. (In Chinese) [Google Scholar] [CrossRef]
  30. Zeng, T.; Jin, B.; Glade, T.; Xie, Y.Y.; Li, Y.; Zhu, Y.H.; Yin, K.L. Assessing the Imperative of Conditioning Factor Grading in Machine Learning-Based Landslide Susceptibility Modeling: A Critical Inquiry. CATENA 2024, 236, 107732. [Google Scholar] [CrossRef]
  31. Gao, D.; Li, K.; Cai, Y.; Wen, T. Landslide Displacement Prediction Based on Time Series and PSO-BP Model in Three Georges Reservoir, China. J. Earth Sci. 2024, 35, 1079–1082. [Google Scholar] [CrossRef]
  32. Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. Available online: http://www.jstor.org/stable/2699986 (accessed on 10 February 2026). [CrossRef]
  33. Wang, Y.; Chen, J.; Li, S.; Zhang, P.; Chen, X.; Ma, S.; Ouyang, H.; Bian, H.; Mao, T.; Zhang, Z.; et al. A Conditional Probability-Based Model for Mountainous Geological Hazard Susceptibility Assessment. Appl. Sci. 2025, 15, 12653. [Google Scholar] [CrossRef]
  34. Merghadi, A.; Yunus, A.P.; Dou, J.; Whiteley, J.; ThaiPham, B.; Bui, D.T.; Avtar, R.; Abderrahmane, B. Machine Learning Methods for Landslide Susceptibility Studies: A Comparative Overview of Algorithm Performance. Earth-Sci. Rev. 2020, 207, 103225. [Google Scholar] [CrossRef]
  35. Fan, L.M. Study on Coal Mining Intensity and Geological Disasters in Yushenfu Area. China Coal 2014, 40, 52–55. (In Chinese) [Google Scholar]
  36. Lan, T.W.; Zhang, Z.J.; Yuan, Y.N. Study on Evaluation Method of Mine Geological Dynamic Environment and Classification of Rock Burst Mine Types. Coal Geol. Explor. 2023, 51, 104–113. (In Chinese) [Google Scholar]
  37. Li, Z.; Wu, G.Y.; Si, S.J. Value Method of Sub-index for Rock Burst Hazard Assessment Based on Mining Depth and Tectonic Stress. Coal Sci. Technol. 2024, 52, 38–47. (In Chinese) [Google Scholar]
  38. Yang, C.C. Optimization of Stope Structural Parameters and Engineering Application of Sublevel Filling Method in a Mine. Min. Res. Dev. 2019, 39, 15. (In Chinese) [Google Scholar]
  39. Wang, M.; Yao, J.; Chang, H.; Liu, R.; Xu, N.; Liu, Z.; Gong, H.; Zheng, H.; Wang, J.; Guo, X.; et al. Underground Well Water Level Observation Grid Dataset from 2005 to 2022. Sci. Data 2025, 12, 728. [Google Scholar] [CrossRef]
  40. Yi, X.; Shang, Y.; Meng, H.; Meng, Q.; Shao, P.; Ahmed, I. Regional Landslide Hazard and Risk Assessment Considering Landslide Spatial Aggregation and Hydrological Slope Units. Appl. Sci. 2025, 15, 8068. [Google Scholar] [CrossRef]
  41. Hu, Y.; Zhang, Z.; Lin, S. Evaluation of Landslide Susceptibility in Ili Valley, Xinjiang Based on the Coupling of WOE Model and Logistic Regression. J. Eng. Geol. 2023, 31, 1350–1363. (In Chinese) [Google Scholar] [CrossRef]
  42. Belenguer-Plomer, M.A.; Barrilero, O.; Saameño, P.; Mendes, I.; Lazzarini, M.; Albani, S.; El Beyrouthy, N.; Al Sayah, M.; Rueche, N.; Edjossan-Sossou, A.M.; et al. Remote Sensing as a Sentinel for Safeguarding European Critical Infrastructure in the Face of Natural Disasters. Appl. Sci. 2025, 15, 8908. [Google Scholar] [CrossRef]
  43. Wang, X.; Hu, S.; Lian, B.; Wang, J.; Zhan, H.; Wang, D.; Liu, K.; Luo, L.; Gu, C. Formation Mechanism of a Disaster Chain in Loess Plateau: A Case Study of the Pucheng County Disaster Chain on August 10, 2023, in Shaanxi Province, China. Eng. Geol. 2024, 331, 107463. [Google Scholar] [CrossRef]
  44. Zhu, N.; Liu, F.; Sun, D. Mechanisms of Groundwater Damage to Overlying Rock in Goaf. Processes 2024, 12, 936. [Google Scholar] [CrossRef]
  45. Yi, S.; Zhang, Y.; Yi, H.; Li, X.; Wang, X.; Wang, Y.; Chu, T. Study on the Instability Activation Mechanism and Deformation Law of Surrounding Rock Affected by Water Immersion in Goafs. Water 2022, 14, 3250. [Google Scholar] [CrossRef]
  46. Li, Z.; Wu, G.; Si, S.; Gao, X.; Liu, X.; Zhang, D.; Zhang, C.; Liu, J. Valuation Method for Sub-indicators of Rock Burst Risk Based on Mining Depth and Tectonic Stress. Coal Sci. Technol. 2024, 52, 38–47. (In Chinese) [Google Scholar] [CrossRef]
  47. Li, Z.; Liu, P.; He, S.; Gu, F.; Li, Z.; Huang, X.; Wang, L. Research on the Collapsing Pattern of Overburden Rock and Pore Development Characterization in the Mining Hollow Area. Sci. Rep. 2025, 15, 31979. [Google Scholar] [CrossRef]
  48. Tan, X.; Chen, W.; Wang, L.; Qin, C. Spatial Deduction of Mining-Induced Stress Redistribution Using an Optimized Non-Negative Matrix Factorization Model. J. Rock Mech. Geotech. Eng. 2023, 15, 2868–2876. [Google Scholar] [CrossRef]
  49. Diddle, B.; Agioutantis, Z.; Maldonado Esguerra, E.; Romero Benitez, J.D.; Parra Valencia, M. Prediction of Dynamic and Final Vertical and Horizontal Movements Due to Longwall Mining. Rock Mech. Rock Eng. 2025, 58, 4403–4420. [Google Scholar] [CrossRef]
  50. Yang, G.; Chen, Y.; Liu, X.; Yang, R.; Zhang, Y.; Zhang, J. Stability Analysis of a Slope Containing Water-Sensitive Mudstone Considering Different Rainfall Conditions at an Open-Pit Mine. Int. J. Coal Sci. Technol. 2023, 10, 64. [Google Scholar] [CrossRef]
  51. Lu, Y.; Jin, C.; Wang, Q.; Li, G.; Han, T. Deformation and Failure Characteristic of Open-Pit Slope Subjected to Combined Effects of Mining Blasting and Rainfall Infiltration. Eng. Geol. 2024, 331, 107437. [Google Scholar] [CrossRef]
  52. Liu, H.; Motoda, H. Feature Selection for Knowledge Discovery and Data Mining; Springer Science & Business Media: Cham, Switzerland, 2012. [Google Scholar]
  53. Van Westen, C.J.; Castellanos, E.; Kuriakose, S.L. Spatial Data for Landslide Susceptibility, Hazard, and Vulnerability Assessment: An Overview. Eng. Geol. 2008, 102, 112–131. [Google Scholar] [CrossRef]
  54. Luo, H.; Xu, Q.; Jiang, Y.; Meng, R.; Pu, C. The Prediction Method of Large-Scale Land Subsidence Based on Multi-Temporal InSAR and Machine Learning. Earth Sci. 2024, 49, 1736–1745. (In Chinese) [Google Scholar]
  55. Noël, C.; Fryer, B.; Baud, P.; Violay, M. Water Weakening and the Compressive Brittle Strength of Carbonates: Influence of Fracture Toughness and Static Friction. Int. J. Rock Mech. Min. Sci. 2024, 177, 105736. [Google Scholar] [CrossRef]
  56. Zhai, G.; Dai, J.; Chen, G.; Pan, Z.; Liang, C.; Liu, Z.; Jiang, X. Mechanism Mode and Prevention and Control Measures of Karst Collapses Induced by Foundation Pit Excavation. Carbonates Evaporites 2024, 39, 88. [Google Scholar] [CrossRef]
  57. Guzy, A.; Witkowski, W.T. Land Subsidence Estimation for Aquifer Drainage Induced by Underground Mining. Energies 2021, 14, 4658. [Google Scholar] [CrossRef]
  58. Qian, L.; Zang, S. Differentiation Rule and Driving Mechanisms of Collapse Disasters in Changbai County. Sustainability 2022, 14, 2074. [Google Scholar] [CrossRef]
  59. Wang, W.; Wang, C.; Zhang, L.; Yan, Y.; Wang, L.; Guo, J. Monitoring Mining-Induced Subsidence from Satellite Imagery Using Transformer-Based Deep Learning Trained on Gridded Subsidence Measurements. J. Environ. Manag. 2025, 394, 127536. [Google Scholar] [CrossRef]
  60. Kim, K.-D.; Lee, S.; Oh, H.-J.; Choi, J.-K.; Won, J.-S. Assessment of Ground Subsidence Hazard near an Abandoned Underground Coal Mine Using GIS. Environ. Geol. 2006, 50, 1183–1191. [Google Scholar] [CrossRef]
  61. Zhu, S.; Wu, L.; Peng, J. An Improved Chebyshev Semi-Iterative Method for Simulating Rainfall Infiltration in Unsaturated Soils and Its Application to Shallow Landslides. J. Hydrol. 2020, 590, 125157. [Google Scholar] [CrossRef]
  62. Song, Z.; Li, X.; Lizárraga, J.J.; Zhao, L.; Buscarnera, G. Spatially Distributed Landslide Triggering Analyses Accounting for Coupled Infiltration and Volume Change. Landslides 2020, 17, 2811–2824. [Google Scholar] [CrossRef]
  63. Yin, Z.; Qin, G.; Guo, L.; Tang, X.; Wang, J.; Li, H. Coupling Antecedent Rainfall for Improving the Performance of Rainfall Thresholds for Suspended Sediment Simulation of Semiarid Catchments. Sci. Rep. 2022, 12, 4816. [Google Scholar] [CrossRef]
  64. Ma, T.; Tang, F.; Zhang, F. Multi-scale soil moisture dynamics in arid mined Loess Plateau arise from crack evolution and microstructure collapse. Sci. Rep. 2025, 15, 44524. [Google Scholar] [CrossRef]
  65. Huang, X.; Li, X.; Li, H.; Duan, S.; Yang, Y.; Du, H.; Xiao, W. Study on the Movement of Overlying Rock Strata and Surface Movement in Mine Goaf under Different Treatment Methods Based on PS-InSAR Technology. Appl. Sci. 2024, 14, 2651. [Google Scholar] [CrossRef]
  66. Bai, J.; Dou, L.; Li, X.; Ma, X.; Lu, F.; Han, Z. Evolution Laws of Stress–Energy and Progressive Damage Mechanisms of Surrounding Rock Induced by Mining Disturbance. Appl. Sci. 2023, 13, 7759. [Google Scholar] [CrossRef]
  67. Li, Y.; Guo, S.; Zheng, B.; Zou, Y.; Song, S.; Zhang, Y.; Yan, J.; Waqar, M.F.; Zada, K.; Qi, S.; et al. Analysis of Ancient Rongcharong Landslide Dam Failure Events in the Suwalong Reach of the Upper Reaches of the Jinsha River. J. Earth Sci. 2025, 36, 2005–2022. [Google Scholar] [CrossRef]
  68. Xiao, H.; Guo, G.; Liu, W. Hazard Degree Identification and Coupling Analysis of the Influencing Factors on Goafs. Arab. J. Geosci. 2017, 10, 68. [Google Scholar] [CrossRef]
  69. Yan, Z.; Su, C.; Zhang, G.; Xu, Z.; Yang, X.; Zhang, C.; Sheng, C. Analysis on the Distribution and Formation Mechanism of Ground Collapse in Gypsum Mining Area in Yingcheng of Hubei Province. Chin. J. Geol. Hazard Control 2025, 36, 57–64. (In Chinese) [Google Scholar] [CrossRef]
  70. Luo, Y.; Yang, L.; Xing, Y. Mechanisms of Karst Ground Collapse Under Groundwater Fluctuations: Insights from Physical Model Test and Numerical Simulation. Water 2025, 17, 3588. [Google Scholar] [CrossRef]
  71. He, X.; Lang, Q.; Zhang, J.; Zhang, Y.; Jin, Q.; Xu, J. Interpretable Machine Learning for Explaining and Predicting Collapse Hazards in the Changbai Mountain Region. Sensors 2025, 25, 1512. [Google Scholar] [CrossRef]
  72. Tao, G.; Guo, L.; Xiao, H.; Chen, Q.; Nimbalkar, S.; Feng, S.; Wu, Z. Assessment of Vegetation Cover and Rainfall Infiltration Effects on Slope Stability. Appl. Sci. 2025, 15, 9831. [Google Scholar] [CrossRef]
  73. Didan, K. MOD13A3 MODIS/Terra Vegetation Indices Monthly L3 Global 1km SIN Grid V006 [Data Set]; NASA Land Processes Distributed Active Archive Center: Sioux Falls, SD, USA, 2015. [Google Scholar] [CrossRef]
  74. GB 18306-2015; Seismic Ground Motion Parameters Zonation Map of China (Scale 1:4,000,000). Standards Press of China: Beijing, China, 2015.
  75. Qi, S. Geological Structure Database of Qinghai Tibet Plateau; National Tibetan Plateau/Third Pole Environment Data Center: Beijing, China, 2022. [Google Scholar] [CrossRef]
  76. Qi, S. Engineering Geological Petrofabric Database of Qinghai Tibet Plateau; National Tibetan Plateau/Third Pole Environment Data Center: Beijing, China, 2021; Available online: https://data.tpdc.ac.cn/zh-hans/data/34828cc5-11ec-4e1f-916d-86d5598f09bb (accessed on 5 February 2026).
Figure 1. Location map of the study area in the Turpan–Hami Basin.
Figure 1. Location map of the study area in the Turpan–Hami Basin.
Applsci 16 03354 g001
Figure 2. Flowchart of GBDT-LR coupled model for hazard assessment.
Figure 2. Flowchart of GBDT-LR coupled model for hazard assessment.
Applsci 16 03354 g002
Figure 3. Groundwater level map of the main administrative divisions in the Turpan–Hami Basin.
Figure 3. Groundwater level map of the main administrative divisions in the Turpan–Hami Basin.
Applsci 16 03354 g003
Figure 4. Kernel density distribution map of ground collapse disaster points in the Turpan–Hami Basin.
Figure 4. Kernel density distribution map of ground collapse disaster points in the Turpan–Hami Basin.
Applsci 16 03354 g004
Figure 5. Distribution maps of contributing factors in the Turpan–Hami Basin. (a) Elevation distribution map; (b) slope distribution map; (c) aspect distribution map; (d) rainfall distribution map; (e) NDVI distribution map; (f) distance to goaf distribution map; (g) distance to water system distribution map; (h) distance to fault distribution map; (i) PGA distribution map; (j) lithology type distribution map; (k) mining intensity and depth distribution map; (l) distance to road distribution map.
Figure 5. Distribution maps of contributing factors in the Turpan–Hami Basin. (a) Elevation distribution map; (b) slope distribution map; (c) aspect distribution map; (d) rainfall distribution map; (e) NDVI distribution map; (f) distance to goaf distribution map; (g) distance to water system distribution map; (h) distance to fault distribution map; (i) PGA distribution map; (j) lithology type distribution map; (k) mining intensity and depth distribution map; (l) distance to road distribution map.
Applsci 16 03354 g005aApplsci 16 03354 g005b
Figure 6. Ring chart of disaster-causing factors.
Figure 6. Ring chart of disaster-causing factors.
Applsci 16 03354 g006
Figure 7. Hazard zoning map based on the GBDT-LR model.
Figure 7. Hazard zoning map based on the GBDT-LR model.
Applsci 16 03354 g007
Figure 8. ROC curves: (a) GBDT-LR ROC curve; (b) LR ROC curve.
Figure 8. ROC curves: (a) GBDT-LR ROC curve; (b) LR ROC curve.
Applsci 16 03354 g008
Table 1. Preliminary screening results of models.
Table 1. Preliminary screening results of models.
TierModelMean AUCAUC Std Dev
First TierRF0.95070.0123
First TierXGBoost0.94980.0089
First TierGBDT-LR Coupled0.94470.0163
Second TierSVM with RBF Kernel0.91570.0236
Third TierWoE-LR Coupled0.89450.0173
Third TierIV-LR Coupled0.89450.0173
Third TierEWM-LR Coupled0.89450.0173
Third TierNaiveBayes-LR Coupled0.89450.0173
Table 2. Classification of lithology.
Table 2. Classification of lithology.
Rock CategoryUniaxial Saturated Compressive Strength (UCS/MPa)Representative Rock Types
Hard Rock FormationUCS > 60Unweathered or slightly weathered granite, gneiss, diorite, quartzite, limestone or conglomerate with siliceous cementation.
Harder Rock Formation60 ≥ UCS > 30Slightly weathered hard rocks; unweathered or slightly weathered welded tuff, marble, dolomite, limestone, slate, sandstone with calcareous cementation; magmatic rocks with relatively coarse crystalline grains, etc.
Weaker Rock Formations30 ≥ UCS > 15Highly weathered hard rocks; slightly weathered hard rocks; sandstone and conglomerate with calcareous cementation.
Weak Rock Formation15 ≥ UCS > 5Schist; phyllite; mudstone; coal; sandstone and conglomerate with argillaceous cementation.
Loose Rock FormationUCS ≤ 5Completely weathered rocks of various types, poorly lithified rocks, and Quaternary loose deposits.
Table 3. Comprehensive performance metrics of different models.
Table 3. Comprehensive performance metrics of different models.
ModelAUCAccuracyRecallPrecisionF1 ScoreKappa Coefficient
LR0.8130.7570.8190.7290.7710.514
GBDT-LR Coupled0.8710.8100.8760.7730.8210.619
Table 4. The coefficients of the LR component in the GBDT-LR coupled model.
Table 4. The coefficients of the LR component in the GBDT-LR coupled model.
Disaster-Causing Factor (Ranked by Factor Weight in this Study)Comprehensive LR CoefficientImpact DirectionNumber of Matched High-Order Features
Elevation-related−0.001Inhibitory435
Distance to goaf-related0.167Promotive222
Rainfall-related−0.006Inhibitory157
Slope-related−0.002Inhibitory325
NDVI-related−0.002Inhibitory381
Aspect-related0.000Promotive386
Lithology × mining intensity interaction0.029Promotive14
Other auxiliary factors0.033Promotive49
Note: “Number of Matched High-Order Features” represents the total number of high-order features automatically extracted and matched by the GBDT module for each original disaster-causing factor in the GBDT-LR coupled model, which are finally input into the LR module for training. High-order features are nonlinear combined features generated by GBDT to characterize the complex nonlinear relationship between each factor and ground collapse. A larger value indicates a more complex nonlinear correlation between the corresponding factor and ground collapse, while a smaller value indicates a relatively simple nonlinear relationship.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, T.; Jin, C.; Liang, N.; Li, Y.; Song, S.; Ying, J.; Zhao, Y.; Zheng, B. Distribution Characteristics and Hazard Assessment of Ground Collapse in the Mining Activity Areas of the Turpan–Hami Basin. Appl. Sci. 2026, 16, 3354. https://doi.org/10.3390/app16073354

AMA Style

Wang T, Jin C, Liang N, Li Y, Song S, Ying J, Zhao Y, Zheng B. Distribution Characteristics and Hazard Assessment of Ground Collapse in the Mining Activity Areas of the Turpan–Hami Basin. Applied Sciences. 2026; 16(7):3354. https://doi.org/10.3390/app16073354

Chicago/Turabian Style

Wang, Tao, Chao Jin, Ning Liang, Yongchao Li, Shuaihua Song, Jingjing Ying, Yiqing Zhao, and Bowen Zheng. 2026. "Distribution Characteristics and Hazard Assessment of Ground Collapse in the Mining Activity Areas of the Turpan–Hami Basin" Applied Sciences 16, no. 7: 3354. https://doi.org/10.3390/app16073354

APA Style

Wang, T., Jin, C., Liang, N., Li, Y., Song, S., Ying, J., Zhao, Y., & Zheng, B. (2026). Distribution Characteristics and Hazard Assessment of Ground Collapse in the Mining Activity Areas of the Turpan–Hami Basin. Applied Sciences, 16(7), 3354. https://doi.org/10.3390/app16073354

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop