Next Article in Journal
CoLIME with 2D Copulas for Reliable Local Explanations on Imbalanced Network Data
Previous Article in Journal
LSTM-CA-YOLOv11: A Road Sign Detection Model Integrating LSTM Temporal Modeling and Multi-Scale Attention Mechanism
Previous Article in Special Issue
Failure Mode and Mechanisms of Gneiss Open-Pit Slopes in Cold Regions—A Case Study of the 14 September 2023 Landslide at the Jinbao Mine in Xinjiang, China
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Integrating SBAS-InSAR and Machine Learning for Enhanced Landslide Identification and Susceptibility Mapping Along the West Kunlun Highway

1
School of Traffic and Transportation Engineering, Xinjiang University, Urumqi 830017, China
2
Xinjiang Key Laboratory of Green Construction and Maintenance of Transportation Infrastructure and Intelligent Traffic Control, Urumqi 830017, China
3
School of Mechanical Engineering, Xinjiang University, Urumqi 830046, China
*
Authors to whom correspondence should be addressed.
Appl. Sci. 2026, 16(1), 120; https://doi.org/10.3390/app16010120
Submission received: 22 November 2025 / Revised: 16 December 2025 / Accepted: 17 December 2025 / Published: 22 December 2025
(This article belongs to the Special Issue Geological Disasters: Mechanisms, Detection, and Prevention)

Abstract

Landslide risk assessment along high-altitude transportation corridors is critical for infrastructure resilience. This study presents an integrated framework combining Small Baseline Subset Interferometric Synthetic Aperture Radar (SBAS-InSAR) deformation data and machine learning (ML) to systematically identify and assess landslide susceptibility along the entire 245.5 km West Kunlun Highway. We first compiled a landslide inventory through visual interpretation and SBAS-InSAR analysis. Subsequently, fourteen causative factors were selected to construct and compare six ML models: random forest (RF), K-nearest neighbours (KNN), artificial neural network (ANN), gradient boosting decision trees (GBDT), support vector machine (SVM), and logistical regression (LR). Research findings indicate that along the Hotan–Kangziva Highway in the Western Kunlun Mountains, there exist 21 potential risk points for small-scale landslides, 12 for medium-scale landslides, and 5 for large-scale landslides, with hazard identification accuracy reaching 80%. The random forest model demonstrated outstanding performance, classifying areas with 5.10%, 4.55% and 4.96% probability as extremely high, high and medium susceptibility, respectively. This work provides a robust methodology and a high-accuracy assessment tool for landslide risk management in the data-scarce Western Kunlun Mountains.

1. Introduction

1.1. Background

The Xinjiang Uygur Autonomous Region of China is vast in expanse, covering a total area of approximately 1.66 million square kilometres. Its topography is characterised by the distinctive “three mountain ranges flanking two basins” configuration, with the Western Kunlun Mountains forming the southernmost natural barrier of the region. From a spatial perspective, the main body of this mountain range lies at the junction of Xinjiang and Tibet. To the northwest, it connects with the Pamir Plateau tectonic belt; to the southwest, it links with the Karakoram orogenic belt; to the east, it transitions into the Central Kunlun Mountains; and to the south, it forms a geological contact with the Northern Tibetan Plateau. Geologically, its core structural unit—the Western Kunlun intermediate uplift zone—predominantly comprises Precambrian metamorphic formations and Hualixi-period granitic rocks. Its northern flank is governed by the Kegang Fault and the Karak Rock-Crust Fault Zone, while its southern margin is bounded by the Kangxiwa Super-Rock-Crust Fault [1]. The highly developed and complex geological structures of the Kunlun Mountains render the region geologically fragile, creating conditions conducive to geological hazards such as rockfalls, landslides, and debris flows [2]. Consequently, the long-distance highway corridors traversing the Kunlun region remain exposed to the risk of slope-related disasters. For instance, landslides along the Kunlun Pass-Qingshuihe section of the Qinghai–Tibet Plateau are widely dispersed, posing threats over extended stretches of the route and presenting high natural disaster risks [3]. However, current research on landslide hazard identification and susceptibility assessment within China has predominantly focused on the southwestern region [4,5,6,7,8] and the China–Pakistan Economic Corridor [9,10,11], with studies on high-altitude mountainous areas remaining relatively scarce. As a high-altitude, long-distance strategic corridor linking Xinjiang and Tibet, the Western Kunlun Mountains present a critical challenge: developing a suitable technical framework for landslide susceptibility assessment in this region. Such an approach would provide essential reference points for researchers and road management authorities.

1.2. Brief Literature Review and Summary

Susceptibility assessment holds significant importance as a crucial means for preventing and mitigating regional geological hazards. Traditional methods for assigning weights to assessment indicators have primarily focused on subjective weighting approaches, which rely on expert judgement to determine indicator weights. Examples include the analytic hierarchy process [12,13,14] and the evidence weighting method [15,16,17]. However, assessment methods based on subjective weights may lead to substantial discrepancies in the weights assigned by experts, thereby causing bias in the results. In recent years, the emergence of machine learning techniques has enabled researchers to derive more objective weightings. For landslide hazard identification and risk assessment, researchers typically commence by screening conditioning factors and optimising dataset samples to ensure assessment reliability [18]. Subsequently, comparing and selecting evaluation methods and models yields results better suited to the study area. Commonly employed methods and models include Support Vector Machines (SVM) [19,20,21,22], Random forests (RFs) [21,23,24,25], extreme gradient boosting (XGB) [22,26], artificial neural networks (ANNs) [27,28], and logistic regression (LR) [21,26,29]. These models effectively mitigate the interference of subjective weighting, though their precision largely depends on mathematical quantification methods, the dataset of the study area, and the selection of conditioning factors. For instance, Wei Yingdong et al. combined random forest (RF), logistic regression (LR), and gradient boosting decision tree (GBDT) models with interferometric synthetic aperture radar (InSAR) technology to propose an enhanced landslide vulnerability assessment method [21]. For landslide-prone study areas with sparse or no samples, combining unsupervised learning strategies with few-shot learning methods and feature-based domain adaptation approaches has been demonstrated to enhance the transferability of landslide susceptibility models [30,31]. Hong Haoyuan compared the performance of five ensemble models in landslide susceptibility modelling, selecting 15 environmental factors for evaluation. Results confirmed that NDVI, rock properties, and elevation are critical factors in the tested models [32]. Therefore, exploring a more comprehensive and holistic vulnerability assessment model to enhance predictive accuracy and applicability holds significant practical importance for the fields of geotechnical engineering and geological hazard management.
In summary, existing studies on landslide susceptibility assessment primarily focus on enhancing evaluation accuracy from multiple dimensions, including assessment models, study area datasets, and evaluation factor selection, providing valuable insights for researchers. However, current research lacks the development of integrated assessment framework models. Furthermore, studies on the regional applicability of machine learning-based susceptibility assessments have predominantly centred on areas such as Southwest China and the China–Pakistan Economic Corridor. Research on high-altitude regions, such as the Western Kunlun Mountains, remains scarce.
To address the aforementioned issues, this study focuses on Hotan–Kangziva Highway in the Western Kunlun Mountains of Xinjiang, China. It innovatively combines visual interpretation with SBAS-InSAR technology, screening 14 assessment factors to systematically construct six mainstream machine learning susceptibility assessment models (SVM, RF, KNN, ANN, LR, and GBDT). A systematic model screening and validation process is provided, culminating in corresponding prevention and mitigation measures and recommendations. This study has adapted and enhanced a specific landslide hazard assessment system for high-altitude mountainous regions, thereby filling a gap in systematic research on landslide hazards along highways in the Western Kunlun Mountains for the first time. It provides a viable approach for disaster mitigation along mountainous transport corridors.

2. Overview of the Study Area

This case study examines the construction project for the Hotan–Kangziva Highway within China’s Xinjiang Uygur Autonomous Region. This section connects with the S210 Aksai Chin Desert Highway and ultimately links to the G219 National Highway. The total length of the route is 245.481 km (Figure 1). The study area is located on the northern slope of the Kunlun Mountains at the edge of the northwest side of the Qinghai–Tibet Plateau, characterised by rich geological conditions. It lies within two central tectonic units: the Western Kunlun Fold Belt and the Tarim Platform. Intense tectonic movements have caused rock layers in the Kunlun mountain troughs to fold and uplift, forming fault lines. This has led to frequent landslides [33] after subsequent earthquakes, where loose materials and unstable slopes are prone to further landslides. Long-term mountain slope failures have severely impacted local economic development and threatened the safety of residents.

3. Methods

3.1. Establishment of a Landslide Prone Area Evaluation System

To effectively identify landslide hazards along the Hotan–Kangzwa Highway corridor and assess their susceptibility, this study employs optical remote sensing imagery and SBAS-InSAR technology for landslide detection. Fourteen conditioning factors were screened, and six models (SVM, RF, KNN, ANN, LR, and GBDT) were established to evaluate the accuracy of susceptibility assessment. Corresponding landslide prevention and mitigation measures and recommendations were ultimately proposed. The specific methodological workflow, illustrated in Figure 2, comprises three key technical stages: Stage 1: monitoring ground deformation using optical remote sensing imagery and SBAS-InSAR technology to identify detailed and reliable slope deformation zones; Stage 2: screening key assessment indicators and constructing six machine learning prediction frameworks to evaluate susceptibility assessment accuracy; Stage 3: analysing susceptibility assessment results to propose targeted prevention and mitigation strategies for different regions.

3.2. Accuracy of Landslide Hazard Identification

To further determine whether deformation hazard zones detected via SBAS technology within the study area correspond to landslide zones, this research employed optical remote sensing imagery from the Sentinel-2 satellite. By integrating optical remote sensing interpretation of specific landslide indicators, a detailed landslide identification analysis was conducted on the calibrated deformation hazard zones. The investigation also encompassed the identification of phenomena such as landslides and rockfalls along highways and their surrounding areas. Accuracy testing was conducted using confusion matrices, with the commonly employed accuracy (ACC) serving as a key metric: a higher ratio indicates greater model precision.
A C C = T P + T N T o t a l × 100 %
T P denotes true positives, T N denotes true negatives, and T o t a l represents all samples.

3.3. Method for Correlation Analysis of Conditioning Factors

The Pearson correlation coefficient (PCC), formally described as the Pearson product-moment correlation coefficient, represents a parametric statistical measure that quantifies linear dependence between two continuous variables [34]. This dimensionless index assumes values within the closed interval [−1, +1], where both the sign and magnitude convey critical information about the linear relationship’s directional tendency and effect size [35]. The coefficient’s computation involves covariance normalisation through the product of standard deviations from both variables, as expressed in the following mathematical formula:
r = ( X i X ¯ ) ( Y i Y ¯ ) ( X i X ¯ ) 2 ( Y i Y ¯ ) 2
r = 1 indicates a perfect positive correlation, meaning there is a perfect positive linear relationship between the two variables. r = 1 indicates a perfect negative correlation, meaning the two variables have a perfect negative linear relationship. r = 0 indicates no linear correlation, meaning there is no linear relationship between the two variables. In practical applications, the Pearson correlation coefficient is often used to assess the linear relationship between variables. For example, in evaluating the susceptibility of geological disasters, the Pearson correlation coefficients between different factors can be calculated to check their independence, avoiding the selection of highly correlated factors to ensure the scientific and reliability of the evaluation results.
Multicollinearity diagnostics evaluate intercorrelations among independent variables in regression analysis, enabling researchers to eliminate redundant predictors and improve the precision of parameter estimation [36]. Before model construction, systematic assessment of variable associations is crucial to identify and remove highly correlated factors.
Multiple Collinearity Diagnostic Parameters [37]: (1) TOL (Tolerance, Tolerance): Tolerance refers to the degree of interpretation of a variable, which is one minus the coefficient of determination for that variable. When TOL is more significant than 0.1, it indicates that the conditioning factors are independent of each other and there is no severe collinearity issue [33]; (2) VIF (Variance Inflation Factor, Variance Inflation Factor): Defined as the reciprocal of tolerance (VIF = 1/TOLVIF = 1/TOL), this metric quantifies the inflation of coefficient variance caused by variable interdependencies [38]. VIF values below 10 indicate negligible collinearity, permitting variable retention in regression models [39].

3.4. Machine Learning Model

3.4.1. Support Vector Machine (SVM)

Support vector machines (SVMs) represent a classical supervised machine learning algorithm extensively employed in landslide susceptibility prediction [40]. Its core principle involves mapping input data into a high-dimensional feature space via kernel functions, then identifying the optimal hyperplane within this space that maximises the inter-class margin. This approach enables modelling and forecasting the non-linear relationship between geological environmental factors and landslide occurrence. In achieving global optimisation, SVM solves a convex quadratic programming problem to simultaneously minimise classification error and maximise the geometric margin between support vectors and the hyperplane [41]. Its objective function is defined as follows:
min ω , b , ξ 1 2 ω 2 2 + C i = 1 l   ξ i y i ω , φ x i b 1 ξ i , ξ i 0 , i = 1,2 , l
where C is the penalty factor, x I is the fault feature sample, y I is the sample label, l is the number of training samples, φ is the feature mapping function, ω is the normal vector of the hyperplane,   b is the bias term, and ξ i is the slack variable.

3.4.2. Random Forest (RF)

Random forest (RF) is an ensemble learning algorithm based on decision trees [42], employed for prediction and classification tasks before and after disasters such as landslides. Multiple training subsets are generated through bootstrap sampling, with a decision tree trained independently for each subset. During node splitting, the optimal splitting feature is selected from a randomly chosen subset of features. This dual randomisation mechanism for both data and features enhances the diversity of individual decision trees, effectively suppressing overfitting. RF aggregates the predictions of all decision trees through majority voting or averaging (classification predictions favour the majority category, while regression predictions take the mean of each tree’s forecast) [43]. This integrates the predictive capabilities of multiple decision trees, thereby enhancing the model’s accuracy in forecasting the spatial distribution of landslides. During the splitting process, for each selected feature, the decision tree uses information gain to determine the optimal split. The information gain is calculated as follows:
I G ( x k , t ) = H ( t ) i   D i | D | H D i
In Equation (4), I G ( x k , t ) denotes the information gain obtained by splitting the current node t using the feature x k . Where H ( t ) is the information entropy of node t, D i is the subset generated by splitting via feature x k . The summation runs over all child subsets D i generated by the split, where D i and D represent the number of samples in subset D i and the original node t , respectively. H D i is the entropy of that subset. The algorithm selects the feature that maximises I G for the split decision.

3.4.3. K-Nearest Neighbours (KNN)

K-nearest neighbours (KNN) is an instance-based learning algorithm employed in landslide susceptibility assessment to perform classification predictions based on the spatial proximity of geological environmental factors. Its characteristic lies in: for a sample requiring prediction, the k-nearest samples within the training set are identified using a distance metric (such as Euclidean distance); subsequently, majority voting is applied based on the category labels of these k samples to determine the landslide category of the predicted sample [44]. Through neighbour selection and voting decision-making, KNN effectively captures complex non-linear coupling relationships among geological features.
In distance measurement, suppose there are two sample points: U = u 0 , , u n and V = v 0 , , v n ; for 1 i n , the formula to calculate the Euclidean distance between U and V is as follows:
d E D ( U , V ) = i = 1 n   u i v i 2
In Equation (5), U and V represent two sample vectors in the multi-dimensional feature space, defined by the landslide conditioning factors. Their components u i and v i correspond to the normalised values of the i -th conditioning factor for each sample. d E D ( U , V ) is the Euclidean distance between them, serving as the similarity metric for the KNN algorithm.

3.4.4. Artificial Neural Network (ANN)

Artificial neural networks (ANNs) constitute a multi-layered, non-linear supervised learning model that emulates the signal transmission mechanisms of biological neurons. They excel at processing complex, non-linear geological environmental data and can uncover latent disaster mechanisms. When handling intricate datasets, ANNs employ a multi-layer perceptron structure to effectively model the complex mapping relationships between multi-source environmental factors—such as topography and geology—and the probability of landslide occurrence [45]. Its commonly used formula is as follows:
y i = f ( j = 1 n   w i j x j + b i )
where y i is the output of the i hidden layer neuron, f is the activation function, n is the number of input layer neurons, w ij is the weight from the j input layer neuron to the i hidden layer neuron, x j represents the input from the j neuron, and b i is the bias value for the i hidden layer neuron.

3.4.5. Gradient Boosting Tree (GBDT)

Gradient boosting decision tree (GBDT) is an ensemble learning algorithm based on the Boosting framework. In landslide and other disaster assessments, it constructs highly accurate landslide prediction models by capturing complex non-linear relationships among various environmental and geological factors. Its core logic involves generating multiple weak decision trees, each fitted to the residuals of the preceding model, progressively correcting bias. The predictions from all weak trees are then weighted and aggregated to produce the output of a strong learner. Key parameters include the number of trees, the depth of each tree, and the learning rate. Optimising these parameters directly influences the model’s ability to characterise the complex non-linear relationships among landslide factors.
Let the training sample be x i , the initial loss function F 0 is as follows:
F 0 x i = arg   min i = 1 n L y i , c
Here, x i represents the training samples. F 0 x i denotes the initial prediction of the GBDT model for the training sample x i . The term arg   min finds the constant value c that minimises the summation of the loss function L ( y i , c )   across all samples, where y i is the true label. This constant c serves as the starting point for the subsequent boosting iterations. By using the generated decision tree to fit the gradient descent direction of the loss function, the loss function achieves the optimal fitting value r j for the r round:
r j = arg   min i = 1 n L y , F i 1 x + h j x i
In the presented formulation, Equation (7) defines the optimisation process for the j -th iteration, where r j represents the optimal parameters for the weak learner h j ( x i ) . This learner is fitted by minimising the loss between the true label y and the combined prediction of the previous model, F i 1 ( x ) and h j ( x i ) itself, which is equivalent to fitting the pseudo-residuals. Subsequently, Equation (8) expresses the final model F M ( x ) as the summation of the ensemble model states F j 1 ( x ) from all M iterations. It is noted that this formulation effectively aggregates the contributions of all weak learners built during the boosting process. The loss function is used to update the model, and the final prediction result calculation function is as follows:
F M x = j = 1 m F j 1 x

3.4.6. Logistic Regression (LR)

Logistic regression (LR) is a classical linear probabilistic classification model employed in disaster assessments such as landslides to establish quantitative relationships between multiple geological environmental factors—including topography, lithology, and rainfall—and the probability of landslide occurrence. Its core methodology involves constructing a linear combination of independent variables, which is then mapped through a sigmoid function to convert the linear score into a probability value between 0 and 1, thereby representing the likelihood of landslide occurrence within a specific area of the study region. The key algorithm involves parameter optimisation based on maximum likelihood estimation. Through iterative solution, the model determines the prediction probabilities for landslide and non-landslide samples, thereby establishing the weighting coefficients for each factor within the linear combination. Specifically, the probability of geological disaster occurrence is formulated as follows:
log P 1 P = a 0 + a 1 X 1 j + a 2 X 2 j + . . . + a n X n j
Here, P denotes the probability of landslide occurrence. On the right-hand side, a 0 is the intercept, and a 1 , a 2 , …, a n are the coefficients of the logistic regression model corresponding to the n elected conditioning factors X 1 j , X 2 j , …, X n j . The value of X n j represents the performance of the factor at the j t h level.

4. Evaluation Indicator Selection and Analysis

4.1. Data Collection and Processing

In evaluating geological disaster susceptibility, the acquisition methods of multi-source data also vary due to the different times when geological disaster points occur, making it challenging to ensure that all conditioning factors are within the same period. Among these, geological faults and elevation do not change significantly in the short term, while normalised vegetation index (NDVI) and rainfall may vary significantly across different years. Therefore, this study seeks to avoid problems such as extensive time spans between different data sources during the data collection process. Secondly, the dataset employed in this study also focuses on information such as road classification and existing national disaster sites, aiming to provide reference material for highway management authorities and policymakers. The specific sources of multi-source data are as follows:
Geological disaster data: This paper’s geological disaster point data have been accumulated up to 2023 and are from the Resource and Environmental Science and Data Center (https://www.resdc.cnDefault.aspx (accessed on 30 June 2025)).
Seismic zone and geological fault data: The seismic zone and geological fault data used in this paper are from the Digital Geological Map Space Database (https://www.cgs.gov.cn/ (accessed on 30 June 2025)) of the China Geological Survey in 2003.
NDVI data: The NDVI data used in this paper are from the Landsat8 OLI TIRS satellite digital product in August 2023, downloaded from the Geospatial Data Cloud (http://www.gsclound.cn/ (accessed on 30 June 2025)).
Transportation network and water system data: The data for national highways, expressways, provincial highways, county roads and water systems are downloaded from the universal map downloader of Water Economy Note, and the time is 2020.
Elevation data: The elevation data used in this paper are derived from NASA’s 12.5 m elevation DEM, which is downloaded from the Water Economy Universal Map Downloader.
Rainfall Data: The rainfall data used in this paper comes from the National Earth System Science Data Center (http://www.geodata.cn (accessed on 30 June 2025)), part of the National Science and Technology Infrastructure Platform, with a resolution of 1 km × 1 km. Since the geological disaster site data are accumulated over multiple years, this study uses the average annual rainfall data for Hotan City from 2020 to 2023 to analyse the spatial distribution characteristics of geological disasters along the Hotan–Kangxiwa Highway.
Geological lithology: The geological lithology data used in this paper are from the Geographical Remote Sensing Ecology Network (http://gisrs.cn/ (accessed on 30 June 2025)).

4.2. Selection and Analysis of Evaluation Indicators

Based on existing data and extensive literature research, combined with the selection principles for geological hazard susceptibility conditioning factors, fourteen factors were screened: distance from roads, slope gradient, slope aspect, curvature, geology, NDVI, distance from faults, distance from watercourses, undulation, rainfall, land use, topography, soil properties, and rock properties (Figure 3). Correlation analysis indicates these factors are independent, meeting the criteria for conditioning factor selection. Specifically, distance from roads and distance from faults reflect the impact of human engineering activities on the geological environment; slope gradient, aspect, and curvature describe variations in topography and landforms, while geology, rock properties, and soil reveal the composition and characteristics of subsurface materials. The NDVI, as an indicator of vegetation cover, reflects surface stability, whereas rainfall and distance from watercourses are closely related to hydrological conditions. Land use and topography reveal surface utilisation patterns and natural forms. Building upon this research, a geological hazard susceptibility conditioning factor system for the Hotan–Kangxiwa Highway has been established. Comprehensive analysis and processing of these factors enable more scientifically grounded prediction and assessment of geological hazard occurrence, providing reliable scientific basis for prevention and emergency management. Furthermore, this system can guide risk assessment and decision-making during highway design and construction processes, thereby enhancing road safety and sustainability.

4.2.1. Distance from the Road

To accurately assess the impact of road construction on the surrounding geological environment, this study employed ArcMap 10.8.x software’s distance tool to calculate the distance between geological hazard sites and the nearest road. The results were categorised into five intervals: (1) 0–500 m, (2) 500–1000 m, (3) 1000–1500 m, (4) 1500–2000 m, and (5) over 2000 m.

4.2.2. Gradient

To quantitatively analyse the driving mechanisms of terrain gradients on landslide hazards, this study employed the Natural Breaks method to construct a terrain gradient spectrum, dividing the study area into six characteristic gradient intervals: (1) 0–10°, (2) 10–20°, (3) 20–30°, (4) 30–40°, (5) 40–50°, and (6) greater than 50°.

4.2.3. Slope Direction

To investigate the relationship between aspect and geological hazard occurrence in detail, this study employed ArcMap 10.8.x software to extract aspect maps for the study area. Aspect data were reclassified into eight primary directional intervals: (1) North (0–22.5°), (2) Northeast (22.5–67.5°), (3) East (67.5–112.5°), (4) Southeast (112.5–157.5°), (5) South (157.5–202.5°), (6) Southwest (202.5–247.5°), (7) West (247.5–292.5°), and (8) Northwest (292.5–337.5°). Additionally, the final northward interval extends from 337.5° to 360°, ensuring complete coverage of the azimuth circle. Generally, areas with lower light levels exhibit stronger soil retention capacity due to higher vegetation coverage. However, in certain circumstances, these regions may also face increased landslide risks owing to the added weight of vegetation.

4.2.4. Curvature

In this study, ArcMap 10.8.x software was employed to calculate the topographic curvature of the study area. Based on the results, curvature values were categorised into three intervals: (1) negative curvature (<0°), (2) zero curvature (0°), and (3) positive curvature (>0°). These classifications reflect variations in terrain undulation. Areas of negative curvature typically serve as sites for water convergence and material accumulation, potentially increasing erosion and landslide risks. Regions of zero curvature exhibit relatively flat topography, presenting a lower probability of geological hazards. Conversely, areas of positive curvature manifest as protruding terrain features. Whilst possessing favourable drainage characteristics, such areas remain susceptible to water erosion under conditions of high rainfall.

4.2.5. Geology

Based on lithological characteristics, thickness, and fault distribution within the strata, this study categorised the stratigraphic data of the study area to identify strata types exhibiting varying degrees of stability. These strata types were further subdivided into distinct intervals to facilitate in-depth analysis. Statistical analysis was conducted by overlaying this stratigraphic data with known geological hazard sites, aiming to reveal the relationship between distinct stratigraphic characteristics and the frequency of geological hazard occurrence.

4.2.6. NDVI

This study employs the normalised difference vegetation index (NDVI) as a key parameter for assessing landslide hazard potential. Utilising the natural breakpoint method, the study area is categorised into five characteristic vegetation cover levels: (1) less than 0, (2) 0–0.25, (3) 0.25–0.5, (4) 0.5–0.75, and (5) 0.75–1. This classification enables more precise analysis and evaluation of the relationship between vegetation coverage and the probability of geological hazard occurrence.

4.2.7. Distance from Fault

In this study, the Distance Analysis tool within ArcMap 10.8.x software was employed to calculate the precise distances from various points within the study area to the nearest fault. Based on this distance data, the study area was further subdivided into five zones: (1) 0–1 km, (2) 1–2 km, (3) 2–3 km, (4) 3–4 km, and (5) over 4 km. Each interval represents areas at varying distances from the fault line, with regions closer to the fault anticipated to exhibit a higher probability of geological hazards occurring.

4.2.8. Distance from Watercourse

In this study, the Distance tool within ArcMap 10.8.x software was employed to calculate the distance from different locations within the study area to the nearest watercourse. These distance results were then reclassified into five intervals: (1) 0–500 m, (2) 500–1000 m, (3) 1000–1500 m, (4) 1500–2000 m, and (5) over 2000 m. This classification facilitates a more nuanced analysis of watercourse influence on geological hazard susceptibility.

4.2.9. Amplitude

In this study, the terrain analysis tools within ArcMap 10.8.x software were employed to calculate the topographic undulation within the study area. Based on these calculations, the terrain undulation was categorised into five intervals: (1) 0–25 m, (2) 25–50 m, (3) 50–75 m, (4) 75–100 m, and (5) over 100 m. These classifications aim to precisely depict the spatial distribution of how varying terrain undulation affects geological hazards.

4.2.10. Rainfall

This study employed the equidistant method to categorise rainfall within the study area, enabling precise assessment of geological hazard occurrence frequency under varying precipitation conditions. Precipitation levels were categorised into six intervals: (1) 0–25 mm, (2) 25–50 mm, (3) 50–75 mm, (4) 75–100 mm, (5) 100–125 mm, and (6) exceeding 125 mm. This classification facilitated a detailed analysis of the relationship between rainfall intensity and geological hazard occurrence.

4.2.11. Land Use

This study employed the equidistance method to delineate land use types within the study area, enabling a detailed analysis of how different land use patterns influence geological hazard susceptibility. Land use types were categorised into six primary classes: (1) water bodies, (2) cultivated land, (3) built-up areas, (4) bare land, (5) glaciers, and (6) grasslands. This classification facilitates a clearer understanding of geological hazard occurrence patterns across diverse land use contexts.

4.2.12. Landform

Landform types constitute one of the key factors in analysing geological hazard susceptibility, as they directly reflect the physical morphology and geological structure of the Earth’s surface. These characteristics are crucial in influencing surface processes and geological activity. Landform types encompass mountains, hills, plains, basins, and plateaus. Each type, owing to its unique topographical characteristics and formation processes, exerts differing influences on hydrological dynamics, vegetation cover, soil erosion, and human activities, thereby shaping the occurrence patterns and frequency of geological hazards. In geological hazard research, accurately understanding and classifying landform types enables scientists and decision-makers to better assess regional geological hazard risks, formulate more effective disaster prevention and mitigation measures, and inform land use and urban planning.

4.2.13. Soil

Soil type, as one of the key factors in analysing geological hazard susceptibility, plays a significant role in determining soil stability, permeability, and erosion rates. Different soil types possess distinct physical and chemical properties, which influence water retention and drainage capacity, soil particle cohesion, and the support provided for vegetation growth. These factors indirectly affect the occurrence of geological hazards such as landslides, soil erosion, and debris flows.

4.2.14. Rock Type

Lithology constitutes a critical factor that cannot be overlooked in the assessment of geological hazard susceptibility. This is because distinct lithologies, owing to their unique chemical and physical properties—such as mineral composition, structure, and texture—determine how rocks respond to environmental factors including climate and surface water.

5. Results

5.1. Landslide Hazard Identification Results

This study utilised SBAS-InSAR technology to detect 15 deformation hazard zones. Through optical remote sensing image analysis, 23 landslide areas, 6 potential landslide zones, and 14 deformation zones caused by other factors were confirmed, as shown in Table 1. The identification accuracy of landslide hazards calculated based on Equation (1) in Table 1 was 80%. To further validate the hazard points identified through visual interpretation and SBAS-InSAR analysis, researchers conducted detailed field surveys, as illustrated in Figure 4. Table 2 presents the comparative list of hazard points documented during field investigations. Figure 5 depicts the spatial distribution of identified landslide areas, potential landslide zones, and other deformation zones. It is worth noting that the InSAR monitoring period in this study spans 2022–2023, whereas the land use and road data originate from 2020. Following meticulous field investigations in July 2023, researchers observed that no new roads were constructed between 2020 and 2023 due to adverse geological conditions and inadequate preparation for highway engineering projects. Consequently, the temporal discrepancy between land use and road data does not impact the final susceptibility assessment outcomes.
This study used SBAS-InSAR technology to monitor the surface deformation of the Hotan–Kangxiwa Highway construction project, with the monitoring period from January 2022 to December 2023. The research area along the highway has complex topography and geomorphology, especially at several key points, such as near reservoirs and in Hotan City. Through InSAR monitoring, data showed that the rate of surface deformation along satellite line-of-sight (LOS) varied across different sections. The overall deformation rate ranged from 163 mm/yr to 49 mm/yr, indicating diverse surface deformations. Green areas in Figure 6 represent stable regions, while red areas indicate deformed regions.
The area around Hotan City and its counties is relatively stable, with slight deformation within small areas, which may be related to local construction activities and groundwater dynamics. The section from Wulawati Reservoir to Toman shows significant deformation activity, especially in the surrounding areas of the reservoir, possibly due to changes in water levels and their impact on the surrounding geological structure. The surface deformation rate is higher from Pangnazi Reservoir to Puxia District, indicating potential groundwater flow or other geological activities. At Kangxiwa to the G219 National Highway connection point, this final section exhibits some surface stability, but localised areas also show minor deformations.
Monitoring results indicate active surface deformation along the Hotan–Kangxiwa Highway in certain areas, particularly around reservoirs. It is recommended that detailed geological surveys and risk assessments be conducted in regions with significant deformation to implement appropriate preventive measures, ensuring the long-term stability and safety of highway projects. Additionally, regular InSAR monitoring can effectively track trends in surface deformation, providing scientific evidence for timely disaster warnings.
Among the 15 deformation hazard areas detected by SBAS-InSAR technology, 23 landslide areas, six potential landslide areas and 14 deformation areas caused by other reasons were confirmed through optical remote sensing image analysis.

5.2. Correlation Analysis of Conditioning Factors

The lower the Pearson correlation coefficient, the weaker the correlation. When the VIF values for multicollinearity are all below 10 and the TOL values are all significantly greater than 0.1 [23,32], this indicates that these condition factors are mutually independent. Following careful screening and removal of evaluation factors that did not meet the requirements, the final 14 evaluation factors (Figure 7 and Table 3) were deemed suitable for model construction.

5.3. Establishment of a Vulnerability Assessment Model and Parameter Determination

5.3.1. Support Vector Machine (SVM)

The study area was divided into 61,207 grid cells, each with a resolution of 10 m × 10 m. Of these, 7307 grid cells experienced landslides. To address sample imbalance, random downsampling was employed to construct a balanced dataset, matching positive samples (7307 landslide cells) with negative samples (7307 non-landslide cells) at a 1:1 ratio. Both positive and negative samples were assigned binary classification labels (1/0), forming a training matrix comprising 14,614 observation units. Based on the dataset of 14,614 grid cells (landslides and non-landslides), the data were partitioned into a training set (10,230 units) and a validation set (4384 units) at a 7:3 ratio to ensure model generalisation capability validation. Model parameters were optimised using the training set, with model performance evaluated via the validation set’s confusion matrix. The final step involved inputting the global spatial data into the model to generate a susceptibility probability field. This continuous probability value was then segmented into five risk-level intervals using the natural breakpoint classification method.

5.3.2. Random Forest (RF)

The classification principle is analogous to that of support vector machines, but utilises a dataset comprising 61,207 grid cells selected from both landslide and non-landslide areas, with 7307 grid cells representing landslides. Parameter settings are detailed in Table 4, while experimental results are presented in Table 5.

5.3.3. K-Nearest Neighbours (KNN)

The k-nearest neighbours parameter is crucial in model construction. To determine the optimal k value, this study employed a dataset to compare the impact of different k values on model performance. The experimental results in Figure 8 indicate that the model achieves its highest accuracy when k = 14. This demonstrates that the model correctly classifies the highest proportion of cases at this k value, yielding the most optimal overall performance.

5.3.4. Artificial Neural Network (ANN)

The model parameters and experimental results are presented in Table 6. Accuracy denotes the proportion of correctly predicted samples relative to the total sample size, with higher values being preferable. Precision represents the proportion of positive predictions that are actually positive samples, where higher values are desirable. Recall indicates the proportion of positive samples correctly predicted as positive, with higher values being advantageous. The F1-score represents a composite evaluation metric synthesising precision and recall, calculated as their harmonic mean. While both precision and recall are desirable, they often exhibit conflicting trade-offs. Consequently, the F1-score is frequently employed to comprehensively assess classifier performance. Its value ranges from 0 to 1, with values closer to 1 indicating superior performance. However, evaluation must be contextualised to specific scenarios, and it is commonly used to gauge improvements in model effectiveness. Experimental results indicate that the final model achieved an accuracy of 90.59% on the test set, with a precision (composite) of 90.49%, a recall (composite) of 90.59%, and an F1-score (composite) of 0.90. The model demonstrates favourable performance.

5.3.5. Gradient Boosting Decision Tree (GBDT)

The parameter settings and evaluation results of the gradient-boosted tree model are presented in Table 7. Experimental findings indicate that the final model achieved an accuracy of 91.69%, precision (overall) of 91.92%, recall (overall) of 91.69%, and an F1-score (overall) of 0.92 on the test dataset. The model demonstrates favourable performance.

5.3.6. Logistic Regression (LR)

This study employed stratified random sampling to process landslide data within the study area, dividing 248 historical landslide sites into training and validation sets at a 7:3 ratio. Following iterative parameter optimisation, results indicated that the significance test values (Sig.) for most variables fell below the 0.01 threshold, confirming the model’s statistical significance. Regression coefficient analysis indicates that logarithmically transformed coefficient values exhibit a positive correlation with disaster occurrence probability. Among all evaluated factors, water systems contributed most significantly, while the slope parameter within topographic elements demonstrated the highest contribution (Table 8). This corroborates the close association between slope stability and shear stress distribution—as slope angles increase, soil shear stress exceeding critical values triggers slippage. However, when topographic gradient surpasses specific thresholds, the absence of a free-face actually reduces the probability of instability. The foregoing analysis provides further corroboration for the reliability of the assessment results.

5.4. Results and Accuracy Evaluation of Landslide Susceptibility Assessment

This study integrated the original landslide catalogue data from the Hotan–Kangxiwa Highway and landslide disaster points identified using SBAS-InSAR deformation monitoring technology, resulting in 248 disaster points. The study area was divided into 61,207 grid cells, each with a resolution of 10 m × 10 m, to conduct a practical landslide susceptibility analysis. Among these, landslide disaster points occupied 7307 grid cells. Other non-landslide data were uniformly processed to the exact resolution using the ArcMaps resampling tool for subsequent data processing, statistics, and analysis.
Given that the study area encompasses a 2 km buffer zone on either side of the Hotan-Kangziva Highway, all “landslide” samples were first identified during the landslide hazard identification process. “Non-landslide” samples were randomly selected within the widened buffer zone along the highway. To account for the influence of proximity effects, this study excluded areas within 500 m of identified landslide samples before selecting these non-landslide samples from the widened buffer research zone. In the constructed model, landslide grid cells are considered positive samples (marked as “1”), while non-landslide units serve as negative samples (marked as “0”). The model aims to output a landslide susceptibility index between 0 and 1, which reflects the probability of landslide occurrence.
In order to train and verify the model’s effectiveness, 70% of the selected sample data is used as training samples, while the remaining 30% is used to test the model’s prediction ability. This data partition strategy ensures that the model can reliably generalise unknown data.
The model’s output was processed using the natural break classification method, which is widely used in geographic information systems to optimise inter-class differences in data. According to the natural break classification, the study area was divided into five levels: extremely low susceptibility, low susceptibility, moderate susceptibility, high susceptibility, and highly high susceptibility (as shown in Figure 9). This classification helps identify and visualise the spatial distribution of landslide disasters and potential risk areas.
The quality of a model is typically evaluated by observing how closely its ROC curve matches the upper left corner of the chart; the closer the curve is to the upper left corner, the better the model’s performance. Additionally, the AUC value is a quantitative measure, providing scores between 0 and 1, where 1 indicates perfect predictive ability, and 0.5 represents random guessing. In our study, six models were tested: support vector machine (SVM), random forest (RF), nearest neighbour (KNN), neural network (ANN), gradient boosting tree (GBDT), and logical regression (LR). The AUC values for these models were 0.87, 0.98, 0.97, 0.96, 0.96, and 0.88, respectively (as shown in Figure 10). These results indicate that all models demonstrated good predictive capabilities for landslide susceptibility. Notably, the random forest model (RF) performed optimally with an AUC value of 0.98, while the support vector machine (SVM), despite having the lowest prediction accuracy (AUC value of 0.87), still exhibited practical predictive power. These findings have significant practical implications for selecting appropriate models for actual landslide susceptibility assessments. By using the AUC value as a metric, we can effectively evaluate the overall performance of each model.
Based on the result of the random forest (RF) model, the study area was divided into five different susceptibility zones, with the area of each zone and its proportion in the entire study area shown in Figure 9. The data indicate that most study areas belong to extremely low-risk zones, accounting for as high as 78.22%, suggesting relatively low risks of geological disasters in these regions. In contrast, the areas with very high and high risk are smaller, but their existence highlights the need for special attention to geological disaster prevention measures in these regions. The extremely high-risk zone accounts for 5.10% of the entire study area, while the high-risk zone accounts for 4.55%. These areas may require enhanced monitoring and preventive measures to reduce potential disaster risks. The medium-risk and low-risk zones also account for 4.96% and 7.16%, respectively. Although the probability of geological disasters in these areas is lower, they should not be overlooked, and appropriate early warning and planning measures are also necessary. Through this detailed zoning and analysis, the government and relevant departments can allocate resources and manage risks more effectively, ensure that sufficient safety measures are taken in high-risk areas, and optimise low-risk areas’ utilisation and development strategies.

6. Discussion

In this case study, addressing the Hotan–Kangziva Highway construction project in the Western Kunlun Mountains affected by adverse geological conditions, a comprehensive integrated workflow is established for landslide hazard identification and susceptibility assessment. This approach employs optical remote sensing image interpretation and SBAS-InSAR technology, screening 14 evaluation factors and incorporating machine learning models. This integrated approach innovatively provides a robust technical reference for highway surveying and construction projects in the Western Kunlun Mountains, significantly enhancing the safety of transport infrastructure within the region.
Regarding the accuracy of susceptibility assessment models, the six machine learning models employed in this study demonstrated satisfactory evaluation outcomes (all AUC values exceeding 0.85) and are all suitable for application in susceptibility assessments within the Western Kunlun Mountains region [46]. Among these, the random forest, neural network, gradient boosting, and KNN models achieved AUC values exceeding 0.9. Research indicates that these four models may be particularly well-suited for landslide susceptibility assessment in the Western Kunlun Mountains [27]. It is noteworthy, however, that certain AUC values in this study (0.98 for the random forest model and 0.97 for the KNN model) are exceptionally high for natural modelling problems. This may stem from spatial autocorrelation leakage and does not necessarily indicate superior modelling performance over other models. Given that statistical evaluation of result discrepancies exceeds the scope of this study, future research will employ McNemar’s test and spatial cross-validation [47] to ascertain models’ true generalisation capabilities and compare the assessment performance of all six models.
Furthermore, to standardise dataset accuracy for model computations, this study prioritised time-coordinated, high-resolution datasets (e.g., 12.5 m digital elevation models) when collecting evaluation factors. However, owing to the harsh environmental conditions and limited monitoring infrastructure along the Hotan–Kangziva Highway in the Western Kunlun Mountains, existing datasets exhibit certain deficiencies in data completeness and precision. Consequently, this study acknowledges that upsampling low-resolution data to higher resolutions may introduce pseudo-accuracy, thereby affecting assessment precision. Yuli Wang et al. demonstrated that integrating the frequency ratio (FR) model into the lightweight gradient booster machine (LightGBM) model yields the composite FR-LightGBM model. This model constituted a robust modelling approach capable of generating high-resolution landslide hazard maps for risk assessment [48]. Future research should adopt multi-resolution or sub-grid frameworks, such as employing scale-aware machine learning algorithms to explicitly model covariate error propagation; validation should utilise landslide polygons generated at the same resolution as the coarsest prediction factor used in forecasting [49,50].
During the data collection process for this study, the focus was on linking susceptibility indices with road classification information and existing national disaster location datasets, aiming to provide reference material for regional highway management authorities and policymakers [51]. To further integrate these findings into the National Geological Disaster Statistics Bureau’s outcomes, future research will advance the generation of three policy-use metrics: high-vulnerability road mileage, population within 2 km widening buffer zones, and county-level GDP within high-vulnerability zones. These metrics can directly inform future disaster mitigation budget prioritisation, enabling more scientifically grounded investment in mitigation policies and measures.

7. Conclusions

This study focuses on landslide hazard identification and susceptibility mapping along the Hotan–Kangxiwa Highway in the western Kunlun Mountains. Utilising optical remote sensing and interferometric synthetic aperture radar techniques, typical landslide hazards within the region were identified. Support vector machines (SVMs), random forest (RF), k-nearest neighbour (KNN), artificial neural network (ANN), gradient-boosted decision tree (GBDT), and logistic regression (LR) models were employed to evaluate regional landslide susceptibility. Optimised landslide susceptibility values were obtained by integrating velocity morphology analysis results with vulnerability assessment grades. The core conclusions of this study are as follows:
(1) The study employed optical remote sensing and InSAR techniques to process Sentinel-1A radar imagery, yielding time-series results of surface deformation within the study area of the Western Kunlun Mountains, China, from 2022 to 2023. Radar line-of-sight deformation rates ranged from −163 mm/yr to 49 mm/yr. The findings demonstrate that InSAR technology, owing to its high sensitivity, offers significant advantages for identifying geological hazards along roadways in the Western Kunlun Mountains.
(2) The study confirmed the technical feasibility of synergistically identifying potential landslide hazards along highways by integrating optical remote sensing and InSAR technologies. This study identified 21 potential small-scale landslide risk points, 12 medium-scale risk points, and 5 large-scale risk points along the Hotan–Kangxiwa Highway, achieving an accuracy rate of 80% in hazard identification. The research revealed that surface deformations ranging from subtle to pronounced all pose potential threats to highway engineering safety.
(3) To analyse environmental factors associated with landslide occurrence within the study area, fourteen evaluation factors were selected: road networks, elevation, slope aspect, slope gradient, curvature, geology, faults, NDVI, undulation, precipitation, land use, relative elevation difference, landform, soil, and lithology. Among these, water systems exhibited the highest importance in model evaluation. All six machine learning models employed in this study achieved favourable evaluation results (AUC > 0.85). Among these, random forest (RF), K-nearest neighbour (KNN), convolutional neural network (CNN), and gradient boosted tree (GBDT) models may be more suitable for landslide susceptibility assessment in the Western Kunlun Mountains (AUC > 0.9). Future research will continue to focus on further demonstrating and enhancing the models’ true generalisation capabilities to provide a more accurate and comprehensive landslide susceptibility assessment framework.
(4) Taking the random forest model as an example, a detailed analysis was conducted on the landslide susceptibility assessment results for the Hotan–Kangxiwa Highway in the Western Kunlun Mountains. The random forest model’s susceptibility assessment indicates areas classified as extremely prone, prone, moderately prone, less prone, and extremely less prone accounted for 5.10%, 4.55%, 4.96%, 7.16%, and 78.22% of the total area, respectively. This study emphasises core prevention and control measures tailored to different susceptibility zones: in high-risk areas, the focus lies on hazard avoidance and emergency evacuation planning; whereas in low-risk zones, the emphasis is on ecological conservation and data accumulation.

Author Contributions

Conceptualisation, X.D. and L.X.; methodology, X.D.; formal analysis, X.D.; investigation, L.X., X.S. and S.L.; resources, X.D.; data curation, X.D.; writing—original draft preparation, X.S., L.X. and D.H.; writing—review and editing, X.D., X.S., S.L. and D.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research is a part of the phase results of the Xinjiang Key R&D Program Projects (grant number: 2022B03033-1), the Xinjiang Uygur Autonomous Region “Dr. Tianchi” Project, National Natural Science Foundation of China Regional Project (grant number: 52562045), and the general project of the National Innovation Training Program of Xinjiang University (grant number: 202410755073).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Acknowledgments

We extend our gratitude to Fuerhaiti Ainiwaer of China Xinjiang Naba Expressway Development Co., Ltd., for his support of this research endeavour.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Wang, L.; Xu, S.; Liu, X.; Wu, Z.; Chen, Y.; Wang, W. The characteristics and mechanism of earthquake disasters on permafrost sites induced by the west of Kunlun Mountaion Pass 8.1 earthquake in 2001. Cold Reg. Sci. Technol. 2024, 226, 104267. [Google Scholar] [CrossRef]
  2. Zhang, Z.; Zhang, T.; Yu, X.; LÜ, Q.; Lai, R.; Jia, J.; Liu, X. Zonation of disaster environments of collapse, landslide and debris flow geologic hazards and their formation mechanisms in Xinjiang. J. Eng. Geol. 2023, 31, 1129–1144. [Google Scholar] [CrossRef]
  3. Fei, D.; Liu, F.; Zhou, Q.; Chen, Q.; Wu, L. Risk analysis of landslide and debris flow disasters along the Qinghai-Tibet Railway. Arid Zone Geogr. 2016, 39, 345–352. [Google Scholar] [CrossRef]
  4. Du, J.; Glade, T.; Woldai, T.; Chai, B.; Zeng, B. Landslide susceptibility assessment based on an incomplete landslide inventory in the Jilong Valley, Tibet, Chinese Himalayas. Eng. Geol. 2020, 270, 105572. [Google Scholar] [CrossRef]
  5. Guo, C.; Xu, Q.; Dong, X.; Li, W.; Zhao, K.; Lu, H.; Ju, Y. Geohazard recognition and inventory mapping using airborne LiDAR data in complex mountainous areas. J. Earth Sci. 2021, 32, 1079–1091. [Google Scholar] [CrossRef]
  6. Cai, J.; Zhang, L.; Dong, J.; Guo, J.; Wang, Y.; Liao, M. Automatic identification of active landslides over wide areas from time-series InSAR measurements using Faster RCNN. Int. J. Appl. Earth Obs. Geoinf. 2023, 124, 103516. [Google Scholar] [CrossRef]
  7. Liang, R.; Dai, K.; Xu, Q.; Pirasteh, S.; Li, Z.; Li, T.; Wen, N.; Deng, J.; Fan, X. Utilizing a single-temporal full polarimetric Gaofen-3 SAR image to map coseismic landslide inventory following the 2017 Mw 7.0 Jiuzhaigou earthquake (China). Int. J. Appl. Earth Obs. Geoinf. 2024, 127, 103657. [Google Scholar] [CrossRef]
  8. Jiang, S.; Li, J.; Ma, G.; Rezania, M.; Huang, J. Stochastic hazard assessment framework of landslide blocking river by depth-integrated continuum method and random field theory. Landslides 2025, 22, 393–411. [Google Scholar] [CrossRef]
  9. Lin, K.; Jiapaer, G.; Yu, T.; Zhang, L.; Liang, H.; Chen, B.; Ju, T. Identification of potential landslides in the Gaizi Valley section of the Karakorum Highway coupled with TS-InSAR and landslide susceptibility analysis. Remote Sens. 2024, 16, 3653. [Google Scholar] [CrossRef]
  10. Su, X.; Zhang, Y.; Meng, X.; Yue, D.; Ma, J.; Guo, F.; Zhou, Z.; Rehman, M.; Khalid, Z.; Chen, G.; et al. Landslide mapping and analysis along the China-Pakistan Karakoram Highway based on SBAS-InSAR detection in 2017. J. Mt. Sci. 2021, 18, 2540–2564. [Google Scholar] [CrossRef]
  11. Chen, X.; Cui, P.; You, Y.; Cheng, Z.; Khan, A.; Ye, C.; Zhang, S. Dam-break risk analysis of the Attabad landslide dam in Pakistan and emergency countermeasures. Landslides 2017, 14, 675–683. [Google Scholar] [CrossRef]
  12. Kavus, Y.; Taskin, A. Assessment of landslides induced by earthquake risk of Istanbul: A comprehensive study utilizing an integrated DFS-AHP and DFS-EDAS approach. Soil Dyn. Earthq. Eng. 2025, 191, 109285. [Google Scholar] [CrossRef]
  13. Zhao, B.; Wang, Y.; Li, W.; Su, L.; Lu, J.; Zeng, L.; Li, X. Insights into the geohazards triggered by the 2017 Ms 6.9 Nyingchi earthquake in the east Himalayan syntaxis, China. CATENA 2021, 205, 105467. [Google Scholar] [CrossRef]
  14. Costache, R.; Tin, T.; Arabameri, A.; Crăciun, A.; Ajin, R.S.; Costache, I.; Towfiqul Islam, A.; Abba, S.I.; Sahana, M.; Avand, M.; et al. Flash-flood hazard using deep learning based on H2O R package and fuzzy-multicriteria decision-making analysis. J. Hydrol. 2022, 609, 127747. [Google Scholar] [CrossRef]
  15. Paul, G.; Alejandra, H. Landslide susceptibility index based on the integration of logistic regression and weights of evidence: A case study in Popayan, Colombia. Eng. Geol. 2021, 280, 105958. [Google Scholar] [CrossRef]
  16. Khanna, K.; Martha, T.; Roy, P.; Kumar, K.V. Effect of time and space partitioning strategies of samples on regional landslide susceptibility modelling. Landslides 2021, 18, 2281–2294. [Google Scholar] [CrossRef]
  17. Guo, Z.; Shi, Y.; Huang, F.; Fan, X.; Huang, J. Landslide susceptibility zonation method based on C5.0 decision tree and K-means cluster algorithms to improve the efficiency of risk management. Geosci. Front. 2021, 12, 101249. [Google Scholar] [CrossRef]
  18. Guo, Z.; Tian, B.; Zhu, Y.; He, J.; Zhang, T. How do the landslide and non-landslide sampling strategies impact landslide susceptibility assessment?—A catchment-scale case study from China. J. Rock Mech. Geotech. Eng. 2024, 16, 877–894. [Google Scholar] [CrossRef]
  19. Huang, Y.; Zhao, L. Review on landslide susceptibility mapping using support vector machines. CATENA 2018, 165, 520–529. [Google Scholar] [CrossRef]
  20. Su, Y.; Chen, Y.; Lai, X.; Huang, S.; Lin, C.; Xie, X. Feature adaptation for landslide susceptibility assessment in “no sample” areas. Gondwana Res. 2024, 131, 1–17. [Google Scholar] [CrossRef]
  21. Sun, D.; Gu, Q.; Wen, H.; Xu, J.; Zhang, Y.; Shi, S.; Xue, M.; Zhou, X. Assessment of landslide susceptibility along mountain highways based on different machine learning algorithms and mapping units by hybrid factors screening and sample optimization. Gondwana Res. 2023, 123, 89–106. [Google Scholar] [CrossRef]
  22. Pyakurel, A.; Dahal, B.; Gautam, D. Does machine learning adequately predict earthquake induced landslides? Soil Dyn. Earthq. Eng. 2023, 171, 107994. [Google Scholar] [CrossRef]
  23. Wei, Y.; Qiu, H.; Liu, Z.; Huangfu, W.; Zhu, Y.; Liu, Y.; Yang, D.; Kamp, U. Refined and dynamic susceptibility assessment of landslides using InSAR and machine learning models. Geosci. Front. 2024, 15, 101890. [Google Scholar] [CrossRef]
  24. Merghadi, A.; Yunus, A.; Dou, J.; Whiteley, J.; ThaiPham, B.; Bui, D.; Avtar, R.; Abderrahmane, B. Machine learning methods for landslide susceptibility studies: A comparative overview of algorithm performance. Earth Sci. Rev. 2020, 207, 103225. [Google Scholar] [CrossRef]
  25. Yang, C.; Liu, L.; Huang, F.; Huang, L.; Wang, X. Machine learning-based landslide susceptibility assessment with optimized ratio of landslide to non-landslide samples. Gondwana Res. 2023, 123, 198–216. [Google Scholar] [CrossRef]
  26. Han, Y.; Semnani, S. Important considerations in machine learning-based landslide susceptibility assessment under future climate conditions. Acta Geotech. 2025, 20, 475–500. [Google Scholar] [CrossRef]
  27. Feng, H.; Miao, Z.; Hu, Q. Study on the Uncertainty of Machine Learning Model for Earthquake-Induced Landslide Susceptibility Assessment. Remote Sens. 2022, 14, 2968. [Google Scholar] [CrossRef]
  28. Al-Najjar, H.; Pradhan, B. Spatial landslide susceptibility assessment using machine learning techniques assisted by additional data created with generative adversarial networks. Geosci. Front. 2021, 12, 625–637. [Google Scholar] [CrossRef]
  29. Meng, S.; Shi, Z.; Li, G.; Peng, M.; Liu, L.; Zheng, H.; Zhou, C. A novel deep learning framework for landslide susceptibility assessment using improved deep belief networks with the intelligent optimization algorithm. Comput. Geotech. 2024, 167, 106106. [Google Scholar] [CrossRef]
  30. Wang, H.; Wang, L.; Zhang, L. Transfer learning improves landslide susceptibility assessment. Gondwana Res. 2023, 123, 238–254. [Google Scholar] [CrossRef]
  31. Kong, L.; Feng, W.; Yi, X.; Xue, Z.; Bai, L. Enhanced landslide susceptibility mapping in data-scarce regions via unsupervised few-shot learning. Gondwana Res. 2025, 138, 31–46. [Google Scholar] [CrossRef]
  32. Hong, H. Landslide susceptibility assessment using locally weighted learning integrated with machine learning algorithms. Expert Syst. Appl. 2024, 237, 121678. [Google Scholar] [CrossRef]
  33. James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning. In Springer Texts in Statistics; Springer: New York, NY, USA, 2013; Volume 103, pp. 1–426. [Google Scholar] [CrossRef]
  34. Mukaka, M.M. Statistics corner: A guide to appropriate use of correlation coefficient in medical research. Malawi Med. J. 2012, 24, 69–71. [Google Scholar] [PubMed]
  35. Hinkle, D.; Wiersma, W.; Jurs, S. Applied Statistics for the Behavioral Sciences, 5th ed.; Houghton Mifflin Harcourt: London, UK, 2003; Available online: http://catalog.hathitrust.org/api/volumes/oclc/50716608.html (accessed on 5 October 2025).
  36. Intarat, K.; Yoomee, P.; Hussadin, A.; Lamprom, W. Assessment of landslide susceptibility in the intermontane basin area of northern Thailand. Environ. Nat. Resour. J. 2024, 22, 158–170. [Google Scholar] [CrossRef]
  37. O’brien, R. A Caution Regarding Rules of Thumb for Variance Inflation Factors. Qual. Quant. 2007, 41, 673–690. [Google Scholar] [CrossRef]
  38. Chen, W.; Yang, Z. Landslide susceptibility modeling using bivariate statistical-based logistic regression, naïve Bayes, and alternating decision tree models. Bull. Eng. Geol. Environ. 2023, 190. [Google Scholar] [CrossRef]
  39. Fox, J.; Monette, G. Generalized Collinearity Diagnostics. J. Am. Stat. Assoc. 1992, 87, 178–183. [Google Scholar] [CrossRef]
  40. Zhou, L.; Yan, P.; Li, X.; Liu, T.; Liu, Z.; Jia, W. Research on prediction model of high geothermal tunnels temperature based on CNN-SVM. Energy Build. 2025, 347, 116285. [Google Scholar] [CrossRef]
  41. Wang, B.; Qiu, W.; Hu, X.; Wang, W. A rolling bearing fault diagnosis technique based on recurrence quantification analysis and Bayesian optimization SVM. Appl. Soft Comput. 2024, 156, 111506. [Google Scholar] [CrossRef]
  42. Wang, G.; Zhou, H.; Hu, Q. Failure detection for deep-sea mining lifting systems based on a hybrid LSTM-RF model. Ocean Eng. 2025, 335, 121772. [Google Scholar] [CrossRef]
  43. Zhao, R. Intention recognition method for spatial non-cooperative target based on improved Random Forest. Adv. Space Res. 2025, in press. [Google Scholar] [CrossRef]
  44. Chen, X.; He, D.; Feng, Q.; Yang, X.; Luo, Q. Robust privacy-preserving KNN for smart healthcare with participant dropout resilience. J. Inf. Secur. Appl. 2025, 94, 104225. [Google Scholar] [CrossRef]
  45. Akbarpoor, S.; Rezazadeh, M.; Ghiassi, B.; Khayatian, F.; Poologanathan, K.; Sefat, H.; Corradi, M. A new bond-slip model for NSM FRP systems using cement-based adhesives through artificial neural networks (ANN). Constr. Build. Mater. 2024, 427, 136034. [Google Scholar] [CrossRef]
  46. Pham, B.T.; Pradhan, B.; Bui, D.T.; Prakash, I.; Dholakia, M.B. A Comparative Study of Different Machine Learning Methods for Landslide Susceptibility Assessment: A Case Study of Uttarakhand Area (India). Environ. Model. Softw. 2016, 84, 240–250. [Google Scholar] [CrossRef]
  47. Lu, F.; Zhang, G.; Wang, T.; Ye, Y.; Zhao, Q. Geographically Weighted Random Forest Based on Spatial Factor Optimization for the Assessment of Landslide Susceptibility. Remote Sens. 2025, 17, 1608. [Google Scholar] [CrossRef]
  48. Wang, Y.; Ling, Y.; Chan, T.O.; Awange, J. High-Resolution Earthquake-Induced Landslide Hazard Assessment in Southwest China Through Frequency Ratio Analysis and LightGBM. Int. J. Appl. Earth Obs. Geoinf. 2024, 131, 103947. [Google Scholar] [CrossRef]
  49. Petschko, H.; Brenning, A.; Bell, R.; Goetz, J.; Glade, T. Assessing the quality of landslide susceptibility maps – case study Lower Austria. Nat. Hazards Earth Syst. Sci. 2014, 14, 95–118. [Google Scholar] [CrossRef]
  50. Yang, Y.; Peng, S.; Huang, B.; Xu, D.; Yin, Y.; Li, T.; Zhang, R. Multi-Scale Analysis of the Susceptibility of Different Landslide Types and Identification of the Main Controlling Factors. Ecol. Indic. 2024, 168, 112797. [Google Scholar] [CrossRef]
  51. Hassani, H.; Marvian Mashhad, L.; Stewart, S.; MacFeely, S. Integrating GIS and Official Statistics Using GISINTEGRATION. AppliedMath 2025, 5, 166. [Google Scholar] [CrossRef]
Figure 1. Geographical location of the study area.
Figure 1. Geographical location of the study area.
Applsci 16 00120 g001
Figure 2. Research flowchart.
Figure 2. Research flowchart.
Applsci 16 00120 g002
Figure 3. Spatial distribution of rating factors (slope, rainfall, soil, road distance, slope direction, curvature, geology, NDVI, fault distance, water system distance, undulation, land use, topography, and lithology).
Figure 3. Spatial distribution of rating factors (slope, rainfall, soil, road distance, slope direction, curvature, geology, NDVI, fault distance, water system distance, undulation, land use, topography, and lithology).
Applsci 16 00120 g003
Figure 4. Researchers conduct field investigations at potential hazard points along the roadside slope of the under-construction highway from Hotan to Kangziva (photographed on 13 July 2023).
Figure 4. Researchers conduct field investigations at potential hazard points along the roadside slope of the under-construction highway from Hotan to Kangziva (photographed on 13 July 2023).
Applsci 16 00120 g004
Figure 5. Highway hazard identification diagram.
Figure 5. Highway hazard identification diagram.
Applsci 16 00120 g005
Figure 6. Average deformation rate of surface deformation.
Figure 6. Average deformation rate of surface deformation.
Applsci 16 00120 g006
Figure 7. Pearson correlation coefficient heat map.
Figure 7. Pearson correlation coefficient heat map.
Applsci 16 00120 g007
Figure 8. Comparison of different nearest neighbour measures.
Figure 8. Comparison of different nearest neighbour measures.
Applsci 16 00120 g008
Figure 9. Model susceptibility diagram.
Figure 9. Model susceptibility diagram.
Applsci 16 00120 g009
Figure 10. ROC curves of six models (SVM, RF, KNN, ANN, GBDT, and LR).
Figure 10. ROC curves of six models (SVM, RF, KNN, ANN, GBDT, and LR).
Applsci 16 00120 g010
Table 1. Confusion matrix accuracy identification.
Table 1. Confusion matrix accuracy identification.
CategorySBAS-InSAR Detection for LandslidesSBAS-InSAR Detection Indicates No Landslide
Optical remote sensing confirms landslides203
Optical remote sensing confirms no landslide512
Table 2. Comparison table of hazardous areas identified during field surveys.
Table 2. Comparison table of hazardous areas identified during field surveys.
Order NumberDisaster TypesPlaceInterpret the Sign
1slideMoyuhe channel in Laiika Township, Hotan County, Xinjiang ProvinceThere are traces of fresh debris flows on the surface, and the channels contain large amounts of mixtures of plant residues and silt mixtures.
2hill-creepFujicun, Langru Township, Hotan County, Xinjiang ProvinceThe gullies are apparent, and much silt and gravel are piled up in the riverbed. The surface soil is washed away in some areas, and the vegetation coverage is significantly reduced.
3hill-creepKumarat Village, Langru Township, Hotan County, Xinjiang ProvinceIt was observed that the direction of surface water flow was obvious, the soil was wet, there was sedimentation, and the surface vegetation was eroded.
4hill-creepKumarat Village, Langru Township, Hotan County, Xinjiang ProvinceThe channel is wide, the soil is loose, there are traces of running water on the surface, and large stones and branches are scattered.
5hill-creepKumarat Village, Langru Township, Hotan County, Xinjiang ProvinceThe irregular grooves on the surface, mixed with soil and stones, vegetation broken off, and water accumulation in local areas indicate strong surface erosion.
6hill-creepMiti Zi Village, Langru Township, Hotan County, Xinjiang ProvinceThe thick layer of silt on the ground contains large rocks, sparse vegetation and traces of erosion, indicating that this area has experienced a violent debris flow event.
7hill-creepFujicun, Langru Township, Hotan County, Xinjiang ProvinceThe mountain is obviously exposed, with scattered rocks and developing cracks. The terrain is steep and the surface soil is loose, which is a typical collapse-prone area.
8slideYingawati Village, Saiybag Township, Moyu County, XinjiangIt is observed that the slope is exposed in large areas, the rock fault is clearly visible, and a large amount of debris is piled up at the bottom of the slope. The surrounding vegetation is sparse, which makes it easy to further collapse.
9slideYingawati Village, Saiybag Township, Moyu County, XinjiangThe terrain is steep, the surface rock has partially loosened and there are signs of new collapse, with the surrounding terrain supporting further material to slide down.
10hill-creepYingawati Village, Saiybag Township, Moyu County, XinjiangThere are clear signs of movement on the ground, with topsoil layers washed away and large rocks and tree fragments left behind.
11slideYingawati Village, Saiybag Township, Moyu County, XinjiangThere are many fresh cracks on the slope, and some local rock layers have fallen off, indicating that the slope is unstable and may continue to move.
12slideYingawati Village, Saiybag Township, Moyu County, XinjiangSurface cracks are widespread, the soil has shown signs of sliding, vegetation is badly damaged, and the terrain conditions indicate a high risk of landslide.
13hill-creepYingawati Village, Saiybag Township, Moyu County, XinjiangThe rock face is exposed, and many of the rocks have loosened. Signs of downward material movement are visible, indicating the potential for continued erosion and collapse.
14hill-creepYingawati Village, Saiybag Township, Moyu County, XinjiangThe terrain is steep, with many rock collapses and obvious traces of water flow between the rock layers, which may have exacerbated the collapse.
15hill-creepPuka Village, Saiybag Township, Moyu County, XinjiangThe recent collapse on the hillside caused a large amount of loose material to accumulate at the foot of the slope, and cracks in the upper rock layer can be seen.
16hill-creepPuka Village, Saiybag Township, Moyu County, XinjiangThe crack develops, the slope is unstable, some rocks begin to slide, the surface soil is obviously eroded, and the collapse risk is high
17hill-creepYingawati Village, Saiybag Township, Moyu County, XinjiangThe cracks between the rock layers have widened, and the local slope has been obviously lowered. The accumulation at the foot of the slope has increased, indicating that there is a potential for further collapse.
18hill-creepYingawati Village, Saiybag Township, Moyu County, XinjiangNew cracks and rock slips were observed on the slope, and the steep terrain increased the risk of collapse, requiring continuous monitoring.
19hill-creepKusui Village, Saiyibag Township, Moyu County, XinjiangSurface cracks are widely distributed, the soil has been showing signs of sliding, vegetation damage is serious, and the topographic conditions indicate high risk of landslide.
20hill-creepUrawati Village, Saiybag Township, Moyu County, XinjiangIn many areas, the ground subsidence occurred, the cracks developed longitudinally along the slope, the soil was loose, and the upper soil was obviously separated from the lower soil.
21hill-creepUrawati Village, Saiyibag Township, Moyu County, XinjiangObserve that there are obvious long strip cracks on the surface; the continuous rainy season leads to soil moisture and instability.
22hill-creepKulamuyiqi Village, Saiybag Township, Moyu County, XinjiangLocal ground has begun to collapse, indicating the potential risk of high landslide.
23hill-creepUrawati Village, Saiybag Township, Moyu County, XinjiangThe slope soil has horizontal cracks, some soil blocks have fallen off, and the water flow erosion is serious at the bottom of the slope.
Table 3. Multiple collinearity diagnostics for disaster variables.
Table 3. Multiple collinearity diagnostics for disaster variables.
Disaster Factor VariablesVIFTOL
Distance from the road1.460.685
Slope direction1.0620.941
falling gradient7.8180.128
curvature1.0030.997
geology1.2740.785
Crack1.1650.858
River system1.3070.765
NDVI1.0000.998
Relief amplitude7.1890.139
Rainfall6.20.161
Land use1.1580.864
Landform3.7350.268
Soil8.070.124
Lithology8.2730.121
Table 4. Random forest model parameter configuration.
Table 4. Random forest model parameter configuration.
Parameter NameParameter Value
Maximum tree depth10
Minimum number of cases in the parent node500
Minimum number of cases in a child node100
Number of nodes193
Number of terminal nodes126
Depth7
Table 5. Sample accuracy table.
Table 5. Sample accuracy table.
SampleActual MeasurementForecastCorrect Percentage
Training030,727265892.00%
1246313,02484.10%
Overall percentage67.90%32.10%89.50%
Inspection0773871191.60%
1583330385.00%
Overall percentage67.50%32.50%89.50%
Table 6. Summary table of neural network model parameter settings and evaluation results.
Table 6. Summary table of neural network model parameter settings and evaluation results.
NameParameter NameParameter Value
Model parameter configurationData preprocessingNone
Training set proportion0.8
Hidden layer neuron configuration(100)
Activation functionrelu
Weight optimisation methodadam
L2 regularisation coefficient1.0 × 10−4
Initial learning rate0.001
Learning rate optimisation methodsconstant
Minibatch sizeauto
Maximum number of iterations200
Optimise tolerance1.0 × 10−4
Model evaluation performanceAccuracy rate90.592%
Precision (Overall)90.492%
Recall rate (comprehensive)90.592%
F1-score0.905
Table 7. Summary table of gradient boosting tree model parameter settings and evaluation results.
Table 7. Summary table of gradient boosting tree model parameter settings and evaluation results.
NameParameter NameParameter Value
Model parameter configurationData preprocessingNone
Training set proportion0.8
Error term penalty coefficient1.0
kernelrbf
Kernel function coefficient values0.01
Multi-class decision functionovr
Model convergence parameters0.001
Maximum number of iterations2000
Model evaluation performanceAccuracy rate91.687%
Precision (Overall)91.921%
Recall rate (comprehensive)91.687%
f1-score0.918
Table 8. Logistic regression model evaluation results (B denotes the regression coefficient for each factor in the model, S.E. represents the standard error, Wald indicates the chi-square value, and sig denotes significance).
Table 8. Logistic regression model evaluation results (B denotes the regression coefficient for each factor in the model, S.E. represents the standard error, Wald indicates the chi-square value, and sig denotes significance).
BS.E.WaldSig.
Distance from the road0.4170.0082628.7730
Slope direction0.0720.005252.7120
falling gradient0.3260.02254.4530
curvature−0.0840.01156.4750
geology0.1030.004712.2820
Crack0.4970.0074826.7860
River system0.7590.0133386.3040
NDVI000.0980.754
Relief amplitude−0.1130.02716.8030
Rainfall0.1230.01942.480
Land use−0.0370.0147.3740.007
Landform−0.3270.019309.2280
Soil0.4160.0121111.0220
Lithology−0.3020.017307.5050
Constant−8.420.1294251.670
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Dai, X.; Song, X.; Xing, L.; Han, D.; Li, S. Integrating SBAS-InSAR and Machine Learning for Enhanced Landslide Identification and Susceptibility Mapping Along the West Kunlun Highway. Appl. Sci. 2026, 16, 120. https://doi.org/10.3390/app16010120

AMA Style

Dai X, Song X, Xing L, Han D, Li S. Integrating SBAS-InSAR and Machine Learning for Enhanced Landslide Identification and Susceptibility Mapping Along the West Kunlun Highway. Applied Sciences. 2026; 16(1):120. https://doi.org/10.3390/app16010120

Chicago/Turabian Style

Dai, Xiaomin, Xinjun Song, Liuyang Xing, Dongchen Han, and Shuqing Li. 2026. "Integrating SBAS-InSAR and Machine Learning for Enhanced Landslide Identification and Susceptibility Mapping Along the West Kunlun Highway" Applied Sciences 16, no. 1: 120. https://doi.org/10.3390/app16010120

APA Style

Dai, X., Song, X., Xing, L., Han, D., & Li, S. (2026). Integrating SBAS-InSAR and Machine Learning for Enhanced Landslide Identification and Susceptibility Mapping Along the West Kunlun Highway. Applied Sciences, 16(1), 120. https://doi.org/10.3390/app16010120

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop