1. Introduction
Landslides, one of the most common natural disasters, prevalent in mountainous regions worldwide, pose significant threats to the ecosystem [
1]. Landslides are accounted as the downhill movement of debris, soil, and rocks under the force of gravity and can be classified based on the materials involved (mud, rock, soil, or debris) and their movement type (topple flow or slide) [
2]. The factors leading to landslides are a combination of tectonics, geomorphology, and climate change, which culminate in a critical slope evolution [
3,
4]. Other triggering factors contribute to landslides depending on the specific features of the area. Natural variables such as rainfall, rapid snowmelt, earthquakes, and anthropogenic activities, e.g., habitation construction, irrigation, etc., can play a role in the occurrence of landslides [
5]. While landslides are often regarded as a natural process, their occurrence mostly has been influenced by anthropogenic activity [
6]. In recent years, exponentially growing populations, a surge in infrastructure development, and settlement growth in developing countries’ mountainous regions have increased the probability of landslides, leading to an alarming increase in landslide-related fatalities [
7].
The “China-Pakistan Economic Corridor” (CPEC), a significant project under the “One Belt and One Road” initiative, is centered around connecting Pakistan and China via the Karakoram Highway (KKH). The KKH was constructed from 1974 to 1978 and commenced operation in 1979. The highway encompasses most of the route of the CPEC. However, this vital route faces challenges due to the high mountainous terrain with overflowing loose debris and heavy rainfall, triggering frequent and severe geological catastrophes such as glacier debris flows, rock falls, landslides, debris and soil slippage, and avalanches [
8]. Determining landslide probabilities along the KKH is a complex process influenced by limited data availability, technical limitations, and harsh environments. Since its completion, the reputation of the KKH has been marred by various geohazards [
9]. Specifically, earth-induced landslides in 2005 caused considerable damage to the highway [
10]. Enormous rockslides and rock avalanches have occurred, with over 115 incidents reported since 1987 [
11]. Moreover, in 2010, a landslide blocked the Hunza River, inundating 19 km of the highway with a loss of 20 lives and damaging 350 houses [
12]. The geological conditions along the KKH pose additional challenges, including fragile and weathered rock masses, varying climates, low and high terrains, diverse stratigraphy, and local variations in tectonic motion. Due to these factors, the study region has become a geohazard laboratory. Enhancing precise LSM along the KKH to mitigate the risks posed by these natural hazards is imperative.
Recently, remote sensing (RS) and geographic information systems (GIS) technology have made remarkable technological progress. The utilization of GIS spatial analysis tools and remote-sensing-derived data has enhanced the effectiveness of landslide susceptibility mapping for accurate assessment. Here, comprehensive landslide inventory data and knowledge of landslide conditioning factors are crucial for both data-driven spatial modeling and knowledge-based approaches [
13]. Researchers have conducted numerous studies using bivariate analyses to quantify the spatial correlations between landslides and specific factors that influence their dispersion [
14,
15,
16,
17]. Several other studies have applied knowledge-based spatial approaches to produce natural risk vulnerability maps, fuzzy logic models [
18,
19], the analytical hierarchy process (AHP) [
20], and the evidential belief function [
21], as well as data-driven spatial approaches such as support vector machines [
22,
23,
24], logistic regression methods [
24,
25], artificial neural network (ANN) models [
26,
27,
28], alternating decision tree (ADTree) [
29], principal component analysis (PCA) [
30], deep belief network (DBN) [
31], decision tree [
25,
32], superposable neural networks [
33], and naïve Bayes [
34]. Expertise-based models often encounter challenges due to their reliance on expert opinions, which can introduce biases [
35,
36].
The primary strengths of probabilistic and ML approaches lie in their objective statistical foundation, consistency, capacity for precisely analyzing the factors influencing landslide development, and capacity building for updates. In this perspective, researchers are continuously seeking new and relatively more robust algorithms that can generalize across different spatial scales [
37,
38]. Deep learning algorithms, which are specifically developed for large datasets but have seen limited application thus far, need to be implemented and evaluated in this context. Currently, deep learning models, particularly recurrent neural networks and convolutional neural networks, have demonstrated remarkable success across various applications, making them well suited for handling big data [
39]. RNNs, like other DL models, comprise a loss function, learnable parameters, and layers [
40]. On the other hand, CNNs differ from RNNs as they include convolutional and pooling layers and focus solely on the current input data, while RNNs consider both the earlier provided inputs and present input data [
41]. CNNs have proven effective in tasks like semantic segmentation and object detection [
7]. Conversely, RNNs show superior performance in tasks such as image recognition, characterization, and sequential data analysis, including time series spatial data [
42]. Despite the acceptable results achieved by CNNs and RNNs in various domains, their true efficiency and capabilities in landslide modeling and large-scale landslide susceptibility mapping (LSM) on big data have not been thoroughly analyzed [
13]. A few deep learning models have been utilized for natural hazard vulnerability mapping, containing landslide susceptibility mapping and flash floods [
43,
44,
45]. However, these studies have separately employed different deep learning models, and their relative proficiency has not been evaluated yet.
In recent years, interferometric synthetic aperture radar (InSAR) methods have acquired universal approval and usage as tools for landslide monitoring and mapping. Over the past two decades, the RS technique, particularly In-SAR, has demonstrated substantial possibility across different fields, including the study of landslide deformation [
46] and groundwater extraction [
47]. PS-InSAR proves useful in automatic slow-moving landslide mapping using a spatial statistical technique, the detection of particular landslides and the delineation of extended unstable regions, redefining of the limits of historical landslides, the detection of landslides using a multitemporal analysis of SAR imagery, and the verification of the terrain elements causing slope deformation [
48]. In areas prone to frequent and rapid large landslides, RS provides a solution through surveys and advanced detection methods [
49]. These techniques can greatly aid in assessing and creating landslide inventory maps. Various methods of InSAR have been effectively used in mapping slope displacement, including that in [
50], the assessment of land displacement places identified by using SBAS-InSAR [
51], the D-InSAR technique for landslide observing and land deformation [
51,
52], the coherence pixel technique [
53], the SqeeInSAR approach to measuring surface motion [
51], interferometric point target analysis [
54], the use of StaMPS to evaluate the displacement in a high-vegetation region [
55,
56], and the PSInSAR method to compute the movement of landslides. These approaches are related to detecting and mapping landslide events, as mentioned in [
54,
57,
58].
In this study, a combination of optical RS analysis and the InSAR technique is utilized to identify landslides and create an updated landslide inventory. The main goals are as follows: (1) mapping all types of landslides along the KKH and estimating displacement maps to identify new landslides, identify unstable places, and redefine the boundaries of previously identified landslides based on the deformation model; (2) generating a landslide susceptibility map using state-of-the-art ML and deep learning (DL) models, including random forest, XGBoost, recurrent neural networks, and convolutional neural networks; (3) comparing the performance of these advanced ML and DL models in terms of landslide susceptibility; (4) assessing the significance and relationships of environmental and anthropogenic factors influencing landslides and their role in evaluating landslide susceptibility in the study area; and (5) determining the most accurate susceptible model reliant on precision and AUC value. Despite the fact that the KKH faces significant landslide threats every summer, previous research has not adequately addressed the issue. Therefore, the landslide susceptibility map produced in this study will aid urban planning and disaster reduction efforts in the area. Moreover, the final InSAR-based landslide inventory will assist in tracking risky areas to minimize future hazards and fatalities. It is imperative to highlight that no previous studies have applied RNNs and CNNs for LSM at KKH. As the first study to utilize and compare these ML and DL models for LSM in this region, it will substantially contribute to the scientific literature.
4. Discussion
The current study utilized RS techniques, such as optical RS and InSAR, for risk assessment and landslide mapping along the KKH [
63,
99,
100]. This study took benefit of the multi-azimuth interpretation provided by the descending and ascending Sentinel-1 dataset, allowing for more extensive monitoring of surface displacement. The PS-InSAR and SBAS-InSAR techniques effectively captured regions with high deformation rates in most areas along the KKH. Additionally, the comprehensive landslide inventory presented in this study includes the latest landslides, ensuring the database is up to date and its valuable information. By processing Sentinel-1 data from June 2021 to June 2023, utilizing the InSAR technique, 24 new prospective landslides were identified, and some existing landslides were redefined. This updated landslide inventory was then utilized to create a landslide susceptibility model, which investigated the link between landslide occurrences and the causal variables. By combining the findings from PS-InSAR, SBAS-InSAR, and field investigations, the inventory was updated with landslides that have the potential for future failure and pose risks for the region, contributing to improved landslide susceptibility mapping.
The selected landslide influencing factors were used to construct CNN 2D and RNN architectures for comparison with the XGBoost and RF methods. The LSMs were validated and compared based on the AUROC curve and accuracy. CNN is known for its ability to efficiently obtain spatial data using weight sharing and local connections, making it a promising method for landslide modeling [
101]. Earlier research has shown that combining CNNs with additional statistical approaches can produce better accuracy in landslide susceptibility modeling than using CNNs alone [
102]. According to [
103], CNN models are an improved tool for landslide modeling due to their substantial outcomes and higher accuracy rate in spatial landslide forecasting. The outcomes also reveal that both DL and traditional ML algorithms give excellent precision in a variety of sectors, such as landslide assessment and earth science studies throughout the world [
104], which is in line with our findings, which showed that the ROC for the four models varies from 82.56 to 75.37%.
In the subsequent experiments, the proposed CNN and RNN models demonstrated enhanced predictive capability compared to the popular XGBoost and RF classifiers. Specifically, CNN-2D attained the highest AUC value of 0.825 on the validation set, indicating its effectiveness in improving prediction performance and its potential as a potential approach for future research. Various statistical and machine learning approaches have been compared and applied for landslide spatial forecast in areas, including AHP and Scoops 3D [
105], frequency ratio (FR) and weight of evidence [
106], the weighted overlay technique and AHP [
9], random forest [
63], support vector classification (SVC) [
107], and XGBoost [
100], but DL techniques, such as CNNs, provide powerful improvements by automatically exploring representations from raw data, making them valuable in various fields, including landslide susceptibility assessments. The experimental findings highlighted that CNN-2D outperformed the traditional DL approach of RNNs and the classical XGBoost and RF ML techniques. Furthermore, the suggested data representation techniques offer an innovative approach to handling raw landslide data. By exploiting the power of DL methods and combining them with other approaches, there is enormous potential to advance landslide susceptibility analysis in the future. The proposed 2D CNN structure includes convolutional max pooling layers and a dropout layer. Overfitting is a common issue when utilizing a 2D CNN in LSM. To address overfitting, each convolution layer is subsequently followed by a dropout layer, which temporarily discards NN units during the training process of the CNN based on a certain probability. This helps to improve classification accuracies and enhance the model’s generalization capability.
Different LCFs influence landslide triggers and relate to each other, making the selection of appropriate variables crucial for building an accurate landslide susceptibility model. The aim is to construct models with reduced noise and greater forecast ability. Before analyzing landslide susceptibility, it is essential to evaluate the forecasting potential of each contributing factor. To attain this, efforts are made to select the most relevant and impactful factors. Multicollinearity analysis is employed to evaluate correlations between the LCFs. In this study, 15 landslide conditioning variables were chosen as independent factors for evaluating landslide susceptibility, and the results are presented in
Table 3. The variance inflation factor (VIF) was used to test the multicollinearity between these factors. Among the selected factors, rainfall had the highest VIF score of 4.892, while aspect had the lowest VIF score of 1.017. The tolerance (TOL) values ranged from 0.204 to 0.982. The outcomes revealed that there is no significant multicollinearity among the chosen variables, allowing all variables to be integrated into the models. It is worth noting that landslides can still occur in areas with significant vegetation due to rainfall and other external forces.
Despite the beneficial effect of the selected factors in evaluating landslide susceptibility, the current research could have been more effective if certain factors were adhered to. One major factor is data availability. This research relied on limited data diversity with a focus on historical landslide data, which limited the comprehensive analysis [
60]. Another factor is that the input data resolution remains unpredictable during the data preparation phase, which has been a prevalent issue in past investigations [
108,
109]. Terrain condition factors were derived from a 12.5 m resolution DEM, while variables related to geological conditions were based on a 1:500,000 scale geological map. All factor layers were resampled at a 12.5 m resolution in ArcGIS 10.8 software to ensure data availability and computational convenience. The analysis of model performance in this study indicates that resampling processing was feasible. Secondly, because of restricted data availability, we examined several types of landslides with varying triggering conditions throughout a given time. While some investigators have previously explored this approach, a separate investigation of distinct types of landslides is more in line with the practical and current state factors [
110,
111].