Landslide Susceptibility Evaluation Based on Potential Disaster Identification and Ensemble Learning

Catastrophic landslides have much more frequently occurred worldwide due to increasing extreme rainfall events and intensified human engineering activity. Landslide susceptibility evaluation (LSE) is a vital and effective technique for the prevention and control of disastrous landslides. Moreover, about 80% of disastrous landslides had not been discovered ahead and significantly impeded social and economic sustainability development. However, the present studies on LSE mainly focus on the known landslides, neglect the great threat posed by the potential landslides, and thus to some degree constrain the precision and rationality of LSE maps. Moreover, at present, potential landslides are generally identified by the characteristics of surface deformation, terrain, and/or geomorphology. The essential disaster-inducing mechanism is neglected, which has caused relatively low accuracies and relatively high false alarms. Therefore, this work suggests new synthetic criteria of potential landslide identification. The criteria involve surface deformation, disaster-controlling features, and disaster-triggering characteristics and improve the recognition accuracy and lower the false alarm. Furthermore, this work combines the known landslides and discovered potential landslides to improve the precision and rationality of LSE. This work selects Chaya County, a representative region significantly threatened by landslides, as the study area and employs multisource data (geological, topographical, geographical, hydrological, meteorological, seismic, and remote sensing data) to identify potential landslides and realize LSE based on the time-series InSAR technique and XGBoost algorithm. The LSE precision indices of AUC, Accuracy, TPR, F1-score, and Kappa coefficient reach 0.996, 97.98%, 98.77%, 0.98, and 0.96, respectively, and 16 potential landslides are newly discovered. Moreover, the development characteristics of potential landslides and the cause of high landslide susceptibility are illuminated. The proposed synthetic criteria of potential landslide identification and the LSE idea of combining known and potential landslides can be utilized to other disaster-serious regions in the world.


Introduction
Accompanied by global climate change, extreme rainfall increase, urban expansion, and human engineering activity intensification, catastrophic landslides have much more frequently occurred around the world, caused enormous losses to human lives and properties, and seriously hindered the sustainable development of nations and society. Thus, an imperious demand is suggested for the valid prevention and control of landslides to effectively mitigate the massive damage to people and engineering infrastructures. Landslide susceptibility evaluation (LSE) can predict the dangerous region where landslides may occur [1]; thus, it has become a crucial technique for disaster prevention and attracted great attention from the worldwide scientists [2][3][4][5][6][7]. Moreover, about 80% of these disastrous landslides have not been discovered in advance [8], called as potential or hidden landslides, which causes disaster control measures to miss the best time. Therefore, the early identification of potential landslides and accurate evaluation of landslide susceptibility play an essential role in control and mitigation of landslide hazards. This work suggests new synthetic criteria of potential (active) landslide identification to improve the accuracy of landslide recognition. Also, this work conducts LSE by integrating both the potential and known landslides to enhance the precision and rationality of LSE.
There have been a large number of excellent studies in LSE, and the LSE methods in these studies generally fall in two categories: knowledge-driven methods and data-driven methods [23,24]. The knowledge-driven LSE methods mainly include analytic hierarchy process (AHP) [5,25,26], fuzzy-AHP [27], fuzzy-relation AHP [28], fuzzy logic [29,30], fuzzy comprehensive evaluation [31], and fuzzy unordered rule induction [32] methods. The datadriven methods primarily consist of logistic regression [33], frequency ratio [34], weights of evidence [35], Information value [36], shallow machine learning (e.g., support vector machine [37], artificial neural network [38], random forest [39], and decision tree [40]), and deep learning (e.g., convolutional neural network [41], deep neural network [42], recurrent neural network [43], and deep belief network [44]) methods. Moreover, some ensemble methods were employed in LSE, including the combination of ant colony optimization and deep belief network [45], the ensemble of a radial basis function neural network, random subspace, attribute selected classifier, cascade generalization, and dagging [46], bagging based reduced error pruning trees [47] and so on. However, the vast majority of present studies conducted LSE based on optical images and known landslides and neglected the serious threat posed by potential landslides. In recent years, a few studies became to involve InSAR technique into LSE and mainly include the following four aspects. (1) The surface deformation characteristics extracted by InSAR technique were employed to validate the LSE results acquired from optical images and known landslides [48,49]. (2) LSEs were conducted by combining InSAR-detected surface deformation and other influencing factors. The factors of surface deformation, lithology, topography, land use and so on were used together as the input parameters to evaluate landslide susceptibility [50][51][52]. (3) The surface deformation features detected by InSAR technique were adopted to improve the LSE results obtained from optical images and known landslides [53][54][55] or to refine the LSE map acquired from a physical model [56]. (4) Landslide susceptibility was assessed according to the potential landslides instead of the historical landslides [57,58]. The aspects (1) and (2) actually conducted LSE according to known landslides or active known landslides. The aspect (3) virtually employed surface deformation derived by InSAR technique to recognize 3 of 26 potential landslides. To our knowledge, only one study carried out LSE using both potential and historical landslides. Kontoes et al. [59] employed the characteristics of ground deformation detected by multi-temporal InSAR interferometry technique, geomorphology, slope angle, and land cover (vegetation deterioration) to determine the active landslides from 1992 to 2010. Both the past landslides and the identified active landslides constituted the landslide inventory, and the method of weights of evidence (WoE) was used to generate the LSE map. These studies on LSE have made significant contributions to improve the accuracy.
Despite great progress achieved in early identification of potential landslides and in assessment of landslide susceptibility, there are still two problems restricting the accuracy and rationality. (1) The present studies generally employed surface deformation or the combination of surface deformation, geomorphological, and topographical features to recognize potential landslides. The essential disaster-triggering mechanism has been neglected, which leads to the relatively low identification accuracy and relatively high false alarm and may cause a huge waste of time, money, and manpower on field investigation to validate the fake landslides. The Ministry of Natural Resources, PRC indicated that the precisions of potential landslide recognition via InSAR, optical image, and expert interpretation techniques are 69.9%, 62.8%, 44%, and 29.8%, respectively in Sichuan Province, Hubei Province, Shanxi Province, and Chongqing City that are vulnerable to landslides. This is because the landslide-influencing factors contain not only the controlling factors, such as topography, geology, and geomorphology but also the triggering factors, including earthquakes, precipitation, and human engineering activity. Landslides occur under the joint effect of the controlling and inducing factors. Thus, the synthetic criteria integrating slope deformation, disaster-controlling mechanism, and disaster-inducing mechanism are necessary to improve the accuracy of potential landslide recognition. (2) The vast majority of present studies adopted optical images and historical landslides to perform LSE, omitted the development and threat of potential landslides, and to a certain degree limited the precision, rationality, and practicability of LSE maps. The historical and potential landslides both reflect landslide activity and susceptibility. Therefore, a complete landslide inventory, including the two types of landslides, can improve the accuracy, rationality, and practicability of LSE maps. Focusing on the above two problems, this work makes two improvements. (1) Synthetic criteria are established to improve the accuracy of potential landslide identification, and the criteria are composed of the characteristics of ground deformation, geology, topography, geomorphology, environment, earthquake, rainfall, and human engineering activity. (2) Landslide susceptibility is assessed according to both the potential and historical landslides to improve the precision and rationality of a LSE map.
This work adopts the multisource data including geological, topographical, geographical, hydrological, meteorological, seismic, and remote sensing data and employs the time-series InSAR technique, slope unit segmentation, and machine learning XGBoost algorithm to identify potential landslides and to conduct LSE in the development region in Chaya County, Tibet. This region is characterized by intense neotectonic movement [60], developed faults [61], steep topography, fragile geological environment [62], and diverse human engineering activities and suffers from frequent landslide activity. Moreover, some potential landslides have not been discovered in this region because the steep relief and high elevation have brought great difficulties to field survey. Thus, the region is selected as the study area in this work. Furthermore, the cause of landslide high-susceptibility, including the function of geology, topography, hydrology, meteorology, earthquake, and human activity, are illuminated. The proposed synthetic criteria for hidden landslide identification and the LSE idea of combining potential and historical landslides can also be applied to other disaster-intensive areas.

Study Area
The study area ( Figure 1) consists of the development areas of Jitang Town, Yanduo Town, Xiangdui Town, Rongzhou Township, and Kagong Township in Chaya County, Tibet and covers an area of 3380.73 km 2 . The region is situated in the India-Eurasia subduction collision zone, featuring intensive crust uplift [60] and relative frequent earthquakes that were mainly reflected in the succession activities of old faults [63]. The area is characterized by complicated tectonics and developed faults and is dominated by brittle fractures with the overall tectonic trend from the northwest to southeast (1:200,000 geological map; [61]). Strata from the Proterozoic to Cenozoic are exposed, including the strata in Neoproterozoic (Pt), Carboniferous age (C), Permian age (P), Triassic age (T), Jurassic age (J), Cretaceous age (K), Paleogene age (E), and Quaternary age (Q) (1:200,000 geological map; [62]). Lithology is dominant in the rock group of weak mudstone and shale and the rock assemblage of quartz sandstone, siltstone, and volcanics [62].

Study Area
The study area ( Figure 1) consists of the development areas of Jitang Town, Yanduo Town, Xiangdui Town, Rongzhou Township, and Kagong Township in Chaya County, Tibet and covers an area of 3380.73 km 2 . The region is situated in the India-Eurasia subduction collision zone, featuring intensive crust uplift [60] and relative frequent earthquakes that were mainly reflected in the succession activities of old faults [63]. The area is characterized by complicated tectonics and developed faults and is dominated by brittle fractures with the overall tectonic trend from the northwest to southeast (1:200,000 geological map; [61]). Strata from the Proterozoic to Cenozoic are exposed, including the strata in Neoproterozoic (Pt), Carboniferous age (C), Permian age (P), Triassic age (T), Jurassic age (J), Cretaceous age (K), Paleogene age (E), and Quaternary age (Q) (1:200,000 geological map; [62]). Lithology is dominant in the rock group of weak mudstone and shale and the rock assemblage of quartz sandstone, siltstone, and volcanics [62].   A large number of mountains are distributed in the study area featuring high mountains, deep valleys, and rugged topography, i.e., a deep-cutting alpine and gorge landform. The high elevation is a remarkable feature that changes from 2920 to 5584 m, with the average of 4254 m, and the lowest position is situated in the outlet of the Lancang river. The region is typical of plateau temperate semi-arid monsoon climate [64], and rainfall concentrates from June to September. Water systems are developed, and the Lancang, Maiqu, Wangbuqu, Sequ rivers run through the area. Along both sides of the rivers, many slopes were excavated due to road construction [62]. Human engineering activity is diverse and mainly composed of slope cutting and house building, road construction, hydropower construction, mining activity, and agricultural activity [62]. The national roads G214 and G349 and the provincial highways S203 and S303 wind through the region.
Therefore, the study area is depicted by intensive crust movement [60], developed fault tectonics [61], fragile geological environment, highly weathered rocks [62], deepcutting alpine and gorge landform, seasonal rainfall, developed drainage system, and diverse human engineering activity, which has created favorable conditions for landslide occurrence and development.

Historical Landslide Inventory
A specific survey on geological hazards has been conducted in Chaya County at a scale of 1:50,000 by Bureau of Geology and Mineral Exploration and Development of Tibet Autonomous Region. Some landslides in Chaya County are characterized by large scales, high elevations, deeply cut surfaces, and deep sliding surfaces [65]. The historical landslide inventory is provided by Bureau of Geology and Mineral Exploration and Development of Tibet Autonomous Region. Fifty-nine historical landslides occurred in the study area, and the main disaster-controlling and disaster-triggering characteristics are shown in Supplementary Figure S1. These characteristics provide a basis for the identification criteria of potential landslides. With regard to the geological characteristics (Supplementary Figure S1a), these landslides mainly occurred in the engineering rock group of weak mudstone and shale and in the rock assemblage of relatively hard quartz sandstone, siltstone, and volcanics. Landslides also developed in the rock group of relatively hard sandstone and limestone and in the rock assemblage of weak sandstone, slate, and conglomerate. These rock groups are characterized by developed joints and severe weathering and thus contribute to landslide development [62]. In addition, 64.4% of landslides were situated within 3 km of the faults (Supplementary Figure S1b) due to the fractured rock mass and developmental fissures in the vicinity of the faults and shear zones [66,67]. Moreover, the fault strike controlled the path of rivers and the direction of valleys and influenced river erosion and terrain incision to the slopes [68,69]; thus, faults have played an important role in landslide occurrence in the study area [62]. Regarding the topographic features, landslides were concentrated in the regions with elevations of 3000-4000 m (Supplementary Figure S1c) and were mainly situated on the slopes with slope angles of 20 • -30 • . Moreover, all the landslides featured the slope angle larger than 10 • (Supplementary Figure S1d). As for the environmental characteristics, river scouring and erosion and groundwater level fluctuation had a critical impact on landslide occurrence, and 67.8% of landslides were distributed within 500 m of the rivers (Supplementary Figure S1e). As to the disaster-triggering mechanism, landslide occurrence in the study area was closely associated with earthquake events; for example, M s 6.1 Changdu earthquake on 12 August 2013, triggered new landslides in the study area [62]. In addition, 72.73% of landslides were induced during the rainy season from June to September, and 77.97% occurred in the region with high cumulative rainfall of 1200-1300 mm from 23 April 2018, to 26 December 2019 (Supplementary Figure S1f). Furthermore, 64.41% landslides were located within 500 m of the roads and 47.46% were within 200 m of the roads (Supplementary Figure S1g).

Multisource Data
Eight sets of multisource data (Table 1) are adopted in this work to identify potential landslides and to conduct LSE. (1) Sentinel-1A SAR images are used to detect surface deformation based on SBAS-InSAR technique. Forty-nine ascending SAR images are employed to extract deformation displacement and velocity. (2) Setinel-2 multispectral images are adopted to extract normalized differential vegetation index (NDVI) and land use. (3) Google Earth images are employed to interpret and refine road networks and water systems. Google Earth images and Mapbox images are used to extract the microgeomorphological and macroscopical deformation characteristics of landslides, including trailing edges, front edges, cracks, collapses, terrace scarps, gullies and so on. (4) SRTM DEM data are adopted to establish the topographic factors, including elevation, slope angle, aspect, curvature, surface roughness, surface cutting depth, relief amplitude, elevation gradient, topographic wetness index (TWI). (5) Geological maps are used to construct the geological factors consisting of strata (lithology) and distance to fault. (6) Geographic data of road networks and drainage systems, refined by Google Earth images, are adopted to extract the factors of distance to road and distance to river. (7) Seismic data with the magnitude above 3.0 and within 400 km of the study area are used to establish the seismic factors of peak ground acceleration (PGA) and kernel density of earthquake distribution that reflects the distribution concentration of earthquake events. (8) CHIRPS rainfall data are employed to build the meteorological factor of cumulative rainfall. Therefore, according to the surface deformation, geoenvironmental, seismic, rainfall, and human engineering activity characteristics derived from the multisource data, potential landslides can be identified, and landslide susceptibility can be evaluated.

Methods
The technical route is shown in Figure 2. First, the geoenvironmental factors and disaster-triggering factors are derived from the multisource data of geology, topography, geography, meteorology, earthquake, and remote sensing data. These factors consist of two classes of indices: the identification indices of potential landslides and the evaluation indices of landslide susceptibility. Second, the surface deformation is extracted from Setinel-1 SAR images. Third, potential landslides are recognized by the characteristics of surface deformation, lithology, fault tectonics, topography, micro-geomorphology, water system, earthquake, rainfall, and human engineering activity. Specifically, potential landslides are identified based on the ground deformation, disaster-controlling, and disaster-inducing indices of deformation velocity, engineering rock group, distance to fault, slope angle, landslide micro-geomorphology, distance to river, PGA, cumulative rainfall, and distance to road. Fourth, landslide susceptibility is evaluated by combining potential landslides and historical landslides based on the LSE indices via slope unit segmentation and XGBoost algorithms. The LSE map generated from the XGBoost algorithm is compared to the ones produced from the support vector machine (SVM) and convolutional neural network (CNN) algorithms. Fifth, the cause of landslide high-susceptibility, including the action of geology, topography, meteorology, and human activity, are illuminated.
1 SAR images. Third, potential landslides are recognized by the characteristics of surf deformation, lithology, fault tectonics, topography, micro-geomorphology, water syste earthquake, rainfall, and human engineering activity. Specifically, potential landslides identified based on the ground deformation, disaster-controlling, and disaster-induc indices of deformation velocity, engineering rock group, distance to fault, slope ang landslide micro-geomorphology, distance to river, PGA, cumulative rainfall, and dista to road. Fourth, landslide susceptibility is evaluated by combining potential landsli and historical landslides based on the LSE indices via slope unit segmentation a XGBoost algorithms. The LSE map generated from the XGBoost algorithm is compared the ones produced from the support vector machine (SVM) and convolutional neural n work (CNN) algorithms. Fifth, the cause of landslide high-susceptibility, including action of geology, topography, meteorology, and human activity, are illuminated.

Establishment of Geoenvironmental and Disaster-Inducing Factors
According to the cause of landslide development in the study area (Section 3.1 two types of influencing factors related to landslide development and occurrence are tablished, including disaster-controlling factors and disaster-triggering factors (Table  All the

Establishment of Geoenvironmental and Disaster-Inducing Factors
According to the cause of landslide development in the study area (Section 3.1.1), two types of influencing factors related to landslide development and occurrence are established, including disaster-controlling factors and disaster-triggering factors ( Table 2). All the factors, except PGA, are employed as the initial evaluation indices of landslide susceptibility. The factors of engineering rock group, distance to fault, slope angle, distance to river, PGA, cumulative rainfall, and distance to road are adopted as the identification indices of potential landslides.
The disaster-controlling factors are composed of the geological, topographic, environmental characteristics that control landslide occurrence and development. The disastertriggering factors consist of the meteorological, seismic, and human engineering activity features that induce landslides.  The factors marked by "*" indicate the ones that pass the inspection of multicollinearity and are utilized in landslide susceptibility evaluation. The abbreviations include: EVC = Elevation variation coefficient; TWI = Topographic wetness index; PGA = peak ground acceleration; NDVI = Normalized difference vegetation index; I = Hard intrusive rock series; II = Rock group of relatively weak schist and gneiss; III-1 = Rock assemblage of relatively hard limestone, quartz sandstone, and volcanics; III-2 = Rock assemblage of weak sandstone, slate, and conglomerate; III-3 = Rock group of relatively weak volcanics and clasolite; III-4 = Rock series of relatively hard sandstone and limestone; III-5 = Rock group of weak mudstone and shale; III-6 = Rock assemblage of relatively hard quartz sandstone, siltstone, and volcanics; IV-1 = Rock group of loose ice water deposit boulder, gravel, and clay; IV-2 = Rock assemblage of loose alluvial-diluvial sand, gravel, and clay. See Figure 1 for the meaning of the stratum symbols.

Surface Deformation Observed by SBAS-InSAR Technique
SBAS-InSAR technique [70] selects multiple main images to generate small-baseline image sets and conduct interference. The minimum norm criterion and singular value decomposition (SVD) method are employed to connect various image sets and extract surface deformation [71]. The short time and space baselines in each image set ensure the coherence of differential interference image pairs, increase the time sampling rate and spatial density of coherence, minimize the error caused by atmospheric phase delay, and guarantee the accuracy and continuity of surface deformation measurements [70].
Suppose in the study area, N SAR images were shot at the N times t 0 , t 1 , · · · , t N-1 , respectively, and M ( N 2 ≤ M ≤ (N − 1) N 2 ) interference image pairs are generated. The differential interference phase δϕ k (x, y) of the kth differential interferogram in Pixel (x, y) is generated from the SAR images shot at the times t a and t b (t a < t b ) and is calculated in Equation (1) [70,72,73].
in which δϕ k,t a (x, y) and δϕ k,t b (x, y) are the differential interference phases at the times t a and t b , respectively; d k,t a (x, y) and d k,t b (x, y) are the displacements in the SAR line of sight (LOS) at the times t a and t b , respectively, which is relative to the displacement at the initial time t 0 ; δϕ k,topo (x, y), δϕ k,atom (x, y), and δϕ k,noi (x, y) are the phases of topography, atmosphere, and noise, respectively; and λ represents the sensor wavelength. Physically, a phase is the product of velocity and time; thus, δϕ k (x, y) can be expressed in Equation (2) [70,72].
in which v i is the deformation velocity at the time t i . All the unwrapping phases are expressed in Equation (3), in which A is a M × N matrix [70,72]. The SVD algorithm is used to calculate Equation (3) and to obtain the least square solution of the velocity V in the minimum norm sense [70,72]. Then, atmospheric and noise phases are removed by high-frequency and low-frequency filtering, and the displacement is calculated according to the integral of velocity over time [70,72]. Finally, geocoding is carried out to acquire surface deformation measurements [70,72].
Moreover, R-index [74,75] is employed to quantitatively evaluate the visibility of deformation in LOS due to the special high and steep terrain in the study area. According to the topography, incident angle of sight, and satellite azimuth, the study area is divided into five visibility classes: good visibility, low sensitivity, foreshortening, layover, and shaded regions. The foreshortening and layover regions are characterized by geometric distortion, and the deformation in these regions cannot be effectively observed by the satellite. The deformation in the shaded areas is completely undetectable. The good visibility and low sensitivity areas are both attributed to visible areas. The surface deformation in the good visibility and low sensitivity regions can be detected by the InSAR technique without geometric distortion; thus, the two classes of regions are adopted to mask the SBAS-InSAR deformation measurements and to extract the effective deformation.

Identification Criteria of Potential Landslides
According to the cause of landslides in the study area (Section 3.1.1), new synthetic criteria for potential landslide identification are suggested, which consist of surface deformation, disaster-controlling, and disaster-inducing characteristics. Disastercontrolling characteristics include engineering rock groups, fault tectonics, topography, micro-geomorphology, and drainage systems. Disaster-inducing features are composed of earthquakes, rainfall, and human engineering activity. A slope is a moving potential landslide if it meets the following 8 criteria (Figure 3).
(1) The LOS deformation velocity V LOS of the slope is larger than two times the standard deviation σ of the Setinel-1 LOS deformation velocity [18,76], and the value of σ is 5 mm/year [18]. The standard deviation indicates the uncertainty of a velocity value [76]; thus, a slope is confidently moving if |V LOS | >10 mm/year.
(2) The surface deformation of the slope features spatial continuity. The deformation is spatially continuous when a minimum of 2 × 2 adjacent pixels have velocities more than 2σ [77]. (3) The lithology is characterized by soft rocks or highly weathered and fractured hard rocks. In the study area, the four classes of rock groups make for landslide development: the rock group of weak mudstone and shale, the rock assemblage of relatively hard quartz sandstone, siltstone, and volcanics, the rock group of relatively hard sandstone and limestone, and the rock assemblage of weak sandstone, slate, and conglomerate. In other types of engineering rock groups, no landslide occurred in the past. (4) The slopes within 3 km of faults feature a high occurrence rate of landslides. The rock mass appears more broken when it is closer to the fault, which avails landslide occurrence [62]. (5) The slope angle is larger than 10 • . (6) The slopes within 500 m of rivers are typical of a high frequency of landslide occurrence. The impacts of water erosion, hydrostatic pressure, and hydrodynamic pressure are remarkable when slopes are close to water systems [62,78,79]. (7) The slope possesses obvious landslide micro-geomorphologic characteristics, e.g., free surfaces, gullies, cracks, slide terraces, and collapses. Moreover, the slope features surface cover changes, e.g., vegetation destruction or bare land expansion. (8) The surface deformation of the slope is triggered by definite factors. There is a significant correlation between the deformation displacement or velocity and the variation of the inducing factors. e.g., PGA, cumulative rainfall, or distance to road or building. The correlation is quantitatively measured by the Pearson correlation coefficient passing a significance test under the significance level of 0.05 [77].

XGBoost Algorithm for Landslide Susceptibility Evaluation
XGBoost, under the full name of Extreme Gradient Boosting [80], is a refined and distributed open-source library of Gradient Boosting techniques and is regarded as one of machine learning algorithms featuring high precisions and high computational efficiencies. XGBoost generates a strong landslide susceptibility prediction model from an ensemble of weak prediction models (decision trees) using a gradient descent algorithm; thus, it can effectively overcome overfitting [80]. It employs a stage-wise fashion and in each iteration, produces a weak prediction model by decreasing the residual and optimizing the loss function value [80]. The objective function is composed of a loss function of gradient boosting and a regularization term (Equation 4) [80]. The loss function is expanded by a second-order Taylor series and is solved by the Newton's method [80].

XGBoost Algorithm for Landslide Susceptibility Evaluation
XGBoost, under the full name of Extreme Gradient Boosting [80], is a refined and distributed open-source library of Gradient Boosting techniques and is regarded as one of machine learning algorithms featuring high precisions and high computational efficiencies. XGBoost generates a strong landslide susceptibility prediction model from an ensemble of weak prediction models (decision trees) using a gradient descent algorithm; thus, it can effectively overcome overfitting [80]. It employs a stage-wise fashion and in each iteration, produces a weak prediction model by decreasing the residual and optimizing the loss function value [80]. The objective function is composed of a loss function of gradient boosting and a regularization term (Equation 4) [80]. The loss function is expanded by a second-order Taylor series and is solved by the Newton's method [80].
in which t is the number of decision trees, n is sample number, y i and y i are the real and predicted values, respectively, f t (x i ) is the prediction value of the tth decision tree, and the function l(·) is the loss of single sample. The regularization term is defined in Equation (5) using L2 norm and reflects the complexity of a tree [81]. The smaller is the value of Ω( f ), the lower is the complexity of the tree, and then the stronger is the generalization ability of the tree .
in which T and ω are the number and weights of tree nodes, respectively, γ is the regularization coefficient of the number of tree nodes, and λ is the regularization coefficient of the weights of tree nodes.
In this work, a surface curvature watershed algorithm [81] is employed to segment the study area into 170,188 slope units according to the DEM data. A total of 1136 slope units, with the same number of landslide and non-landslide samples, are randomly split into 70% of training samples and 30% of testing samples to train the XGBoost model and to evaluate the model precision, respectively. Moreover, the LSE indices are inspected in multicollinearity by the two parameters of Tolerance (TOL) and Variance Inflation Faction (VIF) [82]. The indices with strong correlations are eliminated to improve the precision of LSE.

Identification of Potential Landslides
The visibility analysis based on R-index is shown in Figure 4. The good visibility and low sensitivity regions (visible regions), with an area of 2071.16 km 2 , occupy 62.34% of the whole study area, and 86.44% of historical landslides occurred in the two types of regions. The study area features high elevations and a deep-cutting alpine and canyon landform. The special topography and landform may cause geometric distortion in the SAR interference result. The effective deformation can be extracted only in the good visibility and low sensitivity areas without geometric distortion. Thus, the displacement measurement extracted by SBAS-InSAR technique is masked by the R-Index visibility value, and the displacement and velocity in the good visibility and low sensitivity regions are retained ( Figure 5). According to the proposed synthetic criteria, 25 active landslides are identified, in which 16 ones are newly discovered as potential landslides ( Figure 5). The identified active landslides are validated by field survey, 3D UAV images, and 3D mapbox images and all exhibit obvious deformation signs including cracks, collapses, gullies, fragmented rock mass and so on. Seven validation examples are shown in Figure 6, and the concrete deformation signs and features are shown in Supplementary Figures S2-S4. Moreover, the validation of 25 identified active landslides is shown in Supplementary Figure S5. ( Figure 5). According to the proposed synthetic criteria, 25 active landslides are identified, in which 16 ones are newly discovered as potential landslides ( Figure 5). The identified active landslides are validated by field survey, 3D UAV images, and 3D mapbox images and all exhibit obvious deformation signs including cracks, collapses, gullies, fragmented rock mass and so on. Seven validation examples are shown in Figure 6, and the concrete deformation signs and features are shown in Supplementary Figures S2-S4. Moreover, the validation of 25 identified active landslides is shown in Supplementary Figure S5.      Figure S6. Sixty-eight false alarm areas are not identified as active landslides according to the suggested criteria, and these areas can be divided into three classes. (1) 11.76% are ground subsidence caused by human engineering activity, e.g., road construction, building construction, and mine exploitation, rather than mass movement down a slope.
(2) 47.06% are actually noises due to the inherent shortcoming of InSAR technique. These fake deformations have not been related to any landslide-inducing factor and cannot reflect surface deformation. (3) 41.18% do not exhibit landslide micro-geomorphological features, without free surfaces, and some fake areas are characterized by hard intrusive rocks, e.g., monzonitic granite; thus, the topographic and lithology preconditions of landslide development are absent. Therefore, the proposed criteria can effectively reduce the false alarm.

Landslide Susceptibility Evaluation
The initially established evaluation indices are shown in Figure 7. The multicollinearity analysis result is shown in Supplementary Table S3. There is collinearity among the factors of slope angle, surface roughness, surface cutting depth, elevation variation coefficient, and relief amplitude. As mentioned above, slope angle is very important for landslide development; thus, the other four factors of surface roughness, surface cutting depth, elevation variation coefficient, and relief amplitude are eliminated. There is no collinearity among the remained 13 factors, so they are employed as the assessment indices of landslide susceptibility.    The LSE map is shown in Figure 8, the susceptibility statistics is shown in Table 3, and the accuracy assessment is shown in Table 4. 92% of potential and known landslides are situated in the high and very high susceptibility regions that occupy 21.85% of the whole study area. The values of the precision indices of AUC, Accuracy, TPR, F1-score, and Kappa coefficient reach 0.996, 97.98%, 98.77%, 0.98, and 0.96, respectively.

Comparison with the SVM and CNN Algorithms
SVM [83] and CNN [84] are classical machine learning and deep learning algorithms, respectively; thus, in this work, LSE results produced by the XGBoost, SVM (Figure 9a), and CNN (Figure 9b) algorithms are compared. The principle of SVM is to calculate a hyperplane and to transform linear inseparability in a low-dimensional space to linear separability in a high-dimensional space by the hyperplane [83]. In this work, radial basis functions are adopted as the kernel function, and the values of the parameters C and γ are set as 10 and 0.1, respectively. CNN employs convolutional operations to replace the traditional matrix multiplication operation and generally consists of an input layer, convolutional layers, active layers, pooling layers, and full connected layers [84]. In this work, two convolutional layers and two pooling layers are used to extract sematic features. The function of cross entropy [85] is adopted as the loss function, and Adam optimizer [86] is utilized to update network weight values and to facilitate convergence speed by momentum and a self-adaptative learning rate. As shown in Figure 10, the number proportions of landslides falling in the very high susceptibility regions generated by the XGBoost, SVM, and CNN algorithms are 72%, 68%, and 53%, respectively. The landslide number proportions in the high and very high susceptibility (HVHS) areas produced by The distribution characteristics of various susceptibility levels are illuminated as follows. The very low and low susceptibility regions are primarily located on the high or very high mountains with elevations above 4000 m, and thus human engineering activity is few in these regions. In addition, these regions mainly feature relatively flat topography with slope degrees lower than 20 • and are dominated by quartzy sandstones, silty claystones, and limestones. Moreover, these areas suffer from relatively little influence from river erosion, precipitation, and earthquakes because of relatively far distances from rivers, low cumulative rainfall, and small earthquake occurrence densities. Therefore, the slopes in these regions are relatively stable.
The medium susceptibility areas are mainly situated on the high or medium-high mountains with elevations between 3500 m and 4500 m and are characterized by relatively steep relief with main slope degrees between 20 • and 40 • . The lithology is dominant in sandstones, claystones, slates, and siltstones with shales. Moreover, these regions are to some degree influenced by river erosion and human engineering activity, and some slopes gradually lost stability.
The high and very high susceptibility regions are primarily situated on high mountains or alpine and gorge regions with elevations lower than 4000 m. The topography is steep with slope degrees from 20 • to 60 • . The lithology is dominated by slates, metasandstones, phyllites, and flysch and features schistosity and mylonitization. In addition, these regions are cut by multiple faults including the Lancangjiang fault zone, Zuotongcun fault, and Chuanqiucuo fault. Moreover, these areas are seriously influenced by abundant precipitation (with cumulative rainfall larger than 1140 mm) and are relatively close to roads and rivers. Therefore, the slopes became instable and moving under the combined action of steep relief, fragmented rock mass, developed faults, active tectonic movement, abundant rainfall, intense river undercutting, and intensive human engineering activity.

Comparison with the SVM and CNN Algorithms
SVM [83] and CNN [84] are classical machine learning and deep learning algorithms, respectively; thus, in this work, LSE results produced by the XGBoost, SVM (Figure 9a), and CNN (Figure 9b) algorithms are compared. The principle of SVM is to calculate a hyperplane and to transform linear inseparability in a low-dimensional space to linear separability in a high-dimensional space by the hyperplane [83]. In this work, radial basis functions are adopted as the kernel function, and the values of the parameters C and γ are set as 10 and 0.1, respectively. CNN employs convolutional operations to replace the traditional matrix multiplication operation and generally consists of an input layer, convolutional layers, active layers, pooling layers, and full connected layers [84]. In this work, two convolutional layers and two pooling layers are used to extract sematic features. The function of cross entropy [85] is adopted as the loss function, and Adam optimizer [86] is utilized to update network weight values and to facilitate convergence speed by momentum and a self-adaptative learning rate. As shown in Figure 10, the number proportions of landslides falling in the very high susceptibility regions generated by the XGBoost, SVM, and CNN algorithms are 72%, 68%, and 53%, respectively. The landslide number proportions in the high and very high susceptibility (HVHS) areas produced by the XGBoost, SVM, and CNN algorithms are 92%, 81%, and 85%, respectively. In general, XGBoost has the highest precision in various accuracy indices ( Figure 11). For example, the AUC values of the LSEs by the XGBoost, SVM, and CNN algorithms are 0.996, 0.93, and 0.952, respectively, and the Accuracy values are 97.98%, 92.96%, and 88.56%, respectively. Therefore, XGBoost outperforms the other two algorithms in LSE in the study area.

Comparison with the LSE Map Generated from Known Landslides
The LSE map generated from both potential and known landslides is compared with the one produced only from known landslides (Figure 12). In the later map, 8.62% and 10.34% of historical landslides fall in the low susceptibility and medium susceptibility regions, respectively. Furthermore, in the later map, 43.75% and 6.25% of potential landslides are situated in the low susceptibility and medium susceptibility regions, respectively. Therefore, the LSE map based on known landslides cannot involve the threat from potential landslides and thus, restrict the precision and rationality of a LSE map.

Comparison with the LSE Map Generated from Known Landslides
The LSE map generated from both potential and known landslides is compared with the one produced only from known landslides (Figure 12). In the later map, 8.62% and 10.34% of historical landslides fall in the low susceptibility and medium susceptibility regions, respectively. Furthermore, in the later map, 43.75% and 6.25% of potential landslides are situated in the low susceptibility and medium susceptibility regions, respectively. Therefore, the LSE map based on known landslides cannot involve the threat from potential landslides and thus, restrict the precision and rationality of a LSE map.

Cause Characteristics of Active Landslides
The cause characteristics of active landslides include disaster-controlling features and disaster-inducing features. The disaster-controlling characteristics of active landslides mainly include 4 aspects ( Figure 13 and Table S1). (1) All the active landslides are located on the steep mountains with high elevations. The overwhelming majority occurred on the high and precipitous slopes with slope angles larger than 20° and elevations higher than 3000 m. Five landslides are situated on the slopes with slope angles larger than 30°. (2) 96% are distributed in the rock group of weak mudstone and shale or in the rock assemblage of relatively hard quartz sandstone, siltstone, and volcanics. One landslide occurred in the rock group of relatively hard sandstone and limestone. Thus, soft rocks and fractured and easily weathered hard rocks are important disaster-causing conditions in the study area. (3) 44% are situated within 2 km of the faults, 24% are within 1 km of the faults, and one landslide is directly cut by the Zuotongcun fault. Joints, fissures, and cracks are developed, rock mass is broken, and loose solid materials are deposited in the vicinity of the faults [62]. These provide abundant material sources, create a favorable condition for rainwater infiltration and physical weathering and have a remarkable influence on slope stability in the study area [62,87,88]. (4) 68% are distributed within 500 m of the rivers, 52% are within 100 m of the rivers, and 8 landslides are eroded by the rivers at the front

Cause Characteristics of Active Landslides
The cause characteristics of active landslides include disaster-controlling features and disaster-inducing features. The disaster-controlling characteristics of active landslides mainly include 4 aspects ( Figure 13 and Table S1). (1) All the active landslides are located on the steep mountains with high elevations. The overwhelming majority occurred on the high and precipitous slopes with slope angles larger than 20 • and elevations higher than 3000 m. Five landslides are situated on the slopes with slope angles larger than 30 • . (2) 96% are distributed in the rock group of weak mudstone and shale or in the rock assemblage of relatively hard quartz sandstone, siltstone, and volcanics. One landslide occurred in the rock group of relatively hard sandstone and limestone. Thus, soft rocks and fractured and easily weathered hard rocks are important disaster-causing conditions in the study area. (3) 44% are situated within 2 km of the faults, 24% are within 1 km of the faults, and one landslide is directly cut by the Zuotongcun fault. Joints, fissures, and cracks are developed, rock mass is broken, and loose solid materials are deposited in the vicinity of the faults [62]. These provide abundant material sources, create a favorable condition for rainwater infiltration and physical weathering and have a remarkable influence on slope stability in the study area [62,87,88]. (4) 68% are distributed within 500 m of the rivers, 52% are within 100 m of the rivers, and 8 landslides are eroded by the rivers at the front edges. In the study area, fluvial erosion is strong, and, for example, the specific erosion values along the Lancang trunk river attain 100-500 t/km 2 .a [62]. A number of high and steep slopes formed under strong river down-cutting and erosion, the front edges started moving, and thus pull-type landslides developed [62]. Moreover, the hydrodynamic pressure caused by groundwater seepage decreased the effective stress and shear strength on the potential sliding surface and further destroyed slope stability [62].  Table 2 for the meaning of engineering rock groups.

Cause of Landslide High-Susceptibility
Geographically, the HVHS regions can be divided into three sub-regions (Figure 8): the area in Jitang and Yanduo Towns (I1, I2, I3, and I4 regions), the common boundary  Table 2 for the meaning of engineering rock groups.
The disaster-inducing characteristics of active landslides are primarily embodied as the functions of earthquakes, precipitation, and human engineering activity (Table S2). In the study area, landslides triggered by human activity are closely related to road construction, e.g., the construction of National Highways G214 and G349 and Provincial Highway S203 ( Figure 13) [62]. These constructions, accompanied by explosion, caused slope excavation and destroyed slope balance [62]. There are mainly three disaster-triggering features. (1) Precipitation is the foremost factor inducing active landslides in the study area. Twentythree landslides were moving under the action of rainfall, in which 12 ones were induced mainly by rainfall, 5 by the coupled action of rainfall and road construction, 5 by the combined action of earthquakes and precipitation, and 1 by the common function of earthquakes, precipitation and human engineering activity. (2) Human engineering activity, especially road construction, plays a critical role in the development of active landslides. The deformation of 7 landslides was triggered by road construction. In addition to the 6 ones relevant to road construction mentioned in Point (1), one landslide was creeping under the coupled action of earthquakes and road construction. (3) Neotectonic movement and intensive geologic agent have a significant influence on the movement of active landslides. The deformation of 8 landslides was induced by earthquakes. Except the 7 ones induced by the combined action, one landslide was triggered primarily by earthquakes.
Therefore, in the study area, tectonic movement, developed faults, weak and easily sliding strata, highly weathered and fractured rock mass generated a mass of loose solid materials and created a large number of fissures and cracks. Then rainwater infiltrated into the cracks [89,90], rivers eroded the front edge, groundwater fluctuated to generate hydrodynamic pressure, and slope feet were excavated and exploded under road construction. Thus, the shear strength decreased, the weak sliding surface formed, the slope balance was destroyed, and the slopes became to move and creep down the high and steep topography and developed into an active landslide.

Cause of Landslide High-Susceptibility
Geographically, the HVHS regions can be divided into three sub-regions ( Figure 8): the area in Jitang and Yanduo Towns (I1, I2, I3, and I4 regions), the common boundary area of Kagong Township and Yanduo Town (I5 and I6 regions), and the region in Rongzhou Township and Xiangdui Town (I7, I8, and I9 regions). The cause of landslide high-susceptibility in the study area is shown in Figure 14.
According to field investigation, the HVHS area in Jitang and Yanduo Towns is situated in the river valley deeply cut by the Lancang river or in the downstream of the Maiqu river. Thus, a large amount of high and steep slopes were generated under the strong river erosion. Sandstone, siltstone, mudstone, and limestone were exposed in the region [62]. The lithology features poor mechanical property, weak structural surfaces, and secondary fissures [62]. The Lancangjiang fault zone traversed the region, featuring the fault fracture zone in a width of 150-200 m and abundant cataclasites [62]. Loose colluvial and proluvial deposits were distributed along two sides of the river valleys [62]. Moreover, the HVHS area is located along the Chaya sections of National Highway G214 and Provincial Highway S203, and this area is characterized by the most frequent and concentrated human activities in Chaya County. The construction of the highway S203 cut and exploded the slopes and destroyed slope stability. Therefore, the HVHS region was generated by the combined action of a developed fault zone, fractured rock mass, high and steep topography, strong river erosion, and frequent human engineering activity. area is located along the Chaya sections of National Highway G214 and Provincial Highway S203, and this area is characterized by the most frequent and concentrated human activities in Chaya County. The construction of the highway S203 cut and exploded the slopes and destroyed slope stability. Therefore, the HVHS region was generated by the combined action of a developed fault zone, fractured rock mass, high and steep topography, strong river erosion, and frequent human engineering activity. The HVHS area in the common boundary region of Kagong Township and Yanduo Town is distributed in the river valley deeply cut by the Lancang river, and the slope feet were scoured by the Lancang and Sequ rivers to loose stability. The area features developed faults, e.g., the Lancangjiang fault zone and Shela fault, and abundant loose residual sediments were generated due to seriously ruptured rock mass, developed cracks and cleavages, and intensive weathering [62]. Furthermore, the human engineering activity in the region included road construction, house building, hydropower construction, and farmland cultivation and irrigation. Thus, the HVHS area was caused by the common function of developed faults, broken rock mass, loose deposits, fluvial abrasion, and human engineering activity.
The HVHS region in Rongzhou Township and Xiangdui Town is located along the Maiqu river and its tributaries. The lithology was mainly composed of sandstone, siltstone, and mudstone, and the Chuanqiucuo and Zuotongcun faults spread over the area [62]. A mass of loose deposits were developed due to fractured rock mass and strong weathering [62], and high and steep slopes prevailed in the region. Surface runoff generated by rainfall converged into channels and led to intensive washing on the slope feet. The HVHS area in the common boundary region of Kagong Township and Yanduo Town is distributed in the river valley deeply cut by the Lancang river, and the slope feet were scoured by the Lancang and Sequ rivers to loose stability. The area features developed faults, e.g., the Lancangjiang fault zone and Shela fault, and abundant loose residual sediments were generated due to seriously ruptured rock mass, developed cracks and cleavages, and intensive weathering [62]. Furthermore, the human engineering activity in the region included road construction, house building, hydropower construction, and farmland cultivation and irrigation. Thus, the HVHS area was caused by the common function of developed faults, broken rock mass, loose deposits, fluvial abrasion, and human engineering activity.
The HVHS region in Rongzhou Township and Xiangdui Town is located along the Maiqu river and its tributaries. The lithology was mainly composed of sandstone, siltstone, and mudstone, and the Chuanqiucuo and Zuotongcun faults spread over the area [62]. A mass of loose deposits were developed due to fractured rock mass and strong weathering [62], and high and steep slopes prevailed in the region. Surface runoff generated by rainfall converged into channels and led to intensive washing on the slope feet. The human engineering activity primarily consisted of road construction, hydropower development, agricultural activity, and house building. Therefore, the HVHS region was generated under the coupling function of soft rocks, fault tectonics, broken rock mass, loose solid deposits, alpine canyon landform, river deep-cutting, precipitation, and human activity.

Conclusions
LSE is a vital and effective technique for the prediction, prevention, and control of catastrophic landslides and becomes an essential support to the sustainable development of nations and society. Moreover, about 80% of disastrous landslides have not been discovered in advance [8], and the discovery of these hidden landslides has become a worldwide challenge due to good concealment and high and steep relief. This work proposes new synthetic criteria to improve the identification accuracy of potential landslides, and the criteria include surface deformation, disaster-controlling features, and disaster-inducing characteristics. Furthermore, this work integrates historical landslides and potential landslides to improve the precision and rationality of LSE. The synthetic criteria of potential landslide identification and the idea of combining known landslides and potential landslides can be utilized to other disaster-serious regions.
This work selects Chaya County, a representative region significantly threatened by landslides, as the study area and employs multisource data (geological, topographical, geographical, hydrological, meteorological, seismic, and remote sensing data) to identify potential landslides and realize LSE by the SBAS-InSAR technique and XGBoost algorithm. Three main conclusions are drawn as follows.
(1) The proposed synthetic criteria integrate the characteristics of deformation, geology, topography, geomorphology, environment, earthquake, rainfall, and human engineering activity. According to the criteria, 25 active landslides are identified, among which 16 ones are newly discovered as potential landslides. In the study area, tectonic movement, weak strata, and fractured rock mass generated abundant cleavages and cracks and created numerous loose deposits that tended to move down the steep slopes under the action of external forces. Under the coupled function of strong river erosion, earthquake ground motion, rainwater infiltration, hydrodynamic pressure, and road and building construction, the shear strength decreased, the slope became moving, and an active landslide occurred. (2) A LSE map is generated by slope unit segmentation and the XGBoost algorithm.
92% of the potential and known landslides are situated in the HVHS regions that occupy 21.85% of the whole study area. The values of the precision indices of AUC, Accuracy, TPR, F1-score, and Kappa coefficient reach 0.996, 97.98%, 98.77%, 0.98, and 0.96, respectively. Moreover, XGBoost outperforms the representative machine learning algorithm of SVM and the deep learning algorithm of CNN in the study area. (3) The HVHS region is situated in the river valley, suffering from strong river erosion, and features high and steep topography. The region was cut by various faults, and a large amount of cataclasites and loose deposits were generated and are distributed along two sides of the river valleys. Rainwater washed the slope feet and penetrated through the loose soils and broken rocks into the slope bodies. Moreover, the slope feet in the HVHS region were relaxed and excavated by the construction of the national or provincial highways. Therefore, the HVHS was caused by the coupled action of a developed fault zone, ruptured rock mass, high and steep relief, intensive river erosion, concentrated rainfall, and frequent human engineering activity.
The future research can be conducted on 2 aspects: (1) automatic identification of active landslides over an extensive region, and (2) an implication of landslide susceptibility on seismic hazard assessment. First, in this work, active landslides are recognized in an interactive computer-aided mode. There may be numerous active landslides occurring over a wide region; thus, automatic recognition of active landslides can to a great degree improve the identification efficiency and provide an effective support for landslide prevention and control. Second, ground motion prediction has been a significant technique for seismic hazard and risk assessment and provides a crucial clue on earthquake risk mitigation [91]. In a region with intensive crustal movement, a landslide susceptibility map can be employed to predict the permanent ground displacement (PGD) triggered by an earthquake [2]. The PGD, combined with the transient ground displacement, is a vital basis for earthquake risk evaluation and control [2,91].
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/ijerph192114241/s1, Figure S1: Distribution characteristics of the historical landslides in the study area; Figure S2: Deformation characteristics and triggering factors of AL-1 Landslide; Figure S3: Deformation monitoring and inducing factors of AL-21 landslide; Figure S4: Macroscopical deformation signs of two potential landslide examples (AL-10 and AL-20); Figure S5: Validation of 25 identified active landslides via field survey [62], UAV images [62], and 3D Mapbox images; Figure S6: False alarm regions generated by SBAS-InSAR technique superimposed on (a) Deformation velocity map and (b) Google Earth images shot on 2 February 2015; 16 March 2015; 7 November 2020; and 11 November 2020, respectively; Figure S7: Three examples of false alarms; Table S1: Geoenvironmental features of 25 active landslides; Table S2: Correlation of active landslide deformation with various triggering factors; Table S3 Collinearity relation among various disaster-controlling and disaster-triggering factors.