1. Introduction
Landslides are downslope movements of soil or rock materials along sliding planes that occur under the influence of gravitational forces [
1]. This form of slope instability initiates when the force of the material’s weight exceeds the internal shear resistance of the slide materials [
2]. Landslides are among the most deadly and common natural geohazards in mountainous regions across the globe [
3]. According to the International Disaster Database, landslides comprised more than 4.9% of all natural disaster events and caused 1.3% of natural disaster fatalities between 1990 and 2015 [
4].
The topographical, hydrological, and geo-environmental settings of southern Peru result in several types of landslides (e.g., rockfall, rock and soil slides, debris flows, shallow, and deep-seated landslides) that cause risk to human lives, infrastructure damage, economic instability, and landscape degradation. In general, landslides are triggered due to several factors, such as heavy precipitation, earthquakes, and volcanic and anthropogenic activities [
5]. The frequency and impact of landslides are expected to increase in the future due to urbanization, highway construction, and deforestation [
4,
5,
6]. Although landslides cannot be prevented, their impacts can be mitigated by developing spatial susceptibility models, which can be used in risk zonation and mitigation management [
5,
7,
8].
Landslide susceptibility mapping/modeling (LSM) deals with the prediction of the probability of the occurrence of landslides in an area based on past landslides at different geo-locations with similar topographical, hydrological, and geo-environmental factors [
6,
9,
10]. LSM approaches can be broadly categorized as being either qualitative or quantitative. Qualitative susceptibility maps are prepared based on geomorphological and field mapping by domain experts, whereas quantitative susceptibility maps rely on statistical models that explicitly identify the relationship between past landslides and geo-environmental factors. The advantages of quantitative approaches are that they are not subjective and produce repeatable results with higher accuracy [
11]. Recently, GIS-based probabilistic models have also been developed for the kinematic susceptibility of landslides [
12,
13].
In the past few decades, LSM has been widely used as one of the most effective tools in landslide hazard management worldwide [
7,
14,
15,
16]. However, the accurate prediction of landslides is challenging due to their complex nature [
10,
16,
17,
18]. Successful LSM campaigns typically consist of three phases: preparation of a landslide and non-landslide inventory (i.e., training dataset), identification of relevant landslide influencing factors (LIFs) (e.g., topographical, hydrological, and geo-environmental), and implementation of appropriate prediction methods [
16,
19,
20,
21,
22,
23,
24].
In recent years, several machine learning (ML) algorithms have been successfully used in LSM, such as linear discriminant analysis (LDA), mixture discriminant analysis (MDA), k-nearest neighbors (KNN), support vector machine (SVM), artificial neural networks (ANN), boosted logistic regression (BLR), bagged cart (BC), random forest (RF), rotation forest (RTF), and C5.0 [
20,
25,
26,
27,
28,
29,
30,
31,
32,
33]. ML algorithms offer several advantages over conventional statistical methods, such as the ability to learn the complex relationship between dependent and independent variables, proficiency in big data handling, geostatistical analysis, and the ability to update the developed model in the future [
34,
35,
36,
37,
38]. Numerous studies evaluated the performance of ML methods to identify suitable methods for their study areas [
39]. It should be noted that the performance of ML methods shows considerable variability between study sites due to differences in the complexity of each area, training datasets, availability of data summarizing LIFs, and ML implementation approaches. Therefore, the performance evaluation of different ML methods is recommended for different sites for accurate LSM [
31].
The success of ML models highly depends on appropriate training data, optimal variable selection, and hyper-parameter optimization [
40]. The selection of less correlated and important variables is typically obtained using feature selection (FS) methods. FS methods are broadly categorized as filter-based, wrapper, and embedded methods [
41]. Filter-based FS methods use statistical measures (e.g., correlation, entropy, mutual information, etc.) to obtain the importance of given variables [
41] and have been successfully used in LSM. Linear correlation, rank correlation, information gain (IG), gain ratio (GR), and relief-F (R
F) are common filter-based FS methods. The advantages of filter-based FS over the wrapper and embedded methods are that they are computationally efficient, reliable, and non-biased towards specific models [
41]. Some studies have also highlighted the utility of ensemble FS (EFS), where the selection of variables results from multiple FS methods using majority voting [
42,
43].
Recently, a few studies have explored the utility of ensemble ML models for LSM and reported improvement in accuracy and generalization over individual ML models [
42,
44,
45,
46]. The application of ensemble learning is well-exploited in several fields, including data mining [
47,
48] and biological sciences [
49], but comparatively less explored in geohazard applications, particularly in LSM [
42,
44]. To the best of our knowledge, the application of EFS and ensemble ML models together for landslide susceptibility prediction is rarely discussed in the literature. A few recent studies, such as Kadavi et al. [
50], Arabameri et al. [
51], and Fang et al. [
42], used ensemble ML models for LSM. However, these studies do not investigate the impact of the number of LIFs on the performance of ML models, which is vital in obtaining important LIFs for regional- or global-scale mapping. Additionally, these prior studies evaluated the performance of ML models in relatively small geographical areas (i.e., ~100–400 km
2) of tropical–subtropical climatic conditions and will not necessarily produce a similar performance in an arid climatic region at a regional scale.
To the best of our knowledge, the performance evaluation of a wide range of ML models and their ensemble for regional LSM in an arid climatic condition has not been presented in the literature. Therefore, this study attempts to highlight the utility of the ensemble approach of feature selection and ML models for regional LSM in the arid mountainous terrain of southern Peru using remotely sensed data and GIS. The objectives of this paper are three-fold: (a) We evaluate the performance of diverse sets of ML models (LDA, MDA, BC, BLR, KNN, ANN, SVM, RF, RTF, and C5.0) for LSM. (b) We evaluate the performance of different ensemble ML models developed in this study. (c) We investigate the impact of the number of LIFs derived using EFS on ML performance and their utility in developing robust ML models for regional LSM. From a practical perspective, the identified suitable LIFs coupled with robust ML models developed in this study should be useful in developing mitigation strategies to reduce the landslide impact in the area.
4. Discussion
Feature selection is an important step in machine learning that aims to remove redundant and less useful variables to reduce the potential for overfitting and improve generalization. We used an EFS method derived using the Chi-square, gain ratio, and relief-F methods to select the most important LIFs in LSM for our study area. The EFS reduces the uncertainty in selecting the best possible variables as different feature selection methods may rank the variables in different orders of importance, as seen in this study (
Figure 7). Among twenty-four derived LIFs (elevation, aspect, slope, profile curvature, TPI, TRI, TWI, STI, SPI, SRR, rainfall, stream density, direct radiation, direct duration radiation, NDVI, lithology, hydrogeology, geomorphology, LULC, soil type, distance from faults, earthquake magnitude, distance from roads, and distance from epicenter), the slope, direct radiation, TWI, profile curvature, and direct duration radiation were the top five LIFs ranked by the EFS method in this study (
Figure 7).
Direct radiation and direct duration radiation (solar radiation) are rarely used in LSM [
99,
100] but were found to be important for landslide susceptibility prediction in this study. The relevance of solar radiation is interpreted to be linked with the cold-arid climatic condition of the area, where the amount of solar radiation plays a significant role in evapotranspiration, growth of vegetation, and minimizes the frost action, which improves slope stability [
99]. Direct radiation and direct duration radiation are negatively correlated with slope steepness, indicating a strong association between low solar radiation areas and high slope angles (
Figure 6). The relevance of solar radiation can also be illustrated using frequency ratio plots, where the areas of relatively low direct radiation and low to moderate direct duration indicate a very high frequency of landslides in the area (
Figure 8).
The slope had a negative correlation with TWI, indicating less moisture content (and reduced vegetation growth) along steep slopes and correspondingly more frequent landslides. The frequency of landslides was noted to be higher within the low- to moderate-elevation areas, as these regions mostly consist of relatively soft rock [
111]. This was also confirmed by the geomorphology and lithology layers, indicating the highest landslide frequency within the areas of alluvial terraces and colluvial deposits. The frequency plots of these LIFs also confirmed that these regions have a higher frequency of landslides (
Figure 8 and
Figure S2). The distance to faults, earthquake magnitude, and distance to earthquake epicenter were among the least important variables, as earthquake-triggered landslides usually occur in the vicinity of active geologic faults and their impacts are likely to be limited within a certain distance [
111]. Similarly, distance to roads was among the least important variables in this study, which could again be because the impact of engineering practices is impactful within a certain distance and may have a negligible impact on regional landslide susceptibility, as found in this study.
We evaluated the performance of ten different ML models in regional LSM using different sets of LIFs derived from the EFS method.
Table 5 and
Figure 9 show that the LDA, MDA, KNN, and ANN did not improve their sensitivity values when the number of LIFs increased. Relative to other models, the SVM, RF, and RTF have shown a slight improvement in their performance when developed using a greater number of LIFs. The RF and RTF improved their performance by ~7–8% when the number of LIFs increased from 5 to 24. The RF also achieved the best accuracy statistics in previous LSM studies [
55,
111]. The C5.0 showed a better improvement in performance when developed using a greater number of LIFs, as compared to other models. It is interesting to note that the C5.0 underperformed the other models when developed using the top 5 LIFs but outperformed other models when developed using ≥15 LIFs. Specifically, the sensitivity value of C5.0 increased from 0.64 to 0.81 when the number of LIFs increased from 5 to 15 or more. However, we suggest that the performance of ML models when using a small number of LIFs is more important as it reduces the risk of overfitting and may result in improved generalization performance.
A few ML models may Improve their performance marginally at the cost of higher model complexity. For example, the RF and RTF can provide ~3–4% better accuracy statistics (sensitivity, specificity, AUC, and OA) than the KNN and ANN, but only when approximately five times as many LIFs are considered (24 vs. 5). Interestingly, the C5.0 yielded ~9% better sensitivity than the KNN and ANN when the number of LIFs increased to 15 or higher. There is often a tradeoff between the model’s performance and complexity (i.e., models developed using a higher number of variables (LIFs)) [
112]. The common drawbacks of complex models include issues related to overfitting, interpretability, generalization, and computation cost [
112]. It is challenging to comment on the performance of ML models solely based on statistical inferences if they exhibit slight differences in their performance, as most of them have a similar ability to represent complex non-linear relationships [
55]. Furthermore, the minor differences in accuracy statistics on LSM at a regional scale may not be practically significant in the spatial domain.
Different ensemble ML models using a suitable combination of best-performing individual ML models were developed in this study. The ensemble ML models should reduce the problem of overfitting and improve the generalization over the individual models [
42,
50]. The KNN + RTF, KNN + ANN, and ANN + RTF were among the best-performing ensemble models when developed using the top five LIFs. However, the experimental results did not exhibit notable improvement in the accuracy statistics of ensemble models over their individual models (
Table 5 and
Table 7 and
Figure 9 and
Figure 10), likely due to the high correlations among individual models (
Table 6). In general, the performance of the ensemble models falls somewhere between the performance of the individual models used in ensemble development. For example, the sensitivity value of KNN + C5.0 (0.70) was lower than the sensitivity value of KNN (0.72) but higher than that of C5.0 (0.64) when developed using the top five LIFs. Conversely, the accuracy statistics of KNN + C5.0 were slightly lower and higher than C5.0 and KNN, respectively, when developed using a greater number of LIFs (≥15). This could be due to the performance of KNN decreasing as the number of LIFs increased, whereas C5.0 improved its performance as the number of LIFs increased. In other words, the KNN and C5.0 showed an opposite response to the number of LIFs.
Most of the ensemble ML models yielded very good accuracy statistics (sensitivity ≥ 0.70, specificity ≥ 0.80, AUC ≥ 0.86, and OA ≥ 78%) using the top five LIFs in this study due to the well-distributed training dataset and suitable LIFs. Moreover, the consistent performance of different models ensures the reliability of the derived results and their subsequent utilization in susceptibility categorization (
Figure 11,
Figure 12 and
Figure 13). This can be further supported by
Figure 15, which displays the robustness of developed ensemble models in accurately predicting landslide susceptibility within the vicinity of major communities. The spatial statistics of susceptibility categories indicate that 2–3% and 10–12% of the total study area fell within the “very high” and “high” susceptibility categories, respectively, (
Figure 14), which are predominantly characterized by barren steep slopes, low solar radiation, low to moderate elevation, and sedimentary deposits.
Around 80% of the historical landslide points (i.e., 1168 out of 1460) fell within the moderate to very high landslide susceptibility derived from the ensemble models. The remaining ~20% of the historical landslide points fell within the regions of very low to low susceptibility. This could be due to the significant difference between the spatial resolution of the remotely sensed data used in preparing the landslide inventory (~≤1 m) and most of the LIFs (~30 m). There is a possibility that some of the landslides present in the inventory do not cover the sufficient spatial extent to be predicted by LIFs of coarser resolution using ML models. This could be further attributed to uncertainty in training data and LIFs, ML’s ability to learn the complex non-linear relationship between the historical landslides and LIFs, and spatial prediction at a regional scale.
There are some limitations of this study that are recommended to be considered in future work. In this study, we considered a relatively greater number of non-landslide samples than landslide samples, which may influence the performance of different ML models. Different proportions of landslide and non-landslide samples and their influence on ML performance in LSM at a regional scale could be explored in future studies.
Suitable LIFs are crucial in obtaining an accurate susceptibility map of landslides using any ML models. A few LIFs may provide an added value when their multi-temporal series are considered in context to pre- and post-landslides. In this study, we used a single-year NDVI and land use/landcover map in characterizing the landslide susceptibility of the area, which may produce a landslide detection model rather than a susceptibility model. It would be interesting to see the utility of multi-temporal NDVI and land use/landcover in assessing pre-landslide susceptibility and post-landslide occurrences. Additionally, we used an average of ten years of annual precipitation map in assessing their impact on landslide susceptibility. Future studies could consider using multi-temporal and extreme precipitation events in characterizing landslide susceptibility.
The correlation among ML models is crucial in finding the ideal combination of ML models to obtain robust ensemble ML models. Most of the ML models implemented in this study indicated higher correlation among them, which induced limitations in developing the ensemble models. Ideally, the candidate models for the ensemble should have less correlation. We recommend exploring a wide range of ML models in selecting candidate models for optimal ensemble and their assessment in LSM.
Regarding the validation of the models, we assessed the performance of ML models based on a training and testing data split. However, an intensive iteration-based cross-validation approach can be considered in future studies to assess the robustness of ML models. This could provide further information about the robustness and generalization potential of the models.
5. Conclusions
The identification of landslide-prone areas can be valuable for land use planners or disaster management agencies to aid in the process of appropriately allocating resources to forecast and mitigate landslide impacts. We derived a regional landslide susceptibility of the Colca-Camana watershed in the south of Peru using an ensemble approach of feature selection and machine learning (ML) models. The ensemble feature selection successfully identified the most important landslide influencing factors (LIFs) (e.g., slope, direct radiation, topographical wetness index, profile curvature, and direct duration radiation) to predict the landslide susceptibility in the area.
We evaluated the performance of ten individual ML models using different sets of LIFs ranked by ensemble feature selection. The k-nearest neighbors (KNN) (sensitivity = 0.72, specificity = 0.82, area under curve (AUC) = 0.86, overall accuracy (OA) = 78%) and artificial neural network (ANN) (sensitivity = 0.71, specificity = 0.85, AUC = 0.87, OA = 79%) outperformed other models when developed using the top five LIFs. The RF, RTF, and C5.0 outperformed other models when developed using all 24 LIFs (sensitivity = 0.76–0.81, specificity = 0.87, AUC = 0.90–0.93, OA = 82–84%). Among ensemble ML models, the ensembles of KNN and rotation forest (KNN + RTF), KNN + ANN, and ANN + RTF models outperformed other models using the top five LIFs (sensitivity = 0.72–0.73, specificity = 0.83–0.84, AUC = 0.86, OA = 79%). The ensemble models did not show significant improvement in their statistical performance but should reduce the uncertainty in the spatial prediction of susceptibility over the individual models. The accuracy statistics of different ML models using all LIFs showed small improvements, but arguably not enough to justify the additional complexity introduced by including more LIFs. This justifies the robustness of the proposed ensemble approach to obtain a reliable landslide susceptibility at a regional scale.
The susceptibility maps derived using ensemble models suggested that approximately 2–3% and 10–12% of the total study area fell within the “very high” and “high” landslide susceptibility categories, respectively. These regions are mainly categized by barren steep slopes of low to moderate elevation, southerly slope aspects, low solar radiation, low topographical wetness, and loose sedimentary deposits. The landslide susceptibility maps of the area derived in this study have the potential to be used by policymakers to develop an effective mitigation strategy to reduce the landslide risk for the sustainable development of the area.