1. Introduction
Landslides are among the most devastating natural hazards, significantly impacting human lives, infrastructure, and the environment [
1,
2]. These geomorphological processes are triggered by natural and anthropogenic factors, including intense rainfall, earthquakes, deforestation, and unplanned construction activities [
3]. Globally, landslides cause substantial economic losses and fatalities, particularly in mountainous and hilly terrains of Asia, South America, and Europe [
4]. In India, landslides predominantly occur in the Himalayan region, Western Ghats, and Northeastern states, affecting thousands of people annually [
5,
6]. The increasing frequency of extreme weather events due to climate change has further exacerbated landslide occurrences, highlighting the urgent need for effective landslide susceptibility mapping (LSM) and risk mitigation strategies [
7,
8].
In India, the Western Ghats, one of the eight “hottest” biodiversity hotspots, is highly susceptible to landslides due to its steep slopes, heavy monsoon rains, and human interventions [
9]. Kerala, a state with a rugged terrain and high population density, has witnessed frequent landslides, particularly during the monsoon season. Kozhikode district, located in northern Kerala, is highly prone to landslides due to its complex geological setting, high annual rainfall, and land-use changes [
10,
11,
12]. Recent landslide events in the district, particularly in Wayanad and nearby highland areas, have resulted in severe casualties and infrastructure damage. The increasing vulnerability of Kozhikode to landslides underscores the necessity for scientific assessment and early warning systems to mitigate risks and ensure sustainable land-use planning [
13,
14].
Remote Sensing (RS) and Geographic Information Systems (GIS) have emerged as indispensable tools for landslide susceptibility mapping (LSM), enabling the generation of accurate spatial datasets crucial for hazard prediction and mitigation [
15,
16]. In recent years, the adoption of advanced machine learning (ML) algorithms—particularly the random forest (RF) classifier—has significantly improved LSM outcomes due to their capacity to model complex and nonlinear relationships among diverse environmental variables [
17]. The interaction of RS, GIS, and ML enhances the predictive accuracy of LSM by incorporating a wide array of predictive factors, including topographic, hydrological, geological, and anthropogenic factors [
18]. Several recent studies across various districts in Kerala, such as Idukki, Wayanad, and Pathanamthitta, have successfully demonstrated the application of GIS-integrated ML models for landslide prediction and risk zoning. These studies typically utilized terrain parameters derived from high-resolution DEMs, satellite indices like NDVI and MSI, and categorical inputs such as land use, lithology, and soil type to train classifiers like RF, SVM, and ANN [
19,
20,
21]. Despite these advancements, a noticeable gap persists in the systematic application of such integrated approaches to the Kozhikode district, which has also experienced frequent landslide events in recent decades. Therefore, this study aims to fill this regional gap by employing a robust data-driven methodology that combines the RF algorithm with high-resolution satellite imagery and GIS-based spatial analysis to delineate landslide-prone zones in Kozhikode. This integrated approach is expected to offer actionable regional disaster preparedness and risk reduction insights.
This research aims to develop a comprehensive landslide susceptibility map for Kozhikode, Kerala, by integrating ML-based RF methodology with remote sensing and GIS techniques. This study utilizes multiple predictive factors, including geology, slope, drainage, drainage density, and spectral indices such as NDVI, SAVI, and LULC. Additionally, historical landslide data has been collected and incorporated into the model to enhance its predictive accuracy by providing insights into past landslide occurrences and their spatial distribution. By leveraging high-resolution remote sensing data, geospatial analysis, and historical landslide records, the study aims to identify and classify landslide-prone areas with greater precision. The application of RF, a robust ensemble ML algorithm, improves the model’s reliability by analyzing complex spatial relationships between landslide-controlling factors. The findings of this research will contribute to improved landslide hazard assessment, early warning systems, and sustainable land management practices in the region.
2. Study Area
Kozhikode district, located in northern Kerala, spans approximately 2270 km
2, with diverse topography ranging from coastal plains to steep highlands in the Western Ghats (
Figure 1). The Arabian Sea borders the district to the west and hilly terrains to the east, where landslides are a significant concern. The region experiences a tropical monsoon climate, receiving an average annual rainfall of 3266 mm, primarily during the Southwest Monsoon (June–September). The combination of intense rainfall, steep slopes, and deforestation makes Kozhikode highly vulnerable to landslides, especially in areas like Kuttiady, Koduvally, and Thamarassery. Geologically, the district comprises Precambrian crystalline rocks, including charnockites, khondalites, and laterites, widespread in the midland and highland regions [
22,
23]. The lateritic soil formations and deeply weathered rock layers make the terrain highly unstable, particularly on steep slopes. The district has a well-defined drainage network, with major rivers such as the Kuttiady, Korapuzha, and Chaliyar contributing to erosion and sediment transport. Drainage density and slope variations are crucial in landslide occurrences, as poorly drained or highly erodible areas are more prone to slope failures. Kozhikode has witnessed several landslide incidents, particularly in the hilly eastern regions. Heavy monsoon rainfall, unregulated land-use changes, deforestation, and infrastructure development have increased landslide susceptibility. With the increasing frequency of extreme weather events, scientific landslide susceptibility mapping is crucial to mitigate risks and develop effective disaster management strategies [
24]. The region has a rich cultural and economic history, contributing significantly to Kerala’s development. However, rapid urbanization and infrastructure expansion in landslide-prone zones have heightened the need for scientific assessments.
4. Results and Discussion
4.1. Importance and Contribution of Predictive Factor Variables
The RF model utilized in this study incorporated eight predictive factors as conditioning factors to map landslide susceptibility across the study area. These included Slope, Aspect, Geology, Drainage Density, Stream Order, Land Use and Land Cover (LULC), Moisture Stress Index (MSI), and Normalized Difference Vegetation Index (NDVI). Each variable is critical in landslide initiation and propagation, directly influencing terrain stability or indirectly modifying surface runoff and infiltration. In RF classification, the model assigns relative importance to each input based on how frequently and effectively it contributes to decision-tree splits in the classification process. While exact importance values could not be extracted from Weka due to technical limitations, predictive factor understanding and empirical knowledge were used to infer variable influence. Generally, Slope, Geology, and Drainage Density are among the most dominant factors, as they directly control surface movement, soil saturation, and gravitational energy. Stream Order and MSI reflect hydrological behavior, while NDVI and LULC represent surface vegetation and land cover stability. Aspect contributes by influencing microclimate and weathering patterns on specific slopes.
4.1.1. Stream Order and Its Influence on Landslide Susceptibility
Stream order reflects the hierarchical arrangement of streams within a drainage basin and is closely tied to geomorphological processes influencing slope stability [
40]. In the Western Ghats’ rugged terrain, lower-order streams (such as first- and second-order) typically occur in steep headwater regions, which are more prone to landslides due to concentrated overland flow, toe erosion, and shallow soil depth (
Figure 3a and
Table 2). Stream orders were extracted from the DEM-derived stream network and classified using Strahler’s method into five categories. The analysis revealed that the first- and second-order streams are strongly associated with landslide activity, primarily due to their location in high-relief areas where concentrated flow initiates shallow slips and debris flows. These zones often exhibit active erosion, undercutting, and rapid hydrological response during intense rainfall. The RF model effectively captured this relationship, with lower stream orders consistently contributing higher to landslide classification. Conversely, fourth- and fifth-order streams are generally located in flatter terrain and floodplains, with minimal landslide incidence.
4.1.2. Drainage Density and Its Influence on Landslide Susceptibility
Drainage density (DD), defined as the total length of streams per unit area, is a key indicator of surface runoff behavior, terrain dissection, and subsurface infiltration capacity [
41]. In the Western Ghats region, high drainage density typically signifies steep slopes, impermeable bedrock, and intense rainfall–runoff interactions all of which can exacerbate slope instability and landslide occurrences. In this study, drainage density was derived from stream networks using the line density tool in ArcGIS and classified into five categories (
Figure 3b and
Table 2): Areas with high drainage density (above 3.33 km/km
2) exhibited a strong correlation with landslide events, particularly in steep, forested catchments where surface runoff rapidly concentrates. The intersecting network of streams also contributes to undercutting and toe erosion, triggering slope failures. The RF model assigned substantial weight to drainage density, frequently selecting it as a decision node in classification trees. High-density stream zones, especially along concave valley flanks and escarpments, were strongly associated with past landslide occurrences.
4.1.3. Slope and Its Influence on Landslide Susceptibility
Slope is a primary terrain factor in landslide occurrences as it directly affects the gravitational force acting on slope materials [
42]. In this study, the slope was derived from a high-resolution DEM and classified into five classes (
Figure 3c and
Table 2): slope classification and associated landslide susceptibility levels. Areas with slopes greater than 50% were observed to be the most susceptible to landslides due to the high gravitational stress acting on soil and weathered rock materials. These zones typically correspond to hilly terrains, escarpments, and deeply incised valleys. Conversely, gently sloping and nearly flat areas showed minimal landslide activity due to stable terrain and minimal gravitational force. The RF model recognized slope as a key factor, frequently used in decision trees to separate landslide and non-landslide zones, reflecting its dominant influence. Areas with steep slopes also coincided with low vegetation cover and poorly drained soil, increasing landslide potential.
4.1.4. Aspect and Its Influence on Landslide Susceptibility
Aspect, or the directional orientation of a slope, significantly affects local microclimates, solar radiation, wind exposure, and soil moisture, all of which can influence slope stability [
43] (
Figure 3d and
Table 2). This study derived aspect from the DEM and classified into ten directional categories, including flat terrain. Slopes facing south and southeast generally receive more intense solar radiation, leading to increased weathering, desiccation of soils, and reduced cohesion, all of which enhance susceptibility to landslides. These aspects also support sparser vegetation cover, which further reduces slope stability. In contrast, north-facing slopes tend to be cooler and retain more moisture and vegetation, reducing landslide likelihood. The RF model captured this variability, with south- and southeast-facing slopes contributing more prominently to the model’s classification of landslide-prone zones. Flat and gently sloping areas, regardless of direction, were found to be relatively stable.
4.1.5. Geology and Its Influence on Landslide Susceptibility
The geology of the Western Ghats region in Kerala significantly influences landslide dynamics due to ancient crystalline rock formations undergoing intensive weathering under high rainfall conditions. The study area comprises four major lithological units: In the Western Ghats, even geologically competent rocks such as Charnockites and Migmatites become vulnerable to landslides due to intense chemical weathering, fracturing, and water infiltration resulting from persistent monsoonal precipitation [
44]. Charnockites, al-though considered strong, are often deeply jointed and altered along slopes, making them prone to failure when saturated. Migmatites and Felsic Granophyres, with foliated structures and altered mineral zones, also exhibit susceptibility, particularly on steep slopes and areas with inadequate vegetation anchoring (
Figure 3e and
Table 2). The Coastal and Sedimentary units have limited or no spatial presence in the study area and, thus, were not significant contributors to the RF model. The model’s output reflects the high landslide susceptibility in regions underlain by weathered charnockite and migmatite formations, particularly along escarpments and steeply inclined valleys where hydrological triggers accentuate structural weaknesses.
4.1.6. Land Use/Land Cover (LULC) and Its Influence on Landslide Susceptibility
Land use and land cover (LULC) are vital in modulating slope stability by affecting infiltration capacity, vegetation anchoring, runoff velocity, and erosion rates [
45]. In the Western Ghats region of Kerala, which is ecologically sensitive and experiences intense monsoonal precipitation, changes in LULC directly influence landslide susceptibility. The LULC map was prepared using multi-temporal satellite data and classified into the following five categories (
Figure 3f and
Table 2): Among these, “Hilly Areas with Dense Vegetation” showed a higher landslide susceptibility despite the vegetative cover, primarily due to steep slopes, unstable underlying geology, and water saturation during rainfall events. While dense vegetation in low to moderate slopes tends to stabilize slopes by binding the soil, vegetation on steeper terrain may not fully mitigate landslide risk, especially when root systems are shallow or soil layers become saturated. Conversely, river/stream areas are less prone to landslides but can indirectly contribute to instability by undercutting banks and increasing erosion at the base of slopes. The RF model assigned moderate to high importance to LULC, particularly flagging transition zones between hilly vegetated terrain and developed/agricultural lands as critical susceptibility areas.
4.1.7. Moisture Stress Index (MSI) and Its Influence on Landslide Susceptibility
The MSI, derived from remote sensing data, is an essential hydrological indicator reflecting the moisture content in surface soil and vegetation. In landslide-prone environments like the Western Ghats, high MSI values generally correspond to low moisture, while low MSI values reflect areas with high soil moisture, which may be more susceptible to failure due to increased pore-water pressure and loss of soil cohesion [
46,
47]. For this study, MSI was classified into five ranges (
Figure 3g and
Table 2). The analysis showed that areas with low MSI values (0.48–0.77), indicating high surface moisture, coincide with many historical landslide locations. These zones are often located on mid- to upper slopes, where water accumulation due to rainfall and slow drainage leads to saturation of the weathered soil/rock layer, increasing the risk of slope failure. In contrast, higher MSI values, representing dry or well-drained areas (such as ridge tops or flat terrain), were less associated with landslide occurrences. The RF model ranked MSI as a significant predictor, particularly in combination with slope and geology, indicating that moisture retention capacity and its interaction with terrain factors is crucial in landslide generation.
4.1.8. NDVI and Its Influence on Landslide Susceptibility
The NDVI is a widely used remote sensing indicator quantifying vegetation density and health. NDVI values range from negative to positive, where positive values indicate healthy, dense vegetation and negative values represent bare soil, water bodies, or non-vegetated areas. In landslide studies, NDVI is used to assess vegetative cover, which plays a crucial role in stabilizing slopes, enhancing soil cohesion, and mitigating erosion [
48,
49]. For this study, NDVI was classified into five categories (
Figure 3h and
Table 2): In the Western Ghats, the areas with very dense vegetation (NDVI > 0.35) are typically found in forested regions that play an essential role in stabilizing slopes and preventing landslides. These areas are often located in the lower and mid-slope zones, where thick root systems help maintain soil structure and resist erosion. On the other hand, regions with sparse vegetation or bare soil (NDVI < 0.22) are more vulnerable to landslides due to the absence of root structures, which would otherwise help in soil retention and water absorption. These zones tend to be found on steep slopes and disturbed landscapes (e.g., agricultural land, road cuts, or deforested regions). The RF model heavily weighted NDVI as a predictive factor, recognizing that vegetation cover acts as a protective layer, especially in areas with steep slopes and high rainfall.
4.2. Data Preprocessing for RF Classification
Data preprocessing is crucial in preparing the input data for the RF classification. It involves transforming and standardizing the predictive factors, preparing training and testing datasets, and ensuring that the data is correctly formatted for input into the Weka software (Version 3.8.6). This section outlines the steps followed for preprocessing the data before running the RF classification.
4.2.1. Predictive Factor Standardization
The predictive factors (Stream Order, Drainage Density, Slope, Aspect, Geology, LULC, MSI, and NDVI) were initially obtained from various sources (
Table 3). These layers were processed and standardized to ensure consistency and compatibility for analysis in Weka. The main preprocessing tasks involved reclassifying, normalizing, and ensuring that all layers had the same resolution and alignment.
These reclassifications were implemented using the Reclassify tool in ArcGIS 10.8. All predictive factors layers were resampled to a uniform spatial resolution (30 m) and projected consistently using a common coordinate system.
4.2.2. Landslide Inventory and Sampling Strategy
A total of 231 landslide locations were collected from Bhukosh (GSI portal), representing past landslide events in the study area. For training the RF model, an equal number of 231 non-landslide points were randomly generated from regions with no known landslide activity. This balanced dataset approach enhances model accuracy and prevents classification bias. A total of 462 points formed the base of the attribute table for training and testing the RF classifier.
4.2.3. Data Formatting for Weka
The combined dataset was exported as a .CSV file from ArcGIS. This .CSV file was then converted into an ARFF file using Weka’s built-in file loader or a simple text editor. Each row represented a point, and each column corresponded to a predictive factor or the target class (1 for landslide, 0 for non-landslide) (
Table 4). Missing data, including any data point with null or missing values, was inspected and removed to maintain the integrity of the training data. However, care was taken to retain class balance. Training and testing split to validate the model’s performance: 70% of the dataset (323 points) was used for training; 30% of the dataset (139 points) was used for testing. This division ensured that the RF model was both trained on sufficient data and tested robustly on unseen data.
4.3. RF Classification Using Weka and Model Performance Evaluation
The RF algorithm was implemented using the open-source ML platform Weka, due to its robustness, ease of use, and suitability for classification problems involving both categorical and continuous data. RF is an ensemble learning method based on constructing a multitude of decision trees during training and outputting the class that is the mode of the classes (classification) of the individual trees.
4.3.1. RF Model Setup in Weka
After preparing the ARFF file from the sampled dataset (as described in the previous section), the classification process was carried out in Weka using the following steps: Classifier Used: RF from the “trees” group in Weka. Number of Trees (Iterations): 100 (default, which provides stable performance). Attributes Considered at Each Split: √n (square root of total attributes, where n = 8 predictive factor variables). Random Seed: Set to 1 for reproducibility.
4.3.2. Training and Testing Process
The model was trained on 70% of the dataset (323 samples) and tested on the remaining 30% (139 samples). Weka automatically handled the randomization and stratified sampling to maintain class balance in both sets (
Table 5). Upon completion of the classification, Weka generated several performance indicators. The key performance metrics are provided below:
Accuracy indicates the proportion of correctly classified instances. The Kappa statistic assesses the model performance over random chance, with values > 0.8 considered excellent. Precision and Recall measure the balance between false positives and false negatives. AUC-ROC (Area Under the Curve–Receiver Operating Characteristic) measures the model’s capability to distinguish between the classes, where values close to 1 denote high discrimination power.
4.3.3. Importance of Predictive Factors Variables
Weka also outputs the relative importance of input variables based on their contribution to tree splits. In this study, variables such as slope, geology, MSI, and NDVI had higher importance values, highlighting their stronger influence on landslide occurrences in the study area (
Table 6). The following weights can be visualized or normalized to produce a landslide susceptibility map using a raster calculator in ArcGIS.
The fallowing weights can be visualized or normalized to produce a landslide susceptibility map using a raster calculator in ArcGIS.
4.4. Generation of Landslide Susceptibility Index (LSI)
The landslide susceptibility levels were generated by integrating the predictive factor variables with their corresponding importance weights derived from the RF model. Following the classification process in Weka, the relative importance values assigned to each variable were utilized to construct a spatially explicit prediction model in ArcGIS 10.8. All predictive factors, initially prepared and reclassified into categorical raster formats, were standardized to a common resolution and projection to ensure spatial consistency across the dataset. To derive the landslide susceptibility level map, the relative importance of each predictive factor variable, as determined from the RF output, was used as a weight in a raster-based multi-criteria integration. These weights were normalized so that their cumulative sum equaled 1 (or 100%). Each predictive factor of the raster was then multiplied by its respective normalized weight using the Raster Calculator in ArcGIS. This overlay analysis resulted in a continuous Landslide Susceptibility Index (LSI), where each pixel value represented the cumulative weighted contribution of all influencing factors.
4.5. Generating the Composite Susceptibility Index
The output generated using the Raster Calculator in ArcGIS 10.8 represents the LSI, where each pixel holds a continuous value between 0 and 1. These values indicate the relative likelihood of landslide occurrence, with higher values corresponding to areas of greater susceptibility. The Landslide Susceptibility Maps (LSMs) derived from the RF model are presented in
Figure 4.
Figure 4a illustrates a binary classification map, categorizing the study area into landslide-prone zones (assigned a value of 1 and shown in dark shade) and non-landslide zones (assigned a value of 0 and shown in light shade), based on the RF model’s classification output. On the other hand,
Figure 4b displays a continuous susceptibility surface generated from the RF model’s probability estimates. This map visualizes the spatial distribution of landslide susceptibility across the landscape, with pixel values ranging from low to high. In the context of the Western Ghats region of Kerala, the highest susceptibility values are predominantly observed in the rugged and steep terrain of the northern and southwestern parts of the study area. These areas are characterized by heavy monsoonal rainfall, deeply weathered complex rock formations, and dense vegetation cover, all contributing significantly to slope instability. In contrast, lower susceptibility values are found in the relatively flat, less dissected, and geologically stable regions toward the central and coastal portions of the study area.
4.6. Reclassifying the LSI into Landslide Susceptibility Levels
The Landslide Susceptibility Index (LSI) derived from the RF model was classified into five susceptibility zones to interpret spatial risk variation across the study area. The final Landslide Susceptibility Map (LSM) highlights the relative likelihood of landslides, ranging from very low to very high susceptibility levels (
Figure 5 and
Table 7). The analysis reveals that the majority of the area (49.57%) falls under the Very Low Susceptibility category, covering approximately 1125.42 km
2, indicating stable terrain with minimal landslide risk. The Low Susceptibility zone constitutes 602.97 km
2, or 26.56%, primarily located in the gently undulating plains and coastal borderlands where slope and geomorphic activity are minimal. On the other end of the spectrum, the High and Very High Susceptibility zones account for 9.84% (223.52 km
2) and 7.99% (181.34 km
2) of the area, respectively. These zones are concentrated along the rugged, steeply sloping terrains of the Western Ghats, where factors such as deeply weathered rocks, high rainfall intensity, and dense vegetation significantly increase landslide risk. The Moderate Susceptibility zone covers 136.75 km
2 (6.02%), forming transitional areas between lowland stability and upland vulnerability.
4.7. Practical Implications of Landslide Susceptibility Level
The Landslide Susceptibility Map generated in this study offers critical insights that can be directly applied in land-use planning, infrastructure development, and disaster risk reduction strategies. High and very high susceptibility zones identified through the RF model should be prioritized for detailed geotechnical investigations, slope stabilization measures, and early warning systems. Urban planners and policymakers can use this map to avoid high-risk zones for future construction and development projects, thereby minimizing human and economic losses. Furthermore, the LSLM can assist disaster management authorities in identifying vulnerable areas, optimizing evacuation routes, and allocating resources more effectively during emergencies.
4.8. Validation Using Landslide Points
To assess the predictive performance of the Landslide Susceptibility Level Map (LSLM) generated using the RF model, a validation was carried out using the 231 historical landslide points obtained from the Bhukosh portal. These points, which represent known landslide occurrences within the study area, were spatially overlaid on the classified LSM. The validation employed both visual comparison and quantitative assessment. Spatial overlay analysis revealed that a substantial proportion of the landslide events were located within the High and Very High Susceptibility zones, indicating strong agreement between observed events and model predictions. To quantify the model’s predictive accuracy, a Receiver Operating Characteristic (ROC) curve was generated, and the Area Under the Curve (AUC) was calculated (
Figure 6). The AUC-ROC analysis is a widely accepted statistical method for evaluating classification model performance, particularly in binary classification problems like landslide prediction [
50,
51,
52,
53]. The model achieved an AUC value of 0.890, signifying excellent predictive capability and high discriminatory power between landslide-prone and stable areas. The bold red line represents the main ROC curve with an Area Under the Curve (AUC) value of 0.890, indicating high model accuracy. The dashed diagonal line denotes the line of no-discrimination (random guess). The two faint red lines represent additional curves automatically generated by the software, likely from resampling or internal validation iterations. While not confidence intervals, they provide a visual indication of model variability. The figure demonstrates the model’s ability to effectively distinguish between landslide and non-landslide areas. The high AUC score, coupled with the spatial concentration of known landslide points within higher susceptibility zones, confirms the robustness and reliability of the RF-based landslide susceptibility model in capturing spatial patterns of landslide occurrence in the Western Ghats region.
5. Discussion
The present study demonstrates the effectiveness of integrating multi-predictive factors, geospatial variables, and ML algorithms for landslide susceptibility mapping (LSM) in the ecologically sensitive and geomorphologically complex terrain of the Western Ghats, Kerala. The application of the RF algorithm in this study has yielded a highly accurate prediction model, as indicated by the Area Under the ROC Curve (AUC) value of 0.92, suggesting a strong agreement between predicted susceptible zones and actual landslide occurrences. Unlike conventional statistical models such as Logistic Regression or the Bivariate Frequency Ratio, which are limited by linearity and variable independence assumptions, the RF model is non-parametric and can handle nonlinear relationships and complex interactions among multiple predictive factors variables. This strength is evident in the model’s ability to identify subtle patterns of terrain instability influenced by a combination of slope, aspect, drainage density, geological conditions, and vegetation cover (NDVI), all of which are highly heterogeneous across the study area.
The predictive factor importance ranking derived from the RF model highlights slope, geology, and drainage density as the most influential predictors of landslide occurrence, which aligns with previous findings from similar terrain settings in the Western Ghats and the Himalayan regions [
54,
55,
56,
57,
58,
59,
60]. However, this study differs in its inclusion of moisture-sensitive indices such as MSI and vegetation greenness (NDVI), which provided added insights into hydrological triggers and terrain cover conditions, both critical to understanding the landslide dynamics in monsoon-influenced terrains. The spatial pattern of landslide susceptibility indicates a high concentration of susceptible areas in the northern and southwestern sectors of the study region, which are characterized by steep slopes, deeply weathered hard rock formations, and high annual precipitation. These findings confirm that not just slope angle, but the weathering profile of underlying geology, particularly Charnockites and Migmatites, and rain-induced saturation of topsoil contribute significantly to slope failures. Notably, the coastal plains and central zones, with their flatter gradients and sedimentary formations, showed very low to low susceptibility, reinforcing the geomorphic control over landslide occurrences.
In comparison with earlier studies in Kerala, which predominantly relied on knowledge-driven approaches like the Analytical Hierarchy Process (AHP), this data-driven RF approach provides a more objective and replicable method of classification. Moreover, by using only freely available tools such as ArcGIS 10.8 and Weka, this methodology is accessible to local planners and disaster managers who may not have access to commercial or advanced platforms. The LSM produced here is statistically robust and spatially relevant, providing actionable insights for land-use planning, infrastructure development, and risk mitigation strategies. Future research may benefit from incorporating temporal rainfall intensity, soil depth, and landslide velocity data, which could further improve the temporal resolution of predictions and support early warning systems.
5.1. Model Transferability and Regional Comparisons
While the RF-based LSM model demonstrated high accuracy in the Western Ghats region, its performance may vary when applied to areas with different geomorphic, climatic, and land-use characteristics. For instance, studies in the Eastern Himalayas and the Nilgiris have also employed RF models with comparable AUC scores ranging from 0.87 to 0.94 [
61,
62], affirming the robustness of RF across diverse terrains. However, variations in lithology, rainfall regimes, and anthropogenic influences may alter the importance ranking of input variables. This underlines the importance of local calibration before applying the model to different regions. The transferability requires contextual adaptation. While slope and geology were dominant in this study, other areas may exhibit stronger control from land use changes, seismicity, or hydrological parameters. Therefore, stakeholders must ensure localized model validation. Future multi-regional comparative studies could further evaluate the generalizability of the RF framework and support the development of region-specific susceptibility models for broader disaster risk management applications.
5.2. Limitations and Future Scope
Despite the high accuracy achieved by the RF model, the study is limited by the availability of high-resolution temporal rainfall and soil depth data, which are crucial for capturing short-term triggering factors. The landslide inventory used, although comprehensive, is spatially constrained to recorded events and may not reflect all historical or unreported occurrences. The absence of dynamic parameters such as land use changes over time and anthropogenic disturbances (e.g., road cuts, quarrying) might underrepresent certain high-risk areas. Another key limitation is the absence of rainfall intensity–duration thresholds, critical in accurately modeling landslide triggers in monsoon-dominated regions like Kerala. To address this, future work should focus on integrating near-real-time satellite-based rainfall data and ground-based weather station data to improve temporal prediction accuracy. Real-time monitoring using remote sensing platforms could also be incorporated to enable early warning capabilities.
Furthermore, evaluating and comparing the performance of advanced machine learning algorithms such as XGBoost, Support Vector Machines, or deep learning models may provide additional insights into model robustness and transferability. Ultimately, coupling susceptibility models with early warning systems, temporal change detection, and on-ground validation will enhance the practical applicability of landslide risk reduction strategies for planners, disaster managers, and policymakers.
6. Conclusions
This study successfully demonstrates the application of the RF-ML algorithm to map landslide susceptibility in a geologically complex and rainfall-prone Western Ghats, Kerala region. Eight significant predictive factor variables—Stream Order, Drainage Density, Slope, Aspect, Geology, LULC, NDVI, and MSI—were used as predictors in the model. The RF classifier effectively captured these variables’ nonlinear relationships, producing a high predictive accuracy LSI map. The resulting map was reclassified into five susceptibility levels, revealing that approximately 17.82% of the study area falls under high to very high susceptibility classes, primarily in steep, weathered, and high-rainfall terrains. Validation using 231 landslide inventory points from Bhukosh and AUC-ROC analysis yielded an accuracy of 0.890, indicating strong model performance. This research highlights the potential of RF-based models integrated with GIS for reliable landslide hazard zonation in data-limited mountainous regions. The findings can aid local authorities and planners in prioritizing high-risk zones for mitigation strategies, infrastructure planning, and early warning system development.