Review Reports - Landslide Susceptibility Mapping Optimization for Improved Risk Assessment Using Multicollinearity Analysis and Machine Learning Technique

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This study takes the Doti District of Nepal as the research area, integrates multicollinearity analysis (e.g., PCA), machine learning methods such as Random Forest and Permutation Importance, conducts research on landslide inventory construction, susceptibility modeling, and risk assessment, and finally generates a regional landslide risk map, providing technical support for local landslide disaster management. However, the paper still has the following issues:

L43, Introduction section: There are only 15 cited literatures, which is insufficient in quantity and has gaps in coverage. It fails to systematically review the landslide research results in the Doti District or similar Himalayan mountainous areas; at the same time, it does not elaborate on the theoretical and application significance of the research, nor does it focus on the particularities of the Doti District, making it difficult to highlight the regional pertinence of the research.
L105, Figure 1c: It is recommended to clearly mark the landslide scope in the figure to enhance the clarity of the spatial positioning of the landslide; in addition, the fourth photo in the figure has insufficient clarity, and it is recommended to replace it with a high-resolution image.
L109: Was the landslide inventory work verified through field surveys? What are the selection criteria for non-landslide points?
L380: Vulnerability was calculated based on the indicators of the United Nations Office for Disaster Risk Reduction (UNDRR) (e.g., farmland, buildings), but the method for determining the indicator weights was not explained.
L441: The font size of the text in Figure 4 is too small, which affects information reading; it is recommended to uniformly add borders to all subfigures in Figure 6 to improve the regularity of the charts.
L475: The AUC of the Permutation-Weighted model reaches 95%, but the accuracy is only 69%, and there is a significant contradiction between the two (usually, a high AUC corresponds to excellent classification performance, and the accuracy should be close to the AUC level).
L548: It is recommended to add landslide scope labels in Figures 8d, 8e, and 8f to facilitate the interpretation of verification results.
L552: The discussion section has problems of insufficient depth (key results are not deeply attributed in combination with data defects and regional characteristics, and there is no systematic comparison with similar studies) and a complete lack of discussion on model uncertainty (failure to analyze data, parameter, and spatial uncertainty).

Author Response

Author response is as attached below.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This study demonstrates that replacing human subjectivity and basic statistical methods with advanced statistical analysis, machine learning, and socio-economic integration can produce more effective tools for landslide risk management. The authors empirically validate this approach in the context of Doti District, western Nepal.

The article presents an interesting application with strong potential for replication in other parts of the world. Therefore, the work fits well within the scope of the journal.

However, there are some aspects that deserve attention and revision. I will point them out by section.

Figures, Equations, Graphs, and Tables

In the importance graphs (Figure 4d–e), standard deviation (SD) values are mentioned but not shown graphically.

Some equations lack explanatory legends. For example, in Eq. 5 the meaning of each symbol only appears later in the text.

Although the article mentions PCA (Eq. 3), it does not show how eigenvalues and variances were calculated, nor the percentage of variance explained by the first components.

Introduction

The authors clearly describe the limitations of existing models (multicollinearity, subjective weighting, lack of integration between physical and social factors). However, they never explicitly formulate a research hypothesis or question. Doing so would show readers what is actually being tested and allow the results and discussion to return to that hypothesis — which does not happen in the final text (the conclusion merely reaffirms the success of the method, but not the initial hypothesis).
Suggestion: make the research question or hypothesis explicit.

The introduction presents several references (Lee et al., Kumar et al., Selamat et al., etc.), but they appear as a chronological list of studies without organizing the conceptual field. It is not clear what each study contributes or how the current work advances beyond them.

Methodology

a) The article states that data are “available upon request,” but provides no scripts, data links, or public repository (e.g., GitHub, Zenodo).
Suggestion: Given the strong potential for replication (I would personally use this in the classroom), the authors should make the complete workflow and Python code (as a notebook) publicly available to ensure reproducible transparency — a growing requirement in open science.

b) Selection of causal factors: The choice of 12 variables is coherent, but the article does not explain why other potential variables (e.g., detailed lithology, daily rainfall intensity, NDMI, distance to geological faults) were excluded.
Suggestion: justify the exclusion criteria and data limitations.

c) Normalization of variables: The use of min–max scaling is mentioned, but there is no verification of normal distribution (skewness/kurtosis), which is important for PCA. Equations 23–25 present weighted overlay models, but there is no discussion of possible nonlinear relationships between causative factors.
Suggestion: include an exploratory analysis (boxplots, histograms) to show whether normalization is appropriate.

d) Interpretation of multicollinearity: The article mentions “perfect multicollinearity r=1 between Geology and LULC” but does not discuss the physical causes (why would land use be 100% correlated with geology?).
Suggestion: discuss whether this r=1 results from a coding error or a spatial data limitation.

Results

e) Geomorphological interpretation is superficial.
The article confirms that “slope is the dominant factor” but does not explain the physical reasoning (relationship with soil type, rainfall, tectonic structure, etc.).
Suggestion: enrich the discussion with a process-based geomorphological interpretation.

f) No statistical comparison between models.
The comparison between AUCs (95%, 93%, 90%) is purely descriptive; a statistical test (e.g., DeLong test) should be used to confirm whether differences are significant.
Suggestion: apply a significance test or present confidence intervals.

g) Validation through Google Earth imagery is insufficient.
Suggestion: use an independent quantitative validation (e.g., confusion matrix using field validation points, or cumulative gain curve analysis).

Author Response

Author response to the reviewer comments/suggestions is as attached in the file below.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

Landslide susceptibility mapping identifies areas prone to landslides, providing essential information for risk mitigation and land-use planning. In this manuscript, an improved approach to landslide susceptibility mapping is developed through the integration of geospatial modeling with multi-criteria decision analysis. This method tackles major challenges in LSM by incorporating advanced multicollinearity assessment and machine learning techniques. I have several suggestions for the authors to further improve the quality of this manuscript:

It is suggested to specify the data format and structure of the landslide inventory and to explain whether on-site verification was conducted to ensure the reliability of the mapped landslide locations.
The introduction provides insufficient discussion on the application of ML methods in LSM and lacks logical coherence. Emerging physics-informed ML are transforming LSM research by improving model generalization and maintaining physical consistency in predictions. It is suggested to discuss these advanced approaches in the introduction.

DOI: 10.1007/s11440-023-01841-4

DOI: 10.1007/s10462-021-09967-1

DOI: 10.1016/j.jrmge.2025.06.030

DOI: 10.1016/j.jrmge.2024.08.005

DOI: 10.1007/s11771-024-5687-3

DOI: 10.1002/gj.4902

The computational accuracy of machine learning is intricately tied to the quality and quantity of data. Consequently, the author should furnish a more comprehensive exposition on the procedures for data collection and preprocessing.
For the collinearity matrix, what is the threshold that defines the boundary for unacceptable levels of correlation?
All abbreviations should be explicitly defined upon their first appearance in the manuscript.
The authors are encouraged to conduct an model uncertainty analysis.
The description of the analysis is unclear, and the software or packages used are not specified. The authors are encouraged to share their code on GitHub, if possible, to allow others to reproduce and adapt their methodology.
Please clearly explain the potential implications or limitations of applying this algorithm to a different region or country.

Author Response

The author response to reviewer 3 is as in the attached file below.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

Dear authors, thank you for this well-designed and clearly articulated study, which make a meaningful contribution to the field of geospatial hazard modeling. The manuscript is well structured, with results backed by thorough statistical analysis and insightful interpretation. However, I have a few recomandations to improve the paper's clarity and overall impact:

1-for the introduction, the background is thorough, yet it could be strengthened by referencing some of the more recent regional applications of machine learning in landslide studies, such as publications from 2023-2024, to demonstrate the novelty of your approach within a broader framework.

2- The methodology is clearly explained, but it would be helpful to provide further clarification on several implementation specifics, such as hyperparameter settings, as well as the software or libraries used for PCA and Random Forest training. This would improve the reproducibility of your workflow.

3- The visual quality of the optimization curves, hazard maps, and risk maps need to be improved. Increasing resolution and ensuring standardized legends and color scales will make them easier to interpret.

4- Consider including a summary table that compares the performance of the various models evaluated, showcasing mean performance (± standard deviation) for each. This would make comparisons between Random Forest, Permutation, and hybrid methods more transparent.

5- The discussion should include a brief section on computational efficiency, specifically addressing training time and resource utilisation, as well as potential expansions to other geomorphological settings to demonstrate the scalability of the proposed framework.

Author Response

The author response to comments from reviewer 4 are as in the attached file below.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The paper has met the publication requirements after revisions and is approved for publication.

Reviewer 2 Report

Comments and Suggestions for Authors

The authors addressed all the issues raised.