Next Article in Journal
Joint Use of in-Scene Background Radiance Estimation and Optimal Estimation Methods for Quantifying Methane Emissions Using PRISMA Hyperspectral Satellite Data: Application to the Korpezhe Industrial Site
Previous Article in Journal
Distribution Modeling and Factor Correlation Analysis of Landslides in the Large Fault Zone of the Western Qinling Mountains: A Machine Learning Algorithm
 
 
Article
Peer-Review Record

Explainable Boosting Machines for Slope Failure Spatial Predictive Modeling

Remote Sens. 2021, 13(24), 4991; https://doi.org/10.3390/rs13244991
by Aaron E. Maxwell 1,*, Maneesh Sharma 2 and Kurt A. Donaldson 2
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Remote Sens. 2021, 13(24), 4991; https://doi.org/10.3390/rs13244991
Submission received: 23 October 2021 / Revised: 4 December 2021 / Accepted: 7 December 2021 / Published: 8 December 2021
(This article belongs to the Section Remote Sensing in Geology, Geomorphology and Hydrology)

Round 1

Reviewer 1 Report

An excellent manuscript that contributes significantly to the SoA. Recommend publishing.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 2 Report

What is the interaction for this study to the community of remote sensing besides data providing? Is the resolution of LiDAR DEM affecting this conclusion ? Is the sizes of landslide matters to this conclusion? What is the influence of Pre-event or post-event DEM to this work then? Since all the considering factors are directly calculated from DEM, the variance of DEM in affecting this work should be address. Please validate your AUC at a region without training data, I would like to see the accuracy of AI landslide prediction without local training model. 

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 3 Report

The authors of the paper describe Explainable Boosting Machines (EBM) and evaluate their performance in predicting the probability of slope failures. EBM performance is compared to that of other machine learning and regression alternatives by using data from several study areas. The authors show, for their study data, that EBM is capable of producing similar or improved results to common alternative approaches. The authors then show that the EBM model can reveal quantitative importance of input variables, which is an interesting measure. More transparency with input data is needed before conclusions can be interpreted from variable importance.

General Comments

There is occasional mention to how the work sits within a series of works from the same authors. The present work does not require a paragraph description of the previous works (lines 66-84), nor occasional mention throughout that the works are from the same authors. The previous works contribute knowledge that helped inform the present study, and should rightfully be cited by this work, but I think that they should be treated as literature and not an active part of the present study. Doing so will help to reduce the work’s length and simplify its message.

Sources of training data are provided, but I do not feel that the data are described adequately. I think that readers would like to see quantities describing the mapped slope failures and each study area so that they can compare the datasets to other datasets. For example, providing a table with average and standard deviation of slope (at 2m resolution), and perhaps profile curvature, for each study area (all terrain within area) and for each mapped slope failure. I think that there should also be a comparison of such statistics for slope failures mapped as points versus slope failures mapped as polygons. Are the statistics for slope failures initially mapped as polygons different from those initially mapped as points? Maps of all, or portions, of each study area could be shown with slope failure points as a way of adding transparency.

The concept of variable importance is introduced in section 4.3, but I think that it equivalent to the interpretability described in section 2.1. If the two items are the same thing, then I think that a connection should be made between the two in section 2.1. Furthermore, partial and conditional variable importance were difficult for me to understand in their current context. I think that section 2.1 needs to introduce these concepts independent of results and discussion. In general, section 2.1 could benefit from something saying “EBM is more interpretable, and this is why…”.

Specific Comments

Line

 

48, 87

Avoid stating the same thing twice, particularly without punctuation.

59-60

Sentence is misplaced. The current position divides two related sentences.

66-77

Condense content and add to section 3.2

75-77; 113-116; 195-197

Run-on sentences. Lack of proper punctuation makes the sentence’s meaning unclear.

78-84

Content does not seem relevant to current study.

103

Are GAMs more interpretable than black box ML methods, or are the equations that they produce more interpretable? Implementation of GAMs seems to be of similar complexity to implementation of ML methods.

180-184

I do not completely agree with this sentence. Other works do exist, for example:

 

Chang, KT., Merghadi, A., Yunus, A.P. et al. Evaluating scale effects of topographic variables in landslide susceptibility models using GIS-based machine learning techniques. Sci Rep 9, 12296 (2019). https://doi.org/10.1038/s41598-019-48773-2

 

250-252

Agencies tend to map landslides within areas containing their interests, and do not always do a good job mapping entire areas. For example, my experience with DOT slope failure points is that they tend to be concentrated along highways. Does your methodology account for this when selecting absence points? Just because a landslide was not mapped does not mean it is not a landslide with characteristic landslide topography. Additionally, landslides are more often frequently mapped in developed areas with reworked terrain. Perhaps landslide points actually sample parameters of human change and not landslide change. If properties apply to your slope failure dataset, I suggest removing them and focusing your analysis on the best of the dataset. With tens of thousands of data points, there should be no challenge building a

254-257

Is the model sensitive to the proportion of slope failure points to pseudo absence samples in the testing dataset? Slope failures can be both infrequent and frequent, and the frequency of slope failures can influence confusion matrix metrics.

295-296

With bilinear interpolation, or not?

449

Labeling only plot (a) gives the plot a confusing appearance and is not necessary, considering that adequate room already exists above (b)-(c). Please label all axes.

533; 542

Labeling only plot (a) gives the plot a confusing appearance and is not necessary, considering that adequate room already exists above (b). Please label all axes.

533

Plot (b) is the only one with “Slp” not being the most important variable. In fact, “Slp” falls to near the bottom of the list. Why is this the case? Could there be bias in the training data (adding the tables and transparency described above could help to see)? Are slope failures in the Cumberland Plateau and Mountains area just very different?

Comments for author File: Comments.pdf

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Back to TopTop