This study explores two modeling issues that may cause uncertainty in landslide susceptibility assessments when different sampling strategies are employed. The first issue is that extracted attributes within a landslide inventory polygon can vary if the sample is obtained from different locations with diverse topographic conditions. The second issue is the mixing problem of landslide inventory that the detection of landslide areas from remotely-sensed data generally includes source and run-out features unless the run-out portion can be removed manually with auxiliary data. To this end, different statistical sampling strategies and the run-out influence on random forests (RF)-based landslide susceptibility modeling are explored for Typhoon Morakot in 2009 in southern Taiwan. To address the construction of models with an extremely high false alarm error or missing error, this study integrated cost-sensitive analysis with RF to adjust the decision boundary to achieve improvements. Experimental results indicate that, compared with a logistic regression model, RF with the hybrid sample strategy generally performs better, achieving over 80% and 0.7 for the overall accuracy and kappa coefficient, respectively, and higher accuracies can be obtained when the run-out is treated as an independent class or combined with a non-landslide class. Cost-sensitive analysis significantly improved the prediction accuracy from 5% to 10%. Therefore, run-out should be separated from the landslide source and labeled as an individual class when preparing a landslide inventory.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited