# Exploring Influence of Sampling Strategies on Event-Based Landslide Susceptibility Modeling

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Study Site and Data

^{2}, the Laonong River watershed, located in the Kaoping River watershed in southern Taiwan, is selected as the study site (Figure 1). The elevation of the study site ranges from 258 to 1666 m above sea level, measured from the digital elevation model (Figure 2a), as edited by Chiang et al. [94]. The average slope and standard deviation are 25.84° and 11.98°, respectively. According to the geological and soil maps published by the Central Geological Survey of Taiwan, there are three geological formations and four soil types in this study area. The geological formations are the Lushan, Snhsia, and Toukoshan formations (Figure 2b), and the main soil types (agriculture- and geology-based classification) are alluvium, colluviums, lithosol, and loam soils (Figure 2c).

^{2}) with only two rainfall gauge stations located inside the study site. Similar assumptions have been made in other studies [57,77,95,96,97].

#### 2.2. Developed Procedure

#### 2.2.1. Sampling Strategies and Analytical Schemes

#### 2.2.2. Random Forests

_{j}) is the entropy of the subset in a specific landslide factor computed by Equation (1), and IG(a) indicates the information gain of a specific landslide factor. For continuous data, the Gini index is utilized to calculate information gain, as described in Equations (4) and (5), where C represents a segmented point for a specific landslide factor used to divide numeric data into two parts and N

_{1}and N

_{2}are the numbers of a ≤ C and a > C, respectively.

#### 2.2.3. Cost-Sensitive Analysis

_{00}= C

_{11}= 0, Equation (7) is equivalent to Equation (8), and a threshold p* can be defined, according to Equation (10) based on Equation (9), for the classifier to classify an instance x as positive if P(1|x) is larger than or equal to the threshold [99].

#### 2.2.4. Accuracy Assessment and Mapping

## 3. Results

#### 3.1. Topographic Characteristics

^{2}. The numbers of extracted samples are listed in Table 4. Notably, the number of samples in Table 4 without an area constraint is less than the number of polygons in Table 1. The data preprocessing step transforms the inventory polygons into pixel-based samples by using different sampling strategies to extract the corresponding landslide factors. Some source and run-out samples based on maximum, median, and minimum sampling operators may be located beyond the study area after transformation from vector to raster format and thus yield no data. Therefore, this study ignores those unavailable samples. The results obtained by subtracting the average slope of the run-out from the source class are shown in Figure 5a. It is obvious that a larger area size enables the source and run-out classes to be distinguished better. Moreover, the centroid and median sampling operators vary more with the inventory polygon size. To maintain sufficient distinguishability and a sufficient number of samples, the samples were extracted from inventory polygons with an area equal to or larger than 1000 m

^{2}for further analysis. The area size, which was smaller than the previous threshold, was also extracted for comparison. Figure 5b shows that it is more difficult to distinguish between the source and run-out classes when using the small area threshold (area <1000 m

^{2}). Furthermore, the average slope of the source areas was larger than the run-out class with both constraints. Figure 5c presents the standard deviations for Figure 5b. Both Figure 5b,c suggest that the slope distributions are lower and there are more variations for the run-out class, which is reasonable, because these samples contain landslide trails and depositions, resulting in mixed results.

#### 3.2. Modeling Performance

^{2}are extracted by various sampling operators, is further considered. Figure 7 illustrates the Max–Min sampling strategy, in which the maximum and minimum operators are applied to extract the source and run-out samples, respectively, outperforms other combinations when taking all classes into account (the confusion matrix result is shown in Table 5). These findings correlated well with those in Figure 6, which indicates that the maximum and minimum operators perform better for the source and run-out classes individually. Compared with logistic regression [25,26,27], a commonly used algorithm in related domains, Table 5 also demonstrates the efficiency of RF.

^{2}) to the 2010, 2011, and 2015 landslide records. Table 6 shows the number of extracted source and run-out samples. Moreover, non-landslide samples were extracted randomly with amounts set to be the same as the sum of source and run-out instances.

#### 3.3. Landslide Susceptibility Mapping

## 4. Discussion

## 5. Conclusions

^{2}) with RF can outperform logistic regression and achieve better results: Higher than 80%, 0.7, 0.79, and 0.66 for the overall accuracy, kappa, UA, and PA measures, respectively. In addition, by treating the run-out area as a landslide or non-landslide class, it was determined that the run-out area should be considered part of the non-landslide class if treating the run-out as an independent class (Model-1) does not yield acceptable results. Cost-sensitive analysis was also used to adjust the decision boundary to improve Model-1 performance, achieving a 9% increase in the run-out’s PA. The results of verifying later landslide events also indicate that using cost-sensitive analysis can lead to an improvement of range from 5% to 10% for the landslide source’s UA performance. According to these analyses, it is suggested that run-out should be included as an individual class in a landslide inventory for the construction of more reliable and flexible susceptibility models.

## Author Contributions

## Funding

## Conflicts of Interest

## References

- NDPPC. Disaster Response Disposition Report of Typhoon Morakot; National Disaster Prevention and Protection Commission: New Taipei City, Taiwan, 2009; p. 74.
- Mondini, A.C.; Chang, K.-T.; Yin, H.-Y. Combing multiple change detection indices for mapping landslides triggered by typhoons. Geomorphology
**2011**, 134, 440–451. [Google Scholar] [CrossRef] - Mondini, A.C.; Chang, K.-T. Combing spectral and geoenvironmental information for probabilistic event landslide mapping. Geomorphology
**2014**, 213, 183–189. [Google Scholar] [CrossRef] - Deng, Y.C.; Tsai, F.; Hwang, J.H. Landslide characteristics in the area of Xiaolin Village during Morakot typhoon. Arab. J. Geosci.
**2016**, 9, 332. [Google Scholar] [CrossRef] - Tsai, F.; Hwang, J.-H.; Chen, L.-C.; Lin, T.-H. Post-disaster assessment of land-slides in southern Taiwan after 2009 Typhoon Morakot using remote sensing and spatial analysis. Nat. Hazards Earth Syst. Sci.
**2010**, 10, 2179–2190. [Google Scholar] [CrossRef] - Tsou, C.-Y.; Feng, Z.-Y.; Chigira, M. Catastrophic landslide induced by Typhoon Morakot, Shiaolin, Taiwan. Geomorphology
**2011**, 127, 166–178. [Google Scholar] [CrossRef][Green Version] - Wu, C.-H.; Chen, S.-C.; Chou, H.-T. Geomorphologic characteristics of catastrophic landslides during typhoon Morakot in the Kaoping Watershed, Taiwan. Eng. Geol.
**2011**, 123, 13–21. [Google Scholar] [CrossRef] - Chen, Y.-C.; Chang, K.-T.; Lee, H.-Y.; Chiang, S.-H. Average landslide erosion rate at the watershed scale in southern Taiwan estimated from magnitude and frequency of rainfall. Geomorphology
**2015**, 228, 756–764. [Google Scholar] [CrossRef] - Chang, K.-T.; Chiang, S.-H.; Chen, Y.-C.; Mondini, A.C. Modeling the spatial occurrence of shallow landslides triggered by typhoons. Geomorphology
**2014**, 208, 137–148. [Google Scholar] [CrossRef] - Dai, F.C.; Lee, C.F.; Ngai, Y.Y. Landslide risk assessment and management: An overview. Eng. Geol.
**2002**, 64, 65–87. [Google Scholar] [CrossRef] - Van Westen, C.J.; van Asch, T.W.J.; Soeters, R. Landslide hazard and risk zonation—Why is it still so difficult? Bull. Eng. Geol. Environ.
**2006**, 65, 167–184. [Google Scholar] [CrossRef] - Van Westen, C.J.; Castellanos, E.; Kuriakose, S.L. Spatial data for landslide susceptibility, hazard, and vulnerability assessment: An overview. Eng. Geol.
**2008**, 102, 112–131. [Google Scholar] [CrossRef] - Brabb, E.E. Innovative approaches to landslide hazard mapping. In Proceedings of the 4th International Symposium on Landslides, Toronto, ON, Canada, 16–21 September 1984. [Google Scholar]
- Clerici, A.; Perego, S.; Tellini, C.; Vescovi, P. A GIS-based automated procedure for landslide susceptibility mapping by the condition analysis method: The Baganza valley case study (Italian Northern Apennines). Environ. Geol.
**2006**, 50, 941–961. [Google Scholar] [CrossRef] - Wan, S. A spatial decision support system for extracting the core factors and thresholds for landslide susceptibility map. Eng. Geol.
**2009**, 108, 237–251. [Google Scholar] [CrossRef] - Yilmaz, I. Comparison of landslide susceptibility mapping methodologies for Koyulhisar, Turkey: Conditional probability, logistic regression, artificial networks, and support vector machine. Environ. Earth Sci.
**2010**, 61, 821–836. [Google Scholar] [CrossRef] - Goetz, J.N.; Guthrie, R.H.; Brenning, A. Integrating physical and empirical landslide susceptibility models using generalized additive models. Geomorphology
**2011**, 129, 376–386. [Google Scholar] [CrossRef] - Kang, K.; Ponomarev, A.; Zerkal, O.; Huang, S.; Lin, Q. Shallow landslide susceptibility mapping in Sochi Ski-Jump area using GIS and numerical modelling. ISPRS Int. J. GeoInf.
**2019**, 8, 148. [Google Scholar] [CrossRef] - Park, D.W.; Nikhil, N.V.; Lee, S.R. Landslide and debris flow susceptibility zonation using TRIGRS for the 2011 Seoul landslide event. Nat. Hazards Earth Syst. Sci.
**2013**, 13, 2833–2849. [Google Scholar] [CrossRef][Green Version] - Van Westen, C.J.; Lulie, G.F. Analyzing the evolution of the Tessina landslide using aerial photographs and digital elevation models. Geomorphology
**2003**, 54, 77–89. [Google Scholar] [CrossRef] - Van Westen, C.J.; Soeters, R.; Sijmons, K. Digital geomorphological landslide hazard mapping of the Alpage area, Italy. Int. J. Appl. Earth Obs. Geoinf.
**2000**, 2, 51–59. [Google Scholar] [CrossRef] - Bai, S.; Lu, G.; Wang, J.; Zhou, P.; Ding, L. GIS-based rare events logistic regression for landslide-susceptibility mapping of Lianyungang, China. Environ. Earth Sci.
**2011**, 62, 139–149. [Google Scholar] [CrossRef] - He, H.; Hu, D.; Sun, Q.; Zhe, L.; Liu, Y. A landslide susceptibility assessment method based on GIS technology and an AHP-weighted information content method: A case study of Southern Anhui, China. ISPRS Int. J. GeoInf.
**2019**, 8, 266. [Google Scholar] [CrossRef] - Iovine, G.G.R.; Greco, R.; Gariano, S.L.; Pellegrino, A.D.; Terranova, O.G. Shallow-landslide susceptibility in the Costa Viola mountain ridge (southern Calabria, Italy) with considerations on the role of causal factors. Nat. Hazards
**2014**, 73, 111–136. [Google Scholar] [CrossRef] - Lee, C.-T.; Huang, C.-C.; Lee, J.-F.; Pan, K.-L.; Lin, M.-L.; Dong, J.J. Statistical approach to storm event-induced landslide susceptibility. Nat. Hazards Earth Syst. Sci.
**2008**, 8, 941–960. [Google Scholar] [CrossRef] - Oh, H.-J.; Lee, S. Cross-application used to validate landslide susceptibility maps using a probabilistic model from Korea. Environ. Earth Sci.
**2011**, 64, 395–409. [Google Scholar] [CrossRef] - Su, X.; Chen, J.; Bao, Y.; Han, X.; Zhan, J.; Peng, W. Landslide susceptibility mapping using logistic regression analysis along the Jinsha River and its tributaries close to Derong and Deqin County, Southwestern China. ISPRS Int. J. GeoInf.
**2018**, 7, 438. [Google Scholar] - Nefeslioglu, H.A.; Sezer, E.; Gokceoglu, C.; Bozkir, A.S.; Duman, T.Y. Assessment of landslide susceptibility by decision trees in the metropolitan area of Istanbul, Turkey. Math. Probl. Eng.
**2010**, 2010, 901095. [Google Scholar] [CrossRef] - Saito, H.; Nakayama, D.; Matsuyama, H. Comparison of landslide susceptibility based on a decision-tree model and actual landslide occurrence: The Akaishi Mountains, Japan. Geomorphology
**2009**, 109, 108–121. [Google Scholar] [CrossRef] - Tsai, F.; Lai, J.-S.; Chen, W.W.; Lin, T.-H. Analysis of topographic and vegetative factors with data mining for landslide verification. Ecol. Eng.
**2013**, 61, 669–677. [Google Scholar] [CrossRef] - Gorsevski, P.V.; Jankowski, P. Discerning landslide susceptibility using rough sets. Comput. Environ. Urban Syst.
**2008**, 32, 53–65. [Google Scholar] [CrossRef] - Wan, S.; Lei, T.C.; Chou, T.Y. A novel data mining technique of analysis and classification for landslide problems. Nat. Hazards
**2010**, 52, 211–230. [Google Scholar] [CrossRef] - Su, C.; Wang, L.; Wang, X.; Huang, Z.; Zhang, X. Mapping of rainfall-induced landslide susceptibility in Wencheng, China, using support vector machine. Nat. Hazards
**2015**, 76, 1759–1779. [Google Scholar] [CrossRef] - Choi, J.; Oh, H.-J.; Won, J.-S.; Lee, S. Validation of an artificial neural networks model for landslide susceptibility mapping. Environ. Earth Sci.
**2010**, 60, 473–483. [Google Scholar] [CrossRef] - Kanungo, D.P.; Arora, M.K.; Sarkar, S.; Gupta, R.P. A comparative study of conventional, ANN black box, fuzzy and combined neural and fuzzy weighting procedures for landslide susceptibility zonation in Darjeeling Himalayas. Eng. Geol.
**2006**, 85, 347–366. [Google Scholar] [CrossRef] - Lee, S.; Ryu, J.-H.; Lee, M.-J.; Won, J.-S. The application of artificial neural networks to landslide susceptibility mapping at Janghung Korea. Math. Geol.
**2006**, 38, 199–220. [Google Scholar] [CrossRef] - Melchiorre, C.; Matteucci, M.; Azzoni, A.; Zanchi, A. Artificial neural networks and cluster analysis in landslide susceptibility zonation. Geomorphology
**2008**, 94, 379–400. [Google Scholar] [CrossRef] - Nourani, V.; Pradhan, B.; Ghaffari, H.; Sharifi, S.S. Landslide susceptibility mapping at Zonouz Plain, Iran using genetic programming and comparison with frequency ratio, logistic regression, and artificial neural network models. Nat. Hazards
**2014**, 71, 523–547. [Google Scholar] [CrossRef] - Pradhan, B.; Lee, S. Landslide susceptibility assessment and factor effect analysis: Backpropagation artificial neural networks and their comparison with frequency ratio and bivariate logistic regression modeling. Environ. Model. Softw.
**2010**, 25, 747–759. [Google Scholar] [CrossRef] - Pradhan, B.; Lee, S. Regional landslide susceptibility analysis using back-propagation neural network model at Cameron Highland, Malaysia. Landslides
**2010**, 7, 13–30. [Google Scholar] [CrossRef] - Pradhan, B.; Lee, S.; Buchroithner, M.F. A GIS-based back-propagation neural network model and its cross-application and validation for landslide susceptibility analyses. Comput. Environ. Urban Syst.
**2010**, 34, 216–235. [Google Scholar] [CrossRef] - Song, K.-Y.; Oh, H.-J.; Choi, J.; Park, I.; Lee, C.; Lee, S. Prediction of landslides using ASTER imagery and data mining models. Adv. Space Res.
**2012**, 49, 978–993. [Google Scholar] [CrossRef] - Yilmaz, I. Landslide susceptibility mapping using frequency ratio, logistic regression, artificial neural networks and their comparison: A case study from Kat landslides (Tokat-Turkey). Comput. Geosci.
**2009**, 35, 1125–1138. [Google Scholar] [CrossRef] - Bui, D.T.; Pradhan, B.; Lofman, O.; Revhaug, I.; Dick, O.B. Spatial prediction of landslide hazards in Hoa Binh province (Vietnam): A comparative assessment of the efficacy of evidential belief functions and fuzzy logic models. Catena
**2012**, 96, 28–40. [Google Scholar] - Gemitzi, A.; Falalakis, G.; Eskioglou, P.; Petalas, C. Evaluating landslide susceptibility using environmental factors, fuzzy membership functions and GIS. Glob. NEST J.
**2011**, 13, 28–40. [Google Scholar] - Chalkias, C.; Ferentinou, M.; Polykretis, C. GIS supported landslide susceptibility modeling at regional scale: Anexpert-based fuzzy weighting method. ISPRS Int. J. GeoInf.
**2014**, 3, 523–539. [Google Scholar] [CrossRef] - Lee, S. Application and verification of fuzzy algebraic operators to landslide susceptibility mapping. Environ. Geol.
**2007**, 52, 615–623. [Google Scholar] [CrossRef] - Zhu, A.-X.; Wang, R.; Qiao, J.; Qin, C.-Z.; Chen, Y.; Liu, J.; Du, F.; Lin, Y.; Zhu, T. An expert knowledge-based approach to landslide susceptibility mapping using GIS and fuzzy logic. Geomorphology
**2014**, 214, 128–138. [Google Scholar] [CrossRef] - Oh, H.-J.; Pradhan, B. Application of a neuro-fuzzy model to landslide-susceptibility mapping for shallow landslides in a tropical hilly area. Comput. Geosci.
**2011**, 37, 1264–1276. [Google Scholar] [CrossRef] - Pradhan, B.; Sezer, E.A.; Gokceoglu, C.; Buchroithner, M.F. Landslide susceptibility mapping by neuro-fuzzy approach in a landslide-prone area (Cameron Highlands, Malaysia). IEEE Trans. Geosci. Remote Sens.
**2010**, 48, 4164–4177. [Google Scholar] [CrossRef] - Vahidnia, M.H.; Alesheikh, A.A.; Alimohammadi, A.; Hosseinali, F. A GIS-based neuro-fuzzy procedure for integrating knowledge and data in landslide susceptibility mapping. Comput. Geosci.
**2010**, 36, 1101–1114. [Google Scholar] [CrossRef] - Kavzoglu, T.; Sahin, E.K.; Colkesen, I. Selecting optimal conditioning factors in shallow translational landslide susceptibility mapping using genetic algorithm. Eng. Geol.
**2015**, 192, 101–112. [Google Scholar] [CrossRef] - Wan, S. Entropy-based particle swarm optimization with clustering analysis on landslide susceptibility mapping. Environ. Earth Sci.
**2013**, 68, 1349–1366. [Google Scholar] [CrossRef] - Raia, S.; Alvioli, M.; Rossi, M.; Baum, R.L.; Godt, J.W.; Guzzetti, F. Improving predictive power of physical based rainfall-induced shallow landslide models: A probabilistic approach. Geosci. Model Dev.
**2014**, 7, 495–514. [Google Scholar] [CrossRef] - Merghadi, A.; Abderrahmane, B.; Bui, D.T. Landslide susceptibility assessment at Mila Basin (Algeria): Acomparative assessment of prediction capability of advanced machine learning methods. ISPRS Int. J. GeoInf.
**2018**, 7, 268. [Google Scholar] [CrossRef] - Su, Q.; Zhang, J.; Zhao, S.; Wang, L.; Liu, J.; Guo, J. Comparative assessment of three nonlinear approaches for landslide susceptibility mapping in a coal mine area. ISPRS Int. J. GeoInf.
**2017**, 6, 228. [Google Scholar] [CrossRef] - Rossi, M.; Guzzetti, F.; Reichenbach, P.; Mondini, A.C.; Peruccacci, S. Optimal landslide susceptibility zonation based on multiple forecasts. Geomorphology
**2010**, 114, 129–142. [Google Scholar] [CrossRef] - Guzzetti, F. Landslide Hazard and Risk Assessment. Ph.D. Thesis, University of Bonn, Bonn, Germany, 2005. [Google Scholar]
- Mondini, A.C.; Guzzetti, F.; Reichenbach, P.; Rossi, M.; Cardinali, M.; Ardizzone, F. Semi-automatic recognition and mapping of rainfall induced shallow landslides using optical satellite images. Remote Sens. Environ.
**2011**, 115, 1743–1757. [Google Scholar] [CrossRef] - Mondini, A.C.; Marchesini, I.; Rossi, M.; Chang, K.-T.; Pasquariello, G.; Guzzetti, F. Bayesian framework for mapping and classifying shallow landslides exploiting remote sensing and topographic data. Geomorphology
**2013**, 201, 135–147. [Google Scholar] [CrossRef] - Wang, X.; Niu, R. Spatial forecast of landslides in three gorges based on spatial data mining. Sensors
**2009**, 9, 2035–2061. [Google Scholar] [CrossRef] - Aksoy, B.; Ercanoglu, M. Landslide identification and classification by object-based image analysis and fuzzy logic: An example from the Azdavay region (Kastamonu, Turkey). Comput. Geosci.
**2012**, 38, 87–98. [Google Scholar] [CrossRef] - Dou, J.; Chang, K.-T.; Chen, S.; Yunus, A.P.; Liu, J.-K.; Xia, H.; Zhu, Z. Automatic case-based reasoning approach for landslide detection: Integration of object-oriented image analysis and a genetic algorithm. Remote Sens.
**2015**, 7, 4318–4342. [Google Scholar] [CrossRef] - Stumpf, A.; Kerle, N. Object-oriented mapping of landslides using random forests. Remote Sens. Environ.
**2011**, 115, 2564–2577. [Google Scholar] [CrossRef] - Wang, X.; Niu, R. Landslide intelligent prediction using object-oriented method. Soil Dyn. Earthq. Eng.
**2010**, 30, 1478–1486. [Google Scholar] [CrossRef] - Guzzetti, F.; Mondini, A.C.; Cardinali, M.; Fiorucci, F.; Santangelo, M.; Chang, K.T. Landslide inventory maps: New tools for an old problem. Earth Sci. Rev.
**2012**, 112, 42–66. [Google Scholar] [CrossRef][Green Version] - Petschko, H.; Brenning, A.; Bell, R.; Goetz, J.; Glade, T. Assessing the quality of landslide susceptibility maps—Case study Lower Austria. Nat. Hazards Earth Syst. Sci.
**2014**, 14, 95–118. [Google Scholar] [CrossRef] - Ardizzone, F.; Cardinali, M.; Carrara, A.; Guzzetti, F.; Reichenbach, P. Impact of mapping error on the reliability of landslide hazard maps. Nat. Hazards Earth Syst. Sci.
**2002**, 2, 3–14. [Google Scholar] [CrossRef] - Elith, J.; Burgman, M.A.; Regan, H.M. Mapping epistemic uncertainties and vague concepts in predictions of species distribution. Ecol. Model.
**2002**, 157, 313–329. [Google Scholar] [CrossRef] - Mosleh, A. Hidden sources of uncertainty: Judgment in the collection and analysis of data. Nucl. Eng. Des.
**1986**, 93, 187–198. [Google Scholar] [CrossRef] - Brenning, A. Spatial prediction models for landslide hazards: Review, comparison and evaluation. Nat. Hazards Earth Syst. Sci.
**2005**, 5, 853–862. [Google Scholar] [CrossRef] - Guzzetti, F.; Reichenbach, P.; Ardizzone, F.; Cardinali, M.; Galli, M. Estimating the quality of landslide susceptibility models. Geomorphology
**2006**, 81, 166–184. [Google Scholar] [CrossRef] - Wang, H.; Liu, G.; Xu, W.; Wang, G. GIS-based landslide hazard assessment: An overview. Prog. Phys. Geogr.
**2005**, 29, 548–567. [Google Scholar] - Dai, F.C.; Lee, C.F. Landslide characteristics and slope instability modeling using GIS, Lantau Island, Hong Kong. Geomorphology
**2002**, 42, 213–228. [Google Scholar] [CrossRef] - Hussin, H.Y.; Zumpano, V.; Reichenbach, P.; Sterlacchini, S.; Micu, M.; van Westen, C.; Balteanu, B. Different landslide sampling strategies in a grid-based bi-variate statistical susceptibility model. Geomorphology
**2016**, 253, 508–523. [Google Scholar] [CrossRef] - Dou, J.; Paudel, U.; Oguchi, T.; Uchiyama, S.; Hayakawa, Y.S. Shallow and deep-seated landslide differentiation using support vector machines: A case study of the Chuetsu Area, Japan. Terr. Atmos. Ocean. Sci.
**2015**, 26, 227–239. [Google Scholar] [CrossRef] - Yilmaz, I. The effect of the sampling strategies on the landslide susceptibility mapping by conditional probability and artificial neural networks. Environ. Earth Sci.
**2010**, 60, 505–519. [Google Scholar] [CrossRef] - Suzen, M.L.; Doyuran, V. Data driven bivariate landslide susceptibility assessment using geographical information systems: A method and application to Asarsuyu catchment, Turkey. Eng. Geol.
**2004**, 71, 303–321. [Google Scholar] [CrossRef] - Poli, S.; Sterlacchini, S. Landslide representation strategies in susceptibility studies using weights-of-evidence modelling technique. Nat. Resour. Res.
**2007**, 16, 121–134. [Google Scholar] [CrossRef] - Simon, N.; Crozier, M.; de Moiste, M.; Rafek, A.G. Point based assessment: Selecting the best way to represent landslide polygon as point frequency in landslide investigation. Electron. J. Geotech. Eng.
**2013**, 18, 775–784. [Google Scholar] - Belgiu, M.; Dragut, L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote Sens.
**2016**, 114, 24–31. [Google Scholar] [CrossRef] - Du, P.; Samat, A.; Waske, B.; Liu, S.; Li, Z. Random forest and rotation forest for fully polarized SAR image classification using polarimetric and spatial features. ISPRS J. Photogramm. Remote Sens.
**2015**, 105, 38–53. [Google Scholar] [CrossRef] - Rodriguez-Galiano, V.F.; Ghimire, B.; Rogan, J.; Chica-Olmo, M.; Rigol-Sanchez, J.P. An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J. Photogramm. Remote Sens.
**2012**, 67, 93–104. [Google Scholar] [CrossRef] - Chan, J.C.; Beckers, P.; Spanhove, T.; Borre, J.V. An evaluation of ensemble classifiers for mapping Natura 2000 heathland in Belgium using spaceborne angular hyperspectral (CHRIS/Proba) imagery. Int. J. Appl. Earth Obs. Geoinf.
**2012**, 18, 13–22. [Google Scholar] [CrossRef] - Shao, Y.; Campbell, J.B.; Taff, G.N.; Zheng, B. An analysis of cropland mask choice and ancillary data for annual corn yield forecasting using MODIS data. Int. J. Appl. Earth Obs. Geoinf.
**2015**, 38, 78–87. [Google Scholar] [CrossRef] - Abdel-Rahman, E.M.; Mutanga, O.; Adam, E.; Ismail, R. Detecting Sirex noctilio grey-attacked and lightning-struck pine trees using airborne hyperspectral data, random forest and support vector machines classifiers. ISPRS J. Photogramm. Remote Sens.
**2014**, 88, 45–59. [Google Scholar] [CrossRef] - Shang, X.; Chisholm, L.A. Classification of Australian native forest species using hyperspectral remote sensing and machine-learning classification algorithms. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.
**2014**, 7, 2481–2489. [Google Scholar] [CrossRef] - Mellor, A.; Boukir, S.; Haywood, A.; Jones, S. Exploring issues of training data imbalance and mislabeling on random forest performance for large area land cover classification using the ensemble margin. ISPRS J. Photogramm. Remote Sens.
**2015**, 105, 155–168. [Google Scholar] [CrossRef] - Lagomarsino, D.; Tofani, V.; Segoni, S.; Catani, F.; Casagli, N. A tool for classification and regression using random forest methodology: Applications to landslide susceptibility mapping and soil thickness modeling. Environ. Model. Assess.
**2017**, 22, 201–214. [Google Scholar] [CrossRef] - Desai, A.; Jadav, P.M. An empirical evaluation of adaboost extensions for cost-sensitive classification. Int. J. Comput. Appl.
**2012**, 44, 34–41. [Google Scholar] - Tsai, F.; Lai, J.-S.; Lu, Y.-H. Land-cover classification of full-waveform LiDAR point cloud with volumetric texture measures. Terr. Atmos. Ocean. Sci.
**2016**, 27, 549–563. [Google Scholar] [CrossRef] - Witten, I.H.; Frank, E.; Hall, M.A. Data Mining: Practical Machine Learning Tools and Techniques, 3rd ed.; Morgan Kaufmann: San Francisco, CA, USA, 2011. [Google Scholar]
- Highland, L.M.; Bobrowsky, P. The Landslide Handbook—A Guide to Understanding Landslides; U.S. Geological Survey Circular: Reston, VA, USA, 2008; Volume 1325, p. 129.
- Chiang, S.-H.; Chang, K.-T.; Mondini, A.C.; Tsai, B.-W.; Chen, C.-Y. Simulation of event-based landslides and debris flows at watershed level. Geomorphology
**2012**, 138, 306–318. [Google Scholar] [CrossRef] - Chang, K.-T.; Chiang, S.-H.; Hsu, M.-L. Modeling typhoon-and earthquake-induced landslides in a mountainous watershed using logistic regression. Geomorphology
**2007**, 89, 335–347. [Google Scholar] [CrossRef] - Wang, L.-J.; Sawada, K.; Moriguchi, S. Landslide susceptibility analysis with logistic regression model based on FCM sampling strategy. Comput. Geosci.
**2013**, 57, 81–92. [Google Scholar] [CrossRef] - Yilmaz, C.; Topal, T.; Lűtfi Sűzen, M. GIS-based landslide susceptibility mapping using bivariate statistical analysis in Devrek (Zonguldak-Turkey). Environ. Earth Sci.
**2012**, 65, 2161–2178. [Google Scholar] [CrossRef] - Breiman, L. Random forests. Mach. Learn.
**2001**, 45, 5–32. [Google Scholar] [CrossRef] - Elkan, C. The foundations of cost-sensitive learning. In Proceedings of the 7th International Joint Conference on Artificial Intelligence, Seattle, VA, USA, 4–10 August 2001. [Google Scholar]
- Heckmann, T.; Gegg, K.; Gegg, A.; Becht, M. Sample size matters: Investigating the effect of sample size on a logistic regression susceptibility model for debris flows. Nat. Hazards Earth Syst. Sci.
**2014**, 14, 259–278. [Google Scholar] [CrossRef]

**Figure 2.**Landslide factors used in this study: (

**a**) Digital elevation model, (

**b**) geological formations, (

**c**) soil types, (

**d**) line features, (

**e**) normalized difference vegetation index image.

**Figure 4.**Sampling strategy procedure used to extract landslide causative factors to prepare training and check datasets; Max, Med, and Min represent maximum, median, and minimum operators, respectively.

**Figure 5.**Slope distributions between the source and run-out areas using different sampling operators: (

**a**) Difference in the average slope according to subtracting the run-out area from the source region in consideration of area constraints, (

**b**) average slope of the area size of ≥ and <1000 m

^{2}, (

**c**) standard deviation of (b).

**Figure 6.**Quantitative evaluations of Model-1 using different sampling operators to compare the results of applying the constraint on the area size or not. UA—user’s accuracy, PA—producer’s accuracy, N—non-landslide class, L—landslide source class, R—run-out class.

**Figure 7.**Quantitative evaluations of Model-1 using the hybrid sampling strategies and the area constraint (≥1000 m

^{2}). For example, Centroid–Max indicates that the source and run-out samples are extracted by the centroid and maximum operators, respectively.

**Figure 8.**Quantitative evaluations of treating the run-out area as the landslide source (Model-2) and non-landslide (Model-3) classes compared with Model-1 using the Max–Min sampling strategy.

**Figure 9.**Quantitative evaluations of Model-1 using different cost settings to balance the run-out’s omission and commission errors. C—cost weight in a cost matrix, OA—overall accuracy, Kappa—kappa coefficient.

**Figure 10.**Generated landslide susceptibility maps: (

**a**) Model-1 with cost setting (C=5), (

**b**) Model-2, (

**c**) Model-3, (

**d**) difference of subtracting (c) from (a), (

**e**) zoomed-in Model-2 result, (

**f**) zoomed-in Model-3 result.

Landslide Source | Run-Out | |
---|---|---|

Number of polygons | 5336 | 1080 |

Area of maximum polygon (m^{2}) | 250,493 | 385,381 |

Area of minimum polygon (m^{2}) | 10 | 13 |

The sum of polygon area (m^{2}) | 10,389,920 | 6,349,667 |

The average of polygon area (m^{2}) | ~1947 | ~5879 |

The standard deviation of polygon area (m^{2}) | ~8066 | ~23,355 |

Original Data | Used Factor (Raster Format) |
---|---|

DEM | Aspect |

Curvature | |

Elevation | |

Slope | |

Geology map | Geology |

Soil map | Soil |

Fault map | Distance to fault |

River map | Distance to river |

Road map | Distance to road |

Satellite imagery | NDVI |

Scheme | Number of Classes | Notation |
---|---|---|

Model-1 | 3 | Run-out is an individual class |

Model-2 | 2 | Run-out belongs to the landslide source class |

Model-3 | 2 | Run-out belongs to the non-landslide class |

Constraint on Area Size | Source | Run-Out |
---|---|---|

≥1000 m^{2} | 1345 | 445 |

≥750 m^{2} | 1638 | 536 |

≥500 m^{2} | 2078 | 638 |

≥250 m^{2} | 2943 | 821 |

≥100 m^{2} | 4049 | 981 |

No constraint | 4656 | 1020 |

**Table 5.**Confusion matrix result of Model-1 using the Max–Min sampling strategy without the cost setting.

Ground Truth | |||||
---|---|---|---|---|---|

Source | Run-Out | Non-Landslide | UA | ||

(a) Random Forests | |||||

Prediction results | Source | 408 | 1 | 69 | 0.854 |

Run-out | 1 | 101 | 25 | 0.795 | |

Non-landslide | 56 | 50 | 506 | 0.827 | |

PA | 0.877 | 0.664 | 0.843 | ||

Overall Accuracy = 83.4% | Kappa = 0.7182 | ||||

(b) Logistic Regression | |||||

Prediction results | Source | 361 | 1 | 81 | 0.815 |

Run-out | 1 | 97 | 1 | 0.688 | |

Non-landslide | 103 | 54 | 476 | 0.752 | |

PA | 0.776 | 0.638 | 0.793 | ||

Overall Accuracy = 76.7% | Kappa = 0.6059 |

Year | Source | Run-Out |
---|---|---|

2010 | 361 | 550 |

2011 | 296 | 520 |

2015 | 205 | 85 |

**Table 7.**Prediction of Model-1 using the later-record samples. AUC—area under the receiver operating characteristics curve.

Year | Non-Landslide | Source | Run-Out | ||||||
---|---|---|---|---|---|---|---|---|---|

UA | PA | AUC | UA | PA | AUC | UA | PA | AUC | |

2010 | 0.654 | 0.698 | 0.721 | 0.613 | 0.839 | 0.932 | 0.697 | 0.451 | 0.814 |

2011 | 0.652 | 0.688 | 0.717 | 0.567 | 0.855 | 0.92 | 0.706 | 0.442 | 0.795 |

2015 | 0.819 | 0.734 | 0.857 | 0.732 | 0.878 | 0.909 | 0.649 | 0.565 | 0.843 |

Year | AUC | Non-Landslide | Source | |||
---|---|---|---|---|---|---|

UA | PA | UA | PA | |||

(a) Without cost setting | ||||||

2010 | Model-2 | 0.869 | 0.917 | 0.729 | 0.549 | 0.834 |

Model-3 | 0.912 | 0.935 | 0.822 | 0.656 | 0.856 | |

2011 | Model-2 | 0.871 | 0.929 | 0.708 | 0.514 | 0.851 |

Model-3 | 0.897 | 0.936 | 0.801 | 0.608 | 0.848 | |

2015 | Model-2 | 0.881 | 0.9 | 0.748 | 0.713 | 0.883 |

Model-3 | 0.908 | 0.912 | 0.824 | 0.781 | 0.888 | |

(b) With cost setting | ||||||

2010 | Model-2 | 0.869 | 0.898 | 0.84 | 0.652 | 0.759 |

Model-3 | 0.905 | 0.902 | 0.885 | 0.722 | 0.756 | |

2011 | Model-2 | 0.868 | 0.904 | 0.82 | 0.605 | 0.76 |

Model-3 | 0.891 | 0.907 | 0.865 | 0.671 | 0.757 |

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Lai, J.-S.; Chiang, S.-H.; Tsai, F. Exploring Influence of Sampling Strategies on Event-Based Landslide Susceptibility Modeling. *ISPRS Int. J. Geo-Inf.* **2019**, *8*, 397.
https://doi.org/10.3390/ijgi8090397

**AMA Style**

Lai J-S, Chiang S-H, Tsai F. Exploring Influence of Sampling Strategies on Event-Based Landslide Susceptibility Modeling. *ISPRS International Journal of Geo-Information*. 2019; 8(9):397.
https://doi.org/10.3390/ijgi8090397

**Chicago/Turabian Style**

Lai, Jhe-Syuan, Shou-Hao Chiang, and Fuan Tsai. 2019. "Exploring Influence of Sampling Strategies on Event-Based Landslide Susceptibility Modeling" *ISPRS International Journal of Geo-Information* 8, no. 9: 397.
https://doi.org/10.3390/ijgi8090397