On the Importance of Train–Test Split Ratio of Datasets in Automatic Landslide Detection by Supervised Classification
Institute of Geodesy and Geoinformatics, Wroclaw University of Environmental and Life Sciences, 50-375 Wroclaw, Poland
*
Author to whom correspondence should be addressed.
Remote Sens. 2020, 12(18), 3054; https://doi.org/10.3390/rs12183054
Received: 31 July 2020 / Revised: 14 September 2020 / Accepted: 15 September 2020 / Published: 18 September 2020
(This article belongs to the Special Issue Remote Sensing of Natural Hazards)
Many automatic landslide detection algorithms are based on supervised classification of various remote sensing (RS) data, particularly satellite images and digital elevation models (DEMs) delivered by Light Detection and Ranging (LiDAR). Machine learning methods require the collection of both training and testing data to produce and evaluate the classification results. The collection of good quality landslide ground truths to train classifiers and detect landslides in other regions is a challenge, with a significant impact on classification accuracy. Taking this into account, the following research question arises: What is the appropriate training–testing dataset split ratio in supervised classification to effectively detect landslides in a testing area based on DEMs? We investigated this issue for both the pixel-based approach (PBA) and object-based image analysis (OBIA). In both approaches, the random forest (RF) classification was implemented. The experiments were performed in the most landslide-affected area in Poland in the Outer Carpathians-Rożnów Lake vicinity. Based on the accuracy assessment, we found that the training area should be of a similar size to the testing area. We also found that the OBIA approach performs slightly better than PBA when the quantity of training samples is significantly lower than the testing samples. To increase detection performance, the intersection of the OBIA and PBA results together with median filtering and the removal of small elongated objects were performed. This allowed an overall accuracy (OA) = 80% and F1 Score = 0.50 to be achieved. The achieved results are compared and discussed with other landslide detection-related studies.
View Full-Text
▼
Show Figures
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited
MDPI and ACS Style
Pawluszek-Filipiak, K.; Borkowski, A. On the Importance of Train–Test Split Ratio of Datasets in Automatic Landslide Detection by Supervised Classification. Remote Sens. 2020, 12, 3054. https://doi.org/10.3390/rs12183054
AMA Style
Pawluszek-Filipiak K, Borkowski A. On the Importance of Train–Test Split Ratio of Datasets in Automatic Landslide Detection by Supervised Classification. Remote Sensing. 2020; 12(18):3054. https://doi.org/10.3390/rs12183054
Chicago/Turabian StylePawluszek-Filipiak, Kamila; Borkowski, Andrzej. 2020. "On the Importance of Train–Test Split Ratio of Datasets in Automatic Landslide Detection by Supervised Classification" Remote Sens. 12, no. 18: 3054. https://doi.org/10.3390/rs12183054
Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.
Search more from Scilit