Evaluation of Different Machine Learning Methods and Deep-Learning Convolutional Neural Networks for Landslide Detection

: There is a growing demand for detailed and accurate landslide maps and inventories around the globe, but particularly in hazard-prone regions such as the Himalayas. Most standard mapping methods require expert knowledge, supervision and ﬁeldwork. In this study, we use optical data from the Rapid Eye satellite and topographic factors to analyze the potential of machine learning methods, i


Introduction
Mass movements such as landslides are a major natural hazard in mountainous regions all over the world [1].Although landslides mostly occur locally, they can cause extensive damage to natural and human infrastructures at different scales in hilly and mountainous areas [2].Along with the physical damage, landslides have a direct long-term economic and social impact on an area of human habitation [3].Several studies have investigated mitigation strategies, and landslide susceptibility mapping is increasingly applied [4][5][6][7][8].Hazard and risk mapping generally aim to identify and spatially delineate hazard-prone areas, while analyzing the potential risk from that in a targeted study [9].To carry out a risk analysis, it is necessary to analyze the occurrence, characteristics, and impacts of past hazard events, to relate them to the present situation and to generate predictions for the future.Several studies have been carried out to this end with various knowledge-based methods [10,11], with some also including several machine learning (ML) methods [12][13][14][15][16]. Almost all of the ML methods that have been used to analyze the potential risk of landslides heavily depend on an inventory data set of the spatial extent of known landslides or at least one characterizing GPS point location per known landslide in the target study area.ML methods require such data sets for both training and validation steps [17].Although some knowledge-based methods are independent of the existence of landslide inventory data sets for the generation of hazard and risk maps, the resulting maps require accuracy assessment and sensitivity analysis steps and, therefore, need accurate inventory data sets [18].Thus, it is crucial to have access to an accurate landslide inventory data set at all stages when monitoring, modeling and mapping landslide risk.Moreover, landslides often trigger emergency situations if the occur in the vicinity of human habitations or infrastructures, e.g., power lines, roads, bridges and settlement areas, which means there is often time pressure to detect and delineate landslides affecting certain areas to carry out tasks such as timely support planning and crisis responses [19,20].Although some advanced field surveying techniques exist, e.g., laser rangefinder binoculars along with a GPS receiver [21], gaining access to the affected areas and conducting field surveys are too difficult or dangerous in most cases [22].Thus, Earth observation (EO) data, including very high resolution (VHR) images, are widely considered as the most accessible data providing critical up-to-date information necessary for supporting humanitarian response [23].Analysis and classification of the EO data for extracting mass movements, land displacements and landslides have a long history in remote sensing domain.We may distinguish two main approaches for the classification and information extraction from the satellite images, namely object-based and pixel-based.Image analysis based on objects has become more widespread [24], yet pixel-based approaches still dominate.Both object-based and pixel-based image analysis have been integrated with different ML methods and used in various applications [25].Generally, ML methods are considered effective methods for remote sensing applications with emphasis on image classification and object recognition [26].Different ML methods and classifiers have already been used in landslide detection studies.Recurrent neural network (RNN) and multilayer perceptron neural network (MLP-NN) methods typically require an input data set from sources like orthophotos or LiDAR-derived data.This also applies to the use of textural features for landslide detection as in the study by Mezaal et al. [27].A maximum likelihood ratio (MLR) and an artificial neural network (ANN) were compared as automatic landslide detection methods using multi-spectral advanced spaceborne thermal emission and reflection radiometer (ASTER) images by Danneels et al. [28].In another study, Bui et al. [29] used five different ML methods for landslide prediction models in the case study of the Son La hydropower basin in Vietnam, namely support vector machines (SVM), multi-layer perceptron neural networks (MLP Neural Nets), radial basis function neural networks (RBF Neural Nets), kernel logistic regression (KLR), and logistic model trees (LMT).The authors tested the mentioned ML methods at the pixel level.For this case study, MLP Neural Nets and SVM achieved the best overall accuracies.Some other studies exist which integrate ML methods with mathematical theories to address the limitations and to improve the overall accuracy of landslide detection.Mezaal et al. [9] optimized the performance of ML methods in landslide detection by using Dempster-Shafer theory (DST) based on the probabilistic output from object-based SVM, K-nearest neighbor (KNN) and RF methods.
More recently, deep-learning methods and, above all, convolutional neural networks (CNNs) have achieved fairly good results in various image analysis tasks in computer vision [30,31].CNNs have also been used for VHR image classification and segmentation [32,33], semantic segmentation [34], scene annotation using different high spatial resolution and aerial images [35], and object detection [36].The majority of studies involving CNNs aim to detect objects and, particularly, search for distinct objects such as vehicles, roads and airplanes [37].Well-known are the community-based crowd-sourced approaches using a large number of labeled images for object recognition [38] like classifying to contain a cat or a dog.The prerequisite for applying CNN methods is using a training data set of labeled data.Such a supervised learning approach usually achieves good results if enough training samples exist [39].CNNs apply supervised machine learning, where clots of labeled data are used to push learnable, i.e., adaptive, and feature extractor filters to minimize a loss function [40].The performance of CNNs strongly depends on the availability of larger training datasets, optimized network architectures and faster GPUs [41].
While CNNs have reached good accuracies for object recognition in aerial images, only a few studies exist that use deep-learning methods and CNNs for landslide detection.Yu et al. [42] used a CNN for their research together with an improved region growing algorithm (RSG_R) for landslide detection.They trained their CNN method on a set of landslide images, extracted the discriminant information such as area and boundary of the landslides with the RSG_R algorithm, and concluded to yield high detection accuracy for identifying landslide characteristics.Ding et al. [43] evaluated their CNN method for landslide detection from GF-1 images with four spectral bands and 8 m spatial resolution for Shenzhen, China.Their automated landslide detection method achieved a landslide detection rate of 72.5%, a false positives rate of 10.2%, and an overall accuracy of 67%.This literature review shows that our study is not the first to use CNNs for landslide detection, but that the potential of CNNs in this field has not been fully explored yet.In the remainder of this article, we apply CNN methods to landslide detection based on optical satellite imagery from the Rapid Eye sensor.We compare the results from the CNN methods with those from two state-of-the-art ML methods, namely ANN, SVM and RF.By using the spectral information of the Rapid Eye images separately along with topographic factors, we illustrate the performance of each method and the impact of the used spectral and topographic factors on the landslide detection.The areas identified as having landslide occurrences are then validated using common remote sensing and GIS validation metrics and the mean intersection-over-union (mIOU) validation method from computer vision.

Overview of the Study Area
Our case study area lies in the southern part of the Rasuwa district in Nepal along a highway that connects Nepal to China (see Figure 1).The district has an area of about 1544 km 2 .The elevation in the region ranges from 734 m.a.s.l. to 4050 m.a.s.l.The land cover is predominantly forest, followed by shrub land and agriculture, grassland and rural habituated areas.The study area is located in the higher Himalayas and is considered to be one of the most landslide-prone regions along the Trishuli River, with mainly quartzite, schist, and gneiss rock formations.The Main Central thrust (MCT) lies near the boundary of the Rasuwa district.MCT is a subduction zone where the Asian and Indian plates collide, making this an earthquake-prone area.The climate is sub-tropical and humid with cooler temperatures in higher areas, particularly in the mid hills zone above 2500 m.Orographic monsoon precipitation brings rain to the area, and the annual average rainfall is 691 mm, most of which occurs during the monsoon season.Some of the known landslides affect built-up areas and have already caused casualties in several villages.Some households were entirely abandoned, and people are forced to live in shelters provided by Non-Governmental Organizations.Agricultural land is severely affected by landslide occurrences.Moreover, landslides have dammed the Trishuli River at several locations and have caused damage to hydropower plant project sites that are planned in the study area.Landslides have also affected the bridges and roads of the main transport corridor between Nepal and China.
Several studies have been carried out analyzing landslide hazard and risk for this case study area, some of which are presented in Table 1.Several studies used GNSS-based field survey data for landslide mapping, to carry out hazard and risk analysis and runoff assessment.However, the field survey is very challenging as most of the villages have no road connection, and many landslide affected areas are difficult to reach.It is often dangerous to trek along steep sides along the hills.Several other studies have been carried out by governmental organizations or private companies, but such reports were not accessible for our study.Several studies have been carried out analyzing landslide hazard and risk for this case study area, some of which are presented in Table 1.Several studies used GNSS-based field survey data for landslide mapping, to carry out hazard and risk analysis and runoff assessment.However, the field survey is very challenging as most of the villages have no road connection, and many landslide affected areas are difficult to reach.It is often dangerous to trek along steep sides along the hills.Several other studies have been carried out by governmental organizations or private companies, but such reports were not accessible for our study.

Overall Methodology
The spectral information from Rapid Eye images was used in combination with some conditioning topographic factors to evaluate the performance of two ML methods, namely ANN, SVM and RF, and differently structured CNNs at the pixel level for the detection of landslide affected areas.The workflow of this study is as follow:

•
Designing two different training data sets, a) spectral information only, and b) a data set containing both spectral information and topographic factors.

•
Applying ANN, SVM and RF methods for landslide detection based on both training data sets and validating the performance for the study area.

•
Generating CNN-based patches by considering multiple window sizes from small to large ones.

•
Developing a data augmentation approach for increasing the number of training data sets used for CNNs.

Overall Methodology
The spectral information from Rapid Eye images was used in combination with some conditioning topographic factors to evaluate the performance of two ML methods, namely ANN, SVM and RF, and differently structured CNNs at the pixel level for the detection of landslide affected areas.The workflow of this study is as follow:

•
Designing two different training data sets, a) spectral information only, and b) a data set containing both spectral information and topographic factors.

•
Applying ANN, SVM and RF methods for landslide detection based on both training data sets and validating the performance for the study area.

•
Generating CNN-based patches by considering multiple window sizes from small to large ones.

•
Developing a data augmentation approach for increasing the number of training data sets used for CNNs.

•
Structuring CNNs with different layer depths in regard to the range of input window size CNN patches to determine the most efficient CNN setting.

•
Testing and validating the performances of each method by using multiple parameters.
The descriptions and the experimental results of this workflow are organized in the following sections.Further explanations and discussions regarding the impact of using different input data sets and methods can be found in the conclusion section.

Data
RapidEye is a constellation of five Earth-observing satellites that deliver sun synchronous 5 m spatial resolution imagery.The height of their orbits is 680 km, and the swath width is 77 km with a 5-day revisit time period [47].Three RapidEye cloud-free satellite images from the 28 November 2015 were acquired to cover the study area.We used four out of the five spectral bands of RapidEye, namely blue (440-510 nm), green (520-590 nm), red (630-685 nm), and near-infrared (760-850 nm).In addition, the normalized difference vegetation index (NDVI), which is a very common and widely used ratio [48], was calculated from the near-infrared and the red spectral bands.
Surface topography is considered to be one of the main factors for ground instabilities in hilly and mountainous areas.Steep slopes usually bear the highest probability for landslides, although other factors, such as the geology, can play an even more important role.Within our study area, the geology does not vary significantly and, therefore, topographic factors such as slope and aspect play the key role in the occurrence of landslides.Topography also influences other factors such as wind, sunlight and precipitation [1].Plan curvature is a particular topographic factor with relevance to landslides that can easily be derived from a digital elevation model (DEM).Earlier studies revealed that hillsides with planar plan curvature are generally more prone to landslides [49].Although more conditioning topographical factors are used in literature, we consider slope, aspect and plan curvature as the most important ones and want to limit the complexity of the training data set for the different methods tested (see Figure 2).All these topographical factors were derived from a 5 m resolution DEM acquired from the Japanese aerospace exploration agency JAXA ALOS sensor from 2016.An extensive field survey to locate landslide areas in the Rasuwa district in the higher Himalayas was carried out over two months in the summer of 2018 using a GPS device (Garmin Etre 20X) (see Figure 3).The GPS polygons that refer to the landslide areas were later manually enhanced using the satellite images for visual inspection and plausibility checks.The final landslide inventory As mentioned, we used two different data sets.We trained our methods using the four original spectral bands plus the NDVI.Then, the topographic factors slope, aspect and plan curvature were added to both the training and testing data sets.
An extensive field survey to locate landslide areas in the Rasuwa district in the higher Himalayas was carried out over two months in the summer of 2018 using a GPS device (Garmin Etre 20X) (see Figure 3).The GPS polygons that refer to the landslide areas were later manually enhanced using the satellite images for visual inspection and plausibility checks.The final landslide inventory data set was created using the GPS data from the field survey, correcting or deleting instances, and eventually adding landslide areas clearly visible in the image but not mapped in the field.This was done in the Geographic Information System ArcGIS 10.3.This integration results in the production of a precise and reliable landslide inventory, while eliminating false positives that occurred in several previous studies when solely using remote sensing-based approaches.

Random Forest (RF)
The RF method is based on multiple decision trees.It was introduced by [50] and has been used in a wide range of remote sensing applications [51].As the RF method uses the training data set to create multiple deeper decision trees, it is less sensitive to the over-fitting problem caused by complex datasets compared to other decision trees.Each decision tree of the RF predicts an output and each output is weighted by the value derived from the votes that it receives.The majority voting on an output and a degree of convergence in fitting results in the final classification [52].The cited literature and others report RF classifications to yield good results for satellite image classifications.Therefore, RF is considered to be one of the most effective non-parametric ensemble learning methods in image analysis [53] and RF was chosen as a ML method for landslide detection in our study.For training the method, 3500 random points were prepared from the landslide polygons in both training zones using the random point tool in ArcMap 10.3.To avoid the resampling of each pixel that was signed by the random points, a minimum distance of 5 m was considered for the generated points due to the spatial resolution of Rapid Eye.We used 100 trees with a single randomly split variable to grow the trees.

Random Forest (RF)
The RF method is based on multiple decision trees.It was introduced by [50] and has been used in a wide range of remote sensing applications [51].As the RF method uses the training data set to create multiple deeper decision trees, it is less sensitive to the over-fitting problem caused by complex datasets compared to other decision trees.Each decision tree of the RF predicts an output and each output is weighted by the value derived from the votes that it receives.The majority voting on an output and a degree of convergence in fitting results in the final classification [52].The cited literature and others report RF classifications to yield good results for satellite image classifications.Therefore, RF is considered to be one of the most effective non-parametric ensemble learning methods in image analysis [53] and RF was chosen as a ML method for landslide detection in our study.For training the method, 3500 random points were prepared from the landslide polygons in both training zones using the random point tool in ArcMap 10.3.To avoid the resampling of each pixel that was signed by the random points, a minimum distance of 5 m was considered for the generated points due to the spatial resolution of Rapid Eye.We used 100 trees with a single randomly split variable to grow the trees.

Support Vector Machines (SVM)
The SVM [54] is a machine learning method that maps the dataset of the problem into a higher-dimensional space through the non-linear transformers, where an optimal hyperplane is created for separating the dataset features.The optimal hyperplane will be found when the separating margins between the defined classes are maximal [55].These maximum separating margins are called support vectors.The SVM method has been used for data classification and regression analysis in several domains.It has also been used for landslide detection [29].The resulting SVM classifications are affected by the choice of the kernel function (e.g., polynomial, sigmoid, and radial basis function (RBF) [9].In our study, we applied the widely used RBF kernel [55].In our study, a gamma parameter (γ) of 0.9 was found to yield the most accurate landslide detections.Both training zones were used for training the SVM with the same data set used for the RF method.

Artificial Neural Network (ANN)
The ANNs mimic human brain performance, and they can find solution for complex nonlinear problems by discovering their patterns [56].In the present study, an ANN method was used by a multilayer perceptron (MLP) architecture and trained with the backpropagation algorithm (BPA), which is the most common algorithm for training the ANN method.The number of hidden layer units of any MLP depends on the complexity of the problem [57].For our case, the network was feed-forwarded with the same training input data set with a hidden layer of 30 neurons.The initial weights are randomly selected by the BPA.Then, the difference between the output values and expected ones are obtained across all observations.This comparison and feeding forward signals and back-propagating errors is done for every terrain unit until the mean-square error stabilizes at an adequate low level [58].Therefore, all weightings that were randomly selected at first are updated by the backward process during each cycle to minimize the error.

Convolution Neural Network (CNN)
CNNs have become a hot topic in computer vision and image processing and have introduced state-of-the-art results for these domains [59].CNN's multi-layer feed-forward neural networks can obtain the effective feature representations of an image, which makes it possible for these networks to recognize the visual laws in the image without human-designed complex rules [43].CNNs have a specific architecture, where each so-called hidden layer typically contains convolutional and pooling layers, whereby the convolutional layer is considered to be the main building block of any CNN.The original input image is convolving with a set of trainable kernels that scan across the entire input image resulting in a group of feature maps.Each feature map results from the convolution of the kernel, with its corresponding local region on the original input image.Moreover, an elementwise non-linear activation function (e.g., sigmoid, ReLU, hyperbolic tangent) is taken out of the results of a convolutional layer for non-linearity amplification [60].The pooling layer is usually computed immediately after a convolutional layer and is used to down/sub-sample output of the convolutional layer to generate a condensed set of feature maps.The max-pooling is the most common and widely used pooling layer, which makes it possible to keep only the maximal values of the feature maps.The max-pooling is considered to be the main operation of any CNN.It reduces the spatial size of feature maps significantly and, consequently, the computation volume for the next layers to be processed.The main operations performed in any CNN can be summarized by the following equation [59]: where the O l−1 represents the output feature map from the previous layer of the lth layer, the W l and the b l indicate the weights and biases of the layer, respectively, that convolve the O l−1 by the linear convolution*, and the σ(•) denotes the non-linearity function outside the convolutional layer.These steps are often followed by a pooling operation which is represented by P in Equation (1).

Multiple Input Window Size CNNs
In this study, multiple input window sizes were used for landslide detection.Two input window sizes 32 × 32 and 48 × 48 pixels were considered as our large input window sizes and 12 × 12, 16 × 16, and 22 × 22 were used as three different versions for the small ones (see Figure 4).These input window sizes were selected based on a wide range of sizes from smaller than 12 × 12 up to 64 × 64 based on cross-validation.Multiple input window sizes were used because of the complexity of the shape and size of the features.There are some quite large landslides and several very small ones, often with quite different shapes.Some are elongated and potentially thin and can almost look like an unsealed road rather than a landslide.There are different aspects, slopes and flow directions in the study area and a single landslide may include different aspects.Most landslides exhibit a mixture of topographic features, which makes them difficult to recognize.Zhang et al. [59] used different sizes of input windows and developed a similar approach for the detection of complex shape objects in urban areas.

Different Layer Depth CNNs
Structuring the finest architecture and the optimal layer depth for any specific application of the CNN is an ongoing discussion in the deep learning domain [63].In this study, to account for multiple input window sizes, we structured two CNNs with different layer depths (see Figure 5).A four-layer depth CNN was structured and trained separately with all of our input window sizes The four-layer depth CNN was fed separately with all input window sizes of training sample patches ones using the four original spectral bands (R,G,B, NIR) and the NDVI.Since we fed this CNN with five-layer images, the input sample patch had a×a×5 units (where a is: 12, 16, 22, 32, and 48).In another training process, we also used three topographic layers, namely slope, aspect and plan curvature, in addition to this spectral information, to train the method.In this case, the CNN was fed by the input sample patches with a×a×8 units, where a×a is the size of one layer of sample patches, and 8 is the number of different layers used for the analysis.We used 40 feature maps, hence the number of a×a×5×12 different weights were trained during the first hidden layer of the four-layer depth CNN by using input window sizes of a×a.As a result, 12 feature maps with (a−4)×(a−4)×1 units were obtained through a convolution layer with the kernel size of 5. A max-pooling layer of 2×2 was used immediately after the first hidden layer, which reduced the units to × ×1 in the same number of feature maps from the previous layer.The resulting feature maps were then used as input data for the second convolution layer.Consequently, convolution with a kernel size of 3 led to 20 feature maps with -2]× -2]×1 units.We selected the kernel sizes and the number of feature maps with consideration of the used input window sizes and the spread of our target objects.The last convolution layer was used with a kernel size of 3, which resulted in 40 feature maps with -4]× -4]×1 units.The processes were executed in Trimble's eCognition software environment with the CNN implementation based on the Google TensorFlow library.During the training process, the gradients for each weight were calculated in each hidden layer, i.e., estimated using backpropagation.Moreover, a statistical gradient descent function was used to optimize the weights.We found that the following variables resulted in the best object detection rate in our study; a learning rate of 0.0001, 6000 training steps, The four-layer depth CNN was fed separately with all input window sizes of training sample patches ones using the four original spectral bands (R,G,B, NIR) and the NDVI.Since we fed this CNN with five-layer images, the input sample patch had a × a × 5 units (where a is: 12, 16, 22, 32, and 48).In another training process, we also used three topographic layers, namely slope, aspect and plan curvature, in addition to this spectral information, to train the method.In this case, the CNN was fed by the input sample patches with a × a × 8 units, where a × a is the size of one layer of sample patches, and 8 is the number of different layers used for the analysis.We used 40 feature maps, hence the number of a × a × 5 × 12 different weights were trained during the first hidden layer of the four-layer depth CNN by using input window sizes of a × a.As a result, 12 feature maps with (a − 4) × (a − 4) × 1 units were obtained through a convolution layer with the kernel size of 5. A max-pooling layer of 2 × 2 was used immediately after the first hidden layer, which reduced the units to a−4 − 4] × 1 units.The processes were executed in Trimble's eCognition software environment with the CNN implementation based on the Google TensorFlow library.During the training process, the gradients for each weight were calculated in each hidden layer, i.e., estimated using backpropagation.Moreover, a statistical gradient descent function was used to optimize the weights.We found that the following variables resulted in the best object detection rate in our study; a learning rate of 0.0001, 6000 training steps, and a batch size of 50.
Only two large input window sizes of training sample patches were used for the deeper method of the seven-layer depth CNN (D-CNN).In this case, the CNN was prepared using the same structure of a convolution layer with a kernel size of 5 as the first layer and continuing with further convolution layers with the kernel size of 3 and max-pooling layers of 2 × 2. This D-CNN was also trained with the input sample patches of a × a × 5 and a × a × 8 units.

Results
The described ML and CNN methods using all mentioned parameters were used on the study in two training zones and tested for another zone.For all tests, we removed those detected landslide objects which were smaller than 70 pixels to account for geometric inaccuracies between the fieldwork samples and the satellite imagery.For the CNN methods the optimal thresholds were used.The statistical analysis (e.g., minimum, maximum, sum, mean and standard deviation) of the landslide detection resulting maps were represented in Table 2.As described earlier, the main goals were (a) to compare the ML methods ANN, SVM and RF to CNNs and, (b) to investigate the impact of the input window size and the layer depth of a CNN on the accuracy of landslide detection.The samples resulting from multiple input window sizes for two non-landslide areas and two landslide areas are presented in Figure 6.

Results
The described ML and CNN methods using all mentioned parameters were used on the study in two training zones and tested for another zone.For all tests, we removed those detected landslide objects which were smaller than 70 pixels to account for geometric inaccuracies between the fieldwork samples and the satellite imagery.For the CNN methods the optimal thresholds were used.The statistical analysis (e.g., minimum, maximum, sum, mean and standard deviation) of the landslide detection resulting maps were represented in Table 2.As described earlier, the main goals were a) to compare the ML methods ANN, SVM and RF to CNNs and, b) to investigate the impact of the input window size and the layer depth of a CNN on the accuracy of landslide detection.The samples resulting from multiple input window sizes for two non-landslide areas and two landslide areas are presented in Figure 6.A total of eighteen landslide maps was generated based on all methods and parameters used.Figure 7 shows the 20 resulting maps.For the ML methods ANN, SVM and RF, we implemented these methods first with five spectral layers from the RapidEye images (R, G, B, NIR) and the NDVI and called the resulting maps ANN5, SVM and RF .In addition, we created eight layers including the five mentioned spectral bands plus the three topographical layers slope, aspect and plan curvature.We called these resulting layers ANN8, SVM and RF .This means that four landslide maps in total were created with ML methods.More parameters were used for implementing the CNN methods: in addition to using both five and eight-layer training data sets, different input A total of eighteen landslide maps was generated based on all methods and parameters used.Figure 7 shows the 20 resulting maps.For the ML methods ANN, SVM and RF, we implemented these methods first five spectral layers from the RapidEye images (R, G, B, NIR) and the NDVI and called the resulting maps ANN 5 , SVM 5 and RF 5 .In addition, we created eight layers including the five mentioned spectral bands plus the three topographical layers slope, aspect and plan curvature.We called these resulting layers ANN 8 , SVM 8 and RF 8 .This means that four landslide maps in total were created with ML methods.More parameters were used for implementing the CNN methods: in addition to using both five and eight-layer training data sets, different input window sizes and depths also were used.Therefore, for the CNN p,q , the index of p corresponds to the size of the convolution input window, while q indicates the number of input layers that were used for training the method.CNN refers to the four-layer depth CNN, and D-CNN refers to the deeper seven-layer CNN. Figure 8 shows an enlargment of two different sub-areas from the test area to illustrate some differences of the identified landslides for different types of input data layers and methodologies.to illustrate some differences of the identified landslides for different types of input data layers and methodologies.In any CNN p,q and ML q the index of p corresponds to the size of the convolution input window, and q indicates the number of input layers that were used for training.Landslide detection results are overlayed on the inventory data.In any CNN p,q and ML q the index of p corresponds to the size of the convolution input window, and q indicates the number of input layers that were used for training.

Accuracy Assessment
In this section, we outline some accuracy assessment methods, which are common and widely used in the remote sensing and the computer vision domains, which were used to evaluate the effectiveness and performance of the applied methods of MLs and CNNs by analyzing the conformity between the landslide inventory dataset and the products of the applied methods.Therefore, existing any uncertainty among the distribution, location, and boundaries of the areas where specified as the landslide in the inventory data set affecting the results of the accuracy assessment processes [9].

Quantitative Methods
The comparison of accuracies is based on three kinds of classified pixels, namely, true positive (TP), false positive (FP), and false negative (FN) [31,41].TPs are the pixels that were correctly detected as landslide areas.FPs correspond to pixels that were identified as landslide areas based on the classification but are not landslides according to the inventory data set.FNs indicate inventory landslide areas that are not recognized as such by the applied method (see Figure 9).The corresponding area statistics for these three cases is represented in Table 3 for different methods and parameters.To statistically describe the resulting total areas for TPs, FPs and FNs, three different parameters were used, namely precision, recall, and F1.Precision (P) was used to define how much of the classified areas are really landslides.Recall (R) was used to determine how much of the actual (field-measured) landslide areas were classified in the images.The well-known F1 measure was additionally used to calculate the balance between the two mentioned accuracy descriptors.The three measures are explained as Equations ( 2)-(4), respectively.
Recall = True Positives/(True Positives + False Negatives) Table 3 quantitatively illustrates the results of the ML and CNN methods of landslide detection approaches.For our case study area, the CNN 16,5 method achieved the highest precision value of 83.31% closely followed by RF 5 and RF 8 , which achieved precision values of 81.95 and 80.9%, respectively.The CNN 22,5 produced the lowest FN value and obtained the best recall metrics value (92.85%).However, the resulting precision and F1 values of this method were lower.CNN 16,5 yielded the highest F1 value of 87.8%.
additionally used to calculate the balance between the two mentioned accuracy descriptors.The three measures are explained as equations 2, 3 and 4, respectively.
Recall = True Positives (True Positives + False Negatives) ⁄ Table 3 quantitatively illustrates the results of the ML and CNN methods of landslide detection approaches.For our case study area, the CNN , method achieved the highest precision value of 83.31% closely followed by RF and RF , which achieved precision values of 81.95 and 80.9%, respectively.The CNN , produced the lowest FN value and obtained the best recall metrics value (92.85%).However, the resulting precision and F1 values of this method were lower.CNN , yielded the highest F1 value of 87.8%.

Mean Intersection-over-Union (mIOU)
The mIOU is a validation metric used to measure the accuracy of the result of a predictor method on a particular dataset.This validation metric is widely used in computer vision, particularly for object detection challenges [64].Generally, mIOU is a general validation metric where any method that produces bounding polygons can be validated by using mIoU based on a precise inventory dataset of target polygons (see Figure 10).It is described as the mean of the following Equation ( 5): The mIOU value for each resulting landslide map was calculated and represented in Table 3.Based on these values, CNN 16,5 yielded the best landslide detection results with the highest mIOU value of 78.26, followed by 70.62 and 66.9 obtained with CNN 16,8 and RF 8 , respectively (see Table 3).

Mean Intersection-over-Union (mIOU)
The mIOU is a validation metric used to measure the accuracy of the result of a predictor method on a particular dataset.This validation metric is widely used in computer vision, particularly for object detection challenges [64].Generally, mIOU is a general validation metric where any method that produces bounding polygons can be validated by using mIoU based on a precise inventory dataset of target polygons (see Figure 10).It is described as the mean of the following equation ( 5): The mIOU value for each resulting landslide map was calculated and represented in Table 3.Based on these values, CNN , yielded the best landslide detection results with the highest mIOU value of 78.26, followed by 70.62 and 66.9 obtained with CNN , and RF , respectively (see Table 3).

Discussion
This study proves that it is important to select the appropriate methods and parameters.It is not as simple as to generally compare, for instance, ML methods with CNNs.It turns out that there are multiple options to design a CNN and that a CNN will not automatically outperform other methods-as popular science articles and magazines may imply.Also, for the same method used, different training strategies will influence the results.In this study, we used two different training data sets, different numbers of layers and different depths of CNNs.First, we only focused on the spectral information.This five-layer training data set showed more accurate results than using the eight-layer training data set which also had three topographic layers.Almost all of the applied methods (except for CNN 32 ) yielded better results when using only the spectral information (four original bands plus NDVI).Therefore, we may say that in this study the topographical information did not improve the results.This was somewhat unexpected, and it is unclear whether topographical information could be used in other study areas with other settings to improve the overall classification accuracy or not.Although topographical information slightly reduced the overall accuracy of the results, it was very helpful for distinguishing between settlement areas and the landslide areas, which have a similar spectral behavior (see Figure 11).Here, especially the slope layer was useful because most of the landslides typically occur on steep slopes which are unlikely to be settlement areas.Thus, most of the eight-layer-based results clearly distinguished settlement areas from landslide areas.However, since almost all landslides are located in steep areas, all used ML and CNN methods overestimate landslides in steep areas.The drawback of having these misclassifications associated with using eight layers was more significant in our case study compared to the advantage of having improved detection of settlements since there were only relatively few and small settlement areas.For CNNs, the effects of using different input window sizes were evaluated.Any used CNN input window size yielded similar results between the two training datasets (see Figure 12) while different CNN input window sizes resulted in different accuracies.Unfortunately, these differences For CNNs, the effects of using different input window sizes were evaluated.Any used CNN input window size yielded similar results between the two training datasets (see Figure 12) while different CNN input window sizes resulted in different accuracies.Unfortunately, these differences are not systematic.For instance, increasing the CNN input window size from CNN 12 to CNN 16 improved the accuracy but further increases led to lower overall accuracies and especially lower mIOU values.This is presumably due to the fact that larger input windows negatively influenced the classification of random points distributed within the landslide polygons of the inventory data.Some of the randomly distributed points will be close to the border of a landslide area.Thus, non-landslide areas will increase along with increased input window sizes.Nevertheless, increasing the layer depth from four to seven layers in the CNN method resulted in a better performance of larger input window sizes for both training datasets (see Figure 13).Although the deeper structured D-CNN was limited to input window sizes of 32 and 48, this method could significantly improve these two input window sizes accuracies compared to using the 4 layers CNN.Most significantly, the mIOU value for CNN 32,5 increased from 48.42% to 63.66% by using the deeper method (D-DCNN 32,5 ). are not systematic.For instance, increasing the CNN input window size from CNN to CNN improved the accuracy but further increases led to lower overall accuracies and especially lower mIOU values.This is presumably due to the fact that larger input windows negatively influenced the classification of random points distributed within the landslide polygons of the inventory data.Some of the randomly distributed points will be close to the border of a landslide area.Thus, non-landslide areas will increase along with increased input window sizes.Nevertheless, increasing the layer depth from four to seven layers in the CNN method resulted in a better performance of larger input window sizes for both training datasets (see Figure 13).Although the deeper structured D-CNN was limited to input window sizes of 32 and 48, this method could significantly improve these two input window sizes accuracies compared to using the 4 layers CNN.Most significantly, the mIOU value for CNN , increased from 48.42% to 63.66% by using the deeper method (D-DCNN , ).

Conclusions
The increasing availability of VHR remotely sensed imagery opens many options for landslide mapping and for producing and updating landslide inventories.Landslide mapping is still a challenging task due to the complexity of factors triggering landslides and the many forms, sizes, and shapes landslides can take.Landslide inventory data sets are traditionally generated by field surveys and visual interpretation of satellite images.However, such surveys are time-consuming, expensive and often dangerous.There are also some semi-automated and case-based automatic methods for landslide detection and even classification [65].Recently, ML, deep-learning methods and particularly CNNs have been shown to be powerful in object detection from images if a huge number of training samples exist.In this study, we analyzed and compared current ML and CNN are not systematic.For instance, increasing the CNN input window size from CNN to CNN improved the accuracy but further increases led to lower overall accuracies and especially lower mIOU values.This is presumably due to the fact that larger input windows negatively influenced the classification of random points distributed within the landslide polygons of the inventory data.Some of the randomly distributed points will be close to the border of a landslide area.Thus, non-landslide areas will increase along with increased input window sizes.Nevertheless, increasing the layer depth from four to seven layers in the CNN method resulted in a better performance of larger input window sizes for both training datasets (see Figure 13).Although the deeper structured D-CNN was limited to input window sizes of 32 and 48, this method could significantly improve these two input window sizes accuracies compared to using the 4 layers CNN.Most significantly, the mIOU value for CNN , increased from 48.42% to 63.66% by using the deeper method (D-DCNN , ).

Conclusions
The increasing availability of VHR remotely sensed imagery opens many options for landslide mapping and for producing and updating landslide inventories.Landslide mapping is still a challenging task due to the complexity of factors triggering landslides and the many forms, sizes, and shapes landslides can take.Landslide inventory data sets are traditionally generated by field surveys and visual interpretation of satellite images.However, such surveys are time-consuming, expensive and often dangerous.There are also some semi-automated and case-based automatic methods for landslide detection and even classification [65].Recently, ML, deep-learning methods and particularly CNNs have been shown to be powerful in object detection from images if a huge number of training samples exist.In this study, we analyzed and compared current ML and CNN

Conclusions
The increasing availability of VHR remotely sensed imagery opens many options for landslide mapping and for producing and updating landslide inventories.Landslide mapping is still a challenging task due to the complexity of factors triggering landslides and the many forms, sizes, and shapes landslides can take.Landslide inventory data sets are traditionally generated by field surveys and visual interpretation of satellite images.However, such surveys are time-consuming, expensive and often dangerous.There are also some semi-automated and case-based automatic methods for landslide detection and even classification [65].Recently, ML, deep-learning methods and particularly CNNs have been shown to be powerful in object detection from images if a huge number of training samples exist.In this study, we analyzed and compared current ML and CNN methods for landslide detection in a case study in the higher Himalayas.We considered different training data sets and parameters.Our experiments revealed that CNN do not necessarily outperform ANN, SVM and RF, rather only in ideal cases.How can a user know in advance which CNN structure is ideal?In our case, CNNs based only on spectral information and 16 pixel input window size yielded the best results.As described, we expected that adding topographic layers would improve the accuracy, but this was not the case.We also observed that larger input window sizes for the same CNN structure tendentially decreased accuracies.Conversely, deeper CNN layer structures can be positive.However, how can users know this beforehand?We may conclude that CNNs have a high potential for landslide detection, but users should not be misled by results from crowdsourcing campaigns in computer vision (is there a cat in an image).
On the positive side, deep-learning object detection methods require less human supervision than traditional methods and can be easily transformed to other regions by retraining the model with other related training data; however, the resulted accuracy of the transition of these methods for extracting the landslides on a global scale is still unclear.CNN may be the most efficient for the recognition of complex image patterns and for semantic classifications.CNNs and deep-learning methods at the pixel level seem to be problematic for deriving exact borders of landslides.For our future work, we aim to develop an object-based CNN method for landslide detection.Instead of using CNN input windows based on random points, which yielded moderate accuracies-especially for larger input windows-in this study, we want to use object segments with precise boundaries and define the optimal input window size based on a hierarchical patch dynamics paradigm [66] or some simple ways like the bounding box of a landslide.However, such a methodology will require even more training data.Augmentation strategies may help-as in this study-but their effects on the results are currently not fully understood.

23 Figure 1 .
Figure 1.A true color composite of RapidEye bands 3/2/1 acquired on 18 June 2015, illustrating the geographic location of the study area.

Figure 1 .
Figure 1.A true color composite of RapidEye bands 3/2/1 acquired on 18 June 2015, illustrating the geographic location of the study area.

23 Figure 3 .
Figure 3. Field photographs showing landslide areas during field survey in the Rasuwa district.

Figure 3 .
Figure 3. Field photographs showing landslide areas during field survey in the Rasuwa district.
Remote Sens. 2018, 10, x FOR PEER REVIEW 9 of 23the study area and a single landslide may include different aspects.Most landslides exhibit a mixture of topographic features, which makes them difficult to recognize.Zhang et al.[59] used different sizes of input windows and developed a similar approach for the detection of complex shape objects in urban areas.

Figure 4 .
Figure 4.The multiple CNN input window sizes for (a) landslide and (b) non-landslide areas.

Figure 4 .
Figure 4.The multiple CNN input window sizes for (a) landslide and (b) non-landslide areas.3.6.2.Augmentation of the Training Data SetDeep-learning methodologies and CNN in particular, need a huge number of sample patches for efficient training[61] in the form of a so-called labeled training dataset[62].Preparing such large training datasets can be problematic in practice.The accuracy of the training dataset is also vital, and only a training dataset with a high quality and quantity of samples will yield accurate results.Thus, the size of the training dataset plays a crucial role in the results of CNNs.Therefore, some augmentation techniques have been developed to artificially multiply an existing training dataset.Although the potential impacts of different data augmentation techniques on the results are not clear from the literature, they are believed to improve the performance of the training of the CNNs.The techniques used are also called data distortion since they use particular deformations to increase the volume of the dataset.A deformation applied to a training data set can be a translation or a rotation, or randomly mirroring the image or window shifting.All of these techniques may have particular pros and cons[41].In this study, because of the dispersion of landslides and their sizes, shapes and directions, we used a random window shifting technique.Applying this data augmentation technique increases the size of our training dataset from the randomly selected 3500 points that were also used for our ML methods

Figure 5 .
Figure 5.The CNN architectures with (a) seven-layer depth (D-CNN) and (b) four-layer depth CNN, training separately with five spectral layers and eight layers that included spectral plus with topographical ones.Input window sizes of 32 × 32 and 48 × 48 were used for D-CNN, and window sizes of 12 × 12, 16 × 16, 22 × 22, 32 × 32 and 48 × 48 were used for CNN.

2 × a−4 2 × 1 2 − 2 ] 2 − 2 ] 4 − 4 ]
in the same number of feature maps from the previous layer.The resulting feature maps were then used as input data for the second convolution layer.Consequently, convolution with a kernel size of 3 led to 20 feature maps with [ a−4 × [ a−4 × 1 units.We selected the kernel sizes and the number of feature maps with consideration of the used input window sizes and the spread of our target objects.The last convolution layer was used with a kernel size of 3, which resulted in 40 feature maps with [ a−× [ a−4 2

Figure 6 .
Figure 6.An illustration of convolution input window sizes from (a) two non-landslide areas and (b) two landslide areas.

Figure 6 .
Figure 6.An illustration of convolution input window sizes from (a) two non-landslide areas and (b) two landslide areas.

Figure 7 .
Figure 7. Landslide detection results using different ML and CNN methods, training datasets and parameters.In any CNN , and ML the index of p corresponds to the size of the convolution input window, and q indicates the number of input layers that were used for training.

Figure 7 .
Figure 7. Landslide detection results using different ML and CNN methods, training datasets and parameters.In any CNN p,q and ML q the index of p corresponds to the size of the convolution input window, and q indicates the number of input layers that were used for training.

Figure 8 .
Figure 8. Enlarged maps of two different sub-areas from the test area.Landslide detection results are overlayed on the inventory data.In any CNN p,q and ML q the index of p corresponds to the size of the convolution input window, and q indicates the number of input layers that were used for training.

Figure 9 .
Figure 9. Inventory map with GPS-measured landslide polygons optimized by manual detection and correction using RapidEye satellite images (upper raw).The three sample areas of a, b and, c in the lower raw illustrate true positives (TP), false positives (FP) and false negatives (FN).

Figure 9 .
Figure 9. Inventory map with GPS-measured landslide polygons optimized by manual detection and correction using RapidEye satellite images (upper raw).The three sample areas of a, b and, c in the lower raw illustrate true positives (TP), false positives (FP) and false negatives (FN).

Figure 10 .
Figure 10.Illustration of the area of overlap (a) and area of union (b) for a detected landslide as compared to the corresponding area from the inventory data set.

Figure 10 .
Figure 10.Illustration of the area of overlap (a) and area of union (b) for a detected landslide as compared to the corresponding area from the inventory data set.

Figure 12 .
Figure 12.The influence of multiple input window sizes on the F1 measure (left) and mIOU (right) of CNN method for both 5 and 8 layer training datasets.

Figure 12 .
Figure 12.The influence of multiple input window sizes on the F1 measure (left) and mIOU (right) of CNN method for both 5 and 8 layer training datasets.

Figure 12 .
Figure 12.The influence of multiple input window sizes on the F1 measure (left) and mIOU (right) of CNN method for both 5 and 8 layer training datasets.

Table 1 .
Landslide inventories prepared as part of different studies.

Table 2 .
Statistical analysis of the landslide detection results.The count of the extracted polygons as landslide areas, the minimum and maximum sizes of the landslide areas, total detected areas, mean and standard deviation values.

Table 3 .
Landslide detection results for the test zone for the ML and CNN methods trained with two different training datasets of five and eight layers.For CNNs, multiple input window sizes and layer depths were applied.Accuracies are stated as precision, recall, F1-measure, and mIOU.

Table 3 .
Landslide detection results for the test zone for the ML and CNN methods trained with two different training datasets of five and eight layers.For CNNs, multiple input window sizes and layer depths were applied.Accuracies are stated as precision, recall, F1-measure, and mIOU.