Landslide Detection Using Multi-Scale Image Segmentation and Different Machine Learning Models in the Higher Himalayas

Landslides represent a severe hazard in many areas of the world. Accurate landslide maps are needed to document the occurrence and extent of landslides and to investigate their distribution, types, and the pattern of slope failures. Landslide maps are also crucial for determining landslide susceptibility and risk. Satellite data have been widely used for such investigations—next to data from airborne or unmanned aerial vehicle (UAV)-borne campaigns and Digital Elevation Models (DEMs). We have developed a methodology that incorporates object-based image analysis (OBIA) with three machine learning (ML) methods, namely, the multilayer perceptron neural network (MLP-NN) and random forest (RF), for landslide detection. We identified the optimal scale parameters (SP) and used them for multi-scale segmentation and further analysis. We evaluated the resulting objects using the object pureness index (OPI), object matching index (OMI), and object fitness index (OFI) measures. We then applied two different methods to optimize the landslide detection task: (a) an ensemble method of stacking that combines the different ML methods for improving the performance, and (b) Dempster–Shafer theory (DST), to combine the multi-scale segmentation and classification results. Through the combination of three ML methods and the multi-scale approach, the framework enhanced landslide detection when it was tested for detecting earthquake-triggered landslides in Rasuwa district, Nepal. PlanetScope optical satellite images and a DEM were used, along with the derived landslide conditioning factors. Different accuracy assessment measures were used to compare the results against a field-based landslide inventory. All ML methods yielded the highest overall accuracies ranging from 83.3% to 87.2% when using objects with the optimal SP compared to other SPs. However, applying DST to combine the multi-scale results of each ML method significantly increased the overall accuracies to almost 90%. Overall, the integration of OBIA with ML methods resulted in appropriate landslide detections, but using the optimal SP and ML method is crucial for success.


Introduction
Landslides represent a significant threat to human life, natural resources, infrastructure, and properties in mountainous areas [1].A landslide is defined as the movement of a mass of debris, rocks, or slope failures, which occurs during rainfall, runoff, rapid snowmelt, earthquakes, and volcanic eruptions [2,3].As well as the physical impacts on the environment, landslides also have adverse consequences for the economy of local communities [4,5].Landslides can occur for a range of reasons; for instance, they can be triggered by earthquake shocks, heavy rainfall, or road construction in hilly areas [6,7].Despite some progress being obtained through scientific studies, landslide susceptibility modeling and mapping pose significant challenges for land-use planners and policymakers [8,9].Regardless of the type of methodology applied for landslide susceptibility mapping, reliable inventory data sets play an essential role in this process.A landslide inventory data set, including precise boundaries, spatial locations, and distributions, can be produced by conducting field surveys using the global positioning system (GPS), which is an expensive and, in some cases, dangerous approach due to the rough topography and instability [10,11].Therefore, Earth observation (EO) products are considered a low cost and useful data source for landslide inventory data set production [12].The two main approaches of object-based and pixel-based classification methods have been used for the classification of satellite imagery and information extraction from EO data.Based on improvements in the fields of computer vision and image processing of the last two decades, object-based image analysis (OBIA) has become more widespread [13].OBIA is a relatively new sub-discipline of geographic information science (GIScience) and makes it possible to produce useful geographic information based on the partitioning of EO data into meaningful image objects applicable for the class or feature of interest [14].OBIA is a knowledge-driven approach, which-by mimicking human perception-tries to group a set of contiguous pixels into meaningful objects through a segmentation process that represents corresponding features in an image [15,16].Compared to pixel-based approaches, which depend on the digital number (DN) of pixels, OBIA integrates and employs spectral information (e.g., color) and spatial properties (e.g., size and shape), along with textural data and contextual information (e.g., association with neighboring objects) [17], to classify objects into desired classes.
In OBIA, image segmentation is an essential pre-requisite for classification/feature extraction and further analysis with geographic information systems (GIS) [17,18].The segmentation process controls the accuracy of further image analysis steps, such as classification and object detection [19].In other words, the segmentation procedure has a considerable influence on further processes [20,21], and incorrect segmentation usually results in over-segmentation and under segmentation errors [22].Therefore, defining the optimal parameters for object definition plays an essential role in detecting landslides through the image segmentation process.The optimal scale parameter (SP) should be considered in defining and generating meaningful segments/objects for segmentation [23].Although segmentation and primary object definition are never considered perfect, it is possible to use spectral and spatial indexes to obtain the optimal scale parameter (SP) for segmentation.Besides, many landslides that we can detect with EO data have a multi-scale character: along with their various sizes, they are composites of different entities, such as landslide bodies and affected areas, which are usually defined as the landslide area [1].An optimal scale out of multiple scales results in less internal heterogeneity concerning particular parameters compared to the adjacent areas [15].
Since landslides in the real world come in a wide range of shapes and sizes and are embedded within different land cover types, expert knowledge plays a vital role in the accuracy of landslide detection with conventional rule-based approaches in OBIA.However, determining the appropriate thresholds to group segments into landslide classes based on each landslide diagnostic parameter is a difficult task.Another challenge is that the conventional approaches are mostly time-consuming and labor-intensive, and are often criticized because of their weakness regarding transferability [24].Currently, OBIA has been integrated with different machine learning (ML) methods and used in various applications [25].Generally, ML methods are considered valid methods for remote sensing (RS) applications with an emphasis on image classification and object recognition [26].Different ML methods and classifiers have already been integrated with OBIA and used for extracting landslides in different studies.For example, the ML method of support vector machines (SVMs) was used by [24] to classify the segments employed to extract forested landslides.The authors trained their semi-automatic method using old and densely vegetated landslides and derived their extent using LiDAR products.Their method was then tested in the Flemish Ardennes (Belgium) and resulted in landslide extraction accuracies of almost 70%.In another study, [27] used the same ML method, but applied the RBF kernel along with OBIA to propose an automatic landslide extraction approach for rainfall-induced landslides on Madeira Island.Furthermore, [28] integrated the K-means clustering method with both pixel-based and OBIA approaches to compare their performance in landslide detection.The integrated approaches were implemented using very high-resolution (VHR) remotely sensed images for their case study area of the San Juan La Laguna, Guatemala.The comparative study revealed that the integration of the K-means clustering method with OBIA was able to identify most of the landslides with less false positives compared to the pixel-based approach.
Although using single ML methods provides acceptable accuracies in landslide extraction and modeling, the combination of two or more ML methods has achieved higher accuracies [29].Chen et al. [30] applied an ensemble method to stack the weights of evidence (WoE) and evidential belief function (EBF) methods with a logistic model tree (LMT) ML classifier for landslide susceptibility mapping.Their results proved that the prediction capability of the ensemble methods was better than that of single methods.
Moreover, to improve image classification and feature extraction, there are relevant probability concepts such as Dempster-Shafer theory (DST).The DST has been applied to classifier models to find the best match between the inventory data set and the resulting classification [31].This probability concept has been used in RS data fusion [32] and landslide susceptibility mapping [33] to deal with uncertainty associated with the results.The DST has been successfully used for combining classifiers in a wide range of applications, such as target identification and object tracking [34,35].In the field of landslide detection, Mezaal et al. [12] used the DST to enhance the results of the integration of OBIA with various ML methods, including SVM, random forest (RF), and K-nearest neighbor (KNN).The DST method performed well in landslide detection in their tropical study area.These pieces of evidence from previously published papers motivated us to apply the DST probability concept to improve the ML classification accuracy through integration with different classifiers.Therefore, in the present study, we integrate the widely used ML methods of logistic regression (LR), the multilayer perceptron neural network (MLP-NN), and RF with OBIA for landslide detection, based on optical data and topographic factors resulting from PlanetScope satellite images and Digital Elevation Model (DEM) data, respectively.To improve the performance of the applied ML methods, the ensemble method of stacking is used to combine them and produce a new result.The optimal scale for image segmentation is derived using the estimation of scale parameters (ESP2) tool [23].Multiple scales are selected using interval values based on the optimal scale.The maps resulting from the multi-scale segmentation by each ML method are then fused using DST to demonstrate the advantages of working in a multi-scale environment.All resulting landslide detection maps are then validated using standard RS accuracy metrics and the validation method of receiver operating characteristics (ROC).

Study Area
In April 2015, an earthquake with a magnitude of 7.8 M struck Nepal, killing almost 9000 people and injuring nearly 22,000 [36].The epicenter of the earthquake was located in the east of Gorkha district, and its hypocenter was at a depth of nearly 8.2 km [37].Due to the magnitude of the earthquake, several landslides occurred across Nepal, especially in the east of the Gorkha district.
The study area of this research is located in the southern part of Rasuwa district (see Figure 1), along the Trishuli river.The study area covers around 17,100 ha, and the elevation ranges between 700 and 2038 m above the mean sea level (MSL).The land use largely consists of forested areas, grassland, agricultural land, and rural areas.The study area is located in the higher Himalayas and, due to its rough terrain, is very susceptible to landslide occurrence.According to the Köppen climate classification scheme, the study area falls under a sub-tropical and humid climate with cooler temperatures, and its annual average precipitation is nearly 691 mm.Before this study, some investigations by [38][39][40] and [1] were carried out to extract landslide locations in Rasuwa district.However, the main focus of the present study is evaluating the accuracy and performance of the proposed method of landslide detection compared to the conventional ML methods.temperatures, and its annual average precipitation is nearly 691 mm.Before this study, some investigations by [38][39][40] and [1] were carried out to extract landslide locations in Rasuwa district.However, the main focus of the present study is evaluating the accuracy and performance of the proposed method of landslide detection compared to the conventional ML methods.

Overall Workflow
In this study, we used PlanetScope multispectral images [41] in four bands (Blue, Green, Red, and NIR), along with topographic factors, for landslide detection.The overall workflow (see Figure 2) of this study is as follows: I.
Preparing multispectral images, the spectral index, and topographic derivatives for modeling; II.
Generating multi-scale segments using the ESP2 tool and scale value intervals; III.
Segmentation analysis and evaluation; IV.
Training ML and stacking methods on multi-scale datasets at the object level; V.
Fusing multi-scale ML results using DST; VI.
Applying different accuracy assessment metrics.

Overall Workflow
In this study, we used PlanetScope multispectral images [41] in four bands (Blue, Green, Red, and NIR), along with topographic factors, for landslide detection.The overall workflow (see Figure 2) of this study is as follows: I.
Preparing multispectral images, the spectral index, and topographic derivatives for modeling; II.
Generating multi-scale segments using the ESP2 tool and scale value intervals; III.
Segmentation analysis and evaluation; IV.
Training ML and stacking methods on multi-scale datasets at the object level; V.
Fusing multi-scale ML results using DST; VI.
Applying different accuracy assessment metrics.

Datasets
One of the most critical datasets for landslide detection and prediction is an appropriate landslide inventory, which influences further analyses [42,43].In this case study, a landslide inventory map for the Rasuwa district was obtained from multiple sources, including GPS data from an extensive field survey and manually extracted data from satellite imagery.The satellite images used in this study were taken from the PlanetScope constellation of Planet Labs Company.PlanetScope includes more than 120 satellites that have been operating since 2014 and provide multispectral images with a 3 m spatial resolution and daily revisit time in four bands (Table 1).Due

Datasets
One of the most critical datasets for landslide detection and prediction is an appropriate landslide inventory, which influences further analyses [42,43].In this case study, a landslide inventory map for the Rasuwa district was obtained from multiple sources, including GPS data from an extensive field survey and manually extracted data from satellite imagery.The satellite images used in this study were taken from the PlanetScope constellation of Planet Labs Company.PlanetScope includes more than 120 satellites that have been operating since 2014 and provide multispectral images with a 3 m spatial resolution and daily revisit time in four bands (Table 1).Due to the rugged topography of the study area, some landslide events could not be registered by the field survey, so satellite images were used to identify and digitize them.Subsequently, all types of data were converted to the polygon format in QGIS 3.8.The total number and area of landslide events that were mapped were 194 and 64120 ha, respectively.Moreover, the minimum, maximum, and standard deviation of the area of the landslide polygons were 0.07, 16, and 2.17 ha, respectively.Along with PlanetScope images (Table 1), the normalized difference vegetation index (NDVI) index-which is widely applied in landslide modeling [1,7,44]-was calculated from the NIR and Red bands to be used in landslide detection.Th probability of landslide occurrence is highly dependent on the surface topography; in other words, hilly and mountainous areas have the highest probability of landslide occurrence [1].Our study area is located within the Himalayan fold and thrust zone of central Nepal.This zone resulted from a collision of the Indian Plate with the Eurasian plate.The precipitation amount of this area varies based on the tropical climatic conditions in the monsoon season, which has more rainfall compared to the summer season [45].Although geological formations and precipitation are also important for landslide modeling, these factors do not change significantly in this study area due to its small size, which is why they were not considered in this study.Therefore, only satellite images and topographic factors were used, and all topographic derivatives, such as elevation, slope, aspect, curvature, and the topographic wetness index (TWI), were calculated based on a 5 m resolution DEM.The selection of topographical factors related to landslide occurrences depends on the landslide type, characteristics of the study area, and scale of the analysis.However, there is no standard approach for the selection of landslide conditioning factors.In the present study, six topographical landslide conditioning factors, namely, slope, slope aspect, curvature, plan and profile curvature, and altitude, were generated from a 5 m resolution DEM acquired from the Japanese aerospace exploration agency JAXA ALOS sensor.One of the most critical factors controlling slope stability is the slope angle.The slope angle is regularly used in landslide detection and susceptibility studies [46].In the study area, the slope map was prepared using a DEM, and the slope ranged from 0.05 • to 75.26 • .The slope aspect was considered a topographical conditioning factor representing the slope direction [47].The slope aspect factor mainly affects the hydrological system through evapotranspiration and, consequently, affects vegetation.Our aspect layer was classified, as this layer comprised sharp differences and changes.As such, the slope aspect of the area was divided into nine classes, namely, north, northwest, northeast, east, south, southeast, southwest, west, and flat.
We extracted the plane curvature from the DEM.The plane curvature represents the curvature of the contour line formed by the intersection of the surface with the horizontal plane [48].The convergence and divergence of water in downhill flow are influenced by plane curvature [49].The plane curvature represents the rate of change of aspect, in which positive values indicate convex curvature, zero denotes low change, and negative values indicate concave curvature.The values range from −56 to 74.14.The profile curvature is also crucial for the water flow speed variation from higher to lower areas [7].The conditioning factor of altitude is another instability factor in our region, and landslide events at higher altitudes are usually influenced by gravity.Altitude influences topographical attributes and the Earth's surface, which accounts for spatial variability in precipitation, soil thickness, erosion, and vegetation types [50].In this study area, the elevation ranges from 734 to 4050 m above the mean sea level.
The landslide areas are usually not vegetated, especially due to recent landslide events [7].Therefore, we used the normalized difference vegetation index (NDVI) to better distinguish between landslide and non-landslide areas.This index is useful for our study area, which is covered by forest.In this study, the NDVI map was derived from PlanetScope imagery from 28 November 2015 (see Equation ( 1)).
The NDVI was calculated from the near-infrared and the red spectral bands.The NDVI values vary from −0.44 to 0.72 in the study area.
The TWI is another important topographical conditioning factor within the morphometric conditions of the terrain, as it evaluates the cumulative flow rate upstream with the slope angle [51].The TWI is calculated by Equation ( 2): where α is the specific catchment area (m) and β is the slope.All conditioning factors are represented in Figure 3.

Multi-Scale Image Segmentation
The demand for high-resolution (HR) and VHR satellite imagery, together with the resulting volume, variety, and velocity, as well as the rapid development and progress of EO technologies, provide significant challenges for the RS community with respect to the complexity of image understanding (IU) [52].Therefore, OBIA, as an almost new paradigm in EO data analysis and image processing, has attracted more and more attention in the RS community and has been applied in several applications, such as image classification and object extraction [17].OBIA is a knowledge-driven approach that aims to produce meaningful objects by using geometric and spectral characteristics, such as size, shape, texture, color, and contextual information, to present a better IU based on the real world [13].In OBIA, segmentation is a crucial step [53], which considerably influences further analyses and results.The way that segmentation parameters (e.g., scale, shape, and compactness) are selected impacts the quality of the image objects [54].In this study, multiresolution segmentation (MRS) was applied for the segmentation of our PlanetScope image (see Figure 4).MRS is a bottom-up segmentation technique, which is based on the pairwise region-merging approach [12].In this case, smaller detected objects were merged with larger ones, considering various parameters of scale, color, and shape (i.e., compactness and smoothness).These parameters are usually selected by trial-and-error, which requires expert knowledge and is a time-consuming and uncertain task [12,55].Although some automatic techniques, like the Taguchi optimization method [55], have been introduced for defining the optimal parameters for the segmentation process, the process of detecting optimal objects is still a challenging task, mostly because of the diversity in the sizes and shapes of the target features [56].
The SP is considered the most crucial parameter in the MRS.In this study, the optimal SP was derived using the ESP2 tool, developed by [23].This automated tool is a technique for selecting the SPs, which delivers three distinct scales using MRS, implemented in the eCognition Developer software.The ESP2 tool resulted in SP values at three different scale levels which, for our case, were 112 for scale level 1, 242 for scale level 2, and 1142 for scale level 3.The optimal SP value of the second level was selected along with two other scales based on an interval of 50 for further analysis and landslide detection.In pixel-based image classification, thematic accuracy assessment is the conventional method applied to evaluate the performance of an image classifier and the accuracy of a produced map [57], based on the proportion of correctly and incorrectly classified pixels, which is a count-based classification accuracy index [21].In OBIA, on the other hand, processing units are image objects, which are two-dimensional polygon features.Therefore, it is necessary to evaluate these objects with reference objects to assess the performance of the applied segmentation method [22].
In this regard, a variety of indices and metrics have been proposed, which are either area-based or location-based [22].The accuracy of the segmentation using the selected scales was evaluated and presented in Section 3.3.2.

Segmentation Accuracy Evaluation
Although we used the ESP2 tool to determine the optimal SP, there is no standard method to select segmentation parameters or assess the accuracy of the resulting objects [22].The segmentation parameters are usually modified based on the desired target features, and they are challenging to implement and cannot appropriately address errors like under-and over-segmentation [22].Therefore, we applied two indices that enabled us to evaluate the resulting objects using the multiresolution segmentation technique, along with the ESP2 tool and the corresponding interval values.These indices were the Object Pureness Index (OPI) on the one hand, and the Object Matching Index (OMI) on the other hand.OPI is a measure used to assess the integrity of the object in terms of spectral characteristics, and it is based on the standard deviation (SD) of multispectral bands because there is a robust positive correlation between the applied bands.The reason that SD is selected instead of each band's mean value is that mean values vary among these bands, while SD values are very close to each other, especially for integrated objects like vegetated areas and water bodies.The other considered measure, OMI, evaluates the spatial match between the reference object and the image objects.The mathematical explanation of these measures is stated in Equation ( 3): where SDB, SDG, and SDR stand for the SD of the Blue, Green, and Red bands, respectively, and Max SD stands for the maximum SD value among these three bands.The OPI values close to 1 indicate that spectral variance is very low and the object is pure.In comparison, OPI values close to zero indicate that there is significant variance among the resulting objects, which usually happens when large-scale values are applied in the segmentation process.OPI alone is not adequate for selecting segmentation parameters, because it only evaluates objects in terms of spectral features, and does not address the spatial matching of objects.Therefore, the OMI (see Equation ( 4)) was also applied to evaluate our segmentation: where R is a reference object and S is a segmented object.An OMI value equal to 1 shows a perfect match between R and S, while values less than 1 indicate over-segmentation and values greater than 1 indicate under-segmentation.The Object Fitness Index (OFI) is calculated (as per Equation ( 5)) to

Segmentation Accuracy Evaluation
Although we used the ESP2 tool to determine the optimal SP, there is no standard method to select segmentation parameters or assess the accuracy of the resulting objects [22].The segmentation parameters are usually modified based on the desired target features, and they are challenging to implement and cannot appropriately address errors like under-and over-segmentation [22].Therefore, we applied two indices that enabled us to evaluate the resulting objects using the multiresolution segmentation technique, along with the ESP2 tool and the corresponding interval values.These indices were the Object Pureness Index (OPI) on the one hand, and the Object Matching Index (OMI) on the other hand.OPI is a measure used to assess the integrity of the object in terms of spectral characteristics, and it is based on the standard deviation (SD) of multispectral bands because there is a robust positive correlation between the applied bands.The reason that SD is selected instead of each band's mean value is that mean values vary among these bands, while SD values are very close to each other, especially for integrated objects like vegetated areas and water bodies.The other considered measure, OMI, evaluates the spatial match between the reference object and the image objects.The mathematical explanation of these measures is stated in Equation ( 3): where SD B , SD G , and SD R stand for the SD of the Blue, Green, and Red bands, respectively, and Max SD stands for the maximum SD value among these three bands.The OPI values close to 1 indicate that spectral variance is very low and the object is pure.In comparison, OPI values close to zero indicate that there is significant variance among the resulting objects, which usually happens when large-scale values are applied in the segmentation process.OPI alone is not adequate for selecting segmentation parameters, because it only evaluates objects in terms of spectral features, and does not address the spatial matching of objects.Therefore, the OMI (see Equation ( 4)) was also applied to evaluate our segmentation: where R is a reference object and S is a segmented object.An OMI value equal to 1 shows a perfect match between R and S, while values less than 1 indicate over-segmentation and values greater than 1 indicate under-segmentation.The Object Fitness Index (OFI) is calculated (as per Equation ( 5)) to obtain a single index that can present the optimal object generation scale based on OPI and OMI, among other scales: OFI values close to 1 point to the optimal segmentation scale.Resulting values less and greater than 1 indicate over-and under-segmentation, respectively.

Multilayer Perceptron Neural Network (MLP-NN)
ML algorithms have been widely used in various scientific fields, especially in the earth sciences, to overcome complex problems [58,59].ML is considered a subdivision of artificial intelligence, which imitates the human brain's performance in problem-solving and decision making [60].In doing so, it uses a variety of algorithms, like artificial neural network (ANN), in the learning process [59].MLP-NN is an ANN method that has been widely used in geohazard modeling [1].The performance of this method is affected by variables like the model's structure, the type of applied activation functions, and which weight updating method is used [61].In general, MLP-NN consists of an input layer, one or more hidden layers, and an output layer.In geohazard modeling, like landslide susceptibility modeling, the input layer includes neurons that are the same as the landslide affecting factors, and the number of hidden layers depends on the training data [62] and the complexity of the problem [1].In this study, the backpropagation algorithm (BPA), which is the primary training method employed in neural networks, was used for updating weights; for two hidden layers, 24 neurons for each layer were allocated.To run the method, initial weights, which are randomly chosen by the BPA, are allocated to each neuron, and the method is then continually optimized based on the error rate between the output and expected values until the error rate is stabilized (see Equation ( 6)): where W denotes the vector of weights, X is the input vector of features of objects, and b is the bias.Additionally, the sigmoid activation function, which was used in the present study, can be explained by Equation (7): where f (z) is the output of the activation function, which ranges from 0 to 1.

Logistic Regression (LR)
Logistic regression is one of the multivariate analysis methods that allow us to create a multivariate regression connection among a set of independent variables and one dependent variable [63].LR is a powerful method for predicting the presence of an event by fitting the best linear model based on independent variable values, and it is a commonly used statistical method applied in landslide susceptibly analysis [64].One of the significant advantages of LR is that, in this method, the independent variables can be both discrete and continuous, or a mix of these variable types can be used [65].In this case, the dependent variable was introduced as a binary value of 0 and 1, whereby 0 showed the absence of a landslide event and 1 indicated the presence of a landslide.The mathematical definition of LR is defined by Equation (8) [63]: where p, in this case, is the probability of a landslide occurring, which varies between 0 and 1.Furthermore, z is the linear combination, which fits a linear equation to independent variables (landside conditioning factors), as shown by Equation ( 9) [63]: where b 0 , b i , and x i are intercepts of the method, coefficient, and independent variables, respectively.

Random Forest (RF)
RF is a powerful supervised ML method, which was proposed by [66] and has been widely used in RS and GIS applications, such as image classification [67] and landslide susceptibility mapping [1].This method is based on decision trees and operates by constructing a multitude of decision trees during the training process, which makes it less sensitive to over-fitting issues [1].In the RF method, each decision tree generates outputs, and then outputs weights, which are derived from the votes, are dedicated.The advantages of RF are that it is easy to apply because it requires fewer parameters and it yields a higher accuracy compared to other ML methods due to the bagging process [68].Additionally, it can deal with high-dimensional and complex data structures [69].Due to its simplicity and better performance, we selected this method as another algorithm for landslide detection in this study.

Stacking Machine Learning (ML) Methods
The main idea behind ensemble methods in ML is to improve the performance of a final method, by combining various methods to build a powerful learner to predict or classify a set of data [70].The strength of using ensemble methods is that we can decrease the variance by combining several single methods, which, individually, do not yield an excellent performance [70].There are three main ensemble machine learning methods: bagging, boosting, and stacking.In this case, the stacking method was used.In stacking, there are two levels, namely, level 0 and level 1.In level 0, single methods make a set of predictions based on training data, and these outputs (predictions) are then used as inputs for a meta-learner, which is a single method, to make a new set of predictions [71].Therefore, in this case, ML methods such as LR, MLP-NN, and RF were trained as single methods in level 0, and LR was then used as our meta-learner in level 1 to make the final predictions.

Integration of MLP-NN and OBIA for Landslide Detection
In conventional landslide detection and susceptibility mapping using machine learning algorithms, analyses are usually performed on an n-dimensional raster dataset or, to be specific, calculations are at the pixel level, and the pixel values in each layer are processed.Applying ML algorithms to large or very high-resolution datasets at a pixel level requires robust computing systems and is time-consuming, in particular, when the spatial extent of the study area is significant.Meanwhile, in OBIA, analyses are carried out at an object level [53], and processes like classification are relatively fast compared to pixel-based methods because, instead of processing each pixel in a set that shares the same values, the mean value of the object that entails a set of similar pixels will be analysed.In addition to the mean values of objects in each layer (e.g., image bands, DEM, NDVI, etc...), other statistical and geometrical characteristics, such as the SD, mode, shape index, position, and texture, can be calculated and used in classification and feature extraction [53].Therefore, in this integrated approach, we first produced image objects through a multiresolution segmentation process by using different scale factors and then calculated the statistical and geometrical properties for each object.To train ML methods, objects that had more than a 75% overlapping area with training inventory polygons were selected as the training dataset.Finally, the methods were trained and applied on test data, which were converted to a comma-separated values (CSV) format in Python using sklearn's ML package.

Dempster-Shafer Theory (DST)
The concept of DST is made based on a frame of discernment and known as a belief function (Bel) that is derived from Bayesian probability theory (BPT).It obtained its name from the extensions and clarifications presented by Shafer and is considered a great approach to integrating spatial data [72].The DST is an effective method for modeling imprecision and uncertainty assessment.The DST is transformed from events to proposition and an event set to proposition set, which defines the concept of the basic probability assignments (bpa) function, Bel, and the plausibility function (Pl), and determines the one-to-one relationship of proposition and set.Therefore, the DST can translate the proposition uncertainty into a set uncertainty [73].In this theory, displaying information requires two essential functions, namely, Bel and the Pl, which derive the lower bound value for a known probability function and the upper bound value for an unknown probability function.The differentiation between Bel and Pl illustrates the uncertainty of the knowledge about the objective proposition.
The DST provides an extension of the probability framework for assessing the uncertainty of any imprecision event of the probability P(M l ) that the alternative method M l , l = 1, ..., n is correct.The lower bound indicates the degree of knowledge or belief that supports M l and represents Bel (M l ).In comparison, the upper bound indicates the probability of M l , and is called plausibility Pl (M l ) [74] (see the Equations ( 10) and ( 11)): where the summation is obtained over all sets B∈ 2 8 with B ⊆ A in the definition of Bel and the summation in that of Pl is taken over all B∈ 2 8 with B ∩A 0 in which the set of 8 is mutually exclusive and collectively exhaustive hypotheses, and the power set of 8 is denoted by 2 8 .The Bel is the summation of all masses directly assigned by a set of hypothesis A, while the plausibility sums all masses not assigned to the complement of the hypothesis A. An uncertainty interval of Bel(A), Pl(A) that Bel (A) ≤ Pl(A) can be defined; its length is a measure of the imprecision of knowledge about the uncertainty of set A [75].
From a general point of view, unlike a probabilistic theory that allocates a mass to the individual elementary events, the theory of evidence, or the bpa, makes m(A) on the set A of the P(z), power sets of the space Z event, i.e., on a set of results rather than a single elementary event.
In more detail, m(A) expresses the degree of belief that a specific element x belongs to the set A only, and not to any subset of A. The bpa that assigns a mass in the range of [0, 1] to each subset A satisfies the following requirement, specified by Equation ( 12): If n data sources are available, probability masses m i (B j ) must be defined for each data source i with 1 ≤ i ≤ n and for all sets, B j ∈ 2 8 .The DST allows the combination of these probability masses from resulting landslide detection maps and the training inventory data set to compute a combined probability mass for each set.The composition rule in the proposed DST is based on mathematical theory, which is the basis of the combination of mass functions m i obtained from n sources of information given in Equations ( 13) and ( 14): where K denotes the degree of conflict given in Equation ( 15): Fusion of Multi-Scale Results via DST Multi-scale segmentation resulted in different object sizes, which were used for training the ML methods.Therefore, based on multi-scale segmentation, different landslide detection maps were produced from each ML method.The areas that were classified as landslides using different scales and one ML method were grouped by the fusion level analysis (FLA) technique [12] and fused into a class of landslide area based on the DST.Several Bels were combined in the DST within the same frame, which made it possible to harmonize the landslide detected areas from different scales.The probability of uncertainty in the detected landslide areas can be derived from Bel and Pl.Therefore, the resulting areas detected as landslides in three scales (i.e., 198, 248, and 298) were combined by fusing the DST with the training inventory dataset.The accuracy of the landslide detected areas based on each scale was assessed using a confusion matrix and, accordingly, the Bels were estimated by a precision function [12].The DST combined the majority of landslide detected areas, which were closer to the inventory dataset, and then assigned them to the class of landslide area based on the DST.

Accuracy Assessment and Comparison
In this section, we outline the most common accuracy assessment methods, which were used to validate the performance of the applied ML methods and the improvements made by using the stacking and DST methods.The accuracy assessment was made by comparing the resulting landslide detection maps with the landslide inventory dataset.The accuracy assessment was conducted using a confusion matrix, precision, recall F1 measure, and the receiver operating characteristics (ROC), to determine the accuracy of the landslides detected by each method.The user accuracy was calculated based on dividing correctly mapped objects by the total number of classified objects.In this regard, the study area was divided into two classes called landslide areas and non-landside areas, and for each class, this measure was calculated.We used the confusion matrix based on a comparison of the inventory dataset and the resulting maps based on a pixel-based environment.The Kappa coefficient was derived from the confusion matrix [76], and the coefficient was calculated using Equation ( 16): where θ 1 denotes the ratio of correctly detected areas, whereas θ 2 denotes the proportion of agreement by randomness [12].
The ROC is a graphical plot used to evaluate the validity of a method that predicts the location of the occurrence of events by comparing the probabilistic map of the event with a reference map [77].This assessment method has been applied in many fields, in particular, in the Geosciences [78].The ROC measure is based on three metrics: true positive (TP), false positive (FP), and false-negative (FN) (see Figure 5 and Equations ( 17)-( 19)).In this case, the landslide events that were correctly detected were TPs, areas that were incorrectly classified as landslides were FPs, and FNs represented landslide areas that were not detected.Using these metrics, the three parameters of Precision, Recall, and F1 measure could be calculated to assess the results.Precision shows the proportion of landslide events that were detected, Recall indicates how many of the inventory landslide events were detected, and the F1 measure was used to balance the Precision and Recall.

Image Segmentation
The MRS parameters and results of ten applied scales were chosen by interval values based on the ESP2 tool scale value of 248.The NDVI and topographic derivatives, along with PlanetScope images, were used as the conditioning factors in our image segmentation procedure, to improve the segmentation results.Next, to evaluate the accuracy of our segmentation results, 30% of the inventory dataset of landslide events were randomly selected for use in the OPI, OFI, and OMI indices.The applied parameters and layer weights, and the segmentation evaluation results are presented in Table 2.For the accuracy assessment of our segmentation results, we selected image objects whose centroids were within the reference objects.According to Table 2, based on the OFI index results of 30% of the applied samples, the segmentation with a scale of 248 provides the best convergence, while

Image Segmentation
The MRS parameters and results of ten applied scales were chosen by interval values based on the ESP2 tool scale value of 248.The NDVI and topographic derivatives, along with PlanetScope images, were used as the conditioning factors in our image segmentation procedure, to improve the segmentation results.Next, to evaluate the accuracy of our segmentation results, 30% of the inventory dataset of landslide events were randomly selected for use in the OPI, OFI, and OMI indices.The applied parameters and layer weights, and the segmentation evaluation results are presented in Table 2.

Table 2.
Parameters used for the multiresolution segmentation (MRS) and segmentation evaluation results.For the accuracy assessment of our segmentation results, we selected image objects whose centroids were within the reference objects.According to Table 2, based on the OFI index results of 30% of the applied samples, the segmentation with a scale of 248 provides the best convergence, while scales lower and higher than scale 248 are associated with over-segmentation and under-segmentation errors, respectively.The variation of the resulting values from OPI, OMI, and OFI are shown in Figure 6.The segmentation results in Figure 7 show the over-and under-segmentation errors.In Figure 7a, due to a small SP, the landslide event is over-segmented, while in Figure 7c, the scale is relatively large, resulting in under-segmentation, which merged a non-landslide area with a landslide area.Figure 7b shows the best segmentation result compared to other scales.The main reason that three scale results are used for landslide detection is to evaluate the impact of different scales on the landslide detection accuracy.Since, in OBIA, processing units are image objects, other features (e.g., geometric, textural, and spectral) of objects can be calculated and used in the classification or object extraction.Therefore, in this case, we calculated characteristics such as the shape index, mean brightness, SD of NDVI, length, compactness, density, and contrast grey level of the co-occurrence matrix (GLCM), to be used in MLP-NN algorithms.In order to train MLP-NN for each segmentation result, the objects that overlapped with training data (polygons) were chosen as training objects.Subsequently, the trained algorithm was applied to test objects.
Remote Sens. 2019, 11, x FOR PEER REVIEW 16 of 28 scales lower and higher than scale 248 are associated with over-segmentation and undersegmentation errors, respectively.The variation of the resulting values from OPI, OMI, and OFI are shown in Figure 6.The segmentation results in Figure 7 show the over-and under-segmentation errors.In Figure 7a, due to a small SP, the landslide event is over-segmented, while in Figure 7c, the scale is relatively large, resulting in under-segmentation, which merged a non-landslide area with a landslide area.Figure 7b shows the best segmentation result compared to other scales.The main reason that three scale results are used for landslide detection is to evaluate the impact of different scales on the landslide detection accuracy.Since, in OBIA, processing units are image objects, other features (e.g., geometric, textural, and spectral) of objects can be calculated and used in the classification or object extraction.Therefore, in this case, we calculated characteristics such as the shape index, mean brightness, SD of NDVI, length, compactness, density, and contrast grey level of the co-occurrence matrix (GLCM), to be used in MLP-NN algorithms.In order to train MLP-NN for each segmentation result, the objects that overlapped with training data (polygons) were chosen as training objects.Subsequently, the trained algorithm was applied to test objects.scales lower and higher than scale 248 are associated with over-segmentation and undersegmentation errors, respectively.The variation of the resulting values from OPI, OMI, and OFI are shown in Figure 6.The segmentation results in Figure 7 show the over-and under-segmentation errors.In Figure 7a, due to a small SP, the landslide event is over-segmented, while in Figure 7c, the scale is relatively large, resulting in under-segmentation, which merged a non-landslide area with a landslide area.Figure 7b shows the best segmentation result compared to other scales.The main reason that three scale results are used for landslide detection is to evaluate the impact of different scales on the landslide detection accuracy.Since, in OBIA, processing units are image objects, other features (e.g., geometric, textural, and spectral) of objects can be calculated and used in the classification or object extraction.Therefore, in this case, we calculated characteristics such as the shape index, mean brightness, SD of NDVI, length, compactness, density, and contrast grey level of the co-occurrence matrix (GLCM), to be used in MLP-NN algorithms.In order to train MLP-NN for each segmentation result, the objects that overlapped with training data (polygons) were chosen as training objects.Subsequently, the trained algorithm was applied to test objects.

Landslide Detection using ML and Stacking Methods
The training image objects of each segmentation scale, along with their characteristics, were stored in a CSV file to train ML methods in the Python environment.Results for all scales vary between 0 and 1, and objects that had values greater than 0.5 were selected as landslide events for all ML methods.The results in Figure 8 show that the size of objects has a significant impact on the outputs.For example, the results of ML and stacking methods of image objects with a scale of 298 indicate larger areas as landslides than the other scales' results.However, based on the OPI and OMI indices, differences in the results were expected, because the OPI and OMI values of scales 198 and 248 are close to each other and are associated with an over-segmentation error.Meanwhile, for the scale 298, the values of those indices-OMI, in particular-show an under-segmentation error, which resulted in larger image objects.Additionally, the methods' parameters were the same for each segmentation scale.According to the results, which are represented in Table 3, the RF method achieved the highest user accuracy of all ML methods, with an accuracy of up to 90.07% when detecting non-landslides using the optimal SP.This resulting accuracy was followed by the MLP-NN and LR methods, with almost an 89% and 88.2% accuracy, respectively, again using the optimal SP.For this SP, the stacking method slightly improved the user accuracy to 90.44%.However, combining multi-scale results using

Results of Fusion and Optimization using DST
To optimize the landslide detection using the ML method and obtain the best result, we used DST to combine the results of each method's landslide detection output at different scales.Therefore, the predicted landslides acquired from each method were calculated using a confusion matrix, and a CSV file was employed to fuse the most probable landslide events in QGIS 3.8, which resulted in a single and accurate landslide inventory map for each method (Figure 8).
According to the results, which are represented in Table 3, the RF method achieved the highest user accuracy of all ML methods, with an accuracy of up to 90.07% when detecting non-landslides using the optimal SP.This resulting accuracy was followed by the MLP-NN and LR methods, with almost an 89% and 88.2% accuracy, respectively, again using the optimal SP.For this SP, the stacking method slightly improved the user accuracy to 90.44%.However, combining multi-scale results using the DST increased the user accuracy for all applied methods by up to 9%.In the ROC analysis, the area under the curve (AUC) is a metric that evaluates each method's performance in distinguishing between classes.In our case, these classes are landslide areas and non-landslide areas [79].AUC values close to 1 indicate the reliability of a method's results; however, values close to 0.5 show a method's poor performance.Table 4 presents the quantitative assessment accuracy of each method based on the testing inventory dataset.The ROC validation results showed that the DST-stacking results achieved the highest AUC value of 0.965, whereas the LR method, based on the scale of 180 with an AUC value of 0.88, presented the lowest accuracy in landslide detection (see Figure 9 and Table 5).Furthermore, for each result, the precision, recall, and F1 measures were also calculated.To visually present the improvement of the results using the optimal scale, stacking, and DST, we enlarged a specific area of a landslide event (see Figure 10).This figure illustrates that the ML and  To visually present the improvement of the results using the optimal scale, stacking, and DST, we enlarged a specific area event (see Figure 10).This figure illustrates that the ML and stacking results based on objects at a scale of 298 are much more prone to FPs.Although the multi-scale approach resulted in different TP, FP, and FN areas, the DST was helpful for merging the TP areas of different scales and considerably improved the landslide detection accuracy.For instance, the result of stacking at a scale of 298 resulted in considerably more FPs, but this was not the case for the other scale levels, namely 248 and 198.However, as the DST combined the majority of TP areas, the FPs of the scale 298 were removed from the DST results.
The present study aimed to evaluate the performance of commonly used ML methods in landslide detection in an object-based environment.As shown in Figure 11, our validation results proved that all methods obtained the highest accuracy when using objects with a scale of 248, compared to lower and higher scales.However, applying the DST to combine the results of multiple scales for each method improved the overall accuracy of each method.Regarding the applied ML methods in this study, the stacking method yielded the highest accuracy at any scale compared to using single ML methods.The RF method performed best among the single ML methods, whereas the LR method achieved the lowest accuracies.Therefore, using the optimal SP and ML method is crucial for landslide detection in an object-based environment.We used the seismic-induced landslides as a case instant to evaluate the proposed methodology.However, this method can be applied for other types of landslides, and the outcome may depend on the resolution of the applied satellite imagery and the optimal scale parameter of segmentation.The limitation of this work is that the methodology is outperformed for a high-vegetated area, which was helpful to differentiate landslide areas from non-landslide ones.Therefore, it cannot reach the same accuracies for areas with less vegetation cover.

Conclusions
We have developed a methodology that incorporates OBIA with three machine learning methods, namely, logistic regression (LR), the multilayer perceptron neural network (MLP-NN), and random forest (RF), for landslide detection.Our multi-scale methodology identifies the optimal scale parameters (SP) and uses them for multi-scale segmentation and further analysis.The presented landslide mapping study showed that the integrated method improves the performance and accuracy.Most notably, the stacking method for landslide detection outperformed every single ML method.Furthermore, the validation results show that using DST could optimize and improve the outcomes of all applied methods.However, it should be noted that the results of an object-based ML method strongly depend on the segmentation quality.Therefore, optimal segmentation parameters result in a higher segmentation accuracy and, consequently, better results.Although there are no standard methods for selecting segmentation parameters and for accuracy assessment, we used the two measures of OPI and OMI to identify ideal segmentation parameters in terms of spectral and spatial quality.Therefore, as a result, both measures were combined to create the OFI, which allowed us to identify the best segmentation parameters, as well as to identify over-segmentation and undersegmentation errors.Therefore, we believe that a challenge for object-based ML methods is improving the segmentation accuracy, which requires new reliable automatic methods dealing with intra-class heterogeneity and inter-class homogeneity.This study shows that using high-resolution satellite imagery data does not guarantee a good accuracy.Several measures and parameters should be identified based on the target object detection or classification.In this regard, we used an ensemble method of stacking and DST to enhance the landslide detection results based on multi-scale segmentation.Different accuracy assessment results proved that the performance of landslide detection can be increased using these two methods.Moreover, all resulting maps yielded the highest accuracies using the optimal SP.Therefore, finding the optimal SP for the applied satellite image and ML method for the classification is crucial for accurate landslide detection.Based on the results of the present study, our future work will focus more on improving both segmentation and classification using relative mathematical and probability concepts, such as the central limit theorem (CLT) and fuzzy set theory (FST).

Conclusions
We have developed a methodology that incorporates OBIA with three machine learning methods, namely, logistic regression (LR), the multilayer perceptron neural network (MLP-NN), and random forest (RF), for landslide detection.Our multi-scale methodology identifies the optimal scale parameters (SP) and uses them for multi-scale segmentation and further analysis.The presented landslide mapping study showed that the integrated method improves the performance and accuracy.Most notably, the stacking method for landslide detection outperformed every single ML method.Furthermore, the validation results show that using DST could optimize and improve the outcomes of all applied methods.However, it should be noted that the results of an object-based ML method strongly depend on the segmentation quality.Therefore, optimal segmentation parameters result in a higher segmentation accuracy and, consequently, better results.Although there are no standard methods for selecting segmentation parameters and for accuracy assessment, we used the two measures of OPI and OMI to identify ideal segmentation parameters in terms of spectral and spatial quality.Therefore, as a result, both measures were combined to create the OFI, which allowed us to identify the best segmentation parameters, as well as to identify over-segmentation and under-segmentation errors.Therefore, we believe that a challenge for object-based ML methods is improving the segmentation accuracy, which requires new reliable automatic methods dealing with intra-class heterogeneity and inter-class homogeneity.This study shows that using high-resolution satellite imagery data does not guarantee a good accuracy.Several measures and parameters should be identified based on the target object detection or classification.In this regard, we used an ensemble method of stacking and DST to enhance the landslide detection results based on multi-scale segmentation.Different accuracy assessment results proved that the performance of landslide detection can be increased using these two methods.Moreover, all resulting maps yielded the highest accuracies using the optimal SP.Therefore, finding the optimal SP for the applied satellite image and ML method for the classification is crucial for accurate landslide detection.Based on the results of the present study, our future work will focus more on improving both segmentation and classification using relative mathematical and probability concepts, such as the central limit theorem (CLT) and fuzzy set theory (FST).

Figure 1 .
Figure 1.The study area and a false color composite image of the orthorectified analytical scene of PlanetScope satellites and photographs of landslide events in Rasuwa district.

Figure 1 .
Figure 1.The study area and a false color composite image of the orthorectified analytical scene of PlanetScope satellites and photographs of landslide events in Rasuwa district.

Figure 2 .
Figure 2. The flowchart of the applied methodology of the current study.

Figure 2 .
Figure 2. The flowchart of the applied methodology of the current study.

Figure 4 .
Figure 4. Pseudo-color image of a landslide event (a) and segmented landslides (b).

Figure 4 .
Figure 4. Pseudo-color image of a landslide event (a) and segmented landslides (b).

Figure 5 .
Figure 5. Illustration of accuracy assessment measures.

Figure 5 .
Figure 5. Illustration of accuracy assessment measures.

Figure 6 .
Figure 6.Changes of the object pureness index (OPI), object matching index (OMI), and object fitness index (OFI) at different scales.Scale 248, where all lines cross, turned out to be the best segmentation scale.

Figure 7 .Figure 6 .
Figure 7. Image segmentation with different results; the black colored area represents a landslide event, and blue polygons are image objects.Images (a-c) are segmentation results with scales of 198, 248, and 298, respectively.

Figure 6 .
Figure 6.Changes of the object pureness index (OPI), object matching index (OMI), and object fitness index (OFI) at different scales.Scale 248, where all lines cross, turned out to be the best segmentation scale.

Figure 7 .Figure 7 .
Figure 7. Image segmentation with different results; the black colored area represents a landslide event, and blue polygons are image objects.Images (a-c) are segmentation results with scales of 198, 248, and 298, respectively.

Figure 9 .
Figure 9. Receiver operating characteristics (ROC) analysis and area under the curve (AUC) values for ML, Stacking, and DST results.

Figure 9 .
Figure 9. Receiver operating characteristics (ROC) analysis and area under the curve (AUC) values for ML, Stacking, and DST results.

Figure 10 .
Figure 10.Enlarged maps showing the capabilities of DST to reduce both false positive and false negative areas in landslide detection through multi-scale ML and stacking methods.

Figure 10 .
Figure 10.Enlarged maps showing the capabilities of DST to reduce both false positive and false negative areas in landslide detection through multi-scale ML and stacking methods.

Figure 11 .
Figure 11.Changes of AUC, precision, recall, and F1 measure values for each method and scale.

Figure 11 .
Figure 11.Changes of AUC, precision, recall, and F1 measure values for each method and scale.

Table 2 .
Parameters used for the multiresolution segmentation (MRS) and segmentation evaluation results.

Table 3 .
Confusion matrices based on different methods and the landslide inventory data set.

Table 4 .
Quantitative accuracy assessment for ML and Stacking methods, as well as DST results.

Table 5 .
The accuracy assessment results based on the area under the curve (AUC).

Table 5 .
The accuracy assessment results based on the area under the curve (AUC).