Improving Spatial Agreement in Machine Learning-Based Landslide Susceptibility Mapping

: Despite yielding considerable degrees of accuracy in landslide predictions, the outcomes of di ﬀ erent landslide susceptibility models are prone to spatial disagreement; and therefore, uncertainties. Uncertainties in the results of various landslide susceptibility models create challenges in selecting the most suitable method to manage this complex natural phenomenon. This study aimed to propose an approach to reduce uncertainties in landslide prediction, diagnosing spatial agreement in machine learning-based landslide susceptibility maps. It ﬁrst developed landslide susceptibility maps of Cox’s Bazar district of Bangladesh, applying four machine learning algorithms: K-Nearest Neighbor (KNN), Multi-Layer Perceptron (MLP), Random Forest (RF), and Support Vector Machine (SVM), featuring hyperparameter optimization of 12 landslide conditioning factors. The results of all the four models yielded very high prediction accuracy, with the area under the curve (AUC) values range between 0.93 to 0.96. The assessment of spatial agreement of landslide predictions showed that the pixel-wise correlation coe ﬃ cients of landslide probability between various models range from 0.69 to 0.85, indicating the uncertainty in predicted landslides by various models, despite their considerable prediction accuracy. The uncertainty was addressed by establishing a Logistic Regression (LR) model, incorporating the binary landslide inventory data as the dependent variable and the results of the four landslide susceptibility models as independent variables. The outcomes indicated that the RF model had the highest inﬂuence in predicting the observed landslide locations, followed by the MLP, SVM, and KNN models. Finally, a combined landslide susceptibility map was developed by integrating the results of the four machine learning-based landslide predictions. The combined map resulted in better spatial agreement (correlation coe ﬃ cients range between 0.88 and 0.92) and greater prediction accuracy (0.97) compared to the individual models. The modelling approach followed in this study would be useful in minimizing uncertainties of various methods and improving landslide predictions.


Introduction
Due to the destructive potential of landslides, this natural phenomenon poses a serious threat to human life, property, and the environment in the areas in which they occur [1,2]. Access to continuous and accurate information on landslide occurrence is essential for managing the risk to this unpredictable hazard [2,3]. Mapping landslide susceptibility is a widely conceived approach to estimating the likelihood of occurrence of this complex natural phenomenon [1,[3][4][5]. The development of remote sensing technologies in the last few decades enables researchers to map landslide susceptibility more efficiently, due to the availability of high spatial and temporal resolution data [3,4,6]. For instance, high-resolution remote sensing (satellite imagery) data are used to develop various thematic layers explaining the topography, land cover, geology, and hydrology, which are essential parameters for predicting landslides [4,7]. Remote sensing techniques are also useful in developing accurate landslide inventory maps [3,6].
Along with the quality of available data, the choice of appropriate methodology is essential for developing reliable susceptibility maps [2]. During the last several decades, many landslide susceptibility models have been developed based on the geographic information system (GIS) and remote sensing technology [8]. Examples of such models include the weights-of-evidence [9,10], multivariate regression analysis [10,11], analytical hierarchy process [12], and the evidential belief function [13]. Applications of various machine learning algorithms in landslide susceptibility mapping (LSM) have evolved in recent decades. As a widely applicable method in data mining, the K-Nearest Neighbor (KNN) algorithm made early appearances in landslide prediction [14,15]. The Logistic Regression (LR) [2,12] and Support Vector Machine (SVM) [8,16] models also gained much popularity as adaptive systems for LSM [15]. Artificial neural networks in the form of a Multi-Layer Perceptron (MLP) were also used for this task [17]. More recently, evidence from various studies indicates that ensembles such as the Random Forest (RF) model can improve machine learning-based landslide prediction [1,18]. However, the outcomes of landslide susceptibility mapping could be subject to considerable uncertainties due to errors and variability in model choice, data used, system understanding, weighting factors, and human judgment [19,20].
Since the access to accurate landslide prediction maps is the prerequisite to decision-makers, the results must be carefully analyzed and critically reviewed before disseminating to support the end-users [9]. While developing landslide susceptibility maps, challenges may arise in (i) measuring the accuracy of a susceptibility assessment [21], and (ii) selecting an "optimal" combination of methods for susceptibility assessments [22]. Most of the validation processes of LSM consist of two steps: simulating landslide susceptibility and comparing the predicted results with the observed landslide locations [1,9]. Validation techniques must possess qualities such as reliability, robustness, degree of fitting, and prediction skill [21]. However, the performance evaluation of most of the LSMs was carried out based on the testing datasets [9,23]. Thus, a similar performance of multiple models at the testing landslide locations does not ascertain the same degree of agreement in terms of spatial predicted patterns [9].
Whilst many recent studies applied various combinations of machine learning algorithms to map landslide susceptibility [23][24][25][26], pixel-wise agreement in landslide prediction between various methods is inadequately understood. The resultant spatial heterogeneity in landslide prediction with different techniques creates uncertainties in LSM [9,19,27]. To address this challenge, this study aimed to propose a method to reduce uncertainties in landslide prediction. Therefore, it evaluated the extent of agreement of landslide prediction maps generated by applying four different machine learning algorithms. A combined landslide prediction map was developed by integrating the results of these four models. The study was carried out in Cox's Bazar district of Bangladesh ( Figure 1).

Materials and Methods
The study was conducted in three stages. First, landslide susceptibility maps (LSMs) of the study area were developed using four machine learning algorithms: K-Nearest Neighbor (KNN), Multi-Layer Perceptron (MLP), Random Forest (RF), and Support Vector Machine (SVM). Second, the extent of spatial agreement of predicted patterns in LSMs was assessed by estimating the pixelwise correlation of landslide probabilities obtained using various methods. Finally, an LSM was developed combining results from the four machine learning models ( Figure 2). This study also estimated population exposure to landslide by overlaying gridded population layers of the year 2020, collected from WorldPop [29] and UNHCR [30], on LSMs.

Study Area
This study addressed the Cox's Bazar district, which is located in the south eastern region of Bangladesh ( Figure 1a). The study area lies between latitude 20°53'46.7" N and 21°14'29.8" N, and longitude 92°02'08.2" E and 92°18'27.0" E. It is comprised of seven (out of eight) sub-districts (locally termed as Upazilas) of Cox's Bazar district (Figure 1b). The low-lying areas such as Kutubdia subdistrict, part of Maheshkhali sub-district, and Saint Martin's island ( Figure 1b) were not considered in this study. The study area is diverse and unique, both in terms of ecosystem services and biodiversity, and currently, an epitome of global geopolitics as it is accommodating over one million Rohingya refugees. It is characterized by relatively high elevation land (mean elevation is 18 m), compared to the rest of the country. At present, approximately a total of 3.4 million people inhabit 1869 km 2 of land (estimated using data from WorldPop [29] and UNHCR [30]).
The area receives the annual mean precipitation of 4288 mm [31]. The heavy rainfall triggers both flash floods and landslides in this area [11,12]. The majority of the historical landslides in

Materials and Methods
The study was conducted in three stages. First, landslide susceptibility maps (LSMs) of the study area were developed using four machine learning algorithms: K-Nearest Neighbor (KNN), Multi-Layer Perceptron (MLP), Random Forest (RF), and Support Vector Machine (SVM). Second, the extent of spatial agreement of predicted patterns in LSMs was assessed by estimating the pixel-wise correlation of landslide probabilities obtained using various methods. Finally, an LSM was developed combining results from the four machine learning models ( Figure 2). This study also estimated population exposure to landslide by overlaying gridded population layers of the year 2020, collected from WorldPop [29] and UNHCR [30], on LSMs.

Study Area
This study addressed the Cox's Bazar district, which is located in the south eastern region of Bangladesh ( Figure 1a). The study area lies between latitude 20 • 53 46.7" N and 21 • 14 29.8" N, and longitude 92 • 02 08.2" E and 92 • 18 27.0" E. It is comprised of seven (out of eight) sub-districts (locally termed as Upazilas) of Cox's Bazar district (Figure 1b). The low-lying areas such as Kutubdia sub-district, part of Maheshkhali sub-district, and Saint Martin's island ( Figure 1b) were not considered in this study. The study area is diverse and unique, both in terms of ecosystem services and biodiversity, and currently, an epitome of global geopolitics as it is accommodating over one million Rohingya refugees. It is characterized by relatively high elevation land (mean elevation is 18 m), compared to the rest of the country. At present, approximately a total of 3.4 million people inhabit 1869 km 2 of land (estimated using data from WorldPop [29] and UNHCR [30]).
Remote Sens. 2020, 12, 3347 4 of 23 that are highly susceptible to landslides. The Kutupalong camp is considered as the most densely populated refugee settlement area in the world, where around 75,000 people live per km 2 [31,33]. Any catastrophic landslide will cause significant damage to human lives and assets. Hence, an accurate assessment of landslide susceptibility is paramount for developing a plan for landslide risk management.

Landslide Inventory Mapping
Landslide inventory mapping is one of the essential steps for landslide prediction and susceptibility mapping. This study utilized the landslide inventory map developed by Ahmed, Rahman, Sammonds, Islam, and Uddin [11] (Figure 3). They developed the latest landslide inventory map of the Cox's Bazar district by retrieving the historical landslide information from newspapers and various organizations and later verified those with global positioning system (GPS) and reconnaissance surveys. This study also used information about landslide movement type, its distribution and style, rate of flow, damage, the volume of displacement, material, and the reason for movement by preparing a landslide investigation form collected from Ahmed, Rahman, Sammonds, Islam, and Uddin [11]. A total of 1262 sample locations were used, where the number of landslide and non-landslide locations was 670 and 592, respectively. To develop the models, it is necessary to obtain non-landslide cells (where landslides did not occur). From the existing literature, Huang et al. [34] identified three methods for obtaining non-landslide grid cells: (i) the seed cell procedure; (ii) randomly selecting non-landslide locations from the landslide free areas; and (iii) non-landslide locations selected in areas with a slope lower than 2°. This study followed the second approach to select random locations within the study area, where landslides did not occur. These cells provided the models with the necessary data during the training stage [35,36]. The sample locations were split The area receives the annual mean precipitation of 4288 mm [31]. The heavy rainfall triggers both flash floods and landslides in this area [11,12]. The majority of the historical landslides in Bangladesh occurred in this region [11]. For instance, a major landslide triggered by heavy rain in June 2017 killed at least 156 people in the south eastern hilly region of Bangladesh where the study area is located [32]. Unplanned urbanization, rapid growth of population, hill cutting, and deforestation are associated with the recent increase in landslide hazards [11,31]. Notably, Rohingya refugee camps, especially the Kutupalong camp of Ukhia sub-district ( Figure 3) is located in areas that are highly susceptible to landslides. The Kutupalong camp is considered as the most densely populated refugee settlement area in the world, where around 75,000 people live per km 2 [31,33]. Any catastrophic landslide will cause significant damage to human lives and assets. Hence, an accurate assessment of landslide susceptibility is paramount for developing a plan for landslide risk management.

Landslide Inventory Mapping
Landslide inventory mapping is one of the essential steps for landslide prediction and susceptibility mapping. This study utilized the landslide inventory map developed by Ahmed, Rahman, Sammonds, Islam, and Uddin [11] (Figure 3). They developed the latest landslide inventory map of the Cox's Bazar district by retrieving the historical landslide information from newspapers and various organizations and later verified those with global positioning system (GPS) and reconnaissance surveys. This study also used information about landslide movement type, its distribution and style, rate of flow, damage, the volume of displacement, material, and the reason for movement by preparing a landslide investigation form collected from Ahmed, Rahman, Sammonds, Islam, and Uddin [11]. A total of 1262 sample locations were used, where the number of landslide and non-landslide locations was 670 and 592, respectively. To develop the models, it is necessary to obtain non-landslide cells (where landslides did not occur). From the existing literature, Huang et al. [34] identified three methods for obtaining non-landslide grid cells: (i) the seed cell procedure; (ii) randomly selecting non-landslide locations from the landslide free areas; and (iii) non-landslide locations selected in areas with a slope lower than 2 • . This study followed the second approach to select random locations within the study area, where landslides did not occur. These cells provided the models with the necessary data during the training stage [35,36]. The sample locations were split into two classes: (i) 60% locations (52% landslide and 48% non-landslide locations) were used to train the machine learning-based landslide prediction models, and (ii) 40% testing locations (54% landslide and 46% non-landslide locations) were employed to evaluate the performance of the machine learning models ( Figure 2). Remote Sens. 2020, 12, x FOR PEER REVIEW 5 of 23 into two classes: (i) 60% locations (52% landslide and 48% non-landslide locations) were used to train the machine learning-based landslide prediction models, and (ii) 40% testing locations (54% landslide and 46% non-landslide locations) were employed to evaluate the performance of the machine learning models ( Figure 2).

Landslide Conditioning Factor
The performance of LSMs depends on the choice of landslide conditioning factors. Numerous studies on LSM have been conducted based on machine learning techniques [1,16,18,23,26,37,38], with various combinations of landslide conditioning factors being used. However, the selection of factors should be (i) based on their degree of affinity with landslide locations, (ii) measurable, (iii) non-redundant, and (iv) based on the knowledge of geomorphological characteristics of the area under study [2]. Based on the knowledge obtained from the literature, as well as, expert knowledge on the study area, a total of 12 variables were selected in this current study (Table 1). Areas with an elevation of less than 5 m, as well as, waterbodies and sandy sea beach areas (waterbody and restricted in Figure 4) were excluded from the LSMs [39].

Landslide Conditioning Factor
The performance of LSMs depends on the choice of landslide conditioning factors. Numerous studies on LSM have been conducted based on machine learning techniques [1,16,18,23,26,37,38], with various combinations of landslide conditioning factors being used. However, the selection of factors should be (i) based on their degree of affinity with landslide locations, (ii) measurable, (iii) non-redundant, and (iv) based on the knowledge of geomorphological characteristics of the area under study [2]. Based on the knowledge obtained from the literature, as well as, expert knowledge on the study area, a total of 12 variables were selected in this current study (Table 1). Areas with an elevation of less than 5 m, as well as, waterbodies and sandy sea beach areas (waterbody and restricted in Figure 4) were excluded from the LSMs [39].   Topographical and hydrological parameters including aspect, elevation, slope, curvature, and Stream Power Index (SPI) are important factors that limit the density and spatial extent of landslides [2,37,38,40]. Raster maps of aspect, elevation, curvature, slope, and SPI were derived at 30-m spatial resolution from the Advanced Land Observing Satellite (ALOS) Digital Elevation Model (DEM) [28] (Figure 4a-e). Elevation influences landslides primarily by affecting different biophysical parameters and anthropogenic activities. Only a limited number of studies, conducted on a specific basin, found that landslides occur at certain elevations [41]. Elevation can determine the spatial variability of landslides because it is affected by geological tectonics [37]. It can also influence the occurrence of landslides by impacting other causative factors such as slope, curvature, and SPI [42]. Aspect, indicating the direction of slope [43], indirectly influences the distribution of landslide locations by affecting the general physiographic trend of the area and/or the main precipitation direction [37,40]. Slope angle is considered as one of the most influential factors for the occurrence of landslides, as it affects the concentration of moisture and the level of pore pressure, as well as, controls regional hydraulic continuity [2,40]. All of these processes influence slope instability [8]. Curvature is also considered as a landslide influencing factor that directly controls the velocity of water flow, delimiting erosion [8,40]. SPI also determines the erosion potential of the surface [43] and is considered as an essential predictor of landslides [37,38]. Areas with high SPI values indicate a higher erosion potential, while negative values suggest no predicted erosion [44,45]. In this study, a layer of SPI was derived using the following equations in GIS: where A s and β indicates the specific catchment area (m 2 /m) and slope gradient, respectively [43]. It is widely conceived that various geological factors significantly influence the occurrence of landslides, as these factors often lead to a difference in strength and permeability of rocks and soils [2]. This study considered the three geological factors of surface geology, soil type, and soil texture (Figure 4i-k). Digital geologic and geophysical data of Bangladesh were collected from the U.S. Geological Survey [46]. The surface geology map of the study area includes a total of 11 classes: water (H 2 O), Bhuban formation (Miocene, Tb), Dupi Tila formations undivided (QTdd), valley alluvium and colluvium (ava), Girujan clay (Pleistocene and Neogene, QTg), Tipam Sandstone (Neogene, Tt), Boka Bil formation (Neogene, Tbb), beach and dune sand (csd), marsh clay and peat (ppc), Dupi Tila formation (Pleistocene and Pliocene, QTdt), and Dihing formation (Pleistocene and Pliocene, QTdi) (Figure 4i). Primary-level parameters such as soil type and soil texture are essential predictors of landslides. These parameters determine the amount of moisture content indicating the degree of stability of the soil [25,37,47,48]. Soil type and soil texture data were collected from the Bangladesh Agricultural Research Council [49].
Other anthropogenic, environmental, and locational factors considered in this study include distance to stream, land cover, normalized difference vegetation index (NDVI), and distance to road (Figure 4f-h,l). The land cover and NDVI maps of the year 2020 were prepared using Landsat satellite images based on the Google Earth Engine Platform. The land cover map was developed applying a supervised classification technique with the Random Forest algorithm. In the case of southern Bangladesh, a recent study demonstrated that this method has a higher classification accuracy compared to other land cover classification techniques [50]. The land cover map contains five classes: bare land, built-up area, crop land, vegetation, and waterbody ( Figure 4g). Proximity to roads explains the locations of landslides, as the artificial and natural slopes adjacent to a road are sensitive to this hazard [51]. Road-cuts, excavation, and additional load can induce anthropogenic instability of the soil, promoting landslides [2,5]. A layer of distance to road network was developed using the Euclidian distance algorithm. Likewise, the location of areas with respect to natural drainage channels can also demonstrate the locations of landslides [11], as streams may change the stability of an area by eroding the slopes [5,51]. In this study, distance to stream networks was derived from the ALOS DEM. Again, by applying the Euclidian distance algorithm, a map of distance to stream was generated.

Multi-Collinearity Analysis of Landslide Conditioning Factors
The selected landslide causative factors could be subject to multi-collinearity; hence, it is necessary to estimate the correlation of independent variables before modelling landslide susceptibility [8].
To eliminate the factors susceptible to multi-collinearity, this study determined variance inflation factors (VIF) [53] of 12 selected landslide conditioning factors using R [54]. VIF is a well-known method to determine the multi-collinearity of landslide conditioning factors [8,55]. A VIF value of a variable exceeding 5 indicates potential serious multicollinearity [53,55]. In this study, the selected landslide conditioning factors yielded VIF values < 2.8, indicating the absence of potential multi-collinearity (Table 1).

Pre-Processing
Using the binary locations (landslide and non-landslide), values of the selected 12 conditioning factors were extracted in a geographic information system (GIS) environment. As evident in Table 1, eight were continuous variables, while the remaining four variables had discrete characteristics. In order to represent discrete (categorical) variables semantically, they must be considered as a composite feature (where the number of generated binary features and the number of categories are equal). These discrete variables were encoded using a one-hot encoding scheme [31], implying that multiple binary features were generated to represent a single discrete feature. The number of one-hot encoded features depends on the number of variable classes. For instance, there are 11 categories in the geology variable. If a landslide location was found in a geology class, a value 1 was encoded to the class, while the other 10 classes were encoded as 0. This data pre-processing method was applied for all other discrete variables. For each variable, mean and standard deviation were calculated. The mean of each variable was then subtracted from the corresponding value in a variable and divided by the standard deviation. This reduces training time since optimization routines have a smaller parameter space to traverse.

Hyperparameter Optimization
Hyperparameter optimization can improve the accuracy of machine learning algorithm-based models. The process aims to select the optimal hyperparameter values according to the evaluation index [56]. Three approaches are frequently used for optimizing hyperparameters: grid search, random search, and Bayesian optimization [57]. This current study applied the grid search technique along with 5-fold cross-validation on the training set to perform hyperparameter optimization. Hyperparameters that provide the best performance were chosen for final training and testing samples of respective machine learning models. For instance, the optimal number of neighbors of five in the KNN (Table 2) indicates that values of landslide conditioning factors corresponding to a landslide location were compared against the values of landslide predictors of five other sample locations, to obtain the most reliable prediction. Table 2 summarizes the hyperparameters, their search range and optimal values of the four models. The KNN algorithm classifies an instance (landslide or non-landslide) that is mostly represented within its (k) neighbors. The parameter k is often a small positive integer [58]. The proximity between the samples is measured using a distance metric. The distance metric indicates how similar or different are the profiles of conditioning factors for any given two samples. Data points with similar conditioning factors will have a small feature distance between them. Though the model is simple in terms of hyperparameters, it becomes computationally expensive as the number of samples becomes large. The landslide susceptibility associated with a certain set of values of conditioning factors is determined by calculating its distance to each training data point (in high-dimensional feature space). The k nearest data points are used to determine the landslide susceptibility. The dominant susceptibility class within those k nearest neighbors (i.e., the class with the highest number of members in the k members) becomes the class membership of the new data point [14,59].
where f is the activation function, ω j k is the weight of kth neuron in layer j, a j−1 k is the activation of neuron k in layer j − 1 (the previous layer), j is the layer index, i is the neuron index, and n is the number of neurons in layer j.

(3) Random Forest (RF)
Random forest (RF) is considered as a powerful ensemble-learning method that can be applied for classification, regression, and unsupervised learning [18]. This method has been widely applied in landslide susceptibility mapping [18,56,60]. Ensemble models generally train several weak learners and then take their aggregated outputs to obtain more reliable predictions. The RF algorithm builds weak learners in the form of decision trees. It estimates the mean of outputs of the individual weak learners, as shown in Equation (3). Each weak learner (b) corresponds to a function f b (x). The RF uses bootstrap aggregating where the weak learners train parallelly [31].
whereF(x) is the ensembled prediction from weak learners, B is the total number of weak learners, b is the weak learner index, and f b (x) is the function for bth weak learner.
(4) Support Vector Machine (SVM) Support Vector Machine (SVM) is also a widely used machine learning algorithm in landslide susceptibility mapping [8,16,23,26]. This supervised learning method separates the classes with a decision surface that maximizes the margin of class boundaries [8]. The training locations, closest to the optimal hyperplane, are called support vectors [23]. Suppose each sample location has M number of features, the objective of the SVM algorithm is to find a hyperplane in M dimensional feature space that separates the samples of different classes. A hyperplane is R (M−1) dimensional in R M . A hyperplane in R 2 is a line, a hyperplane in R 3 is a plane, and so on. This hyperplane functions as a decision boundary, which determines the label of a sample (i.e. landslide or non-landslide). The margin around the hyper lane indicates that value exceeding 1 denotes a positive sample (landslide), and a value equal to −1 denotes a negative sample (non-landslide). If X (X 1 , X 2 , . . . . . . . . . , X n ) is the vector of landslide affecting factor and Y j (Y 1 , Y 2 ) is the vector of landslide (1) or non-landslide (0) event, the optimal hyperplane can be found by solving Equation (4) [26].
where k is the offset from the origin of the hyperplane, n is the total number of factors that affects landslide, α i is the positive real constant, and k(X, X i ) is the Kernel function. To classify the binary events (landslide or non-landslide), the condition to solve Equation (4) was assumed as below: where w is the weight vector and ϕ(x i ) is the total number of factors that affects landslide.

Performance Evaluation Methods
The performance of landslide susceptibility models was evaluated using a well-known method called receiver operating characteristic (ROC) curve and subsequent area under the curve (AUC) [1,11,23,31,43]. The ROC curves were developed using the 40% sample testing data. The ROC curve indicates the performance of a binary classifier system, representing sensitivity as a function of the false positive rate (1-specificity).
The sensitivity of a model is the ratio of the number of true positives to the sum of the number of true positives and false negatives. The specificity is the ratio of the number of true negatives to the sum of the number of true negatives and false positives. The ROC curve can be developed by plotting sensitivity in the y-axis against the cumulative distribution function of the false positive rate in the x-axis. The estimated AUC value can be categorized as poor (0.5-0.6), average (0.6-0.7), good (0.7-0.8), very good (0.8-0.9), and excellent (0.9-1) [1,43,60]. Besides, various statistical indices such as overall accuracy, precision, recall, and F1-score were estimated by developing a confusion matrix [1,11,43].

Evaluation of Spatial Agreement and Optimizing Prediction Map
To evaluate the inter-model agreeability, a pixel-wise agreement between two machine learning algorithms was estimated. Therefore, Pearson's correlation coefficient was estimated for a total of six possible combinations of machine learning model-based landslide susceptibility maps. Here, Pearson's correlation coefficient indicates the covariance of landslide predictions, obtained by using two algorithms, divided by the product of their standard deviations. The correlation coefficient can range from +1 to −1, where values zero as indicating no agreement and ±0.29 as low degree, ±0.30-±0.49 as moderate degree, ±0.50 to <±1 as high degree, and ±1 as perfect agreement [60].
Following the evaluation of spatial agreement, an optimized landslide prediction map was developed combining susceptibility maps generated by applying the four machine learning algorithms.
The combined map was developed by following a methodology proposed by Rossi, Guzzetti, Reichenbach, Mondini, and Peruccacci [22], where they established a logistic regression (LR) model. The LR model included binary landslide and non-landslide locations as the dependent variable and the results of the four landslide susceptibility models as the independent variables. The obtained regression coefficients were incorporated in Equation (6) [43] in GIS to derive the probability (P) of landslides in the study area.
where z is the linear combination of independent variables which was estimated using the following equation: where θ 0 is the intercept of the model, θ i (i = 1, 2, . . . , n) indicates the regression coefficient of independent variables, and x i (i = 1, 2, . . . , n) represents the n number of independent variables. Validation of the resultant combined model was performed by developing the ROC curve by using the 40% testing data. Figure 5 shows landslide susceptibility maps of Cox's Bazar district developed by applying the four machine learning algorithms. The generated landslide probability maps were classified into five categories each by applying the Jenks natural breaks classification method in GIS: (i) very low (0-0.1), (ii) low (0.11-0.3), (iii) medium (0.31-0.5), (iv) high (0.51-0.85), and (v) very high (0.86-1). As evident in Figure 6, the proportion of landslide susceptible area varied from one model to another. Among all methods, the SVM resulted in the highest proportion of area (38.7%) susceptible to the landslide of 'high' and 'very high' severity, while the Random Forest (RF) algorithm yielded a relatively lower proportion (23.1%) of landslide susceptible area. Likewise, the ratio of the population exposed to 'high' and 'very high' landslide susceptible zones varied for different algorithms. For all the four methods, the percentage of landslide exposed population ranged between 34% to 48% ( Figure 6). proportion (23.1%) of landslide susceptible area. Likewise, the ratio of the population exposed to 'high' and 'very high' landslide susceptible zones varied for different algorithms. For all the four methods, the percentage of landslide exposed population ranged between 34% to 48% ( Figure 6).    Blue dots indicate the proportion of people exposed to 'high' and 'very high' susceptible zones.

Evaluation of Models' Performance
To evaluate the performance of various landslide susceptibility models, a performance matrix was derived using the test samples (40% of the total data) ( Table 3). The performance evaluation indices indicated a very high prediction accuracy of all the models. In terms of overall accuracy, the RF classifier resulted in the highest accuracy (96.63%), followed by the MLP (95.45%), SVM (94.06%), and KNN (90.69%). However, the overall accuracy is a universal metric, hence, it does not indicate which specific classes were being inaccurately classified. To obtain further insights into the agreement between the observed and modelled locations (landslide and non-landslide)-precision, F1-score, and recall were estimated ( Table 3). The RF classifier achieved the best accuracy with respect to all performance indicators. The MLP followed closely and consistently in terms of all indicators. In relation to the estimated AUC values, the RF classifier yielded the highest accuracy (0.962), followed by the MLP (0.960), SVM (0.935), and KNN (0.927) (Figure 7). The relatively greater values of

Evaluation of Models' Performance
To evaluate the performance of various landslide susceptibility models, a performance matrix was derived using the test samples (40% of the total data) ( Table 3). The performance evaluation indices indicated a very high prediction accuracy of all the models. In terms of overall accuracy, the RF classifier resulted in the highest accuracy (96.63%), followed by the MLP (95.45%), SVM (94.06%), and KNN (90.69%). However, the overall accuracy is a universal metric, hence, it does not indicate which specific classes were being inaccurately classified. To obtain further insights into the agreement between the observed and modelled locations (landslide and non-landslide)-precision, F1-score, and recall were estimated ( Table 3). The RF classifier achieved the best accuracy with respect to all performance indicators. The MLP followed closely and consistently in terms of all indicators. In relation to the estimated AUC values, the RF classifier yielded the highest accuracy (0.962), followed by the MLP (0.960), SVM (0.935), and KNN (0.927) (Figure 7). The relatively greater values of performance indicators of RF and MLP can be attributed to their ability to learn complex relationships between geospatial characteristics of an area and the occurrence of landslides [17,18,60]. Table 3. Performance evaluation indicators of the machine learning based landslide susceptibility models.

Model
Overall Accuracy

Spatial Agreement of Various Methods
This study developed a correlation matrix by comparing pixel-wise landslide probabilities between various methods (Figure 8) to evaluate the extent of agreement of one landslide susceptibility model over another. Although the values of AUC were very similar for various methods (Figure 7), a substantial difference in the agreement was observed in LSMs obtained using the different techniques. Overall, the correlation coefficient ranges from 0.69 to 0.85 (Figure 8). The combinations of SVM-RF resulted in the highest degree of the agreement, while the KNN-SVM yielded the lowest agreement.

Aggregated Landslide Susceptibility Mapping
Since spatial heterogeneity in landslide prediction exists between the different machine learningbased approaches, an aggregated susceptibility map combining the outputs of all algorithms would minimize the uncertainty of individual methods. In this study, a regression-based approach was adopted. A multivariate logistic regression (LR) was established incorporating the binary landslide inventory data as the dependent variable and the results of the four landslide susceptibility models as independent variables. The outcome of the LR model is summarized in Table 4. Among the four

Spatial Agreement of Various Methods
This study developed a correlation matrix by comparing pixel-wise landslide probabilities between various methods (Figure 8) to evaluate the extent of agreement of one landslide susceptibility model over another. Although the values of AUC were very similar for various methods (Figure 7), a substantial difference in the agreement was observed in LSMs obtained using the different techniques. Overall, the correlation coefficient ranges from 0.69 to 0.85 (Figure 8). The combinations of SVM-RF resulted in the highest degree of the agreement, while the KNN-SVM yielded the lowest agreement. The estimated LR coefficients of the four models and intercept were incorporated in Equation (6) to derive the combined LSM. Again, the resultant aggregated map was categorized into five classes applying the Jenks Natural Break algorithm (Figure 9). The combined susceptibility map yielded the highest AUC value (0.965) compared to the single susceptibility forecasts (Figure 7). About 26.8% of the total study area was within the 'high' and 'very high' landslide susceptible zones, where approximately 21.7% of total population inhabit ( Figure 6). In respect to spatial agreement, the combined LSM resulted in greater spatial agreement with the all four models, with the correlation coefficient ranging between 0.85 and 0.92 ( Figure 8).

Aggregated Landslide Susceptibility Mapping
Since spatial heterogeneity in landslide prediction exists between the different machine learning-based approaches, an aggregated susceptibility map combining the outputs of all algorithms would minimize the uncertainty of individual methods. In this study, a regression-based approach was adopted. A multivariate logistic regression (LR) was established incorporating the binary landslide inventory data as the dependent variable and the results of the four landslide susceptibility models as independent variables. The outcome of the LR model is summarized in Table 4. Among the four models, the MLP, RF, and SVM were statistically significant (p-value < 0.05). The coefficient of determinants (R 2 ) of 0.80 indicates a very good model performance. In relation to the estimated regression coefficients, the RF model had the highest degree of agreement with the landslide inventory, followed by the MLP, SVM, and KNN. The pattern of influence of various models in predicting landslides corresponds to their level of accuracy in terms of their respective AUC values (Figure 7). The estimated LR coefficients of the four models and intercept were incorporated in Equation (6) to derive the combined LSM. Again, the resultant aggregated map was categorized into five classes applying the Jenks Natural Break algorithm (Figure 9). The combined susceptibility map yielded the highest AUC value (0.965) compared to the single susceptibility forecasts (Figure 7). About 26.8% of the total study area was within the 'high' and 'very high' landslide susceptible zones, where approximately 21.7% of total population inhabit ( Figure 6). In respect to spatial agreement, the combined LSM resulted in greater spatial agreement with the all four models, with the correlation coefficient ranging between 0.85 and 0.92 (Figure 8). The extent of landslide susceptible areas varies in different sub-districts (Upazila) of Cox's Bazar. Teknaf Upazila is the most susceptible, where more than 8% of the total study area was susceptible to landslides of 'high and 'very high' severity ( Figure 9b). A substantial proportion of area (7% of the study area) in Ukhia sub-district was also susceptible. The Rohingya refugee camps in this area were located within high and very high landslide susceptible zones. Various recent studies also found that changes in the geomorphological, hydrological, and anthropogenic environments due to the Rohingya influx caused their settlement areas vulnerable to landslides [11,31].

Discussion
The spatial disagreement in prediction among various techniques creates challenges in selecting the most suitable susceptibility map for managing landslide hazards [9,21,22]. The current study The extent of landslide susceptible areas varies in different sub-districts (Upazila) of Cox's Bazar. Teknaf Upazila is the most susceptible, where more than 8% of the total study area was susceptible to landslides of 'high and 'very high' severity ( Figure 9b). A substantial proportion of area (7% of the study area) in Ukhia sub-district was also susceptible. The Rohingya refugee camps in this area were located within high and very high landslide susceptible zones. Various recent studies also found that changes in the geomorphological, hydrological, and anthropogenic environments due to the Rohingya influx caused their settlement areas vulnerable to landslides [11,31].

Discussion
The spatial disagreement in prediction among various techniques creates challenges in selecting the most suitable susceptibility map for managing landslide hazards [9,21,22]. The current study seeks to address this challenge by estimating the extent of spatial agreement, as well as, proposing a method to combine landslide susceptibility maps and thus incorporating the valid results of various models. The study focused on the Cox's Bazar district of Bangladesh, which is well known as being vulnerable to landslide disasters [11,12,31]. First, LSMs were developed by applying four machine learning algorithms-K-Nearest Neighbor (KNN), Multi-Layer Perceptron (MLP), Random Forest (RF), and Support Vector Machine (SVM)-featuring hyperparameter optimization. In comparison to the existing studies on LSM of Cox's Bazar district [11,12,31], this current study employed up-to-date data of landslide conditioning factors. In addition, it restricted low-lying areas (waterbodies and elevation < 5 m) in susceptibility mapping, otherwise, the resultant maps would have been prone to overestimation of landslide susceptible zones, as was the case of some recent studies. While evaluating the models' performance, all of them yielded very high prediction accuracy, with the AUC values ranging between 0.927 to 0.962. The results of various recent studies have also ascertained that different machine learning-based models yielded high accuracy in predicting landslides [1,15,17,26,27,61].
This study hypothesized that different susceptibility models can result in different LSMs, despite incorporating similar landslide inventory data. The assessment of spatial agreement between various models revealed spatial heterogeneity in landslide predictions, with the estimated pixel-wise correlation coefficients of landslide probability between various models ranging from 0.69 to 0.85. The spatial distribution of landslide susceptibility obtained in this study was also different than that of a recent study conducted in Cox's Bazar district of Bangladesh [11]. This highlights the uncertainty in landslide predictions of various models, despite their considerable prediction accuracy in terms of the AUC values. Most of the existing studies on machine learning-based LSM had a major focus on identifying the most suitable method for predicting this natural phenomenon [1,8,18,23,60], while little attention has been given in analyzing uncertainties resulting from the spatial disagreement in landslide prediction [9]. The current study is the first case study-based contribution to investigate this major gap in the existing literature.
This study further developed a combined LSM by integrating the results of the four machine learning-based landslide predictions, adopting a method proposed by Rossi, Guzzetti, Reichenbach, Mondini, and Peruccacci [22]. The result indicates an improvement in landslide prediction accuracy. Existing studies, which applied multiple machine learning algorithms to map landslide susceptibility, mainly evaluated different methods based on quantitative measures [8,16,18]. Whilst quantitative measures of model fit are useful, they are not conclusive in determining the efficacy and reliability of susceptibility assessment [22]. A combined landslide susceptibility map that this study developed would help to minimize the uncertainties of individual methods.

Conclusions
Predicting a complex natural phenomenon such as a landslide is a challenging task and subject to considerable uncertainties. An accurate prediction of landslides is the prerequisite for managing this hazard. In this study, the spatial association in landslide prediction between various machine learning-based models was analyzed to quantify the spatial agreement of predicted landslide susceptibility. By addressing uncertainties in various models, this study also developed a landslide susceptibility map combining the outcomes from various models. The results indicate an improvement in landslide prediction compared to the individual models.
Despite achieving an improved result in landslide prediction, this study has some limitations that could be addressed in future results. Landslide inventory data used in this study was developed based on various secondary sources and validated through fieldwork [11]. Scarcity of data, including detailed landslide inventory on the study area, made it difficult to model a landslide more accurately. Accuracy of the LSM results depended on input parameters used, particularly the DEM. The ALOS DEM of 30-m resolution used in this study had a low root-mean-square error (1.78 m) in vertical accuracy and was considered to be the most accurate freely available DEM [43,62]. However, for future research, high-resolution DEM could be employed to improve the existing landslide susceptibility modelling frameworks.
This study is an attempt to integrate results of multiple machine learning-based landslide susceptibility models to minimize uncertainties and improve landslide predictions. The modelling framework used in this study could be transferred to other landslide-susceptible regions. Landslide susceptibility maps can enable urban planners in identifying suitable areas for urban development [63]. The combined landslide susceptibility map of Cox's Bazar district could be useful to policymakers and practitioners in sequencing and prioritizing interventions in managing landslides. The proposed model is an advancement in the existing landslide susceptibility models that intends to predict landslides more accurately. The results of this model could be utilized in improving the existing landslide early warning system [11], to strengthen landslide disaster risk mitigation strategies to support for the resilient future of inhabitants of the study area.