Landslide and Wildﬁre Susceptibility Assessment in Southeast Asia Using Ensemble Machine Learning Methods

: Southeast Asia (SEA) is a region affected by landslide and wildﬁre; however, few studies on susceptibility modeling for the two hazards together have been conducted for this region, and the intersection and the uncertainty of the two hazards are rarely assessed. Thus, the intersection of land-slide and wildﬁre susceptibility and the spatial uncertainty of the susceptibility maps were studied in this paper. Reliable landslide and wildﬁre susceptibility maps are necessary for disaster management and land use planning. This work used three advanced ensemble machine learning algorithms: RF (Random Forest), GBDT (Gradient Boosting Decision Tree) and AdaBoost (Adaptive Boosting) to assess the landslide and wildﬁre susceptibility for SEA. A geo-database was established with 2759 landslide locations, 1633 wildﬁre locations and 18 predictor variables in total. The performances of the models were assessed using the overall classiﬁcation accuracy (ACC), Precision, the area under the ROC (receiver operating curve) (AUC) and confusion matrix values. The results showed RF performs superior in both landslide (ACC = 0.81, Precision = 0.78 and AUC= 0.89) and wildﬁre (ACC= 0.83, Precision = 0.83 and AUC = 0.91) susceptibility modeling, followed by GBDT and AdaBoost. The overall superiority of RF over other models indicates that it is potentially an efﬁcient model for landslide and wildﬁre susceptibility mapping. The landslide and wildﬁre susceptibility were obtained using the RF model. This paper also conducted an overlay analysis of the two hazards. The uncertainty of the susceptibility was further assessed using the coefﬁcient of variation (CV). Additionally, the distance to roads is relatively important in both landslide and wildﬁre susceptibility, which is the most important in landslides and the second most important in wildﬁres. The result of this paper is useful for mastering the whole situation of hazard susceptibility and proves that RF is a robust model in the hazard susceptibility assessment in SEA.


Introduction
Southeast Asia, one of the most natural disaster-prone regions in the world, numerous natural hazards such as floods, earthquakes and heatwaves happen here every year [1][2][3][4]. The landscape in mainland SEA is characterized by mountainous areas [5]. Landslide is a main geological hazard in SEA, having damaging impacts on the safety of life and property [6][7][8]. Additionally, wildfires are frequent in SEA, particularly in Indonesia, and the forests of SEA are represented as increasingly at-risk to fire [9,10]. The extent and severity of wildfire increased with the viability of the climate, and the fire susceptibility in SEA increased [11][12][13]. Wildfires can cause various impacts, including a loss of biodiversity, loss of assets and damage to natural resources and agriculture areas [14,15].
Hazard susceptibility maps are crucial for determining the most susceptible areas where hazards are likely to occur, and multiple studies have implemented landslide and wildfire susceptibility assessment [8,[16][17][18]. Research about susceptibility modeling is susceptibility map, so that the susceptibility map can provide reliable scientific information that can support the identification of hazard-prone areas [52,53]. To overcome the above limitations, landslides and wilfires were studied together for SEA based on machine learing models, and the spatial uncertainty of the susceptibility maps was evaluated to ensure the reliability of the model.
In this paper, three advanced ensemble machine learning techniques, namely RF, GBDT and AdaBoost, were employed to construct the susceptibility models for landslides and wildfires using the geological, topographical and meteorological conditioning factors in SEA. The input datasets were randomly and repeatedly portioned into 70% training set and 30% testing set. The models were compared using accuracy statistics, and the most reliable and suitable model was chosen for generating susceptibility maps for SEA. This work overlaid the two natural hazards together to find out the intersection between them. Then, the uncertainty of the landslide and fire susceptibility maps were analyzed. The main objectives of this paper are to develop a suitable model for landslide and wildfire susceptibility assessment and generate reliable landslide and wildfire susceptibility maps and evaluate the uncertainty of the susceptibility map. The acquired results in this paper could provide support for decision-makers and planners to make suitable future development schemes.

Materials and Methods
The materials used in this paper mainly involve the conditioning factor and the hazard inventories. Machine learning methods should be incorporated with the historical hazard inventory and the dependent factors to develop a rational model for the susceptibility assessment. Three advanced ensemble machine learning methods were employed (i.e., GBDT, RF and AdaBoost).

Study Area
Southeast Asia is located on several plates and lies between the Indian and Pacific Oceans, covering about 4.5 million km 2 ( Figure 1). SEA includes 11 countries: Brunei, Cambodia, East Timor, Indonesia, Laos, Malaysia, Myanmar, Philippines, Singapore, Thailand and Vietnam. Southeast Asia is one of the most disaster-prone areas in the world [54], where earthquakes, volcano eruptions, tsunami and seasonal typhoons occur. The vast majority of SEA falls within the warm, humid tropics with plentiful rainfall. The climates in SEA are dominated by the tropical monsoon climate, tropical dry and wet monsoon climates and tropical rainforest climate. SEA is 80% mountains and hills, and the region has high temperatures and abundant rainfall most of the time [55,56].
As a consequence, landslides are prone to happening. Landslides, usually triggered by heavy rains and seismic activities, are very common in SEA, where large number of mountains and steep landscapes exist. Located in an active seismic area, deadly landslides almost happen in Indonesia and the Philippines every year. The landslide disaster happening in SEA in 2006 was regarded as the deadliest one worldwide, which resulted in 1126 deaths [9].
Vegetation fires are a common phenomenon in many regions in the world, including SEA [57]. Wildfires in SEA are destructive. They happen particularly frequently in Indonesia (Sumatra and Borneo) [9]. The impact of wildfires is not exclusive to the regions where fires happen. The ashes from fires can spread expansively to other places, causing tremendous immediate and long-term effects. The 1997 Indonesian wildfire was one of the worst forest fires in recorded history, engulfing over 12 million acres of Indonesia [9,58]. As a consequence, landslides are prone to happening. Landslides, usually triggered by heavy rains and seismic activities, are very common in SEA, where large number of mountains and steep landscapes exist. Located in an active seismic area, deadly landslides almost happen in Indonesia and the Philippines every year. The landslide disaster happening in SEA in 2006 was regarded as the deadliest one worldwide, which resulted in 1126 deaths [9].
Vegetation fires are a common phenomenon in many regions in the world, including SEA [57]. Wildfires in SEA are destructive. They happen particularly frequently in Indonesia (Sumatra and Borneo) [9]. The impact of wildfires is not exclusive to the regions where fires happen. The ashes from fires can spread expansively to other places, causing tremendous immediate and long-term effects. The 1997 Indonesian wildfire was one of the worst forest fires in recorded history, engulfing over 12 million acres of Indonesia [9,58].

Data Preparation
The quantitative assessment of landslide and wildfire susceptibility includes two aspects of data: landslide and wildfire occurrences and geo-environmental and antropogenic predisposing factors.

Data Preparation
The quantitative assessment of landslide and wildfire susceptibility includes two aspects of data: landslide and wildfire occurrences and geo-environmental and antropogenic predisposing factors.

Hazard Inventories
Disaster inventories, showing the locations of existing disasters, include landslide and fire inventories. The MODIS Collection 6 fire archive was used in this paper for wildfire susceptibility model construction [57,59]. MODIS hotspots have been adopted by many researchers for susceptibility mapping [60,61]. The fire products were downloaded from Fire Information for Resource Management System (FIRMS). This product provides confidence levels of fire pixels, and we discarded all fires with confidence criteria less than 100%. Specific to this research, we collected fire data from 2010 to 2019, and the total number of fire points was 1633.
Landslide inventories were obtained from the Global Landslide Catalogue (GLC) established by Kirschbaum [62]. The GLC was compiled from online disaster databases, scientific reports and other sources available, mainly consisting of landslides triggered by rainfall [63]. Two thousand seven hundred and fifty-nine landslide events within the study area with a spatial accuracy of 1 km or better within SEA were selected.
Unbalanced numbers of positive and negative samples led to a bad performance of the model [64]; thus, we randomly sampled nonhazard points in the disaster-free zones Remote Sens. 2021, 13, 1572 5 of 25 with the same number as the hazard points to ensure the reliability of the susceptibility assessment. The whole hazard dataset, including the positive and negative samples (i.e., the presence and absence of the landslide or fire), was divided into a training dataset and testing dataset.
Before assessing the hazard susceptibility, the conditioning factors and disaster inventories were prepared in the database. All the data used for extracting the conditioning factors for landslides and wildfires, as well as the hazard inventories, are listed in Table 1. Conditioning factors are crucial for hazard susceptibility modeling and mapping. The factors were selected considering the characteristics of SEA, and data availability referring to the information collected from the literature [8,17,25,26,72]. In this paper, 18 conditioning factors in total were chosen for the hazard susceptibility assessment-namely, elevation, slope, aspect, plan curvature, profile curvature, TWI (Topographic Wetness Index), SPI (Stream Power Index), NDVI (Normalized Difference Vegetation Index), annual mean precipitation, annual mean maximum temperature, soil moisture, wind speed, lithology, land use, distance to roads, distance to rivers, distance to faults and distance to urban areas ( Figure 2). The temperature, precipitation, windspeed and NDVI factors are the annual mean values. The separate conditioning factors used for landslides and wildfires are listed in Table 2. A database containing 18 factors and hazard inventories based on the geographic information system (GIS) was generated. All thematic layers are projected using coordinate system UTM zone 50N with a Datum of WGS 1984, and the spatial resolution is 1 km×1 km. The factors have categorical (i.e., lithology and land use) and continuous data. Some factors are pertinent to determining both hazards, and some of these are relevant to one hazard (Table 2) and were, thus, only incorporated into the risk models for which they were relevant. The relationship between hazard and nonhazard with the conditioning factors is shown in Supplementary Figures S1 and S2.

Multicollinearity Test for the Conditioning Factors
Multicollinearity checking of conditioning factors is necessary for the studies of susceptibility mapping, since the multicollinearity may disturb the prediction and cause some error in the results [73,74]. Multicollinearity happens when input datasets are highly correlated, which can cause erroneous modeling [17]. Variance Inflation Factors (VIF) and Tolerance (TOL) were used to detect and quantify the multicollinearity [17,75]. VIF<10 or TOL>0.1 denoted a problem of multicollinearity [76,77]. As a result, there were no multicollinearities among the factors selected for each hazard (Supplementary Tables S1 and S2). The low VIF and high TOL values of the factors indicated that the conditioning factors for each hazard were properly selected.

Methods
Three different ensemble machine learning methods-specifically, AdaBoost, GBDT and RF-were adopted for our research. The scheme of the hazard susceptibility modeling and mapping are presented in Figure 3, and the main procedures are described as follows.
(1) Firstly, we constructed a spatial database collecting the basic environmental data, as well as the landslide and fire inventories.  Multicollinearity checking of conditioning factors is necessary for the studies of susceptibility mapping, since the multicollinearity may disturb the prediction and cause some error in the results [73,74]. Multicollinearity happens when input datasets are highly correlated, which can cause erroneous modeling [17]. Variance Inflation Factors (VIF) and Tolerance (TOL) were used to detect and quantify the multicollinearity [17,75]. VIF < 10 or TOL > 0.1 denoted a problem of multicollinearity [76,77]. As a result, there were no multicollinearities among the factors selected for each hazard (Supplementary Tables S1 and S2). The low VIF and high TOL values of the factors indicated that the conditioning factors for each hazard were properly selected.

Methods
Three different ensemble machine learning methods-specifically, AdaBoost, GBDT and RF-were adopted for our research. The scheme of the hazard susceptibility modeling and mapping are presented in Figure 3, and the main procedures are described as follows.
(1) Firstly, we constructed a spatial database collecting the basic environmental data, as well as the landslide and fire inventories. (2) Secondly, we used the prepared data to extract conditioning factors from the environmental data for landslide and wildfire susceptibility modeling separately. Then, a multicollinearity test on those factors was performed using VIF and TOL. (3) Thirdly, we randomly portioned the dataset into a training dataset and testing dataset.
The dataset was first shuffled and then split randomly into training (70%) and testing (30%) data in Python.The target class value (i.e., hazard point) is 1 if the samples are disaster-positive; otherwise, the class value is set to "0". The ratio between training and validation is 70% and 30% [8,64,78]. The models were run 30 times with different hazard data combinations using AdaBoost, GBDT and RF, and, every time, the input data were split into 70% for training and 30% for testing. After developing the models, evaluation of the model accuracy and comparison between models was implemented, using AUC, Precison, ACC and confusion matrix statistics. (4) Next, the model predictive capability was compared, and the best-performed model was used to generate the susceptibility maps for the two hazards. Then, we carried out an overlay analysis to evaluate the susceptibility of the two hazards. Additionally, we computed the CV to assess the uncertainty of the results. The susceptibility map intersected with the uncertainty map based on a matrix-based method to assess the reliability of the best model. Additonally, the relative importance of every conditioning factor for each hazard was obtained.
Machine learning methods should be incorporated with the historical hazard inventory and geo-environmental factors to develop a rational model for the susceptibility assessment. Three advanced ensemble machine learning methods were employed for hazard susceptibility mapping, including AdaBoost, GBDT and RF. All the algorithms were implemented based on Python language. Depictions of the three machine learning methods are presented as follows.

AdaBoost
AdaBoost was introduced by Freund and Schapire [79]. AdaBoost is the most popular boosting approach, involving the application and an adaptive resampling technique and having enhanced predictive capability, as it controls bias and variance [45,51]. It is an iterative algorithm dealing with binary classification. The basic idea of AdaBoost is to train several different weak classifiers for the training dataset and then combine these weak classifiers to form a strong classifier. Weak classifiers with low error rates account for a larger proportion of the final classifier. First, the weight values are assigned to instances in the training dataset, and then, the weights are replaced during the iterations by reducing the weights of correct classified samples and increasing the weights of wrong classified samples in the last iteration [44]. When optimal weights have been assigned, the learning process ends to obtain the best performance of the base classifier [44]. AdaBoost can adaptively adjust the weak errors of the weak classifier and form a stronger final classifier by combing the weak classifiers; the accuracy of the strong classifier is based on those weak classifiers. This process can reduce the bias and variance and, thus, improve the classification ability and have higher effectiveness [51]. Different from other methods, it also increases the weight of the samples that are misclassified during training and then learns again to continuously improve the accuracy. AdaBoost is sensitive to the outliers, so that outliers may affect the accuracy of the final learner. In this paper, we set the maximum number of iterations as 500 and the learning rate as 0.8, and we used the classification effect of the sample set as the weight of the weak classifier.

Gradient Boosting Decision Tree
GDBT is an ensemble machine learning method combining multiple decision trees based on the Boosting concept [46]. It continuously improves the prediction accuracy through interactions. A new decision tree was established in the gradient direction of the reducing residuals in each iteration [80]. The basic idea of GBDT is to build several weak classifiers and finally combine them to form a strong classifier after multiple iterations. Each iteration is to improve the previous results and reduce the residuals of the previous model. The overall performance of GBDT becomes better, because the errors of the decision trees are compensated by each other [48]. GBDT belongs to Boosting ensemble learning. Compared with the AdaBoost algorithm, GBDT is improved by calculating the negative gradients instead of adjusting the weights of the misclassified samples [81]. Different from the AdaBoost algorithm, the weak classifiers of GBDT are dependent and connected [82]. The prediction results are finalized based on the sum of the weak classifiers. GDBT is one of the best algorithms fitting the real distributions and has a strong generalization ability and can be used for classification problems [81]. The maximum number of iterations was set to 100, and the learning rate is 1 when taking the data into account.

Random Forest
RF [83] belongs to ensemble learning methods aggregating a bunch of CART decision trees for classification and prediction. RF is the most common bagging model, combining bagging ensemble learning and the random subspace method, which is resistant to overfitting [8]. In the classification problem, different subsamples of the datasets and different subsamples of features are used to train several decision trees, and majority voting defines the class [75]. Subsamples of the dataset are produced using the bootstrap resampling method [64]. RF is one of the most frequently used machine learning algorithms in multiclassification and prediction research [77]. The input data for RF do not need to be scaled, transformed or modified, and RF can resist outliers in predictive variables and automatically deal with missing values [8]. In the process of classification, RF can also get the importance of every input factor by the Information Gain (IG), Gain Rate, Gini Index or the chi-square test.

Factor Importance
The mean decrease Gini Index of RF was used to determine the relative importance of the variables. The factor importance is of vital importance for identifying the contribution of each factor and, hence, determining their role in the model [8]. The random forest technique was utilized to evaluate the relative importance of all the conditioning factors [8,73]. Natural hazards are generally influenced by multiple conditioning factors. Therefore, measuring the relative importance of the conditioning factors is crucial for understanding the hazard risk patterns [84,85].

Model Performance and Accuracy Assessment
In this paper, hazard susceptibility modeling and mapping is a classification problem, with binary outcomes of the presence and absence of hazards, meaning that measurements assessing the model performance by evaluating the prediction results and accuracy are important [17]. In the literature, the overall accuracy (ACC), precision and the area under the ROC curve (AUC) are considered the main metrics by which to evaluate the overall results [8,17,86]. In addition, the confusion matrix is also implemented as a further metric to evaluate the model performances quantitatively and graphically.
In classification problems, each class is either positive (hazard) or negative (nonhazard). ACC is a statistical index of the model's overall performance, which is defined as the percentage of correctly classified samples (Equation (1)). Precision can be regarded as a measure of exactness, which indicates the percentage of samples predicted as positive are exactly positive (Equation (2)). The AUC is a useful accuracy statistic for the susceptibility analysis [8,87]. The 30% testing data that were not used previously in the model establishment process were employed to evaluate the model capability. ACC depicts the number of correctly classified samples of both hazard occurrence and nonoccurrence, ranging from 0 to 1, and the larger the AUC values, the better the model performs. The AUC can be interpreted as follows as reflecting the model performance: excellent (0.9-1), very good (0.8-0.9), good (0.7-0.8), medium (0.6-0.7) and poor (0.5-0.6) [17,88]. In addition, the contingency/four-fold plot summarizing the numbers of TP, TN, FP and FN is used to graphically evaluate the model predictive ability [89,90], and the larger proportion of TP and TN indicates a better model.
The mathematical Expression (1) for ACC and Expression (2) for Precision are: where ACC is the overall accuracy, and Precision denotes the fraction of true landslide instances among the samples classified as positive. TP (true positive) and TN (true negative) are the number of hazard and nonhazard samples that are correctly classified, respectively. Conversely, FP (false positive) and FN (false negative) are the hazard and nonhazard samples that have been falsely classified, respectively.

Results
The results of this paper primarily include comparing the performances of the susceptibility models, generating susceptibility maps for landslides and wildfires, evaluating the uncertainty of the susceptibility maps and analyzing the relative importance of the conditioning factors for landslides and wildfires.

Evaluation of the Models
According to the accuracy measures in Table 3 A higher AUC means a higher model accuracy. As can be seen from Table 3, the AUC values of all three models are greater than 0.8, indicating that the three machine learning models performed well in the landslide and wildfire susceptibility modeling. The three accuracy measures have the same order of model performance. Compared with RF and GBDT, AdaBoost has the lowest accuracy values in landslides (ACC 0.77, Precision 0.75 and AUC 0.86) and wildfires (ACC 0.74, Precision 0.72 and AUC 0.81). In addition, AdaBoost performed better in landslides than wildfires, while GBDT and RF are better in wildfires. Overall, RF performed well in the landslide susceptibility assessment and performed excellently in the wildfire susceptibility assessment. The results demonstrated that RF exhibited the best performance among the three ensemble machine learning methods for the two hazards, determining that the RF model is more predictive than GBDT and AdaBoost in landslide and wildfire susceptibility modeling and mapping. The results of the contingency plot of the confusion matrix statistics confirmed that RF is the best model among the three machine learning models (Figure 4). RF performed better in wildfires than in landslides. The gap in the performances between landslides and wildfires may be induced by the difference in data quality. The RF model exhibited good generalization capability. In addition, Figure 4 shows the specific percentages of TN, TP, FN and FP, giving a more detailed accuracy evaluation. FP refers to the nonhazard locations that are wrongly predicted as the hazard, while FN is the hazard locations that are wrongly predicted as nonhazard. TP and TN are the samples that are correctly predicted. In Figure 4, the TN and TP account for large portions in all the models in landslides and wildfires, while the mispredicted samples (FP and FN) in landslides and wildfires account for smaller parts. RF have the least FN and FP, demonstrating the high accuracy of the RF model. Remote Sens. 2021, 13, 1572 14 of 27

Susceptibility Maps
After obtaining the best model for the three hazards, the best performed RF model was applied to produce the susceptibility of the three hazards ( Figure 5). The Natural Breaks classification method was employed to categorize the susceptibility map into five groups, including very low, low, moderate, high and very high [8,17,77]. The intervals of very low, low, moderate, high and very high for landslide susceptibility were 0-0.13, 0.14-0. 28

Susceptibility Maps
After obtaining the best model for the three hazards, the best performed RF model was applied to produce the susceptibility of the three hazards ( Figure 5). The Natural Breaks classification method was employed to categorize the susceptibility map into five groups, including very low, low, moderate, high and very high [8,17,77]. The intervals of very low, low, moderate, high and very high for landslide susceptibility were 0-0.13, 0.14-0. 28  As previously reported, the susceptibility maps were divided into five classes. Here, we carried out an aggregation to make the intersection map clear and distinguishablenamely, the low class includes low and very low, while the high contains high and very high. Subsequently, an intersection operation between the landslide and wildfire susceptibility maps was performed. A multihazard susceptible map ( Figure 6) combing landslides and wildfires was derived by the interaction between the three integrating classes of the susceptibility of landslides and wildfires. The resulting maps showed the classification scheme of low, moderate and high susceptible areas between landslides and wildfires. The majority of the regions have low susceptibility in both landsides and wildfires. The Philippines is highly susceptible to landslides but not very susceptible to wildfires. The western part of Myanmar showed a high susceptibility to both landslides and wildfires. Landslide susceptibility is high in the Philippines, the middle of Vietnam and the southern costal of Indonesia, while the wildfire susceptibility is low. Those regions in the middle and east parts of Indonesia and northwestern Cambodia show a high susceptibility to wildfires but a low susceptibility to landslides. The west of Myanmar is highly susceptible to both wildfires and landslides. From the bar chart, which summarizes the areas of different combination classes, it can be seen that the areas with low susceptibility in both hazards account for the majority, followed by areas with high landslide susceptibility and low wildfire susceptibility, and areas where both of the hazard susceptibilities are high make up the least proportion, confirming the statistical analyses on the maps. The susceptibility maps show a fairly good spatial distribution of the high susceptible areas. Most of the hazard points are within the areas with very high susceptibility, indicating the reliability of the susceptibility maps. The landslide susceptibility is high in the Philippines, the west of Myanmar, the middle of Vietnam and Lao and the southwestern parts of Indonesia near the Indian Ocean. Wildfires need dry environmental conditions. The wildfires in the Philippines are relatively low. The wildfires in Myanmar are relatively severe and are also serious in the northern area of Lao. Cambodia also has high wildfire susceptibility. Additionally, the wildfire susceptibilities in Sumatra and Borneo of Indonesia are relatively high.
As previously reported, the susceptibility maps were divided into five classes. Here, we carried out an aggregation to make the intersection map clear and distinguishablenamely, the low class includes low and very low, while the high contains high and very high. Subsequently, an intersection operation between the landslide and wildfire susceptibility maps was performed. A multihazard susceptible map ( Figure 6) combing landslides and wildfires was derived by the interaction between the three integrating classes of the susceptibility of landslides and wildfires. The resulting maps showed the classification scheme of low, moderate and high susceptible areas between landslides and wildfires. The majority of the regions have low susceptibility in both landsides and wildfires. The Philippines is highly susceptible to landslides but not very susceptible to wildfires. The western part of Myanmar showed a high susceptibility to both landslides and wildfires. Landslide susceptibility is high in the Philippines, the middle of Vietnam and the southern costal of Indonesia, while the wildfire susceptibility is low. Those regions in the middle and east parts of Indonesia and northwestern Cambodia show a high susceptibility to wildfires but a low susceptibility to landslides. The west of Myanmar is highly susceptible to both wildfires and landslides. From the bar chart, which summarizes the areas of different combination classes, it can be seen that the areas with low susceptibility in both hazards account for the majority, followed by areas with high landslide susceptibility and low wildfire susceptibility, and areas where both of the hazard susceptibilities are high make up the least proportion, confirming the statistical analyses on the maps.

Uncertainty of the RF Model
To assess the uncertainty of the susceptibility maps generated by the RF, the coefficient of variation (CV) was computed. The CV is a measure of uncertainty of the model [52,53]. In this work, the uncertainty map interacted with the susceptibility to delineate the uncertainty within each susceptibility class. In this way, the reliability of the RF model was assessed. The uncertainty is high when the CV exhibits high scores, while the lower the score, the better are the results of the model. The CV values were subdivided into five classes using the Natural Breaks method, and the first two classes were considered as low uncertainties while the last two were classified as high uncertainties, while the third class was a medium uncertainty. As can be seen from Figure 7, most of the region showed low uncertainty in the susceptibility map. Areas with high uncertainty were only identified where the landslide susceptibility was low. Most of the landslide points (about 93.95%) fell into the class with low uncertainty and high susceptibility, confirming how the RF model correctly identified the potential landslide areas with a low uncertainty level. In addition, none of the landslides fell into the high susceptibility class with high uncertainty. Only 1.27% of the landslides fell into the low susceptibility class (with low and medium uncertainty). Few regions show high uncertainty and those regions with high uncertainty were mainly distributed in the low landslide susceptibility category.

Uncertainty of the RF Model
To assess the uncertainty of the susceptibility maps generated by the RF, the coefficient of variation (CV) was computed. The CV is a measure of uncertainty of the model [52,53]. In this work, the uncertainty map interacted with the susceptibility to delineate the uncertainty within each susceptibility class. In this way, the reliability of the RF model was assessed. The uncertainty is high when the CV exhibits high scores, while the lower the score, the better are the results of the model. The CV values were subdivided into five classes using the Natural Breaks method, and the first two classes were considered as low uncertainties while the last two were classified as high uncertainties, while the third class was a medium uncertainty. As can be seen from Figure 7, most of the region showed low uncertainty in the susceptibility map. Areas with high uncertainty were only identified where the landslide susceptibility was low. Most of the landslide points (about 93.95%) fell into the class with low uncertainty and high susceptibility, confirming how the RF model correctly identified the potential landslide areas with a low uncertainty level. In addition, none of the landslides fell into the high susceptibility class with high uncertainty. Only 1.27% of the landslides fell into the low susceptibility class (with low and medium uncertainty). Only pixels characterized by low wildfire susceptibility showed high variability (Figure 8). Areas with high wildfire susceptibility and medium or high uncertainty were hard to recognize. The majority of the fire points fell into the highly wildfire-susceptible class with low uncertainty (91.06%), demonstrating the reliability of the wildfire susceptibility map obtained by the RF model. The fire points that fell into the low susceptibility classes with medium and high uncertainty were about 0.24%. About 4.72% of the fire points fell into the high susceptibility class with medium uncertainty. Only 1.90% of the fire points were in the high susceptibility class with high uncertainty. It was hard to recognize areas with high wildfire susceptibility presenting medium and high uncertainty. Most of the highly susceptible areas showed a low uncertainty level. Only pixels characterized by low wildfire susceptibility showed high variability ( Figure 8). Areas with high wildfire susceptibility and medium or high uncertainty were hard to recognize. The majority of the fire points fell into the highly wildfire-susceptible class with low uncertainty (91.06%), demonstrating the reliability of the wildfire susceptibility map obtained by the RF model. The fire points that fell into the low susceptibility classes with medium and high uncertainty were about 0.24%. About 4.72% of the fire points fell into the high susceptibility class with medium uncertainty. Only 1.90% of the fire points were in the high susceptibility class with high uncertainty. It was hard to recognize areas with high wildfire susceptibility presenting medium and high uncertainty. Most of the highly susceptible areas showed a low uncertainty level. The graphs presented in Figure 9 confirmed the uncertainty maps. It was shown in Figure 9a that the largest areas were those with low uncertainty for landslides. Therefore, highly landslide-susceptible areas have low values of uncertainty. As can be seen from the graph in Figure 9b, most of the areas showed low uncertainty, confirming how the RF model correctly identified the potential wildfire areas. It can be observed from Figure 9b that the majority of the zones were those with low uncertainty. Low wildfire susceptibility with medium and high uncertainty also accounted for a large portion.  The graphs presented in Figure 9 confirmed the uncertainty maps. It was shown in Figure 9a that the largest areas were those with low uncertainty for landslides. Therefore, highly landslide-susceptible areas have low values of uncertainty. As can be seen from the graph in Figure 9b, most of the areas showed low uncertainty, confirming how the RF model correctly identified the potential wildfire areas. It can be observed from Figure 9b that the majority of the zones were those with low uncertainty. Low wildfire susceptibility with medium and high uncertainty also accounted for a large portion. The graphs presented in Figure 9 confirmed the uncertainty maps. It was shown in Figure 9a that the largest areas were those with low uncertainty for landslides. Therefore, highly landslide-susceptible areas have low values of uncertainty. As can be seen from the graph in Figure 9b, most of the areas showed low uncertainty, confirming how the RF model correctly identified the potential wildfire areas. It can be observed from Figure 9b that the majority of the zones were those with low uncertainty. Low wildfire susceptibility with medium and high uncertainty also accounted for a large portion.  In general, the uncertainty maps gave more powerful support to evaluate the landslideand wildfire-susceptible regions of SEA, along with the susceptibility maps. Figure 9 represents the areas for each combination class. Landslide and wildfire uncertainty maps have similar characteristics. The low uncertainty level was the majority in both landslides and wildfires, while there were more pixels with high uncertainty in the low susceptibility class of wildfires. In landslides, the low susceptibility with low uncertainty accounted for the most, while, in wildfires, the low susceptibility with medium uncertainty was the largest. Although landslides and wildfires both have good reliability in the highly susceptible areas, the landslide susceptibility map was more reliable than that of wildfires in the low susceptibility class.

Factor Contribution Analysis
The relative importance of each conditioning factor for the susceptibility modeling was obtained using RF. The results revealed that the distance to roads, distance to faults and precipitation are the three most powerful factors to predict landslide risks (Figure 10a), while the SPI and land use seemed to have the least importance for landslide susceptibility modeling. Earthquakes and rainfall are frequent in SEA; it is reasonable that the distance to faults and precipitation are important. Earthquakes will make faults rupture; combined with the impact of precipitation, landslides are prone to occurring. For wildfire susceptibility modeling, it is mostly determined by the distance to urban areas, distance to roads and slopes, while the TWI and elevation are the least important factors (Figure 10b).
In general, the uncertainty maps gave more powerful support to evaluate the landslide-and wildfire-susceptible regions of SEA, along with the susceptibility maps. Figure  9 represents the areas for each combination class. Landslide and wildfire uncertainty maps have similar characteristics. The low uncertainty level was the majority in both landslides and wildfires, while there were more pixels with high uncertainty in the low susceptibility class of wildfires. In landslides, the low susceptibility with low uncertainty accounted for the most, while, in wildfires, the low susceptibility with medium uncertainty was the largest. Although landslides and wildfires both have good reliability in the highly susceptible areas, the landslide susceptibility map was more reliable than that of wildfires in the low susceptibility class.

Factor Contribution Analysis
The relative importance of each conditioning factor for the susceptibility modeling was obtained using RF. The results revealed that the distance to roads, distance to faults and precipitation are the three most powerful factors to predict landslide risks ( Figure  10a), while the SPI and land use seemed to have the least importance for landslide susceptibility modeling. Earthquakes and rainfall are frequent in SEA; it is reasonable that the distance to faults and precipitation are important. Earthquakes will make faults rupture; combined with the impact of precipitation, landslides are prone to occurring. For wildfire susceptibility modeling, it is mostly determined by the distance to urban areas, distance to roads and slopes, while the TWI and elevation are the least important factors (Figure 10b). Comparing the two hazards, it can be found that the distance to roads has great importance for both the landslide and wildfire susceptibility assessments. Additionally, the distance to rivers, precipitation and slope also have great contributions to both hazards. However, elevation and the TWI are of medium importance for landslides, while these two factors have the least importance in wildfires.

Discussion
Landlide and wildfire susceptibility mapping is a hot topic. It is critical in the prediction and reduction of future landslide and wildfire occurences. The primary purpose of the present study was to produce satisfactory hazard susceptibility maps using ensemble machine learning approaches for SEA. In this discussion section, the factor contribution of the two hazards, the model comparision and the sampling methods for nonhazard samples are mainly discussed.

Contribution of Driving Factors
The analysis in this work showed that the distance to roads, distance to faults and precipitation were the most effective factors in determining the landslide potential in SEA. Comparing the two hazards, it can be found that the distance to roads has great importance for both the landslide and wildfire susceptibility assessments. Additionally, the distance to rivers, precipitation and slope also have great contributions to both hazards. However, elevation and the TWI are of medium importance for landslides, while these two factors have the least importance in wildfires.

Discussion
Landlide and wildfire susceptibility mapping is a hot topic. It is critical in the prediction and reduction of future landslide and wildfire occurences. The primary purpose of the present study was to produce satisfactory hazard susceptibility maps using ensemble machine learning approaches for SEA. In this discussion section, the factor contribution of the two hazards, the model comparision and the sampling methods for nonhazard samples are mainly discussed.

Contribution of Driving Factors
The analysis in this work showed that the distance to roads, distance to faults and precipitation were the most effective factors in determining the landslide potential in SEA. The relative importance of the conditioning factors to landslide modeling was dependent on the characteristics of the study area [91]. The landslide inventory data consisted of rainfall-induced data, so precipitation played a relatively important role in landslide susceptibility modeling. There were various external triggers for landslide occurrences, such as heavy rains, earthquakes, volcanoes, deforestations, road constructions and other natural or human-made processes [8,30,92,93]. Landslides are generally measured as a natural occurrence, but studies have shown that most landslides are frequently initiated by anthropological activities [29,30,94].
There are a lot of natural factors governing fire occurrences, such as fuel type, topography, vegetation, climate and drought [13,57,95]. Besides the natural triggering, most of the fires in SEA are initiated by humans [27,57]. As we can observe from the relative importance (Figure 10), the distance to roads plays a big role in both landslides and wildfires. What is more, the distance to urban areas is the most important for wildfire. The two factors are related to human activities that have a direct relationship with susceptibility ( Figures S1 and S2), which may indicate that the human activity impacts the occurrence of landslides and wildfires.
Vast regions in SEA are undergoing a transformation process due to human interference [12], and these changes may create a hazard-prone environment. Anthropogenic factors are crucial for landslide and wildfire occurrences.

Comparison between the Ensemble Machine Learning Methods
The applied approaches, RF, GDBT and AdaBoost, are the prevailing ensemble machine learning algorithms in the field of data mining [17,30,96]. The results showed that the RF model outperformed the other two models in both landslide and wildfire susceptibility modeling, and the exceptional potential of the RF model has been supported by other related research [8,49,90]. AdaBoost, GBDT and RF are all tree-based algorithms that collect several single classifiers to improve their accuracy. The differences between RF and the other two models mainly exist in the particular framework and procedure [77]. RF is based on the Bagging idea, while GBDT and AdaBoost are of the Boosting family [35]. This can indicate that the Bagging strategy may be more suitable for landslide and wildfire susceptibility modeling in SEA. In general, RF can produce more accurate results in landslides and wildfires for SEA. The RF algorithm is believed to be the most suitable ensemble machine learning model when dealing with classification issues [77].
Besides, RF demonstrates robust and accurate performances on complicated data when there are noisy variables [75]. The RF model can provide high accuracy rates concerning outliers in predictors. The randomness of RF is reflected in two aspects: one is the random selection of training samples, and the other is the random selection of the split attribute at each node [17]. Considering the model performance and efficiency, RF is suitable for the susceptibility assessment in SEA.

Comparison of Different Sampling Strategies
Two techniques have been applied to generate nonhazard data as negative samples in the literature [64]: (1) the Buffer-controlled sampling (BCS) method and (2) the target space exteriorization sampling (TSES) method. BSC is a widely used method for sampling absent data for the hazard susceptibility [97,98]. In BCS, the negative samples are randomly generated in the buffer zone, which is a certain distance away from the presence of the disaster. In this work, we devised four strategies according to the BCS method and the two-buffer method proposed by [99] for sampling the negative samples (Table 4). The two-buffer method, as depicted by Zhu et al. [99]. The inner buffer around a hazard location serves as a round polygon simulating the area where the hazard happens, and the outer buffer surrounding the hazard points was created to sample the nonhazard points. Random points were sampled from between the inner and outer buffers as a nonhazard observation.
The ACC, Precision and AUC were used to evaluate the model performance using the proposed sampling methods. As we can see (Table 5), strategy IV always has the best accuracy in whichever model for both landslides and wildfires. Using different buffer distances to generate negative samples has a different influence on the conditions of different models and hazard types. Nowicki Jessee et al. [100] tested a range of buffer radii, and the results showed that their model is not sensitive to the buffer radii, and the values do not significantly influence the model. However, we had different results. This may be because of the model difference and the difference in the extent of the study area.

Limitations and Future Works
The present paper also encountered some limitations. In this work, the ensemble machine learning methods were implemented 30 times (Please see Figures S3-S8 for the ROC plots), using different combinations of training and testing sets each time. The final susceptibility maps from the RF model were the average of the results. The reliability was assessed intersecting the susceptibility map with the uncertainty obtained by the CV. This paper evaluated the uncertainty of the RF model using CV maps. Most of the regions have low uncertainty for both landslides and wildfires (Figure 9), and all the highly susceptible regions have low uncertainty. However, there are still a few lowly susceptible areas with high uncertainty, especially for wildfires. It is worth more effort on the reduction of the uncertainty of those areas. In addition, the CV values could not consider uncertainties from the conditioning factors [53].
SEA's weather is tropical, which is mostly hot and humid with high temperatures and high annual precipitation amounts [55,56]. Intense precipitation could cause landslides, and high temperatures could provide a dry environment for fire occurrences. SEA has often been affected by weather-related natural disasters [55]. Since the climate change can intensify the extreme weather, weather-related hazards may be more frequent in the future, and the variability of the climate will pose a great challenge for susceptibility modeling. It is essential to know the spatial distribution of susceptible areas for disaster prevention. Climate changes should be considered when assessing the regional susceptibility of landslides and wildfires. In addition, wildfires are influenced by the vetegtation types and vegetation phenology that are dominated by the seasons [101]. The vegetation types should be considered in assessing the wildfire susceptibility in future works. The model for wildfire susceptibility in different seasons performed differently [101]. Thus, the seasonality of the wildfire should also be considered in future works when assessing the wildfire susceptibility. Futrthermore, wildfires play an unfavorable role in the landslide susceptibility assessment, which can increase the predisposition towards the territory instability [37]. Future works focusing on the deeper relationship between landslides and wildfires should be done.

Conclusions
This paper developed landslide and wildfire susceptibility models for SEA using three well-accepted ensemble learning methods (AdaBoost, GBDT and RF).
(1) This research compared the model performance using various measures and found out that RF is the best model in both landslide and wildfire susceptibility modeling and mapping. Then, the separate susceptibility maps for landslides and wildfires were generated using the best-performed RF model, in which the majority of actual hazard points fell within the very highly susceptible areas. (2) The resulting maps of each hazard were overlaid to develop the intersection map, and the regions that were highly susceptible to both landslides and wildfires accounted for a small portion. (3) The CV was used to evaluate the uncertainty of landslide and wildfire susceptibility spatial distribution. In general, the uncertainty was low, and there was no high-level uncertainty in the highly susceptible areas in either landslides or wildfires. (4) Through the factor importance analysis, it was found that the distance to roads and distance to faults were, relatively, the two most important factors for landslide susceptibility. For wildfires, the distance to urban areas was the most important, followed by the distance to roads and slope.
This study foucsed on the uncertainty of the susceptibility map, while the uncertainty of the conditioning factors was not considered. The model performance needs to be improved to obtain higher accuracy and lower uncertainty in the future. Additionally, the deeper relationship between landslides and wildfires should be studied in future works. The landslide and wildfire susceptibility with high reliability provided an understanding of the two hazards for SEA, which, therefore, may help to raise an awareness of the highly susceptible zones. Susceptibility maps can help in identifying regions where landslides and wildfires are susceptible, and the results can be useful for land use planning and disaster prevention and mitigation.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/rs13081572/s1, Figure S1: The relationship between the conditioning factors and landslides. Figure S2: The relationship between the conditioning factors and wildfires. Figure S3: ROC curves for landslide susceptibility using AdaBoost. Figure S4: ROC curves for wildfire susceptibility using AdaBoost. Figure S5: ROC curves for landslide susceptibility using GBDT. Figure S6: ROC curves for wildfire susceptibility using GBDT. Figure S7: ROC curves for landslide susceptibility using RF. Figure S8: ROC curves for wildfire susceptibility using RF. Table S1: Multicollinearity analysis of the landslide conditioning factors. Table S2