Landslide Susceptibility Mapping Using Statistical Methods along the Asian Highway, Bhutan

: In areas prone to frequent landslides, the use of landslide susceptibility maps can greatly aid in the decision-making process of the socio-economic development plans of the area. Landslide susceptibility maps are generally developed using statistical methods and geographic information systems. In the present study, landslide susceptibility along road corridors was considered, since the anthropogenic impacts along a road in a mountainous country remain uniform and are mainly due to road construction. Therefore, we generated landslide susceptibility maps along 80.9 km of the Asian Highway (AH48) in Bhutan using the information value, weight of evidence, and logistic regression methods. These methods have been used independently by some researchers to produce landslide susceptibility maps, but no comparative analysis of these methods with a focus on road corridors is available. The factors contributing to landslides considered in the study are land cover, lithology, elevation, proximity to roads, drainage, and fault lines, aspect, and slope angle. The validation of the method performance was carried out by using the area under the curve of the receiver operating characteristic on training and control samples. The area under the curve values of the control samples were 0.883, 0.882, and 0.88 for the information value, weight of evidence, and logistic regression models, respectively, which indicates that all models were capable of producing reliable landslide susceptibility maps. In addition, when overlaid on the generated landslide susceptibility maps, 89.3%, 85.6%, and 72.2% of the control landslide samples were found to be in higher-susceptibility areas for the information value, weight of evidence, and logistic regression methods, respectively. From these ﬁndings, we conclude that the information value method has a better predictive performance than the other methods used in the present study. The landslide susceptibility maps produced in the study could be useful to road engineers in planning landslide prevention and mitigation works along the highway. presents a comparative performance of the IV, WOE, and LR models in developing landslide susceptibility maps along a road corridor in Bhutan. The LSM assessment on the studied road corridor showed the probability of landslide occurrence in the southern mid-region The inﬂuence of each class of causal factor is evident from the IV and models. The highest contributors from


Introduction
A landslide is defined as "the movement of a mass of rock, earth, or debris down a slope", and they are classified according to the type of slope movement (fall, topple, spread, flow, slide), type of material involved (rock, earth, debris), and the speed of movement [1,2]. Even though gravity is the essential contributor to the occurrence of a landslide event, other triggering factors, such as an earthquake, rainfall, flood, or human intervention [3], significantly increase the likelihood of landslide occurrence. Landslides constitute a major geological hazard and pose considerable risks to the livelihood and lives of

Study Area
For this paper, a road corridor of 80.9 km, averaging 5 km in width, along the Asian Highway-48 (AH-48) stretch from Phuentsholing to Chukha Thegchhen zam (bridge), covering an area of 237.28 km 2 , was considered (Figure 1a). The AH-48 is a major trade route connecting the capital city of Thimphu and rest of Bhutan to India, its main economic partner. The study area is located between 26 • 50 34 N and 27 • 5 20 N latitude and 89 • 23 29 E and 89 • 33 12 E longitude. The highway ascends in altitude from 216 to 2159 m above Mean Sea Level (MSL) over a distance of 40 km from Phuentsholing ( Figure 1b) and runs through a geological formation consisting of moderate to highly weathered phyllites [44,45]. During the monsoon season with an average rainfall of 1663.4 mm per year [46], this section of road is subjected to frequent landslides of varying magnitude at a number of locations ( Figure 2).
Often, the landslides at a site along the road corridor are shallow in nature and occur mainly during the monsoon season [47]. These landslides are caused due to deforestation and toe cutting on the fragile slope during the road construction process. The landslides at Sorchen and Jumja [48] are most critical, necessitating realignment of the road length as an avoidance strategy in 1992 [44,49]. In 2017, the highest 24-h rainfall of 285.4 mm in the study area [50] triggered a major landslide at three sites, leading to road closure for many days. This affected the transport of goods and people between Phuentsholing and other parts of Bhutan, causing major economic losses and disrupting everyday life.

Landslide Inventory
Landslide distribution-or inventory mapping-is the fundamental information required in determining the size and features of a landslide [3,51]. Since a landslide inventory in Bhutan is not available, the technical report of the Bhutan Land Cover Assessment 2010, National Soil Services Center (NSSC), and Policy and Planning Division (PPD) [52] was used as the primary guide for field visits, as well as digitized satellite imagery from the publicly available data source Google Earth. From this report, in which a class of "degraded land" with subclass "landslide" showed landslide areas, and by interpreting Google Earth images, 120 landslides totaling an area of 2.77 km 2 were identified. This was obtained by digitizing landslides ranging from 122 to 304,625 m 2 area in the Google Earth environment and verified with news reports and field visit. Sixty percent of landslides by area were aligned with the Ministry of Agriculture and Forests, Royal Government of Bhutan (MoAF, RGoB) [52] report, while the rest were obtained from Google Earth images, which were likely more recent and active landslides areas. Figure 3 shows the distribution of landslides in the study area.

Causative Factor Selection
In the study, the slope failure susceptibility due to underlying causative factors [8] was considered. Land cover, lithology, elevation, proximity to roads, drainage, fault lines, slope aspect, and slope angle were considered as causal factors based on literature [53] and the availability of data for the study area. Each causal factor was mapped and divided into the several equal interval classes described in the legend in Figure 4.
The percentage of landslides against each class of causal factor is illustrated in Figure 5. The geomorphic factors of elevation, slope aspect, and slope angle were derived from corrected DEM of 30 m resolution obtained from the Ministry of Work and Human Settlement, Bhutan. The highway altitude climbs from 216 to 2159 m above MSL over a distance of 40 km (Figure 1b) with diverse climatic conditions, hydrology, and geology. Hence, an elevation map with eight classes of 300 m intervals was produced. The slope aspect indicates the saturation of the slope with water and heat, which affects soil, rock, and vegetation types [11]. The south-facing aspect has a higher frequency (30%) of landslides in the nine proposed classes. The angle of slope in degrees was divided into six classes of equal intervals of 10 • , with 26.59% of landslides falling in the slope angle class of 30 • -40 • . The lithological and fault details were digitized from a 1:50,000 geological map of Bhutan from Long et al. [45].
The study area includes eleven classes of lithology under the three main zones of the greater Himalayan zone, lesser Himalayan zone, and Paro formation, and seven classes of distance to faults. The lithology consists mainly of amphibolite, quartzite, schist, slate, phyllite, marble, paragneiss, limestone, dolostone, and granite dating to the Paleoproterozoic to Ordovician Era [45]. The details of the lithology and the ages of geological formations are given in Table 1. The number of landslides is relatively higher in the Phuentsholing formation (Pzph) region, where the more commonly found lithologies are slate and phyllite. The AH-48 highway is routed over three active main faults: Shumar thrust (ST), Main Central thrust (MCT), and the Southern Tibetan Detachment (STD), plus a few other minor fault lines and folds [45,54]. Landslide density is significantly higher closer to fault lines, with 35% of overall landslides being within 0-500 m of them. Furthermore, one of the major contributors to landslides is land cover [55,56]. Vegetation aids in protecting a slope whereas bare soil or sparse vegetation agitates the occurrence of a landslide [57,58]. Land cover maps from the MoAF/RGoB [52] report were used to derive seven simplified broad land cover categories to meet the aim of the study. Traffic intensity and the cutting of steep slopes during road construction are another factor influencing landslide occurrence [59][60][61]. The proximity of drainage streams also results in saturating the area and causing landslides [62]. Drainage and road maps were acquired from the Bhutan Geospatial portal website [63], and thematic layers were created with six equidistant buffers of 100 m each.   Table 1, and (h) land cover.

Methodology
An inventory of 120 landslide areas and various factors, such as land cover, lithology, elevation, proximity to roads, drainage, and fault lines, slope aspect, and slope angle, was created and converted into 30 × 30 m grid cells in ArcGIS 10.6 to suit the DEM resolution. An 80% training sample equally proportioned between the landslide and non-landslide pixels was generated randomly and was used to create the spatial associations between occurrences of landslides and the causal factors [64][65][66]. The control sample of 20% was used to validate and verify ( Table 2) the accuracy of the models. The three methodological approaches of IV, WOE, and LR were used to derive the relationship between causal factors and landslide occurrence. The landslide susceptibility indices (LSIs) were generated to produce an LSM. LSIs were divided into the five classes of very low, low, moderate, high, and very high susceptibility using the Jenks natural break classification method [24,53,67]. The Jenks natural break classification method is used to determine the arrangement of values into different classes by minimizing and maximizing each class's deviation from the class mean and other groups' means, respectively [12,42,68]. To validate the performance of the methods, the area under the curve (AUC) of the receiver operating characteristic (ROC) was calculated. The control sample was overlaid on the LSM to examine the predictive capability for future landslides. Additionally, a correlation test was performed to assess the level of similarity between LSMs produced using the models. IBM Statistical Product and Service Solutions 25 (SPSS 25) was used for data management and validation. The flowchart in Figure 6 shows the data sources, thematic layers, and methodology applied in the study.

Information Value Method
The IV model, a simple statistical method for mapping areas vulnerable to landslides by determining the influence of each class of causal factors on landslide occurrence, was found suitable in a number of studies and has been implemented in numerous landslide hazard assessment studies [28,29,42,69]. In this model, the information value I i for a class i in a thematic layer considering 80% as the training sample is given by: where S i represents the number of pixels of the class containing a landslide, N i represents the total number of pixels in the class, S is the total number of pixels with a landslide in the layer, and N is the total number of pixels in the layer. After deriving the information values for each class of causal factor, raster maps were overlaid in a GIS environment. The LSIs were determined by averaging the information value of the causal factors as: where M is the number of factors considered. The total information value of factors contributing to landslide occurrence is obtained from Equation (2). The influence of these factors on landslide occurrence is lesser if the LSI value is lower, and vice versa.

Weights of Evidence Model
The WOE method based on Bayes' theorem was used to find the spatial association between the location of a landslide and a set of contributing factors. Numerous studies have been conducted using WOE, emphasizing its theoretical background and application. A detailed mathematical description and formulation can be found in the literature [61,70,71]. The weights of the causal factor for the presence and absence of a landslide are determined by: where W + and W − are the WOE when a binary predictor factor is present or absent, respectively, P is the probability, B is the presence of the desired class of landslide causal factor, B is the absence of the desired class of landslide causal factor, D is the presence of a landslide, and D is the absence of a landslide. A simplified form of Equation (3) above can be expressed as [41]: where LS in % and nonLS in % are the proportions of landslide pixels and non-landslide pixels in the class, respectively. LS out % and nonLS out % are the proportions of landslide pixels and non-landslide pixels outside the same class, respectively. W + implies a positive correlation and W − indicates a negative correlation between the causal factor and the occurrence of a landslide [40,41,70]. The weight contrast is given by C = (W + − W − ), and its magnitude represents the measure of spatial association between the class of the causal factor and the occurrence of a landslide [72][73][74]. The LSI is derived by aggregating the contrasts of all the factors, with higher values of LSI indicating a higher likelihood of landslide occurrence.

Logistic Regression Model
An LR model is a regression analysis technique used to determine the likelihood of a binary dependent variable from several independent variables. It is used to predict the presence of an outcome based on the values of predictor variables. The method does not require normally distributed data, and the variable can be continuous, discrete, or any combination of both types [75].
In order to perform the LR method on a sample, it is necessary to check for collinearity between the variables [40,76]. If the tolerance (TOL) <0.1 and variance inflation factor (VIF) >10, this indicates multicollinearity between independent variables [77]. The dependent variable (landslide variable) and the independent variable (causal factor) were modeled using the LR application in SPSS 25 to determine the coefficient of each factor. The 80% training sample consisting of an equal proportion of landslide pixels and causal factor pixels was imported into SPSS. The frequency ratio of each class of causal factor was derived as the percentage of landslides against the percentage of the area of the class. The logistic function was applied to causal factors to constrain the values between 0 and 1, where zero indicates the probability of 0% landslide occurrence and one indicates the probability of 100%, according to the equation: where P is the probability of landslide occurrence, and z is the landslide causal factors assumed to be a linear combination of the causal factors X i (i = 1, 2, 3, . . . , n), where and B i is the regression coefficient of landslide causal factors.

Validation of the Models
The AUC-ROC graphically illustrates the performance of a binary classifier for the false positive rate (1-specificity) against the true positive rate (sensitivity). The AUC represents the performance of the success rate and prediction of the model against the occurrence of a landslide [78]. A landslide predicted in an existing landslide area is a true positive outcome, whereas a landslide predicted in a non-existing landslide area is a false positive [79]. Rasyid et al. [12] defined the diagnostic values as "0.5-0.6 (fail), 0.6-0.7 (poor), 0.7-0.8 (fair), 0.8-0.9 (good), and 0.9-1.0 (excellent)" in the AUC curve. The AUC success rate of each model was derived for the LSI computed for each model and the training samples, while the rate of prediction was verified using a control sample. In addition, the training and control samples were overlaid on the LSMs to assess predictive performance. The training sample covered a small area of higher susceptibility classes, whereas the landslide data would lie in higher classes when overlaid on the LSM [12,66]. Correlation coefficients ranging from 0 (no correlation) to 1 (strong relationship) were derived from the correlation analyses between a pair of LSMs developed from the IV, WOE, and LR models.

Results
Three LSMs were produced using the IV, WOE, and LR models with 80% training samples. The detailed calculations and values of each class of causal factors derived using the three methods are given in Table A1. In each of the models, the LSIs of the classes under different causal factors corresponding to each pixel in the map were divided into five susceptibility levels and mapped as shown in Figure 7.

Information Value Method
The information values of each class under causal factors were determined using Equation (1) and by overlaying the causal factor map on the landslide distribution map. As indicated in Table A1, the IV is directly proportional to the slope angle. A slope angle of >50 degrees had the highest IV value of 0.483 in the class. All southern exposure aspects had higher IV in classes, with a maximum IV of 0.267 on the southern aspect. The 300-600 m elevation class had the highest IV, followed by the 0-300 m and 600-900 m elevation classes. The least IV value was at a higher altitude of >2100 m. The highest IV for the distance to road factor was 0.141 in the 200-300 m class, with classes over 300 m having lower IV values. In contrast to distance to road, the distance to drainage factor showed no specific relationship of its proximity with the occurrence of a landslide. However, the IV value for the closest distance class, 0-100 m, was 0.071, which indicates a higher likelihood of occurrence of landslides within the class, but all other classes had insignificant IVs. As shown in Table A1, the proximity to a fault line suggests a greater likelihood of landslide occurrence. The 0-500 m class had the highest IV of 0.285, whereas the >3000 m class had the lowest value of −0.657. For lithology factors, Pzph had the highest IV of 0.515, followed by PCs with 0.382 and pCp with 0.314, suggesting a greater probability of landslide occurrence in these classes. Among all factor classes, the land cover bare area class had the highest IV of 1.562, and, as expected, the forest and built-up area class had the minimum IV. To create the LSI, the IV of the classes under different causal factors corresponding to each pixel in the map was derived using Equation (2).

Weights of Evidence Model
In this model, the weights and contrast were determined using Equations (5) and (6)

Logistic Regression Model
The TOL and VIF calculated for this study (Table 3) were more than 0.39 and less than 2.6, respectively, indicating that there was no multicollinearity between variables. We could, therefore, use all the variables for LR analysis. The forward step-wise LR analysis shows that all factors with an estimated logistic coefficient have a significance value of less than 0.05. This indicates that all the independent variables have an influence on the landslide occurrence variable (Table 4), and, therefore, all causal factors were considered for analysis.  The highest regression coefficient was computed for land cover at 3.882, followed by aspect, lithology, slope angle, distance to fault lines, elevation, distance to road, and distance to drainage. The Z value or LSI was computed in a GIS environment using Equation (8), as follows:

Validation
For the IV, WOE, and LR models, the AUC-success rates were 0.873, 0.872, and 0.866, while the prediction rates were 0.883, 0.882, and 0.880, respectively. From this, we conclude that all the models are suitable methods for generating LSMs (Figure 8), although the IV model shows a slightly better predictive performance.  Figure 9 shows that the overlaid training sample covered 33.2%, 27%, and 16.2% of the study region for IV, WOE, and LR, respectively, in the higher susceptibility classes. When overlaying the training landslide and control landslide samples on the LSM, the area coverage above high susceptibility was 87.4% and 89.3% in the IV model, 84.1% and 85.6% in the WOE model, and 69.6% and 72.2% in the LR model. The IV and WOE models appear to be more reliable, since their values have lesser disparity. From the correlation analysis, the correlation coefficients at a significance level of 0.01 (two-tailed) were 0.699 or greater, with the highest coefficient of 0.845 between WOE and LR (Table 5).

Discussion
LSMs were generated from the IV, WOE, and LR models considering the relationship between causal factors and landslides. The LSMs were compared for predictive performance using AUC-ROC by overlaying the training and control samples on LSMs and performing correlation coefficient tests. Higher susceptibility was observed in the extreme southern and northern parts of the focus area, where human settlements, steeper slopes, and more fault lines can be found. Factors such as roads and landcover were found to have a high correlation with the occurrence of landslides, as found by [80]. From the LR model, it was found that the land cover contributes the most to landslide occurrence out of all the factors, followed by aspect, lithology, slope angle, distance to fault lines, elevation, distance to road, and distance to drainage. As shown in Table A1, the most significant contributors to landslide occurrence from each factor were slope angle of >50, southern exposure aspect, 300-600 m elevation, 200-300 m distance to road, 0-100 m distance to drainage, 0-500 m distance to fault lines, lithology group Pzph (Phuentsholing formation), and bare land cover.
The northern region of the study area had the greatest slope angle, and the lowest slope angle was in the south (Figure 4). Landslide occurrence was directly proportional to the slope angle, which agrees with the general conclusion of expecting fewer landslides on gentler slopes because of lower associated shear stresses [11]. This also conforms to the findings of other research [81]. The south, southeast, and southwest had the highest LSIs in aspect classes in the present study. This may be due to the high humidity in the south-facing aspect [11]. Field knowledge also dictates that south-facing slopes are warm, wet, and forested. Most of the study area falls within the elevation class of >2100 m above mean sea level, which is less susceptible to landslide occurrence. In our study, we observed that elevation classes 300-600 m and 0-300 m had a higher influence on landslide occurrence, which gradually decreased as elevation increased. Other studies have also demonstrated similar findings [82]. The stability and resistance to weathering processes of rocky cliffs at higher altitudes may explain this phenomenon [60]. All classes with a distance less than 300 m from the road had a higher influence on landslide occurrence than distances over 300 m. However, the highest LSI was in the 200-300 m class rather than a distance closer to a road. This causal factor contributes to landslides when the slope is eroded from the bottom of the slope [83]. Various studies have shown different patterns depending on the interval of the class. Some studies have shown higher susceptibility closer to roads [81,84,85], while others presented similar findings to the current study [78,86,87]. The results for the occurrence of landslide and distance to drainage suggest that the 0-100 m class is the main contributor. The influencing factor here could be the degree of saturation of material in the area and streams eroding the slopes [25]. The study area has three main fault lines in the south and one fault line in the north. Under the distance to faults factor, the 0-500 m class has the most impact on the probability of landslides, while other classes contribute comparatively less. A landslide occurs closer to a fault because of fractures in the rock masses [36,88], as concluded in other studies [69,89]. The influence of lithology on landslide occurrence is highest where the class consists of phyllite, slate, and schist, with the highest value of LSI [40]. From the results, the bare area land cover class has the highest influence on the occurrence of landslides in the classes considered. Bare areas lacking vegetation are more prone to landslides because vegetation helps prevent erosion through an anchoring effect [90].
The AUC of 0.8 and higher suggests that all three models performed well and produced reliable LSMs. The IV model had the highest AUC, while the WOE and LR models had slightly lower values. An LSM developed for the Zigui-Badong area near the Three Gorges Reservoir in China found that the IV model performs better than the LR model, although the deviation was small [91]. Ozdemir and Altural [40] have also reported that the WOE model has a higher AUC than the LR model. By contrast, a study conducted in the city of Mizunami, Japan found that the LR model has a better predictive performance than the WOE model [84]. The correlation analysis indicated that the LSM generated from the WOE and LR models had similar distribution patterns, whereas the LSM by the IV method was less similar to other methods ( Table 5). When landslide pixels were overlaid onto the LSM, it showed that the LSM produced with the IV model was better, since it predicted a higher percentage of landslides of higher susceptibility in both the training and control groups (Figure 9). Figure 10 shows some of the existing landslide photographs overlaid onto the LSM produced with the IV method, with most of the landslides lying in areas with high landslide susceptibility.

Conclusions
This paper presents a comparative performance of the IV, WOE, and LR models in developing landslide susceptibility maps along a road corridor in Bhutan. The LSM assessment on the studied road corridor showed that the probability of landslide occurrence is greater in the southern area, with the mid-region being more stable. The influence of each class of causal factor is evident from the IV and WOE models. The highest contributors to landslide occurrence from each factor were slope angle of >50, southern exposure aspect, 300-600 m elevation, 200-300 m distance to road, 0-100 m distance to drainage, 0-500 m distance to fault lines, lithology group Pzph (Phuentsholing formation), and bare area in land cover. As a whole, the LR model has greater advantage in identifying the influence of causal factors on the probability of a landslide. In order from most to least influential, the factors were: land cover, aspect, lithology, slope angle, distance to fault lines, elevation, distance to road, and distance to drainage.
The LSMs generated by the three methods were evaluated for suitability by deriving the AUC and overlaying the actual landslide map on the LSM. The correlation coefficients for all the models were greater than 0.699, indicating strong correlation. The WOE and LR models were determined to be similar with a correlation coefficient of 0.85. The validation showed the success rate and prediction rate of all the models to be suitable. The comparative test of the models showed that the IV model was better than the WOE and LR models. We therefore recommend the IV as the most suitable method to predict future landslides. However, its performance can be improved by using higher-resolution images and large-scale maps.
We conclude that the methods considered in the present study can be suitably performed to study areas in Bhutan to develop an accurate and reliable LSM. The LSM thus developed may be used to support planning and decision-making for landslide prevention and mitigation activities during the construction, operation, and maintenance of the road network in Bhutan.
Funding: This research received no external funding.