Landslide Detection and Susceptibility Mapping by AIRSAR Data Using Support Vector Machine and Index of Entropy Models in Cameron Highlands , Malaysia

Since landslide detection using the combination of AIRSAR data and GIS-based susceptibility mapping has been rarely conducted in tropical environments, the aim of this study is to compare and validate support vector machine (SVM) and index of entropy (IOE) methods for landslide susceptibility assessment in Cameron Highlands area, Malaysia. For this purpose, ten conditioning factors and observed landslides were detected by AIRSAR data, WorldView-1 and SPOT 5 satellite images. A spatial database was generated including a total of 92 landslide locations encompassing the same number of observed and detected landslides, which was divided into training (80%; 74 landslide locations) and validation (20%; 18 landslide locations) datasets. Results of the difference between observed and detected landslides using root mean square error (RMSE) indicated that only 16.3% error exists, which is fairly acceptable. The validation process was performed using statistical-based measures and the area under the receiver operating characteristic (AUROC) curves. Results of validation process indicated that the SVM model has the highest values of sensitivity (88.9%), specificity (77.8%), accuracy (83.3%), Kappa (0.663) and AUROC (84.5%), followed by the IOE model. Overall, the SVM model applied to detected landslides is considered to Remote Sens. 2018, 10, 1527; doi:10.3390/rs10101527 www.mdpi.com/journal/remotesensing Remote Sens. 2018, 10, 1527 2 of 32 be a promising technique that could be tested and utilized for landslide susceptibility assessment in tropical environments.


Introduction
Natural disasters, such as landslides, floods, earthquakes, hurricanes, soil erosion and tsunamis, cause huge damages to properties and human lives, among which, landslides are known as one of the most important natural disasters worldwide [1], which are responsible for at least 17% of all natural hazard fatalities [2].
In Southeast Asia, landslides are one of the most common disasters due to its special climate condition, mountainous terrain and socioeconomic circumstances [3].Torrential rainfalls, which cause the heavy flow of mudslides, are the main trigger of landslides and their damages in Cameron Highlands area, Malaysia [4].Pradhan et al. (2010) reported that during 2006-2009, numerous landslides in the Cameron Highlands occurred due to torrential rainfalls, causing millions of dollars of property losses, as well as many fatal events [4].Though few landslides occurred in residential areas, in the Cameron Highlands, many of the landslides have occurred along roads and highways due to human interference (man-made/anthropogenic factor) and triggering factors such as heavy rainfall.This means that humans have prepared the conditions for landslides' occurrence through the balance stability disturbance of natural slopes (no artificial slopes) [5].In recent years, there have been numerous landslides and mudflow events occurring in the Cameron Highlands, resulting in enormous socio-economic damages.Landslide susceptibility, hazard and risk assessment is ineffective due to the lack of reliable landslide inventory maps.Landslide susceptibility assessment can be achieved by providing accurate landslide information and easily accessible and continuous risk data [6].Therefore, an accurate susceptibility mapping can provide key information for a large variety of users [7].In Cameron Highlands area in Malaysia, landslide mapping is difficult because of the dense vegetation covering landslides and the cloudy and rainy weather conditions [8].Consequently, it is of high necessity to obtain reliable landslide susceptibility maps using accurate data and new techniques in tropical areas for purposes such as implementing landslide mitigation measures [9].
In recent years, radars have given a new dimension to doing research on disaster management with precise and real-time information [10].Synthetic aperture radar (SAR) is an active remote sensing system, which collects data day and night, no matter under what kind of weather condition.SAR data have been applied to natural hazards' researches independently or in combination with data obtained from other remote sensing sensors [11,12].The combination of optical and SAR data can also be used in geo-hazards' identification and susceptibility mapping and is especially popular in landslide studies [13][14][15][16][17]. Remote sensing is the foundation of landslide inventory maps and related thematic maps.Previous studies have demonstrated the potential of remote sensing data for the extraction of causal factors and finding landslide-prone areas [18][19][20][21].
There is no standard procedure for the production of landslide susceptibility maps [22].Recently, because of remote sensing data together with data from other sources and the highly-developed geographic information system (GIS), the preparation of different thematic layers that are responsible for the occurrence of landslides can be accomplished in a region [23][24][25][26][27].During the last few decades, the feasibility and effectiveness of using GIS and remote sensing technologies to assess landslide susceptibility modeling have been proven [28][29][30][31][32]. Currently, a variety of GIS-based methods are being used for landslide susceptibility modeling with less input data.
Earlier studies have shown that in the Cameron Highlands, landslide studies have been conducted in three different forms: (i) only landslide detection by remote sensing data [8,32]; (ii) landslide susceptibility mapping [4,5,40]; and (iii) landslide detection by the combination of remote sensing data and susceptibility mapping using other techniques [30].In this case, Shahabi and Hashim (2015) detected landslides in the Cameron Highlands and prepared a landslide susceptibility map using three multi-criteria decision making models including the analytical hierarchy process (AHP), weighted linear combination (WLC) and spatial multi-criteria evaluation (SMCE) models [30].Although some studies have been conducted on landslide assessment using detection by remote sensing data and susceptibility mapping over the different case studies individually, few studies have taken into account both landslide detection and susceptibility mapping in tropical areas such as Cameron Highlands.Therefore, the difference between the current study from earlier studies in the Cameron Highlands is the use of remote sensing data, AIRSAR (C-, L-and P-band images) and optical satellite images for landslide detection to obtain an accurate landslide inventory map, as well as the application of a machine learning algorithm, SVM, and a bivariate statistical model, the index of entropy (IOE), for landslide susceptibility mapping of the Cameron Highlands.

Description of the Study Area
The study area is located between the latitudes of 4 • 24 37"N-4 • 33 19"N and the longitudes of 101 • 20 21"E-101 • 26 50"E, covering an area of 38.4 km 2 (Figure 1).The geomorphology of the area is characterized by a rugged topography with hill ranges varying from 840 to over 2100 m.a.s.l.The Bertam and Telom Rivers are the main drainage features in this area.Its valleys and tributaries mainly flow from north-northwest to south-southeast [40].The annual rainfall is between 2500 and 3000 mm per year falling mostly in March and May and also from November-December.The average daytime and nighttime temperatures are 24 • C and 14 • C, respectively, which lie in moderate climatology category.The Cameron Highlands are usually cloud-covered during the year.The tropical forest and tea plantations, temperate vegetable and flower farms are the major crops in the study area [4].Geologically, megacrystic biotite granites are the most common geological structures of the central mountain chain in Peninsular Malaysia.Schists, phyllite, slate and limestones comprise a significant lithology of Cameron Highlands [81]. Figure 2 shows the geological map of the study area.In Cameron Highlands, most of the landslides have occurred when the maximum daily rainfall reached about 208 mm [82].

Landslide Inventory Map
It is difficult to map landslides in the tropical mountainous environments because dense vegetation obscures landslides soon after their occurrence [8].To obtain important information from landslide locations, remote-sensing data such as aerial photography interpretation and optical satellite images (OSI) are required [19].The landslide information taken from remotely-sensed images is especially associated with the plant life, morphology and hydrologic conditions of the region [83].In this study, the interpretation of digital aerial photographs (DAP) with a 10,000-1:50,000 scale over a 25-year period, WorldView-1 satellite imagery (March 2013), AIRSAR data (November 2004), published reports and field surveys has been done for the extraction of the landslide inventory map.
The black and white digital aerial photographs (acquired from the Malaysian Surveying and Mapping Department archives) with a spatial resolution of 0.54 m were taken during 1981-2006.In order to detect the landslides that have occurred in the research area, six digital aerial images were used, and only one block of adjustment was required for the process; 4 stereo models were then

Landslide Inventory Map
It is difficult to map landslides in the tropical mountainous environments because dense vegetation obscures landslides soon after their occurrence [8].To obtain important information from landslide locations, remote-sensing data such as aerial photography interpretation and optical satellite images (OSI) are required [19].The landslide information taken from remotely-sensed images is especially associated with the plant life, morphology and hydrologic conditions of the region [83].In this study, the interpretation of digital aerial photographs (DAP) with a 10,000-1:50,000 scale over a 25-year period, WorldView-1 satellite imagery (March 2013), AIRSAR data (November 2004), published reports and field surveys has been done for the extraction of the landslide inventory map.
The black and white digital aerial photographs (acquired from the Malaysian Surveying and Mapping Department archives) with a spatial resolution of 0.54 m were taken during 1981-2006.In order to detect the landslides that have occurred in the research area, six digital aerial images were used, and only one block of adjustment was required for the process; 4 stereo models were then formed.
WorldView-1 satellite data with a resolution of 0.46 m were used for the detection of landslides that have occurred and validation of the landslide inventory map.The AIRSAR data, with a 40-MHz, 10-km swath width and slant-range resolution 5 m were collected in November 2004 during the PacRim1 campaign.A DEM (digital elevation model) with a resolution of 10 m combined with C-, L-and P-band images was used to compare with landslide features obtained from digital aerial photographs and WorldView-1 satellite imagery.Although we had some data sources with different scales, all the data were mapped at a resolution of 10 m × 10 m to remove the effect of scale on the detection and validation process.The literature review pinpointed that some researchers have successfully used specified resolutions such as 10 m and 20 m for landslide detection and modeling [4,8].
Table 1 lists the main characteristics of the AIRSAR DEM and WorldView-1 satellite imagery used for landslide detection in the study area.
In the study area, due to the dense forest canopies, the cloudy and rainy weather conditions and also the harsh topography, comprehensive field works and investigations were not possible.Therefore, we had to limit field investigations (ground control points (GCPs)) only to partial locations where they were easily accessible for checking such as along roads and highways, residential areas and the slopes with low elevations.Figure 3 shows landslide types according to the landslide classification of Varnes et al. (1978) [84].
Remote Sens. 2018, 10, x FOR PEER REVIEW 6 of 32 formed.WorldView-1 satellite data with a resolution of 0.46 m were used for the detection of landslides that have occurred and validation of the landslide inventory map.The AIRSAR data, with a 40-MHz, 10-km swath width and slant-range resolution 5 m were collected in November 2004 during the PacRim1 campaign.A DEM (digital elevation model) with a resolution of 10 m combined with C-, L-and P-band images was used to compare with landslide features obtained from digital aerial photographs and WorldView-1 satellite imagery.Although we had some data sources with different scales, all the data were mapped at a resolution of 10 m × 10 m to remove the effect of scale on the detection and validation process.The literature review pinpointed that some researchers have successfully used specified resolutions such as 10 m and 20 m for landslide detection and modeling [4,8].
Table 1 lists the main characteristics of the AIRSAR DEM and WorldView-1 satellite imagery used for landslide detection in the study area.
In the study area, due to the dense forest canopies, the cloudy and rainy weather conditions and also the harsh topography, comprehensive field works and investigations were not possible.Therefore, we had to limit field investigations (ground control points (GCPs)) only to partial locations where they were easily accessible for checking such as along roads and highways, residential areas and the slopes with low elevations.Figure 3 shows landslide types according to the landslide classification of Varnes et al. (1978) [84].In order to identify landslides, three techniques were employed: (i) overlaying of landslide vector images onto the DEMs and AIRSAR raster images; (ii) classifying of the images using ENVI 4.8 software; and (iii) separating landslides from the other land cover types using the segmentation tool in "eCognition" software [85].The spectral values of the C-, L-and P-bands and average slope of the area were used in segmentation as the information about the group of pixels inside the boundaries of the landslide.Besides, the efficiency and quality of optical satellite images and AIRSAR data were examined using root mean square error (RMSE) [85,86].In order to identify landslides, three techniques were employed: (i) overlaying of landslide vector images onto the DEMs and AIRSAR raster images; (ii) classifying of the images using ENVI 4.8 software; and (iii) separating landslides from the other land cover types using the segmentation tool in "eCognition" software [85].The spectral values of the C-, L-and P-bands and average slope of the area were used in segmentation as the information about the group of pixels inside the boundaries of the landslide.Besides, the efficiency and quality of optical satellite images and AIRSAR data were examined using root mean square error (RMSE) [85,86].It is noted that the observed landslides are points, while the detected landslides are polygons.In this study, each polygon was finally converted to a point using the "polygon to point" tool in ArcGIS, which represents a pixel of 10 m × 10 m in size.Consequently, the final landslide inventory map was converted to grid format with a cell size of 10 m.A total of 92 landslides were taken into account, among which 74 cases (80%) were selected for training models and the remaining 18 cases (20%) were used for validation purposes.The area of landslides in Cameron Highlands based on obtained landslide inventory is 6.27 km 2 , accounting for 4.05% of the entire study area.The maximum, mean and the minimum landslide areas are 0.123, 0.017 and 0.003 km 2 , respectively.In addition to landslide locations, we randomly selected 92 non-landslide locations (stable) using the "Create random point" tool in ArcGIS for modeling by SVM in WEKA 3.7.12software.Then, these points were randomly classified into 80% (74 locations) and 20% (18 locations).

Landslide Geodatabase
The DEM image was warped to the given ground control points in order to geo-reference the data.Then, the image was further resized equal to the size of the sample area.At this stage, the pixel resolution size was changed to a new resolution of 2 m.The whole resultant sample area DEM heights were then used for comparison with heights of DAP, DEM, GPS and GCPs in the assessment and correction processes.The sequence of procedures and the resultant images are shown in Figure 4.

Remote Sens. 2018, 10, x FOR PEER REVIEW 8 of 32
It is noted that the observed landslides are points, while the detected landslides are polygons.In this study, each polygon was finally converted to a point using the "polygon to point" tool in ArcGIS, which represents a pixel of 10 m × 10 m in size.Consequently, the final landslide inventory map was converted to grid format with a cell size of 10 m.A total of 92 landslides were taken into account, among which 74 cases (80%) were selected for training models and the remaining 18 cases (20%) were used for validation purposes.The area of landslides in Cameron Highlands based on obtained landslide inventory is 6.27 km 2 , accounting for 4.05% of the entire study area.The maximum, mean and the minimum landslide areas are 0.123, 0.017 and 0.003 km 2 , respectively.In addition to landslide locations, we randomly selected 92 non-landslide locations (stable) using the "Create random point" tool in ArcGIS for modeling by SVM in WEKA 3.7.12software.Then, these points were randomly classified into 80% (74 locations) and 20% (18 locations).

Landslide Geodatabase
The DEM image was warped to the given ground control points in order to geo-reference the data.Then, the image was further resized equal to the size of the sample area.At this stage, the pixel resolution size was changed to a new resolution of 2 m.The whole resultant sample area DEM heights were then used for comparison with heights of DAP, DEM, GPS and GCPs in the assessment and correction processes.The sequence of procedures and the resultant images are shown in Figure 4.For the single C-band VV, the file opened in ENVI 4.8 software was then converted to sigma zero (σ • ) and finally to decibels (dBs).Later, this converted Cvv polarization data were combined with the L-and P-band polarimetric data into a single file.Together, they were then masked, corrected from Antenna Pattern Correction (APC) and geo-referenced.The L-and P-band files were opened in ENVI 4.8 software using the POLSAR Tools menu.The process was to decompress (synthesize) the Stoke matrix files into six wavelength-polarization files that include: (1) L band; HH, HV and VV polarization and (2) P band; HH, HV and VV polarization.After processing, the synthesized L and P data were combined with the converted Cvv data into a new single file [87].

Landslide Conditioning Factors
It is important to extract relevant landslide conditioning factors to construct a spatial database [88].In this study, ten conditioning factors such as slope, aspect, soil, lithology, NDVI, land cover, rainfall, distance to fault, distance to river and distance to road were used to construct a spatial database using ArcGIS.The description of these conditioning factors is shown in Table 2.
As mentioned above, a digital elevation model (DEM) with a 10-m pixel size was produced using AIRSAR DEM (Table 2) from which slope, aspect and distance to river were extracted.Slope, aspect and distance to river factors were then classified into five, nine and seven classes, respectively based on the natural break classification scheme [30] (Table 3).The distance to fault and lithology were derived from the geological map at the 1:63,300 scale, which were classified into six and two classes, respectively (Table 2).The distance to the road was calculated using the topography map with a 50-m buffer zone, which was determined based on the landslides that occurred regarding the closeness of the road in five classes.The soil types were obtained from the soil map at the scale of 1:25,000 and were classified into two classes (Table 3).
Land cover was extracted from the SPOT 5 satellite image using the maximum-likelihood classification method, justified by field survey and in eight classes.The NDVI map was also extracted from SPOT 5 satellite image and in ten classes.The historical rainfall data during the last 30 years was prepared .The average annual rainfall map was prepared with the kriging method using ArcGIS and in ten classes (Tables 2 and 3).The flowchart designed in this study for the landslide susceptibility mapping and spatial data is shown in Figure 5.This flowchart includes four parts: (1) landslide conditioning factors (data collection), (2) the landslide inventory map obtained by overlying the observed and detected landslides, (3) the landslide susceptibility mapping by the SVM and IOE models and (4) model analysis and comparison by statistical measures, AUCROC and statistical tests (Friedman and Wilcoxon signed rank tests).

Support Vector Machine
One of the most popular machine learning algorithms is SVM as a supervised learning binary classifier, which works based on the structural risk minimization principle [89,90].In the classification issue, the SVM separates a given training dataset based on a hyper-plane to maximize the distance between them, which is known as the maximal margin hyper-plane [89].Indeed, the

Support Vector Machine
One of the most popular machine learning algorithms is SVM as a supervised learning binary classifier, which works based on the structural risk minimization principle [89,90].In the classification issue, the SVM separates a given training dataset based on a hyper-plane to maximize the distance between them, which is known as the maximal margin hyper-plane [89].Indeed, the aim of SVM is to find an n-dimensional hyper-plane differentiating between two types by their maximum gap [91,92].The mathematical expressions are as follows [93]: (1) where w is the norm of the normal hyper plane and b is a constant.After multiplying the Lagrangian coefficient (λ i ), the cost function can be expressed as: For the non-separable case, the slack variable [94], Equation ( 4) can be modified as: After that, v (0, 1), which expresses the misclassification [95], Equation ( 5) can be defined as: Besides, a kernel function K (x i , x j ) is taken into account for the nonlinear decision boundary [94].In this study, the radial basis function (RBF) was selected as the kernel function because of its robustness published by researchers [96,97].The RBF Gaussian kernel is expressed as follows: where γ is a parameter of the kernel functions [95].

Index of Entropy
Entropy indicates the extent of the uncertainty of a system [98].The entropy of landslides indicates the extent of various factors affecting the landslide occurrence [99,100].We can use the entropy value to calculate the objective weights of the index system [2,101].The index of entropy (IOE) allows estimating the weight for each conditioning factor (W j ) using the equation as follows [99]: where a and b are percentages of the study area and landslide, respectively; S j is called for the class j; (P ij ) is the probability density.Here, H j and H jmax are entropy values (Equations ( 9) and ( 10)).
H jmax = log 2 S j (10) where S j is the number of classes, I j is the information coefficient (Equation ( 11)) and W j is the corresponded weight value of this information coefficient (Equation ( 12)).
The final calculation result of weight values for each parameter is shown in Table 3.Then, the landslide susceptibility map is generated by applying Equation (13) in ArcGIS where Y IOE is the total classes; i is the number of map parameters (1, 2, . . ., n); z is the greatest number of classes; m i is the number of classes within the map parameter; C is the second classified value of class; and W j is the weight of a parameter [51].This summation shows the various levels of the landslide susceptibility [101].

Statistical-Based Measures
Statistical index-based methods are used to evaluate and compare the performance of machine learning models.In this study, sensitivity (recall), specificity, precision (positive predictive value (PPV)), accuracy, root mean squared error (RMSE) and chosen Kappa were utilized.According to their formulas, they are defined based on the four types of possible consequences including true positive (TP), false positive (FP), true negative (TN) and false negative (FN).The TP and FP are defined as the proportion of the number of pixels that are correctly classified as landslide and non-landslide, respectively.Meanwhile, TN and FN are the number of pixels classified correctly and incorrectly as non-landslide, respectively [102].Hence, sensitivity (recall) is defined as the number of correctly-classified landslides per total predicted landslides, while specificity is the number of incorrectly-classified landslides per total predicted non-landslides [102].Accuracy is the proportion of landslide and non-landslide pixels that are correctly classified [103].Kappa shows the reliability of the landslide models [103].It varies from −1 (non-reliable) to 1 (reliable) [60].If it is ≤0, 0-0.2, 0.2-0.4,0.4-0.6,0.6-0.8 and 0.8-1, it indicates poor, slight, fair, moderate, substantial and almost perfect agreement between estimation (the model) and observation (the reality), respectively [104].RMSE shows the error metric between the observed and estimated data of models [103].The smaller the RMSE, the better performance of the landslide model [105] where N is the number of total training pixels; n is the proportion of pixels that is correctly classified.
When the Kappa value is close to 1, this means a perfect agreement between the model and reality.In contrast, a Kappa value close to 0 indicates that the agreement is no better than chance.The worst case is that the agreement is worse than chance with negative kappa.The value has real meaning only when the categories of the two maps depict the same kind of data with the same data classes [106].Therefore, the kappa index value was also considered to be evidence to show the similarity between the two landslide susceptibility maps.
where n is the total sample in the training dataset or the validation dataset; X predicted is the predicted values in the training dataset or the validation dataset; and X actual is the actual (output) values from the landslide susceptibility models.

ROC Curve Analysis
The receiver operating characteristic curve in all landslide susceptibility studies has been applied to evaluate the performance of the models.It is a standard tool that is plotted using sensitivity on the x-axis and 100-specificity on the y-axis [62,105].The area under the ROC curve (AUC) is commonly used for evaluating the performance of the landslide models [107].It has a range from 0.5-1; an ideal model has AUC equal to 1, and an inaccurate model has an AUC equal to 0.5 [102].The AUC is computed using the following equation: where TP is the number of landslides that is correctly classified, TN is the number of incorrectly-classified landslides, P is the total number of landslides and N is the total number of non-landslides.

Statistical Tests (Friedman and Wilcoxon)
The core of this section is the comparison of the performance of two or more machine learning classifiers on multiple datasets using statistical tests.Indeed, the aim is to find which one of these techniques differs statistically in performance without record of their variance.Hence, it is assumed that the compiled results obtained from the machine learning classifiers in this study provide reliable estimates.All classifiers were evaluated using the same random samples.Statistically, there are two methods for the comparison of two or more classifiers including parametric and non-parametric methods.D'Arco et al. (2012) stated that the parametric tests are suitable when the data are normally distributed with equal variances [108].Additionally, Derrac et al. (2011) reported that the non-parametric tests are free from any statistical assumptions.Moreover, Demšar (2016) expressly declared that non-parametric tests such as Friedman and Wilcoxon sign rank tests are safer and their results stronger than parametric tests, since they do not assume normal distributions or homogeneity of variance.
For this reason, in this study, Freidman [109] and Wilcoxon [110] signed rank tests were used to compare the significant differences between the treatments of models.The null hypothesis for them is that there are not any differences between the performances of the landslide models at the significance level of α = 0.05 (or 5%).Then, a judgement is made based on the probability of a hypothesis (p-value), so that if the p-value is true, the null hypothesis is rejected, and as a result, there is a significant difference between the two models and vice versa [62].It is probably in the comparison process between two or more models that the p-value in the Friedman test for all models was true.In this case, the result is not reliable to compare between models [62].Therefore, the strategy used in this case is the Wilcoxon signed-rank to assess the statistical significance of systematic pairwise differences between the landslide models.In this test, the p-value and z-value are used for evaluating the significance of differences between the landslide susceptibility models.When the p-value is less than 0.05 and the z-value exceeds the critical values of z (−1.96 and +1.96), the null hypothesis is rejected, and thus, the performance of the susceptibility models is significantly different [62][63][64].

Landslide Detection Using AIRSAR and Optical Satellite Images
The classified section of the segmented image overlaid onto the old landslide map is shown in Figure 6a-c.According to this figure, the detected landslides have relatively good concordance with polygons (observed) classified as landslides.Two processes were conducted in order to validate the location of the detected landslides.The first was WorldView-1 satellite images and digital aerial photographs and the other was field surveying.Field observation was then carried out to check the locations of the landslides shown in the old landslide map (Figure 6a).The results exploited that identification of landslides is difficult on the ground due the small size, no traces and also covering of scars by dense vegetation (Figure 6b).The time difference between the production of the map ( 2004) and the field observation (2015) could be another reason (Figure 6c).
and their results stronger than parametric tests, since they do not assume normal distributions or homogeneity of variance.
For this reason, in this study, Freidman [109] and Wilcoxon [110] signed rank tests were used to compare the significant differences between the treatments of models.The null hypothesis for them is that there are not any differences between the performances of the landslide models at the significance level of α = 0.05 (or 5%).Then, a judgement is made based on the probability of a hypothesis (p-value), so that if the p-value is true, the null hypothesis is rejected, and as a result, there is a significant difference between the two models and vice versa [62].It is probably in the comparison process between two or more models that the p-value in the Friedman test for all models was true.In this case, the result is not reliable to compare between models [62].Therefore, the strategy used in this case is the Wilcoxon signed-rank to assess the statistical significance of systematic pairwise differences between the landslide models.In this test, the p-value and z-value are used for evaluating the significance of differences between the landslide susceptibility models.When the p-value is less than 0.05 and the z-value exceeds the critical values of z (−1.96 and +1.96), the null hypothesis is rejected, and thus, the performance of the susceptibility models is significantly different [62][63][64].

Landslide Detection Using AIRSAR and Optical Satellite Images
The classified section of the segmented image overlaid onto the old landslide map is shown in Figure 6a-c.According to this figure, the detected landslides have relatively good concordance with polygons (observed) classified as landslides.Two processes were conducted in order to validate the location of the detected landslides.The first was WorldView-1 satellite images and digital aerial photographs and the other was field surveying.Field observation was then carried out to check the locations of the landslides shown in the old landslide map (Figure 6a).The results exploited that identification of landslides is difficult on the ground due the small size, no traces and also covering of scars by dense vegetation (Figure 6b).The time difference between the production of the map (2004) and the field observation (2015) could be another reason (Figure 6c).The landslide features obtained from WorldView-1 satellite images were overlaid onto the C-, L-and P-band images.Hence, the landslide inventory map was validated by WorldView-1 satellite images and digital aerial photographs.C-, L-and P-band images were used in the UTM reference The landslide features obtained from WorldView-1 satellite images were overlaid onto the C-, L-and P-band images.Hence, the landslide inventory map was validated by WorldView-1 satellite images and digital aerial photographs.C-, L-and P-band images were used in the UTM reference system for the landslide features.The final compiled landslide inventory map in this study is shown in Figure 7.The comparison between Figures 6 and 7   It should be noted that the total of 92 landslide locations in this study was selected based on the overlaying between the detected and observed landslides for landslide modeling.As a final result, the validity of the detected and observed landslides using RMSE concluded that the different between them was 0.163 (16.3%), which is a reasonable result.

Model Analysis and Results
The results of the performance of the SVM and IOE models using training and validation datasets are shown in Table 4. Landslide modeling in the training phase concluded that the SVM model had the highest sensitivity (94.6%), illustrating that 94.6% of the landslide pixels were correctly classified in the landslide class, followed by the IOE model (87.8%).Furthermore, results revealed that the highest specificity (87.8%) belonged to the SVM model, indicating that 87.8% of the non-landslide pixels were correctly classified with respect to the non-landslide class, followed by the IOE model (79.2%).Additionally, the SVM model had the highest value of accuracy (91.2%),Kappa (0.883) and AUC (89.6%) compared to the IOE model.
Overall, the SVM and IOE models were both successfully trained in the training phase.However, the SVM model was more accurate than the IOE model in the model construction procedure.The landslide susceptibility indexes were calculated for all pixels in the study area using these models to obtain landslide susceptibility mapping.It should be noted that the total of 92 landslide locations in this study was selected based on the overlaying between the detected and observed landslides for landslide modeling.As a final result, the validity of the detected and observed landslides using RMSE concluded that the different between them was 0.163 (16.3%), which is a reasonable result.

Model Analysis and Results
The results of the performance of the SVM and IOE models using training and validation datasets are shown in Table 4. Landslide modeling in the training phase concluded that the SVM model had the highest sensitivity (94.6%), illustrating that 94.6% of the landslide pixels were correctly classified in the landslide class, followed by the IOE model (87.8%).Furthermore, results revealed that the highest specificity (87.8%) belonged to the SVM model, indicating that 87.8% of the non-landslide pixels were correctly classified with respect to the non-landslide class, followed by the IOE model (79.2%).Additionally, the SVM model had the highest value of accuracy (91.2%),Kappa (0.883) and AUC (89.6%) compared to the IOE model.
Overall, the SVM and IOE models were both successfully trained in the training phase.However, the SVM model was more accurate than the IOE model in the model construction procedure.The landslide susceptibility indexes were calculated for all pixels in the study area using these models to obtain landslide susceptibility mapping.

Model Validation and Comparison
After model construction, the validation of the models was performed using the validation dataset based on the area under the ROC curve, the kappa index and the statistical evaluation measures (Table 4).The results depicted that the two landslide models showed a high predictive capability for spatial prediction of landslides in the study area.Moreover, the comparison results showed that the SVM model had the highest sensitivity (88.9%), indicating that 88.9% of the landslide pixels were correctly classified in the landslide class, followed by the IOE model (77.8%).The SVM model also had the highest value of specificity (77.8%), indicating that 77.8% of the non-landslide pixels were correctly classified with respect to the non-landslide class.Additionally, the results of model validation and comparison revealed that the highest accuracy (0.833), Kappa (0.663) and AUC (0.845) belonged to the SVM model, followed by the IOE model with the values of 0.750, 0.613 and 0.826, respectively.Eventually, the SVM and IOE models were successfully validated in the evaluation process emphasizing that the SVM model had a greater power of prediction in the landslide model validation process.

LSM by SVM Model
In this study, the radial basis function (RBF) was applied as the kernel function, and the two-class SVM models were firstly trained to build the landslide susceptibility map using ArcGIS.Based on the report by Yao et al. ( 2008), the two-class SVMs can produce a more accurate susceptibility map.The training data were used to train the SVM model [100].Two main parameters such as c and γ in this model were suggested, 0.8 and 0.5, respectively.
Figure 8 is the landslide susceptibility map prepared by the SVM model.Finally, the landslide susceptibility map extracted from the SVM model was reclassified into four susceptibility classes using the natural breaks method as: low, moderate, high and very high.According to the SVM-derived landslide susceptibility map, the very high susceptible zones yielded about 39.78% (15.27 km 2 ) of the total area, while about 27.41% (10.52 km 2 ) was classified as a high susceptibility and 14.92% of the study area (5.72 km 2 ) as a moderate susceptibility zone.It is noticed that the gray color in Figure 8 only indicates the low and high elevations as a hillshade map over the study area, which has not been analyzed.Seventeen-point-eight-nine percent of the study area (6.86 km 2 ) is also classified as a low susceptibility zone (Figure 9).

LSM by the IOE Model
The landslide susceptibility index was calculated by adding up the weighted multiplications by the secondarily reclassified conditioning factors given by Equation (23).
Most landslides occurring in soil classes based on (P ij ) were alluvium-colluvium and Serong series with high values of 0.507 and 0.492, respectively.The highest (P ij ) value of lithology belongs to metamorphic rocks as 0.512.Furthermore, the (P ij ) value for the NDVI index indicated that classes of −0.144-0.012and 0.641-0.809were prone to landslide occurrence with high values of 0.136 and 0.135, respectively (Table 5).The relationship between land cover and landslide occurrence showed that the values of (P ij ) were higher in the classes of agricultural area and settlements with values of 0.180 and 0.129, respectively.The (P ij ) values for rainfall, the highest values (0.276 and 0.105), corresponded to the rainfall classifications of 2765-2781 and 2755-2764 mm/year, respectively.
According to the investigation of distance to faults, the (P ij ) value decreased once distance to roads increased.In this case, the classes of 0-50 m and 50-100 m had high correlations with landslide occurrence with (P ij ) values of 0.257 and 0.169, respectively.Distance to road and distance to drainage were the same as distance to fault such that the (P ij ) value decreased, whereas the distance to these features increased.Most of the landslides were located in the classes of 0-50 m and 50-100 m in terms of distance to drainage with (P ij ) values of 0.160 and 0.153 and the same classes of distance to road with values of 0.302 and 0.173.Furthermore, according to the W j value of the IOE model, rainfall (1.753) had the highest influence on the landslide susceptibility, followed by distance to drainage (1.670), soil (1.172), and lithology (1.127), while the others were much less significant for landslide susceptibility assessment in the region.It should be also kept in mind that the landslide conditioning factors may be different in different regions, such that some factors were suitable for this study area, but may not fit other areas [111,112].In this research, based on the results of the index of entropy (IOE) model, we reclassified the landslide susceptibility map, using the natural break approach, into four categories as low, moderate, high and very high (Figure 10).It is remarkable that the gray color in Figure 10 only indicates the low and high elevations as a hillshade map over the study area, which has not been analyzed.Based on the landslide susceptibility map obtained from the IOE model, 11.41% (4.38 km 2 ) of the entire study area was located in the low landslide susceptibility zone.Moderate and high susceptible zones displayed 15.16% (5.82 km 2 ) and 42.28% (16.24 km 2 ) of the total area, respectively.The very high landslide susceptibility zone occupied 31.15%(11.96 km 2 ) of the total study area, as well (Figure 9).
The capability of the prediction accuracy of the SVM and IOE models was evaluated using the area under the ROC curve (AUROC) based on the training dataset (success rate curve), validation dataset (prediction rate curve) and the Friedman and Wilcoxon signed rank statistical tests.Figure 11 shows the comparison of AUC for the two models using training and validation datasets.Basically, the results of the success rate curve indicated that landslide susceptibility mapping, based on the existing landslide occurrence, using the SVM and IOE models had a good prediction capability.Additionally, the results demonstrated that the SVM model had a higher value of AUC (0.889) compared to the IOE (0.825) model (Figure 11a).Moreover, the results of the prediction rate curve confirm that the landslide susceptibility map plotted by validation landslides, which are supposed to occur in the future, had high prediction accuracy.However, the SVM model showed a high prediction accuracy (AUC = 0.885), followed by the IOE (AUC = 0.806) model (Figure 11b).
Remote Sens. 2018, 10, x FOR PEER REVIEW 24 of 32 Based on the landslide susceptibility map obtained from the IOE model, 11.41% (4.38 km 2 ) of the entire study area was located in the low landslide susceptibility zone.Moderate and high susceptible zones displayed 15.16% (5.82 km 2 ) and 42.28% (16.24 km 2 ) of the total area, respectively.The very high landslide susceptibility zone occupied 31.15%(11.96 km 2 ) of the total study area, as well (Figure 9).
The capability of the prediction accuracy of the SVM and IOE models was evaluated using the area under the ROC curve (AUROC) based on the training dataset (success rate curve), validation dataset (prediction rate curve) and the Friedman and Wilcoxon signed rank statistical tests.Figure 11 shows the comparison of AUC for the two models using training and validation datasets.Basically, the results of the success rate curve indicated that landslide susceptibility mapping, based on the existing landslide occurrence, using the SVM and IOE models had a good prediction capability.Additionally, the results demonstrated that the SVM model had a higher value of AUC (0.889) compared to the IOE (0.825) model (Figure 11a).Moreover, the results of the prediction rate curve confirm that the landslide susceptibility map plotted by validation landslides, which are supposed to occur in the future, had high prediction accuracy.However, the SVM model showed a high prediction accuracy (AUC = 0.885), followed by the IOE (AUC = 0.806) model (Figure 11b).In addition to the AUROC, two statistical tests, the Friedman and Wilcoxon signed rank tests, were applied to validate the landside models.The results of the Friedman test are shown in Table 6.Results indicated that the values of average ranking (AR) for the SVM and IOE models were 2.01 and 1.65, respectively.Although the chi-square was 35.286, due to a significance level of 0.000, the Friedman test was not appropriate for judging the capability of performance between the models.To detect this challenge, the Wilcoxon signed rank test was utilized to assess the pairwise differences between the SVM and IOE models at the 5% significant level (Table 7).Statistically, if there is not a significant difference between the two landslide models at the significant level of 5% (rejection of the null hypothesis), it will be accepted that the results of the two models are not the same.
Tien Bui et al. (2016) reported that when p (value) < 5% (0.05) and z (value) > z (−1.96 and +1.96), it is assumed that the capability of the two models is significantly different [72].According to Table 7, results concluded that there was a statistical difference between two landslide susceptibility models (p (value) = 0.000, z (value) = −10.235,significance = yes).In addition to the AUROC, two statistical tests, the Friedman and Wilcoxon signed rank tests, were applied to validate the landside models.The results of the Friedman test are shown in Table 6.Results indicated that the values of average ranking (AR) for the SVM and IOE models were 2.01 and 1.65, respectively.Although the chi-square was 35.286, due to a significance level of 0.000, the Friedman test was not appropriate for judging the capability of performance between the models.To detect this challenge, the Wilcoxon signed rank test was utilized to assess the pairwise differences between the SVM and IOE models at the 5% significant level (Table 7).Statistically, if there is not a significant difference between the two landslide models at the significant level of 5% (rejection of the null hypothesis), it will be accepted that the results of the two models are not the same.Tien Bui et al. (2016) reported that when p (value) < 5% (0.05) and z (value) > z (−1.96 and +1.96), it is assumed that the capability of the two models is significantly different [72].According to Table 7, results concluded that there was a statistical difference between two landslide susceptibility models (p (value) = 0.000, z (value) = −10.235,significance = yes).

Discussion
On the one hand, due to dense vegetation and cloudy and rainy weather conditions, detection of landslides is a challenging task in the vast and inaccessible mountainous terrain of tropical environments.On the other hand, few studies have attempted to prepare landslide susceptibility mapping using detected landslides by remote sensing data over the world [113,114].For example, Gorsevski et al. (2016) using LIDAR data detected landslides in the Cuyahoga Valley National Park, Ohio, in order to generate a susceptibility map using the artificial neural networks model [113].
The detection process for achieving the landslide inventory map was carried out using DAP, WorldView-1 satellite imagery and AIRSAR data in the current study.The spectral values of the pixels that represent landslides can be differentiated from those of the surroundings.On the one hand, the spatial resolution of AIRSAR DEM is low (10 m), and on the other hand, the resolution of WorldView-1 satellite imagery is high (0.46 m).Therefore, the composite image is one example among several composite images produced.In general, because of the low resolution and rough topography, it is difficult to differentiate the various land cover types using AIRSAR DEM.The findings conclude that the landslide features obtained from WorldView-1 satellite images that were overlaid onto the C-, Land P-band images could precisely detect landslides.The validation process of the detected landslides pinpointed that their locations conformed to the ground control points and the observed landslides through the RMSE value.Furthermore, our findings confirm that for a region where identification of landslides is facing a challenge, the application of landslide detection by remote sensing data can be presented as a reasonable solution.Cheng et al. (2011) declared that the extensive remote sensing imagery has a significant role in landslide inventory mapping, landslide susceptibility and hazard mapping using the detection process [115].Furthermore, Metternicht et al. (2005) have mentioned the role of GIS and RS in landslide detection for spatial prediction of landslides [116].
In this study, we selected a total of 92 landslides using the detection process and checked their locations with the observed landslides for spatial prediction of landslides in Cameron Highlands, Malaysia.Landslide susceptibility mapping was produced using a machine learning algorithm: support vector machine and a statistical method: the index of entropy.For landslide modeling, ten conditioning factors such as slope, aspect, soil, lithology, NDVI, land cover, rainfall, distance to fault, distance to river and distance to road were utilized.The validation process was done using some statistical criteria including sensitivity, specificity, accuracy, Kappa and AUROC based on the training (goodness-of-fit) and validation (performance of models) datasets.The results indicated that the SVM model had a higher goodness-of-fit and performance compared to the IOE model.Additionally, the results of the evaluation of landslide susceptibility maps extracted using the two models by AUROC and two statistical tests including the Friedman and Wilcoxon signed rank tests showed that the SVM model outperformed the IOE model.SVM as a soft computing benchmark model can perform well among the many models for the spatial prediction of landslides [74,96,117,118].The strength of SVM in comparison to IOE is due to its robustness and ability in removing the over-fitting and noise problems in the modeling process, resulting in increasing the model prediction accuracy.

Conclusions
Landslides are very dangerous and destructive disasters all over the world.Therefore, landslide detection is very important for the government and local residents in any country.Cameron Highlands, Malaysia, has a typical landslide problem because of its heavy rainfall and mountainous location.Landslides have frequently occurred in this area following heavy rainfall, specifically in inaccessible areas where field work is difficult to carry out.Hence, the combination of optical and SAR data is a suggested technical strategy for identifying landslides in tropical environments.
The results of the detected landslide and the observed landslides (landslide inventory map) revealed the strong capability of WorldView-1 images and AIRSAR data to detect very small landslides, which occurred due to heavy rainfall with an acceptable RMSE of 0.163 (16.3%).Based on the obtained results from the IOE model, rainfall has the highest influence on landslide occurrence, followed by distance to river, soil, lithology, land cover, slope angle, aspect, distance to road, distance to fault and NDVI.The analysis and validation of the model results using statistical-based measures and AUROC showed that SVM outperformed the IOE model.Additionally, the validation results showed that more than 80% of the total landslide pixels were correctly classified by the IOE and SVM models, indicating the power of prediction of these models in the study area.
Additionally, the results of success and prediction rate curves illustrated that the SVM model had more power of prediction in the determination of existing and future landslides.The current research exploited that the C-, L-and P-band images of AIRSAR data are able to provide acceptable coherence in the study area.Landslide detection in conjunction with GIS susceptibility mapping is proposed for future work based on satellite images with high resolution and more accuracy.The information provided by landslide susceptibility maps could help planners and engineers to make better decisions about landslide prevention, mitigation and avoidance.

32 Figure 1 .
Figure 1.Location of the study area in Cameron Highlands, Peninsular Malaysia; (a) Landsat ETM+ mosaic image of Peninsular Malaysia; (b) the shaded relief map of Cameron Highlands derived from a 30-m ASTER GDEM modified from Razak [8].The rectangular area is the actual study area.

Figure 1 . 32 Figure 2 .
Figure 1.Location of the study area in Cameron Highlands, Peninsular Malaysia; (a) Landsat ETM+ mosaic image of Peninsular Malaysia; (b) the shaded relief map of Cameron Highlands derived from a 30-m ASTER GDEM modified from Razak [8].The rectangular area is the actual study area.

Figure 2 .
Figure 2. The geological map of a part of the region of the Cameron Highlands as the study area.

Figure 3 .
Figure 3. Field photographs of recent landslides and types of landslides: (a) a shallow translational rockslide, (b) a shallow translational slide at the road side, (c) a rotational slide and (d) deep-seated rotational slide.The arrow depicts the movement direction.

Figure 3 .
Figure 3. Field photographs of recent landslides and types of landslides: (a) a shallow translational rockslide, (b) a shallow translational slide at the road side, (c) a rotational slide and (d) deep-seated rotational slide.The arrow depicts the movement direction.

Figure 4 .
Figure 4. Sequence of the AIRSAR DEM process: (a) opened C-band DEM, (b) converted DEM to actual height values, (c) masked DEM image, (d) geo-referenced, (e) resized DEM to the size of research site and (f) resized sample area.

Figure 5 .
Figure 5. Flowchart of preparing the map for landslide susceptibility mapping.IOE, index of entropy.

Figure 5 .
Figure 5. Flowchart of preparing the map for landslide susceptibility mapping.IOE, index of entropy.

Figure 6 .
Figure 6.The AIRSAR composite image is overlaid onto the old landslides map: (a) the segmented and classified AIRSAR images overlaid with the old landslide map (landslides in red polygons), (b) detected landslides and (c) comparing detected and observed landslides.

Figure 6 .
Figure 6.The AIRSAR composite image is overlaid onto the old landslides map: (a) the segmented and classified AIRSAR images overlaid with the old landslide map (landslides in red polygons), (b) detected landslides and (c) comparing detected and observed landslides.
confirms that landslides detected by AIRSAR data have almost a logical concordance with the old landslides in the study area.It is noticed that the gray color in Figure7only indicates the elevations as a hillshade map over the study area.Remote Sens. 2018, 10, x FOR PEER REVIEW 17 of 32 system for the landslide features.The final compiled landslide inventory map in this study is shown in Figure7.The comparison between Figures6 and 7confirms that landslides detected by AIRSAR data have almost a logical concordance with the old landslides in the study area.It is noticed that the gray color in Figure7only indicates the elevations as a hillshade map over the study area.

Figure 7 .
Figure 7. Final landslide inventory map: (a) a translation slide, (b) a shallow translational rockslide, (c) a rotational slide and (d) a translation slide.

Figure 7 .
Figure 7. Final landslide inventory map: (a) a translation slide, (b) a shallow translational rockslide, (c) a rotational slide and (d) a translation slide.

Figure 8 .
Figure 8. Landslide susceptibility map produced by the SVM model.

Figure 9 .
Figure 9. Histograms representing the distribution of observed landslides falling into various susceptibility classes of landslide susceptibility mapping (LSM) extracted from SVM and IOE models: (a) area (km 2 ) of landslides that occurred; (b) percentage (%) of landslides that occurred.

Figure 10 .
Figure 10.Landslide susceptibility map produced by the IOE model.

Figure 10 .
Figure 10.Landslide susceptibility map produced by the IOE model.

Figure 11 .
Figure 11.Success and prediction accuracy rate curves of the SVM and IOE models used in landslide susceptibility mapping.

Figure 11 .
Figure 11.Success and prediction accuracy rate curves of the SVM and IOE models used in landslide susceptibility mapping.

Table 1 .
Characteristics of AIRSAR DEM data and WorldView-1 satellite imagery used in the research.

Table 2 .
Factors used in susceptibility assessment, data sources and associated factor classes for landslide susceptibility mapping in Cameron Highlands.

Table 3 .
Landside influencing factors and their classes.

Table 4 .
Model performance on the training and validation datasets for the SVM and IOE models.

Table 4 .
Model performance on the training and validation datasets for the SVM and IOE models.

Table 5 .
Spatial relationship between each landslide conditioning factor and landslide by the SVM and IOE models.

Table 6 .
Average ranking of the two landslide susceptibility models using the Friedman test.

Table 7 .
Performance of the two landslide susceptibility models using the Wilcoxon signed-rank test.