A Novel Integrated Approach of Relevance Vector Machine Optimized by Imperialist Competitive Algorithm for Spatial Modeling of Shallow Landslides

This research aims at proposing a new artificial intelligence approach (namely RVM-ICA) which is based on the Relevance Vector Machine (RVM) and the Imperialist Competitive Algorithm (ICA) optimization for landslide susceptibility modeling. A Geographic Information System (GIS) spatial database was generated from Lang Son city in Lang Son province (Vietnam). This GIS database includes a landslide inventory map and fourteen landslide conditioning factors. The suitability of these factors for landslide susceptibility modeling in the study area was verified by the Information Gain Ratio (IGR) technique. A landslide susceptibility prediction model based on RVM-ICA and the GIS database was established by training and prediction phases. The predictive capability of the new approach was evaluated by calculations of sensitivity, specificity, accuracy, and the area under the Receiver Operating Characteristic curve (AUC). In addition, to assess the applicability of the proposed model, two state-of-the-art soft computing techniques including the support vector machine (SVM) and logistic regression (LR) were used as benchmark methods. The results of this study show that RVM-ICA with AUC = 0.92 achieved a high goodness-of-fit based on both the Remote Sens. 2018, 10, 1538; doi:10.3390/rs10101538 www.mdpi.com/journal/remotesensing Remote Sens. 2018, 10, 1538 2 of 27 training and testing datasets. The predictive capability of RVM-ICA outperformed those of SVM with AUC = 0.91 and LR with AUC = 0.87. The experimental results confirm that the newly proposed model is a very promising alternative to assist planners and decision makers in the task of managing landslide prone areas.


Introduction
Landslides are phenomena of large ground movements that are mainly caused by either climatic or geophysical factors. These natural hazards often occur along one or several slip surfaces due to shear displacement of soil and rock [1,2]. Besides climatic and geophysical factors, environmental changes caused by humans, which alter the landforms have also been recognized as triggering factors of landslides [3]. A landslide is a mass movement of slope-forming materials where the shear stress on a slope is higher than the shear strength of the slope-forming materials [4].
Despite the fact that landslides occur frequently worldwide, the mechanism, the scale, and the risk of landslides in many regions have not been thoroughly explored and understood [5][6][7][8][9][10][11]. Research on landslide susceptibility and mitigation in many developing countries, especially in Vietnam, is still limited. This fact has been demonstrated through a high number of human casualties and economic losses caused by landslide incidences [12].
Earthquake and rainfall are widely recognized as the two main causes of landslide [13]. In Vietnam, due to its typical geographic and climatic features, rainfall is the dominant factor of landslide occurrence [14]. In recent years, because of the global climate change, extreme weather phenomena, such as torrential rainfalls and tropical rainstorms, appear frequently [15,16]. Therefore, it is of urgent need to construct a landslide susceptibility model with particular consideration of rainfall as a triggering factor. This model may provide immense help to assess the likelihood of landslide occurrence in the future and to establish appropriate strategies for hazard mitigation [17,18].
Some studies were conducted on engineering geology, slope stability, and landsliding by researchers between the 1960s and 1970s. Some of these studies were done by consulting engineering geologists which were not published [19]. However, during this decade some papers were published on engineering geologic, landslide and slope stability. The 1970s was graphically landslide-based where conditioning/controlling factor effects on landslide occurrence were known such as slope exposure [19]. The U.S. Geological Survey undertook extensive regional mapping of landslide deposits, analysis of the costs of damage produced by landsliding, preparation of regional slope maps, while the preparation of slope stability maps were the most important productions in the 1970s [20,21]. On the other word, studies of landslide susceptibility and hazard mapping are referred to 1970s by Nilsen et al. [19]; and Kienholz [22,23].
A susceptibility map for a region can significantly decrease the negative effects of landslides. Based on such a map, intact areas that are potentially damaged by future landslides can be clearly detected. Accordingly, landslide hazard mitigation and prevention can be achieved either by constructing landslide defense structures or by relocation of the local communities and industrial facilities in highly susceptible areas to safer areas. In practice, the establishment of a susceptibility map is the most important step in landslide hazard management [18,24]. This is because this map is an effective tool to quantify the level of landslide hazard reflected by the spatial likelihood of the landslide occurrences with respect to various conditions of topography, geography, vegetation, climate, and land-use [25][26][27].
In recent years, the advancements of GIS technology have paved the way for various data-driven landslide susceptibility models. GIS has been extensively used and demonstrated its feasibility and effectiveness for many types of natural hazard mapping including rainfall-induced shallow landslides [26,[28][29][30]. The GIS database, which consists of a historical landslide inventory and a set of landslide conditioning factors, is first established. Statistical and machine learning models are then employed to generalize a decision boundary that distinguishes landslide areas from the entire study region.
Among these, SVM is also a powerful method for landslide modeling [6]. The SVM method uses statistical learning theory to find an optimum linear hyper-plane to separate two classes of landslide and non-landslide. In this study, the nonlinear data is converted into the linearly separable data in a high-dimensional feature space using the radial basis function (RBF) kernel function (KF) [66]. Recent works reported by Reichenbach, et al. [67] and Huang and Zhao [68] point out an increasing trend of applying GIS-based machine learning models for landslide susceptibility mapping. Although a variety of models have been proposed, the investigation of other advanced machine learning approaches used for tackling the problem of interest is very necessary. It is because data sets collected in different study regions have unique characteristics. Thus, a model may perform well in one region but may not be suitable for another region. This fact has been clearly demonstrated in the comparative works of Pham, et al. [69], Wang, et al. [70], and Bui,et al. [45].
In addition, when applying machine learning models, an important issue is how to appropriately determine their hyper parameters. The task of hyper parameter setting is widely known as the model selection problem [71]. This is a crucial task because hyper parameters affect the learning process and therefore strongly influence its generalization capability. If the model parameters are not set properly, the constructed machine learning models suffer from either over fitting or under fitting, which clearly undermines their prediction processes.
It is also to be noted that the problem of model selection is far from being trivial. The first challenge is that a machine learning method may have many hyper parameters. The second challenge is that the hyper parameters are searched in continuous ranges; thus, there are infinite parameter combinations. To tackle these challenges, many researchers have resorted to metaheuristic algorithms such as the Artificial Bee Colony [72], the Differential Flower Pollination [73], the Particle Swarm Optimization [74], the Symbiotic Organisms Search [75], the Firefly Algorithm [76], and the Genetic Algorithm [77]. The integration of machine learning and metaheuristic algorithms has been demonstrated to be an effective tool for solving complex problems including landslide spatial prediction [78,79].
In this research context, the current study aims at proposing a new artificial intelligent method for landslide spatial modeling that hybridizes the Relevance Vector Machine (RVM) [80] and the Imperialist Competitive Algorithm (ICA) [81]. RVM is a powerful machine learning method, which operates on the theory of Bayesian inference. Successful implementations of RVM in various fields have been reported by previous studies [82]; nevertheless, its application in rainfall-induced shallow landslide modeling is still very limited. Therefore, our research work is an attempt to fill this gap in the literature. It is of note that the learning phase of RVM requires the determination of the Radial Basis Function width as a hyper parameter. For determining this parameter of RVM, the ICA metaheuristic method is employed.
To the best of our knowledge, none of the previous studies have investigated this integration of RVM and ICA for spatial prediction of rainfall-induced shallow landslide as well as of other natural hazards. In addition, a GIS database for the study area of Lang Son district in the northern part of Vietnam was established. The performance of the new hybrid model, named as RVM-ICA, was compared with those of the SVM and LR. The receiver operating characteristic (ROC) curve and other statistical indicators (sensitivity, specificity, and accuracy) were selected as performance measurement metrics.
Monsoon winds (seasonal reversing winds) affect the study area by altering the regional precipitation [85]. The average annual rainfall varies from 1200 to 1600 mm. The typical rainy season is from May to September. Based on historical records, high amount of rainfall is the main landslide triggering factor in the study area. According to meteorological data analyses, during the past 20 years, the annual average temperature ranged from 17 to 22 • C. The highest and lowest average monthly temperatures are 27.5 • C in July and 12.5 • C in January, respectively. Additionally, the annual average relative humidity in the study area is between 80% and 85%. The highest humidity is recorded in August while the lowest humidity is observed in December [85].

Landslide Inventory Map
For the purpose of landslide spatial modeling, information regarding the landslide inventory and landslide influencing factors must be collected from the study area. The landslide inventory contains landslide events that occurred in the past. The landslide affecting factors cover a variety of aspects including geomorphological, geological, hydrological, and climatic factors which influence the likelihood of landslide occurrences.
In the study area, shallow soil slides and debris flows triggered by rainfall are the subject of interest. The reason is that no earthquake-induced landslides have been recorded in the region. It is of note that a small numbers of rock fall events were removed from the inventory due to their rarity, insignificant damage, and also of a different mechanism. The landslide inventory map used in this study was constructed by three means:  [86,87]. (iii) Some recent landslide locations were identified during field works by Nguyen et al. [88].
The rationale of the landslide prediction model is based on the assumption that the factors that triggered landslide events in the past will cause the landslide occurrences in the future [79]. Thus, determining the landslide conditioning factors is the critical issue for constructing an accurate landslide susceptibility map [89]. In this study, a total of 101 landslide locations were collected to establish the landslide inventory map (see Figures 1 and 2). They are soil mixed boulder slides, which occurred during the last 15 years. These landslide locations are obtained from three projects carried out in the previous research work Bui, Pradhan, Revhaug, Nguyen, Pham, and Bui [85]. It is proper to note that the 101 landslide locations are divided into two groups: group 1 including 69 locations is used for model training and group 2 consisting of 32 locations is employed for model validation. The number of pixels in the two groups of data is 2410 and 1045. In other words, the training set occupies~70% of the data and the validation set consists of~30% of the data. To complete the data set used for machine learning model establishment, data samples belonging to non-landslide locations are randomly sampled from the GIS database with the help of the ArcGIS 10.2 package.

Landslide Conditioning Factors
Furthermore, fourteen landslide conditioning factors, used in this study, are categorized into two groups of geo-morphometrical and geographical factors. The geo-morphometrical factors include slope degree, slope length, aspect, curvature, elevation, topographic wetness index (TWI), stream power index (SPI), sediment transport index (STI), valley depth, and toposhade. The geo-environmental factors comprise lithology, land use, soil type, and distance to faults. The selection of these factors is based on a literature reviewing process [8,45], a previous analysis by Nguyen et al. [88], and the availability of data in the study region.
To retrieve the geo-morphometrical factors, a digital elevation model (DEM) was constructed from the National Topographic Maps at scale of 1:5000 for Lang Son city. The resolution of the DEM is 5 m × 5 m and size of the DEM is 2567 and 2954 pixels. In this study, a raster resolution of 20 m was used for all the conditioning factors in the landslide modeling process. Based on the constructed DEM, slope degree, slope length, aspect, curvature, elevation, TWI, SPI, STI, valley depth, and toposhade were extracted using the ArcGIS 10.2 software package. Continuous values of these factors (except aspect) were reclassified into classes using Jenks Natural Break optimization method [67] available in ArcGIS 10. 2, as suggested and explained by Hung et al. [90].
Slope angle is the major parameter for the slope constancy analysis [91]. Slope angle is very frequently used in landslide susceptibility studies [92]. The slope map was generated and classified into six classes ( Figure 3a, Table 1). Slope length has been considered an important factor in landslide activity since longer slope lengths increase the potential of erosive agents to dislodge and transport materials downslope [93]. The slope length was prepared with five classes (Figure 3b, Table 1).
The slope aspect describes the direction of the slope and has an important effect on the rainfall, wind, and sunlight exposure. Consequently, this factor has been frequently applied in landslide susceptibility analyses by many scholars [94]. The aspect map was built with nine classes including flat, north, northeast, east, southeast, south, south-west, west, northwest ( Figure 3c, Table 1). Surface curvature at any point is the arc of a line. It is formed by the intersection of the surface with a specific alignment passing through this point. The value of the curvature can be either below, above, or equal to 0.05, representing the concave, convex, or flat shaped curvatures, respectively [95]. A curvature map was generated in five categories ( Figure 3d, Table 1). Elevation is one of the important factors in the occurrence of landslides. The height area affects the loading on the slope and thus enhances the chances of landslides if the sliding plain has dip (orientation) towards the open excavation [96].
The elevation map was classified into seven classes ( Figure 3e, Table 1). The topographic wetness index (TWI) is a hydrological factor that is affected by the slope and the specific catchment area of a watershed [97]. Moreover, the probability of landslide occurrence will be decreased when the TWI increases [98]. The TWI is calculated as follows: where α is cumulative upslope area drainage through a point (per unit contour length) and β is the angle of the slope at the point. Accordingly, the TWI map for the study area was classified into six classes ( Figure 3f, Table 1). The stream power index (SPI) demonstrates the erosion power of the stream in a region [99]. These hydrological factors are defined based on the specific catchment area (A S ) which is proportional to the discharge (Q) in a watershed. It is computed as follows: where A s is the specific catchment area, and β is the local slope gradient (in degree) [100]. The SPI map was prepared and categorized into six classes ( Figure 3g, Table 1). In addition, the sediment transport index (STI) indicates the amount of sediment transported by overland flow. This hydrological factor is based on the catchment evolution erosion theories and the transport capacity limiting sediment flux [101,102]. The STI is calculated from the following formula [103]: where As is the specific catchment area (m 2 /m) and β is the slope gradient [100]. In this study, the STI map was divided into six classes ( Figure 3h, Table 1).
The valley depth (VD) factor is defined as the difference in elevation between the pixel and the upstream ridge; this factor has a major contribution in the slope instability assessment [104]. The VD map for this study was constructed with six classes (Figure 3i, Table 1).
The toposhape as one of the important conditioning factors in occurrence of landslide in the current study was created and categorized into ten classes namely ridge, saddle, flat, ravine, convex hillside, saddle hillside, slope hillside, concave hillside, inflection hillside, and unknown hillside as suggested in [105,106] (Figure 3j, Table 1). Land use has a significant influence on slope instability and is widely considered in landslide susceptibility [107]. The land use map of the study area with seven classes (Figure 3k, Table 1) was constructed based on the land use status map at 1:50,000 scale; this land use map was provided by the local authority [88]. The dominant land use type in the study area is the forest land (43.4%) in which 35.7% and 7.7% of the land belong to productive and protective forests, respectively. Additionally, paddy land covers 21.5% of the total study area, followed by barren land (20.4%) and residential areas (6.9%), respectively. No.
Lithology plays a very important role in the landslide occurrences. Soft and weathered rocks are more vulnerable than hard unjointed rocks thus lithological units have different vulnerability degrees to landslides [109,110]. In this study, the lithology map is compiled based on four tiles of the Geological and Mineral Resources Map (GMRM) of Vietnam at the scale of 1:50,000 [84] in ArcGIS 10.2.
The study area consists of various types of lithological units (Figure 3m, Table 1).
Moreover, distance to faults is one of the most important affecting factors of landslides as slope may fail along the faults depending on the nature and orientation of the faults [111]. The distance to faults map with five categories (Figure 3n) was constructed from the fault lines of the lithological data with the help of ArcGIS 10.2 (Table 1). Table 1 and Figure 3 show all fourteen conditioning factors and their classifications used in the current study.

Relevance Vector Machine
Relevance Vector Machine (RVM), proposed by Tipping [112], is a popular machine learning technique, which can construct nonlinear classification models with probabilistic outcomes. This algorithm is essentially a supervised pattern classification problem relying on a training set of feature vectors X = {x n } N n=1 with the outputs C = {c n } 2 n=1 . Herein, C 1 and C 2 denote the landslide and non-landslide data samples, respectively. In landslide modeling, the task of interest is to establish a nonlinear classification model that assigns the set of feature vectors into two decision regions, which corresponds to the two aforementioned class labels. The problem at hand is non-trivial due to the multivariate, complex, and uncertain natures of landslide prediction.
Thus, RVM is deemed very suitable for modeling the task of interest. Similar to the concept of SVM, RVM first maps training data samples from the original input space to a higher dimensional space (called feature space). Accordingly, a hyperplane used for discriminating the data can be constructed in the feature space. The concept of RVM is demonstrated in Figure 4.

Kernel Mapping
Conditioning Factor x 1

Non-Landslide
Nonlinear decision boundary Hyper-plane constructed by RVM ) (y  = Without the loss of generality, the outputs C i are assigned with two possible outcomes: 0 for no landslide occurrence and 1 for landslide occurrence. The conditional distribution of landslide occurrence given information of the landslide predisposing factors and the probabilistic model is provided as follows: where σ(y) = 1 1+e −y represents a logistic sigmoid function; and y denotes a linearly-weighted sum of M basis functions: where φ represents a Gaussian radial basis function. Its functional formula is stated as follows: where r represents the width of the radial basis function (RBF). In this method, weights must be determined using a prior distribution function over the vector to formulate a Bayesian training criterion. Additionally, to prevent an over-fitting problem (very large values of w), achieving small weights in order to make a smooth classification boundary is the main objective [80,114]. This can be attained by assigning a zero-mean Gaussian distribution on the weight as follows [79]: where α is a vector of independent hyper-parameters; each element α m of the vector dictates how far its associated weight W m may vary from zero. Additionally, a sparse weight prior distribution is attained by assigning a different variance parameter for each weight [112]. Thus, the prior distribution over w is written in the following form [79]: Given initial values of the hyper-parameter α and p(w|C, α) ∝ P(C|w)p(w|α), the most probable weight µ can be found by maximizing the penalized negative log-likelihood function over w [79]: where A = diag (α 1 , α 2 , . . . , α m ). Furthermore, the least squares algorithm (LSA) is applied on weights iteratively and then the Laplace approximation procedure to solve the above optimization problem is conducted; the most probable weight µ and language its covariance Σ are computed as follows [79]: The remarkable result when the optimization process terminates is that many elements of the hyper parameter vector α approach infinity; therefore, the weight vector w only has a few non-zero elements, which are considered as relevant vectors [112]. After finishing the training process, the vector of model weight w is then used to predict the posterior of the class label C i given an input vector x utilizing Equation (4).
It is of note that the performance of RVM is dependent on the setting of the parameter r, which is the RBF basis width. This parameter strongly influences the smoothness of the classification boundary and therefore affects the model predictive capability [79]. A too smooth boundary may under-fit the data; meanwhile, a too rough boundary may lead to over-fitting [71]. The process of selecting a model parameter appropriately is widely known as a model selection problem. Since model selection can be formulated as an optimization task, we employed Imperialist Competitive Algorithm (ICA) optimization in this study as powerful metaheuristic to solve the model selection problem for RVM. The ICA algorithm is described in the subsequent part of the study.

Imperialist Competitive Algorithm (ICA)
The Imperialist Competitive Algorithm (ICA), proposed by Atashpaz-Gargari and Lucas [81], is inspired from the field of human social evolution. This algorithm belongs to the group of swarm intelligence, which can effectively deal with continuous optimization problems [115]. As other metaheuristics, ICA is specifically designed for solving optimization problems in which no exact algorithm can be applied in polynomial time, also called NP-hard problem. The task of finding an optimal value for the machine learning model can be classified as an NP-hard problem since interaction between the model structure and the training data is very complex and there is an infinite number of possible values of the tuning parameter. Since successful applications of ICA have been widely observed [116], this algorithm can be helpful to assist the training phase of RVM by selecting an optimal width of the radial basis function.
ICA is a population-based (metaheuristic optimization) stochastic search inspired by imperialistic competition [81]. This algorithm attempts to mimic the social policy of imperialism in the real world. When an empire rises, it dominates more colonies and takes advantage of their sources; if one empire falls, other empires will compete to take its possessions. In ICA, individuals within the population represent countries and they interact with each other to form empires that possess colonies ( Figure 5). ICA commences with an initial population and a pre-specified objective function. Based on the objective function value, the most powerful countries are chosen as imperialists and the others are colonies of them. The algorithm then simulates the competition among imperialists in order to acquire more colonies. The best imperialist typically has more chance to occupy more colonies. A population of ICA is illustrated in Figure 6.
it is noted that α represents a uniformly distributed random number, generated as follows After colonies have been assigned to each imperialist, these colonies move towards their corresponding imperialists. These movements of colonies are demonstrated in Figure 6. In this figure, it is noted that α represents a uniformly distributed random number, generated as follows [117]: where θ denotes a constant variable; typically, θ is greater than 1 (e.g., 1.5). S is the distance between the colony and the imperialist. During the competition process, the weakest empire gradually loses its colonies and other powerful empires attempt to obtain them. Moreover, the colonies will exchange their positions when they are more powerful than their relevant imperialist [118]. The empire that has no colonies will collapse and eventually the most powerful empire will dominate all other empires and represents the final optimal solution for the optimization problem at hand.

Statistical-Based Measures
Statistical index-based methods are applied to evaluate and compare the performance of a new proposed model with other soft computing benchmark models. In this study, sensitivity (recall), specificity, precision (positive predictive value (PPV)), and accuracy were utilized. To better understand the definition and formulation the above mentioned criteria, four types of possible consequences including True Positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN) are necessary to be described [119]. The TP and FP are defined as the proportion of the number of pixels that are correctly classified as landslide and non-landslide, respectively. Meanwhile, TN and FN are the number of pixels classified correctly and incorrectly as non-landslide, respectively [4]. Basically, the goodness-of-fit (using training dataset) and prediction power (using validation dataset) of the landslide susceptibility models are evaluated based on these statistical measures.
Moreover, sensitivity (recall), specificity, and accuracy are also commonly employed performance measurement metrics. These three metrics are computed as follows [

Receiver Operating Characteristic (ROC) Curve
The ROC curve is also widely used to assess the predictive power and quality of probabilistic classification models. The global performance of a landslides model is evaluated by the area under the ROC curve (AUROC) [121]. It is to be noted that the ROC is constructed based on the sensitivity (true positivity rate) and the specificity (false negative rate) results. The AUROC can be computed as: To quantify the global performance, AUROC that varies between 0.5 and 1 is also used. The closer the AUC is to one, the more accurate the prediction of the model is. Accordingly, The AUROC values of 0.5-0.6, 0.6-0.7, 0.7-0.8, 0.8-0.9, and 0.9-1 indicate insufficient, moderate, good, very good, and excellence performance, respectively [84].

GIS Database, Training and Validation Datasets
In order to construct a machine learning based model for shallow landslide prediction, it is necessary to establish a GIS database. The GIS database used in this study is illustrated in Figure 7. The locations of the past shallow landslide occurrences, maps of topographic feature, land use data, and the map of the geological features are processed and integrated into a single database. In total, fourteen input variables are employed to model the status of the rainfall-induced shallow landslide. It is worth noting that data was acquired, processed, and integrated via ArcGIS and IDRISI Selva packages. Consequently, for producing the susceptibility maps, the susceptible indexes of all the pixels of the area were calculated by the RVM inference process, the SVM, and the LR models. The two SVM hyper parameters of the kernel width (γ = 0.7) and the regularization parameter (C = 0.5) were selected to implement the SVM prediction model. Thereafter, these pixels with susceptible indexes were converted to GIS data using the ArcGIS 10.2 software (ESRI, California, USA) and a supplementary module developed in C ++ by the authors. To construct the shallow landslide prediction model, information of 6908 pixels within the map of the studied region was collected. Among them, 3453 pixels were associated with landslide occurrences. Since we formulate the landslide modeling as a supervised learning task, landslide occurrence data (or pixels) are assigned to the class label of C 1 = 1; and the label of non-landslide data are denoted as C 2 = 0. In addition, the whole data set is divided into two subsets: Training Set (70%) used for model construction and Validation Set (30%) employed for model testing. In this study, to facilitate the modeling process, all continuous and discreet condition factors were normalized and converted from categorical classes into continuous values within the range of 0.01 and 0.99 by the frequency ratio method described in Tien Bui et al. [85].

The Proposed Model Structure
The overall structure of the RVM-ICA model which is a combination of RVM and ICA algorithms is shown in Figure 8. RVM serves as a pattern recognition algorithm to distinguish data samples with a landslide label from data samples with a non-landslide label. This machine learning method was employed to generalize a classification boundary from the collected data set. This classification boundary is capable of recognizing the status of landslide occurrences in the study area. The prediction results of RVM were then used to construct a landslide susceptibility map for the city of Lang Son. As aforementioned, the model construction of RVM necessitates an appropriate setting of the RBF basis width. Therefore, in this study, we employed ICA to fine-tune this parameter of RVM and achieve the most desirable prediction performance.
At the first iteration, the RBF basis width of RVM is produced randomly between 0 and 1. The RVM model associated with this basis width value is established with the data from the training set. To better evaluate the quality of a RVM model, we further divide the training set into two data groups: Data for model construction (70%), is denoted as group 1 and data for model prediction (30%) is denoted as group 2. The data in Group 1 is used to train the RVM model; the data in group 2 serves as unknown patterns. In the model evaluation of RVM, the following objective function is used; where CSR Group1 and CSR Group2 denote classification success rates of the two groups of interest.
= + Figure 8. The proposed RVM-ICA used for landslide spatial modeling.
The purpose of data separation in the training set is to alleviate the over-fitting problem. It is of note that one may simply identify the most suitable RBF basis width parameter by considering the model prediction performance on the whole the training set. However, the prediction results on the training set alone are not good indicators of the model generalization due to an over-fitting issue. The over-fitting generally occurs when a model learns the training data very well but with the data poorly predicts outside of the training set.
Therefore, the data in group 2 is utilized to penalize the over-fitted model and identify a good value for the RBF basis width. Since the objective function has been defined, the ICA operation can start. During the searching process, ICA gradually recognizes suitable values of the tuning parameter and discards inappropriate ones. When the stopping criterion is satisfied, the ICA operation stops; accordingly, a desirable tuning parameter has been identified. The optimized RVM-ICA model is re-trained with the whole training set and the prediction outcome on the validation set can be obtained.

Factor Selection Using Information Gain Ratio (IGR)
In landslide prediction, it is necessary to preliminarily investigate the relevancy of the influencing factors to ensure that the essential factors are selected and the irrelevant factors are removed from further analysis [122]. In this study, Information Gain (IGR) algorithm, which is a commonly-used feature selection method [91], was selected for expressing the prediction power of the aforementioned landslide conditioning factors. The IGR, as a filtering approach, is an entropy-based method that only considers important factors on landslide occurrence. This method aims at ranking subsets of features based on the results of information gain entropy in a decreasing order [123]. Consider C as a training dataset composed of n input samples which is the number of samples in the training dataset C, belonging to the class label (landslide, non-landslide locations). The IGR for a specify landslide conditioning factor (CF) and training data (C) is obtained with the following equation [63,124]: The results of the factor evaluation based on IGR are shown in Table 2. It can be observed that slope is the most significant factor for shallow landslide modeling in this study because it has the highest value of average predictive ability (0.601 0.  023 0.001), respectively. It can be observed that slope is the most significant factor for shallow landslide modeling in this study because it has the highest value of average predictive ability, followed by STI, aspect, SPI, TWI, land use, curvature, toposhade, lithology, elevation, slope length, distance to faults, soil type, and valley depth, respectively.

Training and Validation Process
In this study, based on the training dataset, the modeling of landslide occurrences was performed and analyzed. Based on the optimization outcome of ICA, the optimal RVM hyper parameter, which is the basis width, was found to be 0.979. Therefore, the proposed model was trained with the basis width of 0.979. The training result of the proposed model is reported in Table 3. It can be observed that the proposed model has very high values of sensitivity (87%), specificity (96.6%), as well as accuracy (91.3%). The training result with the collected dataset demonstrates that the learning phase of the proposed RVM-ICA has been achieved successfully, reflected by a goodness-of-fit between the actual and predicted class labels.
Additionally, the proposed model was evaluated using the validation dataset to assess its prediction power in shallow landslide modeling. The validating results of the proposed RVM-ICA are reported in Table 4 and Figure 9. It can be seen that the proposed model also obtained high values of sensitivity (84.1%), specificity (90.6%), and accuracy (87.1%). Moreover, the predictive results of the ROC curve analysis pointed out that the proposed model has a very high value of AUC (0.92) (see Figure 9). The observed validation results are in line with the training outcomes. In summary, both the training and validating phases indicate that the proposed model achieved a very good predictive capability for shallow landslide modeling.

Construction of the Susceptibility Map
Since RVM-ICA has achieved a very good predictive result with the data set collected from the study area, this hybrid intelligent model was employed to prepare a landslide susceptibility map for the city of Lang Son (Vietnam). The landslide susceptibility map for the study area ( Figure 10) was constructed with five susceptible classes including very low (40%), low (20%), moderate (15%), high (15%), and very high (10%) [125,126]. These labels were determined on the basis of reclassification of susceptible indexes of all pixels in the map.
To validate the reliability of the produced susceptibility map, the landslide inventory map, which includes the actual landslide locations, was overlaid with the newly constructed map. The graphic curve [127] was plotted with the percentage of the landslide pixels on the y-axis and the percentage of pixels of susceptible classes which were sorted from high to low susceptible indexes. It can be seen from the graphic curve that most of the actual landslide pixels were detected in high and very high classes whereas very few actual landslide pixels were located in low and very low classes. The AUC was obtained with the models for the whole study area (Figure 10). These results depicted that the produced landslide susceptibility map established by the proposed RVM-ICA model is highly reliable for practical landslide hazard management.

Model Comparison
Since the proposed model has been newly introduced for landslide prediction, its predictive capability should be compared with other well-known existing methods. In this study, two benchmark models including the LR and SVM were selected for comparison. The results of the model comparison are reported in Tables 3 and 4, and Figure 8. It can be shown that the sensitivity, specificity, accuracy, and AUC of the proposed model are better than those of the other benchmark models (SVM and LR) in both training and validation phases. According to the results, the proposed model outperformed the SVM and LR models in shallow landslide prediction.

Discussion
Landside modelling has been studied based more on qualitative and quantitative methods throughout the world. Among these, machine learning and evolutionary algorithms have been more favored [4,124]. The aim of the introduction of these techniques is to use an accurate method to prepare a reliable landslide susceptibility map. Basically, in the current study we present a new state-of-the-art evolutionary/optimization algorithm of the imperialistic competitive algorithm (ICA) based relevance vector machine (RVM) namely ICA-RVM for spatial prediction of landslides in Lang Son city. Additionally, two machine learning algorithms including LR and SVM were used for comparison of the new proposed model.
The ICA-RVM had not been explored for spatial prediction of landslides. We selected fourteen condition factors for the modelling process. The information gain ratio was used to determine the most important factors for the modelling process. Results indicated that slope angle was the most significant factor for landslide occurrence in the study area. The analysis results are reasonable because slope is well-known as the most important affecting factor for landslide occurrences [128]. The outcomes of IGR-based analysis also show that all factors are relevant to landslide modeling as the average predictive ability of all factors is high; therefore, all fourteen influencing factors were selected for landslide modeling in this study.
Results analysis concluded that the new proposed model had the highest goodness-of-fit and performance according to the training and validation datasets, respectively (see Tables 3 and 4). However, the ICA-RVM (accuracy = 87.1%) model can significantly enhance the performance of LR (accuracy = 79.8%) and SVM (accuracy = 84.6%). The validity of landslide susceptibility maps prepared using the new hybrid model and two state-of-the-art algorithms, LR and SVM, was checked by the validation dataset and designing the ROC curve ( Figure 8). The results of this figure indicated that the ICA-RVM outperformed and outclassed the LR and SVM as benchmark algorithms. The reason for the improvement in prediction accuracy of the proposed method is that RVM-ICA is a hybrid model which inherits the advantages of both machine learning and metaheuristic approaches [72]. Additionally, the proposed model used the RVM model as the classifier which has many advantages compared with others machine learning approaches. The first advantage is that RVM has only one hyper parameter; this significantly eases its model selection phase. The second advantage is that the resulting model constructed by RVM generally has fewer support vectors. This means that the model of RVM is less complex than the standard SVM [78]; therefore, RVM is less prone to over fitting than SVM. Moreover, the results which demonstrate that the SVM model outperforms the LR model for landslide prediction in the present study are in agreement with the findings in previous studies [129].

Concluding Remark
In this study, RVM-ICA, which is a combination of two advanced computational intelligence methods of RVM and ICA, was proposed for spatial landslide modeling with the case study of Lang Son city (northern part of Vietnam). This area has experienced many serious landslides in recent years during the monsoon season. The proposed model was constructed and validated with a dataset generated from the GIS database of the study area. The data set includes 101 historical landslide locations and fourteen landslide influencing factors. The predictive capability of the proposed model was verified by the employment of the area under the ROC curve (AUROC), and three statistical indexes (sensitivity, specificity, and accuracy).
In addition, two benchmark models including SVM and LR were used to compare and validate the applicability of the new shallow landslide prediction model. Analysis results showed that the new hybrid model obtained a very good training performance (accuracy = 91.3%) and a desired predictive outcome with the validating data set (accuracy = 87.1%). More specially, the performance of the proposed model is significantly better than those of the two benchmark models (SVM and LR). These facts indicate that the hybrid machine learning model is a very promising alternative for shallow landslide modeling. In short, the hybrid approach of the advanced machine learning classifier and the metaheuristic optimization method is capable of producing an accurate landslide susceptibility map. One significant advantage of the hybrid RVM-ICA model is that the model parameter selection phase is carried out automatically with the help of the metaheuristic algorithm. This means that the training and prediction phases of the proposed model can be performed autonomously with minimum human intervention. Thus, the model can be used by local authorities without much domain knowledge in machine learning. This advantage and the experimental outcome confirm that RVM-ICA is very helpful for planners and decision makers in the task of landslide hazard management in the study areas. The designed hybrid model can be applied for the same purposes to the other cities with the same weather and topographic conditions; however, caution and careful verification are very necessary.