An Ensemble Broad Learning System (BLS) for Evaluating Landslide Susceptibility in Taiyuan City, Northern China

: Landslides are common and highly destructive geological hazards that pose significant threats to both human lives and property on a global scale every year. In this study, a novel ensemble broad learning system (BLS) was proposed for evaluating landslide susceptibility in Taiyuan City, Northern China. Meanwhile, ensemble learning models based on the classiﬁcation and regression tree (CART) and support vector machine (SVM) algorithms were applied for a comparison with the BLS-AdaBoost model. Firstly, in this study, a grand total of 114 landslide locations were identiﬁed, which were randomly divided into two parts, namely 70% for model training and the remaining 30% for model validation. Twelve landslide conditioning factors were selected for mapping landslide susceptibility. Subsequently, three models, namely CART-AdaBoost, SVM-AdaBoost and BLS-AdaBoost, were constructed and used to map landslide susceptibility. The frequency ratio (FR) was used to assess the relationship between landslides and different inﬂuencing factors. Finally, the three models were validated and compared on the basis of both statistical-based evaluations and ROC curve-based evaluations. The results showed that the integrated model with BLS as the base learner achieved the highest AUC value of 0.889, followed by the integrated models that used CART (AUC = 0.873) and SVM (AUC = 0.846) as the base learners. In general, the BLS-based integrated learning methods are effective for evaluating landslide susceptibility. Currently, the application of BLS and the integrated BLS model for evaluating landslide susceptibility is limited. This study is one of the ﬁrst efforts to use BLS and the integrated BLS model for evaluating landslide susceptibility. BLS and its improvements have the potential to provide a more powerful approach to assess landslide susceptibility.


Introduction
A landslide is a significant movement of soil, rock and debris.As a common geological hazard that no country can ignore [1], landslides not only endanger people's lives and property, but can also have a massive impact on society.Data from several database sites indicate that, in China, the economic losses from landslides exceed CNY 2 billion, and more than 400 people lose their lives in landslides annually.Furthermore, with the advancement of technology, urbanization, and climate change, landslides are becoming increasingly frequent.Consequently, landslide susceptibility mapping (LSM) has received increasing attention as an important part of coping with the damage caused by landslides [2].LSM encompasses the modeling and mapping of a slope's sensitivity to forecast future landslide occurrences and evaluate their probabilities.By utilizing the results of prediction, policymakers and local agencies can categorize geographical surfaces into regions characterized by varying degrees of stability and instability, thereby offering invaluable support for effective landslide risk management and urban development [3].
In recent decades, numerous approaches have been suggested for the preparation of landslide susceptibility maps.With the advent of quantitative research on geological hazards, statistical methods were first used for landslide susceptibility assessments.Methods such as analytic hierarchy process [4], bivariate and multivariate statistical approaches [5][6][7], logistic regression [8] and the multivariate adaptive regression spline [9][10][11] have been widely used by scholars for mapping landslide susceptibility.Hybrid methods, including frequency ratio bivariate statistical models, weights-of-evidence bivariate statistical models, bivariate statistic-based kernel logistic regression (KLR) models and the bivariate model of the Dempster-Shafer evidential belief function (EBF), have also been proposed and widely applied to analyses of landslide susceptibility [12][13][14].As landslides are a complex geological hazard, a single statistically based approach cannot work well in all areas.Therefore, more sophisticated machine learning algorithms were considered.
Nowadays, a variety of machine learning methods are being utilized to prepare landslide susceptibility maps, including support vector machines (SVMs), decision trees [15], Knearest neighbors (KNN) [16], Extremely Randomized Tree (EXT) [17], Bayesian classifiers, artificial neural networks [18], the adaptive neuro-fuzzy inference system (ANFIS) [19] and convolutional neural networks (CNNs) [20].Among these, deep learning, as a new machine learning method proposed to overcome the limitations of machine learning algorithms, has been is verified to have good performance in evaluating landslide susceptibility [21].
In areas with fewer landslide data, better results can also be achieved through a transfer learning-based approach [22].While deep learning, which utilizes deep framing and gradient descent, can improve the accuracy of predictions, it also requires a significant amount of data and time to train the model, and there is no exception for evaluations of landslide susceptibility.To address this issue, Chen proposed a novel learning system known as the Broad Learning System (BLS) [23], which does not rely on deep learning.The BLS achieves its task of prediction by varying the number of mapped features and enhancement nodes instead of the number of network layers, and by solving for a pseudo-inverse instead of a gradient descent.This approach allows for high training efficiency and good classification effects.The BLS has been shown to have similar performance to methods such as stacked autoencoders, deep belief networks, deep Boltzmann machines and multilayer perceptions of MNIST data, and has the fastest training speed [24].BLS has been used as a promising approach in many fields, such as license plate recognition, predicting short-term wind speed, identifying wind turbine faults and industrial process fault diagnostics [25][26][27][28].The fast training speed and good predictions of BLS also make BLS highly promising for evaluating landslide susceptibility.
In addition to the aforementioned methods, the development of reliable landslide susceptibility maps through ensemble learning and optimization algorithms has emerged as a crucial area of research in landslide prevention.Various algorithms, including Gradient Boosting Decision Tree (GBDT) [29], natural gradient boosting, COA-MLP and SFO-MLP, among others, have demonstrated their effectiveness in the field of evaluating landslide susceptibility.These advanced algorithms have shown promising results and hold great potential for improving the precision and dependability of landslide susceptibility maps.
In this study, an ensemble learning model based on BLS was introduced for preparing landslide susceptibility maps.Ensemble learning models based on classification and regression tree (CART) and SVM were created for a comparison with method of BLS-AdaBoost.The entire workflow for this study is shown in Figure 1.Firstly, various data were processed using ENVI 5.3 and ArcGIS 10.2 software.Secondly, all landslide conditioning factors were evaluated using the random forest method.The landslide factors used in the study were also selected at this step.Thirdly, three models, namely CART-AdaBoost, SVM-AdaBoost and BLS-AdaBoost, were constructed and used to map landslide susceptibility.In the final step, all integration methods were evaluated and compared using statistical indices and ROC curves.
processed using ENVI 5.3 and ArcGIS 10.2 software.Secondly, all landslide conditioning factors were evaluated using the random forest method.The landslide factors used in the study were also selected at this step.Thirdly, three models, namely CART-AdaBoost, SVM-AdaBoost and BLS-AdaBoost, were constructed and used to map landslide susceptibility.In the final step, all integration methods were evaluated and compared using statistical indices and ROC curves.

Study Area
Taiyuan City is situated in the central region of Shanxi Province, northern China, covering approximately 6988 km .It spans between longitude 111 ∘ 30 and 113 ∘ 09 E, and latitude 37 ∘ 27 and 38 ∘ 25 N. Taiyuan experiences a warm temperate continental monsoon climate with an average annual precipitation of up to 390 mm, typical of the interior of the continent.The Fen River and the Hu Yu River flow through Tiayuan City, forming the main surface water system.
The landforms in Taiyuan are diverse and complex, with mountains, hills, plains, basins and valleys all present.The mountains cover 4528 km , accounting for 64.79% of the total area, and are mainly composed of Carboniferous and Permian sandstones, shales and Quaternary loess.The hills cover 904 km , accounting for 12.94%; the plains cover 1093 km , accounting for 15.64%; the basins cover 279 km , accounting for 3.99%; and the valleys cover 184 km , accounting for 2.63%.The terrain in the area is undulating, with a wide range of heights ranging from 760 m to 2708 m in elevation.The landforms in Taiyuan are diverse and complex, with mountains, hills, plains, basins and valleys all present.The mountains cover 4528 km 2 , accounting for 64.79% of the total area, and are mainly composed of Carboniferous and Permian sandstones, shales and Quaternary loess.The hills cover 904 km 2 , accounting for 12.94%; the plains cover 1093 km 2 , accounting for 15.64%; the basins cover 279 km 2 , accounting for 3.99%; and the valleys cover 184 km 2 , accounting for 2.63%.The terrain in the area is undulating, with a wide range of heights ranging from 760 m to 2708 m in elevation.
The mountainous area in the west of Taiyuan, particularly, consists mainly of Carboniferous and Permian sandstone, shale and Quaternary loess.The terrain is intricate, and the natural geological conditions are unfavorable.Together with relatively frequent human activities such as mining, landslides occur frequently in Taiyuan.

Data Used
A landslide inventory map is necessary for analyzing landslide susceptibility.To ensure the accuracy of the landslide inventory map, new methods based on satellite and ground-based remote sensing have been developed to enhance the reliability of these maps.In this study, a comprehensive landslide inventory map was created by integrating historical landslide information, high-resolution remote sensing imagery and extensive field observation data.In total, 114 landslide locations were identified, and their centroids are depicted in Figure 1.Additionally, 114 reasonable points were selected from the study area as a negative sample using GIS, resulting in a dataset with both landslide and non-landslide points.All these landslide points and non-landslide points were divided into two subsets, where 70% of the points were randomly selected for model training and the remaining 30% were used for model validation.The locations of all points in the dataset are shown in Figure 2.
The mountainous area in the west of Taiyuan, particularly, consists mainly of Carboniferous and Permian sandstone, shale and Quaternary loess.The terrain is intricate, and the natural geological conditions are unfavorable.Together with relatively frequent human activities such as mining, landslides occur frequently in Taiyuan.

Data Used
A landslide inventory map is necessary for analyzing landslide susceptibility.To ensure the accuracy of the landslide inventory map, new methods based on satellite and ground-based remote sensing have been developed to enhance the reliability of these maps.In this study, a comprehensive landslide inventory map was created by integrating historical landslide information, high-resolution remote sensing imagery and extensive field observation data.In total, 114 landslide locations were identified, and their centroids are depicted in Figure 1.Additionally, 114 reasonable points were selected from the study area as a negative sample using GIS, resulting in a dataset with both landslide and nonlandslide points.All these landslide points and non-landslide points were divided into two subsets, where 70% of the points were randomly selected for model training and the remaining 30% were used for model validation.The locations of all points in the dataset are shown in Figure 2.  The selection of the influencing factors is crucial for evaluating landslide susceptibility [30].The selected factors directly affect the competitiveness of the model.However, there is no general standard or guideline on how to select appropriate control variables [31].In this study, considering previous research, and the geological and environmental charac-teristics, 12 landslide conditioning factors were selected to assess the landslide susceptibility.These landslide conditioning factors can be divided into three categories.The first category reflects the geological and environmental conditions, including the normalized difference vegetation index (NDVI) and the distance to rivers.The second category reflects the topographic and geomorphic conditions, including the elevation, slope aspect, slope angle, plan curvature, stream power index (SPI), topographic wetness index (TWI), terrain roughness index, terrain relief and surface cutting depth.The third category reflects human activity, namely the distance to •roads.Each landslide conditioning factor was processed from the available data, and the final output was a raster image with a resolution of 30 m × 30 m.All the landslide conditioning factors used in this study are presented in Figure 3.The raw remote sensing images were subjected to radiometric calibration, atmospheric correction and fusion using ENVI 5.3 software to generate usable images.The calculation of the TWI, SPI, surface cutting depth and terrain relief involved the utilization of the raster calculator.Moreover, the distances from roads and water were derived using the buffer function.Other factors were acquired directly through the respective tools provided by ArcGIS.
Slope aspect is a critical factor that can significantly impact a slope's stability due to its influence on solar radiation and exposure to moisture.To accurately represent the actual terrain conditions, slope aspect was derived from a digital elevation model (DEM) and categorized into nine levels at 45-degree intervals.If the input raster data represented a flat area, it was assigned to the flat level.
Elevation has been identified as one of the primary factors affecting landslides.Higher elevations generally correspond to steeper slopes and less vegetation cover, leading to higher rates of erosion and greater susceptibility to landslides [32,33] The selection of the influencing factors is crucial for evaluating landslide susceptibility [30].The selected factors directly affect the competitiveness of the model.However, there is no general standard or guideline on how to select appropriate control variables [31].In this study, considering previous research, and the geological and environmental characteristics, 12 landslide conditioning factors were selected to assess the landslide susceptibility.These landslide conditioning factors can be divided into three categories.The first category reflects the geological and environmental conditions, including the normalized difference vegetation index (NDVI) and the distance to rivers.The second category reflects the topographic and geomorphic conditions, including the elevation, slope aspect, slope angle, plan curvature, stream power index (SPI), topographic wetness index (TWI), terrain roughness index, terrain relief and surface cutting depth.The third category reflects human activity, namely the distance to •roads.Each landslide conditioning factor was processed from the available data, and the final output was a raster image with a resolution of 30 m × 30 m.All the landslide conditioning factors used in this study are presented in Figure 3.The raw remote sensing images were subjected to radiometric calibration, atmospheric correction and fusion using ENVI 5.3 software to generate usable images.The calculation of the TWI, SPI, surface cutting depth and terrain relief involved the utilization of the raster calculator.Moreover, the distances from roads and water were derived using the buffer function.Other factors were acquired directly through the respective tools provided by ArcGIS.Slope is a critical factor affecting the occurrence and distribution of landslides.A steep slope increases the gravitational force on the soil and rock mass, making it more likely to become unstable and slide.Numerous studies have provided evidence supporting the significance of slope as a critical factor influencing landslide susceptibility.[34].Slope was reclassified into five categories: <3 The topographic wettability index (TWI) is a measure of the influence of topography on the underlying hydrological processes and it is commonly used to evaluate the susceptibility of slopes to landslides.Higher TWI values indicate greater soil saturation and a higher likelihood of landslides [35].In the study area, the TWI values ranged from 2.5-30.5, and five categories of TWI were identified: <5, 5-7, 7-12, 12-17 and >17.
Surface cutting depth refers to the difference in elevation between the ridge and valley of the surface terrain [36].Studies have shown that surface cutting depth has a significant impact on landslides' occurrence.A greater surface cutting depth indicates a larger amplitude of relief and more significant differences in terrain elevation, which can increase the potential for landslides due to factors such as soil erosion and weathering [37].The surface cut depths were divided into five categories: <3.5, 3.5-6, 6-9.5, 9.5-12.5 and >12.5.
Plan curvature is a terrain parameter that measures the change in a slope's orientation along a surface.Numerous studies have investigated the relationship between plan curvature and landslides, with many finding a significant correlation between the two [38,39].The values of plan curvature were reclassified into five categories: <−0.7, −0.7-−0.3,−0.3-0.02,0.02-0.6 and >0.6.
According to existing research, terrain relief is a significant factor affecting landslides.High terrain relief implies a steep slope, and steep slopes have a higher likelihood of experiencing landslides [40].The terrain relief map was reclassified into five divisions: <7, 7-11.5, 11.5-18.5,18.5-28 and >28.
The distance to roads is also an important factor affecting landslides.Slopes closer to roads are more prone to landslides due to human activities such as road construction and vehicle traffic, which can alter the natural state of slopes and increase their susceptibility to landslides [41].The distance to the road was reclassified into five categories: <800 m, 800-1600 m, 1600-2400 m, 2400-3200 m and >3200 m.
The distance to rivers is another important factor affecting landslides.Some studies have shown that slopes near rivers are more prone to landslides because the erosive effects of rivers may cause the slope to become unstable [42].The distance to rivers was categorized into five groups: <500 m, 500-1000 m, 1000-1500 m, 1500-2000 m and >2000 m.SPI, which is a measure of erosion by flowing water, is also considered an important factor affecting slopes' stability.The values of SPI were reclassified into five groups: <−3, −3-1.5, 1.5-2.5, 2.5-4.5 and >4.5.
NDVI, serving as an indicator of vegetation cover and health, can contribute to a slope's stability and lower the risk of landslides in regions characterized by high NDVI values [9].Therefore, NDVI was selected as an indicator for evaluating the susceptibility of slopes to landslides.The NDVI values were reclassified into five categories: <0.15, 0.15-0.23,0.23-0.28,0.28-0.36and >0.36.
The terrain roughness index is considered to be an influential variable in landslide susceptibility modeling [43].Rougher terrain tends to have steeper slopes and less cohesive soils, making the soil more likely to become unstable.The values of terrain roughness index were reclassified into five groups, i.e., <1.01, 1.01-1.03,1.03-1.11,1.11-1.16and >1.16.

Training and Validation Datasets
To assess the model's performance, the samples generated by processing were separated into two distinct sets: a training set and a test set.The dataset consisted of a total of 228 points, comprising both landslide points and non-landslide points.These points were randomly divided into two subsets, with 70% allocated to the training set and 30% to the test set.Each data point was labeled as "1" for landslide points and "0" for non-landslide points within the dataset.

Broad Learning System (BLS)
The BLS is a model based on the Random Vector Functional-Link Neural Network (RVFLNN), which does not rely on deep structures.The RVFLNN was first proposed by Pao and Takefuji in 1994 [44].The basic structure of the RVFLNN is illustrated in Figure 4, where X R m×n represents the input data, h represents the hidden nodes and ω h represents the weight between the enhancement layer and the input layer.The sum of all these weights can be denoted as and the sum of all these enhancement nodes can be denoted as The combination of H with X, as the input to the prediction process, is denoted as A. Y R m×c represents the output, and W o represents the weight between A and the output layer.Therefore, the RVFLNN can be formulated as where

Training and Validation Datasets
To assess the model's performance, the samples generated by processing were separated into two distinct sets: a training set and a test set.The dataset consisted of a total of 228 points, comprising both landslide points and non-landslide points.These points were randomly divided into two subsets, with 70% allocated to the training set and 30% to the test set.Each data point was labeled as "1" for landslide points and "0" for non-landslide points within the dataset.

Broad Learning System (BLS)
The BLS is a model based on the Random Vector Functional-Link Neural Network (RVFLNN), which does not rely on deep structures.The RVFLNN was first proposed by Pao and Takefuji in 1994 [44].The basic structure of the RVFLNN is illustrated in Figure 4, where  represents the input data, ℎ represents the hidden nodes and  represents the weight between the enhancement layer and the input layer.The sum of all these weights can be denoted as  =  ,  , ⋯ ,  , and the sum of all these enhancement nodes can be denoted as  = ℎ , ℎ , ⋯ , ℎ .The combination of  with , as the input to the prediction process, is denoted as . represents the output, and  represents the weight between  and the output layer.Therefore, the RVFLNN can be formulated as  The structure of the BLS is based on the RVFLNN, with some improvements to the input.The BLS no longer uses the input data  to obtain the enhancement layer and the output directly, Instead, the input  ∈  is first processed to obtain the feature nodes, The structure of the BLS is based on the RVFLNN, with some improvements to the input.The BLS no longer uses the input data X to obtain the enhancement layer and the output directly, Instead, the input X ∈ R m×n is first processed to obtain the feature nodes, and then the enhancement nodes are obtained from the feature nodes.The feature nodes and enhancement nodes collectively form the input A to the prediction process.The process from X to the map's features and the process from the map's features to the enhancement nodes can be expressed by following equations, respectively In these formulae, n groups of map features make up the feature node layer, denoted as consists of a set of m enhancement nodes.The weights and biases corresponding to the map's features and the enhancement nodes are randomly generated.The weight between the output and A is defined as W n , and the output results can be formulated as where is the result that needs to be obtained by training and is the key to making predictions.W n can be quickly calculated by the following equation.
In this study, 160 data samples, each containing 12 impact factor values, were used as input to the model, denoted by X.The feature nodes generated from the input were denoted by Z, and the enhanced nodes generated from the feature nodes were denoted by H. Together, they constituted the prediction input A. The landslide conditions of the 160 pieces of data were replaced by 0 and 1 as the target output Y.The A + required for the prediction was eventually obtained by calculating the pseudo-inverse.Figure 5 illustrates the architecture of the BLS model.
In these formulae, n groups of map features make up the feature node layer, denoted as  =  ,  , ⋯ ,  .The enhancement layer, denoted as  =  ,  , ⋯ ,  , consists of a set of m enhancement nodes.The weights and biases corresponding to the map's features and the enhancement nodes are randomly generated.
The weight between the output and A is defined as  , and the output results can be formulated as where  =  |  =   is the result that needs to be obtained by training and is the key to making predictions. can be quickly calculated by the following equation.
In this study, 160 data samples, each containing 12 impact factor values, were used as input to the model, denoted by .The feature nodes generated from the input were denoted by , and the enhanced nodes generated from the feature nodes were denoted by .Together, they constituted the prediction input .The landslide conditions of the 160 pieces of data were replaced by 0 and 1 as the target output .The  required for the prediction was eventually obtained by calculating the pseudo-inverse.Figure 5 illustrates the architecture of the BLS model.

Support Vector Machine (SVM)
SVM is widely used supervised classification method in the field of analyzing landslide susceptibility [45].The key to SVM is finding the hyperplane that separates the data into different classes.The initial data are converted into a feature space of higher dimensions, and an optimal hyperplane is then sought in this space.The effectiveness of the hyperplane is assessed using the margin, which is defined as the distance between the hyperplane and the nearest points from each class in the high-dimensional feature space.A larger margin represents a more suitable hyperplane.

Support Vector Machine (SVM)
SVM is widely used supervised classification method in the field of analyzing landslide susceptibility [45].The key to SVM is finding the hyperplane that separates the data into different classes.The initial data are converted into a feature space of higher dimensions, and an optimal hyperplane is then sought in this space.The effectiveness of the hyperplane is assessed using the margin, which is defined as the distance between the hyperplane and the nearest points from each class in the high-dimensional feature space.A larger margin represents a more suitable hyperplane.

Classification and Regression Trees (CART)
CART is a method based on decision trees.The algorithm comprises two primary steps: generation of the decision trees and pruning.The generation of decision trees is a recursive process that constructs a binary tree.At each node, the data are split according to rules, creating two subsets with the highest category purity.This process continues by dividing the resulting subsets using different rules [46].However, the recursive process of generating regression trees often leads to excessively large decision trees, which can hinder the model's generalization ability.Therefore, the pruning process becomes crucial to ensure that the model retains only the most important information (i.e., it selectively retains the nodes that explain the largest deviations) [47].

Adaptive Boosting (AdaBoost)
AdaBoost is widely recognized as one of the most renowned boosting algorithms [48].It is an ensemble method based on a linear superposition model.The input data, X, are first fed into a base learner for classification, where misclassified data are given a larger weight and correctly classified data are given a smaller weight.The input data X are then rowed according to the weights obtained and fed into the next base learner.The reason for this is to give more attention to the misclassified data in the next prediction.This process is repeated until a certain number of base learners have been obtained.Finally, the predictions of all base learners are combined as a linear superposition of model instances, weighted according to their effect on the training set.All base learners involved in the training and prediction process should predict better than random guesses based on the training set. Figure 6 provides an illustration of the aforementioned process.
to ensure that the model retains only the most important information (i.e., it selectively retains the nodes that explain the largest deviations) [47].

Adaptive Boosting (AdaBoost)
AdaBoost is widely recognized as one of the most renowned boosting algorithms [48].It is an ensemble method based on a linear superposition model.The input data, , are first fed into a base learner for classification, where misclassified data are given a larger weight and correctly classified data are given a smaller weight.The input data  are then rowed according to the weights obtained and fed into the next base learner.The reason for this is to give more attention to the misclassified data in the next prediction.This process is repeated until a certain number of base learners have been obtained.Finally, the predictions of all base learners are combined as a linear superposition of model instances, weighted according to their effect on the training set.All base learners involved in the training and prediction process should predict better than random guesses based on the training set. Figure 6 provides an illustration of the aforementioned process.
In this study, three integrated models were constructed for evaluating landslide susceptibility using CART, SVM and BLS as the base learners of the AdaBoost algorithm.The predictions of the base learners for landslide events were combined linearly to generate the outputs of the integrated models.

Selection of Landslide Conditioning Factors
The evaluation and selection of landslide conditioning factors are critical steps in evaluating landslide susceptibility [49].Random Forest, one of the most commonly used methods in data mining and feature selection, can provide variable importance measurement scores when analyzing data [50].The variable importance (VI) of each factor was obtained by Random Forest in this study.A VI value greater than 0 for a landslide conditioning factor indicated that the factor contributed to the prediction.Moreover, a higher VI value indicated the greater contribution of the factor.A VI value of 0 may have a In this study, three integrated models were constructed for evaluating landslide susceptibility using CART, SVM and BLS as the base learners of the AdaBoost algorithm.The predictions of the base learners for landslide events were combined linearly to generate the outputs of the integrated models.

Selection of Landslide Conditioning Factors
The evaluation and selection of landslide conditioning factors are critical steps in evaluating landslide susceptibility [49].Random Forest, one of the most commonly used methods in data mining and feature selection, can provide variable importance measurement scores when analyzing data [50].The variable importance (VI) of each factor was obtained by Random Forest in this study.A VI value greater than 0 for a landslide conditioning factor indicated that the factor contributed to the prediction.Moreover, a higher VI value indicated the greater contribution of the factor.A VI value of 0 may have a negative impact on the prediction.VI can be obtained by normalizing the sum of the Gini indices of each evaluation factor before and after all nodes of the whole tree.

Statistical Index-Based Evaluations
When mapping landslide susceptibility, accurately evaluating the model's performance is crucial for ensuring the reliability of the resulting maps.In this study, we chose a set of statistically based evaluation metrics to compare the predictions of different models, including positive and negative predictive rates, sensitivity, specificity and accuracy.The positive and negative predictive rates (PPR and NPR) represent the accuracy of predicting landslide and non-landslide events, respectively, based on the classification of the pixels.Sensitivity measures the accuracy of classifying landslide pixels correctly, while specificity measures the accuracy of classifying non-landslide pixels correctly.Accuracy represents the overall correctness of the resulting models for classifying both landslide and non-landslide pixels.They can be expressed by the following equations: Negative predictive rate = TN TN + FN (7) Accuracy = TP + TN TP + TN + FP + FN (10)

Receiver Operating Characteristic (ROC)
The ROC is commonly used in evaluating landslide susceptibility and other research areas.The ROC curve is obtained by plotting the false positive rate on the X-axis and the true positive rate on the Y-axis.The ROC curve reflects the change in the performance of a binary classifier learner when the threshold value is adjusted.The area under the ROC curve (AUC) is a measure of the model's reliability.The AUC can be calculated using the following formula where P and N represent the number of landslides and the number of non-landslides, respectively.

Relationships between and the Related Factors
By calculating the frequency ratios (FR), the probability of landslides occurring within the different classifications of each influencing factor can be obtained.FR values can effectively indicate the correlation between landslides and various influencing factors.Table 1 displays the FR values for all influencing factors examined in this study.Values greater than 1 indicate a stronger correlation and a higher probability of a landslide occurring [51].The FR values in the study area were higher than 1 for elevations of 941-1193 m and 1193-1416 m.This is due to the fact that the areas of high elevation in the study area are generally covered by vegetation, while the areas of low elevation have a flat topography.Therefore, landslides are frequent in the range of 941-1193 m and 1193-1416 m.The FR values were 1.94 and 1.24 for distances of <500 m and 500-1000 m to rivers, respectively.When the distance to rivers was greater than 1000, FR value was less than 1.The topographic changes caused by river erosion can affect the initiation of landslides [52].Regarding the distance to roads, a maximum FR value of 1.37 was achieved when the distance to roads was less than 800 m.Therefore, we can conclude that the FR always decreases with distance from roads.This can be attributed to the destruction of the slope's stability caused by road construction [53].For slope aspect, the FR value was greater than 1 for five directions: east, southeast, south, southwest and northwest.The smallest FR value of 0.55 was found for north-facing slopes, which indicated that they are less prone to landslides.Regarding SPI, the FR values were greater than 1 for the ranges of 1.5-2.5, 2.5-4.5 and >4.5, and reach a maximum value of 6.15 within the 2.5-4.5 range.For TWI, the FR values were greater than 1 for the categories of <5 and 5-7.This also means that landslides occur more frequently in these areas.The relationship between NDVI and FR showed that the FR values in the range of <0.15 and 0.15-0.23 were greater than 1.The minimum FR value was obtained for NDVI values greater than 0.36, with a value of 0.14.This is mainly because vegetation can reinforce the soil by increasing cohesion to some extent [54].Therefore, areas with strong vegetation are less prone to landslides.For the slope, gentle slopes do not have enough shear stress and gravity to create landslides, whereas steep slopes lack sufficient soil thickness to create landslides, so landslides mostly occur in the range of 3-13, and the FR value in this range was greater than 1.Moreover, for plan curvature, the FR values for the ranges of −0.7 to −0.3 and −0.3 to 0.03 were greater than 1.The FR values of surface cutting depth and terrain relief were greater than 1 in the range of 3.5-9.5 and 7-28, respectively.The FR values of the terrain roughness index were greater than 1 in the range of 1.01-1.11.These findings are in accordance with previous results [55,56].

Selection of Landslide Conditioning Factors
The importance scores for all 12 landslide conditioning factors obtained by Random Forest are shown in Table 2.The importance of all landslide conditioning factors was greater than 0, indicating that they all contributed to the prediction.Elevation achieved the highest importance score of 0.219, followed by NDVI (VI = 0.198), TWI (VI = 0.103), curvature (VI = 0.099), distance to rivers (VI = 0.087), distance to roads (VI = 0.073), SPI (VI = 0.059), slope aspect (VI = 0.046), terrain roughness index (VI = 0.042), slope (VI = 0.023), surface cutting depth (VI = 0.023) and terrain relief (VI = 0.021).As all factors contributed to the prediction, all 12 factors were used for evaluating landslide susceptibility in this study.After determining the landslide conditioning factors for predicting landslides, the next step was to create a landslide susceptibility map using the following three steps.Firstly, the values of each conditioning factor were reflected in the pixel values of the study area.The study area was transformed into pixels to serve as input for the prediction of landslides.Secondly, models were constructed using the prepared training data, and these models were applied to predict the landslide susceptibility for the entire study area.The results predicted by the models were considered to be the landslide susceptibility index.Lastly, the LSI values for the entire study area were input into ArcGIS to generate the landslide susceptibility maps.In this study, three ensemble learning models were utilized for mapping landslide susceptibility: CAET-AdaBoost (the adaptive boosting model with CART-based learners), SVM-AdaBoost (the adaptive boosting model with SVM-based learners) and BLS-AdaBoost (the adaptive boosting model with BLS-based learning).The LSI values predicted by the three integrated learning models were categorized into five classes (very low, low, moderate, high and very high) using the natural break method.The BLS-AdaBoost model resulted (Figure 7) in very low (20.5%),low (19.1%),moderate (19.9%), high (22.1%)and very high (18.5%)landslide susceptibility classes.The SVM-AdaBoost model resulted (Figure 8) in very low (18.3%),low (22.1%),moderate (19.3%), high (23.2%)and very high (17.1%)landslide susceptibility classes.The CART-AdaBoost model resulted (Figure 9) in very low (14.0%),low (32.6%),moderate (29.3%), high (15.8%)and very high (8.2%)landslide susceptibility classes.

Validation of the Landslide Susceptibility Maps
In this study, we evaluated several models for predicting landslide pixels using statistical evaluation indices.The ensemble model based on BLS showed the best performance in predicting landslide pixels (sensitivity = 94.3%),followed by the ensemble models based on SVM (sensitivity = 91.4%)and CART (sensitivity = 88.6%).For predicting non-

Validation of the Landslide Susceptibility Maps
In this study, we evaluated several models for predicting landslide pixels using statistical evaluation indices.The ensemble model based on BLS showed the best performance in predicting landslide pixels (sensitivity = 94.3%),followed by the ensemble models based on SVM (sensitivity = 91.4%)and CART (sensitivity = 88.6%).For predicting non-landslide pixels, the BLS-AdaBoost model (specificity = 93.3%)showed the best performance, followed by the SVM-AdaBoost model (specificity = 90.0%)and the CART-AdaBoost model (specificity = 87.1%).In terms of overall accuracy, the best performance was found for BLS-AdaBoost (accuracy = 87.1%),followed by SVM-AdaBoost (accuracy = 84.3%)and CART-AdaBoost (accuracy = 82.9%).
The results of ROC-AUC, as another evaluation method, are displayed in Figure 10.The ROC curves of the CART-AdaBoost model, the SVM-AdaBoost model and the BLS-AdaBoost model for the validation dataset are represented by green, red and blue, respectively.BLS-AdaBoost (AUC = 0.889) had the highest AUC value among the three ensemble models, followed by SVM-AdaBoost (AUC = 0.873) and CART-AdaBoost (AUC = 0.846).

Discussion
Mapping landslide susceptibility as a means of the prevention and management of landslides is an important guideline for urban planning, and geological hazard prevention and control [57].Various approaches have been used to assess landslide susceptibility.Decision trees, SVM and other methods are well established and are considered to be extremely effective in both landslide prevention and other fields.CART, as a classical decision tree, is an easy and straightforward technique to explain.However, it is overly simplistic for capturing the complexities of real-world scenarios [47] and it has limited applications in landslide susceptibility analyses.Nonetheless, improved trees that use CART as the base learner have been shown to be effective for solving complex problems [58].SVM is a more powerful learning machine, and numerous improvements, such as FR-SVM, the fruit fly optimization algorithm and ensemble learning, have been proposed to enhance its performance [59][60][61].The BLS is a simple and efficient system that was proposed by Chen and has been applied in several fields, but it has not been used in landslide susceptibility analyses to date.The extremely fast training speed, the relatively high reliability and the good performance of the BLS and the integrated BLS in other areas also indicated their potential for landslide applications.In this study, we applied the BLS and the integrated BLS model to evaluate landslide susceptibility, and we compared the performance of the integrated BLS with integrated learning systems that used CART and SVM as the base learners.The evaluation of landslide susceptibility was mapped to provide decision-making guidance for disaster control, urban planning, etc.It may help researchers and decision-makers to make better decisions to reduce the impact of landslide hazards on people's lives.
In this study, a total of 12 influencing factors were used for predictions based on the results of the evaluated importance of the influences.All three integrated models performed well on the validation dataset, especially the integrated model with BLS as the

Discussion
Mapping landslide susceptibility as a means of the prevention and management of landslides is an important guideline for urban planning, and geological hazard prevention and control [57].Various approaches have been used to assess landslide susceptibility.Decision trees, SVM and other methods are well established and are considered to be extremely effective in both landslide prevention and other fields.CART, as a classical decision tree, is an easy and straightforward technique to explain.However, it is overly simplistic for capturing the complexities of real-world scenarios [47] and it has limited applications in landslide susceptibility analyses.Nonetheless, improved trees that use CART as the base learner have been shown to be effective for solving complex problems [58].SVM is a more powerful learning machine, and numerous improvements, such as FR-SVM, the fruit fly optimization algorithm and ensemble learning, have been proposed to enhance its performance [59][60][61].The BLS is a simple and efficient system that was proposed by Chen and has been applied in several fields, but it has not been used in landslide susceptibility analyses to date.The extremely fast training speed, the relatively high reliability and the good performance of the BLS and the integrated BLS in other areas also indicated their potential for landslide applications.In this study, we applied the BLS and the integrated BLS model to evaluate landslide susceptibility, and we compared the performance of the integrated BLS with integrated learning systems that used CART and SVM as the base learners.The evaluation of landslide susceptibility was mapped to provide decision-making guidance for disaster control, urban planning, etc.It may help researchers and decision-makers to make better decisions to reduce the impact of landslide hazards on people's lives.
In this study, a total of 12 influencing factors were used for predictions based on the results of the evaluated importance of the influences.All three integrated models performed well on the validation dataset, especially the integrated model with BLS as the base learner, which achieved the highest AUC value of 0.889, which represents a significant improvement compared with the integrated models that used CART (AUC = 0.873) and SVM (AUC = 0.846) as the base learners.These findings indicate that the BLS and BLS-based integrated learning methods are effective for evaluating landslide susceptibility.Moreover, the integrated BLS-based model required only slightly more time than the BLS, and the ensemble BLS model can be readily implemented on a pre-trained BLS network [62].

Conclusions
As a type of geological disaster that brings great loss of human life, landslides are a hot topic all over the world.To evaluate landslide susceptibility, decision trees and SVM are commonly used, simple and efficient machine learning models.The AdaBoost algorithm has been proven to enhance the performance of decision trees and SVM for evaluating landslide susceptibility.Therefore, an integrated model that combines decision trees and SVM as base learners demonstrated outstanding performance in evaluating landslide susceptibility.In this study, we proposed the BLS and the integrated BLS-based learner, which outperformed the decision tree, SVM and the integrated learner in terms of both statistically based evaluations and ROC curve-based evaluations.The results suggest that the ensemble BLS has significant potential for analyzing landslide susceptibility, and the resulting landslide susceptibility maps may provide strong support for disaster prevention, planning construction, and other relevant fields.The conclusions of this study can be summarized as follows: (2) In total, 12 landslide impact factors were identified and assessed on the basis of their VI values.The most important impact factor was elevation, followed by NDVI, TWI, curvature, distance to rivers, distance to roads, SPI, slope aspect, terrain roughness index, slope, surface cutting depth and terrain relief.(3) The three models (CART-AdaBoost, SVM-AdaBoost, and BLS-AdaBoost) were evaluated and compared by statistical methods and AUC.All three methods had good results, but it is evident from the results that the method proposed in this study, utilizing ensemble BLS, outperformed the other two methods.

Figure 1 .
Figure 1.Workflow of the methodology in this study.(a) Determine the location of the landslide; (b) data processing; (c) model training; (d) model evaluation; (e) Mapping landslide susceptibility.

Figure 1 .
Figure 1.Workflow of the methodology in this study.(a) Determine the location of the landslide; (b) data processing; (c) model training; (d) model evaluation; (e) Mapping landslide susceptibility.2. Study Area and Data Used 2.1.Study Area Taiyuan City is situated in the central region of Shanxi Province, northern China, covering approximately 6988 km 2 .It spans between longitude 111 • 30 and 113 • 09 E, and latitude 37 • 27 and 38 • 25 N. Taiyuan experiences a warm temperate continental monsoon climate with an average annual precipitation of up to 390 mm, typical of the interior of the continent.The Fen River and the Hu Yu River flow through Tiayuan City, forming the main surface water system.The landforms in Taiyuan are diverse and complex, with mountains, hills, plains, basins and valleys all present.The mountains cover 4528 km 2 , accounting for 64.79% of the total area, and are mainly composed of Carboniferous and Permian sandstones, shales and Quaternary loess.The hills cover 904 km 2 , accounting for 12.94%; the plains cover 1093 km 2 , accounting for 15.64%; the basins cover 279 km 2 , accounting for 3.99%; and the valleys cover 184 km 2 , accounting for 2.63%.The terrain in the area is undulating, with a wide range of heights ranging from 760 m to 2708 m in elevation.The mountainous area in the west of Taiyuan, particularly, consists mainly of Carboniferous and Permian sandstone, shale and Quaternary loess.The terrain is intricate, and the natural geological conditions are unfavorable.Together with relatively frequent human activities such as mining, landslides occur frequently in Taiyuan.

Figure 2 .
Figure 2. Map of the study area and distribution of landslide points.Figure 2. Map of the study area and distribution of landslide points.

Figure 2 .
Figure 2. Map of the study area and distribution of landslide points.Figure 2. Map of the study area and distribution of landslide points.
The weight W h and the bias β of the enhancement nodes are initialized randomly.Only the weight W o in the whole model needs to be obtained by training.The network typically selects the non-linear activation function ξ as either sig(x) or tanh(x).
1) where  = |  +  , ℎ =   +  .The weight _ℎ and the bias  of the enhancement nodes are initialized randomly.Only the weight  in the whole model needs to be obtained by training.The network typically selects the non-linear activation function  as either   or ℎ  .

Figure 5 .
Figure 5. Structural diagram of the BLS network.

Figure 5 .
Figure 5. Structural diagram of the BLS network.

Figure 6 .
Figure 6.Structure of the Adaptive Boosting ensemble model.

Figure 6 .
Figure 6.Structure of the Adaptive Boosting ensemble model.

Table 1 .
The spatial relationships among landslide conditioning factors.

Table 2 .
Variable importance of the landslide conditioning using the Random Forest method.