Landslide Geo-Hazard Risk Mapping Using Logistic Regression Modeling in Guixi, Jiangxi, China

: Reliable prediction of landslide occurrence is important for hazard risk reduction and prevention. Taking Guixi in northeast Jiangxi as an example, this research aimed to conduct such a landslide risk assessment using a multiple logistic regression (MLR) algorithm. Field-investigated landslides and non-landslide sites were converted into polygons. We randomly generated 50,000 sampling points to intersect these polygons and the intersected points were divided into two parts, a training set (TS) and a validation set (VT) in a ratio of 7 to 3. Thirteen geo-environmental factors, including elevation, slope, and distance from roads were employed as hazard-causative factors, which were intersected by the TS to create the random point (RP)-based dataset. The next step was to compute the certainty factor (CF) of each factor to constitute a CF-based dataset. MLR was applied to the two datasets for landslide risk modeling. The probability of landslides was then calculated in each pixel, and risk maps were produced. The overall accuracy of these two models versus VS was 91.5% and 90.4% with a Kappa coefﬁcient of 0.814 and 0.782, respectively. The RP-based MLR modeling achieved more reliable predictions and its risk map seems more plausible for providing technical support for implementing disaster prevention measures in Guixi.


Introduction
Landslide hazard is one of the frequent geo-hazards, especially in south China, and it has brought severe damages and losses to human society, as well as to the environment [1][2][3]. According to incomplete statistics, from the beginning of the 21st century to present time, landslide hazards in China have caused as many as 1200 casualties and the estimated economic loss is as high as 5-10 billion yuan, which further restricts the socioeconomic development in the hazard-affected regions [2]. It is therefore essential to conduct an accurate and reliable risk prediction for landslide prevention and early warning.
For this purpose, it is necessary to understand the physical process, or rather, the mechanism of landslides. In general, studied landslides are presumably triggered by precipitation, which may cause landslides when the rainfall is intense, soil resistance is relatively small, and the slope inclination is relatively high. The landslides cover a larger area if the slip length is longer [3][4][5][6][7]. However, the works of Zhang et al. [8], Ou et al. [9], and Zhou et al. [10] also revealed the important role of road construction and housing development in landslide events. Hence, it is critical to consider geological conditions, environmental conditions, and human activities in landslide risk assessment.
With the advent of the big data era, traditional geological methods have gradually become unable to meet current needs to deal with the more and more abundant big geodata. Artificial intelligence (AI) has long been used in scientific research but it was further promoted by machine learning (ML) and especially by deep learning (DL) techniques because of the capacity for processing hyper-dimensional data and for accurate prediction [11]. AI has not yet been widely applied in geoscience; it may therefore be of critical importance to develop and apply relevant algorithms or adapt the available algorithms to tackle big geodata for ore mineralization prediction or geo-hazard risk assessment to serve our society [12,13]. This will clearly be a meaningful exploratory experiment [13]. Landslide risk prediction is one of the important tasks for ML application in environmental geoscience.
As early as the 1970s, developed countries started to study the risk of geo-hazards, mainly landslides, in the world. In 1999, the United Nations General Assembly proposed an initiative on "the international strategy for disaster reduction" to reduce social vulnerability and the risk of natural disasters. In response to this initiative and based on previous research, countries in North America and Europe carried out studies on geo-hazard risk and its relationship with land use, making great progress in the domain of hazard mitigation and risk reduction [14]. For example, Lumb [15] analyzed the relationship between rocksoil types of slope mass and the occurrence of landslide. Collison et al. [16] suggested that the incline stability is associated with the vegetation coverage in tropical regions. Evans [17] revealed the fact that rainfall distribution in mountainous areas was affected to a certain extent by the elevation of the incline. Mark et al. [18] used 1500 landslide data to study the relationship between the shallow landslide frequency and terrain, and concluded that the steeper the incline, the greater the probability of a landslide.
Studies have also been undertaken to determine the concrete landslide causative factors. For example, Jiang and Eastman [19] used genetic algorithms to optimize the hazard-causative factors in landslide susceptibility mapping; and in terms of information entropy, Wan et al. [20] harnessed a data mining technique to determine the key controlling factors and their thresholds to provoke landslide hazards. At present, there are in fact a number of methods for landslide risk zoning, e.g., analytic hierarchy process (AHP), Fuzzy comprehensive evaluation, information quantity, certainty factor (CF) approach, and multiple logistic regression (MLR) analysis [21][22][23][24]. However, there are problems with the single evaluation model, e.g., its limitation in objectively determining the weight of the causative factors, subjective disturbances to the processing of variables in the modeling process, and negligence of the auto-correlation of the causative factors. It is thus a challenge to objectively, quantitatively, and accurately conduct hazard risk assessment at a regional scale [25,26]. Machine learning algorithms have also been utilized for landslide research [8,10], and a series of models are available, such as support vector machines (SVMs) [27], artificial neural networks (ANNs) [28], random forests (RFs) [29], boosted regression trees (BRTs) [30,31], and limit learning machine (LLM) [32]. These algorithms have been reported to have high accuracy for landslide risk assessment, though they have clear disadvantages such as a complex modeling process, unstable model performance, and weak explanatory capability [33,34]. For example, it is difficult to select the penalty coefficient c and kernel function g for SVM models, and to determine the number of neuron nodes in the hidden layer of backpropagation (BP) neural networks, etc. [35]. At present, the selection of these parameters lacks a systematic method and largely relies on the experience of researchers, which brings great uncertainty and difficulty for machine learning modeling. The combination of the MLR method based on GIS and some quantitative evaluation methods may have great potential in geo-hazard risk zoning and assessment. For example, Xu et al. [36] evaluated the landslide susceptibility provoked by an earthquake based on GIS and certainty coefficient analysis, and Miao et al. [37] conducted the susceptibility analysis of Lanzhou landslide hazard based on GIS and a MLR model. Qian et al. [38] and Ou et al. [9] used a combination of binary MLR and an information value (IV) approach to study the geo-hazard risk of Renhe watershed and Jiangxi, respectively. It is seen that MLR has a high potential for this kind of risk assessment. For this reason, this paper intends to utilize MLR to establish the functional relation between the landslide hazard (probability) and its geo-environmental factors, and to produce a risk map to provide technical support for disaster reduction and prevention action taking Guixi as an example.

Study Area
Situated in the northeast of Jiangxi, Guixi is a county covering a surface area of 2292 km 2 , extending geographically between 116 • 57 43 E and 117 • 28 06 E and between 27 • 50 53 N and 28 • 37 33 N in space ( Figure 1). Dominated by the humid subtropical monsoon climate, Guixi is characterized by a hot summer and mild winter, plentiful precipitation, and four distinct seasons. The mean annual temperature is 18.2 • C and the average annual precipitation is 1790 mm with a maximum of 2761.2 mm and a minimum of 1056.8 mm. The rainfall period is mainly concentrated in March to July, with an accumulated average of 1229 mm, accounting for 68.7% of the annual precipitation. However, it is less humid in the period from September to January. The terrain is complex in the study area, with an elevation ranging from 20 to 1504 m and a slope from 0 to 73 • . Guixi is generally high in altitude in the south as a part of the Wuyishan Mountains (Mts), which is composed of granitic massif, and low in the north as a plain, with Red Basin and hills in the center. Xinjiang is the main river in the central part of Guixi flowing from east to west, 60 km in length of which passes through the study area. The basin encompasses more than 50 km 2 in surface and includes 11 tributaries, e.g., Baita and Luotang Rivers from the south, and Yingshi, Taqiao, and Sili Rivers from the north.
Landslides are the most widespread and harmful geo-hazard in Guixi. According to field investigation, this hazard occurs frequently in spring and summer, especially in June and July; they are mostly small in scale and distributed on both sides of the roads. Most of the landslides contain a clear sliding surface, i.e., soil-rock interface, and are shallow slides (<10 m, mostly <3 m in thickness).

Data
The selection of landslide causative factors based on field observation is the first key step in risk zoning and assessment. In combination with the results of other authors [8][9][10][39][40][41][42][43][44], the following geo-environmental factors were obtained and prepared in order to achieve our objectives: elevation, slope, aspect, proximity to roads, proximity to rivers, proximity to faults, lithology of strata, land cover, average annual precipitation, accumulated average precipitations of March-July, May-July, and March-June, and the normalized difference vegetation index (NDVI) standing for the condition of vegetation development,. The sources of these data are listed in Table 1 and details of their quantification are presented in the following sections.  In the study area, 273 landslides were recorded based on the field survey, and they were divided randomly into two parts in a ratio of 7:3: training set (TS, 191 landslides) and validation set (VS, 82 landslides). Additionally, 514 non-landslide points (i.e., no risk) were identified and selected in the areas where slope is less than 3 • , e.g., urban areas or cropland where there is low probability of landslide occurrence. A total of 360 non-landslide points were added to the TS for the successive risk modeling and 154 points were incorporated into the VS for verification and validation purposes.

Geo-Environmental Factors
The geological and environmental factors involved in the study are presented in the following.

Elevation
Elevation is a fundamental characteristic of landform. A number of studies have revealed that there is an obvious relationship between geo-hazards and elevation [45][46][47][48][49][50], as the latter is an important indicator of the potential energy of a slope. The higher the incline in elevation, the higher the potential of energy.

Slope
Slope is a critical environmental factor that affects the occurrence of geo-hazards as it determines the possibility of slope mass displacement, or rather, soils in the weathered crust that have vertically different layers. The slope indirectly affects the relative stability between the layers and this determines, to a large extent, the mechanism of the incline deformation and formation of tension joints or fissures that make the slope massif unstable [51,52]. Slope may also influence rainwater permeation and water erosion. However, landslides may occur only when the slope reaches certain thresholds, e.g., 28-38 • [53], in natural conditions. Generally, bank slopes of large rivers, lakes (reservoirs), seas, and ditches, open front hillsides, railways, highways, and slope-cutting for building, etc., are all prone to landslides ( Figure 2b).

Aspect
Different aspects result in different exposure to sunlight, which brings about differences in vegetation coverage, evapotranspiration, soil erosion, and especially, a different weathering process. These will, in turn, influence the slope runoff, rainfall penetration, the physical and mechanical properties of slope mass, and finally, the stability of the incline [8,54]. Aspects are presented in Figure 2c.

Lithology
Stratigraphic lithology is an important factor for all geo-hazards. On the one hand, lithology determines the development of joints and fractures of rocks under stress or tension. On the other hand, if the lithological characteristics of a given slope are different from those of the underlying rocks, there will be a high potential to form an interbedding or unconformity surface favorable for landslide. Given the same other condition, different lithologies may have different resistance to landslide events, e.g., rigid granitic massif would be more resistant than shale and mudstone. Instead of using AHP as in Roccati et al. [24], we followed Zhang et al. [8] in assigning a proneness weight to each category of lithology shown in Table 2. The assignment rule is that the higher the value, the higher the tendency of the lithology to landslides.

Proximity to Faults
The influence of the fault structure on the landslide mainly lies in that whether it is a normal or a reverse fault, a certain width of the fracture zone is generated on both sides of it. In areas where fractures or joint fissures are well developed, dense, weak, and vulnerable zones may be formed [37]. Thus, the nearer to the faults, the higher susceptibility to landslide; and the bigger the scale of the faults, the greater the impacts they exert. In this study, the distance to faults was calculated by a buffer operation within GIS, and the fault buffers were defined as 0-30 m, 30-60 m, 60-90 m, 90-120 m, and >120 m for small scale faults (<10 km in length), which were assigned a weight value of 13, 10, 7, 4, and 1, respectively, and 0-60 m, 60-120 m, 120-180 m, 180-240 m, and >240 m for bigger faults (≥10 km in length), which were assigned a weight value of 20, 15, 10, 5, and 1, respectively [8,10] (Figure 3c).

Proximity to Roads
Slope cutting and excavation for road construction and housing development destroyed the stability of the incline, provoking landslides [8,10,24]. In terms of our understanding about the impacts of road construction on landslides in the field, it was noted that distance plays a key role in such hazard events, i.e., the closer to the road, the higher the propensity of landslide. Thus, the distance to roads was regarded as a critical factor which was calculated by a buffering operation. The road buffers were defined with intervals of 0-30 m, 30-60 m, 60-90 m, and 90-120 m, and assigned, respectively, a weight value of 10, 8, 4, and 1 [8,10] (Figure 3a).

Precipitation
A large number of landslides show that precipitation is the first important cause of their occurrence. There are generally two cases: either sufficient precipitation duration or intensity [37]. The rainy season in southern China goes from March to July. During this period, there is usually a lot of rainfall and heavy periods of rainfall occur in June-July of each year, and landslide hazards frequently take place in these two months. Therefore, the annual mean precipitation, the accumulated average precipitation of March-July, March-June, and May-July of the past 20 years were obtained from 40 ground weather stations, and then calculated and interpolated with the inverse distance weighting (IDW) method into a grid. These precipitation layers were selected for our risk modeling to investigate which months' combination of rainfall would best reveal its role in the landslide events.

Land Cover Types
Land use refers to all activities that humans impose on land for survival and development. This can be not only production activities such as planting crops, trees, pasture, construction of factories, roads, etc., but also non-production activities such as construction of scenic tourist areas and nature reserves. With increasing demand from human beings, land use has become more and more intensive, resulting in a decline in woodlands/forests and grasslands in some regions or countries [55]. Such land use and change may exert a significant impact on soil erosion, especially, when there is concentrated rainfall [55,56]. The development of burn-and-slash agriculture makes land prone to landslide hazards. In fact, there are at present six land cover types in the study area, namely farmlands, water bodies (rivers and lakes), built-up areas, forests, shrubs, and finally bare land, and these were allocated a proclivity weight of 0, 0, 0, 1, 4, and 10, respectively [8,10] (Figure 2a). Here, forest cover has the least proclivity while bare land without vegetation protection is the most vulnerable category when the other conditions are the same.
The land cover map was produced from Landsat TM images dated 31 May (Spring) and 7 November 2010 (Autumn) based on the approach proposed by Wu et al. [57].

NDVI
The NDVI (normalized difference vegetation index) reveals information about vegetation cover, e.g., vegetation vigor, density, biomass, and its root system. Roots are able to hold soil and stabilize the hillside mass, and reduce the impact of precipitation on the soil. The level of vegetation coverage, in particular forest cover, reflects the degree of vegetation protection of the land surface. Under the same conditions, areas with greater vegetation coverage are less prone to landslide hazards.
The NDVI of the study area was generated from Landsat 5 TM images acquired in late autumn (late October and early November) in the five-year period 2005-2010. After atmospheric correction using the COST model [55,58], an averaged five-year NDVI was produced to represent the natural vegetation (e.g., forests, woodlands, and shrublands) as croplands were harvested and herbaceous vegetation became withered at that time. This mean NDVI image ranges from −0.297 to 0.823 (Figure 2d).

MLR Model
Based on the linear regression in which the dependent variables are binomial, the MLR model applies a logical function to solve the nonlinear problems and this conversion has made MLR highly successful in the field of machine learning. For this reason, it has been increasingly drawing the attention of geographers, biologists, and environmentalists in recent decades thanks to its ability to analyze spatially explicit causes and predict the potential changes and events in the ecosystem [55]. It has hence been widely applied in deforestation [59,60], land use change [61][62][63], and landslide prediction [64]. Within MLR, independent variables can be continuous or discrete and do not necessarily meet the requirement of normal distribution. Supposing that the probability of occurrence of geo-hazards is P, and the probability of non-occurrence is 1 − P, then ln[P/(1 − P)] is taken as the dependent variable, the hazard-causative factors x i (i = 1, 2, 3, .... n) as the independent variables, and the MLR model can be expressed as a linear regression in Equation (1).
The probability of landslide occurrence can be computed with Equation (2).
where P is the calculated probability of landslide occurrence, x i (i = 1, 2, 3, ... n) are the geo-environmental factors as independent predictive variables, α i (i = 1, 2, 3, ... n) are the coefficients of these predictive variables, and α 0 is a constant.
After calculation of the probability of landslide occurrence, the landslide risk zoning map can be created.

Modeling Procedures
For landslide risk zoning, it is requisite to determine the potential role of the geoenvironmental factors in the landslide event, in other words, to understand which factors have a greater effect on this hazard occurrence than others. In this research, the prediction and assessment of landslide hazards in Guixi was conducted using two MLR models, i.e., CF-based and RP-based MLR models.

CF Calculation
Proposed by Shortliffe and Buchanan for certainty reasoning in medical diagnosis in 1975 [65], the CF was redefined and improved by Heckerman in 1986 [66]. It is a probability function [67], which is used to examine the tendency of factors governing the occurrence of an event. As the evaluation process is relatively simple and accurate, it is widely used in geo-hazard risk assessment [68][69][70][71][72]. The premise is to assume that the geo-hazards that have occurred and those that will probably take place in the future are under the same geological and environmental conditions. The calculation formula is shown in Equation (3): where PPa is the conditional probability, or rather, the ratio of the area or number of landslide hazards appearing in subset a to the area of subset a. PPs is the probability of occurrence of the whole event, and in our case, it can be represented as the ratio of the landslide area or number occurring in the whole study area to the surface area of the study area.
According to Equation (3), the CF varies from −1 to 1. The closer the CF to 1, the greater the possibility of a landslide hazard or vice versa. When CF is 0, it indicates the same probability of landslide occurrence and non-occurrence.
Superimposing the training samples (landslides and non-landslides) on each factor layer that has been divided into subsets, the CF approach was applied to calculate the CF value of each subset of the 13 geo-environmental factors. The CF-based dataset was hence produced and the areas and numbers of the subsets of a part of the factors are presented in Table 3.

RP Resampling
Within GIS, 50,000 RPs were generated and used to intersect samples (i.e., landslide event polygons for training), and 1858 points were retained in the polygons (0 or 1) in the dependent variable layer, in which 589 points fell in the landslide areas and 1269 points in the no-risk areas. These points were superimposed and intersected with each geoenvironmental factor layer to obtain the values corresponding to the landslides and no-risk areas. Thus, an integrated dataset including the training set, i.e., RP-based dataset, was produced.

MLR Modeling
The RP-based and CF-based datasets were input into SPSS20.0 for binary MLR analysis, where landslide events were treated as a dependent variable and other geo-environmental factors as independent (predictive) variables. The first and second landslide risk models, i.e., RP-based MLR-1 and CF-based MLR-2, were thus obtained. The regression coefficient of each factor was considered as its contribution weight to the landslide event.

Risk Mapping
After modeling, the two models were applied back to the predictive variables to calculate the landslide hazard risk of the whole study area. Landslide risk zoning was conducted based on the calculated probability, which was further graded into different risk levels, e.g., when the probability is 0-0.2, the zone is considered stable (no risk), 0.2-0.4 low, 0.4-0.6 moderate, 0.6-0.8 high, and 0.8-1.0 extremely high risk.

Verification and Reliability Analysis
Overall accuracy (OA) and Kappa coefficient [8,11,57,73,74] were the metrics to evaluate the accuracy and reliability of the landslide risk models produced by MLR modeling using RP-and CF-based datasets against the VS. As Landis and Koch [74] interpreted, if KC reaches 0.81-1.00, the prediction reliability is "almost perfect" [8].

Multicollinearity Diagnosis
Collinearity means that two predictive variables are linearly correlated to each other in a regression model, and if there are more than two variables involved, we call it multicollinearity. In such a case, it is not possible to conduct regression modeling as independent variables cannot independently predict the dependent variable. MLR modeling is susceptible to such collinearities among the predictive variables. Tolerance (TOL) and variance inflation factor (VIF) are two criteria for diagnosing multicollinearity [75,76]. If TOL is <0.1 or VIF > 10, it indicates a strong multicollinearity among the independent variables [77].

Results
The intermediate processing and final results are presented in Tables 4-7, and Figure 4 in this section for further discussion. Note: β-regression coefficient of each predictive factor in the model, SE-standard error, Wals-Chi-square value, df-degree of freedom, and Sig-significance. Table 6. Overall accuracy (OA) and Kappa coefficient (KC) of the two MLR models versus the validation set (VS).

MLR-1 MLR-2
No risk 4.76% 6.60% Low risk 3.66% 1.83% Median risk 4.76% 2.93% High risk 9.89% 6.60% Extremely high risk 76.92% 82.05%  Tables 4 and 5, respectively, where the estimated coefficients of the predictive factors (independent variables) are demonstrated. These models were utilized to compute the probability of landslide risk for the whole study area. The observation accuracy of the two modeling results are illustrated in Table 6. The two obtained models MLR-1 and MLR-2 are expressed in Equations (4) and (5):
It is worth mentioning that in Tables 4 and 5, we can calculate the odds ratio (OR) [55,62] using exp(β) between the geo-environmental factors to understand the likelihood contribution to the hazard event of one factor over the other.
Evaluated versus the VS, the OA and KC of the two MLR models are 91.5% and 0.814 and 90.4% and 0.782, respectively (Table 6). Based on the interpretation of Landis and Koch (1976), the modeling result of Model MLR-1 reaches "almost perfect" while that of MLR-2 is "substantially" good, though the observation accuracy of the latter appears to be slightly higher as more field-observed landslides are located in the predicted high and extremely high zones (Table 7 and Figure 4).

Collinearity
The results of the collinearity diagnosis are demonstrated in Table 8. For model MLR-1, the TOL of all rainfall factors was <0.1 or VIF > 10, which means that they are correlated and it is not necessary to use all of them for MLR modeling. However, for model MLR-2, the TOL was >0.1 with VIF < 10, indicating that there was not a serious multicollinearity problem among the independent factors for CF-based MLR modeling.
The two maps show that the areas of Guixi susceptible to landslide hazard are largely spread alongside the roads. The high risk and extremely high risk areas are mainly distributed in the towns of Wenfang, Jintun, and Zhangping in the south and Baitian in the north of the study area. The central part of Guixi is a relatively flat plain with a low risk of landslides.
As seen in Tables 4 and 5, the regression coefficients of the geo-environmental factors obtained from MLR analysis denote the order of importance. For example, the importance of roads, slope, and rainfall has been well illustrated in MLR-1 and MLR-2. Road construction and housing development destroy the original stability of the slope massif, triggering the downward slide of the overlying geological strata and the weathered crust along the bedding surface or the downhill slide of the slope massif on the unconformed surface with the underlying rocks. Hence, these human activities form a key part of the landslide disaster. Precipitation is considered the most important triggering factor of this kind of disaster, especially in south China.

Digitization and Weight Assignment
In order to model and predict the landslide risk in Guixi by the MLR algorithm, we employed 13 geological and environmental factors either as continuous or discrete variables, together with field observation data. The non-numeric factors (e.g., geological strata, faults, road, and rivers) were first digitized and then allocated a propensity weight either to different lithology or to buffers of the linear features in line with their proximities to the latter to make these factors digitized. Thence, such processing made data-driven approaches possible and successful in the landslide risk modeling of this study.

Importance of the Variables in Landslide Events
As revealed in Tables 4 and 5, the first three important geo-environmental factors of Model 1 (MLR-1) are the proximity to roads, the accumulated average precipitation of March-July, and the slope, and of Model 2 (MLR-2) are the slope, the proximity to roads, and the accumulated average precipitation of March-June. Both models have illustrated more or less the similar correlation of the hazard-causative factors such as roads, slope, and accumulated average rainfall of March-June or March-July with landslide events. The accuracy of the RP-based Model 1 seems to be higher than the CF-based Model 2 ( Table 6). As already mentioned earlier, most of the observed landslides were distributed along the roads or behind the houses, and resulted directly from human activities, i.e., road construction and housing development. This is the same as what we discovered in Jiangxi [8][9][10].
Collinearity analysis reveals that, for RP-based MLR modeling, one of the rainfallrelated factors is enough, e.g., annual average precipitation or accumulated average precipitation of March-July or May-July. As demonstrated in Table 4, the accumulated average precipitation of March-July is more important than other rainfall factors, and hence can be selected for final risk mapping.

Lowering the Threshold of Slope
We noted that 234 landslides, taking up 85.9% of the total, took place in zones with a slope ranging from 3 • to 21 • . Yet, Fan et al. suggested that the sliding threshold be 28 • -38 • for natural landslides [53]. This means that the threshold has been lowered by the intervention of human activities. This finding is also supported by two other parallel studies [8][9][10] in the same province.

Comparison with the Results of Others
Another study [8] employed an RF algorithm to predict the landslide risk in Guixi and achieved the mapping work with an OA of 91.23% and KC of 0.82. This is close to those of Model 1 of this study, indicating that RP-based MLR modeling and prediction can achieve the same satisfactory results as RF algorithm. Ou et al. [9] conducted a comprehensive regional-scale study on landslide hazard mapping in Jiangxi using MLR based on an information value (IV) approach and achieved an accuracy of 86%. This is also comparable to what we have effectuated. In future work, a comparative analysis is to be conducted using SVM and BRT algorithms, as [31] reported that these algorithms may lead to relevant risk prediction and evaluation.

Conclusions
The prediction of landslide geo-hazards is a complex nonlinear process. This study has not only indicated ways of processing geo-environmental data, especially non-digital data for quantitative analysis, but also exhibits the fact that even the same modeling algorithm may reach different results if the sampling scheme is different. It is therefore necessary to select a reasonable one for achieving reliable risk prediction. The RP-based MLR approach seems more reasonable than the CF-based one and will be recommended for extension to similar landslide hazard prediction elsewhere. The prediction and its results may provide technical support and relevant advice to decision-makers for disaster reduction and prevention management in the study area.
This study also illustrates the unfavorable role of anthropogenic activities in causing such geological disasters, confirming the results of Zhang et al., Ou et al. and Zhou et al. [8][9][10]. Man exploits natural resources, modifies the land surface, and, at the same time, directly breaks the original balance of the landscape. This is the origin of the artificial landslides that are related to the road system development and urbanization. Apart from a better and more comprehensive geological survey before road construction and housing development, man has to think of optimal land use planning to minimize the man-made disasters.

Data Availability Statement:
The data that support the findings of this study are available the corresponding author, Weicheng Wu, upon reasonable request.