E ﬀ ects of Seismogenic Faults on the Predictive Mapping of Probability to Earthquake-Triggered Landslides

: The seismogenic fault is crucial for spatial prediction of co-seismic landslides, e.g., in logistic regression (LR) analysis considering inﬂuence factors. On one hand, earthquake-induced landslides are usually densely distributed along the seismogenic fault; on the other hand, di ﬀ erent sections of the seismogenic fault may have distinct landslide-triggering capabilities due to their di ﬀ erent mechanical properties. However how the feature of a fault inﬂuence mapping of landslide occurrence probability remains unclear. Relying on the landslide data of the 2013 Lushan, China Mw 6.6 earthquake, this study attempted to further address this issue. We quantiﬁed the seismogenic fault e ﬀ ects on landslides into three modes: the distance e ﬀ ect, the di ﬀ erent part e ﬀ ects, and the combined e ﬀ ects of the two. Four possible cases were taken into consideration: zoning the study area vertical and parallel to the fault (case 1), zoning the study area only vertical to the fault (case 2), zoning the study area only parallel to the fault (case 3), and without such study-area zonations (case 4). Using the LR model, predictive landslide probability maps were prepared on these four cases. The model also fully considered other inﬂuencing factors of earthquake landslides, including elevation, slope, aspect, topographic wetness index (TWI), peak ground acceleration (PGA), lithology, rainfall, distance from the epicenter, distance from the road, and distance from the river. Then, cross-comparisons and validations were conducted on these maps. For training datasets, results show that the success rates of earthquake-triggered landslides for the former three scenarios were 85.1%, 84.2%, and 84.7%, respectively, while that of the model for case 4 was only 84%. For testing datasets, the prediction rates of the four LRs were 84.45%, 83.46%, 84.22%, and 83.61%, respectively, as indicated by comparing the test dataset and the landslide probability map. This means that the e ﬀ ects of the seismogenic fault, which are represented by study-area zonations vertical and parallel to the fault proper, are signiﬁcant to the predictive mapping of earthquake-induced landslides.


Introduction
Large earthquakes can trigger a large number of co-seismic landslides, especially in mountainous areas, and the damage caused by co-seismic landslides is much higher than that caused by the earthquake itself [1]. A lot of studies have been conducted on several aspects of earthquake-induced landslides, such as landslide inventory [2][3][4][5], the spatial distribution of landslides [6,7], and landslide susceptibility assessment [8][9][10]. Among them, the landslide susceptibility map is one of the most important works in exploring the laws of earthquake landslides and predicting where co-seismic landslides are prone to occurring, which is a greatly significant aspect of disaster prevention and mitigation [11]. To date, many methods have been applied to landslide susceptibility assessment, which can be classified into qualitative approaches based on expert experiences [12] and quantitative approaches based on statistical analysis [13,14]. Among them, the statistical analysis method is widely used for landslide susceptibility assessment because of its objective evaluation results and high prediction accuracy. It should be noted that different statistical analysis methods have their own advantages and limitations [10,[15][16][17]. Some studies [18,19] have shown that the logic regression model is relatively effective in regional landslide prediction and has been verified in many earthquake events [20][21][22].
It is widely accepted that the distribution pattern of earthquake-triggered landslides is controlled by many natural influence factors [13,23], among which the effect of the seismogenic fault is particularly prominent. For instance, observations have shown that co-seismic landslides tend to be distributed along seismogenic faults [24]. Thus, this effect should be fully considered in predictive mapping to seismic landslides. Considering the distance to the seismogenic fault, Kamp et al. used the multicriteria evaluation method to map landslide susceptibility in the affected area of the 2005 Kashmir earthquake, assuming varied effects of segments along a seismogenic fault [25]. Xu et al. conducted the landslide susceptibility assessment with a combination of eight influence factors by the weight index method. Their results showed that the predictive rates of the weight index modeling exceeded 90% when including the factors of the seismogenic fault [21]. Investigations demonstrated significantly different distribution patterns of landslides in different sections of the causative fault of the 2010 Haiti earthquake [26]. Gorum et al. suggested that the spatial pattern of co-seismic landslides caused by the 2008 Wenchuan M7.9 earthquake was influenced by style-of-faulting of the Yingxiu-Beichuan fault, which ruptured the huge temblor, implying the segmentation effect of the fault [27]. Despite all these, studies on this issue remain relatively few so far, of which most only simply considered the distances vertical to the fault.
Taking the 2013 Lushan earthquake as an example, this work attempted to quantify the fault effects on the predictive mapping of co-seismic landslides by the logistic regression (LR) model. In the analysis, 12 influence landslide factors were chosen, focusing on the effect of the seismogenic fault, which is represented by relative positions through zoning the study area vertical and parallel to the fault. Four scenarios were considered: zoning the study-area vertical to the fault, zoning the study area parallel to the fault, with both combined, and without these conditions, to yield four maps of landslide occurrence probability in a prediction sense through LR analysis. Then, successful rates of the prediction were calculated and compared to these maps.

Brief Description of the 2013 Lushan Earthquake
The 20 April 2013 Lushan, Sichuan, China M6.6 earthquake occurred on the southern section of the Longmen Shan fault zone, which is the boundary between the Tibetan plateau to the west and Sichuan basin to the east ( Figure 1). Its epicenter was located southwest of the 2008 Wenchuan M7.9 event that ruptured this fault zone. The Longmen Shan area is covered with the Late Triassic, which is underlain by the complex lithologic combination, including the Paleozoic passive edge sequence, Proterozoic granite block, and the slight exposures of insufficiently consolidated Cenozoic sediments, and the thick Triassic-Eocene foreland basin succession [28,29]. The study area is crisscrossed by mountains and has complex landforms. The terrain is characterized by a high elevation in the north, west, and south, and a generally low geographical distribution in the east. The elevation range is 213-7143 m, and the average elevation is 2851.06 m.
Field investigations showed that this major shock did not produce any surface rupture; only secondary ground breaks were visible on the ground. The relocated aftershocks, focal mechanism solutions, and distribution of seismic intensities suggest that its causative structure is likely a hidden reverse fault that strikes in NE, dipping NW at an angle of about 38 degrees. The upper tip of this fault lies at a depth of about 9 km, above which deformation can be explained by a fault-propagation anticline model [30]. mechanism solutions, and distribution of seismic intensities suggest that its causative structure is likely a hidden reverse fault that strikes in NE, dipping NW at an angle of about 38 degrees. The upper tip of this fault lies at a depth of about 9 km, above which deformation can be explained by a fault-propagation anticline model [30].

Method and Data
This work used the LR to carry out a quantitative analysis of the effects of the seismogenic fault on landslide prediction. To map the landslide probability in the study area, the study was divided into four parts: (1) a constructing database, which included a landslide inventory of the 2013 Lushan earthquake and influencing factor database, which contained 3 aspects, i.e., topography, geology, and seismology; (2) model construction: When doing LR analysis, 4 cases were assumed: with the effect of the distance vertical to the fault, with the effect of the distance parallel to the fault, with the combined effect of them both, and without such fault effects. Combining other 10 influencing factors, the LR analysis permitted yielding predictive landslide probability maps; (3) sampling strategy: samples for training and testing LR; and (4) model validation: Comparing the landslide occurrence probability maps, the AUC values were used to evaluate the accuracy of the 4 maps. Figure 2 shows the flow chart of these efforts.

Method and Data
This work used the LR to carry out a quantitative analysis of the effects of the seismogenic fault on landslide prediction. To map the landslide probability in the study area, the study was divided into four parts: (1) a constructing database, which included a landslide inventory of the 2013 Lushan earthquake and influencing factor database, which contained 3 aspects, i.e., topography, geology, and seismology; (2) model construction: When doing LR analysis, 4 cases were assumed: with the effect of the distance vertical to the fault, with the effect of the distance parallel to the fault, with the combined effect of them both, and without such fault effects. Combining other 10 influencing factors, the LR analysis permitted yielding predictive landslide probability maps; (3) sampling strategy: samples for training and testing LR; and (4) model validation: Comparing the landslide occurrence probability maps, the AUC values were used to evaluate the accuracy of the 4 maps. Figure 2 shows the flow chart of these efforts.

Logistic Regression Model
LR is a useful tool for analyzing the probability of landslide occurrence, in which the dependent variable is categorical (e.g., presence or absence) and the explanatory (independent) variables are categorical, numerical or both [32].
An LR model usually defines two-class nominal variables, such as untriggered and triggered landslides, and can use 0 and 1 to describe these two different classes, respectively. The relationship between the probability of landslide occurrence and the independent variables can be written as: where ( ) is the probability of landslide occurrence, i.e., when z tends to positive infinity, ( ) tends to 1, and when z tends to negative infinity, ( ) tends to 0; is a constant (intercept of the model), j represents the number of independent variables, ( = 1, 2, 3, …, n) is the regression coefficients of the model, and ( = 1, 2, 3, ..., n) is the independent variable (i.e., influence factors of landslides). These independent variable values were rasterized and derived by GIS technology. We extracted the value at a location specified in a point feature class (the samples we selected in Section 3.3) from 10 × 10 raster derived from

Logistic Regression Model
LR is a useful tool for analyzing the probability of landslide occurrence, in which the dependent variable is categorical (e.g., presence or absence) and the explanatory (independent) variables are categorical, numerical or both [32].
An LR model usually defines two-class nominal variables, such as untriggered and triggered landslides, and can use 0 and 1 to describe these two different classes, respectively. The relationship between the probability of landslide occurrence and the independent variables can be written as: (1) where f (z) is the probability of landslide occurrence, i.e., when z tends to positive infinity, f (z) tends to 1, and when z tends to negative infinity, f (z) tends to 0; a is a constant (intercept of the model), j represents the number of independent variables, b j ( j = 1, 2, 3, . . . , n) is the regression coefficients of the model, and x j ( j = 1, 2, 3, . . . , n) is the independent variable (i.e., influence factors of landslides). These independent variable values were rasterized and derived by GIS technology. We extracted the value at a location specified in a point feature class (the samples we selected in Section 3.3) from 10 × 10 raster derived from the influence factor maps. Then we recorded the values to the attribute table of the point feature (samples) class.
In order to test the fitting effect of the LR model, this work did a Hosmer-Lemeshow test, which is a statistical tool for the goodness of fit, commonly used in prediction models. Essentially, it is a chi-square test for grouped data, i.e., the data are divided into 10 equal subgroups [33].
where g is the number of groups. The test in this work is chi-square with g-2 degrees of freedom. A significant test shows that the model is not a good fit and a nonsignificant test indicates a good fit.

Landslide Inventory of the 2013 Lushan Earthquake
The 2013 Lushan earthquake triggered a large number of landslides, dominated by five types: rock collapse, rock sliding, soil collapses, deep-seated landslides, and large-scale rock avalanches. Although several landslide inventories of this earthquake have been available [34][35][36], most of them are not completed. Xu et al. interpreted aerial photographs and satellite images to generate a more detailed co-seismic landslide inventory for the 2013 Lushan earthquake, which has been used in later research [8,11,37]. In their work, pre-earthquake satellite images included SPOT-5, SPOT-4, Pleiades, ZY02C, and ZY-3; post-earthquake data are aerial photographs and satellite images including rapid-eye multi-spectral images (5 m resolution) and ZY-3 panchromatic (2.1 m resolution) and multispectral (5.8 m resolution) images. Meanwhile, they also validated the preliminary landslide inventory based on detailed field investigations, which can reduce omissions and errors. Finally, a complete landslide inventory of the Lushan earthquake was prepared, containing 21,054 landslides with a total area of 17.156 km 2 , of which 14,483 landslides were larger than 100 m 2 ( Figure 3) [7]. Overall, they were found to be distributed in NE, somewhat scattered.
which is a statistical tool for the goodness of fit, commonly used in prediction models. Essentially, it is a chi-square test for grouped data, i.e., the data are divided into 10 equal subgroups [33].
where g is the number of groups. The test in this work is chi-square with g-2 degrees of freedom. A significant test shows that the model is not a good fit and a nonsignificant test indicates a good fit.

Landslide Inventory of the 2013 Lushan Earthquake
The 2013 Lushan earthquake triggered a large number of landslides, dominated by five types: rock collapse, rock sliding, soil collapses, deep-seated landslides, and large-scale rock avalanches. Although several landslide inventories of this earthquake have been available [34][35][36], most of them are not completed. Xu et al. interpreted aerial photographs and satellite images to generate a more detailed co-seismic landslide inventory for the 2013 Lushan earthquake, which has been used in later research [8,11,37]. In their work, pre-earthquake satellite images included SPOT-5, SPOT-4, Pleiades, ZY02C, and ZY-3; post-earthquake data are aerial photographs and satellite images including rapid-eye multi-spectral images (5 m resolution) and ZY-3 panchromatic (2.1 m resolution) and multispectral (5.8 m resolution) images. Meanwhile, they also validated the preliminary landslide inventory based on detailed field investigations, which can reduce omissions and errors. Finally, a complete landslide inventory of the Lushan earthquake was prepared, containing 21,054 landslides with a total area of 17.156 km 2 , of which 14,483 landslides were larger than 100 m 2 ( Figure 3) [7]. Overall, they were found to be distributed in NE, somewhat scattered.
According to the spatial distribution of these landslides, a study area was determined for this work, which is about 3923.918 km 2 in size where the peak ground acceleration (PGA) value is greater than 0.08 g (Figure 3a). In order to keep the objectivity of the landslide number, congregate and patched landslides were divided into individual landslides expressed as multipolygons (with polygons that denote the clear boundaries and spatial location), rather than an individual landslide as a single polygon (Figure 3b).  According to the spatial distribution of these landslides, a study area was determined for this work, which is about 3923.918 km 2 in size where the peak ground acceleration (PGA) value is greater than 0.08 g (Figure 3a). In order to keep the objectivity of the landslide number, congregate and patched landslides were divided into individual landslides expressed as multipolygons (with polygons that denote the clear boundaries and spatial location), rather than an individual landslide as a single polygon (Figure 3b).

Influence Factors
Firstly, the seismogenic faults are thought to play a very important role in inducing earthquake landslides. The distance effect of the seismogenic fault and the different paragraph effects have significant control effects on the landslide. For example, in terms of a distance effect, landslides are mostly distributed along the seismogenic fault [2,38,39]. Regarding the effect of different paragraphs, the ability to trigger landslides is also significantly different due to the different nature of the seismogenic faults and the amount of energy released. For example, in the Wenchuan earthquake, the southwestern section of the seismogenic fault had a stronger landslide-triggering capability than the northeastern section [2]. There is also a significant different distributed pattern of landslides in a different section of the fault in the Haiti earthquake [26]. Thus, this study considered the effect of the seismogenic fault on landslides, which was divided into three modes: the fault distance effect, the fault segmentation effect, and the combined effect of both.
The causative fault of the 2013 Lushan earthquake seems to be unclear, though some studies suggested it is a hidden reverse fault. We collected several focal mechanism solutions reported (http://www.cea-igp.ac.cn/tpxw/266824.shtml, http://earthquake.usgs.gov) and the parameters of this fault [40][41][42][43][44]. We chose an approximate average of 214 • as the strike direction of the possible seismogenic fault. The study area was divided into 16 zones by parallel lines at an interval of 4 km on either side of the fault, each zone of which has a distinct distance to the fault (Figure 4j), NW to SE numbered 1 to 16. The epicenter is located between zones 6 and 7. To reflect the effect of fault's segmentation, the study area was divided into 12 zones with a spacing of 4 km along the fault, numbered from 1 to 12 ( Figure 4k) from SW to NE, each zone containing a small segment of the fault (Figure 4k). The epicenter is located between zones 6 and 7.
The landslides caused by earthquakes usually occur in areas with similar elevations, and the directionality of earthquake-related fault slips. The propagation of seismic waves may affect the occurrence of landslides. Therefore, we selected elevation, slope, and aspect to be used as control factors for earthquake-triggered landslides. Among them, the aspect was divided into nine categories: As one of the most important ground motion parameters for inducing earthquake landslides, PGA is often used as an important factor in the evaluation of seismic landslide risk. The PGA data are from the USGS(United States Geological Survey) (https://earthquake.usgs.gov).
The river is one of the important factors affecting landslides. The main manifestations are the weakening of the resistance to the front edge of the slope and the increase of the air front surface during erosion to affect the stability of the slope [45]. In addition, the road is another important impact on the local natural geographical environment. Because the undercutting action of rivers or roads may affect the stability of natural slopes, we chose the distance from the river and the distance from the road as an influence factor.
The effect of rainfall and topographic humidity on the landslide is mainly reflected in the following aspects: A large amount of rainwater infiltrates, leading to the saturation of the soil and rock layers on the slope, and even water accumulation on the waterproof layer below the slope, thus increasing the weight of the sliding body, reducing the shear strength of the soil and rock layers, and finally leading to the landslide. The TWI (topographic wetness index) can quantitatively indicate the controlling of the spatial distribution of soil moisture by terrain. Therefore, this paper selected the 2014 annual rainfall data and TWI as the influence factors of the landslide.
The Because this area suffered strong tectonic structures in different ages, the geotechnical characteristics of the same lithology of the different ages have a huge difference; by contrast, the lithological properties (such as cohesion and internal friction angle) of the same ages have few differences. Therefore, referring to previous studies [7,46], we classified the lithology based on stratigraphic ages. Here, the lithology of the strata in the study area was divided into 12 categories according to the stratigraphic chronology (Figure 4i , and intrusive rocks. The detailed lithology information is described in a previous study [7].
Based on GIS technology, all of the 12 influencing factor maps were transformed into a raster format with a grid cell size of 10 × 10 m. These influencing factor maps are shown in Figure 4.
Therefore, this paper selected the 2014 annual rainfall data and TWI as the influence factors of the landslide.
The sources of other factors are as follows: (a) to (d) and (f) to (h) are from DEM(Digital elevation model) (10 m resolution) based on GIS (3D surface), (e) (PGA data) from the USGS (https://earthquake.usgs.gov), and (i) (lithology) from a 1:200,000 geological map of the study area. Because this area suffered strong tectonic structures in different ages, the geotechnical characteristics of the same lithology of the different ages have a huge difference; by contrast, the lithological properties (such as cohesion and internal friction angle) of the same ages have few differences. Therefore, referring to previous studies [7,46], we classified the lithology based on stratigraphic ages. Here, the lithology of the strata in the study area was divided into 12 categories according to the stratigraphic chronology (Figure 4i), including Q (Quaternary), N&E (Neogene and Paleogene), K (Cretaceous), J (Jurassic), T (Triassic), P&C (Permian and Carboniferous), D (Devonian), S (Silurian), O (Ordovician), Sn (Sinian), Pre-Sn (Pre-Sinian), and intrusive rocks. The detailed lithology information is described in a previous study [7].
Based on GIS technology, all of the 12 influencing factor maps were transformed into a raster format with a grid cell size of 10 × 10 m. These influencing factor maps are shown in Figure 4.

Sampling Strategy
Using the medium value of elevations of each landslide, we roughly separated the source and deposit areas. This means that the area greater than this value is the source, while less than this value is the deposit ( Figure 5).
In this work, we aimed to map landslide occurrence probability that predicts the areal extent of landslides. In other words, we wanted the resulting probability to correlate with spatial extent (e.g., areas labeled 5% probability of landsliding will contain about 5% landsliding by area) [47]. Therefore, based on this criterion, using GIS, we randomly selected points throughout the study area (200 points/km 2 ). Figure 5 shows the distribution of co-seismic landslides near the epicenter, where the samples which fall into the source area are the landslide samples, and the rest are nonslip samples. Finally, in the whole study area, 1841 landslide samples and 782,874 nonlandslide samples were chosen, respectively, as training datasets. Next, we randomly selected points throughout the study area (100 points/km 2 ); 888 landslide samples and 38,352 nonlandslide samples were chosen, respectively, as a test dataset.

Sampling Strategy
Using the medium value of elevations of each landslide, we roughly separated the source and deposit areas. This means that the area greater than this value is the source, while less than this value is the deposit ( Figure 5).

LR Analysis
We constructed four LR models: (1) containing zonation of the study area parallel and vertical to the fault, model 1; (2) containing zonation of the study only vertical to the fault, meaning the effect of fault segmentation, as model 2; (3) containing zonation of the study area In this work, we aimed to map landslide occurrence probability that predicts the areal extent of landslides. In other words, we wanted the resulting probability to correlate with spatial extent (e.g., areas labeled 5% probability of landsliding will contain about 5% landsliding by area) [47]. Therefore, based on this criterion, using GIS, we randomly selected points throughout the study area (200 points/km 2 ). Figure 5 shows the distribution of co-seismic landslides near the epicenter, where the samples which fall into the source area are the landslide samples, and the rest are nonslip samples. Finally, in the whole study area, 1841 landslide samples and 782,874 nonlandslide samples were chosen, respectively, as training datasets. Next, we randomly selected points throughout the study area (100 points/km 2 ); 888 landslide samples and 38,352 nonlandslide samples were chosen, respectively, as a test dataset.

LR Analysis
We constructed four LR models: (1) containing zonation of the study area parallel and vertical to the fault, model 1; (2) containing zonation of the study only vertical to the fault, meaning the effect of fault segmentation, as model 2; (3) containing zonation of the study area only parallel to the fault, representing the distance effect, model 3, and (4) without these zonations, model 4.
Using SPSS software, we calculated the regression coefficients of the 784,715 training samples in the four LR models. For continuous variables, if the regression coefficient is positive, it means a positive correlation, i.e., with an increase of the independent variable, the probability of landslide occurrence becomes greater. For categorical variables, the last category was used as the reference category for each categorical variable (i.e., the coefficient of the last category is 0).
In order to test the fitting effect of the logistic regression model, we calculated the Hosmer-Lemeshow (Table 1). All the P-values are >0.05 (α), and so the test is nonsignificant, which indicates that the models are all a good fit. Generally, the higher the probability of landslide occurrence is, the greater the regression coefficient is. Based on the regression coefficients (Table 2 and Figure 6), we can explain the statistical relationship between each influence factor and the occurrence of landslides.
For four models, slope, PGA, and rainfall are positively correlated with the occurrence of landslides. TWI, distance to the epicenter, and river distance are negatively correlated with the occurrence of landslides. This is because co-seismic landslides usually occur along drains and on both sides of roads.
The lithology of Quaternary is the least likely to trigger landslides. For models 1 and 2, the Ordovician stratum is the most prone to landsliding. For models 3 and 4, the lithology of intrusive rock has the highest LR coefficients.
In the zones parallel to the fault, the regression coefficients zones 1-13 are much higher than those in zones 14-16 ( Figure 6), which are 52 km wide. In the zones vertical to the fault, on both sides of the epicenter (located between zones 6 and 7), the regression coefficients decrease with increasing distances.  Figure 6. LR coefficients of the influence factor "fault" for three models. (a) LR coefficients of "vertical to fault" for model 1, (b) LR coefficients of "parallel to fault" for model 1, (c) LR coefficients "vertical to fault" for model 2, (d) is the LR coefficients of "parallel to fault" for model 3.

Mapping Landslide Probability
According to the LR coefficients ( Figure 6, Table 1), based on the GIS platform, the Raster Calculator tool was used to create a Map Algebra expression that outputs a raster. By virtue of Figure 6. LR coefficients of the influence factor "fault" for three models. (a) LR coefficients of "vertical to fault" for model 1, (b) LR coefficients of "parallel to fault" for model 1, (c) LR coefficients "vertical to fault" for model 2, (d) is the LR coefficients of "parallel to fault" for model 3.

Mapping Landslide Probability
According to the LR coefficients ( Figure 6, Table 1), based on the GIS platform, the Raster Calculator tool was used to create a Map Algebra expression that outputs a raster. By virtue of Formula (1), the Z values were calculated using each influencing factor layer ( Figure 3). Based on Formula (2) and Z, the probability of landslide occurrence (P values) was calculated. Figure 7a-d shows the landslide probability distribution maps. The closer the probability value is to 0, the smaller the possibility of landslide occurrence. The closer the probability value is to 1, the more likely it is a landslide will occur. The highest probability values for the four models are 0.151, 0.150, 0.131, and 0.138, respectively.
Based on the natural breakpoint classification method, the landslide occurrence probability was ranked into five classes: (1) very low (0 to 0.18%), (2) low (0.18% to 0.77%), (3) moderate (0.77% to 1.78%), (4) high (1.78% to 3.55%), and (5) Very high (3.55% to 15%). Figure 7e-h shows the probability class maps of the study area for four models. For each probability class, its area in the study area for four models was calculated (Table 3). Apparently, the higher the probability of landslide occurrence, the smaller its area. Table 3. Comparison of predicted probability area and observed area ratio of landslides for four models.

P-Value Class
The Predicted Probability Area/km 2 The Observed Area Ratio of Landslides A more quantitative check on the predicted probabilities is to examine the area ratio of observed landslides within a class of predicted probabilities (i.e., the ratio of landslide area to the total area of this class). As shown in Table 3, there is a good consistency between the predicted probability area and the observed area ratio of the landslides (expressed by percentages). For example, for model 1, the predicted area of the very high probability of landslide occurrence is 6.26 km 2 , which is relatively small; the P-value class was 3.55%-15%, while the area ratio of landslides observed is 5.96%, which is relatively large. Such a result is reasonable and consistent with the real situation.

Evaluation of Mapping Results from LR Analysis
ROC(Receiver Operating Characteristic) analysis provides tools to differentiate two classes, established through a diagnostic test, in an optimal manner [48]. It is widely used to evaluate the quality of deterministic and probabilistic detection and forecast systems [49]. The ROC curve plots the false positive rate (1-Specificity) on the X-axis and the false-negative rate (1-Sensitivity) on the Y-axis. The area under the ROC curve (AUC) can be used to quantitatively compare models, which range from 0.5 to 1.0. An AUC larger than 0.7 means a good performance, while an AUC as low as 0.5 means a very poor performance.
The output results of ROC analysis for the training and validating datasets were calculated in this work, and the resultant ROC curves are shown in Figure 8a

Evaluation of Mapping Results from LR Analysis
ROC(Receiver Operating Characteristic) analysis provides tools to differentiate two classes, established through a diagnostic test, in an optimal manner [48]. It is widely used to evaluate the quality of deterministic and probabilistic detection and forecast systems [49]. The ROC curve plots the false positive rate (1-Specificity) on the X-axis and the false-negative rate (1-Sensitivity) on the Yaxis. The area under the ROC curve (AUC) can be used to quantitatively compare models, which range from 0.5 to 1.0. An AUC larger than 0.7 means a good performance, while an AUC as low as 0.5 means a very poor performance.
The output results of ROC analysis for the training and validating datasets were calculated in this work, and the resultant ROC curves are shown in Figure 8a,b, respectively. It is clear that for the training dataset, the four models of LR have a good success rate, and the AUCs of four LR are 0.851, 0.842, 0.847, and 0.840, respectively; while for validating the dataset, the AUCs of four LR are 0.8445, 0.8346, 0.8422, and 0.8361, respectively.
As can be seen from Figure 4, for regression models 1 and 3 that take into account the distance to the fault, the regression coefficients show a significant hanging wall effect. In other words, most of the co-seismic landslides tend to occur on the hanging wall of the seismogenic fault. Because the causative structure of the Lushan earthquake is presumably a reverse fault, the seismic shaking on its hanging wall might be stronger than the footwall, thus triggering more slope failures. There are some other examples of this effect. For example, in the 2005 Kashmir quake [60], more than one-third of the landslides were distributed within 1 km from the active fault and concentrated on the hangingwall side of the seismogenic fault. In addition, landslide distribution density on different sections of the fault may be different, because they have distinct landslide triggering capabilities that depend on multiple conditions. For the regression models 1 and 2 in this work that take into account the zonation of the study area vertical to the fault, the regression coefficients on either side of the epicenter decrease with distance, implying a feature of point source and line source ruptures ( Figure 6). There are some other examples, such as 2017 Jiuzhaigou, Sichuan [4] showing such an effect that fault segmentation may pose an effect on the pattern of co-seismic landslides.
As can be seen from Figure 4, for regression models 1 and 3 that take into account the distance to the fault, the regression coefficients show a significant hanging wall effect. In other words, most of the co-seismic landslides tend to occur on the hanging wall of the seismogenic fault. Because the causative structure of the Lushan earthquake is presumably a reverse fault, the seismic shaking on its hanging wall might be stronger than the footwall, thus triggering more slope failures. There are some other examples of this effect. For example, in the 2005 Kashmir quake [60], more than one-third of the landslides were distributed within 1 km from the active fault and concentrated on the hanging-wall side of the seismogenic fault. In addition, landslide distribution density on different sections of the fault may be different, because they have distinct landslide triggering capabilities that depend on multiple conditions. For the regression models 1 and 2 in this work that take into account the zonation of the study area vertical to the fault, the regression coefficients on either side of the epicenter decrease with distance, implying a feature of point source and line source ruptures ( Figure 6). There are some other examples, such as 2017 Jiuzhaigou, Sichuan [4] showing such an effect that fault segmentation may pose an effect on the pattern of co-seismic landslides.
At present, most landslide hazard assessments based on the LR model use the same proportion of landslide points and nonsliding points to construct training sample datasets. This kind of method will increase the proportion of landslide samples in the regional evaluation, resulting in the regional landslide occurrence probability index being distributed between 0 and 1. This result is inconsistent with the real situation. In general, landslides triggered by earthquakes account for less than 1% of the study area. Secondly, the landslide spatial assessments in different regions are not comparable. In order to obtain the true probability of landslides, this work randomly selected points throughout the whole study area (200 points/km 2 ). Although the number of landslide samples and nonlandslide samples are different, this imbalance represents the real situation of landslide occurrence in the study area. In addition, the probability of logistic regression prediction is affected by the class balance of the dataset (the ratio of landslides to nonlandslide points) [61]. When the class distribution in the dataset does not match the actual population, a deviation will occur. Thus, our method ensures that the ratio of the landslide point to the nonlandslide point to meet the true occurrence probability of the landslides, which minimizes the sampling deviation. In order to test the probability, we checked the predicted probabilities using the percentage of the observed landslide within a range of predicted probabilities. The results show that there is a good relationship between the predicted probability and the observed percentage of the landslide, which is an empirical estimate of spatial extent (Table 3). In other words, the probability this work obtained is related to the spatial extent, and the evaluation results are acceptable.

Conclusions
Taking the Lushan earthquake as an example, this work examined the influence of seismogenic faults on the predictive mapping of landslide probability using logistic regression (LR). We constructed four LR models to relate 12 influence factors to landslide probability maps: Model 1-zoning the study area vertical and parallel to the fault, model 2-zoning the study area vertical to the fault, model 3-zoning the study area parallel to the fault, and model 4-without such study-area zonations. Then, we prepared predictive landslide probability maps by LR analysis on these models. Cross-comparisons and validations to these maps show that the success rates of earthquake-triggered landslides for the former three scenarios were 0.851, 0.842, 0.847, respectively, while that of the model without considering the effect of the seismogenic fault (model 4) only 0.840. For the validating set, the AUCs of four LRs are 0.8445, 0.8346, 0.8422, and 0.8361, respectively. This means that the effects of the seismogenic fault, which are represented by study-area zonations vertical and parallel to the fault proper, are significant to the predictive mapping of earthquake-induced landslides, which permits enhancing the prediction accuracy by 2% compared to the model that only considers the distance from the fault. In addition, we randomly selected points throughout the study area (200 points/km 2 ) in this work. Although the numbers of landslide samples and nonlandslide samples are different, this imbalance represents the real situation of landslide occurrence in the study area, and using the training dataset from such samples, the result from LR analysis can represent the true probability of landslides.
Author Contributions: C.X. proposed the research concept, organized the landslide interpretation, and provided basic data. X.S. designed the framework and wrote the manuscript. S.M. participated in the writing and data analysis. Q.Z. participated in data analysis.