1. Introduction
Landslides are catastrophic natural hazards frequently posing risks to the major societal, economic, and environmental on an international scale [
1]. According to the report from EM-DAT [
2], 21,412 landslides occurred worldwide between 1900 and 2014, resulting in 38,521,499 fatalities, with 7,229,487,068 people affected and total direct economic losses exceeding
$2.7 trillion.
Landslide susceptibility assessment has long been recognized as a useful tool for landslide hazard management through land use planning and better decision making in landslide prone areas [
3]. It is generally based on heuristic, statistical, or deterministic models [
4,
5,
6,
7,
8]. Heuristic models are subjective and much susceptible to the expectation of the results [
9,
10]. Deterministic models have been reported with higher accuracy, but are limited by the difficulty of obtaining detailed landslide database [
11,
12]. Statistical models are the most widely used models due to their simplicity and high efficiency [
13,
14]. Many remarkable studies on the above aspects have been made, laying a solid foundation for landslide susceptibility mapping. However, in general, most of the previous studies consider the relationship between triggering factors and landslide occurrence as a fixed effect within an area, whereas different degrees of parameter influence may occur, such that, with the change of location, the effect of parameters can be consequently changed. The uncertainties due to this varied relationship remain a scientific challenge.
The second law of geography suggests that there exists variability over space of a given relationship between variables widely in spatial data [
15], which is the so-called spatial non-stationarity. In view of that landslide susceptibility assessment is heavily based on spatial data, the relationship between influencing factors and landslide susceptibility may also have the characteristics of spatial non-stationarity. Previous studies, e.g., [
16], have also suggested that the effective parameters in the occurrence of a natural disaster phenomenon do not have the same importance in different parts of an area. The existence of spatial non-stationarity indicates that average relationships fitted to the whole study area of traditional models might be inappropriate since they do not accurately fit local conditions [
17]. This spatial non-stationarity characteristics in the data pose difficulties in landslide susceptibility assessment based on the traditional models.
Geographically weighted regression (GWR), the most popular local regression format, shows great capability in dealing with spatial non-stationary relationships [
18]. It allows the relationship between dependent and independent variables to vary over space, as well as that regression coefficients in the model are calculated for each spatial zone [
19]. This method has been applied in various fields of study such as social economics, geography, and meteorology [
20,
21,
22]. However, previous studies applying GWR model for the assessment of geological hazard susceptibility have not yet been reported.
One difficulty limiting the application of GWR model in landslide susceptibility assessment is spatial proximity. It is the basic input condition and core problem for GWR model, and the issue regarding an adequate expression for the spatial proximity at different locations directly affects the modeling ability of GWR model [
23,
24]. Spatial proximity is the distance relationship between two units in space, and the closer the distance, the greater the impact. The key to determining the spatial proximity is segmenting the study area into map units to effectively express the spatial adjacency relationship between landslide data. The relationship should satisfy the requirements of GWR for good internal homogeneity and between-units heterogeneity. The commonly used segmentation methods in previous studies relating to GWR model can be categorized into two major kinds, i.e., administrative units [
25] and grid units [
26]. Administrative unit is mostly used for social and economic issues, and its segmentation does not accord with the neighborhood characteristics of landslide data. As such, administrative unit is rarely used in geological hazard assessment. Grid unit is a popular mapping unit for susceptibility assessment since it is easily accessible, but it is not associated with geological environments. Slope unit is a relative new mapping unit for evaluating landslide susceptibility, which is generated according to hydrology theory and is the watershed area defined by drainage lines (valley lines) and water divide lines (ridge lines) [
27]. It is the basic topographical unit of geological hazard occurrence. Slope unit has higher internal homogeneity and between-unit heterogeneity than grid unit. It is closely related to geological environment conditions. In this sense, slope unit provides an alternative solution for spatial proximity expression of the GWR model for landslide susceptibility assessment.
Two other key issues of GWR model are the multicollinearity elimination and the kernel function establishment. Previous studies, e.g., [
28], indicated that GWR is highly susceptible to the effects of multicollinearity between explanatory variables, and collinearity among pairs of explanatory variables or multicollinearity among more than two variables often lead to problems such as parameter estimate instability and unintuitive parameter signs. These problems remain significant owing to the complicated conditions of landslide posing a high possibility of correlations between explanatory variables. Kernel function is based on the distances between observations and calibration units to place emphasis on observations that are closer in space [
28]. The selection of kernel function type and the determination of its bandwidth are crucial to the spatial proximity modeling of GWR.
In this study, we attempt to propose a spatial proximity based on geographically weighted regression (S-GWR) model for landslide susceptibility assessment. The presented model resolves the spatial non-stationarity of landslide susceptibility assessment with GWR model. Firstly, we generate slope units to establish spatial adjacency. Then, variance inflation factor (VIF) method [
29] and principal component analysis (PCA) method [
30] were adopted to eliminate multi-collinearity, and kernel function was determined according to the characteristics of landslide data. Finally, we chose Qingchuan County, Sichuan Province, China, as the study area to validate the applicability of the model, and further compared the established model with the grid-unit GWR model and other evaluation models.
2. Study Area
The study area is the Qingchuan County in the transitional region between the Sichuan Basin and the Western Sichuan Plateau. This area has long been recognized as one of the most landslide-prone areas of China [
31]. It locates between 32°12′~32°56′ N in latitude and 104°36′~105°38′ E in longitude, covering a total area of 3217 km
2. The minimum elevation of the Qingchuan County is approximately 500 m and the maximum is 3820 m, characterized by northwestern part with higher elevation than the southeastern. Slope gradient reaches a maximum of about 80°, with a mean value of 38°.
The tectonics and geological settings in the area are complex. Because of the neotectonics, soft-lithology and hard-lithology usually appears alternately. There are about eight types of lithological outcrops throughout the study region (as shown in
Figure 1), including the sedimentary rock (limestones, sandstone, and conglomerate) from Cambrian to Jurassic age, magmatic (granite), metamorphic rock (shales, schists, gneiss) from Cambrian to Jurassic age and Quaternary loess unconsolidated sedimentary. Two main active faults cross the area: the Pingwu–Qingchuan fault located in the north and crossing the whole territory, and the Yingxiu–Beichuan fracture which belongs to the Longmenshan fault belt, is a thrust fault 60°–70° NW dipping. Bailong river, Qingzhu river and Qiaozhuang river are distributed in the area. The discharges of the three rivers are measured approximately 525, 30, and 40 m
3s
−1, respectively, serving as the main channel for atmospheric precipitation and groundwater drainage.
5. Conclusions
In this paper, a spatial proximity-based geographically weighted regression (S-GWR) model is proposed for assessing the landslide susceptibility. The presented model solves the issues of the spatial non-stationarity between landslide factors and its occurrence that usually neglected in previous landslide susceptibility assessment studies. In order to express spatial proximity properly, slope units are adopted. The multicollinearity between the data is eliminated through VIF method and principal component analysis.
The Qingchuan area in China after the 2008 Ms. 8.0 Wenchuan earthquake is selected as a case study to illustrate the effect and validity of the proposed S-GWR model. The result shows that the four influencing factors (including geological factor, slope shape factor, hydrographic factor, and aspect factor) in the S-GWR model all have noticeable spatial non-stationary effects on the landslide susceptibility. By quantitatively testing, the missing rate of S-GWR is 12.37%, which is much lower than that of G-GWR of 20.96%, and the AUC values of ROC curve and success rate curve of S-GWR (0.859 and 0.850, respectively) are both higher than those of G-GWR (0.837 and 0.827, respectively). Besides, the AUC values of the ROC curve and success rate curve of the GWR models are higher than those of ANN, I, and OLS models, and the accuracy of each model using slope unit is higher than those of the grid unit with similar cell size.
Our study verifies the importance of considering the spatial non-stationary and the applicability of the GWR model in the landslide susceptibility assessment. It also suggests that slope unit can better express the spatial relationship between landslide data and make the evaluation results more accurate.