1. Introduction
Landslides are considered to be dependent on the complex interaction of several static and dynamic factors. Surface characteristics such as geomorphology, soil, land cover, and geology are considered static, and factors that trigger the mass movement are considered to be dynamic [
1,
2,
3]. Though multiple factors play a significant role in landslide occurrence, it is usually a single dynamic factor that becomes the trigger element of a landslide event [
4]. Landslides that are triggered by rainfall are known for their shallow depth between 0.3 and 2 m in thickness and for their great potential to cause significant damage to human beings and property [
1,
4].
The study of shallow rainfall-induced landslides is particularly important as global climate changes are expected to influence regional precipitation patterns such as precipitation intensity and distribution [
5,
6]. Storms with high-intensity and long-duration rainfall have high potential to trigger rapidly moving soil masses due to changes in pore water pressure and seepage forces [
7,
8,
9]. The literature describes two distinctive failure mechanisms for shallow rainfall-induced landslides. The first mechanism is based on the reduction of hydraulic conductivity in the weathering profile and the increase of its density with depth [
4,
10,
11]. In this scenario, the percolation rate lags behind the rainfall rate, creating a perched water flow that is parallel to the slope. The undrained conditions lead to increase pore pressure and to the reduction of shear strength which results in slope failure [
4]. The second mechanism describes the advancement of water from the surface of the slope while the material is still unsaturated. At this point, reduced suction results in failure in the form a rigid mass [
4,
11].
Rainfall events can be analyzed to define statistical or empirical correlations between rainfall’s intensity and duration to shallow landslide occurrence. This relationship is often expressed in a mathematical law that defines a rainfall threshold, which is based on the assumption that past relationships between rain and landslides are valid for the future. When conditions exceed the threshold, landslides should be expected [
12]. Caine (1980) described the relationship linking rainfall intensity (
I) and duration (
D) as a power law (
, where
I is the rainfall mean intensity,
D is the duration of the rainfall event,
is the scaling constant, and
b defines the slope).
Following this methodology, several studies around the world have focused their attention on working with rainfall thresholds to define or assess the prediction of shallow landslide occurrence [
13,
14,
15,
16,
17,
18,
19,
20]. Nevertheless, these rainfall thresholds do not consider antecedent soil moisture conditions on the ground. It is well established that even though rainfall is a triggering factor, it is not the sole culprit of slope instability as increased pore pressure generates shallow landslides [
4]. Antecedent soil moisture conditions significantly influence shallow landslide initiation as the spatial distribution of moisture content in the soil and pore water pressure controls the dynamics of shear strength and effective stress. Water pressure within the porosity of the soil expands the pore space and reduces the frictional forces between soil particles triggering slope instability [
9].
Furthermore, physically-based models that simulate the soil’s hydrological dynamics after rainfall have demonstrated that rainfall alone is not adequate to identify instability and that antecedent soil moisture conditions are substantial in the generation of this phenomenon [
21,
22,
23,
24]. Rainfall intensity-duration thresholds can be indicators of precipitation as a precursor of shallow landslide activity, but antecedent soil moisture conditions significantly influence shallow landslide initiation as gravity drainage becomes negligible when soil water content falls below the soil's field capacity [
25]. Moreover, the spatial distribution of moisture content in the soil and pore water pressure control the dynamics of shear strength and effective stress, water pressure within the porosity of the soil expands the pore space and reduces the frictional forces between soil particles triggering slope instability [
9].
Additionally, precipitation thresholds of rainfall and duration do not provide information about the soil wetness profile with depth. Regardless of the intensity-duration of the rainfall, shallow landslides are influenced by antecedent soil moisture conditions. For example, a substantial precipitation episode within a dry period is not likely to trigger shallow landslides any less than a low-intensity rainfall would within a wet period [
26]. Furthermore, damp antecedent conditions are likely to cause greater debris flow of greater magnitude during or following a given rainstorm [
19]. As shallow landslide hazards are related to the interaction of static conditions and the temporal distribution of triggering factors, defining a methodology that accounts for “pre-event” or antecedent soil moisture conditions can significantly improve the accuracy of rainfall-triggered shallow landslide hazard modeling.
The present study centers on shallow rainfall-triggered landslides as described in the first failure mechanism. Machine-learning methods are used to develop a mathematical algorithm that presents the minimum amount of antecedent soil moisture and rainfall accumulation that results in a shallow landslide event. This leads to the development of an index value that serves as guidance within susceptible areas in the Continental United States.
Susceptible areas that represent “where” shallow landslides happen are defined utilizing a comprehensive landslide inventory and static factors as defined in Hong (2007) and Cullen (2016). “When” shallow landslides happen, is defined as the space-time variation of antecedent soil moisture and rainfall distribution. Although the system is built to assist stakeholders to foresee potential shallow landslide areas days in advance, factors that represent temporal-spatial vulnerability such as “who” will be affected are not considered in this work. Nevertheless, it is expected that the system can facilitate the decision-making processes to assess risk.
3. Results
AMSR-E and TRMM are used in this work as proxies to learn and explore the feasibility of a system that can serve as a guide for antecedent moisture and rainfall triggers of shallow landslides. However, as AMSR-E and TRMM stopped working on October 2011 and April 2015 respectively, a solution that works for the future is needed. The following is a summary of the results obtained with each set of satellite products.
3.1. Shallow Landslide Index—AMSR-E/TRMM
Logistic regression, depicted in Equation (4), calculates the probability or odds of the outcome being an event or a non-event, then, the estimated coefficients related to each independent variable, represent the rate of change in the “log odds”.
These coefficients are estimated via the Maximum Likelihood Estimate (MLE) method, which finds the coefficients that make the log of the likelihood function as large as possible or two-times the likelihood function as small as possible. Then, the Z-factor for the logistic regression for the model becomes:
In Equation (4), P tends to 1, as Z in Equation (5) increases. Mathematically, P or the probability of a shallow landslide event tends towards 1 (event), as Z increases, and towards 0 (non-event), as Z decreases. Hence, any variable that is directly proportional to a shallow landslide event probability should have a positive coefficient in Equation (5) and vice versa.
Soil type and land cover are categorical variables, this means that numerical values are assigned to represent each category. For soil type, increasing values correspond to larger drainage rates for each category as described in the HWSD database for “soil drainage” and “texture class” [
33]; while for land cover, decreasing values represent decreased vegetation cover as described in the SHARE database [
28]. For example, in a dummy scale ranging from 1–9, tree covered areas are assigned a 9 and bare soils are assigned a 1. Hence, while positive coefficient values represent that the occurrence of an event is positively related to that variable, negative coefficient values represent a negative relationship with the occurrence of an event.
As described above, the Hold-Out method is used for validation, the data was divided randomly into a 70%–30% ratio for subsets as “model obtaining” and “validation” subsets respectively. Validation results indicate that this model predicts the highest number of cases correctly at 89.0% accuracy.
Table 3 shows confusion matrix for this model.
The AUC, as a measure of model performance, presents the trade-off between true and false positive proportions. Here, the AUC represents accuracy without sensitivity to changes in class distribution. The resulting AUC for the 0.5 cut-off value is 0.927 for the training set and 0.89 for the validation set, hence, the 0.5 cut-off value is the selected threshold for the event and non-event decision.
Figure 4 below shows the AUC curves for the training and the validation datasets.
Equation (5) above is then incorporated in a Python subroutine that calculates the SLI for each pixel point, 900,000 points to be precise. For each pixel, the algorithm incrementally tries values for SLI from 0 up to the value that causes Equation (4) to provide an output equal to 1, or better said, makes the “probability” of the event become equal to 1. Then, this value is the representative of the minimum amount of water by means of antecedent soil moisture and rainfall value for that pixel.
3.2. Shallow Landslide Index—SMAP/GPM
Based on the short period of time in which the satellites have been in operation, seasonal trends are not identified as of the time of this work. Nonetheless, SMAP and GPM are expected to be functional in almost “real-time” seasonal averaging becomes unnecessary as antecedent root-soil moisture and rainfall can be can be obtained for analysis as soon as seven days prior to date. It is important to have in mind that this work uses not just soil moisture, but root-soil moisture, which encompass the volumetric soil moisture for a 1 m-soil column. As of the writing of this work, this SMAP product has a mean latency of seven days [
34].
Three different time intervals are tested: 10-day, 7-day, and 3-day in logistic regression using SPSS software [
35]. The significance relationship between the dependent variable and combination of independent variables is expressed on the statistical significance of the model chi-square as seen in
Table 4.
As the significance of all models is <0.001, less than or equal to the level of significance of 0.05, all variables are deemed significant in all models. The null hypothesis that there is no difference between the model with only a constant and the model with independent variables is rejected. Therefore, the existence of a relationship between the independent variables and the dependent variable is supported. In addition, the predictor ranking or variable importance, shown in
Figure 5, is also significant for understanding the influence of each variable on the model.
The resulting variable relevance reassures the known conceptual basis of shallow landslides induced by rainfall. Mechanisms that include soil profiles, pore pressures, seepage forces, and soil topography are involved in these results as slope, antecedent soil moisture, and rainfall rank amongst the most significant variables in the model. Soil type and its properties are also involved in the development of shallow landslides and come third in importance in the model. It is the soil’s properties such as its composition that relate to the amount of soil moisture and cohesion among particles that influence landslides. Specifically, the pore water pressures have tremendous effects on slope stability that triggers shallow landslides or slope failures particularly in unstable soils subject to heavy rainfall. Land cover characteristics such as tree roots for stabilization and other hydrological and mechanical influences are also related to landslides, the model uses this variable in almost equal importance to soil type.
The area under the curve (AUC) as a measure of model performance represents accuracy without sensitivity to changes in class distribution. The resulting AUC for the 0.2 cut-off value is the selected threshold for the event and non-event decision as shown in
Figure 6.
Although there is no close analogous statistic in logistic regression to the coefficient of determination R2, the Pseudo R2 or Nagelkerke R2 (that ranges from 0 to 1) for each model describes the goodness of fit for each logistic model, in this case, all validation models are close to 1, therefore indicating a strong relationship (78.7%, 79.6%, and 76.8% respectively) between the predictors and the predicted value.
The resulting equations for the models are:
Each equation is then incorporated in a python subroutine that calculates the SLI for each one of the 900,000 pixels.
3.3. SLI Application
The SLI is built to include the effect of the initial moisture content of the soil and the rainfall depth that is accumulated in a certain period d (days). This period is assumed as the number of days after which the effect of accumulated rainfall will trigger the shallow landslide taking into consideration the initial moisture. Ideally, as a dynamic system, the model will automatically retrieve direct information from SMAP to account for current soil moisture conditions and rainfall forecasts for up to 10 days can be included to calculate the current SLI.
Figure 7 below shows the applied AMSR-E/TRMM SLI and the applied SMAP/GPM SLI. In both maps, each pixel contains a color index value. This value is the critical SLI that will turn Equation (5) equal to 1, in other words, the minimum value of soil moisture content and rainfall accumulation over a 10-day period that is necessary to trigger a shallow landslide. These maps can sever as a baseline to determine shallow landslide hazards. Once the current daily moisture and the forecasted rainfall depth expected in the next 10 days are available, the system can calculate the expected SLI value. This value, in turn, indicates the possibility of a shallow landslide occurrence if it is equal or greater than the critical SLI value.
4. Discussion
4.1. Comparing AMSR-E/TRMM and SMAP/GPM
Both models, AMSR-E and SMAP, are built with the same static variables defined and prepared as described in
Section 2. Soil moisture and rainfall estimates differ on both models as they are retrieved from different satellites. Regretfully, there is no overlap between AMRS-E and SMAP, AMSR-E was discontinued in October 2011 and SMAP was launched in January 2015. Even though TMPA information is still being produced, TRMM stopped functioning on April 2015 and GPM took over in 2015; the non-bias comparison is truncated, as sample sizes of each instrument are very different. In this study, for example, seven years of AMSR-E and TRMM data are used versus only nine months of SMAP and GPM information.
Nevertheless, it is important to highlight the instrument’s characteristics to form an understanding of relative performance to each other. In the case of soil moisture, on the one hand, AMSR-E’s daily root-zone soil moisture product is derived from the C-band retrievals into the 2-Layer Palmer Water Balance Model from the LPRM/(AMSR-E)/AQUA surface soil moisture retrievals using a one-dimensional, 30 member Ensemble Kalman filter (EnKF). This model optimally combines soil moisture information derived from the model forecast and satellite retrieval, it then extrapolates surface soil moisture retrievals into deeper root zone soil moisture predictions. AMSR-E did this at a 0.25-degree spatial resolution.
SMAP on the other hand provides estimates of root zone soil moisture for the first 1 m of the soil column using the ensemble Kalman filter (EnKF) that merges SMAP observations with NASA’s catchment land surface model. This land surface model is based on surface meteorological forcing data which includes precipitation and surface processes such as the vertical transfer between the surface and root zone reservoirs, then the model interpolates and extrapolates the satellite observations in time and space. The model and the products are compared to various in situ observations where the model proves of superior quality [
34].
In the case of rainfall, GPM builds on TRMM, expanding on spatial footprint and improving on spatial resolution going from 0.25-degrees to 0.1-degrees resolution. In addition, GPM improves on TRMM with the Dual-frequency Precipitation radar and the multi-channel microwave imager that provided higher sensitivity than TRMM.
Here we present the statistical analysis of both model performances, AMSR-E 10-day and SMAP 10-day.
Table 5 shows the Descriptive statistics for 3837 random pixels that were selected for evaluation.
A
t-test that measures the significant difference between the two models is performed as shown in
Table 6.
Significant variances between the two models are calculated with the F-test, which simply divides the two variances as shown in Equation (9). F-critical at a 95% significance confidence level is equal to 1.054, as F calculated is greater than F-critical, it is concluded that there is significant variance, making the AMSR-E model significantly different from the SMAP model.
These findings show that both SLI maps, AMSR-E 10-day and SLI 10-day are significantly different and should not be used interchangeably.
4.2. Challenges and Limitations
It is important to stress the limitations of this work. The landslide record in which this system was built does not provide the time of event information. Therefore, daily averages of antecedent root-soil moisture and rainfall are used. Furthermore, antecedent root-soil moisture itself is a model approximation that was not tested in this study. GPM and SMAP each have their limitations and uncertainties, and consequently, they both have a direct effect in the SLI. Confidence on the information was assumed based on the success of the model described in the literature review.
First and foremost, the accuracy of any learning algorithm is based on the accuracy of the landslide inventory and other datasets used [
20,
36]. Cullen (2016) demonstrated that buffer and threshold techniques provide a solution for the characterization of static parameters when studying shallow landslides over a vast domain; nevertheless, inventories are prone to heterogeneous reporting and lack information regarding the specific time of the event. In the case of this work, having information regarding the time of event could have a significant impact on the SLI modeling. GPM for example provides rainfall information every 3 h, a landslide inventory that provides a time of the event could prove very useful to improve performance.
Second, given SMAP’s biases and ongoing recalibration with the newly available Modern-Era Retrospective analysis for Research and Applications (MERRA-2) reanalysis, a comparison between the information used to build this system and the recalibrated data could be useful to determine accuracy. It is also important to note that the L4_SM product uncertainties vary dynamically and geographically. Driest areas, for example, are associated with low values of uncertainty given that the deeper layer of soil moisture is mostly constant. High uncertainty values are found in southern China where root-soil moisture is known to be highly variable but SMAP observations cannot be incorporated.
SMAP requires information regarding the L-band brightness temperature climatology in order to determine observation-minus-forecast biases; this climatological information is derived from the Soil Moisture Ocean Salinity (SMOS) mission, which does not provide good quality information in areas where radio-frequency interference (RFI) is high. Furthermore, no SMAP brightness temperature is assimilated in mountainous areas such the Rocky and Andes mountains or near large water bodies such as northern Canada, the Amazon, and the Congo rivers. When SMAP data is not available, the L4_SM product provides global soil moisture estimates based on information provided by the model and forcing data [
37]. Consequently, caution should be practiced when incorporating these readings into the SLI. It is likely that as SMAP continues to grow and more data becomes available, better certainty can be acquired for further implementation of root-soil moisture in the SLI.
5. Conclusions
This works introduces the Shallow Landslide Index (SLI). The index is intended to be an indicator of antecedent root-soil moisture and rainfall accumulation as a representation of total water volume over a 1 km
2 pixel area. This index can serve as guidance for the assessment of shallow landslide hazards within susceptible areas in the Continental United States. AMSR-E and TRMM information are used at first as proxies for model development from where findings are as follows:
The AMSR-E model predicts the highest number of cases correctly at 92.7% accuracy.
The RMSE between the resulting SLI and the actual events is 0.83 in a scale from 1–13.
The resulting index map is useful to have an understanding of hazardous areas as precedent soil moisture conditions and rainfall are taken into consideration. Nevertheless, as AMSR-E is no longer functional, current and future guidance is not possible.
AMSR-E and TRMM are used in this work to learn and explore the feasibility of a system that can serve as a guide for antecedent moisture and rainfall triggers of shallow landslides. Nevertheless, as AMSR-E and TRMM stopped working on October 2011 and April 2015 respectively, a solution that works for the future is presented. New functional satellites, SMAP and GPM, are used to retrieve daily-modeled root-soil moisture and rainfall respectively. The SLI is modeled for three different time intervals—10-day, 7-day, and 3-day—and results are as follows:
Slope is the variable with most influence over the model followed by soil moisture content and rainfall in the form of SLI, soil type, and land cover are subsequently in importance in the three models.
The pseudo R2, the Nagelkerke R2 fit for a logistic regression for each model—10-day, 7-day, and 3-day—indicates a strong relationship (78.7%, 79.6%, and 76.8% respectively) between the predictors and the prediction.
The optimal cut-off value for these logistic regression models as indicated by the AUC is 0.2.
The RMSE is used to understand the difference between the events and the predicted SLI value for the three models, as the RMSE is scale dependent, RMSE = 1.08, 0.84, 0.97 are considered a low error in the SLI scale of 1–13.
Comparing AMSR-E’s performance to SMAP’s is not possible even though both models are built with the same predefined static variables. There is no overlap between AMRS-E and SMAP. In addition, the sample sizes of each instrument are very different, seven years of AMSR-E and TRMM versus nine months of SMAP and GPM. Nevertheless, the t-test of significant difference in means and the f-test for significant variance result in significant differences.
The SLI index is intended to present stakeholders with the capability to foresee volumetric water conditions for susceptible locations 10 and 7 days in advance, facilitating the decision-making progress to determine shallow landslide hazard, vulnerability, and risk. Nevertheless, several challenges should be resolved to “fine-tune” the model. These encompass the introduction of a “time of event” parameter into shallow landslide inventories; re-evaluation of SMAP root-soil moisture observations after temperature brightness is re-introduced into the L4_SM model, and the inclusion of other physical parameters should be investigated given the optimal computer power. Future work will include the testing against deterministic infiltration methods for specific events in order to determine how SLI capabilities could be improved, the implementation of the automated system, and the development of SLI for other areas where in situ or deterministic methods are not viable.