Empirical and Physically Based Thresholds for the Occurrence of Shallow Landslides in a Prone Area of Northern Italian Apennines

: Rainfall thresholds deﬁne the conditions leading to the triggering of shallow landslides over wide areas. They can be empirical, which exploit past rainfall data and landslide inventories, or physicallybased, which integrate slope physical–hydrological modeling and stability analyses. In this work, a comparison between these two types of thresholds was performed, using data acquired in Oltrep ò Pavese (Northern Italian Apennines), to evaluate their reliability. Empirical thresholds were reconstructed based on rainfalls and landslides triggering events collected from 2000 to 2018. The same rainfall events were implemented in a physicallybased model of a representative testsite, considering di ﬀ erent antecedent pore-water pressures, chosen according to the analysis of hydrological monitoring data. Thresholds validation was performed, using an external dataset (August 1992–August 1997). Soil hydrological conditions have a primary role on predisposing or preventing slope failures. In Oltrep ò Pavese area, cold and wet months are the most susceptible periods, due to the permanence of saturated or close-to-saturation soil conditions. The lower the pore-water pressure is at the beginning of an event, the higher the amount of rain required to trigger shallow failures is. physicallybased thresholds provide a better reliability in discriminating the events which could or could not trigger slope failures than empirical thresholds. The latter provide a signiﬁcant number of false positives, due to neglecting the antecedent soil hydrological conditions. These results represent a fundamental basis for the choice of the best thresholds to be implemented in a reliable earlywarning system. respective thresholds are characterized by TN of 100 ± 0% and by FP of 0 ± 0%, confirming the capability of these models in distinguishing events able to trigger or not trigger shallow landslides. Moreover,the TRIGRS/0 threshold assessed the conditions which could not trigger slope instabilities well, as testified by TN of 93 ± 1% and by FP of 7 ± 1%. Instead, the empirical threshold was characterized by a lower ability in distinguishing triggering or non-triggering events. Its TN was of 76 ± 3%, while its FP was of 24 ± 3%. In these terms, these thresholds overestimated the conditions able to trigger shallow landslides, classifying 24 ± 3% of real non-triggering events as able to cause shallow landsliding (false positives).


Introduction
Shallow landslides are slope instabilities of a mass of soil and/or debris, which could involve the most superficial colluvial layers till around 2.0 m from ground level. Although they involve small volumes (10 1 -10 5 m 3 ) of soil, they can be densely distributed across small catchments [1] and can affect slopes close to urbanized areas, provoking significant damages to cultivations and infrastructures, and sometimes causethe loss of human lives [2].
Rainfall is generally the main triggering factor [3]. Rainfall features leading to shallow landslides and the consequent temporal probability of occurrence at regional scale are generally estimated by means of rainfall thresholds, defined for different geological, geomorphological, and environmental settings [4]. These thresholds represent the main tool to estimate the daily or hourly level of hazard across a territory prone to shallow landslides or to implement earlywarning systems [5], representing the lower bound of rainfall conditions that caused the triggering of shallow landslides [3,6]. These thresholds are expressed as curves which separate the rainfall conditions leading to shallow slope failures from the ones where stability is maintained, sometimes with associated different probabilities of occurrence with uncertainties related to the possible incompleteness of the input data required to define the same thresholds [5,7]. The most widespread type of rainfall thresholds is the empirical one. These thresholds are reconstructed through the statistical analysis of empirical distributions of rainfall conditions that presumably resulted in the triggering of shallow landslides in a particular testsite [8]. The comparison between a multi-temporal inventory of shallow-landslides events and rainfall parameters measured in several points of the study area (e.g., in correspondence of raingauges) during the same analyzed time span is required in order to estimate these types of thresholds. Several authors proposed different methods for the estimation of empirical rainfall thresholds in different contexts all over the world [4,[8][9][10][11][12][13][14][15][16][17][18][19][20]. In all cases, two different rainfall parameters were considered to build up boundary thresholds, namely cumulated event rainfall vs. rainfall duration or mean rainfall intensity vs. rainfall duration.
The use of only easily measurable rainfall data and the reconstruction based on the analysis of real past events, whether or not they triggered shallow landslides, makes empirical thresholds a reliable tool to estimate temporal probability of occurrence of shallow landslides at a large scale (catchment, regional, and national) [4,5]. Instead, these are sometimes limited in their effectiveness for different reasons. First, the shape of the thresholds is affected by the following: (i) the availability and quality of rainfall and of landslide information across the analyzed study area [21,22]; and (ii) the correct definition of the real rainfall features responsible for slope failures during a particular triggering event, generally linked to leakage of precise information about the moment of shallow landslides occurrence during a particular event [4,5]. Moreover, these types of thresholds do not take into account the unsaturated/saturated flow processes and the hydromechanical conditions of soils at the beginning of a particular rainfall event. The mechanical processes, which lead to shallow-slope failures, are in fact related to rainwater flows and water accumulation in the subsurface that provoke the increase in pore-water pressure and the consequent decrease of soil shear strength [22][23][24][25][26].
To overcome these limitations, rainfall thresholds can be estimated by means of a physicallybased model that can provide the assessment of the link between the rainfall features, the soil hydromechanical conditions before a rainfall event, and the shear strength response of the soils during the rainwater infiltration. In this case, the deterministic model estimates the response of the typical geological-geomorphological frame prone to shallow landsliding toward a particular rainfall event, defined by those parameters that are generally involved also for the reconstruction of an empirical threshold (cumulated event rainfall vs. rainfall duration; mean rainfall intensity vs. rainfall duration). This response is represented by the trend in time of the slope safety factor (Fs), during the modeled event. Triggering conditions are then represented by the rainfall patterns, which provoke the decrease of Fs below 1 (unstable conditions). Instead, if Fs stay higher than 1, shallow failures are not modeled (stable conditions). Some attempts were proposed to build up reliable physicallybased thresholds in some areas prone to shallow landslides, such as in Italian alpine catchments [26], catchments of the Central Italian Apennines [27], hilly catchments of Southern Italy [20,[28][29][30][31], western hilly and mountainous settings of United States [32], and Chinese areas susceptible to shallow landsliding [33,34].
The main limitations of physicallybased thresholds are related to the most important disadvantages of the deterministic methods [35]: (i) requiring a significant amount of geotechnical, mechanical, and hydrological parameters for model simulation; and (ii) reconstructing the boundary conditions which represent, in the best way, the real soil and slope behaviors. Integration of meteorological measurements (e.g., rainfall) and hydrological soil parameters (e.g., pore-water pressure and water content) could help in obtaining a better insight into the quantitative effects of antecedent soil conditions on the triggering mechanism of shallow landslides. Thus, field monitoring allows us to improve the calibration of the physicallybased models used to reconstruct rainfall thresholds [23,[32][33][34].
This paper aims to reconstruct and compare empirically and physicallybased rainfall thresholds for the occurrence of shallow landslides in a susceptible area of the Northern Italian Apennines (Figure 1). The main objectives of this work can be summarized as follows: (i) assessing empirical thresholds through the analysis of time series of rainfall data and of shallow-landslide inventories for the identification of the triggering and non-triggering events; (ii) calibrating a physicallybased model by the comparison between monitored and simulated soil hydrological parameters in correspondence of a test-site slope, which can be assumed to be representative of the typical geological, geomorphological, and environmental settings prone to shallow landsliding in the study area; (iii) assessing physicallybased thresholds through the application of the calibrated deterministic model in correspondence with the representative testsite for different rainfall events; (iv) comparing the two typologies of estimated thresholds and verifying their predictive capabilities through different inventories of occurred shallow landslides not used for the threshold reconstruction. Considered rainfall events corresponded to the ones that occurred in the 2000-2018 period and to other synthetic rainfalls characterized by strong average intensities and limited durations, which are not typical of the current climate of Oltrepò Pavese. Instead, their probability of occurrence may increase in the future due to the effects of climate change, which could cause an increase in very intense and short-duration rainfalls in Northern Italy, where the study area is located [36,37].

The Study Area
The study area is the hilly sector of Oltrepò Pavese (265 km 2 wide, Figure 1) that corresponds to the northern termination of the Italian Apennines. It is characterized by a complex geological and geomorphological setting [38][39][40] (Figure 1c). The northern part of the area presents a bedrock geology composed by sandstones and conglomerates overlying marls and evaporitic deposits. In this sector, superficial soils, derived from bedrock weathering, are mostly clayey or clayey-sandy silts. Their thickness, measured in micro-boreholes and in trenches, have a thickness ranging between a few tens of centimeters and 2 m. Hillslopes are steep, with an average slope angle between 15 • and 20 • and maximum values up to 35 • . Instead, the central and southern parts of the study area are characterized by calcareous and marlyflyshes, alternated with sandstones, marls, and mélanges with a peculiar block-in-matrix at the outcrop scale. Due to the different lithology of the bedrock, superficial soils have a clayey or a silty clayey texture. Their thickness is generally in the order of more than 1 m, mostly ranging between 1.5 and 2 m from ground level, as measured in micro-boreholes and in trenches. Hillslopes have a medium steepness, with a typical slope angle of 8 • -15 • .
The slope elevation ranges between 60 and 500 m a.s.l. According to Koppen's classification, the climatic regime of the Oltrepò Pavese area is temperate/mesothermal, with a mean yearly temperature of 12 • C and an average yearly rainfall amount between 700 and 1000 mm, increasing from western to eastern sectors and from northern to southern sectors.
The area is significantly prone to shallow landsliding [24,39]. Several triggering events have occurred in Oltrepò Pavese since 1970s [38][39][40]. In the last 10 years, more than 2500 shallow landslides ( Figure 1b) occurred in this area as a consequence of several rainfall triggering events during the winter and spring months. Most of the shallow landslides are classified as complex phenomena, starting as roto-translational slides and evolving into flows [41]. They are generally 10-70 m wide and 10-500 m long. Sliding surfaces are generally located at 1 m in depth [24]. Rainfall-induced shallow landslides affect medium-steep and steep slopes, with a slope angle of at least 8 • -10 • . An integrated hydrometeorological monitoring station was installed on 27 March 2012 in a testsite slope located near the village of Montuè (red circle in Figure 1), in the northeastern part of Oltrepò Pavese. This testsite ( Figure 2) is representative of the typical geological and geomorphological settings of Oltrepò Pavese areas most prone to shallow landslides for the following reasons: (i) the presence of triggering zones of past shallow landslides; (ii) its position in areas with medium-high susceptibility to shallow landslides according to previous studies [1,24]; and (iii) the typical geomorphological (hillslopes with medium-high thickness) and lithological features (clayey and silty soils) of the sectors most prone to shallow landsliding in the study area. In this station, rainfall amounts are measured through a rain gauge with an accuracy of 0.1 mm. Soil water content is measured by means of Time Domain Reflectometer (TDR) probes, with an accuracy of 0.01-0.02 m 3 /m 3 . Soil pore-water pressure is measured through a combination of tensiometers, with an accuracy An integrated hydrometeorological monitoring station was installed on 27 March 2012 in a test-site slope located near the village of Montuè (red circle in Figure 1), in the northeastern part of Oltrepò Pavese. This testsite ( Figure 2) is representative of the typical geological and geomorphological settings of Oltrepò Pavese areas most prone to shallow landslides for the following reasons: (i) the presence of triggering zones of past shallow landslides; (ii) its position in areas with medium-high susceptibility to shallow landslides according to previous studies [1,24]; and (iii) the typical geomorphological (hillslopes with medium-high thickness) and lithological features (clayey and silty soils) of the sectors most prone to shallow landsliding in the study area. In this station, rainfall amounts are measured through a rain gauge with an accuracy of 0.1 mm. Soil water content is measured by means of Time Domain Reflectometer (TDR) probes, with an accuracy of 0.01-0.02 m 3 /m 3 . Soil pore-water pressure is measured through a combination of tensiometers, with an accuracy of 1.5-2.0 kPa and measuring values higher than −100 kPa, and heat dissipation (HD) sensors, with an accuracy of 1.5-2.0 kPa and range of measure till -10 5 kPa. of 1.5-2.0 kPa and measuring values higher than -100 kPa, and heat dissipation (HD) sensors, with an accuracy of 1.5-2.0 kPa and range of measure till -10 5 kPa.

Reconstruction of the Empirical Thresholds
Empirical rainfall thresholds were reconstructed by implementing CTRL-T tool, written in R open-source software and freely available at: http://geomorphology.irpi.cnr.it/tools/rainfall-eventsand-landslides-threshold. A detailed description of the algorithm is reported by Melillo et al. [8]. Figure 3 illustrates the logical framework of this method to assess the empirical rainfall threshold for a set of rain gauges and a multi-temporal shallow landslide inventory.  1. Identification of distinct rainfall events, along the hourly time series of each rain gauge. Different lengths of dry periods were considered, i.e., considering different lengths of dry periods, meaning significant time spans without rain, which depend on the climatic feature of the area. The length of a dry period separating two distinct events depends on the time required for the soil to dry out and on the season, namely a cold period with low temperatures and limited amount of evapotranspiration (Cc) and a warm one with high temperatures and significant evapotranspiration (Cw). The length in months of Cc and Cw was calculated for the study area, following the procedure described in Melillo et al. [8], based on the application of the monthly soil water balance (MSWB) model [42]. The average monthly potential evapotranspiration PET ( Figure  4a) of the study area was estimated, since the data acquired from 2000 to 2018 for the meteorological stations of the study area ( Figure 1b). The average monthly real evapotranspiration RET (Figure 4a) was then estimated, considering a maximum field capacity of 208 mm/m, which is typical of the soil types (clayey and silty soils) and of the land uses (shrubs, woods, and grapevines) of the study area. For each month, RET was divided by PET, obtaining the parameter

Reconstruction of the Empirical Thresholds
Empirical rainfall thresholds were reconstructed by implementing CTRL-T tool, written in R open-source software and freely available at: http://geomorphology.irpi.cnr.it/tools/rainfall-eventsand-landslides-threshold. A detailed description of the algorithm is reported by Melillo et al. [8]. Figure 3 illustrates the logical framework of this method to assess the empirical rainfall threshold for a set of rain gauges and a multi-temporal shallow landslide inventory.

1.
Identification of distinct rainfall events, along the hourly time series of each rain gauge. Different lengths of dry periods were considered, i.e., considering different lengths of dry periods, meaning significant time spans without rain, which depend on the climatic feature of the area. The length of a dry period separating two distinct events depends on the time required for the soil to dry out and on the season, namely a cold period with low temperatures and limited amount of evapotranspiration (C c ) and a warm one with high temperatures and significant evapotranspiration (C w ). The length in months of C c and C w was calculated for the study area, following the procedure described in Melillo et al. [8], based on the application of the monthly soil water balance (MSWB) model [42]. The average monthly potential evapotranspiration PET (Figure 4a) of the study area was estimated, since the data acquired from 2000 to 2018 for the meteorological stations of the study area ( Figure 1b). The average monthly real evapotranspiration RET ( Figure 4a) was then estimated, considering a maximum field capacity of 208 mm/m, which is typical of the soil types (clayey and silty soils) and of the land uses (shrubs, woods, and grapevines) of the study area. For each month, RET was divided by PET, obtaining the parameter of the monthly aridity index AI [43]. C w was, then, the period when the soil exhibits a water deficit (RET<PET, AI<1) and was from May to September (Figure 4b). Conversely, C c was the period when RET>PET and AI>1, from October to April (Figure 4b). The total amount of RET for C w period was then divided by the total amount of RET for C c period, obtaining an R index equal to 2.1. R is defined as the factor of difference between the length of the dry periods (i.e., time span between two different rainfall events) in C w and C c periods. The dry intervals used for the definition of the rainfall events was, then, the following (Table 1): (i) for the definition of isolated rainfall events, the dry period P 1 was of 3 and 6 h in C w and C c , respectively; (ii) for the definition of the sub-events, the dry period P 2 was of 6 and 12 h in C w and C c , respectively; (iii) for the definition of a rainfall event, the dry period P 4 was of 24 and 48 h in C w and C c , respectively. According to Melillo et al. [8,44], irrelevant rainfall sub-events (P 3 ) with a cumulated amount less than or equal to 1 mm had to be excluded in the calculation of the final events.

2.
Linking rainfall data to shallow landslide events. For each shallow failure, related rain gauge was located in a circular buffer with a radius Rad of 10 km centered on the landslide location.
This radius was chosen according to the morphology of the study area (no significant variations on slope height, which could influence rainfall amount) and to the density of rain gauges in the study area (an average of one gauge per 13 km 2 ).

3.
Estimation of rainfall conditions leading to shallow landslide triggering. For each event in the inventory, the algorithm estimates possible rainfall conditions (in terms of duration and cumulated rainfall amount) leading to slope failure. This allows us to consider a possible inaccuracy in the estimation of the rain features triggering a landslide due to the distance between the slope failure and the related rain gauge. A weight, W, was assigned according to the inverse square distance between the rain gauge and the landslide (d −2 ), the cumulated rainfall amount (E), and the rainfall mean intensity (I) (Equation (1)): Furthermore, a parameter, k, assumed equal to 0.84, allowed us to take into account the antecedent soil moisture condition depending on the amount of rain fallen in the previous days.

4.
Reconstruction of rainfall threshold, based only on events triggering shallow landslides. Moreover, for each event, only the rainfall condition with the highest W value was selected. The threshold is defined as a power law curve which relates the cumulated rainfall amount (E) and the duration (D) of the events (Equation (2)): where α is the intercept of the curve;ω is the slope of the power law curve; and ∆α and ∆ω are the uncertainties of α and ω, respectively.
The threshold was defined by means of a frequentist method for reconstructing a 5% exceedance probability threshold, according to Brunetti et al. [13]. The fitting parameters of the curve and the related uncertainties were estimated through the calculation of thresholds of 5000 synthetic series of rainfall events. These series contained the same number of rainfall events that triggered landslides, but selected randomly with replacement, according to a bootstrap technique [45]. Analysis of these series allowed us to estimate the final threshold, that had α and ω corresponding to the mean values of the different bootstrap thresholds with their respective uncertainties (∆α and ∆ω).
Water 2020, 12, x FOR PEER REVIEW 5 of 28 of 1.5-2.0 kPa and measuring values higher than -100 kPa, and heat dissipation (HD) sensors, with an accuracy of 1.5-2.0 kPa and range of measure till -10 5 kPa.

Reconstruction of the Empirical Thresholds
Empirical rainfall thresholds were reconstructed by implementing CTRL-T tool, written in R open-source software and freely available at: http://geomorphology.irpi.cnr.it/tools/rainfall-eventsand-landslides-threshold. A detailed description of the algorithm is reported by Melillo et al. [8]. Figure 3 illustrates the logical framework of this method to assess the empirical rainfall threshold for a set of rain gauges and a multi-temporal shallow landslide inventory.  1. Identification of distinct rainfall events, along the hourly time series of each rain gauge. Different lengths of dry periods were considered, i.e., considering different lengths of dry periods, meaning significant time spans without rain, which depend on the climatic feature of the area. The length of a dry period separating two distinct events depends on the time required for the soil to dry out and on the season, namely a cold period with low temperatures and limited amount of evapotranspiration (Cc) and a warm one with high temperatures and significant evapotranspiration (Cw). The length in months of Cc and Cw was calculated for the study area, Figure 3. Flowchart of the methodology adopted for the reconstruction of the empirical thresholds.
rainfall events was, then, the following (Table 1): (i) for the definition of isolated rainfall events, the dry period P1 was of 3 and 6 h in Cw and Cc, respectively; (ii) for the definition of the sub-events, the dry period P2 was of 6 and 12 h in Cw and Cc, respectively; (iii) for the definition of a rainfall event, the dry period P4 was of 24 and 48 h in Cw and Cc, respectively. According to Melillo et al. [8,44], irrelevant rainfall sub-events (P3) with a cumulated amount less than or equal to 1 mm had to be excluded in the calculation of the final events.  Linking rainfall data to shallow landslide events. For each shallow failure, related rain gauge was located in a circular buffer with a radius Rad of 10 km centered on the landslide location. This radius was chosen according to the morphology of the study area (no significant variations on slope height, which could influence rainfall amount) and to the density of rain gauges in the study area (an average of one gauge per 13 km 2 ). 3. Estimation of rainfall conditions leading to shallow landslide triggering. For each event in the inventory, the algorithm estimates possible rainfall conditions (in terms of duration and cumulated rainfall amount) leading to slope failure. This allows us to consider a possible inaccuracy in the estimation of the rain features triggering a landslide due to the distance between the slope failure and the related rain gauge. A weight, W, was assigned according to the inverse square distance  The empirical threshold for the study area was assessed through this procedure, using hourly rainfall measurements collected in the period from January 2000 to December 2018, by a network of 19 rain gauges (blue circles in Figure 1), with a resolution (G s ) of 0.1 mm. Shallow landslides inventory of the same time span grouped the spatial and the temporal information of 143 triggering events. The spatial resolution of these events was about 1 km 2 . For 44% of the events, the exact triggering hour was known, while for the remaining 56%, only the part of the day (generally, each 6 h in a day), when slope failure occurred, was identified. Among the landslide inventories, 30 events (11% of the inventory) were located by using information related to field surveys, 155 events (55% of the inventory)by means of aerial or satellite images [1,24], and 96 events (34% of the inventory) from newspapers and online chronicles.

Reconstruction of the Physicallybased Thresholds
The adopted procedure for the reconstruction of the physicallybased thresholds is composed of a series of consequent steps ( Figure 5):

1.
Identification of the representative testsite. Montuè was chosen as testsite exhibiting the typical geological and geomorphological settings prone to shallow landsliding in the study area ( Figure 2a). The typical soil profile is shown in Figure 2b and described in detail in Bordoni et al. [24]. Test-site soils are low plastic clayey-sandy silts with a thickness mostly between 0.5 and 1.5 m. From the ground surface till 0.7 m, the upper soil layer (US) is characterized by a high content in carbonates (15%), as soft concretions, and unit weight in the order of 16.7-17.0 kN/m 3 . Below this level, the lower soil layer (LS), from 0.7 to 1.1 m from the ground level, is characterized by similar carbonate content with respect to the US, but it presents a higher unit weight, ranging around 18.6 kN/m 3 . Between 1.1 and 1.3 m from ground level, there is a layer (CAL) characterized by a significant increase in carbonate content, till 35.3%. The weathered bedrock (WB), which is composed of sand and poorly cemented conglomerates, is at 1.3 m from the ground surface. US and LS are characterized by similar values of friction angle, equal to 31 • and 33 • , respectively, and by a nil effective cohesion. Instead, the CAL horizon has a lower value of friction angle (26 • ), but a higher effective cohesion (29 kPa). Saturated hydraulic conductivity (K s ) is in the order of 10 −5 m/s, except for CAL level that is characterized by a saturated hydraulic conductivity of 10 −7 m/s. Soil water characteristic curves (SWCCs) of the soil layers, fitted through Van Genuchten's [46] model, are quite similar [47], with wetting paths having saturated (θ s ) and residual (θ r ) water contents of 0.37-0. 40  In the test-site slope, an integrated meteorological and hydrological monitoring station has been installed since 27 March 2012, and is still functioning [24]. The station allows meteorological parameters (rainfall depth, air temperature and humidity, air pressure, net solar radiation, wind speed and direction) to be measured, together with soil pore-water pressure, at depths of 0.2, 0.6, and 1.2 m from ground level, and soil water content, at depths of 0.2, 0.4, 0.6, 1.0, 1.2, and 1.4 m from ground level. Further details on this monitoring station are reported in Bordoni et al. [24,47]. 2.
Physicallybased model, to model the hydrological and the mechanical responses of the slope to different rainfall events. The well-established physicallybased methodology named TRIGRS [48] was used. It has been already applied successfully for modeling the occurrence of these phenomena [1,[49][50][51][52][53][54]. This physicallybased model considers the method outlined by Srivastava and Yeh [55] and Iverson [56] to explain shallow landslide triggering in relation to rainwater infiltration both in saturated or unsaturated soil conditions, assuming an impermeable basal boundary at a finite depth and a simple runoff-routing scheme. The model assesses the pore-water pressure and the slope safety factor (Fs) during different moments of a rainfall event, due to rainwater infiltration.
A 5 × 5 m digital elevation model (DEM), realized through LiDAR data acquired in 2008 and 2010 by the Italian Ministry for Environment, Land, and Sea, provided the topographic basis for the study area and was used to derive the slope angle and the flow direction maps required by the model. Hydrological parameters required for the hydrological model of TRIGRS were K s , θ s , θ r , and the ξ parameter, which represents the fitting parameter of Gardner's [57] SWCC equation. ξ was estimated based on the λ and n fitting parameters of Van Genuchten's model through the method proposed by Ghezzehei et al. [58] (Equation (3)): Hydrological boundary condition of the model corresponded to the presence of a low permeable soil level, which limits the infiltration of the rainwater and causes the uprising of a perched-water table in correspondence of the most intense rainfall events. As demonstrated by Bordoni et al. [24], this can be assumed as the main triggering mechanism of shallow failures in the study area. This water table developed in correspondence of the CAL layer, due to its lower permeability than the overlying soil levels, at about 1.1-1.2 m from ground level. The perched water table could rise up to 0.8-1.0 m from ground level, in LS layer, during the most intense rainfall events. TRIGRS modeled water table depth in the soil profile and the corresponding pore-water pressure profiles during a rainfall event, considering also the water table depth at the beginning of a modeled event, which was estimated through the most superficial measured pore-water pressure (in US soil layer; ρ US ) [59] (Equation (4)): where γ w is the water unit weight (9.8 kN/m 3 ), and β is the slope angle.
In TRIGRS model, an infinite slope stability analysis is coupled with the hydrological model to compute the Fs at different time instants at different points and depths in the analyzed area (Fs(z, t)), considering its change over time during a rainfall event, due to change in pore-water pressure ρ(z, t) (Equation (5)): where ϕ' is the soil shear strength angle, c' is the effective cohesion, γ is the soil unit weight, and z is the soil depth. 3. Table 2 lists the soil hydrological and geotechnical values of TRIGRS input parameters. Since US and LS layers had similar values of the different required parameters (K s , θ s , θ r , ξ, ϕ', c', and γ), a uniform soil profile was assumed, and then a unique set of input values was inserted in TRIGRS.
In regard to the parameters of SWCCs (θ s , θ r , ξ), the values of the wetting path of SWCCs were taken into account, according to the modeling of the process of rainwater infiltration [22,24]. The sliding surface depth (z) was assumed equal to 1.0 m, according to the typical depths observed in past shallow failures.

4.
Corroboration of the physicallybased model. The modeled pore-water pressures at 1.2 m from ground level, which corresponds to a level very near to the typical observed sliding surface depth, were then compared with the measured values for different rainfall events. The reliability of the model fit was evaluated with the root mean square error (RMSE) statistical index (Equation (6)): where ρ o,i is the observed pore-water pressure at the ith hour of the considered rainfall event, ρ m,i is the pore-water pressure estimated by the model at the same ith hour of the same event, and n tot is the number of observations, which corresponds to the duration of the rainfall event (in hours).

5.
Modeling slope safety factor (Fs) for different rainfall events. Once both the implementation and validation had been completed, the physicallybased model was used to estimate Fs of the testsite for rainfall scenarios corresponding to the ones identified by CTRL-T algorithm during the phase of reconstruction of rainfall events. Furthermore, synthetic rainfalls characterized by average intensities of 50, 75, and 100 mm/h, for a duration ranging between 1 and 12 h, were also modeled. A modeled rainfall event represented a triggering moment for shallow landslides if Fs dropped below 1 (unstable conditions) in correspondence of the sectors of the testsite where typically shallow landslides source areas developed in the past, namely the sectors with a slope angle higher than 25 • . Instead, if Fs stayed higher than 1 (stable conditions) in all the testsite, the rainfall event was not considered to be a triggering event. Each event was modeled by considering different initial pore-water pressures representative of the typical antecedent conditions before landslide triggering, particularly in correspondence of the depth where typically sliding surfaces developed in the testsite (1.0 m). 6.
Reconstruction of the rainfall thresholds. The method used for the reconstruction of the physicallybased thresholds was the same applied for the assessment of the empirical ones. In this case, only rainfall scenarios leading to shallow-landslide triggering based on the model application were considered. As for empirical thresholds, the physicallybased ones had fitting parameters corresponding to the mean values of the different bootstrap thresholds, with their respective uncertainties. Different rainfallthresholds could be reconstructed, according to the different initial pore-water pressure conditions used in modeling the rainfall events.   From the geomorphological point of view, the testsite has steep slopes (steepness higher than 15° and mostly between 26° and 35°) and is east-facing. The slope elevation ranges from 170 to 210 m a.s.l. The land use is mainly constituted by grass and shrubs passing to a woodland of black robust trees at the bottom of the slope. Shallow landslides occurred on this slope in response to two events: (i) on 27 and 28 April 2009, as a consequence of an extreme rainfall event characterized by 160 mm of cumulated rain in 62 h; (ii) between 28 February and 2 March 2014, as a consequence of an event of 68.9 mm in 42 h. The source areas of these shallow landslides have a slope angle higher than 25°, especially between 30° and 35°. The failure surfaces are located at a depth of around 1.0-1.2 m from ground level.
In the test-site slope, an integrated meteorological and hydrological monitoring station has been installed since 27 March 2012, and is still functioning [24]. The station allows meteorological parameters (rainfall depth, air temperature and humidity, air pressure, net solar radiation, wind speed and direction) to be measured, together with soil pore-water pressure, at depths of 0.2, 0.6, and 1.  Flowchart of the methodology adopted for the reconstruction of the physicallybased thresholds.

Corroboration of the Reconstructed Threshold
A comparison between empirical and physicallybased thresholds estimated for Oltrepò Pavese area was performed in order to evaluate the predictive capability of both these models, as well as their advantages and limitations. Their validation was carried out through a dataset of events that took place during the period of August 1992-August 1997. For this time span, rainfall data were collected in correspondence of 3 rain gauges (black circles in Figure 1), while location and triggering moment of shallow landslides were recorded from local newspapers and information of fire brigades.
CTRL-T tool was used to reconstruct the different rainfall events also for this dataset, using the same required input parameters listed in Table 1. For the empirical thresholds, the final position on the graph below or above the defined thresholds was evaluated. In regard to the physicallybased thresholds, similarly to what done for their definition, the identified rainfall conditions were used as input data to apply the TRIGRS model and to assess slope Fs at the representative testsite. Thus, each event was estimated as responsible to trigger or not to trigger shallow landslides based on the assessed values of Fs in correspondence of the sectors typically affected by shallow landslides. Then, the position on the graph below or above the defined physicallybased thresholds was evaluated.
In the case of the estimated physicallybased thresholds validation, it is required to assess the pore-water pressure condition at the same depth of the observed sliding surfaces (1.0 m). For these reasons, time series of pore-water pressures at 1.0 m were modeled through HYDRUS-1D [60] software, considering the same physical and hydrological boundary conditions used for TRIGRS implementation. This model was chosen because it can assess long time series of soil hydrological parameters influenced by intermittent dry and rainy periods in a reliable way [60]. HYDRUS-1D was implemented for each of the 3 meteorological stations included in the validation dataset. Soil hydrological properties ( Table 2) and boundary conditions corresponded to those used for the application of the TRIGRS model. Meteorological conditions required by the model were the rainfall amounts and air temperatures that were used to model the evapotranspiration effects through Hargreaves et al.'s [61] equation. Modeled time spans started from a significant dry period of a year, corresponding to 1 August 1992. In Oltrepò Pavese, early August is characterized every year by dry conditions of soils, which keep steady along depth, due to low rainfalls and high evapotranspiration rates in the previous summer months (June-July). In particular, a pore-water pressure equal to −993 kPa was assumed, according to the field measurements reported in Bordoni et al. [24,47]. This modeling approach was already implemented for the estimation of time series of soil hydrological parameters in other Italian regions prone to shallow landslides [53,62], obtaining a good estimation of the initial pore-water pressure conditions of a triggering event.
Statistical indexes were then calculated to evaluate the predictive capabilities of both types of thresholds for the validation dataset. Considering a rainfall threshold as a binary classifier of rainfall conditions leading to shallow landslides, its performance can be assessed by computing a 2 × 2 a posteriori contingency table [15]. Thus, each rainfall event can correspond to occurrence (true) or nonoccurrence (false) of shallow landslides, while the model can be considered as positive (successful prediction) or negative (wrong prediction). Accordingly, the following indexes can be classified [17,63]: true positives (TP), i.e., rainfall conditions exceeding the threshold causing shallow landslides; false positives (FP), i.e., rainfall conditions exceeding the threshold but without real triggering of shallow landslides; true negatives (TN), i.e., rainfall conditions below the threshold and without shallow landslides occurrence; false negatives (FN), i.e., rainfall conditions below the threshold but causing shallow landslides.

Reconstructed Rainfall Events
A total of 5231 rainfall events were identified by exploiting the database for the period of 2000-2018. Seventy-two percent of the events were detected in the cold season, while only 28% of the events were identified in warm periods, confirming the higher aridity index of warm months calculated for the study area (Figure 4b). In cold periods, the duration of the events ranged between 1 and 280 h, while cumulated amounts ranged between 1.0 and 285.0 mm. About 80% of the events were characterized by a duration lower than 50 h, and cumulated amounts were lower than 50 mm (Figure 6a,b). Instead, in warm periods, duration and cumulated amounts of the events ranged between 1 and 67 h and between 1.0 and 155.2 mm, respectively. Ninety-five percent of the events had a duration lower than 30 h and cumulated amounts lower than 50 mm (Figure 6a,b).
were identified in warm periods, confirming the higher aridity index of warm months calculated for the study area (Figure 4b). In cold periods, the duration of the events ranged between 1 and 280 h, while cumulated amounts ranged between 1.0 and 285.0 mm. About 80% of the events were characterized by a duration lower than 50 h, and cumulated amounts were lower than 50 mm ( Figure  6a,bFigure 6a; Figure 6b). Instead, in warm periods, duration and cumulated amounts of the events ranged between 1 and 67 h and between 1.0 and 155.2 mm, respectively. Ninety-five percent of the events had a duration lower than 30 h and cumulated amounts lower than 50 mm (Figure 6a,b).  The intensity of the events was classified according to Alpert et al.'s [64] classification. According to the duration and cumulated amount of rainfalls, 87% of the events were classified as light (A), light-moderate (B) or moderate-heavy (C1) (Figure 6c). These events represent the typical rainfalls due to regional frontal systems, which characterize cold months in Northern Italy. Only 13% of the events were heavy (C2) or heavy-torrential (D1) (Figure 6c). Similar results were obtained for warm months, when light (A) and light-moderate (B) were more abundant (Figure 6c). These rainfalls are caused by local convective storms, which are typical of the warm months in Northern Italy. In warm periods, more intense rainfalls were less probable (6% of the total events). Triggering events of shallow landslides occurred more frequently during cold months (Figure 7). Eighty-four percent of the events occurred between January and April and between October and December, with the highest amount in March (20%). Only 16% of landslides triggering occurred in warm months, between May and September, with a higher percentage in August (10%). In cold months, 93% of the total triggering events were light-moderate (B), moderate-heavy (C1), or heavy (C2) rainfalls, with duration between 4 and 105 h and cumulated amount between 30.4 and 133.7 mm (Figure 6d-f). Instead, in warm months, triggering rainfalls were mostly (87%) heavy-torrential (D1) or torrential (D2) rainfalls, characterized by duration between 5 and 15 h and cumulated amounts between 106.8 and 155.2 mm (Figure 6d-f). The intensity of the events was classified according to Alpert et al.'s [64] classification. According to the duration and cumulated amount of rainfalls, 87% of the events were classified as light (A), light-moderate (B) or moderate-heavy (C1) (Figure 6c). These events represent the typical rainfalls due to regional frontal systems, which characterize cold months in Northern Italy. Only 13% of the events were heavy (C2) or heavy-torrential (D1) (Figure 6c). Similar results were obtained for warm months, when light (A) and light-moderate (B) were more abundant (Figure 6c). These rainfalls are caused by local convective storms, which are typical of the warm months in Northern Italy. In warm periods, more intense rainfalls were less probable (6% of the total events). Triggering events of shallow landslides occurred more frequently during cold months (Figure 7). Eighty-four percent of the events occurred between January and April and between October and December, with the highest amount in March (20%). Only 16% of landslides triggering occurred in warm months, between May and September, with a higher percentage in August (10%). In cold months, 93% of the total triggering events were light-moderate (B), moderate-heavy (C1), or heavy (C2) rainfalls, with duration between 4 and 105 h and cumulated amount between 30.4 and 133.7 mm (Figure 6d

Pore-Water Pressure Distribution at Sliding Surface Depth
The characterization of pore-water pressure distribution during different seasons in correspondence of the typical depth of sliding surface was performed through the monitoring of data

Pore-Water Pressure Distribution at Sliding Surface Depth
The characterization of pore-water pressure distribution during different seasons in correspondence of the typical depth of sliding surface was performed through the monitoring of data acquired at 1.2 m from ground level, in the test-site slope, in the period March 2012-December 2018. Averagely, the soil horizon kept in unsaturated conditions during the year, reaching minimum values during summer warm months (till −993 kPa), when strong evapotranspiration was not compensated due to limited rainfall amounts. During cold months, soil horizon re-wetted due to a more significant infiltration of rainwater and a more limited evapotranspiration. In these timespans, pore-water pressure grew to values close to 0 kPa, testifying conditions close to saturation in this soil level. In several periods during the cold time span of a year, pore-water pressure could reach positive values (till 12 kPa), especially when several rainfall events were spaced out by limited dry periods.
The distribution of the measured values of pore-water pressure ( Table 3 and Figure 8) presents a certain degree of Gaussian trend, as confirmed by the values of the skewness very close to 0 and by the results of Shapiro-Wilk test, whose W S-W statistic did not allow us to reject the null hypothesis of gaussianity at 95% confidence level (W S-W = 0.92; p-value = 0.06). The first quartile of this distribution was equal to −846 kPa, while the third one was of −20 kPa. Monitoring data allowed to exploit information on triggering events occurred in cold periods. Bordoni et al. [24] showed that during the observed event of 28 February-2 March 2014 at the testsite slope, pore-water pressure was about 0 kPa, at the beginning of the rainfall event which caused the shallow landslide triggering. Only this information is not enough to characterize exhaustively the antecedent hydrological conditions immediately before a rainfall able to provoke landslides in the study area. To analyze a higher range of soil hydrological conditions causing shallow landslides triggering and to test the effect of initial pore-water pressure on the definition of a threshold, physicallybased thresholds were then estimated by modeling the response of the soil to different rainfall events, starting from an initial pore-water pressure condition of -20, -10, or 0 kPa. For simplicity, they are named TRIGRS/-20, TRIGRS/-10, and TRIGRS/0, respectively. In this way, a significant amount of the typical pore-water pressure values at depths of 1.0-1.2 m was considered in the definition of physicallybased threshold, as the third quartile of the measured values was in fact equal to -20 kPa.  Figure 9 shows the comparison between measured and modelled pore-water pressure trends at the typical sliding surface depth (1.0-1.2 m from ground level) in the representative test-site slope for different rainfall events reported in Table 4. The selected events represented typical rainfall scenarios occurring in the study area during the analyzed time span and were characterized by initial porewater pressure conditions similar to the ones chosen for the reconstruction of the physicallybased thresholds.

Comparison between Measured and Estimated Pore-Water Pressure at Sliding Surface Depth
Despite the different features of the tested events, the trend of the pore-water pressure modeled through the physicallybased method (TRIGRS model) seems to simulate in a reliable way the field measurements during each analyzed rainfall event. Differences between measured and estimated values are always lower than 2 kPa at the analyzed soil depth. RMSE values of 0.1-1.2 kPa confirmed the reliability of these simulations. The highest pore-water pressure value at the end of each rainfall Monitoring data allowed to exploit information on triggering events occurred in cold periods. Bordoni et al. [24] showed that during the observed event of 28 February-2 March 2014 at the test-site slope, pore-water pressure was about 0 kPa, at the beginning of the rainfall event which caused the shallow landslide triggering. Only this information is not enough to characterize exhaustively the antecedent hydrological conditions immediately before a rainfall able to provoke landslides in the study area. To analyze a higher range of soil hydrological conditions causing shallow landslides triggering and to test the effect of initial pore-water pressure on the definition of a threshold, physicallybased thresholds were then estimated by modeling the response of the soil to different rainfall events, starting from an initial pore-water pressure condition of −20, −10, or 0 kPa. For simplicity, they are named TRIGRS/-20, TRIGRS/-10, and TRIGRS/0, respectively. In this way, a significant amount of the typical pore-water pressure values at depths of 1.0-1.2 m was considered in the definition of physicallybased threshold, as the third quartile of the measured values was in fact equal to −20 kPa. Figure 9 shows the comparison between measured and modelled pore-water pressure trends at the typical sliding surface depth (1.0-1.2 m from ground level) in the representative test-site slope for different rainfall events reported in Table 4. The selected events represented typical rainfall scenarios occurring in the study area during the analyzed time span and were characterized by initial pore-water pressure conditions similar to the ones chosen for the reconstruction of the physicallybased thresholds.

Comparison between Measured and Estimated Pore-Water Pressure at Sliding Surface Depth
Water 2020, 12, x FOR PEER REVIEW 14 of 28 measured values, modeling errors in pore-water pressure trends could be linked to the simplification provided by the TRIGRS model with regard to soil hydrological features. In particular, TRIGRS model does not consider a layered soil profile, thus forcing to assume average values of the required soil parameters across the analyzed soil profile.    Despite the different features of the tested events, the trend of the pore-water pressure modeled through the physicallybased method (TRIGRS model) seems to simulate in a reliable way the field measurements during each analyzed rainfall event. Differences between measured and estimated values are always lower than 2 kPa at the analyzed soil depth. RMSE values of 0.1-1.2 kPa confirmed the reliability of these simulations. The highest pore-water pressure value at the end of each rainfall event was generally attained through the physicallybased method, unless for the event occurred on 28 February-2 March 2014. Although the model results were in very good agreement with the real measured values, modeling errors in pore-water pressure trends could be linked to the simplification provided by the TRIGRS model with regard to soil hydrological features. In particular, TRIGRS model does not consider a layered soil profile, thus forcing to assume average values of the required soil parameters across the analyzed soil profile.

Reconstruction of Empirical and Physicallybased Thresholds
Rainfall thresholds reconstructed with different methodologies are shown in Figure 10. All of these functions were characterized by a low uncertainty of the two fitting parameters (0.2-1.9 for α, 0.01-0.04 for ω). Instead, equations of the reconstructed thresholds were very different from each other. Average values of the α parameter ranged between 11.2 and 225.0, while mean values of the ω parameter ranged between 0.08 and 0.30. Empirical threshold and physicallybased threshold considering initial pore-water pressure of −20 kPa (TRIGRS/-20) were steeper than the other two functions, as testified by significantly higher values of the ω parameter (0.25-0.30 against 0.08-0.12, respectively). The empirical threshold had the lowest value of intercept α (11.2 ± 0.2). Within physicallybased thresholds, the lower was the value of α the higher is the initial pore-water pressure used to reconstruct the threshold. The α parameter of TRIGRS/0 was about 5 times and 10 times lower than the values for the thresholds TRIGRS/-10 and TRIGRS/-20, respectively.
The practical effects of these differences are clearer when the cumulated amount of rain able to trigger shallow landslides is calculated for different rainfall durations (between 10 and 50 h), based on the defined thresholds (Table 5). For the same duration, the amount of rainfall able to trigger shallow landslides was lower by considering the empirical threshold than physicallybased threshold. used to reconstruct the threshold. The α parameter of TRIGRS/0 was about 5 times and 10 times lower than the values for the thresholds TRIGRS/-10 and TRIGRS/-20, respectively.
The practical effects of these differences are clearer when the cumulated amount of rain able to trigger shallow landslides is calculated for different rainfall durations (between 10 and 50 h), based on the defined thresholds (Table 5). For the same duration, the amount of rainfall able to trigger shallow landslides was lower by considering the empirical threshold than physicallybased threshold.   Using the TRIGRS/0 threshold, the amount of critical cumulated rainfall increases of 5.3-10.5 mm, for the same rainfall duration. For the other physicallybased thresholds, the increase of the critical cumulated amount was more significant. Considering the TRIGRS/-20 threshold, the critical cumulated rain was about 22-25 times higher than that defined by using empirical threshold, for the same duration. Instead, considering the TRIGRS/-10 threshold, the required rainfall able to trigger shallow landslides was about 6-9 times higher than that defined using empirical threshold, for the same duration.
For the empirical thresholds, it is important to highlight that 26.2% of the rainfall events which did not cause the real triggering of shallow landslides (green circles in Figure 10a) was located above the defined thresholds (false positives). Instead, the percentage of rainfall events modeled as not able to trigger landslides but located above the thresholds was lower than 0.5% for all types of physicallybased thresholds. Considering the only triggering event when also the initial pore-water pressure at the depth of the sliding surface was known (28 February-2 March 2014 event at the testsite, 68.9 mm of rain fallen in 42 h), the empirical threshold and the TRIGRS/0 threshold correctly identified this rainfall scenario as a triggering event, since it was located above the defined thresholds (blue square in Figure 10a,d).
To verify the reliability of a rainfall threshold, it is required to quantify its effectiveness in distinguishing rainfall events able to or not able to trigger shallow landslides. This procedure could not be performed for both empirically and physicallybased thresholds by using only the database of the events already utilized to build these models. In fact, a direct comparison between the reliability of different types of thresholds could not be performed, due to the intrinsic outputs of the methods used to reconstruct each threshold. In particular, in the definition of each physicallybased threshold, all the modeled events whose Fs was lower than 1.0 potentially represented a triggering event. Instead, in the database of the triggering events that occurred between 2000 and 2018 and were used as input to build the different thresholds, the initial pore-water pressure at the depth of the sliding surface was measured only for the event of 28 February-2 March 2014 monitored at the testsite. Thus, it is not possible to link an initial pore-water pressure to all the events, neglecting the possibility to quantify the predictive capability of the different thresholds in identifying triggering or non-triggering events.
For these reasons, the validation and the evaluation of the predictive capability of the thresholds were performed by using an external database of rainfall and shallow-landslide events available for another period.

Validation of the Reconstructed Thresholds
For the validation period of August 1992-August 1997, 488 rainfall events were identified (Figure 11a). Twenty of these events represented conditions able to trigger shallow landslides in the study area. The triggering events occurred in the cold period of the year, especially in November and in February-March and were classified as light-moderate (B), moderate-heavy (C1), or heavy (C2) rainfalls according to Alpert et al.'s [64] classification, with a duration between 9 and 218 h and cumulated amount between 38.0 and 129.4 mm.
During the modelled time-span, evapotranspiration rates ( Figure 11b) ranged between 0 and 12 mm/day. Values close to 0 mm/day occurred in cold winter months, while summer dry months were characterized by a warmer condition that allowed evapotranspiration.
Pore-water pressure trend at typical depths of the sliding surfaces (Figure 11c) was characterized by the typical hydrological behaviors of the soil layers at the same depth, as inferred by field data at the test-site slope during the monitoring period since March 2012 [24,47]. These soil layers reached the driest condition during warm months of the year, especially between June and October, when few thunderstorms were interspersed by prolonged periods without rain and with significant evapotranspiration rates. The first significant rainfall events of October-November, characterized by at least 30 mm of rain fallen in 24 h, caused a slight increase in pore-water pressure. A more evident pore-water pressure increase was observed in the following wet period, between November and January, when rainfall events of at least 20-30 mm/day were rather close to each other, and evapotranspiration rates were limited (<1 mm/day) (Figure 11c). During both cold and wet months, pore-water pressure generally remained lower than -20 kPa, reaching saturated conditions in correspondence of other important rainfall events of at least 20 mm/day. Saturated conditions and the development of positive pressures (corresponding to the formation of a perched water table) were most probable till the end of March. In April, pore-water pressure began to decrease down to values lower than -20 kPa, due to an increase in evapotranspiration rates (about 4-5 mm/day) and to an increase in dry days between two different rainfalls. Instead, till the end of June, after very intense events of at least 50 mm of rain fallen in 12 h, a transient increase of pore-water-pressure till values of about −10 kPa was observed. Pore-water pressure tended to decrease very fast, till the driest soil conditions, since the end of June-beginning of July.  During the modelled time-span, evapotranspiration rates (Figure 11b) ranged between 0 and 12 mm/day. Values close to 0 mm/day occurred in cold winter months, while summer dry months were characterized by a warmer condition that allowed evapotranspiration.
Pore-water pressure trend at typical depths of the sliding surfaces (Figure 11c) was characterized by the typical hydrological behaviors of the soil layers at the same depth, as inferred by field data at the test-site slope during the monitoring period since March 2012 [24,47]. These soil layers reached the driest condition during warm months of the year, especially between June and October, when few thunderstorms were interspersed by prolonged periods without rain and with significant evapotranspiration rates. The first significant rainfall events of October-November, characterized by at least 30 mm of rain fallen in 24 h, caused a slight increase in pore-water pressure. A more evident pore-water pressure increase was observed in the following wet period, between November and The distribution of the modeled values of pore-water pressure (Table 6 and Figure 12) was similar to that observed in the field since March 2012, confirming the reliability of the model in representing the real soil hydrological conditions. Main significant differences between monitored and modeled distributions regarded the lowest value of pore-water pressure (−993 and −1483 kPa for monitored and modeled trends, respectively) and the first quartile (−846 and −484 for monitored and modeled trends, respectively), together with the degree of gaussianity, which was not shown in the distribution of the modeled values (skewness of −1.18; W S-W statistic of Shapiro-Wilk test of 0.71, p-value < 0.01; Table 6 and Figure 12). Instead, the third quartile of the distribution of the modeled pore-water pressure was equal to −22 kPa, which is very similar to that one of the monitored values (−20 kPa). These results confirm the reasonable choice of considering initial conditions of pore-water pressure higher than −20 kPa for the reconstruction of the physicallybased thresholds. The modeled value of pore-water pressure at the sliding surface depth at the beginning of a triggering event in the time span of the validation phase was around 0 kPa every time, which is also in agreement with the initial conditions in correspondence of the monitored triggering event of 28 February-2 March 2014 at the test-site slope [24].  monitored and modeled trends, respectively) and the first quartile (-846 and -484 for monitored and modeled trends, respectively), together with the degree of gaussianity, which was not shown in the distribution of the modeled values (skewness of -1.18; WS-W statistic of Shapiro-Wilk test of 0.71, pvalue < 0.01; Table 6 and Figure 12). Instead, the third quartile of the distribution of the modeled porewater pressure was equal to -22 kPa, which is very similar to that one of the monitored values (-20 kPa). These results confirm the reasonable choice of considering initial conditions of pore-water pressure higher than -20 kPa for the reconstruction of the physicallybased thresholds. The modeled value of pore-water pressure at the sliding surface depth at the beginning of a triggering event in the time span of the validation phase was around 0 kPa every time, which is also in agreement with the initial conditions in correspondence of the monitored triggering event of 28 February-2 March 2014 at the test-site slope [24].   Pore-water pressure at the beginning of each identified rainfall was linked to each reconstructed rainfall scenario. This was done to relate rainfall events with a certain initial pore-water pressure to the correct physicallybased threshold. For the validation of the empirical thresholds, all the rainfall events considered for the validation of each physicallybased threshold (TRIGRS/-20, TRIGRS/-10, and TRIGRS/0) were used, in order to make homogeneous the comparison between the validation phases of all the defined thresholds ( Figure 13). Table 7 Listingof the results of the validation phase. All the thresholds correctly identified the rainfall events able to trigger shallow landslides (true positives), as testified by TP values of 95 ± 2% and 100 ± 0% and by FN values of 5 ± 2% and 0 ± 0% for empirical and TRIGRS/0 thresholds, respectively.TP and FN indexes were not calculated for both TRIGRS/-10 and TRIGRS/-20, because no events triggered shallow landslides, starting from initial conditions of pore-water pressure equal to either −10 or −20 kPa. Instead, the reliability of these thresholds in identifying non-triggering rainfall events was assessed by means of TN and FP values. For events with initial pore-water pressure conditions of −20 or −10 kPa, the respective thresholds are characterized by TN of 100 ± 0% and by FP of 0 ± 0%, confirming the capability of these models in distinguishing events able to trigger or not trigger shallow landslides. Moreover,the TRIGRS/0 threshold assessed the conditions which could not trigger slope instabilities well, as testified by TN of 93 ± 1% and by FP of 7 ± 1%. Instead, the empirical threshold was characterized by a lower ability in distinguishing triggering or non-triggering events. Its TN was of 76 ± 3%, while its FP was of 24 ± 3%. In these terms, these thresholds overestimated the conditions able to trigger shallow landslides, classifying 24 ± 3% of real non-triggering events as able to cause shallow landsliding (false positives).
Pore-water pressure at the beginning of each identified rainfall was linked to each reconstructed rainfall scenario. This was done to relate rainfall events with a certain initial pore-water pressure to the correct physicallybased threshold. For the validation of the empirical thresholds, all the rainfall events considered for the validation of each physicallybased threshold (TRIGRS/-20, TRIGRS/-10, and TRIGRS/0) were used, in order to make homogeneous the comparison between the validation phases of all the defined thresholds ( Figure 13).  Table 7.Listingof the results of the validation phase. All the thresholds correctly identified the rainfall events able to trigger shallow landslides (true positives), as testified by TP values of 95 ± 2% and 100 ± 0% and by FN values of 5 ± 2% and 0 ± 0% for empirical and TRIGRS/0 thresholds, respectively.TP and FN indexes were not calculated for both TRIGRS/-10 and TRIGRS/-20, because no events triggered shallow landslides, starting from initial conditions of pore-water pressure equal to either -10 or -20 kPa. Instead, the reliability of these thresholds in identifying non-triggering rainfall events was assessed by means of TN and FP values. For events with initial pore-water pressure conditions of -20 or -10 kPa, the respective thresholds are characterized by TN of 100 ± 0% and by FP of 0 ± 0%, confirming the capability of these models in distinguishing events able to trigger or not trigger shallow landslides. Moreover,the TRIGRS/0 threshold assessed the conditions which could not trigger slope instabilities well, as testified by TN of 93 ± 1% and by FP of 7 ± 1%. Instead, the empirical threshold was characterized by a lower ability in distinguishing triggering or non-triggering events. Its TN was of 76 ± 3%, while its FP was of 24 ± 3%. In these terms, these thresholds overestimated the conditions able to trigger shallow landslides, classifying 24 ± 3% of real non-triggering events as able to cause shallow landsliding (false positives).

Discussion
Rainfall thresholds can be considered a fundamental tool for assessing hazard toward slope instabilities and for defining reliable early warning system tools for their prediction [65].
One of the major challenges in establishing effective thresholds is obtaining a threshold able to correctly distinguish the triggering scenarios (true positives) from the events which cannot cause the development of slope failures (true negatives), also avoiding numerous erroneous alerts, corresponding to rainfall conditions that could not cause real instabilities (false positives).
Rainfall thresholds answering these issues are mostly reconstructed through an empirical/statistical approach, exploiting past inventories formed by the events able to or not able to trigger shallow landslides [4,5]. Uncertainties and limitations of these thresholds (i.e., availability and quality of rainfall data and landslides information, correct definition of the triggering times, and neglecting the antecedent soil hydrological conditions) induce researchers in order to verify the possibility of using physicallybased procedures that can provide the assessment of the link between rainfall features, soil hydromechanical conditions before a rainfall event, and shear strength response of soils during the rainwater infiltration.
Mirus et al. [32] and Fusco et al. [30] aimed to perform a robust comparison between these two types of approach, in steep slopes covered by colluvial soils derived from glacial and till deposits in the coastal area of the Northwestern United States [32], and in steep slopes covered by thick pyroclastic deposits in Southern Italy [30], respectively. The present paper compares empirical and physicallybased rainfall thresholds estimated for a wide area of Northern Italy (Oltrepò Pavese), significantly prone to shallow landslides and representative of the typical geological, geomorphological, and environmental features of Apennine area [66].
Empirical thresholds of the study area were reconstructed by means of the typical exploitation of a long multi-temporal inventory (2000-2018) of rainfall events able to trigger or not shallow landslides. Instead, the second type of thresholds were estimated through a physicallybased slope model (TRIGRS), representative of the real geological and geomorphological conditions where shallow landslides develop in the study area, allowing to couple the monitoring of soil hydrological responses toward atmosphere-soil interface, the modeling of slope hydrological responses, and the slope stability analysis. In this way, the estimation of rainfall thresholds was performed by considering not only rainfall attributes, but also the typical antecedent soil hydrological conditions.
Monitoring data acquired during a significant time span, covering more than seven years (March 2012-October 2019) [24,45], and the modeled ones for a five-year period (August 1992-August 1997) demonstrate variations of soil pore-water pressure trends in deep soil horizons, where sliding surface could form. Monitoring and modeling data confirm that the soil pore-water pressure regime is linked to the seasonal and interannual meteorological variability, showing similar trends during warm/dry and cold/wet months across different years. Unsaturated conditions are typical of warm months in the year, especially between May and September. A certain increase of pore-water pressure till values close to 0 kPa was observed only after the most intense events, of at least 50 mm of rain fallen in at least 12 h. After re-wetting events in the first weeks of Autumn months (at least 30 mm of rain fallen in at least 24 h), the coldest time of the year, which lasts from October to April, is characterized by pore-water pressure closer to saturated conditions, generally in the order of −20 and −10 kPa. Development of nil or positive values in correspondence of other important rainfall events, corresponding to the formation of a perched water table, occurs when further strong rainfall events, of at least 20 mm/day, or prolonged rainy periods, affect the study slope.
Such a seasonal hydrological behavior explains why shallow-landslide-triggering events occurred mostly during cold months between October and April. Antecedent pore-water pressure close to 0 kPa in soil horizons where shallow landslides develop, combined with further heavy rainfall events or a prolonged rainy period (duration between 4 and 105 h, with cumulated amount between 30.4 and 133.7 mm), cause the typical scenario which induces widespread slope instabilities in the study area. This scenario also confirms the monitored conditions of triggering during 28 February-2 March 2014 event that is shown in Bordoni et al. [23].
Conversely, in warm months between May and September, only heavy-torrential (D1) or torrential (D2) rainfalls, according to Alpert et al.'s [64] classification (duration between 5 and 15 h and cumulated amounts between 106.8 and 155.2 mm), have the potential to trigger shallow failures, only when they are preceded by other rainfalls which cause the increase in pore-water pressure to go up to around −20 kPa.
Triggering conditions in cold months of the study area are similar to those identified in different contexts all over the world, which are characterized by a cold and wet season in a year like in the study area [32,[67][68][69][70][71]. Instead, triggering events of warm months have features similar to those commonly occurring in the coastal zones of the Mediterranean region [16,28,30,51,63,72,73], when strong convective thunderstorms affect those areas especially at the end of summer (September) or in the first weeks of autumn (October-November).
The differences in triggering conditions and the significant effect of soil hydrological conditions at the beginning of a rainfall event influence the reconstructed thresholds for shallow landslides' occurrence. By comparing the different physicallybased thresholds, it is clear that the drier the soil is, the bigger the amount of rain required to trigger a landslide is, considering the same duration of the event. For a certain temporal length of the rainfall, the cumulated amount able to trigger shallow landslides for an initial pore-water pressure condition of −20 kPa is about 20-25 times higher than that required if the initial pore-water pressure is of 0 kPa. This amount decreases if the initial pore-water pressure is of −10 kPa, even if it is still 6-8 times higher than that obtained considering an initial pore-water pressure of 0 kPa. This estimation matches with the datasets of triggering events analyzed for the study area, where rainfalls able to trigger shallow landslides were more severe, in terms of cumulated amount (higher than 100 mm), when they occurred in periods with soil in unsaturated conditions. Instead, the amount of rain able to trigger shallow landslides decreased significantly, till more than 3 times, when the soil was saturated.
The empirical threshold is very close to the physicallybased one estimated based on an initial pore-water pressure condition of 0 kPa. For the same duration, the amount of triggering cumulated rainfall for an initial pore-water pressure of 0 kPa is 5.3-10.5 mm higher than that estimated by the empirical threshold. This is in agreement with comparisons between physicallybased and empirical thresholds performed in other areas prone to shallow landsliding worldwide [28,30,32].
In the dataset used to validate the reconstructed thresholds, triggering events occurred only in conditions of pore-water pressure equal to 0 kPa. Both empirical threshold and TRIGRS/0 threshold correctly identified rainfall events able to trigger shallow landslides (TP higher than 95%, FN lower than 5%), although only the TRIGRS/0 threshold recognized all the triggering events. However, the empirical threshold significantly overestimated the rainfall conditions able to trigger shallow landslides, as testified by FP = 24±3%. Instead, TRIGRS/0 threshold worked well for assessing the conditions which could not trigger slope instabilities, strongly limiting the false positives (FP = 7±1%).
These results confirm the fundamental role played by the soil hydrological conditions present at the beginning of a rainfall event on the development of shallow slope failures. All the false positives identified by the empirical threshold correspond to rainfall event occurred when the soil was not completely saturated, especially (90%) when pore-water pressure was lower than −10 kPa ( Figure 14). These results are confirmed also by an event that occurred on 21 October 2019, when a strong thunderstorm hit the northern portion of the study area, in particular close to rain gauges 3 and 6 ( Figure 1). In total, 118 mm of rain fell in 24 h, with a peak of 97 mm of cumulated rain in 6 h, between 5:00 p.m. and 11:00 p.m. local time. These rainfall conditions are located above the empirical thresholds, but they did not cause any triggering of shallow failures due to pore-water pressure conditions, at the beginning of the rainfall, of −800 kPa, as measured by the monitoring station in the study area.
Water 2020, 12, x FOR PEER REVIEW 22 of 28 ( Figure 1). In total, 118 mm of rain fell in 24 h, with a peak of 97 mm of cumulated rain in 6 h, between 5:00 p.m. and 11:00 p.m. local time. These rainfall conditions are located above the empirical thresholds, but they did not cause any triggering of shallow failures due to pore-water pressure conditions, at the beginning of the rainfall, of -800 kPa, as measured by the monitoring station in the study area. Empirical threshold causes an overestimation of triggering events, determining false positives in correspondence of rainfall conditions similar to the ones that provoked observed shallow failures, but with an initial soil condition drier than that corresponding to the real triggering events. Thus, physicallybased thresholds which also take into account the antecedent soil conditions in terms of pore-water pressure can represent an improvement, both in terms of objectively predicting shallowlandslide occurrence and also limiting false positives.
Reconstructed rainfall thresholds for the Oltrepò Pavese area were then compared with other Empirical threshold causes an overestimation of triggering events, determining false positives in correspondence of rainfall conditions similar to the ones that provoked observed shallow failures, but with an initial soil condition drier than that corresponding to the real triggering events. Thus, physicallybased thresholds which also take into account the antecedent soil conditions in terms of pore-water pressure can represent an improvement, both in terms of objectively predicting shallow-landslide occurrence and also limiting false positives.
Reconstructed rainfall thresholds for the Oltrepò Pavese area were then compared with other duration(D) and cumulated amount (E) thresholds of other Italian areas (Figure 15). Regional and national thresholds in Italy [8,[74][75][76] were derived by using an empirical approach similar to that adopted for the empirical thresholds of the Oltrepò Pavese area. Thresholds for Oltrepò Pavese were also compared to a world threshold defined by Innes [77] for the occurrence of debris flows. Empirical threshold causes an overestimation of triggering events, determining false positives in correspondence of rainfall conditions similar to the ones that provoked observed shallow failures, but with an initial soil condition drier than that corresponding to the real triggering events. Thus, physicallybased thresholds which also take into account the antecedent soil conditions in terms of pore-water pressure can represent an improvement, both in terms of objectively predicting shallowlandslide occurrence and also limiting false positives.
Physicallybased thresholds obtained on the basis of antecedent pore-water pressure equal to −20 kPa (TRIGRS/-20) or −10 kPa (TRIGRS/-10) are located above all the other thresholds, in agreement with the need of a higher amount of rain to trigger shallow landslides in unsaturated soil conditions. TRIGRS/0 threshold and the empirical thresholds are located close to each other, with the former slightly above the empirical curves. Physicallybased thresholds reconstructed for completely saturated soils (TRIGRS/0) intercept all other considered thresholds (at world, Italian, and regional scale) for an event duration of 40 h,whereas the empirical thresholds intercept world and some regional (Sicily, Italian Alps) thresholds at the same duration. Moreover, both these thresholds show a lower steepness, which implies that the rainfall amount required to trigger shallow landslides for event with duration less than 40 h is higher than the one of the compared world, Italian, and regional thresholds. Instead, for event longer than 40 h, TRIGRS/0 thresholds and the empirical thresholds are below the other thresholds. For these rainfall features, the cumulated amount able to trigger shallow failures is lower if compared to the other analyzed thresholds. The Oltrepò Pavese area is, then, more susceptible to shallow landsliding for long-duration [64] events. For short and medium events [64], the amount of rainfall able to trigger shallow landslides is higher, thus reducing the proneness of the territory in correspondence of such events.
According to the achieved results of this paper, the main relevance of this work and of the reconstructed thresholds are as follows: (i) empirically and physicallybased thresholds for a representative area of the Italian Apennines; (ii) different physicallybased thresholds according to different soil hydrological conditions and considering rainfall scenarios already measured in the study area; (iii) implementation of a physicallybased slope model allowing to couple the monitoring of soil hydrological responses toward atmosphere-soil interface, the modeling of slope hydrological responses, and the slope stability analysis; (iv) robust evaluation of the threshold's predictive capability through a different dataset with respect to that used in the reconstruction of the models; and (vi) determination of advantages and constraints in the use of empirically or physicallybased thresholds.

Conclusions
The reconstruction of reliable thresholds with a high predictive capability becomes very important, especially if their implementation in an earlywarning system is proposed. In regard to the Oltrepò Pavese hilly area, which can be assumed representative of the geological, geomorphological, and land-use settings prone to shallow landslides also of the whole Northern Apennines, empirically and physicallybased thresholds were estimated. These were evaluated by quantifying their predictive capability through the comparison between modeled and real triggering or non-triggering conditions, identified in a validation dataset covering a five-year time span.
The role played by the soil hydrological conditions at the beginning of a rainfall event is fundamental in making this rainfall able to trigger or not trigger shallow landslides. The lower the pore-water pressure is at the beginning of an event, the higher the amount of rainfall required to trigger shallow failures is. When shallow landslides occur as a consequence of rain fallen on previously saturated soil (nil pore-water pressure), as in the study area, physicallybased thresholds provide a better reliability in discriminating the event which could or could not trigger slope failures. Besides a good capability in identifying correctly the triggering conditions, empirical thresholds, based only on rainfall data and neglecting the antecedent soil hydrological conditions, provide a significant number of false positives. These are events similar to the ones that provoked observed shallow failures, but with initial soil conditions drier than those corresponding to the real triggering events.
Main conclusions of this work can be summarized as follows: • The antecedent soil hydrological conditions have a primary role on predisposing or preventing shallow slope instability during a rainfall event. This should be taken into account, especially for those contexts characterized by seasonal hydrological behaviors and, in particular, during periods when initial pore-water pressure conditions are more favorable to lead the triggering of shallow landslides. In fact, in the Oltrepò Pavese area, cold and wet months between October and April are the most susceptible periods of the year, due to the permanence of saturated or close-to-saturation soil conditions; • The most promising approach for developing an early warning system based on rainfall thresholds seems to be the reconstruction of physicallybased thresholds for the typical initial pore-water pressure conditions leading to slope instabilities. These tools can be supported further by the monitoring of the soil hydrological behaviors and slope stability analysis in correspondence of different rainfall scenarios. However, to confirm the better effectiveness of the physicallybased thresholds than the empirical ones, it is required a comparison of the threshold exceedance against existing early warning criteria and further landslide occurrence for future time span, as suggested in [12]; • Physicallybased models of a representative testsite and hydrological monitoring data could be not always available for a susceptible area. In such a context, empirical thresholds can represent a precautionary approach that allows us to identify the triggering conditions in a reliable way, in the awareness that they can give many false positives,especially for rainfall events similar to those provoking shallow landslides, but occurring in dry periods; • Physicallybased thresholds are reconstructed based on physical simulation of slope stability, according to well-defined geotechnical, mechanical, and hydrological soil parameters. To take into account the intrinsic variability of these parameters, also within small areas, probabilistic models will be applied to reconstruct this type of threshold in future developments. Funding: This work has been in the frame of the ANDROMEDA project, which has been supported by Fondazione Cariplo, grant n • 2017-0677.