An Update on Rainfall Thresholds for Rainfall-Induced Landslides in the Southern Apuan Alps (Tuscany, Italy) Using Different Statistical Methods

: The southern Apuan Alps (Italy) are prone to rainfall-induced landslides. A first attempt to calculate rainfall thresholds was made in 2006 using non-statistical and repeatable methods for the 1975–2002 period. This research aims to update, validate, and compare the results of that attempt through different statistical approaches. Furthermore, a new dataset of rainfall and landslides from 2008 to 2016 was collected and analyzed by reconstructing the rainfall events via an automatic procedure. To obtain the rainfall thresholds in terms of the duration–intensity relationship, we applied three different statistical methods for the first time in this area: logistic regression (LR), quantile regression (QR), and least-squares linear fit (LSQ). The updated rainfall thresholds, obtained through statistical methods and related to the 1975–2002 dataset, resulted in little difference from the ones obtained with non-statistical methods and have similar efficiency values among themselves. The best one is provided by the LR, with a landslide probability of 0.55 (efficiency of 89.8%). The new rainfall thresholds, calculated by applying the three statistical methods on the dataset from 2008–2016, are similar to the 1975–2002 ones, except for the LR threshold, which exhibits a higher slope. This result confirms the validity of the thresholds obtained with the old database.


Introduction
The need to experiment on approaches and methodologies to identify an empirical relationship between rainfall and landslides lies in the objective difficulty of identifying and quantifying the numerous factors that contribute to the initiation of landslides.If this is already difficult at slope scale, it becomes almost impossible at a basin or regional scale.On the other hand, public bodies that manage forecasting, prevention, and warning plans must often monitor large areas rather than single slopes.For this reason, the physically based approaches e.g., [1][2][3][4][5] are not easily applicable and are less common than the black-box approaches, in which the rainfall parameters (e.g., duration, intensity, cumulated event rainfall, antecedent rainfall) are used to find the relationships with landsliding, associating the return time of landslides to that of rainfall.
In general, the activation of landslides includes several preparatory factors (rainfall, geotechnical, hydrogeological, geological, and morphological features, as well as soil humidity, land use, etc.).Even the triggering factors may be different, however rainfall is recognized as the most common factor [6].For these reasons, it is rare, if not impossible, to have a lot of data available for analyzing the stability conditions of slopes over a large area (e.g., using classic deterministic approaches with the factor of safety).Shallow landslides have a strong relationship with rainfall, especially if the rainfall is intense [7][8][9].Indeed, they usually involve a thin soil cover (less than 2 m thick) on steep slopes [10,11], without interference from aquifers.On the contrary, large landslides (several meters thick and covering large areas) are often related to the behaviour of water tables, whose relationship with rainfall is usually more delayed and uncertain [12].This is the motivation for why the target of the empirical approach is superficial landslides, showing an immediate and stronger relationship with rainfall.Nevertheless, shallow landslides represent very hazardous phenomena in relation to their speed of activation, impact power, difficulty to foresee the location, and high velocity [8,[13][14][15][16][17][18][19].
Since the well-known first attempt by [20], the calculation of critical thresholds for initiating rainfall-induced landslides became a widely used approach worldwide by using different methodologies and increasingly sophisticated statistical techniques, as shown in [6,.On the other hand, these methodologies can be reasonably suitable for predicting shallow landslide phenomena due to the general availability of rainfall data, which today are often characterized by long time series.Indeed, the good performance of these methodologies is based on the accessibility of rainfall data for many years, associated with the availability of landslide event information in terms of location and activation time.
However, the applicability of these approaches is, in our opinion, inversely proportional to the extension of the area on which they are applied.Indeed, the more the geological, morphological, hydrogeological, geotechnical, and land use features of a certain area change, the more the territory responds to rainfall change.This leads to greater uncertainty in identifying rainfall thresholds that activate superficial landslides.This specific reason suggests the necessity to apply these methods to homogeneous areas, which usually implies that the area being studied is small.
In this case, the chosen area is in the southern Apuan Alps (northwestern Tuscany, Italy).The Apuan Alps are well known not only for their beautiful environment, presence of well-known marble quarries, and proximity to popular tourist beaches (e.g., Versilia beach), but also because they are highly prone to shallow landslides and floods [14,[43][44][45].From this point of view, it is strategic for local authorities to be able to use warning tools based on weather forecasting and rainfall observation.

Settings, Criticisms, and Goals
The southern Apuan Alps are located in northwestern Tuscany (Italy) as seen in Figure 1.The region has one of the highest rainfall values in Italy, reaching a mean annual precipitation of about 3000 mm due to regular heavy rainstorms [44,46].These peculiarities are mainly linked to the geographical location as well as the morphological and physiographic characteristics of the Apuan Alps.They are indeed a chain of almost 2000 m in altitude and are close and parallel to the coastline, thus intercepting the atmospheric disturbances of Atlantic origin.One of the most well-known downpours hit the southern portion of Apuan Alps on 19 June 1996, inducing hundreds of shallow landslides, debris torrents, debris flows, and floods, resulting in heavy damage to buildings, infrastructures, and the overall economy, with this incident resulting in 14 deaths (Figure 2) [14,46].This area underwent numerous other damaging events in the past, among which the most important occurred in 1636, 1774, 1885, and 1902 [44].In addition to the triggering cause (heavy rainstorms), there are indeed many other predisposing causes including the spreading of poorly permeable rocks (particularly schists, phyllites, and metasandstones) which lie under more permeable rocks (marbles and limestones), the spread of superficial soil cover (mainly formed by silty sand and sandy silt), the high steepness of slopes (30-45 • ) [14], the proximity of the mountain range to the sea (which favors low run-off times), and the abandonment of agro-forestry practices [14].The typical mass movement activated in these conditions is a very-to-extremely rapid shallow landslide, represented by an initial debris slide that evolves into a debris flow (according to the [47] classification), which then flows into the hydrographic network and brings large quantities of solid material, including tree trunks, with it.This generates a very high destructive power when torrential debris flows interact with the anthropogenic environment, which is what occurred in the June 1996 event.The typical mass movement activated in these conditions is a very-to-extremely rapid shallow landslide, represented by an initial debris slide that evolves into a debris flow (according to the [47] classification), which then flows into the hydrographic network and brings large quantities of solid material, including tree trunks, with it.This generates a very high destructive power when torrential debris flows interact with the anthropogenic environment, which is what occurred in the June 1996 event.The typical mass movement activated in these conditions is a very-to-extremely rapid shallow landslide, represented by an initial debris slide that evolves into a debris flow (according to the [47] classification), which then flows into the hydrographic network and brings large quantities of solid material, including tree trunks, with it.This generates a very high destructive power when torrential debris flows interact with the anthropogenic environment, which is what occurred in the June 1996 event.
Moreover, the main feature of these events is the activation of many shallow landslides with high density (tens per square kilometers) almost simultaneously.The difficulty of predicting their location and activation time makes these phenomena very dangerous, incentivizing the scientific community to identify actions that can define alert systems.Given the lack of availability of geotechnical and hydrogeological data to define the stability conditions governing the slopes over large areas, one of the most used approaches is the quantification of rainfall quantities capable of activating shallow landslides, thus analyzing the causes rather than the effects [37,45,[48][49][50][51].
With particular reference to the southern Apuan Alps, Ref. [27] already elaborated critical threshold curves with non-statistical methodologies analyzing the rainfall and shallow landslide data from 1975 to 2002 considering the raingauge of Retignano (420 m) using a non-statistical and repeatable approach, as seen in Figure 1.To update and compare the outcomes of that work, the rainfall and landslide data collected by [27] from 1975 to 2002 were analyzed using the most recent statistical approaches.Moreover, a new rainfall dataset was collected from the same raingauge over a period from 2008 to 2016 (relevant events did not occur during the 2003-2007 period, and then the research was interrupted in 2016 due to lack of resources), collecting information on shallow landslides activation, to compare the results with the 1975-2002 dataset.Therefore, the important goal of this research is the updating of the old manual rainfall thresholds with modern statistical techniques and their validation using a more recent dataset in the same area.

Datasets
Considering one of the aims of this research, namely the updating/comparison of the critical threshold curves of [27], it was considered fundamental to use the same raingauge (Retignano, 420 m) as the source of the rainfall data (Figure 1).At that time, the Retignano raingauge was the only one in the considered zone with a significant time series and the possibility of analyzing rainfall events lasting less than 24 h, being equipped by a pluviograph from 1975 until 1996, and only becoming electronic afterwards.On the other hand, rainstorms with durations of less than 24 h are common in the Apuan area and sometimes induce landsliding.For example, the June 1996 flood lasted just 12-13 h [14], but other events lasting only some hours activated landslides causing deaths [44].
In relation to the goals of this research, the calculation, updating, and validation of the rainfall thresholds for the area under study were carried out considering two different rainfall and landslide datasets: • the first one is the same used by [27], namely a selection of 167 rainfall events happened during the 1975-2002 period, divided into 139 rainfall events that did not initiate shallow landslides (hereafter NSLE: no shallow landslide events) and 28 events that initiated at least one shallow landslide (hereafter SLE: shallow landslide events).These events were manually selected based on the response of the pluviographic chart, considering the start time and the end time of the rainfall for each event.For example, events with high intensity (20-30 mm/h) and low duration (1-2 h) or low intensity (2-4 mm/h) and high duration (40-50 h), as well as intermediate cases;

•
the second dataset, from 2008 to 2016, was obtained by applying an automatic reconstruction of the rainfall events using the algorithm developed by [52], obtaining a total of 567 events.In order to compare this new dataset with that relating to 1975-2002, the former was further manipulated to exclude all those events of too low intensity and/or duration to be relevant for deriving the rainfall thresholds.The updated dataset resulted in 183 rainfall events, of which 152 did not activate shallow landslides (NSLE) and 31 activated shallow landslides (SLE).
For each rainfall event (both NSLE and SLE), the following information was collected: date, starting and ending time, duration (D), cumulated rainfall (E), and rainfall intensity (I = E/D).

Rainfall Thresholds
The most used rainfall thresholds in scientific literature, as used in this work, are based on the rainfall intensity -duration (ID) and the rainfall cumulated -duration (ED) relationships [6,28,31,37].To obtain the rainfall ID thresholds using the two exposed datasets, three different statistical methods implemented in R software (R version 4.3.0(21 April 2023 ucrt) were used: logistic regression (LR), quantile regression (QR), and leastsquares linear fit (LSQ).Some other methods for threshold computation are found in the literature based on machine learning techniques, particularly the support vector machine (SVM) method [53] and methods based on artificial neural networks (ANN) [54,55].We decided not to use them either because they are too complex or because a large amount of data is required to train the model.All three employed statistical methods allow for the obtaining of threshold curves with a general form of power law [20]: (I: rainfall intensity (mm/h); D: duration (h), which are straight lines in log-log axis; α: intercept on the ordinate axis corresponding to 1 h; β: threshold curve slope).
LR models (see also [45]) are a particular form of generalized linear models in which the value of the output continuous variable, given the set of predictors (in this case duration and intensity), can vary between 0 and 1.It is interpreted as probability p that a rainfall of duration D and intensity I can trigger a landslide.This is definable by the equation: (I': Log 10 (I) (mm/h); D ′ : Log 10 (D) (h); βi with i = 0, 1, 2: parameters of the regression coefficients).
The determination of the rainfall thresholds with LR requires the use of both types of rainfall events, attributing a posteriori probability p = 1 to the SLE events and p = 0 to the NSLE events.
QR and LSQ were used by several authors to determine ID and ED thresholds [31,34,37,56,57] considering only the SLE events as input data.In the LSQ method, a linear dependence of type I' = α + βD' is assumed and the calculation of α and β parameters is carried out by minimizing the sum of the square deviations between the measured I' and the fit-forecasted value, keeping the detected D' values fixed.In this way, it is possible to determine the threshold TLSQ-50 corresponding to the fiftieth Gaussian percentile of the SLE events.Moreover, the standard deviation (σ) is obtained, and it is therefore possible to determine the lower percentiles by translating the intercept α using σ.To obtain several thresholds, we define a probability p e (called exceedance probability threshold [34]) corresponding to a straight line below which, on average, p e •N events are found, where N is the number of SLE events.The α value corresponding to the chosen p e is calculated, translating the TLSQ_50 line according to the procedure described in [56].
Since the LSQ method allows for the calculation of different thresholds with different α values keeping β unchanged, it properly represents the general trend of the distribution of events, but, in some cases, does not obtain good thresholds for low p e values.For this reason, the QR method was used as well.Once p e is chosen, the QR method calculates the α and the β of the threshold below which, on average, p e •N events are found.QR is considered a robust technique and a valid alternative to the LSQ especially when outliers are present.QR also uses only the SLE events as input data, but unlike LR and LSQ methods, in QR both the α and the β values depend on the p e value.For this reason, the obtained thresholds exhibit different slopes.
Since the methods used for determining the thresholds are probabilistic and not deterministic, it is necessary to carry out a statistical check.This validation allows us to evaluate the possibility of their concrete use in an alert system and to get a choice criterion for an optimal threshold.
The validation of the rainfall thresholds was pursued using the approach suggested by [58], using the number of TP (true positive), TN (true negative), FP (false positive), and FN (false negative) to calculate the following skill scores: • POD (probability of detection): TP/(TP + FN) [range 0, 1-optimal value 1];

Processing of the 1975-2002 Dataset
As explained previously, the 1975-2002 dataset includes 167 rainfall events, 28 of which induced shallow landslides.The main rainstorms in terms of landslides number occurred in 1984, 1992, 1994, 1996, 1998, and 2000.An analysis of the event distribution in relation to seasons show that they occurred in autumn 40% of the time [27].
Using the 1975-2002 rainfall and landslide dataset [27], the rainfall thresholds for the southern Apuan Alps were recalculated using the three fit methods (LR, QR, LSQ) by means of R software [60].The values of α and β were obtained using the three methods along with their 95% confidence intervals, and are reported in Tables 1-3.The processing of the rainfall and landslide dataset produced the contingency and skill score tables shown in Tables 4-6 and Figures 3-5.The best results of the three statistical methods and the lower threshold of [27] are reported in Table 7.The graph shown in Figure 6 allows for a visual comparison between the best thresholds obtainable with the three aforementioned methods and the two thresholds plotted by [27].).Blue squares: events that did not trigger landslides (NSLE); red lozenges: events that triggered landslides (SLE).
Table 5. Contingency table (TP, FN, FP, TN) and skill scores (POD, POFD, POFA, Ef, HK) of the thresholds obtained using QR method.Threshold showing the maximum efficiency value is highlighted in red (TQR_60 shows the same results but is calculated for a higher probability).).Blue squares: events that did not trigger landslides (NSLE); red lozenges: events that triggered landslides (SLE).
Table 7. Summary of the contingencies (TP, FN, FP, TN) and skill scores (POD, POFD, POFA, Ef, HK) of the SLE and NSLE forecasts obtained from the three best thresholds calculated on the 1975-2002 dataset using LR, LSQ, QR statistical techniques and the lower rainfall threshold plotted by [27] (blue in Figure 6).In this case, both the number of correct forecasts (28 out of 28 for SLE) and the number of false alarms (FP) are particularly high.The comparison between the thresholds obtained by [27] using a manual fitting and those obtained in this study by statistical calculation on the same rainfall and landslide dataset  was carried out by evaluating the efficiency of the lower threshold only of [27] (straight blue line in Figure 6).The upper threshold (red line in Figure 6) was not considered in the comparison due to the scarcity of rainstorms that induced many landslides (type A events according to [27]).From the position of the lower threshold and joining the events of type A and type B (rainstorms that induced a few landslides) events in a single category of landslide events (SLE), the contingency and skill scores table were calculated (Table 7).Figure 6 shows the comparison between the thresholds of [27] and those obtained using statistical methodologies.It is evident the very low position of the low threshold (blue line) of such Author, which, on the other hand, was considered conservative.Indeed, many NSLE events (blue squares) fall above the threshold.The implication and the consequence of its low position is highlighted in Table 7, showing the total accuracy of SLE's prediction while also showing the fact that it generates a notable number of false alarms (56 events).This characteristic clearly affects the efficiency (Ef) of the threshold, which is around 66%. Table 7. Summary of the contingencies (TP, FN, FP, TN) and skill scores (POD, POFD, POFA, Ef, HK) of the SLE and NSLE forecasts obtained from the three best thresholds calculated on the 1975-2002 dataset using LR, LSQ, QR statistical techniques and the lower rainfall threshold plotted by [27] (blue in Figure 6).In this case, both the number of correct forecasts (28 out of 28 for SLE) and the number of false alarms (FP) are particularly high.The comparison between the thresholds obtained by [27] using a manual fitting and those obtained in this study by statistical calculation on the same rainfall and landslide dataset  was carried out by evaluating the efficiency of the lower threshold only of [27] (straight blue line in Figure 6).The upper threshold (red line in Figure 6) was not considered in the comparison due to the scarcity of rainstorms that induced many landslides (type A events according to [27]).From the position of the lower threshold and joining the events of type A and type B (rainstorms that induced a few landslides) events in a single category of landslide events (SLE), the contingency and skill scores table were calculated (Table 7).Figure 6 shows the comparison between the thresholds of [27] and those obtained using statistical methodologies.It is evident the very low position of the low threshold (blue line) of such Author, which, on the other hand, was considered conservative.Indeed, many NSLE events (blue squares) fall above the threshold.The implication and the consequence of its low position is highlighted in Table 7, showing the total accuracy of SLE's prediction while also showing the fact that it generates a notable number of false alarms (56 events).This characteristic clearly affects the efficiency (Ef) of the threshold, which is around 66%. alyzed dataset, determine contingency values very differently from the lower threshold of [27].The results (Table 7) show that the three thresholds have very similar efficiency values, differing from each other by around 0.1%.The best of the three thresholds is that obtained through the LR method, with a landslide probability of 0.55.It guarantees an efficiency of 89.8%, with a correct forecast of 150 events out of a total of 167.

Processing of the 2008-2016 Dataset
Through the application of the algorithm determined in [52], the automatic reconstruction of the rainfall events was allowed to collect 567 main rainstorms that occurred in the southern Apuan Alps from 2008 to 2016.The events show the following distribution: 21% occurred in winter (December, January, February); 30% in spring (March, April, May); 22% in summer (June, July, August); and 27% in autumn (September, October, November) (Figure 7).The rainfall thresholds obtained through statistical methods, being higher in the analyzed dataset, determine contingency values very differently from the lower threshold of [27].The results (Table 7) show that the three thresholds have very similar efficiency values, differing from each other by around 0.1%.The best of the three thresholds is that obtained through the LR method, with a landslide probability of 0.55.It guarantees an efficiency of 89.8%, with a correct forecast of 150 events out of a total of 167.

Processing of the 2008-2016 Dataset
Through the application of the algorithm determined in [52], the automatic reconstruction of the rainfall events was allowed to collect 567 main rainstorms that occurred in the southern Apuan Alps from 2008 to 2016.The events show the following distribution: 21% occurred in winter (December, January, February); 30% in spring (March, April, May); 22% in summer (June, July, August); and 27% in autumn (September, October, November) (Figure 7).
An archive search was carried out for the 567 rainfall events in order to identify the number of shallow landslides triggered, with a total of 93 being found.Many of them were triggered during the same rainstorm, such as the 9 failures activated by the 25 October 2011 thunderstorm or the 11 failures triggered during the 5 January 2014 event.Therefore, a total of 31 rainfall events which induced at least one shallow landslide (SLE) were obtained.
The seasonal distribution of the SLEs in the 2008-2016 period follows what was already observed in [27] during the 1975-2002 period, with a higher frequency in autumn (from September to November, 42%).However, a high landslide hazard of 39% was also recognized in winter (December-January-February).In spring and summer, the number of SLE decreases to values of 18% and 1%, respectively (Figure 8).An archive search was carried out for the 567 rainfall events in order to identif number of shallow landslides triggered, with a total of 93 being found.Many of them triggered during the same rainstorm, such as the 9 failures activated by the 25 Oct 2011 thunderstorm or the 11 failures triggered during the 5 January 2014 event.There a total of 31 rainfall events which induced at least one shallow landslide (SLE) wer tained.
The seasonal distribution of the SLEs in the 2008-2016 period follows what wa ready observed in [27] during the 1975-2002 period, with a higher frequency in aut (from September to November, 42%).However, a high landslide hazard of 39% was recognized in winter (December-January-February).In spring and summer, the num of SLE decreases to values of 18% and 1%, respectively (Figure 8). Figure 9 shows all 567 events automatically reconstructed by the algorithm, in ing many events characterized by low values of I and D (blue events under the disc nating green line, obtained according to the method explained in [37]).Using the w set of 567 rainfall events within the 2008-2016 period, the values of the efficiency o thresholds are biased.Computing the thresholds efficiencies, it is noted that the val approximately 95% for the best one is due to the high number of true negative events ( To get around the problem of the presence of rainfall events with too low of an inte and duration, making them insignificant for the calculation of the rainfall thresholds,  An archive search was carried out for the 567 rainfall events in order to ide number of shallow landslides triggered, with a total of 93 being found.Many of th triggered during the same rainstorm, such as the 9 failures activated by the 25 2011 thunderstorm or the 11 failures triggered during the 5 January 2014 event.T a total of 31 rainfall events which induced at least one shallow landslide (SLE) tained.
The seasonal distribution of the SLEs in the 2008-2016 period follows wha ready observed in [27] during the 1975-2002 period, with a higher frequency in (from September to November, 42%).However, a high landslide hazard of 39% recognized in winter (December-January-February).In spring and summer, the of SLE decreases to values of 18% and 1%, respectively (Figure 8). Figure 9 shows all 567 events automatically reconstructed by the algorithm ing many events characterized by low values of I and D (blue events under the nating green line, obtained according to the method explained in [37]).Using th set of 567 rainfall events within the 2008-2016 period, the values of the efficien thresholds are biased.Computing the thresholds efficiencies, it is noted that the approximately 95% for the best one is due to the high number of true negative eve To get around the problem of the presence of rainfall events with too low of an and duration, making them insignificant for the calculation of the rainfall thresho events were removed from the dataset.Only SLEs and NSLEs above the discri line of the equation I = 8.173 × D −0.547 (green line in Figure 9) were considered val computation of skill scores and contingency tables.This choice discards the eve Figure 9 shows all 567 events automatically reconstructed by the algorithm, including many events characterized by low values of I and D (blue events under the discriminating green line, obtained according to the method explained in [37]).Using the whole set of 567 rainfall events within the 2008-2016 period, the values of the efficiency of the thresholds are biased.Computing the thresholds efficiencies, it is noted that the value of approximately 95% for the best one is due to the high number of true negative events (TN).To get around the problem of the presence of rainfall events with too low of an intensity and duration, making them insignificant for the calculation of the rainfall thresholds, these events were removed from the dataset.Only SLEs and NSLEs above the discriminating line of the equation I = 8.173 × D −0.547 (green line in Figure 9) were considered valid for the computation of skill scores and contingency tables.This choice discards the events with an intensity of less than 10 mm/h in 1 h for short duration events and 1 mm/h rainfall in 100 h for long-duration events.In this way, the original data set consisting of 567 events (relating to the 2008-2016 period) was reduced to 183 events (Figure 10).
As for the first dataset (1975-2002), the three fit methods (LR, QR, LSQ) were applied even to the 2008-2016 dataset, computing the thresholds using the R software (Figures 11-13).The values of α and β obtained using the three methods, together with their 95% confidence intervals, are reported in Tables 8-10.
an intensity of less than 10 mm/h in 1 h for short duration events and 1 mm/h rainfall in 100 h for long-duration events.In this way, the original data set consisting of 567 events (relating to the 2008-2016 period) was reduced to 183 events (Figure 10).[52].Green line separates events considered significant (above the green line) from events that instead cause a bias in the results (below the green line) due to an increase of the number of TN in the contingency tables and, therefore, a considerable increase of the efficiency values (Ef).Blue squares: events that did not trigger landslides (NSLE); red lozenges: events that triggered landslides (SLE).[52].Green line separates events considered significant (above the green line) from events that instead cause a bias in the results (below the green line) due to an increase of the number of TN in the contingency tables and, therefore, a considerable increase of the efficiency values (Ef).Blue squares: events that did not trigger landslides (NSLE); red lozenges: events that triggered landslides (SLE).As for the first dataset (1975-2002), the three fit methods (LR, QR, LSQ) were applied even to the 2008-2016 dataset, computing the thresholds using the R software (Figures 11-13).The values of α and β obtained using the three methods, together with their 95% confidence intervals, are reported in Tables 8-10.Tables 11-13 show the contingency tables and skill scores obtained.The most efficient thresholds are reported in Table 14, in which the efficiencies of the thresholds before and after the data selection using the discriminating line are compared.The lines shown in Figure 14 allows the visual comparison of the three thresholds with the best efficiency.14B obtained with the three different statistical methods.In this case there is a strong similarity between the QR and LSQ thresholds, reminiscent of the 1975-2002 period.Blue squares: events that did not triggered landslides (NSLE); red lozenges: events that triggered landslides (SLE).
The rainfall thresholds calculated by applying the three statistical methods on the dataset of 183 events produced results (Table 14B) which, in terms of Ef, are between 0.84 and 0.88, with 162 out of 183 events correctly predicted.This proves to be comparable to the results of the thresholds obtained by updating the work of [27] and shown in Table 4.

Combination of 2008-2016 SLE and NSLE Events and the Thresholds Calculated on the Dataset Related to the 1975-2002 Period
After the calculation of the rainfall threshold curves through the three fit methodologies (LR, QR, LSQ) on the respective datasets (1975-2002 and 2008-2016 periods), the last statistical processing carried out in this work aimed to validate the thresholds calculated regarding the 1975-2002 period by joining them with a different dataset (in this case, the most recent data of the 2008-2016 period).Using the R software, the 2008-2016 dataset was combined with the parameters of the threshold curves calculated on the 1975-2002 dataset, obtaining a new series of three graphs (Figures 15-17) and contingency tables (Tables 15-17).
Table 15.Contingency table (TP, FN, FP, TN) and skill scores (POD, POFD, POFA Ef, HK) for the dataset of the 2008-2016 period and the rainfall thresholds calculated regarding the 1975-2002 dataset obtained using the LR method.Maximum efficiency value is reached with the curve highlighted in red relating to a landslide probability of 0.1 (Ef = 0.858).NA: not available.14B obtained with the three different statistical methods.In this case there is a strong similarity between the QR and LSQ thresholds, reminiscent of the 1975-2002 period.Blue squares: events that did not triggered landslides (NSLE); red lozenges: events that triggered landslides (SLE).
The rainfall thresholds calculated by applying the three statistical methods on the dataset of 183 events produced results (Table 14B) which, in terms of Ef, are between 0.84 and 0.88, with 162 out of 183 events correctly predicted.This proves to be comparable to the results of the thresholds obtained by updating the work of [27] and shown in Table 4          As in the previous paragraphs, in this new comparison between the obtained rainfall thresholds, the three best thresholds in terms of efficiency and accuracy in forecasting SLE and NSLE events were highlighted in Table 18 and in Figure 18.As in the previous paragraphs, in this new comparison between the obtained rainfall thresholds, the three best thresholds in terms of efficiency and accuracy in forecasting SLE and NSLE events were highlighted in Table 18 and in Figure 18.Blue squares: events that did not triggered landslides (NSLE); red lozenges: events that triggered landslides (SLE).
The validation results made by combining the rainfall thresholds of the 1975-2002 period with the dataset of 183 rainfall events relating to the 2008-2016 period highlighted in Table 18 and Figure 18 are comparable to the results obtained previously for the rainfall thresholds calculated on their respective input data shown in Tables 7 and 14B.The thresh-  The validation results made by combining the rainfall thresholds of the 1975-2002 period with the dataset of 183 rainfall events relating to the 2008-2016 period highlighted in Table 18 and Figure 18 are comparable to the results obtained previously for the rainfall thresholds calculated on their respective input data shown in Tables 7 and 14B.The thresholds of the 1975-2002 period applied on the NSLE and SLE events of the 2008-2016 have Ef values between 0.85 and 0.86, with a maximum of 158 correctly predicted events, and a QR threshold at the fiftieth percentile of exceedance probability.

Discussion
In order to obtain the rainfall thresholds for shallow landslide occurrence in an area prone to landslides (like the southern Apuan Alps), in this study we used three different statistical procedures (LR, QR, LSQ).We compared the results and then further compared them with old threshold curves obtained by [27] using non-statistical and repeatable approaches.Two different datasets were analyzed: a first rainfall event dataset concerning the 1975-2002 period (the same used by [27]) and one concerning the 2008-2016 period.In both datasets, the rainfall events were classified into two categories: events inducing shallow landslides (SLE) and events not inducing shallow landslides (NSLE).LSQ and QR techniques allowed us the ability to assess the ID rainfall thresholds for several exceeding rainfall probabilities, and we specifically show the results for exceeding probabilities of 1%, 5%, 10%, 15%, 20%, and 50%).The ID thresholds for the range of landslide occurrence probability from 5% to 90% were individuated using the LR method on the same datasets, and we specifically show the landslide probability results of 5%, 10%, 30%, 70%, 90%).
The results obtained using both analyzed datasets highlight the validity of the three statistical techniques used to define the thresholds.We recognized some small differences between the three statistical methods applied, both within the two different datasets (1975-2002 and 2008-2016) and between the two datasets themselves.The most significant difference has to do with the LR method.The rainfall thresholds show different slopes for the 1975-2002 dataset (β = −0.60)and 2008-2016 dataset (β = −0.82).This difference could be attributed to the different techniques applied to reconstruct the rainfall events.The hypothesis that climatic conditions may have changed can be ruled out by the fact that there does not appear to be a significant difference between the thresholds obtained with the other two methods.However, it is interesting to explore this issue [61,62] since climatic conditions are changing, including rainfall and extreme events [63], and constant verification and updating of thresholds will be necessary.
The comparison between old rainfall thresholds of [27], obtained using empirical techniques, and new ones, obtained by LR, QR, and LSQ statistical approaches using the same dataset, highlight how the latter are always included between the upper and lower thresholds of [27], but the differences seem to not be particularly relevant.
Regardless of the methodology used, when elaborating rainfall thresholds inducing shallow landslides in a given area, some aspects are essential.The first one is the exact knowledge of the characteristics of the rainfall event (start and end time, duration, and cumulated rainfall).This depends on the availability of a dense monitoring network that can cover the study area and is equipped with efficient automatic instruments that are able to record the rainfall data continuously.If this is not possible (e.g., raingauge far from the area, imperfect functioning of the instruments, etc.), the results are probably affected by errors.A second fundamental aspect is the information about the effects of the rainfall, obviously in terms of landslides activated.For recent events, specific field surveys could be carried out, however for past events this information is often lacking and may be obtained by indirect sources such as archive research.On the other hand, for statistical analysis it is important to cover large periods of time with many events in order to obtain stronger and more reliable relationships to separate NSLEs and SLEs based on the characteristics of the rainfall.Archive research may not be efficient.For example, we can obtain vague or inaccurate information on landslide time or location, or even find no evidence of landslides despite the fact that they did occur (simply because they did not provide any news, having not interfered with human activity or infrastructures).The lack of accuracy of rainfall datasets and landslides is the main limitation for this type of study.
However, there is another important constraint, namely the characteristics of the area, from different points of view: geological, geomorphological, land use, climate, and so on.Shallow landslides can be triggered in many landscapes, but in different ways depending on the conditions mentioned.Therefore, the choice of a homogeneous area such as the southern Apuan Alps is fundamental for having a reliable database.

Conclusions
Many methods for the rainfall thresholds individuation are used worldwide to determine the rainfall characteristics (in terms of duration, intensity, cumulated rainfall, and other similar parameters) which can initiate shallow landslides.Nevertheless, the great experience and numerous attempts of the many researchers who have dedicated themselves to this type of research cannot avoid the fact that different approaches can produce results that are not always comparable.Furthermore, the reliability of the databases used significantly affects the results obtained.Despite these limitations, we think that these methodologies can offer a valid support, obtainable with reasonable costs and time, which can help authorities to manage alert systems in shallow landslide-prone areas, especially in small and homogeneous environments.Indeed, it is crucial to underline again that choosing a small and homogeneous area allows us to obtain thresholds high enough not to issue a state of alert, even in the case of a not-particularly-intense meteorological phenomena.In this way, it is possible to avoid having a high number of false alarms, as sometimes happens in the case of thresholds calculated on a larger scale (regional or even national).
Specifically, the southern Apuan Alps include these characteristics: a small homogeneous area and high susceptibility to heavy rainstorms inducing shallow landslides.These conditions are optimal for the application of the rainfall threshold's methods in their best conditions, namely achieving a limited number of false and missed alarms.In this research, three statistical approaches (logistic regression (LR), quantile regression (QR), and least-squares linear fit (LSQ)), were applied for the first time in the study area to obtain the rainfall thresholds able to initiate shallow landslides, also comparing them with a previous attempt made by [27] using a non-statistical technique.These approaches were applied to two different datasets, including rainfall data and landslide occurrence information, related to the periods of 1975-2002 and 2008-2016, respectively.The best thresholds obtained for each dataset in terms of efficiency are TLR_55 (Ef = 0.898), TQR_55 (0.886), TLSQ_55 (0.892) for the 1975-2002 dataset and TLR_50 (0.885), TQR_50 (0.842), TLSQ_45 (0.852) for the 2008-2016 dataset, considering the predictive capacity of the thresholds obtained using different methodologies to be acceptable.The comparison between the rainfall thresholds computed by a simple manual fitting and the new statistical ones basing the same dataset do not show significant differences, making it possible to test the thresholds obtained for the management of alerts.
The following step Is necessarily a testing period of the obtained rainfall thresholds to verify their effectiveness and applicability, and then adjust them accordingly, before a final use of the authorities responsible for risk management in a densely populated area.Naturally, the concrete use of these thresholds by civil protection requires close coordination with the public bodies that issue weather forecasts, which should be able to provide reliable data in time for their processing.
Finally, we have been informed that a municipal administration of the study area is acquiring data on rainfall thresholds to start an experimental phase.We believe that this could be a flattering result for scientific research on this topic.

Figure 1 .
Figure 1.Location map and main morphological features of the Apuan Alps (white circle: Retignano raingauge; white triangles: main mountains and relative altitude in m a.s.l.; blue lines: main rivers and torrents.Being a 3D-perspective, the upper and lower scale are showed, along with the distance between two towns, as scale references (3D-perspective by Google Maps).

Figure 2 .
Figure 2. Shallow landslides activated by the rainstorm of 19 June 1996.The upper left inset shows the main village destroyed by the debris torrents, causing 13 deaths (adapted with permission from [14].2004, Elsevier).

Figure 1 . 26 Figure 1 .
Figure 1.Location map and main morphological features of the Apuan Alps (white circle: Retignano raingauge; white triangles: main mountains and relative altitude in m a.s.l.; blue lines: main rivers and torrents.Being a 3D-perspective, the upper and lower scale are showed, along with the distance between two towns, as scale references (3D-perspective by Google Maps).

Figure 2 .
Figure 2. Shallow landslides activated by the rainstorm of 19 June 1996.The upper left inset shows the main village destroyed by the debris torrents, causing 13 deaths (adapted with permission from [14].2004, Elsevier).

Figure 2 .
Figure 2. Shallow landslides activated by the rainstorm of 19 June 1996.The upper left inset shows the main village destroyed by the debris torrents, causing 13 deaths (adapted with permission from [14].2004, Elsevier).

Figure 6 .
Figure 6.Comparison among the upper and lower rainfall thresholds of [27] (red and blue lines, respectively) and those obtained by the three statistical methods (yellow, green, violet lines) considering the maximum efficiency related to 1975-2002 period.Blue squares: events that did not trigger landslides (NSLE); red lozenges: events that triggered landslides (SLE).

Figure 6 .
Figure 6.Comparison among the upper and lower rainfall thresholds of[27] (red and blue lines, respectively) and those obtained by the three statistical methods (yellow, green, violet lines) considering the maximum efficiency related to 1975-2002 period.Blue squares: events that did not trigger landslides (NSLE); red lozenges: events that triggered landslides (SLE).

Water 2024 , 11 Figure 7 .
Figure 7. Distribution of main rainfall events that occurred in the southern Apuan Alps from to 2016 in relation to season.

Figure 8 .
Figure 8. Distribution of rainfall events inducing shallow landslides (SLE) in the southern A Alps from 2008 to 2016 in relation to season.

Figure 7 .
Figure 7. Distribution of main rainfall events that occurred in the southern Apuan Alps from 2008 to 2016 in relation to season.

Figure 7 .
Figure 7. Distribution of main rainfall events that occurred in the southern Apuan Alps to 2016 in relation to season.

Figure 8 .
Figure 8. Distribution of rainfall events inducing shallow landslides (SLE) in the southe Alps from 2008 to 2016 in relation to season.

Figure 8 .
Figure 8. Distribution of rainfall events inducing shallow landslides (SLE) in the southern Apuan Alps from 2008 to 2016 in relation to season.

Figure 9 .
Figure9.ID log-log plot of rainfall events SLEs and NSLEs of the 2008-2016 period reconstructed using the algorithm[52].Green line separates events considered significant (above the green line) from events that instead cause a bias in the results (below the green line) due to an increase of the number of TN in the contingency tables and, therefore, a considerable increase of the efficiency values (Ef).Blue squares: events that did not trigger landslides (NSLE); red lozenges: events that triggered landslides (SLE).

Figure 9 .
Figure9.ID log-log plot of rainfall events SLEs and NSLEs of the 2008-2016 period reconstructed using the algorithm[52].Green line separates events considered significant (above the green line) from events that instead cause a bias in the results (below the green line) due to an increase of the number of TN in the contingency tables and, therefore, a considerable increase of the efficiency values (Ef).Blue squares: events that did not trigger landslides (NSLE); red lozenges: events that triggered landslides (SLE).

Water 2024 , 26 Figure 10 .
Figure 10.ID log-log plot of rainfall events SLEs and NSLEs of the 2008-2016 period located above the discriminating line shown in Figure9.Blue squares: events that did not trigger landslides (NSLE); red lozenges: events that triggered landslides (SLE).

Figure 10 .
Figure 10.ID log-log plot of rainfall events SLEs and NSLEs of the 2008-2016 period located above the discriminating line shown in Figure9.Blue squares: events that did not trigger landslides (NSLE); red lozenges: events that triggered landslides (SLE).

Water 2024 ,
16, x FOR PEER REVIEW 15 of 26

Figure 12 .
Figure12.Rainfall thresholds obtained using the QR method for the 2008-2016 period and corresponding to different landslide probabilities (1-50%).The TQR_10 and TQR_15 lines are perfectly superimposed (only the light blue one is shown).Blue squares: events that did not trigger landslides (NSLE); red lozenges: events that triggered landslides (SLE).

Figure 12 .
Figure12.Rainfall thresholds obtained using the QR method for the 2008-2016 period and corresponding to different landslide probabilities (1-50%).The TQR_10 and TQR_15 lines are perfectly superimposed (only the light blue one is shown).Blue squares: events that did not trigger landslides (NSLE); red lozenges: events that triggered landslides (SLE).

Figure 13 .Table 14 .
Figure 13.Rainfall thresholds obtained using the LSQ method for the 2008-2016 period and corresponding to different landslide probabilities (1-50%).Blue squares: events that did not triggered landslides (NSLE); red lozenges: events that triggered landslides (SLE).Table 14.Summary of contingencies and efficiency values (Ef) of the forecasts of SLEs and NSLEs using the three best thresholds computed on the 2008-2016 dataset before (A) and after (B) the reduction of the dataset shown in Figures 8 and 9.

Figure 14 .
Figure 14.Rainfall thresholds with maximum efficiency computed for the 2008-2016 period shown in Table14Bobtained with the three different statistical methods.In this case there is a strong similarity between the QR and LSQ thresholds, reminiscent of the 1975-2002 period.Blue squares: events that did not triggered landslides (NSLE); red lozenges: events that triggered landslides (SLE).

Figure 14 .
Figure 14.Rainfall thresholds with maximum efficiency computed for the 2008-2016 period shown in Table14Bobtained with the three different statistical methods.In this case there is a strong similarity between the QR and LSQ thresholds, reminiscent of the 1975-2002 period.Blue squares: events that did not triggered landslides (NSLE); red lozenges: events that triggered landslides (SLE). .

4. 3 .
Combination of 2008-2016 SLE and NSLE Events and the Thresholds Calculated on the Dataset Related to the 1975-2002 Period After the calculation of the rainfall threshold curves through the three fit methodologies (LR, QR, LSQ) on the respective datasets (1975-2002 and 2008-2016 periods), the last statistical processing carried out in this work aimed to validate the thresholds calculated regarding the 1975-2002 period by joining them with a different dataset (in this case, the most recent data of the 2008-2016 period).Using the R software, the 2008-2016 dataset was combined with the parameters of the threshold curves calculated on the 1975-2002 dataset, obtaining a new series of three graphs (Figures 15-17 ) and contingency tables (Tables15-17).

Figure 15 .
Figure 15.Rainfall thresholds obtained using the LR technique.Blue squares: events that did not trigger landslides (NSLE); red lozenges: events that triggered landslides (SLE).
) and skill scores (POD, POFD, POFA Ef, HK) for the 2008-2016 dataset and the thresholds calculated regarding the data relating to the 1975-2002 period obtained using the QR method.Maximum efficiency value is reached with the curve highlighted in red relating to an exceedance probability of 0.5 (Ef = 0.863), which is considered better for the higher POD and HK parameters.

Figure 16 .
Figure 16.Rainfall thresholds obtained using the QR technique.Blue squares: events that did not trigger landslides (NSLE); red lozenges: events that triggered landslides (SLE).

Figure 16 .
Figure 16.Rainfall thresholds obtained using the QR technique.Blue squares: events that did not trigger landslides (NSLE); red lozenges: events that triggered landslides (SLE).

Table 18 .
Summary of the contingencies and the skill scores of the forecasts of the SLEs and NSLEs using the three best thresholds computed with the threshold parameters of the 1975-2002 dataset combined with the 2008-2016 dataset.

Figure 18 .
Figure 18.The three rainfall thresholds with maximum efficiency computed using the combination of the rainfall data of the 2008-2016 period and the threshold parameters of the 1975-2002 period.Blue squares: events that did not triggered landslides (NSLE); red lozenges: events that triggered landslides (SLE).

Figure 18 .
Figure 18.The three rainfall thresholds with maximum efficiency computed using the combination of the rainfall data of the 2008-2016 period and the threshold parameters of the 1975-2002 period.Blue squares: events that did not triggered landslides (NSLE); red lozenges: events that triggered landslides (SLE).

Table 1 .
Values of α and β for the 1975-2002 rainfall and landslide dataset obtained with the LR method for different landslide probability p (C.I.: confidence interval).

Table 2 .
Values of α and β for the 1975-2002 rainfall and landslide dataset obtained with the QR method for different exceeding probability p e (C.I.: confidence interval).

Table 3 .
Values of α and β for the 1975-2002 rainfall and landslide dataset obtained with the LSQ method for different exceeding probability p e (C.I.: confidence interval).

Table 4 .
Contingency table (TP, FN, FP, TN) and skill scores (POD, POFD, POFA, Ef, HK) of the thresholds obtained by LR method.Threshold showing the maximum efficiency value is highlighted in red.

Table 5 .
Contingency table (TP, FN, FP, TN) and skill scores (POD, POFD, POFA, Ef, HK) of the thresholds obtained using QR method.Threshold showing the maximum efficiency value is highlighted in red (TQR_60 shows the same results but is calculated for a higher probability).

Table 6 .
Contingency table (TP, FN, FP, TN) and skill scores (POD, POFD, POFA, Ef, HK) of the thresholds obtained using LSQ method.Threshold showing the maximum efficiency value is highlighted in red.

Table 12 .
Contingency table (TP, FN, FP, TN) and skill scores (POD, POFD, POFA Ef, HK) for the thresholds obtained using the QR method.Although the threshold relating to the ninetieth percentile of exceedance probability shows the highest efficiency (Ef = 0.852), the threshold highlighted in red was considered better for the higher POD and HK parameters.

Table 13 .
Contingency table (TP, FN, FP, TN) and skill scores (POD, POFD, POFA, Ef, HK) of the thresholds obtained using the LSQ method.Maximum efficiency value is reached with the threshold highlighted in red relating to an exceedance probability of 0.45 (Ef = 0.852).

Table 8 .
Values of α and β for the 2008-2016 rainfall and landslide dataset obtained with the LR method for different landslide probability p (C.I.: confidence interval).

Table 9 .
Values of α and β for the 2008-2016 rainfall and landslide dataset obtained with the QR method for different exceeding probability p e (C.I.: confidence interval).

Table 10 .
Values of α and β for the 2008-2016 rainfall and landslide dataset obtained with the LSQ method for different exceeding probability p e (C.I.: confidence interval).

Table 11 .
Contingency table (TP, FN, FP, TN) and skill scores (POD, POFD, POFA Ef, HK) for the thresholds obtained by the LR method.The threshold showing the maximum efficiency (Ef = 0.885) is highlighted in red.NA: not available.

Table 12 .
Contingency table (TP, FN, FP, TN) and skill scores (POD, POFD, POFA Ef, HK) for the thresholds obtained using the QR method.Although the threshold relating to the ninetieth percentile of exceedance probability shows the highest efficiency (Ef = 0.852), the threshold highlighted in red was considered better for the higher POD and HK parameters.

Table 13 .
Contingency table (TP, FN, FP, TN) and skill scores (POD, POFD, POFA, Ef, HK) of the thresholds obtained using the LSQ method.Maximum efficiency value is reached with the threshold highlighted in red relating to an exceedance probability of 0.45 (Ef = 0.852).

Table 14 .
Summary of contingencies and efficiency values (Ef) of the forecasts of SLEs and NSLEs using the three best thresholds computed on the 2008-2016 dataset before (A) and after (B) the reduction of the dataset shown in Figures8 and 9.

Table 17 .
Contingency table (TP, FN, FP, TN) and skill scores (POD, POFD, POFA, Ef, HK) for the 2008-2016 dataset period and the thresholds calculated on the data relating to the 1975-2002 period obtained using the LSQ method.Maximum efficiency value is reached with the curve highlighted in red relating to an exceedance probability of 0.30 (Ef = 0.858).NA: not available.

Table 15 .
Contingency table (TP, FN, FP, TN) and skill scores (POD, POFD, POFA Ef, HK) for the dataset of the 2008-2016 period and the rainfall thresholds calculated regarding the 1975-2002 dataset obtained using the LR method.Maximum efficiency value is reached with the curve highlighted in red relating to a landslide probability of 0.1 (Ef = 0.858).NA: not available.

Table 16 .
Contingency table (TP, FN, FP, TN) and skill scores (POD, POFD, POFA Ef, HK) for the 2008-2016 dataset and the thresholds calculated regarding the data relating to the 1975-2002 period obtained using the QR method.Maximum efficiency value is reached with the curve highlighted in red relating to an exceedance probability of 0.5 (Ef = 0.863), which is considered better for the higher POD and HK parameters.

Table 17 .
Contingency table (TP, FN, FP, TN) and skill scores (POD, POFD, POFA, Ef, HK) for the 2008-2016 dataset period and the thresholds calculated on the data relating to the 1975-2002 period obtained using the LSQ method.Maximum efficiency value is reached with the curve highlighted in red relating to an exceedance probability of 0.30 (Ef = 0.858).NA: not available.

Table 18 .
Summary of the contingencies and the skill scores of the forecasts of the SLEs and NSLEs using the three best thresholds computed with the threshold parameters of the 1975-2002 dataset combined with the 2008-2016 dataset.
Water 2024, 16, x FOR PEER REVIEW 21 of 26