Reanalysis of Soil Moisture Used for Rainfall Thresholds for Rainfall-Induced Landslides: The Italian Case Study

: Landslides are one of the most frequent natural disasters that can endanger human lives and property. Therefore, prediction of landslides is essential to reduce economic damage and save human lives. Numerous methods have been developed for the prediction of landslides triggering, ranging from simple methods that include empirical rainfall thresholds, to more complex ones that use sophisticated physically- or conceptually-based models. Reanalysis of soil moisture data could be one option to improve landslide forecasting accuracy. This study used the publicly available FraneItalia database hat contains almost 9000 landslide events that occurred in the 2010–2017 period in Italy. The Copernicus Uncertainties in Ensembles of Regional Reanalyses (UERRA) dataset was used to obtain precipitation and volumetric soil moisture data. The results of this study indicated that precipitation information is still a much better predictor of landslides triggering compared to the reanalyzed (i.e., not very detailed) soil moisture data. This conclusion is valid both for local (i.e., grid) and regional (i.e., catchment-based) scales. Additionally, at the regional scale, soil moisture data can only predict a few landslide events (i.e., on average around one) that are not otherwise predicted by the simple empirical rainfall threshold approach; however, this approach on average, predicted around 18 events (i.e., 55% of all events). Despite this, additional investigation is needed using other (more complete) landslide databases and other (more detailed) soil moisture products.


Introduction
Landslides are one of the most common natural hazards in hilly and mountainous regions around the globe, and there exist numerous varieties of landslides types [1][2][3], posing a serious risk to populations and infrastructure in landslide prone areas [4,5]. For example, globally, over 55,000 fatalities were reported in the period from 2004 to 2016 [5], 1370 fatalities and 784 injuries in 27 European countries were reported in the period from 1995 to 2014 [6], and 204 reported landslides in Bangladesh caused 727 fatalities and 1017 injuries in the period from 2000 to 2018 [7]. The frequency of landslides increases with the frequency of extreme rainfall events [8], and thus the number of rainfall-induced landslides is expected to rise due to climate change and associated extreme rainfall events [9]. Rainfall-induced landslides are primarily triggered during rainfall events of variable characteristics, although the most extreme events are often critical [10]. A rainfall event is most commonly described by different factors such as total rainfall amount (R in mm), event duration (D in hours or days), and intensity (I in mm/time unit). Since the first global rainfall threshold for shallow landslides and debris flows of Caine [11], numerous other rainfall thresholds have been developed. A review of recent literature on rainfall thresholds for landslide occurrence was prepared by Segoni et al. [12]. They introduced spatial scales for rainfall threshold definition and used slope, local and basin, over regional and national to global spatial scale. The development of rainfall thresholds is far from being a trivial task. Gariano et al. [13] studied Italian scientific literature published in the period 2008-2018 on the topic of rainfall thresholds for landslide triggering, and found that in this period, 163 thresholds were published worldwide, with just 65 being in Italy. Recently, an additional focus has been given to procedures for landslide prediction that also take into consideration the hydrological processes responsible for their triggering [14]. Thus, other variables (not only rainfall data) could be used in the process of the landslide prediction with the aim to increase the accuracy [15,16]. For example, ERA5 reanalysis data (https://www.ecmwf.int/en/forecasts/datasets/reanalysis-datasets/era5, (accessed on 15 May 2021)) were recently used as an example in the case of the Campania Region of Southern Italy [16]. This study indicated that ERA5 soil moisture data can be regarded as a proxy of slope wetness conditions [16]. However, it was tested on a relatively small region. Additionally, some other remote sensing data products could provide useful input for the landslide initiation [17]. A recent study indicated that global rainfall products still have issues in detection of precipitation spatiotemporal patterns [18]. Conversely, products with a better potential generated by regional climate models were detected in some other parts of Asia [19] or in Europe [20].
Early warning systems have proven to be a valuable tool for natural disaster risk reduction [21]. In order to forecast landslides and to evacuate populations from hazardous areas in a timely manner, landslide early warning systems (LEWS) have been developed worldwide. LEWS are usually built from several parts: a monitoring system, a selected rainfall threshold, and a warning model that should be evaluated from its performance perspective and how it is perceived by the population [22]. A feature of LEWS is often the empirical rainfall thresholds or some other more physically based thresholds [23,24].
Because of the high spatial and temporal soil moisture variability, due to spatiotemporal variations of associated meteorological (precipitation, temperature, solar radiation, wind speed, and humidity) and biogeophysical (soil properties, topographic features, and vegetation characteristics) parameters, measurement, the monitoring of large-scale soil moisture dynamics is challenging [25]. Therefore, for landslide hazard assessment, the estimation of the wetness conditions of the soil is usually addressed by using antecedent precipitation indices [26,27]. However, Pelletier et al. [28], among others [29,30], recommended replacing the use of antecedent precipitation indices because they are frequently poorly correlated with the actual soil moisture observations [31], as concluded by Ponziani et al. [32]. Soil moisture can be estimated using in-situ measurements, remote sensing techniques, and soil water balance simulation models, such as those provided by Lazzari et al. [33,34]. Wicki et al. [35] showed that in-situ soil moisture data contains some useful information for landslide prediction. Ponziani et al. [32] suggested that the integration of these three tools may provide reliable estimates of soil moisture at the temporal and spatial resolutions required for operational activities related to shallow landslide warning systems [36][37][38][39]. Therefore, reanalysis data, such as Copernicus Uncertainties in the Ensembles of Regional Reanalyses (UERRA) dataset [40,41] could be useful in describing the soil moisture conditions. In cases where additional variables such as soil moisture are to be tested from the perspective of the landslide triggering prediction, one needs a good landslide database in the form of a landslide inventory. For such an inventory, it is of significant importance to have an adequate spatial and temporal accuracy. For example, for NW Spain, the dataset covers the period 1980-2015 with 2063 landslide records, out of which 59% show an exact spatial location, with 51% of the records providing accurate dates, showing the usefulness of press archives and temporal records [42]. Furthermore, the type and size (volume) of landslides in such an inventory is also important. Other landslide databases are available in Europe [43][44][45]. However, many databases are not publicly accessible. One of the freely available and opened databases with a large amount of landslides entries was recently developed by Calvello and Pecoraro [45] and covers the territory of Italy in the period from 2010 to 2017.
The main aim of this paper is to test if reanalysis of the soil moisture data can provide valuable information for the LEWS at large-regional scale using the publicly available FraneItalia database [45]. An Italian case study is believed to be a good choice, since the landslide density in Italy is one of the highest in the world: the Italian Landslide Inventory includes 486,336 landslides [46,47] that affect an area of about 20,800 km 2 or 6.9% of Italy; this gives an average landslide density in the affected area of over 23 landslides/km 2 , or, taking the whole territory of Italy (301,388 km 2 ) into account, an average density of 1.6 landslide/km 2 . The latest numbers on the webpages of the Inventario dei Fenomeni Franosi in Italia (IFFI) project [48] show that 620,808 landslides effect around 23,700 km 2 or 7.9% of the Italian territory. there are other landslide catalogs available in Italy, such as landslide reports published by Istituto Superiore per la Protezionee la Ricerca Ambientale (ISPRa) or Consiglio Nazionale delle Ricerche (CNR)-Polaris database that generally include a smaller number of landslides compared to the IFFI project [45]. Therefore, the main idea of this study is to investigate if not very detailed, i.e., reanalysed soil moisture data, can provide an added value in terms of landslide prediction and what is the performance of such soil moisture data in comparison to simple empirical rainfall threshold curves. The investigation was conducted at the local and regional scale by counting correctly predicted landslides using different approaches and comparison of multiple statistical metrics.

Reanalysis Data
In the scope of this study, the Copernicus Uncertainties in Ensembles of Regional Reanalyses (UERRA) dataset was used. The UERRA includes computations of nearsurface and surface essential climate variables from the MESCAN-SURFEX and UERRA-HARMONIE systems [49]. Additional information about the reanalysis data can be found in the existing literature [40,41,49]. In general, the reanalysis combines historical observations (i.e., in situ, surface and satellite remote sensing) with a dynamic model with an aim to provide a coherent description of the past climate conditions [49]. In this study, we used the volumetric soil moisture (VSM) [m 3 /m 3 ] (i.e., m 3 water in m 3 soil) from the approximately 11 km × 11 km (i.e., 121 km 2 ) UERRA-HARMONIE system and 24-total precipitation (TP) [mm] from the approximately 5.5 km × 5.5 km MESCAN-SURFEX system. The VSM represents the amount of water in cubic meter soil that is valid for the grid cell at the corresponding soil level [49]. It is available for the analysis and forecast time steps [49]. Additionally, for the forecast time steps, this variable is also available at approximately 5.5 km × 5.5 km resolution as part of the MESCAN-SURFEX. The TP represents the amount of water falling onto the ground or water surface and includes all kinds of precipitation forms and is valid for a grid cell [49]. The horizontal coverage of the UERRA dataset spans from the northern tip of Scandinavia to northern Africa and from Ural to the Atlantic Ocean. However, in the scope of this study, the focus was Italy, where a publicly available landslide database was freely accessible (Section 2.2). We also checked other European countries where landslide databases exist but are not openly accessible [44]. It should be noted that the selected UERRA dataset contains its own uncertainties but can be regarded as one of best available high spatial and temporal resolutions datasets covering the entirety of Europe [41,50]. The UERRA dataset was recently used for the investigation of changes in the rainfall events characteristics above the empirical rainfall thresholds [51].

FraneIT Landslides Database and HydroSEHDS Dataset
For a test whether soil moisture data significantly contribute to landslides initiation, we analyzed landslide events from a catalog of recent Italian landslides, called FraneItalia [45]. The catalog contains 8931 landslides from 2010 to 2017 and is based on online sources (news, articles) provided by Google search engine. It is clear that this database only includes a smaller number of all the landslides (i.e., up to around 10%) that occurred in Italy in this period since the total number of landslides in Italy is much larger [46,47]. However, it can be assumed that landslides reported in the FraneItalia are among the largest ones since they were mentioned in the online sources. Mostly, events affecting infrastructure or human lives are expected to be mentioned by the media while smaller landslides occurring in remote places are not expected to be mentioned by the media. About 2% of the included events had very severe consequences and around 14% severe consequences [45]. Georeferenced landslide events are distinguished into single landslide events (SLE-Single), for records reporting only one landslide, and areal landslide events (ALE-Areal), for records referring to multiple landslides triggered by the same cause in the same geographic area. Both SLEs and ALEs are classified into three consequence classes, depending on whether the event resulted in casualties or missing persons, injured persons, or no physical damage to people. Landslide event information collected in the catalog always includes data on the location of the event, the day the landslide occurred, the source of the information, and the number of landslides for areal events. Additional information may include the start and duration of landslide event, phase of activity, details of consequences and characteristics of landslide (e.g., landslide volume). Information on landslide volume is known for less than 15% of landslides, of which only 1% are deep-seated (volume greater than 10,000 m 3 ). Calvello and Pecoraro [45] provided a temporal and spatial representation of the landslides included in the FraneItalia database. The regions with the highest number of landslides in the FraneItalia are Toscana and Veneto, while the region with the smallest number of events is Puglia [45]. In central and southern Italy landslides mostly occur in autumn and winter, while in northern Italy landslides also frequently occur in summer [45]. The open access FraneItalia catalog is available as a PostgreSQL binary dump file at the following link: https: //landsliderisk.wordpress.com/dissemination/franeitalia (accessed on 15 May 2021).
Italy was divided into sub-catchments using the 7th Psafstetter level catchment boundaries from the Hydrological data and maps based on SHuttle Elevation Derivatives at multiple Scales (HydroSHEDS) database [52,53] (Figure 1). Thus, the entire Italy and some of the surrounding countries was divided into 136 sub-catchments with a mean catchment area of 2290 km 2 ranging from 19 to 25,800 km 2 ( Figure 1). Thus, around 10,000 km 2 of the area located in Italy neighboring countries was also selected. The idea behind using the large sub-catchment delineation was to test if soil moisture information could provide valuable information from the landslide warning system at the regional level. Figure 1. Location of all landslides used (FraneItalia database) and HydroSheds catchments boundaries (i.e., 7th level) that were used to test the added value of soil moisture data at regional level.

Statistical Investigation
In order to investigate the benefit of the soil moisture data for the landslide initiation, firstly grid cell values of total precipitation (TP) and volumetric soil moisture (VSM) at the location of the landslides in the FraneItalia catalog were extracted. This means that on the reported day of landslide triggering (FraneItalia), the information about the TP and VSM was extracted from the UERRA dataset. Additionally, antecedent information for both variables was also analyzed (i.e., 3, 7 and 10 days). For example, a 3 day antecedent value was calculated as a mean value of reported values 3 days before the triggering date (similar calculation was done for the 7 day and 10day values). Based on the extracted data, percentiles of TP and VSM were calculated in order to see if the TP and VSM values on the day of triggering were among the highest percentiles compared to the entire investigated period (i.e., 2010-2017).
In the next step, the investigation of the added value of the soil moisture data was performed at large-regional scale using the 7th Psafstetter level catchment boundaries (Section 2.2). For each sub-catchment, the average TP and VSM for a specific day were firstly calculated. In the next step, the individual rainfall events were determined. A 24 h time without rainfall was used to separate individual rainfall events [15]. Thus, for all rainfall events (for all sub-catchments), the total rainfall amount, mean rainfall intensity and total rainfall duration were calculated. In the following step, the normalized global empirical rainfall threshold curve proposed by Guzzetti et al. [54] was used to test how many landslides occurred when rainfall events were located above or below this empirical rainfall threshold curve. This specific threshold normalized by the mean annual precipitation global empirical rainfall threshold curve was selected because it uses normalized rainfall (i.e., to account for different climate conditions) and because it is regarded as a global threshold it should be applicable also for Italy. The selected threshold is defined by the following equation [54]: where D is rainfall duration [h] and I MAP is normalized rainfall intensity [h −1 ] based on the mean annual precipitation in the area (MAP) [mm]. It should be noted that this threshold range is from 0.1 to 1000 h [54]. Moreover, this threshold accounts for shallow landslides and debris flows [54]. It should be noted that landslides that occurred on the same day in the same region were considered only once (i.e., were merged). Furthermore, many empirical rainfall thresholds are available around the world, and also specifically for Italy (e.g., [54]), that one could test also some other thresholds. However, in order to obtain a more robust comparison and avoid additional bias due to use of multiple regional thresholds for different parts of Italy, it was decided to use one (global) threshold for the entire study area. For the evaluation of the soil moisture information, a relatively simple percentile value concept that is frequently used in relation to extreme heat definition was applied [55][56][57]. Besides daily VSM, 3, 7 and 10 days average antecedent conditions were tested. Different percentile thresholds were tested, 90, 75 and 50% among others. Thus, the idea was to check how many landslides occurred when regionally averaged soil moisture was above or below the simple percentile threshold.

Results and Discussion
This section provides discussion about the results. Firstly, results at the local (i.e., point/grid) cell are presented and then the results at the large-regional scale are shown.

Local Approach
Firstly, local (i.e., grid) values on the triggering dates of the landslides were extracted. Figure 2 shows an example of the TP and VSM data on the 1st of January 2010 (i.e., beginning of the investigated period). It can be seen that selected sub-catchments (i.e., 136 in total) have a relatively uniform VSM distribution at the sub-catchment level ( Figure 2). Thus, the VSM is not as spatially variable as the TP that shows larger grid-by-grid variations ( Figure 2). These variations are even larger in summer when local thunderstorms can even have smaller extent than the grid cell area. Therefore, application of additional subcatchments (e.g., 6th or even 5th Psafstetter level catchment boundaries) could partially improve the spatial representation of the VSM and especially the TP. However, using more sub-catchment would reduce the number of landslides per sub-catchment. Thus, as can be seen in Figure 1, this was already relatively low in some cases while it was relatively high in some other sub-catchments. For the dates when landslides were reported in the FraneItalia database (both SLE-Single and ALE-Areal were tested) the VSM and TP grid cell values were extracted. The same applies for the 3, 7 and 10 day antecedent values. Figure 3 shows boxplots of percentile values of the VSM and TP on the triggering dates and antecedent values. It can be seen that using daily values on the date of reported triggering generally yielded worse results compared to the case when antecedent values were used (Figure 3). The highest percentile values were generally reported for the 3 day antecedent values (Figure 3). With the consideration of 7 or 10 day antecedent conditions, the performance generally decreases (Figure 3). The same conclusion can be made both for the TP and VSM and for the SLE-Single and ALE-Areal landslides according to the FraneItalia database. The median percentiles for the 3 day antecedent TP and VSM were around 85%. This means that in 50% of the reported landslides, in the 15% of days in the 2010-2017 period, the TP and VSM were higher than the values on specific triggering date (Figure 3). More specifically, this means that in around 400-450 days (in the 8 year period) the VSM or TP values were higher compared to the VSM and TP values on the reported triggering date in the FraneItalia database. Moreover, the 1st and 3rd quartiles values were around 60% and 95% for TP and VSM (Figure 3), respectively. It is clear that in some cases the reported landslides occurred on the day with relatively extreme local (i.e., grid cell) conditions (i.e., high percentile values shown in Figure 3). However, in many cases local (i.e., grid cells) were not as extreme as one could have expected (i.e., low percentile values shown in Figure 3). Thus, there are many days in the investigated period where the TP and VSM values according to the UERRA were higher than the values associated with the landslides triggering. It should be noted that the selection of only temporarily "certain" landslides (FraneItalia) did not yield significantly different results. The same applied for the case when only "numerous reported" landslides were considered. Since it is clear that there are many uncertainties related to these preliminary investigations such as: (i) landslides locations in the FraneItalia database could be wrong for a several kilometers (i.e., consequently neighboring grid cell was used), (ii) landslides were triggered by extreme thunderstorms that were not correctly captured by the UERRA, (iii) there are only few (or in most cases) only one landslide per grid cell in the FraneItalia database, (iv) not all landslides that actually occurred could be reported in the FraneItalia. With respect to point (ii) it should be noted that the difference between the gridded data and point data (e.g., gridded precipitation and local rain gauge) can be quite significant. Thus, as pointed out by [16] an averaging affect should be taken into account. Thus, additional investigation was performed at sub-catchment level with the aim to additionally test if soil moisture information can provide an added value that could potentially be used in the scope of the landslides warning systems.

Regional Approach
Since the precipitation data, compared to the soil moisture information, is more easily and frequently measured (e.g., either by ground-based or remote sensing measurements), it was firstly tested how many landslides would be correctly captured by the simple normalized empirical rainfall threshold (Equation (1)). Due to the characteristics of the gridded data, the normalized thresholds seem a better option than thresholds that do not use normalized values since the difference between gridded and point data can be quite significant [16]. Additionally, it was tested how many landslides in the FraneItalia database would be additionally captured by the simple percentile soil moisture threshold. Figure 4 shows an example of the distribution of the rainfall events above or below the selected normalized empirical rainfall threshold and variability in the 3 day summed VSM for one of the sub-catchments. It can be seen that there are a relatively large number of rainfall events located above the selected empirical rainfall threshold (Figure 4). The same applied for the 90% percentile threshold where the reported 3 day VSM values were multiple times above the 90% threshold. Furthermore, it was tested how many landslides in the FraneItalia database would be detected by such a simple normalized rainfall threshold if used at subcatchment scale (i.e., large-regional scale). It can be seen that such a normalized threshold would correctly predict (i.e., true positive) around 55% of the reported landslides (Table 1). It should be noted that actual performance of such threshold would be relatively low (i.e., large number of false alarms) and that the main aim of this research was not to propose an optimal threshold but to test if soil moisture can provide valuable information. In cases where that goal would be to optimize the performance of the selected threshold, some of the existing approaches could be used [15,[58][59][60]. It should be noted that use of different empirical rainfall threshold would yield higher true positive values (e.g., calibration of a local threshold for each specific catchment). However, in such a case, a comparison would be less robust due to effect of different rainfall thresholds on the results. Thus, as noted the idea of this study was not to obtain the best suitable empirical rainfall threshold curve for each sub-catchment in Italy but rather to evaluate added value of the VSM variable in terms of landslides prediction, using Italy as a case study. Moreover, some regions also had relatively large numbers of events, which could be reduced with use of different catchment boundaries (i.e., smaller ones) or different definitions of regions. Additionally, Table 2 shows the performance of the simple 90% percentile threshold (i.e., a 3 day antecedent VSM was used). It can be seen that in this case, on average, only around 35% of reported landslides in the FraneItalia were correctly determined (i.e., true positive events). Similar as in the case of the empirical rainfall threshold, also in the case of the VSM percentile threshold there was a relatively large number of so-called false alarms. It should be noted that there could be landslides that are not reported in the FraneItalia database. Additionally, it was also tested how many landslides were captured by the VSM percentile threshold and not by the empirical rainfall threshold ( Table 2). It can be seen that the number of additionally detected events is almost negligible (Table 2). Thus, it seems that use of additional information in the form of soil moisture did not significantly contribute to the accuracy of the landslides' predictions (i.e., on average, 2.6% of events were additionally captured by the VSM threshold with a maximum value of 14.2%).  Furthermore, it was also tested how the proportion of detected landslides using the percentile VSM threshold changes with different percentile values ( Figure 5). As expected, lowering the percentile VSM threshold leads to a smaller number of events below such threshold. However, in the case of use of 50% percentile threshold, there were around 25% of landslides that occur below such VSM threshold that is generally relatively low (i.e., a lot of false alarms) ( Figure 5). Thus, it is clear that consideration of the VSM percentile threshold (with the consideration of the limitations of this study) does not yield as good results as one could perhaps expect based on the results presented in some other studies [16,23,24]. However, it should be noted that these studies used different definitions of the soil moisture, such as the output of the hydrological model HBV [23,24]. Moreover, some other hydrological models such as GR4J (https://webgr.inrae.fr/en/ models/daily-hydrological-model-gr4j/, (accessed on 15 May 2021)) also proved their added value in terms of landslides prediction but such methods require other input data such as river discharge [15]. The main reason for non-optimal performance of the VSM percentile threshold is in the relatively weak dependence between the VSM and number of landslides in the FraneItalia database at monthly or annual level (Tables 3 and 4). It can be seen that stronger dependence was obtained between the TP and number of landslides (Tables 3 and 4). Additionally, it can be also seen that the dependence between the TP and VSM is the strongest (Tables 3 and 4). Thus, the TP characteristics clearly have a relatively large impact on the VSM at the monthly or annual level. This can partly explain the relatively low number of landslides events that were additionally detected using the VSM percentile threshold, and that were not detected by the normalized empirical rainfall threshold (Table 2).

Conclusions
Accurately monitoring soil moisture has a vast range of applications. It can be the basis for numerous predictions, including rainfall estimation or flood forecasting. It can potentially also predict landslides and soil erosion. This study provides a preliminary investigation of the added value of the VSM data obtained from the UERRA reanalysis data from the perspective of landslides prediction. Based on the presented results, it can be concluded that the VSM information did not provide much benefit from the perspective of landslide prediction. This was especially evident in comparison with the simple empirical normalized rainfall threshold curve. More specifically, on average, around 1 landslide per sub-catchment (on average 33 landslides were detected per sub-catchment) was captured by the simple 90% percentile threshold value that was not detected by the selected empirical rainfall threshold. Thus, a relatively strong dependence between the TP and VSM indicated that information about precipitation is still an essential input to the landslides warning system. However, it should be noted that in cases where other (and more detailed) soil moisture information would be selected (e.g., different reanalysis product, use of satellite measured soil moisture, better spatial resolution, consideration of smaller catchments, soil moisture in-situ sensors etc.) and different (more complete) landslide databases, a stronger benefit of the soil moisture data could be obtained. Similarly important is the fact that the FraneItalia landslide database used in this case study covers only roughly 10% of the triggered landslides in Italy in the period under investigation (2010-2017). A more complete landslide database with true georeferenced data and precise timing of landslide triggering would yield possibly better results with more important contribution of the soil moisture data. Palazzolo et al. [61] also indicated that quality of the landslide inventory (i.e., landslide database) could be very important, and we confirmed this statement.
Therefore, it seems that relatively coarse gridded soil moisture data cannot provide much added value in terms of the LEWS and that data with better spatial resolution could lead to a more significant contribution of the soil moisture since this has an effect on the rainfall-induced landslides' triggering. Thus, this study suggests that additional investigations are needed to evaluate the potential of the satellite soil moisture data from the landslide initiation perspective. Moreover, additional comparisons using gauge data could be carried out in order to evaluate the benefit of the reanalysis of soil moisture.