Towards a National-Scale Dataset of Geotechnical and Hydrological Soil Parameters for Shallow Landslide Modeling

: One of the main constraints in assessing shallow landslide hazards through physically based models is the need to characterize the geotechnical parameters of the involved materials. Indeed, the quantity and quality of input data are closely related to the reliability of the results of every model used, therefore data acquisition is a critical and time-consuming step in every research activity. In this perspective, we reviewed all ofﬁcial certiﬁcates of tests performed through 30 years at the Geotechnics Laboratory of the Earth Science Department (University of Firenze, Firenze, Italy), compiling a dataset in which 380 points are accurately geolocated and provide information about one or more geotechnical parameters used in slope stability modeling. All tests performed in the past (in the framework of previous research programs, agreements of cooperation, or to support didactic activities) were gathered, homogenized, digitalized, and geotagged. The dataset is based on both on-site tests and laboratory tests, it accounts for 40 attributes, among which 13 are descriptive (e.g., lithology or location) and 27 may be of direct interest in slope stability modeling as input parameters. The dataset is made openly available and can be useful for scientists or practitioners committed to landslide modeling.


Summary
Distributed physically based models represent the most rigorous and scientifically sound technique to model landslide occurrence in a given study area, as they use complex mathematical equations to account for all the physical processes involved in landslide triggering and mobilization [1]. Many models have been proposed to this aim [2][3][4][5][6][7][8][9][10], but despite this, their application over large areas is still rare and it is limited by several drawbacks. The main one is the limited availability, for large areas, of detailed information on geotechnical and hydraulic properties of soils (e.g., cohesion, internal friction angle, soil unit weight, hydraulic conductivity, and so on) [11][12][13]. It is widely known that these parameters, although quasi-static (i.e., with values that can be reasonably considered constant in time), have a large spatial variability and even two spots close to each other may exhibit very different values [14][15][16]. As a consequence, if distributed slope stability models have to be applied over large areas, a large number of measures would be needed to have a sufficient characterization of the physical properties of the involved materials [17][18][19].
To provide a contribution to fill this gap, a homogeneous dataset has been compiled to collect and organize a large number of already accomplished measures of parameters that can be used as input data for slope stability models. The dataset was collected resorting to the paper and electronic records of the Geotechnics Laboratory of the Earth Science Department of the University of Firenze (DST Lab henceforth). Since 1990, the DST Lab has carried out geotechnical in situ and laboratory analyses for soil and rock characterization to assist geotechnical modeling, environmental studies and cultural heritage preservation. The outcomes of each analysis carried out by the DST Lab staff have been recorded in a certificate reporting the details of the measuring method, the values of the parameters (or indexes) measured and additional information including, e.g., the location and the date. This paper does not account for the acquisition of new measures, but concerns the gathering, digitalization and systematic organization of measures that were originally performed in the framework of the institutional duties of the laboratory, including support to local, national, and European research projects, Ph.D., M. Sc. and B. Sc. theses, and agreements of cooperation with public institutions or private companies. Although the tests were performed all over Italy, their spatial distribution is clearly influenced by the aforementioned activities. Consequently, many measures refer to already published research activities and the most represented Italian regions in the dataset are those in which the case studies were located: Valle d'Aosta [20], Liguria [21], Tuscany [14,18,22], Campania [7].
The measures will be used by the researchers of the department for new applications, allowing for saving time and resources to collect the data useful to calibrate and validate distributed slope stability models, especially concerning shallow landslide modeling. Moreover, this open access publication extends this opportunity for the whole scientific and technical community committed to slope stability modeling, as discussed in the Section 4.

Data Description
The dataset is provided in a zip compressed folder, which contains a shapefile to be imported to a GIS system and an .xlsx spreadsheet that duplicates all the information contained in the attribute table of the shapefile (−9999 is used as "no data" indicator). The shapefile contains 380 points, each of them represents the location where one (or more than one) soil property was assessed ( Figure 1). For in situ tests, the point represents the actual location where the test was performed, while for laboratory tests the point represents the location where the soil was sampled.
All relevant information is contained in the attribute table of the shapefile, which is composed of 40 fields. Each of them is reported in the following list, with a short description, the measurement unit and other useful information for the correct interpretation of the data contained: •  [27,28]; it expresses the minimum water content causing the transition from plastic behaviour to liquid behaviour of soils. • W P [%]: plastic limit of Atterberg limits from hand rolling method [28]; it expresses the minimum water content causing the transition from solid behaviour to plastic behaviour of soils. • I P [%]: plasticity index [28]; it defines the range of values of water content within which a soil has a plastic behaviour, as I P increases the mechanical properties of the material decay.   All relevant information is contained in the attribute table of the shapefile, which is composed of 40 fields. Each of them is reported in the following list, with a short description, the measurement unit and other useful information for the correct interpretation of the data contained: •

Methods
All the laboratory certificates of the DST Lab were retrieved and analyzed: during the early years of activities only paper certificates were available, while since the 1990s digital certificates were reported in .xls (and, later, .xlsx) spreadsheets. Overall, about 5000 certificates were found in the digital and paper archives of the DST Lab (accounting for an average amount of 170 tests for each year of activity). Among this large amount of data, we selected only the certificates containing useful information for soil characterization to support landslide modeling (i.e., the parameters described in the previous section). According to the protocols used in the laboratory, certificates were released only for tests that adhered to national and international standards; however, all certificates were checked and those affected by possible flaws were discarded (e.g., in case of notes reporting of possible disturbances or uncertainties).
To include the useful information in the shapefile and related tables, the geo-location of the in situ measures (or the collected samples, in case of laboratory tests) was investigated. This was the most difficult and time-consuming activity: typically, only the municipality or the name of a nearby location was reported in the certificates, and only some of them reported the exact coordinates of the sampling/test. However, each certificate also reported the name of the operators and the project connected to the measure. This information was useful to find the related documentation (project reports, theses, publications, or other documents), allowing in some circumstances to get a map showing where the measuring points were located, or, if necessary, to contact the personnel involved in the measures, asking to pin-point the exact or approximate location of each measure with the support of aerial/satellite imagery, photographic documentation (when available) and topographic maps. The accuracy of the positioning is reported in the dataset field "precision". Most of the measures (71.05%) are geo-referenced with an error of less than one meter (class 1: coordinates acquired with GPS or exactly placed by means of technical cartography or remote-sensing images). The 13.68% is georeferenced by the name of the locality found in the certificates or in the surrounding of a surveyed area (class 2): this approximation is typically a few hundred meters, but the measurements have been placed at points characterized by the lithology described in the certificate. For the remaining 15.26% of the measures, only the municipality (class 3) is known, and the point has been placed in any area within the municipal territory characterized by the lithology described in the certificate. The points in the precision classes 2 and 3 should be handled with care: they cannot be straightforwardly used as the actual spatial location of the measures they represent; nevertheless, they can be used for statistical characterization of the properties of the lithologies involved.

User Notes
The dataset can be openly accessed and downloaded (https://olmo.unifi.it/file/ sharing/a6mZ4nhmI, accessed on 11 February 2022). In the future, it is planned to update the dataset if a relevant number of additional measures is available; consequently, the corresponding author can be contacted to check the availability of more recent and more complete versions of the datasets. Data can be imported in GIS or other software for further spatial and statistical analyses. The dataset has a high degree of homogeneity, i.e., every parameter in each point was measured with the same instrument following the same procedure, thus ensuring full consistency. For this reason, anyone wishing to mix these data with another similar dataset already at their disposal is advised to preliminarily check the consistency of the two datasets. For this reason, in Section 2 the dataset description contains a description of the methods used to measure each parameter. Users are invited to read the table field "notes", where additional information may be reported.
The dataset was primarily conceived to calibrate distributed physically based slope stability models for shallow landslides. To this end, it can be overlaid with thematic maps (describing, e.g., lithology, geology or pedology) to characterize the statistical distribution of the values of each geotechnical parameter in each mapped unit. This study is useful to define the range and the frequency distribution of the values of each parameter, to increase the results accuracy of Monte Carlo simulations commonly used in probabilistic approaches to handle the variability of the parameters and the associated uncertainties. Of course, the spatial distribution of the measuring point contained in the dataset influences the actual applicability to real case studies. The spatial distribution is mainly concentrated in the areas near the department and in specific areas interested in former research projects. We leave it to the user to evaluate if the measures can be exported elsewhere in similar settings (e.g., in areas where measuring points are not available but the lithologies are similar): at present, studies on this topic are being carried out by the authors of this manuscript.