Integration of Remotely Sensed Soil Sealing Data in Landslide Susceptibility Mapping

Soil sealing is the destruction or covering of natural soils by totally or partially impermeable artificial material. ISPRA (Italian Institute for Environmental Protection Research) uses different remote sensing techniques to monitor this process and updates yearly a national-scale soil sealing map of Italy. In this work, for the first time, we tried to combine soil sealing indicators as additional parameters within a landslide susceptibility assessment. Four new parameters were derived from the raw soil sealing map: Soil sealing aggregation (percentage of sealed soil within each mapping unit), soil sealing (categorical variable expressing if a mapping unit is mainly natural or sealed), urbanization (categorical variable subdividing each unit into natural, semi-urbanized, or urbanized), and roads (expressing the road network disturbance). These parameters were integrated with a set of well-established explanatory variables in a random forest landslide susceptibility model and different configurations were tested: Without the proposed soil-sealing-derived variables, with all of them contemporarily, and with each of them separately. Results were compared in terms of AUC ((area under receiver operating characteristics curve, expressing the overall effectiveness of each configuration) and out-of-bag-error (estimating the relative importance of each variable). We found that the parameter “soil sealing aggregation” significantly enhanced the model performances. The results highlight the potential relevance of using soil sealing maps on landslide hazard assessment procedures.

Of course, besides the statistical techniques used to perform the susceptibility assessment, the input data are also very important, as their selection, number, and parameterization can largely influence the quality of the results [14,21,22]. Several authors pointed out that geographic information systems (GISs) can be useful tools to prepare the input data and to run the LSM [23,24] and the challenge to increase the quality, accuracy, and areal coverage of many input data has led to wide use of remote sensing techniques [25][26][27][28].
According to traditional studies [46,47] and to the most recently updated landslide inventories (which were obtained by combining geomorphological fieldwork, photointerpretation, and satellite remote sensing) [48,49], the main landslide types affecting the area can be classified as slides (rotational, translational, and compound), slow earth flows, complex movements (mainly slides evolving into flows), and, to a lesser extent, debris flows (see also the landslide dataset description in Section 2.3). The main triggering factor is rainfall, with mean annual precipitations gradually ranging from about 1000 mm/year in the plains to about 1900 mm/year in the western mountains [50]. The area is characterized by a high energy of relief, with the steep slopes of the Apennines mountains grading from the maximum altitude of about 2000 m a.s.l. to low-altitude narrow intermontane tectonic valleys or to the floor of a wide alluvial plain with an altitude close to 0 m a.s.l. From a lithological point of view, the bedrock of the area is mainly composed of layered flysch rocks and by metamorphic rocks (phyllite and schists) [46], which play a significant role as landslide-predisposing factors [21]. The mountains are covered by forests and are sparsely urbanized, and the main villages According to traditional studies [46,47] and to the most recently updated landslide inventories (which were obtained by combining geomorphological fieldwork, photointerpretation, and satellite remote sensing) [48,49], the main landslide types affecting the area can be classified as slides (rotational, translational, and compound), slow earth flows, complex movements (mainly slides evolving into flows), and, to a lesser extent, debris flows (see also the landslide dataset description in Section 2.3). The main triggering factor is rainfall, with mean annual precipitations gradually ranging from about 1000 mm/year in the plains to about 1900 mm/year in the western mountains [50]. The area is characterized by a high energy of relief, with the steep slopes of the Apennines mountains grading from the maximum altitude of about 2000 m a.s.l. to low-altitude narrow intermontane tectonic valleys Remote Sens. 2020, 12, 1486 4 of 19 or to the floor of a wide alluvial plain with an altitude close to 0 m a.s.l. From a lithological point of view, the bedrock of the area is mainly composed of layered flysch rocks and by metamorphic rocks (phyllite and schists) [46], which play a significant role as landslide-predisposing factors [21]. The mountains are covered by forests and are sparsely urbanized, and the main villages and cities are located in the valley floors and in the surrounding hills. The sectors of the area occupied by wide alluvial plains were excluded from the analyses because landslides are not a geomorphological process that occurs in a similar setting.

Random Forest Algorithm
The random forest (RF) algorithm, originally proposed by [51], is a machine learning technique based on the binary decision tree classification model. The RF algorithm performs a random sampling of the observations included in the training dataset to create binary decision trees in which a random selection of the explanatory variables is used to split each node (yes/no) and to perform a classification. The observations excluded from the sampling (called "out-of-bag") are used for internal evaluation of the prediction trees and to adjust the definition of the trees generated in other iterations of the process, with the aim of minimizing the prediction errors. Indeed, a single random tree would have a very low predictive capability; thus, to get stable results and lower errors, this application requires the use of several (typically hundreds) randomly generated trees, hence the name "random forest". Usually, in RF applications, a sub-dataset is excluded from the analyses and exclusively used as an independent dataset to validate the results obtained with the optimal "forest" defined by the model.
An interesting feature of RF is the internal procedure of weighting of variables: The out-of-bag samples are used to calculate the prediction error (out-of-bag error) that would be committed if a variable were excluded from the modeling. In this way, the RF algorithm can be used to get, at the same time, a susceptibility mapping, validation of its predictive accuracy, and a study on the sensitivity of the results to the explanatory variables used in the model. RF can be used to solve classification and regression problems and it is a well-established technique in LSM because it is very flexible [14,16,52]: It can use, at the same time, categorical and continuous numerical variables, it can exploit complex information provided by many variables, it can handle multicollinearity, and it is relatively robust with respect to overfitting issues.

Basic Input Data
The landslides location was obtained from IFFI (Inventory of landslides in Italy), a nation-wide catalogue of landslides published by ISPRA, which aims to provide an overview of the distribution of landslides throughout the country and to provide a basic cognitive tool for assessing landslide hazards [53]. It contains 620,808 landslides for a total area of 23,700 km 2 (7.9% of the Italian territory). Landslides are mapped at the 1:10,000 scale and are classified according to [54]. A total of 7799 landslides with areal extension ranging from 10 2 to 10 6 m 2 were identified in the study area: Most of them are mapped as complex movements (typically rotational/translational slides evolving into slow earthflows) (55%) or as rotational/translational slides (35%) (Figure 2). Following the same approach used in previous works, the susceptibility analysis considers only these two typologies. This assumption is supported by two main reasons: First, the triggering mechanism and causative factors are the same; second, the distinction between the two typologies in the inventory is often uncertain and may depend on the interpretation of the surveyors [21,48,49]. Debris flows (4%) and landslides with unspecified typology of movement (6%) were excluded from the analysis. Remote Sens. 2020, 12, x FOR PEER REVIEW 5 of 20 In our application, we decided to work at a 100-m spatial resolution, since this mesoscale has been acknowledged to be a good compromise in regional-scale landslide susceptibility studies [14,55]. Many landslides included in the inventory have dimensions smaller than this cell size, and their significance is taken into account following a methodology of analysis introduced by [14,34]: The dependent variable to be modeled is not represented by a binary classification scheme (pixel with landslides-pixel without landslides), but it is a continuous variable ranging from 0 to 100 and expressing the area of the mapping unit interested in by landslides. When coarse spatial units are used, this approach allows all landslides (and clusters of landslides) to have a significance weighted by their dimensions. Consequently, the 7799 selected landslides were combined with a grid of the area (with 100m x 100m cell size), and for each cell, the percentage of territory covered by landslides was calculated and used as the dependent variable for the landslide susceptibility assessment.
The production of a landslide susceptibility map requires the knowledge of the spatial distribution and interactions of various explanatory variables, which allow landslide-prone areas to be defined. There is no consensus about which is the optimal set of explanatory variables to be used in susceptibility studies and probably an optimal set is something very relative as it may vary according to the landslide type, to the characteristics of the study area, and to the characteristics (e.g., spatial resolution or scale) of the available datasets. However, the main objective of this study was not to produce the best possible susceptibility map but to explore the possibility of using a new explanatory variable derived from soil sealing monitoring data. As a consequence, a very limited set of parameters was used to define the base configuration of the susceptibility model: The "base parameters" were selected among the most used according to a literature review of general landslide studies [3] and studies focused on the same test site [14,56,57]. The base parameters are reported and briefly presented in Table 1. Table 1. Explanatory variables used in the base configuration of the susceptibility assessment.

Variable
Typology Values Source data In our application, we decided to work at a 100-m spatial resolution, since this mesoscale has been acknowledged to be a good compromise in regional-scale landslide susceptibility studies [14,55]. Many landslides included in the inventory have dimensions smaller than this cell size, and their significance is taken into account following a methodology of analysis introduced by [14,34]: The dependent variable to be modeled is not represented by a binary classification scheme (pixel with landslides-pixel without landslides), but it is a continuous variable ranging from 0 to 100 and expressing the area of the mapping unit interested in by landslides. When coarse spatial units are used, this approach allows all landslides (and clusters of landslides) to have a significance weighted by their dimensions. Consequently, the 7799 selected landslides were combined with a grid of the area (with 100 m × 100 m cell size), and for each cell, the percentage of territory covered by landslides was calculated and used as the dependent variable for the landslide susceptibility assessment.
The production of a landslide susceptibility map requires the knowledge of the spatial distribution and interactions of various explanatory variables, which allow landslide-prone areas to be defined. There is no consensus about which is the optimal set of explanatory variables to be used in susceptibility studies and probably an optimal set is something very relative as it may vary according to the landslide type, to the characteristics of the study area, and to the characteristics (e.g., spatial resolution or scale) of the available datasets. However, the main objective of this study was not to produce the best possible susceptibility map but to explore the possibility of using a new explanatory variable derived from soil sealing monitoring data. As a consequence, a very limited set of parameters was used to define the base configuration of the susceptibility model: The "base parameters" were selected among the most used according to a literature review of general landslide studies [3] and studies focused on the same test site [14,56,57]. The base parameters are reported and briefly presented in Table 1.

Soil Sealing Raw Data
Since soil sealing data were the focal point of the research, this paragraph contains a brief description of the ISPRA soil sealing dataset, together with an overview of the methodology that was developed for producing it. Since 2015, ISPRA has undertaken a national-scale monitoring program of land cover and one of the delivered products is a national cartography of soil sealing in the raster format (pixel size 10*10 m), in which the whole Italian territory is classified into two classes: Sealed soil/not sealed soil. The monitoring activity includes an update of this product every year. In this study, we employed the map updated to the year 2017.
In the year 2015, the first soil sealing map was generated by using RapidEye images [58,59], which consist of 5 bands (blue, green, red, NIR -Near Infra-Red, and red edge) at a 5-m spatial resolution [58]. The map was obtained by supervised classification with the maximum likelihood; to increase the accuracy of classification, ancillary data and intensive manual photointerpretation of Very High Resolution (VHR) images and national orthophotos were also used [60].
After 2015, the soil sealing map was updated yearly at a 10-m spatial resolution. A new methodology was implemented to update the soil sealing map each year and to exploit the potential of both Sentinel 2 optical and Sentinel 1 radar satellites. The methodology includes some of the main steps described below.
The new soil sealing mapping procedure was based on Sentinel-2 images because they ensure the best spatial and spectral resolution among the freely available optical images in Italy. The Sentinel-2 images can be used for both photointerpretation and semi-automatic classification purposes. The multispectral instrument (MSI) of Sentinel-2 provides 13 bands from visible to short wave infrared (SWIR) with a 10-m, 20-m, and 60-m resolution, depending on the spectral band ( Table 2). The three 60-m bands (b1, b9, and b10) were not used in the procedure, due to their coarse spatial resolution.
The geometric data resolution was aligned for all image datasets, in order to ensure the consistency of mapping on an annual basis, throughout all the available Copernicus datasets. The Sentinel-2 images were corrected to account for distortions caused by atmosphere and terrain morphology, in order to derive a preliminary thematic classification layer that distinguishes natural soils from imperviousness areas. This first rough classification was improved through specific elaborations based on spectral vegetation indexes that are useful for the detection of target land cover classes. Some masks were created by exploiting the maximum NDVI values calculated over the image time series. These values highlight vegetated areas, and, by contrast, mask the non-vegetated areas (such as sealed land, bare soil, bare rocks, and so on). The maximum NDVI (pixel based) of the current year was compared to the maximum NDVI of the previous year and differences were calculated on a pixel-by-pixel basis; changes were acknowledged if the differences were greater than a set of thresholds empirically calibrated during the last years [60,61], i.e., the difference was higher than 0.2 and the NDVI value of the current year was lower than 0.3 (absence of vegetation). This method allowed identification of the areas where vegetation is removed or is greatly diminished. Another type of analysis was carried out by exploiting the key features of the Sentinel-1 ground range-detected (GRD) data product, including 20-m spatial resolution with a pixel spacing of 10 m and availability of dual polarization acquisitions (VV and VH). Dual polarization acquisitions are widely used for monitoring urban areas, since different polarizations have different sensitivities and different backscattering coefficients for the same target [62]. The first-level image classification obtained from Sentinel-2 classification was integrated by the Sentinel 1 information. The VV polarization is particularly useful to detect land cover changes. The median of the VV values of the Sentinel-1 acquisition of the current year was compared to the median of VV of the previous year: If the difference was greater than 0.1 dB and the median of the current year was lower than −9 dB (i.e., buildings have low VV values), the pixel was considered a possible change due to the construction of buildings.
The output of this integration represents a second-level soil sealing map that represents the base for a following work of manual photointerpretation on multitemporal VHR images and orthophotos for the whole Italian territory on a detailed scale (≥1:5000). In this way, it is possible to accurately (spatial resolution of 0.5 m) detect the limits of the area concerned, to identify the type of change that occurred, and finally, to perform a binary classification encompassing sealed vs. not sealed soil classes. Sealed soils include built-up areas, paved roads, railways, airports, ports, and other paved areas (public squares, parking lots, courtyards, sport fields, permanent paved greenhouses). They also include reversible land consumption: Dirt roads, construction sites, and other compacted earth areas. A further validation process was then performed by using very high-resolution orthophotos and Google Earth imagery as control datasets.
After the classification process was finished, an accuracy analysis was performed by selecting a random sample of control points and calculating accuracy indexes. Finally, the final soil sealing map was re-projected in the ETRS_1989_LAEA system and made available in a public repository (http: //groupware.sinanet.isprambiente.it/uso-copertura-e-consumo-di-suolo/library/consumo-di-suolo).

Soil Sealing Parameterization for Landslide Susceptibility Mapping
This study aimed to investigate whether soil sealing data can be effectively used in landslide susceptibility mapping models. To this aim, the soil sealing map released by ISPRA (see previous section) was used as raw data to derive some parameters to be tested as explanatory variables in landslide susceptibility assessments. Indeed, the original soil sealing map contains very basic information (dichotomic classification between sealed and not-sealed soil) at high spatial resolution (10 m) and some alternate approaches may be needed to convert this information into continuous variables, or to aggregate it at different pixel sizes. The tested parameters were "soil sealing", "urbanization", "soil sealing aggregation" (percentage of sealed soil), and "roads"; they are briefly introduced hereafter and are shown in Figure 3.

Description of the Landslide Susceptibility Tests
To train the susceptibility model, 15,557 randomly selected points were considered. The constraints for the random selection were set as follows: Half of them were randomly sampled within landslide areas, the other half were sampled in areas outside landslide polygons. Even if some studies use crowns, scarps, or detachment zones to sample landslide conditions, we took into account the whole landslide polygon (typically encompassing both the detachment zone and accumulation zone), because the landslides of the area are often subject to reactivations and therefore, in the landslide susceptibility assessment, the definition of conditions associated to landslide bodies that could be reactivated is as important as the definition of conditions associated to new activations. Moreover, the use of the whole landslide area was necessary because, as explained in Section 2.3, the dependent variable is defined as the area of each cell occupied by landslides and consists of a continuous value that will be modeled with a regression (while most of the literature studies perform a classification on binary instances of presence of landsides vs. the absence of landslides). The dataset was divided in a 70% subsample for model training and a 30% subsample for independent verification and testing. Within the subsamples, the balance between landslide and non-landslide conditions was kept at 50%-50%.
The susceptibility assessment was performed in regression mode using a graphical user interface

Soil Sealing (SS)
This parameter is the most similar to the original soil sealing data: Since the original raw data consists of a raster at a 10-m resolution in which each cell represents sealed vs. non-sealed soil, this layer was downsampled in the GIS environment to a 100-m resolution by using the "majority" operator. At the end of the process, each cell represented a portion of territory, where most of the area is occupied either by artificial structures or by natural and semi-natural soil. This parameter was included in the tests because it represents the result of the most straightforward method to assimilate the soil sealing raw data to the mapping unit defined for an application. This parameter represents the density of urbanization: It was derived from the original high-resolution soil sealing map, after a re-classification in three classes:
Predominantly artificial areas, with a high density of urbanization The identification of urban density was carried out on high-resolution soil sealing data with the "focal statistics" tool of ArcMap 10.6, assigning to each pixel the mean value of all the pixels present in a circular area of a 600-m radius. The result was reclassified in three classes, whose limits were defined by comparison with the Copernicus Urban Atlas data set.

Soil Sealing Aggregation (SSA)
This variable was calculated by using the soil sealing layer at a 10-m resolution; a data aggregation operation was performed in a GIS environment and the output was a raster with a 100-m cell size, where the value expresses the number of sealed 10-m-resolution cells. This dataset therefore presents a range of integer values from 0 to 100, which could also be interpreted as the percentage of soil sealed within each hectare. In other applications involving different pixel sizes, a simple arithmetical operation should be performed to convert the count of sealed cells into a percentage.

Roads (ROA)
The distance to roads is traditionally considered as one of the most important anthropogenic factors influencing landslides and it is widely used in landslide susceptibility studies [3]. Roads are included in the raw soil sealing maps but are not distinguishable from other sealing factors. Consequently, to isolate the contribution of the road network to the generalized soil sealing process, OpenStreetMap (in vector format with line geometry) was used. From the original layer, the main and secondary communication routes were selected; only the small mountain tracks were not considered. A 10-m buffer was created around the lines representing the roads to define an area of disturbance, then the resulting polygons were rasterized and aggregated to a resolution of 100 m. As a final check, ROA and SSA raster were compared: Since the former systematically had lower values than the latter, we are confident that ROA can be considered a subset of the aggregated soil sealing map, where the contribution of roads is isolated from other soil sealing processes.

Description of the Landslide Susceptibility Tests
To train the susceptibility model, 15,557 randomly selected points were considered. The constraints for the random selection were set as follows: Half of them were randomly sampled within landslide areas, the other half were sampled in areas outside landslide polygons. Even if some studies use crowns, scarps, or detachment zones to sample landslide conditions, we took into account the whole landslide polygon (typically encompassing both the detachment zone and accumulation zone), because the landslides of the area are often subject to reactivations and therefore, in the landslide susceptibility assessment, the definition of conditions associated to landslide bodies that could be reactivated is as important as the definition of conditions associated to new activations. Moreover, the use of the whole landslide area was necessary because, as explained in Section 2.3, the dependent variable is defined as the area of each cell occupied by landslides and consists of a continuous value that will be modeled with a regression (while most of the literature studies perform a classification on binary instances of presence of landsides vs. the absence of landslides). The dataset was divided in a 70% subsample for model training and a 30% subsample for independent verification and testing. Within the subsamples, the balance between landslide and non-landslide conditions was kept at 50%-50%.
The susceptibility assessment was performed in regression mode using a graphical user interface (GUI) developed in MATLAB, derived from a previous work [63] and already used in several applications to produce and validate landslide susceptibility maps and to investigate the forecasting effectiveness of each input parameter [21,63]. The validation of the susceptibility assessment was performed in terms of AUC (area under receiver-operator characteristic curves), a quantitative index widely used in landslide susceptibility studies to quantify the overall predictive effectiveness of a model [3,64,65]. AUC is a cutoff-independent index with an operating range from 0.5 (completely random predictions) to 1.0 (perfection) [65]. The relative importance of each parameter used within a model configuration is quantitatively expressed by the OOBE (out of bag error), an estimation of the relative error that the model would commit if a given parameter were omitted (as for the procedure described in Section 2.2).
According to sensitivity analyses performed in previous studies [14], the random forest model was tuned by using 200 trees and by performing for each configuration 20 different runs to handle the randomness of the results provided by the RF approach. Therefore, the AUC and OOBE values are intended as mean values of the results obtained in each of the 20 iterations.
A series of tests was conceived to assess whether soil sealing can provide a useful contribution in landslide susceptibility mapping and which soil sealing parameterization provides the best outcomes. The first test was performed using only the basic parameters (base configuration), while a second test was carried out using all the basic parameters together with all the soil sealing parameters. The first test allowed analysis of the performance of the model in a basic and standard configuration, establishing a benchmark for further comparisons with other sets of input parameters. The second one allowed calculation of the relative importance of each variable when all of them were considered together: The objective of this test was to have a direct comparison of the importance of all the parameters, as expressed in terms of OOBE values.
The last group of tests was carried out by adding to the base configuration each time a single soil-sealing-derived variable. For each configuration, the mean and maximum AUC were calculated, and the relative importance of each variable was defined in terms of OOBE.

Sensitivity to Soil Sealing Parameterization
The first objective of the tests was to investigate the sensitivity of the susceptibility assessment to different parameterizations of soil sealing information. To this aim, the AUC values obtained by the validation of the different model configurations were compared and used as an indicator of their accuracy. Figure 4 shows that the base configuration used as a benchmark reports the lowest accuracy (AUC = 0.65), thus indicating that soil sealing information has the potential to improve the accuracy of the susceptibility assessments. However, the results are very sensitive to the different parameterizations of soil sealing information: Some of the soil sealing-derived parameters produce only a very limited improvement of the model (SS, URB), while others bring a more marked contribution (ROA, SSA). The joint use of all the variables together returns intermediate results. The configuration that returned the best validation statistics is the one using the soil sealing aggregation (SSA) parameter, with an AUC of 0.74, showing that land consumption can be used as an important feature in landslide susceptibility mapping.
Another outcome of the test that can be used to investigate the impact of soil sealing-derived parameters in the landslide susceptibility assessment is the comparison of OOBE values, which are indicators of the relative importance of each variable used. Figure 5 shows that the relative importance of the main morphometrical parameters does not vary significantly from a configuration to another: Slope gradient, elevation, curvature, and flow accumulation are the most important parameters, followed by the categorical variables "lithology" and "land cover". The soil sealing-derived parameters have a ranking (and a relative importance) that is clearly connected to the quality of the results observed with the AUC: The most effective parameter (SSA) has an OOBE value very close to lithology and land cover and markedly higher than the variable "aspect". The parameter accounting for roads (ROA) has a ranking higher than aspect but the OOBE is lower than CLC and lithology. The other two soil sealing-derived parameters are ranked as the least important parameters (yet with a positive impact on the modelling), and this is reflected by the low AUC values of the derived susceptibility assessments (which are, however, slightly higher than the base configuration) (Figure 4). The results expressed in terms of AUC values and OOBE rankings are consistent and point out that soil sealing aggregation (SSA), at least in this case of study, is the most effective way to improve a landslide susceptibility assessment with soil sealing. Therefore, this parameterization was selected as a benchmark for the subsequent analyses.

Sensitivity to Soil Sealing Parameterization
The first objective of the tests was to investigate the sensitivity of the susceptibility assessment to different parameterizations of soil sealing information. To this aim, the AUC values obtained by the validation of the different model configurations were compared and used as an indicator of their accuracy. Figure 4 shows that the base configuration used as a benchmark reports the lowest accuracy (AUC = 0.65), thus indicating that soil sealing information has the potential to improve the accuracy of the susceptibility assessments. However, the results are very sensitive to the different parameterizations of soil sealing information: Some of the soil sealing-derived parameters produce only a very limited improvement of the model (SS, URB), while others bring a more marked contribution (ROA, SSA). The joint use of all the variables together returns intermediate results. The configuration that returned the best validation statistics is the one using the soil sealing aggregation (SSA) parameter, with an AUC of 0.74, showing that land consumption can be used as an important feature in landslide susceptibility mapping. Another outcome of the test that can be used to investigate the impact of soil sealing-derived parameters in the landslide susceptibility assessment is the comparison of OOBE values, which are indicators of the relative importance of each variable used. Figure 5 shows that the relative importance of the main morphometrical parameters does not vary significantly from a configuration to another: Slope gradient, elevation, curvature, and flow accumulation are the most important parameters, followed by the categorical variables "lithology" and "land cover". The soil sealingderived parameters have a ranking (and a relative importance) that is clearly connected to the quality of the results observed with the AUC: The most effective parameter (SSA) has an OOBE value very close to lithology and land cover and markedly higher than the variable "aspect". The parameter accounting for roads (ROA) has a ranking higher than aspect but the OOBE is lower than CLC and lithology. The other two soil sealing-derived parameters are ranked as the least important parameters (yet with a positive impact on the modelling), and this is reflected by the low AUC values of the derived susceptibility assessments (which are, however, slightly higher than the base configuration) ( Figure 4). The results expressed in terms of AUC values and OOBE rankings are consistent and point out that soil sealing aggregation (SSA), at least in this case of study, is the most effective way to improve a landslide susceptibility assessment with soil sealing. Therefore, this parameterization was selected as a benchmark for the subsequent analyses.  Figure 6a shows the susceptibility map obtained by using the best configuration (the one including the parameter "soil sealing aggregation"). Following an approach proposed by [16], the other susceptibility assessments were represented in terms of deviation from the benchmark map: Using a GIS system, the susceptibility maps obtained using ROA, SS, and URB were subtracted from the SSA map. Figure 6b, 6c, and 6d show the differences, on a pixel-by-pixel basis, in the modelled  Figure 6a shows the susceptibility map obtained by using the best configuration (the one including the parameter "soil sealing aggregation"). Following an approach proposed by [16], the other susceptibility assessments were represented in terms of deviation from the benchmark map: Using a GIS system, the susceptibility maps obtained using ROA, SS, and URB were subtracted from the SSA map. Figure 6b,c,d show the differences, on a pixel-by-pixel basis, in the modelled susceptibility values. In some points, the differences are relevant (maximum overestimation is 0.38 and maximum underestimation is −0.34, as shown in Table 3, but they never reach "extreme values", as defined by [16]. Moreover, Figure 6 shows that except for an area in the NW sector of the test site, where the SSA-derived susceptibility is generally lower than the others, underestimations and overestimations do not have a systematic spatial pattern: They are not concentrated in some precise zones and they have a "salt and pepper" distribution.  For completeness, a reclassified susceptibility map of the area obtained with the SSA configuration is shown in Figure 7. According to the Italian national regulation on landslide hazards, the susceptibility values were divided into four qualitative classes. To this aim, the well-established [15,66] Jenks method based on natural breaks was used, as it is an objective approach to optimize the data clustering by reducing the variance within classes while maximizing the variance between classes. According to this classification, low susceptibility zones occupy 32% of the study area and moderate, high, and very high zones occupy 32%, 27%, and 10%, respectively (Figure 7a).  For completeness, a reclassified susceptibility map of the area obtained with the SSA configuration is shown in Figure 7. According to the Italian national regulation on landslide hazards, the susceptibility values were divided into four qualitative classes. To this aim, the well-established [15,66] Jenks method based on natural breaks was used, as it is an objective approach to optimize the data clustering by reducing the variance within classes while maximizing the variance between classes. According to this classification, low susceptibility zones occupy 32% of the study area and moderate, high, and very high zones occupy 32%, 27%, and 10%, respectively (Figure 7a).

Discussion
The outcomes of the analyses demonstrated that soil sealing information can have a positive influence in regional-scale landslide susceptibility assessments, but the results are sensitive to the approach used to parametrize soil sealing. Since in Italy soil sealing raw data are characterized by a very fine spatial resolution (10-m pixels) and coarse information (dichotomous classification in sealed and not-sealed pixels), the application to susceptibility models based on wider spatial units requires the derivation of parameters and many approaches could be pursued. In this study, we introduced, tested, and compared four approaches based on four different soil sealing-derived variables.
The best model configuration corresponds to the one that uses the soil sealing aggregation variable, with an AUC value of 0.74 and a higher ranking and OOBE value than other soil sealingderived parameters. Among the tested variables, SSA includes the most detailed information, with continuous values ranging from 0 to 100 representing the degree of sealing (and thus the degree of anthropogenic "disturbance") in each 100m * 100m cell used as the spatial unit in the susceptibility analysis. It is worth highlighting that to create this variable, the pixel size was upscaled from 10 (the native resolution of the raw soil sealing map) to 100m, but despite the coarser spatial resolution, the SSA variable had a positive effect in the susceptibility assessment. Indeed, the OOBE is relatively high (similar to other important variables as lithology and land cover) and the derived susceptibility map has an overall accuracy higher than the base configuration.
This outcome is important as it leaves open other possibilities of application with different spatial units like basins or geomorphic units, which, although less common [3], are receiving growing attention [17,67,68].
The configuration encompassing SSA is the only one that returned an AUC higher than 0.70, which is used by some authors as the limit to "good" results [69]. Indeed, in the susceptibility

Discussion
The outcomes of the analyses demonstrated that soil sealing information can have a positive influence in regional-scale landslide susceptibility assessments, but the results are sensitive to the approach used to parametrize soil sealing. Since in Italy soil sealing raw data are characterized by a very fine spatial resolution (10-m pixels) and coarse information (dichotomous classification in sealed and not-sealed pixels), the application to susceptibility models based on wider spatial units requires the derivation of parameters and many approaches could be pursued. In this study, we introduced, tested, and compared four approaches based on four different soil sealing-derived variables.
The best model configuration corresponds to the one that uses the soil sealing aggregation variable, with an AUC value of 0.74 and a higher ranking and OOBE value than other soil sealing-derived parameters. Among the tested variables, SSA includes the most detailed information, with continuous values ranging from 0 to 100 representing the degree of sealing (and thus the degree of anthropogenic "disturbance") in each 100m * 100m cell used as the spatial unit in the susceptibility analysis. It is worth highlighting that to create this variable, the pixel size was upscaled from 10 (the native resolution of the raw soil sealing map) to 100m, but despite the coarser spatial resolution, the SSA variable had a positive effect in the susceptibility assessment. Indeed, the OOBE is relatively high (similar to other important variables as lithology and land cover) and the derived susceptibility map has an overall accuracy higher than the base configuration. This outcome is important as it leaves open other possibilities of application with different spatial units like basins or geomorphic units, which, although less common [3], are receiving growing attention [17,67,68].
The configuration encompassing SSA is the only one that returned an AUC higher than 0.70, which is used by some authors as the limit to "good" results [69]. Indeed, in the susceptibility mapping literature, it is common to find even higher AUC values, but the main objective of this work was to assess the sensitivity to soil sealing parameterization and identify some promising parameters derived by ISPRA soil sealing monitoring products. For this reason, only a few morphological and thematic parameters have been used as explanatory variables of the susceptibility model, to avoid that the impact of shadowing of the soil sealing-derived features. It is not excluded that other scholars could use the SSA parameter inside a more complex susceptibility model and to get better results in terms of AUC. For instance, other authors in the same test site obtained an AUC of 0.84 by using a random forest model with 23 explanatory variables [56]. However, even if the SSA-derived map can surely be ameliorated, a fair match between the susceptibility map and the landslide polygons can be observed in several sectors of the area (Figure 7b,c).
By using the same soil sealing raw data, two other similar parameters were defined, but they returned worst results. Among all the variables created, SS is conceptually the most similar to the original raw data: The territory is classified in the same two classes (natural soil or sealed soil) and the only difference is the downsampling from the original cell size of 10m to the resolution used in the susceptibility assessment (100m). However, there are two main limitations that in our opinion negatively influenced the results on landslide susceptibility mapping, making the SS parameter irrelevant. First, the rescaling procedure brought a loss of spatial detail (spatial resolution is one of the main advantages of the raw soil sealing maps). Second, some uncertainties were added as the "majority" operator used to aggregate pixels from 10 to 100m is not able to represent intermediate situations in which the 100-m cell is only partially sealed. To overcome the latter limitation, the URB parameter was conceived: It basically reflects the SS structure, but it also has an intermediate class to account for partially sealed spatial units. However, the improvement in terms of AUC and OOBE is very limited and leads to poor results.
The second-best result was given by the roads, with an AUC of 0.70 and a higher ranking and OOBE than SS and URB. ROA was structured as a continuous variable ranging from 0 to 1, each cell representing the percentage of area occupied by roads of every typology, from large highways to small paved roads. Conceptually, ROA is a subset of SSA: While the former quantifies the disturbance of roads on the hillslope system, the latter quantifies the disturbance driven by every human structure. Relative to the susceptibility mapping, in our case of study, SSA seems to be more significant than ROA and to foster better results. This outcome should not be surprising, since roads are widely acknowledged as one of the most important human-related landslide-predisposing factors, especially when built with cut-and-fill techniques. Indeed, the same can also apply to every other human asset, including buildings, as pointed out by a growing number of examples in the literature [70][71][72]. From this perspective, the use of road network is clearly only partial information, while a good (i.e., spatially accurate and timely updated) soil sealing map, like the ones produced every year by ISPRA in Italy, has the potential to hold more complete information that can be conveniently used in landslide susceptibility mapping. Regarding landslides influenced by road networks or other constructions, it should be stressed that in this area, they are quite frequent but have typically small sizes [45,48,50,53]. Despite the relatively coarse (100m x 100m) cell size, the modeling approach allows proper consideration to be given to both large and small landslides, because the modeled susceptibility index represents the percentage of landslide area expected in the mapping unit.
Another important test is the one using all soil sealing-related parameters altogether ("ALL" model configuration). The random forest technique is acknowledged to be able to handle many parameters even if they are partially correlated and to avoid collinearity issues. However, the joint use of SSA, SS, URB, and ROA does not provide satisfactory results (AUC = 0.68). This is in contrast with other similar applications, e.g., [21] in the same test site demonstrated that a landslide susceptibility assessment can be improved if many thematic maps with slightly different geological meanings are jointly used. In the present case of study, the four different parameters derived from soil sealing raw data are evidently redundant and the best prediction was obtained when only the best of them (SSA) was used alone. This outcome can be interpreted as further evidence of the effectiveness of SSA.
In many scientific papers about landslide susceptibility, land cover maps are used as input variables, and their influence on the results is widely recognized. This was confirmed by our results, since land cover has a positive influence on the susceptibility assessment. However, it is important to remark that land cover was outperformed by soil sealing aggregation (higher ranking and OOBE value). Although Corine Land Cover presents good thematic detail, it is less effective in spatial resolution (25 hectares minimum mappable unit) than soil sealing thematic maps, and this is probably the main reason of the lower forecasting effectiveness. In particular, the spatial resolution of CLC is ineffective in mapping the Italian soil sealing phenomenon, which has peculiar spatial characteristics: As [38] clearly pointed out, its spatial pattern is typically fragmented, and small roads and urban sprawl areas are systematically underestimated or even ignored. The fact that a better parameterization of urban fabric leads to an improved accuracy of the derived LSM could be considered as further evidence of the importance of human drivers on landslide activity, at least in this study area. Consequently, the possibility of using land consumption to improve landslide susceptibility assessments brings two advantages: A fine spatial resolution of 10 m of the raw data and a good temporal resolution, as the soil sealing thematic layers in Italy and in other European countries are updated on an annual frequency. The latter point is particularly important also from the perspective of providing timely updates of landslide hazard assessments.
As a last point of discussion, some comparisons could be made with other susceptibility assessments already performed in the same test site [21,56]. Both works describe an RF-based modeling approach that is very similar to the one used in the present work; thus, the differences can be considered depending mainly on the use of explanatory variables. Initially, [56] attempted to draw a first susceptibility map of the area, resorting to 23 state-of-the-art explanatory variables. The resulting map was validated, and a 0.84 AUC value was obtained. More recently, [21] used the same area to optimize the parameterization of geological information in the landslide susceptibility assessment. They used as a benchmark a reduced base configuration of parameters (similar to the one used in this article), then improved the modeling by adding complex and multifaceted geological information in the susceptibility model [21]. The AUC values of the resulting maps were improved from 0.60 to 0.75. The AUC improvement was similar to the one observed in this study (from 0.65 to 0.74). In light of these observations, the present work provided some preliminary insights on the possibility to use in LSM activities a new parameter derived from nation-wide soil sealing monitoring programs. When sensitivity analyses and further tests are carried out for other variables as well, the gained knowledge will be summarized in a new and improved comprehensive LSM effort, in the framework of future research activities on this area.

Conclusions
Soil sealing maps at a 10-m spatial resolution covering the whole Italian territory are updated every year by ISPRA. In this work, this high-resolution thematic layer was used for the first time in a landslide susceptibility assessment. The test site was a 3100-km 2 area located in northern Tuscany (Italy), with 7799 observed landslides, mostly rotational, translational, and compound landslide types. Four different variables were derived from the soil sealing raw data: Soil sealing aggregation (SSA-a continuous variable expressing the percentage of sealed soil within the selected mapping unit), soil sealing (SS-a dichotomous categorical variable expressing if a given mapping unit is composed of natural soil or sealed soil), urbanization (URB-a categorical variable subdividing each unit into natural, seminatural, or urbanized soil), and road (ROA-similar to SSA but encompassing only the effect of road network instead of all human structures).
The main objective of this research was to use these four soil sealing variables and test if they could improve the accuracy of landslide susceptibility assessments. A random forest algorithm was used for the susceptibility analysis and seven conditioning factors were used as base variables (aspect, land cover, slope, elevation, lithology, planar curvature, and flow accumulation). The four soil sealing-derived variables were integrated in the susceptibility model, resulting in a series of tests (each variable alone, all variables together, none of them). For each test, the overall accuracy of the resulting map was evaluated in terms of AUC; in addition, the importance of each variable was evaluated in terms of the "out of bag error".
The results showed that: • Soil sealing data can be used to improve the effectiveness of landslide susceptibility assessments; • Among the tested variables, "soil sealing aggregation" was the most promising, leading to the highest AUC and showing a relative importance within the model higher than other widely used parameters like land cover; • In addition to the improved forecasting effectiveness, the use of soil sealing-derived parameters in landslide susceptibility studies seems promising because the original thematic layer is updated yearly, and this good temporal resolution allows for a quick and constant update of susceptibility maps accounting for the modifications induced by human activity on hillslope systems; and • All the proposed variables were formulated in general terms to make their use reproducible in other susceptibilities studies, even in those based on spatial units different from those used here. The methodology is technically reproducible in Italy and in all other countries with similar environmental monitoring programs.