Comparing Forward Conditional Analysis and Forward Logistic Regression Methods in a Landslide Susceptibility Assessment: A Case Study in Sicily

Forward logistic regression and conditional analysis have been compared to assess landslide susceptibility across the whole territory of the Sicilian region (about 25,000 km2) using previously existing data and a nested tiered approach. These approaches were aimed at singling out a statistical correlation between the spatial distribution of landslides that have affected the Sicilian region in the past, and a set of controlling factors: outcropping lithology, rainfall, landform classification, soil use, and steepness. The landslide inventory used the proposal of building the models like the official one obtained in the PAI (hydro geologic asset plan) project, amounting to more than 33,000 events. The 11 types featured in PAI were grouped into 4 macro-typologies, depending on the inherent conditions believed to generate various kinds of failures and their kinematic evolution. The study has confirmed that it is possible to carry out a regional landslide susceptibility assessment based solely on existing data (i.e., factor maps and the landslide archive), saving a considerable amount of time and money. For scarp landslides, where the selected factors (steepness, landform classification, and lithology) are more discriminate, models show excellent performance: areas under receiver operating characteristic (ROC) (AUCs) average > 0.9, while hillslope landslide results are highly satisfactory (average AUCs of about 0.8). The stochastic approach makes it possible to classify the Sicilian territory depending on its propensity to landslides in order to identify those municipalities which are most susceptible at this level of study, and are potentially worthy of more specific studies, as required by European-level protocols.


Introduction
One of the most obvious effects of rapid territorial expansion in recent decades is the growing impact that natural disasters have on man and his activities. Therefore, institutions are committed to investing their resources in both the implementation of structural interventions to mitigate risk as well as early warning systems, and in defining guidelines for land management and civil protection issues [1,2]. Landslides are among the major contributors to the dynamics of the morphological evolution of slopes and occur when slope stability conditions change due to increased stress or decreased resistance along a failure surface. A deformation occurring on a pre-existing failure surface is called re-activation, whilst one along a new fracture plane is referred to as a neo-activation. International literature refers to landslide susceptibility as the spatial probability for gravitational instability conditions within an area, based on its physical-environmental conditions [3,4]. Therefore, depending on the spatial variability of the physical-environmental features of the study area (typically, a river-basin or an administrative The mechanical characteristics of outcropping lithotypes are among the main geoenvironmental factors directly influencing the geomorphological stability of Sicilian slopes. Slides and flows are mainly located where the clay continental sequences outcrop. Hard block outcrops (metamorphic and carbonate) are affected by falls, topples, and lateral spreading. The rainfall may cause denudation slope actions able to generate, in the presence of debris on the metamorphic units, debris flows or debris avalanches. The triggering factors which are most likely to influence the activation of landslides in Sicily, are, in order: rainfall, human activity, volcanic eruptions, and earthquakes. Carbonatic rocks outcropping in the foreland sector are almost exclusively affected by falls. Both the ductile clayey formations and the weathered top coverage of the metamorphic units may experience rapid debris avalanche/debris flow phenomena.
The geomorphological features of the island of Sicily have been crafted by the collision of the Eurasian and African plates acting in synergy to shape the current landscape. The topography is directly influenced by the stratigraphic structure of the area and by the surface uplifting and subduction that occurred during the quaternary period, influenced heavily by significant eustatic sea level variations. The changes to Sicily's geology through the epochs have led to a mountainous and hilly terrain.
Furthermore, the influence of medieval human civilization is still apparent in the small Sicilian urban centers. These historic centers are often surrounded by harsh terrain, where slopes with a gradient of 50% or more can often be seen encompassing the centers, features which would have been favorable to the population who first settled there when defending their land. This, however, leads to limited space and opportunity for further urban development.
The flat areas of the island, a total of just 7% of the entire territory, are represented by the alluvial plain of Catania, the coastal plain of Licata and Gela, the coastal area of Trapanese, and that between Syracuse and Scicli, at the foot of the Monti Iblei.
From analysis of the rainfall data of the Sicilian stations, it is possible to highlight how the rainfall is concentrated, especially in the October-March period, though it is somewhat appreciable in the spring (April-May) and of little importance in the summer months. The maximum and minimum values of average annual precipitation are respectively 919 mm in the Monreale station, in the province of Palermo, and 510.6 mm in the Agrigento station. Frequently, at the highest peaks, there b a Figure 1. Schematic structural map of Sicily [17]. (a) The three main geological structural elements of Sicily; (b) distribution of the main stratigraphic-structural units.
The mechanical characteristics of outcropping lithotypes are among the main geo-environmental factors directly influencing the geomorphological stability of Sicilian slopes. Slides and flows are mainly located where the clay continental sequences outcrop. Hard block outcrops (metamorphic and carbonate) are affected by falls, topples, and lateral spreading. The rainfall may cause denudation slope actions able to generate, in the presence of debris on the metamorphic units, debris flows or debris avalanches. The triggering factors which are most likely to influence the activation of landslides in Sicily, are, in order: rainfall, human activity, volcanic eruptions, and earthquakes. Carbonatic rocks outcropping in the foreland sector are almost exclusively affected by falls. Both the ductile clayey formations and the weathered top coverage of the metamorphic units may experience rapid debris avalanche/debris flow phenomena.
The geomorphological features of the island of Sicily have been crafted by the collision of the Eurasian and African plates acting in synergy to shape the current landscape. The topography is directly influenced by the stratigraphic structure of the area and by the surface uplifting and subduction that occurred during the quaternary period, influenced heavily by significant eustatic sea level variations. The changes to Sicily's geology through the epochs have led to a mountainous and hilly terrain.
Furthermore, the influence of medieval human civilization is still apparent in the small Sicilian urban centers. These historic centers are often surrounded by harsh terrain, where slopes with a gradient of 50% or more can often be seen encompassing the centers, features which would have been favorable to the population who first settled there when defending their land. This, however, leads to limited space and opportunity for further urban development.
The flat areas of the island, a total of just 7% of the entire territory, are represented by the alluvial plain of Catania, the coastal plain of Licata and Gela, the coastal area of Trapanese, and that between Syracuse and Scicli, at the foot of the Monti Iblei.
From analysis of the rainfall data of the Sicilian stations, it is possible to highlight how the rainfall is concentrated, especially in the October-March period, though it is somewhat appreciable in the spring (April-May) and of little importance in the summer months. The maximum and minimum values of average annual precipitation are respectively 919 mm in the Monreale station, in the province of Palermo, and 510.6 mm in the Agrigento station. Frequently, at the highest peaks, there is snowfall, of which there is no significant long-term data, due largely to the lack of a sufficient number of snow stations. A meteoric event of considerable importance in the northern mountain ranges, in particular for the northern slopes of the Madonie due to the presence of fog, which, in addition to integrating normal water supplies through condensation, performs a mitigating and compensating action for extreme climatic events, limiting precipitation and keeping temperatures lower during periods of summer water deficit, as well as decreasing the intensity of weather events harmful to plants, such as late frosts. As regards the thermometric data, there is an inverse trend compared with that of rainfall, as occurs throughout the Mediterranean region. There is, in fact, a gradual increase between March and April, and a more marked increase from May to July-August-a period in which the absolute maximum values are reached-beyond which the temperatures gradually decrease until October, and then drop sharply until December and touch the minimum values in January-February which is the coldest time of the year. The highest average annual temperature is 18.8 • C for the Cefalù (Palermo) station, while the lowest, 13.3 • C, is recorded in Petralia Sottana (Palermo). The lowest average annual minimum temperature value (9.3 • C) is recorded in the Petralia Sottana station, while the highest annual average maximum temperature value (24.1 • C) in the Lentini (Siracusa) and Palermo stations (Castelnuovo Institute).

Landslide Inventory
Landslides are natural events in the evolution of a slope. They are a problem and become a hazard and/or risk when they interact with man and the man-made environment. Landslides can be classified according to their movement types and the nature of the displaced material type, as well as the state, distribution, and style of their activity [8,[19][20][21]. As raised by [22], there is a conceptual ambiguity concerning landslides stemming from the use of the very same term (i.e., landslide) for both the landslide deposit (displacement volume) and the movement of material on a slope or a pre-existing landslide body [20,23]. This is in addition to general confusion originating from the variable and complex nature of the phenomenon itself [24], due to profoundly different morphological characteristics, behavior, activity states, and their evolution. The construction of the landslide inventory is a fundamental and critical step towards the application of statistical models designed to estimate the probability of new activations affecting previously uninvestigated areas. A landslide inventory commonly represents the sum of all the events that have occurred in an area. Alterations to a slope profile, pointing to ongoing landslides, tend, over time, to become less evident because of erosion, new landslides, human activity, and vegetation, making the "in landslide/not in landslide" border hard to detect with the passage of time. Generally, "newer" phenomena generated by recent heavy rainfall or earthquakes are more easily identifiable and interpretable than more remote ones, where diagnostic elements begin to dissolve. Certainly, there is a critical issue concerning the updating of the landslide inventory, as the public administration lacks economic and human resources. This shows, even more clearly, the limitations of PAI methodology in mapping areas at risk, as it leads to a zoning system, which is not closely related to all new activations. No systematic archive of slope instability exists for the research area of this paper. The latest archive of slope failures is the one belonging to the PAI project, which now holds 33,094 landslides (latest update May 2016) classified into 11 different types. The landslides that were mapped within the PAI project affect an area of approximately 1300 square kilometers, approximately 5.1% of the size of the region (Table 1; Figure 2). The landslide archive derives from a historical inventory and territorial analysis, completed with the execution of inspections, which began in 2003. The archive is continuously updated by reports from the following: literature and scientific publications; studies to support urban development projects in municipalities; regional civil protection archives; and reports from local authorities to regional and national bodies (Civil Protection, Territory and Environment Departments) of geomorphological phenomena that have occurred (from 1998 to today).  The PAI archive focuses more on urban and populated areas. In this piece of research, various models were created to record landslides affecting scarps (scarp landslides; SCR_LSN) and landslides lying over slopes (hillslope landslides; HILL_LSN).

Modelling Approach
A predictive model must represent the response of a natural system to the trigger conditions described by its environmental features; the response may lie in the spatial distribution of new landslides or in the so-called prediction image. The effectiveness of the model can be measured by comparing the final expected results (the susceptibility map) with the actual results that are directly observed (the new landslide map). When studying the natural environment, the use of models is essential, as they allow for a simplification of the infinite natural variables, as well as operating within an acceptable processing time with conventional computers. Thanks to a greater exchange of information, the development of the hardware and software component for the acquisition and processing of data, and the interaction between different research groups, the methods for assessing susceptibility from landslides have evolved rapidly [3,4,[24][25][26][27]. In the last decade, several applications have been carried out with the aim of comparing results from different statistical approaches, as applied to the implementation of models of landslide susceptibility on the same area, using the same landslide inventory and the same control factors [28][29][30][31]. Although full agreement does not exist within the scientific community as to what the best approach to follow is, the experience gained through our previous studies tells us which statistical techniques may be adopted from among the many available (discriminant analysis, conditional analysis, binary logistic regression, classification, and regression trees) and which one leads to greater performance in terms of predictive fitting, and robustness [27,[32][33][34][35][36][37][38][39].

Conditional Analysis (CA)
The CA statistical approach has been widely adopted in landslide susceptibility assessment [24,26,33,[40][41][42][43][44]. The CA concept is simple and easily manageable in a GIS environment. That is why it is the single most recommended method when investigating the landslide susceptibility conditions of larger areas and at scales greater than 1:100,000. The PAI archive focuses more on urban and populated areas. In this piece of research, various models were created to record landslides affecting scarps (scarp landslides; SCR_LSN) and landslides lying over slopes (hillslope landslides; HILL_LSN).

Modelling Approach
A predictive model must represent the response of a natural system to the trigger conditions described by its environmental features; the response may lie in the spatial distribution of new landslides or in the so-called prediction image. The effectiveness of the model can be measured by comparing the final expected results (the susceptibility map) with the actual results that are directly observed (the new landslide map). When studying the natural environment, the use of models is essential, as they allow for a simplification of the infinite natural variables, as well as operating within an acceptable processing time with conventional computers. Thanks to a greater exchange of information, the development of the hardware and software component for the acquisition and processing of data, and the interaction between different research groups, the methods for assessing susceptibility from landslides have evolved rapidly [3,4,[24][25][26][27]. In the last decade, several applications have been carried out with the aim of comparing results from different statistical approaches, as applied to the implementation of models of landslide susceptibility on the same area, using the same landslide inventory and the same control factors [28][29][30][31]. Although full agreement does not exist within the scientific community as to what the best approach to follow is, the experience gained through our previous studies tells us which statistical techniques may be adopted from among the many available (discriminant analysis, conditional analysis, binary logistic regression, classification, and regression trees) and which one leads to greater performance in terms of predictive fitting, and robustness [27,[32][33][34][35][36][37][38][39].

Conditional Analysis (CA)
The CA statistical approach has been widely adopted in landslide susceptibility assessment [24,26,33,[40][41][42][43][44]. The CA concept is simple and easily manageable in a GIS environment. That is why it is the single most recommended method when investigating the landslide susceptibility conditions of larger areas and at scales greater than 1:100,000.
The CA method is based on the concept that landslide density, computed on homogenous domains, represents the relational function between the environmental variables and the framework of a landslide area [2,26,45].
According to the CA concept, we must compute the density of cells in an unstable area for unique condition units (UCUs), defined by overlapping and combining a set of selected control factors and intersecting the UCU and landslide layers. From a statistical point of view, the landslide density corresponds to the susceptibility value of an area in a landslide, linked to a particular combination of control factors. Mathematically, this concept can be expressed as where probability (P) can be computed as the ratio between unstable (UCUunst) and total counts (UCUall) of cells for the cells which have a value. Landslide density is the relational function between the environmental conditions and the landslides in a given area, meaning that a density ranking order corresponds to a scale of landslide susceptibility [32,33,46].

Binary Logistic Regression (BLR)
One of the key points in determining the susceptibility conditions of an area with multivariate statistical techniques is the selection of an appropriate number of factors that can justify the spatial distribution of past and future forms of instability. In fact, many of these techniques provide an estimate of the importance of each factor in relation to the others, or its specific contribution in generating a particular type of landslide in the area under investigation. Many of these techniques rank, through hierarchization, the contribution of each factor in determining the landslide-specificity of an area by identifying the minimum and maximum number of factors needed, beyond which the performance variation of the model may be defined as insignificant or even negative [47,48].
According to Hosmer and Lemeshow [49], the aim of an approach based on binary logistic regression (BLR) is to enable the singling out of the best linear relationship between a dichotomous dependent variable (such as 1 or 0 representing the "presence"/"absence" of landslides, respectively) and a set of independent variables representing control geo-environmental factors. In the logistic regression equation, the expected dependent variable f(x) may be expressed as where logit(x) corresponds to a natural logarithm of odds [π/(1 − π)], expressed as a ratio between the likelihood of the presence of landslides (π) over the likelihood of their absence (1 − π); α is the intercept of the model; and β 1 , β 2 up to β n are the coefficients, which measure the contribution of each independent input variable [50][51][52].
In other words, the BLR allows us to estimate the contribution of each input variable using its coefficient in the probability equation by adopting the maximum likelihood classifier concept. Comparing the maximum likelihood computed for every β-value with each estimated error, the significance of the coefficients is tested using the Wald test [31,49,53]. The probability of occurrence can be estimated by multiplying the −2 log-likelihood ratio; thus, the negative log-likelihood (−2LL) statistic is obtained, which has a chi-square (χ2) distribution. According to this approach, BLR is executed through a stepwise procedure that allows the differentiation of only those predictor variables with a significant impact on the performance of the multivariate model [36,54,55].
A problem arising when using BLR is that which is related to choosing the appropriate sample size for an unstable area (positive cases) as opposed to stable ones (negative cases). Indeed, the number of positive cases is significantly lower than negative ones, thus generating an imbalance. Therefore, it is one of the most important choices concerning the random selection technique likely to lead to a balanced dataset of landslide and non-landslide pixels [56]. In this piece of research, the TANAGRA open-source software was used to apply forward stepwise logistic regression [57].

Variables Selection and Factor Class Definition
Landslides are directly connected to the many environmental condition types, mainly depending on slope morphology (slope angle, orientation slope, curvature, elevation, roughness, etc.) and other intrinsic characteristics (lithology, soil use, tectonic condition, landform classification, etc.), while the activation of new landslides depends on trigger factors, such as intensive rainfall or earthquakes [58][59][60].
All kinds of geo-environmental factors may be considered as potential predictive factors and can be introduced into the analysis. All the factors should be drawn at a specific measurement for classification (categorical, continuous). Acquiring each factor requires time and money. In addition, we must also consider the computing time needed to process huge volumes of data. This is why only those variables thought to be directly and/or indirectly capable of conditioning the established of a slope, thanks to knowledge acquired from previous studies, may enter the analysis. The use and reclassification of a limited number of factors also prevents the generation of a large number of not very widespread mapping units and thus the overestimation of the density for untrained mapping units.
The CA method, requires preselecting the factors to be entered into the analysis, and then defining the UCUs representing the mapping unit in this type of analysis. The univariate approach was followed for each acquired variable and their density calculated for the different types of landslide analyzed (scarp and hillslope landslide). The univariate approach was followed for each acquired variable and their density calculated for the different types of landslide analyzed (scarp and hillslope landslide). Univariate analysis allows to distinguish the significant from insignificant variables, and then combine only those variables among them which are likely to be the most influential in the production of the mapping units GRID layer. The following continuous and categorical variables have been acquired and analyzed (Table 2).

Continuous Variables
The 2-m ARTA-DEM was used and resampled to generate the digital elevation model (DEM). Slope angle, slope aspect, and landform classification were calculated using the DEM.

•
Slope angle (SLO) is usually considered as one of the main controlling factors in landslide modelling. At first, SLO was classified into 5 natural break intervals [14], expressed in sexagesimal The raster-file of the slope angle was obtained by resampling the 2-meter resolution ARTA-DTM flight ATA (2007/2009) to 100 m per side. As shown in Figure 3, the proposed reclassification for the slope angle for the hillslope landslide does not reveal the theoretical concept for the slope increase, which corresponds to an increase in the likelihood of landslides occurring. This does not happen with the scarp landslide, where increasing the slope angle leads to an increase in the percentage of landslides.
obtained by resampling the 2-meter resolution ARTA-DTM flight ATA (2007/2009) to 100 m per side. As shown in Figure 3, the proposed reclassification for the slope angle for the hillslope landslide does not reveal the theoretical concept for the slope increase, which corresponds to an increase in the likelihood of landslides occurring. This does not happen with the scarp landslide, where increasing the slope angle leads to an increase in the percentage of landslides.
Using an ArcMap open source tool, the LCL variable was derived directly from the DEM. LCL provides a simple and repeatable method to classify the landscape into slope position and landform category comparison. The different landform category classes can be determined by classifying the combination of a small and large neighborhood topographic position index (TPI) computed for each cell from different scales. The TPI is simply the difference between a cell elevation value and the average elevation of the neighborhood around that cell.
Positive values mean the cell is higher than its surroundings, while negative values mean it is lower [61]. To compute the LCL, the small and the large neighborhood areas were set to 500 and 100 m, respectively. Ten landform classes were thus obtained ( Table 2); • Outcropping lithology (LITH). Together with the slope itself, the lithological conditions of an area are the most important factors influencing the geomorphological processes on the slope. The lithology controls the response of the slope in terms of the trigger-time of the collapse because of rainfall or seismic forces and evolution of the process. The lithotypes cropping out in the map of the Sicilian region were used in this research and grouped into 9 different "lithological complexes", according to their geotechnical characteristics. The output lithological complexes were named as shown in Table 2. The clay complex is the most widespread one in the Sicilian territory, as it crops out in almost 35% of the area (more than 50,000 hectares);  Table 2 shows land cover characteristics in 11 different classes, for terrain units larger than 0.25 km 2 . The Corine 2006 map was converted into a soil cover digital map provided by the Sicilian region, using the second level of the Corine legend, except for the "urban areas" class, which has been divided into continuous (USE_111) and discontinuous (USE_112) urban fabric, corresponding to level III of the Corine Land Cover classification. Arable land (USE_21) covers more than 30% of the research area. Forest crops cover about 7% of the area and mainly appear in the northeastern sectors. Areas covered by shrubby and herbaceous vegetation associations are dispersed around the study area: they cover 17%. Urban area cover is only 4.5%; • Rainfall (RAIN). For rainfall, 280 rainfall stations were used to create the GRID rainfall map using the inverse distance weighted method. The database from the Sicilian regional administration office (http://www.osservatoriodelleacque.it) was used to extract the mean annual precipitation (for the period 1921-2009). Regarding precipitation, Sicily can be divided into three main sectors with three different pluviometric regimes: the northern sector: includes all the Tyrrhenian coast of the island. Rainfall here is characterized by a rainy season (autumn-winter) and a dry spring and summer. Eastern Sicily: in this area, rainfall is also greater in winter. Precipitation is often concentrated into short spells and is sometimes very violent. This is because the precipitation depression bearers come from Africa and are very hot and humid, favoring strong thermal contrasts. Southern Sicily: includes all the area bordered by the Mediterranean Sea. As in the rest of the island, winter is the rainy season. The number of rainy days is less than in the northern area (<60 days per year). In some areas, rainfall is sparse, especially in the coastal zone. The areas with the highest rainfall are the Madonie, Nebrodi, and Peloritani peaks, Etna, and the area south of Palermo. The driest areas are the Plain of Catania and the southern coast, in particular, Gela city.
Generally, there are two reasons that lead us to choose the smallest possible number of geo-environmental variables for the construction of the forecasting model, which allow the realization of what is called the "best model" for a specific area, capable of providing an acceptable performance forecast. On the one hand, achieving or obtaining each parameter requires spending a considerable amount of time and money, on the other, a large number of environmental variables results in a large number of possible combinations characterizing each of the territorial units chosen in an excessively specific manner, as the basis for statistical analysis. A high number of combinations bring a progressive decrease in the distribution of each combination class. The consequence is an unexpected decrease in the performance of the susceptibility model caused by the inclusion of variables which are closely related to a small number of cells, but poorly correlated to the global distribution of the remaining part, thus affecting the choice of the most predictive variables. Even the selection of factors is an essential step in landslide susceptibility assessment procedures, in which the nature of geomorphological criteria takes priority.
An expert-driven univariate analysis procedure for the following 5 processing control variables was carried out (Figure 3). Depending on the type of landslides, a maximum moderation criterion in the number of factors used is necessary to identify a first set of control factors that can be justified based on morphodynamic models, defined as heuristic at first approximation, for the distribution of the observed phenomena. Then, regression techniques may be applied to highlight the actual role played by each of the geo-environmental variables considered. However, it is also common practice, and recommended by applied statistics handbooks, to maintain certain diagnostic values of the variables (i.e., slope), even when stepwise regression procedures have greatly reduced their influence. A type of approach which, in a certain way, is the opposite of the analytical-geomorphological way, relying on deterministic or physically based methods.
That is why, for example, for "Slope angle", values were reclassified into 4 different classes (for hillslope landslide) and 3 classes (landslide scarp). At this point, a new univariate analysis was carried out and the results are shown in Figure 3.

Model-Building Strategy
The five reclassified geo-environmental variables were combined in unique conditions units (UCUs) aiming to get all the possible combinations of the classes of the different factors in order to assess the landslide susceptibility. The UCU layers were laid over with the landslide ones (SCR_LSN and HILL_LSN, respectively). The landslide densities were computed for each UCU class combination, as the ratio between landslide area counts and total pixel counts: according to Bayes' theorem [26,45], these values express the conditional probability of landslide occurrence, given a factor condition. The variables were combined and a UCU layer was derived from these. The combination produced a large number of combination classes (8355 for the HILL_LSN; 4173 for the SCR_LSN). The CA method requires that the combination of output factors has total areas which are large enough to guarantee the statistical significance of the "observed" sample. Table 3 (Table 3).

Results and Validation
Regardless of the statistical approach, models and susceptibility maps should always be validated to test the forecasting performance of the proposed protocol. Generally, a statistical validation procedure correlates the performance forecast (the expected landslides) that represents the space distribution of the model created by using a part of the observed phenomena (training dataset) with the distribution of a set of landslides which are not used in model-building (test dataset). This test dataset should therefore be temporally or spatially different from the training dataset used for model building, but the regional administration does not provide a systematic and continuous mapping of new landslides which occur reasonably often and in all Italian regions and worldwide. Many validation approaches have been presented in the last two decades in the relevant literature. In this paper, the random partition strategy was used, based on the random partition of the slope failures in two balanced populated training and test subsets to compare and estimate the robustness and reliability of the susceptibility models proposed.
The models presented in this research were derived using the CA and BLR statistical approaches, as the result of the average of 100 different replicas obtained by splitting the dependent variable, in 75% TRN_Subset and 25% TST_Subset randomly 100 times.
A reliable approach to test the performance of susceptibility models is the receiver operating characteristic (ROC) curve, allowing the comparison of the predictive performance models. The ROC curve is a plot of the probability expressing the sensitivity (TP rate) that represents the area classified correctly as susceptible (x-axis) versus the 1-specificity (FP rate), representing the probability of false prediction in response to an event for all the cutoff probability values (y-axis). ROC curve analysis allows the differentiation between two classes of events: unstable and stable cells [62]. The quantitative measure of model performance can be tested by computing the area under the curve (AUC) ranging from 0 to 1 [63]. The closer the AUC values are to 1, the higher the predictive performance of the model will be, while the closer the values are to 0.5 (random performance) the higher the inaccuracy of the model will be [50,64,65]. A value equal to 1 denotes a perfect discrimination between positive and negative cases. ROC curves were created for each of the two different statistical approaches (CA and BLR). In Figure 4, the ROC curves for both CA and BLR models were drawn for the training and test subsets with the aim of evaluating the model fitting for both approaches.
The BLR approach has been applied for forward stepwise selection independent predictive variables [28]. As shown in Table 4, average results for the 100 different splits of dependent variables in terms of SUCU. The model suite produced for SCR_LSN is characterized by a mean error rate of 0.17 (St.dev. 0.004) and AUC > 0.9 (outstanding). The graphs (Figure 4) show the average AUC values are close to excellent for HILL_LSN (>0.77) according to Hosmer and Lemeshow [49].
For HILL_LSN, the mean error rate is higher (about 0.29) but still very stable (St. dev. 0.001): a ranking of predictor variables derived from exploiting the forward stepwise statistical procedure.
Using BLR grid-cell-based models, the positive cases (unstable cells) are dramatically less than the negative cases (stable cells) and a suite of 10 different models (both for SCR_LSN and HILL_LSN) were prepared in order to estimate the model fitting, prediction skill, and robustness of the proposed approach [22,28,30,36,66]. Each suite model is made up of a subset of the unstable/positive cells and by an ever-changing subset which contains the stable/negative cells. The SCR_LSN grid-cell models were prepared by merging 6674 positive cells (5% of the total area) with 10 different subset, randomly selected negative 100 × 100 m cells. As for the HILL_LSN, suite models were created by merging 101,860 unstable cells (about 80% of the total area) and an equal number of stable cases. The negative cases in the subset were randomly selected in order to prevent stable cells from overlapping.  The BLR approach has been applied for forward stepwise selection independent predictive variables [28]. As shown in Table 4, average results for the 100 different splits of dependent variables in terms of SUCU. The model suite produced for SCR_LSN is characterized by a mean error rate of 0.17 (St.dev. 0.004) and AUC > 0.9 (outstanding). The graphs (Figure 4) show the average AUC values are close to excellent for HILL_LSN (>0.77) according to Hosmer and Lemeshow [49].
For HILL_LSN, the mean error rate is higher (about 0.29) but still very stable (St. dev. 0.001): a ranking of predictor variables derived from exploiting the forward stepwise statistical procedure.
Using BLR grid-cell-based models, the positive cases (unstable cells) are dramatically less than the negative cases (stable cells) and a suite of 10 different models (both for SCR_LSN and HILL_LSN) were prepared in order to estimate the model fitting, prediction skill, and robustness of the proposed approach [22,28,30,36,66]. Each suite model is made up of a subset of the unstable/positive cells and Categorical variables were binarized and BLR was applied. Each model underwent the BLR procedure 10 times. An open source statistical package (TANAGRA) was used to generate the contingency tables (Table 4) and automatically extract the true positive (TP) and true negative (TN), and single estimates of the sensitivity or hit rate (TP/(TP + FN)) and 1-specificity (FP/(TP + FN); [66][67][68][69][70].
For SCR_LSN, drawing from 18 predictors, 13 were always selected in all 100 model repeats of the models. SLO, LCL_MNTPS, and USE_321 were systematically extracted as the first three significant factors, and with positive coefficients. The regression coefficients, for each selected predictor, were marked by conceptually coherent signs. A scan was easily imagined, and LCL_PLAINS and LCL_OPEN had a negative coefficient instead (Table 5). The main significant controlling factors for SCR_LSN in the study area which showed a slope angle for LCL_MNTPS was landform classification (positive coefficient), and LCL_PLAINS with a negative coefficient.
On to the HILL_LSN, the predictors extracted by BLR, are more than those extracted for SCR LSN, and above all, this slope angle variable does not appear to be as highly significant as may have been imagined. This is probably due to the fact that the diagnostic areas used here (whole landslide body area) are not so discriminant and able to determine the preparatory conditions for landslides. Additionally, it may owe something to the reasonably large cell size. The PAI landslide inventory is not characterized by high accuracy and some landslide typologies, like for "Areas with diffused landslide", do not allow a true discrimination of the geo-environmental variables which influence the slope stability conditions. Among the variables, LCL PLAINS ranks as the first predictor extracted in the analysis with a negative coefficient. Among all the variables, 20 were systematically extracted in all 100 different repeats, lithological conditions being those selected most frequently, and with a higher weight. Among them, LITH_Ca is characterized by having a negative coefficient value (Table 6).

Discussion
These analyses allow the generation of the susceptibility maps for SCR_LSN and HILL_LSN for both CA and for BLR.
The GRID UCU layers were intersected with 100 different TRNsubset landslide layers (SCR_LSN and HILL_SCR) and the mean density was calculated for each mapping unit. The susceptibility model derived from the CA approach was obtained by assigning each UCU to its corresponding class, according to its computed density. For each landslide type, the computed density corresponds to the susceptibility function (S UCU ) equivalent to the conditional probability of a new landslide, given the selected predictive variables [44].
Although numerous studies on landslide susceptibility zoning have been published, no global approach is yet shared by the scientific community to classify susceptibility maps. In the studies, where the classification of territories in accordance with their level of landslide susceptibility is the aim, it is appropriate to determine the optimal cutoff classification values which are capable of dividing the mapping units mainly into two large domains: stable areas (susceptibility values are less than the cutoff) to the left of the cutoff and unstable terrain (susceptibility greater than the cutoff) to the right of the cutoff [66]. When statistical approaches such as CA and BLR are used, the statistical software sets the significant cutoff (mcutoff) as equal to 0.5 by default [54,[71][72][73][74] in order to correctly classify the predicted stable or unstable cells. This research area is quite extensive (>25,000 km 2 ), therefore, it is useful to divide the territory into a few different classes, depending on the value of susceptibility, and define some useful class range boundaries for the susceptibility (or density) index. The aim is to be able to divide the entire Sicilian territory into four classes of susceptibility: very low, moderate, high, and very high. To such an end, three cutoff values are required: the low cutoff (lcutoff), the central cutoff (mcutoff), and the high cutoff (hcutoff). Operationally, the mcutoff is first identified graphically, on the ROC curves, as the maximum value of the difference between the FP and TP rate. Subsequently, with the same method, were identified the other limits of the classes for the areas to the right of the mcuoff (high) and for the areas of the left of the mcuoff (low). Table 7 shows the identification of mcutoff for the model obtained by following the CA for SCR_LSN. The susceptibility maps based on both statistical approaches have been derived and with test inventory subsets verified. Therefore, the maps presented in Figure 5 classify the Sicilian territory with low, moderate, high and very high susceptibility values according to their degree of propensity to instability and second cutoff probability values identified objectively.    The presented research is focused on verifying two main concepts: the first, to explore the possibility of using existing data (landslide inventories, thematic maps of predictor variables) to create regional landslide susceptibility models; the second, to verify the exploitability of CA, comparing it with models based on BLR, in order to create landslide susceptibility models for areas and studies bigger than 1:100,000.
When analyzing the maps presented in Figure 5, it is noteworthy that those created by BLR (b, d) have a chromatic gradation indicating susceptibility conditions for Sicily which are generally higher than those created by CA (a, c). Therefore, this remains an open question: Which statistical approach is more representative of the conditions of susceptibility?
To answer this question, we analyzed the various sectors of the research area in detail and the different maps were compared and analyzed ( Figure 6). In order to single out the targets in the present study, 75% of instability phenomena (training subset) surveyed in the regional inventory of landslides, produced by the Environment and Territory Department of the Sicilian region (ARTA) was used to create two different landslide susceptibility In order to single out the targets in the present study, 75% of instability phenomena (training subset) surveyed in the regional inventory of landslides, produced by the Environment and Territory Department of the Sicilian region (ARTA) was used to create two different landslide susceptibility maps: one based on the conditional analysis approach, and another on the binary logistic regression approach. The slope failure archive was simplified into two main types based on expert judgment of which preparatory variables they shared: the scarp landslide and hillslope landslide. The results obtained by CA showed an outstanding predictive ability for models based on a small number of predictive parameters combined in UCUs which were verified through spatial validation using a test subset randomly extracted from the Sicilian regional landslide inventory that covers the entire territory. ROC curve validation of the 100 different models showed the unquestionable excellence and stability of the forecasting performance for both SCR_LSN and HILL_LSN. A quality control test on four susceptibility maps was applied according to the degree of fit approach (DF) [24,27,[73][74][75]. DF represents the percentage of an area subject to landslides for each range of classes of susceptibility and is determined by cross-tabulation of the landslide test subset with the susceptibility class of maps. DF can be expressed with the following formula: where LSN is the area occupied in the i-class of susceptibility and Si is the area of the i susceptibility class [74,75]. The percentage of area subject to landslides falling into the null or moderate susceptibility class may be considered as a false negative error. The histograms shown in Figure 7 highlight an excellent predictive ability of new landslides not seen in the construction of maps. The percentages of relative accuracy are understood as the sum of the degrees of fit of the high and very high susceptibility classes [27] and the value of R 2 of the trend line confirms the goodness of the susceptibility performance models created.
Hydrology 2020, 7, x FOR PEER REVIEW 21 of 29 maps: one based on the conditional analysis approach, and another on the binary logistic regression approach. The slope failure archive was simplified into two main types based on expert judgment of which preparatory variables they shared: the scarp landslide and hillslope landslide. The results obtained by CA showed an outstanding predictive ability for models based on a small number of predictive parameters combined in UCUs which were verified through spatial validation using a test subset randomly extracted from the Sicilian regional landslide inventory that covers the entire territory. ROC curve validation of the 100 different models showed the unquestionable excellence and stability of the forecasting performance for both SCR_LSN and HILL_LSN. A quality control test on four susceptibility maps was applied according to the degree of fit approach (DF) [24,27,[73][74][75]. DF represents the percentage of an area subject to landslides for each range of classes of susceptibility and is determined by cross-tabulation of the landslide test subset with the susceptibility class of maps. DF can be expressed with the following formula: where LSN is the area occupied in the i-class of susceptibility and Si is the area of the i susceptibility class [74,75]. The percentage of area subject to landslides falling into the null or moderate susceptibility class may be considered as a false negative error. The histograms shown in Figure 7 highlight an excellent predictive ability of new landslides not seen in the construction of maps. The percentages of relative accuracy are understood as the sum of the degrees of fit of the high and very high susceptibility classes [27] and the value of R 2 of the trend line confirms the goodness of the susceptibility performance models created. The propensity of a territory to be affected by new landslides and the degree of hazard or risk that characterizes it are usually expressed with the help of a map in which the area is divided into different zones according to the different values that qualify it. In this mapping, the territory is zoned or divided into homogeneous zones or user-defined fields/areas, the ranking of which is defined according to their real or potential degree of landslide susceptibility [8].
From among the Italian regions, Sicily is one of areas most affected by geomorphological instability. Landslide activity is a clear threat to the territory, facilities, and people present there. From this point of view, each part of the territory is characterized by a landslide vulnerability value. Generally, it depends on the territorial level of exposure to the threat, determined by the socio- The propensity of a territory to be affected by new landslides and the degree of hazard or risk that characterizes it are usually expressed with the help of a map in which the area is divided into different zones according to the different values that qualify it. In this mapping, the territory is zoned or divided into homogeneous zones or user-defined fields/areas, the ranking of which is defined according to their real or potential degree of landslide susceptibility [8].
From among the Italian regions, Sicily is one of areas most affected by geomorphological instability. Landslide activity is a clear threat to the territory, facilities, and people present there. From this point of view, each part of the territory is characterized by a landslide vulnerability value. Generally, it depends on the territorial level of exposure to the threat, determined by the socio-economic value of the assets as well as by their resistance to the stresses expected. Interaction between humans and the natural environment is a very complex and diverse issue, not often approached in a systematic way, as resources are primarily invested in risk-mitigating measures, while being severely limited when it comes to the medium-and long-term research needed to understand the environment better and more effectively. The current PAI archive version is highly dependent on the past instability scenario. By looking at those that were counted and catalogued using a matrix system of evaluation, it is possible to derive the conditions of the associated geomorphological risk. This last concept represents a big step forward as it is now necessary to consider strongly the concept of landslide susceptibility which would represent the adoption of a spatial analysis tool with predictive power. On this scale, output cuts will be based on administrative boundaries (municipalities).
To meet this goal, the mean values of susceptibility (CA) and probability (BLR) were assigned to each municipality of the Sicily region (the calculation was only performed for the 382 municipalities included within the main island) in a GIS environment.
The output susceptibility values obtained by means of the two methods (CA and BLR) for the municipal boundaries are represented as maps in Figure 8. The picture shows the susceptibility maps relating to both conditional methods (a, b) and BLR (c, d) for the SCR_LSN and HILL_LSN typologies, respectively. In particular, it is possible to observe that for the SCR_LSN types, maps (a, c) are very different with regard to the distribution of the susceptibility values among the municipal boundaries. On the other hand, the comparison between the maps concerning the HILL_LNS type shows a convergence among the results and suitable differentiation for the urban units. Both statistical approaches (CA and BLR) show higher susceptibility landslide values in the northeastern portion of the island, in correspondence with the mountainous chains of Nebrodi and Peloritani in the province of Messina.

Final Remarks and Management Implications
The research results presented here have demonstrated the ability to leverage a set of existing data already in the public administration to generate regional landslide susceptibility maps. A total of 100 performances executed for both the CA and BLR approaches have allowed susceptibility patterns characterized by excellent performance to be obtained, with remarkable stability, robustness, and reliability, even if CA generated more powerful models in terms of forecasted performance (AUC curves). With the goal of creating, for the whole Sicilian territory, susceptibility maps that could be useful for the purposes of effective land management and spatial urban planning, this study proposes a reclassification method into four susceptibility map classes. The reclassified susceptibility maps, determined by the identified cutoff boundaries, were validated quantitatively with the degree of fit technique. Validations demonstrated the reliability of the maps created and the explorability of the proposed protocol. However, some differences may be highlighted when comparing the final maps generated with two different approaches. In fact, the maps generated with CA seem to better represent the conditions leading to the failure of a territory than those created by BLR-generated ones. This is more realistic for the SCR_LSN than for HILL_LSN. This is probably due to the fact that the BLR tries to find a correlation equation between landslides and predictor training models, using a diagnostic area (the entire area subject to landslides) not only representative of the trigger conditions, involving areas only passively affected by the landslide and, in some cases generating an overestimation of the susceptibility value.
Of the 382 of Sicilian municipalities examined, 244 (nearly 64%) were correctly classified as unstable (i.e., prone to landslide instability) as they fall in the classes of high and/or very high susceptibility, both for scarp and for hillslope landslides. We particularly want to highlight that susceptibility is predicted as being very high in 90 of 108 municipalities in the province of Messina (more than 80%). The municipalities featuring lower susceptibility values were those within the provinces of Syracuse and Ragusa. This approach is based on the propensity for gravitational instability, unlike PAI, and allows the concept of landslide density or density index to be overcome. Table 8 lists the top 10 municipalities in order, compared to the size of an area that may be affected by new landslides in the future.
For each municipality surface area, the extension area has been reported, the density (D) which corresponds to the landslide density for the current phenomena surveyed in the PAI and the P (or the area which is located to the right of mcutoff) for each municipality, and finally, the difference between the current framework of the landslide and that provided (Odd).
The table shows that the towns of Isola delle Femmine and Roccafiorita, which despite having a higher SCR_LSN density (respectively 7.04% and 6.04%) are the last two municipalities in the table, because of the difference, in terms of Odd, between the density and the future probability, which is lower than that seen in the municipalities of San Vito Lo Capo and Frazzanò. Almost 4 km of new territory in the municipality of San Vito Lo Capo, for example, could be affected by new SCR_LSN activations in the future. Similarly, we can say that linked to HILL_LSN, the territory of Alcara Li Fusi is the Sicilian municipality with the highest HILL_LSN density value, but the municipalities of Ficarra (with 8 km 2 ) and Sinagra (with 6.54 km 2 ), both in the province of Messina, will be most affected by new activations of hillslope landslides.
The large and widespread use of known geostatistical methods has gone through at least three decades of landslide hazard studies, but still does not eliminate some of the conceptual and operational bottlenecks, only sporadically resulting in the safety enforcement schemes by the authorities involved in studying landslide risk in Italy. Since economic problems, which are common to all countries, do not allow either investment in research projects on a medium-and long-term scale, the concept of landslide susceptibility should represent, for all political and administrative actors dealing with environmental and territorial policies, a new approach to the problem associated with mass movements. For this reason, the scientific community is engaged in a continuous search for methods and techniques to estimate the degree of real and potential instability, using the minimum amount of equipment, data, and economic resources possible. Generally, substantial difficulty exists in identifying the most reliable procedures, allowing this matter to be approached in a non-traditional manner based on modelling and investigative techniques built on the exchange of experiences between experts, studies, and experiments on every continent, and showing different strategies and possible technical combinations, depending on the type and/or the number and complexity of the investigation, producing susceptibility, hazard, and risk maps, used as the basis for decision-making processes in land management. In this framework, further effort is needed in trying to make the different methods more objective and shared by all, in order to be simple and repeatable and, most of all, in transferring the knowledge gained to laws that underpin territorial planning, building regulations, and in civil defense plans [22].
Over the decades, many research groups and national and international commissions have tried to provide precise definitions, in order to generate maps indicating the different urban planning vocation of an area. For this reason, the scientific community is engaged in a continuous search for methods and techniques to estimate the degree of real and potential instability, using the minimum amount of equipment and possible economic resources.

Conflicts of Interest:
The authors declare no conflict of interest.