The geospatial methodology presented in

Section 2 was applied hereinafter for megazoning the earthquake-induced soil liquefaction risk in Europe. First, a European data-driven prediction model for liquefaction occurrence [

7] was adopted for mapping the probability of liquefaction in the European territory (

Section 3.1). Then, liquefaction potential charts were convolved with the exposure model adopted for Europe (

Section 3.2) to assess the liquefaction risk at the continental scale. A knowledge-driven method was adopted at this stage. Indeed, the AHP technique was applied with the aim of mapping the liquefaction risk at the European scale, as illustrated in

Section 3.3.

#### 3.1. Mapping the Probability of Liquefaction by Applying A European Prediction Model

Geological, geomorphological, hydrogeological, and seismological information in addition to digital terrain and shallow lithology data available at the European scale were collected and processed in a GIS database with the final aim of mapping the liquefaction potential in Europe.

A prediction model was calibrated after constructing a dataset of liquefaction occurrences from four recent European earthquakes whose main characteristics (e.g., ground shaking maps, liquefaction manifestations) are well documented [

7]. This dataset was adopted to identify the optimal explanatory variables best correlated with liquefaction occurrences. The last step was the calibration of the logistic regression model to predict liquefaction manifestations in both historical (for the validation of the model) and future earthquakes.

The optimal explanatory variables that, at a given point of the territory, are best correlated with liquefaction occurrence were identified according to the criteria of

practicality,

efficiency, and

proficiency characterizing the Luco and Cornell methodology [

16,

17,

18,

19], which was originally developed for applications in structural earthquake engineering. Different explanatory variables were considered (e.g., the distance to the nearest river, to the nearest coast, and to the nearest water; the Terrain Roughness Index (TRI); and the Topographic Position Index (TPI). Three out of the nine explanatory variables were selected as optimal geospatial predictors of liquefaction occurrence [

7]. They are listed as follows:

where PGA are the values of peak ground acceleration (horizontal component), referred to three different return periods (475, 975, and 2475 years), for standard ground conditions (outcropping bedrock and level site), extracted from the deliverables of the SHARE project (Seismic hazard harmonization in Europe;

http://www.share-eu.org/). These values of PGA were then multiplied by the soil factor defined in Eurocode 8 Part 1 [

22] (hereinafter, EC8) to take into account the possible side effects. Ground categories of EC8 were assigned by exploiting the global topographic-slope-based V

_{S30} map. In doing so, only 1D lithostratigraphic amplification was explicitly considered. MWF stands for Magnitude-Weighting Factor, which is defined by the following equation:

where Mw was defined by using the European seismogenic zoning proposed in the SHARE project. The MWF [

23] is the inverse of the Magnitude-Scaling Factor (MSF). Indeed, MSF is typically used as a proxy for earthquake duration, which plays a key role in liquefaction occurrence. Different models are available in the literature to define the MSF. In this study, the model adopted is that proposed by Youd et al. (2001) [

24].

As mentioned above, the optimal geospatial explanatory variables were used to calibrate a data-driven model to predict liquefaction occurrence. For this purpose, logistic regression was applied to model the liquefaction occurrence in Europe based on recent studies in the literature [

2,

3,

4]. Logistic regression is a machine learning procedure that is effective when analyzing a dataset in which several independent (explanatory) variables determine a binary outcome. In this case, the outcome is represented by a liquefaction label: 1 stands for

liquefaction, 0 stands for

no liquefaction. The independent variables are the geospatial explanatory proxies selected with the Luco and Cornell methodology, namely CTI, ln(PGAm) and ln(V

_{S30}). In the logistic regression, the probability of liquefaction is therefore calculated using the following equation:

where X was computed as

where the x

_{k} are the explanatory variables and γ

_{k} the coefficients of the regression. By integrating the optimal geospatial predictors into Equation (4), the latter can be rewritten as follows:

The calibration of the coefficients A, B, C, and D of the regression was carried out using the previously mentioned dataset of liquefaction occurrences associated with the four European earthquakes, balanced after applying different sampling techniques. Indeed, three alternative re-sampling algorithms were adopted to correct the disproportion in the dataset of the size of the classes 1 and 0 of the cells. They are the under-sampling technique [

25], the Synthetic Minority Over-Sampling Technique (SMOTE) [

26], and the Adaptive Synthetic (ADASYN) [

27] algorithms.

Each of these three re-sampling techniques was used to construct a different logistic prediction model for Europe. To calibrate the coefficients of the logistic regression the balanced dataset was split into two subsets: a training dataset and a testing dataset. In each of these two data groups, one out of two ratios between classes was maintained. The training dataset, including 75% of the balanced dataset, contained the cells upon which the coefficients of the prediction model were calibrated. On the other hand, the testing dataset contained data that were used to test the performance of the model in predicting liquefaction occurrence in historical earthquakes.

The goodness of the predictions of the alternative models was assessed quantitatively through a Receiver Operating Characteristics (ROC) analysis and expressed in terms of the parameter Area Under the Curve (AUC). The latter represents a useful procedure for organizing classifiers and visualize their performance, e.g., in Fawcett [

28]. From ROC analysis applied to the training set, it turned out that the best prediction model of liquefaction occurrence is that associated with the ADASYN algorithm although the under-sampling and the SMOTE algorithms showed good performance as well [

7].

After calibration of the coefficients, Equation (5) can be rewritten as follows:

Further details on the prediction model adopted for mapping the probability of liquefaction in Europe are discussed in the article by Bozzoni et al. [

7]. The latter also computed liquefaction potential charts for continental Europe with reference to three levels of severity of the expected ground motion, namely for a seismic hazard with return periods of 475, 975, and 2475 years. The outcomes of this work were validated by superimposing on the charts the historical liquefaction cases from the European catalogue, delivered in the LIQUEFACT project. The charts of

Figure 2a,b show the mega-zonation of European territory for liquefaction potential with reference to a return period of 475 years, superimposed on the locations of historical liquefaction occurrences associated with the same return period increased and decreased by +/− 10% [

7]. It is important to remark that the latter are independent data with respect to the ones used to calibrate the prediction model. In the map, almost all the historical liquefaction manifestations are located in areas characterized by liquefaction potential (red areas in

Figure 2a). This is particularly evident in Italy and the Balkan region. The results are also displayed in

Figure 2b according to a chromatic scale based on the following five different classes of probability of liquefaction PL defined by Zhu et al. [

2]: PL < 0.01 very low, 0.01 < PL < 0.03 low, 0.03 < PL < 0.08 medium, 0.08 < PL < 0.2 high, 0.2 < PL < 1 very high. It is worth noting that the historical liquefaction occurrences are mainly located in areas characterized by a very high probability of liquefaction.

It is important to remark that the liquefaction potential maps were computed in Bozzoni et al. [

7] by pre-filtering the datasets in order to exclude from the analysis the territories that were either geologically incompatible with the phenomenon of earthquake-induced liquefaction (e.g., rocky formations) or characterized by an expected severity of ground shaking too low for liquefaction triggering (a value of horizontal peak ground acceleration equal to 0.1 g was arbitrarily assumed as a threshold in this regard).

#### 3.2. Exposure Model for Europe

Two indicators were used to account for the exposure in the computation of the risk maps. The population density, which is a well-established proxy in the case of urbanized territories [

29], was combined with the CORINE land cover map for Europe which provides the geo-referenced distribution of non-residential, strategic areas in Europe.

The population density data for Europe was obtained from the European Global Human Settlement Layer (GHSL;

https://ghsl.jrc.ec.europa.eu/index.php). The census data refer to the year 2015, and two different resolutions are available, 250 m and 1 km. The data are provided in a raster format, in which each cell contains the estimated number of inhabitants in that cell. The resolution adopted for this study was 1 km to be consistent with the resolution of the previously presented hazard charts. The raster map represents the population density in terms of inhabitants/km

^{2} unit, which is the most common format to express the population density. The data were grouped into five classes of exposure, as done for the probability of liquefaction in

Section 3.1. In particular, the following classes for the population density (Pd) were adopted:

very low: Pd < 400 inhab./km^{2};

low: 400 ≤ Pd 800 inhab./km^{2};

medium: 800 ≤ Pd < 2000 inhab./km^{2};

high: 2000 ≤ Pd < 5000 inhab./km^{2};

very high: Pd ≥ 5000 inhab./km^{2}.

An additional proxy for exposure was found in the CORINE European map (

http://land.copernicus.eu/pan-european/corine-land-cover/clc-2012/view), which provides the geo-referenced inventory on land cover areas in Europe. The data that referred to the land use was particularly helpful in order to identify strategic areas that population density could not identify.

Thus, the population density and the CORINE land cover were jointly adopted as exposure proxies. In particular, areas where airports, ports, roads, and railways are located across Europe were assigned to the highest exposure class (very high). The final exposure model is shown in

Figure 3.

#### 3.3. Assessment of the Liquefaction Risk at the European Scale by Using the AHP Technique

Since the risk of occurrence of earthquake-induced liquefaction is the convolution of

seismic hazard,

vulnerability, and

exposure, the charts of liquefaction risk for Europe have been calculated by convolving these three variables represented as geospatial data at a continental scale. The risk assessment was carried out using the Analytical Hierarchy Process (AHP), a multi-criteria decision method introduced by Saaty [

8] and then successfully applied for mapping natural hazards by [

9,

10,

11]. The AHP technique belongs to the category of knowledge-driven methods. Indeed, a set of explanatory variables were ranked. Next, their relative importance was assessed with respect to the goal of the mapping by assigning weights via a pairwise comparison matrix. The final risk map was calculated based on weighted sums and rating assignments through a sequence of overlay operations. The assigned ranking was based on expert-judgment. The geospatial explanatory variables representing hazard, vulnerability, and exposure were combined, and the goal, with respect to which the variables were compared, was the liquefaction risk map. The more a variable influences the risk, the higher will be its weight in the calculation of the final chart. The main steps of AHP method procedure are explained hereinafter.

In Step 1, the alternatives were arranged in a GIS environment and their values were classified into different classes. The classes were ranked, from the highest class (i.e., the value that has the greatest importance for the goal of the map) to the lowest. Step 2 of the procedure involved organizing the data in a matrix. The pairwise comparison of the alternatives was carried out on a qualitative scale. Experts rated the comparison as equal, marginally strong, strong, very strong, and extremely strong, as shown in

Table 1. As an example, the row corresponding to alternative A in the column corresponding to alternative B shows the value 9, indicating that A is “extremely strong” compared to B in light of the objective. In general terms, the alternatives in the

ith row are stronger than those in the

jth column if the value of the matrix (

i,

j) is greater than 1; otherwise, the alternatives in the

jth column are stronger than those in the

ith row. Consequently, the [

j,

i] element of the comparison matrix is the reciprocal of the [

i,

j] element.

Step 3 focused on the computation of the principal eigenvalue and the corresponding normalized right eigenvector of the comparison matrix built in Step 2. The elements of the normalized eigenvector were termed weights with respect to the goal of the map and the comparison of the alternatives. Step 4 involved an assessment of the consistency of the comparison matrix. Indeed, the AHP incorporates an effective technique for checking the consistency of the evaluations made by the decision-maker when building each of the pairwise comparison matrices involved in the process. A consistency index needed to be computed and, if it turned out that it was lower than a pre-defined threshold, the comparisons may have to be re-examined. The Consistency Index (CI) was then calculated by adopting the following relation:

where λ

_{max} is the maximum eigenvalue of the judgement matrix and

n is the rank of the matrix. CI is then compared with that of a Random Matrix (RI). The corresponding ratio, i.e., CI/RI, is termed the Consistency Ratio (CR). Saaty [

8] suggested that the upper threshold value of CR should be 0.1.

In the final step, namely, Step 5, the value of each alternative was multiplied by its own weight. Subsequently, the values obtained were summed up and the final rank calculated. This last step is developed in a GIS environment. The alternatives were represented by overlapped raster files: every pixel of each raster contains a value calculated in the first step. The final raster file consists of a map representing in each pixel the sum of the value contained in the pixels of the alternatives.