1. Introduction
Most of the Italian territory has a relevant seismicity and a large number of existing buildings characterized by a high seismic vulnerability [
1]; indeed, more than 70% of the existing real estate was made in the absence of any seismic standard [
2]. For this reason, the mitigation of seismic risk can be achieved through suitable strategies, measures and interventions aimed to improve seismic behavior [
3,
4,
5]. The huge number of existing buildings and the urgent need to assess the performance levels, potential damages, and socioeconomic losses and provide retrofitting actions entail a preliminary screening on a territorial scale, aimed at the prioritization and planning of further and more detailed analyses optimizing the necessary resources.
At this scale of analysis, it is not possible to use the procedures generally employed to assess single buildings, due to the detailed information required and the considerable computational burden, but simplified procedures should be used, which are easily implementable on the basis of limited information and provide results with acceptable reliability and accuracy [
6].
Different types of methods, widely developed in the last 50 years, are particularly suitable to this aim; they are generally classified as three different approaches [
7,
8]: (i) empirical procedures that identify building categories, called “vulnerability class”, on the basis of recurrent typological and structural features to define corresponding vulnerability functions calibrated using damage observed after past earthquakes or combine the evaluation of a few parameters (geographical position, general characteristics of the structure, and possible damage), to obtain a final seismic vulnerability index to establish a relation between the seismic action and response of the buildings [
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20]; (ii) mechanical/analytical procedures that define vulnerability functions as a relation between structural capacity and seismic demand by means of structural analysis of one or more numerical models representative of samples of buildings. However, these types of methodologies are in many cases burdensome for large-scale applications, requiring very detailed knowledge about the building features and time-consuming structural calculations [
21,
22,
23,
24]; and (iii) hybrid procedures that calibrate the results of the mechanical/analytical assessment with post-earthquake observational damages, useful in the cases of partial absence of damage data or difficulty with the calibration or validation of the results of analytical models [
25,
26]. It worth adding that the implementation of all these procedures finalized to the seismic vulnerability and risk assessment on a large scale represents a fundamental step for any subsequent types of losses estimations [
27,
28,
29,
30] able to describe in a direct and immediate way the impacts of an earthquake event on society and the economy and, hence, is helpful for the public authorities and decision makers to plan and manage earthquake risk mitigation and disaster risk-reduction measures optimizing the necessary resources [
28].
Despite several proposals and the application of the above-mentioned procedures, the lack of necessary information for carrying out seismic assessment procedures and the need to reduce the cost and time connected to the collection and management of huge number of data remain challenges; as a matter of fact, in recent years, large-scale data collection procedures have been developed with the common aim to define inventories of existing buildings [
31,
32]. The level of detail of the knowledge framework reachable is closely connected to the size of the building stock investigated and the context of application [
33]; from this perspective, it is possible to follow three different approaches depending on the scale and scope of the evaluation: the first approach requires few general data of a certain set of buildings, e.g., the building typology and the structural material and age of construction, suitable for national, regional, and urban scales; the second approach entails the use of more information about morphological, geometric, and structural features, applicable on an urban or district scale; the third approach requires very detailed knowledge of single buildings in order to perform a sophisticated numerical analysis, which is not feasible for large-scale application [
34].
The degree of detail achievable also depends on the type and resolution of available information sources. The most commonly used, as described in [
32], are the following:
- Census data, which are freely accessible and provided in the form of aggregate data about structural typology, age of construction, number of floors, and state of maintenance, with regard to specific portions of urban territory, allowing for assessment from a national to urban scale; 
- Interview-based surveys, which allow one to gather more refined information about typological-structural features of sets of buildings by interviewing local technicians with extensive knowledge and experience. Different types of these procedures are available in the literature [ 34- , 35- , 36- , 37- , 38- , 39- ]. In particular, the CARTIS procedure [ 40- ], developed in the framework of the RELUIS project by the Italian Civil Protection Department, represents a peculiar approach, collecting data about recurrent typological–structural building classes within homogeneous urban sectors characterized by uniformity of the urban fabric and year of construction; 
- Remote sensing (high resolution (HR), or very-high-resolution (VHR) satellite imagery), which uses the processing of satellite images to obtain a huge amount of georeferenced general data about buildings of very large portions of territory, e.g., the footprints, number of floors, and height, but it should be integrated with other data sources to provide more information fundamental for vulnerability assessment [ 41- , 42- ]; 
- Building-by-building surveys, which are able to define the most detailed knowledge about individual buildings and are suitable to perform an accurate vulnerability assessment of the focused structures but are impracticable for large-scale assessment due to the time, computational efforts, and economic resources required. 
In addition to these sources of information, in recent years, there has been an increase in the popularity of applications of machine learning (ML) in the data collection process for seismic vulnerability and risk assessment, and some proposals have been developed to rapidly obtain a large amount of information processing images of wide sets of existing buildings with low computational effort [
43,
44,
45,
46].
Other elements that should not be overlooked are the uncertainties associated with each piece of input information that propagate in the analysis, producing consequent total uncertainty in the results, especially in the case of partial lack of data typical of large-scale applications. Uncertainties are generally classified into three categories: (i) aleatory variability, linked to the inherent randomness of earthquake phenomena; (ii) epistemic uncertainty, due to a lack of knowledge, and the hypothesis underlying the models; (iii) ontological uncertainty, which is unknown or unexpected and very difficult, if impossible, to take into account; hence, in all the models for seismic vulnerability and loss assessments, the first two types of uncertainties must be taken into account [
28].
It is clear that, to face the problems of absence, the unavailability and/or inaccessibility of information, the integration of the abovementioned different types of sources allows one to derive the missing data, obtaining a nearly complete knowledge framework for a large set of buildings; in addition, the uncertainty, accuracy, and reliability of information should be evaluated by means of comparison with more detailed data (if available) or by implementing different procedures of seismic vulnerability analysis on a different scale.
In this broad framework, a suitable tool is required to manage, transform, overlay, process, and display a large number and wide variety of data formats in the same environment [
47]. As a consequence, the use of the Georeferenced Information System (GIS) is gaining momentum in civil engineering applications, especially in the field of risk assessment on a large scale. The potential of such an IT tool is the possibility to create representative georeferenced databases by manipulating and integrating large sets of different typologies of information derived from the overlap of several data sources and then to implement automatic numerical algorithms for multiple purposes [
48]. Indeed, in the GIS environment, it is possible to manage, combine, and analyze geospatial databases about existing building stock and display general information and results about vulnerability assessment, damage, and loss estimation [
49,
50,
51,
52]. In addition, the possibility for enhancing the collection data procedure and subsequent seismic vulnerability and risk assessment on a large scale is based on the use of fuzzy logic [
53,
54,
55], machine learning [
46,
56,
57], and Artificial Neural Networks (ANNs) [
58,
59,
60,
61]. As a matter of fact, in recent years many proposals have been developed that are able to implement computer algorithms trained using post-earthquake data, implement buildings images and/or expert-opinion-based information to identify the fundamental input parameters, and calculate seismic vulnerability and risk indicators. In this framework, the GIS represents a suitable tool to collect a large number of data and images [
62,
63] useful for the training phase. These developments make possible the comprehensive use of a variety of factors and the rapid extrapolation of necessary parameters for seismic assessment on a large scale, and they make it possible to check the reliability and uncertainty of the information gathered and results obtained using traditional technics [
63,
64].
In this context, the authors have proposed a procedure to extract, integrate, and elaborate on a large number of data from different available sources, which are able to construct a geo-referenced cartographic and descriptive database in a GIS environment to perform the seismic vulnerability assessment of current residential buildings on a large scale through indirect, empirical algorithms.
The procedure follows a top-down approach, using all available datasets with different levels of detail. An application has been developed for a case study in Puglia, Italy, the City of Bisceglie. The analysis of the urban area is performed on the basis of historic cartography and ISTAT statistical data [
2], providing a preliminary definition of homogeneous urban sectors. In each of these urban sectors, the recurrent geometrical, structural, and technological features of buildings are identified using more detailed georeferenced datasets and the CARTIS procedure [
40], and different typological building classes representative of the entire building stock are defined. For validating the database, a phase of verification of the features of some typological building classes is made by analyzing a sample of buildings for which detailed information is available. Finally, on the whole database for the examined urban areas, a simplified version of the procedure proposed in [
65,
66] is applied, and the seismic vulnerability assessment for 3726 individual buildings and a 3D interactive tool is released for the visualization of data and results. 
  2. State of the Art about GIS-Based Building Inventory for the Seismic Vulnerability Assessment on a Large Scale
A GIS is an IT tool able to store, manage, process, compare, correlate, and display a large number of spatial variables, derived from different georeferenced datasets, that are useful to many types of applications, e.g., risk and environmental impact assessment, emergency response and management, urban planning, cartography, and geographic history, regarding more or less extended territorial areas. In addition, GISs are flexible, freely available, open-source, and capable of containing multi-hazard information; as a matter of fact, during the last decade geotechnical, topographical, geological, and hazard data, and various satellite images, have been realized, collected, processed, and integrated into qualitative and quantitative spatial databases, allowing for the development of various types of GIS-based modeling [
48]. For these reasons, the GIS also represents a powerful tool in the field of seismic vulnerability and risk assessment on a large scale, for which all approaches and methodologies must be based on a wide and accurate set of information, being the knowledge of morphological, geometrical, and structural information about a town, its streets, its buildings, and their aggregation, which constitutes crucial input for subsequent analyses [
49]. Nevertheless, as already highlighted, at this scale of application, the accessibility, availability, and reliability of data represent the main hurdle to overcome, and the focus should be the data collection procedures and the improvement of methods to investigate the seismic vulnerability of buildings [
50]; indeed, in recent years, several procedures have been developed.
In [
67], a computer-aided strategy for the rapid visual inspection of buildings and the prioritization of strengthening interventions necessary prior to and after an earthquake event is proposed; the first step of the methodology regards the building inventory compilation and a vulnerability ranking procedure for the existing buildings in the specific area investigated, implemented into a GIS-oriented and multi-functional tool, allowing for the management, evaluation, processing, and visualization of a lot of available spatial distribution of stored data stock gathered during the pre- and post-earthquake assessment.
In [
47], the procedure for the construction of a complete GIS database focusing on the earthquake hazard, architectonic/urban planning, and structural vulnerability analyses is developed and applied to the case study of San Giuliano di Puglia. In this case, many different types of information were available, gathered during the activities carried out after the earthquake occurred in the area in 2002; indeed, after the identification of regional context and research on the history and morphologic evolution of the city, all the buildings of the historical center were catalogued, overlapping and integrating all available data: cadastral maps; information derived by damage forms completed by Italian Civil Defense teams during the earthquake emergency; data gathered by ENEA experts; materials on reconstruction projects coming from the Municipality and professionals; and information coming directly from in situ surveys. Using this detailed georeferenced database, a seismic vulnerability assessment and the qualitative identification of collapse mechanism for each masonry building within the historical center were performed quickly using a vulnerability index method.
In [
68], a method for rapidly characterizing seismic disaster risk on a large scale is proposed by focusing on monetary loss, integrating accurate information about the heights and footprint areas derived using remote sensing, with building-relevant local knowledge, that is, the last various sources of information regarding buildings in a specific area, e.g., local building codes, local dwelling traditions, and local building construction planning. On the basis of such a georeferenced database, the damage probability matrix method has been quickly implemented to assess the seismic vulnerability of a large number of existing buildings.
In [
69], an integrated model to consider the five main groups of parameters (geotechnical and seismological, social, distance to dangerous facilities, and access to vital facilities) has been implemented, along with related sub-parameters for the seismic vulnerability analysis of an urban area, using the analytical hierarchy process (AHP) and the GIS.
A rapid visual screening (RVS) method for reinforced concrete buildings is developed, implemented, and managed in the GIS in [
70], exploiting data gathered during an intensive field survey, to elaborate a vulnerability scenario in terms of damage grade for a building stock in an urban area.
In [
71], the GIS is used as a proactive tool to predict and monitor the vulnerability of masonry spired damage in the area hit by the Emilia 2012 earthquake, as well as the object of extensive surveys in the post-earthquake; thus, a georeferenced database to statistically elaborate the data and define the correlation between the levels of damage and the type of masonry arrangement is designed.
In [
72], a QGIS plugin [
73] is presented to implement two vulnerability index-based methods for old masonry buildings and generate vulnerability maps of an urban area, predicting potential damage scenarios and deriving information useful in risk mitigation strategies; in this case, the information about buildings has been taken by means of rapid visual structural and geometrical inspection.
In [
64], the authors propose a new approach using an artificial intelligence system model implemented on the basis of a large set of data about different types of buildings surveyed and collected using the GIS for many years; the aim is to predict the damage to buildings on an urban scale, considering input uncertainties, by means of capacity spectrum method (CSM) to obtain a large set of training data from a combination of three parameters: earthquake magnitudes, structural types, and distances between the epicenter and buildings.
In [
74], a GIS-based seismic risk assessment of a local district on the basis of data gathered from national census was performed, to assess the vulnerability of local populations, using a holistic model to estimate exposure, resilience, and capacity factors. GIS tools allow one to generate and display maps to highlight the socio-economic and physical characteristics of vulnerability for a district in Pahang, Malaysia. Subsequently, a seismic risk map of the investigated area is elaborated on by overlying the derived map with the seismic hazard map.
An integrated approach is tested and validated for a city context in [
52], combining data-mining, remote sensing interpretation, GIS-based mapping, and vulnerability index methods and generating vulnerability maps in terms of the distribution of the expected damage grade. A similar approach is used in [
41] with the further aim to ensure the constant updating of GIS building inventory, which able to identify large newly constructed buildings in a large urban context by comparing the high-resolution satellite images and the existing GIS data.
In all the above-mentioned studies, GIS represents a fundamental tool for seismic vulnerability and risk assessment on a large scale, allowing the following operations: (a) the management of a large number of georeferenced data; (b) the integration and processing of several information layers; (c) the rapid implementation of different large-scale assessment procedures; and (d) the elaboration of data and results in simple and immediate thematic maps. Within this process, the key step is the derivation of all the possible information about existing buildings from different types of sources and several data-gathering approaches, with the aim to construct a suitable knowledge framework for the implementation of seismic assessment procedures. However, it worth noting that, in all previously mentioned studies and applications, the starting point was a wide variety and quantity of data, often collected during an intensive in-situ survey or using refined technologies, e.g., remote sensing or high-resolution satellite images, that was already available in a format directly implementable in a GIS environment. On the other hand, generally, it is not possible to rely on such a full information framework because it needs to derive missing data by exploiting and integrating limited sources. The present work is contextualized in this framework, proposing an approach suitable for all circumstances concerning a lack of data, time, and economic resources that is able to derive missing information about existing building stock, from the urban scale down to the building scale, without resorting to extensive and burdensome detailed on-field inspections and/or costly technologies but exploiting solely limited “poor” data, with the aim to realize an information tool that is easily searchable and dynamically implementable using the potential of the GIS. The issue is addressed proposing a procedure to construct a georeferenced buildings database (GBD) for seismic vulnerability assessment of an entire urban context, directly in GIS environment, starting from limited information and sources, e.g., the ISTAT statistical dataset [
2], georeferenced cartography, and CARTIS survey form [
40]. The reliability of the GBD is then verified by means of building-by-building surveys performed for a sample of buildings. An application is proposed for the case study of Bisceglie, located in Puglia, obtaining a georeferenced database composed of layers with three different levels of detail depending on the minimum entity (ME) of information: census section (CS), urban block (UB), and individual building (IB), which are functional regarding the implementation of several types of seismic assessment procedures. In the specific case, using information stored in GBD, a Vulnerability Index Method [
48,
49], which is mentioned in the literature, has been directly implemented in a GIS environment, obtaining an Output Vulnerability Index of 3726 IB. These results are stored together with the other information in GBD; displayed in 2D maps; and used for the realization of a 3D interactive tool that is easily manageable, upgradeable and searchable.
  3. Methodology
The proposed method allows one to construct a georeferenced database of the existing building stock of an entire municipality, using information derived and integrated from different sources in order to carry out a typological classification using few meaningful typological, geometrical, structural, and technological characteristics for a group of similar buildings. For this information set, a rapid seismic vulnerability assessment based on the Vulnerability Index Method is automatically implemented in a GIS environment. Toward this aim, the use of GIS is a powerful tool able to execute deep and integrated spatial analysis on a large scale and manage a huge amount of information, enabling at the same time a rapid search; the automatic implementation of assessment procedures; and, finally, the effective visualization of data and results.
The method follows a top-down approach, starting from the collection and analysis of data of an entire urban context and is structured according to the following different phases:
- Documentation retrieval, data gathering, and integration from different types of sources with different levels of detail and GIS implementation; 
- The identification of homogeneous urban sectors and the definition of different typological structural building classes; 
- Validation through a comparison of some typological structural building classes with the characteristics of sample of buildings; 
- The automatic implementation of indirect methods of seismic vulnerability assessment and the elaboration of results in 3D interactive maps. 
In 
Figure 1 a flowchart of the procedure is schematized. Data gathering and integration in GIS are carried out by implementing different georeferenced information layers, following a decreasing level of detail: at first, the time development of urban fabric is studied through overlapping historic cartographies, aerial photos, and base maps; then, the analysis of ISTAT information allows for a preliminary subdivision of the urban area. Successively, the definition of homogeneous urban sectors with relative typological building classes is refined by integrating the information derived by CARTIS procedure. All obtained data are associated and added to those already available at the level of UB, down to the level of IB. For the latter, it is also possible recognize the relative CARTIS typological class by means of proper matching of geometrical data. The reliability and accuracy of this classification is verified by means of building-by-building surveys on a sample of buildings. All obtained data are stored in GBD, allowing for the implementation of several types of seismic vulnerability assessment procedures. As a final step, it is possible to plot 2D thematic maps and implement a 3D tool useful for storage, making queries, and processing information and results. The structure and the phases of the procedure are illustrated more in detail in the following sections.
  3.1. Data Gathering and Integration in GIS
The preliminary phase consists of the documentation retrieval and collection of all available information. It is worth highlighting that the availability and quality of data are connected to the specific context of the analysis, but generally it is possible to collect information from different sources and with different levels of detail: statistical data, the existing historical cartography, site maps, aerial photogrammetric images, the geospatial database, technical cartographies, and a rapid in situ survey. These sources are usually provided by the technical departments of municipalities or are available in web platforms in a digital format that can be easily implemented and managed directly in a GIS environment.
The sources used are as follows:
- The ISTAT dataset, which is one of the main sources of statistical data about existing buildings and is available for the whole Italian national territory in the form of georeferenced databases that contain variables and vector files for each census section (subarea of municipal territory) [ 2- ], defined as a polygon with associated attributes. For each CS, the aggregate data about the number of buildings by structural typology (masonry, reinforced concrete, and other material) are reported; the class of “age of construction” (≤1919, 1919–1945, 1946–1960, 1961–1970, 1971–1980, 1981–1990, 1991–2000, 2001–2005, and ≥2005); the number of floors (1, 2, 3, and ≥4); and the maintenance state (bad, mediocre, good, and excellent); 
- Historic documentations and cartographies, which are raster files of utmost importance for studying the time development of the urban fabric and its conformation, which are generally available in paper format and are quickly implementable in GIS after simple operations of scanning and conversion; 
- Aerial photos and base maps, which allow for a realistic vision of the area analyzed, e.g., orthophotos raster files and maps published online in standard Web Map Service (WMS);  
- Technic Regional Cartographies (CTR), which are vector files composed of polygons representative of UBs with a set of attributes, and which include the typology of construction (civil building, dilapidated building, building under construction, and underground building), area, and height; 
- The digital terrain model (DTM) and digital surface model (DSM); both are raster files that represent the distribution of a terrain’s elevation data, respectively, without and with elements as vegetation and other artefacts, and allow one to derive the real height of each building through a simple operation of subtraction; 
- A cadastral map, available in digital format and directly readable in GIS or in WMS standard, in which the polygons represent the IB; 
- Technical project documentations; if available, they contain different types of detailed information about IB and their components; 
- The ReLUIS CARTIS catalogue [ 40- ], which represents an additional fundamental dataset, containing information about the typological and structural features of different building classes identified within homogeneous urban districts, collected by compiling forms that have been successively digitalized. 
This multisource information has been managed and integrated in a GIS environment, using the QGis opensource geographic information system software [
73]. Generally, the datasets in the form of vector files, in which different attributes are associated with polygons representative of a part of the built space, can be directly implemented in a GIS environment, but the datasets available in paper or digital format are not readable in GIS, necessitating easy digitalization or conversion operations.
Different ME of information have been defined for the different datasets, depending on the scale of representation of polygons, as follows:
- ISTAT dataset: CS, which is a sector of municipal territory including a set number of buildings; 
- CTR: UB, which is a group of contiguous buildings; 
- Cadastral Building Map: IB, which represent a single building. 
The three datasets are listed following the decreasing size of relative ME; this means that within each CS there is one or more UB and, consequently, within each UB there is one or more IB. In 
Table 1, for each of abovementioned dataset, excerpts of the vector file implemented in QGis are shown, highlighting one of the polygons representing the corresponding ME and completed with relative data format and attributes. For example, in the excerpt of the ISTAT dataset a polygon of a specific CS is shown, for which the information is given in aggregate form as a number of buildings by structural typology, class of “age of construction”, number of floors, and maintenance state. Successively, in the excerpts of the CTR, one of the UBs within the CS is highlighted and, finally, in the excerpts of the Cadastral Building Map, an IB belonging to the UB is highlighted. The procedure, therefore, firstly performs an integration of data among the various ME, allowing one to fill in the gap of information for each level of detail. More specifically, for each CS, the information contained in ISTAT datasets regarding the structural typology, age of construction, number of floors, and maintenance state are processed and then associated with each UB of the CTR included in the specific CS. At the level of IB, the geometrical information of 
area is derived directly from the Cadastral Building Map, and 
height is obtained via a subtraction between DSM and DTM. Finally, by overlapping the georeferenced vector layers with aerial photos and WMS base maps, a check is performed that involves correcting or adding polygons and related information if required.
The time needed for the implementation in GIS and the construction of GBD, depend mainly on the number of data sources that require a format conversion; in particular, all the information available in paper form, such as historic documentation, cartographies, and paper forms, should be digitalized, imported, and georeferenced in a GIS environment. However, after these preliminary operations, the integration of all the data is performed in very little time; connected to simple operations of overlap and the association of the different information layers in QGis [
73]; and implemented in a few seconds for the whole area analyzed, using functions and plugins available in GIS software.
The result of all these operations is a GBD in which the information is structured according to three different layers, each related to the pertinent scale of representation: the first contains the information at the scale of the CS, the second at the scale of UB, and the third at the scale of IB. By selecting the appropriate level, different types of assessment procedures can be implemented easily and automatically for the different ME.
  3.2. Definition of Homogeneous Urban Sectors and Typology Building Classes
The definition of homogeneous urban sectors and typology building classes is performed according to the CARTIS procedure [
40]. CARTIS forms are filled in by an expert with the support of local technicians, and the data collected regard the recurring structural and typological features (i.e., the identification of building classes) within homogeneous areas of the municipality. In the proposed procedure, the information provided by CARTIS forms is integrated and verified with the data of GBD, in order to obtain more reliable information. The general structure of the CARTIS form is composed of four sections:
- Section 0: the delimitation of homogeneous urban sectors; 
- Section 1: the identification of prevailing buildings’ typological classes for each urban sector; 
- Section 2: the identification of the general characteristics of each building typological class; 
- Section 3: the characterization of the structural elements of each building typological class, which are closely linked to the seismic behavior of the building type. 
A homogeneous urban sector is a part of a municipality characterized by consistency in building fabric, the same age of first construction, and the same prevalent structural and typological features [
40]. The definition of different homogeneous parts of a municipality can be carried out on the basis of the information contained in the GBD. The overlap of different historical cartographies allows for the preliminary analysis of the historical development of the urban territory. Then, using the information associated with each CS by ISTAT datasets, it is possible to analyze the distribution of features such as the structural materials, age of construction, number of floors, and maintenance state for the CS of the whole municipality, obtaining a reliable delimitation of the homogeneous urban sectors and the relative aggregate data. For each homogeneous urban sector, a preliminary identification of typological structural building classes is performed on the basis of this elaborated information. Then, a more detailed characterization of the typological classes is carried out by compiling the CARTIS forms with the information about recurrent geometrical, structural, and morphological features supplied by the local expert technicians. Consequently, these data are associated with each IB by matching 
area and 
height derived by the Cadastral Building Map and DTM and DSM, with the geometrical data about 
mean floor area and 
number of stories contained in Section 2 of the CARTIS form, recognizing the corresponding typological class and the pertinent Cartis typological and structural information.
  3.3. Verification of Building Classes
The information for each IB located in a specific GBD is obtained by classifying the building into the pertinent CARTIS typological structural class, on the basis of the attributes area and height, which are associated with the corresponding polygon within the Cadastral building map. 
A validation of the reliability of information thus obtained for each IB has been performed by analyzing, in detail, a sample composed of similar buildings located close to each other and possibly in the same homogeneous urban sector and, hence, belonging to the same typological classes. 
Building-by-building surveys, using the CARTIS form, have been carried out for each building of the sample, by means of specific in situ inspections and detailed sources of information such as existing technical documentations. The resulting structural and typological features of each building have been compared with those of the typological class that was previously associated with them, verifying the reliability of the information of some typological classes and the accuracy of the classification implemented by the procedure.
  3.4. Seismic Vulnerability Assessment
The demonstration of the procedure has been completed with the application of a seismic vulnerability assessment, by using a simplified version of the procedure proposed in [
65,
66], which has been implemented in the GBD and is automatically performed. 
It is worth noting that this is only an example of the potentiality of the platform, which could be equipped with any algorithm for the vulnerability assessment based on the data provided by the procedure. 
In the application, a seismic vulnerability index
		
 has been quickly calculated for each ME within the municipality according to the following equation:
        where 
 is the typological seismic vulnerability index proposed in [
66,
67] and 
 represents the modifying coefficients depending on different factors. The values of 
 and 
 vary according to the structural material and other parameters; in any case, the result in terms of the seismic vulnerability index 
 must be between −0.02, corresponding to the maximum seismic vulnerability, and 1.02, corresponding to the minimum seismic vulnerability. As reported in 
Table 2, the value of 
 for masonry buildings depends on classes based on the age of construction, assuming a decreasing value for newer buildings. In 
Table 3, the values of 
 for reinforced concrete buildings are reported; they decrease according to three seismic design level categories: absent; low; and medium. In this case, some assumptions need to be made about the data concerning the age of construction with the evolution of technical standards and Italian seismic classification. Modifying coefficients 
 take into account the contribution of other features that influence the seismic behavior listed in 
Table 4 and 
Table 5, respectively, for masonry buildings and reinforced concrete buildings with reference to the classes previously defined: in both cases, the parameters considered are the state of maintenance and the number of floors. In addition, for the masonry buildings, the position in aggregate is evaluated; hence, according to this approach, a negative value of 
 is associated with a worse condition and a positive value of 
 with a better condition. The results, together with all information, are stored in the GBD as attributes associated with corresponding ME. In this way, it is possible to elaborate interactive 2D and 3D maps of the city within the GIS environment.
It needs to be pointed out that the informative structure of the GBD is easily integrable with any further information about existing buildings, as well as retrofitted and new buildings, allowing for the implementation of any type of seismic assessment procedures on a large scale, including the construction and application of fragility functions, using empirical or analytical procedures. Indeed, in the case of the empirical method [
15], the data collected in the GBD can be directly used, instead, for the elaboration of analytical fragility curves [
75], and the statistical elaboration of the same data could allow for the definition of mechanical-analytical models representative of large sets of buildings.
  4. Application to the Case Study
The procedure has been applied to Bisceglie, a city located in Puglia, with a population of about 60,000 inhabitants and a territory of about 70 km
2. According to ISTAT data [
2], the existing building stock occupies the 10% of entire Bisceglie territory and consists of about 5000 residential buildings, of which about 30% has a masonry structure and 60% a reinforced concrete structure. Regarding the age of construction, the 13% was realized before 1919; 8% between 1919 and 1946; 7% between 1946 and 1960; 12% between 1961 and 1970; 19% between 1971 and 1980; 21% between 1981 and 1990; 11% between 1991 and 2000; 6% between 2001 and 2005; and 3% after 2005 [
29]. From these data, it is observed that 50% of the buildings were constructed before 1981, the year of the first seismic classification of Bisceglie. 
The available dataset is derived by different type of sources and information, including:
A first analysis of the historic construction development of the urban fabric has been performed using the available historic cartographies and maps, obtaining a preliminary subdivision of urban areas. A more accurate division of the urban territory has been derived based on the ISTAT dataset, by calculating, for each SC, the prevailing number of buildings by structural typology, class of age of construction, number of floors, and maintenance state, refining the definition of homogeneous urban sectors, which are composed of CSs characterized by the same prevailing features.
Then, different typological classes have been defined for each urban sector based on available information in spatial georeferenced datasets and with the support of local expert technicians. For each building class, geometrical, structural, and technological characteristics; the possible damage and degradation state; and maintenance condition information have been collected in the CARTIS form [
40].
The GBD has been constructed by following a top-down procedure: each UB of CTR inherits the ISTAT data of the corresponding CS concerning structural typology, age of construction, and maintenance state. The area of each IB is available as an attribute of the polygons in cadastral building maps, and the height of each IB has been obtained by subtraction between DSM and DTM. In this way, it has been possible to associate typological class and relative typological and structural features derived from the CARTIS catalogue with each IB of the cadastral building map shapefile, comparing the area and height of the corresponding polygons with the geometrical data on mean floor area and number of stories contained in Section 2 of the CARTIS form. 
The validation of the accuracy and reliability of the approach has been carried out by a detailed verification of the characteristics of some building classes, by filling in CARTIS forms for a sample of buildings properly chosen in order to have the highest possible number of buildings belonging to a same buildings class. For each building of the sample, a building-by-building survey has been performed by collecting information using the CARTIS form; enough level of detail was taken from the technical documentation to be made available by the technical department of the municipality or from in situ surveys specifically performed. These data have been compared with the information of the corresponding building class that was previously associated with them. In particular, a comparison has been carried out on a few simple parameters: age of construction, total number of floors, underground floor, and mean floor surface. In case of a correspondence of characteristics, the building classification has been confirmed, whereas in the case of incorrect correspondence, the data of building included in the GBD has been corrected.
The subsequent seismic vulnerability assessment can be carried out in GBD in an automatic way; in particular, for the scope of the present work, the reference unit of the procedure is the IB, and the available data allows one to calculate the seismic vulnerability index  for each ME using a simplified procedure abovementioned, calculating an index  for 3726 IB of the cadastral building map. The final results are stored in the 3D georeferenced model and can be mapped in 2D and 3D form. This becomes a powerful interactive tool for knowledge, characterization, and analysis on an urban scale, with an associated database that can be easily implementable, updatable, and searchable.
  5. Application to the City of Bisceglie and Discussion of the Results
Using historical cartographies, the time development of the urban fabric has been analyzed. The historic center, located on the coastal area and clearly delimitated by boundary walls, is the medieval part of the city; its foundation dates back to about 1000 ad., and the first expansions toward the inland and the west coast started in 1920. The second expansion area and the touristic zone are located, respectively, at the east and at the west of the city center; they date back to 1950. The modern part of the city covers the inland area and the east coast, and the first buildings were constructed around 1970. After these preliminary considerations based on historical information, eight homogeneous urban sectors have been defined using the ISTAT dataset [
2]; their boundaries are illustrated in 
Figure 2 with relative denomination, defined as follows: 
- C01—historic center; 
- C02—first expansion; 
- C03—second expansion; 
- C04—third expansion east; 
- C05—fourth expansion east; 
- C06—third expansion west; 
- C07—fourth expansion south; 
- C08—touristic expansion. 
For each homogeneous urban sector, the distribution of buildings by structural materials, the age of construction, the number of floors, and the maintenance state have been analyzed, as reported in 
Figure 3. In 
Figure 3a, it is possible to notice that the highest percentages of buildings with a masonry structure are concentrated in the oldest homogeneous urban sector 
C01 historic center and 
C02 first expansions; instead, in the newest homogeneous urban sectors the majority of buildings have a reinforced concrete structure, as expected; with regard to the age of construction, observing 
Figure 3b, the prevailing percentage of the buildings dates back to before 1919 within the 
C01; in 
C02, more than half of buildings were built before the 1919 (33%) and between 1961 and 1970 (20%); within 
C03 and 
C04, more than 50% of buildings were realized between 1961 and 1990; a significant percentage of buildings belonging to the 
C05 were built between 2001 and 2005; within 
C06, the percentage of buildings by age of construction is uniformly distributed among all the classes of age of construction; and finally in 
C07 and 
C08, most of the buildings were constructed between 1981 and 1990; therefore, it is found that, for the entire urban sector the distributions are consistent with the first age of construction. 
Figure 3c showed that the distribution in terms of number of floors is strictly linked to the location of the homogeneous urban sector within the urban context and the time development of urban fabric; indeed, in the 
C01 historic center and 
C02 first expansion, there are large parts of 2-, 3-, and 4-storey buildings, while 
C06 and 
C08, which cover the west coast, mainly have 1- and 2-storey buildings; finally, there is a high percentage of buildings of 4 or more floors within 
C03, 
C04, 
C05, and 
C07 located inland and at the east of the city center. In the 
Figure 3d, it is possible to find the same trend of percentage of buildings by maintenance state; in 
C01 and 
C02 most of the buildings present a good or mediocre maintenance state; within 
C06, 
C07, and 
C08, many of the buildings have good maintenance status; and within 
C03, 
C04, and 
C05, which cover the modern part of the city, there are mainly buildings that have excellent or good maintenance status.
The subsequent step has been the definition of the different typological-structural building classes compiling the CARTIS form [
40]. 
Table 6 reported all the identified classes and the relative percentage for each homogeneous urban sector, named with the letter M for masonry and RC for reinforced concrete, followed by a progressive number. 
The data has been integrated into the GBD according to the procedure. ISTAT data about structural materials, age of construction, number of floors, and maintenance state have been analyzed for all CS of Bisceglie and then associated with UB of CTR. Each IB has been classified according to CARTIS typological classes by matching the data area and height of the polygon of the cadastral map and the CARTIS catalogue. The polygons inherit all the structural and typological features of the corresponding class.
The subsequent phase of validation of the accuracy and reliability of the approach has been carried out by a building-by-building survey on a sample of buildings composed of 33 buildings of 3 different CSs, as shown in 
Figure 4. As illustrated in more detail in 
Figure 5, 21 buildings are included in the CS #130 (which is entirely included in the urban sector 
C02) (
Figure 5a); 5 buildings are included in the part of census section #93 belonging to the urban sector 
C02 (
Figure 5b); 3 buildings are part of census section #93 in the urban sector 
C06; and 4 buildings are located in the census section #405, in the urban sector 
C02 (
Figure 5c). The CARTIS form [
40] has been filled in for each building of the sample using detailed information derived from an in situ survey and technical documentation supplied from technical offices; then, these data have been compared with those of the corresponding typological class. The comparison has shown that, in the total sample set, only 37% of the buildings have correspondence between their characteristics and corresponding class. At the level of the single census section, in CS #130, in which a large number of buildings were analyzed, 43% have correspondence; in census section #93, only 28% of sample has correspondence; and in the CS #405, for which only 4 buildings were analyzed, no buildings have correspondence. At the level of urban sector in 
C02, there is correspondence only in 37% of cases, and in the urban sector 
C06, all of 3 buildings of sample have correspondence. 
For the sake of clarity, the RC typological classes of the homogeneous urban sector 
C02 are listed in 
Table 7 with relative information of 
number of stories, 
mean interstorey height, and 
mean floor area extrapolated from Section 2 of the CARTIS form. The polygon of the shape file of the cadastral map and a relative image are reported in 
Table 8 for four buildings investigated by means of building-by-building survey in CS #130 falling in the homogeneous urban sector 
C02. For each building, the information is accompanied by relative ID, and attributes of 
area and 
height stored in GBD, derived, respectively, from the cadastral map shape file and raster file obtained by subtracting from DSM the DTM. The correspondent RC typological class has been obtained by matching the features of 
area of the polygon with 
mean floor area of the typological classes of 
C02 and the number of floors, which were obtained by dividing the 
height of the polygon by an interstorey height assumed to be equal to 3.5 m, and the 
number of stories were of the same typological classes. For example, the IB 9594-G has an 
area equal to 280.00 m
2 and a 
height of 20.5 m, corresponding approximately to six floors, and consequently has been classified as RC1. Successively, the reliability of the information of typological classes has been verified by means of a comparison with the features collected in a building-by-building survey. In this case, for buildings IB 9594-G and IB 9594-I, good correspondence has been found; however, for buildings IB 3538 and IB 7370-A1, low correspondence has been found.
The results suggest that CARTIS typological classes do not cover the whole building stock and should be modified or extended. However, increasing the number of buildings analyzed, it is possible to find a match with a typological class. For this reason, it is necessary to establish sampling rules for the verification of typological classes.
Subsequently, the 
 has been rapidly calculated for 3726 IB of the cadastral building map, and four class of vulnerability have been defined: low (
L) (0.00 ≤ 
 ≤ 0.25); medium-low (
ML) (0.25 < 
 ≤ 0.50); medium (
M) (0.50 < 
 ≤ 0.75); and high (
H) (0.75 ≤ 
 < 0.50). 
Figure 6a shows the results of the seismic vulnerability assessment plotted in a 2D map. The typological classes have similar characteristics, and considering all the IB, the value of 
 varies between 0.29 and 0.56. Specifically, 88.7% of all buildings are characterized by medium-low seismic vulnerability level, and 11.3% by medium seismic vulnerability. In particular, the analysis of results with regard to different homogeneous urban sectors (
Figure 6b) shows that all the buildings within 
C05 and 
C08 (which have the highest percentages of RC buildings) have a medium-low seismic vulnerability. Instead, 
C01 and 
C02, which are rather homogeneous and have the highest percentage of masonry buildings, have the worst condition in terms of the seismic vulnerability index.
It worth highlighting that the seismic vulnerability assessment methods on a large scale are characterized by low resolution, and, generally, typological classes of the same urban sector present characteristics that are very similar. Therefore, further analyses will also have to be made for the seismic vulnerability results to understand the influence of an incorrect classification. 
Finally, the results of the seismic vulnerability assessment have been stored in GBD and associated with the relative polygon representative of the IB, together with all the information; this operation has allowed for the realization of a 3D georeferenced tool that is easily searchable and dynamically implementable with other information and assessment procedures; indeed, as displayed in the 
Figure 7, the search for data and results stored in GBD can be conducted quickly for each level of detail by simply selecting the ME of interest.
Some important considerations should be made about the pros and cons of the proposed method, by means of a comparison with a similar one available in the literature. With this aim, the procedure proposed in [
34] has been considered, and the definition of data-collection strategies to fill the possible lack of consistent data and to characterize the representative building typologies of historical centers has been focused on, to implement, on a GIS platform, a suitable database at the urban level that is useful for subsequent seismic vulnerability assessments. In both procedures, the integration of all available data sources is considered a suitable strategy to extrapolate and derivate all the possible information on an urban scale, which is extremely helpful in the definition of representative typologies of building stock with relative meaningful data. With this aim, the use of GIS tools is essential to manage a large amount of information and build an explanatory database.
Nevertheless, the two proposals have a different approach in refining the information contained in the database. Indeed, in [
34] the first phase of data collection is finalized to plan building-by-building on-field surveys to collect very detailed data on a significant sample composed of 111 buildings and, only subsequently, associate the same information with the other buildings with similar general characteristics. Furthermore, the buildings should be inspected carefully both from inside and outside to optimally fill the forms and obtain homogeneous and consistent databases, but, as highlighted by the authors of [
34], such an optimum scenario is not easily feasible. For this reason, it may be necessary to use the taxonomy derived in the preliminary analysis to extrapolate information for the buildings that are without accessibility or lacking information. Therefore, if such an approach allows one to obtain more detailed and reliable information, it will require significant resources in terms of time and costs.
Instead, the procedure proposed in this work involves the achievement of a knowledge framework of a whole area investigated, exploiting only the available sources and extrapolating, integrating, and deriving all the possible information to build a GIS database with different level of detail, reaching the scale of the IB. Obviously, the level of detail achievable is strictly connected to the type of available information sources and, in the specific case, it was possible using the CARTIS catalogue of the structural typological classes, which is, anyway, a simple and fast procedure applicable in any urban context; therefore, although the data collected are less reliable and accurate, the procedure allows one to minimize time and costs resources. Obviously, a database with very detailed information helps to improve the quality and reliability of the large-scale seismic vulnerability assessment but implies the employment of significant resources, and such an approach is often unfeasible, especially when there is a large area to investigate; in this case, a compromise between reliability and accuracy of information and available resources must be found.
  6. Conclusions
The procedure proposed in the present work allows for the extraction, integration, and elaboration of data from different sources to construct a geo-referenced cartographic and descriptive database in a GIS environment, which is the fundamental reference basis for any rapid seismic vulnerability assessment of the existing residential building stock on a large scale. 
The application has demonstrated that it is possible to easily manage different typologies of data and obtain homogeneous information on the urban territory, filling in the possible data deficiencies at the different scales. Moreover, on the basis of such a structured georeferenced database, an indirect method for seismic vulnerability assessment has been implemented in a simple and rapid way, deriving the results for a large number of buildings, but it can be possible to implement other types of assessment on a large scale as well. 
It worth remembering that, in order to optimize resources in terms of the time and cost of analysis, this level of procedure is fundamental for providing a preliminary analysis of the current state of the existing building stock and planning further detailed surveys and more detailed analyses.
Furthermore, the use of a GIS environment allows one to create a tool connected to a relational database that is easily implementable and searchable and can support the management of urban areas. Such a tool is particularly useful as a support for decision makers, enabling them to have a global picture of the risk of a wide territory and use it as a rational basis for programming rehabilitation strategies and risk-mitigation measures. 
An important issue remains, of course: the availability of information and the associated uncertainty, which are often connected to the specific context of analysis. Moreover, the verification procedure performed on a sample of buildings highlights the necessity to verify and correct the information collected on a large scale to define the building typological classes through a comparison with the characteristics of a number of buildings, for which the gathering data could be burdensome. 
The verification of “robustness” of the procedure of data extraction and integration from different sources should be considered; indeed, it shall be such as to introduce insignificant errors with respect to the resolution of indirect seismic vulnerability assessment procedures. 
Finally, it is important to point out that the present work is part of a broader study about the application of automation and artificial intelligence (AI) methodologies in the field of design, analysis, monitoring, and risk assessment processes of structures and infrastructures; in particular, further development will relate to the realization of an automated procedure that is able to check and correct data through a comparison between information on different scales and to obtain data with less uncertainty and higher reliability than normal.