Bias in the Estimation of Seismic Risk for Municipal Building Stocks due to Limited Data

This study investigated the effect of the building data knowledge level on seismic risk estimation for municipal building stocks, focusing on identifying the characteristics that influence loss estimation bias. Fifteen municipalities in two Slovenian regions were analysed using twelve building data knowledge levels, defined by combining different knowledge levels about building location and floor area. The knowledge levels ranged from those using data aggregated at the municipality level to those using building-specific data. The bias was quantified as the log residual between the expected annual losses estimated for the given knowledge level and the base-case level, characterised by building-specific data. The results indicate that loss estimation bias is affected by both the building location and floor-area knowledge levels. The data on building density distribution across the municipality and building-class-specific floor areas are sufficient for estimating loss with low bias with respect to the base-case level. The effect of potential data improvement on bias reduction can be assessed using building stock homogeneity and hazard variation indexes determined from readily available data. Further research is needed to explore loss estimation bias for building data knowledge levels not considered in this study and generalise the concepts to other regions and building classifications.


Introduction
Information about the seismic risk to building stocks at the level of administrative units such as municipalities supports decision making about strengthening the community against earthquakes. Because of the numerous buildings considered in such risk estimation, the buildings are typically described by a few essential characteristics needed to construct the building stock exposure model considered in risk assessment (e.g., [1][2][3]). These characteristics of buildings can be defined at different knowledge levels. Buildingspecific data, accounted for in some studies (e.g., [3,4]), represent a relatively high building stock knowledge level. However, the knowledge available about the building stock is usually less comprehensive. It is much more common that only aggregated building data are readily available for use in seismic risk assessment (e.g., [5]). In such cases, additional bias can be introduced into the seismic risk estimation.
The effect of the building data knowledge level on seismic risk assessment has been investigated in several studies. Bal et al. [6] studied the effect of exposure data and ground-motion field resolution on event loss for an idealised city and a region in Turkey. They found that a crude resolution level does not imply a significantly biased result. A similar study by Dabbeek et al. [7] explored the effect of exposure data resolution on the expected annual loss (EAL) at the national and sub-national levels. In their study, the average bias at the national level was estimated at 27% with respect to the base-case results.
Kalakonas et al. [8] investigated the effects of various exposure model resolutions on risk estimation for Guatemala. They concluded that a coarse resolution at an urban level should be refined to avoid biased results. The importance of the geographical scale used to represent the building exposure model was also highlighted by Douglas [9] and Ordaz et al. [10]. Bazzurro and Park [11] studied the effect of location data aggregation for a selected building type in California. They found that aggregating data affected the loss curve more than the EAL. Senouci et al. [12] explored different building aggregation options and their effects on the uncertainties of event loss estimates. Sanderson and Cox [13] compared the losses estimated using national and local building inventories and found that using the national inventory underpredicted the losses aggregated at the city level. Some studies have focused on the feasibility of developing fragility and vulnerability models using limited data. Basic fragility and vulnerability models can be established using information from freely available databases (e.g., [14,15]), and additional data obtained using an interview-based approach (e.g., [16]) can further improve such models (e.g., [17][18][19][20]). Methods for incorporating new data into risk estimation [21,22] and improving the accuracy of risk estimation based on a given level of data [23][24][25][26] have also been developed.
Previous studies have shown that the bias in seismic risk estimation can be reduced by upgrading the building data knowledge level. However, collecting additional data is a demanding process that requires resources and is thus not always justified. The decision about whether upgrading the building data knowledge level is worth the effort requires an understanding of the effect of new data on the accuracy of risk estimation. Moreover, new data can affect bias reduction differently for different administrative units. Therefore, an understanding of which characteristics of administrative units are indicative of the degree of bias reduction a ainable with new data is necessary for additional data collection to be effective.
The current study addressed the problems described by investigating the effect of gradually upgrading the building data knowledge level on seismic risk estimation, with the ultimate goals of (1) identifying which gradual improvements to the knowledge level significantly contribute to bias reduction and (2) developing indexes based on readily available data that can be used to estimate in advance the degree of bias reduction achievable by introducing new data. Fifteen municipalities in two Slovenian regions were analysed using each municipality's EAL as the risk indicator. The building data knowledge levels were defined by combining different knowledge levels about building location (i.e., the location knowledge level) and floor area (i.e., the floor-area knowledge level). The location knowledge level affects the EAL because of the potential bias in estimating the seismic hazard. The floor-area knowledge level determines the accuracy of defining the floor area of each building, which affects the outcome of the consequence assessment for the selected risk indicator. Section 2 describes the seismic risk estimation procedure used in this study. Section 3 presents the building stock data for the investigated municipalities. Section 4 presents the method for determining the bias in seismic risk estimation for different building data knowledge levels. The method considers twelve building data knowledge levels, with the base-case level defined by building-specific data. The bias results are presented in Section 5, and the municipal characteristics affecting the bias are identified in Section 6.

Estimation of Seismic Risk for Municipal Building Stocks
The seismic risk for municipal building stocks was quantified in this study using the EAL as a measure of the direct economic loss resulting from physical damage to buildings. The EAL at the municipality level was obtained by summing the EALs of the individual buildings in the municipality. For the base-case knowledge level, the EALs of individual buildings were estimated using the available building-specific data. However, for lower knowledge levels, building data were adjusted to be consistent with the assumptions at that knowledge level. The assumptions for each knowledge level considered in this study are presented in Section 4.
The general methodology used for the estimation of EAL at the building level comprises (1) hazard assessment, (2) damage assessment and (3) consequence (loss) assessment and can be divided into the following sub-steps: - Step 1: Assessment of seismic hazard in terms of seismic hazard curves ( ), each representing the mean annual frequency of exceedance of a given intensity level (im) at the location of the ith building. - Step 2a: Assessment of seismic fragility curves for the designated damage states of the i th building. - Step 2b: Estimation of the mean annual frequencies of exceedance of the designated damage states ( ≥ ), where d is the index of the damage state. - Step 2c: Estimation of the mean annual frequencies of occurrence of the designated damage states ( = ) . For the most severe damage state, ( = ) is equal to ( ≥ ). For other damage states, ( = ) is equal to the difference between ( ≥ ) and ( ≥ ).

-
Step 3a: Estimation of the expected economic losses given the occurrences of the designated damage states ( = ).

-
Step 3b: Calculation of EAL by summing the products of ( = ) and ( = ) over all designated damage states.
In the current study, the first step was performed by using the peak ground acceleration (PGA) as the ground-motion intensity measure. To assess the seismic hazard curves, the municipalities were first divided into grid cells with an approximate size of 0.5 × 0.5 km. The centre of each cell represented the calculation point for the assessment of the hazard curve, which was then applied to all buildings within the cell. The hazard curves, ( ), were defined as follows: where ( > ) is the mean annual frequency of the PGA exceeding the designated value, pga, for the grid cell containing the i th building. The hazard curves were calculated for soil type A [27] using the OpenQuake engine [28,29] and the ESHM2020 model [30]. For other soil types, the PGA values from the hazard curves for soil type A were transformed based on the soil factors proposed in the draft of the new Eurocode 8 [31], as in a previous study [32].
In the second step (damage assessment), five damage states were considered, consistent with the HAZUS methodology [33]: no-to-minor damage (DS0), slight damage (DS1), moderate damage (DS2), extensive damage (DS3) and complete damage (DS4). A detailed description of damage associated with each damage state can be found elsewhere [33]. The mean annual frequency of exceedance of the d th damage state for the i th building, ( ≥ ), was calculated as follows: where ( ≥ | = ) is the seismic fragility function of the i th building for the d th damage state, defined as the conditional probability of exceeding the d th damage state given the PGA, and ( ) is the absolute value of the derivative of the seismic hazard curve at the location of the i th building. The fragility functions were defined at the building class level, as is typical in largescale seismic risk assessment (e.g., [3,33,34]). Twenty building classes were defined based on the load-bearing structure material, construction period and number of storeys (Table  1), as in a previous study [3,4].
In terms of the load-bearing structure material, a distinction between masonry (brick or stone) and reinforced concrete buildings was made since the vast majority of buildings in Slovenia are made of these materials. All other load-bearing structure materials (i.e., steel and timber buildings and buildings with mixed or unknown materials of the loadbearing structure) were grouped in a third material class. It is important to highlight that buildings with steel and timber structures were not segregated into distinct classes due to their minimal representation in the building stock.
A distinction was made between periods of construction based on standards for earthquake-resistant design in Slovenia and the former Yugoslavia, which extended to the territory of today's Slovenia until 1991 (Table 1). Buildings constructed up to 1964 were designed with no or minimal consideration for lateral loads and represented the first construction period category. The first building code that explicitly dealt with seismic actions was effective from 1964 until 1981 and was issued after the Skopje earthquake. The buildings built in this period constituted the second construction period category. The third category included buildings constructed after the implementation of the second generation of earthquake-resistant design codes in 1982. These regulations remained in effect until 2008 when Eurocodes became compulsory in Slovenia. Nonetheless, no distinction was made between buildings constructed between 1982 and 2008 and those constructed after 2008 due to the relatively limited number of buildings from the la er period and the already substantial earthquake resistance of buildings from the 1982-2008 timeframe.
In terms of the number of storeys, two or three classes per material and construction period were defined (Table 1), thus considering that the seismic performance is affected by the height of the building, mainly due to its direct effect on the building's vibration period. Buildings with up to three storeys, four to six storeys and seven storeys or more were classified as low-, medium-and high-rise buildings, respectively. This classification is similar to that used in the literature (e.g., [33,35,36] The data needed to classify buildings were obtained from the Slovenian Real Estate Register [37]. In addition to those data, building-specific parameters were obtained, including the buildings' coordinates and net floor areas, for use in the EAL calculation. These parameters, aggregated at the municipality level, are presented in Section 3. For each building class, fragility functions were defined based on the fragility model presented by Babič et al. [4] in the form of a lognormal cumulative distribution function, which is typically assumed (e.g., [33,38]). However, a simpler definition of fragility functions was used than that presented by Babič et al. [4], who randomly simulated the fragility functions for a given building class to account for the uncertainty in the building class's median fragility and the variability in building fragility within the class. In this study, one fragility function per building class and damage state, defined as the average fragility function of those simulated in [4], was considered, thus disregarding the effect of the uncertainty in the fragility. There were two reasons for this simplification. Firstly, the study focused on the bias in EAL resulting from the building data knowledge level rather than the variability in EAL caused by the uncertainty in the fragility. Secondly, the bias in EAL was quantified by the log ratio of two EAL values (log residual) (see Section 4). Changes in the fragility functions have a lower impact on the EAL ratio than on individual EAL values because they partly cancel each other out. The medians and the lognormal standard deviations of the fragility function are presented in Table 2. In the third step in estimating EAL at the building level, the EAL was calculated for each building of the municipality's building stock as follows: The expected direct economic loss for the i th building in the case of the d th damage state, ( = ), was calculated as follows: where is the net floor area of the i th building, is the average building replacement cost per m 2 of the net floor area, and is the ratio between the repair costs for the d th damage state and . The areas were obtained from the Real Estate Register [37] (see Section 3). The costs of new construction and the demolition and removal of the damaged building were taken into account in estimating . The cost of new construction was estimated to be EUR 1100 per m 2 of net floor area, based on the construction costs for 2020 obtained from a Slovenian construction cost database [39]. The estimated value includes a tax rate of 9.5%. The cost of the demolition and removal of the damaged building was estimated based on previous studies [40][41][42], in which it varied between 8% and 15% of the new construction cost. On that basis, the assumed cost of reconstruction was EUR 1250 per m 2 of the net floor area, which is 13.5% more than the cost of a new building. The ratios were defined based on the HAZUS methodology [33]. For damage states DS1-DS4, they are equal to 0.02, 0.1, 0.4 and 1.0, respectively.
Finally, based on the values of EAL estimated for each building (Equation (3)), the EAL at the municipality level was calculated as follows: thus considering that the EAL for a portfolio of properties can be obtained by summing the EALs estimated for each of the properties (e.g., [43]).

Municipalities Investigated and Their Seismic Hazard
The municipalities investigated are located in two Slovenian regions ( Figure 1a). The first region (the Eastern region) is located in the northeastern part of Slovenia, close to the Austrian border. It contains nine municipalities, i.e., Tišina, Radenci, Gornja Radgona, Apače, Sveta Ana, Benedikt, Šentilj, Pesnica and Kungota (see Figure 1b). The second region (the Western region) is located in the western part of the country, near the Italian border. It contains six municipalities, i.e., Kanal, Brda, Nova Gorica, Šempeter-Vrtojba, Renče-Vogrsko and Miren-Kostanjevica ( Figure 1c). The two regions are approximately 200 km apart.
The knowledge level of the data on the building stock in Slovenia is relatively good since quite a significant amount of building characteristics data is stored in the Real Estate Register [37]. Some of the basic building characteristics aggregated at the municipality level are presented below. Please note that the data from [37] refer to building units, which can either be entire buildings or parts of buildings. As the distinction between these two categories is not evidenced in [37], it is also omi ed in this study, and all building units are termed 'buildings'. Table 3 shows the number of buildings in a single municipality, the municipality area, the total net floor area of all buildings in the municipality and the number of buildings per square kilometre.
In the Western region, the largest municipality is Nova Gorica, with an area of 279.5 km 2 . Nova Gorica also has the largest number of buildings (7451) and the largest total net floor area of buildings (2,903,638 m 2 ). However, the buildings are not evenly distributed across the municipality; most are situated in an urban area around the city of Nova Gorica, while the rest of the municipality is mostly rural. The smallest municipality in terms of area is Šempeter pri Gorici, but it has more buildings than, for example, the municipality of Kanal. Šempeter pri Gorici also has the highest number of buildings per square kilometre (126). In terms of the total net floor area, Renče-Vogrsko is the smallest, with 400,000 m 2 . It also has the lowest number of buildings.
In the Eastern region, the areas of the municipalities and the number of buildings per square kilometre are more uniform than in the Western group. The largest municipality is Gornja Radgona, with 76.6 km 2 of total area, followed by Šentilj with 65.0 km 2 . These municipalities have almost the same number of buildings (i.e., about 2550), with Gornja Radgona having approximately 16% more total net floor area. The municipalities with the fewest buildings are Benedikt and Apače, with only 775 and 715 buildings, respectively.  Based on the Real Estate Register data, the buildings in the municipalities were further classified into the building classes defined in Section 2. Table 4 reports the percentage of buildings in each building class relative to the total number of buildings in the municipality and the percentage of the total net floor area of each building class relative to the total net floor area in the municipality. In the Western region, most buildings are in building class 1 (low-rise masonry buildings built before 1964). In contrast, in the Eastern region, most buildings are in building class 3 (low-rise masonry buildings built after 1982). Based on the net floor area percentages, the municipalities with the highest numbers of high-rise buildings are Nova Gorica, Šempeter-Vrtojba and Kanal in the Western region and Radenci, Šentilj and Gornja Radgona in the Eastern region.
To demonstrate the difference in the seismic hazard between the two regions, the PGA values for the centres of municipalities, soil type A and return periods of 475 and 2475 years are presented in Table 5. The PGA values were obtained from the hazard curves derived using the OpenQuake engine [28,29] and ESHM2020 model [30] (see Section 2). Based on Table 5, it is evident that the seismic hazard in the Western region is higher than in the Eastern region. The PGAs for the 475-year return period do not vary significantly within a given region. However, a slightly larger variation can be observed for the return period of 2475 years. For the Eastern region, the PGA for the return period of 475 years ranges between 0.07 g and 0.08 g, while that for the return period of 2475 years is between 0.16 g and 0.18 g. The range of PGAs in the Western region is between 0.14 g and 0.17 g for the return period of 475 years and between 0.32 g and 0.43 g for the return period of 2475 years. Table 4. Percentage of buildings in each building class relative to the total number of buildings in the municipality and the percentage of the total net floor area of each building class relative to the total net floor area in the municipality. The green and red colours indicate higher and lower percentages, respectively.

Methodology for the Estimation of Bias in Loss Estimation for Different Knowledge Levels of Building Data
The loss estimation bias for a municipal building stock was calculated in terms of log residuals δEAL,KL: where EALKL is the municipality's expected annual loss for the building data knowledge level (KL), and EALbase is the municipality's expected annual loss for the base-case KL. A positive and a negative δEAL,KL indicate overestimation and underestimation of the EAL, respectively. Such a measure of bias was selected because it allows a more straightforward comparison of negative and positive errors than the relative difference (EALKL − EALbase)/EALbase, which is bounded by −100% on the negative side and unbounded on the positive side. However, for low values of δEAL,KL (up to 0.1), the following applies: Both EALKL and EALbase are calculated as described in Section 2 (Equation (5)). The base-case KL corresponds to the complete data, as described in Section 3. However, other KLs represent a variation in the base-case KL defined to be consistent with the assumptions on the data limitations at those levels. Twelve KLs were defined, including the base-case KL. KLs were defined as a combination of (1) the location knowledge level (KLL) and (2) the floor-area knowledge level (KLFA) ( Table 6). KLL indicates the accuracy of the definition of buildings' locations within a municipality. By lowering KLL, the seismic hazard assigned to the municipality's building stock is less accurate, which introduces bias in the estimation of the EAL. However, because building-specific characteristics that affect the building's fragility are location-dependent, KLL also affects the EAL through the damage assessment. In contrast, KLFA indicates the precision of defining the net floor areas of buildings within a given building class in a municipality. The term 'net' is omi ed hereinafter for brevity. The floor area of a building is the building's main characteristic affecting the outcome of the consequence assessment. Therefore, reducing KLFA also introduces bias in the estimation of EAL at the building level and, consequently, at the municipality level.  All buildings lumped in one point; hazard equal to that in the municipality's centre All buildings lumped in one point; hazard equal to the municipality's average Buildings' locations are randomly selected consistent with the building density distribution across the municipality; hazard determined for defined locations Actual location of each building considered; hazard determined for defined locations Four location knowledge levels (KLL = '1′, '2′, '3′ or '4′) and three floor-area knowledge levels (KLFA = 'a', 'b' or 'c') were considered. The highest (base-case) KL combines KLL = '4′ and KLFA = 'c' and is therefore denoted as '4c'. This KL indicates building-specific data, both in terms of the buildings' locations and floor areas. Such a KL is available in Slovenia, as explained in Section 3.
At the lowest KL ('1a'), no distinction was made between the locations or floor areas of different buildings in a municipality. Instead, the seismic hazard of all buildings in a municipality was considered equal to that in the centre of the municipality, taking into account the municipality's soil conditions (KLL = '1′). Therefore, the hazard curve applied to each building in a municipality was changed to the hazard curve in the grid point closest to the municipality's centre and adjusted for the average soil conditions. Moreover, as no distinction was considered between the floor areas of different buildings, the floor areas of all buildings in a municipality were changed to the average floor area in the municipality (KLFA = 'a'). Such a KL was selected as the lowest one because it is based on very limited data while still representing a realistic scenario. It combines the treatment of the seismic hazard used in a previous study of seismic risk [5] and the availability of floorarea data observed for Austria [44]. The first intermediate location knowledge level (KLL = '2′) was defined similarly as KLL = '1′ in that no distinction was made between the locations of buildings in a municipality. However, in contrast to KLL = '1′, the hazard of buildings at KLL = '2′ was considered equal to the municipality's average hazard, assuming uniform building density distribution across the municipality. Therefore, the hazard curve applied to each building in a municipality was changed to the average of the hazard curves from all grid points in the municipality and adjusted for the average soil conditions. At the second intermediate location knowledge level (KLL = '3′), the variation in buildings' locations in a municipality was considered. The locations were selected consistently with the building density distribution across the municipality. Thus, they were the same as in the case of KLL = '4′. However, level KLL = '3′ differed from KLL = '4′ in that it did not allow consideration of the actual building characteristics at a given location. Therefore, the locations of buildings in the municipality were randomly permuted in a set of simulations, and the average EAL over all simulations was considered as EALKL.
Moreover, the intermediate floor-area knowledge level (KLFA = 'b') indicates that the distinction is made between the building classes but not between the buildings within a given building class. Therefore, the floor area of each building was changed to the average floor area in its building class.
The availability of data can influence the selection of KL. For example, suppose the coordinates of the municipality's borders are not fully known, but those of the municipality's centre are known. In that case, KLL = '1′ can be selected because it does not require calculation of the hazard over the entire municipality's area. If the municipality's borders are also known, the average hazard over the municipality area can be evaluated, and KLL = '2′ can be applied. If the building density distribution is known, from remote sensing imagery, for example, KLL = '3′ can be used. Finally, if data on the characteristics of buildings at each specific location are available, the highest location knowledge level considered in this study, KLL = '4′, can be selected. Similarly, if there are no data on the floor areas typical for different building types, the average floor area in the municipality may be assigned to each building, which corresponds to KLFA = 'a'. If the floor areas typical of the building types considered can be estimated, KLFA can be increased to 'b'. Finally, if the variation in floor areas within the building classes is known, KLFA = 'c' can be used. Increasing KL can be quite simple in some cases (e.g., identifying the municipality's borders) but more demanding in others (e.g., determining the floor-area variation in the municipality). However, a KL increase typically requires a certain amount of effort, which is not necessarily justified given the resources available for the analysis.
It should be noted that the selection of KL can also represent the analyst's decision to simplify the analysis despite the available data. For example, the selection of an appropriate exposure resolution to obtain low-bias results, given that data are available, was explored in [7]. It should also be noted that the list of KLs considered herein is not exclusive. Additional KLs could be defined between the lowest and highest KL. For example, the location and floor-area data could be defined at the census tract level, which is an intermediate level between the building-specific and municipality levels. In addition, other KLs could be defined using less accurate data than those representative of KL = '1a' or more accurate data than those representative of KL = '4c'. For example, the number of buildings in different building classes could be considered unavailable, which would introduce additional uncertainty in the application of the fragility model. Consideration of such options is beyond the scope of this study.

Bias in Seismic Risk for the Selected Building Data Knowledge Levels
The log residuals δEAL,KL (Equation (6)) selected as the measure of bias were estimated for each KL. At the lowest KL ('1a'; Figure 2), the highest δEAL,KL was calculated for the municipality of Benedikt (+0.17) in the Eastern region, while the lowest δEAL,KL corresponded to the municipality of Miren-Kostanjevica in the Western region (−0.17). Interestingly, most municipalities in the Western region had a negative δEAL,KL, while a majority of the municipalities in the Eastern region had a positive δEAL,KL. Accordingly, the mean δEAL,KL calculated at the regional scale (rightmost dot in Figure 2) was positive for the Eastern region and negative for the Western region. The bias was relatively small in both cases, as the absolute values of δEAL,KL were between 0.05 and 0.06. By increasing the building data knowledge level to KL = '1b', the δEAL,KL values of all municipalities in the Western region shifted by a similar amount (Figure 3b). This shift was negative (−0.09 on average), thus intensifying the error compared to KL = '1a'. The largest bias was again observed for the municipality of Miren-Kostanjevica (−0.25). Such a result may be surprising because it implies that by improving the data, the bias increases. However, it can be explained by the fact that, for this particular region, the bias due to low KLL was negative, while the bias due to low KLFA was positive. By increasing KLFA from 'a' to 'b', the positive part of the bias decreased, leading to a lower reduction in the negative bias and, consequently, to an increase in the absolute values of δEAL,KL. Conversely, in the case of the Eastern region (Figure 3a), an increase in the building data knowledge level to KL = '1b' diminished the overall bias so that the maximum municipality δEAL,KL was only +0.06, and the average δEAL,KL in the region was practically negligible. An opposite outcome was observed when the floor-area knowledge level was maintained at KLFA = 'a' and the location knowledge level was increased to KLL = '3′ ( Figure 4). In this case, the δEAL,KL values for the municipalities in the Western region generally increased compared to the values obtained at KL = '1a'. However, their absolute values decreased and were all lower than 0.07, which means that improving KLL had a significantly be er effect on bias reduction than improving KLFA. On the other hand, δEAL,KL for the municipalities in the Eastern region did not drastically change when KL increased from '1a' to '3a' (Figure 4a), and the bias reduction was not as pronounced as when KL increased to '1b'. This implies that KLFA was more crucial for this particular region than KLL. For other KLs, only the minimum, maximum and average δEAL,KL values for a region are presented for brevity ( Figure 5). Some of the observations mentioned above related to KL = '1a', '1b' and '3a' can be further generalised. For the Western region (Figure 5b), an improvement in KLFA from 'a' to 'b' generally decreased δEAL,KL. However, decreasing δEAL,KL did not clearly reduce the bias (the absolute value of δEAL,KL did not change notably or was even further increased), except for KLL = '4′. When KLFA improved further to 'c', a small reduction in bias was again observed for KLL = '4′, while this reduction was negligible for other location knowledge levels. In fact, δEAL,KL was exactly the same for KL = '1b' and '1c' and KL = '2b' and '2c'. This is to be expected because all buildings were assigned the same hazard curve at location knowledge levels KLL = '1′ and '2′. Therefore, dividing the floor area within a building class evenly (KLFA = 'b') and based on the actual data (KLFA = c) resulted in the same building class's EAL. In contrast, for the Eastern region (Figure 5a), increasing KLFA from 'a' to 'b' clearly improved the accuracy of the risk estimation, as it resulted in a smaller range of δEAL,KL, centred closer to 0. However, when the level of this type of knowledge improved further to level 'c', the beneficial effect was again less significant, for the same reasons as in the case of the Western region.
With regard to the location knowledge, the most significant bias reduction was achieved by increasing KLL from '2′ to '3′ ( Figure 5), i.e., by including data on the building density distribution across the municipality. However, the effect of these data was much more pronounced for the municipalities in the Western region than for those in the Eastern region. Further improving KLL to '4′ had a small bias reduction effect, especially for the Western region, while improving this type of knowledge level from '1′ to '2′ had a negligible effect for both regions.
The results indicate that the effect of including different data types on the bias in seismic risk is municipality-dependent. To improve the understanding of this dependency, the municipal characteristics that affect the bias in EAL were identified, as presented in the following section.

Identification of Municipal Characteristics Affecting the Bias in Seismic Risk
Identifying the characteristics of the municipalities affecting the bias reduction can improve the understanding of why some municipalities are more prone to loss estimation bias in the case of incomplete data. It can also allow the analyst to predict whether adding data of a given type can reduce the bias if these characteristics are already available at the given building data knowledge level.
The municipal characteristics affecting the bias at the selected building data knowledge levels were identified in two steps. In the first step, the data types that contribute most to the bias reduction were determined. The results presented in the previous section indicate that the bias at the regional scale can be most reduced by including the data on building density distribution across the municipality (transition from KLL = '2′ to KLL = '3′) and building-class-specific floor areas (transition from KLFA = 'a' to KLFA = 'b'). However, to obtain a clearer indication of the effect of different data types, the absolute differences in δEAL,KL caused by a gradual increase from knowledge level KL to knowledge level KL′, i.e., |δEAL,KL' − δEAL,KL|, were also calculated at the municipality level ( Figure 6). Consistent with the observations at the regional scale, it was found that the absolute difference in δEAL,KL at the municipality level was the largest when increasing KLL from '2′ to '3′ and KLFA from 'a' to 'b' (Figure 6). All other gradual increases in KL had a much lower effect on the bias. Including the data on building density distribution across the municipality in the analysis (KLL from '2′ to '3′) changed δEAL,KL by as much as 0.21, regardless of the floor-area knowledge level, while adding the data on building-class-specific floor areas (KLFA from 'a' to 'b') resulted in the absolute difference of δEAL,KL being as high as 0.15, with a small variation across the location knowledge levels. For both knowledge-level increases, the lowest absolute difference of δEAL,KL was close to 0, which implies that including new data affects the bias reduction differently for different municipalities.
In the second step of the identification of the essential municipal characteristics related to the bias in seismic risk, the variations in the absolute difference of δEAL,KL for the two most significant increases in KL were analysed further to identify the indicators that correlate well with the bias reduction. Several simple indicators (e.g., municipality's area, number of buildings and average building density) and more advanced indicators were tested. The indicators that were found to best explain the bias reduction for the two most significant increases in KL are presented in the following.
The effect of increasing KLFA from 'a' to 'b' on the bias can be explained by the municipality's building stock homogeneity index IBSH, calculated as follows: where Nb,i is the number of buildings in the i th building class in the municipality, Nb is the total number of buildings in the municipality and nc is the number of building classes in the municipality. The value of IBSH is between zero and one. It equals one if the municipality has only one building class and approaches zero if the buildings are distributed into more building classes. Moreover, for a given nc, IBSH is equal to 1/nc if the building class distribution is uniform, while it takes a larger value if most buildings are in only a few building classes. IBSH correlated reasonably well with the absolute difference in δEAL,KL caused by the increase from KLFA = 'a' to KLFA = 'b', i.e., |δEAL,KL' − δEAL,KL|a→b (Figure 7). Regardless of the location knowledge level, the correlation coefficient ρ was about −0.65, and the coefficient of determination R 2 was approximately 0.43. The higher the IBSH is, the lower the benefit is of adding the data on building-class-specific floor areas in the analysis. In other words, the higher the IBSH is, the more likely it is that KLFA = 'a' results in a low-bias estimate of EAL. The reason for this is that higher values of IBSH indicate that the municipality's building stock is less fragmented and that, consequently, the average floor area over all buildings in the municipality can be er describe the average floor area in the building class with the predominant contribution to the EAL. For example, almost all buildings in the municipality of Tišina (E-T) are masonry with a maximum of three storeys (Table 4). Therefore, the building stock is relatively homogenous, which is reflected by IBSH = 0.275 (Figure 7), and the average floor area over all buildings in the municipality (200 m 2 ) is close to the average floor areas of the three building classes that contribute most to the EAL (182, 202 and 211 m 2 ). Consequently, providing additional data on building-classspecific floor areas did not reduce the bias significantly (by approximately 3% for all location knowledge levels). For other municipalities, IBSH ranged from 0.16 to 0.27. In the case of IBSH greater than 0.25, using floor-area knowledge level KLFA = 'a' could produce relatively accurate results (Figure 7). It should be noted that the absolute values of IBSH depend on the building classification. For a different number of classes, different IBSH values would indicate the sufficiency of KLFA = 'a'. However, the effect of increasing KLL from '2′ to '3′ on the loss estimation bias was found to be explainable by the hazard variation index IHV, calculated as follows: where , is the average building-specific EAL in the municipality calculated for KL = '2a', and , , is defined as follows: In Equation (10) is the average standard deviation of building-specific EALs in the municipality with randomly distributed building stock. It is calculated as the average of the standard deviations , , obtained in multiple simulations. In each simulation, each building in the municipality is assigned a random location within the municipality (disregarding the building density distribution across the municipality), while the fragility functions and floor area remain the same as in the calculation of , . Building-specific EALs are then determined, and , , is calculated as their standard deviation. In this study, 50 such simulations were performed. The calculated , , reflects the variability in the buildings' fragility and the variability in the hazard across the municipality. Assuming that these two variabilities are uncorrelated and that building-specific EALs in the municipality are normally distributed, , , in Equation (10) represents the variation in building-specific EALs resulting only from the municipality's hazard variability. The index IHV can thus be understood as a coefficient of variation of building-specific EALs resulting only from the hazard variability.
IHV is positively correlated with the absolute difference in δEAL,KL caused by a knowledge level increase from KLL = '2′ to KLL = '3′, i.e., |δEAL,KL'-δEAL,KL|2→3 (Figure 8). Regardless of the floor-area knowledge level, ρ was about +0.69, and R 2 was approximately 0.47. This positive correlation implies that an increase in IHV intensifies the potential for the hazard variation to alter the municipality's EAL when data on building density distribution across the municipality are included. For example, for the municipality of Miren-Kostanjevica (W-MK), the average and standard deviation of building-specific EALs calculated for KL = '2a' ( , and , , respectively) were EUR 137 and EUR 69, respectively. When the buildings were moved from the municipality's centre to random locations in the municipality, the standard deviation, on average, increased notably ( , , = EUR 85), leading to , , = EUR 49 (Equation (10)) and IHV = 0.36 (Equation (9)). Such a high IHV indicates that the municipality's EAL could change significantly if the buildings were moved to their actual location. This was also indicated by |δEAL,KL'-δEAL,KL|2→3 being equal to 0.21 ( Figure 8). For other municipalities, IHV was between 0.06 and 0.42. The results suggest that for IHV less than 0.1, the hazard variation in the municipality does not have significant potential to cause bias in the estimation of seismic risk.
The indexes IBSH and IHV can therefore be used to estimate whether additional data would significantly reduce the bias in seismic risk estimation. It is worth emphasising that IBSH can be calculated at all KLs considered in this study, while IHV can be obtained at KL = '2a' or higher. Therefore, its application requires at least data on the municipality's borders and average floor area over all buildings in the municipalities.

Conclusions
The effect of the availability of building data on loss estimation was analysed for fifteen municipalities in two Slovenian regions. Twelve knowledge levels of building data were introduced as a combination of the location knowledge level and floor-area knowledge level. The bias for a given building data knowledge level was defined as the log residual between the municipality's expected annual loss (EAL) estimated for that knowledge level and the municipality's EAL estimated for the base-case knowledge level.
The log residuals of EAL ranged from −0.25 to +0.16 at the lowest knowledge levels. The bias was affected by both the location and floor-area knowledge levels. It was found that introducing more data can reduce the bias but not in general. Among the gradual knowledge level improvements considered in this study, those corresponding to the inclusion of data on the building density distribution across the municipality and data on the building-class-specific floor areas proved to be the most beneficial. Additional data on building-specific locations and floor areas reduced the bias only slightly.
The importance of a given type of data was found to be dependent on the municipality. However, the results suggest that for a given municipality, the reduction in bias caused by introducing new data can be estimated in advance based on readily available data. Introducing building-class-specific floor area data is anticipated to mitigate bias more effectively in municipalities with a lower building stock homogeneity index (IBSH), which quantifies the fragmentation of buildings into building classes. For IBSH values greater than 0.25, using the average floor area for all buildings in the municipality was sufficient to obtain a relatively accurate estimate of EAL. However, including data on the building density distribution across the municipality is expected to yield greater advantages for municipalities with a higher hazard variation index (IHV), which measures the potential for the hazard variation in the municipality to intensify bias in the estimation of seismic risk. The findings imply that for IHV values less than 0.1, the bias remains low even if assigning the average hazard in the municipality to each building. Both indexes were introduced in this study and require a low knowledge level of readily available data.
The presented findings can be beneficial in making decisions on collecting additional data when the current data knowledge level is low, especially in the case of limited resources. In such situations, it is suggested to focus primarily on obtaining data types that proved to have a higher impact on bias reduction. Moreover, it is suggested that the prioritisation of municipalities for enhancing their data knowledge levels be guided by indexes such as IBSH and IHV.
The findings refer to the selected building data knowledge levels and building classification. Additional research is needed to identify the sources of loss estimation bias for other knowledge levels, which may be below, above or between the knowledge levels considered in this study. Further research is also needed to generalise the concepts introduced in this study to other regions and building classifications.