Effects of Different Normalization, Aggregation, and Classification Methods on the Construction of Flood Vulnerability Indexes

Index-based approaches are widely employed for measuring flood vulnerability. Nevertheless, the uncertainties in the index construction are rarely considered. Here, we conducted a sensitivity analysis of a flood vulnerability index in the Maquiné Basin, Southern Brazil, considering distinct normalization, aggregation, classification methods, and their effects on the model outputs. The robustness of the results was investigated by considering Spearman’s correlations, the shift in the vulnerability rank, and spatial analysis of different normalization techniques (min-max, z-scores, distance to target, and raking) and aggregation methods (linear and geometric). The final outputs were classified into vulnerability classes using natural breaks, equal interval, quantiles, and standard deviation methods. The performance of each classification method was evaluated by spatial analysis and the Akaike’s information criterion (AIC). The results presented low sensitivity regarding the normalization step. Conversely, the geometric aggregation method produced substantial differences on the spatial vulnerability and tended to underestimate the vulnerability where indicators with low values compensated for high values. Additionally, the classification of the vulnerability into different classes led to overly sensitive outputs. Thus, given the AIC performance, the natural breaks method was most suitable. The obtained results can support decision-makers in reducing uncertainty and increasing the quality of flood vulnerability assessments.


Introduction
Vulnerability has an important role in flood risk assessment, as hazards only become disasters if there are vulnerable people or infrastructure located in hazard-exposed areas [1]. Indeed, flood impacts strongly depend on the vulnerability of the exposed system or community [2,3]. Thus, the knowledge of vulnerability is fundamental for assessing flood risk, as it allows computing the susceptibility of the exposed elements [4] by considering multiple dimensions [5]. Furthermore, the assessment of vulnerability allows identifying hot spot areas and the main drivers that contribute to it (e.g., social, economic, physical, cultural, environmental, and institutional) [6].
According to Nasiri et al. [7], flood vulnerability is usually assessed by employing the following methods: (i) vulnerability curve, (ii) disaster loss data, (iii) computer modeling, and (iv) index-based. The latter is recommended by several authors, because it allows for a holistic analysis of the vulnerability dimensions, aiming to ensure a better representation of reality [5,[7][8][9]. Additionally, the use of indexes allows to simplify the system conditions and behavior, to summarize complex and multidimensional issues, to facilitate interpretations by end-users, and to reduce the number of indicators [10].

Flood Vulnerability Index Construction
The construction of indexes usually consists of the following steps: (i) choice of the phenomenon to be measured, (ii) indicators selection, (iii) normalization, (iv) weighting, (v) aggregation, (vi) classification of the results in different classes, (vii) sensitivity and uncertainty analysis, and (viii) validation [19].
In order to investigate uncertainty in the construction of a flood vulnerability index for our case study, we (i) chose relevant indicators based on a systematic literature review [45,46]; (ii) normalized these indicators by using the four most common methods (minmax, z-scores, distance to target, and raking); (iii) employed an equal-weights scheme; (iv) aggregated the normalized indicators with equal weights by two aggregation methods: linear and geometric; (v) classified flood vulnerability outputs by considering four methods; and (vi) performed a sensitivity and uncertainty analysis of the normalization and aggregation methods by computing the Spearman correlation between the outputs and the rankings using a box plot. Furthermore, the performance of the different classification methods was evaluated by using the Akaike's information criterion (AIC) (Figure 2).

Flood Vulnerability Index Construction
The construction of indexes usually consists of the following steps: (i) choice of the phenomenon to be measured, (ii) indicators selection, (iii) normalization, (iv) weighting, (v) aggregation, (vi) classification of the results in different classes, (vii) sensitivity and uncertainty analysis, and (viii) validation [19].
In order to investigate uncertainty in the construction of a flood vulnerability index for our case study, we (i) chose relevant indicators based on a systematic literature review [45,46]; (ii) normalized these indicators by using the four most common methods (min-max, z-scores, distance to target, and raking); (iii) employed an equal-weights scheme; (iv) aggregated the normalized indicators with equal weights by two aggregation methods: linear and geometric; (v) classified flood vulnerability outputs by considering four methods; and (vi) performed a sensitivity and uncertainty analysis of the normalization and aggregation methods by computing the Spearman correlation between the outputs and the rankings using a box plot. Furthermore, the performance of the different classification methods was evaluated by using the Akaike's information criterion (AIC) (Figure 2).

Figure 2.
Flowchart with the methodological outline. The uncertainty of the different normalization and aggregation methods was evaluated using both the Spearman correlation and the ranking index analysis (blue boxes). The uncertainty coming from the index classification step was evaluated by employing the Akaike's information criterion (AIC) analysis (green boxes).
The indicators used to represent flood vulnerability were selected according to: (i) their relevance, as evidenced by a systematic literature review conducted in a previous step of this research, (ii) the availability of data to represent them, and (iii) their suitability to the Brazilian context. Based on these criteria, we selected 19 indicators (Table 1). The datasets used to represent them were obtained from the Brazilian 2010 Census [43]. The spatial resolution of this data corresponds to the census block level [43]. Among the 20 census blocks in Maquiné River Basin, two were ignored (8 and 9, see Figure 1), because there are no people living there and, consequently, no vulnerability. The selected indicators were grouped into three vulnerability dimensions: social, economic, and physical (Table 1). Given to data availability limitations, relevant criteria such as distance to critical infrastructure and risk perception were not considered.  [43]. R$ denotes the Brazilian currency. * Dependency rate is an age-population ratio of persons that are not in the labor force (persons' ages 0 to 14 and 65+). The indicators used to represent flood vulnerability were selected according to: (i) their relevance, as evidenced by a systematic literature review conducted in a previous step of this research, (ii) the availability of data to represent them, and (iii) their suitability to the Brazilian context. Based on these criteria, we selected 19 indicators ( Table 1). The datasets used to represent them were obtained from the Brazilian 2010 Census [43]. The spatial resolution of this data corresponds to the census block level [43]. Among the 20 census blocks in Maquiné River Basin, two were ignored (8 and 9, see Figure 1), because there are no people living there and, consequently, no vulnerability. The selected indicators were grouped into three vulnerability dimensions: social, economic, and physical (Table 1). Given to data availability limitations, relevant criteria such as distance to critical infrastructure and risk perception were not considered.  [43]. R$ denotes the Brazilian currency. * Dependency rate is an age-population ratio of persons that are not in the labor force (persons' ages 0 to 14 and 65+).
The indicators were normalized, (i.e., were changed into the same units) to allow their aggregation. The following methods were chosen, as proposed by Saisana and Saltelli [47].

•
Min-max-rescales values from 0 (worst rank for a specific indicator) to 1 (the best). It subtracts the minimum value and divides it by the range of the maximum value subtracting the minimum value: • Z-cores-converts all the indicators to a common scale with a mean of zero and a standard deviation of one: • Distance to target-normalizes indicators by dividing the unit's value by a reference target (i.e., maximum value): • Ranking-based on ordinal variables that can be turned into quantitative variables: where y in is the normalized indicator, x in is the indicator value, x in is the indicator's average, and σ x in is the indicator standard deviation.
After normalizing the indicators, they were aggregated by two methods: linear and geometric, as shown by Equations (5) and (6), respectively. All the indicators had the same weight, thus receiving the same importance. This allowed focusing only on the uncertainties of the normalization and aggregation methods, because different weights would interfere in the final results.
where n ∑ i w i = 1 and 0 ≤ w i ≤ 1, for all i = 1, . . . , n, and w is the weight associated with a normalized value (I) for the indicator i, and q is the number of indicators. Finally, the flood vulnerability index was spatialized and classified into five categories: "very low", "low", "medium", "high", and "very high". This classification was made by four methods: • Natural breaks (Jenks)-class breaks are identified by the best group similar values that maximize the differences between classes. The features are divided into classes whose boundaries are set where there are relatively big differences in the data values. • Equal interval-it divides the values into equal-sized classes. After specifying the number of intervals, the class breaks based on the value range are automatically determined. • Quantile-each class contains an equal number of features. • Standard deviation-shows how much a feature's attribute value varies from the mean. Class breaks are created with equal value ranges that are a proportion of the standard deviation, usually at intervals of one, one-half, one-third, or one-fourth the standard deviations using mean values and the standard deviations.

Sensitivity Analysis
The sensitivity analysis or robustness test was conducted to understand how the variation in the output parameters can be apportioned to different choices of normalization and aggregation methods, as well as the index classification methods.
We tested the robustness of our results by changing the input data parameters, considering a local sensitivity analysis (SA) termed one-at-a-time SA [48]. By variating the normalization, aggregation, and classification methods, we verified how these disturbances affected the results when all the other parameters remained constant [49]. The similarity of the outputs when considering these changes was measured by conducting a correlation analysis using Spearman's rank correlation [12,34,39]. This nonparametric correlation allows measuring the strength of the association between two variables [50]. Additionally, we computed the sensitivity according to the rank methodology proposed by Hudrliková [51] to examine the relative vulnerability ranking of the census block with different normalization and aggregation approaches.
To investigate the sensitivity of the different classification method schemes and identify the most suitable one, we adopted the Akaike's information criterion (AIC) [52]. At first, we mapped all the indexes by using all the classification methods to investigate the spatial differences. Then, the AIC was applied to the classification of geospatial data [37] by using Equations (7) and (8).
where x i is the geospatial data (index value), m is the number of classes, G k is the kth class of data, X is the global sum of x i data, and n k is the number of data in class G k .

Results
Based on four normalization and two aggregation methods, six flood vulnerability indexes or scenarios (N1AL, N2AL, N3AL, N4AL, N2AG, and N4AG) were generated ( Figure 3). Two normalization methods were unable to be used with geometric aggregation: min-max and distance to target. The former transformed the minimum value to zero, resulting in a final vulnerability of zero. Similarly, with the distance to the target method, it was only possible to aggregate indicators with nonzero values. This indicates that the geometric aggregation operator does not possess the ability to aggregate such types of information effectively [53]. In this case, it was necessary to exclude seven indicators that presented values equal to zero in some census blocks.  The correlations between the flood vulnerability indexes (Figure 3) show that the majority of indexes have a high correlation with each other, with values ranging from 0.818 to 0.998. However, the lowest correlations were found with N2AG with all the indexes,  The correlations between the flood vulnerability indexes (Figure 3) show that the majority of indexes have a high correlation with each other, with values ranging from 0.818 to 0.998. However, the lowest correlations were found with N2AG with all the indexes, with Spearman correlation coefficients near to zero. This occurs because Z-scores normalize some values near to zero. Hence, when these indicators are aggregated using the geometric method, the high scores result into values near zero (low vulnerability). Besides, the correlation between N2AG and N4AG is negative, which indicates that the flood vulnerability values vary in contrast to each other.
All scenarios with linear aggregations have linear and positive correlations (Figure 4a-f). This is not the case for all correlations between the indexes based on geometric aggregation and the others (Figure 4g-o). Although N4AG (Figure 4m) has a high correlation with all the indexes that used linear aggregation, it is not linear.   The robustness of the results when considering the normalization and aggregation steps was also tested by computing the shift in rank for each census block (CB) ( Figure 5). Overall, the vulnerability outcomes for each CB obtained by the different normalization and aggregation methods confirmed the flood vulnerability independence of the normalization scheme selection. On the other hand, flood vulnerability tends to be sensitive to the aggregation schemes. and aggregation methods confirmed the flood vulnerability independence of the normalization scheme selection. On the other hand, flood vulnerability tends to be sensitive to the aggregation schemes.
CB1 and CB7 had, overall, the highest vulnerability scores (i.e., they were usually ranked with high vulnerability by all scenarios), with a relatively low sensitivity. This is mainly because these are the most urbanized areas, and hence, after normalization, they had the highest vulnerability scores, given their higher population density. CB11 was the census block with the highest variability, whereas CB12, CB19, CB13, CB3, and CB4 were the least sensitive ones. In Figure 5, the higher sensitivities are observed in all census blocks for the N2AG index. Similarly, N4AG presents relative differences in most of the census blocks in comparison to the other indexes. Indeed, the geometric aggregation method carries the most sensitive in the flood vulnerability outputs. Meanwhile, all normalization techniques with linear aggregation stayed almost in the same rank position, except when considering the distance to target normalization (N4AL) for the CB4. are organized according to their rank score values. High ranks indicate higher vulnerability, whereas lower values represent CB ranked with low vulnerability. Outliers are denoted by circles and extremes by asterisks. The CB spatial location is shown in Figure 1. N1 = Min-max normalization, N2 = Z-scores normalization, N3 = distance to target normalization, N4 = ranking normalization, AL = linear aggregation, and AG = geometric aggregation.
To understand how these methods affect the spatial behavior of flood vulnerability, maps with the class switches were generated by using four classification methods. These maps were classified into "very low", "low", "medium", "high", and "very high" flood vulnerabilities [3,54,55] based on the natural breaks and equal interval classification methods ( Figure 6) and quantile and standard deviation classification methods (Figure 7). Ranking by different normalization and aggregation methods. The census blocks (CB) are organized according to their rank score values. High ranks indicate higher vulnerability, whereas lower values represent CB ranked with low vulnerability. Outliers are denoted by circles and extremes by asterisks. The CB spatial location is shown in Figure 1. N1 = Min-max normalization, N2 = Z-scores normalization, N3 = distance to target normalization, N4 = ranking normalization, AL = linear aggregation, and AG = geometric aggregation. CB1 and CB7 had, overall, the highest vulnerability scores (i.e., they were usually ranked with high vulnerability by all scenarios), with a relatively low sensitivity. This is mainly because these are the most urbanized areas, and hence, after normalization, they had the highest vulnerability scores, given their higher population density. CB11 was the census block with the highest variability, whereas CB12, CB19, CB13, CB3, and CB4 were the least sensitive ones. In Figure 5, the higher sensitivities are observed in all census blocks for the N2AG index. Similarly, N4AG presents relative differences in most of the census blocks in comparison to the other indexes. Indeed, the geometric aggregation method carries the most sensitive in the flood vulnerability outputs. Meanwhile, all normalization techniques with linear aggregation stayed almost in the same rank position, except when considering the distance to target normalization (N4AL) for the CB4.
To understand how these methods affect the spatial behavior of flood vulnerability, maps with the class switches were generated by using four classification methods. These maps were classified into "very low", "low", "medium", "high", and "very high" flood vulnerabilities [3,54,55] based on the natural breaks and equal interval classification methods ( Figure 6) and quantile and standard deviation classification methods (Figure 7).   When considering the linear aggregation (AL), the spatial distributions of the vulnerability obtained by the Min-max (N1), z-scores (N2), and distance to target (N3) normalization methods are identical for all the census blocks for the natural breaks (Figure 6a-c), quantile (Figure 7a-c), and standard deviation (Figure 7g-i) classification methods. The only exception occurs when using the equal interval approach (Figure 6a,g,h) and the ranking normalization (N4).
The flood vulnerability classes generated by the geometric aggregation (AG) method are, in some cases, drastically different when compared with linear aggregation for all classification methods. Indeed, some census blocks change from the "very low" (Figure 6a-c) to "very high" (Figure 6e) vulnerability class. Such a high sensitivity from using different aggregation methods can result in inaccurate outputs. The major differences were observed in the equal interval (Figure 6k,l) and standard deviation (Figure 7k,l) classification methods, where most census blocks were classified with a "very low" and "low" vulnerability, according to the geometric aggregation method. This tends to underestimate the real vulnerability.
When fixing the index and focusing on the classes generated by different classification methods, significant differences can be found. For instance, for the N1AL index, one census block changed from the "high" to "medium" vulnerability class created by the natural breaks ( Figure 6a) and equal interval (Figure 6g) methods, respectively. In comparison, in the same index, four census blocks changed their classes from the natural breaks ( Figure 6a) to quantile (Figure 7a) classification method. These differences took place for all the indexes in all classification methods. Indeed, the index classification methods introduced great uncertainty. Figure 8 demonstrates the percentage of the areas classified according to each index and classification method. The percentage of the census block for each class was similar, except for the standard deviation method, where the quantities of the census block in each class changed significantly in all the normalization and aggregation methods. For example, for the N1AL and N2AL indexes, 16.67% of the census block was classified as having "very high" vulnerability, which strongly disagrees with the standard deviation classification method, which calculated 5.56%.
Other percentage differences took place in the same classification methods as the N4AL and aggregation methods when compared with the N1Al, N2Al, and N3AL indexes. It is important to note that all indexes have the same quantities for all classes in the quantile classification method. Although the percentage is the same, there are spatial changes (e.g., in the N1AL index, 22% of the census block was classified as having "medium" vulnerability in the natural breaks and quantile classification methods); when a census block was classified as having "medium" vulnerability for the N1AL index classified using natural breaks, the same blocks were classified as having "low" vulnerability for the N1AL index using the quantile method.
Finally, the performances of the classification methods were analyzed by the AIC estimations (Table 2), where the lowest values indicate the best performances. These estimations were not performed for the indexes based on z-scores normalization with linear (N2AL) and geometric aggregation (N2AG), because the sum of all the indicators was zero, and some indicators had negative values. Overall, the natural breaks method provides the lowest AIC for the N1AL and N3AL indexes; however, for the equal interval methods, we found the lowest AIC for N4AL and N4AG. As shown in Table 2, the geometric aggregation method (N4AG) also resulted in higher variances when compared to the other indexes that used linear aggregation.

Indices
Natural

Discussion
We verified the sensitivity of the different normalization and aggregation methods to construct a flood vulnerability index. Additionally, we investigated how different index classification methods could modify the flood vulnerability results. By analyzing the outputs, we could derive the following general summary: (i) most of the indexes have a high positive and linear correlation between each other, except for indexes generated with geometric aggregation, (ii) a high sensitivity to flood vulnerability arises from indexes created by geometric aggregation when compared with linear aggregation, (iii) the results are not sensitive to the different normalization methods, (iv) the flood vulnerability classes vary significantly for the indexes based on geometric aggregation for most classification methods, except quantile, (v) significant spatial changes of the flood vulnerability class occur for equal intervals for indexes that do not change outputs the for other classification methods, and (vi) according to the AIC, the natural breaks and equal intervals have the best performances among the investigated classification methods.
No significant differences were observed among the min-max, z-scores, and distance to target normalization methods with linear aggregation. The low sensitivity by normalization methods based on linear aggregation were also demonstrated by the shift in rank ( Figure 5). Even though ranking normalization with liner aggregation brought a high Spearman's correlations coefficient (0.92 to 0.93), there were small spatial changes in the flood vulnerability class in comparison with the other normalization methods with linear aggregation. High Spearman's correlation coefficients (>0.98) for the min-max and z-scores were also found in the vulnerability indexes elaborated by other authors [12,34]. However, for other application areas (e.g., the agricultural sustainability index), significant differences were found according to the different normalization methods [39].
On the other hand, when the flood vulnerability indexes were created by geometric aggregation, the outputs were very distinct. This is because indicators with very low values were fully compensated by indicators with high values [19]. This can be observed in Figures 6 and 7, where most of all parts of the Maquiné Basin were classified as "low" or "very low" flood vulnerabilities. Simultaneously, this method forbids to use indicators with zero scores or with normalization techniques that result in zero scores, such as the min-max. Since studies of vulnerability normally include social, economic, cultural, environmental, and other dimensions, these indicators are mutually preferentially independent. In these cases, linear aggregation is preferred [56].
In addition to uncertainty regarding the aggregation method, we identified high uncertainties in the index classification. High spatial sensitivities were observed in all the classification methods. For the equal intervals, different from other methods, changes occurred with the N1AL, N2AL, and N3AL indexes. Since it divides the score data into equalsized classes, its performance might not be optimal for different types of normalization and aggregation methods whose distributions scores are different. The best performance confirmed with the AIC were attributed to natural breaks and equal intervals.
Based on the disadvantages of the equal interval methods mentioned above and based on the AIC, the natural breaks method performed more efficiently than the others. Since the quantile method divides the data by the quantities of elements in each class, its performance might not be optimal for different distributions, and the standard methods do not show real scores, only how far these are from the average. On the other hand, the natural breaks method could show the variance of each class from the average and determine the location of class breaks based on the numerical scores of the features. These methods generated the most suitable classification map for other index studies as landslide susceptibility [36] and intelligent compaction data [37].
The spatial analysis of the elaborated indexes showed that some regions tend to be overly sensitive to model changes, whereas others present robust outcomes (e.g., CB12, CB19, CB13, CB3, and CB4 in Figure 5). Furthermore, the vulnerability classification into different classes also contributed to the spatial variation. This information can be used by end users to conduct further studies aiming to investigate the role of the different criteria in shaping the vulnerability outcomes.
Notwithstanding the advances of our study, the limitations should be also pointed out. Although the investigated methods are most commonly used in flood vulnerability studies, there are others, such as the categorical scale, binary standardization, division by total and fuzzy for normalization methods, and non-compensatory aggregation technique. Therefore, future studies should focus on understanding the uncertainty underpinning these methods. Likewise, the choice of the criteria and the variation of their weights also generate significant uncertainty [25,42] and should be the subject of future research. Here, we decided to use equal weights because of difficulties in finding an acceptable weighting scheme. Nevertheless, when information on the indicator's importance is available, weighted indexes are recommended.

Conclusions
The present study investigated the effects of the use of different normalization, aggregation, and classification methods in order to construct a flood vulnerability index. The sensitivity analysis results provided information on the regions with high sensitivity, as well as the techniques that increased this variability. Overall, we concluded that: • Normalization techniques such as the min-max, z-scores, and distance to target do not make significant changes in the flood vulnerability outputs. Among the normalization methods, the ranking method was most sensitive.

•
The choice of aggregation method strongly affects the vulnerability outcomes. For our case study, the geometric aggregation method was more sensitive when compared with linear aggregation, as it offered inferior compensability for the indexes, with lower scores. • For each classification method (natural breaks, equal interval, quantile, and standard deviation), there were changes in the same census block with respect to over-and underestimating the flood vulnerability. However, the natural breaks method had the best performance, according to the AIC values.
The present study contributes to addressing the importance of measuring the sensitivity of different steps while building vulnerability indexes. Based on our results, efforts can be taken to reduce the uncertainty. Focus should be given to census blocks classified with high and very high vulnerability and high sensitivity. These blocks are potentially vulnerable but need to be further examined due to their degree of uncertainty. Hence, the outcomes can support end users in reducing uncertainty and improving decision-making. The proposed approach can be transferred to other case studies, providing insights regarding the sensitivity of the flood vulnerability indexes.

Data Availability Statement:
No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest:
The authors declare no conflict of interest.