Developing Composite Indicators for Agricultural Sustainability Assessment : Effect of Normalization and Aggregation Techniques

The assessment of the sustainability of agricultural systems is multidimensional in nature and requires holistic measures using indicators with different measurements and units reflecting social, economic, and environmental aspects. To simplify the assessment process, various indicators have different units, and measurements are grouped under broad indicator heads, and normalization and/or transformation processes are carried out in order to aggregate them. In this study, a total of 50 indicators from agricultural sustainability categories of productivity, stability, efficiency, durability, compatibility, and equity are employed to investigate which normalization technique is the most suitable for further mathematical analysis for developing a final composite indicator. To understand the consistency and quality of normalization measurement techniques and compare the benefits and drawbacks of the various selected normalization processes, the indicators of agricultural sustainability are considered. Each of the different techniques for normalization has advantages and drawbacks. This study shows that the proportionate normalization and hybrid aggregation rules of the arithmetic mean and the geometric mean are appropriate for the selected data set, and that this technique has a wider applicability for developing composite indicators for agricultural sustainability assessment.


Introduction
Assessing agricultural sustainability holistically requires multidimensionality [1], and therefore a set of appropriate social, economic, and ecological indicators are needed [2,3].Multiple criteria methods [4,5] and different kinds of indicators have been designed to evaluate agricultural sustainability [3,6,7].A multidimensional set of indicators of agricultural sustainability can be interpreted using different statistical scales such asordinal and nominal, which may be difficult to handle.To avoid this problem, a multidimensional set of indicators can be aggregated into a composite indicator [8].Andreoli and Tellarini [9], Pirazzoli and Castellini [10], Rigby et al. [11], and van Calker et al. [12] have utilized this approach for agricultural sustainability assessment.A number of steps must be followed to develop a composite indicator.According to the Organization for Economic Co-operation and Development [13], there are 10 steps for constructing ideal composite indicators: theoretical framework, data selection, imputation of missing data, multivariate analysis, normalization, weighting and aggregation, uncertainty and sensitivity analysis, back to the data, links to other indicators, and visualization of the results.These 10 steps are well documented ( [13], pp.[20][21][22], see Table A1 in Appendix A); the normalization step is particularly important for constructing composite indicators.
In the case of agricultural sustainability assessment, the indicators are rarely in the same measurement units irrespective of scholars and organizations [2,3], so normalization for developing composite indicators requires special care.Applying the methodological process of constructing composite indicators in the coastal agricultural systems of Bangladesh as a case study, this paper aims to show the differences between normalization and aggregation techniques and their impacts on the resulting rankings of the composite indicators of the sustainability of agricultural systems.Mathematical experiments are carried out to test whether the ranking of the composite indicators is greatly influenced by the choice of a normalization technique.These tests are important in a situation in which there are no reference values or goalpost values for developing composite indicators to assess the sustainability of agricultural systems.This paper highlights the advantages and disadvantages and compares the results obtained when applying these tests to a dataset containing selected agricultural sustainability indicators for coastal Bangladesh.A point to be noted here is that composite indicators for agricultural sustainability assessment are still the subject of policy/strategic discussion or documents in Bangladesh.There have been a few studies [14][15][16] that measured aspects of the sustainability of agricultural systems in Bangladesh in a very isolated way, but the measures themselves have rarely been compared.Although a few studies in Bangladesh have used indicators for assessing agricultural sustainability, this is the first one in Bangladesh that checks the effects of normalization and aggregation techniques on developing composite indicators.

Brief Overview of Composite Indicators
The concept of composite indicators was introduced in the 1990s to capture the complexity and multidimensionality of a range of development issues [17].Since then, international organizations like the United Nations, World Bank, and European Commission have developed composite indicators [18] such as the Human Development Index (HDI), Environmental Performance Index (EPI), Gender Empowerment Measure (GEM), and Quality of Life Index.In the literature, the term "composite indicator" often refers to an index made up of aggregated data, ratings, league tables, and multidimensional measures [19][20][21].Bandura and Martin del Campo (as cited in [18]) found 160 composite indicators used around the world.
Although composite indicators are being used extensively, there is a spirited debate over the conceptual and methodological parameters for this measurement technique [22].For example, Sharpe [23] argued that producing a composite indicator/index is not a good idea because a single indicator is not appropriate to explain and compare any observed phenomenon and does not capture the relative importance of the components of the composite indicators [20].In spite of this limitation, composite indicators are considered to be desirable among policy makers and stakeholders due to their capacity to summarize complex issues [24], allow for cross comparisons, enable evaluation of results, set the bar for performance, and indicate the steps of accomplishment of a project [25].They are also useful for generating media interest about a phenomenon [23].Comprehensive discussions of the advantages and disadvantages of composite indicators are documented in Booysen [26], Foa and Tanner [18], and Nardo et al. [20].
Conceptually, composite indicators are based on sub-indicators that may have no common meaningful unit of measurement [20].Technically, composite indicators are mathematical combinations of a set of multidimensional indicators [20,24] and normal measures that combine the issues of a complex phenomenon [26].Therefore, the construction of composite indicators requires transparency as to its process to facilitate replication and debate among stakeholders [24].The construction of composite indicators requires more craftsmanship by the modeler than universally accepted scientific rules for encoding indicators [20].Basically, a typical composite indicator "I" is built as follows [13]: = normalized variable w i = weight attached to x i ∑ n i=1 w i = 1 and 0 ≤ w i ≤ 1, i = 1, 2, . . ., n.
From this formula, it is clear that a composite indicator requires a weighted linear aggregation rule that is applied to a set of variables.The formula indicates that normalization and weighted summation of the normalized variables are the two main steps for developing composite indicators.
Data can be aggregated without being scaled if all the variables are measured with the same unit (e.g., percent or ratios), but in many situations the variables to be aggregated have different units and different measurement techniques [27] such as nominal, ordinal, interval, and ratio scales.In this situation, normalization is the process by which the indicators in various scales and units are compared on a common basis, as depicted in Figure 1.Normalization is, therefore, the process of reducing the measurements to a standard scale [28], which helps to avoid the dominance of extreme values in a data set and partially corrects data quality problems [29].Normalization of indicators is required to make the indicators mathematically operational in aggregation [8].


From this formula, it is clear that a composite indicator requires a weighted linear aggregation rule that is applied to a set of variables.The formula indicates that normalization and weighted summation of the normalized variables are the two main steps for developing composite indicators.
Data can be aggregated without being scaled if all the variables are measured with the same unit (e.g., percent or ratios), but in many situations the variables to be aggregated have different units and different measurement techniques [27] such as nominal, ordinal, interval, and ratio scales.In this situation, normalization is the process by which the indicators in various scales and units are compared on a common basis, as depicted in Figure 1.Normalization is, therefore, the process of reducing the measurements to a standard scale [28], which helps to avoid the dominance of extreme values in a data set and partially corrects data quality problems [29].Normalization of indicators is required to make the indicators mathematically operational in aggregation [8].Every step of data transformation and/or normalization increases the probability of uncertainty and measurement error [30].Accordingly, the choice of the proper normalization technique is indisputably important.In developing composite indicators, the selection of a preferred normalization technique deserves special care, taking into account the objectives of the composite indicators as well as the data properties and the potential requirement of further analysis [20,31].Different normalization techniques produce different results [13] and may have major effects on composite scores [22,32].

Materials and Methods
For this paper, five normalization techniques are examined to investigate their effect and to identify the preferred technique for constructing composite indicators of coastal agricultural sustainability assessment in Bangladesh. Figure 2 shows the construction and evaluation process of the individual composite indicators that are examined in this paper.As shown in this figure, sustainability was categorized in terms of productivity, stability, efficiency, durability, compatibility, and efficiency.In brief, productivity is related to the yield of agricultural systems, stability refers to the ability to maintain a good level of productivity over an extended period of time, and efficiency is the measure of the extent to which the inputs for agricultural production enhance the crop yield.Durability can be defined as the ability of the agricultural system to resist or recover from stress and thereby maintain a good level of productivity over a cropping cycle.Compatibility refers to the ability of an agricultural system to fit in with the bio-geophysical, human, and socio-cultural surroundings in which the system is placed, and equity reflects a good quality of life for farmers and their family members [7].For details about these categories, see vanLoon et al. [7].For conceptual judging and selection of indicators, the dataset of Talukder [33] was used.Talukder [33] developed 110 indicators for assessing the sustainability of the coastal agricultural systems of Bangladesh.From 110 indicators of six sustainability categories, 50 indicators (Tables A2-A7 in Appendix B) were judged according to their importance and then grouped to make 15 composite indicators, as depicted in Figure 2.Then, Every step of data transformation and/or normalization increases the probability of uncertainty and measurement error [30].Accordingly, the choice of the proper normalization technique is indisputably important.In developing composite indicators, the selection of a preferred normalization technique deserves special care, taking into account the objectives of the composite indicators as well as the data properties and the potential requirement of further analysis [20,31].Different normalization techniques produce different results [13] and may have major effects on composite scores [22,32].

Materials and Methods
For this paper, five normalization techniques are examined to investigate their effect and to identify the preferred technique for constructing composite indicators of coastal agricultural sustainability assessment in Bangladesh. Figure 2 shows the construction and evaluation process of the individual composite indicators that are examined in this paper.As shown in this figure, sustainability was categorized in terms of productivity, stability, efficiency, durability, compatibility, and efficiency.In brief, productivity is related to the yield of agricultural systems, stability refers to the ability to maintain a good level of productivity over an extended period of time, and efficiency is the measure of the extent to which the inputs for agricultural production enhance the crop yield.Durability can be defined as the ability of the agricultural system to resist or recover from stress and thereby maintain a good level of productivity over a cropping cycle.Compatibility refers to the ability of an agricultural system to fit in with the bio-geophysical, human, and socio-cultural surroundings in which the system is placed, and equity reflects a good quality of life for farmers and their family members [7].For details about these categories, see vanLoon et al. [7].For conceptual judging and selection of indicators, the dataset of Talukder [33] was used.Talukder [33] developed 110 indicators for assessing the sustainability of the coastal agricultural systems of Bangladesh.From 110 indicators of six sustainability categories, 50 indicators (Tables A2-A7 in Appendix B) were judged according to their importance and then grouped to make 15 composite indicators, as depicted in Figure 2.Then, various normalization, weighting, and aggregation techniques were applied to identify suitable normalization and aggregation techniques for developing the final set of composite indicators.

Overview of Datasets
The datasets in Tables A2-A7 in Appendix B contain different measurement units under six categories of sustainability.The data were collected from both primary and secondary sources in the southwest coastal zone of Bangladesh.The primary data were collected from five different agricultural systems: Bagda (shrimp)-based agricultural systems (S) from Shyamnagar; Bagda-ricebased agricultural systems (SR) from Kalijang; rice-based agricultural systems (R) from Kalaroa; Galda-rice-vegetable-based integrated agricultural systems (I) from Dumuria; and traditional practices-based agricultural systems (T) from Bhola Sadar (Figure 3).The details of the data collection process, justification of data collection, and development of indicators can be found in Talukder [33].The description of the indicators, their units, data type, their relationships with sustainability pillars, data collection areas, data sources, and levels of measurement are presented in brief in Tables A2-A7 in Appendix B. A point to be noted here is that to develop the indicators, the collected data were

Overview of Datasets
The datasets in Tables A2-A7 in Appendix B contain different measurement units under six categories of sustainability.The data were collected from both primary and secondary sources in the southwest coastal zone of Bangladesh.The primary data were collected from five different agricultural systems: Bagda (shrimp)-based agricultural systems (S) from Shyamnagar; Bagda-rice-based agricultural systems (SR) from Kalijang; rice-based agricultural systems (R) from Kalaroa; Galda-rice-vegetable-based integrated agricultural systems (I) from Dumuria; and traditional practices-based agricultural systems (T) from Bhola Sadar (Figure 3).The details of the data collection process, justification of data collection, and development of indicators can be found in Talukder [33].The description of the indicators, their units, data type, their relationships with sustainability pillars, data collection areas, data sources, and levels of measurement are presented in brief in Tables A2-A7 in Appendix B. A point to be noted here is that to develop the indicators, the collected data were processed through cleaning, integration, reduction, and transformation.No outliers were detected in the collected data during these processes.
processed through cleaning, integration, reduction, and transformation.No outliers were detected in the collected data during these processes.

Normalization
A variety of transformation and/or normalization techniques are available (e.g., [29,[34][35][36]), but only the five most widely employed techniques [35,37] are shown in Table 1.These five techniques are ranking, distance to target, Z-score, min-max, and proportionate normalization.The first four are the most commonly used normalization techniques [13,21].The proportionate normalization technique was considered because of its suitability for the development of composite indicators.

Normalization
A variety of transformation and/or normalization techniques are available (e.g., [29,[34][35][36]), but only the five most widely employed techniques [35,37] are shown in Table 1.These five techniques are ranking, distance to target, Z-score, min-max, and proportionate normalization.The first four are the most commonly used normalization techniques [13,21].The proportionate normalization technique was considered because of its suitability for the development of composite indicators.

Name Formula Explanation
Ranking [35] N ias = Rank(X ias ) Where N ias = normalized value of indicator i for agricultural systems , X ias = variable X for indicator i for agricultural systems as Distance to target [35] N ias =

Xias Target X ias
Where N ias = normalized value of indicator i for agricultural systems as, X ias = variable X for indicator i for agricultural systems as Table 1.Cont.

Name Formula Explanation
Z-score (Standardization) [35] N ias = (X ias − µ)/σ Where N ias = normalized value of indicator i for agricultural systems as, X ias = variable X for indicator i for agricultural systems as, µ Where N ias = normalized value of indicator i for agricultural systems as, X i = indicator, X ias = variable X for indicator i for agricultural systems as; max as and min as are the largest and smallest observed values Proportionate [36] N ias = Ii ∑i Ii 0 < N ias < 1 Where I i = indicator value, ∑ i I i = sum of the indicators

Ranking Normalization
Ranking normalization replaces measurements with their rank.In the rank normalization process, each data point is replaced by its rank, that is, by values ranging from 1 (lowest) to N (highest) [38].In this system, there is no score, only a rank; the absolute-level information is lost.This technique, while simple, cannot lead to any conclusion about the differences among performances of the indicator being assessed because there is no measure of the distance between values of the indicators [39].Ranking normalization is employed in the "Information and Communications Technology Index" [29] and "Medicare Study on Healthcare Performance across the United States" [40].

Distance to Target Normalization
In the distance to target normalization technique, the indicator's value is divided by the target value to normalize the indicator [21] so that the normalized values represent a fraction of the highest value.The highest value of the indicator set or any reference point can be the target value.The results of this technique are easy to handle and understand, but imbalance between scores and rankings remains, and the normalization results are more influenced by outliers than in other techniques.This method is useful for further analysis (e.g., geometric aggregation) since it does not generate any zero values.However, if outliers are chosen as target points, the result can be misleading.The distance to target normalization technique is used in "Eco-indicator 99" and the "Summary Innovation Index" [41].

Z-Score Normalization
Z-score normalization is calculated by subtracting the mean from an indicator value and then dividing by its standard deviation.If the standard deviation is calculated for a set of variables with a mean of 0 and then all values are divided by the standard deviation, the resulting set of values will have a standard deviation of 1 [27].After performing normalization, the data have a common scale with a 0 mean and a standard deviation of 1.Since all Z-score distributions have the same mean and standard deviation, individual scores from different distributions can be directly compared.The advantage of this technique is that it provides no distortion from the mean, adjusting for different scales and variance.The output is dimensionless, and the relative differences are maintained due to the application of a linear transformation [42].Z-score is preferred when extreme values exist in the dataset [20,32].Although the technique does not fully adjust for outliers, the minimum and maximum values are not as influential as in other techniques such as distance to target.When extreme values are present in the original data, Z-score normalization takes these extreme values into account in a manner that does not distort their impacts on a composite indicator.In this way, an outlier, such as exceptional performance, is recognized and not ignored [13,27].The Z-score technique is widely employed, including in the knowledge-based economy index [43] and the World Health Organization's child growth standards index [44].

Min-Max Normalization
The min-max technique rescales data into different intervals based on minimum and maximum values.The advantage of this method is that boundaries can be set and all indicators have an identical range (0, 1).However, the normalized values do not maintain proportionality, and normalized values reflect the percentage of the range of max as (X ias ) − min as (X ias ).This technique is based on extreme values (minimum and maximum), but because these two values can be outliers, the range of max and min strongly influences the final output.Another disadvantage is that the difference in variance is not fully eliminated [13].Nevertheless, this technique is very popular and has been applied in the construction of many composite indicators, the best-known of which is the Human Development Index (HDI, [45]).

Proportionate Normalization
In proportionate normalization, the single attribute value is divided by the sum total of the values of attributes [37,46].The normalized values maintain proportionality such that they reflect the percentage of the sum of the total value of the indicators.Here, values of the indicator are relatively normalized.Normalizing the indicators by dividing them by their sums has a number of attractive properties, including that the normalized values are identical to the original, except for a scaling factor, and the process is easily understandable.The value differences among indicators become narrow.Dividing by the sum ensures that even the smallest value greater than zero comes out with a positive normalized value [19,37].The proportionate normalization technique is frequently used in normalizing census data in ArcView GIS (Geographical Information System, [46]).Benini [19] also suggested using this technique for developing composite measures for disaster impact assessment.

Weighting
The final score and ranking of the composite indicators depends on the weighting of the normalized values of the indicators.Weighting reflects the importance of each indicator relative to the overall composite indicators [21].Weights should ideally be selected according to an underlying and agreed-upon, or at least clearly stated theoretical, framework so that the process is transparent [47,48].Weighting can be a very important step in creating composite indicators before aggregation can take place, because it modifies the sub-indicator values.However, Sajeva et al. [28] have shown that the use of different weighting schemes can often have no significant effect on the ranking of the composite indicators.No agreed-upon methodology exists to weight individual indicators.Different types of weighting techniques and their explanations are provided by Nardo et al. [20].
In this paper, equal weighting of sub-indicators is used for all rank, distance to target, Z-score, max-min and proportionate normalization, and arithmetic mean and geometric mean aggregation.Simplicity is the main advantage of equal weighting, but the composite indicator that is developed by the combination of more indicators will have a stronger influence on the list of composite indicators.Using this weighting system may be justified when no other available means of weighting are known [30].Equal weighting is used in the HDI [49].Budget allocation techniques for weighting are used for MCA aggregation (as shown in Table A13).A budget allocation technique for weighting is chosen because the sustainability of agriculture is very contextual, so stakeholders' opinions are very important for weighting of the indicators.Geometric and multi-criteria, as well as linear, aggregation can be employed with these weightings [13].The OECD's Handbook on Constructing Composite Indices [13] describes expert weighting as a budget allocation technique.In expert weighting, an expert allocates 100 points among indicators according to their importance [21].Selection of the appropriate expert and number of experts is the biggest problem for this system because point allocation may be influenced by the expert's experience [30].This subjective judgment of the weights of sub-indicators is used to allocate relative worth for each sub-indicator [22].Subjective weighting is often affected by strong inter-individual disagreement [29] and is particularly sensitive in the case of complex, interrelated, and multidimensional phenomena [20].Nevertheless, Sen and Foster ( [48], p. 206) pointed out that "while the possibility of arriving at a unique set of weights is rather unlikely, that uniqueness is not really necessary to make acceptable judgments in many situations, and may indeed not even be required for a complete ordering".

Aggregation
The rules for aggregation are well documented in the Handbook on Constructing Composite Indices [13], but steps are still debated in the development of composite indicators [50].The fundamental issue in aggregation is the compensability of indicators, which is defined as compensating for any indicator's dimension with a suitable surplus in another indicator's dimension.The rules for aggregating composite indicators can be compensatory or non-compensatory [51].A compensatory technique deals with the imbalances in the indicators and uses linear functions, whereas non-compensatory techniques use unbalance-adjusted functions [52].Different aggregation rules are possible to develop composite indicators.Commonly applied aggregation options include additive aggregation (arithmetic mean), geometric aggregation (multiplication), and multi-criteria analysis [13].
The arithmetic mean is a linear function [53].The normalized and weighted or unweighted indicators are summed to compute the arithmetic mean (the formula for evaluating arithmetic mean [26,32].In this method, compensability can be a disadvantage if a low value in one indicator or dimension masks a high value in another, that is, a deficit in one indicator or dimension can be compensated for by a surplus in another [30,32]. Geometric aggregation, which is the product of normalized weighted indicators, is used to avoid concerns related to interaction and compensability [32].Non-comparable data measured in a ratio scale can only be meaningfully aggregated by using geometric functions, provided that indicators are strictly positive [20,30].A geometric mean (the formula for evaluating geometric mean is (∏ n i=1 x 1 ) 1 n ) takes into consideration differences in achievement across dimensions [20].Poor performance in any dimension or indicator is directly reflected in the composite indicator's value.According to Hudrlikova and Kramulova [30], this technique is partly compensable since it rewards composite indicators with higher indicator scores.
"When different goals are equally legitimate and important, and in addition trade-offs exist between the dimensions of a composite indicator (namely negative correlations between dimensions) then a non-compensatory logic may be necessary" ( [21], p. 256).Multi-Criteria Analysis (MCA) is used for aggregating non-compensatory data [52].In general, MCA provides an overall ranking based on the weight and values of given indicators.One of the shortcomings of MCA is that when the number of indicators to develop composite indicators is high, it is difficult to compute MCA [30].MCA is based on an outranking matrix.The standard procedure for performing an MCA consists of three steps: identifying the weighting of the criteria, preparing an "outranking matrix" by pairwise comparison of the weighted performance of each criterion (for n options, there are n (n − 1)/2 comparisons) [54], and calculating the composite indicator score of the criteria by adding the values of the row of the outranking matrix [55].

Robustness
The outcome of the composite indicators depends on the selection of variables, normalization, weighting (if it is used), and aggregation techniques [20], so it is necessary to examine the robustness of the developed composite indicators.Various statistical tests can help ensure that the composite is reliable.Freudenberg [29] and Hudrlikova and Kramulova [30] mentioned correlation as a technique to assess the impacts of different normalization techniques on composite indicators.The correlation coefficient can show whether the results of the composite indicator are heavily influenced by the choice of normalization rules [30] and aggregation methods.In this paper, correlation is used to assess the robustness of composite indicators.

Results
The results for the composite indicators using various normalization techniques, weighting, and different aggregation techniques are presented in Tables A8-A13 in Appendix B. The results of the robustness tests of the composite indicators are presented in Tables A14-A27 in Appendix B.
The values of data and different normalization techniques and arithmetic aggregation involve different assumptions that have specific consequences that produce different results for the composite indicators (Tables A8-A13).In this regard, Nardo et al. [20] mentioned that the ranking of composite indicators is heavily influenced by the nature of the data.Saisana and Saltelli [21] also pointed out that it is beyond doubt that composite indicators are a value-laden construct.In arithmetic aggregation, it is also observed that poor performance in some indicators is covered by sufficiently high values of other indicators in composite indicators.
In the dataset for this study, the score for some of the indicators is "0".For example, as shown in Table A6, in the compatibility category "S" scored "0" in drinking water quality.Indicators that have "0" scores have the normalization result "0" in proportionate, distance, and Z-score normalization, but not a "0" ranking normalization, since the score "0" is ranked as the lowest number.The max-min normalization also generates "0" scores as normalized values.Whenever the normalization score is "0" or negative, those indicators are not suitable for geometric mean aggregation because geometric aggregation requires all positive numbers and is therefore only appropriate when indicator values are always positive [20].
When aggregation was carried out considering indicators' values and budget allocation weight and MCA techniques, the results also generated different values for some of the composite indicators (Table A13) compared to other types of aggregation.Due to the nature of the data, MCA also generates "0" values for productivity, energy efficiency, and human compatibility composite indicators of "S", as well as "0" values for resistance to economic stress, resistance to climate change, and gender composite indicators of "T" (Table A13).Therefore, budget allocation weighting and MCA combinations cannot be recommended for composite indicators.
These different values of the composite indicators that result from applying different combinations of normalization techniques, weighting, and aggregation reflect that the properties of the indicators are very crucial for the final output values of the composite indicators.This study shows that the normalization technique, arithmetic mean, and geometric mean should take into account the data properties, as well as the objectives of the composite indicator.From the results, it appears that not all normalization techniques are suitable for the dataset, and not all normalization techniques support arithmetic mean and geometric mean.Even when MCA techniques are applied, some "0" values are generated for the composite indicators.
Nardo et al. [20] suggested that in the case of non-compensatory composite indicators, MCA is the best way to develop indicator values.However, due to the nature of the present dataset, MCA is not suitable for this experiment because the "0" scores of some of the indicators do not reflect the weight of the indicator, so the results may be difficult to interpret and compare.In MCA, composite indicators are based on weight, so the magnitudes of values of the different indicators are disregarded in the composite."This means any issue that does marginally better on many indicators score higher than the issue that does a lot better on a few indicators because outstanding performances of the indicators cannot compensate for the deficiencies in some indicators" ( [56], p. 364).

Discussion
In this study, it is observed that proportionate normalization produced values that conserve the proportionality of the indicator values (Table A12), whereas other normalization techniques show different outcomes.For example, the normalization results of rank, distance to target, Z-score, max-min, and proportionate normalization for weighted yield of rice indicators of "S" in Tables A8-A12 are 1, 0.35, −1, 0 and 0.11, respectively.Here, only the 0.11 that is generated using proportionate normalization represents the proportionate value of the original score of 2.26 for the weighted yield of rice indicator of "S" (see Table A2).Therefore, proportionate normalization is selected to develop composite indicators in this study, because the original values of the data do not change through this process.If the values of the data change due to the transformation technique/normalization, they are mathematically not meaningful.Therefore, it is always preferable to follow a technique by which original data are transformed in such a way that their informational content is not fundamentally altered [22].In proportionate normalization, the rank of the composite indicator depends on actual values since proportionate normalization does not alter the actual importance of the values of the indicators.This is the strength of this technique [22].Furthermore, proportionate normalization seems preferable in this experiment to the most popular min-max normalization because there are no goalpost values for any of the 50 indicators.
There is clearly no universal best aggregation method because aggregation depends on the requirement of the developer of the composite indicators.In the data there are some "0" values.Therefore, to aggregate the indicators, a hybrid aggregation is suggested: indicator values with the "0" normalization result will be aggregated by arithmetic mean, and the rest will be aggregated by geometric mean.Hybrid aggregation techniques use more than one aggregation function at different levels [54].For example, the "Multidimensional Poverty Assessment Tool (UNIFAD, 2010 as cited in [54]) used arithmetic average within a subcomponent and geometric average within a component, while the Food and Nutrition Security Index (FAO, 2014 as cited in [57]) used arithmetic averages within dimensions and geometric average across dimensions" ([57], p. 16).When comparing all applied normalization and aggregation techniques, it appeared that, for the present research, the proportionate normalization and hybrid aggregation techniques (geometric mean and arithmetic mean) produced the most preferred results.Therefore, 15 single composite indicators are developed from the 50 indicators in Talukder et al. [16] using proportionate normalization and hybrid aggregation.These 15 single composite indicators (see Table 2) are proposed to create a set of the most representative variables of agricultural sustainability in the study area.Among these 15 composite indicators, "monetary efficiency" carries the proportionate normalization values of the original values without any aggregation but is normalized by proportionate normalization.

Conclusions
This study tested various normalization and aggregation techniques for developing composite indicators, providing a comparison among different combinations to find out the best normalization and aggregation combination.Normalization techniques, weighting, and aggregation all influence the final outcomes of composite indicators, so it is important to compare different combinations of normalization, weighting, and aggregation techniques.Rank, distance to target, Z-score, max-min, and proportionate methods were used for normalization, while equal weight and budget allocation for weighting and arithmetic mean, geometric mean, and multi-criteria analysis were used for aggregation.The results show that the normalization and characteristics of data have a huge influence on composite indicators.For example, the human compatibility composite indicator in the compatibility category has a score of "0" using rank normalization and geometric aggregation, distance to target normalization and geometric mean, Z-score normalization and arithmetic mean, Z-score normalization and geometric mean, max-min normalization and arithmetic mean, proportionate normalization and geometric mean, proportionate normalization and arithmetic mean, or MCA.A score of 1 results from using rank normalization and arithmetic mean, a score of 0.25 from using distance to target normalization and arithmetic mean, and a score of −1.98 using Z-score normalization and arithmetic mean.
Both methodological and empirical conclusions can be drawn from this study.From a methodological point of view, it can be said that proportionate normalization and the hybrid aggregation technique are suitable for developing composite indicators from these empirical data, which are developed through a questionnaire and secondary data and have a score of "0" for several indicators.These techniques allow the aggregation of a multidimensional set of indicators into a unique composite indicator that can facilitate the understanding of a complex concept such as agricultural sustainability.In the case of proportionate normalization, weighting the indicators has no effect.However, these techniques depend on the properties of the indicators, and some subjectivity is associated with the selection of normalization and aggregation rules.Depending on the methodology selected for constructing indicators, the results of the composite indicators can vary and sometimes be misleading.Based on the properties of the dataset, it appears that proportionate normalization is appropriate, and a hybrid of aggregation rules is suitable for developing composite indicators.However, it is the responsibility of the designer of the composite indicator to choose the most appropriate normalization and aggregation techniques.These techniques must have a sound and transparent methodological framework.In this respect Nardo et al. [20] also stated that the selection of the normalization process deserves special care.To get a clear understanding and definition of the multidimensional phenomenon to be measured.

Appendix A
To structure the various sub-groups of the phenomenon (if needed).
To compile a list of selection criteria for the underlying variables, e.g., input, output, and process.
2nd: Data selection Should be based on the analytical soundness, measurability, country coverage, and relevance of the indicators to the phenomenon being measured and their relationship to each other.The use of proxy variables should be considered when data are scarce (involvement of experts and stakeholders is envisioned in this step).
To check the quality of the available indicators.
To discuss the strengths and weaknesses of each selected indicator.
To create a summary table of data characteristics, e.g., availability (across country, time), source, type (hard, soft or input, output, and process).
3rd: Imputation of missing data Is needed in order to provide a complete dataset (e.g., by means of single or multiple imputation).
To estimate missing values.
To provide a measure of the reliability of each imputed value so as to assess the impact of the imputation on the composite indicator results.
To discuss the presence of outliers in the dataset.
4th: Multivariate analysis Should be used to study the overall structure of the dataset, assess its suitability, and guide subsequent methodological choices (e.g., weighting, aggregation).
To check the underlying structure of the data along the two main dimensions, namely individual indicators and countries (by means of suitable multivariate methods, e.g., principal components analysis, cluster analysis).
To identify groups of indicators or groups of countries that are statistically "similar" and provide an interpretation of the results.
To compare the statistically determined structure of the dataset to the theoretical framework and discuss possible differences.To select suitable normalization procedure(s) that respects both the theoretical framework and the data properties.
To discuss the presence of outliers in the dataset as they may become unintended benchmarks.
To make scale adjustments, if necessary.
To transform highly skewed indicators, if necessary.
6th: Weighting and aggregation Should be done along the lines of the underlying theoretical framework.
To select appropriate weighting and aggregation procedure(s) that respects both the theoretical framework and the data properties.
To discuss whether correlation issues among indicators should be accounted for.
To discuss whether compensability among indicators should be allowed.

7th: Uncertainty and sensitivity analysis
Should be undertaken to assess the robustness of the composite indicator in terms of the mechanism for including or excluding an indicator, the normalization scheme, the imputation of missing data, the choice of weights, the aggregation method, and so forth.
To consider a multi-modelling approach to build the composite indicator and alternative conceptual scenarios for the selection of the underlying indicators if available.
To identify all possible sources of uncertainty in the development of the composite indicator and accompany the composite scores and ranks with uncertainty bounds.
To conduct sensitivity analysis of the inference (assumptions) and determine what sources of uncertainty are more influential in the scores and/or ranks.
8th: Back to the data Is needed to reveal the main drivers of overall good or bad performance.Transparency is primordial to good analysis and policymaking.
To profile country performance at the indicator level so as to reveal what is driving the composite indicator results.
To check for correlation and causality (if possible).
To identify whether the composite indicator results are overly dominated by a few indicators and to explain the relative importance of the sub-components of the composite indicator.

Figure 1 .
Figure 1.Generalized graphical representation of normalization for constructing a composite indicator.

Figure 1 .
Figure 1.Generalized graphical representation of normalization for constructing a composite indicator.
weighting, and aggregation techniques were applied to identify suitable normalization and aggregation techniques for developing the final set of composite indicators.

Figure 3 .
Figure 3. Location of the study areas and gradients of soil salinity (1973-2009) in the coastal zone of Bangladesh.The soil salinity contours represent the northern boundary of areas where soils may have salinity values of 2 dS m −1 or more ([16], p. 149).

Figure 3 .
Figure 3. Location of the study areas and gradients of soil salinity (1973-2009) in the coastal zone of Bangladesh.The soil salinity contours represent the northern boundary of areas where soils may have salinity values of 2 dS m −1 or more ([16], p. 149).
Legend: RNAM = Rank normalization and arithmetic mean; DTTNAM = Distance to target normalization and arithmetic mean; ZSNAM = Z-score normalization and arithmetic mean; M-MNAM = Max-Min normalization and arithmetic mean; PNAM = Proportionate normalization and arithmetic mean; MCA = Multi-criteria analysis.
Legend: RNAM = Rank normalization and arithmetic mean; DTTNAM = Distance to target normalization and arithmetic mean; ZSNAM = Z-score normalization and arithmetic mean; M-MNAM = Max-Min normalization and arithmetic mean; PNAM = Proportionate normalization and arithmetic mean; MCA = Multi-criteria analysis.
Legend: RNAM = Rank normalization and arithmetic mean; DTTNAM = Distance to target normalization and arithmetic mean; ZSNAM = Z-score normalization and arithmetic mean; M-MNAM = Max-Min normalization and arithmetic mean; PNAM = Proportionate normalization and arithmetic mean; MCA = Multi-criteria analysis.

Table A21 .
Resistance to climate change: Spearman correlation (in %).= Rank normalization and arithmetic mean; DTTNAM = Distance to target normalization and arithmetic mean; ZSNAM = Z-score normalization and arithmetic mean; M-MNAM = Max-Min normalization and arithmetic mean; PNAM = Proportionate normalization and arithmetic mean; MCA = Multi-criteria analysis.
Legend: RNAM = Rank normalization and arithmetic mean; DTTNAM = Distance to target normalization and arithmetic mean; ZSNAM = Z-score normalization and arithmetic mean; M-MNAM = Max-Min normalization and arithmetic mean; PNAM = Proportionate normalization and arithmetic mean; MCA = Multi-criteria analysis.
Legend: RNAM = Rank normalization and arithmetic mean; DTTNAM = Distance to target normalization and arithmetic mean; ZSNAM = Z-score normalization and arithmetic mean; M-MNAM = Max-Min normalization and arithmetic mean; PNAM = Proportionate normalization and arithmetic mean; MCA = Multi-criteria analysis.
Legend: RNAM = Rank normalization and arithmetic mean; DTTNAM = Distance to target normalization and arithmetic mean; ZSNAM = Z-score normalization and arithmetic mean; M-MNAM = Max-Min normalization and arithmetic mean; PNAM = Proportionate normalization and arithmetic mean; MCA = Multi-criteria analysis.

Table 1 .
Selected normalization techniques for this study.

Table 1 .
Selected normalization techniques for this study.

Table 2 .
Composite indicators developed using proportionate normalization and hybrid aggregation techniques.

Table A1 .
Checklist for building a composite indicator.

Table A3 .
Selected indicators and values to construct single composite indicators for stability.

Table A4 .
Selected indicators and values to construct single composite indicators for efficiency.

Table A5 .
Selected indicators and values to construct single composite indicators for durability.

Table A6 .
Selected indicators to construct single composite indicators for compatibility.

Table A7 .
Selected indicators and values to construct single composite indicators for equity.

Table A8 .
Results of composite indicators after applying rank normalization and aggregation techniques.

Table A9 .
Results of composite indicators after applying distance to target normalization and aggregation techniques.

Table A10 .
Results of composite indicators after applying Z-score normalization and aggregation techniques.
* Only proportionate normalization, no aggregation.#NUM! means calculation is not possible.

Table A11 .
Results of composite indicators after applying max-min normalization and aggregation techniques.

Table A12 .
Results of composite indicators after applying proportionate normalization and aggregation techniques.

Table A13 .
Results of composite indicators after applying weight and multi-criteria aggregation.