Building a Composite Indicator to Measure Environmental Sustainability Using Alternative Weighting Methods

Environmental sustainability in agriculture can be measured through the construction of composite indicators. However, this is a challenging task because these indexes are heavily dependent on how the individual base indicators are weighted. The main aim of this paper is to contribute to the existing literature regarding the robustness of subjective (based on experts’ opinions) weighting methods when constructing a composite indicator for measuring environmental sustainability at the farm level. In particular, the study analyzes two multi-criteria techniques, the analytic hierarchy process and the recently developed best-worst method, as well as the more straightforward point allocation method. These alternative methods have been implemented to empirically assess the environmental performance of irrigated olive farms in Spain. Data for this case study were collected from a panel of 22 experts and a survey of 99 farms. The results obtained suggest that there are no statistically significant differences in the weights of the individual base indicators derived from the three weighting methods considered. Moreover, the ranking of the sampled farms, in terms of their level of environmental sustainability measured through the composite indicators proposed, is not dependent on the use of the different weighting methods. Thus, the results support the robustness of the three weighting methods considered.


Introduction
There is a broad consensus about the definition of 'sustainable agriculture' as an activity that satisfies the following requirements for an indefinite period of time [1,2]: a) it protects biodiversity and natural resources and prevents environmental degradation, b) it is economically viable, and c) it is socially acceptable. Taking these requirements into consideration, agricultural sustainability can be defined as a concept that encompasses three main dimensions: • Environmental sustainability. Sustainable agriculture must preserve biological biodiversity and the provision of ecosystem services. Thus, environmental sustainability can be defined as the ability to ensure greater agricultural productivity while simultaneously conserving natural resources and preventing the degradation of ecosystems. • Economic sustainability. Sustainable agriculture must be economically viable, ensuring not only adequate profitability for farmers (the microeconomic level) but also a positive contribution to national/regional income (the macroeconomic level).
• Socio-cultural sustainability. Sustainable agriculture must be socially and culturally beneficial, i.e., it should ensure food security and the fair and equitable distribution of the wealth it generates, as well as contribute to the viability of rural communities.
This paper is focused on the measurement of environmental sustainability. To date, various indicator-based methods have been developed for this purpose, constructed using a wide range of agri-environmental indicators (AEIs) (for a review of the approaches proposed, interested readers can consult Bockstaller et al. [3]). All these proposals are based on large sets of AEIs aimed at assessing the multidimensional environmental impacts of agricultural activity and the provision of ecosystem services (natural resources, biological processes, biodiversity, etc.) (e.g., [4][5][6]). However, the quantification of environmental sustainability through AEIs has been criticized for several reasons. On the one hand, there are technical problems related to data and measurability (qualitative aspects that are hard to quantify) issues and a lack of sound ecological models that enable the interpretation of the indicators (e.g., a lack of reference levels or thresholds for reversibility processes), the multiple spatial scales needed for an overall assessment or the appropriate time horizon required (extended monitoring is needed for long-term environmental changes) [7,8]. On the other hand, there are also operational concerns related to the interpretation of the whole set of indicators required for such analyses, which is an obstacle to their use as a practical decision-support tool. In order to deal with the latter problem, composite indicators or indexes have been proposed as a means of summarizing the information provided by multiple indicators into an overall assessment of environmental performance (see, for instance, [9,10]).
In order to construct a composite environmental indicator, specific methodological approaches for the normalization, weighting, and aggregation of base AEIs must be selected from several alternatives (see further details in Section 3). All these choices have a significant effect on the overall composite indicator built. The inherent subjectivity of the choice of these approaches is behind most of the criticisms leveled at the different sustainability indexes proposed in the empirical studies carried out to date [11]. This issue has prompted an academic debate on the robustness and sensitivity of the methodological approaches used in the construction of composite indicators [12,13].
The objective of this paper is to contribute to the existing literature regarding the robustness of three alternative weighting methods [14]. For this purpose, we build a composite environmental indicator using three different methods to assess the relative importance of the individual base AEIs; then, by comparing the results obtained from a real-world case study, we provide further insights into the robustness of the weighting methods implemented.
In this context, it is worth mentioning that two of the weighting methods used in this paper are at the core of the multi-criteria decision-making (MCDM) paradigm, since they are used as key tools to provide information about the relative importance of the different criteria considered in these kinds of problems. For this reason, MCDM weighting procedures have commonly been used to build composite indicators [15], with the analytic hierarchy process (AHP) being the most popular one (e.g., [16][17][18]). Moreover, other more consistent and less time-consuming weighting methods have recently been developed, with the notable among them being the best-worst method (BWM) [19]. As this is a new technique, there have only been a few applications to date, and it has not yet been used in the construction of composite indicators. In this paper, we use the two aforementioned multi-criteria methods, in addition to the more straightforward point allocation (PA) method, to weight the base AEIs that are to be included in an index. We then compare the results obtained from the three methods in an empirical case study. To the best of our knowledge, this is the first study focused on the analysis of the robustness of weighting methods when constructing a composite environmental indicator, although there have been several empirical studies comparing some alternative weighting methods in different types of composite indicators (e.g., [20,21]). The comparative analysis proposed will enable us to draw useful conclusions about the construction of sound composite indicators.
The empirical case study considered is the assessment of the environmental sustainability of olive groves in Spain. For this purpose, the composite environmental indicator built (ENV_SUST) relies on previous research focused on the selection of relevant AEIs for this particular agricultural system [22], and uses primary data gathered from Spanish olive farms to calculate the whole set of selected AEIs at the farm level [23]. In any case, it is worth noting that, as far as the authors are aware, this paper is the first to develop a single composite environmental index for an overall assessment of environmental performance in olive growing. Thus, this paper also contributes to the existing literature by providing a sound instrument that is particularly useful for designing targeted policy interventions aimed at promoting sustainable olive farming.

Recent Developments of Olive Groves in Spain
The current area of olive groves in Spain has reached its historically highest level, with around 2.5 million hectares (14% of the country's utilized agricultural area). In fact, Spain is the world's leading olive-producing country, accounting for one-third of the total olive grove area worldwide and half of the total olive oil production.
Spain's accession to the EU in 1986 allowed the Spanish olive sector to benefit from the implementation of the Common Agricultural Policy (CAP), which granted olive growers coupled subsidies that made olive farming more profitable than other types of agriculture (i.e., extensive herbaceous crops). The CAP subsidies encouraged these farmers to increase their olive grove area and olive production. This process of expansion has been also supported by new growing techniques aimed at production intensification, such as irrigation (olive has traditionally been a non-irrigated crop) and higher tree density (increasing from traditional densities of around 100 olive trees per hectare to 300-500 trees per hectare in 'intensive' orchards, or even more than 1000 trees per hectare in 'super-intensive' ones). As a result of these changes, Spain has increased its olive grove area by 25% between 1990 and 2020, and doubled its production of olive oil.
However, this expansion and intensification of Spanish olive groves has given rise to a number of environmental problems [23,24]:

1.
Soil erosion. Erosion is the main environmental problem caused by this crop. The high soil erosion rates are due to the fact that more than 40% of all olive groves are located on land with unfavorable soil conditions for agricultural production (steep slopes, land particularly sensitive to erosion, or affected by frequent torrential rain), and that poor soil management by farmers has damaged natural vegetation cover (leading to farms with uncovered soils) [25]. This environmental impact has been aggravated in recent years by the expansion of olive groves into areas with especially adverse characteristics (steep slopes, extreme torrential rainfall, high erodibility of soils) [26].

2.
Loss of biodiversity. One of the main characteristics of olive groves in the 1980s (under traditional farming) was the high biodiversity associated with the crop, with olive being an example of a 'high natural value' agricultural system. The low-intensity olive farming (minimum use of agrochemicals) and the existence of old olive trees with semi-natural herbaceous vegetation located in areas with different land uses (vineyards, cereals, pastures, and Mediterranean forest) provided a varied habitat, where a large number of insects, birds, reptiles, and mammals found refuge. However, the extension (large olive monoculture areas where hedgerows, stone walls, and islands of shrubs and trees have been eliminated) and intensification of olive groves (disappearance of vegetable cover, intensive use of biocides, fertilizers and machinery, water pollution, and soil erosion) has changed this situation, leading to a reduction in both the number and diversity of animal species in olive grove systems [27,28].

3.
Non-point source water pollution. Modern olive growing has contributed to a decline in water quality due to the intense use of agrochemical products (mainly herbicides and fertilizers). This has resulted in non-point source water pollution problems in rivers, reservoirs and aquifers. Although in recent years some of the most polluting products widely used in olive farming (e.g., simazine and diuron) have been banned, water quality could be further improved by modifying some of the current olive farming practices [29].

4.
Overexploitation of water resources. Before the 1990s, most olive trees in Spain were rain-fed, but the intensification of the crop has seen the emergence of 800,000 hectares of irrigated olive groves. Although olive trees have low water requirements and are usually irrigated using highly efficient irrigation systems (water consumption using drip irrigation is around 1500 m 3 /ha·year), there is substantial pressure on water resources [30]. Increasing water extraction not only causes the overexploitation of water resources but also jeopardizes the ability to meet other water demands in basins with a higher degree of water scarcity [31].
It is worth noting that several policy initiatives have been implemented in recent years to (partially) solve all these relevant environmental problems related to olive groves, by encouraging farmers to adopt various biodiversity-friendly and resource conservation practices. Along with the rational use of agrochemicals and water, these include some compatible soil conservation practices, such as disposing of olive-desuckering debris without burning, which helps mitigate climate change; the shredding of olive-pruning debris for use as soil cover, improving soil texture and reducing the impact of rain and water run-offs; and the use of cover crops under mower control as a sustainable practice in terms of soil protection [26,32,33]. In addition, some practices related to functional elements (hedgerows, riparian vegetation, plots margins, etc.) have proven effective in enhancing biodiversity as well as having a positive effect on other ecosystem services, such as landscape aesthetics [34]. Yet there is still plenty of room for further improvement. To effectively guide the future development of olive groves, more in-depth analyses are required, especially those that provide a quantitative assessment of the environmental performance of individual olive farms.

Indicators Measuring Environmental Sustainability of Olive Farming
Although there are several methodological frameworks for the quantitative assessment of the environmental sustainability of agricultural systems, there is widespread scientific agreement that constructing and calculating environmental indicators is the most suitable approach for this purpose [35,36]. Thus, the present study relies on the set of AEIs proposed by Gómez-Limón and Riesgo [22] to evaluate the sustainability of olive farms in Spain.
In order to evaluate the environmental sustainability of agriculture, the approach followed is founded on the SAFE (Sustainability Assessment of Farming and the Environment) analytical framework [37]. SAFE is based on a hierarchical structure with three levels: i) principles, ii) criteria and iii) indicators. Principles are general conditions for achieving environmental sustainability related to the ecological functions of the agro-ecosystems. In this sense, the environmental sustainability of agricultural systems centers on two principles regarding the protection of (a) biodiversity and (b) natural resources. Criteria are the resulting states of agricultural systems when their related principles are respected. For the particular case of olive groves, biodiversity is considered to be protected when the following elements are guaranteed: a1) olive grove genetic diversity (trees in the orchard), a2) biological diversity (a range of different species within the farm boundaries, from 0.1 km 2 to 10 km 2 ), and a3) habitat diversity (a range of different habitats within the landscape unit, from 10 km 2 to 1000 km 2 ). In addition, natural resources conservation is achieved when: b1) soil erosion is minimized, b2) soil fertility is protected or enhanced, b3) soil and water quality are maintained or improved, b4) water extraction is minimized, and b5) the energy balance (primary energy supply minus primary energy used per cultivation unit) is optimized. Lastly, indicators are variables that can be assessed to measure compliance with a criterion, thus producing a representative picture of the environmental sustainability of the agricultural system under analysis. Taking the former criteria into account, a set of 11 AEIs was selected on the basis of analytical soundness, measurability and policy relevance [38,39], as shown in Table 1. Technical details on why each indicator was chosen, how it was calculated at farm level and how its value should be interpreted can be found in Gómez-Limón and Arriaza [23].

The Methodological Approach for Building Composite Indicators
The literature contains a plethora of techniques for building environmental sustainability indices. In any case, the Organization for Economic Co-operation and Development (OECD) and the Joint Research Centre of the European Commission (JRC) [11] provide guidance on the transparent construction of composite indicators, identifying the steps that analysts should follow:

1.
Indicator selection and data gathering. As explained in Section 2, an essential element of this kind of study is the selection of relevant AEIs based on strict quality criteria, and accurate data gathering to calculate the empirical values of these indicators. Given the huge number of possible indicators, the use of a solid theoretical framework is recommended; in this paper, the SAFE approach is applied.

2.
Normalization of indicators. Transforming indicators into dimensionless variables (normalization) is essential before they are weighted and aggregated, as they have usually been calculated using different units of measurement. To be able to compare them and perform arithmetic operations on them, they need to be expressed in homogeneous units within the same range. In our case, selecting from among the various normalization techniques available [40,41], we applied the min-max or re-scaling normalization, taking the reference values for each of the AEI considered as sustainability thresholds. Thus, the values of all the normalized indicators vary within a dimensionless range [0,1], where 0 is assigned to all cases where the AEI value is worse than or equal to an 'unacceptable level of sustainability' (i.e., the worst environmental performance) and 1 is assigned to all cases where the AEI value is better than or equal to a 'desired level of sustainability' (i.e., the best environmental performance).

3.
Weighting of indicators. Assigning weights enables us to identify the relative importance of the individual indicators. There are several valid procedures for weighting indicators, but the composite indicator may yield different results depending on the procedure used [42,43]. Therefore, the selection of a particular technique is a challenging task. The weighting techniques for constructing indices can be divided into 'objective' and 'subjective' ones [44]. With the former, weights are derived endogenously using statistical or mathematical procedures, such as principal components analysis (PCA), data envelopment analysis (DEA), the benefit of the doubt (BOD) approach or regression analysis (RA). With the latter, weights are determined exogenously on the basis of value judgments expressed by experts or decision-makers, as is the case with AHP, BWM, PA, budget allocation process (BAP) or conjoint analysis (CA). It is worth mentioning that environmental sustainability is a technical concept that requires scientific knowledge to define and measure, especially when it is applied to a specific ecosystem (in this case, olive groves in Spain). This justifies the use of exogenously determined weights in our case study, based on the opinion of experts on the environmental performance of olive groves. In particular, we have chosen AHP because it is the most commonly-applied technique among the subjective weighting methods available, BWM has been selected due to its novelty and the presumed advantages over AHP, and PA because it is an explicit and straightforward weighting method.

4.
Aggregation of indicators. The OECD and JRC [11] suggest several alternative functional forms that allow indicators to be aggregated, explaining their pros and cons. Depending on the aggregation method used to develop the indices, the results and the conclusions drawn from them may differ from case to case. Thus, the choice of the aggregation method is also subject to criticism relating to the shortcomings of the technique used [40,43,45]. The key issue when selecting a functional method of aggregating indicators is the compensability or marginal rate of substitution among indicators [18]: a) additive linear functions implicitly assume total compensability among indicators, b) multiplicative and geometric functions permit partial compensability, and c) non-compensatory functions assure non-compensability. In order to minimize the subjectivity regarding the method employed to build the composite indicator measuring environmental sustainability, the multicriteria function based on the distance to the ideal point measured by different metrics (i.e., different degrees of compensability) and developed by Díaz-Balteiro and Romero [46] has been chosen for implementation.
Having made the decisions explained above, the composite indicator measuring the environmental sustainability of olive farms (ENV_SUST) can be calculated as a function of the normalized values of the 11 AEIs taken into account (I k ), the weights assigned to each of these indicators (w k ) and the compensation parameter (λ), following the expression: The parameter λ ranges between 0 and 1, thus affecting the degree of compensability among the indicators. Here we consider five values of the compensation parameter (λ = 0, λ = 0.25, λ = 0.5, λ = 0.75 and λ = 1), which gives us the three abovementioned possibilities: (a) total compensability (λ = 1), (b) various degrees of partial compensability (0 < λ < 1) and (c) zero compensability (λ = 0).

Alternative Techniques for Weighting Indicators
As has been previously stated, this paper is focused on the role of weighting techniques when constructing composite indicators, in order to provide further empirical insights about their robustness. For this reason, in this paper we use three exogeneous weighting methods (i.e., based on expert opinion) to determine the priorities (global weights) (w k ) of the whole set of AEIs used to construct the ENV_SUST index: AHP, BWM, and PA.
The AHP was initially developed as a decision-support tool for making complex decisions [47], but it was subsequently adapted to index construction; this technique is also particularly useful for weighting sustainability attributes when constructing composite indicators [13,48]. The implementation of this method involves the following steps: first, the weighting problem is structured as a tree-based hierarchy, where the overall goal of the problem (in our paper, the environmental sustainability of irrigated olive grove) is at the top of the hierarchy. Decision criteria contributing to the main goal are placed at an intermediate level (i.e., biodiversity protection and conservation of natural resources in our case) and decision subcriteria are positioned at the lowest level (the base AEIs in our case). Second, experts individually perform pairwise comparisons at each node of the hierarchy, expressing their preferences as to how much one (sub)criterion should be valued over another, following Saaty's fundamental scale (from 1 −equal importance− to 9 −extreme importance of one (sub)criterion over another). Based on these expert judgments, reciprocal square matrices can be built for each node. Third, the local weights of the sets of criteria (biodiversity protection and conservation of natural resources) and subcriteria (the base AEIs) are calculated using the main eigenvector method proposed by Saaty [49]: AW = λ max W, where λ max is the maximum eigenvalue of A and W is the vector of local weights. AHP allows some degree of inconsistency in the decision maker's judgments, measured using a consistency ratio that must not exceed predefined values [50]. Fourth, global weights (w k ) of the base AEIs are calculated by multiplying the local weight of each subcriterion (AEI) by the priority of its parent node (its related principle).
The BWM is a novel multi-criteria decision-making technique [19]. Like AHP, this method is suitable for weighting attributes, and although it has only recently been developed, it has already been applied to the construction of composite indicators [51,52]. BWM requires fewer pairwise comparisons than AHP (in the AHP method, the number of comparisons is n (n − 1)/2, while for the BWM, the number of comparisons is 2n − 3), which may lead to more consistent and reliable results. In order to derive the global weights of the base AEIs, BWM entails the following steps: first, as in AHP, the problem is structured as a tree-based hierarchy (overall goal, decision criteria, and decision subcriteria). Second, the best (sub)criterion (i.e., the most important) and the worst (sub)criterion (i.e., the least important) of the set of (sub)criteria are identified by the expert. Third, the preference for the best (sub)criterion over all the other (sub)criteria is determined using a number between 1 and 9, similar to Saaty's fundamental scale. The expert's preferences are then used to generate the Best-to-Others vector: A B = (a B1 , . . . , a Bk , . . . , a BK ), where a Bk shows the preference for the best (sub)criterion B over (sub)criterion k, and a BB = 1. Fourth, the preferences of all the (sub)criteria over the worst (sub)criterion are determined using a number between 1 and 9, as in the previous step. This information enables the construction of the Others-to-Worst vector: A W = (a 1W , . . . , a kW , . . . , a KW ) T , where a kW shows the preference for the (sub)criterion k over the worst (sub)criterion W, and a WW = 1. Finally, the local weights of decision (sub)criteria and the corresponding indicator of the consistency of responses (ξ L ) are obtained by solving the following linear programming model: minξ L , s.t. |w B − a Bk w k | ≤ ξ L , for all k |w k − a kW w W | ≤ ξ L , for all k k w k = 1, w k ≥ 0, for all k. (2) Using ξ L , it is possible to calculate a consistency ratio (CR), which must not be higher than 0.25. As in AHP, global weights (w k ) of the base AEIs are calculated by multiplying the local weight of each AEI by the priority of its associated principle.
The PA method is a straightforward weighting technique that has proved valuable for determining the priorities of the different attributes of sustainability composite indicators [40,53]. In this method, the expert is asked to directly allocate a fixed number of points (e.g., 10 or 100) among the multiple criteria (indicators in our case) considered in a decision problem to establish the weight of each criterion [54]. In this paper, we apply this method as follows: first, each expert has to distribute 100 points between the two principles related to the environmental sustainability of irrigated olive grove (biodiversity protection and conservation of natural resources); second, experts allocate another 100 points among the five AEIs linked to biodiversity protection and an additional 100 points among the six AEIs related to the conservation of natural resources. Third, the global weights of the AEIs (w k ) are obtained (in percentage terms) by multiplying the priority of each AEI by the priority of its related principle.

Data Collection for the Empirical Assessment of Environmental Sustainability
The empirical assessment of the environmental sustainability of the irrigated olives groves in Spain relies on two data gathering sources: a survey of farmers to collect the farm-level technical data needed to calculate the AEIs considered (I k ), and a survey of experts to obtain the weights assigned to each of these indicators (w k ).

Farmer Survey
Due to the dispersion of the olive orchards in Spain, we carried out multi-stage cluster sampling to obtain a representative sample of irrigated olive farms. First, following a random selection of agricultural regions proportional to the total area of olive groves, six agricultural regions in Andalusia were selected (Andalusia accounts for more than 80% of total Spanish olive oil production). Figure 1 shows their location on a map. Second, a number of farms proportional to the area of olive groves in the agricultural region were selected through quota sampling, taking into account the farm size. Third, the selection of olive growers to be interviewed was determined using random route sampling. This procedure yielded a final sample consisting of a total of 480 olive farms. Further details about the sampling procedure can be seen in [23]. The data collection was carried out via face-to-face interviews with the farmers (lasting around 35 min each) using a structured questionnaire with nine blocks: (1) farm characteristics (location, area of irrigated and rainfed crops, ownership type and farm labor); (2) olive growing characteristics (varieties, plantation age, tree density, type of management: conventional, integrated or organicand yield); (3) soil and weed management (agricultural practices, weed control-tilled or not tilledand use of cover crops); (4) olive-pruning (date and desuckering debris management); (5) irrigation system and use of water-soluble fertilizers; (6) fertilization (fertilizers and dosage); (7) crop protection (chemicals, dosage, and management plans); (8) olive harvest (manual or mechanical); and (9) farmer's socio-economic characteristics (gender, age, professional experience, family size, income share from agriculture, education level, membership of producer organizations, and generational renewal). This information allowed us to calculate the corresponding AEIs (I k ) at the farm level.
The sample of 480 farms is divided into four types of olive orchards: traditional mountain olive groves (rain-fed), traditional low-medium slope olive groves (rain-fed), semi-intensive irrigated olive groves and other types. In this study, the assessment of environmental sustainability focuses on the semi-intensive irrigated type and is based on a subsample of 99 olive farms. The Table 2 summarizes farms' and farmers' characteristics in this subsample:

Expert Survey
In order to weight the contribution of each AEI to the composite indicator measuring the environmental sustainability of olive farms (ENV_SUST), a multidisciplinary group of 22 experts was selected following a judgmental sampling method [55]. The expert panel was primarily composed of scientists from universities and research centers (15), but also contained specialists from the Regional Administration (3) and technical services firms (4). Although it is a non-probability sampling technique, the nature of the technical information required and the homogeneity of the group (in terms of their expertise) suggest that the data gathering is reliable and bias-free [56].
The survey was based on one-to-one interviews, and two sessions were conducted with each expert. After an introduction about the objective of the study and the assessment methods, the questionnaires designed for each of those methods (AHP, BWM and PA) were administered. Each interview took approximately 20-30 min. Since the AHP questionnaire was slightly longer than the one for BWM, the short questionnaire conducted for the PA method was included in the BWM session. To avoid order effects (period and carryover effects), a counterbalanced Latin square design was followed [57,58]. Thus, half of the experts were given the AHP questionnaire first, whereas the other half began with the BWM and PA methods (controlling for the period effect). Then, two days later, the experiments were reversed for each group of experts (controlling for the carryover effect).
Finally, it is worth pointing out that from these questionnaires we obtained the individual AEI weights according to each of the three methods used, for each expert in the panel (w k j ), with the subscript k denoting the base indicator and the subscript j denoting the expert considered. However, the weights to be included in Equation (1) are the result of the synthesis of the panel's weights. In this regard, we follow Forman and Peniwati [59], who suggest that group decision-making should be performed by aggregating individual weights using the geometric mean for every weighting method (AHP, BWM and PA):

Indicator Weighting
The values of the consistency ratios for AHP and BWM, although not reported here due to space constraints, do not exceed the permissible threshold levels and, hence, provide evidence of the high degree of reliability of the experts' responses. Table 3 shows the summary statistics of the global weights obtained by the three methods implemented (AHP, BWM and PA). There is consensus about the most important AEI for the protection of biodiversity; namely, the index of biological diversity (DIVERSIND). This is followed by the pesticide risk (PESTRISK) for AHP, and by the percentage of non-cultivated land (NONCULTIV) for BWM and PA. Regarding the second principle, the conservation of natural resources, there is also consensus about the most important indicator, soil erosion (EROSION), followed by soil organic matter (ORGMAT). It is worth noting that the lowest variability of AEI weights is found with the PA method.

Inter-rater Reliability
Before undertaking the comparison of the weighting methods, we assess the degree of agreement between the experts' AEI weights, using the intraclass correlation coefficient (ICC). Unlike the traditional correlation coefficient, based on paired observations, the ICC simultaneously considers the group agreement. Higher ICC values indicate higher inter-rater reliability, with 1 indicating perfect agreement and 0 only indicating random agreement [60]. There are 10 different forms of ICC depending on: (1) the statistical model; one-way or two-way models, according to whether the source of variation comes from objects or subjects (raters), respectively; (2) whether raters are considered as random or fixed effects (two-way random-effects model or two-way mixed-effects model, respectively); and (3) the type of agreement: absolute agreement (for the same object, similar scores among raters) or consistency (for the same object, similar ranking among raters) [61,62].
Since the experts are not randomly selected and we are interested in assessing whether or not the AEI weights are equal within each weighting method, we estimate the absolute agreement among experts using a two-way mixed-effects model. The resulting ICCs for AHP, BWM and PA are presented in Table 4. To interpret ICC, Cicchetti [63] gives some guidelines: <0.40, 0.40-0.59, 0.60-0.74, and >0.74 for poor, fair, good and excellent reliability, respectively. Additionally, Koo and Li [64] give slightly different intervals: <0.5, 0.5-0.75, 0.75-0.9 and >0.9 for poor, moderate, good and excellent reliability, respectively. In our case, the degree of agreement can be regarded as good to excellent in all three methods. Thus, despite the sample size limitation, all three methods produce consistent assessments of AEI weights. Notwithstanding these outcomes, as commented above, PA produces the least variability in experts' assessments in terms of the coefficient of variation (see Table 3).

Multivariate Comparison of Weights from the Three Methods
In order to compare the AEI weights from the three methods, a within-subjects multivariate analysis of variance (MANOVA) design was implemented. The MANOVA not only reduces the chance of Type I error but can also account for the correlation among the dependent variables [65], and therefore has more power to detect differences among groups [66,67]. The experimental design met the assumptions relating to the measurement of the dependent variables at interval scale, the independence of observations, and adequate sample size (more observations than the number of dependent variables; in this case, 22 observations vs. 11 variables).
Regarding the additional assumptions of the MANOVA, we conclude: (1) the visual check of scatterplots suggests that the condition of linearity (no curvilinear pattern between all pairs of indicators) is met; (2) the conditions of homogeneity of variance-covariance matrices (Box'M statistic = 124.0, p-value = 0.686) and homogeneity of variances (minimum Levene statistic = 0.895, p-value = 0.414) are fulfilled; (3) there is no multicollinearity among the AEIs (maximum r = 0.580 < 0.90 [67]); (4) none of the three methods satisfy the multivariate normality assumption, however, since we have equal group sizes, the MANOVA is robust given the absence of multivariate outliers [68,69]; and (5) no multivariate outliers were identified using Mahalanobis distance (minimum probability equals 0.066). According to this evidence, the MANOVA can be applied to check for significant differences of means among weighting methods. The results of the MANOVA are shown in Table 5.
As Table 5 shows, three out of four multivariate criteria test statistics suggest there is no statistically significant difference in means. Furthermore, as Kuhfeld [70] points out, in the event of a discrepancy between Roy's Largest Root and the other three test statistics, the effect should be considered to not be significant. In summary, the results suggest that the AEI weights do not depend on the weighting method (AHP, BWM, and PA). It is subject to discussion whether the consistency of the three methods depends on the use of the same group of experts. Although further research would be needed in this regard, a random subsampling of the 22 experts was carried out (assigning 8 to AHP, 7 to BWM, and 7 to PA) to compare the AEI weights, yielding the same conclusion based on the MANOVA test (p-values: 0.190, 0.256, 0.338 and 0.158, respectively). Table 6 shows the main descriptive statistics of the 15 distributions of the composite indicator ENV_SUST (3 weighting methods × 5 values of lambda) obtained for the 99 olive farms sampled. As can be clearly observed, the index values calculated vary more due to the compensation parameter λ than due to the weighting method. In fact, while there is no statistically significant difference among the means and the variances of the ENV_SUST distributions obtained for every single λ, it can be proved that the average values significantly decrease as the compensation parameter decreases. Although the discussion about the most suitable value of the compensation parameter to measure farms' environmental performance is beyond the scope of this paper, it is worth pointing out that composite indicators based on complete compensability (i.e., λ = 1) have been criticized because trade-offs between base indicators could be considered incompatible with the concept of sustainability [13,40]. It is thus reasonable to opt for indexes that allow partial compensability (i.e., ENV_SUST for 0 < λ < 1). In any case, the selection of the most suitable value of λ is an issue that remains open for discussion in future studies [18].

Assessing the Environmental Performance of Irrigated Olive Farms
The assessment of the rankings of farms produced by the composite indicator ENV_SUST using the three weighting methods, for each of the five values considered for the compensation parameter (λ = 0, λ = 0.25, λ = 0.5, λ = 0.75 and λ = 1), is carried out using the Kendall's coefficient of concordance, or Kendall's W [71], mathematically: where m = 3 (AHP, BWM, and PA), k = 99 (number of farms), r ij = ranking of farm i by method j, R i = m j=1 r ij and S 2 = k i=1 R 2 i . To test the null hypothesis of no agreement among the methods, that is W = 0, the statistic to be used is m(k − 1)W~χ 2 k−1 . As the results in Table 7 show, the ranking of the irrigated olive farms based on their environmental performance (i.e., values of composite indicator ENV_SUST), for any lambda considered, does not depend on the weighting method used (AHP, BWM, or PA). Furthermore, when considering the 15 rankings simultaneously, the overall Kendall's W indicates a strong level of concordance (Kendall W's = 0.705). This indicates that, regardless of the weighting method or the compensation parameter, all ENV_SUST measurements provide similar rankings of the sampled irrigated olive farms.

Conclusions
Measuring environmental sustainability in agriculture through the construction of composite indicators is a widespread practice, although it is a tough task. An especially challenging aspect is the choice of the most appropriate methods to normalize, weight and aggregate the large set of base agri-environmental indicators usually considered. In particular, the results of environmental composite indicators are heavily dependent on how the base indicators are weighted (i.e., if the indicator weights can accurately synthetize the relative importance of each AEI included in the index built). For this reason, our main aim in this paper was to analyze the robustness of three alternative weighting methods (AHP, BWM and PA). To that end, we consulted the opinions of a panel of experts and compared the results obtained for an environmental index implemented in the real-world case study of irrigated olive farms in Spain.
In light of the results, we can identify three main findings: first, there is a high level of consistency in experts' assessments of AEI weights derived from the three weighting methods; second, there are no statistically significant differences in the means of the AEI weights estimated with the three methods; and third, the values of the composite indicator built (ENV_SUST) using the three alternative weighting techniques produce similar rankings of the irrigated olive farms in terms of their environmental performance. Further evidence regarding the consistency of the weighting methods is needed to confirm whether this finding is generalizable. In any case, it can be hypothesized that similarly consistent results can be expected whenever the composite indicator construction (i.e., weighting) is focused on the assessment of technical concepts, where expert opinion is rooted in empirical knowledge.
Overall, these findings provide useful empirical insights into the robustness of the two multicriteria methods, AHP and BWM, and the more straightforward PA, as weighting techniques to be used when constructing composite environmental indicators. However, although the three methods are valid, feasible tools to determine the weights of the individual base indicators, it is worth noting that the PA could be cumbersome if there is a large number of indicators to be included in a single index (e.g., more than six) [44].
Beyond the methodological focus, it is also worth pointing out that the ENV_SUST composite indicator implemented in this study is sufficiently stable and methodologically sound to be used in the design of targeted policy interventions aimed at measuring the environmental sustainability of farms. In this context, there is still room for research on the practical implementation of this environmental index to analyze the heterogeneous environmental performance among farms (i.e., determining reference and threshold values) and track changes in agricultural practices (i.e., irrigation or fertilization). This practical information could be useful for policy-makers in the design of the results-based agri-environmental programs, to set the level of payments to be granted to each particular farm or any other policy instruments with a similar purpose (i.e., fiscal or qualitative rewards). Funding: This research was partially funded by the Andalusian Department of Economy and Knowledge and the European Regional Development Fund (ERDF) through the research project FINAGUA (UCO-1264548). These funding institutions had no involvement in the conduction of the research nor the preparation of the paper.