Methodologies for the Sustainability Assessment of Agricultural Production Systems, with a Focus on Rice: A Review

: The intensiﬁcation of agricultural production is connected to the increased use of fertilizers, pesticides, irrigation water, and energy. Among all cropping systems, rice cultivation is considered to be one of the most signiﬁcant sources of environmental harm due to the ﬂooding conditions in which rice normally grows; at the same time, rice has important economic and social implications, especially in areas where it is a staple food. In the last 20 years, sustainable development of agricultural production has become a priority for scientiﬁc research and policy programs. Several studies proposed methodological frameworks to assess the impacts of different management practices adopted in agro-ecosystems and to identify strategies to mitigate the negative effects of agricultural intensiﬁcation. Such methodologies are based on the use of particular indicators, which are increasingly seen as crucial tools in impact assessment studies and for decision making. This paper aims to review and analyze the most signiﬁcant methodological frameworks developed to assess the sustainability of agricultural production systems, with a particular focus on rice cultivation. The analysis includes highlighting which dimensions of sustainability (economic, environmental, social, and governance) are covered by each method and identifying which indicators are used to describe the different dimensions. The spatial scale of the application of the indicators, their typology, the data needed for their implementation, and the criteria for formulating the overall sustainability judgment were then examined. The analysis highlighted the scarce availability of clear operational data for the calculation of the indicators and the often-limited involvement of stakeholders in the development and implementation of the methodologies. The exceptions to these limitations are represented by a few methodologies developed under the umbrella of important international organizations to promote sustainability and research efﬁciency in speciﬁc agricultural production systems, such as the SRP (sustainable rice platform) for rice. Finally, the analysis shows that there is a need to develop methodologies that are applicable not only to an individual farm or group of farms, but also at larger spatial scales (district, watershed, region), which are often those of greatest interest to decision makers.


Introduction
The expansion of agricultural land and the intensification of agricultural production methods are approaching environmental, social, and economic limits [1].In particular, the unsustainable use of natural resources in the agricultural sector has caused a variety of negative environmental outcomes, such as groundwater depletion, agrochemical pollution, soil exhaustion, and increase of greenhouse gas emissions [2,3].It is therefore feared that the productivity of many intensive systems cannot be maintained under current management practices [4].Thus, it is necessary to move towards more eco-efficient management strategies to minimize the negative effects on the environment [5,6]; however, this must be achieved while maintaining crop productivity in order to meet global food demand and to safeguard (or improve) the economic profitability of farms and social well-being [7].
Among all crops, rice has an important position because it is one of the most widely grown in the world and it is the primary food for more than a half of the global population.Rice is grown in most countries under flooding conditions using large volumes of water, which may also mobilize agrochemicals; moreover, anoxic conditions are responsible for the emission of an important greenhouse gas (methane).From 1961 to 2019, the rice area harvested globally increased from about 115 million ha to about 162 million ha, with significant conversions of natural lands to arable lands.Methane emissions are estimated to have increased from 17,400 to 24,100 kilotons in the same period [8].Yield production increased dramatically with the introduction of high yielding crop varieties, farm mechanization, various types of chemical fertilizers, and pesticides [3,9].Farmers have often adopted the attitude of 'more is better' to increase yield production, regardless of economic and environmental costs [6,10].Among others, refs.[11][12][13] have stated that the current situation and trends of rice farming are not sustainable, and that rice sustainability assessment has become a key challenge for policymakers, development planners, researchers, and academics in many areas of the world.
In the last 20 years, scientists and experts have felt the pressing need to involve international organizations, policymakers, governments, and local and national institutions in discussions on how to increase the agricultural sustainability of all cropping systems.These discussions led to the development of a number of methodological frameworks designed to assess the sustainability of agricultural production systems.The methodologies commonly rely on sets of indicators, usually grouped under the following three 'sustainability dimensions': ecological, economic, and social.Sometimes the governmental dimension is added to these.Thus, each dimension unfolds into a set of indicators, each of which is built to investigate a specific aspect of sustainability.
The objective of this paper is to provide a review of the main methodologies developed to assess the sustainability of agricultural production systems, with a special focus on assessments applied to or specifically developed for rice systems.To achieve this objective, the paper is structured as follows: main sustainability assessment methodologies based on indicators and sustainability dimensions (Section 2); indicator selection and type of data required (Section 3); spatial scale of the assessment and data collection (Section 4); overall sustainability judgement (Section 5); sustainability assessment of rice agro-ecosystems (Section 6); discussion on the criticalities of the existing methodologies (Section 7); concluding remarks and future research needs (Section 8).

Main Sustainability Assessment Methodologies Based on Indicators and Sustainability Dimensions
Several methodologies to assess agricultural sustainability can be found in the literature.Table 1 summarizes the principal characteristics of the most interesting methodologies which are based on indicators.In particular, it reports the purpose for which the assessment techniques were developed, the spatial scale at which they were applied, the proponents of each methodology, the dimensions of sustainability taken into account, and the approach used to reach a quantification of the overall sustainability of the production system.More detailed information about the number and names of indicators considered under each sustainability dimension is reported in an additional table in the Supplementary Materials.The three classical dimensions of sustainability (i.e., 'environmental', 'social', and 'economic') are included in a number of methodologies (Table 1).These include: RISE 'Response-Inducing Sustainability Evaluation' [19,20], SAFE 'Sustainability Assessment of Farming and the Environment' [27], MOTIFS 'Monitoring tool for integrated farm sustainability' [33], IDEA 'Indicateurs de Durabilité des Exploitations Agricoles' [24], the MESMIS framework 'Evaluating the sustainability of complex socio-environmental systems' [35], SEAMLESS 'System for Environmental and Agricultural Modelling; Linking European Science and Society' [34] and SRP 'Sustainable Rice Platform methodology' [44][45][46].Only a few methodologies add 'governance' as a fourth dimension in the sustainability analysis: SAFA 'Sustainability assessment of food and agriculture systems' [29][30][31], SIRIUS 'Sustainable Irrigation water management and River-basin governance: Implementing User-driven Services' [1], and PSDCIFASA 'Problem-oriented Status-Driver Composite Indicator-base Framework of Agricultural Sustainability Assessment' [2].
However, several methodologies focus only on one or two sustainability dimensions.For instance, the life cycle assessment (LCA) methodology is often adopted when only the environmental dimension is examined [14,[48][49][50][51][52].Similarly, the Agri-environmental Footprint Index (AFI) methodology, presented by [15], addresses only the environmental dimension [16].Other methodologies utilize economic indicators to perform an agroecological and economic sustainability assessment, such as, for instance, Bechini and Castoldi (2010) [28] and SOSTARE 'Analysis of farm technical efficiency and impacts on environmental and economic sustainability' [32].
Whatever the sustainability dimensions considered, they are usually unfolded in a series of themes, which sometimes are further subdivided in a list of subthemes, which are finally linked to a set of indicators.Thus, each sustainability dimension relies on a set of indicators.

Indicator Selection and Type of Data Required
The selection of indicators is a crucial step in the assessment of sustainable agricultural systems.Given its importance, the issue of defining robust criteria for indicator selection has been widely discussed in the literature [2,3,17,27,53,54].For instance, [53] states that 'defining an appropriate set of indicators for the sustainable development of an agricultural sector is a difficult task' and explains that 'if too few indicators are monitored, crucially important developments may escape attention.On the other hand, if too many indicators are considered, data collection and data elaboration become difficult to manage at a reasonable cost, redundancies might appear and the message expressed by the indicator set becomes difficult to understand'.Ideally, the developed/selected indicators should be RACER: relevant, accepted, credible, easy, and robust [16,55].Following [56], indicators need to meet two criteria: (i) they must objectively measure progress towards achieving sustainable development goals; and (ii) they must be easy to use and also applicable to local conditions.Details on indicators proposed by each methodological approach are reported in the Supplementary Materials.
Many indicators require quantitative data, directly measured or available from other studies, that are then elaborated upon and used in the computation of the indicators' values.This is usually the case for indicators under the economic sustainability dimension, such as farm profitability or energy productivity.Quantitative data are also required for many indicators under the environmental sustainability dimension, such as greenhouse gas emissions or pesticide-use efficiency.An example of environmental indicators based on quantitative data are those computed in the 'inventory analysis' phase of the LCA methodology.The 'inventory analysis' is based on the following two steps: (i) collection of 'input data', relating to production factors such as inputs of organic and mineral fertilizers, water, pesticides, and energy consumption for agronomic and irrigation operations; and (ii) retrieval of 'output data', relating to crop production, straw production, and fertilizer and pesticide emissions to air, surface water, groundwater, and agricultural soils.Emissions can be measured directly in the field or estimated using models.In absence of field measurements, a number of models have been applied in different studies.For instance, the IPCC model [57] has usually been adopted for the estimation of methane and nitrous oxide emission to the air, while the SALCA-Nitrate [58] and SQCB models [58,59] have been used to estimate nitrate leaching to groundwater in European countries and elsewhere.Moreover, the SALCA model has been used in studies that estimate pesticide leaching in surface water, groundwater, and agricultural soils [58].After the 'inventory analysis' phase, inventory data are aggregated to compute the final LCA indicators (e.g., climate change, acidification of soil and water, eutrophication, freshwater aquatic ecotoxicity, mineral resource depletion, and fossil fuel depletion), which are subsequently used to provide an overall sustainability judgment with respect to the environmental dimension (see Section 5).
In contrast to indicators based on data that are quantitative by nature, other indicators used in the sustainability assessment of agro-ecosystems are subjective.They mainly refer to the social and governance sustainability dimensions, such as, for instance, indicators related to working conditions or social security.In these cases, information is usually collected through stakeholder consultations, which involve questionnaires or interviews that are often based on close-ended questions to ensure that data collected can be reliably compared among respondents and are statistically processable.Results are then combined to obtain a value that can be expressed either through categories (e.g., 'safe', 'fairly safe', etc.), or converted into a score that can be referred to on a scale (e.g., in the range 0-100, with higher values indicating a higher safety level).For instance, the health and safety indicator proposed by the SRP methodology [45,46] is calculated from the results of a questionnaire including multiple choice questions related to work conditions in the farm, where a higher score indicates a higher level of safety.

Spatial Scale of the Assessment and Data Collection
The methodologies listed in Table 1 are applied to specific spatial scales during an assessment.Thus, data collection and the presentation of results also occur at a certain spatial scale.Most methodologies are based on data collected at the field/farm scale, such as ISAP, IDEA, [17, 18,28], SOSTARE, MOTIFS, SEAMLESS, SAFE, LCA, SRP, and AFI.Some studies collect data from many farms in a certain geographical area (e.g., a natural park, a province, or a region) and their goal is to present general results for one or more cropping systems in that area (taken individually or compared one another).Such a study was carried out by [18] to compare 45 and 65 farms operated under organic and conventional agricultural management systems, respectively, in Bangladesh.Another example is the ISAP methodology, which was applied at the farm scale in the UK using data collected through a survey conducted in 80 organic and 157 conventional horticultural farms.The SAFE methodology was explicitly designed to cover three spatial levels: the field level (data collection and estimation of sustainability indicators at the field level); the farm level (operating a weighted average of sustainability indicators calculated at the field level to reach the farm level); and a higher spatial level that can be landscape, regional, or national scale (based on a weighted average of farm sustainability indicators to obtain indicators which apply to larger spatial scales).The approach adopted in SAFE to cover the three levels is described in [26].
On-farm data can be collected through structured questionnaires compiled during faceto-face interviews with farmers or other key actors within the agricultural sector [14,28], or by using computer-assisted personal interviewing software [60].Questionnaires are generally used to collect the on-farm data needed for the calculation of economic and environmental indicators.When very accurate or very specific data are required, information can be directly measured by researchers or other institutions conducting the study [45,46].Otherwise, literature sources and existing databases, such as the ecoinvent database [61], can be a valuable source of information needed for the assessment, such as, for instance, energy consumption of the most common agricultural machinery.
To facilitate the collection of data and the calculation of indicators at the farm scale, the SRP Performance Indicators methodology [45,46] describes data needed for each specific indicator based on three levels of detail: basic, intermediate, and advanced.These levels correspond to the different data requirements and to the varying complexity of different numerical procedures to be applied in the calculation of the indicators.In other words, the SRP approach allows the user to select the most suitable complexity level for which: (i) data are already available or can be collected; and (ii) the sufficiency of the user's knowledge to allow an indicator's calculation.In case the users are farmers, the basic level is usually selected.Collection of basic data (Level 1) is considered to be an entry point, while the collection of higher-level data (more detailed) and the adoption of more complex calculation procedures are adopted by the more skilled users (e.g., researchers, environmental associations, and institutions that adopt the methodology for specific purposes).
Few methodologies have been developed to directly make use of data collected at large spatial scales (e.g., district, watershed, or region) and provide results that are valid at these scales; in this review, only the SIRIUS procedure exhibited such characteristics.SIRIUS was developed to assess the sustainability of irrigated agricultural systems and was applied in ten irrigation districts that are located in eight different countries which have diverse levels of agricultural development, environmental conditions, socio-economic settings, and political contexts, but which shared the characteristic that water use was a critical issue for agricultural sustainability.Most of the information used to calculate indicators focused on water balance, water quality, biodiversity, education, and health (among others), and was collected at the irrigation district level by a team which included researchers, agricultural association members, and water managers from the ten pilot areas involved in the project.Other data were obtained from officially published data (e.g., official statistics) or other verifiable sources (e.g., scientific literature).However, when necessary, data were collected at the farm level and then up-scaled at the irrigation district level, such as in the case of data related to production costs and revenues [1].The SIRIUS methodology, due to the spatial scale taken into consideration, differs from the others presented in this paper.Unfortunately, documents illustrating the approach are not particularly detailed, and for this reason its application may be not straightforward.

Overall Sustainability Judgement
To evaluate the overall sustainability performance of an agro-ecosystem, once all the indicators have been calculated, the following procedures are the most commonly used: (A) the comparison of indicator results with reference values; (B) weighting and scoring of indicator results; and (C) scale rating of indicator results.Alternatively, some studies apply a fourth procedure: (D) statistical techniques to compare the indicator values of two or more farms or cropping systems, without reaching an absolute judgment about their sustainability performance.
The first approach (A, the comparison of indicator results with reference values) is adopted, for instance, by [17], and in the SIRIUS, SAFA, MESMIS and SRP sustainability procedures (Table 1).The reference values express the desired level to be reached by each indicator.Reference values are usually determined by scientific studies or coincide with values provided in legislation.When legal values are used, it is important to keep in mind that such values are typically the result of a negotiation among policy makers, farmers' representatives, advisory organizations, and scientists, and therefore they do not necessarily express optimal values for farmers, the environment, or society.
Reference values are often introduced as threshold values (i.e., as minimum or maximum-or as a range of-acceptable values).Ref. [62] used reference values in their study on current soil fertility management practices in wheat, cotton, and chive agroecosystems and their impact on nitrate concentration in groundwater for the north China plain.In the study, authors adopted a set of threshold values indicating a good sustainability level: ≥75 mg/kg for soil N content, ≥10 mg/kg for soil P content, ≥100 mg/kg for soil K content, between 6 and 7 for soil pH, and ≤1% for soil organic matter content (derived from previous studies of [63,64], as well as ≤50 mg/L for nitrate concentration in groundwater (proposed by the World Health Organisation [65]).Another example of a methodology adopting the use of reference values is the SAFA procedure, which reported a reference value for each calculated indicator; for instance, nitrogen and phosphorus balances must not deviate by more than 10% from zero.
The second approach (B, weighting and scoring of indicator results) is adopted by many sustainability assessment methodologies, such as ISAP, IDEA, SOSTARE, MOTIF, PSDCIFASA, SEAMLESS, [25], LCA, and AFI.This approach consists usually of the following operations: (i) calculation of numerical indicators; (ii) conversion of indicator values to nondimensional variables (normalization) in order to allow a comparison among indicators; (iii) weighting, by assigning a specific weight to each indicator (usually based on expert opinion or from literature reviews), a step which is often dependent on the focus of the sustainability assessment; and (iv) aggregation, by combining (often summing) the weighted indicators to obtain a final sustainability index, also called 'score', such that the higher the score, the more sustainable the agro-ecosystem.
The third approach (C, scale rating of indicator results) is adopted by the RISE and SAFE methodologies.Following this approach, the individual indicators are normalized and then a score is assigned to each of them; successively, scores of the individual indicators are used directly (as in the case of RISE) or aggregated (as in the case of SAFE) to evaluate the overall sustainability of a cropping system.As an example, in the SAFE methodology, the procedure adopted is the following: (i) indicator values are normalized through functions that assign to each indicator the corresponding value of a sustainability index (SI) that ranges from 0 ('unacceptable level of sustainability') to 1 ('desired level of sustainability'); (ii) SIs of all indicators are weighted; and finally, (iii) weighted SIs are aggregated to obtain the overall sustainability index, which also ranges from 0 to 1, with lower values indicating low sustainability, and higher values referring to the high sustainability of the agro-ecosystem.
Finally, the methodologies of group D adopt statistical methods to compare two or more production systems.For instance, [18] compared two systems, organic farms (45) and conventional farms (65), in terms of their environmental soundness, economic viability, and social acceptability, by using different statistical tests for the different indicators.Results of the statistical analysis showed that significant differences were found between the two systems in relation to crop diversification, soil fertility management, and crop protection.However, no significant differences were found with respect to other indicators such as crop yield and stability, or food security.Ref. [28] evaluated the effect of cropping system management on the environment and economic profitability in northern Italy at the crop and field levels by using simple statistical indicators, such as the average and standard deviation of indicator values.This approach highlighted that when a single crop was considered, the gross margin of soybean was lower than that of rice, while, at the field level, the gross margin of maize cultivated in rotation with other crops was lower than that of continuous maize.Ref. [60] applied the SRP Performance Indicators procedure to six rice-producing Asiatic countries.Since the SRP documentation does not provide reference values for the indicators, the authors proposed a statistical approach to overcome this problem.Performance indicators were calculated first for all the farms selected, and then a baseline value (population mean) and a target value (top decile) for each indicator were calculated for each country and compared to identify the gaps between them, as well as to provide a pathway of action to be adopted by farmers to meet the target value, such as the adoption of best management practices.In the study of [60], across the six sites, there was a yield gap of 24-42% and a profit gap of 36-82% between the baseline and the target value.In addition, there was a labor productivity gap of 12-32%, a nitrogen use efficiency (NUE) gap of 11-20%, a phosphorus use efficiency (PUE) gap of 1-29%, and a water productivity gap of 12-42%.Consequently, strategies that could reduce the gap between the baseline and target values were identified.

Sustainability Assessment of Rice Agro-Ecosystems
Only a few of the methodologies described in Table 1 have been applied to cropping systems in which rice plays an important role.The list of indicators proposed by these methodologies are described in Table 2.
The SOSTARE methodology was adopted to assess the sustainability of farm production in one of the most intensively cultivated areas in Europe, that is, the Po plain in northern Italy [32], where rice is one of the most important crops (maize, permanent meadows, and winter cereals were also considered).Data were collected from 68 farms which were considered to be representatives of the region.In particular, livestock farms (dairy, cattle, and swine) account for about 25% of the total farms operating in the study area.Cropping system management is mostly intensive and conventional, and less than 15% of the farms in the area apply agro-environmental measures, such as reduced use of mineral fertilizers, and fewer than 1% of the farms are organic.
The methodology proposed by [18] was used to assess the sustainability of ecological and conventional agricultural systems in the Tangail district in Bangladesh, including paddies, which cover a large area within the district, together with wheat, potato, and other crops.To make a comparison between the two agricultural systems, two villages, one adopting conventional practices and another using ecological techniques, were selected in the Delduar subdistrict and considered to be representative for the two cropping systems in the region.
The methodology proposed by [28] was developed to evaluate the effects of cropping system management on the economic and environmental sustainability of agricultural systems in northern Italy.The study area is a regional agricultural park with cereal and livestock farms, which cultivate mostly maize, rice, permanent meadows, winter wheat, winter barley, Italian ryegrass, triticale, and soybean.Table 2. Indicators proposed in the literature for the sustainability assessment of rice production at the farm scale.

Proposed Indicators in the Literature Dimension Sources
Profitability and productivity: grain yield, labor productivity Economic [6,45,46,60] Variable costs, gross income, gross margin [28] Value of production, value added, farm household income, independence from CAP subsidies, farm business diversification SOSTARE [32] N-use efficiency, P-use efficiency Environmental [6,28,45,46,60] K-use efficiency [6,60] Water productivity and water quality, greenhouse gas emission [6,45,46,60] Pesticide use efficiency, biodiversity [45,46] Fossil energy input, energy output, dependency of food and feed production on non-renewable energy, load index algae, load index crustaceans, load index fish, load index rats, environmental exposure (air), environmental exposure (soil), environmental exposure (groundwater), crop sequence indicator, soil cover index, soil organic carbon indicator [28] Land-use pattern, cropping pattern, soil fertility management, pest and disease management, soil fertility status [18] Cropping system and soil fertility, nutrient application and management, consumption of non-renewable energy, water resource management, agrochemical management, natural value of the farm, functional landscape pattern SOSTARE [32] Climate change (CC), ozone depletion (OD), terrestrial acidification (TA), freshwater eutrophication (FE), marine eutrophication (ME), human toxicity (HT), photochemical oxidant formation (POF), particulate matter formation (PMF), terrestrial eco-toxicity (TET), freshwater eco-toxicity (FET), marine eco-toxicity (MET), ionizing radiation (IR), agricultural land occupation (ALO), urban land occupation (ULO), natural land transformation (NLT), water depletion (WD), mineral resource depletion (MRD), fossil fuel depletion (FD) LCA [38,48,51,66] Social acceptability in terms of: input self-sufficiency, equity, food security, risks and uncertainties involved in crop cultivation Social [18] Food safety, worker health and safety, child labor and youth engagement, women empowerment [45,46] Still fewer are the methodologies that have been applied in specific studies exclusively to the rice agro-ecosystem; among them, the life cycle assessment (LCA) has been most frequently used.LCA was used to study the environmental impact of rice produced in northern Iran [38] and in Bangladesh [51].In the study presented by [38], 100 paddy fields of different sizes, utilizing different agricultural management practices (traditional and semi-mechanized) and input regimes (low, conventional, and high) were selected and compared in two Iranian regions (Amol and Rasht).In the research presented by [51], general data describing the baseline rice paddy in Bangladesh used for the LCA were obtained mainly from available datasets, literature studies, and by interviewing rice farmers and experts of the Bangladesh Rice Research Institute.Ref. [48] assessed the environmental profile of organic rice cultivation in the Pavia district (Lombardy region, northern Italy) by collecting data from 19 paddy fields; the objective of the study was to highlight the main environmental hotspots for organic rice in northern Italy.In the same area (Vercelli district, Piedmont region, northern Italy), [66] applied the LCA approach to assess the environmental consequences of rice production by considering the cultivation practices most frequently adopted, with particular attention paid to different straw management strategies.Data describing the cropping systems were identified from published literature and interviews with experts (researchers, representative farmers, and technicians), as well as through surveys conducted in farms of the Vercelli district, which were chosen with the support of Ente Nazionale Risi (Italian Rice Bureau).
In recent years, the Sustainable Rice Platform (SRP) initiative has been developed specifically to promote sustainable rice cultivation through the introduction of a range of tools, including the SRP Standard on Sustainable Rice Cultivation [45,46], and a set of twelve SRP Performance Indicators [45,46] connected to the economic, social, and environmental sustainability dimensions of rice production (Table 2).Data needed and approaches for the calculation of each individual indicator proposed by SRP [45,46] change according to the level of analysis that is selected (basic/intermediate/advanced), as already explained in Section 4. Ref. [60] applied the SRP Performance Indicators methodology version 1.0 [44] to compare the economic and environmental sustainability performance of rice production among six intensive rice-producing regions in Asia.This was the first multi-country comparison using SRP indicators.The specific objectives of the study were to suggest priorities for research and development in each country, and to provide indications for setting target values of the different indicators (in particular, the mean of the indicator values of the top decile farmers in each country was suggested to be adopted as target values).Another recent study that applied the SRP Performance Indicators methodology was that of [6], which was carried out in the Mekong Delta in Vietnam.In Vietnam, rice production is characterized by an overuse of inputs (fertilizers and pesticides), coupled with rising rice production costs, which are increasing the difficulty of producing rice while maintaining economic profitability and a low impact on the environment.The SRP Performance Indicators methodology was applied to evaluate the effects of a number of management practices proposed by the Vietnam Ministry of Agricultural and Rural Development from 2003 to 2006 and those already implemented (e.g., using good-quality seeds, which should reduce seeding rates; the optimized use of water, fertilizer, and pesticide inputs; changes in postharvest management), in order to identify the most effective techniques for increasing the overall sustainability of the system.

Discussion on the Criticalities of the Existing Methodologies
In this section, the main issues arising from the review analysis are illustrated and discussed.The first point which emerges is that publications presenting the application of the methodologies to specific case studies are often lacking a detailed illustration of data to be collected and steps to be undertaken to calculate indicators and elaborate the final overall sustainability judgement.Among the studies cited in this paper, only [28,45,46,48,60,66] provide a good degree of detail on all the steps to be carried out for an assessment.
Moreover, sustainability assessment methods are often developed by researchers (as shown in Table 1, column 3).As a matter of fact, the involvement of stakeholders and experts in the design of the assessment framework from the very first steps is not easy, as it requires much time and effort to understand procedures and to schematize the processes and build the indicators.It was obvious from this review that the involvement of stakeholders and experts in the indicator validation phase is often insufficient or completely missing.Even when stakeholders and experts are involved in identifying reference values or in the choice of weighting and scoring procedures to provide the overall sustainability judgment, numerical reference values and criteria adopted in the choice of weighting and scoring are seldom reported in the papers.
Additionally, very few methodologies have introduced indicators applicable at spatial scales larger than an individual farm.While some of the farm-level indicators can be used to describe an agricultural system also at larger spatial scales (e.g., labor productivity, nutrient use efficiency), others may need to be revised as additional mechanisms and processes come into play at larger scales.For instance, in the case of environmental indicators dealing with water use in agricultural areas, it must be taken into account that percolation under the rooting zone and surface drainage can be seen as pure losses at the field scale, degrading efficiency, while at larger scales these same processes can be considered to contribute to groundwater recharge and to increase water availability for downstream areas, therefore increasing the efficiency of the system [67,68].
Regarding the assessment of environmental sustainability in rice agro-ecosystems, until the introduction of the SRP approach, there were no methodologies developed specifically for rice, and only a few of the existing strategies have been applied to cropping systems that include rice.In this context, it seems important to underline that while the LCA has been so far the most used methodology used to assess the environmental sustainability of rice systems, this approach has significant shortcomings.For example, LCA cannot specifically highlight the main environmental impacts of rice cultivation (i.e., methane emission, surface water and groundwater pollution due to inappropriate use of pesticides, and the use of large water volumes); rather, these specific impacts are masked within more 'global' indicators (e.g., climate change, acidification of soil and water, eutrophication, freshwater aquatic eco-toxicity, mineral resource depletion, and fossil fuel depletion).Since 2015, the SRP has been important for the construction of indicators for sustainable rice cultivation, which can be used both for a self-assessment by rice growers, as well as for evaluating the sustainability of rice agro-ecosystems in the current situation and/or scenario analysis.Indicators proposed by [45,46] address many relevant environmental aspects which are strictly connected to the traditional flooding conditions applied in paddy fields, including water productivity, water quality, GHG emission, nutrient and pesticide use efficiency, and biodiversity.However, the metrics refer only to the farm scale; it would be relevant to investigate how to extend these values beyond the farm boundaries, as already mentioned for other agro-ecosystems.

Concluding Remarks and Future Research Needs
This work provides an extensive review of methodologies developed over the last 20 years to evaluate the effects of management practices adopted in various agro-ecosystems on their overall sustainability.Particular attention was paid to the rice production system, because rice is a widely cultivated crop that can potentially produce significant impacts on the environment.
From the review conducted, some critical issues emerge, such as, above all, the lack of documentation that describes in detail all steps that must be undertaken to apply the assessment methodologies, from data collection to the formulation of the final sustainability judgment.If authors wish to see their indicators and methodologies applied in a variety of economic, environmental, and societal contexts, they should strive to produce comprehensive technical documents describing all the phases for their implementation, in addition to scientific (i.e., journal) papers.Moreover, the active participation of experts and stakeholders in the formulation of indicators and in their validation is still very sporadic.At the very least, the validation phase should be conducted with the active input of stakeholders and experts to assure objectivity, reliability, and applicability of the assessment.
The approach recently proposed by [44][45][46][47] for rice production systems provides a good example of an assessment tool, complete with detailed technical documents that facilitate its implementation for the different users involved in the sustainable development of rice agro-ecosystems.As a matter of fact, indicators can be used by farmers as well as researchers, both for a self-sustainability assessment and for monitoring progress towards achieving sustainability goals.The SRP initiative was launched in 2011 and it is co-convened by the UN Environment Programme and the International Rice Research Institute (IRRI).It includes over 100 public and private partners.Many stakeholders in rice areas around the world have been involved in the construction and validation of the assessment procedure.As a matter of fact, this procedure aims to be the 'benchmark' for the sustainability assessment in rice agro-ecosystems worldwide.Given the road opened by the SRP, it seems crucial that international institutions aimed at safeguarding the environment also through the achievement of greater sustainability of agricultural production could promote similar initiatives for other agricultural sectors.The hard work of constructing and validating indicators that consider different sustainability dimensions and which are appropriate to specific geographical contexts, which have already been carried out by various researchers around the world, certainly constitutes a valid starting point for the building process.
Finally, this review has revealed that there are very few procedures that address sustainability assessment at spatial scales larger than an individual farm.When larger scales have been considered, indicators are almost always calculated for individual farms and averaged over the larger space (with the exception of the SIRIUS procedure).Although this approach can be adopted for some indicators (e.g., gas emissions, yield production), it is not valid for all.For instance, in the case of indicators aimed at quantifying water use in the irrigation of agricultural areas, the water use efficiency in a portion of territory (district, watershed, region) does not necessarily coincide with what is observed in individual farms; on a territorial scale, in fact, complex mechanisms involved in the reuse of water resources may occur, especially in traditional irrigation systems.Therefore, both for rice (which is usually grown in very traditional irrigation systems) and for other agro-ecosystems, there is a need to identify procedures for the development of indicators covering larger spatial scales than the single farm; these scales are in fact often of greater interest to decision makers.

Table 1 .
Main methodologies proposed in the literature for the assessment of agricultural sustainability based on sets of indicators (more details can be found in Supplementary Material).