Socioeconomic and Environmental Proxies for Comparing Freshwater Ecosystem Service Threats across International Sites: A Diagnostic Approach

In this work, we develop and test proxy-based diagnostic tools for comparing freshwater ecosystem services (FWES) risks across an international array of freshwater ecosystems. FWES threats are increasing rapidly under pressure from population, climate change, pollution, land use change, and other factors. We identified spatially explicit FWES threats estimates (referred to as threat benchmarks) and extracted watershed-specific values for an array of aquatic ecosystems in the Western Hemisphere (Ramsar sites). We compared these benchmark values to values extracted for sites associated with an international FWES threat investigation. The resulting benchmark threats appeared to provide a meaningful context for the diagnostic assessment of study site selection by Water 2018, 10, 1578; doi:10.3390/w10111578 www.mdpi.com/journal/water Water 2018, 10, 1578 2 of 21 revealing gaps in coverage of the underlying socio-environmental problem. In an effort to simplify the method, we tested regularly updated environmental and socioeconomic metrics as potential proxies for the benchmark threats using regression analysis. Three category proxies, aggregated from (i) external (global to regional, climate-related), (ii) internal (watershed management-related), and (iii) socioeconomic and governance related proxies produced strong relationships with water supply threat benchmarks, but only weak relationships with biodiversity-related and nutrient regulation benchmark threats. Our results demonstrate the utility of advancing global FWES status and threat benchmarks for organizing coordinated research efforts and prioritizing decisions with regard to international socio-environmental problems.


Introduction
Freshwater ecosystems provide our global society with provisioning (e.g., water and food supply), regulating (e.g., pollutant attenuation), cultural (e.g., iconic fishery conservation), and supporting services [1][2][3]. While critical to human and aquatic life and threatened globally, many freshwater ecosystem services (FWES) are complex and difficult to quantify [4][5][6]. Therefore, most ecosystem services (ES) assessments are site-specific, limiting our ability to compare across sites and make informed decisions about the allocation of research and management resources. To enhance our understanding of threats to FWES and make sound, transferable recommendations for mitigation, it is important to consider a broad range of socio-environmental conditions. Coordinated research networks provide new opportunities to develop understanding at larger spatial scales in the context of socio-environmental problems like threats to FWES [7][8][9][10]. However, optimizing site selection in such networks is also a challenging problem that has received little attention in the socio-environmental and ES literature.
Evaluating a network of sites for the study of FWES threats requires at least some knowledge of these threats and their spatial variation. Some knowledge may be gleaned from environmental quality and performance, water scarcity, biodiversity and other ES-related states, which are increasingly being assessed and mapped using socioeconomic, environmental, and human development data [11][12][13][14][15][16]. However, advancing toward understanding and mitigating FWES threats requires further investigation, as this is a "wicked problem" related to natural and anthropogenic factors, and to society's capacity to manage those factors in terms of governance, values, and perceptions [17,18]. Most relevant to this work are the Riverthreat.net [14] and Aqueduct Water Risk Atlas [16] efforts to integrate available environmental, agronomic and human development metrics and simulation outputs to obtain FWES threat-related products at the watershed scale, which is the scale of interest most relevant to FWES decision-makers. The Riverthreat.net products include global maps of threat drivers (e.g., catchment disturbance, pollution, fishing pressure) and the estimated incident threat to human water security (HWS) and biodiversity. Both threats are relatable to FWES threats, as are several of the drivers. For example, Vörösmarty et al. [14] classify nitrogen loading as a driver for the threat to biodiversity, but nitrogen loading is also connected to nutrient regulation FWES. These and related approaches that take more FWES-relevant metrics and simulations into account show great promise in supporting the understanding of the distribution of FWES threats [19][20][21].
Approaches integrating hydro-climatic, agronomic, and other information to map environmental states have advanced markedly in the past decade and research is ongoing to provide reliable FWES and FWES threat estimates [2,4,6,[14][15][16]. A key issue to address is that most such approaches fail to capture social dynamics that would help them advance toward forecasting environmental conditions or performances at the watershed scale and identify high threat/low capacity areas in need of attention.
For example, diagnostic investigations and modifications may be warranted for advanced freshwater ecosystem monitoring and management programs, such as the European Union's Water Framework Directive (WFD) [22][23][24][25]. Coordinated inquiries across socio-environmental gradients can help to address this shortcoming and, despite their limitations, current FEWS-related products may be adequate to support the design of coordinated research activity on global FWES threats and adaptation strategies. For a first approximation, societal conditions and aquatic state (hydroclimate, watershed management, etc.) should be useful for predicting FWES threat. It also follows, and others have noted, that some FWES threats are easier to characterize than others [5,6]. For example, provisioning services like water supply are universally valued and straightforward to assess. More complex FWES, such as nutrient regulation, are affected by social, hydro-climatic, watershed management and governance factors and more difficult to quantify. Some FWES, such as biodiversity-related ES, may be hampered by unanswered ecological questions rendering them much more difficult to assess.
In this paper, we develop and test proxy-based diagnostic tools for comparing FWES risks across an international (Pan-American) array of freshwater ecosystems. Our approach was to first identify meaningful benchmarks for threats to FWES, and then to construct and test more easily attainable proxies for these benchmarks by combining readily available and regularly updated indicators. We hypothesized that the reliability of FWES threat proxies will decrease from provisioning services (most reliable), to regulating services, to cultural and supporting services (least reliable). In addressing this hypothesis, we explored two supporting research questions: (1) Do the benchmark threats provide useful diagnostics in the context of a coordinated international research network? (2) Do meaningful benchmark threat proxy relationships exist that could offer similar threat diagnostics in the absence of the more effort-intensive benchmarks that are not necessarily updated over time?

Methods
Our approach involved three main tasks. First, we identified the best available FWES threat estimates for a large number of well-known aquatic ecosystems distributed throughout the Americas, including sites from an existing research network. Second, with respect to experimental design, we used the resulting threat estimates to assess whether the research network site as a subset of the greater FWES threat space adequately spanned that space. These tasks allowed us to address our first research question. Third, to address our second research question, we assembled potential threat proxies from the published literature on environmental and socioeconomic metrics and indicators, and used linear regression analysis to explore potential relationships between threat proxies and benchmarks. Regression approaches such as this one are commonly used to explore global socio-environmental problems and connect regularly available and updated metrics (e.g., Gross Domestic Product) with less available and less frequently updated environmental quality or performance indicators, e.g., [15,16].

Study Sites
To explore the variation in threats to FWES across the Americas we selected 32 sites in 23 nations from the international Convention on Wetlands (www.ramsar.org, Figure 1 and Table A1), including a mix of lakes, rivers and wetlands. The 32 sites included three sites each from larger nations with heterogeneous climate and population distributions (Canada, USA, Brazil), two sites from medium-sized nations (Argentina, Chile, Mexico), and one site from each of 17 smaller nations in Central and South America. Several smaller Caribbean nations were omitted because they lacked Ramsar sites, or indicators needed for the analysis were unavailable. We used the Ramsar sites to extract threat estimates from a global FWES threat dataset (Section 2.2), then used those results to diagnostically assess the international coordinated research effort "Sensing the Americas' Freshwater Ecosystem Risk from Climate change" (SAFER Project, www.safer.conicet.gob.ar). The SAFER research network includes seven aquatic sites (rivers, lakes, and coastal lagoons) located in six nations ( Figure 1). SAFER sites were selected mainly because they were sites of ongoing investigations by SAFER researchers and allowed the SAFER network to leverage prior work and additional funding. In this work, we examine their potential for exploring socio-environmental gradients to improve the understanding of FWES threats.
As part of the SAFER project, investigators held workshops with local experts and in some cases stakeholders to classify and rank key ecosystem services (ES) and identify threats to those services (Table 1). See Smyth et al. (this issue) for more details on the SAFER sites, particularly with respect to stakeholder engagement. For the purposes of this work, we selected three FWES common to all of the SAFER sites: (i) Water Provisioning, (ii) Biodiversity-related ES (broadly defined), and (iii) Nutrient regulation which reflect water quantity, ecological condition, and water quality, respectively.

Identifying Freshwater Ecosystem Service Threat Benchmarks
We identified the global Riverthreat.net spatial 0.5 • gridded data products [14] as the best available FWES threat metrics (hereafter referred to as threat benchmarks). Other prospective products provided similar results but were available only at coarser resolutions [26][27][28][29]. We designated these as our benchmarks because (i) they emphasize freshwater ecosystems (primarily rivers) and are organized at the watershed scale, and (ii) they incorporate a comprehensive range of stressors related to climate and human influences. Specific to the three FWES targeted in this work (water supply, biodiversity conservation, nutrient regulation), we selected the adjusted Human Water Security threat (aHWS), incident Biodiversity threat (iBD), and the nitrogen loading (NL) driver as our focal threat benchmarks. The aHWS and iBD metrics are intended to summarize threats of inadequate human water supply and biodiversity loss in a watershed. These threats are based on accumulation of 23 weighted drivers in four thematic areas: catchment disturbance (four drivers); pollution (nine); water resource development (six); and biotic factors (four) [14]. The adjustment in the HWS score (leading to aHWS) is calculated from water resources development drivers associated with risk-ameliorating water infrastructure and management investments. The nitrogen loading (NL) score was categorized as one of the 23 drivers in the Riverthreat.net data product (nutrient regulation threats per se were not available), and we chose to employ it separately as a threat benchmark.   [30] and the locations of Ramsar sites (black-in-white circles) and SAFER project sites (white-in-black circles) used in this study (see Ramsar site list in Table A1; see SAFER abbreviations in Table 1).   [30] and the locations of Ramsar sites (black-in-white circles) and SAFER project sites (white-in-black circles) used in this study (see Ramsar site list in Table A1; see SAFER abbreviations in Table 1).
We extracted aHWS, iBD and NL benchmark values from the global dataset for each Ramsar and SAFER study site by averaging the five highest pixel values in contact with each site's water body. Two of the SAFER site water bodies (La Salada and Laguna de Rocha) were small relative to the data set spatial resolution (0.5 • ). In these cases, we averaged only the two or three pixels in contact with the water bodies. We used the resulting Ramsar benchmark values to diagnostically assess the SAFER sites as a coordinated research network in terms of its coverage of the range of FWES threats observed in the broader set of sites.

Identifying Proxies for Threat Benchmarks
Because the benchmarks threats are (to date) one-time estimates, we tested a wide array of readily available hydrologic, environmental performance, and socioeconomic indicators as potential proxies for the benchmarks. If available, such proxies could facilitate site selection and prioritization in coordinated research networks. Prospective proxies included ( Table 2): (i) indicators of water use, water stress/scarcity, and water vulnerability, (ii) scores from the Yale Environmental Performance Index (EPI) and its components, (iii) access to clean water and improved sanitation indicators, (iv) World Governance indicators, and (v) wealth as GDP per capita. All of these proxies are available only at the national scale with the exception of some of the hydroclimatic variables (WRI 2014 proxies; [14]), which are watershed-based values.
After preliminary assessment of a broader array of data sources, using methods described below, we settled on the most promising and reliable proxies (their readily available and regularly updated). We categorized the selected proxies in terms of (1) external threats related to regional hydroclimatic changes, (2) internal threats from human activities within a watershed, or (3) social threats, i.e., threats due to lack of resources, poor governance, and other socioeconomic factors ( Table 2). Many of these indicators are associated with more than one of these categories, but we categorized them with respect to their primary influence. For example, upstream storage (STOR) is categorized here as an internal indicator because it is primarily associated with geomorphology and major watershed manipulations (e.g., reservoirs), but is also clearly driven by climate and adaptive capacity (e.g., arid climates require more water storage infrastructure to enable agricultural enterprises and require government policy to construct).
We normalized all non-normalized proxy values using their maximum theoretical value, resulting in a range from 0 to 1 (low to high threat, strong to poor performance, etc.). Proxies scored with the opposite convention were inverted for our purposes (1-proxy value in Table 2). The directionality of the performance or threat scale was unclear for normalized GDP per capita (GDPP). While increasing national wealth (GDP) is expected to be associated with lower threats to some services (e.g., water supply), it may be associated with elevated threats to other services (e.g., biodiversity-related threats). For this proxy, we made no assumption about directionality. The directionality of the threat scale is sometimes dependent of the ES in question. For example, increasing amounts of upstream storage in a watershed (STOR, Table 2) typically reduces the threat of loss of water supply services. However, storage is also likely to increase the threat of loss of biodiversity-related ES.
We used a three-step approach to search for FWES threat proxies that were a combination of those from the three categories ( Figure 2). First, within each category, we considered the proxy values both individually and in combination with the other proxies in the category. Combinations were created by simple averaging and all combinations were tested. Second, we aggregated the three categories using both weighted means (P wm ) and geometric means (P gm ) as follows: (1) where P wm and P gm are referred to as three-category threat proxies based on the intermediate proxies for external, internal, and social categories (T e , T i , T s ,) and w 1 , w 2 , w 3 are weighting factors (valued 0 to 1 and summing to 1). Third, we used regression analysis to identify the strongest linear relationships (highest R 2 ) between the threat benchmarks as the independent variables (aHWS, iBD, NL) and the 3-category proxy as the dependent variable (P wm or P gm ). As with the benchmarks, we used the resulting regression relationships to diagnostically assess the SAFER network of sites in terms of its coverage of the proxy-benchmark threat space.

WRI
Water return index Fraction of available water previously used and discharged upstream as wastewater effluent (0-1 = high-low threat for water supply ES; 0-1 = low-high threat for biodiversity-related and nutrient regulation ES) [16,33] AGSUB Agricultural subsidies Degree of environmental pressure exerted by subsidizing agricultural inputs (0-1 = high-low threat) [15] NUE Nitrogen use efficiency Measure of the appropriate management of nitrogen resources for agricultural production (0-1 = high-low threat) [15] NBal Nitrogen use balance Measure of the appropriate management of nitrogen resources for agricultural production (0-1 = high-low threat) [15] STOR Upstream storage Upstream water storage capacity relative to total water supply (0-1 = high-low threat for water supply ES; 0-1 = low-high threat for biodiversity-related and nutrient regulation ES) [16] ECO_S Upstream protected land Fraction of total water supply that originates from protected watersheds (0-1 = high-low threat) [16] WATSUP Access to drinking water Fraction of nation's population with access to improved drinking water (0-1 = high-low threat to water supply) [15] ACSAT Access to sanitation Fraction of a nation's population with access to improved sanitation (0-1 = high-low threat to water supply) [15] WWT Wastewater treated Fraction of collected wastewater that is treated (0-1 = high-low threat) [15] TPA Terrestrial protected Areas Degree to which a nation achieves target of protecting 17% of its biomes (0-1 = high-low threat) [15] Socioeconomic and Governance Proxies

RL Rule of Law
Captures perceptions of the extent to which agents have confidence in and abide by rules of society (especially quality of contract enforcement, property rights, police, courts and likelihood of crime and violence) [34] VA Voice & Accountability Captures perceptions of the extent to which a country's citizens are able to participate in selecting their government, as well as freedom of expression, freedom of association, and a free media [34] GE Government Effectiveness Captures perceptions of a nation's quality of public services and civil services, the degree of its independence from political pressures, the quality of policy formulation and implementation, and the credibility of the government's commitment to such policies [34] GDPP Gross Domestic Product per Capita Value of annual goods produced and services provided by a nation divided by its population (0-1 = low-high normalized GDPP; threat scale tested in both directions) [35] Figure 2. Method for developing three category proxies for benchmark freshwater ecosystem service threats: (1) averaging of proxies within each category; (2) aggregation of proxies from the three categories, (i) external (regional hydro-climatic), (ii) internal (land use and watershed managementrelated), (iii) social (wealth and governance related); and (3) building benchmark-proxy relationships through regression analysis.

Freshwater Ecosystem Service Threat Benchmarks
The FWES threat benchmarks for water supply (aHWS), biodiversity-related services (iBD), and nutrient regulation (N-load) ranged from relatively minor (~0.3) to major threats (~1.0) in all three cases (Figures 3). For the Ramsar site aHWS values (Figure 3a), we found few low-threat sites (aHWS < 0.4) and a relatively high mean threat (0.67). The aHWS values were roughly normally distributed (Figure 3a inset graph), though slightly skewed toward lower values. The sites causing the tailing at the low end are all highly developed nations (CA and US). With respect to SAFER project diagnostics, the SAFER site aHWS values ranged from 0.35 to 0.75, encompassing the lower two-thirds of the range exhibited by the Ramsar sites, as evidenced by the lower mean value for the SAFER sites (0.53). There was a lack of coverage of the higher water supply threat range by the SAFER sites.
The biodiversity-related services threat benchmark (iBD) for the Ramsar sites ( Figure 3b) ranged from 0.39 to 0.93, with an average (0.66) suggesting moderately high threat levels at most sites. Similar to the aHWS case, the Ramsar iBD values were roughly normally distributed and skewed toward lower values. In this case, the sites associated with lower iBD values were mainly in remote, undeveloped areas. In contrast to aHWS, the higher end of iBD range included most of the Ramsar sites in highly developed nations. For the iBD-related threat benchmark, the SAFER site threat values represented good coverage of the entire range presented by the Ramsar sites.
The nutrient regulation threat benchmark (N-load) for the Ramsar sites ( Figure 3c) ranged from 0.29 to 0.85, with a slightly lower average (0.58) than the other two benchmarks. The Ramsar N-load values were normally distributed (skewness = −0.02). The SAFER site values were skewed toward the intermediate and high threat portions of the range, but fail to appear in the lower portion. Method for developing three category proxies for benchmark freshwater ecosystem service threats: (1) averaging of proxies within each category; (2) aggregation of proxies from the three categories, (i) external (regional hydro-climatic), (ii) internal (land use and watershed management-related), (iii) social (wealth and governance related); and (3) building benchmark-proxy relationships through regression analysis.

Freshwater Ecosystem Service Threat Benchmarks
The FWES threat benchmarks for water supply (aHWS), biodiversity-related services (iBD), and nutrient regulation (N-load) ranged from relatively minor (~0.3) to major threats (~1.0) in all three cases (Figure 3). For the Ramsar site aHWS values (Figure 3a), we found few low-threat sites (aHWS < 0.4) and a relatively high mean threat (0.67). The aHWS values were roughly normally distributed (Figure 3a inset graph), though slightly skewed toward lower values. The sites causing the tailing at the low end are all highly developed nations (CA and US). With respect to SAFER project diagnostics, the SAFER site aHWS values ranged from 0.35 to 0.75, encompassing the lower two-thirds of the range exhibited by the Ramsar sites, as evidenced by the lower mean value for the SAFER sites (0.53). There was a lack of coverage of the higher water supply threat range by the SAFER sites.
The biodiversity-related services threat benchmark (iBD) for the Ramsar sites (Figure 3b) ranged from 0.39 to 0.93, with an average (0.66) suggesting moderately high threat levels at most sites. Similar to the aHWS case, the Ramsar iBD values were roughly normally distributed and skewed toward lower values. In this case, the sites associated with lower iBD values were mainly in remote, undeveloped areas. In contrast to aHWS, the higher end of iBD range included most of the Ramsar sites in highly developed nations. For the iBD-related threat benchmark, the SAFER site threat values represented good coverage of the entire range presented by the Ramsar sites.
The nutrient regulation threat benchmark (N-load) for the Ramsar sites (Figure 3c) ranged from 0.29 to 0.85, with a slightly lower average (0.58) than the other two benchmarks. The Ramsar N-load values were normally distributed (skewness = −0.02). The SAFER site values were skewed toward the intermediate and high threat portions of the range, but fail to appear in the lower portion.

Proxies for Freshwater Ecosystem Service Threat Benchmarks
Multiple combinations of the external, internal, and social proxies resulted in strong relationships with the water supply stress benchmark (aHWS). In the simplest case, national wealth  (iBD, b), and nutrient regulation (N-load, c) for Ramsar (black bars) and SAFER sites (red bars). Inset graphs are frequency histograms for the same values (0.05 bins).

Proxies for Freshwater Ecosystem Service Threat Benchmarks
Multiple combinations of the external, internal, and social proxies resulted in strong relationships with the water supply stress benchmark (aHWS). In the simplest case, national wealth (GDPP) as an independent proxy accounted for 72% of the variation in aHWS across the Ramsar sites (Figure 4a). The majority of the Ramsar sites are from developing nations, which led to the grouping of sites between about 0.1 and 0.5 (x-axis, P aHWS in Figure 3a), and a large gap in coverage between 0.5 and 0.8, with highly-developed nations (Canada and USA) falling above 0.8. Notably, aggregation of prospective governance indicators with GDPP did not increase the regression coefficient in this case. The strongest relationship with the Ramsar aHWS values (R 2 = 0.92) was provided by the following three-category proxy ( Figure 4b): where DRO is the drought frequency proxy (external), (1 − WRI), WATSUP, ACSAT is the average of the water return index, access to clean water, and access to sanitation (internal) proxies, and GDPP is the gross domestic product per capita (social) proxy. Five of the seven SAFER sites were within the 95% confidence bands developed from the Ramsar site data for GDPP-only proxy case. For the more complex proxy there was a much tighter confidence interval encompassing four SAFER sites, with the remaining three just outside the lower confidence band. These results suggest that this benchmark-proxy relationship could be useful for informing the design of a coordinated research network. The SAFER sites displayed the same grouping behavior mentioned above for development level. This outcome suggested a potential limitation of the resulting relationships in these equations due to an overabundance of sites characterized by roughly the same GDPP.
Water 2018, 7, x FOR PEER REVIEW 12 of 23 (GDPP) as an independent proxy accounted for 72% of the variation in aHWS across the Ramsar sites (Figure 4a). The majority of the Ramsar sites are from developing nations, which led to the grouping of sites between about 0.1 and 0.5 (x-axis, PaHWS in Figure 3a), and a large gap in coverage between 0.5 and 0.8, with highly-developed nations (Canada and USA) falling above 0.8. Notably, aggregation of prospective governance indicators with GDPP did not increase the regression coefficient in this case. The strongest relationship with the Ramsar aHWS values (R 2 = 0.92) was provided by the following three-category proxy (Figure 4b): where DRO is the drought frequency proxy (external), (1 − ), , is the average of the water return index, access to clean water, and access to sanitation (internal) proxies, and GDPP is the gross domestic product per capita (social) proxy.
Five of the seven SAFER sites were within the 95% confidence bands developed from the Ramsar site data for GDPP-only proxy case. For the more complex proxy there was a much tighter confidence interval encompassing four SAFER sites, with the remaining three just outside the lower confidence band. These results suggest that this benchmark-proxy relationship could be useful for informing the design of a coordinated research network. The SAFER sites displayed the same grouping behavior mentioned above for development level. This outcome suggested a potential limitation of the resulting relationships in these equations due to an overabundance of sites characterized by roughly the same GDPP.
(a)  . Regression analysis for ecosystem service water supply benchmark aHWS (adjusted Human Water Stress; [14]) as a function of proxy (PaHWS) obtained using external, internal and socioeconomic indicators ( Table 2): (a) regression results for GDPP as single proxy, and (b) best regression results for threecategory proxy. Regression models (solid line; dashed lines 95% confidence interval) based on Ramsar sites (black circles), with SAFER sites (red circles) plotted for comparison.
Viable relationships between the benchmark threats to biodiversity services (iBIO) and the prospective proxies were more challenging to identify relative to those for water supply. We identified one relatively weak (R 2 = 0.31) but significant relationship for a three-category threat proxy (PiBIO) defined by ( Figure 4): where 1 − , 1 − , 1 − , 1 − is the average of all four of the hydro-climate (external) proxies, , is the average of the watershed storage and water return (internal) proxies, and , is the average of the Rule of Law and GDPP (social) proxies. The regression slope is positive, meaning that any increase in PiBD predicts an increase in the threat to biodiversity.
The utility of PiBD as a comparative metric for improving the SAFER project is questionable relative to the case for PaHWS (Figure 3b). Only two of the seven SAFER sites (San Joaquin River, USA and La Salada, AR) fall within the confidence bands. Furthermore, the SAFER sites are narrowly clustered (0.3 < PiBD < 0.5) and appear to be aligned along a steeper slope than the Ramsar sites. Lastly, the scale of the optimal proxy is compressed here, achieving the maximum threat (iBD = 1.0) at a PiBD value of 0.7. This compression could be an artifact of the iBD indicator since biodiversity threat classification is itself a challenging topic [36]. Viable relationships between the benchmark threats to biodiversity services (iBIO) and the prospective proxies were more challenging to identify relative to those for water supply. We identified one relatively weak (R 2 = 0.31) but significant relationship for a three-category threat proxy (P iBIO ) defined by ( Figure 4): where 1 − HFO, 1 − DRO, 1 − WSV, 1 − SV is the average of all four of the hydro-climate (external) proxies, STOR, WRI is the average of the watershed storage and water return (internal) proxies, and RL, GDPP is the average of the Rule of Law and GDPP (social) proxies. The regression slope is positive, meaning that any increase in P iBD predicts an increase in the threat to biodiversity. The utility of P iBD as a comparative metric for improving the SAFER project is questionable relative to the case for P aHWS (Figure 3b). Only two of the seven SAFER sites (San Joaquin River, USA and La Salada, AR) fall within the confidence bands. Furthermore, the SAFER sites are narrowly clustered (0.3 < P iBD < 0.5) and appear to be aligned along a steeper slope than the Ramsar sites. Lastly, the scale of the optimal proxy is compressed here, achieving the maximum threat (iBD = 1.0) at a P iBD value of 0.7. This compression could be an artifact of the iBD indicator since biodiversity threat classification is itself a challenging topic [36]. Water 2018, 7, x FOR PEER REVIEW 14 of 23 Figure 5. Strongest correlation for biodiversity-related threat benchmark, iBD (incident biodiversity threat; [14]), as a function of proxy (PiBD) obtained using external, internal and socioeconomic indicators ( Table 2). Regression model (solid line; dashed lines 95% confidence interval) based on Ramsar sites (black circles), with SAFER sites (red circles) plotted for comparison.
Relationships between the benchmark threats to nutrient regulation services (NL) and proxies (PNL) were the most challenging to identify. One significant but weak relationship (R 2 = 0.22) proxy for nitrogen loading (PNL) was found ( Figure 6): where (1 − ), (1 − ) is the average of the inter-annual and seasonal precipitation variability (external) proxies, , is the average of the nitrogen use efficiency and nitrogen balance (internal) proxies, and , is the average of the Rule of Law and GDPP (social) proxies. In this case, the response between the proxy and benchmark threats was weak (slope = 0.35) relative to those for iBD (0.67) and aHWS (−1.23). The 95% confidence bands for the N-loading regression were broad due to the relatively weak relationship, and hence it is not surprising that most of the SAFER sites (six of seven) fell within the bands. The SAFER site proxy values provided good coverage of the upper half of the proxy space (0.5 < PiBD < 0.9) but were absent in the lower range. Strongest correlation for biodiversity-related threat benchmark, iBD (incident biodiversity threat; [14]), as a function of proxy (P iBD ) obtained using external, internal and socioeconomic indicators ( Relationships between the benchmark threats to nutrient regulation services (NL) and proxies (P NL ) were the most challenging to identify. One significant but weak relationship (R 2 = 0.22) proxy for nitrogen loading (P NL ) was found ( Figure 6): where (1 − WSV), (1 − SV) is the average of the inter-annual and seasonal precipitation variability (external) proxies, NUE, NBal is the average of the nitrogen use efficiency and nitrogen balance (internal) proxies, and RL, GDPP is the average of the Rule of Law and GDPP (social) proxies. In this case, the response between the proxy and benchmark threats was weak (slope = 0.35) relative to those for iBD (0.67) and aHWS (−1.23). The 95% confidence bands for the N-loading regression were broad due to the relatively weak relationship, and hence it is not surprising that most of the SAFER sites (six of seven) fell within the bands. The SAFER site proxy values provided good coverage of the upper half of the proxy space (0.5 < P iBD < 0.9) but were absent in the lower range.
Water 2018, 7, x FOR PEER REVIEW 15 of 23 Figure 6. Regression analysis for ecosystem services provided by ecosystem nutrient regulation service threat benchmark N-load (NL; [14]) as a function of proxy (PNL) based on external, internal and socioeconomic indicators ( Table 2). Regression model (solid line; dashed lines 95% confidence interval) based on Ramsar sites (black circles), with SAFER sites (red circles) plotted for comparison.

Freshwater Ecosystem Service Threat Benchmarks
Our first research question concerned whether the benchmarks can provide useful diagnostics with respect to a coordinated research network. The threat benchmarks extracted for the Ramsar sites demonstrate the promise of this approach for identifying meaningful comparative FWES threats metrics across an international array of sites. Superposition of the SAFER sites benchmark values on the spectrum of Ramsar site values (Figure 3a-c) reveals that the SAFER network covers some parts of the more general range of threat behavior better than others. For water supply threats (aHWS), the results point to the need for study sites in higher threat areas (Figure 3a), which mainly coincide with the less well-developed nations in our study (e.g., Haiti, Ecuador, El Salvador, Dominican Republic). There are logistical and other challenges associated with including such nations, and these must be overcome if the goal is to test a wider range of threats. In contrast, for nutrient regulation threats (NL, Figure 3c) the SAFER sites provide better coverage of the high and intermediate threat levels, but fail to cover the lowest levels. Improving coverage in this regard may present the challenge of identifying funding sources for study sites where threats to FWES are relatively low.
As noted at the outset of this paper, identifying the real threat to FWES is a complex and sitespecific socio-environmental problem. The outcomes here are considered first approximations for the purpose of better aligning future studies with the diversity of threats present. That said, our benchmarking approach could provide diagnostic support for decisions regarding inclusion of additional sites to a research network should research capacity and local interests allow it.

Proxies for Threat Benchmarks
The best proxies-benchmark relationships (Figures 4b, 5 and 6) provided some support for our research hypothesis. Proxies exhibited strong relationships with the water supply benchmark Figure 6. Regression analysis for ecosystem services provided by ecosystem nutrient regulation service threat benchmark N-load (NL; [14]) as a function of proxy (P NL ) based on external, internal and socioeconomic indicators ( Table 2). Regression model (solid line; dashed lines 95% confidence interval) based on Ramsar sites (black circles), with SAFER sites (red circles) plotted for comparison.

Freshwater Ecosystem Service Threat Benchmarks
Our first research question concerned whether the benchmarks can provide useful diagnostics with respect to a coordinated research network. The threat benchmarks extracted for the Ramsar sites demonstrate the promise of this approach for identifying meaningful comparative FWES threats metrics across an international array of sites. Superposition of the SAFER sites benchmark values on the spectrum of Ramsar site values (Figure 3a-c) reveals that the SAFER network covers some parts of the more general range of threat behavior better than others. For water supply threats (aHWS), the results point to the need for study sites in higher threat areas (Figure 3a), which mainly coincide with the less well-developed nations in our study (e.g., Haiti, Ecuador, El Salvador, Dominican Republic). There are logistical and other challenges associated with including such nations, and these must be overcome if the goal is to test a wider range of threats. In contrast, for nutrient regulation threats (NL, Figure 3c) the SAFER sites provide better coverage of the high and intermediate threat levels, but fail to cover the lowest levels. Improving coverage in this regard may present the challenge of identifying funding sources for study sites where threats to FWES are relatively low.
As noted at the outset of this paper, identifying the real threat to FWES is a complex and site-specific socio-environmental problem. The outcomes here are considered first approximations for the purpose of better aligning future studies with the diversity of threats present. That said, our benchmarking approach could provide diagnostic support for decisions regarding inclusion of additional sites to a research network should research capacity and local interests allow it.

Proxies for Threat Benchmarks
The best proxies-benchmark relationships (Figures 4-6) provided some support for our research hypothesis. Proxies exhibited strong relationships with the water supply benchmark (aHWS), while only relatively weaker relationships were identified for proxies and the benchmarks for biodiversity-related (iBD) and nutrient regulation (NL) service threats. This outcome is likely related to the complexity of the respective ecosystem services, and also pertains to our second research question regarding the meaningfulness of our proxy benchmark relationships.
For water provisioning ES threats, our three-category proxy was strongly (R 2 = 0.85) and inversely related to the benchmark aHWS (Figure 4b), indicating a relatively sharp decline in risk per unit increase in proxy. The best-fitting proxy included drought occurrence (DRO), the non-return (natural) flow ratio (1-WRI), access to clean water (WATSUP) and advanced sanitation (ACSAT), and GDP per capita (GDPP). Although it would be speculative to attribute cause and effect to these simple regression results, the constituting proxies here could be meaningful. Drought prone regions with sufficient financial resources generally develop water resources infrastructure to provide a buffer against drought impacts (often at the expense of the aquatic ecosystems). Increasing access to clean water and sanitation has long been used to document human development and could be interpreted as another indicator for improving watershed management. All of these conditions are consistent with a reduced threat to the water provisioning service.
For the biodiversity-related ES threat (iBD), the best proxy-benchmark relationship was much weaker than that for water supply (R 2 = 0.31, Figure 5) and therefore less reliable. The three-category proxy identified is consistent with the rationale that threats to these services increases with decreasing hydro-climate variability, increasing water storage and return flows in the watershed, and increasing national wealth and Rule of Law. These conditions are consistent with development and likely urbanization, which are well-known to result in permanent habitat loss and other threats to biodiversity-related services [37,38]. However, given the weakness of the relationship, it is more likely the case that better and/or more consistently assessed metrics for biodiversity-related threats are needed before meaningful proxies can be identified.
The benchmark-proxy threat relationship for nutrient regulation (NL) was the weakest of the three FWES threats tested (R 2 = 0.22, Figure 6). The large degree of scatter combined with the weak positive slope in this relationship reveal that nitrogen loading does not differ greatly across the spectrum of sites. Less variable climate and increased national wealth correspond to higher threat values, which may correspond to increased agriculture and fertilizer usage. The appearance of nitrogen-use efficiency (NUE) and nitrogen balance (NBal) from the internal (watershed management) proxy category appears to be counterintuitive. Higher values for these metrics correspond to better agronomic management of nitrogen on watershed farms. Thus, one would expect the nitrogen loading to decrease with increase NUE and/or NBal. If this benchmark-proxy threat relationship is valid, then it suggests that these agronomic metrics for nitrogen use may also be indicative of nitrogen-related threats to nutrient regulation, as from widespread nitrogen application in a watershed [39].

Testing for Nationally Clustered Behavior
As noted in Section 3, in some cases benchmark values for the sites appeared to cluster with respect to national factors. We did not analyze the clustering behavior exhaustively, but we began to explore it with respect to the World Bank's GDP estimates and governance quality indicators ( Table 2). More specifically, we tested for clusters in nations using historical values (1996-2015) of annual GDP and the four governance indicators (R Package https://uc-r.github.io/kmeans_clustering). Three distinct clusters of nations emerged using the historical trends in GDPP, Rule of Law (RL) and access to advanced sanitation (ACSAT) (see Figure A1). These can be characterized as higher (Cluster 1), intermediate (Cluster 2) and lower (Cluster 3) clusters of wealth, sanitation access, and Rule of Law. Reanalyzing 32 Ramsar sites to create three benchmark-proxy correlations using these clusters revealed further insight into the threat benchmark-proxy relationships.
Many of the new benchmark-proxy relationships for the clusters span narrow ranges and hence it is important to avoid over-interpretation. With this qualification, the results suggest that there are "sub-trends" within the overall trends (Figure 7) that may warrant further investigation. These sub-trends may be related to factors like national development level and governance quality. For instance, the first and second clusters for the water supply threat (aHWS) resulted in significant negative relationships, while the third cluster failed to yield a relationship (Figure 7 left column). Furthermore, if the trends are accurate, they point to stronger negative relationship (based on the slope) for the more developed nations relative to intermediate nations. These results suggest that the connection between the external, internal, and socioeconomic/governance proxy indicators and the water provisioning service benchmark weakens with decreasing development levels in regard to national wealth, rule of law, and access to sanitation.
Water 2018, 7, x FOR PEER REVIEW 17 of 23 the first and second clusters for the water supply threat (aHWS) resulted in significant negative relationships, while the third cluster failed to yield a relationship (Figure 7 left column). Furthermore, if the trends are accurate, they point to stronger negative relationship (based on the slope) for the more developed nations relative to intermediate nations. These results suggest that the connection between the external, internal, and socioeconomic/governance proxy indicators and the water provisioning service benchmark weakens with decreasing development levels in regard to national wealth, rule of law, and access to sanitation. Figure 7. Three-category threat benchmark proxy regression outcomes for Ramsar sites clustered by national GDPP and access to advanced sanitation. Plots are for water supply (aHWS), biodiversity-related (iBD), and nutrient regulation (NL) threats (left to right) for higher, intermediate and lower development levels (top to bottom). (Significance levels: ** p < 0.01; * p < 0.05; NS = not significant; see Table A2 for regression results).
For the biodiversity-based benchmark threats (iBD), the first and second clusters resulted in positive slopes, similar to the overall results (Figure 4b), while the third cluster again failed to yield a significant relationship (Figure 7 center column). In this case, if accurate, the slopes for the second cluster were more positive than for the first cluster. These results suggest that the connection between the proxy indicators and the biodiversity-based services threat benchmark weakens with increasing development level in regard to national wealth, rule of law, and access to sanitation. However, the abundance of elevated biodiversity threat values for the rightmost portion of the plot also suggests that many of these sites are already experiencing elevated threats and hence further development cannot increase these threats substantially.
The nutrient regulation ES threats (NL) resulted in significant positive and negative relationships with the proxy for the first and third clusters, respectively, while cluster 2 failed to yield a significant relationship (Figure 7, right column). From cluster 1, we see that wealthier nations with relatively strong environmental performance (ACSAT) and Rule of Law appear to show an increase in nitrogen loading. With the United States, Canada and other nations in the group, increasing  Table A2 for regression results).
For the biodiversity-based benchmark threats (iBD), the first and second clusters resulted in positive slopes, similar to the overall results (Figure 4b), while the third cluster again failed to yield a significant relationship (Figure 7 center column). In this case, if accurate, the slopes for the second cluster were more positive than for the first cluster. These results suggest that the connection between the proxy indicators and the biodiversity-based services threat benchmark weakens with increasing development level in regard to national wealth, rule of law, and access to sanitation. However, the abundance of elevated biodiversity threat values for the rightmost portion of the plot also suggests that many of these sites are already experiencing elevated threats and hence further development cannot increase these threats substantially.
The nutrient regulation ES threats (NL) resulted in significant positive and negative relationships with the proxy for the first and third clusters, respectively, while cluster 2 failed to yield a significant relationship (Figure 7, right column). From cluster 1, we see that wealthier nations with relatively strong environmental performance (ACSAT) and Rule of Law appear to show an increase in nitrogen loading. With the United States, Canada and other nations in the group, increasing population and wealth led to increased agricultural production supported by the widespread fertilizer application. From cluster 3, we see nitrogen loading decreasing as P NL increases. This trend would be consistent with a stage in development in which farming pressure in the form of nitrogen loading on waterways is not yet evident, either due to lack of agricultural development or lack of data [39].

Conclusions
We extracted benchmark threat estimates for three FWES from a global data set using an array of Ramsar aquatic ecosystems distributed throughout the Americas. The resulting benchmark threats provided a meaningful context for a posteriori diagnostic assessment of study site selection for the SAFER project coordinated research network. To our knowledge, this is the first demonstration of the potential utility of such an approach to examine and optimize socio-environmental research site selection. With new global data sets becoming available [40][41][42][43], future work should compare them in this context.
We also explored the potential for using a wide array of readily accessible and regularly updated proxies to estimate the same three benchmark threats. We did this because the benchmark threats are currently one-time snapshot estimates, and proxies would therefore be useful moving forward to apply this now-dated large-scale initiative to current and future FWES assessments. Recognizing that estimating FWES threats is a complex socio-environmental problem, we proposed aggregating proxy values representing external (regional climatic), internal (watershed management-related), and social (wealth and governance-related) factors. We hypothesized that this approach would yield stronger relationships for relatively simple FWES than for relatively complex FWES. Our results for three FWES threats supported this hypothesis with water provisioning benchmark threats yielding a much stronger relationship with proxies than those for either biodiversity-related services or nutrient regulation.
Although the proxy approach proposed is promising, the unreliable outcomes for two of the three ES tested confirm that alternative data are needed if this approach is to apply to a broad array of FWES. Future research would involve identifying and testing new prospective data sources. Such data may include high resolution regional climate products with stronger connections to FWES threats. The amount and quality of such data varies globally, which is another challenge for applications that span countries and hemispheres. For example, we have observed that the climatic pressures on FWES have been scrutinized much more in some regions of the SAFER project (e.g., SJR, California) than others (e.g., SR, Argentina). Additional types of and higher granularity socioeconomic and governance data sources also need to be explored as our proxies were mainly national values, which may be biased toward urban conditions. In contrast, many of the aquatic ecosystems in this study are in relatively rural settings, which can lag in terms of environmental policy enforcement [44,45]. Lastly, our approach failed to capture the likely connections between governance, political factors, and local stakeholders. Understanding these connections is critical to managing socio-environmental problems and is detailed in a related SAFER project paper [46]. Disconnects between environmental policy and implementation that can bias environmental assessments have also been identified in Chile [47] and elsewhere. Thus, additional efforts aimed at validating data and indexes calculated for socio-environmental performance or problems are warranted.
Overall, our results demonstrate the utility of advancing the state of global FWES status and threat benchmarks for organizing coordinated research efforts and prioritizing management decision-making. Identifying risks to freshwater ecosystem services will require intensive coordinated research efforts into the nature of the FWES threats and how best to monitor those threats and communicate them to stakeholders. This study identifies important differences in how easily those threats can be recreated from proxies when data are lacking, and which FWES could benefit most from additional research. Table A2. Regression outcomes for Figure 7, with nations clustered by historical trends in GDPP, access to advanced sanitation (ACSAT), and Rule of Law (RL), where the regression coefficient (trend line slope).

Sites (by Nation)
Benchmark Indicator