Regional Multivariate Indices of Water Use Potential for the Continental United States

: The necessity of freshwater for sustaining human life has prompted the development of numerous estimation techniques and metrics for understanding where, when, and why water is used. While estimates are valuable, techniques for estimating water use vary, and may be di ﬃ cult to replicate and / or unavailable on an annual basis or at the regional scale. To address these drawbacks, this paper proposes a series of regional indices for the continental United States that could serve as proxies for water use that are based on key variables associated with water use. Regional indices at the county level are computed, compared against each other, and compared to water withdrawal estimates from the United States Geological Survey (USGS). These comparisons highlight di ﬀ erences amongst the derived indices and the water withdrawal estimates. They also demonstrate promise for future development and implementation of related indices, given their similarities with water withdrawal estimates. Using only a small set of variables, these indices achieve some degree of similarity (~20%) to estimates of water withdrawals. The comparative data availability and ease of estimating these indices, as well as the ability to decompose the additive indices into their constituent use categories and constituent variables, renders them practically useful to water managers and other decision makers for identiﬁcation of locally speciﬁc drivers of water use and implementation of more geographically-appropriate policies to manage scarce water resources.


Introduction
Water covers 70% of the earth, but less than 1% of this water is suitable for human consumption and food production [1]. Recent studies estimate that 4 billion people suffer from severe water scarcity at least one month per year and that half these people reside in China and India [2]. This scarcity is exacerbated by the uneven distribution of water globally [3] and widespread pollution that negatively affects human health [4]. Climate change, which has increased air temperature and modified regional precipitation patterns [5], has changed the distribution of renewable water resources [6]. Population growth has also altered the demand for renewable water resources [7]. Compounding this scarcity is the urbanization and development of countries around the globe [8]. Water consumption is linked with affluence [9] and studies highlight that per capita uses of water in developed countries are orders of magnitude greater than water use in developing nations [10].
The global scarcity of freshwater, combined with the aforementioned pressures, has attracted the attention of water scientists and engineers around the world. Their research efforts are dedicated to understanding and reducing demands on water supply/demand through technological innovation while improving management of available resources [11]. A critical first step in reducing pressures on water systems is identifying locations of higher water use and the drivers of this use. To this end, researchers have created a variety of water scarcity and water stress metrics [12][13][14][15]. Several of these indices involve global assessments using country-level data, which limits our ability to understand important variations in use and water stress at a regional scale.
From a management perspective, more localized regional metrics are critical to achieving the first step in sustainable water management, which is understanding "where, when, and how water is used to satisfy our needs" ( [11], p. 20). More recently, studies have begun to address the need for assessment of regional water scarcity and development of stress indicators by analyzing water demand at finer spatial scales [16,17]. This paper builds on these efforts by deriving four multivariate indices of water use based on county-level data for the United States, which is one of the nations with the highest level of water use globally [10]. Variables used to construct our multivariate indices are grouped into three categories of use: agriculture, industry, and municipal. Once derived, the multivariate indices are compared to one another and to county estimates of water withdrawals from the United States Geological Survey (USGS). Here, it is important to note that these variables indicate the potential for water use since the composite indices and their components are not estimates of actual use but are instead based on variables from the literature that are acknowledged to be drivers of use.
While metrics for understanding water use are an ongoing area of research, the composite indices developed in this article offer two key advantages over prior efforts. First, the indices use publicly available data that are easy to collect. We illustrate techniques that can be used by others as better and newer data become available. Second, the additive indices are decomposable into their constituent components, which simplifies understanding of the individual factors driving water use at a localized level. This facilitates a comprehensive appreciation of pressures on water resources, which can assist with the formulation of better policies to reduce water use.

Indicators of Water Use
The United States used an estimated 1.343 trillion liters of water per day in 2010, according to the USGS, yet this level of water use represents a 13% decrease from consumption patterns observed in 2005 [18]. Reasons for this declining trend can vary by location, however, because the local economy, demographic shifts, conditions of the housing stock, and general growth patterns affect the level of water use [19]. Water used for agricultural purposes (e.g., irrigation and livestock) decreased from 488 to 446 billion liters per day, representing an 8% decline between 2005 and 2010, while industrial uses of water also declined by 8% (60.9 to 56 billion liters per day) during the same period [20]. Residential water use also declined from 670 to 522 liters per household per day (22% decline) and from 261 to 219 liters per capita per day (16% decrease) between 1999 and 2016 [21]. Despite the overall decrease in water use and consumption, it remains vitally important that we understand the composition of water use across these activities since it is likely that this trend manifests differently across geographic space with distinct regional variation.
The value of water and the need to understand pressures on water systems has prompted considerable attention in the scientific community. Water use is a multifaceted issue that involves population and demographic characteristics, natural features, and the climatic characteristics of place [6]. Although the uses of water vary by country and regions within countries [18,20] human uses may be categorized into three types: agricultural, industrial, and municipal [6,22]. To construct a more nuanced, multivariate index that characterizes the diverse range of human pressures on water resources, data were collected for each of these three domains of water use to coincide with our study year of 2010. To these three domains, climatic variables were added to account for the relationship between climatic conditions and water use [23,24]. Table 1 contains the suite of variables considered as potential index inputs with associated sources (see also Table S1 in Supplementary Material for data dictionary). Several variables are standardized and placed in percentage terms to account for variations in establishment presence and housing types across counties of varying sizes. It is important to note that this is not an exhaustive list of all possible uses of water. Rather, this list is meant to summarize major uses of water in each of the three activity areas (agriculture, industry, and municipal) based on information derived from the literature.

Agricultural Water Use
While uses of water vary within and across countries, global statistics of water use indicate that agriculture accounts for 70% of water use on average [29]. Irrigation for crop growth is one of the major users of water for agriculture [30] and accordingly, studies use the amount of irrigated land as a proxy for agricultural water use [7]. Irrigation needs vary by the location and type of crop grown and accordingly, crop coefficients that represent the "soil, climatic, environmental and management factors" that impact crop water needs have been developed [31]. Thus, the crop profile of regions can help understand varying water uses.
To capture agricultural activities in counties, data were collected from the 2012 Census of Agriculture from the United States Department of Agriculture (USDA). Variables obtained from this database included the amount of irrigated land area and the production of wheat, cotton, corn, and soybeans; these crops are noted to be water intensive [30]. The Census of Agriculture is released every 5-years and, while there is a slight temporal mismatch between these data and the 2010 data collected as indicators of industrial and residential water demand, these data are useful because they illustrate where water-intensive crops are more likely to be grown.

Industrial Water Use
Industry is another activity that uses substantial amounts of water. Major industrial users of water are power generation, mining, livestock and aquaculture, and manufacturing [18,32,33]. Older manufacturing activities including those related to metals, chemicals, food, and petroleum [34] are well-known users of water, as is mining for solid minerals [18,35]. Water is used in industry as a solvent and in finished products; it is also used for washing and cooling [6]. New economy manufacturing activities oriented towards computer technology also use a lot of water. For example, manufacturing semiconductors, which requires the use of ultrapure water, can consume millions of liters of water per day in one factory alone [36,37]. Manufacturers of everyday household products such as jeans and cars also demand a large amount of water. For example, the amount of water used to produce one car requires an automotive manufacturer to use~148,000 liters of water and to produce one pair of jeans, a textile manufacturer will use~11,000 liters [38].
To proxy for industrial uses of water, information concerning the number of businesses by county was obtained from the 2010 County Business Patterns of the U.S. Census Bureau for agriculture, forestry, fishing, and hunting (NAICS 11), mining (NAICS 21), utilities (NAICS 22), and manufacturing industries (NAICS [31][32][33]. Establishments in these industries are grouped by the North American Industrial Classification Industry (NAICS), which categorizes establishments by production process [39]. While there is overlap with the USDA data described in the previous section, business level data about agriculture was collected to capture farming and related activities not otherwise captured in the USDA data, including ranching, hatcheries, vegetable growing, and orchards [39]. The utilities sector includes power generation activities for multiple types of power (electric, hydroelectric, fossil fuel, natural gas, and nuclear) [39].

Residential Water Use
Population growth [7,40] and population density [41] are a popular means of characterizing human demands on water systems. In global studies of water scarcity, rising population levels increase competition for scarce water resources [42]. While this is certainly true in the developing world, in the developed world, where water saving fixtures and appliances are available, as are the means for water recycling and reuse, the link between people and water use is more complex. In this context, the characteristics of people, particularly their incomes matters, as do the characteristics of housing [9]. Studies use housing density to analyze water use because lower household densities are associated with more outdoor water use [43]. Higher housing densities present fewer opportunities for water use in this context. Thus, in the developed world, the relationship between population density or housing density and water use is complex.
From the large volume of research conducted on water use, we know that the three largest building factors affecting residential water use are type and size of the dwelling as well as the age of construction. There is widespread agreement that the size of single-family homes (square footage, number of rooms, and number of bedrooms) predicts higher water use [44][45][46][47]. In particular, single-family homes use the most water at the household level because they are correlated with larger lot size, more square footage, and household size [48]. Zhou [49] and [44] found that the number of bathrooms in a house is a contributor to water use.
Household income is the most studied sociodemographic variable related to residential water consumption and the one for which there is the highest agreement in terms of linkages with water use. As income increases in individual households, studies find total and per capita water use increase [50][51][52][53][54]. This is because household income is positively correlated with many other factors that are associated with higher water use, including single family residences (SFR) house size, lot size, and water appliances, so that whatever wealthier households may gain in newer plumbing, water-saving devices, or greater awareness of water conservation strategies, is more than offset by water-intensive lifestyles [9]. When income measures are not available, studies have used property values [45,55], education [56], occupation [44], and ethnicity [50] as proxies for affluence.
To account for the aforementioned factors that are associated with higher residential water use, three variables are drawn from the 2008-2012 American Community Survey of the U.S. Census Bureau compiled by the IPUMS National Historical Geographical Information System [27] where our study year of 2010 is the midpoint: the percentage of single-family homes in a county, the percentage of homes with four or more bedrooms, and median home value. The single-family home variable is designed to capture the relationship between single-family homes and greater water use [48]. The bedroom variable is intended to capture the likelihood that a home will have more bathrooms in the absence of data availability on bathrooms in households at the county level. Median home value is used to capture the link between home value and water use [45,57].

Climatic Variables
Water demand for agriculture and urban/residential use is modulated by climate [6]. Aside from the agricultural, industrial, and residential mix within individual counties, their climatic conditions are also important to capture because studies have highlighted that seasonal climatic conditions account for substantial variation in outdoor water use [50]. Many agricultural crops and plants in urban areas (e.g., lawns) require access to sufficient soil moisture to survive and grow. Soil moisture depends principally on the balance of water inputs (precipitation, irrigation) and outputs (evapotranspiration, runoff and percolation) to the soil system [58]. In regions and seasons where precipitation exceeds evapotranspiration, plants typically have access to sufficient water without the need for irrigation (provided precipitation is not too heterogeneously distributed in time and runoff is not excessive). However, in regions where precipitation cannot supply sufficient water, irrigation may be required, and water demand is likely to be more fully coupled to thermal, moisture, and radiative climate. In many regions of the contiguous U.S., summer months represent peak irrigation requirements due to more heterogeneous precipitation, higher evapotranspiration (due to higher temperatures and solar radiative forcing), and greater plant growth (whose underlying power source, photosynthesis, consumes additional water).
Here, we focus on precipitation inputs and evapotranspiration outputs to the local water balance for irrigated locations during summer. In addition to plant characteristics and relative amounts of water in the soil and atmosphere, evapotranspiration is driven by solar radiation and is strongly modulated by air temperature (which is positively correlated with solar radiation). Therefore, we include air temperature as a proxy for evapotranspiration; a more direct approach based on the Penman-Monteith equation could employ the "MODIS/Terra Net Evapotranspiration 8-day" [59] data set available from the US Geological Survey. Alternatively, latent heat flux dynamically downscaled by a regional climate model that was evaluated against a suite of multi-scale annual and seasonal observational products across the contiguous US would provide a third option [60,61]. Broad regional patterns of air temperature in the US have a latitudinal gradient; evapotranspiration, however, presents a stronger east-west variation between the dry western and humid eastern summertime US climates. However, in our view, there is no single data set that stands out from the others, and a number of systematic errors may bias each modelled evapotranspiration product in unknown ways. Accounting for the full range of potential differences and characterizing the source of these differences is a useful endeavor but is beyond the scope of this particular paper. Nevertheless, one caveat of our approach that uses air temperature as a surrogate for evapotranspiration is that it may overestimate variation between northern and southern US counties and underestimate corresponding east-west variation.
We use the June-August-average air temperature and precipitation total from 2010 at 4 km resolution from the observationally-based Parameter-elevation Relationships on Independent Slopes Model (PRISM) data set [62], and subsequently spatially average all 4 km by 4 km precipitation and temperature data grids located within each county. The PRISM data set is based on quality-controlled data from more than 10,000 observation stations and uses spatial regression modeling that considers location, elevation, coastal proximity, topographic facet orientation, vertical atmospheric layer, topographic position, and orographic effectiveness of the terrain to assess gridded values. This product represents a notable improvement over similar data sets, particularly in mountainous terrain [62]. Importantly, because PRISM is a spatially continuous dataset, it offers key advantages for the present application that distinct meteorological stations do not provide.

Index Construction Methodology
To provide a multivariate characterization of demand-based pressures on water systems, the data discussed above were integrated into four indices (see Table S2 in Supplementary Materials for county-level values of indices calculated in this study). These indices are compared to estimates of water withdrawals in the year 2010 from the United States Geological Survey, a popular means of characterizing water use [63][64][65]. Data on water withdrawals are presented according to the broad categories of public supply, domestic, industrial, irrigation, livestock, aquaculture, mining, and thermoelectric, but we consolidate these categories into groups that more closely match those provided by the Food and Agriculture Organization (FAO) discussed later.
The methodology for constructing these indices and associated formulae are presented below. These indices are computed for U.S. counties, which is a means of administratively dividing states into smaller units. Aside from the availability of data at the county level, counties have important administrative responsibilities in the United States, including education and law enforcement [66], and thus represent units of analysis that have administrative power over people and resources. Counties are also more geographically stable units for analysis than are other spatial subdivisions in the United States that are based on population thresholds such as Census blocks or tracts, which change size and shape as the population of places expands or contracts [67]. In this study, data were compiled for 3108 counties in the continental United States.

Unweighted Decile Index
The first index developed is an unweighted sum of deciles computed from the fourteen variables that characterize agricultural, industrial and residential pressures on water resources. To accomplish this, the variables summarized in Table 1 were placed into deciles (D i ) and assigned a value of 0 through 9, where a 0 corresponds with counties that fall within the lowest 10% of all counties and a 9 indicates counties that fall within the highest 10% of all counties. In this index low values identify places with lower water use while high index values indicate higher water use. For example, placement of an observation in the lowest 10% indicates it contains a very low value relative to other observations for a given variable of interest. Alternatively, placement of an observation into the top decile indicates that observation contains a very high value for a particular variable, relative to other observations in the dataset.
In the case of average precipitation, however, this schema is reversed and counties with more precipitation are placed into lower deciles. Unlike the other variables in Table 1, higher precipitation typically reduces water use via the provision of soil moisture to agricultural and residential landscapes that would otherwise need to be provided by irrigation. Once a decile value is computed for each of the fourteen variables, the decile values are added and standardized (see Equation (1)) so the index takes on values between 0 and 100. where: DecileSum is the additive total of decile ranks across all variables in each county; Minimum DecileSum is the lowest value of the total of decile ranks across all counties; Maximum DecileSum is the highest value of the total of decile ranks across all counties In this version of the index, no weights are applied. To address the issue of weighting, information about the distribution of water use between sectors is incorporated into indices discussed later in this paper (see Section 3.3).

Principal Components Analysis Index
Principal components analysis (PCA) provides a means of constructing a weighted index of these same variables. PCA is a data reduction technique that produces linear combinations of the original variables, called components, which are uncorrelated combinations of variables that explain the greatest amount of variation in the dataset. This method also determines the strength and direction of the relationship between the variables and components, or component loadings, and uses these loadings to calculate component scores, which indicate the value for each county on each component. We employed several heuristics to determine an appropriate number of components to extract and eventually rotate. First, we included a random variable in the analysis and if this variable exhibited a moderate to high loading (±0.50 or greater), that particular component and any subsequent components were considered noise in the dataset and were removed. Next, we visually inspected the output and retained components with an eigenvalue greater than 1. Finally, we ran a parallel analysis of 1000 randomly generated datasets and identified components with an eigenvalue greater than mean eigenvalues produced from the randomly generated datasets [68].
When these criteria are applied to our dataset summarized in Table 1, the PCA model identified five components for extraction and rotation. These five components were rotated orthogonally using the Varimax method which maximizes the sum of the variances of the squared loadings and, in effect, produces either large loadings or loadings near zero with few intermediate values and simplifies interpretation of the PCA results. The five components accounted for over 59% of the total variation in the data.
Component 1 illustrates corn and soy production per square mile across U.S. counties and, to a lesser extent, the impact of average annual precipitation. Each is positively correlated with the component indicating that corn and soy production increases or decreases alongside average annual precipitation. Average annual summer temperature ( • C) and median house value are captured by component 2 and show an inverse relationship suggesting that as the annual average temperature increases, the median home value decreases. Wheat production per square mile is highlighted in component 3 and shows a positive relationship with the percentage of single-family housing units. Cotton production per square mile is positively correlated with the percentage of irrigated agricultural land in component 4 while agricultural establishments and energy-producing establishments as a percentage of total establishments are captured in component 5.
To incorporate some type of weighting into a composite index of water demand, we followed previous research [69,70] and used the results of the PCA analysis from Table 2 to create an unstandardized index of demand (USID) for each county i with the following notation: where: P j is the amount of variance explained by component j divided by the cumulative variance explained by the model; and, f ij is the component score on each component j for each county i. This index was standardized per the notation below so that it can be compared to the unweighted decile index described above. This standardized version of the PCA weighted index takes on values between 0 and 100 where lower numbers correspond to counties that have fewer demands on water resources and counties with high index values have higher demands on water resources. where: County USID is the non-standardized value of the PCA weighted index for each county; Minimum USID is the lowest non-standardized value of the PCA weighted index across all counties; Maximum USID is the highest non-standardized value of the PCA weighted index across all counties.

Composite Principal Components Analysis Index
A drawback of the unweighted decile and PCA-derived indices described above is they do not reflect the variation in water use across various sectors. While there is variation in this distribution of uses across counties [22], an index that incorporates this allocation of use, is perhaps more insightful than one based on use-blind weights, such as those derived from the PCA above. To harness the analytical power of PCA and incorporate weights that reflect the distribution of water use across agriculture, industry, and households, a variant on the PCA-based index is proposed. To construct this index, three separate PCA models (Table 3) were run on the following groups of variables:

1.
Agriculture: irrigated land as a percentage of total land area; tons of corn per square mile; bales of cotton per square mile; bushels of soy per square mile; and, bushels of winter wheat per square mile 2.
Industrial: agricultural establishments as a percent of total establishments; mining establishments as a percent of total establishments; manufacturing establishments as a percent of total establishments; and, energy generating establishments as a percent of total establishments 3.
Residential: percentage of housing units with four bedrooms or more; single-family housing units as a percent of total housing units; and, median home value These independent PCA models were also rotated orthogonally using the Varimax method to produce uncorrelated components. Communalities describe the amount of variation in each variable explained by the PCA models and these values serve as weights in our composite PCA index. Our original variables were converted to z-scores to account for differences in measurement scale and we applied the communality weights using the following set of equations: Next, each group of variables was combined into a single, composite measure and additional weights assigned based on information about water use across agricultural, industrial, and municipal activities. Two sets of weights are used at this stage to reflect variations in use between the agricultural, industrial, and municipal sectors. These two sets of weights come from information about water use in the United States from the Aquastat database of the Food and Agriculture Organization [29] and the United States Geological Survey [18]. The FAO indicates uses of 36% agriculture, 51% industrial, 13% municipal and the USGS indicates uses of 27% agriculture, 60% industrial, and 13% municipal. As such, we used the weighted totals of the variable groups above and calculated two distinct composite PCA indices as follows: USGS Weighted PCA = (Agriculture * 0.27) + (Industrial * 0.60) + (Residential * 0.13) Then, we standardized each of the composite PCA indices to a range of 0-100 as follows: where: County CompPCA is the non-standardized value of the USGS or FAO weighted index for each county; Minimum CompPCA is the lowest non-standardized value of the USGS or FAO weighted index across all counties; and, Maximum CompPCA is the highest non-standardized value of the USGS or FAO weighted index across all counties.
It is worth noting that the information reported by these two sources is different. To align the sectoral categories reported by the FAO with the use categories reported by the USGS, the USGS use categories were mapped to the FAO categories in the following manner: municipal use contains the domestic withdrawals component of public supply; agriculture contains withdrawals for irrigation, livestock, and aquaculture; industrial withdrawals are classified as all other uses not classified as municipal or agriculture. This latter category includes water withdrawn for use in mining and thermoelectric power generation. It is important to note that, in our analysis, information about domestic use is the only component of municipal use. Municipal use is defined by the FAO as "water withdrawn for the direct use by the population" and can contain industrial and agricultural uses [22]. In the USGS database, public supply is the closest analog to municipal use and contains "water delivered to users for domestic, commercial, and industrial purposes" [71]. Since it was not possible to decompose public supply withdrawals into its constituent parts, only the domestic portion is classified as "municipal" and the remainder allocated to industrial use. This is not considered a huge assumption since "domestic deliveries represent the largest single component of public supply withdrawals" [71]. Figure 1 presents the four strategies for characterizing potential water use. Figure 1a displays the distribution of index values for the unweighted index. Figure 1b presents the distribution of index values for the principal components derived index. Figure 1c,d present the index values for the USGS and FAO weighted principal components indices respectively. Each of these figures displays the index of interest in terms of quintiles with breaks set at the 20th, 40th, 60th, and 80th percentiles. We present our results using quintiles for two reasons: quintiles contain approximately the same number of counties in each of the five groups and they simplify visualization and interpretation by using a standardized range of values for each version of the index. Lower quintiles represent lower water use while higher quintiles indicate higher water use. The same is true for most counties in Florida as well as counties surrounding Seattle and Portland in the Pacific Northwest. The PCA-weighted version of the index presented in Figure 1b illustrates several similarities and differences with the unweighted decile index. For example, this index also identifies higher water use in the North and Central Plains states but with greater concentration and intensity. The Rustbelt region and the Southeastern United States (with the exception of Appalachia) are also shown to be areas of higher water use according to the PCA-weighted index but, again, with varying intensities. Unlike the unweighted decile index, the West Coast states of Washington, Oregon, and California are shown to be areas having lower water use. An exception to this pattern is the Central Valley of California (which includes the San Joaquin River Valley). As well, the Northeast portion of the country, particularly New York, New Jersey and Pennsylvania, is shown to be an area with lower potential water use when compared against the unweighted decile index.

Results
This picture of water use is quite different from the picture presented by the unweighted index. The reason for these differences is that the PCA index uses the proportion of cumulative variance explained by the individual components as weights and these vary across geographic space. For instance, our first component explains 17.3% of the cumulative variance and receives greater weight when calculating the index while our fifth component receives less weight because it explains only 9% of the cumulative variance. In other words, water-intensive crops such as corn and soy, which are the highest loading variables on our first component, contribute more to this index than the agricultural and energy generating establishments that define our fifth component. Figure 1c,d present the geography of water use, as indicate by the USGS and FAO-Weighted indices respectively. Visually, these indices are quite similar to one another and they also share several similarities with the decile index. Like the decile index, the Central Valley farming region of

Geographic Trends in Metrics
A visual comparison of the panels presented in Figure 1 highlights differences in potential water use. The unweighted decile index (Figure 1a) indicates five regions with high water use. The first region is in and adjacent to the San Joaquin River Valley in California. Counties of the Northern and Central Plains states where soybeans and corn are grown extensively also are delineated as having high water use. The industrial Rustbelt, consisting of counties located in Illinois, Indiana, Michigan and Ohio, is a third region identified as having high water use. A fourth region with high water use, the Mississippi River Valley, includes counties in the states of Mississippi, Louisiana and Arkansas which are some of the more intensive cotton producing locations in the country [72]. The southeastern seaboard is a fifth region with high water use and includes counties in Georgia, South Carolina, North Carolina and Virginia, where the apparel and automobile industries are a strong presence [73] as well as low-wage manufacturing and military defense industry [74]. On the other hand, counties along the Northeastern seaboard and those located in Appalachia are classified as having lower water use. The same is true for most counties in Florida as well as counties surrounding Seattle and Portland in the Pacific Northwest.
The PCA-weighted version of the index presented in Figure 1b illustrates several similarities and differences with the unweighted decile index. For example, this index also identifies higher water use in the North and Central Plains states but with greater concentration and intensity. The Rustbelt region and the Southeastern United States (with the exception of Appalachia) are also shown to be areas of higher water use according to the PCA-weighted index but, again, with varying intensities. Unlike the unweighted decile index, the West Coast states of Washington, Oregon, and California are shown to be areas having lower water use. An exception to this pattern is the Central Valley of California (which includes the San Joaquin River Valley). As well, the Northeast portion of the country, particularly New York, New Jersey and Pennsylvania, is shown to be an area with lower potential water use when compared against the unweighted decile index.
This picture of water use is quite different from the picture presented by the unweighted index. The reason for these differences is that the PCA index uses the proportion of cumulative variance explained by the individual components as weights and these vary across geographic space. For instance, our first component explains 17.3% of the cumulative variance and receives greater weight when calculating the index while our fifth component receives less weight because it explains only 9% of the cumulative variance. In other words, water-intensive crops such as corn and soy, which are the highest loading variables on our first component, contribute more to this index than the agricultural and energy generating establishments that define our fifth component. Figure 1c,d present the geography of water use, as indicated by the USGS and FAO-Weighted indices respectively. Visually, these indices are quite similar to one another and they also share several similarities with the decile index. Like the decile index, the Central Valley farming region of California is highlighted as having high water use. The agriculturally intensive Plains states are also highlighted as having higher use for water. Pockets of higher use for water are noticeable in industry-intensive states including Illinois, Indiana, Michigan and Ohio. The Mississippi River Valley and the Southeastern seaboard are also pockets of higher water use. These results reflect a more nuanced weighting schema that accounts for distinct regional variation in agricultural, industrial, and residential characteristics across US counties. For this reason, it is not surprising that farming intensive regions (e.g., Central Valley of California or the Great Plains) show high potential water use considering that agricultural activities constitute a large share of overall water use. Likewise, industrial activity that remains in the traditional manufacturing belt continues to indicate potentially high use of water.

Metric Comparison
Given the similarities and differences highlighted, a series of contingency tables were tabulated (Tables 4 and A1, Tables A2-A4 to quantify the degree to which results vary across the different indices of potential water use. The rows and columns of the tables may be interpreted as the number of counties that were classified in a particular quintile for one index but changed quintile values in another index version. The diagonals represent counties that were classified in the same quintile for both indices. Table 4 compares the quintile assignments of the unweighted decile and PCA indices. This table shows some agreement in the lower decile. The unweighted decile index classifies 9% of observations in the first quintile, as does the PCA index. There is also some agreement in the assignment of counties to the fifth quintile; both indices classify 10% of their observations in this quintile. In quintiles 2-4 there is somewhat less agreement between the indices, as indicated by the off-diagonal values. To summarize similarities in quintile classification, the values along the diagonal are summed and added to 36%. This number means that 36% of all counties were assigned to the same quintiles by each index.  Table 5 summarizes the similarities and differences between the four indices computed in this study. The numbers in this table were computed by summing the diagonal elements of Tables 4 and A1,  Tables A2-A4. This table also presents the similarities and differences with total withdrawals data from the USGS 2010 estimates [18]. To compute this comparison, water withdrawals were assigned to quintiles (Table A4) and compared with the quintiles for the indices derived in this study. Based on the information in this table, there are some similarities between the derived indices and the quintiles derived from USGS water withdrawal data. Of these indices, the weighted PCAs are most similar to one another; they classified 83% of all counties in the same quintile. Of these two indices, the FAO-weighted PCA has the most similarities with the other indices. It is 41% similar to the unweighted decile index and 45% similar to the unweighted PCA index. A comparison of the USGS water withdrawal data and the four indices indicates that they are dissimilar from one another; the indices classify counties into similar quintiles about 20% of the time. The unweighted decile index is the most similar of the derived indices to the USGS data with about 23% of counties classified similarly.
To expand on the analysis in Table 5, Figures 2-4 depict the geographic similarities and differences between the four indices developed in this paper via difference maps. To construct these maps, the quintile of one index is subtracted from the quintile of the comparison index. For example, Figure 2a compares the decile index to the principal components derived index. If the results of this difference are positive, the decile index has higher values than does the PCA index. These locations appear in green on the map. If the results of this difference are negative, meaning that the decile index produces lower index values than the PCA index, counties are displayed in brown. Gray counties in this figure are counties that were classified in the same quintile by each index. Figure 2 summarizes the similarities and differences between the decile index and the other three indices. A comparison with the PCA index ( Figure 2a) reveals that the decile index presents a picture of higher potential use in the West and along the East coast. Alternatively, the PCA index indicates higher use in the Plains states of Iowa and Nebraska, as well as portions of Southeast states including Texas, Florida, Louisiana and Arkansas. These differences are derived from how the indices are constructed. The PCA index emphasizes agricultural and industrial variables while the decile-based index affords equal consideration to all variables including residential variables. As described in the contingency tables, there is some agreement between the two indices; counties in gray are evident in the Plains states, as well as counties in several states including Illinois, Indiana, and New York.
A comparison of the decile and USGS-weighted PCA index (Figure 2b) conveys a somewhat different picture. The USGS weighted index categorizes the Western states as having higher potential use than does the decile index. This is particularly evident in Oregon, Washington, Montana, and Wyoming. The USGS weighted index also ascribes higher potential use to Plains states such as Nebraska and Iowa than does the decile index. The decile index, which ascribes equal weighting to all variables, indicates more potential for water use along the East Coast and in states such as Nebraska, Oklahoma, and Texas. A comparison of the FAO weighted index (Figure 2c), which ascribes slightly more weight to industry than does the USGS weighting scheme, presents a similar picture of differences in potential use between the two indices. The decile index ascribes higher potential use in the Plains states and the Southeastern region of the country. These differences are largely attributable to several agricultural variables in the decile index that are capturing these regions even though their mix of crops is different. Unlike the decile index, which produced somewhat mixed patterns of similarities and differences, a comparison of the PCA index and the other three indices in Figure 3 produces more geographically pronounced trends. Figure 3a contains a difference map of the PCA and decile indices for comparative purposes only since this difference map for the two indices was discussed above. In terms of how the PCA index compares to the USGS (Figure 3b) and FAO weighted (Figure 3c) indices, the similarities and differences are quite similar. Both the USGS and FAO indices, which are weighted more towards industry, present the West and Northeast as having higher potential water uses than the PCA index. Alternatively, the PCA index classifies counties in the Plains states, the Midwest, and portions of the Southeast as having higher potential water use.
A geographic comparison of the quintile classifications highlights that those based on the USGS withdrawal data present counties in the West as having higher potential water use than do the indices derived in this paper (Figure 4). The same is true for counties in the Northeast and in Florida. These differences in classification are most stark in the comparison of the decile (Figure 4a) and PCAderived (Figure 4b) indices to the quintiles based on withdrawal data. The differences between quintile classifications fade a bit once FAO and USGS weighting schemes are incorporated ( Figure  4c,d). More gray counties, indicating classification similarities between two indices, appear in the West and Northeast. All four indices characterize the agriculturally intensive Plains states as having higher potential use than do the withdrawal data. However, these findings are made cautiously since publications by the USGS indicate the types of data used in their analysis, but the precise weighting schemes are not presented [18,75].

Index Decomposition
One of the means of understanding variations in classification results between indices-drivers of index values and how these map to quintile classifications-is the decomposition of each index into individual components. This moves beyond industry decomposition into municipal, industrial, and agricultural uses, which is possible via information from Aquastat [29], or, the decomposition of Northwest. Single-family homes, depicted in Figure 5d, represent heightened potential water use and counties positioned in higher deciles are located in/ around the North and Central Plains, the upper Midwest and Rustbelt, and several counties in the Rocky Mountain region. The contribution of various factors to potential water use across the United States is complicated, as demonstrated by the series of indices derived in this paper, but through a decomposition analysis, it is possible to begin to understand how, why, and where potential water use varies geographically.  Unlike the decile index, which produced somewhat mixed patterns of similarities and differences, a comparison of the PCA index and the other three indices in Figure 3 produces more geographically pronounced trends. Figure 3a contains a difference map of the PCA and decile indices for comparative purposes only since this difference map for the two indices was discussed above. In terms of how the PCA index compares to the USGS (Figure 3b) and FAO weighted (Figure 3c) indices, the similarities and differences are quite similar. Both the USGS and FAO indices, which are weighted more towards industry, present the West and Northeast as having higher potential water uses than the PCA index. Alternatively, the PCA index classifies counties in the Plains states, the Midwest, and portions of the Southeast as having higher potential water use.
A geographic comparison of the quintile classifications highlights that those based on the USGS withdrawal data present counties in the West as having higher potential water use than do the indices derived in this paper (Figure 4). The same is true for counties in the Northeast and in Florida. These differences in classification are most stark in the comparison of the decile (Figure 4a) and PCA-derived (Figure 4b) indices to the quintiles based on withdrawal data. The differences between quintile classifications fade a bit once FAO and USGS weighting schemes are incorporated (Figure 4c,d). More gray counties, indicating classification similarities between two indices, appear in the West and Northeast. All four indices characterize the agriculturally intensive Plains states as having higher potential use than do the withdrawal data. However, these findings are made cautiously since publications by the USGS indicate the types of data used in their analysis, but the precise weighting schemes are not presented [18,75].

Index Decomposition
One of the means of understanding variations in classification results between indices-drivers of index values and how these map to quintile classifications-is the decomposition of each index into individual components. This moves beyond industry decomposition into municipal, industrial, and agricultural uses, which is possible via information from Aquastat [29], or, the decomposition of withdrawal data into different types of uses (i.e. irrigation, thermoelectric power, public supply). We use the unweighted decile index to demonstrate the value of this approach. Figure 5a-d presents a decomposition analysis of the unweighted decile index and three of its constituent variables in decile form.

Conclusions
Given the scarcity of freshwater resources across the globe, the goal of this paper was to derive and evaluate the classification performance of indices that could serve as proxies for water use. Classification similarities between the indices were compared against one another and against water withdrawal data from the USGS, which is a frequently used proxy for water use [63][64][65]. This comparison revealed varying degrees of similarity between the four indices in terms of the classification of counties into quintiles. County classifications from the PCA derived indices were more similar to one another but dissimilar from those based on the water withdrawal data. The unweighted decile-based index was most similar to the water withdrawal data. From a geographic perspective, there were also significant differences in the geography of potential water use depicted by each index. These differences reflect the variations in weighting schemes from one index to another. The differences in index values between the indices presented in this study and the USGS data are likely due to differences in the composition of the indices. While the indices constructed in the study use similar data sources as the USGS water withdrawal estimates, the composition of the indices and these data are quite different from one another.  Figure 5c shows counties with high potential water use located in the Southeast region as well as the Pacific Northwest. Single-family homes, depicted in Figure 5d, represent heightened potential water use and counties positioned in higher deciles are located in/ around the North and Central Plains, the upper Midwest and Rustbelt, and several counties in the Rocky Mountain region. The contribution of various factors to potential water use across the United States is complicated, as demonstrated by the series of indices derived in this paper, but through a decomposition analysis, it is possible to begin to understand how, why, and where potential water use varies geographically.

Conclusions
Given the scarcity of freshwater resources across the globe, the goal of this paper was to derive and evaluate the classification performance of indices that could serve as proxies for water use. Classification similarities between the indices were compared against one another and against water withdrawal data from the USGS, which is a frequently used proxy for water use [63][64][65]. This comparison revealed varying degrees of similarity between the four indices in terms of the classification of counties into quintiles. County classifications from the PCA derived indices were more similar to one another but dissimilar from those based on the water withdrawal data. The unweighted decile-based index was most similar to the water withdrawal data. From a geographic perspective, there were also significant differences in the geography of potential water use depicted by each index. These differences reflect the variations in weighting schemes from one index to another. The differences in index values between the indices presented in this study and the USGS data are likely due to differences in the composition of the indices. While the indices constructed in the study use similar data sources as the USGS water withdrawal estimates, the composition of the indices and these data are quite different from one another.
Water withdrawals include consumptive uses of water, conveyance losses (i.e., water lost in transit), as well as water returned to surface and groundwater sources [76]. Ideally, a breakout of withdrawals into each of these components would be important for understanding consumptive uses, which are not returned to water sources, and leakages from water systems. The latter is particularly important to analyze given the volume of water lost from leaks annually [77,78]. That said, this level of granularity in data at the regional level is difficult to find. In the absence of this information, related efforts to derive information about water use is an ongoing challenge that attempts to strike a balance between coverage and precision [76]. Numerous means of estimating water use are available to the scientific community ranging from input-output based techniques to multivariate regression [76]. Given this diversity of approaches, estimates of water use are better referred to as representing "potential use" instead of actual use. Thus, the indices offered in this paper represent a simpler approach to representing potential water use at the regional level and offer four advantages over prior measures of use. First, the indices make use of publicly available data that are easy to collect and integrate. Second, the additive indices are decomposable into their constituent components, which makes it easier to pinpoint the individual drivers of potential use. Third, the indices integrate a targeted yet reasonably comprehensive list of variables across agricultural, industrial and residential sectors and include data about climatic conditions. Fourth, the localized resolution is likely to be more informative within the policy realm than are aggregate measures of use.
These indices are a first step in the direction of computing robust indices of water use at the regional level. The results of this analysis suggest four areas for additional research. The first line of research is with respect to a need for work comparing existing measures of water use to one another as well as across time. A variety of strategies are used to estimate water use because of the "legal, financial, and political constraints" associated with collecting precise information [76]. Comparisons of these measures in terms of their component variables and weighting schemes (as was done in this study) are critical to understanding differences in water use presented by various metrics; this includes where measures agree or disagree and how measures change over time.
The present study examined various sectoral weighting schemes based on national averages and did not pursue regionalized weighting schemes. However, a second extension to this study is the exploration of regionalized weighting schemes that reflect regional variations in water use [20]. This type of analysis would require a detailed examination of water withdrawal estimates to design potential weighting schemes. Third, the present study examined proxies for water use, which is one element of water stress. Given the time and effort dedicated to the development of water stress indices, similar comparative studies of water stress metrics could be undertaken to understand differences and the sources of variation in these metrics.
It is also important to note that this study developed indices for water use in the United States, which is one of the larger users of water globally. As discussed previously, uses of water but also the technologies and financial ability of households to purchase and adopt these technologies to mitigate water use is perhaps greater in the developed world than the developing world. This suggests that some of the drivers of water use in the developing world may differ from those discussed in the present study. Thus, an evaluation of the utility of these indices in the developing world, and necessary modifications to these indices is a fourth recommended area for future research.
Good proxies for water use at the regional level on an annual basis are needed to help in the management of scarce freshwater resources. To date, estimates of water use are varied and often difficult to replicate. Thus, regional metrics that are possible to replicate and estimate on an annual basis are needed to track water use at finer temporal and geographic scales. This study represented one step in this direction, but it is hoped that more will follow to enhance our knowledge and management of this vital natural resource.
Supplementary Materials: The supplementary material in the Word document contains Table S1, serves as the data dictionary for this dataset and is available at http://www.mdpi.com/2071-1050/11/8/2292/s1. The supplementary material in the Excel document (Table S2) contains county level indices that serve as proxies for water demand as computed in this paper and is available at http://www.mdpi.com/2071-1050/11/8/2292/s2.

Acknowledgments:
The authors wish to thank the anonymous reviewers for helpful comments and suggestions that greatly improved the content of this research.

Conflicts of Interest:
The authors declare no conflict of interest.