Weight Approximation for Spatial Outcomes

: When preferences explicitly include a spatial component, it can be challenging to assign weights to geographic regions in a way that is both pragmatic and accurate. In multi-attribute decision making, weights reﬂect cardinal information about preferences that can be difﬁcult to assess thoroughly in practice. Recognizing this challenge, researchers have developed several methods for using ordinal rankings to approximate sets of cardinal weights. However, when the set of weights reﬂects a set of geographic regions, the number of weights can be enormous, and it may be cognitively challenging for decision makers to provide even a coherent ordinal ranking. This is often the case in policy decisions with widespread impacts. This paper uses a simulation study for spatial preferences to evaluate the performance of several rank-based weight approximation methods, as well as several new methods based on assigning each region to a tier expressing the extent to which it should inﬂuence the evaluation of policy alternatives. The tier-based methods do not become more cognitively complex as the number of regions increases, they allow decision makers to express a wider range of preferences, and they are similar in accuracy to rank-based methods when the number of regions is large. The paper then demonstrates all of these approximation methods with preferences for water usage by census block in a United States county.


Introduction
When the impacts of a policy decision vary geographically, decision makers will need to evaluate and compare spatial outcomes. In some cases, simple summary metrics (e.g., average pollution level, total annual water usage) might be sufficient to capture decision makers' preferences adequately. However, if the level of an attribute is more important or meaningful in some locations than in others, then a more detailed assessment of preferences is required. This contrast between "spatially implicit" and "spatially explicit" analysis is discussed by [1], who point out that spatial weights are the crucial distinguishing component of the latter. Each geographic region is assigned a spatial weight capturing tradeoffs that a decision maker is willing to make between regions; a higher weight indicates that the results in that region are factored more heavily into evaluation of outcomes. Simon, Kirkwood, and Keller [2] provide a theoretical framework for applying a multiattribute value function that includes a set of spatial weights for all of the regions impacted by a decision.
There are several thorough and effective methods for assessing weights that could be applied to a set of spatial weights, including direct point allocation, pricing out, the value tradeoff method [3,4], simple multiattribute rating technique (SMART) [5,6], and swing weighting [6]. However, these methods tend to require substantial time and effort on the part of the decision maker to obtain accurate results. Some methods also require a level of comfort with quantitative thinking not possessed by all decision makers. Recognizing these challenges, researchers have developed and evaluated many techniques for approximating attribute weights, ranging from a very simple assumption that all weights are equal [7] to a calculation of the centroid of the set of weight vectors that follow a specified ranking [8]. Spatial weights, however, pose a new dilemma. The number of regions, and thus the number of weights required, may be extremely large. It is therefore valuable to explore how well various weight approximation techniques perform in this setting.
Many of the common approximation techniques are based on ordinal rankings of the attributes (in this case, the geographic regions). We explore these, as well as three new weight approximation methods that are based on the assignment of each attribute to one of a small number of pre-determined tiers. A tier expresses the degree to which the decision maker would like the attribute to impact the evaluation of alternatives. The set of tiers can be represented clearly using, e.g., a Likert scale or a color-coding scheme. For spatial decision problems, tier-based methods may be preferable to approximations based on ordinal rankings for two main reasons.
First, they require substantially less time and effort from the decision maker. Ranking methods were devised to be less cognitively demanding than the most thorough elicitation methods, but they can still be quite challenging to carry out manually when the number of attributes to be ranked is very large. The results of a simple experiment presented in Section 4 illustrate this advantage of the tier approach. Second, tier-based methods are more flexible. Given n attributes, a rank-based method can only assign n possible weights to any individual attribute (assuming no ties). As we will demonstrate, tier-based methods are much less restricted.
Despite the relative lack of information the decision maker is able to provide about each attribute individually, tier-based methods still perform well for large sets of spatial weights. This is demonstrated using weight simulations in Section 4. The tier-based methods are intuitive, and can be a helpful alternate approach to existing methods.
The remainder of the paper proceeds as follows. Section 2 explores prior related work on attribute weight approximation and the use of spatial weights in decision making. Section 3 describes the approach used in this paper to evaluate approximation techniques. Section 4 presents the results and compares the performance of various techniques. Section 5 applies these weight approximation techniques to preferences for water usage by census block in a county in the United States. Finally, Section 6 offers some additional discussion and concludes the paper.
Any multiattribute decision requires consideration, implicitly or explicitly, of the tradeoffs that stakeholders are willing to make between those attributes. Numerous methods have been developed for eliciting weights on the attributes, as specified in the Introduction. Schoemaker and Waid [22], Borcherding, Eppel, and von Winterfeldt [23], and Pöyhönen and Hämäläinen [24] explore comparisons of these and additional techniques. In a spatial setting, Hobbs [25] compares a rating and a tradeoff weighting method in a power plant siting problem, and finds that the two methods lead to substantially different choices of location.
As Malczewski and Jankowski [1] discuss, while additive multiattribute value functions are common in spatial decisions, the explicit inclusion of a set of weights assigned to a set of regions is a more recent development. Theory supporting this form of spatial preferences is developed by [2,26,27]. Ferretti and Montibeller [28] explore the challenges of using decision support systems in spatial analysis, one of which is the elicitation of weights. In these decision settings, it is possible for the number of regions (and thus the number of spatial weights) to be extremely large. For instance, Ferretti and Montibeller [29] explore an illustrative flood risk example that divides a map into 400 regions, each of which is assigned a vulnerability score.
Due to the substantial time and cognitive effort involved in the weight elicitation techniques mentioned previously, researchers have introduced many ways to approximate weights using simpler approaches. The easiest possible approach is to assign equal weights to all attributes [7], which would be the most reasonable approximation in the absence of any preference information whatsoever. If an ordinal ranking of attributes can be obtained, there are several possible methods for converting that ranking into a set of approximated weights [30]. Perhaps the two simplest rank-based methods are rank-sum, in which the i-th most important attribute receives a weight of n + 1 − i, and rank-reciprocal, in which the i-th most important attribute receives a weight of 1/i. In both methods, the weights are then normalized to sum to 1. Barron [8] introduces another rank-based method called rank-order centroid. Rank-order centroid is based on the center of mass of the subset of the simplex that conforms to the stated ranking. For each attribute, it assigns the attribute's average weight over all possible weight vectors in that rank order. In a simulation experiment, Barron and Barrett [31] find that rank-order centroid is the most accurate of the methods listed here.
It is possible to increase the precision of rank-based approximation methods by incorporating some cardinal preference information [32,33]. Indeed, in many decision problems, weight approximations based on purely ordinal information might not be sufficiently accurate, and analysts may be willing to sacrifice some ease of elicitation. When the number of attributes is extremely large, the opposite problem is more likely to arise; even an ordinal ranking of attributes can be prohibitively difficult to obtain manually. However, it could be argued that the tier-based methods explored in the current paper, while designed for easier elicitation, are nonetheless capturing cardinal information.
For pragmatic reasons, tiering is already used in spatial decisions without an intentional elicitation of preference information; this is one of the motivations to develop and evaluate the performance of weight approximation methods that are deliberately tier-based. The tier designations have typically been determined by one or more physical measures. For example, in their water resource vulnerability analysis, Chung et al. [34] classify South Korea's 232 districts into six tiers based on population. Zhou et al. [35] classify regions of Tsushima Island, Japan into five tiers of landslide susceptibility. Niu, Yu, and Li [36] classify regions of Taogang Town, China into five tiers of environmental sensitivity for use in planning and development decisions. Note that tiering is distinct from spatial clustering or spatial autocorrelation [37][38][39]. If the metrics being analyzed tend to exhibit similar levels for regions that are geographically close to one another, then it is likely that clusters of regions would often end up assigned to the same tier. However, there is nothing about the tiering approach itself that inherently groups nearby regions into the same tier.
There are additional techniques that have been used to obtain spatial weights but are not explored in this paper because they require context-specific information. For instance, Sironen and Mononen [20] capture an approximation of region weights by incorporating several spatial metrics of interest to forest management decisions, such as proximity to main roads and recreation routes. If attribute weights are calculated directly using the ranges of observed attribute levels, then heterogeneity in spatial preferences can be captured by using local ranges rather than the global range for each attribute [40,41]. Alternatively, if the analyst knows what particular tradeoffs are likely to be crucial to the decision(s), then it may be sufficient to elicit incomplete preference information that captures those tradeoffs, and then use it to construct spatial weight approximations, as demonstrated by [27].
In the context of spatial outcomes, single-attribute value functions can be assessed easily using standard approaches, provided that the shape of the value function does not change by region [2]. Therefore, the focus of the analysis in the current paper is restricted to the spatial weights assigned to the n regions.

Methods
Consider an additive multiattribute value function V for spatial outcomes that include n different regions, where n is a large number: with x 1 , . . . , x n denoting the physical outcomes in the n regions, v denoting the single-attribute function that translates x i into a score, and w 1 , . . . , w n denoting the true weights of the regions, which are the component of Equation (1) central to this paper. The notion of true weights is explained by [31]; they reflect the weights that would be obtained from a perfect elicitation process with zero error. While the true weights could never be known precisely in practice, they are a useful concept for evaluating the accuracy of weight approximation methods via simulation. They serve as a target by which all approximation methods can be measured.

Simulation Approach
Each simulation will include 5000 trials, where a set of true weights is generated randomly in each trial and compared to the weights resulting from various approximation methods. We denote the set of weights generated by an approximation method as w 1 , . . . , w n .
The simulation approach for w 1 , . . . , w n used in this paper comes from [42], and is also used by [31]. To generate sets of weights that must sum to one, we require a joint distribution over the n − 1 simplex, which represents all possible weight vectors. We assume that this distribution is uniform over the simplex, since we have no a priori reason to believe that any valid set of weights is more likely than any other. This is also known as a flat Dirichlet distribution, with the property that the marginal distribution of each individual weight is Beta(1, n − 1). A straightforward way to simulate this distribution is to generate n − 1 uniform [0,1] random variables and plot them on a unit interval; this divides the unit interval into n segments whose lengths follow the desired flat Dirichlet distribution. Further details regarding this simulation approach are given by [42]. It was implemented here using Microsoft Excel and Visual Basic for Applications. Another possible simulation approach, explored by [43], is to generate n draws from an exponential distribution with rate 1, and divide each by the sum of the n draws to obtain a set of weights.
For an approximation method to be viable, it should tend to produce w 1 , . . . , w n that are sufficiently close to w 1 , . . . , w n . There are many possible error metrics that can serve as a basis for comparison. We focus on three of them. The first is mean squared error (MSE); that is, the average value of (w i − w i ) 2 . Other similar metrics such as mean absolute deviation and mean absolute percentage error yielded qualitatively similar results. For the sake of robustness, we also consider two metrics related to the quality of a resulting decision. One is hit rate: the proportion of trials in which the approximation led to the same choice that would have been made using the true weights. The other is average value loss: the average of the percentage loss in value as compared to the value that would have been achieved with the best possible choice. The value loss is zero if the best choice is selected. Both value calculations (the best value and the achieved value), are based on the true weights. Hit rate captures the frequency of mistakes, while average value loss captures both the frequency and severity of mistakes.
Of course, calculating hit rate and average value loss requires having alternatives from which to choose. We use two simple alternatives: one in which all regions have a value of 0.5, and one in which half of the regions have a value of 0.8 and the other half have a value of 0.2. The assignment of 0.8 and 0.2 values to regions is arbitrary; the probability density function for the weights is symmetric, as are all of the approximation methods. The results are robust to other changes in these alternatives as well, provided that the alternatives are sufficiently different from one another and both are similarly desirable before the weights are known.

Approximation Methods
The simplest weight approximation method is to assign equal weights (EW) 1/n to every region. This is equivalent to evaluating an outcome based on the average value over all regions, and reflects no spatial preference information whatsoever. A perfect approximation method would yield the true weights every time; EW can be viewed as the opposing baseline, in which no attempt is made to ensure that w 1 , . . . , w n are similar to w 1 , . . . , w n .
We will consider three rank-based methods: rank-sum (RS), rank-reciprocal (RR), and rank-order centroid (ROC). All three are based on an ordinal ranking of the regions from most to least important in the evaluation of outcomes. As mentioned previously, RS converts the i-th ranked region to a weight of n + 1 − i, while RR converts the i-th ranked region to a weight of 1/i; both are then normalized such that the resulting weights sum to 1. The premise of ROC is, given that region i has received rank r, to set w i as the average true weight for region i across all sets of true weights that are in the order stated. The weight conversion for the i-th ranked region under ROC is: Barron [8] and Barron and Barrett [31] provide the derivation of this formula; it is obtained by averaging the i-th ranked region's weight over all of the vertices of the subset of the n − 1 simplex that obeys the specified ranking. For large n, RS is the most similar to EW (i.e., the weights are closest to equal), and RR is the farthest from EW. For example, when n = 100, EW would result in w i = 0.01 for all i. If region 1 is ranked first, then RS results in w 1 = 0.0198, ROC in w 1 = 0.0519, and RR in w 1 = 0.1928. EW, RS, RR, and ROC were evaluated by [31] for smaller values of n, and ROC was found to be the most accurate. This is, of course, influenced by the underlying uniform distribution of true weights (Barron and Barrett's [31] simulation, like the one used in the current paper, assumes a uniform distribution over the n − 1 simplex). As Danielson and Ekenberg [44] discuss, this is analogous to a "point allocation" process with n − 1 degrees of freedom; if preferences followed a "direct rating" approach with n degrees of freedom followed by a normalization, the distribution of weights would be substantially different (the weights would tend to be closer to equal). It is possible that the context of a decision would lead the analyst to suspect otherwise. For instance, if the region weights were likely to be similar for political reasons, then RS would fare better, and ROC and especially RR would fare worse.
Because manually constructing an ordinal ranking may prove prohibitively difficult when n is very large, we introduce three analogous methods based on tiers rather than ranks. Instead of ranking all n regions, the decision maker assigns each region to one of K tiers reflecting various levels of importance. Assume that n K. The tiers are indexed by k = 1, . . . , K. The choice of K reflects a tradeoff between precision and simplicity, and will be discussed in Section 4. Let t i denote the tier to which region i is assigned. No further assessment from the decision maker is required once t 1 , . . . , t n have been obtained. In particular, ranking is not required within tiers.
There are two clear advantages to tier-based methods as compared to rank-based methods. First, when the number of regions is very large, it is substantially faster. In terms of computational complexity, assigning n regions to tiers is O(n); i.e., the longest possible time required by an algorithm to complete it is linear in n. Optimal ranking algorithms, assuming no parallel processing, are O(n log n) [45]. However, the simpler ranking algorithms that are more feasible for humans, such as selection or insertion, are O(n 2 ), which is substantially slower than tiering. Furthermore, these comparisons of speed are purely based on algorithmic complexity; they do not consider the cognitive demand on the decision maker, which is likely lower for tiering than it is for ranking. Tiering never requires the decision maker to compare one region to another explicitly. Of course, if the ranking is constructed automatically via calculations based on physical measures (population, distance from a river, soil quality, etc.) with no subjective judgment, then this is not an issue.
To confirm that tiering is indeed easier than ranking, a simple experiment was conducted in which subjects were asked to consider 20 different attributes of apartments. One group of subjects was instructed to rank the attributes from most to least important, a second group was instructed to assign each attribute to one of three tiers (K = 3), and a third group was instructed to assign each attribute to one of seven tiers (K = 7). The ranking group had the highest average completion time, as well as the highest proportion of incomplete or otherwise invalid responses. A more detailed explanation of the experiment and results can be found in Appendix A.
The second benefit of tiering is that it is much more flexible. In a rank-based approach without ties, there are only n possible values for w i ; any individual approximation method such as RS, RR, and ROC will always yield the same n weights. When using tiers instead of a ranking, the exact number of possible values for w i will depend on how the tiers are converted to weights, but any reasonable conversion method will be able to produce substantially more than n values. (To see why this is true, consider a simple example with three tiers and four regions, and let the first region be assigned to the first tier (t 1 = 1). It should be clear that (t 2 , t 3 , t 4 ) = {(1, 1, 1), (2, 1, 1), (2, 2, 2), (3, 2, 2), (3, 3, 3)} should lead to five different increasingly large values of w 1 . This constitutes only some of the possible weight estimates for a single tier; the total number of possible values for w 1 is much larger than five).
As in the case of rankings, we must choose a method by which the set of tiers will be converted to weights. We will start by assigning a score s k to each of the K tiers. Then, once t 1 , . . . , t n have been assessed, we can calculate each weight as: The challenge is how each of the s k should be determined. It is conceptually analogous to converting a rank to a weight, and thus several analogous approaches are possible. We will consider three in particular: tier-sum (TS), tier-reciprocal (TR), and tier-quantile (TQ). The adaptations for TS and TR are straightforward. Under TS, s k = K + 1 − k, and under TR, s k = 1/k. The only notable difference from RS and RR is that the denominators for normalizing the weights are not known a priori; they will depend on how many regions are assigned to each tier. TQ involves a somewhat more involved calculation of s k . The reasoning behind TQ is similar to that of ROC: when region i has been placed into tier k, we would like TQ to set w i as the average true weight that would lead to the region being placed into tier k. However, unlike a ranking, a set of tier assignments t 1 , . . . , t n does not unambiguously determine the subset of the n − 1 simplex within which the weights must lie. Therefore, we instead base TQ's tier scores on the premise that the decision maker is equally likely to place a given region i into any of the K tiers. We can then take advantage of a convenient property of the flat Dirichlet distribution of true weights: each region's true weight will follow a Beta(1, n − 1) marginal distribution. Thus, we can interpret t i = k as a statement by the decision maker that w i is in the k-th quantile (out of K quantiles) of a Beta(1, n − 1) distribution, and set s k as the mean of that quantile. The resulting score is given by: where k l and k h are the lower and upper bounds, respectively, of the k-th quantile of Beta(1, n − 1). An illustration of the resulting scores is shown in Figure 1 with n = 50 and K = 4. As a reminder, when t i = k, that does not mean w i = s k . The scores must still be normalized as shown in Equation (3); s k could be considered the raw weight associated with tier k. Though Equation (4) may appear cumbersome, it is a straightforward spreadsheet calculation.

Results
In total, 5000 sets of true weights were simulated for each of five different values of n: 20, 40, 60, 80, and 100. For each set of true weights, approximated weights were computed using EW, RS, RR, ROC, TS, TR, and TQ, where each region was placed into a tier according to equally likely quantiles of a weight's Beta(1, n − 1) marginal distribution. Each of the tier-based methods was run using eight different values of K (three through ten). Thus, each trial of each simulation produced a set of true weights and 28 different sets of approximated weights. For each simulation, a MSE, a hit rate, and an average value loss was calculated for each of the 28 approximation methods. The MSE numbers are all very close to zero, and cumbersome to compare directly. Therefore, they are shown instead as percentages of the MSE of the EW method's MSE for that value of n. EW is a natural baseline to use, since it is the simplest possible way to assign weights, and the one we would expect to have the largest error (though it does not always). Note that the results should not be compared across different values of n. MSE tends to decrease with n for all methods, including EW, simply because the true weights decrease with n. Due to the way the alternatives are constructed, their values (using the true weights) tend to be closer to one another as n increases. For most methods, this leads to more incorrect choices but fewer very large mistakes.
The simulation results are shown in Tables 1-3. Note that there is no row for EW in Table 2; the two alternatives being compared are equally desirable when all weights are equal, so it cannot distinguish between them. That could be interpreted as a hit rate of 50%, for the sake of comparison. To estimate value loss under EW, an alternative was selected at random in each trial.  There are several results worth discussing. First, RR performs extremely badly for large values of n, with even larger errors than EW, as well as the lowest hit rate and highest average value loss among all methods except EW. This is due to the approximated weights being too disparate, i.e., too far away from the center of the n − 1 simplex. When n = 100, RR assigns one region a weight of 0.193, while most regions are largely ignored. TR, however, is substantially more accurate than both TS and RS for K > 4. Because K is much smaller than n, TR does not suffer from the problem of producing overly disparate weights. For the higher values of K shown, TR tends to provide an accurate approximation of the true weights.
The two most accurate approaches are ROC, followed by TQ (which is noticeably ahead of TR). ROC is unique in that it becomes substantially more accurate as n increases. It should be noted, however, that all of these results are based on a presumption of zero elicitation error. This presumption becomes increasingly unlikely when ranking an extremely large number of regions with limited time.
Without elicitation error, ROC is a very rigorous approximation for large n. TQ, however, is also quite accurate across the range of values of n, and is a viable approach if rank-based methods are deemed to be impractical.
For all of the tier-based methods, accuracy improves as K increases. This is not surprising, as larger values of K allow the decision maker greater flexibility. However, it is important to note that the primary motivations behind these methods are speed and ease of elicitation. Increasing K means sacrificing these traits. The appropriate value of K depends on the tradeoff that stakeholders are willing to make between thoroughness and expediency.

Example
In this section, we apply the weight approximation methods to illustrative preferences regarding total water usage by census block in Montgomery County, which is located in the United States adjacent to Washington, D.C., in the state of Maryland. There are 215 blocks in total; the map is shown in Figure 2, with many of the blocks labeled, and the primary water provider's two filtration plants shown in blue. These preferences would apply to any regulatory or development decisions in Montgomery County for which impact on water usage is the primary concern. First, a set of "true" spatial weights for each census block's water usage was created. They are shown in Figure 3, along with the results of several approximation methods. Recall that these true weights could never actually be known in practice. The weights used here are illustrative, but are intended to be plausible; they reflect distance from the two filtration plants, population density, and the degree to which public water systems serve that area. A shorter distance from a filtration plant involves lower operating and maintenance costs associated with water usage, and a higher population density allows for greater efficiency of supply. Both of these factors are captured easily by natural measures. The county does not state exact proportions of public vs. private water usage by block, but provides a simple constructed scale by which large differences between blocks' public water usage can be easily observed; the more rural areas far from Washington, D.C. tend to use more water from private sources. A simple linear combination of these three factors is used to calculate the true weights used in the example. Detailed information about Maryland's water systems can be found at www.mgs.md.gov, and Montgomery County's specifically at montgomerycountymd.gov/water/. The full set of weights is shown in Appendix B, along with the weights approximated by each method (K = 5 and K = 10 are included for each of the tier-based methods). EW is not included; it leads trivially to a weight of 1/215 in every block. The weights are fairly similar; no census block has an extremely high or extremely low weight. If this lack of spatial weight variability is understood a priori, then either RS or one of the tier-based methods should be used, as RR and ROC for such a large n will produce a much greater disparity in region weights than desired.
The MSE for each method is shown in Table 4. The two most accurate methods, by a wide margin, are TQ and TR. As expected, RR and ROC do not perform well. Maps of the approximated weights for TQ and TR are shown in Figure 3. For the sake of readability, labels are only included for the maximum and minimum weights. ROC is also included in Figure 3 to illustrate why it is less accurate in this example; it assigns too many very large and very small weights. Note that the tier-based methods err slightly in the opposite direction (especially for K = 5), as their highest and lowest weights are less extreme than the true weights.

Discussion
In this paper, we have explored several possible weight approximation techniques regarding their viability for spatial decisions with a large number of regions. The simulation study revealed that, overall, rank-order centroid was the most accurate. However, since generating a ranking of the regions may be infeasible or impractical in some cases, we have also introduced three methods based on assigning each region to a tier of importance. The tier-based methods, particularly tier-reciprocal and tier-quantile, are also quite accurate. The greater the number of tiers, the more accurate the approximation; however, fewer tiers will make the elicitation process simpler. There are limitations to working memory [46], and decision makers might have trouble adequately distinguishing between a large number of tiers.
The goal of this paper is not to identify a single "best" weight approximation method. It is to explore and discuss the performance of a variety of methods that could be included in an analyst's toolbox. Therefore, the methods themselves merit some final discussion regarding both their performance and their potential implementation.
While it was clear that rank-order centroid was the strongest performing method overall, it is important to remember the assumptions and limitations of the simulation approach. Primarily, it assumes that all sets of weights are equally likely. While this is certainly a reasonable assumption in a vacuum, it is unlikely to reflect an individual decision setting. If the decision maker knows that one of the 100 regions is by far the most important, or that all of the 100 regions are similarly important, then the accuracy of an approximation method is relevant only for that type of true weight vector, not for the entire n − 1 simplex. The sum-based approaches tend to produce more equitable sets of weights. The example regarding water usage in Montgomery County illustrated that for a large set of relatively similar weights, tier-quantile and tier-reciprocal substantially outperformed rank-order centroid. One could easily construct additional examples that are very well suited to specific methods. Table 5 summarizes and compares the relevant characteristics of each method. Note that a clear preference direction is associated with elicitation difficulty and overall accuracy; we prefer easier and more accurate methods. The variability in the weights produced by each method can help guide the selection if some preference information can be obtained regarding the range of importance across the set of regions. A more thorough comparison of weight approximation methods' accuracies for subsets of the n − 1 simplex or with partial information about preferences would be a valuable avenue for future work. Table 5. A brief summary evaluation of each approximation method for large n. Compared to the rank-based methods, the tier-based methods have the advantage of being faster and less cognitively demanding to do manually. If the construction of rankings and assignment to tiers require subjective judgment from the decision maker, then this is an important consideration; it may be extremely difficult to produce a coherent ranking if the number of regions is large. However, it is not uncommon to generate rankings or tiers mathematically based on one or more physical attributes of the regions. In such cases, we can ignore elicitation issues; the accuracy of the weight approximation method should be the overriding concern.

Method Elicitation Difficulty Overall Accuracy Variability of Weights
Finally, while the focus of this paper is on decisions with spatial outcomes, it is certainly possible for very large sets of weights to arise in other contexts. For example, Ewing, Tarantino, and Parnell [47] developed a customized approach to weight approximation for evaluating Army bases in the United States; their value function included 40 attributes.
Funding: Ths research received no external funding.

Acknowledgments:
The author is grateful to several colleagues for valuable comments and suggestions on a conference presentation of this work.

Conflicts of Interest:
The author declares no conflict of interest.

Appendix A. Apartment Attributes Experiment
This experiment was conducted via Amazon Mechanical Turk. Each subject was shown a 4 × 5 table of attributes describing apartments (20 attributes in total), with the attributes' positions in the table generated randomly. Each attribute was accompanied by a textbox in which the subject was able to enter a number. The list of attributes was the following: There was also a space at the end for subjects to state any attributes that were very important to them that were not included. Nearly all subjects either left it blank or confirmed that the list covered their main concerns.
In total, 120 subjects were included in the experiment. Each was randomly given one of the following three tasks (40 subjects per task): 1. Enter a number from 1-3 for each of the 20 characteristics below according to how much influence it would have on your choice of apartment, where 1 means "little-to-no influence," 2 means "some influence," and 3 means "a large influence". 2. Enter a number from 1-7 for each of the 20 characteristics below according to how much influence it would have on your choice of apartment, where 1 means: "would have little-to-no influence on my choice" and 7 means: "would greatly influence my choice". 3. Enter a ranking from 1-20 for each of the 20 characteristics below according to how much they would influence your choice of apartment, where 1 is the most important characteristic and 20 is the least important.
The third task deliberately did not state that ties were not allowed, and indeed, many of the responses contained ties. If subjects had been told to avoid ties, they would have needed to exert additional effort to make sure that each rank was entered exactly once. However, that additional time and cognitive burden could be avoided with a different interface, e.g., a drag-and-drop list of attributes, so this choice was made in the interest of obtaining a fair comparison between the methods' completion times.
First, all responses were checked for validity. Responses with blank or otherwise invalid entries were discarded, as were responses in which the subject clearly made very little effort to provide meaningful information (e.g., a tiering response that put every attribute in tier 1, or a ranking response that included only a few distinct ranks). Ties were permitted in rankings, provided that it was clear the response reflected a ranking rather than effectively a small set of tiers. In total, 36 out of 40 responses in the 3-tier group were valid, 31 out of 40 responses in the 7-tier group were valid, and 23 out of 40 responses in the ranking group were valid.
The primary hypothesis being tested was that tiering is faster than ranking. Using only the valid responses, the average completion time for the tiering groups (combined) was 207.6 s, while the average completion time for the ranking group was 252.5 s. A one-tailed t-test with unequal variances yielded a p-value of 0.04.
The secondary hypothesis was that tiering is faster with fewer tiers. Using only the valid responses, the average completion time for the 3-tier group was 183.2 s, while the average completion time for the 7-tier group was 235.9 s. A one-tailed t-test with unequal variances also yielded a p-value of 0.04.
An observation of note is that the shortest completion time among the valid rankings was 125 s, whereas both tiering groups contained several valid responses in under 90 s. This suggests that it is simply not possible to provide a coherent ranking as quickly as a coherent set of tiers.
There were no notable differences in the actual results between groups; attribute importance (whether by rank or by tier) was strongly correlated among the three groups. Though there was of course noticeable variability across individual responses, the one attribute worth mentioning specifically is "Rating of nearby public schools," which was frequently placed at both extremes.
Presumably it was very important to the subjects with children, and unimportant to the subjects without children. Table A1 shows the full results by block number.