#### 3.1. Simulation Approach

Each simulation will include 5000 trials, where a set of true weights is generated randomly in each trial and compared to the weights resulting from various approximation methods. We denote the set of weights generated by an approximation method as ${w}_{1}^{\prime},\dots ,{w}_{n}^{\prime}$.

The simulation approach for

${w}_{1},\dots ,{w}_{n}$ used in this paper comes from [

42], and is also used by [

31]. To generate sets of weights that must sum to one, we require a joint distribution over the

$n-1$ simplex, which represents all possible weight vectors. We assume that this distribution is uniform over the simplex, since we have no a priori reason to believe that any valid set of weights is more likely than any other. This is also known as a flat Dirichlet distribution, with the property that the marginal distribution of each individual weight is Beta

$(1,n-1)$. A straightforward way to simulate this distribution is to generate

$n-1$ uniform [0,1] random variables and plot them on a unit interval; this divides the unit interval into

n segments whose lengths follow the desired flat Dirichlet distribution. Further details regarding this simulation approach are given by [

42]. It was implemented here using Microsoft Excel and Visual Basic for Applications. Another possible simulation approach, explored by [

43], is to generate

n draws from an exponential distribution with rate 1, and divide each by the sum of the

n draws to obtain a set of weights.

For an approximation method to be viable, it should tend to produce ${w}_{1}^{\prime},\dots ,{w}_{n}^{\prime}$ that are sufficiently close to ${w}_{1},\dots ,{w}_{n}$. There are many possible error metrics that can serve as a basis for comparison. We focus on three of them. The first is mean squared error (MSE); that is, the average value of ${({w}_{i}^{\prime}-{w}_{i})}^{2}$. Other similar metrics such as mean absolute deviation and mean absolute percentage error yielded qualitatively similar results. For the sake of robustness, we also consider two metrics related to the quality of a resulting decision. One is hit rate: the proportion of trials in which the approximation led to the same choice that would have been made using the true weights. The other is average value loss: the average of the percentage loss in value as compared to the value that would have been achieved with the best possible choice. The value loss is zero if the best choice is selected. Both value calculations (the best value and the achieved value), are based on the true weights. Hit rate captures the frequency of mistakes, while average value loss captures both the frequency and severity of mistakes.

Of course, calculating hit rate and average value loss requires having alternatives from which to choose. We use two simple alternatives: one in which all regions have a value of 0.5, and one in which half of the regions have a value of 0.8 and the other half have a value of 0.2. The assignment of 0.8 and 0.2 values to regions is arbitrary; the probability density function for the weights is symmetric, as are all of the approximation methods. The results are robust to other changes in these alternatives as well, provided that the alternatives are sufficiently different from one another and both are similarly desirable before the weights are known.

#### 3.2. Approximation Methods

The simplest weight approximation method is to assign equal weights (EW) $1/n$ to every region. This is equivalent to evaluating an outcome based on the average value over all regions, and reflects no spatial preference information whatsoever. A perfect approximation method would yield the true weights every time; EW can be viewed as the opposing baseline, in which no attempt is made to ensure that ${w}_{1}^{\prime},\dots ,{w}_{n}^{\prime}$ are similar to ${w}_{1},\dots ,{w}_{n}$.

We will consider three rank-based methods: rank-sum (RS), rank-reciprocal (RR), and rank-order centroid (ROC). All three are based on an ordinal ranking of the regions from most to least important in the evaluation of outcomes. As mentioned previously, RS converts the

i-th ranked region to a weight of

$n+1-i$, while RR converts the

i-th ranked region to a weight of

$1/i$; both are then normalized such that the resulting weights sum to 1. The premise of ROC is, given that region

i has received rank

r, to set

${w}_{i}^{\prime}$ as the average true weight for region

i across all sets of true weights that are in the order stated. The weight conversion for the

i-th ranked region under ROC is:

Barron [

8] and Barron and Barrett [

31] provide the derivation of this formula; it is obtained by averaging the

i-th ranked region’s weight over all of the vertices of the subset of the

$n-1$ simplex that obeys the specified ranking.

For large

n, RS is the most similar to EW (i.e., the weights are closest to equal), and RR is the farthest from EW. For example, when

$n=100$, EW would result in

${w}_{i}^{\prime}=0.01$ for all

i. If region 1 is ranked first, then RS results in

${w}_{1}^{\prime}=0.0198$, ROC in

${w}_{1}^{\prime}=0.0519$, and RR in

${w}_{1}^{\prime}=0.1928$. EW, RS, RR, and ROC were evaluated by [

31] for smaller values of

n, and ROC was found to be the most accurate. This is, of course, influenced by the underlying uniform distribution of true weights (Barron and Barrett’s [

31] simulation, like the one used in the current paper, assumes a uniform distribution over the

$n-1$ simplex). As Danielson and Ekenberg [

44] discuss, this is analogous to a “point allocation” process with

$n-1$ degrees of freedom; if preferences followed a “direct rating” approach with

n degrees of freedom followed by a normalization, the distribution of weights would be substantially different (the weights would tend to be closer to equal). It is possible that the context of a decision would lead the analyst to suspect otherwise. For instance, if the region weights were likely to be similar for political reasons, then RS would fare better, and ROC and especially RR would fare worse.

Because manually constructing an ordinal ranking may prove prohibitively difficult when

n is very large, we introduce three analogous methods based on tiers rather than ranks. Instead of ranking all

n regions, the decision maker assigns each region to one of

K tiers reflecting various levels of importance. Assume that

$n\gg K$. The tiers are indexed by

$k=1,\dots ,K$. The choice of

K reflects a tradeoff between precision and simplicity, and will be discussed in

Section 4. Let

${t}_{i}$ denote the tier to which region

i is assigned. No further assessment from the decision maker is required once

${t}_{1},\dots ,{t}_{n}$ have been obtained. In particular, ranking is not required within tiers.

There are two clear advantages to tier-based methods as compared to rank-based methods. First, when the number of regions is very large, it is substantially faster. In terms of computational complexity, assigning

n regions to tiers is O(

n); i.e., the longest possible time required by an algorithm to complete it is linear in

n. Optimal ranking algorithms, assuming no parallel processing, are O(

$n\text{}\mathrm{log}\text{}n$) [

45]. However, the simpler ranking algorithms that are more feasible for humans, such as selection or insertion, are O(

${n}^{2}$), which is substantially slower than tiering. Furthermore, these comparisons of speed are purely based on algorithmic complexity; they do not consider the cognitive demand on the decision maker, which is likely lower for tiering than it is for ranking. Tiering never requires the decision maker to compare one region to another explicitly. Of course, if the ranking is constructed automatically via calculations based on physical measures (population, distance from a river, soil quality, etc.) with no subjective judgment, then this is not an issue.

To confirm that tiering is indeed easier than ranking, a simple experiment was conducted in which subjects were asked to consider 20 different attributes of apartments. One group of subjects was instructed to rank the attributes from most to least important, a second group was instructed to assign each attribute to one of three tiers (

$K=3$), and a third group was instructed to assign each attribute to one of seven tiers (

$K=7$). The ranking group had the highest average completion time, as well as the highest proportion of incomplete or otherwise invalid responses. A more detailed explanation of the experiment and results can be found in

Appendix A.

The second benefit of tiering is that it is much more flexible. In a rank-based approach without ties, there are only n possible values for ${w}_{i}^{\prime}$; any individual approximation method such as RS, RR, and ROC will always yield the same n weights. When using tiers instead of a ranking, the exact number of possible values for ${w}_{i}^{\prime}$ will depend on how the tiers are converted to weights, but any reasonable conversion method will be able to produce substantially more than n values. (To see why this is true, consider a simple example with three tiers and four regions, and let the first region be assigned to the first tier (${t}_{1}=1$). It should be clear that $({t}_{2},{t}_{3},{t}_{4})=\{(1,1,1),(2,1,1),(2,2,2),(3,2,2),(3,3,3)\}$ should lead to five different increasingly large values of ${w}_{1}^{\prime}$. This constitutes only some of the possible weight estimates for a single tier; the total number of possible values for ${w}_{1}^{\prime}$ is much larger than five).

As in the case of rankings, we must choose a method by which the set of tiers will be converted to weights. We will start by assigning a score

${s}_{k}$ to each of the

K tiers. Then, once

${t}_{1},\dots ,{t}_{n}$ have been assessed, we can calculate each weight as:

The challenge is how each of the ${s}_{k}$ should be determined. It is conceptually analogous to converting a rank to a weight, and thus several analogous approaches are possible. We will consider three in particular: tier-sum (TS), tier-reciprocal (TR), and tier-quantile (TQ). The adaptations for TS and TR are straightforward. Under TS, ${s}_{k}=K+1-k$, and under TR, ${s}_{k}=1/k$. The only notable difference from RS and RR is that the denominators for normalizing the weights are not known a priori; they will depend on how many regions are assigned to each tier.

TQ involves a somewhat more involved calculation of

${s}_{k}$. The reasoning behind TQ is similar to that of ROC: when region

i has been placed into tier

k, we would like TQ to set

${w}_{i}^{\prime}$ as the average true weight that would lead to the region being placed into tier

k. However, unlike a ranking, a set of tier assignments

${t}_{1},\dots ,{t}_{n}$ does not unambiguously determine the subset of the

$n-1$ simplex within which the weights must lie. Therefore, we instead base TQ’s tier scores on the premise that the decision maker is equally likely to place a given region

i into any of the

K tiers. We can then take advantage of a convenient property of the flat Dirichlet distribution of true weights: each region’s true weight will follow a Beta(

$1,n-1$) marginal distribution. Thus, we can interpret

${t}_{i}=k$ as a statement by the decision maker that

${w}_{i}$ is in the

k-th quantile (out of

K quantiles) of a Beta(

$1,n-1$) distribution, and set

${s}_{k}$ as the mean of that quantile. The resulting score is given by:

where

${k}_{l}$ and

${k}_{h}$ are the lower and upper bounds, respectively, of the

k-th quantile of Beta(

$1,n-1$). An illustration of the resulting scores is shown in

Figure 1 with

$n=50$ and

$K=4$. As a reminder, when

${t}_{i}=k$, that does not mean

${w}_{i}^{\prime}={s}_{k}$. The scores must still be normalized as shown in Equation (

3);

${s}_{k}$ could be considered the raw weight associated with tier

k. Though Equation (

4) may appear cumbersome, it is a straightforward spreadsheet calculation.