Path-Independent Consideration

: In the context of choice with limited consideration, where the decision-maker may not pay attention to all available options, the consideration function of a decision maker is path-independent if her choice cannot be manipulated by the presentation of the choice set. This paper characterizes a model of choice with limited consideration with path independence, which is equivalent to a consideration function that satisﬁes both the attention ﬁlter and consideration ﬁlter properties from Masatlioglu et al. (2012) and Lleras et al. (2017), respectively. Despite the equivalence of path-independent consideration with the consideration structures from these two papers, we show that, to have a choice with limited consideration that is path-independent, satisfying both axioms on the choice function that characterize choice limited consideration with attention and consideration ﬁlters unilaterally (from Masatlioglu et al. (2012) and Lleras et al. (2017)) is necessary but not sufﬁcient.


Introduction
Individuals do not always compare all alternatives in making a decision, particularly when a decision problem is complex or contains many alternatives. For instance, Lapersonne et al. [1] reported that 22% of new car customers consider only one brand out of more than 100 car brands available in USA. Instead, a decision maker (DM) often forms a consideration set, a subset of her actual feasible set, and ignores the rest (see, e.g., [2]).
The issue of limited consideration shakes the main principle of revealed preferences and raises the following question: How can we identify the DM's preference by observing her choice under limited attention? It is not straightforward to answer this question by the standard revealed preference tools since the revealed preference implicitly relies on the knowledge of consideration, which is not observable in real life. The authors of [3,4] managed to provide answers to the above question when she picks her most preferred item from her consideration set, not from her actual entire feasible set. These two papers impose certain assumptions on the consideration set and employ the feasible set variations to reveal preferences.
The conditions on the formation of consideration sets contemplated in [3,4] are called attention filter and competition filter. According to the attention filter, the consideration set is not affected when overlooked alternatives are removed from her feasible set, whereas the competition filter requires that, if she ignores some alternatives, she will also do so when her feasible set expands. Attention filter is plausible when her inattention is based on unawareness, while, for a competition filter, the alternatives are competing for DM's attention (such as in a large supermarket).
The consideration set formation in both models are vulnerable to manipulation. To illustrate a manipulation possibility in a simple example, consider the following consideration set formation which is an attention filter (but not a competition filter). There are three products x, y, and z, where z is considered only when both x and y are present. 1 The consideration set of such DM is {x, y, z} if everything is presented at once. On the other hand, when x and z are offered first and then y, z is never considered. Hence, the consideration set of a DM may be manipulated by way the choice problem is presented. 2 More precisely, if instead of presenting a large menu, A, the choice problem is divided into sub-menus, S and T, where S ∪ T = A, and first the sub-menu S is presented. Once a DM forms her consideration based on S, then the sub-menu T is presented. Now, the DM forms her consideration based on the sub-menu T and the consideration set of S. If this consideration set is different from presenting the large menu A as a whole, a firm that is responsible for presenting DMs with menus may have the ability to manipulate the DMs' consideration sets.
In this paper, we are interested in situations where the formation of consideration sets are free from this kind of manipulation. This non-manipulability requirement is captured by the well-known property of choice theory, path-independence, proposed by Plott [5], which imposes a "consistency" requirement on how consideration sets are determined across comparable situations. Path-independence requires that the consideration set cannot be manipulated by changing the presentation of the set of alternatives.
Our model rests on the realistic assumption of path-independence and thus offers a better understanding non-manipulation in the consideration set formation. This property not only will eliminate manipulation examples mentioned above but also makes the revealed preference analysis more complete.
In the literature, the formation of the consideration sets has been motivated by behavioral reasons, such as shortlisting [6] 3 , rationalization [16], and categorization [17]. At first glance, the path-independence property may be normatively plausible but does not sound behaviorally plausible. However, the path-independence property is equivalent to consideration set formation satisfying both the attention filter by Masatlioglu et al. [3] and the competition filter by Lleras et al. [4]. Therefore, the path-independence property, although it appears to be demanding, has a behavioral background, too. In Section 2, we provide a list of heuristics generating a consideration set satisfying the path-independence property.
The organization of this paper is as follows. Section 2 introduces notations and the model. Section 3 provides our characterization. Section 4 analyzes the revealed preference.

Model
We denote the set of alternatives by X, which is an arbitrary non-empty finite set. X denotes the set of all non-empty subsets of X with cardinality at least 2. Each subset of X is a choice problem. Let c be a choice function: c : X → X and c(S) ∈ S for all S ∈ X . Our DM assigns a unique alternative for each choice problem S. Let be a complete, transitive, and antisymmetric binary relation (a linear order) over X. max( , S) represents the best element in S with respect to .
Let Γ : X → X be a self-map on X , that is, Γ(S) ⊂ S for all S ∈ X . Γ(S) represents the consideration set under S ∈ X ; that is, the set of alternatives is considered when the DM is facing feasible set S. Since the DM can only consider options that are available, Γ(S) must be a subset of S.
The next definition describes the behavior of DM with limited consideration. That is, the DM picks her most preferred alternative within her consideration set, not the entire feasible set. 1 This example is from page 2198 in [3] called "pairwisely unchosen". 2 A DM who has a competition (but not an attention) filter can be manipulated in a similar manner. For instance, suppose she ignores y when x is present and z when y is given. This is a competition filter but not an attention filter. She considers only x when all of the three are presented at once. On the other hand, she will consider x and z when she first sees x and y (so discards y) and then is given z, as y has been already gone. 3 For the extensions of the shortlisting procedure, see also [7][8][9][10][11][12][13][14][15].

Definition 1. A choice function c is a choice with limited consideration (LC) if there exists a linear order and a consideration mapping Γ such that
The LC model is very broad. Indeed, without further condition on Γ, any choice behavior has an LC representation. The authors of [3,4] proposed two conditions on consideration sets. Masatlioglu et al. [3] required that a DM who overlooks some feasible alternative has the same consideration set if that alternative is removed. This property is called Attention Filter (AF). 4 Lleras et al. [4] imposed that, when the size of the opportunity set gets larger, DMs tend to overlook more options. This property is called Competition Filter (CF). 5 CF : If for all x and y, x / ∈ Γ(S) then x / ∈ Γ(S ∪ y).
It turns out that, if a consideration function satisfies both AF and CF, then it also satisfies the well-known path independence condition. In consideration context, path independence would mean that the final consideration set is independent of the way the alternatives were initially presented.
Path independent imposes a consistency property on how the consideration sets are determined. An important benefit of PI is the elimination of manipulation possibilities. That is, the consideration set of our DM cannot be altered by different ways of presenting the available set of alternatives. We now define our model formally.

Definition 2.
We say c is a π-LC model if c is a choice with limited consideration and Γ satisfies path independence.
Before we state our main result, we first state the result of Aizerman and Malishevski [28] that AF and CF together is equivalent to PI. Then, we provide a list of examples of consideration set formations satisfying path independence.

Theorem 1. ([28]) A consideration set Γ is path independent if and only if it satisfies AF and CF.
There are several examples of non-manipulable heuristics. For example, in elimination by aspects of Tversky [29], at each stage, the DM selects an aspect perceived and eliminates alternatives lacking that attribute. The DM continues selecting aspects and eliminating products. The process stops at stage n when there is no alternative left. The consideration set is the set of alternatives survived at stage n − 1. Additional examples can be listed as follows: The DM considers: • only the three cheapest suppliers in the market [30]; • the products that appear on the first page of the web search and/or sponsored links [31]; and • the first N available alternatives according to an exogenously given order [32]; • only a job candidate if she is the best in a program or the top-two job candidates from all first-tier schools and the top candidate from second-tier schools; or • only the cheapest car, the safest car, and the most fuel-efficient car on the market.

Characterization
In this section, we provide necessary and sufficient conditions for our model where the consideration set mapping satisfies path independence. In other words, we ask how one could decide whether choice data are consistent with the π-LC model.
The weak axiom of revealed preference (WARP) characterizes the preference maximization. However, WARP does not distinguish between "being feasible" and "being considered." Therefore, one cannot decide that an alternative is chosen from a choice problem without confirming that the alternative is considered. The question is: How we can infer that an alternative is considered? The answer for this question depends on the structure imposed on the consideration set. The authors of [3,4] provided two axioms characterizing their models. We now state each of them.
In [3], if removing an alternative from a set changes the DM's choice, they infer this alternative is considered. which is the additional requirement for x * to be chosen from T. Masatlioglu et al. [3] introduced the following axiom.

WARP-AF:
For any nonempty S, there exists b * ∈ S such that, for any T including b * , An alternative way to state this axiom is through revealed preferences. Whenever the choices change as a consequence of removing an alternative, the initially chosen alternative is preferred to the removed one. Formally, for any distinct x and y, define: WARP-AF indeed guarantees that the binary relation P AF defined in (1) is acyclic and it fully characterizes the class of choice functions generated by an attention filter. The lemma from [3] states that WARP-AF is equivalent to the fact that P AF has no cycle.

Lemma 1 ([3]). P AF is acyclic if and only if c satisfies WARP-AF.
Given this result, we now illustrate that the just observing the following two choice reversals falsifies WARP-AF. c(S 1 ) = x = c(S 1 \ y), and c(S 2 ) = y = c(S 2 \ x).
These observations reveal that xP AF y and yP AF x, which is a cycle. By Lemma 1, WARP-AF is violated.
In [4], if an alternative is considered in larger set, then it must be considered in a smaller set. That is, if b * = c(T ) for some T ⊃ T, then b * ∈ Γ(T ) since a necessary condition for choice is that the b * is considered. Since Γ is a competition filter, b * ∈ Γ(T). The following axiom of Lleras et al. [4] summarizes this discussion.

WARP-CF
For any nonempty S, there exists b * ∈ S such that for any T including b * , Similar to the above, we can state this axiom through revealed preference. Whenever a DM's choices from a small set and a larger set are inconsistent, the former reflects her true preference under CF more than the latter. Formally, for any distinct x and y, define the following binary relation: Similar to Masatlioglu et al. [3], the binary relation P CF defined in (2) is acyclic and it fully characterizes the class of choice functions generated by an attention filter. The lemma from [4] states that WARP-CF is equivalent to the fact that P CF has no cycle.

Lemma 2 ([4]). P CF is acyclic if and only if c satisfies WARP-CF.
Only three observations can falsify this axiom. For example, consider the following choice pattern: c({x, y, z, t}) = y, c({x, y, z}) = x and c({x, y}) = y.
The first two observations imply that xP CF y. Similarly, the last two observations indicate yP CF x, which leads a cycle of two. By Lemma 2, the axiom is violated.
Since each axiom characterizes the corresponding model, and the path-independence is equivalent to an AF and CF, it is tempting to claim that WARP-AF and WARP-CF would characterize the π-LC model where the consideration structure satisfies both AF and CF. However, that is not the case. The following example satisfies the both axioms, but it cannot be represented a pair of preference and consideration set satisfying path independence.
Consider the following "choosing pairwisely unchosen" pattern (the chosen alternative from {x, y, z} is never chosen in any binary comparisons): First, note that c satisfies both WARP-AF and WARP-CF. To see this, note c involves just two choice reversals: when y or z is removed from {x, y, z}. Therefore, the revealed preference generated by AF is just xP AF y and xP AF z and the one by CF is only yP CF x and zP CF x. Neither of them contains a cycle so c satisfies both axioms. Nevertheless, x must be best in AF and worst in CF, which is not compatible, so c cannot be represented by Γ satisfying AF and CF simultaneously. In other words, there is no (Γ, ) pair that can rationalize these data, where Γ satisfies PI.
The axiom we propose is a stronger version of both WARP-AF and WARP-CF. Remember that both axioms require that every set S has the "best" alternative x * and it must be chosen from any other decision problem T as long as it attracts attention. Remember that, with an attention filter, an alternative, say x * , attracts attention at a choice set, T, when removing it changes the choice, i.e., c(T) = c(T \ x * ). Now that we assume that the consideration set is path independent, we can also conclude it when we know x * is paid attention to at some bigger decision problem T ⊃ T by observing c(T ) = c(T \ x * ). Therefore, we need to state that, if the removal of x * changes the choice in some superset of T, then it attracts attention at T.

(WARP-PI)
For any nonempty S, there exists x * ∈ S such that for any T x * , It turns out that WARP-PI is the necessary and sufficient condition for the π-LC model.

Theorem 2. (Characterization) A choice function satisfies WARP-PI if and only if it is a π-LC model. 6
Theorem 2 characterizes a special of class of choice behavior we above. The characterization involves a single behavioral postulate which is stronger than both WARP-AF and WARP-CF. We show that this model has higher predictive power, which comes with diminishing explanatory power: "choosing pairwisely unchosen" is no longer within the model. It is routine to verify that this choice behavior satisfies WARP-PI. 7 Hence, Theorem 2 implies that it is consistent with a π-LC model. The choice reversal between {x, y, z} and {x, y} yields that her preference must be x y z. This implies that we can uniquely pin down preference for this choice behavior. Note that this is not true for the models of Masatlioglu et al. [3] and Lleras et al. [4].
In addition to unique preference, we can also reveal the unique consideration set mapping. To see this, consider the set {x, y, z}. First of all, the choice, which is y, must be in the consideration set. Since removing z changes the choice, z is also in it (attention filter). Finally, we know x is better than the choice from the above discussion, as x does not belong the consideration set of {x, y, z}. Hence, Γ({x, y, z}) = {y, z}. In addition, path independence requires that y and z attract attention whenever they are available, which pins down the consideration set mapping uniquely for this example. Theorem 2 states that it is possible to test our model non-parametrically from observed choice behavior even when the consideration sets themselves are unobservable. 8

Revealed Preference
In this section, we discuss the revealed preference of our model. One might suspect that P AF ∪ P CF should be the revealed preference of this model. The following example illustrates that this is not the case. The example shows that there is an additional preference revelation, which cannot be captured even by the transitive closure 9 of P AF ∪ P CF .

Example 1. [Hidden Revelation
] Consider the following behavior with four alternative x, y, z, t: A DM chooses z whenever z is available except in two occasions {x, z, t} and {z, t}, from which the DM chooses t. When z is not available, the DM chooses t whenever t is available. Lastly, the DM chooses x from {x, y}. It is routine to show that this choice behavior satisfies WARP-PI, hence it is a π-LC model. 10 Let X be {x, y, z, t}. Consider the following choice behavior on X: A DM chooses z whenever z is available except from {x, z, t} and {z, t}, x from {x, y}, and t from the rest of decision problems.
The DM exhibits only one choice reversal: c({x, y, z, t}) = z = c({x, z, t}) = t. This implies that we must have zP AF y and tP CF z. This implies that t must be better than z and z is better than y (of course, t is better than y). However, there is no revelation between x and y according to P AF ∪ P CF .
We now illustrate that, in our model, we reveal that x must be better than y. To see this, c(x, y, z, t) = z and c(x, z, t) = t implies y ∈ Γ({x, y, z, t}). Then, we must have y ∈ Γ({x, y}). Since x is chosen from {x, y}, x must be better than y. However, this is not captured by either P CF or P AF .
Given this observation, we provide a characterization for the revealed preference when Γ is known to be path independent. To do this, we consider cyclical choice behavior: c({x, y, z}) = x, c({x, y}) = x, c({y, z}) = y, c({x, z}) = z. Here, we can uniquely pin down the preference for the cyclical choice example when Γ is path independent. To see this, first note that c({x, y, z}) = x implies that the DM pays attention to x at {x, y, z}, so she does at {x, z} (revealed attention due to competition filter). Since she picks z from {x, z}, we can conclude that she prefers z over x (revealed preference). Since c({x, y, z}) = c({x, z}), y must attract attention at {x, y, z} (revealed attention due to attention filter). Since she picks x from {x, y, z}, we can conclude that she prefers x over y (revealed preference). Therefore, her preference is uniquely pinned down: z x y. Now, we generalize this observation. Suppose c(T) = c(T \ y) and c(T) = y. Then, we conclude that y must be paid attention to at T, hence c(T) y. Since Γ is path independent, c(T) must attract attention at any decision problem S smaller than T including c(T). Therefore, if c(S) = c(T), c(S) is revealed to be preferred to c(T), hence c(S) c(T) y. Formally, for any distinct pair of x and y, define: Note that the second condition in the definition of P PI holds trivially when y is equal to c(T). This implies that c(T) must have been considered not only at T but also at any decision problem S smaller than T including c(T) since Γ satisfies PI. Therefore, whenever c(T) ⊆ S ⊂ T and c(T) = c(S), we have x = c(S) c(T) = y.
As above, if xP PI y and yP PI z for some y, we also conclude that she prefers x to z even when xP PI z does not hold. The following proposition states that the transitive closure of P PI , denoted by P PI R , is the revealed preference.

Proposition 1.
Suppose c is a π-LC model. Then, x is revealed to be preferred to y if and only if xP PI R y.
Proof. The if-part is already demonstrated above. The only-if part can be shown paralleled with Theorem 2, where we show that any including P PI R represents c by choosing Γ properly.
Finally, note that P PI must include both P AF and P CF , but it might include more. To show this, we revisit Example 1 and illustrate that P PI captures x better than y, which is missed by both P AF and P CF . Let T = {x, y, z, t} and S = {x, y}. Since c(T) = c(T \ y) and c(S) = x, we must have xP PI y. Hence, our model reveals more preference information than the combined models of Masatlioglu et al. [3] and Lleras et al. [4].
Author Contributions: The authors contributed equally to this work. All authors have read and agreed to the published version of the manuscript. The proof of Lemma A1 is completely analogous to the proofs of Lemmas 1 and 2 (see [3,4]), hence we skip it here.
Let P PI R be the transitive closure of P PI and let be any arbitrary completion of P PI R . For every S, we call B ⊂ S a minimum block of S if and only if c(S) = c(S \ B) but c(S) = c(S \ B ) for any B B. Given this, define Γ recursively as follows: 1. Γ(X) consists of the -worst element of each of X's minimum blocks.

2.
Suppose Γ has been already defined for all proper supersets of S. Then, define Γ(S).
If there is a minimum block of S that does not have an element in Γ(S) according to the above, add the -worst element into Γ(S). Therefore, we have c(S)P PI x so it must be c(S) x.

Claim 1. Γ is path independent.
Proof. Γ satisfies CF by construction so we shall prove that Γ satisfies AF. Suppose not, i.e., x / ∈ Γ(S) and Γ(S) = Γ(S \ x). Since Γ satisfies CF, we must have Γ(S) ⊆ Γ(S \ x). Hence, Γ(S) Γ(S \ x), that is, there exists y ∈ S such that y / ∈ Γ(S), but y ∈ Γ(S \ x). Then, there exists T ⊃ S such that: (i) T \ x has a minimum block B and y is the worst element in B; and (ii) none of the elements in B are included in Γ(T ) for any T T \ x. Then, we must have c(T) = c(T \ x). Otherwise, {x} is a minimum block of T so we have x ∈ Γ(T ) that implies x ∈ Γ(S). Therefore, we have Therefore, by Lemma A2 (iii), T has a minimum block that is a subset of x ∪ B so at least one element in x ∪ B must be in Γ(T), which is a contradiction. Now, we want to show that ( , Γ) represents c. Since Lemma A2 (i) implies that c(S) ∈ Γ(S), all we need to show is that c(S) y for all y ∈ Γ(S) \ c(S).
Proof. Since y ∈ Γ(S), there exists T ⊃ S such that y ∈ Γ(T). Furthermore, T has a minimum block B where y is the worst element and none of elements in B is in Γ(T ) for any T T. There are three easy cases: (i) if c(S) = c(T) then by Lemma A2 (ii) we have c(S) = c(T) y; (ii) if y = c(T) then we have c(S)P PI y so it must be c(S) y; and (iii) if c(S) ∈ B, then c(S) y by construction. Therefore, we only need to investigate the case when y = c(T) = c(S) and c(S) / ∈ B. Note that c(T) y in this case by Lemma A2 (ii). Now, let S = S \ B. Since y ∈ B, S is a proper subset of S.
Case I: c(S ) = c(S) for some S where S ⊂ S ⊂ S. By Lemma A2 (iii), S has a minimum block B that is a subset of S \ S ⊂ B. Since c(S) / ∈ B (⊂ B), every element in B is worse than c(S) by Lemma A2 (ii). Since y is the worst element in B that is a superset of B , we conclude c(S) y.