#### 3.1. A College Admissions Model

We begin with a standard college admissions model as in Chapter 5 of

Roth and Sotomayor (

1990) using different notation that is suitable for our purpose. Then we introduce a random preference profile, and make explicit the distribution of the observed matching.

Suppose that we have a set of students indexed by ${N}_{s}=\{1,\dots ,{n}_{s}\}$ and that of colleges indexed by ${N}_{c}=\{1,\dots ,{n}_{c}\}$. In many situations, the colleges are capacity-constrained for various reasons. For each college j, let ${q}_{j}$ be a positive integer that represents the quota of college j. To accommodate the possibility of students or colleges unmatched, define ${N}_{s}^{\prime}={N}_{s}\cup \{0\}$ and ${N}_{c}^{\prime}={N}_{c}\cup \{0\}$, so that unmatched student (or college) is viewed as being matched to 0. (When we need to view ${N}_{s}^{\prime}$ and ${N}_{c}^{\prime}$ as an ordered set, according to the ordering of natural numbers, we take 0 to be the last element of ${N}_{s}^{\prime}$ and ${N}_{c}^{\prime}$.) A (many-to-one) matching under (the capacity constraint)$q={({q}_{j})}_{j\in {N}_{c}}$ is defined as a (point-valued) map $\mu :{N}_{s}^{\prime}\to {N}_{c}^{\prime}$ such that $|{\mu}^{-1}(j)\cap {N}_{s}|\le {q}_{j}$ for each $j\in {N}_{c}$, i.e., the number of the students assigned to each college does not exceed its capacity.

The matching result depends on a preference profile of agents. In this paper, we allow for only strict preferences so that each student is never indifferent between choices from ${N}_{c}^{\prime}$ and the same with each college. It is convenient if we represent each preference ordering with a permutation of the agent indices. Let ${\Pi}_{c}$ and ${\Pi}_{s}$ be the collections of permutations over ${N}_{s}^{\prime}$ and ${N}_{c}^{\prime}$ respectively, so that each permutation is associated with a preference ordering. (A word of caution with subscripts: ${\Pi}_{c}$ represents the set of preferences of colleges over students.) For example, suppose that ${N}_{c}^{\prime}=(0,1,2,3,4)$. If a student has preference ordering$\pi \in {\Pi}_{s}$ over colleges ${N}_{c}^{\prime}$ such that $\pi =(3,2,0,4,1)$ (or equivalently, $\pi (1)=3,\phantom{\rule{4pt}{0ex}}\pi (2)=2,$$\pi (3)=0,$ etc., this means the student ranks college 3 as highest, and college 1 as lowest (even lower than being unmatched.)

More formally, for given $\pi \in {\Pi}_{s},$ we write ${i}_{1}{\succ}_{\pi}{i}_{2}$ if ${\pi}^{-1}({i}_{1})<{\pi}^{-1}({i}_{2})$, i.e., ${i}_{1}$ is ranked higher than ${i}_{2}$ by preference $\pi $. Given preference $\pi \in {\Pi}_{s}$ of student i over colleges, we say that college j is acceptable by student i if $j{\succ}_{\pi}0$. We make similar definitions for college preferences. For two sets ${S}_{1}$ and ${S}_{2}$ of students and a preference ordering $\pi $ of a college, we write ${S}_{1}{\succ}_{\pi}{S}_{2}$, if for all ${i}_{1}\in {S}_{1}$ and ${i}_{2}\in {S}_{2}$, ${i}_{1}{\succ}_{\pi}{i}_{2}$.

The collection of preference ordering profiles,

$\mathit{\pi}=({\mathit{\pi}}_{s},{\mathit{\pi}}_{c})$, is given by

where

${\mathit{\pi}}_{s}\in {\Pi}_{s}^{{n}_{s}}$ and

${\mathit{\pi}}_{c}\in {\Pi}_{c}^{{n}_{c}}$. Given a quota vector

q, we call any map

$\mu :{N}_{s}^{\prime}\times \mathbf{\Pi}\to {N}_{c}^{\prime}$ a

matching mechanism under q, if for each

$\mathit{\pi}\in \mathbf{\Pi}$,

$\mu (\xb7;\mathit{\pi})$ is a matching under

q.

4In modeling a predicted outcome of matching in economics, it is standard in the literature to focus on (pairwise) stable matchings (

Roth and Sotomayor (

1990)).

5**Definition** **1.**

(i) A matching $\mu :{N}_{s}^{\prime}\to {N}_{c}^{\prime}$ is stable with respect to $\mathit{\pi}=({\mathit{\pi}}_{s},{\mathit{\pi}}_{c})\in \mathbf{\Pi}$, if the following two conditions are satisfied.

(a) There is no $i\in {N}_{s}$ such that $0{\succ}_{{\pi}_{i}}\mu (i)$ and no $j\in {N}_{c}$ such that $0{\succ}_{{\pi}_{j}}{i}^{\prime}$ for some ${i}^{\prime}\in {\mu}^{-1}(j)$.

(b) There is no pair $(i,j)\in {N}_{s}\times {N}_{c}$ such that both $j{\succ}_{{\pi}_{i}}\mu (i)$ and $i{\succ}_{{\pi}_{j}}{i}^{\prime}$ for some ${i}^{\prime}\in {\mu}^{-1}(j)$.

(ii) A matching mechanism $\mu :{N}_{s}^{\prime}\times \mathbf{\Pi}\to {N}_{c}^{\prime}$ is stable if $\mu (\xb7;\mathit{\pi})$ is stable with respect to each $\mathit{\pi}\in \mathbf{\Pi}.$

Definition 1 says that a matching is stable with respect to $\mathit{\pi}$, if (a) each student matched with a college prefers to be matched with the college than to remain unmatched, and each college matched with a student prefers to be matched with the student than to remain unmatched, and (b) there is no unmatched pair of a student and a college both of whom prefer to be matched with each other than with their current matches.

#### 3.3. The Joint Distribution of a Large Observed Matching

In this section, we obtain an explicit expression of the distribution of the observed matching when the matching arises from a stable matching mechanism. Throughout this paper, we regard ${x}_{s,i}$ and ${x}_{c,j}$ as non-stochastic, which means that all other unobserved random components such as $\eta $ and $\epsilon $ are independently drawn from these observed characteristics. Recall that the preference profile of students and colleges are given by $\mathit{\pi}(\epsilon ,\eta )=({\pi}_{c}(\eta ),{\mathit{\pi}}_{s}(\epsilon ))$, where we simply write the preference profile of colleges as ${\pi}_{c}(\eta )$ because it is the same across the colleges.

Let

$\mathbf{Y}$ denote the observed matching which is generated by a stable matching mechanism, say,

$\mu (\xb7;\mathit{\pi}(\epsilon ,\eta ))$.

7 In other words,

This is a reduced form for the observed matching. The randomness of the observed matching comes from the randomness of the students’ and colleges’ preferences (i.e., $\eta $ and $\epsilon $).

We make the following assumptions regarding the random preferences for students and colleges.

**Assumption** **1.** $({v}_{c}(\eta ),{({v}_{s}(i;{\epsilon}_{i}))}_{i\in {N}_{s}}))$ is a continuous random vector.

The continuity assumption is made to generate strict preferences. Later we assume that the distributions of ${\eta}_{i}$’s and ${\epsilon}_{i,j}$’s are independently drawn from certain parametric families of distributions.

Let us define

which is the set of students that colleges prefer to match than to stay unmatched with any student. The stability of matching

$\mu $ requires that

$\mu (i)=0$ for all

$i\notin {N}_{s}(\eta )$ that is, students not preferred by any of the colleges (to the alternative of staying unmatched) are not matched with any colleges under

$\mu $. If

${N}_{s}(\eta )=\u2300$, there is no college-student pair that is matched under a stable matching

$\mu $. Thus, it suffices to determine the match between colleges with students in

${N}_{s}(\eta )$ when

${N}_{s}(\eta )$ is not empty. Let us enumerate the student indexes in

${N}_{s}(\eta )$ as

$\{S(1),\dots ,S({n}_{s}^{\prime})\}$ with

${n}_{{s}^{\prime}}=|{N}_{s}(\eta )|$, so that

$S:\{1,\dots ,{n}_{s}^{\prime}\}\to {N}_{s}$ is a (random) map depending on

$\eta $.

As we shall see now, the stability of matching yields a useful distributional characterization of the observed matching. For each

$S(i)\in {N}_{s}(\eta )$, and for any set

$A\subset {N}_{c}^{\prime}$, let

In other words,

${\rho}_{i}(A;{\mathit{\pi}}_{\mathit{s}})$ is the ranking of a college that is most preferred among colleges

j in

A by student

$S(i)$. Hence

denotes the college among

A that is most preferred by student

$S(i)$.

Suppose that the colleges’ homogeneous preference is given by

${\pi}_{c}(\eta )(1)=S(1)$,

${\pi}_{c}(\eta )(2)=S(2)$,…,

${\pi}_{c}(\eta )({n}_{s}^{\prime})=S({n}_{s}^{\prime})$. In other words, the colleges prefer student

$S(i)$ to

$S(j)$ in

${N}_{s}(\eta )$ if and only if

$i<j$. Thus, let us simply write

${\pi}_{c}(\eta )$ as

${N}_{s}(\eta )$. In the stable matching with this

${\pi}_{c}(\eta )$, student

$S(1)$ is ranked highest by all the colleges and chooses his most preferred college first. Then student

$S(2)$ chooses his most preferred college among the colleges whose quota is not filled yet, etc.

8 To formalize this matching mechanism, let us define for each student

$S(i)\in {N}_{s}(\eta )$,

where

${N}_{[0]}({\mathit{\pi}}_{s})={N}_{c}^{\prime}$, (setting

${q}_{j}=\infty $ if

$j\in {N}_{c}^{\prime}$ is 0),

9
and

The college $\tau ([1];{\mathit{\pi}}_{s})$ is one that student $S(1)$ prefers most among all the colleges in ${N}_{c}^{\prime}$. Then $\tau ([2];{\mathit{\pi}}_{s})$ is a college that student $S(2)$ prefers most among all the colleges whose quota is not yet filled once student $S(1)$ is assigned to college $\tau ([1];{\mathit{\pi}}_{s})$. Now $\tau ([3];{\mathit{\pi}}_{s})$ is a college that student $S(3)$ prefers most among all the colleges whose quota is not yet filled once students $S(1)$ and $S(2)$ are assigned to colleges $\tau ([1];{\mathit{\pi}}_{s})$ and $\tau ([2];{\mathit{\pi}}_{s})$ respectively.

Please note that the assignment of each student

$S(i)$ to a college

$\tau ([i];{\mathit{\pi}}_{s})$ is fully known up to the students’ preference profile

${\mathit{\pi}}_{s}$. Thus, the matching of each student

$S(i)$ to a college

$\tau ([i];{\mathit{\pi}}_{s})$ is a unique stable matching that is explicitly represented as a function of random preferences. In other words, for each

$S(i)\in {N}_{s}(\eta )$,

Please note that $\tau ([i];{\mathit{\pi}}_{\mathit{s}})$ depends on the history of choices of students $S(1),S(2),\dots ,S(i)$, not merely that of the last student $S(i)$.

It is straightforward to extend this result to a general case where ${\pi}_{c}(\eta )(i)\ne S(i)$ for some $i=1,\dots ,{n}_{s}^{\prime}$. We begin with student ${\pi}_{c}^{-1}(\eta )(1)$ (i.e., ranked highest by the colleges according to ${\pi}_{c}(\eta )$) and let him choose his best choice among all the colleges. Then we move onto student ${\pi}_{c}^{-1}(\eta )(2)$ (ranked second highest by the colleges).

To see its major implication for the distribution of the observed matching, note that

where

${\mathit{\pi}}_{s}(\epsilon ,\eta )$ is a profile of students’ preferences over colleges with the

$S(i)$-th student’s preference over the colleges equal to

for each

$S(i)\in {N}_{s}(\eta )$. The second equality in (

4) follows because the matching mechanism is anonymous. In other words, college

$\mathbf{Y}(S(i))$ matched by student

$S(i)$ should be the same college matched by the same student after relabeling the student

$S(i)$ as

$S({\pi}_{c}^{-1}(\eta )(S(i)))$. (Recall that

${\pi}_{c}^{-1}(\eta )(S(i))$ represents the ranking of student

$S(i)$.) After this relabeling, the colleges’ preference ordering over the students becomes:

$S(i){\succ}_{{\pi}_{c}}S(j)$ if and only if

$i<j$. Then from (

3), we obtain the following result.

**Theorem** **1.** Suppose that the matching mechanism μ is stable, and that Assumption 2.1 holds. Then for each $S(i)\in {N}_{s}(\eta )$,where $[({\pi}_{c}^{-1}(\eta )\circ S)(i)]=(1,2,\dots ,({\pi}_{c}^{-1}(\eta )\circ S)(i))$. The expression in the theorem provides an explicit reduced form for the matching $\mathbf{Y}$. It shows how the randomness of the observed matching $\mathbf{Y}$ depends on the random preferences. It shows that each single match $\mathbf{Y}(S(i))$ of student $S(i)$ is a complex function of the preferences of all the students and colleges. While incorporating this interdependence is crucial to properly take into account the inherent endogeneity of observed matching, it hampers the use of standard asymptotic theory. Thus, this paper pursues an approach of finite sample inference. Indeed, Theorem 1 shows that once we parametrize the distribution of ${\epsilon}_{i,j}$ and ${\eta}_{i}$, we can obtain the full joint distribution of the observed matching up to a parametrization. Comparing the sorting pattern of observed characteristics implied by this predicted distribution and the observed sorting pattern, we seek to ’perform inference on the structural parameters in finite samples.