Multiple Discrete Endogenous Variables in Weakly-Separable Triangular Models

Jun, Sung Jae; Pinkse, Joris; Xu, Haiqing; Yıldız, Neşe

doi:10.3390/econometrics4010007

Open AccessArticle

Multiple Discrete Endogenous Variables in Weakly-Separable Triangular Models

by

Sung Jae Jun

¹,

Joris Pinkse

¹,

Haiqing Xu

² and

Neşe Yıldız

^3,*

¹

CAPCP and Department of Economics, Pennsylvania State University, 608 Kern Graduate Building, University Park, PA 16802, USA

²

Department of Economics, University of Texas at Austin, 78712 Austin, TX, USA

³

Department of Economics, University of Rochester, 222 Harkness Hall, Rochester, NY 14627, USA

^*

Author to whom correspondence should be addressed.

Econometrics 2016, 4(1), 7; https://doi.org/10.3390/econometrics4010007

Submission received: 7 September 2015 / Revised: 14 December 2015 / Accepted: 8 January 2016 / Published: 4 February 2016

(This article belongs to the Special Issue Discrete Choice Modeling)

Download

Browse Figures

Versions Notes

Abstract

:

We consider a model in which an outcome depends on two discrete treatment variables, where one treatment is given before the other. We formulate a three-equation triangular system with weak separability conditions. Without assuming assignment is random, we establish the identification of an average structural function using two-step matching. We also consider decomposing the effect of the first treatment into direct and indirect effects, which are shown to be identified by the proposed methodology. We allow for both of the treatment variables to be non-binary and do not appeal to an identification-at-infinity argument.

Keywords:

nonparametric identification; discrete endogenous regressors; triangular models

JEL:

C01; C14; C31

1. Introduction

This paper deals with nonparametric identification in a three-equation nonparametric model with discrete endogenous regressors. We provide conditions under which an average structural function (ASF) (e.g., [1]) is point identified and discuss how different treatment effects can be identified using our methods. Like [2,3], we use a Dynkin system approach, which is based on the idea of matching; the idea of matching was also used by [3,4] inter alia, albeit that our notion of matching is different from the commonly-used matching method in the treatment effect literature (e.g., [5]). The latter uses the matching idea to control for observed covariates, while our method matches on identified/estimable sets, i.e., elements of Dynkin systems, as will become apparent below.

To motivate the parameter of interest in this paper, consider the example of assessing the dynamic evolution of crime (e.g., [6]). The number of crimes, say murders, at time t is affected both by the number of crimes prior to time t and by the level of police activity (measured by, e.g., the number of police patrols) at time t. This example has a special triangular structure, because the number of police patrols at time t is in part a response to the number of crimes at time

t - 1

. The number of crimes is discrete, as is the number of police patrols. There are several potential endogeneity problems in this example, e.g., simultaneity between crimes and police activity at time t and unobserved heterogeneity due to changes in the neighborhood and its surroundings. We focus on the identification of the ASF, which in this example, corresponds to the mean number of crimes at time t that would occur if both the number of crimes at time

t - 1

and the number of police patrols were exogenously fixed. There are other objects of potential interest that can be identified with our identification strategy. For instance, one could instead fix the number of crimes at time

t - 1

, but allow the number of police patrols to respond to it endogenously. We can thus decompose the effect of the changes of the past number of crimes into a direct effect and an indirect effect: a high level of crime at time

t - 1

can create an environment in which crime thrives at time t (e.g., because criminals build up local knowledge, set up networks), but it also leads to an increased police presence, which reduces crimes at time t. We also discuss such decompositions in this paper.

The model that we study is similar to that in [3,4] and others in that we make and exploit a weak separability assumption. However, [4] specifically excludes the possibility of non-binary categorical endogenous regressors, imposes restrictive support conditions on the covariates and only deals with the two-equation case. The non-binary categorical regressor case is not discussed in (the published version of) [3], which further does not deal with the present, more complicated, three-equation model featuring two discrete endogenous regressors. In this paper, we show that the methodology developed in [3] can be used to study non-binary treatments with a double layer of endogeneity. There are other papers that have a three-equation model and/or allow for non-binary regressors (e.g., [7,8,9]), but the model or the object of interest is generally different.

There are many examples in which (a (semi)parametric version of) our structure has been used. We mention only a few. The work in [10] studies the effects of smoking on birth weight through the mechanism of gestation time. The work in [11] analyzes the effects of school type and class size on earnings and educational attainment. The work in [12] has a simpler dependence structure than the one used here. The work in [13] investigates labor market returns to community college attendance and four-year college education. The work in [14] considers the multi-stage nature of the adoption process of on-line banking services, where interruptions in the initial sign-up stage and in the later regular use stage are the treatments of interest. We further note that the double hurdle model of [15], which is used in much empirical work, is a special case of our model, albeit that the identification methods developed here are of limited use in Cragg’s specification.

The focus here is on point identification. There are several papers (e.g., [16,17,18]) that develop bounds on treatment effects in models that are similar to, but simpler than, the one in this paper using weaker monotonicity assumptions than are imposed here. As shown in [2], the Dynkin system approach can be used to obtain sharp bounds in an environment in which there is only partial identification. We do not pursue this possibility in the current paper.

Identification of parameters of interest in our paper proceeds in two steps. In the first step, we use the variation in the instrument

z

for the treatment

d

to infer what variation in the instrument for the intermediate endogenous variable

s

would compensate exactly for variation in

d

. Using this information, we can undo the effect of changing

d

on

s

. Provided that the instruments for

d

and

s

have sufficient variation, we can identify the structural function for

s

this way. Using this first stage information along with variation in instruments for

d

and

s

, we infer what variation in the exogenous regressors in the outcome equation would compensate exactly for variation in both treatment

d

and intermediate endogenous variable

s

. Our paper differs from both [2,3,4] in that we have to use another level of matching in order to undo the effect of both

d

and

s

on the outcome

y

. A critical component of our strategy is the existence of instruments for the endogenous regressors

d

and

s

and sufficient variation in the exogenous regressors in the outcome equation to allow us to compensate for variation in the endogenous regressors directly.

The Dynkin system approach is a natural scheme that allows one to collect and aggregate information contained in the data in a natural and thorough fashion through a recursion scheme1. Each combination of observables implies that the unobservable error terms belong to certain sets. From these sets, one can infer additional information through various operations on these sets. In this paper, we use a version of the Dynkin system approach, first used in [3], which exploits matching in addition to the union and difference operators used in [2]. Matching has been used frequently in the past. For instance, [20] used it to avoid support conditions in estimating weakly-separable nonparametric regression functions. The way we use matching in this paper is closer to [4], albeit that our procedure, as already mentioned, can be applied more generally.

Although the fact that the Dynkin system approach requires only weak covariate support restrictions is an attractive feature, this paper will instead focus on extending the use of the Dynkin system to more complicated situations, since the support restrictions issue was discussed at length in [3], albeit for the two-equation binary endogenous regressor case. Further, the Dynkin system mechanism can be used to study effects other than average partial effects, such as marginal treatment effects (e.g., [21]), but here, we focus on average partial effects.

The remainder of the paper is organized as follows. In Section 2, we lay out our model and discuss the objects we want to identify and the rationale for our desire to do so. Section 3 provides a rough description of the basic ideas underlying our identification approach. These ideas are formalized and illustrated using more complete examples in Section 4 and Section 5. Finally, Section 7 provides a brief sketch of how the identification methods proposed here could be implemented.

2. Model

Imposing weak separability in multiple places, we consider the model

\{\begin{matrix} y = g (α (x, s, d), ϵ), \\ s = \sum_{j = 1}^{η_{s}} {v > m_{j} (x, w, d)}, \\ d = \sum_{j = 1}^{η_{d}} {u > p_{j} (x, w, z)}, \end{matrix}

(1)

where

η_{s}, η_{d} \geq 1

and

g, m_{1}, \dots, m_{η_{s}}, p_{1}, \dots, p_{η_{d}}

are unknown functions. We assume that

η_{s}, η_{d}

are known and that we observe

y, s, d, x, w, z

. The unobservables

u

and

v

are scalar random variables; the dimension of

ϵ

is not restricted.

One feature in (1) is that

w

and

z

are excluded from the first and second equations, respectively. Our identification arguments will require that

w

and

z

be able to vary the

m_{j}

and

p_{j}

functions, respectively, but the fact that

x

appears in the

m_{j}

functions and

x, w

do in the

p_{j}

functions will be immaterial. Therefore, we will simply consider

\{\begin{matrix} y = g (α (x, s, d), ϵ), \\ s = \sum_{j = 1}^{η_{s}} {v > m_{j} (w, d)}, \\ d = \sum_{j = 1}^{η_{d}} {u > p_{j} (z)}, \end{matrix}

(2)

for the sake of illustrational clarity. We now impose that

p_{0} (z) = m_{0} (w, d) = 0

,

p_{j} (z) < p_{j + 1} (z)

,

m_{j} (w, d) < m_{j + 1} (w, d)

and

p_{η_{d} + 1} (z) = m_{η_{s} + 1} (w, d) = 1

. This is without loss of generality in view of Assumption B below. The setup in (2) requires that the exogenous covariates

x, w, z

appear only once in each equation2. It is straightforward to generalize our identification strategy to Model (1) at the expense of exposition. However, doing so would introduce additional notational complexity and requires more variations in

w

and

z

.

In the crime example discussed in the Introduction,

y

would be the number of crimes this period,

s

the number of police patrols and

d

the number of crimes in the previous period. Then,

x, z

represent observable exogenous neighborhood characteristics this period and last period, respectively. Finally,

w

can contain variables that reflect the resources that the police can employ to combat crime, with the implicit assumption that such resources cannot be enhanced in the short term and can hence be treated as exogenous.

We now make several model assumptions. Let

U = (0, 1]

.

Assumption A.

(u, v, ϵ)

is independent of

(w, z, x)

.

Assumption B. The distribution of

(u, v)

is absolutely continuous with respect to the Lebesgue measure μ with support

S_{u v} = U^{2}

, and

u, v

have marginal uniform distributions on

U

.

Assumption C.

E [g (α, ϵ) | u = u, v = v]

is for all

u, v \in U

strictly monotonic in α.

Assumption A is strong, but can be relaxed to independence conditional on covariates, i.e., either covariates in addition to

w, z, x

or elements of vector-valued

w, z, x

. Moreover, if g is additively separable in

ϵ

, then Assumption A can be further weakened as explained below.

The second half of Assumption B constitutes a normalization. The first part is restrictive, but is difficult to avoid. Please note, however, that

u

and

v

are allowed to be dependent and that the support of

(u, v)

given

ϵ

need not be

U^{2}

.

Monotonicity is a common assumption in the nonparametric identification literature, but unlike, e.g., [22,23,24], Assumption C does not require monotonicity in the error term of the structural function g itself, but instead, it requires monotonicity of the (conditional) expectation3; a similar assumption can be found in [4]. For instance, an indicator function, such as

g (α, ϵ) = {ϵ > α}

, is allowed, as long as

ϵ

is continuously distributed, given

u

and

v

. However, the single index feature of the structural function is an essential feature of Assumption C. For the use of the Dynkin system idea to identify a structural function under a stronger form of monotonicity, see [2].

Both

s

and

d

are general ordered response variables, which are allowed to be endogenous. Instead of having one variable with

(1 + η_{s}) (1 + η_{d})

support points, we have two treatment variables here4 that depend on two distinct error terms,

u

and

v

. As a result, if we tried to combine

s

and

d

into one variable with

(1 + η_{s}) (1 + η_{d})

support points, the resulting random variable would not necessarily have the threshold crossing form that

s

and

d

have in our paper. This is because to have a treatment variable that has a threshold crossing form,

u

and

v

would have to be represented by a single unobservable, whose values could be ordered linearly. However, there does not generally exist such a one-to-one mapping. Without having a discrete treatment variable that has this threshold crossing form, the identification method given in [4] would not work. Since [3] also consider a single treatment variable with a threshold crossing form, the method in [3] would not work either. As a result, the model studied in this paper is not covered by the models studied in [3,4]. It is also more general than the double hurdle model of [15], Equations (5) and (6), albeit that our matching strategy for identification is of limited usefulness there5.

When discussing our assumptions, we mentioned that Assumption A could be weakened further if g is additively separable in

ϵ

. To be more specific, let

x = {(x_{1}, x_{2}^{⊤})}^{⊤}, w = {(w_{1}, x_{2}^{⊤})}^{⊤}

and

z = {(z_{1}, x_{2}^{⊤})}^{⊤}

, where

x_{1}, w_{1}, z_{1}

are scalar-valued random variables. Suppose that the outcome equation is given by

y = h (x_{2}, s, d) + x_{1} β + ϵ,

(3)

which is a form commonly applied by researchers. Then, Assumption A can be further weakened in the following way:

Assumption D. (i)

ϵ = x_{1} γ + \tilde{ϵ}

, (ii)

E (\tilde{ϵ} | u, v, w_{1}, z_{1}, x_{1}, x_{2}) = E (\tilde{ϵ} | u, v, x_{2})

, (iii)

(u, v)

is independent of

(x_{1}, w_{1}, z_{1})

conditional on

x_{2}

and (iv)

β + γ \neq 0

.

Under Assumption D, the outcome Equation (3) can be written as

y = h (x_{2}, s, d) + x_{1} (β + γ) + \tilde{ϵ},

(4)

and

β + γ

can be identified by running an OLS regression of

y - E (y | x_{2}, d, s, w_{1}, z_{1})

on

x_{1} - E (x_{1} | x_{2}, d, s, w_{1}, z_{1})

, since

E (y | x_{2}, d, s, w_{1}, z_{1}) = h (x_{2}, s, d) + E (x_{1} | x_{2}, d, s, w_{1}, z_{1}) (β + γ) + ρ (x_{2}, d, s, w_{1}, z_{1}) + e,

where

ρ (x_{2}, d, s, w_{1}, z_{1}) = E (\tilde{ϵ} | x_{2}, d, s, w_{1}, z_{1})

. Then,

x_{1} (β + γ)

can be used to compensate for the effects of varying

d

and

s

in the outcome equation as long as

β + γ \neq 0

.

To see why this weakening of Assumption A might be particularly useful, suppose that

y

equals adult wages of an individual, treatment

d

is whether a student is assigned to a small class or not and

s

is an indicator for college attendance. This example is also considered in [25]. The instrument for

d

is the educational intervention in the Project STAR experiment, in which early graders were randomized into small classes, and the instrument for

s

could be the variation in tuition fees or distance to college; see, for instance, [13,26]. We still need a variable in the wage equation that is exogenous and that does not enter the other two equations. Under Assumption D, the exogeneity condition such a variable has to satisfy is considerably weaker than the one embodied in Assumption A. In particular, the individual’s age when adult wage is measured might be a reasonable candidate as the required

x_{2}

.

In contrast to the existing literature, including [3,4], which mainly focuses on the effects of one endogenous variable while fixing other variables, our setting features multiple endogenous treatments with a triangular structure, which allows us to consider various causal parameters, such as direct and indirect (average) effects of the treatment variable

d

. Below, we discuss such parameters and methods of identifying them, albeit that our main focus is on identifying the average structural function.

We now formally state the average structural function we analyze. Let

y_{s d} = g (α (x, s, d), ϵ)

. Thus,

y_{s d} = y

if

(s, d) = (s, d)

, but if

(s, d) \neq (s, d)

, then

y_{s d}

is the value

y

would have taken if the same individual had

s = s, d = d

. Therefore,

y_{s d}

is a typical counterfactual outcome variable, but with two indices instead of the usual one. The focus in this paper will be on the identification of

ψ (x^{*}, s^{*}, d^{*}) = E (y_{s^{*} d^{*}} | x = x^{*}) = E g (α (x^{*}, s^{*}, d^{*}), ϵ),

(5)

where

x^{*}, s^{*}, d^{*}

are chosen by the researcher. We obtain identification of

m_{s} (w, d)

as a byproduct. Please note that

ψ (x^{*}, s^{*}, d^{*})

is the ASF conditional on

x = x^{*}

, when the treatments are exogenously fixed at

s^{*}

and

d^{*}

. For instance,

ψ (1, 1, 1)

could be the counterfactual mean earnings of a male worker (

x = 1

) if he had both a college degree (

d = 1

) and received on-the-job training (

s = 1

), or it could be the counterfactual mean birth weight for an infant if her mother had a normal gestation length (

s = 1

) and smoked (

d = 1

). In the crime example,

ψ (1, 1, 1)

is the mean number of crimes at time t if current neighborhood characteristics

x

are one and with both police patrols at time t and crime at

t - 1

for exogenous reasons.

The function ψ can be used to obtain many, but not all, causal effects of interest. Recall the dual binary treatment example involving college education and on-the-job training. Consider exogenously changing

d

and fixing

s

at a specified value

s^{*}

. Then, one can identify the ceteris paribus effect of a change in college education status on earnings for a male worker with job training, i.e.,

ψ (1, s^{*}, 1) - ψ (1, s^{*}, 0)

. We call this an average partial treatment effect. Alternatively, we can define average joint treatment effects by looking at the causal effects on earnings for male workers of exogenously changing both college education and job training status, i.e.,

ψ (1, 1, 1) - ψ (1, 0, 0)

. One can aggregate up such effects across sexes, or indeed across job training statuses, e.g.,

E [ψ (1, \tilde{s}, 1) - ψ (1, \tilde{s}, 0)]

, where

\tilde{s}

is drawn from a suitable job training status distribution.

It should also be noted that our results can be further used to conduct a decomposition of direct and indirect effects for policy analysis. For instance, if the policy maker can only influence college education decisions, but not job training decisions directly, then an object of interest would be the effect of exogenously changing

d

on a male worker’s mean earnings leaving

s

to adjust according to the preferences of the worker and his employer, i.e., the parameter

\begin{matrix} E g (α (x, s_{1} (w), 1), ϵ) - E g (α (x, s_{0} (w), 0), ϵ) \\ = [E g (α (x, s_{1} (w), 1), ϵ) - E g (α (x, s_{1} (w), 0), ϵ)] \\ + [E g (α (x, s_{1} (w), 0, ϵ) - E g (α (x, s_{0} (w), 0), ϵ)], \end{matrix}

(6)

where

s_{d} (w)

is the counterfactual value of

s

when

d

is exogenously fixed at d given

w = w

6. We call the left-hand side in (6) an average total treatment effect, which is decomposed into a direct effect and an indirect effect on the right-hand side7. Although the parameters in (6) are not represented by ψ, the methods we develop to identify ψ can be used to identify them, as we show in Section 6.

The fact that there are several causal parameters of potential interest arises both because there are multiple endogenous treatment variables and because of the triangular nature of the model. However, we do not believe that one parameter is generally more important than others, but the purpose and context of the policy question of interest should be taken into account. As explained in Section 6, identification of causal parameters, like (6), can be established by the matching method developed in this paper. Therefore, we focus on the identification of ψ (and

m_{s}

) in the main text to highlight the idea of matching, while we show in Section 6 that the identification of (6) can be obtained by the same methods.

3. Description

We now provide a broad and rough description of our identification strategy. We combine the idea of matching to that of set operations. Matching was also used in [3,4], inter alia. Indeed, our methodology shares some of the intuition with Jun, Pinkse, and Xu (2012) [3]: this will become clear as we proceed. However, due to the triangular structure, the procedure used in this paper is more complicated than that in [3]. The methodology in [3] covers the specification in [4] as a special case.

There are several unknown functions in our model: the

p_{j}

’s,

m_{j}

’s and α are important to identify ψ. The

p_{j}

functions are identified directly from the data since

p_{j} (z)

is simply the probability that the number of crimes last period was no more than

j - 1

given that

z = z

. Identification of the

m_{j}

’s is more involved, but is simpler than that of ψ. Therefore, we start with the

m_{j}

functions.

Our method of identifying the

m_{j}

’s is related to the identification approaches in [3,4]. Indeed, if

d

is binary and the joint support

S_{w z}

of

(w, z)

is sufficiently rich, then our approach has the same intuition as that in [4]. For instance, we also ask what changes in police resources will offset the changes in police activity induced by changes in the number of past crimes. However, the method of [4] only applies to the case in which

d

is binary. Below, we explain how matching is convenient when

d

is binary and how our Dynkin system can be used to obtain identification if

d

is not necessarily binary.

We start with the simple case, i.e., binary

d

. Consider the problem of identifying

m_{1} (w^{*}, 0)

. Note that for any value of z,

\begin{matrix} m_{1} (w^{*}, 0) = P (v \leq & m_{1} (w^{*}, 0)) \\ = \underset{= P (s = 0, d = 0 | w = w^{*}, z = z)}{\underset{︸}{P (v \leq m_{1} (w^{*}, 0), 0 < u \leq p_{1} (z))}} + P (v \leq m_{1} (w^{*}, 0), p_{1} (z) < u \leq 1) . \end{matrix}

(7)

Note here that the inequality

v \leq m_{1} (w^{*}, 0)

describes the event in which the potential status of

s

given

w = w^{*}

when

d

is fixed at zero is equal to zero. There are two possibilities: either

d

is actually equal to zero (the first right-hand side term in (7)) or it is not equal to zero (the second right-hand side term in (7)). The first right-hand side term in (7) can be inferred directly from the distribution of observables and is hence identified. This is where matching is useful. If we can find

\tilde{w}

such that

m_{1} (w^{*}, 0) = m_{1} (\tilde{w}, 1)

, then

v \leq m_{1} (w^{*}, 0)

is the same event as

v \leq m_{1} (\tilde{w}, 1)

. Therefore, the second term on the right-hand side of (7) equals

P (v \leq m_{1} (\tilde{w}, 1), p_{1} (z) < u \leq 1) = P (s = 0, d = 1 | w = \tilde{w}, z = z) .

The question is how to find such

\tilde{w}

. The work in [4] proposes finding

\tilde{w}

for which the left-hand sides (and therefore, the right-hand sides) in the following equations are equal.

\begin{matrix} P (s = 0, d = 0 | w = w^{*}, z = \tilde{z}) - P (s = 0, d & = 0 | w = w^{*}, z = z) \\ = P (v \leq m_{1} (w^{*}, 0), p_{1} (z) < u \leq p_{1} (\tilde{z})), \end{matrix}

(8)

\begin{matrix} P (s = 0, d = 1 | w = \tilde{w}, z = z) - P (s = 0, d & = 1 | w = \tilde{w}, z = \tilde{z}) \\ = P (v \leq m_{1} (\tilde{w}, 1), p_{1} (z) < u \leq p_{1} (\tilde{z})) . \end{matrix}

(9)

The equalities in (8) and (9) rely on the threshold structure of

d

(which is binary for now). There are a few issues here. First,

(w^{*}, \tilde{z}), (w^{*}, z), (\tilde{w}, z)

and

(\tilde{w}, \tilde{z})

must all be in the joint support

S_{w z}

. Second, this procedure only works if

d

is binary.

Our Dynkin system approach is a systematic way of combining multiple such matches via set operations. For instance, when the support

S_{w z}

is limited, the Dynkin system approach provides chaining arguments: see [3] for details. When

d

is not binary, it provides an extra layer of matching. For instance, suppose that

d

can take three values: 0, 1 or 2. Then, like in (7), for any z,

\begin{matrix} m_{1} (w^{*}, 0) = \underset{= P (s = 0, d = 0 | w = w^{*}, z = z)}{\underset{︸}{P (v \leq m_{1} (w^{*}, 0), 0 < u \leq p_{1} (z))}} \\ + P (v \leq m_{1} (w^{*}, 0), p_{1} (z) < u \leq p_{2} (z)) + P (v \leq m_{1} (w^{*}, 0), p_{2} (z) < u \leq 1) . \end{matrix}

(10)

The intuitive interpretation of the event

v \leq m_{1} (w^{*}, 0)

is the same as before: the potential outcome of the

s

variable when

d

is fixed at zero is equal to zero. Therefore, the first term on the right-hand side is identified because it is equal to a conditional probability on observables. In the binary case, (7), we had one unknown right-hand side term; now, there are two. The second and third terms in (10) correspond to the cases where the realized value of

d

equals one and two, respectively. Therefore, we need to find

\tilde{w}, \bar{w}

, such that

m_{1} (w^{*}, 0) = m_{1} (\tilde{w}, 1) = m_{1} (\bar{w}, 2)

. The method of [4] does not provide a solution: (8) is still valid, but (9) is not.

Our solution is to use an extra layer of matching in the

p_{j}

’s. To see how this works, suppose that the probability of having no more than one incidence of crime in the past given

z = \tilde{z}

is matched to the probability of having no crime at all in the past given

z = z

, i.e.,

P (d = 0 | z = \tilde{z}) = p_{1} (\tilde{z}) = P (d \leq 1 | z = z) = p_{2} (z),

(11)

Then, we have

P (s = 0, d = 1 | w = \tilde{w}, z = z) = P (v \leq m_{1} (\tilde{w}, 1), p_{1} (z) < u \leq p_{1} (\tilde{z})),

(12)

which can be used in place of (9). In other words,

m_{1} (w^{*}, 0) = m_{1} (\tilde{w}, 1)

if and only if the left-hand side in (12) equals the left-hand side in (8). The Dynkin system provides a general and systematic method of doing this.

Note that it is insufficient for the (conditional) probability of no crime in the past to vary with z. It now matters how much the conditional probabilities of crime vary with

z

; see (11). The above examples only a few features of the general Dynkin system approach. For instance, if the joint support

S_{w z}

of

(w, z)

is limited, then identification can be obtained via the Dynkin system approach, but it will be more complicated than the procedure described above.

Identification of

ψ (x^{*}, 0, 0)

is substantially more complicated (even when

d

and

s

are both binary), but the basic idea is the same. We want to match the α function at different argument values, for which we need to combine matching

m_{j}

’s and matching of

p_{j}

’s. We now explain how this can be done.

To get a whiff of the basic premise, we focus on the simplest possible meaningful case, i.e., binary treatments

d

and

s

: our results in the remainder of the paper are general. Again, we will exploit only a few features of the general methodology. In particular, in this example, we assume that the joint support

S_{u v}

of

(u, v)

is simply the product of the marginal supports, i.e.,

S_{u v} = U^{2}

, which is unnecessary, as will become apparent later in the paper.

Define

A_{d s} (w, z, j) = (p_{d} (z), p_{d + 1} (z)] \times (m_{s} (w, d), m_{s + 1} (w, j)] .

(13)

Further, define

κ (A, a) = E [g (a, ϵ) {(u, v) \in A}] .

(14)

To understand the idea behind (13) and (14), please note that

(u, v) \in A_{d s} (w, z, j)

is the event that

d

is equal to d, and the potential status of

s

when

d

is fixed at j is equal to s, conditional on

z = z, w = w

. Therefore, it involves the counterfactual status of the

s

variable. There are combinations of

(A, a)

for which

κ (A, a)

can be recovered directly from the joint distribution of observables, namely for given

w = w, z = z

,

(u, v) \in A_{d s} (w, z, d) ⟺ d = d and s = s .

Therefore, if

δ (x^{*}, s, d, w, z) = E (y (s = s) (d = d) | x = x, w = w, z = z)

(15)

then

κ (A_{00} (w, z, 0), α (x^{*}, 0, 0)) = δ (x^{*}, 0, 0, w, z) .

(16)

Equality (16) plays the same role as the first right-hand side term in (7) and (10). Indeed, note that

ψ (x^{*}, 0, 0) = κ (U^{2}, α (x^{*}, 0, 0))

can be decomposed as follows: for any

w, z

,

\begin{matrix} κ (U^{2}, α (x^{*}, 0, 0)) = \underset{= δ (x^{*}, 0, 0, w, z)}{\underset{︸}{κ (A_{00} (w, z, 0), α (x^{*}, 0, 0))}} + κ (A_{01} (w, z, 0), α (x^{*}, 0, 0)) \\ + κ (A_{10} (w, z, 0), α (x^{*}, 0, 0)) + κ (A_{11} (w_{1}, z, 0), α (x^{*}, 0, 0)), \end{matrix}

(17)

which is more complicated than, but similar to (7) and (10). An important complication is that, for instance, finding a value

\tilde{x}

, such that

α (x^{*}, 0, 0) = α (\tilde{x}, 0, 1)

, is insufficient to identify the second term on the right-hand side in (17) because

A_{01} (w, z, 0)

itself also involves a counterfactual.

Resolving this complication requires that we pair this approach with the matching procedure for the

m_{j}

functions, which we have explained above. For example, matching

m_{1} (w, 0)

to

m_{1} (\tilde{w}, 1)

ensures that

A_{01} (w, z, 0) = A_{01} (\tilde{w}, z, 1)

, which implies that matching

α (x^{*}, 0, 0) = α (\tilde{x}, 0, 1)

will indeed lead to identification of the second right-hand side term in (17). In the following example, we provide a graphical illustration to explain how to find such

\tilde{x}

.

Example 1. Consider Figure 1 and suppose for now that the

m_{s}

functions are identified and that the joint support of the covariates equals the product of their marginal supports. Let

x_{1}, x_{2}, w_{1}, w_{2}, w_{3}, z_{1}, z_{2}

be values in their respective supports (i.e.,

S_{X}, S_{W}, S_{Z}

), such that

m_{1} (w_{1}, 0) = m_{1} (w_{2}, 1) < m_{1} (w_{3}, 1)

and

0 < p_{1} (z_{2}) < p_{1} (z_{1}) < 1

, as is depicted in Figure 1. Then, the following quantities are identified directly from the data.

\{\begin{matrix} δ (x^{*}, 0, 0, w_{1}, z_{2}) & = κ (green, α (x^{*}, 0, 0)), \\ δ (x^{*}, 0, 0, w_{1}, z_{1}) & = κ (green+yellow, α (x^{*}, 0, 0)), \\ δ (x_{1}, 0, 1, w_{2}, z_{1}) & = κ (blue, α (x_{1}, 0, 1)), \\ δ (x_{1}, 0, 1, w_{2}, z_{2}) & = κ (blue+yellow, α (x_{1}, 0, 1)), \\ δ (x_{1}, 0, 1, w_{3}, z_{1}) & = κ (blue+purple, α (x_{1}, 0, 1)), \\ δ (x_{2}, 1, 1, w_{3}, z_{1}) & = κ (red, α (x_{2}, 1, 1)), \\ δ (x_{2}, 1, 1, w_{2}, z_{1}) & = κ (red+purple, α (x_{2}, 1, 1)), \\ δ (x_{3}, 1, 0, w_{1}, z_{1}) & = κ (blank, α (x_{3}, 1, 0)) . \end{matrix}

(18)

Subtracting the first and third lines in (18) from the second and fourth lines, respectively, yields

κ (yellow, α (x^{*}, 0, 0))

and

κ (yellow, α (x_{1}, 0, 1))

, which are equal if and only if

α (x^{*}, 0, 0) = α (x_{1}, 0, 1)

. Likewise, subtracting the third and sixth lines in (18) from the fifth and seventh lines allows one to verify whether

α (x_{1}, 0, 1) = α (x_{2}, 1, 1)

. We can verify that

α (x_{1}, 1, 1) = α (x_{2}, 1, 0)

analogously.

Figure 1. Simple matching procedures.

Once values

x_{1}, x_{2}, x_{3}

are found, such that

α (x^{*}, 0, 0) = α (x_{1}, 0, 1) = α (x_{2}, 1, 1) = α (x_{3}, 1, 0)

,

κ (S_{u v}, α (x^{*}, 0, 0))

can be computed as (for instance) the sum of

δ (x^{*}, 0, 0, w_{1}, z_{1})

,

δ (x_{1}, 0, 1, w_{2}, z_{1})

,

δ (x_{2}, 1, 1, w_{2}, z_{1})

and

δ (x_{3}, 1, 0, w_{1}, z_{1})

. ☐

Finally, we note that there exists an alternative, but not particularly attractive, possibility: identification-at-infinity. From (15), it should be apparent that if we can find a sequence

{(z_{n}, w_{n})}

, such that

lim_{n \to \infty} p_{1} (z_{n}) = 1, lim_{n \to \infty} m_{1} (w_{n}, 0) = 1,

then identification of

ψ (x^{*}, 0, 0)

obtains, since

lim_{n \to \infty} E [y (s = 0) (d = 0) | x = x^{*}, w = w_{n}, z = z_{n}] = E [g (α (x^{*}, 0, 0), ϵ\} {(u, v) \in U^{2})] = ψ (x^{*}, 0, 0) .

However, such an identification-at-infinity argument is undesirable since it generally makes inefficient use of the data [27] and imposes extreme support restrictions. Therefore, we do not consider this possibility.

In the remainder of this paper, more general versions of the procedures sketched above are formally expressed in terms of a Dynkin system, and their power is illustrated using some concrete examples.

4. Identification of m

We now establish the identification of

m_{s^{*}} (w^{*}, d^{*})

formally. Define8

θ (V, m) = P (u \in V, v \leq m), V \subset U, m \in U .

Further, let

S_{z} (w)

be the support of

z

conditional on

w = w

and define

V (d, w) = \{(p_{d} (z), p_{d + 1} (z)] : z \in S_{z} (w)\}, d = 0, \dots, η_{d} .

(19)

Then,

θ (V, m_{s} (w, d))

is identified when

V \in V (d, w)

because

θ ((p_{d} (z), p_{d + 1} (z)], m_{s} (w, d)) = P (s < s, d = d | w = w, z = z) .

(20)

We now show that

θ (V, m_{s} (w, d))

is identified for a much broader class of sets than

V (d, w)

.

Definition 1.

D^{*} (d, s, w)

is the collection

D_{\infty}^{*} (d, s, w)

in the following iterative scheme. Let

D_{0}^{*} (d, s, w) = V (d, w)

. Then, for all

t \geq 0

,

D_{t + 1}^{*} (d, s, w)

consists of all sets

A^{*}

, such that at least one of the following conditions is satisfied, where μ denotes the standard Lebesgue measure over

U

.

(i): $A^{*} \in D_{t}^{*} (d, s, w)$ ;
(ii): $\exists A_{1}, A_{2} \in D_{t}^{*} (d, s, w) : A_{1} \subset A_{2}, μ (A_{2} - A_{1}) > 0, A^{*} = A_{2} - A_{1}$ ;
(iii): $\exists A_{1}, A_{2} \in D_{t}^{*} (d, s, w) : A_{1} \cap A_{2} = \emptyset, μ (A_{1} \cup A_{2}) > 0, A^{*} = A_{1} \cup A_{2}$ ;
(iv): $\exists (\bar{d}, \bar{s}, \bar{w}) : m_{s} (d, w) = m_{\bar{s}} (\bar{d}, \bar{w}), D_{t}^{*} (d, s, w) \cap D_{t}^{*} (\bar{d}, \bar{s}, \bar{w}) \neq \emptyset, A^{*} \in D_{t}^{*} (\bar{d}, \bar{s}, \bar{w})$ . ☐

The conditions in Definition 1 are similar to those in [3]. Note that

D^{*} (d, s, w)

depends on s because of Condition (iv). The importance of Condition (iv) will become apparent in Lemma 1 below. The main difference between [3] and what we have here for the identification of m is that the collection in Definition 1 now also has an argument s: identification of ψ is substantially more involved than that.

Note that

{D_{t}^{*} (d, s, w) : t = 0, 1, \dots}

is an increasing sequence of collections, such that

D^{*} (d, s, w)

is the infinite union of

D_{t}^{*} (d, s, w)

’s.9 Note further that

D^{*} (d, s, w)

is indexed by

s, w

, as well as d. If

S_{z} (w)

is the same for all w values, then the argument pursued in this section is simpler, but such support restrictions are undesirable, because they exclude the possibility that

w, z

have elements in common, and they also preclude the situation in which certain combinations of

(w, z)

values cannot occur.

All elements of

D^{*}

are defined in terms of (combinations of) the unknown

p_{d}

and

m_{s}

functions. Hence, each element can be thought of as an unknown parameter. In Lemma 1, we show that all elements in

D^{*}

are identified. Subsequently, we obtain a condition that is sufficient for identification of

m_{s^{*}} (w^{*}, d^{*})

.

Lemma 1. Suppose that Assumptions A and B are satisfied.

(i): For all $(d, s, w) \in S_{d s w}$ , every $V \in D^{*} (d, s, w)$ is identified;
(ii): $θ (V, m_{s} (w, d))$ is identified whenever $(d, s, w) \in S_{d s w}$ and $V \in D^{*} (d, s, w)$ .

Proof. See Appendix A. ☐

Assumption E.

U \in D^{*} (d^{*}, s^{*}, w^{*})

.

Since

{D_{t}^{*} (d^{*}, s^{*}, w^{*}) : t = 0, 1, 2, \dots}

is an increasing sequence of collections of sets and

d, s

take finitely many values, Assumption E is satisfied when there exists a finite T, such that

U \in D_{T}^{*} (d^{*}, s^{*}, w^{*})

. Assumption E is testable, because for any finite t, all elements of

D_{t}^{*} (d, s, w)

are identified.

Theorem 1. If Assumptions A, B and E are satisfied then,

m_{s^{*}} (w^{*}, d^{*})

is identified.

Proof. See Appendix A. ☐

Assumption E involves conditions on the support of

z

; the class

D^{*} (d, s, w)

is mostly determined by the amount of variation available in

z

given

d = d, s < s, w = w

. For example, consider the simple case

η_{d} = 1

. Suppose that there exist

s, \bar{s}, w, \bar{w}

, such that

m_{s} (w, 0) = m_{\bar{s}} (\bar{w}, 1)

. Then, Assumption E is satisfied if the support of

z

contains values

z, \bar{z}

with

p_{1} (z) < p_{1} (\bar{z})

. Please note that even though

V (0, w) = \{(0, p_{1} (z)], (0, p_{1} (\bar{z})]\}

does not contain a partition of

U

, we have

D_{1}^{*} (0, s, w) \cap D_{1}^{*} (1, \bar{s}, \bar{w}) = \{(p_{1} (z), p_{1} (\bar{z})]\}

, and therefore, the matching mechanism (iv) in Definition 1 implies that

D^{*} (0, s, w)

contains a partition of

U

.

Indeed, suppose that

D^{*} (d^{*}, s^{*}, w^{*}) \cap D^{*} (\bar{d}, \bar{s}, \bar{w}) \neq \emptyset

for some

(\bar{d}, \bar{s}, \bar{w}) \in S_{d s w}

. Then, by (iv) in Definition 1,

m_{s^{*}} (w^{*}, d^{*}) = m_{\bar{s}} (\bar{w}, \bar{d})

implies that

D^{*} (d^{*}, s^{*}, w^{*}) = D^{*} (\bar{d}, \bar{s}, \bar{w})

. Therefore, not only

V (d^{*}, w^{*})

, but also

V (\bar{d}, \bar{w})

should be taken into account, which is particularly useful when

d^{*} \neq \bar{d}

. This reasoning suggests a simple sufficient condition, which we state as a corollary.

Corollary 1 (Sufficient conditions). Suppose that Assumptions A and B are satisfied and that

S_{w z} = S_{w} \times S_{z}

. Suppose further that there exists a sequence

{(s_{j}, w_{j}) \in S_{s w} : j = 0, 1, \dots, η_{d}}

, such that

m_{s_{j}} (w_{j}, j) = m_{s^{*}} (w^{*}, d^{*})

for all

j = 0, 1, \dots, η_{d}

. Further, suppose that:

\forall j = 1, \dots, η_{d} - 1 : inf_{z \in S_{z}} p_{j + 1} (z) < sup_{z \in S_{z}} p_{j} (z),

(21)

where each

p_{j}

is a continuous function and

z

is continuously distributed. Then,

m_{s^{*}} (w^{*}, d^{*})

is identified.

Please note that Corollary 1 imposes restrictions on the relationship between

p_{j}

and

p_{j + 1}

(for all values of j), but it does not require there to be a direct relationship between

p_{j}

and

p_{j + 2}

. Indeed, the matching procedure can be chained in the sense that we can first establish equality of

m_{s_{0}} (w_{0}, 0)

to

m_{s_{1}} (w_{1}, 1)

, then uncover that

m_{s_{0}} (w_{0}, 0) = m_{s_{1}} (w_{1}, 1) = m_{s_{2}} (w_{2}, 2)

, and so on.

To illustrate Corollary 1, consider the following example.

Example 2 (Ordered response). Suppose that for all

d, z

and some

β_{0}

and

- \infty = γ_{0} < γ_{1} < \dots < γ_{η_{d} + 1} = \infty

,

p_{d} (z) = Φ (γ_{d} + β^{⊤} z)

, as would be the case in an ordered probit model. This is one of the least favorable cases for our procedure, since for all

z, z^{*}

and

d = 1, \dots, η_{d}

,

p_{d} (z) < p_{d} (z^{*}) \Rightarrow p_{d + 1} (z) \leq p_{d + 1} (z^{*}) and p_{d - 1} (z) \leq p_{d - 1} (z^{*})

.

Therefore, condition (21) in Corollary 1 is satisfied if

sup_{z, z^{*} \in S_{z}} β^{⊤} (z - z^{*}) \geq max_{d = 1, \dots, η_{d} - 1} (γ_{d + 1} - γ_{d}) . ☐

To illustrate the idea of Theorem 1, we provide the following two fairly concrete examples. Let

\begin{matrix} π_{s d} (w, z) = P (s < s, d = d | w = w, z = z) = P {p_{d} (z) < & u \leq p_{d + 1} (z), v \leq m_{s} (w, d)} \\ = θ (p_{d} (z) p_{d + 1} (z), m_{s} (w, d)), \end{matrix}

(22)

which is identified provided that

z \in S_{z} (w)

.

Example 3 (Uncovering that

m_{s_{0}} (w_{0}, 0) = m_{s_{1}} (w_{1}, 1)

). We verify whether

m_{s_{0}} (w_{0}, 0) = m_{s_{1}} (w_{1}, 1)

for some candidate pair

(s_{1}, w_{1})

. Our approach is described below and illustrated in Figure 2, which assumes the existence of values

z_{11}, z_{12}

, such that

p_{1} (z_{12}) = p_{2} (z_{11})

. It should be apparent from Figure 2 that

m_{s_{0}} (w_{0}, 0) = m_{s_{1}} (w_{1}, 1)

if and only if the measure of the red area is zero.

Figure 2. Verifying whether

m_{s_{0}} (w_{0}, 0) = m_{s_{1}} (w_{1}, 1)

.

Figure 2. Verifying whether

m_{s_{0}} (w_{0}, 0) = m_{s_{1}} (w_{1}, 1)

.

The measures of the yellow area, the yellow plus the green area and the yellow plus the red area are identified directly from the data. The measure of the yellow area can then be learned as

(yellow+green) - green

, and finally, the measure of the red area as

(yellow+red) - yellow

.

The formal identification argument is as follows. First,

D_{0}^{*} (0, s_{0}, w_{0}) \supset {0, p_{1} (z_{11}), 0, p_{1} (z_{12})}, D_{0}^{*} (1, s_{1}, w_{1}) \supset {p_{1} (z_{11}) p_{2} (z_{11})} .

Using (i) and (ii) of Definition 1, it follows that

V = p_{1} (z_{11}) p_{1} (z_{12}) = p_{1} (z_{11}) p_{2} (z_{11}) \in D_{1}^{*} (0, s_{0}, w_{0}) \cap D_{1}^{*} (1, s_{1}, w_{1})

. Thus,

\{\begin{matrix} θ (V, m_{s_{0}} (w_{0}, 0)) & = π_{s_{0} 0} (w_{0}, z_{12}) - π_{s_{0} 0} (w_{0}, z_{11}), \\ θ (V, m_{s_{1}} (w_{1}, d_{1})) & = π_{s_{1} 1} (w_{1}, z_{11}), \end{matrix}

are both identified; they are equal if and only if

m_{s_{1}} (w_{1}, 1) = m_{s_{0}} (w_{0}, 0)

. ☐

In Example 3 it is implicitly assumed that

z_{11}, z_{12} \in S_{z} (w_{0})

and that

z_{11} \in S_{z} (w_{1})

. However, Theorem 1 does not require this. Indeed, if there exist

z_{110}, z_{111}

, such that

p_{1} (z_{110}) = p_{1} (z_{111})

,

p_{1} (z_{12}) = p_{2} (z_{111})

and both

z_{110}, z_{12} \in S_{z} (w_{0})

and

z_{111} \in S_{z} (w_{1})

, then we can match

π_{s_{0} 0} (w_{0}, z_{12}) - π_{s_{0} 0} (w_{0}, z_{110})

with

π_{s_{1} 1} (w_{1}, z_{111})

to obtain

m_{s_{0}} (w_{0}, 0) = m_{s_{1}} (w_{1}, 1)

.

Example 4 (Verifying that

m_{s_{1}} (w_{1}, 1) = m_{s_{2}} (w_{2}, 2)

). We now turn to the task of verifying that

m_{s_{1}} (w_{1}, 1) = m_{s_{2}} (w_{2}, 2)

once

m_{s_{0}} (w_{0}, 0) = m_{s_{1}} (w_{1}, 1)

has been established. The procedure is illustrated in Figure 3 and described below, which presumes the existence of

z_{21}, z_{22}

for which

p_{3} (z_{22}) = p_{2} (z_{21})

.

Again, the question is whether the measure of the red area equals zero. Pink, orange and yellow are directly identified, which allows us to deduce

(pink + orange)

. Further,

(pink + orange + yellow + red) = π_{s_{0} 0} (w_{0}, z_{21}) + π_{s_{1} 1} (w_{1}, z_{21})

is identified, and hence, so is

(yellow + red)

, which in turn implies the identification of red.

Figure 3. Verifying whether

m_{s_{1}} (w_{1}, 1) = m_{s_{2}} (w_{2}, 2)

given that

m_{s_{0}} (w_{0}, 0) = m_{s_{1}} (w_{1}, 1)

.

Figure 3. Verifying whether

m_{s_{1}} (w_{1}, 1) = m_{s_{2}} (w_{2}, 2)

given that

m_{s_{0}} (w_{0}, 0) = m_{s_{1}} (w_{1}, 1)

.

Formally, it follows from Example 3 that

D_{t}^{*} (0, s_{0}, w_{0}) = D_{t}^{*} (1, s_{1}, w_{1})

for all

t \geq 2

. Therefore, for sufficiently large t,

V = p_{2} (z_{22}) p_{2} (z_{21}) \in D_{t}^{*} (0, s_{0}, w_{0})

. However, since

V = p_{2} (z_{22}) p_{3} (z_{22}) \in D_{2}^{*} (0, s_{2}, w_{2})

, the equality of

m_{s_{0}} (w_{0}, 0)

and

m_{s_{2}} (w_{2}, 2)

can be verified using the set V. ☐

Once we have ascertained that

m_{s_{0}} (w_{0}, 0) = m_{s_{1}} (w_{1}, 1) = m_{s_{2}} (w_{2}, 2)

, we can identify

\begin{matrix} θ (0 p_{3} (z_{22}), & m_{s_{0}} (w_{0}, 0)) = \\ θ (0 p_{1} (z_{22}), m_{s_{0}} (w_{0}, 0)) + θ (p_{1} (z_{22}) p_{2} (z_{22}), m_{s_{1}} (w_{1}, 1)) + θ (p_{2} (z_{22}) p_{3} (z_{22}), m_{s_{2}} (w_{2}, 2)), \end{matrix}

since

(0, p_{3} (z_{22})] = (0, p_{1} (z_{22})] \cup (p_{1} (z_{22}), p_{2} (z_{22})] \cup (p_{2} (z_{22}), p_{3} (z_{22})]

.

When the support of

z

and

w

is the Cartesian product of the marginals (as in these examples), Assumption E is reduced to the requirement that

p_{d}

has sufficient variability and

z

sufficiently rich support, as in Corollary 1.

5. Identification of ψ

We now turn to the identification of the main object of interest, i.e.,

ψ^{*} = ψ (x^{*}, s^{*}, d^{*})

, for which we use the fact that the m function is identified.

Recall from (14) that for

A \subset S_{u v}

,

κ (A, a) = E [g (a, ϵ) {(u, v) \in A}] .

The role of κ is similar to that of the function θ in Section 4. Indeed, if A is a set of positive measure, then by Assumption C,

κ (A, a) = κ (A, \tilde{a})

if and only if

a = \tilde{a}

. We start with the identification of κ.

Let

S_{w z} (x)

be the support of

(w, z)

conditional on

x = x

. We define

M

to be the collection of

(d, s, w)

triples for which

m_{s} (w, d)

and

m_{s + 1} (w, d)

are both identified. Formally, we let

M^{*} (s) = \{\begin{matrix} \{(d, w) : U \in D^{*} (d, 1, w)\}, & s = 0, \\ \{(d, w) : U \in D^{*} (d, s, w) \cap D^{*} (d, s + 1, w)\}, & 1 \leq s \leq η_{s} - 1, \\ \{(d, w) : U \in D^{*} (d, η_{s}, w)\}, & s = η_{s}, \end{matrix}

M = \{(d, s, w) : (d, w) \in M^{*} (s)\},

and

K (x, s, d) = \{(p_{d} (z), p_{d + 1} (z)] \times (m_{s} (w, d), m_{s + 1} (w, d)] : (w, z) \in S_{w z} (x) and (d, s, w) \in M\} .

10

Therefore, by Theorem 1

K (x, s, d)

is a collection of nonempty rectangles whose corner points are all identified under Assumptions A and B. Moreover, for

K = p_{d} (z) p_{d + 1} (z) \times m_{s} (w, d) m_{s + 1} (w, d)

,

κ (K, α (x, s, d))

is identified, because

κ (K, α (x, s, d)) = E [y (d = d) (s = s) | x = x, w = w, z = z] .

(23)

We now extend

K (x, s, d)

to a larger class of sets K for which the identification of

κ (K, α (x, s, d))

obtains.

Definition 2.

D (x, s, d)

is the collection

D_{\infty} (x, s, d)

in the following iterative scheme. Let

D_{0} (x, s, d) = K (x, s, d)

. Then, for all

t \geq 0

,

D_{t + 1} (x, s, d)

consists of all sets

A^{*}

, such that at least one of the following four conditions is satisfied, where

μ^{*}

denotes the standard Lebesgue measure over

U^{2}

.

(i): $A^{*} \in D_{t} (x, s, d)$ ;
(ii): $\exists A_{1}, A_{2} \in D_{t} (x, s, d) : A_{1} \subset A_{2}, μ^{*} (A_{2} - A_{1}) > 0, A^{*} = A_{2} - A_{1}$ ;
(iii): $\exists A_{1}, A_{2} \in D_{t} (x, s, d) : A_{1} \cap A_{2} = \emptyset, μ^{*} (A_{1} \cup A_{2}) > 0, A^{*} = A_{1} \cup A_{2}$ ;
(iv): $\exists (\tilde{x}, \tilde{s}, \tilde{d}) : α (\tilde{x}, \tilde{s}, \tilde{d}) = α (x, s, d), D_{t} (x, s, d) \cap D_{t} (\tilde{x}, \tilde{s}, \tilde{d}) \neq \emptyset, A^{*} \in D_{t} (\tilde{x}, \tilde{s}, \tilde{d})$ . ☐

The collection

D (x, s, d)

(like

D^{*} (d, s, w)

) consists of sets defined in terms of the unknown

p_{d}, m_{s}, α

functions, such that

D (x, s, d)

can be interpreted as a set of unknown parameters.

Lemma 2. Suppose that Assumptions A to C and E are satisfied.

(i): For all $(x^{*}, s^{*}, d^{*}) \in S_{x s d}$ , every $K \in D (x^{*}, s^{*}, d^{*})$ is identified;
(ii): $κ (K, α (x^{*}, s^{*}, d^{*}))$ is identified whenever $(x, s, d) \in S_{x s d}$ and $K \in D (x^{*}, s^{*}, d^{*})$ .

Assumption F.

U^{2} \in D (x^{*}, d^{*}, s^{*})

.

Like for Assumption E, Assumption F equivalently requires that there be a finite T, such that

U^{2} \in D_{T} (x^{*}, d^{*}, s^{*})

.

Theorem 2. Suppose that Assumptions A to C and F are satisfied. Then,

ψ^{*}

is identified.

Our method for identifying ψ is similar to our method for identifying m described in Section 4:

D (x, s, d)

is now generated from a collection of rectangles, not a collection of intervals. Further, if we can ascertain that

α (x^{*}, s^{*}, d^{*}) = α (\bar{x}, \bar{s}, \bar{d})

, then

D (x^{*}, s^{*}, d^{*}) \cap D (\bar{x}, \bar{s}, \bar{d}) \neq \emptyset

implies that the two collections in fact coincide. This is particularly helpful when

s^{*} \neq s

and

d^{*} \neq \bar{d}

.

We now state a set of sufficient conditions for the identification of

ψ^{*}

.

Corollary 2 (Sufficient conditions). Suppose that there exists a sequence

\{x_{i j} \in S_{x} : i = 0, 1, \dots, η_{s}, j = 0, 1, \dots, η_{d}\}

, such that

α (x_{i j}, i, j) = α (x^{*}, s^{*}, d^{*})

for all

i, j

. Further, suppose that

S_{x w z} = S_{x} \times S_{w} \times S_{z}

. If

w, z

are continuously distributed and that for some continuous functions

m_{1}, \dots, m_{η_{s}}, p_{1}, \dots, p_{η_{d}}

,

(i): for $i = 1, \dots, η_{s} - 1$ and $j = 0, 1, \dots, η_{d}$

$inf_{w \in S_{w}} m_{i + 1} (w, j) < sup_{w \in S_{w}} m_{i} (w, j),$
(ii): for $i = 1, 2, \dots, η_{s}$ and $j = 1, \dots, η_{d} - 1$ ,

$inf_{w \in S_{w}} m_{i} (w, j + 1) < sup_{w \in S_{w}} m_{i} (w, j), inf_{z \in S_{z}} p_{j + 1} (z) < sup_{z \in S_{z}} p_{j} (z) .$

Then,

ψ^{*}

is identified.

Corollary 2 is a two-dimensional analog to Corollary 1.

We now consider a simple example that illustrates the basics of the machinery developed above. The example is limited relative to the theoretical results in several respects, which we discuss after the example.

Example 5. We will focus on the simplest interesting case, i.e.,

η_{s} = η_{d} = 2

with covariate support

S_{x w z} = S_{x} \times S_{w} \times S_{z}

. Because of the absence of support restrictions, we will use

K (s, d)

instead of

K (x, s, d)

in this example. Identification of

p_{d}

is trivial, and identification of

m_{s}

was discussed in Section 4, so the discussion below starts from the point at which identification of

p_{d}

and

m_{s}

has already been established.

The example is illustrated in Figure 4, which depicts a situation in which

ψ^{*}

is identified for all values of

x^{*}, s^{*}, d^{*}

provided that

α (x, s^{*}, d^{*})

varies sufficiently as a function of x. In the discussion below, we assume that there exists a

{x_{s d}}

, such that

α (x_{s d}, s, d)

is the same for all values of s and d, such that the existence of the

w, z

combinations in Figure 4 is sufficient. We show that for such

{x_{s d}}

,

D (x_{s d}, s, d)

is the same for all values of

s, d

, which implies that

U^{2}

is an element of

D (x_{s d}, s, d)

for all

s, d

, which implies identification. From hereon, we use the shorthand notation

D (s, d)

to mean

D (x_{s d}, s, d)

.

Figure 4. Identification of ψ if

η_{s} = η_{d} = 2

.

Figure 4. Identification of ψ if

η_{s} = η_{d} = 2

.

We start by showing that

D (1, 1) = D (0, 1)

if

α (x_{11}, 1, 1) = α (x_{01}, 0, 1)

. Let

K_{h r i j} = p_{h}^{*} p_{r}^{*} \times m_{i}^{*} m_{j}^{*}, h = 0, 1; r = h + 1, \dots, 2; i = 0, \dots, 5; j = i + 1, \dots, 6 .

Since

p_{1} (z_{1}) = p_{1}^{*}

,

p_{2} (z_{1}) = p_{2}^{*}

,

m_{1} (w_{1}, 1) = m_{1}^{*}

and

m_{2} (w_{1}, 1) = m_{4}^{*}

, it follows that

K_{1214} \in D (1, 1)

. Likewise, using

m_{1} (w_{1}, 1) = m_{1}^{*}

and

m_{1} (w_{2}, 1) = m_{4}^{*}

, it follows that

K_{1201}, K_{1204} \in D (0, 1)

, which implies that

K_{1214} = K_{1204} \cap K_{1201} \in D (0, 1)

, also. Therefore,

K_{1214} \in D (1, 1) \cap D (0, 1)

, such that by the assumption on α made earlier in the example and Condition (iv) of Definition 2,

D (1, 1) = D (0, 1)

.

We next show that

D (0, 0) = D (1, 0) = D (2, 0)

. Now,

K_{0145} \in D (1, 0)

, because

m_{1} (w_{3}, 0) = m_{4}^{*} < m_{5}^{*} = m_{2} (w_{3}, 0)

. Further,

m_{1} (w_{3}, 0) = m_{4}^{*} < m_{5}^{*} = m_{1} (w_{4}, 0)

implies that

K_{0104}, K_{0105} \in D (0, 0)

and, hence, that

K_{0145} = K_{0105} \cap K_{0104} \in D (0, 0)

. Likewise,

m_{2} (w_{5}, 0) = m_{4}^{*} < m_{5}^{*} = m_{2} (w_{3}, 0)

, such that

K_{0145} = K_{0146} - K_{0156} \in D (2, 0)

. Consequently,

K_{0145} \in D (0, 0) \cap D (1, 0) \cap D (2, 0)

, which (together with the assumption on α used in this example) implies that

D (0, 0) = D (1, 0) = D (2, 0)

.

Given that

m_{1} (w_{6}, 1) = m_{2}^{*}

, it follows that

K_{1202} \in D (0, 1)

. Likewise, using

w_{7}

,

K_{0102}, K_{0202} \in D (0, 0)

, and hence,

K_{1202} = K_{0102} \cap K_{0202} \in D (0, 0)

, also. Repeating the same argument for

w_{8}

results in

D (0, 0) \cap D (0, 1) \cap D (0, 2) \neq \emptyset

, and hence,

D (0, 0) = D (0, 1) = D (0, 2) = D (1, 0) = D (1, 1) = D (2, 0)

.

Finally, using

w_{9}, w_{0}

, it follows that

K_{2334} \in D (1, 2) \cap D (2, 2)

, and using

w_{8}, w_{9}

, it can be deduced that

K_{1224} \in D (1, 1) \cap D (1, 2)

, such that

D (s, d)

is identical for all

s, d

.

To see that

U^{2} \in D (1, 1)

, note that each of the nine rectangles with solid boundaries in Figure 4 belongs trivially to some

D (s, d)

(e.g.,

K_{1224} \in D (1, 1)

). Since the union of the nine rectangles is exactly

U^{2}

and

D (s, d)

is the same for all

s, d

, identification is hereby established. ☐

In the above example, it was shown that

D (s, d)

was the same for all values of

s, d

. This is not necessary for the identification of

ψ^{*}

. Indeed, all that is required is that

U^{2} \in D (x^{*}, s^{*}, d^{*})

; it does not matter which combinations of

(s, d)

pairs are matched with each other, as long as the Dynkin system generated by the union of their

K

-sets includes

U^{2}

as an element.

Example 5 is limited in several respects. First, the support of covariates was assumed to be the Cartesian product of the marginal supports and to be independent of

s, d

. With support restrictions, the procedure to establish identification of

ψ^{*}

would be similar, but more care should be taken in the selection of

w, z

pairs to ensure that the support restrictions are satisfied. For instance, Figure 4 of Example 5 indicates that

(w_{j}, z_{1})

belongs to

S_{w z}

for a number of different values of j, but this condition can be relaxed in numerous ways.

Further, it was assumed that

η_{s} = η_{d} = 2

. With more than two categories, the essence of the identification procedure does not change, but Figure 4 would be messier. An essential ingredient of Example 5 is that there are values of

z_{1}, z_{3}

for which

p_{1} (z_{1}) = p_{2} (z_{3})

and likewise for

m_{s}

. This is analogous to Corollary 1. It should be pointed out that with more than three categories (

η_{d} > 2

or

η_{s} > 2

), it is not necessary for there to be a

z_{4}

-value for which

p_{1} (z_{1}) = p_{3} (z_{4})

. Indeed, what is needed is for there to be a pair

z_{4}, z_{5}

, such that

p_{2} (z_{4}) = p_{3} (z_{5})

. As mentioned earlier, such a chaining argument can be extended to any number of categories, i.e., one could obtain a set of sufficient conditions similar to those in Corollary 1.

6. Decomposing the Effect of $d$

As mentioned in Section 2, it is possible to use the methodology developed in this paper to identify objects that are not based on ψ. In this section, we show that the average total effect and its decomposition in (6) is indeed identified by the same method. For this purpose, we will explain how to use the matches of the m and α functions, because we have already explained in detail how to achieve those matches and how Dynkin systems can help.

We focus on the special case with binary

s, d

; the general case is similar. We discuss the identification of

ψ^{\circ} (x, w, d, \tilde{d}) = E g (α (x, s_{d} (w), \tilde{d}), ϵ),

where

s_{d} (w)

is the counterfactual value of

s

when

d

is fixed at d given

w = w

, i.e.,

s_{d} = {v > m_{1} (w, d)}

. Therefore, (6) is now

\begin{matrix} \underset{Average Total Effect of d}{\underset{︸}{ψ^{\circ} (x, w, 1, 1) - ψ^{\circ} (x, w, 0, 0)}} \\ = \underset{Direct Effect of d}{\underset{︸}{ψ^{\circ} (x, w, 1, 1) - ψ^{\circ} (x, w, 1, 0)}} + \underset{Indirect Effect of d}{\underset{︸}{ψ^{\circ} (x, w, 1, 0) - ψ^{\circ} (x, w, 0, 0)}} . \end{matrix}

(24)

We note that

ψ^{\circ}

and ψ are different objects unless

v

and

ϵ

are known to be independent11. However, the identification of

ψ^{\circ}

can also be achieved using our matching procedure.

We focus on

ψ^{\circ} (x, w, 0, 0)

; the other cases are similar. We have

ψ^{\circ} (x, w, 0, 0) = E [{d = 0} g (α (x, s_{0} (w), 0), ϵ)] + E [{d = 1} g (α (x, s_{0} (w), 0), ϵ)] .

The first term on the right-hand side can be identified by using

E [{d = 0} y | x = x, w = w, z]

. For the second term on the right-hand side, consider

E [(d = 1) g (α (x, s_{0} (w), 0), ϵ) | x = x, w = w, z = z],

which can be written as

\begin{matrix} E [{u > p_{1} (z)} {v \leq m_{1} (w, 0)} g (α (x, 0, 0), ϵ) & ] + \\ E [{u > p_{1} (z)} {v > m_{1} (w, 0)} g (α (x, 1, 0), ϵ)] . \end{matrix}

(25)

The method developed in the paper explains how to find

(x, w)

and

(\tilde{w}, \tilde{x})

, such that

α (x, 1, 0) = α (\tilde{x}, 1, 1)

and

m_{1} (w, 0) = m_{1} (\tilde{w}, 1)

. Identification of the second term in (25) then follows from the fact that it is equal to

E [{u > p_{1} (z)} {v > m_{1} (\tilde{w}, 1)} g (α (\tilde{x}, 1, 1), ϵ)] = E [y (s = 1) (d = 1) | z = z, w = w, x = \tilde{x}] .

The first term in (25) can be dealt with similarly.

Given that

ψ^{\circ}

is identified, the total, direct and indirect effects of

d

in (24) are all identified.

7. Sketch of an Estimation Procedure

Below follows a sketch of a simple estimation procedure of

ψ^{*} = ψ (x^{*}, s^{*}, d^{*})

. This procedure is provided to demonstrate how

ψ^{*}

can be estimated, but in order to keep the sketch simple, we will make several assumptions, which are much stronger than those made in the identification portion of this paper. For instance, we shall assume that the joint support of

(w, z)

is the Euclidean product of the marginal supports, that

s, d

only take the values

0, 1, 2

and that there is sufficient variation in

z, p_{d} (z)

to allow for the matches used. More complicated procedures can be devised that exploit some salient features of this paper (such as chaining) and lift such restrictions, but such procedures are beyond the scope of this paper, which primarily deals with identification. In earlier work [3], we provide rigorous results for an estimation procedure that does not impose a joint support assumption, albeit in a considerably simpler model than the one considered here.

We will moreover not be assuming the use of any particular nonparametric methodology. Most objects to be estimated can be expressed as conditional expectations (or probabilities), sometimes with estimated regressors. Some of these conditional expectations are then integrated with respect to one of the conditioning variables à la [28]. There are numerous important details in the theoretical development and empirical implementation of such methods, but these can by now be considered to be well established, and elaborate discussions thereof are available in various places in the literature. Hence, we do not discuss them here. Whenever an object is estimable by the standard nonparametric methodology (ENPM) we will so indicate.

7.1. Estimation of m

We commence our discussion with the estimation of

m_{s} (w, 1)

. Please note that

m_{s} (w, 1) = \sum_{d = 0}^{2} E λ_{s d} (w, z),

(26)

where

λ_{s d} (w, z) = P (s < s, d = d | J_{d} (s, w) = J_{1} (s, w), z = z)

with

J_{d} (s, w) = P (p_{1} (z) < u \leq p_{2} (z), v \leq m_{s} (w, d)) .

(27)

Once estimates of

J_{0}, J_{1}, J_{2}

are available,

λ_{s 0}, λ_{s 1}, λ_{s 2}

are ENPM, and

m_{s} (w, 1)

can then be estimated by integrating out over z in the spirit of [28].

Now,

J_{1} (s, w) = \int P (d = 1, s < s | w = w, z = z) d F_{z} (z)

, which is ENPM. For the estimation of

J_{0}, J_{2}

, it is helpful to introduce

ζ_{s d j} (w, p) = P (s < s, d = d | w = w, p_{j} (z) = p)

, which is ENPM given that

p_{d} (z) = P (d < d | z = z)

. Since

J_{d} (s, w) = \{\begin{matrix} E ζ_{s 01} (w, p_{2} (z)) - E ζ_{s 01} (w, p_{1} (z)), & d = 0, \\ E ζ_{s 22} (w, p_{1} (z)) - E ζ_{s 22} (w, p_{2} (z)), & d = 2, \end{matrix}

they too are ENPM.

Finally, to obtain estimates of

m_{s} (w, 0)

and

m_{s} (w, 2)

, one can simply estimate

m_{s} (w, d) = E (m_{s} (w, 1) | J_{1} (s, w) = J_{d} (s, w)) .

7.2. Estimation of ψ

We focus here on the estimation of

ψ^{*} = ψ (x^{*}, s^{*}, d^{*})

for

s^{*} = d^{*} = 1

; other combinations of

(s^{*}, d^{*})

can be handled analogously. Let

ρ_{s d} = y (s = s) (d = d)

. Please note that

ψ^{*} = \sum_{s, d = 0}^{2} E ν_{s d} (x^{*}, w, z) with ν_{s d} (x^{*}, w, z) = E (ρ_{s d} | α (x, s, d) = α (x^{*}, 1, 1), w = w, z = z .)

Naturally,

ν_{11} (x^{*}, w, z)

is ENPM. For

s \neq 1

and/or

d \neq 1

, other methods must be developed to estimate

ν_{s d} (x^{*}, w, z)

. We will focus on the case

s = d = 0

, where the other cases can be handled analogously and possibly (if

s = s^{*}

or

d = d^{*}

) more easily.

Let

κ_{j t}^{*} (x, w, z) = E (ρ_{00} | x = x, p_{1} (z) = p_{j} (z), m_{1} (w, 0) = m_{t} (w, 1)),

which is ENPM. Define

W (x, w, z) = \{κ_{22}^{*} (x, w, z) - κ_{21}^{*} (x, w, z) - κ_{12}^{*} (x, w, z) + κ_{11}^{*} (x, w, z)\} - E (ρ_{11} | x = x^{*}, w = w, z = z),

which is ENPM. Then,

W^{*} (x) = E W (x, w, z) = 0

is equivalent to

α (x, 0, 0) = α (x^{*}, 1, 1)

. Finally,

ν_{00} (x^{*}, w, z) = E (ρ_{00} | W^{*} (x) = 0, w = w, z = z)

is ENPM.

Acknowledgments

This paper is based on research supported by National Science Foundation Grant SES–0922127. We thank the Human Capital Foundation (http://www.hcfoundation.ru/en) and especially Andrey P. Vavilov for their support of the Center for Auctions, Procurements and Competition Policy (CAPCP, http://capcp.psu.edu) at Penn State University. We thank Andrew Chesher, Elie Tamer, Xavier d’Haultfoeuille, (other) participants of the 2010 Cowles foundation workshop and the 2012 conference by Centre Interuniversitaire de Recherche en Economie Quantitative (CIREQ) and Centre for Microdata Methods and Practice (CEMMAP), as well as the referees for their helpful comments.

Author Contributions

All of the authors made contributions to all parts of the paper.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix

A. Proofs

Proof of Lemma 1. We show both parts simultaneously and use transfinite induction: i.e., we will show: (i) that

D_{0}^{*} (d, s, w)

has a property; (ii) that if

D_{t}^{*} (d, s, w)

has the property, then

D_{t + 1} (d, s, w)

has the property too; and (iii) that if for all t,

D_{t} (d, s, w)

has the property, then

D_{\infty} (d, w, s)

must have the property, as well. Please note that (iii) is trivial, because

{D_{t} (d, s, w)}

is an increasing sequence of sets, and therefore,

D_{\infty} (d, s, w) = \cup_{t = 0}^{\infty} D_{t} (d, w, s)

. Therefore, we will establish (i) and (ii) below.

For all

(d, s, w)

, any

V_{0} \in D_{0}^{*} (d, s, w)

can be expressed as

V_{0} = p_{d} (z) p_{d + 1} (z)

for some

z \in S_{z} (w)

and is hence identified and satisfies

θ (V_{0}, m_{s} (w, d)) = P (s < s, d = d | w = w, z = z)

, which is hence also identified.

Now, suppose that for arbitrary t and all

(d, s, w)

, identification of

V_{t}, θ (V_{t}, m_{s} (w, d))

has been established for all

V_{t} \in D_{t}^{*} (d, s, w)

. We now establish identification of

\{V_{t + 1}, θ (V_{t + 1}, m_{s} (w, d))\}

for any set

V_{t + 1} \in D_{t + 1}^{*} (d, s, w)

and any

(d, s, w)

.

Since

V_{t + 1} \in D_{t + 1}^{*} (d, s, w)

, it must be the set

A^{*}

in one of the four conditions in Definition 1. We verify identification in each of the four cases. First (i): if

V_{t + 1} \in D_{t}^{*} (d, s, w)

, then identification of both objects is trivial. Now (ii): since both

V_{t + 1}

and

θ (V_{t + 1}, m_{s} (w, d))

are differences between two identified objects, they are identified, also. The argument is analogous for (iii).

Finally, (iv): We know that

V_{t + 1} \in D_{t}^{*} (\bar{d}, \bar{s}, \bar{w})

where

\bar{d}, \bar{s}, \bar{w}

are such that there exists a set

V^{*} \in D_{t}^{*} (d, s, w) \cap D_{t}^{*} (\bar{d}, \bar{s}, \bar{w})

. Since all sets in

D_{t}^{*} (d, s, w)

and

D_{t}^{*} (\bar{d}, \bar{s}, \bar{w})

are identified, the existence and identification of such a set

V^{*}

can be established. Further,

θ (V^{*}, m_{s} (w, d))

and

θ (V^{*}, m_{\bar{s}} (\bar{w}, \bar{d}))

are both identified and equal if and only if

m_{s} (w, d) = m_{\bar{s}} (\bar{w}, \bar{d})

. Given that

V_{t + 1}

belongs to

D_{t}^{*} (\bar{d}, \bar{s}, \bar{w})

, it is identified and so is

θ (V_{t + 1}, m_{s} (w, d))

, because it is known to equal

θ (V_{t + 1}, m_{\bar{s}} (\bar{w}, \bar{d}))

, which is identified. ☐

Proof of Theorem 1. This follows from the fact that

θ (U, m_{s^{*}} (d^{*}, w^{*})) = m_{s^{*}} (d^{*}, w^{*})

. ☐

Proof of Corollary 1. We use mathematical induction. Suppose that for some

1 \leq i \leq η_{d}

, it has been established that

\forall j < i : D^{*} (0, s_{0}, w_{0}) = D^{*} (j, s_{j}, w_{j})

. By (21), there exists a

z_{1}, z_{2}

for which

p_{i - 1} (z_{1}) = p_{i} (z_{2})

. Now,

p_{i - 1} (z_{2}) p_{i} (z_{2}) \{\begin{matrix} \in D^{*} (i, s_{i}, w_{i}), \\ = 0 p_{i - 1} (z_{1}) - 0 p_{i - 1} (z_{2}) \in D^{*} (0, s_{0}, w_{0}), \end{matrix}

such that

D^{*} (0, s_{0}, w_{0}) = D^{*} (i, s_{i}, w_{i})

. ☐

Proof of Lemma 2. The proof is very similar to, but somewhat more complicated than, that of Lemma 1. We establish both parts simultaneously and again use transfinite induction, for which we note that

D_{\infty} (x, s, d) = \cup_{t = 0}^{\infty} D_{t} (x, s, d)

.

For all

(x, s, d)

, any

K_{0} \in D_{0} (x, s, d)

can be expressed as

K_{0} = p_{d} (z) p_{d + 1} (z) \times m_{s} (w, d) m_{s + 1} (w, d)

for some

(w, z) \in S_{w z} (x, s, d)

for which

(d, s, w) \in M

.

K_{0}

is hence identified and satisfies

κ (K_{0}, α (x, s, d)) = E (y (d = d) (s = s) | x = x, w = w, z = z),

which is hence also identified.

Now, suppose that for arbitrary t and all

(x, s, d)

identification of

K_{t}, κ (K_{t}, α (x, s, d))

has been established for all

K_{t} \in D_{t} (x, s, d)

. We now establish identification of

\{K_{t + 1}, κ (K_{t + 1}, α (x, s, d)})

for any set

K_{t + 1} \in D_{t + 1} (x, s, d)

and any

(x, s, d)

.

Since

K_{t + 1} \in D_{t + 1} (x, s, d)

, it must be the set

A^{*}

in one of the four conditions in Definition 2. We verify identification in each of the four cases. First (i): if

K_{t + 1} \in D_{t} (x, s, d)

, then identification of both objects is trivial. Now (ii): since both

K_{t + 1}

and

κ (K_{t + 1}, α (x, s, d))

are differences between two identified objects, they are identified, also. The argument is analogous for (iii).

Finally (iv): We know that

K_{t + 1} \in D_{t} (\bar{x}, \bar{s}, \bar{d})

, where

\bar{x}, \bar{s}, \bar{d}

are such that there exists a set

K^{*} \in D_{t} (x, s, d) \cap D_{t} (\bar{x}, \bar{s}, \bar{d})

. Since all sets in

D_{t} (x, s, d)

and

D_{t} (\bar{x}, \bar{s}, \bar{d})

are identified, the existence and identity of such a set

K^{*}

can be established. Further,

κ (K^{*}, α (x, s, d))

and

κ (K^{*}, α (\bar{x}, \bar{s}, \bar{d}))

are both identified and equal if and only if

α (x, s, d) = α (\bar{x}, \bar{s}, \bar{d})

by Assumption C. Given that

K_{t + 1}

belongs to

D_{t} (\bar{x}, \bar{s}, \bar{d})

, it is identified and so is

κ (K_{t + 1}, α (x, s, d))

, because it is equal to

κ (K_{t + 1}, α (\bar{x}, \bar{s}, \bar{d}))

, which is identified. ☐

Proof of Theorem 2. When

S_{u v} \subset K

, we have

ψ (x^{*}, s^{*}, d^{*}) = κ (K, α (x^{*}, s^{*}, d^{*}))

. Apply the previous theorem. ☐

References

R.W. Blundell, and J.L. Powell. “Endogeneity in semiparametric binary response models.” Rev. Econ. Stud. 71 (2004): 655–679. [Google Scholar]
S. Jun, J. Pinkse, and H. Xu. “Tighter bounds in triangular systems.” J. Econom. 161 (2011): 122–128. [Google Scholar]
S.J. Jun, J. Pinkse, and H.Q. Xu. “Discrete endogenous variables in weakly separable models.” Econom. J. 15 (2012): 288–312. [Google Scholar]
E. Vytlacil, and N. Yildiz. “Dummy endogenous variables in weakly separable models.” Econometrica 75 (2007): 757–779. [Google Scholar] [CrossRef]
G.W. Imbens, and J.M. Wooldridge. “Recent Developments in the Econometrics of Program Evaluation.” J. Econ. Lit. 47 (2009): 5–86. [Google Scholar]
B. Jacob, L. Lefgren, and E. Moretti. “The dynamics of criminal behavior evidence from weather shocks.” J. Hum. Resour. 42 (2007): 489–527. [Google Scholar]
D. Black, and J. Smith. “How robust is the evidence on the effects of college quality? Evidence from matching.” J. Econom. 121 (2004): 99–124. [Google Scholar]
K. Imai, and D. van Dyk. “Causal inference with general treatment regimes.” J. Am. Stat. Assoc. 99 (2004): 854–866. [Google Scholar]
A. Lewbel. “Endogenous selection or treatment model estimation.” J. Econom. 141 (2007): 777–806. [Google Scholar]
C. Flores, and A. Flores-Lagunes. Identification and Estimation of Causal Mechanisms and Net Effects of a Treatment under Unconfoundedness. Discussion Paper, IZA Discussion Paper; Bonn, Germany: The Institute for the Study of Labor (IZA), 2009. [Google Scholar]
L. Dearden, J. Ferri, and C. Meghir. “The effect of school quality on educational attainment and wages.” Rev. Econ. Stat. 84 (2002): 1–20. [Google Scholar]
M. Lechner. “Identification and estimation of causal effects of multiple treatments under the conditional independence assumption.” In Econometric Evaluation of Labour Market Policies. Berlin, Germany: Springer Science and Business Media, 2001, pp. 43–58. [Google Scholar]
T.J. Kane, and C.E. Rouse. “Labor-market returns to two-and four-year college.” Am. Econ. Rev. 85 (1995): 600–614. [Google Scholar]
A. Lambrecht, K. Seim, and C. Tucker. “Stuck in the adoption funnel: The effect of interruptions in the adoption process on usage.” Mark. Sci. 30 (2011): 355–367. [Google Scholar] [CrossRef]
J.G. Cragg. “Some statistical models for limited dependent variables with application to the demand for durable goods.” Econometrica 39 (1971): 829–844. [Google Scholar] [CrossRef]
R. Chiburis. “Semiparametric bounds on treatment effects.” J. Econom. 159 (2010): 267–275. [Google Scholar]
I. Mourifié. Sharp Bounds on Treatment Effects. Discussion Paper; Québec, Canada: Université de Montréal, 2012. [Google Scholar]
A. Shaikh, and E. Vytlacil. “Partial identification in triangular systems of equations with binary dependent variables.” Econometrica 79 (2011): 949–955. [Google Scholar]
X. D’Haultfœuille, and P. Février. “Identification of nonseparable models with endogeneity and discrete instruments.” Econometrica 83 (2015): 1199–1210. [Google Scholar]
J. Pinkse. “Nonparametric Regression Estimation Using Weak Separability.” PA, USA: Pennsylvania State University, Unpublished work. 2001. [Google Scholar]
J. Heckman, and E. Vytlacil. “Structural equations, treatment effects, and econometric policy evaluation.” Econometrica 73 (2005): 669–738. [Google Scholar]
V. Chernozhukov, and C. Hansen. “An IV model of quantile treatment effects.” Econometrica 73 (2005): 245–261. [Google Scholar] [CrossRef]
A. Chesher. “Identification in nonseparable models.” Econometrica 71 (2003): 1405–1441. [Google Scholar]
G. Imbens, and W. Newey. “Identification and estimation of triangular simultaneous equations models without additivity.” Econometrica 77 (2009): 1481–1512. [Google Scholar]
M. Frölich, and M. Huber. “Direct and Indirect Treatment Effects: Causal Chains and Mediation Analysis with Instrumental Variables.” Discussion Paper, IZA Discussion Paper; Bonn, Germany: IZA, 2014. [Google Scholar]
D. Card. “The wage curve: A review.” J. Econ. Lit. 33 (1995): 785–799. [Google Scholar]
S. Khan, and E. Tamer. “Irregular identification, support conditions, and inverse weight estimation.” Econometrica 78 (2010): 2021–2042. [Google Scholar]
O. Linton, and W. Härdle. “Estimation of additive regression models with known links.” Biometrika 83 (1996): 529–540. [Google Scholar]

^1.D’Haultfoeuille and Février (2015) [19] also uses a recursion scheme for the purpose of identification, but both their method and their model is different from ours.
^2.We allow for the possibility that $x, w, z$ are random vectors containing common elements, e.g., $x = {(x_{1}^{⊤}, x_{2}^{⊤})}^{⊤}$ and $w = {(x_{2}^{⊤}, w_{1}^{⊤}, w_{2}^{⊤})}^{⊤}$ and $z = {(x_{2}^{⊤}, w_{2}^{⊤}, z_{1}^{⊤})}^{⊤}$ , provided that at least one variable in each equation is excluded from the other equations.
^3.Under additive separability of the error term, both types of monotonicity are satisfied.
^4.We thank Elie Tamer for pointing this out.
^5.Indeed, let $s, d$ be binary; let $x = w$ ; and let $u, v, ϵ$ be independent uniform $(0, 1]$ . Define $g (α, ϵ) = \tilde{σ} Φ^{- 1} \{1 - (1 - ϵ) Φ (α / \tilde{σ})\} + α$ . Then, for parameter vectors $\tilde{β}, \bar{β}$ , and scale parameter $\tilde{σ}$ , letting $p (z) = Φ (- z^{⊤} \tilde{β})$ , $m_{1} (w, 0) = m_{1} (w, 1) = Φ (- w^{⊤} \bar{β} / \bar{σ})$ , $α (w, s, d) = - \infty$ if $s d = 0$ , and $α (w, 1, 1) = w^{⊤} \bar{β}$ , otherwise, reproduces the likelihoods in Equations (5) and (6), of [15]. We note however that our matching strategy will explicitly require that $x$ and $w$ can be varied separately.
^6.Note that $E g (α (x, s_{1} (w), 1), ϵ)$ is generally not equal to $ψ (x, s_{1} (w), 1)$ because $ϵ$ and $s_{1} (w)$ are dependent.
^7.A similar decomposition is studied by Frölich, M. and Huber, M. (2014) [25].
^8.We use ⊂ as a generic symbol for the subset, where some other authors might distinguish between proper and non-proper subsets.
^9.Please note that this is the infinite union of collections of sets, not the collection of infinite unions of sets. To see the difference, consider that $U = \cup_{n = 1}^{\infty} (1 / n, 1]$ , but $U \notin \cup_{n^{*} = 1}^{\infty} {(1 / n, 1]}_{n = 1}^{n^{*}}$ . It is the latter concept that is used here.
^10. $u$ is nonempty under the the conditions of Theorem 1.
^11.If $ϵ$ and $v$ are independent, then $ψ^{\circ} (1, d, d) = E ψ (1, s_{d}, d)$ .

© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons by Attribution (CC-BY) license ( http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jun, S.J.; Pinkse, J.; Xu, H.; Yıldız, N. Multiple Discrete Endogenous Variables in Weakly-Separable Triangular Models. Econometrics 2016, 4, 7. https://doi.org/10.3390/econometrics4010007

AMA Style

Jun SJ, Pinkse J, Xu H, Yıldız N. Multiple Discrete Endogenous Variables in Weakly-Separable Triangular Models. Econometrics. 2016; 4(1):7. https://doi.org/10.3390/econometrics4010007

Chicago/Turabian Style

Jun, Sung Jae, Joris Pinkse, Haiqing Xu, and Neşe Yıldız. 2016. "Multiple Discrete Endogenous Variables in Weakly-Separable Triangular Models" Econometrics 4, no. 1: 7. https://doi.org/10.3390/econometrics4010007

APA Style

Jun, S. J., Pinkse, J., Xu, H., & Yıldız, N. (2016). Multiple Discrete Endogenous Variables in Weakly-Separable Triangular Models. Econometrics, 4(1), 7. https://doi.org/10.3390/econometrics4010007

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multiple Discrete Endogenous Variables in Weakly-Separable Triangular Models

Abstract

1. Introduction

2. Model

3. Description

4. Identification of m

5. Identification of ψ

6. Decomposing the Effect of $d$

7. Sketch of an Estimation Procedure

7.1. Estimation of m

7.2. Estimation of ψ

Acknowledgments

Author Contributions

Conflicts of Interest

Appendix

A. Proofs

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Multiple Discrete Endogenous Variables in Weakly-Separable Triangular Models

Abstract

1. Introduction

2. Model

3. Description

4. Identification of m

5. Identification of ψ

6. Decomposing the Effect of d

7. Sketch of an Estimation Procedure

7.1. Estimation of m

7.2. Estimation of ψ

Acknowledgments

Author Contributions

Conflicts of Interest

Appendix

A. Proofs

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

6. Decomposing the Effect of $d$