Strategic Interaction Model with Censored Strategies

Jenish, Nazgul

doi:10.3390/econometrics3020412

Open AccessArticle

Strategic Interaction Model with Censored Strategies

by

Nazgul Jenish

Department of Economics, New York University, 19 West 4th Street, New York, NY 10012, USA

Econometrics 2015, 3(2), 412-442; https://doi.org/10.3390/econometrics3020412

Submission received: 24 March 2015 / Revised: 5 May 2015 / Accepted: 21 May 2015 / Published: 1 June 2015

(This article belongs to the Special Issue Spatial Econometrics)

Download Versions Notes

Abstract

:

In this paper, we develop a new model of a static game of incomplete information with a large number of players. The model has two key distinguishing features. First, the strategies are subject to threshold effects, and can be interpreted as dependent censored random variables. Second, in contrast to most of the existing literature, our inferential theory relies on a large number of players, rather than a large number of independent repetitions of the same game. We establish existence and uniqueness of the pure strategy equilibrium, and prove that the censored equilibrium strategies satisfy a near-epoch dependence property. We then show that the normal maximum likelihood and least squares estimators of this censored model are consistent and asymptotically normal. Our model can be useful in a wide variety of settings, including investment, R&D, labor supply, and social interaction applications.

Keywords:

tobit model; static incomplete information games; near-epoch dependent spatial processes

JEL classifications:

C13; C14; C21

1. Introduction

Identification and estimation of strategic interaction models have recently received a great deal of attention in econometrics, owing to the growing interest and application of stochastic games in various fields including industrial organization, labor, political and international economics. Most of the existing literature has focused on discrete choice games, see [1,2] for a survey of recent results. In this literature, the observed data are assumed to arise from an equilibrium of a game played by a finite number of players, and therefore, to be correlated across players. Typically, the number of players is assumed to be fixed, and the asymptotic inferential theory relies on a large number of independent repetitions of the same game in different markets or in a single market at different points of time. Two notable exceptions are Menzel [3] and Xu [4], who develop the inferential theory of discrete choice games based on a large number of players.

In this paper, we develop a new model of a static game of incomplete information with a large number of players. The model has two key distinguishing features. First, the strategies are subject to threshold effects, and can be interpreted as dependent censored random variables, e.g., R&D investment and labor supply. Second, the game is played in a single market and is not repeated over time. To develop the asymptotic theory, we instead assume that the number of players grows unboundedly, and the players reside on an exogenously given lattice so that the vector of their choices and characteristics can be viewed as a dependent random field, which can be handled by the limit theorems for near-epoch dependent (NED) random fields established by Jenish and Prucha [5].

We derive this model explicitly in two game-theoretical applications: (i) R&D investment by firms under strategic complementarities; and (ii) labor supply decision by women under peer effects. The set-up is standard for a static game of incomplete information: each player’s payoff function depends on her choice and choices of other players, her commonly observed characteristics, and her private characteristic unobserved by other players; players move simultaneously based on their expectations about the choices of other players, and in equilibrium, players have self-consistent expectations, see [6], i.e., their subjective expectations coincide with the expectation based on the equilibrium distribution of strategies conditional on commonly observed variables. We assume that private characteristics are i.i.d. normal across players, and prove existence and uniqueness of the pure strategy equilibrium under some mild conditions. We then show that the censored equilibrium strategies also satisfy the NED property under the same conditions.

Under normality of private shocks, the equilibrium strategies boil down to a Tobit econometric model. However, in contrast to the standard Tobit model, our censored model involves a non-zero threshold parameter that needs to be estimated. Therefore, we use the following two-step semiparametric procedure: we first estimate the threshold by the minimum order statistic of the uncensored subsample, and then estimate the remaining parameters either by the maximum likelihood or least squares method. Unlike the standard Tobit model, the maximum likelihood estimator does not strictly dominate the least squares estimator in our model due to discontinuous dependence of the likelihood on the first-step estimator, which may amplify finite-sample biases stemming from the first-step estimation. This provides a rationale for considering the least squares estimation as an alternative to the maximum likelihood procedure. We establish consistency and asymptotic distributions of these estimators. The minimum order statistic is n-consistent and asymptotically exponentially distributed, while the maximum likelihood and least squares estimators are

\sqrt{n}

-consistent and asymptotically normal. A Monte Carlo study suggests that all these estimators perform well in finite samples.

Finally, we address the computational challenges of our game with a large number of players. The standard estimation of games involves computing the equilibrium for each alternative parameter value and then optimizing the objective function over parameter values, and thus presents a formidable computational burden. To tackle it, we use the constrained optimization algorithm proposed by Su and Judd [7], which treats the equilibrium equations as constraints and optimizes simultaneously over parameters and equilibrium variables, thereby avoiding calculation of the equilibrium at each iteration on the parameter value. Su and Judd [7] show equivalence of this constrained optimization problem to the original problem. Our simulations confirm viability and a significant computational efficiency of the Su-Judd algorithm in our model.

To our knowledge, the proposed censored model has not been considered in the existing literature. Most of the existing results have dealt with discrete choice games, e.g., [8]. Recently, Xu and Lee [9] analyzed a spatially autoregressive (SAR) Tobit model, which can be viewed as a censored version the Cliff-Ord type linear SAR model with a known spatial weight matrix. Xu and Lee [9] establish the NED property as well as consistency and asymptotic normality of the maximum likelihood estimator using the limit theorems of Jenish and Prucha [5]. Though not explicitly demonstrated, this SAR Tobit model can be interpreted as an equilibrium of a static game of complete information, while our model is a game of incomplete information with a different concept of equilibrium and, consequently, qualitatively different implications. Moreover, the presence of latent endogenous variables and non-zero threshold in our model pose additional statistical and computational difficulties. Thus, the two papers are complementary to each other.

The paper is organized as follows. Section 2 describes and derives the model in two examples. Section 3 establishes existence and uniqueness of the equilibrium, and proves the NED property of the equilibrium strategies. Section 4 discusses identification and estimation of the model. Consistency and asymptotic distributions of the estimators are established in Section 5. Section 6 contains a Monte Carlo study, and Section 7 concludes. All proofs are collected in the appendices.

2. Model

In this paper, we are concerned with estimation of the following econometric model:

Y_{i n} = \{\begin{matrix} b_{0} + X_{i}^{'} β_{0} + \sum_{j \in N_{i}} α_{j 0} E_{i} (Y_{j n}) + ε_{i}, & b_{0} + X_{i}^{'} β_{0} + \sum_{j \in N_{i}} α_{j 0} E_{i} (Y_{j n}) + ε_{i} > γ_{0} \\ 0, & otherwise \end{matrix}

(1)

where

Y_{i n}

,

i = 1, . . ., n

, is the choice of agent i,

E_{i} (Y_{j n}) = E (Y_{j n} | F_{i})

is agent i’s expectation, given its information set

F_{i}

, about the choices of its neighbors,

Y_{j n}

, within the neighborhood of radius r,

N_{i} = N_{i} (r)

, containing a fixed number of neighbors k

= |N_{i}|

that does not depend on i;

X_{i}

is the vector of observed characteristics of agent i;

ε_{i}

is agent i’s private characteristic observed only by agent i, and

(α_{0}^{'}, β_{0}^{'}, b_{0}, γ_{0})

,

γ_{0} \geq 0

, is the vector of unknown coefficients. The distribution of private shocks is known to all players. It is assumed that

ε_{i} \sim

i.i.d.

N (0, σ_{0}^{2})

, and are independent of

\{X_{i}\}

. The information set of each player consists of the entire state vector,

{\bar{X}}_{n} = {(X_{1}^{'}, . . ., X_{n}^{'})}^{'},

and its private information

ε_{i}

, i.e.,

F_{i} = ({\bar{X}}_{n}, ε_{i})

.

The choice of player i is assumed to be directly affected by its neighbors only in a fixed neighborhood of the known radius,

r > 0

, with respect to some socio-economic metric. However, it will be indirectly affected by all other players. The number of the neighbors within the r-neighborhood of each agent is assumed to be fixed and equal to k. To avoid the incidental parameters problem, the k coefficients

{(α_{j 0})}_{j \in N_{i}}

, measuring the effect of these k neighbors, are assumed to depend only on the relative locations of i and j, but not on j or i. We formally specify the metric and the neighborhood structure in the following section.

The above assumptions seem reasonable in many empirical settings. For example, in their R&D decision, firms would take into account R&D of its neighbors within a certain distance in the geographic (or product characteristic) space, rather than all firms in the market. This is due to the fact that technological spillovers, knowledge diffusion and labor mobility—determinants of R&D diffusion—are usually confined to a limited geographical or technological area.

Aside from the unobserved heterogeneity captured by the private shocks

ε_{i},

we do not allow individual heterogeneity in the parameters. The reason is that the model assumes only one repetition of the game and the number of agents growing to infinity to develop the asymptotic theory. Clearly, allowing heterogenous parameters across individuals will result in inconsistency.

Model (1) is fairly general for applications. It can arise as a system of best response functions of a static game of incomplete information among n players. Below, we derive these equations for two strategic interaction models: (i) R&D investment by firms; and (ii) labor supply by women. In these models, decisions of players exhibit strategic complementarities, and are subject to threshold effects.

2.1. Spillovers in R&D Investment

A large body of empirical evidence suggests presence of technology and R&D spillovers among firms, e.g., [10]. Audretsch and Feldman [10] find that knowledge spillovers are more prevalent in the industries that exhibit spatial clustering. Positive R&D spillovers may occur through several channels including knowledge transfers, labor mobility and imitation. Therefore, it is reasonable to expect the magnitude of such a spillover effect to depend on the geographical and technological distances between firms. As a result, firms’ R&D expenditures may be spatially correlated, and the magnitude of this correlation often decays with the distance between firms.

The literature distinguishes two major channels through which R&D can raise firms’ profits: cost-reducing and demand-creating effects. The former allows firms to carry out process improvements leading to efficiency gains and cost reduction, while the latter enables firms to improve the quality of their product and thereby boost the demand. Levin and Reiss [11] analyze a model of monopolistic competition with both demand-creating and cost-reducing R&D spillovers across n firms. Based on a sample of US manufacturing firms, the authors find statistically significant, sizeable spillovers in the cost-reducing R&D and insignificant, small spillovers in the demand-creating R&D in most industries. Levin and Reiss [11] also find the elasticity of product quality to firm’s own R&D to be much higher than that of cost to firm’s own R&D. Other theoretical models of R&D spillovers include d’Aspremont and Jacquemin [12], and Motta [13], among others.

Yet all these papers model the R&D investment as a continuous variable thereby neglecting the strong empirical evidence that a sizeable proportion of firms do not undertake R&D activities, see, e.g., [14]. One plausible explanation is that the demand-creating effect of R&D is subject to threshold effects: the quality could be raised only after a certain minimum level R&D investment is attained; R&D has no effect on the quality below this level. Thus, the R&D expenditure could be viewed as a censored decision variable whose optimal values below a certain threshold are unobserved. This type of model in the single-firm setup is analyzed by Gonzalez and Jaumandreu [15].

To study spatial spillovers in R&D investment, we develop a simple model of strategic interaction with a censored decision variable that incorporates the empirical findings discussed above. We consider a single, monopolistically competitive industry composed of a large number,

n,

firms, each producing a brand of the same product differentiated by quality. Let

p_{i,}

q_{i}

,

s_{i}

and

y_{i}

denote, respectively, the price, demand, product quality and R&D expenditure of firm i. To derive

q_{i}

, we employ a variant of the Dixit-Stiglitz [16] model of monopolistic competition in which the CES utility of a representative consumer is augmented with preference for quality:

u (q_{1}, . . ., q_{n}) = {[\sum_{i = 1}^{n} {(q_{i} s_{i}^{η})}^{ρ}]}^{1 / ρ}, 0 < ρ < 1

where

η > 0

is a quality sensitivity parameter. Utility maximization yields the demand for firm i of the form:

q_{i} (p_{i}, s_{i}) = \bar{K} p_{i}^{- ν} s_{i}^{ϵ}

, where

ν = 1 / (1 - ρ) > 1

is the elasticity of substitution between the quality-adjusted goods,

ϵ = η (ν - 1)

,

\bar{K} = I {\bar{p}}^{- 1}

with I being consumer’s income and

\bar{p} = \sum_{i = 1}^{n} {(p_{i} / s_{i}^{η})}^{1 - ν}

is a quality-adjusted price index. To obtain non-increasing marginal demand for quality,

\partial^{2} q_{i} / \partial s_{i}^{2} \leq 0,

suppose that

η \leq 1 / (ν - 1) .

If the number of firms is large, it is reasonable to assume the effect of a single firm’s decision on the industry index

\bar{p}

to be negligible, i.e.,

\bar{K}

is constant, and normalize

\bar{K} = 1

.

Following Gonzalez and Jaumandreu [15], we assume that firm’s own R&D expenditure affects only its product quality, subject to a technological constraint:

s_{i} = s (y_{i}) = \{\begin{matrix} {(y_{i} + 1)}^{δ} & if & y_{i} > \bar{y} \\ {(\bar{y} + 1)}^{δ} & if & y_{i} \leq \bar{y} \end{matrix}

where

\bar{y}

is the minimum investment required for quality improvements,

0 < δ < 1

is the R&D sensitivity parameter. Throughout, we use

y_{i} + 1

instead of

y_{i}

to ensure that the logarithm of the censored investment is defined for zero values, and let

Y_{i} = log (y_{i} + 1)

. It is a convenient normalization, which does not affect the results.

Furthermore, in light of the above empirical findings, we assume that other firms’ R&D have only a cost-reducing effect on firm

i,

and this effect is limited to the fixed r-neighborhood,

N_{i} (r)

, of firm i:

c_{i} = c_{i} (X_{i}, Y_{- i}, e_{i}) = exp (X_{i}^{'} β + α \sum_{j \in N_{i}} Y_{j} + e_{i})

where

c_{i}

and

X_{i}

are, respectively, the marginal cost and vector of observed cost-determinants of firm i,

Y_{i} = log (y_{i} + 1)

is the log of firm i’s investment,

Y_{- i} = (Y_{1}, . ., Y_{i - 1}, Y_{i + 1}, . . ., Y_{n})

is the vector of the log R&D choices of all firms except i, and

e_{i}

is firm i’s idiosyncratic cost component. The coefficient

α \leq 0

measures the strength of this spillover effect.

Suppose that all firms observe

{\bar{X}}_{n} = {(X_{1}, . . ., X_{n})}^{'}

, but

e_{i}

is observed only by firm i. Given this uncertainty about the choices of other firms, following Durlauf [17], see also [6], we assume that each firm i decides on its R&D investment based on its beliefs about the choices of the other firms,

E_{i} (Y_{- i}) = E (Y_{- i} | {\bar{X}}_{n}, e_{i}),

which are formed as the conditional expectation given all the information available to firm i.

Based on these beliefs, firms choose simultaneously price,

p_{i}

, and R&D investment,

y_{i}

, to maximize their profits subject to a technological constraint, i.e., solve

max_{p_{i} \geq 0, y_{i} \geq 0} Π (p_{i}, y_{i}, X_{i}, E_{i} (Y_{- i}), e_{i}) = [p_{i} - c_{i} (X_{i}, E_{i} (Y_{- i}), e_{i})] q_{i} (p_{i}, s_{i} (y_{i})) - C (y_{i})

(2)

s.t. q_{i} (p_{i}, s_{i}) = p_{i}^{- ν} s_{i}^{ϵ}, s_{i} = \{\begin{matrix} {(y_{i} + 1)}^{δ} & if & y_{i} > \bar{y} \\ {(\bar{y} + 1)}^{δ} & if & y_{i} \leq \bar{y} \end{matrix} and C (y_{i}) = \{\begin{matrix} y_{i} + 1 & if & y_{i} > \bar{y} \\ y_{i} & if & y_{i} \leq \bar{y} \end{matrix}

(3)

where

C (y_{i})

is the cost of investment. The nonstochastic threshold

\bar{y} \geq 0

is assumed to be observed and constant across all firms.

Lemma 1.

The solution to optimization problem (2) and (3) is given by (1) with

α_{j 0} = - τ α

,

j = 1, . ., k,

β_{0} = - τ β

,

ε_{i} = - τ e_{i}

,

τ = (ν - 1) / (1 - ϵ δ),

b_{0} = log ({[ϵ δ ν^{- ν} {(1 - ν)}^{(1 - ν)}]}^{1 / (1 - ϵ δ)})

and

γ_{0} = log ({(1 - ϵ δ)}^{- 1 / (ϵ δ)} (\bar{y} + 1)) > 0 .

If

α < 0

, i.e., R&D of the neighbors has a cost-reducing effect on firm i, then both the probability and intensity of firm i’s R&D increases with the expected R&D of its neighbors. In other words, there are strategic complementarities or positive externalities in the R&D decision of firms. Furthermore, the probability of R&D is also increasing in (i) the elasticity of demand with respect to quality, higher ϵ; (ii) the elasticity of quality with respect to R&D, higher δ; and (iii) the market power, lower ν. The latter is consistent with the Schumpeterian argument that economies of scales make R&D more attractive to large firms than to small firms.

2.2. Peer Effects in Female Labor Supply

Our next example involves social interactions in the female labor supply. Suppose the utility of female i is defined over her consumption,

c_{i}

, and leisure,

l_{i}

, as follows:

U (c_{i}, y_{i}, h_{i}) = \frac{c_{i}^{δ}}{δ} + h_{i} l_{i}

where

0 < δ < 1

is the parameter characterizing the relative preference for consumption over leisure. Let

y_{i}

be the labor supply of female i, and let the weight on the leisure,

h_{i}

, capture the peer effects that depend on the labor supply decisions of female i’s peers in her social neighborhood, referred to as friends, as follows:

h_{i} = h_{i} (X_{i}, Y_{- i}, e_{i}) = exp (X_{i}^{'} β + α \sum_{j \in N_{i}} Y_{j} + e_{i})

where

X_{i}

is the vector of observed characteristics of woman i,

Y_{j} = log (y_{j} + 1)

is the log labor supply of woman i’s friends, and

e_{i}

is her private characteristic unobserved by other women. As in the previous example, we use

y_{i} + 1

instead of

y_{i}

to ensure that log of the censored labor supply is defined for zero values. In presence of positive peer effects,

α < 0

, which implies mutual reinforcement in the choices within the social group.

As before, all women observe

{\bar{X}}_{n} = {(X_{1}, . . ., X_{n})}^{'}

, but

e_{i}

is observed only by i. Woman i makes her decision based on her beliefs about the choices of her peer group,

E_{i} (Y_{- i}) = E (Y_{- i} | {\bar{X}}_{n}, e_{i}),

which are formed as the conditional expectation given all the information available to woman i. Based on these beliefs, women simultaneously maximize their utility subject to threshold effects:

max_{c_{i} \geq 0, y_{i} \geq 0} U (c_{i}, X_{i}, E_{i} (Y_{- i}), e_{i}) = \frac{c_{i}^{δ}}{δ} + h_{i} (X_{i}, E (Y_{- i}), e_{i}) l_{i}

(4)

s.t. c_{i} = \{\begin{matrix} w (y_{i} + 1) & if & w (y_{i} + 1) > \bar{c} \\ \bar{c} & if & w (y_{i} + 1) \leq \bar{c} \end{matrix}, l_{i} = \{\begin{matrix} T - y_{i} - 1 & if & w (y_{i} + 1) > \bar{c} \\ T - y_{i} & if & w (y_{i} + 1) \leq \bar{c} \end{matrix}

(5)

where w is the wage, T is the time endowment,

\bar{c}

is the reservation labor income, which can be interpreted as welfare or other government transfers. The nonstochastic threshold

\bar{c} \geq 1

is assumed to be observed and constant across women.

Lemma 2.

The solution to optimization problem (4) and (5) is given by (1) with

α_{j 0} = - α / (1 - δ)

,

j = 1, . ., k,

β_{0} = - β / (1 - δ)

,

ε_{i} = - e_{i} / (1 - δ)

,

b_{0} = log (w^{δ / (1 - δ)})

and

γ_{0} = log ({(1 - δ)}^{- 1 / δ} \bar{c}) > 0

.

If

α < 0

, i.e., there are positive peer effects, then both the probability and magnitude of woman i’s labor supply increases with the expected labor supply of her peers.

3. Equilibrium: Characterization and Weak Dependence

We assume that in equilibrium, players have self-consistent expectations, i.e., their subjective expectations or beliefs coincide with the expectation based on the equilibrium distribution of strategies conditional on

{\bar{X}}_{n}

. That is,

E_{i} (Y_{j}) = E (Y_{j} | {\bar{X}}_{n}, ε_{i}) = E (Y_{j} | {\bar{X}}_{n}) : = {\tilde{Y}}_{j n}

(6)

where the expectation

E [\cdot | {\bar{X}}_{n}]

is taken with respect to the equilibrium conditional distribution of strategies. The last equality follows from independence of

\{ε_{i}\}

, and independence of

\{ε_{i}\}

and

\{X_{i}\}

.

Suppose that the

ε_{i}

are i.i.d.

N (0, σ_{0}^{2})

. Taking conditional expectation of Equation (1) with respect to the equilibrium distribution of strategies, conditional on

{\bar{X}}_{n}

, yields:

\begin{matrix} {\tilde{Y}}_{i n} & = & Φ (\frac{\sum_{j \in N_{i}} α_{j 0} {\tilde{Y}}_{j n} + X_{i}^{'} β_{0} + b_{0} - γ_{0}}{σ_{0}}) (\sum_{j \in N_{i}} α_{j 0} {\tilde{Y}}_{j n} + X_{i}^{'} β_{0} + b_{0}) \\ + σ_{0} ϕ (\frac{\sum_{j \in N_{i}} α_{j 0} {\tilde{Y}}_{j n} + X_{i}^{'} β_{0} + b_{0} - γ_{0}}{σ_{0}}) \end{matrix}

(7)

where Φ and ϕ are, respectively, the c.d.f. and p.d.f. of the standard normal distribution.

Provided that they are well-defined, strategies

\{Y_{i n}, i = 1, . ., n\}

are independent across i conditional on

{\bar{X}}_{n}

and have censored normal distributions with the means

\{{\tilde{Y}}_{i n}, i = 1, . ., n\},

the common variance

σ_{0}

and the common nonstochastic threshold

γ_{0}

. In equilibrium,

\{{\tilde{Y}}_{i n}, i = 1, . ., n\}

satisfy system (7). If this system has a unique solution, the corresponding equilibrium strategies,

\{Y_{i n}, i = 1, . ., n\},

will be also unique with probability 1, since a censored normal variable is uniquely characterized by its mean, variance and threshold. This leads to the following characterization of equilibrium.

Definition 1.

An equilibrium is a set of policy functions

\{Y_{i n}, i = 1, . ., n\}

whose conditional mean functions

{\{{\tilde{Y}}_{i n} = E (Y_{i n} | {\bar{X}}_{n})\}}_{i = 1}^{n}

satisfy system (7).

A similar characterization of equilibrium in discrete games of incomplete information is used in [8].

An appealing feature of Equation (1) is that it reduces to the popular Tobit model, which is part of any regression package. However, the difficulty is that

\{Y_{i n}\}

depends on the latent regressors,

\{{\tilde{Y}}_{i n}\}

. Thus, one would need first to obtain consistent estimates of the latent regressors, and then use any consistent estimation procedures for the Tobit model.

Since consistency of any estimation method hinges upon uniqueness of equilibrium, we first prove existence and uniqueness of the pure strategy equilibrium. To this end, we maintain the following assumption.

Assumption 1.

The shocks

ε_{i} \sim

i.i.d.

N (0, σ_{0}^{2})

and

λ = k [ϕ (0) γ_{0} / σ_{0} + 1] \bar{α} < 1,

where

γ_{0} \geq 0

,

\bar{α} = {max}_{1 \leq j \leq k} |α_{j 0}|

,

ϕ (\cdot)

is the p.d.f. of the standard normal distribution.

This assumption restricts the strength of interactions, captured by the coefficients α: interactions must not be too strong for a stable equilibrium to exist. Intuitively, if the interactions are long-ranged and too strong, then the effect of remote neighbors is substantial and may lead to instability and multiple equilibria. Since it involves only the estimated coefficients and the number of neighbors, k, is typically known, the assumption is testable.

Assumption 1 is similar to Assumptions B and C in [4], which restrict the strength of interactions to obtain a unique equilibrium in a discrete choice game of social interactions.

Based on this assumption, we can now show existence and uniqueness of equilibrium.

Theorem 1.

Under Assumption 1, there exists a unique equilibrium of model (1).

In general, without restrictions on the parameters, multiple equilibria could occur. If one does not want to impose restrictions directly, one can use the Mathematical Program with Equilibrium Constraints (MPEC) routine to deal with multiple equilibria implicitly by choosing the equilibrium that maximizes the empirical likelihood.

In equilibrium, the policy variables will be correlated across players. To characterize their dependence, we assume that the process

\{W_{i n} = {(Y_{i n}, {\tilde{Y}}_{i n}, X_{i}^{'}, ε_{i})}^{'}\}

is indexed by a vector of locations

t (i) = (t_{1}, \dots, t_{d}) \in Z^{d}

on the lattice

Z^{d}

, and hence can be viewed as a random field on

Z^{d}

. In other words,

\{W_{i n} = W_{t (i) n}, t (i) \in Λ_{n}, n \geq 1\}

is a triangular array of vector-valued random fields defined on a probability space

(Ω, F, P)

and observed on sample regions

Λ_{n} \subset Z^{d}

. In the following, to simplify notation, we suppress the index t and write

W_{i n} = W_{t (i) n}

. Furthermore, we denote by

∥\cdot∥

the Euclidian norm in

R^{d}

and by

{∥\cdot∥}_{p} = {[E {∥\cdot∥}^{p}]}^{1 / p}

– the

L_{p}

-norm.

Assumption 2.

The data-generating process

\{W_{i n} = W_{t (i) n}, t (i) \in Λ_{n}, n \geq 1\}

is a triangular array of random fields indexed by

t (i) = (t_{1}, \dots, t_{d}) \in Z^{d},

where the

Λ_{n} \subset Z^{d}

are the sample regions such that

|Λ_{n}| \to \infty

as

n \to \infty

. The distance between players i and j is measured by the Euclidian metric:

d i s t (i, j) = ∥t (i) - t (j)∥ .

This assumption implies that the players’ locations are exogenous, i.e., they are known and determined outside the model. Extensions to endogenous locations would require explicit modeling of the location choice, and would therefore considerably complicate the model. This extension is an interesting direction for future research.

Given Assumption 2, it turns out that the equilibrium policy variables satisfy a weak dependence condition known as near-epoch dependence (NED), see [5], under the same condition that ensures uniqueness of equilibrium. For ease of reference, we state definition of NED random variables.

Definition 2.

The triangular array of random fields

\{W_{t n}, t \in Λ_{n}, n \geq 1\},

{∥W_{t n}∥}_{2} < \infty

, is

L_{2}

-NED on

{V_{t n}, t \in Λ_{n}, n \geq 1}

iff

{sup}_{n, t \in Λ_{n}} {∥W_{t n} - E (W_{t n} | F_{t n} (m))∥}_{2} \leq ψ (m)

for some sequence

ψ (m) \to 0

as

m \to \infty

, where

F_{t n} (m) = σ (V_{s n}; s \in Λ_{n} : ∥t - s∥ \leq m) .

Theorem 2.

Suppose Assumptions 1 and 2 hold and

{∥X∥}_{6} = {sup}_{i} {∥X_{i}∥}_{6} < \infty

, then (i)

\{{\tilde{Y}}_{i n}\}

is

L_{2}

-NED on

\{X_{i}\}

with the NED numbers

ψ (m) = 2 {∥Y∥}_{2} λ^{[m / r]}

, where

p = [m / r]

is the integer of part of

m / r,

(ii)

\{Y_{i n}\}

is

L_{2}

-NED on

\{(X_{i}^{'}, ε_{i})\}

with the NED numbers

c ψ^{1 / 12} (m)

for constant c that does not depend on m, (iii)

\{1 (Y_{i n} > γ_{0})\}

and

\{1 (Y_{i n} = 0)\}

are

L_{2}

-NED on

\{(X_{i}^{'}, ε_{i})\}

with the NED numbers

(2 + k \bar{α}) ψ^{1 / 3} (m) .

The value of the constant c is given in the proof, but it is not important for what follows.

4. Identification and Estimation

We now discuss identification and estimation of our model. Let

Z_{i n} = {(({\tilde{Y}}_{j n}, j \in N_{i}), 1, X_{i}^{'})}^{'}

, and let

δ = {((α_{j}, j \in N_{i}), b, β^{'})}^{'}

denote the corresponding vector of the coefficients. Given the specification, it is natural to identify and estimate all unknown parameters based on the likelihood function. The log likelihood function of the model is

\begin{matrix} log L (θ, γ) & = & \sum_{i = 1}^{n} log f (Y_{i} | {\bar{X}}_{n}; θ, γ) with \\ log f (Y_{i} | {\bar{X}}_{n}; θ, γ) & = & 1 (Y_{i} = 0) log Φ (\frac{γ - Z_{i n}^{'} δ}{σ}) + 1 (Y_{i} > γ) log [σ^{- 1} ϕ (\frac{Y_{i} - Z_{i n}^{'} δ}{σ})] \end{matrix}

(8)

where

θ = {(δ^{'}, σ)}^{'}

. Likelihood function (8) involves an unknown threshold parameter,

γ,

which is in contrast to the standard Tobit model, where the threshold is assumed to be known and equal to zero. The maximum likelihood (ML) estimator of γ is the minimum order statistics of the uncensored subsample. More specifically, partition the dependent variable and regressor matrix into two parts:

Y = {(Y_{(0)}^{'}, Y_{(1)}^{'})}^{'}

and

Z = {(Z_{(0)}^{'}, Z_{(1)}^{'})}^{'}

, where the subscript

(0)

indicates that observations come from the censored subsample, and the subscript

(1)

– from the uncensored subsample, and let

\hat{γ} = min \{Y_{i} : Y_{i} > 0, i = 1, \dots, n\}

(9)

As shown in Proposition 1 below,

\hat{γ}

is a consistent estimator of γ. The ML estimators of the other parameters θ can then be obtained by the standard differentiation techniques.

Assumption 3.

Suppose (i)

{sup}_{i, n} {∥W_{i n}∥}_{6} < \infty

; and (ii)

\{X_{i}\}

is a stationary α-mixing process with coefficient satisfying

α (k, l, m) \leq {(k + l)}^{ς} \hat{α} (m)

,

ς \geq 0

,

\hat{α} (m)

s.t.

\sum_{m = 1}^{\infty} m^{(d ς / 3 - 1)} {\hat{α}}^{1 / 6} (m) < \infty

.1

Proposition 1.

Under Assumptions 1–3,

{lim}_{n \to \infty} P (n (\hat{γ} - γ_{0}) < z) = 1 - exp (- a z)

for

z > 0,

where

a = E [Φ (\frac{Z_{i n}^{'} δ - γ}{σ})] E [\frac{ϕ ((Z_{i n, (1)}^{'} δ - γ) / σ)}{σ Φ ((Z_{i n, (1)}^{'} δ - γ) / σ)}]

and

Z_{i n, (1)}

is the regressor vector from the uncensored subsample.

Thus,

\hat{γ}

is n-consistent and asymptotically exponentially distributed. For i.i.d. sample, this result has been established by Carson and Sun [19]. So, Proposition 1 extends [19] to a spatially dependent case. The superconsistency of

\hat{γ}

is a well-known consequence of the dependence of the support of

Y_{i}

on γ.

Proposition 1 implies that

γ_{0}

is identified. The remaining parameters,

θ_{0} = {(δ_{0}^{'}, σ_{0})}^{'}

can now be identified from the likelihood function. Alternatively, one can identify

θ_{0}

from the conditional mean function and estimate it by the least squares procedure:

φ (Z_{i n}, θ, γ) = E (Y_{i} | {\bar{X}}_{n}) = Φ (\frac{Z_{i n}^{'} δ - γ}{σ}) Z_{i n}^{'} δ + σ ϕ (\frac{Z_{i n}^{'} δ - γ}{σ})

(10)

In contrast to the standard Tobit model with zero-threshold, the ML estimator does not strictly dominate the least squares (LS) estimator in our model due to the presence of the first-step estimator. The thing is that the LS objective function is continuous in γ, while the likelihood function is not. The latter implies that small finite-sample biases in

\hat{γ}

may cause sizeable finite-sample biases in the ML estimates of θ. This prediction is confirmed by the simulation results of Section 6, which suggest larger finite-sample biases in the ML than in the LS estimator. This is the main rationale for considering the LS procedure as an alternative to the ML procedure in our model.

Thus, estimation of model (1) could be carried out in two steps. First, estimate the threshold parameter γ by the minimum order statistic of the uncensored subsample

\hat{γ}

, and substitute it for the true

γ_{0}

in (8) and (10). Then, estimate the remaining parameters θ in (8) and (10) by the ML or LS procedures, respectively. Note that the least squares estimator of γ in (1) will be imprecise due to near multicollinearity of the intercept and threshold. Therefore, we use the first-step estimator

\hat{γ}

in both procedures.

We now present sufficient conditions for identification of θ.

Assumption 4.

Suppose (i) at least one of components of

X_{i}

has the full support,

R

; and (ii)

E (Z_{i n} Z_{i n}^{'})

is positive definite.

Theorem 3.

Under Assumptions 1–4,

Q (θ, γ_{0}) = E [m (W_{i n}, θ, γ_{0})]

is uniquely maximized at

θ_{0}

for

A.: $m (W_{i n}, θ, γ_{0}) = log f (Y_{i} | {\bar{X}}_{n}; θ, γ_{0})$ , where $log f (Y_{i} | {\bar{X}}_{n}; θ, γ)$ is as defined in (8), and
B.: $m (W_{i n}, θ, γ_{0}) = - {(Y_{i} - φ (Z_{i n}, θ, γ_{0}))}^{2}$ , where $φ (Z_{i n}, θ, γ)$ is as defined in (10).

Practically, the second-step estimation of θ could be implemented through the following nested fixed-point (NFXP) algorithm: (i) in an inner loop, for a given θ, find the unique solution of the equilibrium equations (7) by the fixed-point algorithm; and (ii) in an outer loop, search over

θ \in Θ

that maximizes the objective function. Let

\tilde{Y} (θ, γ) = ({\tilde{Y}}_{1}, \dots, {\tilde{Y}}_{n})

be the solution of the equilibrium equations (7). Then, the resulting estimator can be represented as

\hat{θ} = arg max_{θ \in Θ} \{Q_{n} (θ, \tilde{Y} (θ, \hat{γ}), \hat{γ}) = \frac{1}{n} \sum_{i = 1}^{n} m (W_{i n}, θ, \hat{γ})\}

(11)

where

m (\cdot, \cdot, \cdot)

is either the log likelihood function defined in (8) or minus the squared deviation of

Y_{i}

from the conditional mean defined in (10). This formulation makes explicit the dependence of the equilibrium variables on the estimated parameters. Given superconsistency of the first-step estimator, the resulting second-step maximum likelihood or least squares estimators of θ will be root-n consistent, asymptotically normal and independent of

\hat{γ}

, as shown in Theorem 4 below. However, the NFXP algorithm will be computationally costly for large cross-sectional datasets, e.g.,

n \geq 200

.

To overcome this problem, we instead use the constrained optimization algorithm proposed by Su and Judd [7]. The idea is to solve the following constrained optimization problem:

max_{(θ, \tilde{Y})} Q_{n} (θ, \tilde{Y}, \hat{γ}) subject to h (\tilde{Y}, θ, \hat{γ}) = 0

(12)

where

h (\tilde{Y}, θ, γ) = 0

is the vector representation of the equilibrium system (7). Note that

\tilde{Y}

in this formulation does not depend on θ, and is chosen simultaneously with θ to maximize the objective function subject to the equilibrium constraints. This obviates the need to solve the multi-dimensional fixed point problem for

\tilde{Y}

at each iteration on θ.

Su and Judd [7] prove equivalence of problems (11) and (12) provided that the model is identified. They also demonstrate the computational advantage of this constrained optimization algorithm over the NFXP algorithm in the context of a single-agent dynamic discrete choice model. In particular, they show that the proposed algorithm leads, on average, to a ten-fold reduction of the computational time relative to the NFXP algorithm.

Since our model is identified by Theorem 3, the maximizer

\hat{θ}

of problem (11) equals the maximizer

\bar{θ}

of problem (12) by Proposition 1 of Su and Judd [7], and one can thus replace the computationally intensive problem (11) by the simpler problem (12). We investigate performance of the constrained optimization algorithm for our model in the Monte Carlo study of Section 6.

5. Consistency and Asymptotic Normality

We next show consistency and asymptotic normality of the maximum likelihood and least squares estimators. To this end, we need the following assumption.

Assumption 5.

Suppose

(i): $Θ = Δ \times Σ$ and Γ are compact, $Σ = [σ_{1}; σ_{2}]$ , $σ_{1} > 0$ , $θ_{0} \in i n t (Θ)$ , $γ_{0} \in i n t (Γ)$ .
(ii): ${sup}_{i, n} E [{sup}_{Θ} {log}^{2} Φ ((γ_{0} - Z_{i n}^{'} δ) / σ)] < \infty$ , ${sup}_{i, n} E [{sup}_{Θ} {(Y_{i n} - Z_{i n}^{'} δ)}^{2}] < \infty$ , ${sup}_{i, n} E ∥{sup}_{Θ} \frac{\partial m (W_{i n}, θ, γ_{0})}{\partial θ}∥ < \infty$ .

Part (i) Assumption 5 is the standard condition on the parameter space and the true parameter value; Part (ii) is used to verify uniform convergence of various sample functions. Generally, the above assumptions are slightly stronger than those in the fully parametric Tobit model with zero-threshold, since our Tobit estimator of θ relies on a nonparametric first-step estimator of γ.

Theorem 4.

Under Assumptions 1–5, the maximum likelihood and least squares estimators are both consistent and asymptotically normal, i.e.,

\sqrt{n} ({\hat{θ}}_{n} - θ_{0}) \overset{d}{\to} N (0, Ω)

, where

Ω = H_{0}^{- 1} S_{0} H_{0}^{- 1}

with

H_{0} = E [\frac{\partial^{2}}{\partial θ \partial θ^{'}} m (W_{i n}, θ_{0}, γ_{0})]

and

S_{0} = V a r [\frac{\partial}{\partial θ} m (W_{i n}, θ_{0}, γ_{0})] .

Thus, both the maximum likelihood and least squares estimators of θ are

\sqrt{n}

-consistent and asymptotically normal. To conduct inference, it remains to obtain a consistent estimate of the covariance matrix

S_{0}

. For this purpose, one can employ the following spatial HAC estimator:

\hat{S} (\hat{θ}, \hat{γ}) = \frac{1}{\bar{n}} \sum_{i \in Λ_{n}} \sum_{j \in Λ_{n} : |i - j| \leq h_{n}} K ((i - j) / h_{n}) \frac{\partial}{\partial θ} m (W_{i n}, \hat{θ}, \hat{γ}) \frac{\partial}{\partial θ^{'}} m (W_{j n}, \hat{θ}, \hat{γ})

where

K ((i - j) / h_{n}) = K ((i_{1} - j_{1}) / h_{n}, \dots, (i_{d} - j_{d}) / h_{n})

is a d-dimensional symmetric kernel, and

h_{n}

is a bandwidth parameter. Jenish [20] proves consistency of this estimator for more general nonparametric estimators of γ. In our model, consistency is achieved by bandwidth parameters satisfying

h_{n} = O (n^{1 / (3 d)})

.

6. Numerical Results

In this section, we examine the finite sample properties of the maximum likelihood (ML) and least squares (LS) estimators of our censored model, as well the performance of the Su-Judd [7] algorithm.

Throughout, data

\{W_{i_{1}, i_{2}}\}

reside on the two-dimensional lattice

Z^{2}

, where

(i_{1}, i_{2}) \in

Z^{2}

denotes, for simplicity, the vector of coordinates. The data are simulated on a rectangular grid of

(m_{1} + 300) \times (m_{2} + 300)

locations. To control for boundary effects, we discard the 300 outer boundary points along each of the axes and use the sample of size

n = m_{1} m_{2}

for estimation.

Our experiment consists of two stages: (i) simulation; and (ii) estimation. In the first stage, we first simulate two i.i.d.

N (0, 1)

processes

\{ε_{i_{1}, i_{2}}\}

and

\{η_{i_{1}, i_{2}}\}

, which are independent of each other. Next, using the fixed-point algorithm, we generate the process

\{X_{i_{1}, i_{2}}\}

according to:

X_{i_{1}, i_{2}} = 0.2 (X_{i_{1} - 1, i_{2}} + X_{i_{1}, i_{2} - 1} + X_{i_{1} + 1, i_{2}} + X_{i_{1}, i_{2} + 1}) + η_{i_{1}, i_{2}}

and then the process

\{{\tilde{Y}}_{i_{1}, i_{2}}\}

according to:

\begin{matrix} {\tilde{Y}}_{i_{1}, i_{2}} = Φ (\frac{α ({\tilde{Y}}_{i_{1} - 1, i_{2}} + {\tilde{Y}}_{i_{1}, i_{2} - 1} + {\tilde{Y}}_{i_{1} + 1, i_{2}} + {\tilde{Y}}_{i_{1}, i_{2} + 1}) + X_{i_{1}, i_{2}} β + b - γ}{σ}) \cdot \\ \cdot [α ({\tilde{Y}}_{i_{1} - 1, i_{2}} + {\tilde{Y}}_{i_{1}, i_{2} - 1} + {\tilde{Y}}_{i_{1} + 1, i_{2}} + {\tilde{Y}}_{i_{1}, i_{2} + 1}) + X_{i_{1}, i_{2}} β + b] - \\ - σ ϕ (\frac{α ({\tilde{Y}}_{i_{1} - 1, i_{2}} + {\tilde{Y}}_{i_{1}, i_{2} - 1} + {\tilde{Y}}_{i_{1} + 1, i_{2}} + {\tilde{Y}}_{i_{1}, i_{2} + 1}) + X_{i_{1}, i_{2}} β + b - γ}{σ}) \end{matrix}

(13)

with

α = 0.2,

β = 1

,

b = 1.25

,

γ = 2

and

σ = 1

. Last, we form the process

\{Y_{i_{1}, i_{2}}\}

according to:

\begin{matrix} Y_{i_{1}, i_{2}} & = & Y_{i_{1}, i_{2}}^{*} 1 (Y_{i_{1}, i_{2}}^{*} > γ), where \\ Y_{i_{1}, i_{2}}^{*} & = & b + X_{i_{1}, i_{2}} β + α ({\tilde{Y}}_{i_{1} - 1, i_{2}} + {\tilde{Y}}_{i_{1}, i_{2} - 1} + {\tilde{Y}}_{i_{1} + 1, i_{2}} + {\tilde{Y}}_{i_{1}, i_{2} + 1}) + ε_{i_{1}, i_{2}} \end{matrix}

In the second stage, we first construct the minimum order statistic estimator of γ –

\hat{γ}

– and then use the Su-Judd [7] constrained optimization algorithm to estimate the remaining parameters

θ = {(α, β^{'}, b, σ)}^{'} .

As discussed in Section 4, we estimate the n-dimensional vector of endogenous variables

\tilde{Y} = \{{\tilde{Y}}_{i_{1}, i_{2}}\}

jointly with θ, instead of computing it at each iteration on θ, i.e., we solve the constrained optimization problem:

max_{(θ, \tilde{Y})} Q_{n} (θ, \tilde{Y}, \hat{γ}) subject to h (\tilde{Y}, θ, \hat{γ}) = 0

where

h (\tilde{Y}, θ, γ) = 0

is the vector representation of the equilibrium system (13).

The estimation results based on 1000 Monte-Carlo repetitions are presented in Table 1, Table 2, Table 3, Table 4 and Table 5. Table 1 reports the estimates of the autoregressive parameter

α = 0.2

.

Table 1. Estimation of auto-regressive parameter:

α = 0.2

.

**Table 1.** Estimation of auto-regressive parameter: $α = 0.2$ .
Maximum Likelihood
Sample Size	Mean	Bias (%)	SD	RMSE	25th Pct	50th Pct	75th Pct
$N = 200$	$0.1978$	$- 1.1212$	$0.0056$	$0.0060$	$0.1946$	$0.1979$	$0.2013$
$N = 400$	$0.1989$	$- 0.5674$	$0.0030$	$0.0032$	$0.1970$	$0.1991$	$0.2008$
$N = 600$	$0.1991$	$- 0.4255$	$0.0022$	$0.0024$	$0.1977$	$0.1992$	$0.2007$
$N = 800$	$0.1994$	$- 0.3052$	$0.0018$	$0.0019$	$0.1983$	$0.1994$	$0.2006$
$N = 1000$	$0.1995$	$- 0.2691$	$0.0015$	$0.0016$	$0.1984$	$0.1995$	$0.2005$
Least Squares
$N = 200$	$0.1989$	$- 0.5452$	$0.0059$	$0.0060$	$0.1953$	$0.1990$	$0.2027$
$N = 400$	$0.1995$	$- 0.2578$	$0.0033$	$0.0034$	$0.1974$	$0.1996$	$0.2018$
$N = 600$	$0.1996$	$- 0.1981$	$0.0023$	$0.0024$	$0.1982$	$0.1997$	$0.2013$
$N = 800$	$0.1998$	$- 0.1162$	$0.0019$	$0.0019$	$0.1985$	$0.1999$	$0.2011$
$N = 1000$	$0.1998$	$- 0.0787$	$0.0017$	$0.0017$	$0.1986$	$0.1998$	$0.2011$

Both the ML and LS estimators of α behave well for all sample sizes. The finite-sample bias declines rapidly from about 1.1% (

n = 200)

to 0.3% (for

n = 1000

) in the case of the ML estimator, and from 0.55% (

n = 200)

to 0.08% (

n = 1000)

in the case of the LS estimator. These results suggest that a five-fold increase in the sample size leads to a more than three-fold reduction in the ML bias, and a more than six-fold decrease in the LS bias, which is consistent with our asymptotic theory. The standard errors also fall off rapidly with the sample size. A similar pattern is observed for the estimates of the slope β, shown in Table 2.

Table 3 contains the minimum order statistic estimates of γ. The finite-sample bias diminishes from 4.5% (

n = 200)

to 0.8% (for

n = 1000

), which means that a five-fold increase in the sample size is associated with more than a five-fold reduction in the bias. This is in line with the theoretical prediction of n-consistency of the minimum order statistics.

Table 2. Estimation of slope:

β = 1.0

.

**Table 2.** Estimation of slope: $β = 1.0$ .
Maximum Likelihood
Sample Size	Mean	Bias (%)	SD	RMSE	25th Pct	50th Pct	75th Pct
$N = 200$	$1.0053$	$0.5286$	$0.0666$	$0.0668$	$0.9612$	$1.0072$	$1.0507$
$N = 400$	$1.0037$	$0.3665$	$0.0419$	$0.0420$	$0.9756$	$1.0022$	$1.0295$
$N = 600$	$1.0032$	$0.3248$	$0.0324$	$0.0326$	$0.9789$	$1.0038$	$1.0251$
$N = 800$	$1.0022$	$0.2228$	$0.0254$	$0.0255$	$0.9844$	$1.0014$	$1.0197$
$N = 1000$	$1.0019$	$0.1886$	$0.0232$	$0.0233$	$0.9870$	$1.0019$	$1.0169$
Least Squares
$N = 200$	$1.0050$	$0.5027$	$0.0712$	$0.0714$	$0.9538$	$1.0038$	$1.0522$
$N = 400$	$1.0018$	$0.1758$	$0.0448$	$0.0449$	$0.9717$	$0.9988$	$1.0320$
$N = 600$	$1.0015$	$0.1534$	$0.0340$	$0.0340$	$0.9775$	$1.0009$	$1.0241$
$N = 800$	$1.0009$	$0.0909$	$0.0271$	$0.0271$	$0.9820$	$0.9999$	$1.0180$
$N = 1000$	$1.0002$	$0.0241$	$0.0250$	$0.0250$	$0.9840$	$1.0007$	$1.0161$

Table 3. Estimation of threshold:

γ = 2.0

.

**Table 3.** Estimation of threshold: $γ = 2.0$ .
Min. Order Statistics
Sample Size	Mean	Bias (%)	SD	RMSE	25th Pct	50th Pct	75th Pct
$N = 200$	$2.0899$	$4.4969$	$0.1128$	$0.1443$	$2.0219$	$2.0528$	$2.1111$
$N = 400$	$2.0438$	$2.1888$	$0.0462$	$0.0637$	$2.0123$	$2.0288$	$2.0610$
$N = 600$	$2.0294$	$1.4715$	$0.0310$	$0.0427$	$2.0082$	$2.0198$	$2.0408$
$N = 800$	$2.0189$	$0.9471$	$0.0193$	$0.0270$	$2.0050$	$2.0130$	$2.0260$
$N = 1000$	$2.0161$	$0.8047$	$0.0166$	$0.0231$	$2.0045$	$2.0109$	$2.0219$

Next, Table 4 and Table 5 present the estimates of the intercept b and the standard deviation σ, respectively.

The maximum likelihood estimates of b and σ exhibit larger biases than those of α and β. However, they decrease as the sample size increases: from 5.5% (

n = 200)

to 1.4% (

n = 1000)

in the case of b, and from 15.9% (

n = 200)

to 6.9% (

n = 1000)

in the case of σ. Thus, the biases still halve when the sample size increases four-fold, consistent with the asymptotic theory. Large small-sample biases could be explained by weak identification or near multicollinearity introduced by the inverse Mills ratio, which is approximately linear over a wide range of its argument.

Table 4. Estimation of intercept:

b_{0} = 1.25

.

**Table 4.** Estimation of intercept: $b_{0} = 1.25$ .
Maximum Likelihood
Sample Size	Mean	Bias (%)	SD	RMSE	25th Pct	50th Pct	75th Pct
$N = 200$	$1.3185$	$5.4812$	$0.1352$	$0.1516$	$1.2353$	$1.3156$	$1.3950$
$N = 400$	$1.2862$	$2.8955$	$0.0749$	$0.0832$	$1.2372$	$1.2804$	$1.3321$
$N = 600$	$1.2778$	$2.2216$	$0.0539$	$0.0607$	$1.2415$	$1.2744$	$1.3130$
$N = 800$	$1.2702$	$1.6191$	$0.0438$	$0.0482$	$1.2403$	$1.2698$	$1.2990$
$N = 1000$	$1.2679$	$1.4353$	$0.0374$	$0.0415$	$1.2428$	$1.2668$	$1.2924$
Least Squares
$N = 200$	$1.2834$	$2.6748$	$0.1449$	$0.1487$	$1.1917$	$1.2808$	$1.3699$
$N = 400$	$1.2671$	$1.3653$	$0.0843$	$0.0860$	$1.2121$	$1.2630$	$1.3197$
$N = 600$	$1.2635$	$1.0803$	$0.0582$	$0.0598$	$1.2236$	$1.2625$	$1.3011$
$N = 800$	$1.2582$	$0.6562$	$0.0483$	$0.0489$	$1.2252$	$1.2572$	$1.2896$
$N = 1000$	$1.2560$	$0.4775$	$0.0427$	$0.0431$	$1.2253$	$1.2557$	$1.2852$

Table 5. Estimation of standard deviation:

σ = 1.0

.

**Table 5.** Estimation of standard deviation: $σ = 1.0$ .
Maximum Likelihood
Sample Size	Mean	Bias (%)	SD	RMSE	25th Pct	50th Pct	75th Pct
$N = 200$	$0.8405$	$- 15.9486$	$0.0575$	$0.1695$	$0.8010$	$0.8397$	$0.8798$
$N = 400$	$0.8966$	$- 10.3429$	$0.0397$	$0.1108$	$0.8700$	$0.8972$	$0.9243$
$N = 600$	$0.9179$	$- 8.2084$	$0.0316$	$0.0879$	$0.8962$	$0.9185$	$0.9392$
$N = 800$	$0.9262$	$- 7.3842$	$0.0273$	$0.0787$	$0.9075$	$0.9267$	$0.9448$
$N = 1000$	$0.9303$	$- 6.9696$	$0.0246$	$0.0739$	$0.9134$	$0.9305$	$0.9468$
Least Squares
$N = 200$	$0.9860$	$- 1.3956$	$0.4799$	$0.4801$	$0.7597$	$0.9956$	$1.2585$
$N = 400$	$0.9598$	$- 4.0236$	$0.2901$	$0.2929$	$0.8227$	$0.9858$	$1.1342$
$N = 600$	$0.9784$	$- 2.1637$	$0.2000$	$0.2012$	$0.8833$	$0.9881$	$1.1034$
$N = 800$	$0.9828$	$- 1.7215$	$0.1594$	$0.1604$	$0.8930$	$0.9963$	$1.0809$
$N = 1000$	$0.9303$	$- 0.8312$	$0.1241$	$0.1243$	$0.9114$	$0.9941$	$1.0732$

Interestingly, the LS estimates of all parameters, including b and σ, have smaller finite-sample biases than the respective ML estimates. The reason is that the LS objective function is continuous in the first-step nonparametric estimator of γ, while the likelihood function is not, and small first-step biases in γ may get disproportionately amplified and translate into sizeable second-step biases in θ. Consequently, the biases in the ML estimates of b and σ due to weak identification are further exacerbated by discontinuity of the likelihood function. Nevertheless, as expected, the LS estimator has larger standard errors than the ML estimator across all parameters. Thus, in contrast to the standard Tobit model with zero-threshold, the ML estimator does not strictly dominate the LS estimator in our model.

Finally, Table 6 reports the computational time and the number of converged iterations for the Su-Judd [7] algorithm. The algorithm performs well for all sample sizes: converges in almost 99% of iterations and the time costs are less than two hours even in large sample sizes such as

n = 1000

. For comparison, the NFXP algorithm will take about 130–150 hours to estimate the model for same sample sizes. Thus,the Su-Judd [7] algorithm offers a considerable time savings over standard nested fixed-point algorithms.

Table 6. Algorithm Performance

**Table 6.** Algorithm Performance
Maximum Likelihood
Sample Size	$N = 200$	$N = 400$	$N = 600$	$N = 800$	$N = 1000$
Number of converged iterations	1000	999	991	992	961
Run Time	2 min	6 min	27 min	61 min	119 min
Least Squares
Sample Size	$N = 200$	$N = 400$	$N = 600$	$N = 800$	$N = 1000$
Number of converged iterations	942	996	997	999	998
Run Time	2 min	5 min	24 min	57 min	112 min

Overall, the simulations results are consistent with our asymptotic theory: the finite-sample biases and standard errors of the ML and LS estimators decay rapidly with the sample size. Moreover, the Su-Judd [7] constrained optimization algorithm seems to be a viable and effective numerical procedure for estimating games with large number of players, including our model.

7. Conclusions

In this paper, we study identification and estimation of a static game of incomplete information with censored strategies. Specifically, we show existence and uniqueness of an equilibrium as well as its weak dependence property under a condition that restricts the strength of interactions among the players. We then show identification of the parameters and estimate them by maximum likelihood and least squares procedures. The resulting estimators are shown to be consistent and asymptotically normal. We also demonstrate application of our results to modeling spillovers in firms’ R&D investment and peer effects in female labor supply.

One direction for future research is to relax the normality assumption on the errors and obtain identification under more general error distributions whose conditional mean functions satisfy contraction mapping conditions, similar to the one used in the paper. Another extension could be to allow for random threshold effects in the outcome variable, using some parametric family of distributions. One can also allow for truncated strategies by slight modifications in the likelihood function and Assumption 1. Finally, instead of the regular lattice, one can consider players located at the nodes of some graph, which describes the network structure, as in the social interactions literature.

Acknowledgments

I would like to thank Konrad Menzel for his helpful comments.

Conflicts of Interest

The author declares no conflict of interest.

Appendix

A. Proofs for Section 2

Throughout appendices, C denotes a generic constant that does not depend on n and may vary from line to line. Also,

|A| = t r a c e^{1 / 2} (A^{'} A)

denotes the norm of a nonrandom matrix A.

Proof of Lemma 1: The first-order conditions with respect to price are the same for both

y_{i} > \bar{y}

and

y_{i} \leq \bar{y}

. They imply the following optimal price:

\frac{\partial Π}{\partial p_{i}} = p_{i}^{- ν} s_{i}^{ϵ} - ν [p_{i} - c_{i}] p_{i}^{- ν - 1} s_{i}^{ϵ} = 0 \Rightarrow p_{i}^{*} = \frac{ν}{ν - 1} c_{i} .

First, if

y_{i} > \bar{y}

, then the first-order conditions with respect to investment imply the following optimal investment:

\frac{\partial Π}{\partial y_{i}} = ϵ δ [p_{i} - c_{i}] p_{i}^{- ν} {(y_{i} + 1)}^{ϵ δ - 1} - 1 = 0 \Rightarrow y_{i}^{*} + 1 = B c_{i}^{- τ},

where

B = {(ϵ δ ν^{- ν} {(ν - 1)}^{(ν - 1)})}^{1 / (1 - ϵ δ)}

and

τ = (ν - 1) / (1 - ϵ δ) > 0

since, by assumption,

0 < ϵ δ < 1

and

ν > 1 .

The value of the profit at the optimal values of price and investment is

Π_{1} = (1 - ϵ δ) {(p_{i}^{*})}^{1 - ν} {(y_{i}^{*} + 1)}^{ϵ δ} / ν

.

Next, if

y_{i} \leq \bar{y}

, then

Π = [p_{i} - c_{i}] p_{i}^{- ν} {(\bar{y} + 1)}^{ϵ δ} - y_{i}

and, hence, it is optimal to set

y_{i}^{*} = 0

. The value of the profit in the second case is

Π_{2} = Π (p_{i}^{*}, 0) = {(p_{i}^{*})}^{1 - ν} {(\bar{y} + 1)}^{ϵ δ} / ν

. Thus, firm i engages in R&D iff

Π_{1} > Π_{2} \Leftrightarrow (y_{i}^{*} + 1) > {(1 - ϵ δ)}^{- 1 / (ϵ δ)} (\bar{y} + 1),

and hence the optimal investment is given by:

Y_{i} = \{\begin{matrix} b_{0} - X_{i}^{'} (τ β) - (τ α) \sum_{j \in N_{i}} E_{i} (Y_{j}) - τ e_{i}, & if b_{0} - X_{i}^{'} (τ β) - (τ α) \sum_{j \in N_{i}} E_{i} (Y_{j}) - τ e_{i} > γ_{0} \\ 0, & otherwise \end{matrix}

where

b_{0} = log B

and

γ_{0} = log ({(1 - ϵ δ)}^{- 1 / (ϵ δ)} (\bar{y} + 1)) > 0

since

0 < ϵ δ < 1

and

\bar{y} > 0 .

∎

Proof of Lemma 2: First, if

w (y_{i} + 1) > \bar{c}

, then

U = \frac{w^{δ} {(y_{i} + 1)}^{δ}}{δ} + h_{i} (T - y_{i} - 1)

and the first-order conditions imply the following labor supply:

\frac{\partial U}{\partial y_{i}} = w^{δ} {(y_{i} + 1)}^{δ - 1} - h_{i} = 0 \Rightarrow y_{i}^{*} + 1 = w^{δ / (1 - δ)} h_{i}^{1 / (δ - 1)} .

The optimal utility is

U_{1} = \frac{w^{δ} {(y_{i}^{*} + 1)}^{δ}}{δ} + h_{i} T - w^{δ} {(y_{i}^{*} + 1)}^{δ}

.

Next, if

w (y_{i} + 1) \leq \bar{c}

, then

U = \frac{w^{δ} {\bar{c}}^{δ}}{δ} + h_{i} (T - y_{i})

and, hence, it is optimal to set

y_{i}^{*} = 0

. The optimal utility in the second case is

U_{2} = \frac{w^{δ} {\bar{c}}^{δ}}{δ} + h_{i} T

. Thus, firm i engages in R&D iff

U_{1} > U_{2} \Leftrightarrow (y_{i}^{*} + 1) > {(1 - δ)}^{- 1 / δ} \bar{c},

and hence the optimal investment is given by:

Y_{i} = \{\begin{matrix} b_{0} - X_{i}^{'} \frac{β}{1 - δ} - \frac{α}{1 - δ} \sum_{j \in N_{i}} E_{i} (Y_{j}) - \frac{e_{i}}{1 - δ}, & if b_{0} - X_{i}^{'} \frac{β}{1 - δ} - \frac{α}{1 - δ} \sum_{j \in N_{i}} E_{i} (Y_{j}) - \frac{e_{i}}{1 - δ} > γ_{0} \\ 0, & otherwise \end{matrix}

where

b_{0} = log (w^{δ / (1 - δ)})

and

γ_{0} = log ({(1 - δ)}^{- 1 / δ} \bar{c}) > 0

since

\bar{c} \geq 1

and

0 < δ < 1 .

∎

B. Proofs for Section 3

Proof of Theorem 1: In the following, we suppress dependence of variables on n and write

{\tilde{Y}}_{i} = {\tilde{Y}}_{i n}

.

To prove the theorem, it suffices to show that the mapping

G = (G_{1}, . . ., G_{n}) :

R^{n} \to R^{n}

is a contraction mapping w.r.t.

{\{{\tilde{Y}}_{i}\}}_{i = 1}^{n}

, with the components given by:

\begin{matrix} G_{i} & = & G ({({\tilde{Y}}_{j})}_{j \in N_{i}}, X_{i}) = Φ (\frac{\sum_{j \in N_{i}} α_{j 0} {\tilde{Y}}_{j} + X_{i}^{'} β_{0} + b_{0} - γ_{0}}{σ_{0}}) (\sum_{j \in N_{i}} α_{j 0} {\tilde{Y}}_{j} + X_{i}^{'} β_{0} + b_{0}) \\ + σ_{0} ϕ (\frac{\sum_{j \in N_{i}} α_{j 0} {\tilde{Y}}_{j} + X_{i}^{'} β_{0} + b_{0} - γ_{0}}{σ_{0}}) \end{matrix}

(B1)

The result would then follow by the Banach Fixed Point Theorem.

To simplify notation, let

z = (\sum_{j \in N_{i}} α_{j 0} {\tilde{Y}}_{j} + X_{i}^{'} β_{0} + b_{0} - γ_{0}) σ_{0}^{- 1}

. Differentiating

G_{i}

w.r.t.

{\tilde{Y}}_{j}

gives

\frac{\partial G_{i}}{\partial {\tilde{Y}}_{j}} = \frac{α_{j 0}}{σ_{0}} ϕ (z) (σ_{0} z + γ_{0}) + α_{j 0} Φ (z) - α_{j 0} z ϕ (z) = \frac{α_{j 0} γ_{0}}{σ_{0}} ϕ (z) + α_{j 0} Φ (z)

where we used

ϕ^{'} (z) = - z ϕ (z)

. It then follows that

sup |\frac{\partial G_{i}}{\partial {\tilde{Y}}_{j}}| \leq |α_{j 0}| [ϕ (0) γ_{0} σ_{0}^{- 1} + 1] .

Hence, by the Mean Value Theorem, for any

X_{i}

and any Y,

Y^{'} \in R^{n}

:

|G (Y^{'}, X_{i}) - G (Y, X_{i})| < \sum_{j \in N_{i}} sup |\frac{\partial G}{\partial {\tilde{Y}}_{j}}| \cdot |Y_{j} - Y_{j}^{'}| \leq λ_{0} \sum_{j \in N_{i}} |Y_{j} - Y_{j}^{'}|, λ_{0} < 1

(B2)

by Assumption 1, where

λ_{0} = \bar{α} [ϕ (0) γ_{0} σ_{0}^{- 1} + 1]

and

\bar{α} = {max}_{1 \leq j \leq k} |α_{j 0}|

.

Consequently, the vector mapping

G

satisfies the following Lipschitz condition:

∥G (Y) - G (Y^{'})∥ \leq \sum_{i = 1}^{n} |G (Y^{'}, X_{i}) - G (Y, X_{i})| \leq λ \sum_{i = 1}^{n} |Y_{i} - Y_{i}^{'}|

with the Lipschitz coefficient

λ = k λ_{0} < 1,

by Assumption 1. ∎

Proof of Theorem 2: The proof is similar to that of Proposition 1 of Jenish [18]. Let

N_{i} (m) = \{j, j \neq i, 1 \leq j \leq n : ∥t (i) - t (j)∥ \leq m\}

be the m-neighborhood of agent i that excludes i, and

N_{i}^{o} (m) = N_{i} (m) \cup \{i\}

be the neighborhood of agent i that includes i, where

t (i) \in Z^{d}

is i’s location.

Let

{\tilde{Y}}_{i n} = F_{i} (X_{1}, \dots, X_{n})

,

i = 1, . . ., n

, be the unique solution of the system

{\tilde{Y}}_{i n} = G ({({\tilde{Y}}_{j n})}_{j \in N_{i} (r)}, X_{i})

,

i = 1, . . ., n

, defined in (B1). To simplify, we suppress dependence of variables on n, and write

{\tilde{Y}}_{i} = {\tilde{Y}}_{i n}

.

Fix i, and define

η_{i}^{(m)} = {(X_{j})}_{j \in N_{i}^{o} (m)}

and

ζ_{i}^{(m)} = {(X_{j})}_{j \in Λ_{n} \ N_{i}^{o} (m)}

for any

m \in N

, so that we can partition

X = \{X_{1}, \dots, X_{n}\} = (η_{i}^{(m)}; ζ_{i}^{(m)})

. Suppose that the underlying probability space is rich enough that there exists a random variable U uniformly distributed on

[0, 1]

that is independent of

\{X_{j}\}

. Then, by Lemma A1 in Jenish [18], there exists a function

h (U, η_{i}^{(m)})

such that the process

X^{(m)} = {\{X_{j}^{(m)}\}}_{j = 1}^{n} = (η_{i}^{(m)}; h (U, η_{i}^{(m)}))

has the same distribution as X.

We now construct an approximation to

{\tilde{Y}}_{i}

. Define

{\tilde{Y}}_{j}^{(m)} = F_{j} (X_{1}^{(m)}, \dots, X_{n}^{(m)}) = F_{j} (η_{i}^{(m)}; h (U, η_{i}^{(m)}))

(B3)

Since

X^{(m)}

has the same distribution as X , we have

{\tilde{Y}}_{j}^{(m)} = G (({({\tilde{Y}}_{l}^{(m)})}_{l \in N_{j} (r)}, X_{j}^{(m)})

. If

j \in N_{i} (m)

, then

X_{j}^{(m)} = X_{j}

and by the Lipschitz condition (B2), we have

{∥{\tilde{Y}}_{j} - {\tilde{Y}}_{j}^{(m)}∥}_{2} \leq λ_{0} \sum_{l \in N_{j} (r)} {∥{\tilde{Y}}_{l} - {\tilde{Y}}_{l}^{(m)}∥}_{2}

(B4)

Consider two cases:

m \geq r

and

m < r

. If

m \geq r

, recursive use of (B4) gives

\begin{matrix} {∥{\tilde{Y}}_{i} - {\tilde{Y}}_{i}^{(m)}∥}_{2} \leq λ_{0} \sum_{j_{1} \in N_{i} (r)} {∥{\tilde{Y}}_{j_{1}} - {\tilde{Y}}_{j_{1}}^{(m)}∥}_{2} \leq λ_{0}^{2} \sum_{j_{1} \in N_{i} (r)} \sum_{j_{2} \in N_{j_{1}} (r)} {∥{\tilde{Y}}_{j_{2}} - {\tilde{Y}}_{j_{2}}^{(m)}∥}_{2} \\ \leq & . . . \leq λ_{0}^{p} \sum_{j_{1} \in N_{i} (r)} . . . \sum_{j_{p} \in N_{j_{p - 1}} (r)} {∥{\tilde{Y}}_{j_{p}} - {\tilde{Y}}_{j_{p}}^{(m)}∥}_{2} \leq 2 {∥Y∥}_{2} λ^{p} \end{matrix}

where

λ = λ_{0} k < 1

. If

m < r

, then

p = [m / r] < 1

and again we will have the same bound on the error:

{∥{\tilde{Y}}_{i} - {\tilde{Y}}_{i}^{(m)}∥}_{2} \leq λ_{0} \sum_{j_{1} \in N_{i} (r)} {∥{\tilde{Y}}_{j_{1}} - {\tilde{Y}}_{j_{1}}^{(m)}∥}_{2} \leq 2 {∥Y∥}_{2} λ \leq 2 {∥Y∥}_{2} λ^{p}

Since i was arbitrary, using the same arguments, we can approximate the entire process

\{{\tilde{Y}}_{i}\}

by

\{{\tilde{Y}}_{i}^{(m)}\}

, which has a similar functional form but a more tractable dependence structure. The approximation error is given by

{sup}_{n, i \in Λ_{n}} {∥{\tilde{Y}}_{i} - {\tilde{Y}}_{i}^{(m)}∥}_{2} \leq 2 {∥Y∥}_{2} λ^{p} .

We now verify the NED condition

{sup}_{n, i \in Λ_{n}} {∥{\tilde{Y}}_{i} - E ({\tilde{Y}}_{i} | F_{i n}^{X} (m))∥}_{2} \to 0

as

m \to \infty,

where

F_{i n}^{X} (m) = σ (X_{j}, j \in N_{i}^{o} (m)) .

If

2 m >

diameter

(Λ_{n})

, then

{∥{\tilde{Y}}_{i} - E ({\tilde{Y}}_{i} | F_{i n}^{X} (m))∥}_{2} = 0

.

If

2 m \leq

diameter

(Λ_{n})

, approximate

{\tilde{Y}}_{i}

by

{\tilde{Y}}_{i}^{(m)}

as defined in (B3). Note that by construction,

E [{\tilde{Y}}_{i} | F_{i n}^{X} (m)] = E [{\tilde{Y}}_{i}^{(m)} | F_{i n}^{X} (m)] = \int_{0}^{1} F_{i} (η_{i}^{(m)}; h (u, η_{i}^{(m)})) d u .

Then, by the Jensen inequality and independence of U and

\{X_{i}\}

, we have

\begin{matrix} {∥{\tilde{Y}}_{i} - E ({\tilde{Y}}_{i} | F_{i n}^{X} (m)))∥}_{2} & = & {∥F_{i} (η_{i}^{(m)}; ζ_{i}^{(m)}) - \int_{0}^{1} F_{i} (η_{i}^{(m)}; h (u, η_{i}^{(m)})) d u∥}_{2} \\ \leq & {\{E \int_{0}^{1} {|F_{i} (η_{i}^{(m)}; ζ_{i}^{(m)}) - F_{i} (η_{i}^{(m)}; h (u, η_{i}^{(m)}))|}^{2} d u\}}^{1 / 2} \\ = & {∥F_{i} (η_{i}^{(m)}; ζ_{i}^{(m)}) - F_{i} (η_{i}^{(m)}; h (U, η_{i}^{(m)}))∥}_{2} \\ = & {∥{\tilde{Y}}_{i} - {\tilde{Y}}_{i}^{(m)}∥}_{2} \leq ψ (m) = 2 {∥Y∥}_{2} λ^{[m / r]} \to 0 as m \to \infty \end{matrix}

We now show that

\{Y_{i}\}

is

L_{2}

-NED on

\{(X_{i}, ε_{i})\}

. Let

F_{i n} (m) = σ ((X_{j}, ε_{j}), j \in N_{i}^{o} (m))

and define

Z_{i}

as

Z_{i} = g ({({\tilde{Y}}_{j})}_{j \in N_{i} (r)}) = \{\begin{matrix} b_{0} + X_{i}^{'} β_{0} + \sum_{j \in N_{i} (r)} α_{j 0} {\tilde{Y}}_{j} + ε_{i}, & b_{0} + X_{i}^{'} β_{0} + \sum_{j \in N_{i} (r)} α_{j 0} {\tilde{Y}}_{j} + ε_{i} > γ_{0} \\ γ_{0}, & otherwise \end{matrix}

Then,

Y_{i} = Z_{i} 1 \{Z_{i} > γ_{0}\}

. The proof will proceed in three steps: (i) verify the NED property of

Z_{i}

; (ii) show the NED property of

1 (Y_{i} > 0) = 1 \{Z_{i} > γ_{0}\}

; and (iii) apply Proposition 3 of Jenish and Prucha [5] to the product of the two processes to show its NED property.

It can be easily verified that

g (y_{1}, \dots, y_{k})

satisfies a Lipschitz inequality:

|g (y_{1}, . . ., y_{k}) - g (y_{1}^{'}, . . ., y_{k}^{'})| \leq \bar{α} \sum_{j = 1}^{k} |y_{j} - y_{j}^{'}|

, where

\bar{α} = {max}_{1 \leq j \leq k} |α_{j 0}|

. By the least mean squared property of the conditional mean,

\begin{matrix} {∥Z_{i} - E (Z_{i} | F_{i n} (m))∥}_{2} & \leq & {∥g ({\tilde{Y}}_{- i}) - g (E ({\tilde{Y}}_{- i} | F_{i n} (m)))∥}_{2} \\ \leq & \bar{α} \sum_{j \in N_{i} (r)} {∥{\tilde{Y}}_{j} - E [{\tilde{Y}}_{j} | F_{i n}^{X} (m)]∥}_{2} \leq k \bar{α} ψ (m) \to 0 as m \to \infty \end{matrix}

since

E [{\tilde{Y}}_{j} | F_{i n} (m)] = E [{\tilde{Y}}_{j} | F_{i n}^{X} (m)]

by independence of

\{ε_{i}\}

and

\{X_{i}, {\tilde{Y}}_{j}\}

.

Next, let

φ (z) = 1 \{z > γ_{0}\}

,

a > 0

be some positive scalar to be chosen later. Define the function:

φ_{m} (z) = \{\begin{matrix} 1, & z > γ_{0} + ψ^{a} (m) \\ ψ^{- a} (m) (z - γ_{0}), & γ_{0} \leq z \leq γ_{0} + ψ^{a} (m) \\ 0, & z < γ_{0} \end{matrix}

This piecewise linear function converges pointwise to

1 \{z > γ_{0}\}

as

m \to \infty;

but in contrast to

1 \{z > γ_{0}\},

it is Lipschitz-continuous with the Lipschitz coefficient

ψ^{- a} (m),

i.e., for all

z_{1}, z_{2}

|φ_{m} (z_{1}) - φ_{m} (z_{2})| \leq ψ^{- a} (m) |z_{1} - z_{2}|

. Moreover, observe that

{sup}_{z} |φ (z) - φ_{m} (z)| \leq 1

, and consequently,

\begin{matrix} {∥φ (z) - φ_{m} (z)∥}_{2} & \leq & {[\int 1 \{γ_{0} \leq z \leq γ_{0} + ψ^{a} (m)\} d F (z)]}^{1 / 2} \\ = & {[F (γ_{0} + ψ^{a} (m)) - F (γ_{0})]}^{1 / 2} \leq {[f (\tilde{z}) ψ^{a} (m)]}^{1 / 2} \leq C ψ^{a / 2} (m) \end{matrix}

where

f (\cdot)

is the p.d.f. of z and

\tilde{z} \in (γ_{0}, γ_{0} + ψ^{a} (m))

. Using the above inequalities gives

\begin{matrix} {∥φ (Z_{i}) - E (φ (Z_{i}) | F_{i n} (m))∥}_{2} \\ \leq & {∥φ (Z_{i}) - φ_{m} (Z_{i})∥}_{2} + {∥φ_{m} (Z_{i}) - E (φ_{m} (Z_{i}) | F_{i n} (m))∥}_{2} + {∥E (φ (Z_{i}) | F_{i n} (m)) - E (φ_{m} (Z_{i}) | F_{i n} (m))∥}_{2} \\ \leq & 2 {∥φ (Z_{i}) - φ_{m} (Z_{i})∥}_{2} + {∥φ_{m} (Z_{i}) - E (φ_{m} (Z_{i}) | F_{i n} (m))∥}_{2} \leq 2 ψ^{a / 2} (m) + k \bar{α} ψ^{1 - a} (m) . \end{matrix}

Now, minimize the order of magnitude of the variable in the last line by setting

a = 2 / 3

. Thus,

1 \{Z_{i} > γ_{0}\} = 1 (Y_{i} > 0)

is

L_{2}

-NED on

\{(X_{i}, ε_{i})\}

with the NED numbers

(2 + k \bar{α}) ψ^{1 / 3} (m) .

Next, observe that

1 (Y_{i} = 0) = 1 - 1 (Y_{i} > 0)

. Hence, by Proposition 2 of Jenish and Prucha [5],

\{1 (Y_{i} = 0)\}

is

L_{2}

-NED on

\{(X_{i}, ε_{i})\}

with the same NED numbers. Finally, noting that

|z 1 \{z > γ_{0}\} - z^{'} 1 \{z^{'} > γ_{0}\}| \leq (|z| + φ (z^{'})) (|z - z^{'}| + |φ (z) - φ (z^{'})|)

and using Proposition 3 of Jenish and Prucha [5] with

B (z, z^{'}) = |z| + φ (z^{'})

and

r = 3

, we have that

\{Y_{i}\}

is

L_{2}

-NED on

\{(X_{i}, ε_{i})\}

with the NED numbers

c ψ^{1 / 12} (m)

, where

c = 2 {∥Y∥}_{6}^{3 / 2} {∥Y∥}_{2}^{1 / 4} {(2 + k \bar{α})}^{1 / 4} .

∎

C. Proofs for Section 4

Proof of Proposition 1:

The proof of this proposition follows Carson and Sun [19]. The latter paper is not readily applicable to our model since it relies on LLN for independent processes. Let

n_{0}

and

n_{1}

denote, respectively, the sizes of the censored and uncensored subsamples, and let

Y_{i, (0)}

and

Y_{i, (1)}

denote observations from the censored and uncensored subsample, respectively. Conditional on the state variables

{\bar{X}}_{n}

, the uncensored subsample

\{Y_{i, (1)}\}

is independent and follows a truncated normal distribution. Consequently,

\begin{matrix} P (min \{Y_{(1)}\} > γ + z / n | {\bar{X}}_{n}, n_{1}) & = & P (Y_{1, (1)} > γ + z / n, . . ., Y_{n_{1}, (1)} > γ + z / n | {\bar{X}}_{n}, n_{1}) \\ = & \prod_{i = 1}^{n_{1}} (1 - \frac{Φ [(γ + z / n - Z_{i n, (1)}^{'} δ) / σ] - Φ [(γ - Z_{i n, (1)}^{'} δ) / σ]}{1 - Φ [(γ - Z_{i n, (1)}^{'} δ) / σ]}) \\ = & \prod_{i = 1}^{n_{1}} (1 - \frac{z}{n σ} \frac{ϕ [(γ - Z_{i n, (1)}^{'} δ) / σ] + o_{p} (1)}{1 - Φ [(γ - Z_{i n, (1)}^{'} δ) / σ]}) \end{matrix}

(C1)

where

o_{p} (1)

holds uniformly over i since

ϕ (\cdot)

is uniformly continuous on

R

. Now, let

μ_{i n} = \frac{1}{σ} \frac{ϕ [(γ - Z_{i n, (1)}^{'} δ) / σ]}{1 - Φ [(γ - Z_{i n, (1)}^{'} δ) / σ]} and μ = E (μ_{i n})

Next, we establish some inequalities for the r.v.

μ_{i n}

. By Lemma D1,

Z_{i n}

is

L_{2}

-NED on

\{X_{i}\}

with geometrically decaying coefficients. Then, by Proposition 3 of Jenish and Prucha [5],

\{μ_{i n}\}

is also

L_{2}

-NED on

\{X_{i}\}

, which is mixing satisfying Assumption 3. Consequently,

\{μ_{i n}\}

satisfies the LLN of Jenish and Prucha [5], i.e.,

\frac{1}{n_{1}} \sum_{i = 1}^{n_{1}} μ_{i n} = μ + o_{p} (1)

. By the Mill’s Ratio inequality,

|ϕ (z) / (1 - Φ (z))| \leq K |z|

for some

K < \infty,

and hence,

{sup}_{i, n} E ({|μ_{i n}|}^{6}) \leq \bar{M} < \infty,

since

{sup}_{i, n} E ({|Z_{i n}|}^{6}) < \infty

by assumption. Next, let

M_{k} = {max}_{1 \leq i \leq k} μ_{i n}

, for

k \leq n

and define sets

A_{1} (c) = \{M_{1} > c\},

A_{k} (c) = \{M_{k - 1} \leq c, μ_{k n} > c\}

for

k = 2, . . ., n

. Clearly,

\{M_{n} > c\} \subseteq \cup_{k = 1}^{n} A_{k} (c)

and

A_{k} (c) \subseteq \{μ_{k n} > c\} .

Then, by the Markov inequality,

P (max_{1 \leq i \leq n} μ_{i n} > n^{1 / 6} C) \leq \sum_{k = 1}^{n} P (A_{k} (n^{1 / 6} C)) \leq \sum_{k = 1}^{n} P (μ_{k n} > n^{1 / 6} C) \leq \bar{M} C^{- 6}

which implies

{max}_{1 \leq i \leq n} μ_{i n} = O_{p} (n^{1 / 6}) .

Now, using (C1) and the last inequality gives

\begin{matrix} \frac{1}{n_{1}} log P (min \{Y_{(1)}\} > γ + z / n | {\bar{X}}_{n}, n_{1}) = \frac{1}{n_{1}} \sum_{i = 1}^{n_{1}} log (1 - \frac{z}{n} μ_{i n} + o_{p} (\frac{1}{n})) \\ = & - \frac{1}{n_{1}} \sum_{i = 1}^{n_{1}} [\frac{z}{n} μ_{i n} + o_{p} (\frac{1}{n}) + O (\frac{μ_{i n}^{2}}{n^{2}})] = [- \frac{1}{n_{1}} \sum_{i = 1}^{n_{1}} \frac{z}{n} μ_{i n}] + o_{p} (\frac{1}{n}) + O (\frac{{max}_{1 \leq i \leq n} μ_{i n}^{2}}{n^{2}}) \\ = & [- \frac{z}{n} \frac{1}{n_{1}} \sum_{i = 1}^{n_{1}} μ_{i n}] + o_{p} (\frac{1}{n}) + O_{p} (n^{- 5 / 3}) = - \frac{z}{n} [μ + o_{p} (1)] + o_{p} (\frac{1}{n}) = - \frac{z μ}{n} (1 + o_{p} (1)) \end{matrix}

Thus,

P (min \{Y_{(1)}\} > γ + z / n | n_{1}) = exp [- \frac{n_{1}}{n} z μ (1 + o_{p} (1))]

. Moreover, by Theorem 2,

1 (Y_{i} > γ)

is

L_{2}

-NED on

\{{(X_{i}^{'}, ε_{i})}^{'}\}

, and hence, satisfies the LLN of Jenish and Prucha [5]:

\frac{n_{1}}{n} = \frac{1}{n} \sum_{i = 1}^{n} 1 (Y_{i} > γ) \overset{p}{\to} E [1 - Φ ((γ - Z_{i n}^{'} δ) / σ)] : = κ

Consequently,

P (min \{Y_{(1)}\} > γ + z / n | n_{1}) \to exp (- a z),

where

a = κ μ

. As the right-hand side of the last expression does not depend on

n_{1}

, it is also the limit of the unconditional probability. Thus,

lim_{n \to \infty} P (n (\hat{γ} - γ) \leq z) = 1 - exp (- a z) for z > 0 ∎

Proof of Theorem 3:

By Proposition 1,

γ_{0}

is identified, so it remains to prove identification of θ by showing that the population objective function

Q (θ, γ_{0}) = E [m (W_{i n}, θ, γ_{0})]

is uniquely maximized at

θ_{0}

.

A. Identification in ML

To prove identification in the ML case, it suffices to verify the Kullback-Leibler information inequality, i.e., for all

θ \in Θ

s.t.

θ \neq θ_{0}

log f (Y_{i} | {\bar{X}}_{n}; θ, γ_{0}) \neq log f_{i} (Y_{i} | {\bar{X}}_{n}; θ_{0}, γ_{0}) with positive probability.

(C2)

Observe that

log f (Y_{i} | {\bar{X}}_{n}; θ, γ_{0}) - log f_{i} (Y_{i} | {\bar{X}}_{n}; θ_{0}, γ_{0}) = 1 (Y_{i} = 0) [log Φ (\frac{γ_{0} - Z_{i n}^{'} δ}{σ}) - log Φ (\frac{γ_{0} - Z_{i n}^{'} δ_{0}}{σ_{0}})] - - \frac{1}{2} 1 (Y_{i} > γ_{0}) [log σ^{2} - log σ_{0}^{2}] - 1 (Y_{i} > γ_{0}) [\frac{{(Y_{i} - Z_{i n}^{'} δ)}^{2}}{2 σ^{2}} - \frac{{(Y_{i} - Z_{i n}^{'} δ_{0})}^{2}}{2 σ_{0}^{2}}] .

Clearly,

log f (Y_{i} | {\bar{X}}_{n}; θ, γ_{0}) \neq log f (Y_{i} | {\bar{X}}_{n}; θ_{0}, γ_{0})

with positive probability if

σ^{2} \neq σ_{0}^{2}

. Suppose the opposite were true, i.e.,

log f (Y_{i} | {\bar{X}}_{n}; θ, γ_{0}) = log f (Y_{i} | {\bar{X}}_{n}; θ_{0}, γ_{0})

w.p.1. Then, we would have

log \frac{σ^{2}}{σ_{0}^{2}} - \frac{{(Y_{i} - Z_{i n}^{'} δ_{0})}^{2}}{σ_{0}^{2}} + \frac{{(Y_{i} - Z_{i n}^{'} δ)}^{2}}{σ^{2}} = 0

for

Y_{i} > γ_{0},

which implies that the r.v.

{(Y_{i} - Z_{i n}^{'} δ_{0})}^{2}

and

{(Y_{i} - Z_{i n}^{'} δ)}^{2}

would have a point mass, which is impossible since at least one of the regressors in

X_{i}

has a full support by Assumption 4. So,

σ_{0}^{2}

is identified, and we can focus on the case:

σ^{2} = σ_{0}^{2}

but

δ \neq δ_{0} .

If

δ \neq δ_{0}

, then

E {(Z_{i n}^{'} δ - Z_{i n}^{'} δ_{0})}^{2} = E {[Z_{i n}^{'} (δ - δ_{0})]}^{2} = {(δ - δ_{0})}^{'} E (Z_{i n} Z_{i n}^{'}) (δ - δ_{0}) > 0,

since

E (Z_{i n} Z_{i n}^{'})

is positive definite by Assumption 4. Hence,

Z_{i n}^{'} δ \neq Z_{i n}^{'} δ_{0}

with positive probability. Consequently,

Y_{i} - Z_{i n}^{'} δ \neq Y_{i} - Z_{i n}^{'} δ_{0}

with positive probability. Since

Z_{i n}^{'} δ \neq Z_{i n}^{'} δ_{0}

with positive probability, and

Φ (\cdot)

and log are strictly increasing functions, we also have

log Φ (\frac{γ_{0} - Z_{i n}^{'} δ}{σ_{0}}) \neq log Φ (\frac{γ_{0} - Z_{i n}^{'} δ_{0}}{σ_{0}})

with positive probability, which proves (C2).

B. Identification in LS

To prove identification in the LS case, it suffices to verify that for all

θ \in Θ

s.t.

θ \neq θ_{0}

φ (Z_{i n}, θ, γ_{0}) \neq φ (Z_{i n}, θ_{0}, γ_{0}) with positive probability.

(C3)

Recall

φ (Z_{i n}, θ, γ_{0}) = Φ (\frac{Z_{i n}^{'} δ - γ_{0}}{σ}) Z_{i n}^{'} δ + σ ϕ (\frac{Z_{i n}^{'} δ - γ_{0}}{σ}) .

Re-parametrize:

υ = δ / σ

and

σ = σ

. This mapping is one-to-one, and hence its suffices to prove identification of the parameters

υ_{0} = δ_{0} / σ_{0}

and

σ_{0}

.

By similar arguments as in part A, if

υ \neq υ_{0}

, then

Z_{i n}^{'} υ \neq Z_{i n}^{'} υ_{0}

with positive probability. Denote

u = Z_{i n}^{'} υ

and

u_{0} = Z_{i n}^{'} υ_{0},

and consider the function

\tilde{φ} (u, σ) = σ [u Φ (u - γ_{0} / σ) + ϕ (u - γ_{0} / σ)] .

It is strictly increasing in σ. To see this, let

\tilde{γ} = γ_{0} / σ

and note that

\frac{\partial \tilde{φ} (u, σ)}{\partial σ} = u Φ (u - \tilde{γ}) + (1 + {\tilde{γ}}^{2}) ϕ (u - \tilde{γ}) > 0,

because if

u < 0

, then

- u \leq (\tilde{γ} - u) < \frac{ϕ (\tilde{γ} - u)}{1 - Φ (\tilde{γ} - u)} = \frac{ϕ (u - \tilde{γ})}{Φ (u - \tilde{γ})}

by the Mill’s Ratio inequality since

\tilde{γ} = γ_{0} / σ \geq 0

. Moreover,

\tilde{φ} (u, σ)

is also strictly increasing in u:

\frac{\partial \tilde{φ} (u, σ)}{\partial u} = σ [Φ (u - \tilde{γ}) + u ϕ (u - \tilde{γ}) - (u - \tilde{γ}) ϕ (u - \tilde{γ})] = σ [Φ (u - \tilde{γ}) + \tilde{γ} ϕ (u - \tilde{γ})] > 0,

which proves identification of

υ_{0}

and

σ_{0},

and thus completes the proof of the theorem. ∎

D. Proofs for Section 5

The following lemma collects formulas and some properties of the score and Hessian functions for the ML and LS estimators, which are used throughout the proofs. Let

m_{i n} = m (W_{i n}, θ, γ),

s (W_{i n}, θ, γ) = \frac{\partial m_{i n}}{\partial θ} = {(\frac{\partial m_{i n}}{\partial δ^{'}}; \frac{\partial m_{i n}}{\partial σ^{2}})}^{'}

denote the score function, and let the Hessian matrix be denoted by:

H (W_{i n}, θ, γ) = \frac{\partial^{2} m_{i n}}{\partial θ \partial θ^{'}} = [\begin{matrix} \frac{\partial^{2} m_{i n}}{\partial δ \partial δ^{'}} & \frac{\partial^{2} m_{i n}}{\partial δ \partial σ^{2}} \\ \frac{\partial^{2} m_{i n}}{\partial σ^{2} \partial δ^{'}} & \frac{\partial^{2} m_{i n}}{\partial {(σ^{2})}^{2}} \end{matrix}]

Lemma D1.

Let

ϕ_{i n} = ϕ (\frac{γ - Z_{i n}^{'} δ}{σ})

,

Φ_{i n} = Φ (\frac{γ - Z_{i n}^{'} δ}{σ})

and

φ_{i n} = φ (Z_{i n}, θ, γ)

.

A.: For the ML estimator, the components of the score and Hessian are given by:

$\begin{matrix} \frac{\partial m_{i n}}{\partial δ} & = & - 1 (Y_{i} = 0) \frac{ϕ_{i n}}{σ Φ_{i n}} Z_{i n} + 1 (Y_{i} > γ) \frac{1}{σ^{2}} (Y_{i} - Z_{i n}^{'} δ) Z_{i n} \\ \frac{\partial m_{i n}}{\partial σ^{2}} & = & - 1 (Y_{i} = 0) \frac{ϕ_{i n}}{2 σ^{3} Φ_{i n}} (γ - Z_{i n}^{'} δ) - \frac{1}{2 σ^{2}} 1 (Y_{i} > γ) + \frac{1}{2 σ^{4}} 1 (Y_{i} > γ) {(Y_{i} - Z_{i n}^{'} δ)}^{2} \\ \frac{\partial^{2} m_{i n}}{\partial δ \partial δ^{'}} & = & - 1 (Y_{i} = 0) \frac{ϕ_{i n}}{σ Φ_{i n}^{2}} [\frac{1}{σ} ϕ_{i n} - \frac{1}{σ^{2}} (γ - Z_{i n}^{'} δ) Φ_{i n}] Z_{i n} Z_{i n}^{'} - 1 (Y_{i} > γ) \frac{1}{σ^{2}} Z_{i n} Z_{i n}^{'} \\ \frac{\partial^{2} m_{i n}}{\partial σ^{2} \partial δ^{'}} & = & - 1 (Y_{i} = 0) \frac{ϕ_{i n}}{2 σ^{3} Φ_{i n}^{2}} [\frac{1}{σ} (γ - Z_{i n}^{'} δ) ϕ_{i n} - Φ_{i n} + \frac{1}{σ^{2}} {(γ - Z_{i n}^{'} δ)}^{2} Φ_{i n}] Z_{i n} \\ - \frac{1}{σ^{4}} 1 (Y_{i} > γ) (Y_{i} - Z_{i n}^{'} δ) Z_{i n} \\ \frac{\partial^{2} m_{i n}}{\partial {(σ^{2})}^{2}} & = & 1 (Y_{i} = 0) \frac{σ^{- 5} ϕ_{i n}}{4 Φ_{i n}^{2}} [\frac{{(Z_{i n}^{'} δ - γ)}^{3}}{σ^{2}} Φ_{i n} - 3 (γ - Z_{i n}^{'} δ) Φ_{i n} - \frac{{(Z_{i n}^{'} δ - γ)}^{2}}{σ} ϕ_{i n}] \\ + \frac{1}{2 σ^{4}} 1 (Y_{i} > γ) - \frac{1}{σ^{6}} 1 (Y_{i} > γ) {(Y_{i} - Z_{i n}^{'} δ)}^{2} \end{matrix}$

(D1)
B.: For the LS estimator, the components of the score and Hessian are given by:

$\begin{matrix} \frac{\partial m_{i n}}{\partial δ} & = & 2 (Y_{i} - φ_{i n}) [Φ_{i n} + \frac{γ}{σ} ϕ_{i n}] Z_{i n}, \\ \frac{\partial m_{i n}}{\partial σ^{2}} & = & (Y_{i} - φ_{i n}) ϕ_{i n} [\frac{γ (γ - Z_{i n}^{'} δ γ)}{σ^{3}} + \frac{1}{σ}] \\ \frac{\partial^{2} m_{i n}}{\partial δ \partial δ^{'}} & = & - 2 {[Φ_{i n} + \frac{γ}{σ} ϕ_{i n}]}^{2} Z_{i n} Z_{i n}^{'} + 2 (Y_{i} - φ_{i n}) \frac{ϕ_{i n}}{σ} [1 - \frac{(Z_{i n}^{'} δ - γ) γ}{σ^{2}}] Z_{i n} Z_{i n}^{'} \\ \frac{\partial^{2} m_{i n}}{\partial σ^{2} \partial δ^{'}} & = & - ϕ_{i n} [(Φ_{i n} + \frac{γ}{σ} ϕ_{i n}) \frac{γ (γ - Z_{i n}^{'} δ γ) + σ^{2}}{σ^{3}} - (Y_{i} - φ_{i n}) \frac{γ {(γ - Z_{i n}^{'} δ γ)}^{2} + σ^{2} (γ - Z_{i n}^{'} δ) - γ σ}{σ^{4}}] Z_{i n} \\ \frac{\partial^{2} m_{i n}}{\partial {(σ^{2})}^{2}} & = & - ϕ_{i n}^{2} {[\frac{γ^{2} - Z_{i n}^{'} δ}{σ^{3}} + \frac{1}{σ}]}^{2} + \frac{1}{2} (Y_{i} - φ_{i n}) ϕ_{i n} [- \frac{{(Z_{i n}^{'} δ - γ)}^{3} γ}{σ^{7}} + \frac{(Z_{i n}^{'} δ - γ) (Z_{i n}^{'} δ + 2 γ)}{σ^{5}} - \frac{1}{σ^{3}}] \end{matrix}$

(D2)
C.: Under Assumptions 1–3(i), $\{m (W_{i n}, θ, γ)\}$ and $\{H (W_{i n}, θ_{0}, γ_{0})\}$ are $L_{1}$ -NED on $\{{(X_{i}^{'}, ε_{i})}^{'}\}$ with coefficients decaying at geometric rates, and the score $\{s (W_{i n}, θ_{0}, γ_{0})\}$ is $L_{2}$ -NED on $\{{(X_{i}^{'}, ε_{i})}^{'}\}$ with coefficients decaying at geometric rates.

The rates of the NED coefficients are required only for verification of the assumptions of the CLT and LLN for NED processes of Jenish and Prucha [5], which rely on polynomial decay rates. Clearly, the NED coefficients declining at any geometric rate will automatically satisfy these theorems, and their exact orders of magnitude are unimportant for the proofs our results.

Proof of Lemma D1: Parts A and B follow by straightforward differentiation. To show part C, observe that by Theorem 2 and Proposition 3 of Jenish and Prucha [5],

\{1 (Y_{i n} > γ_{0})\},

\{1 (Y_{i n} = 0)\}

,

\{Φ (\frac{γ - Z_{i n}^{'} δ}{σ})\},

\{ϕ (\frac{γ - Z_{i n}^{'} δ}{σ})\}

and

\{{(Y_{i} - Z_{i n}^{'} δ)}^{2}\}

are

L_{2}

-NED on

\{{(X_{i}^{'}, ε_{i})}^{'}\}

with the NED coefficients decaying at geometric rates. Then, by Theorem 17.9 of Davidson [21], all products and sums of

L_{2}

-NED terms are

L_{1}

-NED variables with the NED coefficients decaying at the slowest rate of the multiples or summands. Thus,

\{m_{i n}\}

is

L_{1}

-NED on

\{{(X_{i}^{'}, ε_{i})}^{'}\}

for both the ML and LS estimators. B analogous arguments, the Hessians

H (W_{i n}, θ_{0}, γ_{0})

are

L_{1}

-NED on

\{{(X_{i}^{'}, ε_{i})}^{'}\}

for both the ML and LS estimators. Finally, the score of the ML estimator is

L_{2}

-NED on

\{{(X_{i}^{'}, ε_{i})}^{'}\}

with geometric decay rates by Example 17.17 of Davidson [21] as the sum of products of

1 (Y_{i} = 0)

or

1 (Y_{i} > γ)

, which are bounded and

L_{2}

-NED, and some smooth functions, each of which is

L_{2}

-NED on

\{{(X_{i}^{'}, ε_{i})}^{'}\}

with geometric decay rates. By analogous arguments, the score of the LS estimator is also

L_{2}

-NED on

\{{(X_{i}^{'}, ε_{i})}^{'}\}

with geometric decay rates. ∎

Proof of Theorem 4: We first show consistency. Let

Q (θ, γ) = E [m (W_{i n}, θ, γ)]

. To prove consistency of

{\hat{θ}}_{n}

, it suffices to verify assumptions of Lemma A-1 of Andrews [22], namely: (a)

{sup}_{θ \in Θ} |Q_{n} (θ, \hat{γ}) - Q (θ, γ_{0})| \overset{p}{\to} 0

, and (b) for every neighborhood

Θ_{0}

of θ,

{sup}_{θ \in Θ / Θ_{0}} Q (θ, γ_{0}) < Q (θ_{0}, γ_{0})

. Condition (b) is satisfied both for the ML and LS estimators by the identification theorem, Theorem 3. So, it only remains to verify condition (a).

A. We first verify condition (a) for the ML estimator. Note that

sup_{θ \in Θ} |Q_{n} (θ, \hat{γ}) - Q (θ, γ_{0})| \leq sup_{θ \in Θ} |Q_{n} (θ, γ_{0}) - Q (θ, γ_{0})| + sup_{θ \in Θ} |Q_{n} (θ, \hat{γ}) - Q_{n} (θ, γ_{0})|

(D3)

We now show that both terms on the r.h.s. of this inequality go to zero in probability. Consider the first term. By Lemma D1,

\{m (W_{i n}, θ, γ_{0})\}

is

L_{1}

-NED on

{(X_{i}^{'}, ε_{i})}^{'}

with the NED coefficients decaying at a geometric rate, and

{(X_{i}^{'}, ε_{i})}^{'}

is α-mixing satisfying Assumption 3(ii). Moreover,

E |m (W_{i n}, θ, γ)| < \infty

. Then, by Theorem 1 of Jenish and Prucha [5],

\{m (W_{i n}, θ, γ_{0})\}

satisfies a pointwise LLN on Θ. Next, since Θ is compact, and

m (W_{i n}, θ, γ_{0})

is continuously differentiable in θ,

m (W_{i n}, θ, γ_{0})

satisfies a Lipschitz condition in θ with a coefficient satisfying:

{sup}_{i, n} {sup}_{Θ} E |\frac{\partial m (W_{i n}, θ, γ_{0})}{\partial θ}| < \infty

, by Assumption 5. Thus,

\{m (W_{i n}, θ, γ_{0})\}

is

L_{1}

-stochastically equicontinuous on Θ, and hence, by the ULLN of Jenish and Prucha [23], the first term on the r.h.s. of (D3) converges to zero.

Next, write out the second term in (D3) as

\begin{matrix} Q_{n} (θ, \hat{γ}) - Q_{n} (θ, γ_{0}) & = & \frac{1}{n} \sum_{i = 1}^{n} 1 (Y_{i} = 0) [log Φ (\frac{\hat{γ} - Z_{i n}^{'} δ}{σ}) - log Φ (\frac{γ_{0} - Z_{i n}^{'} δ}{σ})] \\ + \frac{1}{n} \sum_{i = 1}^{n} log [σ^{- 1} ϕ (\frac{Y_{i} - Z_{i n}^{'} δ}{σ})] [1 (Y_{i} > \hat{γ}) - 1 (Y_{i} > γ_{0})] \end{matrix}

By construction,

\hat{γ} = min \{Y_{i} : Y_{i} > 0\} \geq γ_{0},

and hence,

1 (Y_{i} > \hat{γ}) - 1 (Y_{i} > γ_{0}) = \{\begin{matrix} 0, if Y_{i} \neq Y_{(1)} = \hat{γ} \\ - 1 (Y_{i} > γ_{0}), if Y_{i} = min \{Y_{i} : Y_{i} > 0\} = \hat{γ} \end{matrix}

(D4)

Note that the minimal value in the uncensored subsample,

\{Y_{i} > 0\}

, is attained only by a single observation in the subsample. This is because the variables

\{Y_{i}\}

are i.i.d. continuously distributed on

(γ_{0}, + \infty)

so that the probability of the event

Y_{i} = Y_{j}

,

i \neq j

, is zero. Then,

\begin{matrix} Q_{n} (θ, \hat{γ}) - Q_{n} (θ, γ_{0}) & = & \frac{1}{n} \sum_{i = 1}^{n} 1 (Y_{i} = 0) [log Φ (\frac{\hat{γ} - Z_{i n}^{'} δ}{σ}) - log Φ (\frac{γ_{0} - Z_{i n}^{'} δ}{σ})] \\ - \frac{1}{n} 1 (\hat{γ} > γ_{0}) log [σ^{- 1} ϕ (\frac{\hat{γ} - Z_{i n, (1)}^{'} δ}{σ})] \end{matrix}

Since

log Φ (z)

is continuously differentiable on

R

, we have by the Mean Value Theorem:

log Φ (\frac{\hat{γ} - Z_{i n}^{'} δ}{σ}) - log Φ (\frac{γ_{0} - Z_{i n}^{'} δ}{σ}) = \frac{ϕ (\frac{\tilde{γ} - Z_{i n}^{'} δ}{σ})}{σ Φ (\frac{\tilde{γ} - Z_{i n}^{'} δ}{σ})} (\hat{γ} - γ_{0})

where

\tilde{γ}

is between

\hat{γ}

and

γ_{0}

, and

\tilde{γ} \overset{p}{\to} γ_{0} .

Then,

\begin{matrix} sup_{θ \in Θ} |Q_{n} (θ, \hat{γ}) - Q_{n} (θ, γ_{0})| & \leq & |\hat{γ} - γ_{0}| \frac{1}{n} \{\sum_{i = 1}^{n} sup_{θ \in Θ} |\frac{ϕ (\frac{\tilde{γ} - Z_{i n}^{'} δ}{σ})}{σ Φ (\frac{\tilde{γ} - Z_{i n}^{'} δ}{σ})}|\} \\ + \frac{1}{n} 1 (\hat{γ} > γ_{0}) sup_{θ \in Θ} |log [σ^{- 1} ϕ (\frac{\hat{γ} - Z_{i n, (1)}^{'} δ}{σ})]| \overset{p}{\to} 0 \end{matrix}

since

\hat{γ} \overset{p}{\to} γ_{0}

,

{sup}_{x \in R} \frac{ϕ (x)}{Φ (x)} \leq C

,

σ \in [σ_{1}, σ_{2}]

,

σ_{1} > 0

,

E [{sup}_{δ} {(Y_{i n} - Z_{i n}^{'} δ)}^{2}] < \infty

, by Assumption 5, and by Proposition 1:

E (1 (\hat{γ} > γ_{0})) = P (\hat{γ} > γ_{0}) = P (n (\hat{γ} - γ_{0}) > 0) \to 1 .

Thus,

{sup}_{θ \in Θ} |Q_{n} (θ, \hat{γ}) - Q (θ, γ_{0})| \overset{p}{\to} 0

, and hence, the ML estimator

{\hat{θ}}_{n} \overset{p}{\to} θ_{0} .

B. We now verify condition (a) for the LS estimator. Using similar arguments as in part A,

\{m (W_{i n}, θ, γ_{0})\}

is

L_{1}

-NED and

L_{1}

-stochastically equicontinuous on Θ, and hence, by the ULLN of Jenish and Prucha [23], the first term on the r.h.s. of (D3) converges to zero.

As for the second term, observe that by the Mean Value Theorem:

\begin{matrix} Q_{n} (θ, \hat{γ}) - Q_{n} (θ, γ_{0}) = \frac{1}{n} \sum_{i = 1}^{n} [m (W_{i n}, θ, \hat{γ}) - m (W_{i n}, θ, γ_{0})] \\ = & \frac{1}{n} \sum_{i = 1}^{n} [φ (W_{i n}, θ, γ_{0}) - φ (W_{i n}, θ, \hat{γ})] [2 Y_{i} + φ (θ, γ_{0}) + φ (θ, \hat{γ})] \\ = & (\hat{γ} - γ_{0}) \frac{1}{n} \sum_{i = 1}^{n} \{\frac{γ_{0}}{σ} ϕ (\frac{Z_{i n}^{'} δ - \tilde{γ}}{σ}) [2 Y_{i} + φ (θ, γ_{0}) + φ (θ, \hat{γ})]\} \overset{p}{\to} 0 \end{matrix}

since

\hat{γ} \overset{p}{\to} γ_{0}

and the r.v. in braces is

O_{p} (1),

where

φ (W_{i n}, θ, γ) = Φ (\frac{Z_{i n}^{'} δ - γ}{σ}) Z_{i n}^{'} δ + σ ϕ (\frac{Z_{i n}^{'} δ - γ}{σ})

and

\frac{\partial φ}{\partial γ} = γ ϕ ((Z_{i n}^{'} δ - γ) / σ) / σ

. This completes the proof of consistency

Proof of asymptotic normality: Taking the mean value expansion of the first-order conditions around

θ_{0}

gives:

0 = \frac{\partial Q_{n} (\hat{θ}, \hat{γ})}{\partial θ} = \frac{1}{n} \sum_{i = 1}^{n} s (W_{i n}, θ_{0}, \hat{γ}) + [\frac{1}{n} \sum_{i = 1}^{n} H (W_{i n}, \tilde{θ}, \hat{γ})] ({\hat{θ}}_{n} - θ_{0})

where

\tilde{θ}

lies between

\hat{θ}

and

θ_{0}

, and

{\hat{θ}}_{n} \overset{p}{\to} θ_{0}

. For both ML and LS estimators, we verify below that

\frac{1}{n} \sum_{i = 1}^{n} H (W_{i n}, \tilde{θ}, \hat{γ}) \overset{p}{\to} H_{0} = E [H (W_{i n}, θ_{0}, γ_{0})]

(D5)

where

H_{0}

is nonsingular by assumption. Then,

\sqrt{n} ({\hat{θ}}_{n} - θ_{0}) = - H_{0}^{- 1} \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} [s (W_{i n}, θ_{0}, \hat{γ}) - E s (W_{i n}, θ_{0}, \hat{γ})] - H_{0}^{- 1} \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} E s (W_{i n}, θ_{0}, \hat{γ}) + o_{p} (1)

Define:

ν_{n} (γ) = n^{- 1 / 2} \sum_{i = 1}^{n} [s (W_{i n}, θ_{0}, γ) - E s (W_{i n}, θ_{0}, γ)],

and re-write

\sqrt{n} ({\hat{θ}}_{n} - θ_{0}) = - H_{0}^{- 1} ν_{n} (γ_{0}) - H_{0}^{- 1} [ν_{n} (\hat{γ}) - ν_{n} (γ_{0})] - H_{0}^{- 1} \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} E s (W_{i n}, θ_{0}, \hat{γ}) + o_{p} (1)

By Lemma D1,

\{s (W_{i n}, θ_{0}, γ_{0})\}

is

L_{2}

-NED on

\{{(X_{i}^{'}, ε_{i})}^{'}\}

with the NED coefficients decaying at a geometric rate both in the case of the ML and LS estimators. In turn,

\{{(X_{i}^{'}, ε_{i})}^{'}\}

is mixing with α-mixing coefficients satisfying Assumption 3. Then, by the CLT of Jenish and Prucha [5],

ν_{n} (γ_{0}) \overset{d}{\to} N (0, S_{0})

, where

S_{0} = V a r [s (W_{i n}, θ_{0}, γ_{0})]

. So to prove the theorem, it remains to verify (D5) and show that

\begin{matrix} ν_{n} (\hat{γ}) - ν_{n} (γ_{0}) \overset{p}{\to} 0, \end{matrix}

(D6)

\begin{matrix} n^{- 1 / 2} \sum_{i = 1}^{n} E s (W_{i n}, θ_{0}, \hat{γ}) \overset{p}{\to} 0 \end{matrix}

(D7)

A. ML estimator

We first verify (D7). Note that by the population log likelihood equality

E s (W_{i n}, θ_{0}, γ_{0}) = E [\frac{\partial}{\partial θ} log f (W_{i n}, θ_{0}, γ_{0})] = 0

(D8)

Next partition

E s (W_{i n}, θ_{0}, γ) = {[E \frac{\partial}{\partial δ^{'}} m (W_{i n}, θ_{0}, γ); E \frac{\partial}{\partial σ^{2}} m (W_{i n}, θ_{0}, γ)]}^{'}

and use formulas (D1) to find the conditional expectations for

γ \geq γ_{0}

:

\begin{matrix} h_{δ} (W_{i n}, θ_{0}, γ) & \equiv & E [\frac{\partial}{\partial δ} m (W_{i n}, θ_{0}, γ) | {\bar{X}}_{n}] \\ = & - Φ (\frac{γ_{0} - Z_{i n}^{'} δ_{0}}{σ_{0}}) \frac{ϕ (\frac{γ - Z_{i n}^{'} δ_{0}}{σ_{0}})}{σ_{0} Φ (\frac{γ - Z_{i n}^{'} δ_{0}}{σ_{0}})} Z_{i n} + \frac{Z_{i n}}{σ_{0}^{2}} E [ε_{i} 1 (ε_{i} > γ - Z_{i n}^{'} δ_{0}) | {\bar{X}}_{n}] \\ = & - Φ (\frac{γ_{0} - Z_{i n}^{'} δ_{0}}{σ_{0}}) \frac{ϕ (\frac{γ - Z_{i n}^{'} δ_{0}}{σ_{0}})}{σ_{0} Φ (\frac{γ - Z_{i n}^{'} δ_{0}}{σ_{0}})} Z_{i n} + \frac{ϕ (\frac{Z_{i n}^{'} δ_{0} - γ}{σ_{0}})}{Φ (\frac{Z_{i n}^{'} δ_{0} - γ}{σ_{0}})} \frac{Z_{i n}}{σ_{0}} \end{matrix}

\begin{matrix} h_{σ} (W_{i n}, θ_{0}, γ) & \equiv & E [\frac{\partial}{\partial σ^{2}} m (W_{i n}, θ_{0}, γ) | {\bar{X}}_{n}] = - E [1 (Y_{i} = 0) | {\bar{X}}_{n}] \frac{ϕ (\frac{γ - Z_{i n}^{'} δ_{0}}{σ_{0}})}{2 σ_{0}^{3} Φ (\frac{γ - Z_{i n}^{'} δ_{0}}{σ_{0}})} (γ - Z_{i n}^{'} δ_{0}) \\ - \frac{1}{2 σ_{0}^{2}} E [1 (Y_{i} > γ) | {\bar{X}}_{n}] + \frac{1}{2 σ_{0}^{4}} E [1 (Y_{i} > γ) {(Y_{i} - Z_{i n}^{'} δ_{0})}^{2} | {\bar{X}}_{n}] \\ = & - Φ (\frac{γ_{0} - Z_{i n}^{'} δ_{0}}{σ_{0}}) \frac{ϕ (\frac{γ - Z_{i n}^{'} δ_{0}}{σ_{0}})}{2 σ_{0}^{3} Φ (\frac{γ - Z_{i n}^{'} δ_{0}}{σ_{0}})} (γ - Z_{i n}^{'} δ_{0}) - \\ - \frac{1}{2 σ_{0}^{2}} Φ (\frac{Z_{i n}^{'} δ_{0} - γ}{σ_{0}}) + \frac{1}{2 σ_{0}^{2}} Z_{i n} [1 + (γ - Z_{i n}^{'} δ_{0}) \frac{ϕ (\frac{Z_{i n}^{'} δ_{0} - γ}{σ_{0}})}{Φ (\frac{Z_{i n}^{'} δ_{0} - γ}{σ_{0}})}] \end{matrix}

since by Amemiya [24],

E [ε_{i}^{2} 1 (ε_{i} > γ - Z_{i n}^{'} δ_{0}) | {\bar{X}}_{n}] = σ_{0}^{2} [1 - (Z_{i n}^{'} δ_{0} - γ) ϕ (\frac{Z_{i n}^{'} δ_{0} - γ}{σ_{0}}) / Φ (\frac{Z_{i n}^{'} δ_{0} - γ}{σ_{0}})]

.

By the Law of Iterated Expectations,

E s (W_{i n}, θ_{0}, γ) = {(E {[h_{δ} (W_{i n}, θ_{0}, γ)]}^{'}, E [h_{σ} (W_{i n}, θ_{0}, γ)])}^{'} .

Next, observe that

h_{δ} (W_{i n}, θ_{0}, γ)

and

h_{σ} (W_{i n}, θ_{0}, γ)

are continuous in γ uniformly on

Γ,

since Γ is compact. Moreover,

h_{δ} (W_{i n}, θ_{0}, γ)

and

h_{σ} (W_{i n}, θ_{0}, γ)

both satisfy domination conditions in γ in the ϵ-neighborhood of

γ_{0}

. Then, since

\hat{γ} \geq γ_{0}

and

\hat{γ} ↓ γ_{0}

in probability,

\begin{matrix} p lim_{n \to \infty} E [h_{δ} (W_{i n}, θ_{0}, \hat{γ})] & = & E [p lim_{n \to \infty} h_{δ} (W_{i n}, θ_{0}, \hat{γ})] = E [h_{δ} (W_{i n}, θ_{0}, γ_{0})] = 0 \\ p lim_{n \to \infty} E [h_{σ} (W_{i n}, θ_{0}, \hat{γ})] & = & E [p lim_{n \to \infty} h_{σ} (W_{i n}, θ_{0}, \hat{γ})] = E [h_{σ} (W_{i n}, θ_{0}, γ_{0})] = 0 \end{matrix}

by (D8), which verifies (D7). We next verify (D6). Write

ν_{n} (\hat{γ}) - ν_{n} (γ_{0}) = n^{- 1 / 2} \sum_{i = 1}^{n} [s (W_{i n}, θ_{0}, \hat{γ}) - s (W_{i n}, θ_{0}, γ_{0})] - n^{- 1 / 2} \sum_{i = 1}^{n} E s (W_{i n}, θ_{0}, \hat{γ})

since

E s (W_{i n}, θ_{0}, γ_{0}) = 0

. It was already shown that

n^{- 1 / 2} \sum_{i = 1}^{n} E s (W_{i n}, θ_{0}, \hat{γ}) \overset{p}{\to} 0

. It remains to show that

n^{- 1 / 2} \sum_{i = 1}^{n} [s (W_{i n}, θ_{0}, \hat{γ}) - s (W_{i n}, θ_{0}, γ_{0})] \overset{p}{\to} 0 .

Partition

s (W_{i n}, θ_{0}, \hat{γ}) - s (W_{i n}, θ_{0}, γ_{0}) = {(s_{1 i} {(\hat{γ}, γ_{0})}^{'}, s_{2 i} (\hat{γ}, γ_{0}))}^{'}

, where

\begin{matrix} s_{1 i} (\hat{γ}, γ_{0}) & = & σ_{0}^{- 1} 1 (Y_{i} = 0) Z_{i n} [\frac{ϕ (\frac{γ_{0} - Z_{i n}^{'} δ_{0}}{σ_{0}})}{Φ (\frac{γ_{0} - Z_{i n}^{'} δ_{0}}{σ_{0}})} - \frac{ϕ (\frac{\hat{γ} - Z_{i n}^{'} δ_{0}}{σ_{0}})}{Φ (\frac{\hat{γ} - Z_{i n}^{'} δ_{0}}{σ_{0}})}] \\ + \frac{1}{σ_{0}^{2}} (Y_{i} - Z_{i n}^{'} δ_{0}) Z_{i n} [1 (Y_{i} > \hat{γ}) - 1 (Y_{i} > γ_{0})]; \\ s_{2 i} (\hat{γ}, γ_{0}) & = & - \frac{1}{2 σ_{0}^{3}} 1 (Y_{i} = 0) [\frac{ϕ (\frac{\hat{γ} - Z_{i n}^{'} δ_{0}}{σ_{0}})}{Φ (\frac{\hat{γ} - Z_{i n}^{'} δ_{0}}{σ_{0}})} (\hat{γ} - Z_{i n}^{'} δ_{0}) - \frac{ϕ (\frac{γ_{0} - Z_{i n}^{'} δ_{0}}{σ_{0}})}{Φ (\frac{γ_{0} - Z_{i n}^{'} δ_{0}}{σ_{0}})} (γ_{0} - Z_{i n}^{'} δ_{0})] \\ - \frac{1}{2 σ_{0}^{2}} [1 (Y_{i} > \hat{γ}) - 1 (Y_{i} > γ_{0})] + \frac{1}{2 σ_{0}^{4}} {(Y_{i} - Z_{i n}^{'} δ_{0})}^{2} [1 (Y_{i} > \hat{γ}) - 1 (Y_{i} > γ_{0})] \end{matrix}

Using (D4) and similar arguments as in the proof of consistency, we have

\begin{matrix} n^{- 1 / 2} \sum_{i = 1}^{n} s_{1 i} (\hat{γ}, γ_{0}) & = & n^{- 1 / 2} \sum_{i = 1}^{n} σ_{0}^{- 1} 1 (Y_{i} = 0) Z_{i n} [\frac{ϕ (\frac{γ_{0} - Z_{i n}^{'} δ_{0}}{σ_{0}})}{Φ (\frac{γ_{0} - Z_{i n}^{'} δ_{0}}{σ_{0}})} - \frac{ϕ (\frac{\hat{γ} - Z_{i n}^{'} δ_{0}}{σ_{0}})}{Φ (\frac{\hat{γ} - Z_{i n}^{'} δ_{0}}{σ_{0}})}] \\ - n^{- 1 / 2} \frac{1}{σ_{0}^{2}} (\hat{γ} - Z_{i n, (1)}^{'} δ_{0}) Z_{i n, (1)} 1 (\hat{γ} > γ_{0}) : = A_{1 n} + A_{2 n}, \end{matrix}

\begin{matrix} n^{- 1 / 2} \sum_{i = 1}^{n} s_{2 i} (\hat{γ}, γ_{0}) & = & - \frac{1}{2 σ_{0}^{3}} n^{- 1 / 2} \sum_{i = 1}^{n} 1 (Y_{i} = 0) [\frac{ϕ (\frac{\hat{γ} - Z_{i n}^{'} δ_{0}}{σ_{0}})}{Φ (\frac{\hat{γ} - Z_{i n}^{'} δ_{0}}{σ_{0}})} (\hat{γ} - Z_{i n}^{'} δ_{0}) - \frac{ϕ (\frac{γ_{0} - Z_{i n}^{'} δ_{0}}{σ_{0}})}{Φ (\frac{γ_{0} - Z_{i n}^{'} δ_{0}}{σ_{0}})} (γ_{0} - Z_{i n}^{'} δ_{0})] \\ + \frac{1}{2 σ_{0}^{2}} n^{- 1 / 2} 1 (\hat{γ} > γ_{0}) - \frac{1}{2 σ_{0}^{4}} n^{- 1 / 2} {(\hat{γ} - Z_{i n, (1)}^{'} δ_{0})}^{2} 1 (\hat{γ} > γ_{0}) \end{matrix}

Since the function

g (z) = ϕ (z) /

Φ (z)

is continuously differentiable on

R

, we have by the Mean Value Theorem:

g (\frac{\hat{γ} - Z_{i n}^{'} δ_{0}}{σ_{0}}) - g (\frac{γ_{0} - Z_{i n}^{'} δ_{0}}{σ_{0}}) = σ_{0}^{- 1} g^{'} (\frac{\tilde{γ} - Z_{i n}^{'} δ_{0}}{σ_{0}}) (\hat{γ} - γ_{0})

, where

\tilde{γ}

is between

\hat{γ}

and

γ_{0}

, and

\tilde{γ} \overset{p}{\to} γ_{0} .

Hence, the first term in

n^{- 1 / 2} \sum_{i = 1}^{n} s_{1 i} (\hat{γ}, γ_{0})

satisfies

A_{1 n} = n^{- 1 / 2} [n (\hat{γ} - γ_{0})] n^{- 1} \sum_{i = 1}^{n} 1 (Y_{i} = 0) Z_{i n} g^{'} (\frac{\tilde{γ} - Z_{i n}^{'} δ_{0}}{σ_{0}}) = n^{- 1 / 2} O_{p} (1) O_{p} (1) = o_{p} (1)

since

n (\hat{γ} - γ_{0}) = O_{p} (1)

by Proposition 1. The second term satisfies

E |A_{2 n}| \leq n^{- 1 / 2} {[E {|(\hat{γ} - Z_{i n, (1)}^{'} δ_{0}) Z_{i n, (1)}|}^{2}]}^{1 / 2} P^{1 / 2} (n (\hat{γ} - γ_{0}) > 0) \to 0

since

P (n (\hat{γ} - γ_{0}) > 0) \to 1

by Proposition 1. Hence,

n^{- 1 / 2} \sum_{i = 1}^{n} s_{1 i} (\hat{γ}, γ_{0}) = o_{p} (1)

. Similarly,

n^{- 1 / 2} \sum_{i = 1}^{n} s_{2 i} (\hat{γ}, γ_{0}) = o_{p} (1)

. Thus,

n^{- 1 / 2} \sum_{i = 1}^{n} [s (W_{i n}, θ_{0}, \hat{γ}) - s (W_{i n}, θ_{0}, γ_{0})] = o_{p} (1)

, proving (D6).

Finally, we verify (D5). Write

|\frac{1}{n} \sum_{i = 1}^{n} H_{i n} (\tilde{θ}, \hat{γ}) - H_{0}| \leq |\frac{1}{n} \sum_{i = 1}^{n} [H_{i n} (\tilde{θ}, \hat{γ}) - H_{i n} (θ_{0}, γ_{0})]| + |\frac{1}{n} \sum_{i = 1}^{n} H_{i n} (θ_{0}, γ_{0}) - H_{0}|

We need to show that each of the terms on the r.h.s. of the last inequality is

o_{p} (1)

. The second term is

o_{p} (1)

by the LLN of Jenish and Prucha [5] since

H (W_{i n}, θ_{0}, γ_{0})

is

L_{1}

-NED on

{(X_{i}^{'}, ε_{i})}^{'}

.

Now consider the first term. Using formulas (D1), we have

\begin{matrix} \frac{\partial^{2} m_{i n} (\tilde{θ}, \hat{γ})}{\partial δ \partial δ^{'}} - \frac{\partial^{2} m_{i n} (θ_{0}, γ_{0})}{\partial δ \partial δ^{'}} & = & - 1 (Y_{i} = 0) [g_{1 i n} (\tilde{θ}, \hat{γ}) - g_{1 i n} (θ_{0}, γ_{0})] Z_{i n} Z_{i n}^{'} \\ - [1 (Y_{i} > \hat{γ}) - 1 (Y_{i} > γ_{0})] \frac{1}{σ^{2}} Z_{i n} Z_{i n}^{'} \end{matrix}

where

g_{1 i n} (θ, γ) = \frac{ϕ (\frac{γ - Z_{i n}^{'} δ}{σ})}{σ Φ^{2} (\frac{γ - Z_{i n}^{'} δ}{σ})} [\frac{1}{σ} ϕ (\frac{γ - Z_{i n}^{'} δ}{σ}) - \frac{1}{σ^{2}} Φ (\frac{γ - Z_{i n}^{'} δ}{σ}) (γ - Z_{i n}^{'} δ)]

is continuously differentiable in

(θ^{'}, γ)

, and hence by the Mean Value Theorem

g_{1 i n} (\tilde{θ}, \hat{γ}) - g_{1 i n} (θ_{0}, γ_{0}) = \frac{\partial g_{1 i n} (θ^{*}, γ^{*})}{\partial θ^{'}} (\tilde{θ} - θ_{0}) + \frac{\partial g_{1 i n} (θ^{*}, γ^{*})}{\partial γ} (\hat{γ} - γ_{0})

where

(θ^{*}, γ^{*})

lie between

{({\tilde{θ}}^{'}, \hat{γ})}^{'}

and

{(θ_{0}^{'}, γ_{0})}^{'}

. Then, using (D4) and similar arguments as in the proof of consistency, we have

\begin{matrix} |\frac{1}{n} \sum_{i = 1}^{n} \frac{\partial^{2} m_{i n} (\tilde{θ}, \hat{γ})}{\partial δ \partial δ^{'}} - \frac{\partial^{2} m_{i n} (θ_{0}, γ_{0})}{\partial δ \partial δ^{'}}| \leq |\tilde{θ} - θ_{0}| \frac{1}{n} \sum_{i = 1}^{n} |\frac{\partial g_{1 i n} (θ^{*}, γ^{*})}{\partial θ^{'}}| |Z_{i n} Z_{i n}^{'}| \\ + |\hat{γ} - γ_{0}| \frac{1}{n} \sum_{i = 1}^{n} |\frac{\partial g_{1 i n} (θ^{*}, γ^{*})}{\partial γ}| |Z_{i n} Z_{i n}^{'}| + \frac{1}{n} 1 (\hat{γ} > γ_{0}) σ^{- 2} |Z_{i n, (1)} Z_{i n, (1)}^{'}| \overset{p}{\to} 0 \end{matrix}

since

|\tilde{θ} - θ_{0}| \overset{p}{\to} 0

,

|\hat{γ} - γ_{0}| \overset{p}{\to} 0

, and

E [1 (\hat{γ} > γ_{0})] = P (n (\hat{γ} - γ_{0}) > 0) \to 1

. By similar arguments,

|\frac{1}{n} \sum_{i = 1}^{n} \frac{\partial^{2} m_{i n} (\tilde{θ}, \hat{γ})}{\partial {(σ^{2})}^{2}} - \frac{\partial^{2} m_{i n} (θ_{0}, γ_{0})}{\partial {(σ^{2})}^{2}}| \overset{p}{\to} 0 and |\frac{1}{n} \sum_{i = 1}^{n} \frac{\partial^{2} m_{i n} (\tilde{θ}, \hat{γ})}{\partial σ^{2} \partial δ^{'}} - \frac{\partial^{2} m_{i n} (θ_{0}, γ_{0})}{\partial σ^{2} \partial δ^{'}}| \overset{p}{\to} 0

which verifies (D5), and hence, asymptotic normality of the MLE.

B. LS estimator

The proof of (D5) for the LSE is analogous to that in the case of the MLE. Next, by (D2), the score

s_{i n} (θ_{0}, γ)

is continuously differentiable in γ, and hence, by the Mean Value Theorem,

n^{- 1 / 2} \sum_{i = 1}^{n} [s_{i n} (θ_{0}, \hat{γ}) - s_{i n} (θ_{0}, γ_{0})] \leq n^{- 1 / 2} n |\hat{γ} - γ_{0}| \frac{1}{n} \sum_{i = 1}^{n} |\frac{\partial s_{i n} (θ_{0}, \tilde{γ})}{\partial γ}| \overset{p}{\to} 0

where

\tilde{γ}

is between

\hat{γ}

and

γ,

since

n (\hat{γ} - γ_{0}) = O_{p} (1)

and

n^{- 1} \sum_{i = 1}^{n} |\partial s_{i n} (θ_{0}, \tilde{γ}) / \partial γ| = O_{p} (1)

. Now, observe that

ν_{n} (\hat{γ}) - ν_{n} (γ_{0}) = n^{- 1 / 2} \sum_{i = 1}^{n} [s_{i n} (θ_{0}, \hat{γ}) - s_{i n} (θ_{0}, γ_{0})] - n^{- 1 / 2} \sum_{i = 1}^{n} E s_{i n} (θ_{0}, \hat{γ}) .

By the population moment condition,

E s_{i n} (θ_{0}, γ_{0}) = E [\frac{\partial}{\partial θ} m (W_{i n}, θ_{0}, γ_{0})] = 0,

and hence

|n^{- 1 / 2} \sum_{i = 1}^{n} E s_{i n} (θ_{0}, \hat{γ}) - E s_{i n} (θ_{0}, γ_{0})| \leq n^{- 1 / 2} n |\hat{γ} - γ_{0}| \frac{1}{n} \sum_{i = 1}^{n} E |\frac{\partial s_{i n} (θ_{0}, \tilde{γ})}{\partial γ}| \overset{p}{\to} 0

which verifies both (D7) and (D6). The proof of the theorem is thus complete. ∎

References

P. Bajari, H. Hong, and D. Nekipelov. “Game theory and econometrics: A survey of some recent research.” Working Paper, University of Minnesota, Minneapolis, MN, USA, 2010. [Google Scholar]
A. De Paula. “Econometric analysis of games with multiple equilibria.” Annu. Rev. Econ. 5 (2013): 107–131. [Google Scholar] [CrossRef]
K. Menzel. “Inference for games with many players.” Working Paper, NYU, New York, NY, USA, 2012. [Google Scholar]
H. Xu. “Social interactions: A game theoretical approach.” Working Paper, University of Texas at Austin, Texas, Austin, 2011. [Google Scholar]
N. Jenish, and I.R. Prucha. “On spatial processes and asymptotic inference under near-epoch dependence.” J. Econom. 170 (2012): 178–190. [Google Scholar] [CrossRef] [PubMed]
W. Brock, and S. Durlauf. “Discrete choice with social interaction.” Rev. Econ. Stud. 68 (2001): 235–260. [Google Scholar] [CrossRef]
C.-L. Su, and K.L. Judd. “Constrained optimization approaches to estimation of structural models.” Econometrica 80 (2012): 2213–2230. [Google Scholar] [CrossRef]
P. Bajari, H. Hong, J. Krainer, and D. Nekipelov. “Estimating static models of strategic interactions.” J. Bus. Econ. Stat. 28 (2010): 469–482. [Google Scholar] [CrossRef]
X. Xu, and L.-F. Lee. “Maximum likelihood estimation of a spatial autoregressive Tobit model.” Working Paper, The Ohio State University, Columbus, OH, USA, 2014. [Google Scholar]
D.B. Audretsch, and M.P. Feldman. “R&D spillovers and the geography of innovation and production.” Am. Econ. Rev. 86 (1996): 253–273. [Google Scholar]
R. Levin, and P. Reiss. “Cost-reducing and demand-creating R&D with spillovers.” Rand J. Econ. 19 (1988): 538–556. [Google Scholar]
C. d’Aspremont, and A. Jacquemin. “Cooperative and noncooperative R&D in duopoly with spillovers.” Am. Econ. Rev. 78 (1988): 1133–1137. [Google Scholar]
M. Motta. “Cooperative R&D and vertical product differentiation.” Int. J. Ind. Org. 10 (1992): 643–661. [Google Scholar]
W.M. Cohen, and S. Klepper. “The anatomy of industry R&D intensity distributions.” Am. Econ. Rev. 82 (1992): 773–788. [Google Scholar]
X. Gonzalez, and J. Jaumandreu. “Threshold effects in product R&D decisions: Theoretical Framework and Empirical Analysis. Studies on the Spanish Economy, FEDEA.” Available online: http://econpapers.repec.org/paper/fdafdaeee/45.htm (accessed on 10 October 2013).
A. Dixit, and J. Stiglitz. “Monopolistic competition and optimum product diversity.” Am. Econ. Rev. 67 (1977): 297–308. [Google Scholar]
S. Durlauf. “A framework for the study of individual behavior and social interactions.” Unpublished Manuscript. 2001. [Google Scholar]
N. Jenish. “Nonparametric spatial regression under near-epoch dependence.” J. Econom. 167 (2012): 224–239. [Google Scholar] [CrossRef]
R. Carson, and Y. Sun. “The Tobit model with a non-zero threshold.” Econom. J. 10 (2007): 488–502. [Google Scholar] [CrossRef]
N. Jenish. “Spatial semiparametric model with endogenous regressors.” Econometric Theory, 2014. forthcoming. [Google Scholar] [CrossRef]
J. Davidson. Stochastic Limit Theory. Oxford, UK: Oxford University Press, 1994. [Google Scholar]
D.W.K. Andrews. “Asymptotics for semi-parametric econometric models via stochastic equicontinuity.” Econometrica 62 (1994): 43–72. [Google Scholar] [CrossRef]
N. Jenish, and I.R. Prucha. “Central limit theorems and uniform laws of large numbers for arrays of random fields.” J. Econom. 150 (2009): 86–98. [Google Scholar] [CrossRef] [PubMed]
T. Amemiya. “Regression analysis when the dependent variable is truncated normal.” Econometrica 41 (1973): 997–1016. [Google Scholar] [CrossRef]

¹For the definition of mixing coefficients, see [18].

© 2015 by the author; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license ( http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jenish, N. Strategic Interaction Model with Censored Strategies. Econometrics 2015, 3, 412-442. https://doi.org/10.3390/econometrics3020412

AMA Style

Jenish N. Strategic Interaction Model with Censored Strategies. Econometrics. 2015; 3(2):412-442. https://doi.org/10.3390/econometrics3020412

Chicago/Turabian Style

Jenish, Nazgul. 2015. "Strategic Interaction Model with Censored Strategies" Econometrics 3, no. 2: 412-442. https://doi.org/10.3390/econometrics3020412

APA Style

Jenish, N. (2015). Strategic Interaction Model with Censored Strategies. Econometrics, 3(2), 412-442. https://doi.org/10.3390/econometrics3020412

Article Menu

Strategic Interaction Model with Censored Strategies

Abstract

1. Introduction

2. Model

2.1. Spillovers in R&D Investment

2.2. Peer Effects in Female Labor Supply

3. Equilibrium: Characterization and Weak Dependence

4. Identification and Estimation

5. Consistency and Asymptotic Normality

6. Numerical Results

7. Conclusions

Acknowledgments

Conflicts of Interest

Appendix

A. Proofs for Section 2

B. Proofs for Section 3

C. Proofs for Section 4

A. Identification in ML

B. Identification in LS

D. Proofs for Section 5

A. ML estimator

B. LS estimator

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI