Strategic Interaction Model with Censored Strategies

In this paper, we develop a new model of a static game of incomplete information with a large number of players. The model has two key distinguishing features. First, the strategies are subject to threshold effects, and can be interpreted as dependent censored random variables. Second, in contrast to most of the existing literature, our inferential theory relies on a large number of players, rather than a large number of independent repetitions of the same game. We establish existence and uniqueness of the pure strategy equilibrium, and prove that the censored equilibrium strategies satisfy a near-epoch dependence property. We then show that the normal maximum likelihood and least squares estimators of this censored model are consistent and asymptotically normal. Our model can be useful in a wide variety of settings, including investment, R&D, labor supply, and social interaction applications.


Introduction
Identification and estimation of strategic interaction models have recently received a great deal of attention in econometrics, owing to the growing interest and application of stochastic games in various fields including industrial organization, labor, political and international economics.Most of the existing literature has focused on discrete choice games, see [1,2] for a survey of recent results.In this literature, the observed data are assumed to arise from an equilibrium of a game played by a finite number of players, and therefore, to be correlated across players.Typically, the number of players is assumed to be fixed, and the asymptotic inferential theory relies on a large number of independent repetitions of the same game in different markets or in a single market at different points of time.Two notable exceptions are Menzel [3] and Xu [4], who develop the inferential theory of discrete choice games based on a large number of players.
In this paper, we develop a new model of a static game of incomplete information with a large number of players.The model has two key distinguishing features.First, the strategies are subject to threshold effects, and can be interpreted as dependent censored random variables, e.g., R&D investment and labor supply.Second, the game is played in a single market and is not repeated over time.To develop the asymptotic theory, we instead assume that the number of players grows unboundedly, and the players reside on an exogenously given lattice so that the vector of their choices and characteristics can be viewed as a dependent random field, which can be handled by the limit theorems for near-epoch dependent (NED) random fields established by Jenish and Prucha [5].
We derive this model explicitly in two game-theoretical applications: (i) R&D investment by firms under strategic complementarities; and (ii) labor supply decision by women under peer effects.The set-up is standard for a static game of incomplete information: each player's payoff function depends on her choice and choices of other players, her commonly observed characteristics, and her private characteristic unobserved by other players; players move simultaneously based on their expectations about the choices of other players, and in equilibrium, players have self-consistent expectations, see [6], i.e., their subjective expectations coincide with the expectation based on the equilibrium distribution of strategies conditional on commonly observed variables.We assume that private characteristics are i.i.d.normal across players, and prove existence and uniqueness of the pure strategy equilibrium under some mild conditions.We then show that the censored equilibrium strategies also satisfy the NED property under the same conditions.
Under normality of private shocks, the equilibrium strategies boil down to a Tobit econometric model.However, in contrast to the standard Tobit model, our censored model involves a non-zero threshold parameter that needs to be estimated.Therefore, we use the following two-step semiparametric procedure: we first estimate the threshold by the minimum order statistic of the uncensored subsample, and then estimate the remaining parameters either by the maximum likelihood or least squares method.Unlike the standard Tobit model, the maximum likelihood estimator does not strictly dominate the least squares estimator in our model due to discontinuous dependence of the likelihood on the first-step estimator, which may amplify finite-sample biases stemming from the first-step estimation.This provides a rationale for considering the least squares estimation as an alternative to the maximum likelihood procedure.We establish consistency and asymptotic distributions of these estimators.The minimum order statistic is n-consistent and asymptotically exponentially distributed, while the maximum likelihood and least squares estimators are √ n-consistent and asymptotically normal.A Monte Carlo study suggests that all these estimators perform well in finite samples.
Finally, we address the computational challenges of our game with a large number of players.The standard estimation of games involves computing the equilibrium for each alternative parameter value and then optimizing the objective function over parameter values, and thus presents a formidable computational burden.To tackle it, we use the constrained optimization algorithm proposed by Su and Judd [7], which treats the equilibrium equations as constraints and optimizes simultaneously over parameters and equilibrium variables, thereby avoiding calculation of the equilibrium at each iteration on the parameter value.Su and Judd [7] show equivalence of this constrained optimization problem to the original problem.Our simulations confirm viability and a significant computational efficiency of the Su-Judd algorithm in our model.
To our knowledge, the proposed censored model has not been considered in the existing literature.Most of the existing results have dealt with discrete choice games, e.g., [8].Recently, Xu and Lee [9] analyzed a spatially autoregressive (SAR) Tobit model, which can be viewed as a censored version the Cliff-Ord type linear SAR model with a known spatial weight matrix.Xu and Lee [9] establish the NED property as well as consistency and asymptotic normality of the maximum likelihood estimator using the limit theorems of Jenish and Prucha [5].Though not explicitly demonstrated, this SAR Tobit model can be interpreted as an equilibrium of a static game of complete information, while our model is a game of incomplete information with a different concept of equilibrium and, consequently, qualitatively different implications.Moreover, the presence of latent endogenous variables and non-zero threshold in our model pose additional statistical and computational difficulties.Thus, the two papers are complementary to each other.
The paper is organized as follows.Section 2 describes and derives the model in two examples.Section 3 establishes existence and uniqueness of the equilibrium, and proves the NED property of the equilibrium strategies.Section 4 discusses identification and estimation of the model.Consistency and asymptotic distributions of the estimators are established in Section 5. Section 6 contains a Monte Carlo study, and Section 7 concludes.All proofs are collected in the appendices.

Model
In this paper, we are concerned with estimation of the following econometric model: where is agent i's expectation, given its information set F i , about the choices of its neighbors, Y jn , within the neighborhood of radius r, N i = N i (r), containing a fixed number of neighbors k = |N i | that does not depend on i; X i is the vector of observed characteristics of agent i; ε i is agent i's private characteristic observed only by agent i, and (α 0 , β 0 , b 0 , γ 0 ), γ 0 ≥ 0, is the vector of unknown coefficients.The distribution of private shocks is known to all players.It is assumed that ε i ∼ i.i.d.N (0, σ 2 0 ), and are independent of {X i }.The information set of each player consists of the entire state vector, X n = (X 1 , ..., X n ) , and its private information ε i , i.e., The choice of player i is assumed to be directly affected by its neighbors only in a fixed neighborhood of the known radius, r > 0, with respect to some socio-economic metric.However, it will be indirectly affected by all other players.The number of the neighbors within the r-neighborhood of each agent is assumed to be fixed and equal to k.To avoid the incidental parameters problem, the k coefficients (α j0 ) j∈N i , measuring the effect of these k neighbors, are assumed to depend only on the relative locations of i and j, but not on j or i.We formally specify the metric and the neighborhood structure in the following section.
The above assumptions seem reasonable in many empirical settings.For example, in their R&D decision, firms would take into account R&D of its neighbors within a certain distance in the geographic (or product characteristic) space, rather than all firms in the market.This is due to the fact that technological spillovers, knowledge diffusion and labor mobility-determinants of R&D diffusion-are usually confined to a limited geographical or technological area.
Aside from the unobserved heterogeneity captured by the private shocks ε i , we do not allow individual heterogeneity in the parameters.The reason is that the model assumes only one repetition of the game and the number of agents growing to infinity to develop the asymptotic theory.Clearly, allowing heterogenous parameters across individuals will result in inconsistency.
Model ( 1) is fairly general for applications.It can arise as a system of best response functions of a static game of incomplete information among n players.Below, we derive these equations for two strategic interaction models: (i) R&D investment by firms; and (ii) labor supply by women.In these models, decisions of players exhibit strategic complementarities, and are subject to threshold effects.

Spillovers in R&D Investment
A large body of empirical evidence suggests presence of technology and R&D spillovers among firms, e.g., [10].Audretsch and Feldman [10] find that knowledge spillovers are more prevalent in the industries that exhibit spatial clustering.Positive R&D spillovers may occur through several channels including knowledge transfers, labor mobility and imitation.Therefore, it is reasonable to expect the magnitude of such a spillover effect to depend on the geographical and technological distances between firms.As a result, firms' R&D expenditures may be spatially correlated, and the magnitude of this correlation often decays with the distance between firms.
The literature distinguishes two major channels through which R&D can raise firms' profits: cost-reducing and demand-creating effects.The former allows firms to carry out process improvements leading to efficiency gains and cost reduction, while the latter enables firms to improve the quality of their product and thereby boost the demand.Levin and Reiss [11] analyze a model of monopolistic competition with both demand-creating and cost-reducing R&D spillovers across n firms.Based on a sample of US manufacturing firms, the authors find statistically significant, sizeable spillovers in the cost-reducing R&D and insignificant, small spillovers in the demand-creating R&D in most industries.Levin and Reiss [11] also find the elasticity of product quality to firm's own R&D to be much higher than that of cost to firm's own R&D.Other theoretical models of R&D spillovers include d'Aspremont and Jacquemin [12], and Motta [13], among others.
Yet all these papers model the R&D investment as a continuous variable thereby neglecting the strong empirical evidence that a sizeable proportion of firms do not undertake R&D activities, see, e.g., [14].One plausible explanation is that the demand-creating effect of R&D is subject to threshold effects: the quality could be raised only after a certain minimum level R&D investment is attained; R&D has no effect on the quality below this level.Thus, the R&D expenditure could be viewed as a censored decision variable whose optimal values below a certain threshold are unobserved.This type of model in the single-firm setup is analyzed by Gonzalez and Jaumandreu [15].
To study spatial spillovers in R&D investment, we develop a simple model of strategic interaction with a censored decision variable that incorporates the empirical findings discussed above.We consider a single, monopolistically competitive industry composed of a large number, n, firms, each producing a brand of the same product differentiated by quality.Let p i, q i , s i and y i denote, respectively, the price, demand, product quality and R&D expenditure of firm i.To derive q i , we employ a variant of the Dixit-Stiglitz [16] model of monopolistic competition in which the CES utility of a representative consumer is augmented with preference for quality: where η > 0 is a quality sensitivity parameter.Utility maximization yields the demand for firm i of the form: is the elasticity of substitution between the quality-adjusted goods, = η (ν − 1), K = Ip −1 with I being consumer's income and p = n i=1 (p i /s η i ) 1−ν is a quality-adjusted price index.To obtain non-increasing marginal demand for quality, ∂ 2 q i /∂s 2 i ≤ 0, suppose that η ≤ 1/ (ν − 1) .If the number of firms is large, it is reasonable to assume the effect of a single firm's decision on the industry index p to be negligible, i.e., K is constant, and normalize K = 1.
Following Gonzalez and Jaumandreu [15], we assume that firm's own R&D expenditure affects only its product quality, subject to a technological constraint: where y is the minimum investment required for quality improvements, 0 < δ < 1 is the R&D sensitivity parameter.Throughout, we use y i +1 instead of y i to ensure that the logarithm of the censored investment is defined for zero values, and let Y i = log (y i + 1).It is a convenient normalization, which does not affect the results.Furthermore, in light of the above empirical findings, we assume that other firms' R&D have only a cost-reducing effect on firm i, and this effect is limited to the fixed r-neighborhood, N i (r), of firm i: where c i and X i are, respectively, the marginal cost and vector of observed cost-determinants of firm i, is the vector of the log R&D choices of all firms except i, and e i is firm i's idiosyncratic cost component.The coefficient α ≤ 0 measures the strength of this spillover effect.
Suppose that all firms observe X n = (X 1 , ..., X n ) , but e i is observed only by firm i.Given this uncertainty about the choices of other firms, following Durlauf [17], see also [6], we assume that each firm i decides on its R&D investment based on its beliefs about the choices of the other firms, E i (Y −i ) = E Y −i |X n , e i , which are formed as the conditional expectation given all the information available to firm i.
Based on these beliefs, firms choose simultaneously price, p i , and R&D investment, y i , to maximize their profits subject to a technological constraint, i.e., solve where C(y i ) is the cost of investment.The nonstochastic threshold y ≥ 0 is assumed to be observed and constant across all firms.
Lemma 1.The solution to optimization problem (2) and ( 3) is given by (1) with α j0 = −τ α, j = 1, .., k, If α < 0, i.e., R&D of the neighbors has a cost-reducing effect on firm i, then both the probability and intensity of firm i's R&D increases with the expected R&D of its neighbors.In other words, there are strategic complementarities or positive externalities in the R&D decision of firms.Furthermore, the probability of R&D is also increasing in (i) the elasticity of demand with respect to quality, higher ; (ii) the elasticity of quality with respect to R&D, higher δ; and (iii) the market power, lower ν.The latter is consistent with the Schumpeterian argument that economies of scales make R&D more attractive to large firms than to small firms.

Peer Effects in Female Labor Supply
Our next example involves social interactions in the female labor supply.Suppose the utility of female i is defined over her consumption, c i , and leisure, l i , as follows: where 0 < δ < 1 is the parameter characterizing the relative preference for consumption over leisure.Let y i be the labor supply of female i, and let the weight on the leisure, h i , capture the peer effects that depend on the labor supply decisions of female i's peers in her social neighborhood, referred to as friends, as follows: where X i is the vector of observed characteristics of woman i, Y j = log (y j + 1) is the log labor supply of woman i's friends, and e i is her private characteristic unobserved by other women.As in the previous example, we use y i + 1 instead of y i to ensure that log of the censored labor supply is defined for zero values.In presence of positive peer effects, α < 0, which implies mutual reinforcement in the choices within the social group.
As before, all women observe X n = (X 1 , ..., X n ) , but e i is observed only by i. Woman i makes her decision based on her beliefs about the choices of her peer group, E i (Y −i ) = E Y −i |X n , e i , which are formed as the conditional expectation given all the information available to woman i.Based on these beliefs, women simultaneously maximize their utility subject to threshold effects: where w is the wage, T is the time endowment, c is the reservation labor income, which can be interpreted as welfare or other government transfers.The nonstochastic threshold c ≥ 1 is assumed to be observed and constant across women.
Lemma 2. The solution to optimization problem (4) and ( 5) is given by (1) with If α < 0, i.e., there are positive peer effects, then both the probability and magnitude of woman i's labor supply increases with the expected labor supply of her peers.

Equilibrium: Characterization and Weak Dependence
We assume that in equilibrium, players have self-consistent expectations, i.e., their subjective expectations or beliefs coincide with the expectation based on the equilibrium distribution of strategies conditional on X n .That is, where the expectation E •|X n is taken with respect to the equilibrium conditional distribution of strategies.The last equality follows from independence of {ε i }, and independence of {ε i } and {X i }.
Suppose that the ε i are i.i.d.N (0, σ 2 0 ).Taking conditional expectation of Equation ( 1) with respect to the equilibrium distribution of strategies, conditional on X n , yields: where Φ and φ are, respectively, the c.d.f. and p.d.f. of the standard normal distribution.Provided that they are well-defined, strategies {Y in , i = 1, .., n} are independent across i conditional on X n and have censored normal distributions with the means Y in , i = 1, .., n , the common variance σ 0 and the common nonstochastic threshold γ 0 .In equilibrium, Y in , i = 1, .., n satisfy system (7).If this system has a unique solution, the corresponding equilibrium strategies, {Y in , i = 1, .., n} , will be also unique with probability 1, since a censored normal variable is uniquely characterized by its mean, variance and threshold.This leads to the following characterization of equilibrium.Definition 1.An equilibrium is a set of policy functions {Y in , i = 1, .., n} whose conditional mean functions satisfy system (7).
A similar characterization of equilibrium in discrete games of incomplete information is used in [8].An appealing feature of Equation ( 1) is that it reduces to the popular Tobit model, which is part of any regression package.However, the difficulty is that {Y in } depends on the latent regressors, Y in .Thus, one would need first to obtain consistent estimates of the latent regressors, and then use any consistent estimation procedures for the Tobit model.
Since consistency of any estimation method hinges upon uniqueness of equilibrium, we first prove existence and uniqueness of the pure strategy equilibrium.To this end, we maintain the following assumption.
This assumption restricts the strength of interactions, captured by the coefficients α: interactions must not be too strong for a stable equilibrium to exist.Intuitively, if the interactions are long-ranged and too strong, then the effect of remote neighbors is substantial and may lead to instability and multiple equilibria.Since it involves only the estimated coefficients and the number of neighbors, k, is typically known, the assumption is testable.
Assumption 1 is similar to Assumptions B and C in [4], which restrict the strength of interactions to obtain a unique equilibrium in a discrete choice game of social interactions.
Based on this assumption, we can now show existence and uniqueness of equilibrium.
In general, without restrictions on the parameters, multiple equilibria could occur.If one does not want to impose restrictions directly, one can use the Mathematical Program with Equilibrium Constraints (MPEC) routine to deal with multiple equilibria implicitly by choosing the equilibrium that maximizes the empirical likelihood.
In equilibrium, the policy variables will be correlated across players.To characterize their dependence, we assume that the process W in = Y in , Y in , X i , ε i is indexed by a vector of locations , and hence can be viewed as a random field on Z d .In other words, W in = W t(i)n , t(i) ∈ Λ n , n ≥ 1 is a triangular array of vector-valued random fields defined on a probability space (Ω, F, P ) and observed on sample regions Λ n ⊂ Z d .In the following, to simplify notation, we suppress the index t and write W in = W t(i)n .Furthermore, we denote by • the Euclidian norm in R d and by Assumption 2. The data-generating process The distance between players i and j is measured by the Euclidian metric: This assumption implies that the players' locations are exogenous, i.e., they are known and determined outside the model.Extensions to endogenous locations would require explicit modeling of the location choice, and would therefore considerably complicate the model.This extension is an interesting direction for future research.
Given Assumption 2, it turns out that the equilibrium policy variables satisfy a weak dependence condition known as near-epoch dependence (NED), see [5], under the same condition that ensures uniqueness of equilibrium.For ease of reference, we state definition of NED random variables.

Definition 2. The triangular array of random fields {W
Theorem 2. Suppose Assumptions 1 and 2 hold and The value of the constant c is given in the proof, but it is not important for what follows.

Identification and Estimation
We now discuss identification and estimation of our model.Let Z in = Y jn , j ∈ N i , 1, X i , and let δ = ((α j , j ∈ N i ) , b, β ) denote the corresponding vector of the coefficients.Given the specification, it is natural to identify and estimate all unknown parameters based on the likelihood function.The log likelihood function of the model is where θ = (δ , σ) .Likelihood function (8) involves an unknown threshold parameter, γ, which is in contrast to the standard Tobit model, where the threshold is assumed to be known and equal to zero.The maximum likelihood (ML) estimator of γ is the minimum order statistics of the uncensored subsample.More specifically, partition the dependent variable and regressor matrix into two parts: , where the subscript (0) indicates that observations come from the censored subsample, and the subscript (1) -from the uncensored subsample, and let As shown in Proposition 1 below, γ is a consistent estimator of γ.The ML estimators of the other parameters θ can then be obtained by the standard differentiation techniques.
and Z in,(1) is the regressor vector from the uncensored subsample. 1 For the definition of mixing coefficients, see [18].
Thus, γ is n-consistent and asymptotically exponentially distributed.For i.i.d.sample, this result has been established by Carson and Sun [19].So, Proposition 1 extends [19] to a spatially dependent case.The superconsistency of γ is a well-known consequence of the dependence of the support of Y i on γ.
Proposition 1 implies that γ 0 is identified.The remaining parameters, θ 0 = (δ 0 , σ 0 ) can now be identified from the likelihood function.Alternatively, one can identify θ 0 from the conditional mean function and estimate it by the least squares procedure: In contrast to the standard Tobit model with zero-threshold, the ML estimator does not strictly dominate the least squares (LS) estimator in our model due to the presence of the first-step estimator.The thing is that the LS objective function is continuous in γ, while the likelihood function is not.The latter implies that small finite-sample biases in γ may cause sizeable finite-sample biases in the ML estimates of θ.This prediction is confirmed by the simulation results of Section 6, which suggest larger finite-sample biases in the ML than in the LS estimator.This is the main rationale for considering the LS procedure as an alternative to the ML procedure in our model.Thus, estimation of model ( 1) could be carried out in two steps.First, estimate the threshold parameter γ by the minimum order statistic of the uncensored subsample γ, and substitute it for the true γ 0 in ( 8) and (10).Then, estimate the remaining parameters θ in ( 8) and ( 10) by the ML or LS procedures, respectively.Note that the least squares estimator of γ in (1) will be imprecise due to near multicollinearity of the intercept and threshold.Therefore, we use the first-step estimator γ in both procedures.
We now present sufficient conditions for identification of θ.Assumption 4. Suppose (i) at least one of components of X i has the full support, R; and (ii) , where ϕ(Z in , θ, γ) is as defined in (10).
Practically, the second-step estimation of θ could be implemented through the following nested fixed-point (NFXP) algorithm: (i) in an inner loop, for a given θ, find the unique solution of the equilibrium equations ( 7) by the fixed-point algorithm; and (ii) in an outer loop, search over θ ∈ Θ that maximizes the objective function.Let Y (θ, γ) = Y 1 , ..., Y n be the solution of the equilibrium equations (7).Then, the resulting estimator can be represented as where m (•, •, •) is either the log likelihood function defined in (8) or minus the squared deviation of Y i from the conditional mean defined in (10).This formulation makes explicit the dependence of the equilibrium variables on the estimated parameters.Given superconsistency of the first-step estimator, the resulting second-step maximum likelihood or least squares estimators of θ will be root-n consistent, asymptotically normal and independent of γ, as shown in Theorem 4 below.However, the NFXP algorithm will be computationally costly for large cross-sectional datasets, e.g., n ≥ 200.
To overcome this problem, we instead use the constrained optimization algorithm proposed by Su and Judd [7].The idea is to solve the following constrained optimization problem: where h( Y , θ, γ) = 0 is the vector representation of the equilibrium system (7).Note that Y in this formulation does not depend on θ, and is chosen simultaneously with θ to maximize the objective function subject to the equilibrium constraints.This obviates the need to solve the multi-dimensional fixed point problem for Y at each iteration on θ.Su and Judd [7] prove equivalence of problems ( 11) and ( 12) provided that the model is identified.They also demonstrate the computational advantage of this constrained optimization algorithm over the NFXP algorithm in the context of a single-agent dynamic discrete choice model.In particular, they show that the proposed algorithm leads, on average, to a ten-fold reduction of the computational time relative to the NFXP algorithm.
Since our model is identified by Theorem 3, the maximizer θ of problem (11) equals the maximizer θ of problem ( 12) by Proposition 1 of Su and Judd [7], and one can thus replace the computationally intensive problem (11) by the simpler problem (12).We investigate performance of the constrained optimization algorithm for our model in the Monte Carlo study of Section 6.

Consistency and Asymptotic Normality
We next show consistency and asymptotic normality of the maximum likelihood and least squares estimators.To this end, we need the following assumption.Assumption 5. Suppose Part (i) Assumption 5 is the standard condition on the parameter space and the true parameter value; Part (ii) is used to verify uniform convergence of various sample functions.Generally, the above assumptions are slightly stronger than those in the fully parametric Tobit model with zero-threshold, since our Tobit estimator of θ relies on a nonparametric first-step estimator of γ.
Theorem 4.Under Assumptions 1-5, the maximum likelihood and least squares estimators are both consistent and asymptotically normal, i.e., Thus, both the maximum likelihood and least squares estimators of θ are √ n-consistent and asymptotically normal.To conduct inference, it remains to obtain a consistent estimate of the covariance matrix S 0 .For this purpose, one can employ the following spatial HAC estimator: where and h n is a bandwidth parameter.Jenish [20] proves consistency of this estimator for more general nonparametric estimators of γ.In our model, consistency is achieved by bandwidth parameters satisfying h n = O(n 1/(3d) ).

Numerical Results
In this section, we examine the finite sample properties of the maximum likelihood (ML) and least squares (LS) estimators of our censored model, as well the performance of the Su-Judd [7] algorithm.
Throughout, data {W i 1 ,i 2 } reside on the two-dimensional lattice Z 2 , where (i 1 , i 2 ) ∈ Z 2 denotes, for simplicity, the vector of coordinates.The data are simulated on a rectangular grid of (m 1 + 300) × (m 2 + 300) locations.To control for boundary effects, we discard the 300 outer boundary points along each of the axes and use the sample of size n = m 1 m 2 for estimation.
Our experiment consists of two stages: (i) simulation; and (ii) estimation.In the first stage, we first simulate two i.i.d.N (0, 1) processes {ε i 1 ,i 2 } and {η i 1 ,i 2 }, which are independent of each other.Next, using the fixed-point algorithm, we generate the process {X i 1 ,i 2 } according to: and then the process Y i 1 ,i 2 according to: 25, γ = 2 and σ = 1.Last, we form the process {Y i 1 ,i 2 } according to: In the second stage, we first construct the minimum order statistic estimator of γγ -and then use the Su-Judd [7] constrained optimization algorithm to estimate the remaining parameters θ = (α, β , b, σ) .As discussed in Section 4, we estimate the n-dimensional vector of endogenous variables Y = Y i 1 ,i 2 jointly with θ, instead of computing it at each iteration on θ, i.e., we solve the constrained optimization problem: where h( Y , θ, γ) = 0 is the vector representation of the equilibrium system (13).
The estimation results based on 1000 Monte-Carlo repetitions are presented in Tables 1-5.Both the ML and LS estimators of α behave well for all sample sizes.The finite-sample bias declines rapidly from about 1.1% (n = 200) to 0.3% (for n = 1000) in the case of the ML estimator, and from 0.55% (n = 200) to 0.08% (n = 1000) in the case of the LS estimator.These results suggest that a five-fold increase in the sample size leads to a more than three-fold reduction in the ML bias, and a more than six-fold decrease in the LS bias, which is consistent with our asymptotic theory.The standard errors also fall off rapidly with the sample size.A similar pattern is observed for the estimates of the slope β, shown in Table 2.
Table 3 contains the minimum order statistic estimates of γ.The finite-sample bias diminishes from 4.5% (n = 200) to 0.8% (for n = 1000), which means that a five-fold increase in the sample size is associated with more than a five-fold reduction in the bias.This is in line with the theoretical prediction of n-consistency of the minimum order statistics.Next, Tables 4 and 5 present the estimates of the intercept b and the standard deviation σ, respectively.The maximum likelihood estimates of b and σ exhibit larger biases than those of α and β.However, they decrease as the sample size increases: from 5.5% (n = 200) to 1.4% (n = 1000) in the case of b, and from 15.9% (n = 200) to 6.9% (n = 1000) in the case of σ.Thus, the biases still halve when the sample size increases four-fold, consistent with the asymptotic theory.Large small-sample biases could be explained by weak identification or near multicollinearity introduced by the inverse Mills ratio, which is approximately linear over a wide range of its argument.Interestingly, the LS estimates of all parameters, including b and σ, have smaller finite-sample biases than the respective ML estimates.The reason is that the LS objective function is continuous in the first-step nonparametric estimator of γ, while the likelihood function is not, and small first-step biases in γ may get disproportionately amplified and translate into sizeable second-step biases in θ.Consequently, the biases in the ML estimates of b and σ due to weak identification are further exacerbated by discontinuity of the likelihood function.Nevertheless, as expected, the LS estimator has larger standard errors than the ML estimator across all parameters.Thus, in contrast to the standard Tobit model with zero-threshold, the ML estimator does not strictly dominate the LS estimator in our model.
Finally, Table 6 reports the computational time and the number of converged iterations for the Su-Judd [7] algorithm.The algorithm performs well for all sample sizes: converges in almost 99% of iterations and the time costs are less than two hours even in large sample sizes such as n = 1000.For comparison, the NFXP algorithm will take about 130-150 hours to estimate the model for same sample sizes.Thus,the Su-Judd [7] algorithm offers a considerable time savings over standard nested fixed-point algorithms.Overall, the simulations results are consistent with our asymptotic theory: the finite-sample biases and standard errors of the ML and LS estimators decay rapidly with the sample size.Moreover, the Su-Judd [7] constrained optimization algorithm seems to be a viable and effective numerical procedure for estimating games with large number of players, including our model.

Conclusions
In this paper, we study identification and estimation of a static game of incomplete information with censored strategies.Specifically, we show existence and uniqueness of an equilibrium as well as its weak dependence property under a condition that restricts the strength of interactions among the players.We then show identification of the parameters and estimate them by maximum likelihood and least squares procedures.The resulting estimators are shown to be consistent and asymptotically normal.We also demonstrate application of our results to modeling spillovers in firms' R&D investment and peer effects in female labor supply.
One direction for future research is to relax the normality assumption on the errors and obtain identification under more general error distributions whose conditional mean functions satisfy contraction mapping conditions, similar to the one used in the paper.Another extension could be to allow for random threshold effects in the outcome variable, using some parametric family of distributions.One can also allow for truncated strategies by slight modifications in the likelihood function and Assumption 1. Finally, instead of the regular lattice, one can consider players located at the nodes of some graph, which describes the network structure, as in the social interactions literature.
First, if y i > y, then the first-order conditions with respect to investment imply the following optimal investment: and τ = (ν − 1) /(1 − δ) > 0 since, by assumption, 0 < δ < 1 and ν > 1.The value of the profit at the optimal values of price and investment is Π i (y + 1) δ − y i and, hence, it is optimal to set y * i = 0.The value of the profit in the second case is , and hence the optimal investment is given by: where b 0 = log B and γ 0 = log (1 − δ) −1/( δ) (y + 1) > 0 since 0 < δ < 1 and y > 0.
Proof of Lemma 2: and the first-order conditions imply the following labor supply: The optimal utility is and, hence, it is optimal to set y * i = 0.The optimal utility in the second case is , and hence the optimal investment is given by: where b 0 = log w δ/(1−δ) and γ 0 = log (1 − δ) −1/δ c > 0 since c ≥ 1 and 0 < δ < 1.

B. Proofs for Section 3
Proof of Theorem 1: In the following, we suppress dependence of variables on n and write Y i = Y in .
To prove the theorem, it suffices to show that the mapping , with the components given by: The result would then follow by the Banach Fixed Point Theorem.
To simplify notation, let z where we used φ (z) = −zφ (z).It then follows that Hence, by the Mean Value Theorem, for any X i and any Y , Y ∈ R n : by Assumption 1, where λ 0 = α φ (0) γ 0 σ −1 0 + 1 and α = max 1≤j≤k |α j0 |.Consequently, the vector mapping G satisfies the following Lipschitz condition: Proof of Theorem 2: The proof is similar to that of Proposition 1 of Jenish [18].Let N i (m) = {j, j = i, 1 ≤ j ≤ n : t (i) − t (j) ≤ m} be the m-neighborhood of agent i that excludes i, and N o i (m) = N i (m)∪{i} be the neighborhood of agent i that includes i, where , X i ), i = 1, ..., n, defined in (B1).To simplify, we suppress dependence of variables on n, and write Y i = Y in .
Fix i, and define η . Suppose that the underlying probability space is rich enough that there exists a random variable U uniformly distributed on [0, 1] that is independent of {X j }.Then, by Lemma A1 in Jenish [18], there exists a function h(U, η (m) i ) such that the process ) has the same distribution as X.

We now construct an approximation to
Since X (m) has the same distribution as X , we have = X j and by the Lipschitz condition (B2), we have Consider two cases: m ≥ r and m < r.If m ≥ r, recursive use of (B4) gives ...
] < 1 and again we will have the same bound on the error: Since i was arbitrary, using the same arguments, we can approximate the entire process Y i by Y (m) i , which has a similar functional form but a more tractable dependence structure.The approximation error is given by sup n,i∈Λn We now verify the NED condition sup n,i∈Λn ) du.Then, by the Jensen inequality and independence of U and {X i }, we have ) ) ) and define Z i as The proof will proceed in three steps: (i) verify the NED property of Z i ; (ii) show the NED property of 1 (Y i > 0) = 1 {Z i > γ 0 }; and (iii) apply Proposition 3 of Jenish and Prucha [5] to the product of the two processes to show its NED property.It can be easily verified that g (y 1 , ..., y k ) satisfies a Lipschitz inequality: |g (y 1 , ..., y k ) − g (y 1 , ..., y k )| ≤ α k j=1 y j − y j , where α = max 1≤j≤k |α j0 |.By the least mean squared property of the conditional mean, a > 0 be some positive scalar to be chosen later.Define the function: This piecewise linear function converges pointwise to and consequently, where f (•) is the p.d.f. of z and z ∈ (γ 0 , γ 0 + ψ a (m)).Using the above inequalities gives Now, minimize the order of magnitude of the variable in the last line by setting a = 2/3.Thus, . Hence, by Proposition 2 of Jenish and Prucha [5], and using Proposition 3 of Jenish and Prucha [5] with B(z, z ) = |z|+ϕ(z ) and r = 3, we have that

C. Proofs for Section 4
Proof of Proposition 1: The proof of this proposition follows Carson and Sun [19].The latter paper is not readily applicable to our model since it relies on LLN for independent processes.Let n 0 and n 1 denote, respectively, the sizes of the censored and uncensored subsamples, and let Y i,(0) and Y i,(1) denote observations from the censored and uncensored subsample, respectively.Conditional on the state variables X n , the uncensored subsample Y i,(1) is independent and follows a truncated normal distribution.Consequently, where o p (1) holds uniformly over i since φ (•) is uniformly continuous on R. Now, let Next, we establish some inequalities for the r.v.µ in .By Lemma D1, Z in is L 2 -NED on {X i } with geometrically decaying coefficients.Then, by Proposition 3 of Jenish and Prucha [5], {µ in } is also L 2 -NED on {X i }, which is mixing satisfying Assumption 3. Consequently, {µ in } satisfies the LLN of Jenish and Prucha [5], i.e., 1 Then, by the Markov inequality, which implies max 1≤i≤n µ in = O p n 1/6 .Now, using (C1) and the last inequality gives Thus, P min Y (1) > γ + z/n|n 1 = exp − n 1 n zµ (1 + o p (1)) .Moreover, by Theorem 2, 1 (Y i > γ) is L 2 -NED on (X i , ε i ) , and hence, satisfies the LLN of Jenish and Prucha [5]: Consequently, P min Y (1) > γ + z/n|n 1 → exp (−az) , where a = κµ.As the right-hand side of the last expression does not depend on n 1 , it is also the limit of the unconditional probability.Thus, Proof of Theorem 3: By Proposition 1, γ 0 is identified, so it remains to prove identification of θ by showing that the population objective function Q (θ, γ 0 ) = E [m (W in , θ, γ 0 )] is uniquely maximized at θ 0 .

A. Identification in ML
To prove identification in the ML case, it suffices to verify the Kullback-Leibler information inequality, i.e., for all θ ∈ Θ s.
Clearly, log f Y i |X n ; θ, γ 0 = log f Y i |X n ; θ 0 , γ 0 with positive probability if σ 2 = σ 2 0 .Suppose the opposite were true, i.e., log f Y i |X n ; θ, γ 0 = log f Y i |X n ; θ 0 , γ 0 w.p.1.Then, we would have = 0 for Y i > γ 0 , which implies that the r.v.(Y i − Z in δ 0 ) 2 and (Y i − Z in δ) 2 would have a point mass, which is impossible since at least one of the regressors in X i has a full support by Assumption 4. So, σ 2 0 is identified, and we can focus on the case: δ 0 with positive probability.Since Z in δ = Z in δ 0 with positive probability, and Φ (•) and log are strictly increasing functions, we also have log Φ with positive probability, which proves (C2).

B. Identification in LS
To prove identification in the LS case, it suffices to verify that for all θ ∈ Θ s.t.
By similar arguments as in part A, if υ = υ 0 , then Z in υ = Z in υ 0 with positive probability.Denote u = Z in υ and u 0 = Z in υ 0 , and consider the function It is strictly increasing in σ.To see this, let γ = γ 0 /σ and note that ∂ ϕ(u,σ) ∂σ by the Mill's Ratio inequality since γ = γ 0 /σ ≥ 0.Moreover, ϕ(u, σ) is also strictly increasing in u: which proves identification of υ 0 and σ 0 , and thus completes the proof of the theorem.

D. Proofs for Section 5
The following lemma collects formulas and some properties of the score and Hessian functions for the ML and LS estimators, which are used throughout the proofs.Let denote the score function, and let the Hessian matrix be denoted by: and ϕ in = ϕ(Z in , θ, γ).
A. For the ML estimator, the components of the score and Hessian are given by: For the LS estimator, the components of the score and Hessian are given by: C. Under Assumptions 1-3(i), {m(W in , θ, γ)} and {H (W in , θ 0 , γ 0 )} are L 1 -NED on (X i , ε i ) with coefficients decaying at geometric rates, and the score {s(W in , θ 0 , γ 0 )} is L 2 -NED on (X i , ε i ) with coefficients decaying at geometric rates.
The rates of the NED coefficients are required only for verification of the assumptions of the CLT and LLN for NED processes of Jenish and Prucha [5], which rely on polynomial decay rates.Clearly, the NED coefficients declining at any geometric rate will automatically satisfy these theorems, and their exact orders of magnitude are unimportant for the proofs our results.
Proof of Lemma D1: Parts A and B follow by straightforward differentiation.To show part C, observe that by Theorem 2 and Proposition 3 of Jenish and Prucha [5] with the NED coefficients decaying at geometric rates.Then, by Theorem 17.9 of Davidson [21], all products and sums of L 2 -NED terms are L 1 -NED variables with the NED coefficients decaying at the slowest rate of the multiples or summands.Thus, {m in } is L 1 -NED on (X i , ε i ) for both the ML and LS estimators.B analogous arguments, the Hessians H (W in , θ 0 , γ 0 ) are L 1 -NED on (X i , ε i ) for both the ML and LS estimators.Finally, the score of the ML estimator is L 2 -NED on (X i , ε i ) with geometric decay rates by Example 17.17 of Davidson [21] as the sum of products of 1 (Y i = 0) or 1 (Y i > γ), which are bounded and L 2 -NED, and some smooth functions, each of which is L 2 -NED on (X i , ε i ) with geometric decay rates.By analogous arguments, the score of the LS estimator is also L 2 -NED on (X i , ε i ) with geometric decay rates.A. We first verify condition (a) for the ML estimator.Note that
Next, write out the second term in (D3) as By construction, γ = min {Y i : Y i > 0} ≥ γ 0 , and hence, Note that the minimal value in the uncensored subsample, {Y i > 0}, is attained only by a single observation in the subsample.This is because the variables {Y i } are i.i.d.continuously distributed on (γ 0 , +∞) so that the probability of the event Y i = Y j , i = j, is zero.Then, Since log Φ (z) is continuously differentiable on R, we have by the Mean Value Theorem: where γ is between γ and γ 0 , and  B. We now verify condition (a) for the LS estimator.Using similar arguments as in part A, {m (W in , θ, γ 0 )} is L 1 -NED and L 1 -stochastically equicontinuous on Θ, and hence, by the ULLN of Jenish and Prucha [23], the first term on the r.h.s. of (D3) converges to zero.

by Assumption 5 ,
and by Proposition 1: E and hence, the ML estimator θ n p → θ 0 .

Table 6 .
Algorithm Performance