Nonparametric Regression with Common Shocks

Souza-Rodrigues, Eduardo A.

doi:10.3390/econometrics4030036

Open AccessArticle

Nonparametric Regression with Common Shocks

by

Eduardo A. Souza-Rodrigues

¹

Department of Economics, University of Toronto, Max Gluskin House, 150 St. George Street, 324, Toronto, ON M5S 3G7, Canada

¹

University of Toronto, Toronto, ON, Canada

Econometrics 2016, 4(3), 36; https://doi.org/10.3390/econometrics4030036

Submission received: 7 April 2016 / Revised: 20 July 2016 / Accepted: 16 August 2016 / Published: 1 September 2016

Download Versions Notes

Abstract

:

This paper considers a nonparametric regression model for cross-sectional data in the presence of common shocks. Common shocks are allowed to be very general in nature; they do not need to be finite dimensional with a known (small) number of factors. I investigate the properties of the Nadaraya-Watson kernel estimator and determine how general the common shocks can be while still obtaining meaningful kernel estimates. Restrictions on the common shocks are necessary because kernel estimators typically manipulate conditional densities, and conditional densities do not necessarily exist in the present case. By appealing to disintegration theory, I provide sufficient conditions for the existence of such conditional densities and show that the estimator converges in probability to the Kolmogorov conditional expectation given the sigma-field generated by the common shocks. I also establish the rate of convergence and the asymptotic distribution of the kernel estimator.

Keywords:

nonparametric regression; common shocks; cross-sectional dependence; disintegration theory

JEL:

C13, C14, C21

1. Introduction

Cross-sectional dependence has attracted considerable attention among economists recently.1 It is well-known that ignoring cross-sectional dependence may lead to inconsistent estimators and misleading inference. A popular and successful way to capture cross-sectional dependence is through common factors.2 Common factor models assume a finite number of unobserved factors that may be the result of economy-wide shocks with impacts on population units that may depend on the characteristics of the unit. Possible common factors include macroeconomic, technological, legal/institutional, political, environmental, health and sociological shocks, among others. The applied literature has considered, for example, technological shocks (such as new procedures, drugs and surgical techniques) affecting the relationship between countries’ healthcare attainments and their per capita health expenditures and educational levels (e.g., [20]); cross-country cross-industry analysis of returns to R&D, which are affected both by global shocks, such as the recent financial crisis, and by local shocks, such as spillovers between a limited group of industries or countries (e.g., [21]); and the analysis of transnational terrorism, where common factors may arise from common terrorist training camps, common grievances and demonstration effects (cf. [22]).

Typically, common factor models allow for a small and known number of unobserved factors. Although such an approach is convincing in empirical macro models, in microeconometric models, it is often more reasonable to think of a potentially large, possibly unknown (and maybe infinite) number of factors that can influence individuals’ behaviour. For instance, in studies of individual earnings, there are many individual-level observables and unobservables that affect income; as well as several common factors, such as region, family, male/female ratio, race composition, education, age composition, and so on (cf. [7]). The number of common factors may increase as we collect more cross-sectional observations or there may be an infinite number of unobserved factors (see, e.g., [23]).

The purpose of this paper is to study a nonparametric regression model for cross-sectional data in the presence of common shocks that are very general in nature. The common shocks can be of infinite dimension with flexible impact on different population units. For example, common shocks could take the form of a nonlinear random function of observable or unobservable individual characteristics with the effect on the i-th observation varying continuously across i depending on the value of the characteristic. We focus on nonparametric models because there may be little guidance (or justification) in practice for selecting a particular functional form for the regression function.

There has been important recent work on nonparametric models with many finite common factors (e.g., [5,15,16]). They consider common shocks that enter the regression function additively and with disturbances that are modelled as linear functions of mutually-independent unobserved common factors and individual-specific factor loadings. We, in contrast, allow the regression function to be non-separable for common shocks, and we do not require the mutual independence assumption. In other words, we allow for an unknown large, potentially infinite, number of factors that can influence individuals’ outcomes and that may interact with observable and unobservable individual characteristics in extremely rich and flexible ways. To the best of our knowledge, this is the first paper that allows for such a flexible framework.

We consider this flexible setting because we are interested in investigating how general the regression function and the common shocks can be while still allowing for meaningful nonparametric estimates. We focus on the Nadaraya-Watson kernel estimator and study the effects of general common shocks on its asymptotic properties. Asymptotic results for kernel estimators are typically obtained by manipulating conditional densities of random variables. However, if the common shocks are too general, conditional densities do not necessarily exist. Doob [24] (pp. 623–624) and Halmos [25] (Section 48) present some examples of non-existence. If conditional densities do not exist, then what we would expect to be the probability limit of the kernel estimator in the present context is either meaningless or difficult to interpret.3

The idea here is to let the common shocks be as general as possible and to work with well-defined conditional densities that adhere as closely as possible to the standard kernel literature. To do so, we appeal to the disintegration theory for conditional distributions that can be found in Pollard [28], Dellacherie and Meyer [29] and Hoffmann-Jorgensen [30]. We find that an important sufficient condition to guarantee the existence of conditional densities is that the common shocks must belong to a separable metric space equipped with the Borel σ-field. We conclude that the sufficient conditions are mild and not very restrictive in practice.4

Given the existence of conditional densities, we adjust the standard assumptions of the kernel literature to the present case. We show that the Nadaraya-Watson kernel estimator converges in probability to the Kolmogorov conditional expectation given the sigma-field generated by the common shocks. The optimal rate of convergence is the same as the rate obtained when the observations are i.i.d. The asymptotic distribution is mixed normal with weights depending on the common shocks. It is obtained by exploring a martingale difference sequence central limit theorem. We find that inference depends on how the common shocks affect the regression variables. A dichotomy similar to that of Andrews [8] is present here: if the dependent variable is mean independent of the common shocks given the explanatory variables, the usual t-test has the correct size; but if the dependent variable is not mean independent, the t statistic diverges to infinity in probability under the null hypothesis.

The closest paper in the literature to ours is that of Andrews [8], who considers a linear regression model in the presence of general common shocks. He shows that the least-squares estimator converges in probability to Kolmogorov conditional expectations given the σ-field generated by the common shocks. The random probability limit is a well-defined object because the Kolmogorov conditional expectation always exists. Andrews, therefore, does not need to guarantee the existence of conditional densities. Extending his results to a nonparametric model is important because parametric models may be misspecified. We show that the price to be paid is that mild restrictions must then be imposed on the nature of the common shocks.

The nonparametric version of the standard factor model is a special case of the model considered here. For this class of models, we show that, even though the kernel regression converges in probability to a random object measurable with respect to the common shocks, it is possible to identify and estimate the slope of the regression function. However, its location (e.g., the intercept in a linear model) is not identified even if we normalize common shocks to have a zero mean. To identify and estimate location, the dependent variable must be mean independent of the common shocks given the regressors.

Common factor models are typically applied to panel data sets (e.g., [14,17,18,19]). We view the present paper as a first step towards nonparametric panel data models that may incorporate a more general and flexible common factors structure. Indeed, in a companion paper, Souza-Rodrigues [27] develops a two-step nonparametric estimator that requires a “large-N, large-T” dataset for a generalized regression model based on the identification results of Berry and Haile [31]. The estimator applies equally to datasets with a large number of individuals in different groups and a large number of groups. The empirical application in Souza-Rodrigues [27] considers the impact of hospital volumes of surgical procedures on individual health status (e.g., mortality rate).5 Group-level observables (i.e., hospital volume of surgeries) may be correlated with group-level unobservables (hospital unobserved quality), which, by its turn, may be indexed by individual characteristics (since an unobserved hospital characteristic that is helpful for patients with some demographic characteristics may not be as helpful for other patients). The strategy proposed by Souza-Rodrigues [27] is to run a nonparametric regression of individual outcomes on individual observables within each group (hospital) in the first step. It is a nonparametric regression with common shocks where the common shocks are the group-level observables and unobservables. Because the group-level unobservables may be a (random) function of individual characteristics, it is important to allow for this possibility, as we do here.6 The results of the present paper can be incorporated in other nonlinear panel data settings.

The present paper also relates to the literature of spatial dependence.7 Typically, in this literature, common shocks are presumed to have predominantly local effects, and the dependence is modelled as a function of an exogenously-given spatial or economic distance, with some form of stationary mixing condition analogous to the time series data. Recent nonparametric versions of spatial models have been considered by Martins-Filho and Yao [37], Gerolimetto and Magrini [38], among others. Although the present paper can incorporate common shocks with differential local effects (e.g., assuming that individual factor loadings include geographic location), we do not allow individual outcomes to depend on the characteristics of other individuals. We therefore view spatial dependence models as complementary to ours.

Robinson [44] provides an alternative way of modelling cross-sectional dependence. He considers a nonparametric kernel regression in which the disturbances are represented by a (possibly infinite) sum of independent random variables with unknown weights. The structure in the disturbances is sufficiently rich to cover spatial dependence models, but, since it does not require known economic distances, it can accommodate stronger forms of dependence than mixing conditions. Robinson [44] investigates the properties of kernel estimators, and Robinson and Lee [45] study the properties of sieve estimators within this framework. The present paper can accommodate disturbances of the type represented by Robinson [44], but with a vector of common shocks in place of the vector of independent random variables. We do not require the vector of common shocks to be independent random variables, and we allow for potentially correlated random weights in the summation term for the disturbances. However, the restrictions we need to impose on the common shocks differ from the assumptions in Robinson [44]. Furthermore, we require i.i.d. sampling schemes that are neither assumed by Robinson [44], nor by the spatial dependence literature. Our model is therefore neither more general than, nor is it a special case of Robinson’s model.

The paper is organized as follows: Section 2 presents the regression model and discusses sufficient conditions to guarantee the existence of conditional densities. Section 3 establishes the asymptotic properties of the Nadaraya-Watson kernel regression estimator and discusses its implications. Section 4 concludes. The Appendix presents the disintegration theory and briefly discusses the role of separability of common shocks in the existence of conditional densities. The Supplemental Material presents results for the kernel density estimator, contains all relevant proofs and discusses the probabilistic framework adapted from Andrews [8] that justifies the approach taken here.

2. Regression Model and Conditional Densities

The dataset is

\{Y_{i}, X_{i} : i = 1, . . ., n\}

, where

Y_{i} \in Y

(

\subseteq R

) and

X_{i} \in X

(

\subseteq R^{k}

). Consider the model:

Y_{i} = m (X_{i}, C (S_{i})) + ε_{i},

(1)

where

S_{i} \in S

(

\subseteq R^{d_{s}}

, with

d_{s} \in N

) is a vector of individual-specific random variables;

C (.) \in C

is the common shock; and

ε_{i}

is the idiosyncratic error. Some components of

S_{i}

may be observable (in which case, it may be incorporated in

X_{i}

) or it may be completely unobservable. We allow the common shock

C (.)

to be either a random vector (possibly infinite-dimensional) or a random function of

S_{i}

. In the latter case, the common shocks may affect individuals differently. As usual, we use upper-case letters to denote random quantities and lower-case letters to denote realizations.

The standard parametric factor model is a special case of our model and is typically written as:

Y_{i} = α + X_{i}^{'} β + U_{i}, U_{i} = \sum_{j = 1}^{J} S_{i j} C_{j} + ε_{i},

(2)

where

S_{i} = (S_{i 1}, . . ., S_{i J})

is the vector of individual-specific factor loadings;

C = (C_{1}, . . ., C_{J})

is the vector of unobserved common factors;

ε_{i}

is the idiosyncratic error that is independent of

(X_{i}, S_{i}, C)

and has zero mean; and

(α, β)

is the vector with the parameters of interest. Cross-sectional dependence in the disturbances is generated by the term

S_{i}^{'} C

. The standard model can also accommodate cross-sectional dependence on regressors

X_{i}

. For example, consider the expanded vectors

C = (C^{1}, C^{2})

and

S_{i} = (S_{i}^{1}, S_{i}^{2})

and take

X_{i} = S_{i}^{1'} C^{1}

and

U_{i} = S_{i}^{2'} C^{2} + ε_{i}

. Note that if

C^{1} = C^{2}

, then

X_{i}

and

U_{i}

are correlated even when

S_{i}^{1}

and

S_{i}^{2}

are independent of each other (e.g., [8,10,11]). The nonparametric version of (2) takes

Y_{i} = m_{1} (X_{i}) + U_{i}

, with the same structure for the disturbances

U_{i}

.8

The standard factor model (2) is a special case of our model (1) with the regression function given by the linear and additively separable

m (X_{i}, C (S_{i})) = α + X_{i}^{'} β + C (S_{i})

and the common shock function given by

C (S_{i}) =

\sum_{j = 1}^{J} S_{i j} C_{j}

. We therefore generalize the standard model in the following ways: (i) we let the regression function

m (.)

be nonparametric; (ii) we allow the regressors

X_{i}

to freely interact with the common shock

C (.)

; and (iii) we let the common shock be a general function of individual-specific factor loadings

S_{i}

(subject to the restrictions discussed below). Furthermore, factor models typically impose independence between

S_{i}

and

(X_{i}, C)

and assume that

C = (C_{1}, . . ., C_{J})

is a mutually independent vector, while we do not need to impose these independence assumptions. We, however, do not consider a fully-non-separable model; we maintain the additive separability assumption in the idiosyncratic error

ε_{i}

.

Robinson [44] also considers a nonparametric version of (2), but with another structure for

U_{i}

. He considers the model:

Y_{i} = m_{1} (X_{i}) + U_{i}, U_{i} = σ_{i} (X_{i}) \sum_{j = 1}^{\infty} b_{i j} e_{j}, \sum_{j = 1}^{\infty} b_{i j}^{2} < \infty,

(3)

where

σ_{i}

are scalar unknown functions;

e_{j}^{'} s

are independent random variables with zero mean and unit variance; and

b_{i j}^{'} s

are unknown fixed weights.9 The present paper compares to Robinson [44] when the following holds:

m (X_{i}, C (S_{i})) = m_{1} (X_{i}) + σ_{i} (X_{i}) C (S_{i})

, with

C (S_{i}) = \sum_{j = 1}^{\infty} S_{i j} C_{j}

,

S_{i j} = b_{i j}

and

C_{j} = e_{j}

. Unlike Robinson [44], we allow for (potentially correlated) random coefficients

b_{i j}^{'} s

and do not restrict

e_{j}^{'} s

to be independent variables with zero mean and unit variances. The restrictions we need to impose on the function

C (S_{i})

are discussed below and are of a different nature than the assumptions used by Robinson [44].

Data Generation

Denote the vector

W_{i} = (Y_{i}, X_{i}, S_{i}, C) \in W

, where

W \subseteq Y \times X \times S \times C

. Define the measurable space

(W, A)

, where

A

is the Borel sigma-field. The random elements

\{W_{i} : i \geq 1\}

are defined on

(W^{N}, A^{N})

, where

W^{N}

is the product space and

A^{N}

is the product Borel sigma-field on

W^{N}

. We suppose the common shocks across observations are captured by the σ-field generated by C, denoted by

σ (C) \subset A^{N}

. We impose the following assumption:

Assumption 1

The sequence

\{W_{i} : i \geq 1\}

is i.i.d. conditional on the σ-field

σ (C) \subset A^{N}

.

As shown by Andrews [8], this assumption is valid when the units are drawn randomly from the population. One difference between the present paper and Andrews [8] is that he states the existence of some σ-field, such that the data are i.i.d. conditional on it without specifying a priori how this σ-field is constructed, while we impose more structure and state explicitly how the σ-field is generated. Andrews’ framework is, therefore, more general than ours in this respect. Note that neither the spatial dependence models nor Robinson’s [44] approach require random sampling.10

Existence of Conditional Densities

Because we make use of the Nadaraya-Watson kernel estimator and because the kernel estimator requires the existence of conditional densities, we now discuss the existence problem.

To guarantee the existence of conditional densities that allow for very general common shocks, we make use of the disintegration theory. Disintegration of a probability measure is a collection of regular conditional probabilities, each satisfying (i) a concentration property (i.e., conditional on an event, the probability of its complement is zero) and (ii) a decomposition property (i.e., the probability of an event is a weighted sum of the conditional probability measures, also known as the law of total probability).11 The reader unfamiliar with disintegration theory might want to read the Appendix (or the references cited there) before proceeding.

Define the sub-vector

Z_{i} = (Y_{i}, X_{i}, S_{i}) \in Z \subseteq Y \times X \times S

, for

i \geq 1

. We want to guarantee the existence of the conditional density of

Z_{i}

given C. By Assumption 1, the probability distribution of

\{W_{i} : i \geq 1\}

, denoted by

P^{N}

, is exchangeable on

(W^{N}, A^{N})

. Call

P^{i}

the marginal distribution of

W_{i}

under

P^{N}

. We impose the following:

Assumption 2

(i)

W

is a metric space.

(ii) λ is a sigma-finite Radon measure on

(W, A)

.12

(iii) C maps

(W, A)

into

(C, B)

.

C

is a separable metric space, and

B

is the Borel σ-field.

(iv) μ is a sigma-finite measure on

(C, B)

. Let the measure

λ (C^{- 1})

induced by C and λ on

(C, B)

be absolutely continuous with respect to μ.

(v) Let

P^{i}

, for any

i \geq 1

, be absolutely continuous with respect to λ. Denote its Radon-Nikodym density by

f_{i} (z, c)

.

Assumption 2(iii) requires

C

to be a separable metric space. This is trivially satisfied when

C

belongs to a finite-dimensional Euclidean space. However, if C is an infinite dimensional vector of random variables, we need restrictions, such as

C = ℓ_{p}

, for some

1 \leq p < \infty

, where

ℓ_{p}

is the space of sequences with finite

{∥·∥}_{p}

-norm, and we need to rule out the case

C = ℓ_{\infty}

, because

ℓ_{\infty}

is non-separable. Similarly, if C is a random function of

S_{i}

, it must belong to spaces, such as the

L_{p} (S)

space for

1 \leq p < \infty

, or the space of bounded and continuous functions defined on a closed bounded subset of

S

and equipped with the sup-norm, or the Hölder space, etc. However, it cannot belong to the space of bounded functions with the sup-norm,

L_{\infty} (S)

, because it is not separable. See the discussion about the role of separability for existence of conditional densities in the Appendix.13

The restrictions in Assumption 2 are mild and sufficient to guarantee the existence of conditional densities of

Z_{i}

given C for any

i \geq 1

. The reason for sufficiency is the following: first, Assumptions 2(i)–(iv) are sufficient for the sigma-finite Radon measure λ to have a

(C, μ)

-disintegration; i.e., they guarantee the existence of a collection of measures, denoted by

Λ = \{λ_{c} : c \in C\}

, that satisfy the aforementioned concentration and decomposition properties (but note that

λ_{c}^{'} s

do not have to be probability measures; see Definition 3 and Theorem 1 in the Appendix).

Second, if the disintegration

Λ = \{λ_{c} : c \in C\}

exists and the probability measure

P^{i}

on

(W, A)

is absolutely continuous with respect to λ with density

f_{i} (z, c)

(Assumption 2(v)), then two implications follow (see Theorem 2 in the Appendix): (i) the probability distribution of C induced by

P^{i}

(i.e., the image measure

Q^{i} = P^{i} (C^{- 1})

) is absolutely continuous with respect to μ with density:

q_{i} (c) \equiv \int f_{i} (\tilde{z}, c) d λ_{c} (\tilde{z})

(4)

and (ii) the probability measure

P^{i}

has a conditional distribution given C, denoted by the collection

P^{i} = \{P_{c}^{i} : c \in C\}

, where

P_{c}^{i}

is defined by having density:

f_{i} (z | c) \equiv \frac{f_{i} (z, c)}{q_{i} (c)} 1 \{0 < q_{i} (c) < \infty\}

(5)

with respect to

λ_{c}

for

Q^{i}

-almost all

c \in C

. The conditional density

f_{i} (z | c)

is therefore similar to elementary conditional densities: it is the ratio of the joint density

f_{i} (z, c)

and the marginal

q_{i} (c)

. However, it does not require C to belong to a finite-dimensional Euclidean space.

Because C is common to all i, the equality

Q = Q^{i}

follows for all

i \geq 1

. In addition,

f_{i} (z | c) = f_{j} (z | c)

for all

i \neq j

and for Q-almost all

c \in C

by Assumption 1. We state this result as a lemma:

Lemma 1.

Let Assumptions 1 and 2 hold. Then, there exist conditional densities of

Z_{i}

given C, for all

i \geq 1

, defined by:

f_{i} (z | c) = \frac{f_{1} (z, c)}{q (c)} 1 \{0 < q (c) < \infty\},

(6)

for Q-almost all

c \in C

, where

q (c) \equiv \int f_{1} (\tilde{z}, c) d λ_{c} (\tilde{z})

.14

Example 1.

Suppose

S_{i}

is scalar and

C

is the separable Hilbert space

L_{2} (S)

. Take a basis

{\{ϕ_{j}\}}_{j = 1}^{\infty}

for

L_{2} (S)

and represent the common shock by

C (S_{i}) = \sum_{j = 1}^{\infty} C_{j} ϕ_{j} (S_{i})

, where

C_{j} \in R

for

j \geq 1

. Note that one can define

S_{i j} = ϕ_{j} (S_{i})

, in which case the random coefficients are not independent of each other. More important for us is to note that selecting a function in

C

is equivalent to selecting the infinite dimensional vector

{\{C_{j}\}}_{j = 1}^{\infty}

in

ℓ_{2}

. Let

B (L_{2})

be the Borel σ-field on

L_{2}

and

B (ℓ_{2})

be the Borel σ-field on

ℓ_{2}

. Because the spaces

L_{2}

and

ℓ_{2}

are homeomorphic, their topologies are equivalent, and so,

B (L_{2})

and

B (ℓ_{2})

are equivalent.15 As a result, the event

\{C (·) = c\}

on

L_{2}

is equivalent to the (potentially more intuitive) event

\{(C_{1}, C_{2}, . . .) = (c_{1}, c_{2}, . . .)\}

on

ℓ_{2}

. In addition, conditioning on

\{C (·) = c\}

is equivalent to conditioning on

\{(C_{1}, C_{2}, . . .) = (c_{1}, c_{2}, . . .)\}

. We have, therefore,

f (z | c (·)) = f (z | c_{1}, c_{2}, . . .)

and:

\begin{matrix} Pr (Z_{i} \in A | C (·) = c) & = & Pr (Z_{i} \in A | C_{1} = c_{1}, C_{2} = c_{2}, . . .) \\ = & \int_{A} f (\tilde{z} | c_{1}, c_{2}, . . .) d \tilde{z}, \end{matrix}

(7)

for any measurable set A.16

Example 1 intends to translate properties of conditional probabilities given an element in some abstract space of functions into properties in (hopefully) more concrete spaces defined by random vectors. Example 1, however, does not apply when

C

is not a Hilbert space. Although we may approximate any of the separable metric spaces by other simpler spaces, the conditioning argument does not hold without running into problems, such as the Borel paradox (see, e.g., [47]). For instance, take

C

to be the set of bounded and continuous functions,

(B C (S), {∥·∥}_{\infty})

. It is separable, and any

c \in C

can be well approximated by a polynomial of order

J < \infty

, say

p^{J} (·)

with some coefficients

{(b_{j})}_{j = 1}^{J}

. Because we can take J such that

{∥c - p^{J}∥}_{\infty} < ε

, for some

ε > 0

, the probability of the event

\{C (·) = c\}

is close to the probability of the event

\{{(B_{j})}_{j = 1}^{J} = {(b_{j})}_{j = 1}^{J}\}

. However, the topology of

(B C (S), {∥·∥}_{\infty})

is not the same as the topology of the Euclidean

R^{J}

for any finite J. Therefore, the Borel σ-field on

B C (S)

is different from the Borel σ-field on any

R^{J}

. Conditioning on different σ-fields delivers different conditional probability distributions, and so, we are not guaranteed to have

Pr (Z \in A | C (·) = c)

close to

Pr (Z \in A | \cap_{j = 1}^{J} \{B_{j} = b_{j}\})

for all measurable sets A. We can still obtain the existence of conditional densities, but we cannot derive conclusions based on some approximation

\sum_{j = 1}^{J} b_{j} p_{j} (S_{i})

for

C (S_{i})

, no matter how large J is.

3. Regression Estimator

Next, we consider the properties of the Nadaraya-Watson kernel regression estimator:

\hat{m} (x) = \frac{\sum_{i = 1}^{n} Y_{i} K (\frac{X_{i} - x}{h_{n}})}{\sum_{i = 1}^{n} K (\frac{X_{i} - x}{h_{n}})},

(8)

where

K

(·) is the kernel function and

h_{n}

is the bandwidth. As previously mentioned, the objective here is to work as closely as possible to the standard kernel literature. The assumptions we impose are therefore similar to the standard assumptions (see Pagan and Ullah [48]), but with the population density and regression function substituted for the corresponding conditional functions and with the extra “Q-almost all c” qualifiers added. For brevity, we relegate the properties of the kernel density estimator to the Supplemental Material.

We maintain Assumptions 1 and 2 from now on. In addition, we impose the following conditions:

Condition 1.

Let K be the class of all Borel measurable nonnegative bounded real-valued functions

K (u)

, such that: (i)

\int K (u) d u = 1

; (ii)

\int |K (u)| d u < \infty

; (iii)

|K (u)| {∥u∥}^{k} \to 0

as

∥u∥ \to \infty

; (iv)

κ = \int K^{2} (u) d u < \infty

; (v)

{sup}_{u}

|K (u)| < \infty

; and (vi)

μ_{2} = \int u^{2} K (u) d u < \infty

.

Condition 2.

For Q-almost all

c \in C

, the conditional density

f (x | c)

is continuous at any point

x_{0}

.

Condition 3.

(i)

h_{n} \to 0

as

n \to \infty

and (ii)

n h_{n} \to \infty

as

n \to \infty

.

Condition 4.

For Q-almost all c, (i)

f (x | c)

is twice continuously differentiable with respect to x in some neighbourhood of

x_{0}

and (ii) the second-order derivatives of

f (x | c)

with respect to x are bounded in this neighbourhood.

Condition 5.

For Q-almost all c, the point

x_{0}

is in the interior of the support of X conditional on

\{C = c\}

and

f (x_{0} | c) \geq ξ > 0

, for some finite ξ.

Condition 6.

The kernel K is a symmetric function satisfying

\int u K (u) d u = 0

.

Condition 7.

(i)

E [ε_{i} | X_{i}, σ (C)] = 0

a.s.; and (ii) let

σ^{2} (x, c) =

E (ε_{i}^{2} | X_{i} = x, C = c)

, and assume

σ^{2} (X, C) < \infty

a.s..

Condition 8.

For Q-almost all c, the function

m (x, c)

is twice continuously differentiable with respect to x in some neighbourhood of

x_{0}

.

Conditions 1–5 suffice to obtain the asymptotic properties of the kernel density estimator (consistency, rate of convergence and asymptotic distribution; see the Supplemental Material). Condition 6 is standard in the literature.

Condition 7(i) implies

m (x, C) = E [Y | X = x, σ (C)]

. In the standard factor model, this translates into:

m (x, C) = E [Y | X = x, C] = α + x^{'} β + \sum_{j = 1}^{J} E [S_{i j} | X = x, C] C_{j} .

(9)

Note that

m (x, C)

is a random object because C has not been fixed. Typically in the literature,

S_{i}

is assumed to be independent of

(X_{i}, C)

, in which case

E [S_{i j} | X = x, C] = E [S_{i j}] \equiv b_{i j},

where

b_{i j}

is an unknown constant. Unlike the standard model, here, we allow J to be infinite (as long as C belongs to an appropriate separable metric space); we do not require

S_{i}

to be independent of

(X_{i}, C)

, and we allow for more complicated interactions between X and C.

Condition 7(ii) allows for conditional heteroskedasticity; and Condition 8 is used to apply Q-almost sure Taylor expansions similar to what is usually done in the kernel literature.

Remark 1.

Condition 8 requires

m (x, c)

to be twice continuously differentiable in x for almost all c. To fix ideas, consider the following case: let

S_{i} = X_{i}

,

C (·) \in L_{2} (X)

and

m (X, C) = m_{1} (X_{i}) + C (X_{i})

. Conditioned on the event

\{X_{i} = x\} \cap \{C (·) = c\}

, we have that:

\begin{matrix} E [Y_{i} | X_{i} = x, C (·) = c] & = & m_{1} (x) + (\sum_{j = 1}^{\infty} c_{j} ϕ_{j} (x)) \\ = & m (x, c), \end{matrix}

while conditioning only on the event

\{X_{i} = x\}

, we obtain the random object:

\begin{matrix} E [Y_{i} | X_{i} = x, C (x)] & = & m_{1} (x) + (\sum_{j = 1}^{\infty} C_{j} ϕ_{j} (x)) \\ = & m (x, C) . \end{matrix}

Therefore, to satisfy Condition 8, we need

E [Y_{i} | X_{i}, C (X_{i})]

to be twice continuously differentiable with respect to both the first and second arguments, and we also need

C (·)

to be twice continuously differentiable with respect to x.17

To obtain the consistency of

\hat{m} (x)

, we first show that the kernel density converges in probability to the conditional density

f (x | C)

. Then, we prove that the mean-squared error of

\hat{m} (x)

conditional on

σ (C)

converges to zero in probability. Finally, consistency follows by the dominated convergence theorem. We then show that the rate of convergence is the same as the rate of convergence without common shocks. The pointwise asymptotic distribution is obtained using the martingale difference sequence central limit theorem.

Proposition 1.

Let

E [· | X = {\{x_{i}\}}_{i = 1}^{n}]

denote the conditional expectation given

x_{i}

,

i = 1, . . ., n

. Let Assumptions 1 and 2 and Conditions 1–8 hold. Then:

1.: $\hat{m} (x) \overset{p}{⟶} m (x, C)$ as $n \to \infty .$
2.: $\hat{m} (x) - m (x, C) = O_{p} (n^{- \frac{2}{4 + k}}) .$
3.: Suppose also that $\int {|K (u)|}^{2 + δ} d u < \infty$ and $E [{|ε_{i}|}^{2 + δ}] < \infty$ , for some $δ > 0$ . Define $σ^{2} (x, C) = E (ε_{i}^{2} | X_{i} = x, C)$ . Then, (i) as $n \to \infty$ :

$\sqrt{n h_{n}^{k}} (\hat{m} (x) - E [\hat{m} (x) | X = {\{x_{i}\}}_{i = 1}^{n}, C]) \overset{d}{⟶} (\frac{σ^{2} (x, C)}{f (x | C)} \int K^{2} (u) d u) N (0, 1)$

and (ii) if, in addition, $\sqrt{n h_{n}^{k}} h_{n}^{2} \to 0$ as $n \to \infty$ , then:

$\sqrt{n h_{n}^{k}} (\hat{m} (x) - m (x, C)) \overset{d}{⟶} (\frac{σ^{2} (x, C)}{f (x | C)} \int K^{2} (u) d u) N (0, 1)$

as $n \to \infty$ .

Proposition 1.1 shows that the kernel regression estimator converges in probability to the random object

m (x, C) = E [Y | X = x, σ (C)]

. In general,

m (x, C)

is different from the conditional expectation

m (x) = E [Y | X = x]

; the equality

m (x, C) = m (x)

only holds when Y is mean independent of C given X. To see how this difference may affect the interpretation of potential estimands, take the standard factor model as an example.18 In this case,

m (x, C)

is given by (9), while

m (x)

is given by:

m (x) = E [Y | X = x] = α + x^{'} β + \sum_{j = 1}^{J} E [S_{i j} C_{j} | X = x] .

(10)

If we assume, as is usually done, that

S_{i}

is independent of

(X_{i}, C)

, we have that

E [S_{i j} C_{j} | X = x] = b_{i j} E [C_{j} | X = x]

. If there is no cross-sectional dependence on regressors resulting from the common shocks, then

E [C_{j} | X = x] = E [C_{j}]

. In addition, if we normalize

E [C_{j}] = 0

for all j, then

m (x) = α + x^{'} β

, while

m (x, C) = α + x^{'} β + \sum_{j = 1}^{J} b_{i j} C_{j}

. Because Y is not mean independent of C given X,

\hat{m} (x) \overset{p}{⟶} m (x, C) \neq m (x)

.

Although we cannot estimate

m (x)

consistently, it is possible to identify and estimate β by noting that

m (x_{1}, C) - m (x_{2}, C) = {(x_{1} - x_{2})}^{'} β

, for

x_{1} \neq x_{2}

. Similarly, for nonparametric factor models,

Y_{i} = m_{1} (X_{i}) + U_{i}

, one can identify and estimate the slope of

m_{1} (x)

. However, the presence of the common shocks

\sum_{j = 1}^{J} b_{i j} C_{j}

prevents the identification of the intercept α in the linear model (and the identification of the location of

m_{1} (x)

in the nonparametric model) even if we normalize

E [C_{j}] = 0

for all j.

Remark 2.

The nonparametric factor model with

J = \infty

,

E [S_{i j} | X = x, C] = b_{i j}

and

E [C_{j}] = 0

, for all j, has a structure similar to the one proposed by Robinson [44]. Yet, while Robinson [44] shows that the kernel regression estimator converges in probability to

m (x)

, we obtain convergence to

m (x, C)

. An important distinction comes from the assumption on the sampling process. Because we have exchangeable data given the common shocks (Assumption 1), the conditions we impose are not sufficient to “get rid of” C in the limit. Robinson [44], in contrast, does not impose the conditional i.i.d. sampling process.

Returning to the standard factor model, if we assume now the presence of cross-sectional dependence on regressors captured by, say,

X_{i} = S_{i}^{1'} C

with

S_{i} = (S_{i}^{1}, S_{i}^{2})

, then

E [C_{j} | X = x] \neq E [C_{j}]

and:

\begin{matrix} m (x, C) & = & α + x^{'} β + \sum_{j = 1}^{J} b_{i j} C_{j}, \\ m (x) & = & α + x^{'} β + \sum_{j = 1}^{J} b_{i j} E [C_{j} | X = x] . \end{matrix}

Again, Y is not mean independent of C given X, so

\hat{m} (x) \overset{p}{⟶} m (x, C) \neq m (x)

, but it is still possible to identify β in the parametric model and the slope of

m_{1} (x)

in the nonparametric version.19

In the standard factor model, Y is mean independent of C given X only if the common shocks have no direct effect on Y. This is the case when

E [S_{i j}] = b_{i j} = 0

. When this is true,

m (x, C) = m (x)

, and the kernel regression converge in probability to

m (x)

, even when there is cross-sectional dependence on X. In this case, we identify both parameters α and β in the linear model and

m_{1} (x)

in the nonparametric model. Note that assuming

E [S_{i j}] = 0

for all j is not an innocuous normalization, but a substantive assumption.

Remark 3.

The last case is similar to Andrews [8]. Let

X_{i} = S_{i}^{1'} C^{1}

and

U_{i} = S_{i}^{2'} C^{2} + ε_{i}

, where

C = (C^{1}, C^{2})

is mutually independent and

S_{i} = (S_{i}^{1}, S_{i}^{2})

(see Andrews’ Assumption SF1). Imposing

E [S_{i j} | X = x, C] = E [S_{i j}] = 0

is similar to imposing Andrews’ Condition SF3. Assuming Condition 7(i) together with mutual independence

(S_{i}^{1}, S_{i}^{2}, ε_{i})

is similar to Andrews’ Condition SF2.

Proposition 1.2 shows that the rate of convergence of the kernel regression in the presence of common shocks is the same as the rate of convergence without common shocks.

Proposition 1.3 presents the asymptotic distribution of the kernel regression estimator. It shows that even when

\hat{m} (x) \overset{p}{⟶} m (x, C) = m (x)

, the common shocks affect the asymptotic distribution of the kernel regression because they may impact both the conditional variance of Y and the conditional density of X. This result is similar to that of Andrews [8], Robinson [44] and others.

Remark 4.

A consequence of Proposition 1.3 is that inference results depend on whether Y is mean independent of C given X. To test a null hypothesis, say,

H_{0} : m (x) = m_{0} (x)

against

H_{1} : m (x) \neq m_{0} (x)

, the corresponding t statistics is:

T_{n} = \sqrt{n h_{n}^{k}} \frac{(\hat{m} (x) - m_{0} (x))}{{(\frac{{\hat{σ}}^{2} (x)}{\hat{f} (x)} \int K^{2} (u) d u)}^{1 / 2}} .

The usual two-sided t test with significance level α rejects the null if

|T_{n}| > z_{1 - α / 2}

, where

z_{α}

is the α quantile of the standard normal distribution. If Y is mean independent of C given X, then

Pr (|T_{n}| > z_{1 - α / 2}) \to α

as

n \to \infty

. Otherwise, we have

Pr (|T_{n}| > z_{1 - α / 2}) \to 1

as

n \to \infty

.20

Remark 5.

The bandwidth can be chosen by minimizing the approximated integrated mean squared error (

A M I S E

) conditional on

σ (C)

. The bandwidth must be a

σ (C)

-measurable random variable,

h_{n} (C)

. In the Supplemental Material, we show that

h_{n} (C) = O_{p} (n^{- \frac{1}{4 + k}})

, and one might expect both plug-in and cross-validation estimators to be consistent. The usual concerns in the literature about how to select the bandwidth are present here, but for brevity, we do not investigate the topic further. We only emphasize that the bandwidth choice based on the unconditional

A M I S E

is infeasible because it is impossible to estimate the distribution of C (and integrate that out) using a single cross-sectional dataset.

4. Conclusions

In this paper, we investigate a nonparametric regression estimator for cross-sectional data in the presence of very general, potentially infinite-dimension, common shocks. In a companion paper Souza-Rodrigues [27], we extend the results to a “large-N, large-T” panel data framework for a nonlinear generalized regression model. We plan to investigate extensions to finite-T panel data models in the future.

Supplementary Materials

Supplementary File 1

Acknowledgments

I am grateful to Donald Andrews, Xiaohong Chen, Philip Haile, Steven Berry, Tai Otsu, Yuichi Kitamura, Ed Vytlacil, Peter Phillips, Marfisa Queiroz, two anonymous referees, and the participants of the Econometrics Lunch at Yale. Financial support from Charles V. Hickox Fellowship at Yale University, Yale University Fellowship, and Kernan Brothers Environmental Fellowship at Harvard University are gratefully acknowledged. All errors are mine.

Conflicts of Interest

The author declares no conflict of interest.

Appendix A. Disintegration Theory

We follow the discussion in Pollard [28] (Chapter 5 and Appendix F).21 Throughout this section, let the measurable space be

(Ω, F)

, and let C be a measurable map from

(Ω, F)

into

(C, B)

. Let λ be a sigma-finite measure on

F

and μ be a sigma-finite measure on

B

. The definition of conditional distributions given in Pollard [28] (p. 113) is:

Definition 1.

Let P be a probability measure on

(Ω, F)

, and let Q be the probability distribution of C induced by P. A family

P = \{P_{c} : c \in C\}

of probability measures on

F

is called the conditional probability distribution of P given C if:

1.: $P_{c} \{C \neq c\} = 0$ , for Q-almost all $c \in C$ ;
and for each nonnegative measurable function f on Ω:
2.: the map $c \mapsto \int f (ω) d P_{c} (ω)$ is $B$ -measurable; and
3.: the equality $\int f (ω) d P = \int [\int f (ω) d P_{c} (ω)] d Q (c)$ holds.

The conditional probability distribution

P = \{P_{c} : c \in C\}

is a family of probability measures satisfying (i) a concentration property (

P_{c} \{C \neq c\} = 0

), (ii) a measurability condition (Property 2) and (iii) a decomposition property (Property 3). Unfortunately, the conditional probability distribution may not exist. The Kolmogorov conditional expectation, on the other hand, always exists. For completeness, we state the definition below Pollard [28] (p. 126):

Definition 2.

Let f be a random variable on Ω and P be a probability measure on

(Ω, F)

. For each sub-sigma-field

G \subset F

, the conditional expectation

E (f | G)

is the random variable defined on

(Ω, G)

, such that, for all sets

A \in G

, with indicator functions

I_{A}

,

\int \{I_{A} (ω) f (ω)\} d P (ω) = \int \{I_{A} (ω) [E (f | G) (ω)]\} d P (ω) .

(11)

The random variable

E (f | G)

is called the conditional expectation of f given the sub-sigma-field

G

and it is unique up to P-equivalence.

If the conditional probability distribution of P given C exists,

P = \{P_{c} : c \in C\}

, the

B

-measurable function defined by:

E (f | C = c) = \int f (ω) d P_{c} (ω)

satisfies the equality (11) for Q-almost all

c \in C

.

The problem with the Kolmogorov conditional expectation is that each of its usual properties (mainly, being a linear increasing functional of f satisfying the monotone convergence property) holds Q-almost everywhere, but with possible uncountably many negligible sets in which these properties do not hold. The accumulation of these null sets may lead to paradoxes when one is trying to compute the conditional expectation (see, e.g., [47]). To avoid these difficulties, topological assumptions are invoked to guarantee the existence of conditional probability distributions

P = \{P_{c} : c \in C\}

, such that all of the properties of the Kolmogorov conditional expectation are satisfied except in countably many Q-negligible sets. By collecting all of these countably many Q-negligible sets into a single Q-null set, we avoid the problems and paradoxes coming from an accumulation of uncountably many null sets. Under the topological assumptions, the family

P = \{P_{c} : c \in C\}

is a version of the Kolmogorov conditional expectation that does not run into such difficulties and, as a by-product, guarantees the existence of conditional densities. The conditional densities may then be (carefully) manipulated preserving the intuition we have for the cases where the conditioning event has positive probability.

The existence of conditional probability distribution follows from a general decomposition called disintegration. The definition of disintegration given in Chang and Pollard [50] (p. 292) is:

Definition 3.

The measure λ has a disintegration

Λ = \{λ_{c} : c \in C\}

with respect to C and μ (or a

(C, μ)

-disintegration) if:

1.: $λ_{c}$ is a sigma-finite measure on $F$ concentrated on $\{C = c\}$ , that is $λ_{c} \{C \neq c\} = 0$ for μ-almost all c; and for each nonnegative measurable function f on Ω:
2.: the map $c \mapsto \int f (ω) d λ_{c} (ω)$ is $B$ -measurable; and
3.: the equality $\int f (ω) d λ = \int [\int f (ω) d λ_{c} (ω)] d μ (c)$ holds.

From the definitions, it is clear that a

(C, μ)

-disintegration

Λ = \{λ_{c} : c \in C\}

can be a conditional probability distribution

P = \{P_{c} : c \in C\}

. Yet, it is useful to let Λ be a collection of sigma-finite measures, so that we can define conditional densities with respect to dominating measures.

Based on this disintegration, we can define a new measure

μ \otimes Λ

on the product

(Ω \times C, F \otimes B)

, by the iterated integral:

(μ \otimes Λ) (A) = \int [\int I_{A} d λ_{c} (ω)] d μ (c)

for all sets

A \in F \otimes B

. The measure

μ \otimes Λ

has to be well-defined to satisfy Condition 3 of the Definition 3.

The existence of disintegration is guaranteed by the following theorem (Theorem 6 in Pollard [28], Appendix F)).

Theorem 1.

(Existence of disintegration) Let λ be a sigma-finite Radon measure on the Borel sigma-field

F

of a metric space Ω. Let μ be a sigma-finite measure on

B

that dominates the image measure

λ (C^{- 1})

(i.e., the measure on

B

induced by the map C and the measure λ). If the set:

g r a p h (C) \equiv \{(ω, c) \in Ω \times C : C (ω) = c\}

is

F \otimes B

measurable, then λ has a

(C, μ)

-disintegration,

Λ = \{λ_{c} : c \in C\}

, uniquely determined up to μ-equivalence (i.e., if

{λ_{c}^{*} : c \in C}

is another

(C, μ)

-disintegration, then

μ \{c \in C : λ_{c} \neq λ_{c}^{*}\} = 0

).

To guarantee the existence of the

(C, μ)

-disintegration, we need, therefore, to restrict: (i) Ω to be a metric space with the Borel sigma-field

F

; (ii)

λ

to be a sigma-finite Radon measure; and (iii) the set

g r a p h (C) \equiv \{(ω, c) \in Ω \times F : C (ω) = c\}

to be

F \otimes B

measurable. Depending on the problem at hand, it may be reasonable to assume (i) directly. To see the importance of the requirements (ii) and (iii), we briefly describe how the proof works. We then finally discuss the existence of conditional densities.

First, assume that Ω is a compact metric space, and let

K_{0}

be a compact paving. A compact paving is a class of compact sets in Ω that is closed under finite unions and intersections. One can show that

K_{0}

is countable when Ω is compact. The proof carefully constructs a finitely additive measure

λ_{c} : K_{0} \to R^{+}

, for some

c \in C

, so that the desired “measure-like” properties of the disintegration (Definition 3) hold for μ-almost all c. Because

K_{0}

is countable, all of the desired properties of

λ_{c}

hold, except on countably many negligible sets, which can be collected into a single negligible set. It is shown, then, that there exists a unique extension of

λ_{c}

to a countably additive measure defined on a sigma-field containing

K_{0}

(see [28], Appendix A)). The extension is (inner) approximated by compact sets. By construction, all of the desirable properties hold for the extension of

λ_{c}

and for all

c \notin N

, where N is a single set with

μ (N) = 0

. The proof then shows that

c ⟼ λ_{c} (A)

is

B

-measurable for all Borel sets

A \in F

. Finally, the argument is extended for Ω that is not compact, but the measure λ concentrates all of the mass on a disjoint union of countably many compact Borel sets; i.e., the measure λ is a sigma-finite Radon measure. Intuitively, the proof explores compact approximations as a way to obtain countable additivity from finite additivity and to collect the negligible sets into a single null set N.

Pachl [49] shows that a sigma-finite Radon λ (Requirement (ii)) is a necessary condition for existence of disintegration. Therefore, even when Ω is not compact (or not separable), λ must have separable support.22

The third requirement, the

F \otimes B

-measurability of the set

g r a p h (C)

, is also necessary because the measure:

(μ \otimes Λ) (A) = \int [\int I_{A} d λ_{c} (ω)] d μ (c) = λ \{ω \in Ω : (ω, C (ω)) \in A\}

is well-defined only if

A \in F \otimes B

. The condition is not innocuous: it is well known that the

g r a p h (C)

may not be

F \otimes B

-measurable even when C is measurable. The

F \otimes B

-measurability can be obtained if the σ-field

B

is countably generated and contains all of the singleton sets

{c}

(see [28], p. 344). In particular, if

B

is the Borel σ-field on the separable metric space

C

, these conditions are satisfied (see [28], p. 103).

A separable

C

with the Borel σ-field

B

is sufficient, but not necessary, for the

F \otimes B

-measurability of the

g r a p h (C)

. It is possible, but not trivial to obtain such a result for non-separable spaces. Hansell [51] provides very abstract (and somewhat difficult to interpret) sufficient conditions for the

F \otimes B

-measurability when

C

is not separable. Yet, even if the

F \otimes B

-measurability holds for a non-separable

C

, the Radon measure λ puts all mass on a separable subset of

C

. To see why, let G be a countable union of compact sets on Ω, such that

λ (G^{c}) = 0

, where

G^{c}

is the complement of G. The map

g : Ω \to Ω \times C

defined by

g (ω) = (ω, C (ω))

is such that λ concentrates all mass in the set

g (G)

. If C is Borel measurable, the set

g (G) \subset Ω \times C

is separable and, so, is

C (G)

(see Bogachev [52], Corollary 6.10.17)). The image measure of C under λ therefore puts all mass on a separable subset of

C

when

C

is non-separable. Therefore, although

C

does not have to be separable to obtain the existence of disintegration, it seems difficult to get away from separability in this context.

The next theorem provides the conditions under which conditional densities exist (Theorem 12 in Pollard [28], Chapter 5).

Theorem 2.

(Conditional densities) Let P be a probability measure on

(Ω, F)

with density

f (ω)

with respect to the sigma-finite measure λ. Let λ have a

(C, μ)

-disintegration

Λ = \{λ_{c} : c \in C\}

. Then:

1.: The image measure $Q = P (C^{- 1})$ (i.e., the probability distribution of C induced by P) is absolutely continuous with respect to μ, with density $q (c) \equiv \int f (ω) d λ_{c} (ω)$ .
2.: The set $\{(ω, c) \in Ω \times C : q (c) = \infty or q (c) = 0\}$ has zero $μ \otimes Λ$ measure.
3.: The probability measure P has conditional distribution $\{P_{c} : c \in C\}$ given C, where $P_{c}$ is defined by having density:

$f (ω | c) \equiv \frac{f (ω)}{q (c)} \{0 < q (c) < \infty\}$

(12)

with respect to $λ_{c}$ , for Q-almost all $c \in C$ .

The formula in (12) is the general version of the conditional density as the ratio of the joint density to the marginal density, but not requiring C to belong to a Euclidean space. To guarantee the existence of the conditional density, we therefore need the existence of the

(C, μ)

-disintegration

Λ = \{λ_{c} : c \in C\}

. For a more detailed discussion, see [28,29,30,50].

^1.See, for example, Arbia [1], the proceedings of the 2008 Cowles Summer Conference [2], the special issue of the Journal of Econometrics (“Analysis of Spatially Dependent Data,” 2007, 140(1), edited by Baltagi, Kelejian and Prucha), and the special issue of Econometrics (“Spatial Econometrics,” 2015, edited by Arbia and Lee). For recent surveys, see [3,4,5].
^2.See, e.g., [6,7,8,9,10,11,12,13,14,15,16,17,18,19].
^3.Formally, the probability limit of the kernel estimator for a nonstationary process can be obtained using the concept of local time, as in Wang and Phillips [26]. However, the probability limit of the kernel regression estimator may not be measurable with respect to the conditioning variables, including the common shocks. This is a particularly important problem when we extend the results to panel data models, as in Souza-Rodrigues [27].
^4.Although separability is not a necessary condition, it seems difficult to avoid it if we are to obtain the existence of conditional densities; see the discussion about the role of separability in the Appendix. Note that several separable metric spaces satisfying the sufficient conditions are available, but careful interpretation is needed in particular cases. For instance, suppose that an infinite-dimensional common shock can be well-approximated by a finite dimensional object. Because the sigma-field generated by the common shock may be different from the sigma-field generated by the approximating object, the conditional expectations given the common shock and given the finite-dimensional object are different. Ignoring this difference leads to problems such as the Borel paradox.
^5.The motivation for this application is that numerous studies have documented an inverse relationship between hospital volumes of operations and mortality rates (see [32]). This suggests that thousands of deaths per year could have been prevented if hospitals with inadequate experience (i.e., with low volume of operations) had performed fewer surgical procedures. The evidence, however, is weak for most operations. Furthermore, existing papers have estimated parametric models that may be misspecified and have not considered the potential correlation between hospital volume of operations and hospital unobserved quality.
^6.The second step runs a nonparametric instrumental variable regression across groups (hospitals) of the predicted outcome obtained in the first step on the group-level observables. It separates the impacts of group-level observables (hospital volume of surgeries) and unobservables (hospital unobserved quality).
^7.See, e.g., [33,34,35,36,37,38,39,40,41], and the discussion in [42]. For a recent survey, see [43].
^8.In a panel data setting, one typically allows for time-varying regressors $X_{i t}$ , but restricts $S_{i}$ , so that it does not vary over time, and the common shock C, so that it does not vary across individuals. Fixed-effect panel data models let $X_{i t}$ and $S_{i}$ be correlated.
^9.Note that this approach does not require known economic distances, but can readily accommodate them by taking $U_{i} = σ_{i} (X_{i}) \sum_{j = 1}^{\infty} b_{i j} e_{i}$ , $e = (e_{1}, . . ., e_{n})$ and by making some assumptions regarding how $b_{i j}$ depends on the distance $|i - j|$ .
^10.When Andrews [8] specializes to factor structure models, he imposes more restrictions on the common shocks, which makes his approach more similar to ours.
^11.A regular conditional probability, $Pr (Y | X = x)$ , is a family of probability distribution, such that (i) for a fixed x, $Pr (· | X = x)$ is a probability measure and (ii) for a fixed measurable set A, $Pr (A | X = x)$ is a measurable function mapping x to $[0, 1]$ .
^12.The measure λ is Radon if (i) $λ (K) < \infty$ for each compact K and $λ (B) = sup \{λ (K) : B \supseteq K, K compact\} .$
^13.It is possible to characterize all of the objects in Assumption 2 when $W = Z \times C$ . First, we have that (i) $W$ is a separable metric space provided that $C$ is a separable metric space, as well, and (ii) the Borel σ-field $A$ on $W$ equals the product Borel σ-field $A_{Z} \otimes B$ , where we denote $A_{Z}$ the Borel σ-field on $Z$ (see [46], Proposition 1.5). Second, let $π_{c}$ be the projection of $W$ onto the coordinate space $C$ , i.e., $π_{c} : W \to C$ . Then, (i) the sub-sigma field $π_{c}^{- 1} (B)$ is contained in $A$ and (ii) because $C (w) = π_{c} (w)$ , for all $w \in W$ ; the sigma-field generated by C is $σ (C) = π_{c}^{- 1} (B) \subset A$ . Furthermore, if we define the sigma-finite Radon λ on $(W, A)$ to be the product measure $λ = ν \otimes μ$ , where $ν$ is defined on $(Z, A_{Z})$ and μ on $(C, B)$ , then the measure $λ (C^{- 1})$ induced by C and λ on $(C, B)$ equals μ, and so, $λ (C^{- 1})$ is (trivially) absolutely continuous with respect to μ. Finally, we have to assume both ν and μ are sigma-finite Radon, so that λ is sigma-finite Radon on $A$ , as well.
^14.Note that we can manipulate the conditional density (6) on $Z \otimes C$ as is usually done. Fix $C = c$ and think of $Z \otimes {c}$ as a copy of $Z$ embedded into the product space. For a fixed $c \in C$ , take the measure $λ_{c}$ living on $Z \otimes {c}$ to coincide with the Lebesgue measure on $Z$ . If $r ($ ·) is a vector-valued function with $E ∥r (Z)∥ < \infty$ , then:

$E [r (Z) | C = c] = \int r (\tilde{z}) d P_{c} (\tilde{z}) = \int r (\tilde{z}) f (\tilde{z} | c) d λ_{c} (\tilde{z}) = \int r (\tilde{z}) f (\tilde{z} | c) d \tilde{z} .$
^15.Any infinite-dimensional separable Hilbert space, say $H$ , is isometrically isomorphic to a suitable $ℓ_{2} (I)$ , where the cardinality of the set I is the cardinality of an arbitrary Hilbertian basis for $H$ , i.e., there exists a linear operator $L : H \to ℓ_{2} (I)$ , such that ${∥L h∥}_{2} = {∥h∥}_{H}$ , where $h \in H$ , ${∥·∥}_{H}$ is the norm on $H$ and ${∥·∥}_{2}$ is the $ℓ_{2}$ -norm.
^16.Conditioning on the event $\{(C_{1}, C_{2}, . . .) = (c_{1}, c_{2}, . . .)\}$ is only one possibility. For some $a \in R$ , we could condition either on the event $\{C (S_{i}) = a\} = \{\sum_{j = 1}^{\infty} C_{j} ϕ_{j} (S_{i}) = a\}$ , or on the event $\{c (S_{i}) = a\} = \{\sum_{j = 1}^{\infty} c_{j} ϕ_{j} (S_{i}) = a\}$ , where the randomness of the event comes from $S_{i}$ , or on $\{C (s) = a\} = \{\sum_{j = 1}^{\infty} C_{j} ϕ_{j} (s) = a\}$ , where the randomness comes from $(C_{1}, C_{2}, . . .)$ .
^17.It should be clear that it is not possible to separately identify $m_{1} (X)$ from $C (X)$ in this example.
^18.Recall that the nonparametric version of the factor model takes $Y_{i} = m_{1} (X_{i}) + U_{i}$ , with $U_{i} = \sum_{j = 1}^{J} S_{i j} C_{j} + ε_{i}$ . The parametric model imposes $m_{1} (x) = α + x^{'} β$ .
^19.Note that if we were able to estimate the conditional expectation $m (x)$ instead of $m (x, C)$ , it would be impossible to separate $x^{'} β$ from $\sum_{j = 1}^{J} b_{i j} E [C_{j} | X = x]$ , and so, we would not be able to identify β.
^20.In the Supplemental Material, we provide conditions under which the kernel density estimator is consistent: $\hat{f} (x) \overset{p}{⟶} f (x | C)$ . For the variance $σ^{2} (x, c) = E (Y_{i}^{2} | X_{i} = x, C = c) - {[m (x, c)]}^{2}$ , we can take ${\hat{σ}}^{2} (x)$ to be:

${\hat{σ}}^{2} (x) = [\frac{Σ_{i = 1}^{n} Y_{i}^{2} K (\frac{X_{i} - x}{h})}{Σ_{i = 1}^{n} K (\frac{X_{i} - x}{h})}] - {[\hat{m} (x)]}^{2} .$

The first term on the right-hand side converges in probability to $E (Y_{i}^{2} | X_{i} = x, C)$ using the same arguments as in Proposition 1. The second term on the right-hand side converges in probability to ${[m (x, C)]}^{2}$ by the Slutsky theorem. Therefore, ${\hat{σ}}^{2} (x) \overset{p}{⟶} σ^{2} (x, C)$ . Next, note that:

$T_{n} = \sqrt{n h_{n}^{k}} \frac{(\hat{m} (x) - m (x, C))}{{(\frac{{\hat{σ}}^{2} (x)}{\hat{f} (x)} \int K^{2} (u) d u)}^{1 / 2}} + \sqrt{n h_{n}^{k}} \frac{(m (x, C) - m_{0} (x))}{{(\frac{{\hat{σ}}^{2} (x)}{\hat{f} (x)} \int K^{2} (u) d u)}^{1 / 2}} .$

The first term on the RHS converges in distribution to $N (0, 1)$ by Proposition 1.3(ii). The second term on the RHS is such that: (a) $\hat{f} (x) \geq ξ > 0$ , for some finite ξ, with probability approaching one because $f (x | c) \geq ξ > 0$ for Q-almost all c (see the Supplemental Material). If (b) $σ^{2} (x, C)$ is finite Q-almost surely (implying ${\hat{σ}}^{2} (x)$ is finite with probability approaching one); and if (c) $m (x, C) \neq m_{0} (x)$ with positive probability; then, the second term on the RHS diverges in probability to $\pm \infty$ . As a result, $|T_{n}| \to \infty$ as $n \to \infty$ under the null.
^21.Dellacherie and Meyer [29], Hoffmann-Jorgensen [30] (Chapter 6 and Section 10.11), Pachl [49], and Chang and Pollard [50] are also important references.
^22.Formally, the necessary condition is that λ must be approximated by a compact paving that is closed under countable unions.

References

G. Arbia. Spatial Econometrics: Statistical Foundations and Applications to Regional Convergence. Berlin, Germany: Springer-Verlag, 2006. [Google Scholar]
D.W.K. Andrews. “Handling Dependence: Temporal, Cross-Sectional and Spatial.” In Proceedings of the Cowles Summer Conference, New Haven, CT, USA, 22–23 June 2009.
V. Sarafidis, and T. Wansbeek. “Cross-sectional dependence in panel data analysis.” Econom. Rev. 31 (2012): 483–531. [Google Scholar] [CrossRef] [Green Version]
A. Chudik, and M.H. Pesaran. “Large panel data models with cross-sectional dependence: A survey.” In The Oxford Handbook on Panel Data. Edited by B. Baltagi. New York, NY, USA: Oxford University Press, 2015, pp. 3–45. [Google Scholar]
Q.H. Xu, Z.W. Cai, and Y. Fang. “Panel data models with cross-sectional dependence: A selective review.” Appl. Math.-J. Chin. Univ. 31 (2016): 127–147. [Google Scholar] [CrossRef]
P.C.B. Phillips, and D. Sul. “Dynamic panel estimation and homogeneity testing under cross section dependence.” Econom. J. 6 (2003): 217–259. [Google Scholar] [CrossRef]
P.C.B. Phillips, and D. Sul. “Bias in dynamic panel estimation with fixed effects, incidental trends and cross section dependence.” J. Econom. 137 (2007): 162–188. [Google Scholar] [CrossRef]
D.W.K. Andrews. “Cross-section regression with common shocks.” Econometrica 73 (2005): 1551–1585. [Google Scholar] [CrossRef]
J. Bai, and S. Ng. “Evaluating latent and observed factors in macroeconomics and finance.” J. Econom. 131 (2006): 507–537. [Google Scholar] [CrossRef]
M.H. Pesaran. “Estimation and inference in large heterogeneous panels with a multifactor error structure.” Econometrica 74 (2006): 967–1012. [Google Scholar] [CrossRef]
J. Bai. “Panel data models with interactive fixed effects.” Econometrica 77 (2009): 1229–1279. [Google Scholar]
H. Moon, and M. Weidner. “Likelihood Expansion for Panel Regression Models with Factors.” Available online: http://cowles.yale.edu/handling-dependence-temporal-cross-sectional-and-spatial (accessed on 2 July 2014).
P. Zaffaroni. “Generalized Least Squares Estimation of Panel with Common Shocks.” Unpublished paper. Available online: http://www.imperial.ac.uk/people/p.zaffaroni (accessed on 1 July 2014).
M.H. Pesaran, and E. Tosetti. “Large panels with common factors and spatial correlation.” J. Econom. 161 (2011): 182–202. [Google Scholar] [CrossRef]
L. Su, and S. Jin. “Sieve estimation of panel data models with cross section dependence.” J. Econom. 169 (2012): 34–47, In Special Issue “Recent Advances in Panel Data, Nonlinear and Nonparametric Models: A Festschrift in Honor of Peter C.B. Phillips”. [Google Scholar] [CrossRef]
X. Huang. “Nonparametric Estimation in Large Panels with Cross-Sectional Dependence.” Econom. Rev. 32 (2013): 754–777. [Google Scholar] [CrossRef]
G.M. Kuersteiner, and I.R. Prucha. “Limit theory for panel data models with cross sectional dependence and sequential exogeneity.” J. Econom. 174 (2013): 107–126. [Google Scholar] [CrossRef] [PubMed]
A. Chudik, and M.H. Pesaran. “Common correlated effects estimation of heterogeneous dynamic panel data models with weakly exogenous regressors.” J. Econom. 188 (2015): 393–420. [Google Scholar] [CrossRef]
G. Forchini, and B. Peng. “A conditional approach to panel data models with common shocks.” Econometrics 4 (2016): 4. [Google Scholar] [CrossRef]
D. Evans, A. Tandon, C. Murray, and J. Lauer. The comparative efficiency of national health systems in producing health: An analysis of 191 countries. GPE Discussion Paper No. 29; Geneva, Switzerland: World Health Organization, 2000. [Google Scholar]
M. Eberhardt, C. Helmers, and H. Strauss. “Do spillovers matter when estimating private returns to R&D? ” Rev. Econ. Stat. 95 (2013): 436–448. [Google Scholar]
K. Gaibulloev, T. Sandler, and D. Sul. “Common drivers of transnational terrorism: Principal component analysis.” Econ. Inq. 51 (2013): 707–721. [Google Scholar]
J. Altonji, T. Conley, T.E. Elder, and C.R. Taber. Methods for Using Selection on Observed Variables to Address Selection on Unobserved Variables. New Haven, CT, USA: Yale University, 2010. [Google Scholar]
J.L. Doob. Stochastic Processes. New York, NY, USA: Wiley, 1953. [Google Scholar]
P.R. Halmos. Measure Theory. New York, NY, USA: Van Nostrand, 1950, (July 1969 reprinting). [Google Scholar]
Q. Wang, and P.C.B. Phillips. “Asymptotic Theory for Local Time Density Estimation and Nonparametric Cointegration Regression.” Econom. Theory 25 (2009): 710–738. [Google Scholar] [CrossRef]
E.A. Souza-Rodrigues, and University of Toronto, Toronto, ON, Canada. “Nonparametric estimation of generalized regression model with group effects.” Unpublished paper. 2014. [Google Scholar]
D. Pollard. A User’s Guide to Measure Theoretic Probability. New York, NY, USA: Cambridge University Press, 2002. [Google Scholar]
C. Dellacherie, and P.A. Meyer. Probabilities and Potential. Amsterdam, The Netherland: North-Holland, 1978. [Google Scholar]
J. Hoffmann-Jorgensen. Probability with a View Towards Statistics. New York, NY, USA: Chapman and Hall, 1994, Volume 2. [Google Scholar]
S.T. Berry, and P.A. Haile. “Identification of a nonparametric generalized regression model with group effects.” Discussion paper. New Haven, CT, USA: Yale University, 2009. [Google Scholar]
J.F. Finks, N.H. Osborne, and J.D. Birkmeyer. “Trends in hospital volume and operative mortality for high-risk surgery.” N. Engl. J. Med. 364 (2011): 2128–2137. [Google Scholar] [CrossRef] [PubMed]
L. Anselin. Spatial Econometric Methods and Models. Boston, MA, USA: Springer, 1988. [Google Scholar]
T.G. Conley. “GMM estimation with cross-sectional dependence.” J. Econom. 92 (1999): 1–45. [Google Scholar] [CrossRef]
H.H. Kelejian, and I.R. Prucha. “A generalized moments estimator for the autoregressive parameter in a spatial model.” Int. Econ. Rev. 40 (1999): 509–533. [Google Scholar] [CrossRef]
L.F. Lee. “Consistency and efficiency of least squares estimation for mixed regressive, spatial autoregressive models.” Econom. Theory 18 (2002): 252–277. [Google Scholar] [CrossRef]
C. Martins-Filho, and F. Yao. “Nonparametric regression estimation with general parametric error covariance.” J. Multivar. Anal. 100 (2009): 309–333. [Google Scholar] [CrossRef]
M. Gerolimetto, and S. Magrini. Nonparametric Regression with Spatially Dependent Data. Veneza, Italy: Dipartimento di Scienze Economiche, Università Ca’ Foscari Venezia, 2009. [Google Scholar]
L. Lee, and J. Yu. “Efficient GMM estimation of spatial dynamic panel data models with fixed effects.” J. Econom. 180 (2014): 174–197. [Google Scholar] [CrossRef]
L. Su, and Z. Yang. “QML estimation of dynamic panel data models with spatial errors.” J. Econom. 185 (2015): 230–258. [Google Scholar] [CrossRef]
C.A. Bester, T.G. Conley, C.B. Hansen, and T.J. Vogelsang. “Fixed-b asymptotics for spatially dependent robust nonparametric covariance matrix estimators.” Econom. Theory 32 (2016): 154–186. [Google Scholar] [CrossRef]
G. Arbia. “Spatial Econometrics: A Rapidly Evolving Discipline.” Econometrics 4 (2016): 18. [Google Scholar] [CrossRef]
L. Lee, and J. Yu. “Some recent developments in spatial panel data models.” Reg. Sci. Urban Econ. 40 (2010): 255–271. [Google Scholar] [CrossRef]
P.M. Robinson. “Asymptotic theory for nonparametric regression with spatial data.” J. Econom. 165 (2011): 5–19. [Google Scholar] [CrossRef]
P.M. Robinson, and J. Lee. “Series estimation under cross-sectional dependence.” J. Econom. 190 (2016): 1–17. [Google Scholar]
G.B. Folland. Real Analysis: Modern Techniques and Their Applications, 2nd ed. Pure and Applied Mathematics: AWiley Series of Texts, Monographs and Tracts; New York, NY, USA: Wiley Interscience, 1999. [Google Scholar]
M.M. Rao. “Paradoxes in conditional expectation.” J. Multivar. Anal. 27 (1988): 434–446. [Google Scholar] [CrossRef]
A.R. Pagan, and A. Ullah. Nonparametric Econometrics. New York, NY, USA: Cambridge University Press, 1999. [Google Scholar]
J. Pachl. “Disintegration and compact measures.” Mathematica Scandinavica 43 (1978): 157–168. [Google Scholar]
J.T. Chang, and D. Pollard. “Conditioning as disintegration.” Statistica Neerlandica 51 (1997): 287–317. [Google Scholar] [CrossRef]
R.W. Hansell. “Sums, products and continuity of Borel maps in nonseparable metric spaces.” Proc. Am. Math. Soc. 104 (1988): 465–471. [Google Scholar] [CrossRef]
V.I. Bogachev. Measure Theory. Berlin, Germany: Springer-Verlag, 2007, Volume 2. [Google Scholar]

© 2016 by the author; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license ( http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Souza-Rodrigues, E.A. Nonparametric Regression with Common Shocks. Econometrics 2016, 4, 36. https://doi.org/10.3390/econometrics4030036

AMA Style

Souza-Rodrigues EA. Nonparametric Regression with Common Shocks. Econometrics. 2016; 4(3):36. https://doi.org/10.3390/econometrics4030036

Chicago/Turabian Style

Souza-Rodrigues, Eduardo A. 2016. "Nonparametric Regression with Common Shocks" Econometrics 4, no. 3: 36. https://doi.org/10.3390/econometrics4030036

APA Style

Souza-Rodrigues, E. A. (2016). Nonparametric Regression with Common Shocks. Econometrics, 4(3), 36. https://doi.org/10.3390/econometrics4030036

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Nonparametric Regression with Common Shocks

Abstract

1. Introduction

2. Regression Model and Conditional Densities

Data Generation

Existence of Conditional Densities

3. Regression Estimator

4. Conclusions

Supplementary Materials

Acknowledgments

Conflicts of Interest

Appendix A. Disintegration Theory

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI