Weak Identification Robust Tests for Subvectors Using Implied Probabilities

Carrasco, Marine; Chaudhuri, Saraswata

doi:10.3390/e27040396

Open AccessArticle

Weak Identification Robust Tests for Subvectors Using Implied Probabilities

by

Marine Carrasco

^1,*

and

Saraswata Chaudhuri

²

¹

Department of Economics, University of Montreal, Montreal, QC H3T 1J4, Canada

²

Department of Economics, McGill University, Montreal, QC H3A 0G4, Canada

^*

Author to whom correspondence should be addressed.

Entropy 2025, 27(4), 396; https://doi.org/10.3390/e27040396

Submission received: 2 February 2025 / Revised: 21 March 2025 / Accepted: 2 April 2025 / Published: 8 April 2025

(This article belongs to the Special Issue Maximum Entropy Principle and Applications)

Download

Browse Figure

Versions Notes

Abstract

This paper develops tests for hypotheses concerning subvectors of parameters in models defined by moment conditions. It is well known that conventional tests such as Wald, Likelihood-ratio and Score tests tend to over-reject when the identification is weak. To prevent uncontrolled size distortion and introduce refined finite-sample performance, we extend the projection-based test to a modified version of the score test using implied probabilities obtained by information theoretic criteria. Our test is performed in two steps, where the first step reduces the space of parameter candidates, while the second one involves the modified score test mentioned earlier. We derive the asymptotic properties of this procedure for the entire class of Generalized Empirical Likelihood implied probabilities. Simulations show that the test has very good finite-sample size and power. Finally, we apply our approach to the veteran earnings and find a negative impact of the veteran status.

Keywords:

empirical likelihood; entropy; exponential tilting; projection based test; score test; weak instruments

1. Introduction

We are interested in developing tests for hypotheses concerning subvectors of an unknown parameter

θ \in R^{d_{θ}} .

The true value of the parameter

θ,

denoted by

θ^{0},

satisfies a vector of moment conditions:

E [g (W_{i}; θ^{0})] = 0

where the vector

g \in R^{d_{g}}

is known and

d_{g} \geq d_{θ} .

The moment conditions may stem from the first-order condition of the maximization of any criterion written as an expectation (for instance the expected utility in economics). They may also come from matching theoretical and empirical moments (see Section 4.1) or from instrumental variables (see Section 5). Other examples can be found in the book by Hall [1]. A particularity of our tests is that they are robust to weak identification.

To illustrate the notion of weak identification, consider the example of the linear instrumental variable regression:

y_{i} = X_{i} θ^{0} + u_{i},

where the endogenous regressor

X_{i}

is a scalar random variable related to the instruments

Z_{i}

through the reduced-form equation

X_{i} = Z_{i}^{'} Π + V_{i}

where

E [Z_{i} u_{i}] = 0

. Let

W_{i} = {(y_{i}, X_{i}, Z_{i}^{'})}^{'}

, and the moment condition corresponding to the orthogonality between

Z_{i}

and

u_{i}

is

g (W_{i}; θ) = Z_{i} (y_{i} - X_{i} θ) .

If

Π

is non-null and independent of n, the instruments

Z_{i}

are strongly correlated with the endogenous regressor

X_{i}

, and hence the instruments are said to be strong. In that case,

θ

is identified in the sense that

E g (W_{i}; θ) = 0 \Leftrightarrow θ = θ^{0} .

Then, the GMM estimator is consistent with the

\sqrt{n}

rate of convergence. When

Π = C / \sqrt{n}

, the correlations between

Z_{i}

and

X_{i}

go to zero, and the instruments are said to be weak (in the sense of Staiger and Stock [2]). In that case, the GMM estimator of

θ

is not consistent because

E [g (W_{i}; θ)] \to 0

as

n \to \infty

for all

θ \neq θ^{0}

(with

θ - θ^{0}

fixed). Then, the standard confidence intervals and tests are not reliable. In the nearly weak/semi-strong case, i.e., when

Π = C / n^{ϕ}

with

0 < ϕ < 1 / 2

, the GMM estimator of

θ

is consistent with a slower rate of convergence than the usual

\sqrt{n}

(see Antoine and Renault [3] and Andrews and Cheng [4]).

Based on a random sample

W_{i}

,

i = 1, 2, \dots, n,

the standard approach of inference is to conduct a Wald test based on the Generalized Method of Moments (GMM) estimator of

θ

or a score test. Wald tests have been shown to be inappropriate in the presence of weak identification (Dufour [5]). Moreover, the GMM-based score test proposed by Newey and West [6] is plagued by size distortions under common scenarios such as skewed moment vectors or models with weak identification; see the discussion in Wang and Zivot [7]. To improve the finite sample properties of this test, Chaudhuri and Renault [8] and Chaudhuri and Renault [9] propose to replace the uniform weights by implied probabilities obtained from an Information Theory criterion. These probabilities exploit the information from the model, namely that

E [g (W_{i}, θ)] = 0

. So the implied probabilities

{\hat{π}}_{i}

are selected such that the moments hold exactly:

\sum_{i = 1}^{n} {\hat{π}}_{i} g (W_{i}; θ) = 0 .

However, given that the number of moments,

d_{g},

is smaller than the sample size n, there is an infinity of possibilities for

{\hat{π}}_{i}

,

i = 1, 2, \dots, n

. The estimation of

π_{i}

is an ill-posed problem. Which distribution should be used? A solution inspired from the entropy literature is to select the distribution obtained by minimizing the Cressie–Read divergence measure under the moment restrictions. Equivalently, one could also work with the Generalized Empirical Likelihood (GEL) that is characterized by the dual problem of this Cressie–Read divergence minimization. Two notable members of this class are the Empirical Likelihood estimator and the exponential tilting estimator (see Newey and Smith [10]). All these estimators can be viewed as Information Theory estimators (see Kitamura and Stutzer [11] and Golan [12]).

Chaudhuri and Renault [9] focus on tests for the entire parameter vector, i.e.,

H_{0} : θ = θ_{0}

and they show that implied probability-based score tests lead to improved finite sample properties compared to the conventional score test. In particular, they have better size control and remain powerful.

In this paper, we are concerned with testing the subsets of parameters

θ_{1} \subset θ

. More precisely, we want to test

H_{0} : θ_{1} = θ_{10} .

The subset version of score tests suffers from important size distortion as shown by Guggenberger et al. [13]. To address this issue, we suggest to use the projection-based test developed by Chaudhuri and Zivot [14] coupled with the score test that includes the GEL implied probabilities.

The contribution of our paper is to provide a framework that opens up the possibility of applying any type of the Generalized Empirical Likelihood or Cressie–Read implied probabilities to the type of score tests discussed in Chaudhuri and Zivot [14]; see also Smith [15], Newey and Smith [10]. We derive the asymptotic properties of the resulting tests using the properties of the implied probabilities obtained in Chaudhuri and Renault [9] and generalized to include all the GEL estimators. Special care is taken to allow for weak identification. The simulations show that these tests perform well in terms of finite-sample size and exhibit strong power under the alternative. We complete the paper with an empirical illustration examining the effect of veteran status on earnings. Using our proposed test, we construct confidence intervals for the returns to veteran status on earnings, leveraging instrumental variables. This analysis, inspired by Chaudhuri and Rose [16], builds on the seminal natural experiment framework developed by Angrist [17] and Card [18], which earned them the 2021 Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel. Our findings provide evidence of a negative impact of the veteran status on earnings.

The related literature is vast. The application of Information Theory measures to the estimation of econometric models goes back to Golan et al. [19], Kitamura and Stutzer [11], Imbens et al. [20], Smith [21], and Kitamura [22], among others. The use of the implied probabilities in the context of testing the hypothesis in the GMM setup was pioneered by Guggenberger and Smith [23] and further developed by Caner [24], Chaudhuri and Renault [8], and Chaudhuri and Renault [9]. Our current paper builds on this literature. Extensive research in econometrics has demonstrated that testing subsets of parameters in the face of commonly encountered problems such as weak identification is a much more difficult problem than testing the full parameter vector studied in Chaudhuri and Renault [9]; see Guggenberger et al. [13], Andrews et al. [25]. Chaudhuri and Zivot [14] provide an early contribution to the weak identification robust testing of subsets of parameters that is subsequently extended and refined in Andrews [26] and Andrews [27], and that is particularly suitable for application of the Information Theory. Time series extensions of generalized empirical likelihood are proposed by Otsu [28] and Guggenberger and Smith [29]. Recent papers have tried to develop most powerful tests for subvectors; see Guggenberger et al. [30], Kleibergen [31], and Kleibergen et al. [32]. In the current paper, we demonstrate that the application of Information Theory in the form of implied probabilities to the test of Chaudhuri and Zivot [14] for subsets of parameters delivers improved finite-sample performance.

The remainder of this paper is organized as follows. Section 2 describes the GMM framework and the implied probabilities in the context of the null hypothesis for subsets of parameters that is the focus of our interest. Section 3 discusses the score test for subsets of parameters and establishes its asymptotic properties. Section 4 provides evidence of the improved performance of this test using simulation results in empirically relevant settings. Section 5 includes the empirical application. Finally, Section 6 concludes. The main proofs are collected in Appendix A.

Notation. For a sequence

\{a_{n}\}

of real numbers,

a_{n} = O (n^{k})

means

a_{n} / n^{k} \to c

as

n \to \infty

for some constant c;

a_{n} = o (n^{k})

means

a_{n} / n^{k} \to 0

as

n \to \infty .

The notation

\overset{P}{\to}

represents the convergence in probability as

n \to \infty

. If

\{a_{n}\}

is a sequence of random variables, then

a_{n} = O_{p} (n^{k})

if

a_{n} / n^{k} - c_{n} \overset{P}{\to} 0

where

c_{n}

is a deterministic sequence. Similarly

a_{n} = o_{p} (n^{k})

if

a_{n} / n^{k} \overset{P}{\to} 0 .

2. Implied Probabilities for Hypothesis on Subsets of Parameters in the GMM Framework

2.1. Background

Let

W_{1}, \dots, W_{n}

be independent and identically distributed (i.i.d.) random variables. Let

g (W_{i}; θ) : R^{d_{w}} \times Θ \mapsto R^{d_{g}}

be the

d_{g}

-dimensional moment vector,

W_{i}

a

d_{w}

-dimensional random vector,

Θ \subseteq R^{d_{θ}}

the parameter space, and let

d_{g} \geq d_{θ}

. The dimensions

d_{w}

,

d_{g}

and

d_{θ}

are fixed and hence do not depend on n. Suppose that we have a set of moment restrictions:

E [g (W_{i}; θ^{0})] = 0

(1)

which holds for the true value of the parameter

θ^{0} .

Our goal is the testing of hypotheses on subsets of parameters, i.e., a subvector of

θ

. Without loss of generality, let

θ = {(θ_{1}^{'}, θ_{2}^{'})}^{'}

, and let the null hypothesis of interest be

H_{0} : θ_{1} = θ_{10} .

(2)

The parameter

θ_{1}

is the parameter of primary interest, while

θ_{2}

is the nuisance parameter.

The usual approach to tackle this problem consists in estimating

θ

by constrained GMM. Constrained estimators are obtained by imposing the null hypothesis. The estimator takes the form

θ = {(θ_{10}^{'}, θ_{2}^{'})}^{'}

that restricts

θ_{1}

by

H_{0}

but lets the

θ_{2}

parameters be unrestricted. Given a first-step consistent estimator of

θ

, denoted

\bar{θ}

, the constrained GMM estimator is a solution of

{\hat{θ}}_{G M M} = arg min_{θ \in Θ_{1}} Q_{n} (θ) \equiv {\bar{g}}_{n} {(θ)}^{'} {\hat{Ω}}_{n} {(\bar{θ})}^{- 1} {\bar{g}}_{n} (θ)

where

Θ_{1}

is the set of elements

{(θ_{1}^{'}, θ_{2}^{'})}^{'}

of

Θ

such that

θ_{1} = θ_{10}

,

{\bar{g}}_{n} (θ) : = \sum_{i = 1}^{n} g (W_{i}; θ) / n

, and

{\hat{Ω}}_{n} (θ) : = \frac{1}{n} \sum_{i = 1}^{n} g (W_{i}; θ) g^{'} (W_{i}; θ) .

Let

{\bar{G}}_{n} (θ) = (1 / n) \sum_{i = 1}^{n} \partial g (W_{i}; θ) / \partial θ^{'}

. The score test proposed by Newey and West [6] is

L M_{n} = {(\frac{\partial Q_{n} ({\hat{θ}}_{G M M})}{\partial θ})}^{'} I_{n}^{- 1} ({\hat{θ}}_{G M M}) (\frac{\partial Q_{n} ({\hat{θ}}_{G M M})}{\partial θ})

where

\begin{matrix} \partial Q_{n} (θ) / \partial θ & = & {\bar{G}}_{n} {(θ)}^{'} {\hat{Ω}}_{n} {(θ)}^{- 1} {\bar{g}}_{n} (θ) and \\ I_{n} (θ) & = & {\bar{G}}_{n} {(θ)}^{'} {\hat{Ω}}_{n} {(θ)}^{- 1} {\bar{G}}_{n} (θ) . \end{matrix}

(3)

According to Chaudhuri and Renault [8] and Chaudhuri and Renault [9], the poor finite sample properties of the score test can be improved by replacing the averages in

{\bar{G}}_{n} (θ)

and

{\hat{Ω}}_{n} (θ)

by weighted sum using Information Theory. Instead of averaging using an equal weight

1 / n,

one should use the implied probabilities obtained from information theoretic criteria. The criterion considered is the Cressie–Read family.

The optimization problem solved by the implied probabilities

{\hat{π}}_{n}^{(γ)} (θ)

for

θ \in Θ

is

min_{π \in R^{n}} \frac{1}{γ (γ + 1)} \sum_{i = 1}^{n} [{(n π_{i})}^{1 + γ} - 1] subject to \sum_{i = 1}^{n} π_{i} = 1 and \sum_{i = 1}^{n} π_{i} g (W_{i}; θ) = 0 .

(4)

The objective function (4) is defined for any real

γ

, including the two limit cases

γ \to 0

and

γ \to - 1 .

γ = - 1

corresponds to Empirical Likelihood (EL), and

γ = 1

corresponds to the Euclidean Empirical Likelihood (EEL), and

γ \to 0

to the Kullback–Leibler Information Criterion (KLIC) that is consistent with Shannon’s entropy. The so-called Generalized Empirical Likelihood (GEL) estimator of

θ

is obtained by minimizing the criterion in (4) with respect to

θ

or alternatively by minimizing the dual problem based on the Lagrange multipliers associated with the constraints

\sum_{i = 1}^{n} π_{i} g (W_{i}; θ) = 0

; see Guggenberger and Smith [23] and Chaudhuri and Renault [9]. Here, however, the aim is not to estimate

θ

but to perform a test. Therefore, we need to go further than the aforementioned references and devise the two-step approach described in Section 3 to address the additional inferential issues of the uncontrolled over-rejection of the truth without an unnecessary loss in power.

2.2. Assumptions

Let us define

V (θ) : = V a r (\sqrt{n} {\bar{g}}_{n} (θ)) and {\hat{V}}_{n} (θ) : = \frac{1}{n} \sum_{i = 1}^{n} g (W_{i}; θ) {[g (W_{i}; θ) - {\bar{g}}_{n} (θ)]}^{'} .

We remark that this definition of

{\hat{V}}_{n} (θ)

corresponds to the appropriate estimator of

V (θ)

for the EEL estimator.

Consider a sequence of subsets

Θ_{n} : n \geq 1

of

Θ

containing

θ^{0}

.

Θ_{n}

is a neighborhood of

θ^{0}

whose width depends on the identification strength in (1). Typically,

Θ_{n}

is narrower for strongly identified parameters and wider for weakly identified parameters.

We will maintain Assumptions 1 and 2 below to show the asymptotic equivalence of the implied probabilities for

θ \in Θ_{n}

; see Guggenberger and Smith [23] and Chaudhuri and Renault [9] for more discussion.

Assumption 1.

(i): ${sup}_{θ \in Θ_{n}} ∥E [{\bar{g}}_{n} (θ)]∥ = O (\frac{1}{\sqrt{n}})$ .
(ii): ${max}_{1 \leq i \leq n} {sup}_{θ \in Θ_{n}} ∥g (W_{i}; θ)∥ = o_{p} (\sqrt{n})$ .
(iii): ${sup}_{θ \in Θ_{n}} ∥g (W_{i}; θ)∥ = O_{p} (1)$ for $i = 1, \dots, n$ .
(iv): ${sup}_{θ \in Θ_{n}} ∥{\bar{g}}_{n} (θ) - E [{\bar{g}}_{n} (θ)]∥ = O_{p} (\frac{1}{\sqrt{n}})$ .

Assumption 2.

(i): ${sup}_{θ \in Θ_{n}} ∥{\hat{Ω}}_{n} (θ) - V (θ)∥ = o_{p} (1)$ , ${sup}_{θ \in Θ_{n}} ∥{\hat{V}}_{n} (θ) - V (θ)∥ = o_{p} (1)$ ,
${sup}_{θ \in Θ_{n}} ∥{\hat{Ω}}_{n}^{- 1} (θ) - V^{- 1} (θ)∥ = o_{p} (1)$ and ${sup}_{θ \in Θ_{n}} ∥{\hat{V}}_{n}^{- 1} (θ) - V^{- 1} (θ)∥ = o_{p} (1)$ .
(ii): $0 < {inf}_{θ \in Θ_{n}} b_{min} (θ) < {sup}_{θ \in Θ_{n}} b_{max} (θ) < + \infty$ where $b_{min} (θ)$ and $b_{max} (θ)$ stand for the smallest and largest eigenvalues, respectively, of $V (θ)$ .

Assumption 1 is not restrictive if

Θ_{n}

is reduced to

θ^{0}

for all

n .

In that case, Assumptions 1(i) and 1(iii) are fulfilled by definition. Assumptions 1(ii) and 1(iv) follow from the fact that

g (W_{i}; θ^{0})

is i.i.d. with zero mean and finite variance. Assumption 1(ii) is a consequence of the Borel–Cantelli Lemma. Assumption 1(iv) can be proved by the Lindeberg–Levy Central Limit Theorem. The validity of Assumption 1 when

Θ_{n}

is a local neighborhood of

θ^{0}

, where the definition of local depends on the identification strength of

θ

, follows under mild conditions.

Regarding Assumption 2(i), it requires the uniform law of large numbers for the sample covariance matrix only. The two convergences on the first line of (i) are equivalent, provided Assumptions 1(i) and 1(iv) hold. The same is true for the convergence on the second line under the extra condition of Assumption 2(ii), which ensures that the population covariance matrix is positive definite and finite.

2.3. Properties of GEL Implied Probabilities

In this section, we investigate the properties of weighted sums based on implied probabilities. To do so, it is convenient to use the dual representation of the estimators introduced in Section 2.1.

Let

ρ (.)

be a scalar function that is concave on its domain

V

, an open interval containing 0. The GEL class of estimators of

θ^{0}

is indexed by the function

ρ

and is defined as

\begin{matrix} {\hat{θ}}_{ρ, n} : = arg min_{θ \in Θ} sup_{λ \in Λ_{n} (θ)} {\hat{Q}}_{ρ, n} (θ, λ) \\ where & {\hat{Q}}_{ρ, n} (θ, λ) : = \frac{1}{n} \sum_{i = 1}^{n} ρ (λ^{'} g (W_{i}; θ)) - ρ (0), \\ and & Λ_{n} (θ) : = {λ \in R^{k} : λ^{'} g (W_{i}; θ) \in V, \forall i = 1, \dots, n} . \end{matrix}

Different choices of

ρ (.)

lead to different GEL estimators. The Continuous-Updating GMM or Euclidean Empirical Likelihood (EEL) estimator is a special case with (

ρ (v) = - {(1 + v)}^{2} / 2, V = R

) corresponding to

γ \to - 1

in Equation (4), the Empirical Likelihood (EL) estimator (

ρ (v) = ln (1 - v), V = (- 1, \infty)

) corresponds to

γ = 1

, the exponential tilting (ET) estimator (

ρ (v) = - exp (v), V = R

) to

γ \to 0

, etc., all of which satisfy Assumption

ρ

below.

Assumption $ρ$ : (GEL function)
$ρ : V \mapsto R$ is a continuous function such that

(i): $ρ$ is concave on its domain $V$ , which is an open interval containing 0.
(ii): $ρ$ is twice continuously differentiable on its domain. Defining $ρ_{r} (v) : = \partial^{r} ρ (v) / \partial v^{r}$ for $r = 1, 2$ and $ρ_{r} : = ρ_{r} (0)$ , let $ρ_{1} = ρ_{2} = - 1$ (standardization for convenience).
(iii): There exists a positive constant b such that for each $v \in V$ , $| ρ_{2} (v) - ρ_{2} (0) | \leq b \times | v |$ hold.

The desirable higher-order properties of the GEL estimators are due to the GEL first-order condition, which, assuming the differentiability of the moment vector

g (w; θ)

with respect to

θ

, is given by

{[\sum_{i = 1}^{n} π_{ρ, i, n} ({\hat{θ}}_{ρ, n}) G_{i} ({\hat{θ}}_{ρ, n})]}^{'} {[\sum_{i = 1}^{n} κ_{ρ, i, n} ({\hat{θ}}_{ρ, n}) g (W_{i}; {\hat{θ}}_{ρ, n}) g^{'} (W_{i}; {\hat{θ}}_{ρ, n})]}^{- 1} {\bar{g}}_{n} ({\hat{θ}}_{ρ, n}) = o_{P} (\frac{1}{\sqrt{n}})

where for given

θ

and

ρ (.), {\bar{g}}_{n} (θ) : = \frac{1}{n} \sum_{i = 1}^{n} g (W_{i}; θ), G_{i} (θ) : = \frac{\partial}{\partial θ^{'}} g (W_{i}; θ),

\begin{matrix} λ_{ρ, n} (θ) & : & = arg sup_{λ \in Λ_{n} (θ)} {\hat{Q}}_{ρ, n} (θ, λ), \end{matrix}

(5)

\begin{matrix} π_{ρ, i, n} (θ) & : & = \frac{ρ_{1} (λ_{ρ, n}^{'} (θ) g (W_{i}; θ))}{\sum_{j = 1}^{n} ρ_{1} (λ_{ρ, n}^{'} (θ) g (W_{j}; θ))} : implied probabilities from GEL, \\ κ_{ρ, i, n} (θ) & : & = \frac{κ_{ρ} (λ_{ρ, n}^{'} (θ) g (W_{i}; θ))}{\sum_{j = 1}^{n} κ_{ρ} (λ_{ρ, n}^{'} (θ) g (W_{j}; θ))}, κ_{ρ} (v) : = \frac{ρ_{1} (v) + 1}{v} if v \neq 0, κ_{ρ} (0) = - 1 \end{matrix}

(6)

Interestingly, the form of

ρ (.)

for EL leads to

π_{ρ, i, n} (θ) = κ_{ρ, i, n} (θ)

for

i = 1, \dots, n

. It is because of this along with the orthogonalization property of the implied probabilities

π_{ρ, i, n} (θ)

(shown in Proposition 2 below) that the EL estimator has superior higher-order properties among the GEL class (see Newey and Smith [10]).

Note that Assumption

ρ (i i i)

is a technical assumption needed only for the proofs. Now, we are able to establish some important results relative to the GEL implied probabilities.

Proposition 1.

Let Assumptions 1, 2, and ρ hold. Then, for

θ \in Θ_{n}

:

(A): $λ_{ρ, n} (θ)$ defined in (5) is such that $λ_{ρ, n} (θ) = - {\hat{Ω}}_{n}^{- 1} (θ) {\bar{g}}_{n} (θ) + o_{P} (n^{- 1 / 2})$ ,
(B): $π_{ρ, i, n} (θ)$ defined in (6) is such that for a given $i = 1, \dots, n$ ,

$π_{ρ, i, n} (θ) = π_{E E L, i, n} (θ) + o_{P} (n^{- 3 / 2}),$

where $π_{E E L, i, n} (θ)$ ’s are the implied probabilities from EEL with the closed-form expression

$π_{E E L, i, n} (θ) = \frac{1}{n} [1 - {(g (W_{i}; θ) - {\bar{g}}_{n} (θ))}^{'} {\hat{Ω}}_{n}^{- 1} (θ) {\bar{g}}_{n} (θ)] = \frac{1}{n} + O_{P} (n^{- 3 / 2}) .$

Remark 1.

It follows from (B) that the difference between the EEL and GEL implied probabilities is of a smaller order than that between the EEL implied probabilities and the naive empirical probabilities

{1 / n}

. It may be tempting to argue that the use of the GEL implied probabilities to reweight the observations results in an equivalence up to one higher order. However, this result, in itself, is not sufficient for such a claim because (B) is not uniform in

i = 1, \dots, n

. We provide a formal proof of this claim in Proposition 2.

Proposition 2.

Let Assumptions 1, 2, and ρ hold, and let θ be an arbitrary element of

Θ_{n}

. Consider n i.i.d. realizations

{Y_{1, n}, \dots, Y_{n, n}}

of a

d \times 1

random vector

Y_{n}

. Denote

{\bar{Y}}_{n} = \sum_{i = 1}^{n} Y_{i, n} / n

. Assume that

{\bar{Y}}_{n} - E [{\bar{Y}}_{n}] \overset{P}{\to} 0

,

\frac{1}{n} \sum_{i = 1}^{n} (Y_{i, n} - {\bar{Y}}_{n}) [{(g (W_{i}; θ) - {\bar{g}}_{n} (θ))}^{'}, Y_{i, n}^{'}] \overset{P}{\to} [Ω_{Y g}, Ω_{Y Y}]

(finite) and that

(\begin{matrix} \sqrt{n} ({\bar{Y}}_{n} - E [{\bar{Y}}_{n}]) \\ \sqrt{n} ({\bar{g}}_{n} (θ) - E [{\bar{g}}_{n} (θ)]) \end{matrix}) \overset{d}{\to} N (0, [\begin{matrix} Ω_{Y Y} & Ω_{Y g} \\ Ω_{Y g}^{'} & V \end{matrix}]) .

Then, as

n \to \infty

, we have

(B): $(\begin{matrix} \sqrt{n} \sum_{i = 1}^{n} π_{E E L, i, n} (θ) (Y_{i} - E [{\bar{Y}}_{n}]) \\ \sqrt{n} ({\bar{g}}_{n} (θ) - E [{\bar{g}}_{n} (θ)]) \end{matrix}) \overset{d}{\to} N (0, [\begin{matrix} Ω_{Y Y} - Ω_{Y g} V^{- 1} Ω_{Y g}^{'} & 0 \\ 0 & V \end{matrix}])$ ,
(B): $\sqrt{n} \sum_{i = 1}^{n} π_{ρ, i, n} (θ) (Y_{i} - E [{\bar{Y}}_{n}]) - \sqrt{n} \sum_{i = 1}^{n} π_{E E L, i, n} (θ) (Y_{i} - E [{\bar{Y}}_{n}]) \overset{P}{\to} 0$ .

Remark 2.

The proofs of Propositions 1 and 2 are given in the Appendix. Some of these results are already established in Chaudhuri and Renault [9]. However, the result for the ET estimator is not covered in Chaudhuri and Renault [9]. This is important because ET is the only GEL estimator fully consistent with Shannon’s entropy.

Proposition 2 shows that the weighted average involving the implied probabilities is asymptotically independent of the average

{\bar{g}}_{n} (θ)

. Replacing

Y_{i}

by the first derivative of

g (W_{i}; θ)

or by

g (W_{i}; θ) g {(W_{i}; θ)}^{T},

one can deduce that the implied probability estimates of the Jacobian and variance are asymptotically independent of

{\bar{g}}_{n} (θ)

. In the case of weak identification, this asymptotic independence of the estimated Jacobian (and estimated variance) with the moment vector leads to better finite-sample properties.

It follows from Proposition 2 that the use of the implied probabilities provides a more precise estimator of

E [Y]

since the asymptotic variance is smaller than

V a r (Y) .

The score test for the subsets of parameters that we will discuss now allows for weak identification that makes the use of implied probabilities necessary. Chaudhuri and Zivot [14] followed Kleibergen [33] and therefore implicitly used the EEL (Euclidean Empirical Likelihood) implied probabilities. Our paper opens up the possibility of using other implied probabilities for the same test for subsets of parameters, and demonstrates using simulations that other implied probabilities, such as those from EL, can provide significant improvement in its finite-sample performance.

3. Score Test for Subsets of Parameters Using the Implied Probabilities

3.1. Score Vector and Score Statistic Using the Implied Probabilities

Following Chaudhuri and Renault [9], we define the general score vector:

l_{n} (θ, π^{G} (θ), π^{V} (θ)) = \{\sum_{i = 1}^{n} π_{i, n}^{G} (θ) G_{i}^{'} (θ)\} {[\sum_{i = 1}^{n} π_{i, n}^{V} (θ) V_{i, n} (θ)]}^{- 1} \sqrt{n} {\bar{g}}_{n} (θ)

(7)

where

G_{i} (θ) = \frac{\partial g (W_{i}; θ)}{\partial θ^{'}}, V_{i, n} (θ) = g (W_{i}; θ) {(g (W_{i}; θ) - {\bar{g}}_{n} (θ))}^{'},

and

π_{i, n}^{G} (θ)

and

π_{i, n}^{V} (θ)

may be different but such that

π_{i, n}^{G} (θ), π_{i, n}^{V} (θ) \in \{{\hat{π}}_{i, n}^{(γ)} (θ); γ \in R\} \cup \{\frac{1}{n}\} .

(8)

The choice of

π_{i, n}^{G} (θ) = π_{i, n}^{V} (θ) = 1 / n

leads to the standard GMM score statistic (3) as defined in Newey and West [6]. The choice of

π_{i, n}^{G} (θ) = {\hat{π}}_{i, n}^{(1)} (θ)

(EEL) and

π_{i, n}^{V} (θ) = 1 / n

leads to the K-statistic of Kleibergen [33]. The other choices in (8) cover the various score statistics of Guggenberger and Smith [23]. Importantly, note that

π_{i, n}^{G} (θ)

and

π_{i, n}^{V} (θ)

can be based on different

γ

s, accommodating for hybrid GEL score statistics in the spirit of Schennach [34]. We refer the interested reader to Chaudhuri and Renault [9] for further discussion on the score vector.

Pretending that the parameters are all strongly identified, the natural estimator of the asymptotic variance of

l_{n} (θ, π^{G} (θ), π^{V} (θ))

would be

I_{n} (θ, π^{G} (θ), π^{V} (θ)) = \{\sum_{i = 1}^{n} π_{i, n}^{G} (θ) G_{i}^{'} (θ)\} {[\sum_{i = 1}^{n} π_{i, n}^{V} (θ) V_{i, n} (θ)]}^{- 1} \{\sum_{i = 1}^{n} π_{i, n}^{G} (θ) G_{i} (θ)\} .

(9)

Using (9), the general score statistic based on the general score vector in (7) is given by

{LM}_{n} (θ, π^{G} (θ), π^{V} (θ)) = l_{n}^{'} (θ, π^{G} (θ), π^{V} (θ)) I_{n}^{- 1} (θ, π^{G} (θ), π^{V} (θ)) l_{n} (θ, π^{G} (θ), π^{V} (θ)) .

(10)

It is now well known that if

θ_{2}

is weakly identified, then plugging in a GMM estimator of

θ_{2}

that is restricted by

H_{0}

in (2) generally results in a badly over-sized test; see Andrews [26] for a comprehensive discussion. An alternative to such plug-in tests is the projection tests as in, for example, Dufour and Taamouti [35,36]. However, projection tests can be needlessly conservative.

Therefore, we adopt here the idea of the refined projection score test as in Chaudhuri [37], Zivot and Chaudhuri [38], Chaudhuri et al. [39], Chaudhuri and Zivot [14]. Our presentation can be adapted to the more sophisticated version of the aforementioned tests that was introduced in Andrews [26], but that is not performed here for simplicity and brevity.

To present the refined projection score test for the null hypothesis (2) on

θ_{1}

, treating

θ_{2}

as the nuisance parameters, it will be useful to introduce the natural partition of

l_{n} (θ, π^{G} (θ), π^{V} (θ))

and

I_{n} (θ, π^{G} (θ), π^{V} (θ))

conformable to the partition of

θ = {(θ_{1}^{'}, θ_{2}^{'})}^{'}

as

\begin{matrix} l_{n} (θ, π^{G} (θ), π^{V} (θ)) = & [\begin{matrix} l_{1, n} (θ, π^{G} (θ), π^{V} (θ)) \\ l_{2, n} (θ, π^{G} (θ), π^{V} (θ)) \end{matrix}], \\ I_{n} (θ, π^{G} (θ), π^{V} (θ)) = & [\begin{matrix} I_{11, n} (θ, π^{G} (θ), π^{V} (θ)) & I_{12, n} (θ, π^{G} (θ), π^{V} (θ)) \\ I_{21, n} (θ, π^{G} (θ), π^{V} (θ)) & I_{22, n} (θ, π^{G} (θ), π^{V} (θ)) \end{matrix}], \\ l_{1.2, n} (θ, π^{G} (θ), π^{V} (θ)) = & l_{1, n} (., ., .) - I_{12, n} (., ., .) I_{22, n}^{- 1} (., ., .) l_{2, n} (., ., .), \\ I_{11.2, n} (θ, π^{G} (θ), π^{V} (θ)) = & I_{11, n} (., ., .) - I_{12, n} (., ., .) I_{22, n}^{- 1} (., ., .) I_{21, n} (., ., .) \end{matrix}

(11)

where the right-hand side of the last two lines above use

(., ., .)

to denote

(θ, π^{G} (θ), π^{V} (θ))

to avoid notational clutter. Using the notation in (11), it is straightforward to decompose the score statistic in (10) as follows:

{LM}_{n} (θ, π^{G} (θ), π^{V} (θ)) = {LM}_{2, n} (θ, π^{G} (θ), π^{V} (θ)) + {LM}_{11.2} (θ, π^{G} (θ), π^{V} (θ))

(12)

where, borrowing the maximum-likelihood terminology from Cox and Hinkley [40],

\begin{matrix} {LM}_{2, n} (θ, π^{G} (θ), π^{V} (θ)) & = & l_{2, n}^{'} (θ, π^{G} (θ), π^{V} (θ)) I_{22, n}^{- 1} (θ, π^{G} (θ), π^{V} (θ)) l_{2, n} (θ, π^{G} (θ), π^{V} (θ)), \\ {LM}_{1.2, n} (θ, π^{G} (θ), π^{V} (θ)) & = & l_{1.2, n}^{'} (θ, π^{G} (θ), π^{V} (θ)) I_{11.2, n}^{- 1} (θ, π^{G} (θ), π^{V} (θ)) l_{1.2, n} (θ, π^{G} (θ), π^{V} (θ)) \end{matrix}

are respectively the score statistic for

θ_{2}

and the efficient score statistic for

θ_{1}

. The efficient score statistic

{LM}_{1.2, n} (θ, π^{G} (θ), π^{V} (θ))

expressed at

θ = (θ_{10}, θ_{2}^{0})

can be seen as the

C (α)

statistic of Neyman [41] for testing

H_{0} : θ_{1} = θ_{10} .

Interestingly, this test has, under standard regularity conditions, an asymptotic distribution that is invariant to the

\sqrt{n}

-local perturbation of

θ_{2}

from the truth

θ_{2}^{0}

; see, for example, Bera and Bilias [42]. So the unknown nuisance parameter

θ_{2}^{0}

can be replaced by a

\sqrt{n}

-consistent estimator without altering the asymptotic distribution of the

C (α)

statistic.

Another important fact is that

{LM}_{1.2, n} (θ, π^{G} (θ), π^{V} (θ))

can be constructed using any choice of implied probabilities (including

1 / n

) for the Jacobian or the variance matrix, which will now allow us to explore the improved performance of the refined projection score test idea for the null hypothesis

H_{0} : θ_{1} = θ_{10}

in (2) with the use of these implied probabilities.

3.2. Refined Projection Score Test Using the Implied Probabilities

To test the null hypothesis

H_{0} : θ_{1} = θ_{10},

we propose to use the refined projection score test as in Chaudhuri [37], Zivot and Chaudhuri [38], Chaudhuri et al. [39], Chaudhuri and Zivot [14] but with the accommodation for the various choice of implied probabilities. The test is conducted in two steps:

Step 1: Construct a $100 (1 - τ) %$ confidence interval $C_{H_{0}} (θ_{2}, 1 - τ)$ for $θ_{2}$ under the restriction of the null hypothesis $H_{0} : θ_{1} = θ_{10}$ . $C_{H_{0}} (θ_{2}, 1 - τ)$ is a random subset of the parameter space $Θ_{2}$ of $θ_{2}$ and is defined as follows:

$C_{H_{0}} (θ_{2}, 1 - τ) = \{θ_{2} \in Θ_{2} | n {\bar{g}}_{n}^{T} (θ_{10}, θ_{2}) {[{\hat{Ω}}_{n} (θ_{10}, θ_{2})]}^{- 1} {\bar{g}}_{n} (θ_{10}, θ_{2}) \leq χ_{d_{g}}^{2} (1 - τ)\}$

where $χ_{a}^{2} (b)$ denotes the b-th quantile of a chi-square distribution with a degrees of freedom.
Step 2: Reject the null hypothesis $H_{0} : θ_{1} = θ_{10}$ if either $C_{H_{0}} (θ_{2}, 1 - τ)$ is empty or

$inf_{θ_{2} \in C_{H_{0}} (θ_{2}, 1 - τ)} {LM}_{1.2, n} (θ_{10}, θ_{2}, π^{G} (θ_{10}, θ_{2}), π^{V} (θ_{10}, θ_{2})) \geq χ_{d_{θ_{1}}}^{2} (1 - α)$

where $d_{θ_{1}}$ is the dimension of $θ_{1}$ . When deemed necessary, one should impose $π_{i, n}^{G} (θ_{10}, θ_{2}) \neq 1 / n$ following Kleibergen [33,43] to be robust to the weak identification of $θ$ .

Step 1 corresponds to inverting the S-test of Stock and Wright [44]. In special cases, such as the linear instrumental variables regression with conditionally homoskedastic error,

C_{H_{0}} (θ_{2}, 1 - τ)

can be obtained analytically using closed-form formula presented in Dufour and Taamouti [35]. Moreover, Sun [45] provides a STATA command “twostepweakiv” with the “project” option to obtain confidence intervals for

θ_{1}

based on the version of this refined projection test from Chaudhuri and Zivot [14].

The difference between the refined projection test and the Newey and West [6], Kleibergen [33,43] or Guggenberger and Smith [23] score test is that the former performs a projection of

{LM}_{1.2, n} (.)

from

C_{H_{0}} (θ_{2}, 1 - τ)

, while the latter plugs in an estimator of

θ_{2}

in

{LM}_{n} (.)

that makes

{LM}_{2, n} (.)

in (12) zero. This difference enables the refined projection test to guard against the uncontrolled over-rejection of a true

H_{0}

under weak identification. All these tests are asymptotically equivalent under strong identification thanks to the

C (α)

form of

{LM}_{1.2, n} (.)

.

On the other hand, the refinement provided by the refined projection test over the standard projection test principle is two-fold. First, the projection is performed from

C_{H_{0}} (θ_{2}, 1 - τ)

instead of from

Θ_{2}

, as is performed by the latter. Second, the test statistic and critical values used are

{LM}_{1.2, n} (.)

and

χ_{d_{θ_{1}}}^{2} (α)

instead of

{LM}_{n} (.)

and

χ_{d_{θ}}^{2} (α)

, as is performed by the standard projection score test. The restricted projection from

C_{H_{0}} (θ_{2}, 1 - τ)

instead of from

Θ_{2}

and the use of the smaller critical values based on the degrees of freedom

d_{θ_{1}}

instead of

d_{θ}

of the chi-squared distribution are what make the refined projection test more powerful than the standard projections tests.

Without the weak identification problem, the refined projection test is the efficient test in the sense of Newey and West [6]. The standard projection score test is less powerful. In presence of weak identification, both the standard projection score test and the refined projection score test guard against the uncontrolled over-rejection of the truth, while the Newey and West [6], Kleibergen [33,43] or Guggenberger and Smith [23] score tests do not do so.

The following proposition makes precise the statement about “uncontrolled over-rejection” and “efficient test” made above. For brevity, we list the technical assumptions

Θ

, SW, and D in the Appendix. These additional assumptions are essential for establishing the asymptotic properties of the refined projection test in Chaudhuri and Zivot [14] to which we refer the readers for the proof. Then, by appealing to the results in Propositions 1 and 2 that were obtained under Assumptions 1, 2, and

ρ

, the results stated in Proposition 3 follow directly.

Proposition 3.

Let Assumptions 1, 2,

ρ,

and the three assumptions Θ,

S W

and D, stated in the Appendix, hold. Then, we obtain the following results for the refined projection score test using the implied probabilities in (8):

(i): The asymptotic size of the test cannot exceed $α + τ$ for any choice of $α > 0$ and $τ > 0$ with $α + τ < 1$ under a restriction in (8) that $π_{i, n}^{G} (θ) \in \{{\hat{π}}_{i, n}^{(γ)} (θ); γ \in R\}$ .
(ii): If all elements of θ are strongly identified as in Newey and West [6], and $θ_{10} = θ_{1}^{0} + b / \sqrt{n}$ , then the test with any given $τ > 0,$ such that $C_{H_{0}} (θ_{2}, 1 - τ)$ is non-empty and is asymptotically equivalent to the infeasible efficient score test that rejects $H_{0} : θ_{1} = θ_{10}$ if ${LM}_{1.2, n} (θ_{10}, θ_{2}^{0}, 1 / n, 1 / n) \geq χ_{d_{θ_{1}}}^{2} (α)$ .

Remark 3.

The tests discussed here involving various implied probabilities have the same first-order asymptotic properties as the test in Chaudhuri and Zivot [14]. Indeed, their asymptotic size cannot exceed

α + τ

, and if there is no problem of weak identification, then for any choice of τ (however small or large), these tests are asymptotically equivalent to the asymptotically efficient infeasible score test with asymptotic size α. So, with strong identification, the asymptotic size of this test is α, provided the first-step confidence interval is non-empty. The results in Chaudhuri and Renault [9] suggest that the use of the implied probabilities could lead to better properties in the finite samples. This is precisely what we find in the Monte Carlo experiment described below.

4. Monte Carlo Experiment

The improvement in the finite-sample size properties of tests by the use of implied probabilities is well known. The characterization of the asymptotic size described in Proposition 3(i) of the refined projection test appeals to the Bonferroni inequality applied to the size properties of two full vector score tests. Guggenberger and Smith [23] and Chaudhuri and Renault [9] document evidence that the finite-sample size of the full vector score tests with various implied probabilities is similar to their nominal level under various scenarios involving different strengths of identification. This will be confirmed here in our simulations.

On the other hand, less attention has been paid to the matter of improvement in power; the work of Chaudhuri and Renault [9] is an exception but only when testing a full vector (

θ

and not

θ_{1}

). However, there is a big difference between the power of a test for the full vector

θ

versus a test for subset of

θ

, and the main advantage of the refined projection test concerns its power. Therefore, we will primarily focus on the power properties of the refined projection score test for

θ_{1}

, compared to that of the plug-in tests. Since the power properties of the plug-in tests are better understood when parameters

θ

are strongly identified (see Andrews [26]), we will maintain strong identification of

θ

in this section.

4.1. Design

In this section, we examine a model that is not subject to weak identification but is instead affected by large higher-order moments, leading to difficult estimation of the variance matrix. This is the same experiment as that considered in the unpublished manuscript by Chaudhuri and Renault [46]. We generate

W_{i} \sim i . i . d Gamma (\exp (θ_{1}^{0}) = 1, exp (θ_{2}^{0}) = 2) for i = 1, \dots, n

where

θ_{1}^{0} = ln (1) = 0

and

θ_{2}^{0} = ln (2)

. We exploit the first two moments of the Gamma distribution, i.e.,

E [W_{i}] = exp (θ_{1}^{0} + θ_{2}^{0})

and

E [W_{i}^{2}] = exp (θ_{1}^{0} + 2 θ_{2}^{0}) + exp (2 θ_{1}^{0} + 2 θ_{2}^{0})

to conduct the score tests. Consequently, the moment vector is defined as

g (W_{i}, (θ_{1}, θ_{2})) = [\begin{matrix} W_{i} - exp [θ_{1} + θ_{2}] \\ W_{i}^{2} - exp [θ_{1} + 2 θ_{2}] - exp [2 θ_{1} + 2 θ_{2}] \end{matrix}]

and it satisfies the moment restrictions in (1) for

θ = θ^{0} = {(θ_{1}^{0'}, θ_{2}^{0'})}^{'}

. The Jacobian does not depend on

W_{i}

, so the implied probabilities are not involved in its estimation. The elements of the moment vector

g (W_{i}; θ^{0})

are skewed. Indeed, the skewness of the first element is 2, while that of the second element is approximately 6.6. Moreover, the two elements of the moment vector are strongly leptokurtic with fourth moments equal to 144 (kurtosis = 9) and 8,982,528 (kurtosis = 87.7), respectively. Hence, the estimation of the variance might be problematic and, therefore, appropriate weighting for the estimator of the variance matrix might be crucial.

4.2. Results

There is no weak identification issue in this design. Hence, without the fear of over-rejection of the truth, according to the first-order asymptotics, one could plug in the restricted GMM estimator of

θ_{2}

in the second-step test statistic

{LM}_{1.2, n} (.)

instead of minimizing the test statistic

{LM}_{1.2, n} (.)

over values of

θ_{2}

in the first-step confidence interval. This is similar in spirit to the score test of Newey and West [6]. Taking advantage of the

C (α)

form of

{LM}_{1.2, n} (.)

’s asymptotic invariance to

\sqrt{n}

-local deviation of

θ_{2}

from

θ_{2}^{0}

, we plug in the computationally convenient restricted GMM estimator of

θ_{2}

in

{LM}_{1.2, n} (.)

. We consider this plug-in version of the score test for three popular choices: (i)

π^{G} (.) = π^{V} (.) = 1 / n

; (ii)

π^{G} (.) = π^{V} (.) = {\hat{π}}^{(1)} (.)

, i.e., the EEL implied probabilities; and (iii)

π^{G} (.) = π^{V} (.) = {\hat{π}}^{(- 1)} (.)

, i.e., the EL implied probabilities. We similarly consider each of these choices for the refined projection score test with two choices

τ = 1 %

and

τ = 5 %

for the first-step confidence interval. Asymptotic theory says that all tests considered here are asymptotically equivalent and efficient in this case.

To explore the finite-sample properties of the tests, we run 5000 Monte Carlo trials for the sample sizes

n = 100

and

1000 .

The theoretical size is

α = 5 %

for all tests. Table 1 contains the rejection rate of the null

H_{0} : θ = θ_{10}

of all these tests for a grid of deviations from the null, i.e.,

θ_{10} - θ_{1}^{0}

. The columns contain rejection rates for the plug-in score test and our refined test with two values of

τ

,

τ = 1 %,

5%. The row with

θ_{10} - θ_{1}^{0} = 0

corresponds to the empirical size of the tests.

First, we analyze the size. We see that the plug-in version of the score test for all three choices of

π^{G} (.), π^{V} (.)

over-rejects the true null. Over-rejection goes down for the choices

π^{G} (.) = π^{V} (.) = 1 / n

and

π^{G} (.) = π^{V} (.) = {\hat{π}}^{(- 1)} (.)

when the sample size increases to

n = 1000

. However, the refined projection version of the score test for all three choices largely solves this problem of the over-rejection of the truth even when

n = 100

. Importantly, we see that the choice of

τ = 1 %

versus

τ = 5 %

for the refined projection does not much affect the finite-sample rejection rate of the truth under this strong identification setup.

Moving to the discussion of power, we see that the refined projection test has good power in small samples. Now, comparing the choices

π^{G} (.) = π^{V} (.) = 1 / n

,

π^{G} (.) = π^{V} (.) = {\hat{π}}^{(1)} (.)

,

π^{G} (.) = π^{V} (.) = {\hat{π}}^{(- 1)} (.)

, we see that the finite sample power of the third choice, i.e., EL, is much better than that of the other two. The lower power in small samples for the choice

π^{G} (.) = π^{V} (.) = 1 / n

supports that orthogonalization by the implied probabilities in the variance matrix estimator is important for power. However, do note that the

π^{G} (.) = π^{V} (.) = {\hat{π}}^{(1)} (.)

(EEL) delivers the worst power in spite of the orthogonalization by the implied probabilities in the variance matrix estimator. This happens because the EEL implied probabilities can be negative, which rules out the positive (semi-) definiteness of the variance estimator and, in turn, leads to an unduly small

{LM}_{1.2, n} (.)

under false null hypotheses. The shrinkage of the EEL implied probabilities to make them positive, as suggested in Antoine et al. [47] and extensively used in Chaudhuri and Renault [9], can alleviate this problem of poor power to some extent but is not investigated here.

The refined projection test with the EL implied probabilities is the clear winner in terms of size and power. Its superiority is more prominent in the smaller sample, where it matters more.

Another Monte Carlo experiment using a linear instrumental variables regression confirms the good size and power of our test (the results are available from the authors upon request).

5. Application to the Impact of Veteran Status on Earnings

Following Chaudhuri and Rose [16], we propose to estimate the effect of the veteran status on future earnings for Vietnam war veterans in the United States by running an instrumental variables regression of log annual earnings on the dummy variable veteran status and a variety of control variables related to both earnings and veteran status. One important variable which influences earnings is years of schooling. However, since schooling is related to some unobservable variable (“ability”) that is related to both earnings and veteran status, it is obviously endogenous. So, we wish to estimate a regression of the log earnings on both veteran status and schooling. (The causal question in this empirical illustration is a difficult one due to the nature of the relationship between veteran status and schooling. First, the veteran status can help increase the years of schooling because of the subsidy provided by the GI Bill. Hence, schooling can be a mediator through which the veteran status affects wages. Second, the draft avoidance behavior of individuals was often enacted by enrolling in college and thereby increasing years of schooling. That is, the decisions to join the military and for the continuation of schooling were often made simultaneously. A more complete analysis is beyond the scope of this paper.) Given both regressors are endogenous, we need to use instrumental variables.

Angrist [17,48] used the Vietnam Era draft lottery that determined the draft eligibility of individuals, to instrument for an individual’s veteran status in the Vietnam war. A popular choice of instrument for schooling since Card [18,49,50] has been the presence of colleges in the neighborhood of where the individual grew up. Following these seminal references, we use four instrumental variables: (i) the lottery number assigned to the individual based on their date of birth, (ii) the lottery ceiling for the year when this individual attained draft age, (iii) a dummy variable indicating the presence of a 4 year accredited public college, and (iv) a dummy variable indicating the presence of a 4 year accredited private college in the neighborhood of the individual’s residence in 1966.

Partialling out the control variables from the system by taking the residuals from a regression of the concerned variables on those controls and the intercept, we focus on the instrumental variables regression model

y_{i} = X_{1 i} θ + X_{2 i} θ_{2} + u_{i}

with moment vector

g (W_{i}; θ) = Z_{i} (y_{i} - X_{1 i} θ_{1} - X_{2 i} θ_{2})

where

y_{i}, X_{1 i}, X_{2 i}

denote the residuals from the regression on the controls and the intercept of the variables log earnings, veteran status, and years of schooling, respectively, and

Z_{i}

is the

4 \times 1

vector of instruments such that

E (Z_{i} u_{i}) = 0

.

We use the same data (the dataset is available on https://saraswata.research.mcgill.ca/MC_SC_Data.xlsx, accessed on 2 February 2025) as in Chaudhuri and Rose [16], which were obtained from the National Longitudinal Survey of Young Men. The sample includes 1080 (i.e., 39%) veterans and 1674 non-veterans. In this dataset, the instruments are weak for both veteran status and schooling with the first stage F statistic equal to 8.46 and 2.53, respectively.

Using these data, Chaudhuri and Rose [16] implemented a variety of plug-in methods, namely, the subset-K, subset-KJ and subset-CLR tests, and obtained a significant (at the 5% level) negative effect of the veteran status. However, these tests are not reliable in the presence of weak identification as shown by Guggenberger et al. [13] and Andrews [26].

The only genuinely weak-identification robust method used in Chaudhuri and Rose [16] is the so-called subset-Anderson–Rubin test proposed by Guggenberger et al. [13]. The subset-AR test lead to a 90% confidence interval for the coefficient of the veteran status whose upper bound is approximately 0.095, signifying that rather large positive effects of veteran status—

100 (\exp (0.095) - 1) = 9.97 %

increase in wage—cannot be ruled out. The lower bound of the subset-AR confidence interval asymptotes to

- \infty

, which is a consequence of weak identification. The inclusion of positive values in the confidence renders this test inconclusive.

The subset-AR test can be conservative when the effective number of over-identifying restrictions (the number of instruments minus the dimension of

θ_{2}

, in this case

4 - 1 = 3

) is larger than the number of restrictions in the null (in this case, 1) being tested. Therefore, a priori, there is reason to believe that the refined projection test, that is, the efficient test under strong identification but also robust to weak identification, might alter the conclusion of the subset-AR test.

Indeed, this is what we find with the refined projection test using EL implied probabilities

π^{G} (.) = π^{V} (.) = {\hat{π}}^{(- 1)} (.)

. This confidence interval also includes implausibly large negative values (consequence of weak identification); however, its upper bound is less than zero, supporting the hypothesis that the veteran effect is negative.

For a visual illustration, Figure 1 presents two plots against various values of

θ_{10}

of

H_{0} : θ_{1} = θ_{10}

—(i) the subset-AR statistic minus the

χ_{3}^{2} (1 - 0.1)

, i.e., the tests statistic minus the 10% critical value for the subset-AR test, and (ii) the second step test statistic for the refined projection test minus the second step critical value, i.e.,

{inf}_{θ_{2} \in C_{H_{0}} (θ_{2}, 1 - τ)} {LM}_{1.2, n} (θ_{10}, θ_{2}, π^{G} (θ_{10}, θ_{2}), π^{V} (θ_{10}, θ_{2})) - χ_{1}^{2} (1 - α)

for the choice

τ = α = 0.05

. We take the function plotted for (ii) as

+ \infty

if the first-step confidence interval is empty (that automatically rejects

H_{0} : θ_{1} = θ_{10}

without requiring the second step). The values

θ_{10}

for which these two plots are below the horizontal red dotted line at zero are those that are included in the confidence interval for the respective tests. The vertical black dotted line is the zero effect line. Inclusion of the blue or the green line in the south-east quadrant of the graph means the positive effect is not ruled out by the concerned test. We see that while the CI of the subset-AR test includes positive values, that of our refined test includes only negative values which permits to conclude that the veteran effect is negative.

6. Conclusions

In this paper, we propose a two-step approach for testing the subvectors of parameters in models characterized by a vector of moment restrictions. The first step is based on an identification robust confidence interval of the parameter, while the second relies on a score test. We show the advantages of using the implied probabilities obtained from the Information Theory criteria to estimate the Jacobian and variance matrix present in our score tests. These tests exploit efficiently the information content of the moment conditions. As a result, these tests have an empirical size close to the theoretical size and their power is good. The resulting confidence intervals are more reliable than those from alternative tests in the presence of skewness and/or weak identification. The theoretical properties of our tests are derived for all the elements of the Cressie–Read family, including the Kullback–Leibler Information Criterion. Finally, the empirical application brings evidence that veterans have lower earnings than comparable non-veterans.

Author Contributions

Methodology, S.C.; Resources, M.C.; Writing—original draft, M.C. and S.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by FQRSC and SSHRC.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Acknowledgments

The authors thank the editors, two referees, and Eric Renault for their helpful comments. Financial support from FQRSC and SSHRC are gratefully acknowledged.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Appendix A.1. Assumptions Involving Weak Identification

All the following assumptions are discussed in detail in Chaudhuri and Zivot [14]. Without loss of generality, we group the parameters into weakly and strongly identified parameters. For

j = w, s

, let

ν_{j} = ν_{1 j} + ν_{2 j}

,

θ_{j} = {(θ_{1 j}^{'}, θ_{2 j}^{'})}^{'}

and

Θ_{j} = Θ_{1 j} \times Θ_{2 j}

. This notation denotes the weakly identified parameters as

θ_{w}

and the strongly identified parameters as

θ_{s}

. The true values are, when convenient, regrouped as

θ_{0 w} = {(θ_{01 w}^{'}, θ_{02 w}^{'})}^{'}

and

θ_{0 s} = {(θ_{01 s}^{'}, θ_{02 s}^{'})}^{'}

, respectively. When necessary,

N \subset Θ

and

N_{r} \subset Θ_{r}

are generically used to denote non-shrinking open neighborhoods of

θ_{0}

and

θ_{0 r}

for

r = 1 w, 1 s, 2 w, 2 s, w, s, 1, 2

, respectively. Define

\tilde{N} : = N_{w} \times N_{1 s} \times Θ_{2 s}

.

Assumption $Θ$ :

[partition of parameter space]

For $l = 1, 2$ , let $Θ_{l} = Θ_{l w} \times Θ_{l s}$ and for $j = w, s$ , let $θ_{l j}^{0} \in$ interior $(Θ_{l j})$ , where $Θ_{l j} \subset R^{ν_{l j}}$ is compact.

Assumption SW:

[characterization of strong/weak identification]

$E [{\bar{g}}_{n} (θ)] = {\tilde{m}}_{n} (θ) / \sqrt{n} + m (θ_{s})$ where

(a): ${\tilde{m}}_{n} (θ) : Θ \mapsto R^{d_{g}}$ is such that ${\tilde{m}}_{n} (θ) \to \tilde{m} (θ)$ uniformly for $θ \in \tilde{N}$ where $\tilde{m} (θ)$ is bounded and continuous and $\tilde{m} (θ_{0}) = 0$ . For $θ \in \tilde{N}$ , ${\tilde{M}}_{n} (θ) : = \partial {\tilde{m}}_{n} (θ) / \partial θ^{'}$ , ${\tilde{M}}_{n} (θ) \to \tilde{M} (θ)$ uniformly. $\tilde{M} (θ) = [{\tilde{M}}_{1 w} (θ), {\tilde{M}}_{1 s} (θ), {\tilde{M}}_{2 w} (θ), {\tilde{M}}_{2 s} (θ)]$ where, for $l = 1, 2$ and $j = w, s$ , the $k \times ν_{l j}$ matrix ${\tilde{M}}_{l j} (θ)$ is bounded and continuous.
(b): $m (θ_{s}) : Θ_{s} \mapsto R^{d_{g}}$ is a continuous function and $m (θ_{s}) = 0$ if and only if $θ_{s} = θ_{s}^{0}$ . For $θ_{s} \in N_{1 s} \times Θ_{2 s}$ , $M (θ_{s}) : = \partial m (θ_{s}) / \partial θ_{s}^{'}$ is bounded and continuous. $M (θ_{0 s})$ has full column rank. Here, $M (θ_{s}) = [M_{1} (θ_{s}), M_{2} (θ_{s})]$ where $M_{l} (θ_{s}) : = \partial m (θ_{s}) / \partial θ_{l s}^{'}$ for $l = 1, 2$ .

Assumption D: [assumptions on the moment vector and its derivative]

D1.: ${\bar{G}}_{n} (θ) : = \partial {\bar{g}}_{n} (θ) / \partial θ^{'} = [G_{1 w n} (θ), G_{1 s n} (θ), G_{2 w n} (θ), G_{2 s n} (θ)] = E [{\bar{G}}_{n} (θ)] + o_{p} (1)$ uniformly for $θ \in \tilde{N}$ where $E [{\bar{G}}_{n} (θ)] = \partial E [{\bar{g}}_{n} (θ)] / \partial θ^{'} = {\tilde{M}}_{n} (θ) / \sqrt{n} + [0, M_{1} (θ_{s}), 0, M_{2} (θ_{s})]$ by imposing interchangeability of the order of differentiation and integration (and from Assumption $S W$ ).
D2.: $\sqrt{n} [{\bar{g}}_{n}^{'} (θ_{0}), v e c^{'} ({\bar{G}}_{w n} (θ_{0}) - E [{\bar{G}}_{w n} (θ^{0})])] \overset{d}{\to} [Ψ_{g}^{T}, Ψ_{w}^{T}]$ where (the partition of $Ψ_{w} (θ) = {[Ψ_{1 w}^{'} (θ), Ψ_{2 w}^{'} (θ)]}^{'}$ , $Σ_{g w} (θ) = [Σ_{g 1} (θ), Σ_{g 2} (θ)] = Σ_{w g}^{'} (θ)$ and $Σ_{w w} (θ) = {(Σ_{l l^{'}} (θ))}_{l, l^{'} = 1, 2}$ is conformable to the partition of $θ_{w} = {(θ_{1 w}^{'}, θ_{2 w}^{'})}^{'}$ , the partition of the weakly identified elements of $θ$ into those from $θ_{1}$ and $θ_{2}$ , respectively)

$[\begin{matrix} Ψ_{g} \\ Ψ_{w} \end{matrix}] \sim N (0, Σ (θ_{0}) = [\begin{matrix} \underset{k \times k}{Σ_{g g} (θ^{0})} \equiv V (θ^{0}) & \underset{k \times k ν_{w}}{Σ_{g w} (θ^{0})} \\ \underset{k ν_{w} \times k}{Σ_{w g} (θ^{0})} & \underset{k ν_{w} \times k ν_{w}}{Σ_{w w} (θ^{0})} \end{matrix}]) .$

$Σ_{g g} (θ)$ is bounded, continuous, and positive definite. Refining Assumption 2 and with the obvious correspondence of notation between the V’s and the $Σ$ ’s that specify what the estimators are, we also make the following assumptions: ${\hat{Σ}}_{g g} (θ) \overset{p}{\to} Σ_{g g} (θ)$ uniformly for $θ \in \tilde{N}$ . $Σ_{w g} (θ)$ is bounded and continuous. ${\hat{Σ}}_{w g} (θ) : = {[{\hat{Σ}}_{1 g}^{'} (θ), \dots, {\hat{Σ}}_{ν_{1 w}, g}^{'} (θ), {\hat{Σ}}_{ν_{1} + 1, g}^{'} (θ), \dots, {\hat{Σ}}_{ν_{1} + ν_{2 w}, g}^{'} (θ)]}^{'}$ $\overset{p}{\to} Σ_{w g} (θ)$ uniformly for $θ \in N$ . (It is worth noting that the $d_{g} \times d_{g}$ matrix ${\hat{Σ}}_{l g} (θ^{0}) \overset{p}{\to} Σ_{l g} (θ^{0}) = Asym . Cov (n^{- 1 / 2} \partial {\bar{g}}_{n} (θ^{0}) / \partial θ_{l}, n^{- 1 / 2} {\bar{g}}_{n} (θ^{0}))$ where $θ_{l}$ is the l-th element ( $l = 1, \dots, ν_{1 w}, ν_{1} + 1, \dots, ν_{1} + ν_{2 w}$ ) of $θ$ .) For $l = ν_{1 w} + 1, \dots, ν_{1}, ν_{1} + ν_{2 w} + 1, \dots, ν$ the $d_{g} \times d_{g}$ matrices ${\hat{Σ}}_{l g} (θ)$ are such that ${\hat{Σ}}_{l g} (θ) {\hat{Σ}}_{g g}^{- 1} (θ) = O_{p} (1)$ uniformly for $θ \in N$ .

Appendix A.2. Proofs

In the proofs, we use the notation $g_{i} (θ) = g (W_{i}; θ) .$

Proof of Proposition 1:

(A) A mean-value expansion of the RHS of the (approximate) first-order condition of the maximization problem in (5) gives,

\begin{matrix} o_{P} (\frac{1}{\sqrt{n}}) & = & \frac{1}{n} \sum_{i = 1}^{n} ρ_{1} (λ_{ρ, n}^{'} (θ) g_{i} (θ)) g_{i} (θ) \\ = & \frac{1}{n} \sum_{i = 1}^{n} ρ_{1} (0) g_{i} (θ) + \frac{1}{n} \sum_{i = 1}^{n} ρ_{2} (0) g_{i} (θ) g_{i}^{'} (θ) λ_{ρ, n} (θ) + R_{λ, n} (θ) \\ = & - {\bar{g}}_{n} (θ) - {\hat{Ω}}_{n} (θ) λ_{ρ, n} (θ) + R_{λ, n} (θ), \end{matrix}

(A1)

where

{\bar{v}}_{i}

are the mean values satisfying

| {\bar{v}}_{i} | \leq | λ_{ρ, n}^{'} (θ) g_{i} (θ) |

for all

i = 1, \dots, n

, and the remainder term

R_{λ, n} (θ) = \frac{1}{n} \sum_{i = 1}^{n} [ρ_{2} ({\bar{v}}_{i}) - ρ_{2} (0)] g_{i} (θ) g^{'} (W_{i}; θ) λ_{ρ, n} (θ)

. If we could ignore the contribution of

R_{λ, n} (θ)

in (A1), we would obtain

λ_{ρ, n} (θ) = - {\hat{Ω}}_{n}^{- 1} (θ) {\bar{g}}_{n} (θ) + {\hat{Ω}}_{n}^{- 1} (θ) \times o_{P} (\frac{1}{\sqrt{n}}) = O_{P} (\frac{1}{\sqrt{n}}) .

{\hat{Ω}}_{n} (θ)

and

{\hat{Ω}}_{n}^{- 1} (θ)

are assumed to be

O_{P} (1)

by assumption A2 and

{\bar{g}}_{n} (θ) = O_{P} (n^{- 1 / 2})

by assumption A1 (i) and (iv). For this reason, if we can show that

∥ R_{λ, n} (θ) ∥ = o_{P} (n^{- 1 / 2}),

then it will be sufficient to establish result (A). This is what we prove next:

\begin{matrix} ∥ R_{λ, n} (θ) ∥ & = & ∥(\frac{1}{n} \sum_{i = 1}^{n} [ρ_{2} ({\bar{v}}_{i}) - ρ_{2} (0)] g_{i} (θ) g_{i}^{'} (θ)) λ_{ρ, n} (θ)∥ \\ \leq & ∥\frac{1}{n} \sum_{i = 1}^{n} [ρ_{2} ({\bar{v}}_{i}) - ρ_{2} (0)] g_{i} (θ) g_{i}^{'} (θ)∥ \times ∥ λ_{ρ, n} (θ) ∥ \\ \leq & max_{1 \leq i \leq n} | ρ_{2} ({\bar{v}}_{i}) - ρ_{2} (0) | \times ∥\frac{1}{n} \sum_{i = 1}^{n} g_{i} (θ) g_{i}^{'} (θ)∥ \times ∥ λ_{ρ, n} (θ) ∥ \\ \leq & b \times max_{1 \leq i \leq n} | {\bar{v}}_{i} | \times (∥ V (θ) ∥ + o_{p} (1)) \times ∥ λ_{ρ, n} (θ) ∥ \\ \leq & b \times max_{1 \leq i \leq n} | g_{i}^{'} (θ) λ_{ρ, n} (θ) | \times b_{\max} (θ) \times ∥ λ_{ρ, n} (θ) ∥ \\ \leq & b \times max_{1 \leq i \leq n} ∥ g_{i} (θ) ∥ \times b_{m a x} (θ) \times {∥ λ_{ρ, n} (θ) ∥}^{2} \\ \leq & b \times b_{\max} (θ) \times o_{p} (\sqrt{n}) \times {∥ λ_{ρ, n} (θ) ∥}^{2} = o_{P} (n^{- 1 / 2}), \end{matrix}

(A2)

by the repeated use of Cauchy–Schwarz and triangle inequalities and because

{max}_{1 \leq i \leq n} ∥ g_{i} (θ) ∥ = o_{P} (\sqrt{n})

and

∥ λ_{ρ, n} (θ) ∥ = O_{P} (n^{- 1 / 2})

. Therefore, result (A) follows.

(B) Expanding the numerator and denominator of the RHS of (6) around 0, and using the result obtained in (A), we obtain for any given

i = 1, \dots, n

\begin{matrix} π_{ρ, i, n} (θ) & = & \frac{\frac{1}{n} [ρ_{1} (0) + ρ_{2} (0) λ_{ρ, n}^{'} (θ) g_{i} (θ) + \{ρ_{2} ({\bar{v}}_{i}) - ρ_{2} (0)\} λ_{ρ, n}^{'} g_{i} (θ)]}{\frac{1}{n} \sum_{j = 1}^{n} [ρ_{1} (0) + ρ_{2} (0) λ_{ρ, n}^{'} g_{i} (θ) + \{ρ_{2} ({\bar{v}}_{j}) - ρ_{2} (0)\} λ_{ρ, n}^{'} g (W_{j}; θ)]} \\ = & \frac{\frac{1}{n} [ρ_{1} (0) - ρ_{2} (0) g_{i}^{'} (θ) \{{\hat{Ω}}_{n}^{- 1} (θ) {\bar{g}}_{n} (θ) + o_{P} (n^{- 1 / 2})\} + \{ρ_{2} ({\bar{v}}_{i}) - ρ_{2} (0)\} λ_{ρ, n}^{'} (θ) g_{i} (θ)]}{\frac{1}{n} \sum_{j = 1}^{n} [ρ_{1} (0) - ρ_{2} (0) g_{i}^{'} (θ) \{{\hat{Ω}}_{n}^{- 1} (θ) {\bar{g}}_{n} (θ) + o_{P} (n^{- 1 / 2})\} + \{ρ_{2} ({\bar{v}}_{j}) - ρ_{2} (0)\} λ_{ρ, n}^{'} (θ) g (W_{j}; θ)]} \\ = & \frac{\frac{1}{n} [1 - {(g_{i} (θ) - {\bar{g}}_{n} (θ))}^{'} {\hat{Ω}}_{n}^{- 1} (θ) {\bar{g}}_{n} (θ)] + R_{N U M, i, n}}{1 - {\bar{g}}_{n}^{'} (θ) {\hat{Ω}}_{n}^{- 1} (θ) {\bar{g}}_{n} (θ) + R_{D E N, n}} \end{matrix}

(A3)

where the remainder terms in the numerator and the denominator are given by

\begin{matrix} R_{N U M, i, n} : = & \frac{1}{n} \{ρ_{2} ({\bar{v}}_{i}) - ρ_{2} (0)\} λ_{ρ, n}^{'} (θ) g_{i} (θ) - \frac{1}{n} ρ_{2} (0) g^{'} (W_{i}; θ) \times o_{P} (n^{- 1 / 2}) + \frac{1}{n} {\bar{g}}_{n}^{'} (θ) {\hat{Ω}}_{n}^{- 1} (θ) {\bar{g}}_{n} (θ), \\ R_{D E N, n} : = & \frac{1}{n} \sum_{j = 1}^{n} [\{ρ_{2} ({\bar{v}}_{j}) - ρ_{2} (0)\} λ_{ρ, n}^{'} (θ) g (W_{j}; θ) - ρ_{2} (0) g^{'} (W_{i}; θ) \times o_{P} (n^{- 1 / 2})] . \end{matrix}

It is important to note that i is given (fixed) in the remainder term

R_{N U M, i, n}

. Now following the same steps as in (A) to deal with the remainder term, we obtain for a given

i = 1, \dots, n

\begin{matrix} | R_{N U M, i, n} | & \leq & \frac{1}{n} | ρ_{2} ({\bar{v}}_{i}) - ρ_{2} (0) | \times ∥ λ_{ρ, n} (θ) ∥ \times ∥ g_{i} (θ) ∥ + \frac{1}{n} ∥ g_{i} (θ) ∥ \times o_{P} (\frac{1}{\sqrt{n}}) + \frac{1}{n} {\bar{g}}_{n}^{'} (θ) {\hat{Ω}}_{n}^{- 1} (θ) {\bar{g}}_{n} (θ) \\ \leq & \frac{1}{n} b \times | λ_{ρ, n}^{'} (θ) g_{i} (θ) | \times ∥ λ_{ρ, n} (θ) ∥ \times ∥ g_{i} (θ) ∥ + ∥ g_{i} (θ) ∥ \times o_{P} (\frac{1}{n^{3 / 2}}) + \frac{1}{n} {∥ {\bar{g}}_{n} (θ) ∥}^{2} \times b_{m i n}^{- 1} (θ) \\ \leq & \frac{1}{n} b \times ∥ λ_{ρ, n} {(θ) ∥}^{2} \times ∥ g_{i} {(θ) ∥}^{2} + ∥ g_{i} (θ) ∥ \times o_{P} (\frac{1}{n^{3 / 2}}) + \frac{1}{n} {∥ {\bar{g}}_{n} (θ) ∥}^{2} \times b_{m i n}^{- 1} (θ), \\ = & O_{P} (\frac{1}{n^{1 + 1}}) \times O_{P} (1) + O_{P} (1) \times o_{P} (\frac{1}{n^{3 / 2}}) + O_{P} (\frac{1}{n^{1 + 1}}) \\ = & o_{P} (\frac{1}{n^{3 / 2}}) \end{matrix}

(A4)

because

∥ g_{i} (θ) ∥ = O_{P} (1)

by A1(iii),

∥ {\bar{g}}_{n} (θ) ∥ = O_{P} (n^{- 1 / 2})

by A1 (i) and (iv), and

λ_{ρ, n} (θ) = O_{P} (n^{- 1 / 2})

by (A). Finally, we want to derive the order of magnitude of

| R_{D E N, n} |

. Using a similar technique as before, we obtain that

\begin{matrix} | R_{D E N, n} | & \leq & \frac{1}{n} |\sum_{j = 1}^{n} \{ρ_{2} ({\bar{v}}_{j}) - ρ_{2} (0)\} λ_{ρ, n}^{'} (θ) g (W_{j}; θ)| + ∥ {\bar{g}}_{n} (θ) ∥ \times o_{P} (n^{- 1 / 2}) \\ \leq & max_{1 \leq j \leq n} | ρ_{2} ({\bar{v}}_{j}) - ρ_{2} (0) | \times ∥ {\bar{g}}_{n} (θ) ∥ \times ∥ λ_{ρ, n} (θ) ∥ + ∥ {\bar{g}}_{n} (θ) ∥ \times o_{P} (n^{- 1 / 2}) \\ \leq & b \times max_{1 \leq j \leq n} | λ_{ρ, n}^{'} (θ) g_{j} (θ) | \times ∥ {\bar{g}}_{n} (θ) ∥ \times ∥ λ_{ρ, n} (θ) ∥ + ∥ {\bar{g}}_{n} (θ) ∥ \times o_{P} (n^{- 1 / 2}) \\ \leq & b \times max_{1 \leq j \leq n} ∥ g_{j} (θ) ∥ \times ∥ {\bar{g}}_{n} (θ) ∥ \times ∥ λ_{ρ, n} {(θ) ∥}^{2} + ∥ {\bar{g}}_{n} (θ) ∥ \times o_{P} (n^{- 1 / 2}) \\ = & o_{P} (n^{1 / 2 - 3 / 2}) + o_{P} (n^{- 1}) = o_{P} (n^{- 1}) \end{matrix}

because

{max}_{1 \leq j \leq n} ∥ g_{j} (θ) ∥ = o_{P} (\sqrt{n})

by Assumption 1(ii), while by (A) we have

λ_{ρ, n} (θ) = O_{P} (n^{- 1 / 2})

. Moreover,

{\bar{g}}_{n}^{'} (θ) {\hat{Ω}}_{n}^{- 1} (θ) {\bar{g}}_{n} (θ)

in the denominator of (A3) is

O_{P} (n^{- 1})

because

∥ {\bar{g}}_{n} (θ) ∥ = O_{P} (n^{- 1 / 2})

as before. Therefore, the whole denominator of (A3) is

1 + O_{P} (n^{- 1})

. Consequently, result (B) follows from (A3) and (A4).

Proof of Proposition 2:

(A) This result follows directly from the definition of the

π_{E E L, i, n} (θ)

and (vi).

(B) Since our result in Proposition 1(B) is not uniform in

i,

we cannot appeal to

{max}_{1 \leq i \leq n} | π_{ρ, i, n} (θ) - π_{E E L, i, n} (θ) |

after applying the Cauchy–Schwarz inequality. Alternatively, we directly work with the expression of the difference

\{π_{ρ, i, n} (θ) - π_{E E L, i, n} (θ)\} = R_{N U M, i} / (1 + o_{P} (1)

obtained in (A3). To simplify notations, we denote

{\tilde{Y}}_{i, n} : = Y_{i, n} - E [{\bar{Y}}_{n}]

,

g_{i} : = g_{i} (θ)

,

{\bar{g}}_{n} : = {\bar{g}}_{n} (θ)

,

{\hat{Ω}}_{n} : = {\hat{Ω}}_{n} (θ)

and

λ : = λ_{ρ, n} (θ)

. Accordingly, using Proposition 1(A), and assumptions A1 and A2, we obtain

\begin{matrix} ∥\sqrt{n} \sum_{i = 1}^{n} {\tilde{Y}}_{i, n} \{π_{ρ, i, n} (θ) - π_{E E L, i, n} (θ)\}∥ \\ \leq & ∥\frac{1}{n} \sum_{i = 1}^{n} \{ρ_{2} ({\bar{v}}_{i}) - ρ_{2} (0)\} {\tilde{Y}}_{i, n} g_{i}^{'} \sqrt{n} λ∥ + o_{P} (1) \times ∥\frac{1}{n} \sum_{i = 1}^{n} {\tilde{Y}}_{i, n} g_{i}^{'}∥ + {\bar{g}}_{n}^{'} {\hat{Ω}}_{n}^{- 1} {\bar{g}}_{n} \times ∥\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} {\tilde{Y}}_{i, n}∥ \\ \leq & \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {\{ρ_{2} ({\bar{v}}_{i}) - ρ_{2} (0)\}}^{2}} \times ∥\frac{1}{n} \sum_{i = 1}^{n} {\tilde{Y}}_{i, n} g_{i}^{'}∥ \times ∥\sqrt{n} λ∥ + o_{P} (1) \times ∥\frac{1}{n} \sum_{i = 1}^{n} {\tilde{Y}}_{i, n} g_{i}^{'}∥ + {\bar{g}}_{n}^{'} {\hat{Ω}}_{n}^{- 1} {\bar{g}}_{n} \times ∥\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} {\tilde{Y}}_{i, n}∥ \\ \leq & b \times \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {| λ^{'} g_{i} |}^{2}} \times ∥\frac{1}{n} \sum_{i = 1}^{n} {\tilde{Y}}_{i, n} g_{i}^{'}∥ \times ∥\sqrt{n} λ∥ + o_{P} (1) \times ∥\frac{1}{n} \sum_{i = 1}^{n} {\tilde{Y}}_{i, n} g_{i}^{'}∥ + {\bar{g}}_{n}^{'} {\hat{Ω}}_{n}^{- 1} {\bar{g}}_{n} \times ∥\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} {\tilde{Y}}_{i, n}∥ \\ \leq & b \times \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {∥ g_{i} ∥}^{2}} \times ∥\frac{1}{n} \sum_{i = 1}^{n} {\tilde{Y}}_{i, n} g_{i}^{'}∥ \times \sqrt{n} {∥λ∥}^{2} + o_{P} (1) \times ∥\frac{1}{n} \sum_{i = 1}^{n} {\tilde{Y}}_{i, n} g_{i}^{'}∥ + {\bar{g}}_{n}^{'} {\hat{Ω}}_{n}^{- 1} {\bar{g}}_{n} \times ∥\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} {\tilde{Y}}_{i, n}∥ \\ = & O_{P} (1) \times O_{P} (1) \times O_{P} (n^{- 1 / 2}) + o_{P} (1) \times O_{P} (1) + O_{P} (n^{- 1}) \times O_{P} (1) = o_{P} (1), \end{matrix}

from the standard arguments, for example,

∥ n^{- 1} \sum_{i = 1}^{n} {\tilde{Y}}_{i, n} g_{i}^{'} ∥ \leq ∥ n^{- 1} \sum_{i = 1}^{n} {\tilde{Y}}_{i, n} {(g_{i} - {\bar{g}}_{n})}^{'} ∥ + ∥ ({\bar{Y}}_{n} - E [{\bar{Y}}_{n}]) {\bar{g}}_{n} ∥ \leq ∥ V_{Y g} ∥ + o_{P} (1) + O_{P} (n^{- 1 / 2} ∥ V_{Y Y} ∥) \times O_{P} (∥ {\bar{g}}_{n} ∥) = O_{P} (1)

.

References

Hall, A. Generalized Method of Moments; Advanced Texts in Econometrics; Oxford University Press: Oxford, UK, 2005. [Google Scholar]
Staiger, D.; Stock, J. Intrumal Variables Regression with Weak Instruments. Econometrica 1997, 65, 557–586. [Google Scholar]
Antoine, B.; Renault, E. Efficient GMM with Nearly-Weak Instruments. Econom. J. 2009, 12, S135–S171. [Google Scholar]
Andrews, D.; Cheng, X. Estimation and Inference with Weak, Semi-Strong, and Strong Identification. Econometrica 2012, 80, 2153–2211. [Google Scholar]
Dufour, J.M. Some Impossibility Theorems in Econometrics with Applications to Structural and Dynamic Models. Econometrica 1997, 65, 1365–1387. [Google Scholar]
Newey, W.K.; West, K.D. Hypothesis Testing with Efficient Method of Moments Estimation. Int. Econ. Rev. 1987, 28, 777–787. [Google Scholar]
Wang, J.; Zivot, E. Inference on Structural Parameters in Instrumental Variables Regression with Weak Instruments. Econometrica 1998, 66, 1389–1404. [Google Scholar]
Chaudhuri, S.; Renault, E. Shrinkage of Variance for Minimum Distnce Based Tests. Econom. Rev. 2015, 34, 328–351. [Google Scholar]
Chaudhuri, S.; Renault, E. Score Tests in GMM: Why Use Implied Probabilities. J. Econom. 2020, 219, 260–280. [Google Scholar]
Newey, W.K.; Smith, R.J. Higher Order Properties of GMM and Generalized Empirical Likelihood Estimators. Econometrica 2004, 72, 219–255. [Google Scholar]
Kitamura, Y.; Stutzer, M. An Information-Theoretic Alternative to Generalized Method of Moments Estimation. Econometrica 1997, 65, 861–874. [Google Scholar]
Golan, A. Information and entropy economerics—A review and synthesis. Found. Trends Econom. 2006, 2, 1–145. [Google Scholar]
Guggenberger, P.; Kleibergen, F.; Mavroeidis, S.; Chen, L. On the Asymptotic Sizes of Subset Anderson-Rubin and Lagrange Multiplier Tests in Linear Instrumental Variables Regression. Econometrica 2012, 80, 2649–2666. [Google Scholar]
Chaudhuri, S.; Zivot, E. A new method of projection-based inference in GMM with weakly identified nuisance parameters. J. Econom. 2011, 164, 239–251. [Google Scholar]
Smith, R.J. Alternative semi-parametric likelihood approaches to generalized method of moments estimation. Econ. J. 1997, 107, 503–519. [Google Scholar]
Chaudhuri, S.; Rose, E.; Estimating the Veteran Effect with Endogenous Schooling when Instruments are Potentially Weak. Technical Report, University of North Carolina, Chapel Hill and University of Washington. 2010. Available online: https://saraswata.research.mcgill.ca/sc_er_10.pdf (accessed on 2 February 2025).
Angrist, J. Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records. Am. Econ. Rev. 1990, 80, 313–336. [Google Scholar]
Card, D. Using Geographical Variation in College Proximity to Estimate the Return to Schooling. Aspects of Labour Market Behavior: Essays in Honor of John Vanderkamp; University of Toronto Press: Toronto, ON, Canada, 1995. [Google Scholar]
Golan, A.; Judge, G.; Miller, D. Maximum Entropy Econometrics: Robust Estimation with Limited Data; John Wiley & Sons: New York, NY, USA, 1996. [Google Scholar]
Imbens, G.W.; Spady, R.H.; Johnson, P. Information Theoretic Approaches to Inference in Moment Condition Models. Econometrica 1998, 66, 333–357. [Google Scholar]
Smith, R.J. Empirical Likelihood Estimation and Inference. In Applications of Differential Geometry to Econometrics; Salmon, M., Marriott, P., Eds.; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar]
Kitamura, Y. Empirical Likelihood Methods in Econometrics: Theory and Practice. In Advances in Economics and Econometrics: Theory and Applications, Ninth World Congress; Blundell, R., Newey, W., Persson, T., Eds.; Cambridge University Press: Cambridge, UK, 2007. [Google Scholar]
Guggenberger, P.; Smith, R. Generalized Empirical Likelihood Estimators and Tests under Partial, Weak and Strong Identification. Econom. Theory 2005, 21, 667–709. [Google Scholar]
Caner, M. Exponential tilting with weak instruments: Estimation and testing. Oxf. Bull. Econ. Stat. 2010, 72, 307–326. [Google Scholar]
Andrews, A.; Stock, J.; Sun, L. Weak Instruments in IV Regression: Theory and Practice. Annu. Rev. Econ. 2019, 11, 727–753. [Google Scholar]
Andrews, D.W.K. Identification-Robust Subvector Inference; Yale University Working Paper; Yale University: New Haven, CT, USA, 2017. [Google Scholar]
Andrews, I. Valid Two-Step Identification-Robust Confidence Sets for GMM. Rev. Econ. Stat. 2018, 100, 337–348. [Google Scholar]
Otsu, T. Generalized Empirical Likelihood Inference for Nonlinear and Time Series Models under Weak Identification. Econom. Theory 2006, 22, 513–527. [Google Scholar] [CrossRef]
Guggenberger, P.; Smith, R. Generalized empirical likelihood tests in time series models with potential identification failure. J. Econom. 2008, 142, 134–161. [Google Scholar] [CrossRef]
Guggenberger, P.; Kleibergen, F.; Mavroeidis, S. A more powerful subvector Anderson Rubin test in linear instrumental variables regression. Quant. Econ. 2019, 10, 487–526. [Google Scholar] [CrossRef]
Kleibergen, F. Efficient size correct subset inference in homoskedastic linear instrumental variables regression. J. Econom. 2021, 221, 78–96. [Google Scholar] [CrossRef]
Kleibergen, F.; Mavroeidis, S.; Guggenberger, P. A powerful subvector Anderson-Rubin test in linear instrumental variables regression with conditional heteroskedasticity. Econom. Theory 2024, 40, 957–1002. [Google Scholar]
Kleibergen, F. Testing Parameters In GMM Without Assuming That They Are Identified. Econometrica 2005, 73, 1103–1123. [Google Scholar] [CrossRef]
Schennach, S.M. Point estimation with exponentially tilted empirical likelihood. Ann. Stat. 2007, 35, 634–672. [Google Scholar] [CrossRef]
Dufour, J.M.; Taamouti, M. Projection-Based Statistical Inference in Linear Structural Models with Possibly Weak Instruments. Econometrica 2005, 73, 1351–1365. [Google Scholar] [CrossRef]
Dufour, J.M.; Taamouti, M. Further Results on Projection-Based Inference in IV Regressions with Weak, Collinear or Missing Instruments. J. Econom. 2007, 139, 133–153. [Google Scholar] [CrossRef]
Chaudhuri, S. Projection-Type Score Tests for Subsets of Parameters. Ph.D. Thesis, University of Washington, Seattle, WA, USA, 2008. [Google Scholar]
Zivot, E.; Chaudhuri, S. Comment: Weak Instrument Robust Tests in GMM and the New Keynesian Phillips Curve by F. Kleibergen and S. Mavroeidis. J. Bus. Econ. Stat. 2009, 27, 328–331. [Google Scholar] [CrossRef]
Chaudhuri, S.; Richardson, T.; Robins, J.M.; Zivot, E. Split-Sample Score Tests in Linear Instrumental Variables Regression. Econom. Theory 2010, 26, 1820–1837. [Google Scholar] [CrossRef]
Cox, D.R.; Hinkley, D.V. Theoretical Statistics; Chapman and Hall: London, UK, 1974. [Google Scholar]
Neyman, J. Optimal Asymptotic Test of Composite Statistical Hypothesis. In Probability and Statistics, the Harald Cramer Volume; Grenander, U., Ed.; Almqvist and Wiksell: Uppsala, Sweden, 1959; pp. 313–334. [Google Scholar]
Bera, A.K.; Bilias, Y. Rao’s score, Neyman’s C(α) and Silvey’s LM tests: An essay on historical developments and some new results. J. Stat. Plan. Inference 2001, 97, 9–44. [Google Scholar] [CrossRef]
Kleibergen, F. Pivotal Statistics for Testing Structural Parameters in Instrumental Variables Regression. Econometrica 2002, 70, 1781–1803. [Google Scholar]
Stock, J.H.; Wright, J.H. GMM with Weak Identification. Econometrica 2000, 68, 1055–1096. [Google Scholar]
Sun, L. Implementing valid two-step identification-robust confidence sets for linear instrumental-variables model. Stata J. 2018, 18, 803–825. [Google Scholar] [CrossRef]
Chaudhuri, S.; Renault, E. Finite-Sample Improvements of Score Tests by the Use of Implied Probabilities from Generalized Empirical Likelihood; Technical report; McGill University: Montreal, QC, Canada, 2011. [Google Scholar]
Antoine, B.; Bonnal, H.; Renault, E. On the efficient use of the informational content of estimating equations: Implied probabilities and Euclidean empirical likelihood. J. Econom. 2007, 138, 461–487. [Google Scholar]
Angrist, J. The Draft Lottery and Voluntary Enlistment in the Vietnam Era. J. Am. Stat. Assoc. 1991, 86, 584–595. [Google Scholar]
Card, D. The Causal Effect of Education on Earnings. Handb. Labor Econ. 1999, 3, 1801–1863. [Google Scholar]
Card, D. Estimating the Return to Schooling: Progress on Some Persistent Econometric Problems. Econometrica 2001, 69, 1127–1160. [Google Scholar]

Figure 1. The values of

θ_{10}

below the horizontal line are included in the confidence interval obtained by inverting the refined projection test (blue line) and the subset-AR test (green line).

Figure 1. The values of

θ_{10}

below the horizontal line are included in the confidence interval obtained by inverting the refined projection test (blue line) and the subset-AR test (green line).

Table 1. Finite-sample rejection rate (in %) of score tests for

H_{10} : θ_{1} = θ_{10}

with nominal level

α = 5 %

. The asymptotic size of the refined projection test cannot exceed

α + τ

.

Table 1. Finite-sample rejection rate (in %) of score tests for

H_{10} : θ_{1} = θ_{10}

with nominal level

α = 5 %

. The asymptotic size of the refined projection test cannot exceed

α + τ

.

Nominal Level		Plug-in and Refined Projection Score Tests with $τ = 5 %$ and $τ = 1 %$
$α = 5 %$		$π^{G} (.) = π^{V} (.) = 1 / n$			$π^{G} (.) = π^{V} (.) = {\hat{π}}^{(1)} (.)$ : EEL			$π^{G} (.) = π^{V} (.) = {\hat{π}}^{(- 1)} (.)$ : EL
n	$θ_{10} - θ_{1}^{0}$	Plug-in	$τ = 5 %$	$τ = 1 %$	Plug-in	$τ = 5 %$	$τ = 1 %$	Plug-in	$τ = 5 %$	$τ = 1 %$
100	−1	99.7	99.3	98.4	97.5	96.3	95.2	99.5	99.4	98.9
100	−0.8	98.6	96.9	93.7	92.5	89.5	87.0	97.6	96.7	95.2
100	−0.6	93.4	87.7	79.2	78.4	72.3	67.3	91.2	88.7	83.5
100	−0.4	76.2	64.0	50.4	50.9	42.9	37.6	72.6	64.9	55.4
100	−0.2	42.1	28.6	18.1	20.3	15.2	12.2	39.1	29.3	21.3
100	0	10.6	6.1	2.9	7.8	5.6	4.6	13.0	7.7	5.4
100	0.2	6.6	6.1	6.0	21.8	18.6	17.9	26.1	20.5	20.1
100	0.4	34.4	34.4	34.4	57.4	50.8	48.5	68.4	63.8	63.6
100	0.6	74.4	74.4	74.4	76.4	66.5	61.6	94.7	93.4	93.4
100	0.8	92.3	92.3	92.3	62.5	49.0	43.1	99.8	99.8	99.8
100	1	96.2	96.2	96.2	37.8	23.2	18.5	100.0	100.0	100.0
1000	−0.3162	99.3	99.3	99.1	99.3	97.6	92.9	97.1	96.1	94.9
1000	−0.253	96.9	96.3	95.7	97.6	93.4	84.7	91.6	88.1	86.1
1000	−0.1897	86.9	84.9	83.1	91.2	81.5	68.4	76.4	71.1	67.1
1000	−0.1265	60.5	57.3	53.8	73.1	56.9	41.4	48.1	42.3	37.5
1000	−0.0632	25.6	23.2	20.6	39.5	24.7	14.8	18.9	14.9	12.4
1000	0	6.2	5.6	4.8	11.9	6.1	3.1	6.8	4.8	4.0
1000	0.0632	11.4	11.4	11.3	15.8	7.4	5.8	21.3	18.8	18.8
1000	0.1265	45.1	45.1	45.1	33.2	16.0	13.0	61.1	58.2	58.2
1000	0.1897	85.6	85.6	85.6	24.8	13.3	11.1	92.0	90.8	90.8
1000	0.253	98.5	98.5	98.5	8.9	5.1	4.3	99.5	99.4	99.4
1000	0.3162	100.0	100.0	100.0	1.5	1.0	0.8	100.0	100.0	100.0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Carrasco, M.; Chaudhuri, S. Weak Identification Robust Tests for Subvectors Using Implied Probabilities. Entropy 2025, 27, 396. https://doi.org/10.3390/e27040396

AMA Style

Carrasco M, Chaudhuri S. Weak Identification Robust Tests for Subvectors Using Implied Probabilities. Entropy. 2025; 27(4):396. https://doi.org/10.3390/e27040396

Chicago/Turabian Style

Carrasco, Marine, and Saraswata Chaudhuri. 2025. "Weak Identification Robust Tests for Subvectors Using Implied Probabilities" Entropy 27, no. 4: 396. https://doi.org/10.3390/e27040396

APA Style

Carrasco, M., & Chaudhuri, S. (2025). Weak Identification Robust Tests for Subvectors Using Implied Probabilities. Entropy, 27(4), 396. https://doi.org/10.3390/e27040396

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Weak Identification Robust Tests for Subvectors Using Implied Probabilities

Abstract

1. Introduction

2. Implied Probabilities for Hypothesis on Subsets of Parameters in the GMM Framework

2.1. Background

2.2. Assumptions

2.3. Properties of GEL Implied Probabilities

3. Score Test for Subsets of Parameters Using the Implied Probabilities

3.1. Score Vector and Score Statistic Using the Implied Probabilities

3.2. Refined Projection Score Test Using the Implied Probabilities

4. Monte Carlo Experiment

4.1. Design

4.2. Results

5. Application to the Impact of Veteran Status on Earnings

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

Appendix A.1. Assumptions Involving Weak Identification

Appendix A.2. Proofs

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI