Next Article in Journal
A Review on Variable Selection in Regression Analysis
Next Article in Special Issue
Information Flow in Times of Crisis: The Case of the European Banking and Sovereign Sectors
Previous Article in Journal / Special Issue
Estimation of Treatment Effects in Repeated Public Goods Experiments
Article

On the Stock–Yogo Tables

1
Department of Economics, The University of Melbourne, Parkville, 3010, Australia
2
Department of Economics and IEU, University of Bristol, Bristol, BS8 1TU, UK
*
Author to whom correspondence should be addressed.
Econometrics 2018, 6(4), 44; https://doi.org/10.3390/econometrics6040044
Received: 31 March 2018 / Revised: 17 October 2018 / Accepted: 8 November 2018 / Published: 13 November 2018
(This article belongs to the Special Issue Celebrated Econometricians: Peter Phillips)

Abstract

A standard test for weak instruments compares the first-stage F-statistic to a table of critical values obtained by Stock and Yogo (2005) using simulations. We derive a closed-form solution for the expectation from which these critical values are derived, as well as present some second-order asymptotic approximations that may be of value in the presence of multiple endogenous regressors. Inspection of this new result provides insights not available from simulation, and will allow software implementations to be generalised and improved. Finally, we explore the calculation of p-values for the first-stage F-statistic weak instruments test.
Keywords: weak instruments; hypothesis testing; Stock–Yogo tables; hypergeometric functions; quadratic forms; p-values weak instruments; hypothesis testing; Stock–Yogo tables; hypergeometric functions; quadratic forms; p-values

1. Introduction

In a seminal contribution, Phillips (1989) focussed the attention of the profession on the distributional consequences of what has come to be known as the problem of weak instruments. There followed a series of important papers that helped crystallise the consequences for inference of this problem including, but not restricted to, Nelson and Startz (1990a, 1990b); Buse (1992); Cragg and Donald (1993); Dufour (1997) and Kleibergen (2002). There have been many, many responses seeking to address the issues raised in this literature, some with greater merit than others. Phillips continues to make substantial contributions to this area of the literature. See, for example, Phillips (2016, 2017); Phillips and Gao (2017). One development that, for better or worse,1 has had significant practical impact, as a consequence of its inclusion in widely used econometric software, is that of Stock and Yogo (2005), hereafter SY. It is this latter development which is the focus of this paper.
The fundamental contribution of SY was to develop quantitative definitions of weak instruments, based on either IV estimator bias or Wald test size distortion, that were testable. Their idea was to relate the first-stage F-statistic (or, when there are multiple endogenous regressors, the Cragg–Donald (Cragg and Donald 1993) statistic) to a non-centrality parameter that, in turn, was related to the aforementioned estimator bias or test size distortion. In this way, they were able to use this F-statistic to test whether instruments were weak.2 That such tests have become part of the toolkit of many practitioners is evidenced by the fact that critical values for the SY tests are available within standard computer software, such as Stata (StataCorp 2015, e.g.,) when using either the intrinsic ivregress command or the ivreg2 package (Baum et al. 2010). The difficulty in the SY approach is that, in order to compute appropriate critical values, it is necessary to evaluate a complicated integral as an intermediate step. SY did this by Monte Carlo simulation and the tables of critical values they provided are widely used in practice.
In this paper, we focus primarily on the two-stage least squares (2SLS) bias representation of weak instruments for the single endogenous variable case, which we shall call the scalar case. Although the SY tables allow for more endogenous regressors, Sanderson and Windmejer (2016) demonstrate how these situations can all be mapped into the single endogenous regressor case, making it the case of greatest interest and importance. We show that, for this case, the integral mentioned above need not be estimated by simulation methods, as it can be resolved analytically and evaluated numerically using the intrinsic functions of software such as Matlab (MathWorks 2016). This result, Theorem 1, is presented in the next section, in which we also provide complete details of the model in question and the problem to be addressed.
From an empirical perspective, there are two important consequences of Theorem 1. First, it allows us to examine the accuracy of the SY critical values that have become so important in empirical research. For the most part, these critical values concord reasonably well with those that we derive analytically, although the most substantial differences occur in regions that we would argue are of practical significance.3 Second, it is now straightforward to generate more extensive sets of critical values, something that we do in Table 1 (also in Section 2). In particular, we extend the SY tables to include more values of both k z and B, where k z is the number of instruments and B denotes the bias of 2SLS relative to that of ordinary least squares (OLS).
From a theoretical perspective, Theorem 1 provides a foundation that allows us to explore analytically certain patterns that exist in the SY tables, something that can only be alluded to on the basis of simulation results. These cases are explored in Section 3 of the paper. To further support the discussion of Section 3, we present in Section 4 some Monte Carlo simulation results, where we explore the sampling distributions of the F-statistics in relation to the bias of the 2SLS estimator relative to that of the OLS estimator, for the k z = 2 and k z = 3 cases.
Key to the development of Theorem 1 is the expectation of the ratio of a bilinear form in perfectly correlated normally distributed random variables, that differ only in their means, to a quadratic form in one of these same random variables, which is of some independent interest. We note in passing that the problem could be re-cast as one involving the expectation of a ratio of quadratic forms in normal variables, although in this form the normal variables have a singular distribution and both the numerator and the denominator weighting matrices are also singular, with the numerator weighting matrix asymmetric. This observation explains the difficulty in evaluating the integral analytically, but is also the reason that the expectation ultimately has such a simple structure. This expectation is evaluated in Appendix A.
Given the recent statement on p-values issued by the American Statistical Association Board of Directors (Wasserstein and Lazar 2016), it would be remiss of a paper such as this to be silent on the matter. In Section 5, we extend our discussion to show how p-values can be readily calculated on the basis of our earlier results.
Having extensively studied the scalar case, in Section 6, we turn our attention to analysing the model when there are multiple endogenous regressors in the model of interest. We shall, hereafter, refer to this as the general case. We find that we are able to draw on a variety of results developed in the exact finite sample literature. Not that any of our results are exact, they are all asymptotic in nature, but they do share a common structure with the earlier work that we are able to exploit. In addition to obtaining results for the relative bias of 2SLS, in the general case, we also provide first- and second-order asymptotic approximations to the relative bias, where the nesting sequence is the number of instruments. Overall, the analysis of the general case is less favourable to the SY approach than is the scalar case.
Final remarks appear in Section 8. For the most part, we have relegated technical developments to the various appendices, as well as discussion of some matters deemed secondary to the main ideas of the paper.

2. An Analytic Development of Stock–Yogo

Consider the simple model
y = x β + u ,
where y = [ y 1 , , y n ] , x = [ x 1 , , x n ] and u = [ u 1 , , u n ] are n × 1 vectors, with n the number of observations. The regressor x is assumed endogenous, so that E u | x 0 . Other exogenous regressors in the model, including the constant, have been partialled out.
We can implicitly define a set of instruments via the following linear projection
x = Z π + v ,
where Z is an n × k z matrix of instruments (with full column rank), π a k z × 1 vector of parameters and v is an n × 1 error vector. In this model, k z 1 is the degree of over-identification. We assume that individual observations are independently and identically distributed, and
u i v i z i 0 , Σ , with Σ = σ u 2 σ u v σ u v σ v 2 , i = 1 , 2 , , n ,
where z i denotes the ith row of Z. A test for H 0 : π = 0 against H 1 : π 0 , is the so-called first-stage F-statistic
F = π ^ Z Z π ^ k z σ ^ v 2 d H 0 χ k z 2 k z ,
where π ^ = ( Z Z ) 1 Z x and σ ^ v 2 = n 1 x ( I n Z ( Z Z ) 1 Z ) x . Here, a large value of the statistic is evidence against the null hypothesis, which is that the nominated instruments are irrelevant.
Following Staiger and Stock (1997), we consider values of π local to zero, as π = c / n . We then obtain for the concentration parameter μ n 2 ,
μ n 2 = π Z Z π σ v 2 p c Q z z c σ v 2 μ 2 ,
where Q z z = E z i z i = plim n n 1 Z Z is positive definite by assumption. We see that k z F is a sample analogue of μ 2 . With this formulation, the testing problem previously discussed is equivalent to that of testing H 0 : μ 2 = 0 against H 1 : μ 2 > 0 . Rather than testing for the irrelevance of instruments, SY characterised weak instruments as a situation where μ 2 was greater than zero but proximate to it. Specifically, their testing problem can be thought of as H 0 : μ 2 = μ 0 2 > 0 against H 1 : μ 2 > μ 0 2 , for some suitably specified value of μ 0 2 . The statistic F is still a natural one in this problem; although, of course, the null distribution is no longer the central distribution associated with μ 0 2 = 0 . Instead, we have
F d H 0 χ k z , μ 0 2 2 k z ,
where χ k , δ 2 denotes a random variable following a non-central chi-squared distribution with k degrees of freedom and non-centrality parameter δ 0 .4 Let
χ α = χ k z , μ 0 2 2 ( 1 α )
denote the ( 1 α ) 100 th quantile of a non-central chi-squared distribution with k z degrees of freedom and non-centrality parameter μ 0 2 . Then, the relevant size α critical region is
F : F > c v α = χ α k z ,
where χ α can be obtained, for given μ 0 2 and k z , as the solution to either of the equations
1 α = e μ 2 / 2 j = 0 ( μ 2 / 2 ) j 2 k z / 2 + j j ! Γ k z 2 + j 0 χ α e s / 2 s k z / 2 + j 1 d s
= e μ 2 / 2 e χ α / 2 χ α 2 k z / 2 j = 0 ( χ α μ 2 / 4 ) j j ! Γ k z + 2 2 + j 1 F 1 1 ; k z + 2 2 + j ; χ α 2 ,
and where 1 F 1 ( · ; · ; · ) denotes a confluent hypergeometric function (Slater 1960).
The aspect of the SY approach that remains outstanding is the choice of μ 0 2 . Their quantitative definition of the weakness of a set of instruments is couched in terms of the impact that it has on inference. They provided two possible definitions that variously reflect the known consequences of weak instruments for (i) estimation, through the bias of the estimator, and (ii) hypothesis testing, through the size of a particular Wald test relative to its nominal size. It is the former that is in most common use and the approach of interest here.5
In particular, SY relate the bias of the 2SLS estimator of β , β ^ 2 S L S , relative to that of the ordinary least squares estimator of β , β ^ O L S , to the first-stage F-statistic by showing that they are both related to μ 2 . A value for μ 2 , denoted μ 0 2 , is then chosen to allow a certain level of relative bias. Specifically, let B n denote the relative bias for a given sample size n, i.e.,
B n = E β ^ 2 S L S , n β E β ^ O L S , n β .
As discussed in Chao and Swanson (2007), if there exists a positive integer N < such that6
sup n N E β ^ 2 S L S , n β 1 + δ <
for some δ > 0, then the limit of the sequence of finite sample biases will coincide with the bias computed from the local-to-zero asymptotic distribution. That is,
lim n | B n | = E ( ξ λ 0 ) ξ ξ ξ | B | ,
where ξ N λ 0 , I k z . The test for weak instruments then proceeds as follows:
  • The practitioner chooses a value for | B | , e.g., | B | = 0.1 , if an asymptotic relative bias of less than 10% is deemed acceptable.
  • Given k z and | B | , μ 0 2 = λ 0 λ 0 is obtained on solving (8).
  • Given μ 0 2 , critical values for F can be determined, which are proportional to those of the non-central chi-squared distribution as specified in (4).
  • The null of weak instruments is then rejected for sufficiently large values of the first-stage F-statistic, and we conclude that | B | is no larger than the value chosen in Step 1 above.
The difficulty in the procedure just described is that, at Step 2, there is an integral that must be evaluated as part of a search for μ 0 2 . SY do this using a 20,000 draw Monte Carlo simulation. This is unnecessary as the integral can be solved analytically. The result is summarised in the following theorem.
Theorem 1.
If B is as defined in Equation (8), then, provided k z 2 ,
B = 1 F 1 1 ; k z 2 ; μ 0 2 2 > 0 ,
where, as noted following (6), 1 F 1 ( · ; · ; · ) denotes a confluent hypergeometric function.
Proof. 
The result follows immediately from Theorem A1, in Appendix A, which establishes the equality, and from the observation that if b a > 0 but s < 0 then 0 < 1 F 1 a ; b ; s 1 , which establishes the inequality.7 □
That Theorem 1 involves a confluent hypergeometric function is not surprising as they have long figured prominently in the finite sample literature; see, for example, Phillips (1980) and the papers cited therein. These functions have been very intensively studied in the mathematics literature over a period of hundreds of years and so an important consequence of Theorem 1 is that it allows the use of efficiently programmed intrinsic functions in readily available software, such as Matlab (MathWorks 2016), at each step of a search for μ 0 2 rather than having to estimate an integral by simulation.8 For the special case of k z = 2 ,
1 F 1 1 ; k z 2 ; μ 0 2 2 = exp μ 0 2 2 ,
making evaluation of the expression especially simple.
Using our result, we provide in Table 1 an extended version of that panel of SY Table 5.1 corresponding to a single endogenous variable, which is the set of critical values most commonly used. We note that SY start their tables at k z = 3 even though, following the arguments of Kinal (1980), finite biases will exist for all k z 2 if one is prepared to make a normality assumption. As this is a practically relevant case, we include it in Table 1. However, such inclusion is not without controversy. The mode of convergence leading to Label (8) is convergence in distribution. Existence of moments is not sufficient to imply convergence in expectation, which is a stronger result (see Label (7) and the accompanying discussion). Heuristically, (7) might be interpreted as meaning that a little more than simply the existence of the moments of the estimators is required for the sequence of biases to converge to the local-to-zero asymptotic results, and so this might be achieved by requiring k z 3 in the case of a single endogenous regressor rather than just k z 2 , although we note that (7) doesn’t actually say this. In any event, the inclusion in Table 1 of the row k z = 2 may be viewed as something of an ad hoc approximation. Some confidence in the value of the approximation may be garnered from the simulation results presented in Section 4.
Where Table 1 overlaps with SY (Table 5.1), we are able to provide an indication of the difference made by the analytical evaluation of the expectation in (8). As shown in Table 2, the differences are typically small, with the largest differences when k z and B are themselves small, which we would argue is the most important case in practice.9

3. Some Further Consequences of Theorem 1

Theorem 1 allows us to prove a variety of further results that can only be speculated about on the basis of simulation results.
Remark 1.
Implicit in Theorem 1 is the observation that, whenever k z 2 , OLS and 2SLS are always asymptotically biased in the same direction, making the absolute value function of B in (8) redundant.
Remark 2.
The values of the limiting relative biases of β ^ 2 S L S and β ^ O L S are explored in Figure 1 for different values of the parameters k z and μ 2 / 2 . The figure illustrates that, for k z 2 , the function is increasing in its argument, which is μ 2 / 2 . Note also that, as μ 2 0 , the information in the instruments approaches zero, and so the local-to-zero asymptotic bias of β ^ 2 S L S approaches that of β ^ O L S from below. Hence, the limit of the relative asymptotic biases at μ 2 = 0 is unity, which is the value of 1 F 1 1 ; k z 2 ; 0 .
Remark 3.
Certain patterns in Table 1 are readily established, as illustrated by the following result.10
Theorem 2.
The critical values c v α are decreasing functions of B for given k z .
Proof. 
See Appendix B.2. □
Heuristically, Theorem 2 states that the critical values will necessarily decrease as one moves from left to right across any given row of Table 1; that is, the critical values decrease as the practitioner is willing to accept increasing amounts of 2SLS bias relative to that of OLS. The intuition behind the results is as follows. An increase in B for fixed k z implies that the argument of the confluent hypergeometric function in (9) must increase, i.e., that μ 2 / 2 must decrease. As μ 2 approaches zero, the non-central chi-squared distribution from which critical values are drawn approaches a central chi-squared and the corresponding quantiles become smaller. Hence, as one moves across columns from left to right in Table 1, the c v α become smaller.
Theorem 2 explains the row behaviour of Table 1. Explaining the column behaviour is much more complicated. Observation suggests the following to be true.
Conjecture 1.
For given B, the critical values c v α , presented in Table 1, are increasing functions of k z up to some value, k say, whereafter they are decreasing functions of k z . k is a decreasing function of B.
Some intuition for Conjecture 1 is available from the definition of c v α , see (5), if one considers the impact of increasing the number of instruments by one, from k z to k z + 1 , with superscripts ‘0’ and ‘1’ distinguishing the two cases, respectively. For given B and α ,
c v α 1 c v α 0 0 as χ α 1 χ α 0 χ α 0 1 k z .
k is then that value of k z after which the c v α start diminishing.
Remark 4.
Although B does not exist when k z = 1 , the confluent hypergeometric function of Theorem 1 remains well-defined. In Appendix D, we analyse the properties of 1 F 1 1 ; 1 2 ; μ 2 2 .

4. Some Monte Carlo Results

We follow Sanderson and Windmejer (2016) and specify the model is as in (1) and (2), with β = 1 and
u i v i N 0 0 , 1 0.5 0.5 1 .
The instruments in Z are k z independent standard normally distributed random variables and π = c B k z ι k z / n , where ι k z is a k z vector of ones, and with c B k z chosen such that the relative bias B is equal to 0.01 , 0.05 , 0.10 or 0.20 , for values of k z = 3 and k z = 2 . The sample size n = 10,000 and the results are presented in Table 3 for 100,000 Monte Carlo replications.
For k z = 3 , the results are exactly in line with the theory: the Monte Carlo relative biases are equal to B and the rejection frequencies of the first-stage F-test are 5% at the 5% nominal level, using the critical values reported in Table 1.
The results for k z = 2 are also in line with the theory, although we see here that the standard deviations of β ^ 2 S L S are much larger than those of the k z = 3 case at the same values of B. This is due to the fact that the information needed to obtain the same relative bias is much smaller for the k z = 2 case than for the k z = 3 case, as reflected by their smaller μ 0 2 / k z values, but it also reflects the problem that the second moment does not exist when the degree of over-identification is equal to 1. The interquartile ranges for the 2SLS estimator when k z = 2 are 0.3296 , 0.4170 , 0.4811 and 0.5570 for B = 0.01 , 0.05 , 0.10 and 0.20 , respectively. These Monte Carlo results therefore confirm our theoretical findings for the k z = 2 case. Clearly some caution should be exercised when working with 2SLS in this case because it possesses no second moment.

5. p -Values

p-values are readily available as a straightforward extension of our earlier analysis. Specifically, from (4), we have the limiting result
k z × F d H 0 χ k z , μ 0 2 2 .
For any particular sample value of the F-test, say F ^ , if X χ k z , μ 0 2 2 , then the p-value for the SY weak instruments test considered in this paper is simply Pr X k z × F ^ . Of course, the problem here is the determination of μ 0 2 . Table 4 reports those values of μ 0 2 / k z that were calculated in order to construct Table 1. For those values of B considered in Table 1, we now have the parameters k z and μ 0 2 / k z . Consequently, any computer software that can evaluate a non-central chi-squared cdf can readily calculate p-values for the test for weak instruments considered here.

6. Multiple Endogenous Regressors

The general case is obtained by defining x in (1) to be a matrix of endogenous regressors of dimension n × k x , say, so that β is k x × 1 . Then, (2) becomes a multivariate regression model with π of dimension k z × k x and v of dimension n × k x . The rows of [ u , v ] are independent with common covariance matrix Σ . All other aspects of the model described in (1) and (2) remain essentially unchanged.11
In terms of the SY analysis, it clearly makes no sense to proceed in terms of the ratio
E β ^ 2 S L S β E β ^ O L S β ,
given that β is now a vector rather than a scalar quantity. Thus, instead, they focus attention on the quadratic form (SY, Equation 3.8)
B 2 = ρ h h ρ ρ ρ = γ h h γ ,
where h is a matrix analogue of the expectation explored in Theorem 1 and γ = ρ ( ρ ρ ) 1 / 2 , with ρ defined in SY (p. 85). The essential feature of γ is that it is not a function of λ . Despite being somewhat more tractable, the quadratic form tells us no more about the non-centrality parameter determining the bias of the 2SLS estimator than does
h γ = E ( ξ ξ ) 1 ξ ( ξ λ ) γ = γ J ,
say, where the elements of the k z × k x matrix ξ are jointly distributed according to
vec ξ N ( vec λ , I k z × k x ) ,
with λ as defined in SY (p. 85).12 Again, we should stress that the normality of ξ is a consequence of the local-to-zero asymptotic analysis and is not a strong distributional assumption. As γ is independent of λ , it is sufficient to focus attention on the vector of expectations J . If we let e i denote the ith row of the identity matrix I k x , then we have immediately13
J = E ( ξ ξ ) 1 ξ λ γ = etr { Λ } j = 0 ϕ ψ ϕ · [ 1 ] k z 2 ϕ j ! k z 2 ψ θ ψ ϕ , 1 C ( Λ , γ , k z ) ,
where the ith element of the k x × 1 vector C ( λ , γ , k z ) is C ψ ϕ , 1 k z Λ , k z Λ γ e i , with Λ = λ λ / ( 2 k z ) .14 Here, ϕ denotes ordered partitions of j into no more than k x parts, so that ϕ = ( ϕ 1 , ϕ 2 , , ϕ p ) where the integers ϕ 1 , ϕ 2 , , ϕ p satisfy the restrictions (i) ϕ 1 ϕ 2 ϕ p , (ii) i = 1 p ϕ i = j , and (iii) p k x . The symbol ϕ then denotes the sum over all such partitions of j. For example, the ordered partitions of 2 are the so-called top partition [ 2 ] and ( 1 , 1 ) .15 The invariant polynomials C ψ ϕ , 1 ( · , · ) and the symbol ψ ϕ · [ 1 ] are defined in Davis (1979), p. 465, with θ ψ ϕ , 1 = C ψ ϕ , 1 ( I k z , I k z ) / C ψ ( I k z ) a constant that may be zero. These were developed as extensions (to two matrix arguments) of the zonal polynomials C ψ ( · ) originally due to James (1961).16 Finally, the generalised hypergeometric coefficients are defined as (Constantine 1963 Equation (26))
a κ = i = 1 m a 1 2 ( i 1 ) k i , κ = ( k 1 , , k m ) ,
where b n = b ( b + 1 ) ( b + n 1 ) , b 0 = 1 is the usual Pochhammer symbol or forward factorial (see Slater 1966, Appendix I).
Expressions like (14) are computationally problematic. The available evidence suggests that the series are typically slow to converge (Phillips 1983a, 1983b). Unfortunately, the invariant polynomials of Davis are tabulated only to low order and, to date, no algorithms have been derived for their computation.17 Consequently, until such time as the computational restrictions are lifted, the practical relevance of the result is limited and is offered here only for completeness. Better progress might be made working with one of the various approximation techniques that are available, e.g., the Laplace approximation used by Phillips (1983b) to extract approximate marginal distributions for IV estimators in this more general setting. In this case, however, we can further adapt the results of Hillier et al. (1984) to obtain many instrument approximations to (14). Specifically, analogous to their Equations (32) and (33), we have
J i = e i γ ( k z k x 1 ) e i ( λ λ ) 1 γ + O ( k z 2 ) ,
and
J i = e i γ ( k z k x 1 ) [ 1 + t r { ( λ λ ) 1 } ] e i ( λ λ ) 1 γ + ( k z k x 1 ) ( k z k x 2 ) e i ( λ λ ) 2 γ + O ( k z 3 ) ,
where J i denotes the ith element of J .18
Although these approximations are operational, in the presence of multiple endogenous regressors, we question whether the approach of SY is as sound as it is in the single endogenous regressor case. Our concern is rooted in the structure of the Davis polynomials themselves. For matrix arguments, X and Y with given indices k and l, respectively, the polynomials are ‘linear combinations of the distinct products of traces
( t r X a 1 Y b 1 X c 1 ) r 1 ( t r X a 2 Y b 2 X c 2 ) r 2
of total degree k, l in the elements of X, Y, respectively (Davis 1979, p. 468). It is immediately apparent that local-to-zero asymptotic expression for the bias of 2SLS is not a function of the eigenvalues of the concentration parameter or, at least, not a function of them alone. Stock and Yogo (2005), p. 90, remark on this themselves when discussing certain numerical results, where they observe that 2SLS bias is decreasing in all eigenvalues of the concentration parameter for all values of (what we call) k z . To focus on the smallest eigenvalue, as the Cragg–Donald statistic does, is, consequently, problematic. Another way of thinking of this problem is to consider the problem of determining the magnitude of a matrix and to ask which of the following three matrices is either largest or smallest:
diag 1 , 2 , 3 , diag 2 , 2 , 2 , diag 0 , 0 , 6 .
While consideration of the smallest eigenvalue will lead to a particular choice, it is not clear that that choice actually has that much to do with the exact behaviour of the IV estimator, except in the scalar case. Indeed, for this reason, the SY results for multiple endogenous variable cases are only approximate and provide upper bounds on critical values for the Cragg–Donald minimum eigenvalue test. That the SY approach works in the case of a single endogenous regressor is a consequence of the commutative law of multiplication that allows us to extract scalars from products in ways that we can’t in more general matrix situations. In summary, the SY approach results in a well-posed problem in the case of a single endogenous regressor but a somewhat poorly-posed problem in the case of multiple endogenous regressors.
If one must deal with multiple endogenous regressors, then we prefer the approach of Sanderson and Windmejer (2016), who define weak identification as the rank of Π being local to a rank reduction of one, which is essentially a scalar problem. By reducing the problem in this way, the approach is reduced to a well-posed problem for which the single endogenous variable results apply, only the degrees of freedom need to be adjusted for the number of endogenous variables (see Sanderson and Windmejer (2016) for details).

7. The Wisdom of Hindsight: Some Historical Remarks

As discussed in Footnote 7, helpful referees drew our attention to the fact that there are other ways that we might have approached this problem than the one that we initially chose. Indeed, once one recognises the the structure of the problem, many results become available. However, before discussing some of these, we will again stress that the results derived in the previous sections are all asymptotic in nature, there are no underlying exact distributional assumptions beyond those originally made in Staiger and Stock (1997). What these results share with the exact distribution literature is integrals with similar structures and a common approach to resolving them. Furthermore, the parameterisations adopted here are different from the canonical forms underlying the exact distributional results and so the resultant expressions are different even if their structures are reminiscent of earlier results. A prime example of this is given by the similarities between the many, local-to zero, instrument approximations of Equations (15) and (16) and the large-sample approximations of (Hillier et al. (1984, Equations (32) and (33)).
It was noted in Footnote 7 that Chao and Swanson (2007) had derived local-to-zero asymptotic expressions for the bias of both 2SLS and OLS in the scalar case, so that they might bias correct the estimators, and that these provide an alternate path to Theorem 1. If moments were the focus of our attention, then we should note that the results of Chao and Swanson (2007) have the same structure as do those of Richardson (1968) and references cited therein, who first derived moments in the scalar case with arbitrary numbers of instruments and proved Basmann’s conjecture (Richardson 1968, Section 4.3) for the existence of moments. In the general case, we should, of course, be looking to Hillier et al. (1984) for results with the same structure as presented here and to Kinal (1980), who established existence criteria in the general case. Noting that the distributions of interest in the exact finite sample literature are different from the distributions thrown up by the local-to zero asymptotics, with different parameters, it might be argued that the exact results for misspecified models are closer in spirit to what we have here. In this event, we might argue that results for moments are implicit (but unrecognisable) in Hale et al. (1980) or, in a far more recognisable form in Knight (1982), for the scalar case, or Skeels (1995) in the general case. However, here moments are only of interest as an intermediate result to obtain the relative bias that is used to obtain a non-centrality parameter that determines the non-central chi-squared distribution of interest in step 3 of the procedure given following (8). Perhaps of greater historical interest, in the scalar case, are the results of Richardson and Wu (1971) who explore the properties of the relative bias. However, these results are of limited interest for two reasons. First, and probably most important, because the parameterisation of their model differs from that generated by the local-to-zero asymptotics, their tabulated results are not comparable with what is done here. Second, to reiterate, the relative bias is only of interest to us inasmuch as it allows us, given certain other information, to chose a non-centrality parameter useful in determining SY critical values in the scalar case. That is, unlike for all of the above-mentioned papers, moments are only of tangential interest to us, as this is not a study of the estimators themselves. To the best of our knowledge, Section 6 provides the first treatment of the (local-to-zero asymptotic) relative bias in the general case.

8. Conclusions

The main contribution of this paper has been to resolve analytically an integral as a special function, obviating the need to resolve it by simulation. This integral is of independent interest in the theory of ratios of quadratic forms in normal variables. Here, it is of primary interest because it provides a functional relationship between the bias in the 2SLS estimator and the limiting sampling distribution of a test statistic that SY proposed for testing the presence of weak instruments, when the null of weak instruments is true. Analysis of this special function provides theoretical foundations for the remarks of Section 3, which explore patterns observed in Table 1 as the parameters B and k z vary. This analysis required the derivation of certain results that are of independent interest in the theory of confluent hypergeometric functions. We have also explored the problem of p-values of the aforementioned test for weak instruments, on the basis of our earlier theoretical developments. We provide information such that any computer software that can then evaluate a non-central chi-squared cdf can readily compute p-values in essentially all practical circumstances. The final contribution of this paper has been the analysis of the general case characterised by an arbitrary number of endogenous regressors. Here, we find that the analysis is able to draw heavily on the foundations laid down in the literature on exact sampling distributions. This allows us to provide expressions for both the expectation of interest and also first- and second-order many-instrument expansions of this expectation. The exact expression obtained for the integral of interest in the general case is not of great practical interest, as it involves invariant polynomials with two matrix arguments for which, at the time of writing, there exist no algorithms for their computation, except in special cases. Nevertheless, the asymptotic expansions obtained are readily computable and potentially of practical importance. Given our reservations about the usefulness of the overall procedure in the general case, we leave such explorations to others.
One aspect of the SY tables that we have not addressed relates to those tables based on size distortions of a Wald statistic. This is a much more difficult analytical problem than has been addressed here and it is not clear that there is much benefit in tackling it as, in our estimation, the bias tables are in much more frequent use, making them of greater practical relevance.
Finally, in support of the results presented in the paper, we provide two Matlab programs on an ‘as is’ basis. The first of these, Table1.m, provides the body of Table 1. The second program, entitled sypval.m, provides p-values. Appendix C provides some discussion on the contents of these programs. The programs are available at https://sites.google.com/site/skeelscv/.

Author Contributions

F.W. conceived, designed, and performed the simulation experiments; C.S. and F.W. wrote the paper.

Funding

Windmeijer acknowledges funding from the Medical Research Council (MC_UU_12013/9).

Acknowledgments

We happily acknowledge our intellectual debt to Peter Phillips, whose seminal contributions to the analysis of inference in simultaneous equations models is but a small part of his enormous contribution to the econometrics literature over a nearly 50-year period. We are particularly grateful to Grant Hillier for helping us with some technical details that form the basis of Appendix E. We also acknowledge Jon Temple, who provided extensive comments on an earlier draft of this paper, and Mark Schaffer for more recent comments. We would also like to thank the anonymous referees for their very constructive comments. The usual caveat applies. Skeels thanks the Department of Economics at the University of Bristol for their hospitality during his visits there, which is where this paper was first written. Finally, we would like to dedicate this paper to the memory of John Knight, a fine scholar and, far more importantly, one of nature’s true gentlemen, taken far too soon.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. The Expectation of a Particular Function of Normal Random Variables

Theorem A1.
Suppose that ξ N λ , I k . Then,
E ( ξ λ ) ξ ξ ξ = 1 F 1 1 ; k 2 ; μ 2 2 , k 2 , diverges , k = 1 ,
where μ 2 = λ λ .
Proof. 
It is straightforward to demonstrate that the expectation is unbounded when k = 1 and so we shall assume hereafter that k > 1 . Given the normality assumption on ξ , we can write
E ( ξ λ ) ξ ξ ξ = exp { λ λ / 2 } ( 2 π ) k / 2 1 λ ξ ξ ξ exp ξ ξ 2 exp λ ξ d ξ = I ( say ) .
In accordance with Herz (1955), Lemma 1.4, we can decompose almost all k-vectors ξ into ξ = h r 1 / 2 , where h = ξ ( ξ ξ ) 1 / 2 , so that h h = 1 , and r = ξ ξ > 0 , with volume elements
d ξ = 2 1 r ( k 2 ) / 2 d h d r .
This is essentially a transformation to polar coordinates. The resulting expression is
I = exp { λ λ / 2 } 2 ( 2 π ) k / 2 r > 0 exp { r / 2 } r ( k 2 ) / 2 × h h = 1 exp { λ h r 1 / 2 } d h r 1 h h = 1 λ h r 1 / 2 exp { λ h r 1 / 2 } d h d r
almost everywhere. Next, write
λ h r 1 / 2 exp { λ h r 1 / 2 } = d exp ( 1 + t ) λ h r 1 / 2 d t t = 0
and evaluate the integrals over h h = 1 using Hillier et al. (1984), Equation (6):
h h = 1 exp r 1 / 2 λ h d h = 2 π k / 2 Γ k 2 0 F 1 k 2 ; μ 2 r 4 .
This yields, on replacing λ λ by μ 2 ,
I = exp { μ 2 / 2 } 2 k / 2 Γ k 2 r > 0 exp r 2 r ( k 2 ) / 2 × 0 F 1 k 2 ; μ 2 r 4 r 1 d d t 0 F 1 k 2 ; ( 1 + t ) 2 μ 2 r 4 t = 0 d r ,
where p F q ( a 1 , , a p ; b 1 , , b q ; ξ ) denotes a generalised hypergeometric function. Finally, differentiating with respect to t, using say NIST (2015, Equation 16.3.1), evaluating the derivative at t = 0 , and then resolving the resulting Laplace transforms using
1 Γ k 2 r > 0 exp r 2 r ( k 3 ) / 2 0 F 1 k 2 ; μ 2 r 4 d r = 2 ( k 1 ) / 2 1 F 1 k 1 2 ; k 2 ; μ 2 2
yields
I = exp { μ 2 / 2 } 1 F 1 k 2 ; k 2 ; μ 2 2 μ 2 k 1 F 1 k 2 ; k + 2 2 ; μ 2 2 = exp { μ 2 / 2 } 1 F 1 k 2 2 ; k 2 ; μ 2 2 = 1 F 1 1 ; k 2 ; μ 2 2 ,
where the second to last equality exploits one of the relationships for contiguous confluent hypergeometric functions (NIST 2015, Equation 13.3.4) and the final equality is another application of Kummer’s transformation. □

Appendix B. Analysis of Table 1

Appendix B.1. Preliminaries

In this appendix, we analyse how the critical values c v α , presented in Table 1, change in response to changes in one of either k z or B when the other is held fixed; that is, as one either moves down columns of the table or across rows, from left to right, respectively. Some additional analysis is available in Skeels and Windmeijer (2016).
From Equation (5),
c v α = χ α k z ,
with χ α the solution to the equation
1 α = 0 χ α f ( s k z , μ 2 ) d s ,
where f ( s k z , μ 2 ) denotes the density function of a non-central chi-squared random variable; specifically
f ( s k z , μ 2 ) = e μ 2 / 2 j = 0 ( μ 2 / 2 ) j j ! 2 ( k z + 2 j ) / 2 Γ k z + 2 j 2 e s / 2 s ( k z + 2 j 2 ) / 2 = j = 0 κ j ( k z , μ 2 ) e s / 2 s ( k z + 2 j 2 ) / 2 ,
where
κ j ( k z , μ 2 ) = e μ 2 / 2 ( μ 2 / 2 ) j j ! 2 ( k z + 2 j ) / 2 Γ k z + 2 j 2 .
The parameter μ 2 is chosen to satisfy
B = 1 F 1 1 ; k z 2 ; μ 2 2 .
The absolute values can be ignored as the confluent hypergeometric function is positive for all μ 2 0 whenever k z 2 , which shall be assumed for the rest of this appendix unless indicated otherwise.

Appendix B.2. The Consequence of Varying B for Fixed k z 2

With k z held fixed we have, from (A4),
d c v α d B = 1 k z d χ α d μ 2 d B d μ 2 .
First,
d B d μ 2 = d B d ( μ 2 / 2 ) d ( μ 2 / 2 ) d μ 2 = 1 k z 1 F 1 2 ; k z + 2 2 ; μ 2 2 < 0 ,
for all μ 2 and k z . Second, using Leibniz’s Rule for the differentiation of integrals, we can differentiate both sides of (A5) with respect to μ 2 to obtain,
0 = 0 χ α f ( s k z , μ 2 ) μ 2 d s + f ( χ α k z , μ 2 ) d χ α d μ 2 d χ α d μ 2 = 1 f ( χ α k z , μ 2 ) 0 χ α f ( s k z , μ 2 ) μ 2 d s .
Note that (A11) implicitly assumes 0 < χ α < , so that 0 < α < 1 . In the event that either χ α = 0 or χ α is infinite, then f ( s k z , μ 2 ) = 0 , as does its derivative with respect to μ 2 , making the representation (A11) invalid. Indeed, as these cases are on the boundaries of support of a non-central chi-squared random variable, the ordinary derivative is not well-defined and so the approach taken above would require modification. For this reason, hereafter, we shall assume that 0 < α < 1 .
From (A6), the integrand in (A11) is (Cohen 1988, Equation (2))
f ( s k z , μ 2 ) μ 2 = 1 2 f ( s k z + 2 , μ 2 ) f ( s k z , μ 2 ) .
Integrating by parts allows us to write
0 χ α e s / 2 s ( k z + 2 j ) / 2 d s = 2 e s / 2 s ( k z + 2 j ) / 2 0 χ α + ( k z + 2 j ) 0 χ α e s / 2 s ( k z + 2 j 2 ) / 2 d s = 2 e χ α / 2 χ α ( k z + 2 j ) / 2 + ( k z + 2 j ) 0 χ α e s / 2 s ( k z + 2 j 2 ) / 2 d s ,
and so (A11) becomes
d χ α d μ 2 = 1 2 f ( χ α k z , μ 2 ) j = 0 κ j ( k z + 2 , μ 2 ) 2 e χ α / 2 χ α ( k z + 2 j ) / 2 + ( k z + 2 j ) 0 χ α e s / 2 s ( k z + 2 j 2 ) / 2 d s κ j ( k z , μ 2 ) 0 χ α e s / 2 s ( k z + 2 j 2 ) / 2 d s = f ( χ α k z + 2 , μ 2 ) f ( χ α k z , μ 2 ) j = 0 k j ( k z + 2 , μ 2 ) ( k z + 2 j ) k j ( k z , μ 2 ) 2 f ( χ α k z , μ 2 ) 0 χ α e s / 2 s ( k z + 2 j 2 ) / 2 d s = f ( χ α k z + 2 , μ 2 ) f ( χ α k z , μ 2 ) < 0 ,
as κ j ( k z + 2 , μ 2 ) ( k z + 2 j ) κ j ( k z , μ 2 ) = 0 . The positivity of the ratio follows because each of the functions f are values of non-central chi-squared density functions which differ only in their degrees of freedom, k z versus k z + 2 respectively, and so are both everywhere positive for all 0 < χ α < , as is assumed above. As an aside, we know that as degrees of freedom increase for given μ 2 these functions cross, which means that sometimes f ( χ α k z , μ 2 ) > f ( χ α k z + 2 , μ 2 ) and sometimes the converse is true. That is, we are unable to bound d χ α / d μ 2 from above.
Combining (A9), (A10), and (A12), we find that
d c v α d B = f ( χ α k z + 2 , μ 2 ) f ( χ α k z , μ 2 ) 1 F 1 2 ; k z + 2 2 ; μ 2 2 < 0 ,
which confirms the behaviour observed in Table 1. That is, for given values of k z , the critical values c v α are decreasing functions of the asymptotic bias B.

Appendix C. Some Remarks on Computational Aspects

For the most part, both the programs Table1.m and sypval.m rely on intrinsic Matlab functions. Once the relevant inputs are available then the structure of the programs is immediately apparent. Specifically, for given values of k z and B, it is necessary to obtain the corresponding value for μ 0 2 from the nonlinear Equation (9). We adopt a fairly simple-minded approach to this, by iterating from a starting value to the correct solution using a bisection algorithm.
Our starting values are chosen as follows. When k z = 2 , we know from (10) that the values of μ 0 2 can be calculated exactly as μ 0 2 = 2 ln B and so no search is required. When k z > 2 , we exploit an approximation asymptotic in μ 0 2 (Slater 1960, Equation (4.1.8)) that reduces to μ 0 2 ( k z 2 ) / B . As expected, the performance of the approximation improves as B decreases which, for fixed k z , corresponds to increasing μ 0 2 (see Appendix B.2). Nevertheless, for all cases where k z > 2 , this approximation provides much better starting values in the search for μ 0 2 than do naive alternatives, such as starting the search from zero (say). Moreover, this approximation performs best under exactly the same circumstances that naive methods are at their slowest, affording considerable computational time savings. As the values of μ 0 2 / k z are much more stable for any given B than the μ 0 2 , as can be deduced from Table 4, we use this parameterisation in our search algorithm.

Appendix D. Some Remarks on the Just-Identified Scalar Case

The SY approach is not available if k z = 1 because the bias of 2SLS does not exist, hence | B | is undefined.19 Nevertheless, given the difficulties often encountered in finding appropriate instruments, the exactly identified model is one of considerable practical relevance. As the confluent hypergeometric function of Theorem 1 remains well-defined when k z = 1 , one might ask if it could provide an ad hoc basis for a test for weak instruments in this case, based on F, in the spirit of the SY approach.
The function 1 F 1 1 ; 1 2 ; μ 2 2 displays behaviours that are quite different to what was observed in over-identified models. These behaviours are displayed in Figure A1 where we plot both the confluent hypergeometric function and its absolute value against μ 2 / 2 , when k z = 1 . Note that in the figure we use the symbol B to represent the confluent hypergeometric function 1 F 1 1 ; 1 2 ; μ 2 2 , rather than the expectation E ( ξ λ ) ξ / ξ ξ , with the latter unbounded when k z = 1 .
In Figure A1, we see that neither B nor | B | are monotonic in μ 2 / 2 when k z = 1 , in contrast to the over-identified cases. That the confluent hypergeometric function can take negative values when k z = 1 means that this case is the only one considered where taking the absolute value of the hypergeometric function has any material impact on observed behaviour. We can establish numerically that B, and hence | B | , both have a zero at μ 2 / 2 0.8540 . As this is in the region where the hypergeometric function is a decreasing value of its argument, we see that, as B moves through its zero to the right, so that μ 2 is increasing, it becomes negative and appears to stay that way, with a minimum of approximately 0.2847 occurring at μ 2 / 2 2.2559 . Clearly, | B | cannot become negative and so, at μ 2 / 2 2.2559 , it has a local maximum of approximately 0.2847 . Consequently, there are three values of μ 2 that yield the same value of | B | for 0 < | B | < 0.2847 , there are two values of μ 2 corresponding to | B | = 0.2847 , and for | B | { 0 } ( 0.2847 , 1 ] there is a one-to-one mapping between | B | and μ 2 < .
Figure A1. Plots of B = 1 F 1 1 ; 1 2 ; μ 2 2 and its absolute value against μ 2 2 .
Figure A1. Plots of B = 1 F 1 1 ; 1 2 ; μ 2 2 and its absolute value against μ 2 2 .
Econometrics 06 00044 g0a1
Observe in Figure A1 that there are three values of μ 2 corresponding to | B | = 0.1 . Setting μ 0 2 = 13.83 , the largest of these numbers, we find a critical value for the first-stage F-test of 28.77. At this level of information, the 2SLS estimator appears well-behaved. This is shown in Table A1, which shows the estimation result of a Monte Carlo analysis as in Section 4 for k z = 1 . Even though it has no moments as the model is just-identified, we find that the Monte Carlo relative bias is indeed 10% with the rejection frequency of the F-test again 5%. The same holds at the smaller values of | B | of 0.05 and 0.01, for which the largest implied values of μ 0 2 are 23.41 and 103.06, with the estimation results very similar to those for k z = 3 . However, when we consider the | B | = 0.20 case, for which μ 0 2 is 8.198, the lack of moments of the 2SLS estimator becomes apparent, with the standard deviation now very large at 6.05. These results suggest that the approximation might be useful for the smaller values of | B | , if one works with the largest implied values of μ 0 2 , even though the 2SLS estimator does not possess any moments in this case.
Table A1. Simulation results for k z = 1 .
Table A1. Simulation results for k z = 1 .
B
−0.01−0.05−0.10−0.20
meanstd devmeanstd devmeanstd devmeanstd dev
β ^ O L S 1.49490.00861.49880.00861.49930.00861.49960.0087
β ^ 2 S L S 0.99540.10030.97530.23270.94950.43890.89366.0492
F104.1120.41124.4299.835214.8217.57329.17575.8671
r e l b i a s −0.0092−0.0496−0.1011−0.2130
μ 0 2 103.0623.41213.8308.198
cv F139.1742.03528.76920.323
rej freq F0.04960.05070.04850.0489
Notes: Sample size n = 10,000, number of Monte Carlo replications is 100,000.
Confirming the approximate median unbiasedness of the just-identified 2SLS estimator (see, for example, the discussion in Angrist and Pischke (2009), p. 209), we find that the median biases, not reported in the table, are very close to 0 at all values of μ 2 .

Appendix E. Derivation of the O( k z 2 ) Term in (16)

The following derivation is due to Grant Hillier, via private communication, and we thank him for allowing us to include it here.
By analogy with Hillier et al. (1984), Equation (31), the O ( k z 2 ) term in (16) requires the resolution of an integral of the form
G i ( Λ , γ , k z , k x ) = Γ k x k z 2 c k x R e ( W ) > 0 etr { W } | W | k z / 2 e i Λ 1 / 2 W Λ 1 W Λ 1 / 2 γ d W ,
where e i denotes the ith column of an identity matrix, c p = 2 p ( p 1 ) / 2 / ( 2 π ı ) p ( p + 1 ) / 2 , with ı 2 = 1 , and the range of integration is the set of complex, symmetric matrices of dimension p × p with fixed, positive definite real part. Noting that
e i Λ 1 / 2 W Λ 1 W Λ 1 / 2 γ = 1 2 tr { ( γ e i + e i γ ) Λ 1 / 2 W Λ 1 W Λ 1 / 2 } = tr { ( f 1 η 1 η 1 + f 2 η 2 η 2 ) Λ 1 / 2 W Λ 1 W Λ 1 / 2 } ,
where f 1 and f 2 are the non-zero eigenvalues of ( γ e i + e i γ ) / 2 , and η 1 and η 2 are the corresponding eigenvectors, respectively. Making this substitution, the integral of interest reduces a weighted sum of integrals of the form G i ( Λ , γ , k z , k x ) = f 1 I ( Λ 1 / 2 , η 1 , k z , k x ) + f 2 I ( Λ 1 / 2 , η 2 , k z , k x ) , where
I ( S , η , m , p ) = Γ p m 2 c p R e ( W ) > 0 etr { W } | W | m / 2 η S W S 2 W S η d W .
We shall present the key result in the following lemma.
Lemma A1.
Let W be a p × p complex, symmetric matrix whose real part is fixed and positive definite, i.e., R e ( W ) > 0 , let S denote a non-singular, symmetric matrix of the same dimension, and let η denote a fixed p-vector. Then, the inverse Laplace transform of
g ( W ) = Γ p m 2 | W | m / 2 η S W S 2 W S η , m > p + 1 ,
is
i s f ( S ) = ( m p 1 ) 4 ( m p 2 ) η S 4 η η S 2 η t r { S 2 } .
Proof. 
Noting that
0 F 1 p 2 ; t 2 4 η S W S 2 W S η = k = 0 1 p 2 κ t 2 4 η S W S 2 W S η k k ! = 1 + t 2 η S W S 2 W S η 2 p + higher order terms in t ,
we see that f ( S ) is 2 p times the coefficient on t 2 in
I ( t ) = Γ p m 2 c p R e ( W ) > 0 etr { W } | W | m / 2 0 F 1 m 2 ; t 2 4 η S W S 2 W S η d W .
That is,
f ( S ) = 2 p × 1 2 d 2 I ( t ) d t 2 t = 0 = p d 2 I ( t ) d t 2 t = 0 .
The hypergeometric function in (A15) has the integral representation (James 1961, Theorem 5)
0 F 1 m 2 ; t 2 4 η S W S 2 W S η = h h = 1 exp t h S W S η ( d h ) ,
where ( d h ) denotes the normalised invariant Haar measure on the unit sphere, and so
I ( t ) = Γ p m 2 c p R e ( W ) > 0 etr { W } | W | m / 2 h h = 1 exp t h S W S η ( d h ) d W = Γ p m 2 c p h h = 1 R e ( W ) > 0 etr { R ( t ) W } | W | m / 2 d W ( d h ) , = h h = 1 | R ( t ) | ( m p 1 ) / 2 ( d h ) ,
where
R ( t ) = I p + t 2 S ( h η + η h ) S
is symmetric and positive definite for small enough t,and the final equality in (A15) follows from Muirhead (1982), pp. 252–53. Now
| R ( t ) | = I 2 + t 2 ( η , h ) S 2 ( h , η ) = 1 + t 2 η S 2 h 2 t 2 4 h S 2 h η S 2 η = 1 + c 1 t + c 2 t 2 ,
say, where c 1 = η S 2 h and c 2 = 1 4 h [ S 2 η η S 2 ( η S 2 η ) S 2 ] h . Hence,
d 2 I ( t ) d t 2 t = 0 = ( m p 1 ) c 2 + ( m p 1 ) ( m p 3 ) 4 c 1 2 = ( m p 1 ) 4 h [ ( m p 2 ) S 2 η η S 2 ( η S 2 η ) S 2 ] h .
Finally, noting that, for symmetric p × p matrix B,20
h h = 1 h B h ( d h ) = 1 p t r { B } ,
we find that
f ( S ) = ( m p 1 ) 4 ( m p 2 ) η S 4 η ( η S 2 η ) t r { S 2 } ,
as required. □
Applying Lemma A1 to (A14), and re-combining the spectral decomposition, we find that
G i ( Λ , γ , k z , k x ) = k z k x 1 4 [ ( k z k x 2 ) η Λ 2 η ( η Λ 1 η ) t r { Λ 1 } ] .
Equation (16) follows directly on substituting λ λ / ( 2 ν ) for Λ .

References

  1. Angrist, Joshua D., and Jörn-Steffen Pischke. 2009. Mostly Harmless Econometrics. An Empiricist’s Companion. Princeton and Oxford: Princeton University Press. [Google Scholar]
  2. Baum, Christopher F., Mark E. Schaffer, and Steven Stillman. 2010. ivreg2: Stata Module for Extended Instrumental Variables/2SLS, GMM and AC/HAC, LIML, and k-Class Regression. Boston College Department of Economics, Statistical Software Components S425401. Available online: http://ideas.repec.org/c/boc/bocode/s425401.html (accessed on 26 August 2015).
  3. Buse, Adolf. 1992. The bias of instrumental variables estimators. Econometrica 60: 173–80. [Google Scholar] [CrossRef]
  4. Chao, John, and Norman R. Swanson. 2007. Alternative approximations of the bias and MSE of the IV estimator under weak identification with an application to bias correction. Journal of Econometrics 137: 515–55. [Google Scholar] [CrossRef]
  5. Cohen, Jonathan D. 1988. Noncentral Chi-Square: Some observations on recurrence. The American Statistician 42: 120–22. [Google Scholar] [CrossRef]
  6. Constantine, A. Graham. 1963. Some non-central distribution problems in multivariate analysis. Annals of Mathematical Statistics 34: 1270–85. [Google Scholar] [CrossRef]
  7. Cragg, John G., and Stephen G. Donald. 1993. Testing identifiability and specification in instrument variable models. Econometric Theory 9: 222–40. [Google Scholar] [CrossRef]
  8. Das Gupta, Somesh, and Michael D. Perlman. 1974. Power of the noncentral F-test: Effect of additional variates on Hotelling’s T2-test. Journal of the American Statistical Association 69: 174–80. [Google Scholar]
  9. Davis, A. William. 1979. Invariant polynomials with two matrix arguments extending the zonal polynomials: Applications to multivariate distribution theory. Annals of the Institute of Statistical Mathematics 31 Pt A: 465–85. [Google Scholar] [CrossRef]
  10. Dufour, Jean-Marie. 1997. Some impossibility theorems in econometrics with applications to structural and dynamic models. Econometrica 65: 1365–87. [Google Scholar] [CrossRef]
  11. Forchini, Giovanni, and Grant H. Hillier. 2003. Conditional inference for possibly unidentified structural equations. Econometric Theory 19: 707–43. [Google Scholar] [CrossRef]
  12. Hale, Christopher, Roberto S. Mariano, and John G. Ramage. 1980. Finite sample analysis of misspecification in simultaneous equation models. Journal of the American Statistical Association 75: 418–27. [Google Scholar] [CrossRef]
  13. Herz, Carl S. 1955. Bessel functions of matrix argument. Annals of Mathematics 61: 474–523. [Google Scholar] [CrossRef]
  14. Hillier, Grant H., Raymond Kan, and Xiaolu Wang. 2009. Computationally efficient recursions for top-order invariant polynomials with applications. Econometric Theory 25: 211–42. [Google Scholar] [CrossRef]
  15. Hillier, Grant H., Raymond Kan, and Xiaolu Wang. 2014. Generating functions and short recursions, with applications to the moments of quadratic forms in noncentral normal vectors. Econometric Theory 30: 436–73. [Google Scholar] [CrossRef]
  16. Hillier, Grant H., Terrence W. Kinal, and Virendra K. Srivastava. 1984. On the moments of ordinary least squares and instrumental variables estimators in a general structural equation. Econometrica 52: 185–202. [Google Scholar] [CrossRef]
  17. James, Alan T. 1961. Zonal polynomials of the real positive definite symmetric matrices. Annals of Mathematics 74: 456–69. [Google Scholar] [CrossRef]
  18. James, Alan T. 1964. Distributions of matrix variates and latent roots derived from normal samples. The Annals of Mathematical Statistics 35: 475–501. [Google Scholar] [CrossRef]
  19. Johansson, Fredrik. 2016. Computing Hypergeometric Functions Rigorously. Available online: https://hal.inria.fr/hal-01336266v2 (accessed on 15 September 2016).
  20. Kinal, Terrence W. 1980. The existence of moments of k-class estimators. Econometrica 49: 241–49. [Google Scholar] [CrossRef]
  21. Kleibergen, Frank. 2002. Pivotal statistics for testing structural parameters in instrumental variables regression. Econometrica 70: 1781–803. [Google Scholar] [CrossRef]
  22. Knight, John L. 1982. A note on finite sample analysis of misspecification in simultaneous equation models. Economics Letters 9: 275–79. [Google Scholar] [CrossRef]
  23. MathWorks. 2016. MATLAB and Statistics Toolbox Release 2016b. Natick: The MathWorks Inc. [Google Scholar]
  24. Muirhead, Robb J. 1982. Aspects of Multivariate Statistical Theory. New York: John Wiley and Sons, Inc. [Google Scholar]
  25. National Institute of Standards and Technology (NIST). 2015. Digital Library of Mathematical Functions; Edited by Frank W. J. Olver, Adri B. Olde Daalhuis, Daniel W. Lozier, Barry I. Schneider, Ronald F. Boisvert, Charles W. Clark, Bruce R. Miller and Bonita V. Saunders. Release 1.0.10 of 2015-08-07; Gaithersburg: NIST. Available online: http://dlmf.nist.gov/ (accessed on 10 August 2015).
  26. Nelson, Charles R., and Richard Startz. 1990a. The distribution of the instrumental variables estimator and its t-ratio when the instrument is a poor one. Journal of Business 63 Pt 2: S125–40. [Google Scholar] [CrossRef]
  27. Nelson, Charles R., and Richard Startz. 1990b. Some further results on the exact small sample properties of the instrumental variable estimator. Econometrica 58: 967–76. [Google Scholar] [CrossRef]
  28. Phillips, Peter C. B. 1980. The exact distribution of instrumental variable estimators in an equation containing n + 1 endogenous variables. Econometrica 48: 861–78. [Google Scholar] [CrossRef]
  29. Phillips, Peter C. B. 1983a. Exact small sample theory in the simultaneous equations model. In Handbook of Econometrics, Volume I. Edited by Zvi Griliches and Michael D. Intriligator. Amsterdam: North Holland, Chapter 8. pp. 449–516. [Google Scholar]
  30. Phillips, Peter C. B. 1983b. Marginal densities of instrumental variable estimators in the general single equation case. Advances in Econometrics 2: 24. [Google Scholar]
  31. Phillips, Peter C. B. 1989. Partially identified econometric models. Econometric Theory 5: 181–240. [Google Scholar] [CrossRef]
  32. Phillips, Peter C. B. 2016. Inference in near-singular regression. Advances in Econometrics 36: 461–86. [Google Scholar] [CrossRef]
  33. Phillips, Peter C. B. 2017. Reduced forms and weak instrumentation. Econometric Reviews 36: 818–39. [Google Scholar] [CrossRef]
  34. Phillips, Peter C. B, and Wayne Y. Gao. 2017. Structural inference from reduced forms with many instruments. Journal of Econometrics 199: 96–116. [Google Scholar] [CrossRef]
  35. Richardson, David H. 1968. The exact distribution of a structural coefficient estimator. Journal of the American Statistical Association 63: 1214–26. [Google Scholar] [CrossRef]
  36. Richardson, David H., and De-Min Wu. 1971. A note on the comparison of ordinary and two-stage least squares estimators. Econometrica 39: 973–81. [Google Scholar] [CrossRef]
  37. Sanderson, Eleanor, and Frank Windmejer. 2016. A weak instrument F-test in linear IV models with multiple endogenous variables. Journal of Econometrics 190: 212–21. [Google Scholar] [CrossRef] [PubMed]
  38. Skeels, Christopher L. 1995. Instrumental variables estimation in misspecified single equations. Econometric Theory 11: 498–529. [Google Scholar] [CrossRef]
  39. Skeels, Christopher L., and Frank Windmeijer. 2016. On the Stock–Yogo Tables. Discussion Paper 16/679. Bristol: Department of Economics, University of Bristol. [Google Scholar]
  40. Slater, Lucy J. 1960. Confluent Hypergeometric Functions. Cambridge: Cambridge University Press. [Google Scholar]
  41. Slater, Lucy J. 1966. Generalized Hypergeometric Functions. Cambridge: Cambridge University Press. [Google Scholar]
  42. Staiger, Douglas, and James H. Stock. 1997. Instrumental variables regression with weak instruments. Econometrica 65: 557–86. [Google Scholar] [CrossRef]
  43. StataCorp. 2015. Stata Statistical Software: Release 14. College Station: StataCorp LP. [Google Scholar]
  44. Stock, James H., and Motohiro Yogo. 2005. Testing for weak instruments in linear IV regression. In Identification and Inference for Econometric Models: Essays in Honor of Thomas Rothenberg. Edited by Donald W. K. Andrews and James H. Stock. Cambridge: Cambridge University Press, Chapter 5. pp. 80–108. [Google Scholar]
  45. Wasserstein, Ronald L., and Nicole A. Lazar. 2016. The ASA’s statement on p-values: Context, process, and purpose. The American Statistician 70: 129–33. [Google Scholar] [CrossRef]
1.
Much of the later literature has focussed less on testing for the presence of weak instruments and more on the development of techniques that are robust to the presence of weak instruments.
2.
A heuristically appealing aspect of using the first-stage F-statistic as a measure of instrument weakness, in the case of a single endogenous regressor, is its consistency with the well-known Staiger–Stock rule of thumb. Staiger and Stock (1997), p. 557, suggested that instruments be deemed weak if the first-stage F is less than 10. SY (pp. 101–2) observe that 10 corresponds closely to their tabulated critical values for a 5% test that the relative bias is 10% for all values of k z , and concluded that ‘this provides a formal, and not unreasonable, testing interpretation of the the Staiger–Stock rule of thumb.’
3.
Through the use of more extensive simulation results than those used originally by SY, we are able to support the proposition that the numerical approximation errors inherent in the computation of our analytical results are less than those contained in the original SY tables.
4.
Some references specify the non-centrality parameter for a non-central chi-squared distribution as δ , whereas others specify it as δ / 2 . We have adopted the former convention here.
5.
The exact details of these arguments can be found in SY and will not be repeated here.
6.
We thank an anonymous referee for bringing this subtlety to our attention. For a more complete discussion of this point, we refer the reader to the discussion of (Chao and Swanson 2007, pp. 518–19) and the references cited therein.
7.
It should be noted that the proof provided is not the only one possible and we would like to thank helpful referees for drawing various alternatives to our attention. For example, in an elegant paper, Chao and Swanson (2007), Proposition 3.1 and Lemma 3.3, respectively, derive local to zero approximations for each of lim n E β ^ 2 S L S , n β and lim n E β ^ O L S , n β , from whence derivation of the ratio is straightforward. Similarly, there are finite sample papers in the literature from which it would be possible to start a proof along the lines of the one presented but at a more advanced point (see, for example, Forchini and Hillier 2003, Equation B.13). However, we favour the proof presented for two reasons. First, it is a direct continuation of the developments of Stock and Yogo (2005), Equation 3.1, and the discussion immediately thereafter. Second, when viewed in the correct light, there are much earlier antecedents that take precedence over the two mentioned here. We discuss this further in Section 6.
8.
In the absence of such intrinsic functions, computational aspects of hypergeometric functions are discussed in Johansson (2016).
9.
We have also computed simulated critical values from 20,000 random draws as in SY, but repeating the exercise 1000 times. The resulting mean critical values are virtually identical to those in Table 1, with the maximum difference being 0.02.
10.
Theorem 2 is similar in spirit to Das Gupta and Perlman (1974), p. 180, Remark 4.1, although they only address the numerator of the ratio in Equation (5). Consequently, Das Gupta and Perlman are silent on the relative magnitudes of χ α and k z which, in essence, is the content of Theorem 2.
11.
The complete set of assumptions are presented in SY (Section 2.4).
12.
In (13), vec · is the usual matrix operator that stacks all of the columns of its matrix argument into a single column vector; see, for example, Muirhead (1982), p. 17.
13.
Make the substitutions { e i , γ , k z } for { α ¯ , β , ν } , respectively, in Hillier et al. (1984), Equation (30).
14.
Please note that the definition of Λ adopted here is slightly different from the definitions used in either Hillier et al. (1984) and SY.
15.
See Muirhead (1982), Section 7.2.1, for a much more complete treatment of ordered partitions.
16.
The zonal polynomials appearing in (14) adopt a normalisation due to Constantine (1963), which typically leads to more compact expressions than do the polynomials originally proposed by James (1961).
17.
Some progress towards addressing the computational aspects of these polynomials has been made by Hillier et al. (2009, 2014).
18.
Although the derivation of (15) is straightforward, this is less true for (16). A derivation of the terms in (16) that are in addition to those in (15) is provided in Appendix E.
19.
Similarly, in the proof of Theorem A1, we established that E ( ξ λ 0 ) ξ / ξ ξ was unbounded when k z = 1 .
20.
This is an application of James (1964), Equation (23), where, in his notation S B , T e 1 e 1 , κ = [ 1 ] and H [ h , H 2 ] is an orthogonal p × p matrix.
Figure 1. Plots of B = 1 F 1 1 ; k z 2 ; μ 2 2 against μ 2 2 for k z = 2 , 3 and 6.
Figure 1. Plots of B = 1 F 1 1 ; k z 2 ; μ 2 2 against μ 2 2 for k z = 2 , 3 and 6.
Econometrics 06 00044 g001
Table 1. 5% Critical values ( c v 0.05 S W ) for single endogenous regressor, 2SLS bias.
Table 1. 5% Critical values ( c v 0.05 S W ) for single endogenous regressor, 2SLS bias.
k z B 0.010.050.10.150.20.250.3
211.579.027.857.146.616.195.83
346.3213.769.187.526.605.965.49
463.1016.7210.237.916.675.885.32
572.5518.2710.788.116.715.825.19
678.5919.1911.088.216.705.755.09
782.7519.7911.258.256.675.695.01
885.7820.2011.368.266.645.634.93
988.0720.4911.428.256.605.584.87
1089.8620.7011.468.246.565.524.81
1191.3020.8611.498.226.535.484.76
1292.4720.9911.508.206.495.434.71
1393.4321.0811.508.176.465.394.67
1494.2521.1611.508.156.425.364.63
1594.9421.2211.498.136.395.324.59
1695.5421.2611.498.116.365.294.56
1796.0521.3011.488.086.345.264.53
1896.5021.3311.468.066.315.234.50
1996.0921.3511.458.046.295.214.47
2097.2521.3711.448.026.265.184.45
2197.5621.3911.438.006.245.164.43
2297.8421.4011.417.986.225.144.40
2398.0921.4111.407.966.205.124.38
2498.3221.4111.397.946.185.104.36
2598.5321.4211.387.936.165.084.35
2698.7121.4211.367.916.155.064.33
2798.8821.4211.357.906.135.054.31
2899.0421.4211.347.886.115.034.30
2999.1821.4211.327.876.105.024.28
3099.3121.4211.317.856.085.004.27
Table 2. Differences: c v 0.05 S W c v 0.05 S Y .
Table 2. Differences: c v 0.05 S W c v 0.05 S Y .
k z B 0.050.100.200.30
30.15−0.10−0.14−0.10
40.130.040.040.02
50.100.050.060.06
60.090.040.060.06
70.070.040.060.07
80.050.030.050.06
90.040.040.050.05
100.040.030.050.05
110.040.010.030.04
120.020.020.040.04
130.020.020.030.04
140.020.020.030.04
150.010.020.030.04
160.020.010.030.03
170.010.010.020.03
180.010.020.020.03
190.010.010.020.04
200.010.010.020.03
210.000.010.020.03
220.000.010.020.03
230.000.010.020.03
240.000.010.020.03
250.000.000.020.02
260.000.010.010.02
270.000.010.010.03
280.000.020.020.02
290.000.010.010.03
300.000.010.010.02
Note: The values of c v 0.05 S Y are taken from Stock and Yogo (2005), Table 5.1.
Table 3. Simulation results for k z = 3 and k z = 2 .
Table 3. Simulation results for k z = 3 and k z = 2 .
B
0.010.050.100.20
k z = 3 meanstd devmeanstd devmeanstd devmeanstd dev
β ^ O L S 1.49500.00861.49890.00871.49940.00871.49970.0087
β ^ 2 S L S 1.00540.09981.02410.22221.05060.31611.10250.4276
F34.7136.76268.03363.18284.78492.39523.09481.8630
rel bias0.01080.04820.10140.2052
μ 0 2 / k z 33.6747.04453.77542.0902
cv F46.31613.7659.18156.5960
rej freq F0.05150.05050.05080.0511
k z = 2 meanstd devmeanstd devmeanstd devmeanstd dev
β ^ O L S 1.49960.00871.49970.00871.49970.00871.49980.0087
β ^ 2 S L S 1.00560.43981.02560.71951.05190.96511.09811.1404
F5.61243.19894.00042.64923.29632.37462.60112.0502
rel bias0.01110.05130.10390.1962
μ 0 2 / k z 4.60522.99572.30261.6094
cv F11.5729.02327.85216.6087
rej freq F0.05090.05070.05050.0498
Notes: Sample size n = 10,000, number of Monte Carlo replications is 100,000.
Table 4. Values for μ 0 2 / k z corresponding to Table 1.
Table 4. Values for μ 0 2 / k z corresponding to Table 1.
k z B 0.010.050.10.150.20.250.3
0204.60502.9962.3031.8971.6091.3861.204
0333.67407.0453.7752.6772.0901.7061.426
0450.00010.0005.0003.3292.4831.9601.599
0559.79911.7935.7843.7742.7612.1441.724
0666.33212.9916.3154.0812.9582.2771.816
0770.99813.8486.6964.3043.1022.3751.885
0874.49814.4916.9824.4723.2122.4501.938
0977.22114.9927.2054.6043.2982.5101.980
1079.39815.3927.3844.7093.3672.5582.014
1181.18015.7207.5314.7963.4242.5972.043
1282.66515.9937.6534.8683.4712.6302.066
1383.92216.2247.7564.9293.5112.6582.086
1484.99916.4237.8454.9813.5462.6822.104
1585.93216.5947.9225.0273.5762.7032.119
1686.74916.7457.9895.0673.6022.7212.132
1787.47016.8778.0485.1023.6262.7382.144
1888.11016.9958.1015.1333.6462.7522.154
1988.68317.1018.1485.1613.6652.7652.163
2089.19917.1968.1915.1863.6812.7772.172
2189.66617.2818.2295.2093.6972.7872.179
2290.09017.3608.2645.2303.7102.7972.186
2390.47717.4318.2965.2493.7232.8062.193
2490.83317.4968.3265.2663.7342.8142.198
2591.15917.5568.3535.2823.7452.8212.204
2691.46117.6128.3775.2973.7552.8282.209
2791.74017.6638.4005.3113.7642.8342.213
2891.99917.7118.4225.3233.7722.8402.217
2992.24117.7558.4425.3353.7802.8462.221
3092.46617.7978.4605.3463.7872.8512.225
Back to TopTop