On the Stock–Yogo Tables

: A standard test for weak instruments compares the ﬁrst-stage F -statistic to a table of critical values obtained by Stock and Yogo (2005) using simulations. We derive a closed-form solution for the expectation from which these critical values are derived, as well as present some second-order asymptotic approximations that may be of value in the presence of multiple endogenous regressors. Inspection of this new result provides insights not available from simulation, and will allow software implementations to be generalised and improved. Finally, we explore the calculation of p -values for the ﬁrst-stage F -statistic weak instruments test.


Introduction
In a seminal contribution, Phillips (1989) focussed the attention of the profession on the distributional consequences of what has come to be known as the problem of weak instruments. There followed a series of important papers that helped crystallise the consequences for inference of this problem including, but not restricted to, Startz (1990a, 1990b); Buse (1992); Cragg and Donald (1993); Dufour (1997) and Kleibergen (2002). There have been many, many responses seeking to address the issues raised in this literature, some with greater merit than others. Phillips continues to make substantial contributions to this area of the literature. See, for example, Phillips (2016Phillips ( , 2017; Phillips and Gao (2017). One development that, for better or worse, 1 has had significant practical impact, as a consequence of its inclusion in widely used econometric software, is that of Stock and Yogo (2005), hereafter SY. It is this latter development which is the focus of this paper.
The fundamental contribution of SY was to develop quantitative definitions of weak instruments, based on either IV estimator bias or Wald test size distortion, that were testable. Their idea was to relate the first-stage F-statistic (or, when there are multiple endogenous regressors, the Cragg-Donald (Cragg and Donald 1993) statistic) to a non-centrality parameter that, in turn, was related to the aforementioned estimator bias or test size distortion. In this way, they were able to use this F-statistic to test whether instruments were weak. 2 That such tests have become part of the toolkit of many 1 Much of the later literature has focussed less on testing for the presence of weak instruments and more on the development of techniques that are robust to the presence of weak instruments. 2 A heuristically appealing aspect of using the first-stage F-statistic as a measure of instrument weakness, in the case of a single endogenous regressor, is its consistency with the well-known Staiger-Stock rule of thumb. Staiger and Stock (1997), p. 557, suggested that instruments be deemed weak if the first-stage F is less than 10. SY (pp. 101-2) observe that 10 corresponds refer to this as the general case. We find that we are able to draw on a variety of results developed in the exact finite sample literature. Not that any of our results are exact, they are all asymptotic in nature, but they do share a common structure with the earlier work that we are able to exploit. In addition to obtaining results for the relative bias of 2SLS, in the general case, we also provide firstand second-order asymptotic approximations to the relative bias, where the nesting sequence is the number of instruments. Overall, the analysis of the general case is less favourable to the SY approach than is the scalar case.
Final remarks appear in Section 8. For the most part, we have relegated technical developments to the various appendices, as well as discussion of some matters deemed secondary to the main ideas of the paper.

An Analytic Development of Stock-Yogo
Consider the simple model where y = [y 1 , . . . , y n ] , x = [x 1 , . . . , x n ] and u = [u 1 , . . . , u n ] are n × 1 vectors, with n the number of observations. The regressor x is assumed endogenous, so that E [u|x] = 0. Other exogenous regressors in the model, including the constant, have been partialled out. We can implicitly define a set of instruments via the following linear projection where Z is an n × k z matrix of instruments (with full column rank), π a k z × 1 vector of parameters and v is an n × 1 error vector. In this model, k z − 1 is the degree of over-identification. We assume that individual observations are independently and identically distributed, and where z i denotes the ith row of Z. A test for H 0 : π = 0 against H 1 : π = 0, is the so-called first-stage F-statistic whereπ = (Z Z) −1 Z x andσ 2 v = n −1 x (I n − Z(Z Z) −1 Z )x. Here, a large value of the statistic is evidence against the null hypothesis, which is that the nominated instruments are irrelevant.
Following Staiger and Stock (1997), we consider values of π local to zero, as π = c/ √ n. We then obtain for the concentration parameter µ 2 n , where Q zz = E z i z i = plim n→∞ n −1 Z Z is positive definite by assumption. We see that k z F is a sample analogue of µ 2 . With this formulation, the testing problem previously discussed is equivalent to that of testing H 0 : µ 2 = 0 against H 1 : µ 2 > 0. Rather than testing for the irrelevance of instruments, SY characterised weak instruments as a situation where µ 2 was greater than zero but proximate to it. Specifically, their testing problem can be thought of as H 0 : µ 2 = µ 2 0 > 0 against H 1 : µ 2 > µ 2 0 , for some suitably specified value of µ 2 0 . The statistic F is still a natural one in this problem; although, of course, the null distribution is no longer the central distribution associated with µ 2 0 = 0. Instead, we have where χ 2 k,δ denotes a random variable following a non-central chi-squared distribution with k degrees of freedom and non-centrality parameter δ ≥ 0. 4 Let denote the (1 − α)100th quantile of a non-central chi-squared distribution with k z degrees of freedom and non-centrality parameter µ 2 0 . Then, the relevant size α critical region is where χ α can be obtained, for given µ 2 0 and k z , as the solution to either of the equations and where 1 F 1 (·; ·; ·) denotes a confluent hypergeometric function (Slater 1960). The aspect of the SY approach that remains outstanding is the choice of µ 2 0 . Their quantitative definition of the weakness of a set of instruments is couched in terms of the impact that it has on inference. They provided two possible definitions that variously reflect the known consequences of weak instruments for (i) estimation, through the bias of the estimator, and (ii) hypothesis testing, through the size of a particular Wald test relative to its nominal size. It is the former that is in most common use and the approach of interest here. 5 In particular, SY relate the bias of the 2SLS estimator of β,β 2SLS , relative to that of the ordinary least squares estimator of β,β OLS , to the first-stage F-statistic by showing that they are both related to µ 2 . A value for µ 2 , denoted µ 2 0 , is then chosen to allow a certain level of relative bias. Specifically, let B n denote the relative bias for a given sample size n, i.e., As discussed in Chao and Swanson (2007), if there exists a positive integer N < ∞ such that 6 for some δ > 0, then the limit of the sequence of finite sample biases will coincide with the bias computed from the local-to-zero asymptotic distribution. That is, 4 Some references specify the non-centrality parameter for a non-central chi-squared distribution as δ, whereas others specify it as δ/2. We have adopted the former convention here. 5 The exact details of these arguments can be found in SY and will not be repeated here. 6 We thank an anonymous referee for bringing this subtlety to our attention. For a more complete discussion of this point, we refer the reader to the discussion of (Chao and Swanson 2007, pp. 518-19) and the references cited therein.
where ξ ∼ N λ 0 , I k z . The test for weak instruments then proceeds as follows: 1. The practitioner chooses a value for |B|, e.g., |B| = 0.1, if an asymptotic relative bias of less than 10% is deemed acceptable.

3.
Given µ 2 0 , critical values for F can be determined, which are proportional to those of the non-central chi-squared distribution as specified in (4). 4.
The null of weak instruments is then rejected for sufficiently large values of the first-stage F-statistic, and we conclude that |B| is no larger than the value chosen in Step 1 above.
The difficulty in the procedure just described is that, at Step 2, there is an integral that must be evaluated as part of a search for µ 2 0 . SY do this using a 20,000 draw Monte Carlo simulation. This is unnecessary as the integral can be solved analytically. The result is summarised in the following theorem. (8), then, provided k z ≥ 2,
Proof. The result follows immediately from Theorem A1, in Appendix A, which establishes the equality, and from the observation that if b ≥ a > 0 but s < 0 then 0 < 1 F 1 (a; b; s) ≤ 1, which establishes the inequality. 7 That Theorem 1 involves a confluent hypergeometric function is not surprising as they have long figured prominently in the finite sample literature; see, for example, Phillips (1980) and the papers cited therein. These functions have been very intensively studied in the mathematics literature over a period of hundreds of years and so an important consequence of Theorem 1 is that it allows the use of efficiently programmed intrinsic functions in readily available software, such as MATLAB (MathWorks 2016), at each step of a search for µ 2 0 rather than having to estimate an integral by simulation. 8 For the special case of k z = 2, making evaluation of the expression especially simple. 7 It should be noted that the proof provided is not the only one possible and we would like to thank helpful referees for drawing various alternatives to our attention. For example, in an elegant paper, Chao and Swanson (2007), Proposition 3.1 and Lemma 3.3, respectively, derive local to zero approximations for each of lim n→∞ E β 2SLS,n − β and lim n→∞ E β OLS,n − β , from whence derivation of the ratio is straightforward. Similarly, there are finite sample papers in the literature from which it would be possible to start a proof along the lines of the one presented but at a more advanced point (see, for example, Forchini and Hillier 2003, Equation B.13). However, we favour the proof presented for two reasons. First, it is a direct continuation of the developments of Stock and Yogo (2005), Equation 3.1, and the discussion immediately thereafter. Second, when viewed in the correct light, there are much earlier antecedents that take precedence over the two mentioned here. We discuss this further in Section 6. 8 In the absence of such intrinsic functions, computational aspects of hypergeometric functions are discussed in Johansson (2016).
Econometrics 2018, 6, 44 6 of 23 Using our result, we provide in Table 1 an extended version of that panel of SY Table 5.1 corresponding to a single endogenous variable, which is the set of critical values most commonly used. We note that SY start their tables at k z = 3 even though, following the arguments of Kinal (1980), finite biases will exist for all k z ≥ 2 if one is prepared to make a normality assumption. As this is a practically relevant case, we include it in Table 1. However, such inclusion is not without controversy. The mode of convergence leading to Label (8) is convergence in distribution. Existence of moments is not sufficient to imply convergence in expectation, which is a stronger result (see Label (7) and the accompanying discussion). Heuristically, (7) might be interpreted as meaning that a little more than simply the existence of the moments of the estimators is required for the sequence of biases to converge to the local-to-zero asymptotic results, and so this might be achieved by requiring k z ≥ 3 in the case of a single endogenous regressor rather than just k z ≥ 2, although we note that (7) doesn't actually say this. In any event, the inclusion in Table 1 of the row k z = 2 may be viewed as something of an ad hoc approximation. Some confidence in the value of the approximation may be garnered from the simulation results presented in Section 4.
Where Table 1 overlaps with SY (Table 5.1), we are able to provide an indication of the difference made by the analytical evaluation of the expectation in (8). As shown in Table 2, the differences are typically small, with the largest differences when k z and B are themselves small, which we would argue is the most important case in practice. 9 We have also computed simulated critical values from 20,000 random draws as in SY, but repeating the exercise 1000 times. The resulting mean critical values are virtually identical to those in Table 1, with the maximum difference being 0.02.

Some Further Consequences of Theorem 1
Theorem 1 allows us to prove a variety of further results that can only be speculated about on the basis of simulation results.
Remark 1. Implicit in Theorem 1 is the observation that, whenever k z ≥ 2, OLS and 2SLS are always asymptotically biased in the same direction, making the absolute value function of |B| in (8) redundant.

Remark 2.
The values of the limiting relative biases ofβ 2SLS andβ OLS are explored in Figure 1 for different values of the parameters k z and µ 2 /2. The figure illustrates that, for k z ≥ 2, the function is increasing in its argument, which is −µ 2 /2. Note also that, as µ 2 → 0, the information in the instruments approaches zero, and so the local-to-zero asymptotic bias ofβ 2SLS approaches that ofβ OLS from below. Hence, the limit of the relative asymptotic biases at µ 2 = 0 is unity, which is the value of 1 F 1 1; k z 2 ; 0 . Table 1 are readily established, as illustrated by the following result.
Heuristically, Theorem 2 states that the critical values will necessarily decrease as one moves from left to right across any given row of Table 1; that is, the critical values decrease as the practitioner is willing to accept increasing amounts of 2SLS bias relative to that of OLS. The intuition behind the results is as follows. An increase in B for fixed k z implies that the argument of the confluent hypergeometric function in (9) must increase, i.e., that µ 2 /2 must decrease. As µ 2 approaches zero, the non-central chi-squared distribution from which critical values are drawn approaches a central chi-squared and the corresponding quantiles become smaller. Hence, as one moves across columns from left to right in Table 1, the cv α become smaller.
Theorem 2 explains the row behaviour of Table 1. Explaining the column behaviour is much more complicated. Observation suggests the following to be true.
Conjecture 1. For given B, the critical values cv α , presented in Table 1, are increasing functions of k z up to some value, k say, whereafter they are decreasing functions of k z . k is a decreasing function of B.
Some intuition for Conjecture 1 is available from the definition of cv α , see (5), if one considers the impact of increasing the number of instruments by one, from k z to k z + 1, with superscripts '0' and '1' distinguishing the two cases, respectively. For given B and α, k is then that value of k z after which the cv α start diminishing.
Remark 4. Although B does not exist when k z = 1, the confluent hypergeometric function of Theorem 1 remains well-defined. In Appendix D, we analyse the properties of 1 F 1 1; 1 2 ; − µ 2 2 . Heuristically, Theorem 2 states that the critical values will necessarily decrease as one moves from left to right across any given row of Table 1; that is, the critical values decrease as the practitioner is willing to accept increasing amounts of 2SLS bias relative to that of OLS. The intuition behind the results is as follows. An increase in B for fixed k z implies that the argument of the confluent hypergeometric function in (9) must increase, i.e., that µ 2 /2 must decrease. As µ 2 approaches zero, the non-central chi-squared distribution from which critical values are drawn approaches a central chi-squared and the corresponding quantiles become smaller. Hence, as one moves across columns from left to right in Table 1, the cv α become smaller.
Theorem 2 explains the row behaviour of Table 1. Explaining the column behaviour is much more complicated. Observation suggests the following to be true.
Conjecture 1. For given B, the critical values cv α , presented in Table 1, are increasing functions of k z up to some value, k say, whereafter they are decreasing functions of k z . k is a decreasing function of B.
Some intuition for Conjecture 1 is available from the definition of cv α , see (5), if one considers the impact of increasing the number of instruments by one, from k z to k z + 1, with superscripts '0' and '1' distinguishing the two cases, respectively. For given B and α, k is then that value of k z after which the cv α start diminishing.
Remark 4. Although B does not exist when k z = 1, the confluent hypergeometric function of Theorem 1 remains well-defined. In Appendix D, we analyse the properties of 1 F 1 1; 1 2 ; − µ 2 2 .

Some Monte Carlo Results
We follow Sanderson and Windmejer (2016) and specify the model is as in (1) and (2), with β = 1 and The instruments in Z are k z independent standard normally distributed random variables and π = c Bk z ι k z / √ n, where ι k z is a k z vector of ones, and with c Bk z chosen such that the relative bias B is equal to 0.01, 0.05, 0.10 or 0.20, for values of k z = 3 and k z = 2. The sample size n = 10,000 and the results are presented in Table 3 for 100,000 Monte Carlo replications. For k z = 3, the results are exactly in line with the theory: the Monte Carlo relative biases are equal to B and the rejection frequencies of the first-stage F-test are 5% at the 5% nominal level, using the critical values reported in Table 1.
The results for k z = 2 are also in line with the theory, although we see here that the standard deviations ofβ 2SLS are much larger than those of the k z = 3 case at the same values of B. This is due to the fact that the information needed to obtain the same relative bias is much smaller for the k z = 2 case than for the k z = 3 case , as reflected by their smaller µ 2 0 /k z values, but it also reflects the problem that the second moment does not exist when the degree of over-identification is equal to 1. The interquartile ranges for the 2SLS estimator when k z = 2 are 0.3296, 0.4170, 0.4811 and 0.5570 for B = 0.01, 0.05, 0.10 and 0.20, respectively. These Monte Carlo results therefore confirm our theoretical findings for the k z = 2 case. Clearly some caution should be exercised when working with 2SLS in this case because it possesses no second moment.

p-Values
p-values are readily available as a straightforward extension of our earlier analysis. Specifically, from (4), we have the limiting result For any particular sample value of the F-test, say F, if X ∼ χ 2 k z ,µ 2 0 , then the p-value for the SY weak instruments test considered in this paper is simply Pr X ≥ k z × F . Of course, the problem here is the determination of µ 2 0 . Table 4 reports those values of µ 2 0 /k z that were calculated in order to construct Table 1. For those values of B considered in Table 1, we now have the parameters k z and µ 2 0 /k z . Consequently, any computer software that can evaluate a non-central chi-squared cdf can readily calculate p-values for the test for weak instruments considered here.

Multiple Endogenous Regressors
The general case is obtained by defining x in (1) to be a matrix of endogenous regressors of dimension n × k x , say, so that β is k x × 1. Then, (2) becomes a multivariate regression model with π of dimension k z × k x and v of dimension n × k x . The rows of [u, v] are independent with common covariance matrix Σ. All other aspects of the model described in (1) and (2) remain essentially unchanged. 11 In terms of the SY analysis, it clearly makes no sense to proceed in terms of the ratio The complete set of assumptions are presented in SY (Section 2.4).
given that β is now a vector rather than a scalar quantity. Thus, instead, they focus attention on the quadratic form (SY, Equation 3.8) where h is a matrix analogue of the expectation explored in Theorem 1 and γ = ρ(ρ ρ) −1/2 , with ρ defined in SY (p. 85). The essential feature of γ is that it is not a function of λ. Despite being somewhat more tractable, the quadratic form tells us no more about the non-centrality parameter determining the bias of the 2SLS estimator than does say, where the elements of the k z × k x matrix ξ are jointly distributed according to with λ as defined in SY (p. 85). 12 Again, we should stress that the normality of ξ is a consequence of the local-to-zero asymptotic analysis and is not a strong distributional assumption. As γ is independent of λ, it is sufficient to focus attention on the vector of expectations J . If we let e i denote the ith row of the identity matrix I k x , then we have immediately 13 where the ith element of the k . 14 Here, φ denotes ordered partitions of j into no more than k x parts, so that φ = (φ 1 , φ 2 , . . . , φ p ) where the integers φ 1 , φ 2 , . . . , φ p satisfy the restrictions (i) φ 1 ≥ φ 2 ≥ · · · ≥ φ p , (ii) ∑ p i=1 φ i = j, and (iii) p ≤ k x . The symbol ∑ φ then denotes the sum over all such partitions of j. For example, the ordered partitions of 2 are the so-called top partition [2] and (1, 1). 15 The invariant polynomials C φ,1 ψ (·, ·) and the symbol ∑ ψ∈φ· [1] are defined in Davis (1979), p. 465, with θ φ,1 ψ = C φ,1 ψ (I k z , I k z )/C ψ (I k z ) a constant that may be zero. These were developed as extensions (to two matrix arguments) of the zonal polynomials C ψ (·) originally due to James (1961). 16 Finally, the generalised hypergeometric coefficients are defined as (Constantine 1963, Equation (26) (14) are computationally problematic. The available evidence suggests that the series are typically slow to converge (Phillips 1983a(Phillips , 1983b. Unfortunately, the invariant polynomials 12 In (13), vec (·) is the usual matrix operator that stacks all of the columns of its matrix argument into a single column vector; see, for example, Muirhead (1982), p. 17. 13 Make the substitutions {e i , γ, k z } for {α, β, ν}, respectively, in Hillier et al. (1984), Equation (30). 14 Please note that the definition of Λ adopted here is slightly different from the definitions used in either Hillier et al. (1984) and SY. 15 See Muirhead (1982), Section 7.2.1, for a much more complete treatment of ordered partitions. 16 The zonal polynomials appearing in (14) adopt a normalisation due to Constantine (1963), which typically leads to more compact expressions than do the polynomials originally proposed by James (1961).
of Davis are tabulated only to low order and, to date, no algorithms have been derived for their computation. 17 Consequently, until such time as the computational restrictions are lifted, the practical relevance of the result is limited and is offered here only for completeness. Better progress might be made working with one of the various approximation techniques that are available, e.g., the Laplace approximation used by Phillips (1983b) to extract approximate marginal distributions for IV estimators in this more general setting. In this case, however, we can further adapt the results of Hillier et al. (1984) to obtain many instrument approximations to (14). Specifically, analogous to their Equations (32) and (33), we have and where J i denotes the ith element of J . 18 Although these approximations are operational, in the presence of multiple endogenous regressors, we question whether the approach of SY is as sound as it is in the single endogenous regressor case. Our concern is rooted in the structure of the Davis polynomials themselves. For matrix arguments, X and Y with given indices k and l, respectively, the polynomials are 'linear combinations of the distinct products of traces (tr X a 1 Y b 1 X c 1 · · · ) r 1 (tr X a 2 Y b 2 X c 2 · · · ) r 2 · · · of total degree k, l in the elements of X, Y, respectively (Davis 1979, p. 468). It is immediately apparent that local-to-zero asymptotic expression for the bias of 2SLS is not a function of the eigenvalues of the concentration parameter or, at least, not a function of them alone. Stock and Yogo (2005), p. 90, remark on this themselves when discussing certain numerical results, where they observe that 2SLS bias is decreasing in all eigenvalues of the concentration parameter for all values of (what we call) k z . To focus on the smallest eigenvalue, as the Cragg-Donald statistic does, is, consequently, problematic. Another way of thinking of this problem is to consider the problem of determining the magnitude of a matrix and to ask which of the following three matrices is either largest or smallest: diag (1, 2, 3) , diag (2, 2, 2) , diag (0, 0, 6) .
While consideration of the smallest eigenvalue will lead to a particular choice, it is not clear that that choice actually has that much to do with the exact behaviour of the IV estimator, except in the scalar case. Indeed, for this reason, the SY results for multiple endogenous variable cases are only approximate and provide upper bounds on critical values for the Cragg-Donald minimum eigenvalue test. That the SY approach works in the case of a single endogenous regressor is a consequence of the commutative law of multiplication that allows us to extract scalars from products in ways that we can't in more general matrix situations. In summary, the SY approach results in a well-posed problem in the case of a single endogenous regressor but a somewhat poorly-posed problem in the case of multiple endogenous regressors.
If one must deal with multiple endogenous regressors, then we prefer the approach of Sanderson and Windmejer (2016), who define weak identification as the rank of Π being local to a rank reduction of one, which is essentially a scalar problem. By reducing the problem in this way, the approach is reduced to a well-posed problem for which the single endogenous variable results apply, only the degrees of freedom need to be adjusted for the number of endogenous variables (see Sanderson and Windmejer (2016) for details).

The Wisdom of Hindsight: Some Historical Remarks
As discussed in Footnote 7, helpful referees drew our attention to the fact that there are other ways that we might have approached this problem than the one that we initially chose. Indeed, once one recognises the the structure of the problem, many results become available. However, before discussing some of these, we will again stress that the results derived in the previous sections are all asymptotic in nature, there are no underlying exact distributional assumptions beyond those originally made in Staiger and Stock (1997). What these results share with the exact distribution literature is integrals with similar structures and a common approach to resolving them. Furthermore, the parameterisations adopted here are different from the canonical forms underlying the exact distributional results and so the resultant expressions are different even if their structures are reminiscent of earlier results. A prime example of this is given by the similarities between the many, local-to zero, instrument approximations of Equations (15) and (16) and the large-sample approximations of Hillier et al. (1984, Equations (32) and (33)).
It was noted in Footnote 7 that Chao and Swanson (2007) had derived local-to-zero asymptotic expressions for the bias of both 2SLS and OLS in the scalar case, so that they might bias correct the estimators, and that these provide an alternate path to Theorem 1. If moments were the focus of our attention, then we should note that the results of Chao and Swanson (2007) have the same structure as do those of Richardson (1968) and references cited therein, who first derived moments in the scalar case with arbitrary numbers of instruments and proved Basmann's conjecture (Richardson 1968, Section 4.3) for the existence of moments. In the general case, we should, of course, be looking to Hillier et al. (1984) for results with the same structure as presented here and to Kinal (1980), who established existence criteria in the general case. Noting that the distributions of interest in the exact finite sample literature are different from the distributions thrown up by the local-to zero asymptotics, with different parameters, it might be argued that the exact results for misspecified models are closer in spirit to what we have here. In this event, we might argue that results for moments are implicit (but unrecognisable) in Hale et al. (1980) or, in a far more recognisable form in Knight (1982), for the scalar case, or Skeels (1995) in the general case. However, here moments are only of interest as an intermediate result to obtain the relative bias that is used to obtain a non-centrality parameter that determines the non-central chi-squared distribution of interest in step 3 of the procedure given following (8). Perhaps of greater historical interest, in the scalar case, are the results of Richardson and Wu (1971) who explore the properties of the relative bias. However, these results are of limited interest for two reasons. First, and probably most important, because the parameterisation of their model differs from that generated by the local-to-zero asymptotics, their tabulated results are not comparable with what is done here. Second, to reiterate, the relative bias is only of interest to us inasmuch as it allows us, given certain other information, to chose a non-centrality parameter useful in determining SY critical values in the scalar case. That is, unlike for all of the above-mentioned papers, moments are only of tangential interest to us, as this is not a study of the estimators themselves. To the best of our knowledge, Section 6 provides the first treatment of the (local-to-zero asymptotic) relative bias in the general case.

Conclusions
The main contribution of this paper has been to resolve analytically an integral as a special function, obviating the need to resolve it by simulation. This integral is of independent interest in the theory of ratios of quadratic forms in normal variables. Here, it is of primary interest because it provides a functional relationship between the bias in the 2SLS estimator and the limiting sampling distribution of a test statistic that SY proposed for testing the presence of weak instruments, when the null of weak instruments is true. Analysis of this special function provides theoretical foundations for the remarks of Section 3, which explore patterns observed in Table 1 as the parameters B and k z vary. This analysis required the derivation of certain results that are of independent interest in the theory of confluent hypergeometric functions. We have also explored the problem of p-values of the aforementioned test for weak instruments, on the basis of our earlier theoretical developments. We provide information such that any computer software that can then evaluate a non-central chi-squared cdf can readily compute p-values in essentially all practical circumstances. The final contribution of this paper has been the analysis of the general case characterised by an arbitrary number of endogenous regressors. Here, we find that the analysis is able to draw heavily on the foundations laid down in the literature on exact sampling distributions. This allows us to provide expressions for both the expectation of interest and also first-and second-order many-instrument expansions of this expectation. The exact expression obtained for the integral of interest in the general case is not of great practical interest, as it involves invariant polynomials with two matrix arguments for which, at the time of writing, there exist no algorithms for their computation, except in special cases. Nevertheless, the asymptotic expansions obtained are readily computable and potentially of practical importance. Given our reservations about the usefulness of the overall procedure in the general case, we leave such explorations to others.
One aspect of the SY tables that we have not addressed relates to those tables based on size distortions of a Wald statistic. This is a much more difficult analytical problem than has been addressed here and it is not clear that there is much benefit in tackling it as, in our estimation, the bias tables are in much more frequent use, making them of greater practical relevance.
Finally, in support of the results presented in the paper, we provide two MATLAB programs on an 'as is' basis. The first of these, Table1.m, provides the body of Table 1. The second program, entitled sypval.m, provides p-values. Appendix C provides some discussion on the contents of these programs. The programs are available at https://sites.google.com/site/skeelscv/.
Author Contributions: F.W. conceived, designed, and performed the simulation experiments; C.S. and F.W. wrote the paper.

Acknowledgments:
We happily acknowledge our intellectual debt to Peter Phillips, whose seminal contributions to the analysis of inference in simultaneous equations models is but a small part of his enormous contribution to the econometrics literature over a nearly 50-year period. We are particularly grateful to Grant Hillier for helping us with some technical details that form the basis of Appendix E. We also acknowledge Jon Temple, who provided extensive comments on an earlier draft of this paper, and Mark Schaffer for more recent comments. We would also like to thank the anonymous referees for their very constructive comments. The usual caveat applies. Skeels thanks the Department of Economics at the University of Bristol for their hospitality during his visits there, which is where this paper was first written. Finally, we would like to dedicate this paper to the memory of John Knight, a fine scholar and, far more importantly, one of nature's true gentlemen, taken far too soon.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. The Expectation of a Particular Function of Normal Random Variables
Theorem A1. Suppose that ξ ∼ N (λ, I k ). Then, Proof. It is straightforward to demonstrate that the expectation is unbounded when k = 1 and so we shall assume hereafter that k > 1. Given the normality assumption on ξ, we can write In accordance with Herz (1955), Lemma 1.4, we can decompose almost all k-vectors ξ into ξ = hr 1/2 , where h = ξ(ξ ξ) −1/2 , so that h h = 1, and r = ξ ξ > 0, with volume elements dξ = 2 −1 r (k−2)/2 dh dr.
Note that (A11) implicitly assumes 0 < χ α < ∞, so that 0 < α < 1. In the event that either χ α = 0 or χ α is infinite, then f (s | k z , µ 2 ) = 0, as does its derivative with respect to µ 2 , making the representation (A11) invalid. Indeed, as these cases are on the boundaries of support of a non-central chi-squared random variable, the ordinary derivative is not well-defined and so the approach taken above would require modification. For this reason, hereafter, we shall assume that 0 < α < 1. From (A6), the integrand in (A11) is (Cohen 1988, Equation (2)) Integrating by parts allows us to write χ α 0 e −s/2 s (k z +2j)/2 ds = −2e −s/2 s (k z +2j)/2 and so (A11) becomes as κ j (k z + 2, µ 2 )(k z + 2j) − κ j (k z , µ 2 ) = 0. The positivity of the ratio follows because each of the functions f are values of non-central chi-squared density functions which differ only in their degrees of freedom, k z versus k z + 2 respectively, and so are both everywhere positive for all 0 < χ α < ∞, as is assumed above. As an aside, we know that as degrees of freedom increase for given µ 2 these functions cross, which means that sometimes f (χ α | k z , µ 2 ) > f (χ α | k z + 2, µ 2 ) and sometimes the converse is true. That is, we are unable to bound dχ α / dµ 2 from above. Combining (A9), (A10), and (A12), we find that which confirms the behaviour observed in Table 1. That is, for given values of k z , the critical values cv α are decreasing functions of the asymptotic bias B.

Appendix C. Some Remarks on Computational Aspects
For the most part, both the programs Table1.m and sypval.m rely on intrinsic MATLAB functions. Once the relevant inputs are available then the structure of the programs is immediately apparent. Specifically, for given values of k z and B, it is necessary to obtain the corresponding value for µ 2 0 from the nonlinear Equation (9). We adopt a fairly simple-minded approach to this, by iterating from a starting value to the correct solution using a bisection algorithm.
Our starting values are chosen as follows. When k z = 2, we know from (10) that the values of µ 2 0 can be calculated exactly as µ 2 0 = −2 ln B and so no search is required. When k z > 2, we exploit an approximation asymptotic in µ 2 0 (Slater 1960, Equation (4.1.8)) that reduces to µ 2 0 ≈ (k z − 2)/B.
As expected, the performance of the approximation improves as B decreases which, for fixed k z , corresponds to increasing µ 2 0 (see Appendix B.2). Nevertheless, for all cases where k z > 2, this approximation provides much better starting values in the search for µ 2 0 than do naive alternatives, such as starting the search from zero (say). Moreover, this approximation performs best under exactly the same circumstances that naive methods are at their slowest, affording considerable computational time savings. As the values of µ 2 0 /k z are much more stable for any given B than the µ 2 0 , as can be deduced from Table 4, we use this parameterisation in our search algorithm. and so, at µ 2 /2 ≈ 2.2559, it has a local maximum of approximately 0.2847. Consequently, there are three values of µ 2 that yield the same value of |B| for 0 < |B| < 0.2847, there are two values of µ 2 corresponding to |B| = 0.2847, and for |B| ∈ {0} ∩ (0.2847, 1] there is a one-to-one mapping between |B| and µ 2 < ∞. Observe in Figure A1 that there are three values of µ 2 corresponding to |B| = 0.1. Setting µ 2 0 = 13.83, the largest of these numbers, we find a critical value for the first-stage F-test of 28.77. At this level of information, the 2SLS estimator appears well-behaved. This is shown in Table A1, which shows the estimation result of a Monte Carlo analysis as in Section 4 for k z = 1. Even though it has no moments as the model is just-identified, we find that the Monte Carlo relative bias is indeed 10% with the rejection frequency of the F-test again 5%. The same holds at the smaller values of |B| of 0.05 and 0.01, for which the largest implied values of µ 2 0 are 23.41 and 103.06, with the estimation results very similar to those for k z = 3. However, when we consider the |B| = 0.20 case, for which µ 2 0 is 8.198, the lack of moments of the 2SLS estimator becomes apparent, with the standard deviation now very large at 6.05. These results suggest that the approximation might be useful for the smaller values of |B|, if one works with the largest implied values of µ 2 0 , even though the 2SLS estimator does not possess any moments in this case. Observe in Figure A1 that there are three values of µ 2 corresponding to |B| = 0.1. Setting µ 2 0 = 13.83, the largest of these numbers, we find a critical value for the first-stage F-test of 28.77. At this level of information, the 2SLS estimator appears well-behaved. This is shown in Table A1, which shows the estimation result of a Monte Carlo analysis as in Section 4 for k z = 1. Even though it has no moments as the model is just-identified, we find that the Monte Carlo relative bias is indeed 10% with the rejection frequency of the F-test again 5%. The same holds at the smaller values of |B| of 0.05 and 0.01, for which the largest implied values of µ 2 0 are 23.41 and 103.06, with the estimation results very similar to those for k z = 3. However, when we consider the |B| = 0.20 case, for which µ 2 0 is 8.198, the lack of moments of the 2SLS estimator becomes apparent, with the standard deviation now very large at 6.05. These results suggest that the approximation might be useful for the smaller values of |B|, if one works with the largest implied values of µ 2 0 , even though the 2SLS estimator does not possess any moments in this case.