On the Interpretation of Instrumental Variables in the Presence of Specification Errors

The method of instrumental variables (IV) and the generalized method of moments (GMM), and their applications to the estimation of errors-in-variables and simultaneous equations models in econometrics, require data on a sufficient number of instrumental variables that are both exogenous and relevant. We argue that, in general, such instruments (weak or strong) cannot exist.


Introduction
Researchers are becoming increasingly aware that there are often serious problems with the use of instrumental-variable based techniques-both instrumental variable (IV) estimation and versions of generalized methods of moments (GMM) that use instrumental variables [1]. A valid instrument must be uncorrelated with the errors in an equation, that is, it must be exogeneous, and correlated with the explanatory variable, that is, it must be relevant [1,2]; [3] (p. 316); [4] (pp. 603-605). In this connection, Pratt and Schlaifer [5] pointed out that, without knowing what the errors represent, it is not possible to OPEN ACCESS decide whether the exogeneity condition is correct. They also noted that the condition is "meaningless" if the errors are included in an equation to represent the net effect (on the dependent variable) of variables excluded from the equation 1 . This paper may be seen as an extension of the argument made by Pratt and Schlaifer [5] to the general case of IV estimators and, in particular, to explain why much IV estimation is plagued by either irrelevant instruments or instruments that fail the exogeneity condition. As pointed out by Murray [1] (p. 114), an instrument can be so weakly correlated with the troublesome variable that the instrument has little relevance 2 .
In this paper we argue that the difficulties associated with instruments should not be surprising. Specifically, we show that valid instruments cannot exist in the presence of any model mis-specification. Such mis-specification can arise, indeed, is very likely to arise, from a variety of influences, including omitted variables, measurement errors, and incorrect functional forms. To generate cases in which instruments could exist, the model being estimated would have to be correctly specified; any error component of such a model would have to be a white noise process that it is independent of the instruments.
As Pratt and Schlaifer [5] make clear, the interpretation of the error in an equation is crucial here. There are two possible extreme interpretations. One interpretation is embedded in the classical regression model, which includes an error that is simply assumed to be a white noise error process with a given distribution. The alternative view is that the error is generated by all the misspecification in the model; a perfectly specified model would have no error. We would argue that the second interpretation is always more relevant in practice and it is this interpretation which gives rise to the problem with instrumental variables outlined below.
How does out framework fit with the "standard" one? The standard view typically starts from a multivariate DGP made up of a set of random variables with non-degenerate distributions. This will imply the existence of a set of error terms that are not directly associated with any misspecification in the model but which reflects the basic stochastic nature of the variables being considered. These error terms may be easily built into the analysis below simply by interpreting one (or more) of the time-varying coefficients as errors. We will not do this below, as it simply adds an extra layer of complexity without changing the results. The key assumption which makes the analysis below work, however, is that we must assume that at least part of the observed errors comes from model misspecifications, including omitted variables, measurement error and the wrong functional form. If errors are not the result of such misspecification then we would essentially be claiming to know the true model, and the criticism of instrumental variables made below will not hold true.
We would also stress that we are certainly not arguing that, in light of the problems associated with IV estimation, for a return to standard OLS, with its well-known problems. We simply show that instrumental variables do not adequately deal with these problems. There is also a reasonably large 1 Pratt and Schlaifer [5] go on to state that the exogeneity condition may be satisfied for certain "sufficient sets" of excluded variables. However, the point we make here is that it cannot hold for the excluded variables (in the Pratt and Schlaifer sense [5], meaning that, in principle, there are variables that should be in the equation, but are omitted; these are the excluded variables referred to by Pratt and Schlaifer [5]). 2 Additionally, it is extremely difficult to verify if an instrument is uncorrelated with the error term in the equation being estimated. For a discussion, see [6] (pp. 144-145). literature on conducting inference in IV regressions with poor instruments; this literature includes, Cheng and Liao [7], Conley, Hansen and Rossi [8], Di Traglia [9] and Guggenberger [10]. However, this is often assuming that IV at least yields consistent estimates. We argue that this is not the case and, in general, IV is not a consistent estimator, so the accuracy of the inference made is highly questionable.
The remainder of this paper consists of three sections. Section 2 presents a general representation of model mis-specifications. We show why errors in an equation can arise. If a real-world relationship were completely known, there would be no role for a substantial error term. However, incomplete knowledge of real-world relationships is a basic component of estimated relationships. We show how correctly specified models involve time-varying coefficients (TVCs) [11], for which instruments cannot exist because, under a TVC set-up, the error terms contain the explanatory variables. Section 3 provides a simple example that illustrates our argument. Section 4 concludes.

General Considerations
In general, economic theory suggests relationships between variables, but it does not usually give clear guidance as to the correct functional form or the complete set of variables that are relevant. For example, consider an economic variable, denoted by * with unknown functional form, no knowledge of some of the arguments of ) ,..., ( , and with no need for an error term. In other words, we do not have any omitted determinant of * t y in Equation (1) which is, therefore a mathematical equation. To distinguish it from a regression equation, we do not call the regressors or explanatory variables but call them the determinants of * t y or "the arguments" of the function ) ,..., ( . We call the arguments omitted determinants, since data on the latter arguments are not available. Without mis-specifying the relationship in Equation (1), we can write , the time profiles of the α t  s are determined by the correct functional form of model (1). Since the correct functional form is unknown, these time profiles are also unknown. Allowing the coefficients of Equation (2) to vary freely defines an infinite class of functional forms, which surely encompasses the correct (but unknown) functional form of Equation (2) as a special case. A main benefit of model (2) is the certainty that the infinite class of functional forms will encompass the correct functional form and, thus, the unknown functional form problem is solved.
We wish to point out that that if spline-, cubic-spline-, P-spline-, or any other-type restrictions are imposed on the functional form of model (1), then it can have an incorrect functional form; for examples of spline-and cubic-spline-type restrictions, see [3] (p. 111) and [12] (p. 803). A main benefit of model (2) is the certainty that the infinite class of functional forms will encompass the correct functional form. This notion, that a time varying coefficient model can exactly represent an unknown nonlinear functional form was first proved by Swamy and Mehta [13] and subsequently confirmed by Granger [14].
Clearly, the the determinants of y in Equation (2) can be correlated with each other, leading to the well-known problem of multicollinearity. In particular, the K − 1 observable determinants (the * jt x s) in Equation (2) can be correlated with the L(t) − K + 1 omitted determinants (the * gt x s). To assume otherwise would, in the words of Pratt and Schlaifer [5], be a "meaningless" assumption. The mathematical relationship between each omitted determinant and the observed determinants is as follows where 0 λ gt is a portion of * gt x remaining after the effects of the * jt x s have been removed from * gt x .
Since we do not have data on the L(t) − K + 1 * gt x variables, we can eliminate them from Equation (2) by substituting Equation (3) into (2), which gives Note that Equation (4) In the presence of Equation (3) and measurement errors, model (5) coincides with model (2) if According to Pratt and Schlaifer [5], the term in Equation (4) can be treated as an error term. With this treatment we can use the usual regression terminology from this point on.
To recapitulate, we have begun with Equation (1). To solve the unknown functional form problem, Equation (1) is replaced with Equation (2). To solve the excluded variables problem without making meaningless assumptions, Equation (3) is introduced and inserted into Equation (2) to obtain Equation (4). After introducing measurement errors at the appropriate places in Equation (4), it is replaced with Equation (5). 3 In this derivation, no approximations and no meaningless assumptions are made. The terms on the right-hand side of Equations (6) and (7) provide crucial information. Equation (4) shows that the 0 λ gt s, in conjunction with the * jt x s, are at least sufficient to determine * t y .
This is the proof Pratt and Schlaifer [5] (pp. 34, 50) offer to show that the second term on the right-hand side of Equation (6) is a function with the correct functional form of certain "sufficient sets" of excluded variables. The authors warn against adding an arbitrary error term to a linear or nonlinear function of the * jt x s and assuming that the * jt x s are independent of the error term.
The interpretation of the terms on the right-hand side of Equation (7) and their implications are as follows: • The term α jt is equal to * Even though these bias-free effects are economically very meaningful, they cannot be estimated using any of the conventional econometric techniques.
• The term • The explanatory variables of model (5) are correlated with their own coefficients because the measurement-error bias component of γ jt is a function of jt x . 3 For the derivation, see [15]. 4 The minus sign in the expression reflects the fact that the second parenthetical term on the right-hand side of Equation (7) is one minus the ratio ( / ) Econometrics 2015, 3

60
• Model (5) can be mis-specified if the omitted-variable and measurement-error bias (or simply, the specification bias) components of its coefficients in Equation (7) are ignored 5 .
Having derived the model in Equation (5), which explicitly includes all these forms of biases, it is now possible to show why valid instruments cannot be found for this model. Combining Equations (5)-(7) into one gives

Some Illustrative Cases
In the standard approach, we aim to choose instruments that are strongly correlated with the variable being instrumented, but which are independent of the errors in the model. If an instrument is not well-correlated with the variable under consideration, then we have the problem of weak instruments, if the instrument is not independent of the error then we will not remove the bias. We illustrate the problem with IV by considering three cases.
Case I. (Linear models). By adding and subtracting a constant parameter model we get where the last two terms in Equation (9) become the error term in the model. The problem with instrumental variables in this context now becomes apparent; we need to find a variable that is both correlated with xjt, but uncorrelated with the error term, which itself contains xjt. Such a variable almost certainly cannot exist. We extend this proof to nonlinear models in Case III below.
Equation (10) implies that there are no omitted variables and Equation (11) implies that the true model has a linear functional form. Under Equations (10) and (11), Equation (9) reduces to an errors-in-variables model and the error term becomes just 0t For IV estimation of such a model, we need instruments that are relevant and uncorrelated with the errors (exogenous). Assumptions (10) and (11) are highly restrictive and, in effect, amount to the assumption that the model is perfectly specified and that there are no excluded variables. Hence, this extreme case rules out Pratt and Schlaifer's case [5] where the included variables are independent of the excluded variables, as there are none. The error term is then purely an identifier, in the Pratt and Schlaifer sense [5]. However we would argue that this case can never occur in the real world. 1 1 1 1 This is an estimable form of model (5). 6 Now if we were to estimate a fixed coefficient IV version of Equation (5) The instrumental variables that are correlated with the jt x s of the IV equation above, but not with the error terms of model (14), almost surely do not exist because these error terms also involve the jt x s.
Therefore, IV estimation is not possible. It is sometimes claimed that lagged values of the variables in a model provide natural instrumental variables in many time-series settings. The mere fact that the value of , 1 j t x − was determined before the 6 Good approximations to the minimum variance linear unbiased estimators of the π's and the best linear unbiased predictors of the ε's can be obtained by applying an iteratively rescaled generalized least squares method to model (13).
The consistency of these estimators can be established by letting T go to ∞ and letting p go to ∞ more slowly than T.
For further discussion, see [15]. value of ε jt should not lead one to conclude that , 1 j t x − is necessarily independent of ε jt . The variable , 1 j t x − may well have been influenced by a forecast of a variable represented in ε jt , or both , 1 j t x − and ε jt , may have been affected by some third variable, as shown by Pratt and Schlaifer [5] (p. 47). Of course, if , 1 j t x − were independent of the error then this would imply that it was no longer relevant.

A Simple Example
Consider a simple example where the only misspecification is measurement error in the independent variable. Assume that we have a perfectly fitting linear relationship in the true variables: where the measured value of t x is given by is an error term.
There are two ways we can demonstrate the problem with IV applied to Equation (17). First, we may consider the issue from a TVC perspective and we write an exact version of Equation (15) where we are only considering the cases in which β ** ≠ β so that the last term is not the same as the last term in Equation (17). The last term in Equation (19) is the error term. We can see that no valid instruments can exist for t x since t x is also in the error term.
We can also show the same problem from a more conventional perspective. If we perform a fixed coefficient regression, then we can rewrite Equation (15) as where the term in brackets is the error term. We again can see that the error term contains the same variable that we are trying to instrument. Thus, almost surely no valid instrument can exist. One standard way to construct a suitable instrument 7 would be to create the following variable * ε t i t t z x = + where ε t is uncorrelated with t v and * it x . Let us rewrite Equation (16)  x and so the whole problem goes away as we could simply have estimated Equation (15) without any measurement error and, therefore, IV would have been unnecessary in this case.

Conclusions
The instrumental variables that are correlated with the jt x s of model (5), but not with the error terms of model (13), do not, in general, exist because these error terms also involve the jt x variables.
These arguments help explain why practical work with IV methods is plagued by several problems. We would argue that a much better way forward in terms of practical estimation rests on avoiding incorrect functional forms and recognition of the potential sources of omitted-variable and measurement-error biases which are present in Equation (5). By accounting for these sources of biases, we are able to show that (i) the unknown functional form give rise to TVCs; and (ii) in this TVC set-up, instruments almost surely cannot exist.