Accuracy and Efficiency of Various GMM Inference Techniques in Dynamic Micro Panel Data Models

Kiviet, Jan; Pleus, Milan; Poldermans, Rutger

doi:10.3390/econometrics5010014

Open AccessArticle

Accuracy and Efficiency of Various GMM Inference Techniques in Dynamic Micro Panel Data Models

by

Jan Kiviet

^1,*

,

Milan Pleus

² and

Rutger Poldermans

¹

Amsterdam School of Economics, University of Amsterdam, P.O. Box 15867, 1001 NJ Amsterdam,The Netherlands

²

IKZ, Newtonlaan 1-41, 3584 BX Utrecht, The Netherlands

^*

Author to whom correspondence should be addressed.

Econometrics 2017, 5(1), 14; https://doi.org/10.3390/econometrics5010014

Submission received: 28 December 2016 / Revised: 6 March 2017 / Accepted: 10 March 2017 / Published: 20 March 2017

(This article belongs to the Special Issue Recent Developments in Panel Data Methods)

Download Versions Notes

Abstract

:

Studies employing Arellano-Bond and Blundell-Bond generalized method of moments (GMM) estimation for linear dynamic panel data models are growing exponentially in number. However, for researchers it is hard to make a reasoned choice between many different possible implementations of these estimators and associated tests. By simulation, the effects are examined in terms of many options regarding: (i) reducing, extending or modifying the set of instruments; (ii) specifying the weighting matrix in relation to the type of heteroskedasticity; (iii) using (robustified) 1-step or (corrected) 2-step variance estimators; (iv) employing 1-step or 2-step residuals in Sargan-Hansen overall or incremental overidentification restrictions tests. This is all done for models in which some regressors may be either strictly exogenous, predetermined or endogenous. Surprisingly, particular asymptotically optimal and relatively robust weighting matrices are found to be superior in finite samples to ostensibly more appropriate versions. Most of the variants of tests for overidentification and coefficient restrictions show serious deficiencies. The variance of the individual effects is shown to be a major determinant of the poor quality of most asymptotic approximations; therefore, the accurate estimation of this nuisance parameter is investigated. A modification of GMM is found to have some potential when the cross-sectional heteroskedasticity is pronounced and the time-series dimension of the sample is not too small. Finally, all techniques are employed to actual data and lead to insights which differ considerably from those published earlier.

Keywords:

cross-sectional heteroskedasticity; model specification strategy; Sargan-Hansen (incremental) tests; variants of t-tests; weighting matrices; Windmeijer-correction

JEL Classification:

C12; C13; C15; C23; C26; C52; J22

1. Introduction

One of the major attractions of analyzing panel data rather than single indexed variables is that they allow us to cope with the empirically very relevant situation of unobserved heterogeneity correlated with included regressors. Econometric analysis of dynamic relationships on the basis of panel data, where the number of surveyed individuals is relatively large while covering just a few time periods, is very often based on GMM (generalized method of moments). Its reputation is built on its claimed flexibility, generality, ease of use, robustness and efficiency. Widely available standard software enables us to estimate models including exogenous, predetermined and endogenous regressors consistently, while allowing for semiparametric approaches regarding the presence of heteroskedasticity and the type of distribution of the disturbances. This software also provides specification checks regarding the adequacy of the internal and external instrumental variables employed and the specific assumptions made regarding (absence of) serial correlation.

Especially popular are the GMM implementations put forward by Arellano and Bond [1]. However, practical problems have often been reported, such as vulnerability due to the abundance of internal instruments, discouraging improvements of 2-step over 1-step GMM findings, poor size control of test statistics, and weakness of instruments especially when the dynamic adjustment process is slow (a root is close to unity). As remedies it has been suggested to reduce the number of instruments by renouncing some valid orthogonality conditions, but also to extend the number of instruments by adopting more orthogonality conditions. Extra orthogonality conditions can be based on certain homoskedasticity or stationarity assumptions or initial value conditions, see Blundell and Bond [2]. By abandoning weak instruments finite sample bias may be reduced, whereas by extending the instrument set with a few strong ones the bias may be further reduced and the efficiency enhanced. Presently, it is not clear yet how practitioners can best make use of these suggestions, because no set of preferred testing tools is yet available, nor a comprehensive sequential specification search strategy, which in a systematic fashion allow us to select instruments by assessing both their validity and their strength, as well as to classify individual regressors accurately as relevant and either endogenous, predetermined or strictly exogenous. Therefore, it happens often that, in applied research, models and techniques are selected simply on the basis of the perceived significance and plausibility of their coefficient estimates, whereas it is well known that imposing invalid coefficient restrictions and employing regressors wrongly as instruments will often lead to relatively small estimated standard errors. Then, however, these provide misleading information on the actual precision of the often seriously biased estimators.

The available studies on the performance of alternative inference techniques for dynamic panel data models have obvious limitations when it comes to advising practitioners on the most effective implementations of estimators and tests under general circumstances. As a rule, they do not consider various empirically relevant issues in conjunction, such as: (i) occurrence and the possible endogeneity of regressors additional to the lagged dependent variable, (ii) occurrence of individual effect (non-)stationarity of both the lagged dependent variable and other regressors, (iii) cross-section and/or time-series heteroskedasticity of the idiosyncratic disturbances, and (iv) variation in signal-to-noise ratios and in the relative prominence of individual effects. For example: the simulation results in Arellano and Bover [3], Hahn and Kuersteiner [4], Alvarez and Arellano [5], Hahn et al. [6], Kiviet [7], Kruiniger [8], Okui [9], Roodman [10], Hayakawa [11] and Han and Phillips [12] just concern the panel AR(1) model under homoskedasticity. Although an extra regressor is included in the simulation studies in Arellano and Bond [1], Kiviet [13], Bowsher [14], Hsiao et al. [15], Bond and Windmeijer [16], Bun and Carree [17,18], Bun and Kiviet [19], Gouriéroux et al. [20], Hayakawa [21], Dhaene and Jochmans [22], Flannery and Hankins [23], Everaert [24] and Kripfganz and Schwarz [25], this regressor is (weakly-)exogenous and most experiments just concern homoskedastic disturbances and stationarity regarding the impact of individual effects. Blundell et al. [26] and Bun and Sarafidis [27] include an endogenous regressor, but their design does not allow us to control the degree of simultaneity; moreover, they stick to homoskedasticity. Harris et al. [28] only examine the effects of neglected endogeneity. Heteroskedasticity is considered in a few simulation experiments in Arellano and Bond [1] in the model with an exogenous regressor, and just for the panel AR(1) case in Blundell and Bond [2]. Windmeijer [29] analyzes panel GMM with heteroskedasticity, but without including a lagged dependent variable in the model. Bun and Carree [30] and Juodis [31] examine effects of heteroskedasticity in the model with a lagged dependent and a strictly exogenous regressor under stationarity regarding the effects. Moral-Benito [32] examines stationary and nonstationary regressors in a dynamic model with heteroskedasticity, but the extra regressor is predetermined or strictly exogenous. Moreover, his study is restricted to time-series heteroskedasticity, while assuming cross-sectional homoskedasticity. In a micro context cross-sectional heteroskedasticity seems more realistic to us, whereas it is also trickier when N is large and T small.

So, knowledge is still scarce with respect to the performance of GMM when it is not only needed to cope with genuine simultaneity (which we consider to be the core of econometrics), but also because of occurrence of heteroskedasticity of unknown form. Moreover, many of the simulation studies mentioned above did not systematically explore the effects of relevant nuisance parameter values on the finite sample distortions to asymptotic approximations. We examine estimating a prominent nuisance parameter, namely the variance of the individual effects, which to date has received surprisingly little attention in the literature. Regarding the performance of tests on the validity of instruments worrying results have been obtained in Bowsher [14] and Roodman [10] for homoskedastic models. On the other hand Bun and Sarafidis [27] report reassuring results, but these just concern models where

T = 3 .

Hence, it would be useful to examine more cases over an extended grid covering more dimensions. Our grid of examined cases will be much wider and have more dimensions. Moreover, we will deliberately explore both feasible and unfeasible versions of estimators and test statistics (in unfeasible versions particular nuisance parameter values are assumed to be known). Therefore we will be able to draw more useful conclusions on what aspects do have major effects on any inference inaccuracies in finite samples.

The data generating process designed here can be simulated for classes of models which may include individual and time effects, a lagged dependent variable regressor and another regressor which may be correlated with these and other individual effects and be either strictly exogenous or jointly dependent with regard to the idiosyncratic disturbances, whereas the latter may show a form of cross-section heteroskedasticity associated with both the individual effects. For a range of relevant parameter values we will verify in moderately large samples the properties of alternative GMM estimators, both 1-step and 2-step, focussing on alternative implementations regarding the weighting matrix and corresponding corrections to variance estimates according to the often practiced approach by Windmeijer [29]. This will include variants of the popular system estimator, which exploit as instruments the first-difference of lagged internal variables for the untransformed model in addition to lagged level internal variables as instruments for the model from which the individual effects have been removed. We will examine cases where the extra instruments are (in)valid in order to verify whether particular tests for overidentification restrictions have appropriate size and power, such that with reasonable probabilities valid instruments will be recognized as appropriate and invalid instruments will be detected and can be discarded. Moreover, following Kiviet and Feng [33], we shall investigate a rather novel modification of the traditional GMM implementation which aims at improving the strength of the exploited instruments in the presence of heteroskedasticity. Of course, also the simulation design used here has its limitations. It has only one extra regressor next to the lagged dependent variable, we only consider cross-sectional heteroskedasticity, and all basic random terms have been drawn from the normal distribution. Moreover, the design does not accommodate general forms of cross-sectional dependence between error terms. However, by including individual and time specific effects particular simple forms of cross-sectional dependence are accommodated.

Due to the high dimensionality of the Monte Carlo design a general discussion of the major findings is hard, because particular qualities (and failures) of inference techniques are usually not global but only occur in a particular limited context. However, in the last but one paragraph of the concluding Section 7 we nevertheless provide a list of eleven (a through k) established observations which seem very useful for practitioners. Here we will list a few more advises most of which seem contrary to current dominant practice: (i) many studies claim to have dealt with the limitations of a static model by just including the lagged dependent variable, whereas its single extra coefficient just leads to a highly restrictive dynamic model; (ii) it seems widely believed that an exogenous regressor should just be instrumented by itself, whereas using its lags as instruments too is highly effective to instrument further non-exogenous regressors; (iii) test statistics involving a large number of degrees of freedom will generally lack power when they jointly test restrictions of which only few are false and therefore Sargan-Hansen statistics should as a rule be partitioned into a series of well-chosen increments; (iv) it has been reported (see, for instance, Hayashi [34] p. 218), that Sargan-Hansen tests tend to overreject, especially when using many instruments, though in our simulations we find that underrejection is predominant, as already reported under homoskedasticity by Bowsher [14]; (v) estimates of nuisance parameters are generally useful to interpret estimated parameters of primary interest and therefore not only the variance of the idiosyncratic disturbances but also the variance of the individual effects should be examined.

The structure of this study is as follows. In Section 2 we first present the major issues regarding IV and GMM coefficient and variance estimation in linear models and on inference techniques on establishing instrument validity and regarding the coefficient values by standard and by corrected test statistics. Next in Section 3 the generic results of Section 2 are used to discuss in more detail than provided elsewhere the various options for their implementation in linear models for single dynamic simultaneous micro econometric panel data relationships with both individual and time effects and some form of cross-sectional heteroskedasticity. In Section 4 the Monte Carlo design is developed to analyze and compare the performance of alternative often asymptotically equivalent inference methods in finite samples of empirically relevant parametrizations. Section 5 summarizes the simulation results, from which some preferred techniques for use in finite samples of particular models emerge, plus a warning regarding particular types of models that require more refined methods yet to be developed. An empirical illustration, which involves data on labor supply earlier examined by Ziliak [35], can be found in Section 6, where we also formulate a tentative comprehensive specification search strategy for dynamic micro panel data models. Finally, in Section 7 the major findings are summarized.

2. Basic GMM Results for Linear Models

Here we present concisely the major generic results on IV and GMM inference for single indexed data. First we define the model and estimators, discuss some of their special properties and consider specific test situations. From these general findings for linear regressions the examined implementations for specific linear dynamic panel data models follow easily in Section 3.

2.1. Model and Estimators

Let the scalar dependent variable

y_{i}

depend linearly on K regressors

x_{i}

and an unobserved disturbance term

u_{i},

and let there be

L \geq K

variables

z_{i}

(the instruments) that establish orthogonality conditions such that

\begin{matrix} \begin{matrix} \begin{matrix} y_{i} = x_{i}^{'} β_{0} + u_{i} \\ E [z_{i} (y_{i} - x_{i}^{'} β_{0})] = 0 \end{matrix}\} i = 1, . . ., N . \end{matrix} \end{matrix}

(1)

Here

x_{i}

and

β_{0}

are

K \times 1

vectors,

β_{0}

containing the true values of the unknown coefficients, and

z_{i}

is an

L \times 1

vector. Applying the analogy principle, the method of moments (MM) aims to find an estimator for model parameter β by solving the L sample moment equations

\begin{matrix} \begin{matrix} N^{- 1} Σ_{i = 1}^{N} z_{i} (y_{i} - x_{i}^{'} \hat{β}) = 0 . \end{matrix} \end{matrix}

(2)

Generally, these have a unique solution only when

L = K

and then yield

\begin{matrix} \begin{matrix} \hat{β} = {(Σ_{i = 1}^{N} z_{i} x_{i}^{'})}^{- 1} Σ_{i = 1}^{N} z_{i} y_{i}, \end{matrix} \end{matrix}

(3)

provided the inverse exists. For

L > K

the MM recipe to find a unique estimator is: minimize with respect to β the criterion function

Σ_{i = 1}^{N} [(y_{j} - x_{j}^{'} β) z_{j}^{'}] G Σ_{i = 1}^{N} [z_{i} (y_{i} - x_{i}^{'} β)]

for some weighting matrix

G .

It can be shown that the asymptotically optimal choice for G is an expression which has a probability limit that is proportional to the inverse of the asymptotic variance V of

N^{- 1 / 2} Σ_{i = 1}^{N} z_{i} u_{i} \overset{d}{\to} N (0, V) .

When

u_{i} \sim i i d (0, σ_{u}^{2})

an optimal choice for G is proportional to the inverse of

Σ_{i = 1}^{N} z_{i} z_{i}^{'}

and the MM estimator is

\begin{matrix} \begin{matrix} {\hat{β}}_{I V} = {[X^{'} Z {(Z^{'} Z)}^{- 1} Z^{'} X]}^{- 1} X^{'} Z {(Z^{'} Z)}^{- 1} Z^{'} y, \end{matrix} \end{matrix}

(4)

where

y = {(y_{1} \dots y_{N})}^{'},

X = {(x_{1} \dots x_{N})}^{'}

and

Z = {(z_{1} \dots z_{N})}^{'} .

But, when

E (z_{i} u_{i}) = 0

while

u = {(u_{1} \dots u_{N})}^{'} \sim (0, σ_{u}^{2} Ω),

where Ω has full rank and without loss of generality

t r (Ω) = N,

the optimal choice for G is a matrix proportional to

{(Z^{'} Ω Z)}^{- 1},

yielding MM estimator

\begin{matrix} \begin{matrix} {\hat{β}}_{G M M} = {[X^{'} Z {(Z^{'} Ω Z)}^{- 1} Z^{'} X]}^{- 1} X^{'} Z {(Z^{'} Ω Z)}^{- 1} Z^{'} y . \end{matrix} \end{matrix}

(5)

Note that for

Ω = I_{N}

the latter formula simplifies to

{\hat{β}}_{I V} .

When

L = K

both

{\hat{β}}_{G M M}

and

{\hat{β}}_{I V}

simplify to (3) or

{(Z^{'} X)}^{- 1} Z^{'} y .

Below we focus on cases where Ω is diagonal.

When Ω is unknown and therefore (5) is unfeasible, one should use an informed guess

Ω^{(0)}

to obtain the 1-step estimator

{\hat{β}}_{G M M}^{(1)},

which is sub-optimal when

Ω^{(0)} \neq Ω,

though consistent under the assumptions made. Then the residuals

\begin{matrix} \begin{matrix} {\hat{u}}^{(1)} = y - X {\hat{β}}_{G M M}^{(1)} \end{matrix} \end{matrix}

(6)

are consistent for

u .

In cases where

{\hat{G}}^{(1)} = {(N^{- 1} Σ_{i = 1}^{N} {\hat{u}}_{i}^{(1) 2} z_{i} z_{i}^{'})}^{- 1}

is such that

plim ({\hat{G}}^{(1)} - {[N^{- 1} Z^{'} Ω Z]}^{- 1}) = O

the 2-step estimator

\begin{matrix} \begin{matrix} {\hat{β}}_{G M M}^{(2)} = {[X^{'} Z {\hat{G}}^{(1)} Z^{'} X]}^{- 1} X^{'} Z {\hat{G}}^{(1)} Z^{'} y \end{matrix} \end{matrix}

(7)

is asymptotically equivalent to

{\hat{β}}_{G M M}

and thus asymptotically optimal, given the L instruments used.

2.2. Some Algebraic Peculiarities

Defining

P_{Z} = Z {(Z^{'} Z)}^{- 1} Z^{'}

and

\hat{X} = P_{Z} X

one finds

{\hat{β}}_{I V} = {({\hat{X}}^{'} \hat{X})}^{- 1} {\hat{X}}^{'} y,

which highlights its two-stage least-squares character. Now suppose that

X = (X_{1}, X_{2})

and

Z = (Z_{1}, Z_{2})

with

Z_{2} = X_{2}

whereas

X β = X_{1} β_{1} + X_{2} β_{2}

, where

β_{1}

and

β_{2}

have

K_{1}

and

K_{2}

elements respectively. Standard results on partitioned regression yields

\begin{matrix} \begin{matrix} {\hat{β}}_{1, I V} = {({\hat{X}}_{1}^{'} M_{{\hat{X}}_{2}} {\hat{X}}_{1})}^{- 1} {\hat{X}}_{1}^{'} M_{{\hat{X}}_{2}} y = {(X_{1}^{'} P_{M_{X_{2}} Z_{1}} X_{1})}^{- 1} X_{1}^{'} P_{M_{X_{2}} Z_{1}} y, \end{matrix} \end{matrix}

(8)

which is the IV estimator in the regression of y on just

X_{1}

using the

L - K_{2}

instruments

M_{X_{2}} Z_{1} .

This result is known as partialling out the predetermined regressors

X_{2} .

It follows from

{\hat{X}}_{2} = P_{Z} X_{2} = X_{2}

which yields

\begin{matrix} M_{{\hat{X}}_{2}} {\hat{X}}_{1} = M_{X_{2}} P_{Z} X_{1} = M_{X_{2}} (P_{X_{2}} + P_{M_{X_{2}} Z_{1}}) X_{1} = P_{M_{X_{2}} Z_{1}} X_{1} . \end{matrix}

A similar result is not straight-forwardly available for GMM because of the following.

Let positive definite matrix Ω be factorized as follows

\begin{matrix} \begin{matrix} Ω^{- 1} = Ψ^{'} Ψ, so Ω = Ψ^{- 1} {(Ψ^{'})}^{- 1} . \end{matrix} \end{matrix}

(9)

Now define

\begin{matrix} \begin{matrix} y^{*} = Ψ y, X^{*} = Ψ X, Z^{†} = {(Ψ^{'})}^{- 1} Z, \end{matrix} \end{matrix}

(10)

then

\begin{matrix} \begin{matrix} {\hat{β}}_{G M M} & = {[X^{*'} Z^{†} {(Z^{†'} Z^{†})}^{- 1} Z^{†'} X^{*}]}^{- 1} X^{*'} Z^{†} {(Z^{†'} Z^{†})}^{- 1} Z^{†'} y^{*} \\ = {(X^{*'} P_{Z^{†}} X^{*})}^{- 1} X^{*'} P_{Z^{†}} y^{*}, \end{matrix} \end{matrix}

so GMM is equivalent to IV using transformed variables, but where Z has been transformed differently. Therefore, if

X_{2}

is such that

X_{2}^{*}

establishes valid instruments in the transformed model

y^{*} = X^{*} β + u^{*},

where

u^{*} \sim (0, σ_{u}^{2} I_{N})

, the regressors

X_{2}^{*}

are not used as instruments in GMM in its IV interpretation. They would, though, if one would deliberately choose

Z_{2} = Ω^{- 1} X_{2}

.

As is well-known and easily verified, linear transformations of the matrix of instruments of the form

Z^{⋄} = Z C,

where C is a full rank

L \times L

matrix, have no effect on

{\hat{β}}_{I V}

nor on

{\hat{β}}_{G M M} .

However, there is not such invariance when the matrix Z is premultiplied by some transformation matrix, and hence not the columns but the rows of Z are directly affected. It has been shown in Kiviet and Feng [33] that such transformations, chosen in correspondence with the required transformation of the model when

Ω \neq I_{N}

, may lead to modified GMM estimation achieving higher efficiency levels and better results in finite samples than standard GMM, provided the validity of the transformed instruments is maintained. We will examine here the effects of employing transformation

Z^{*} = Ψ Z,

which provides the modified GMM estimator

\begin{matrix} \begin{matrix} {\hat{β}}_{M G M M} = {[X^{'} Ω^{- 1} Z {(Z^{'} Ω^{- 1} Z)}^{- 1} Z^{'} Ω^{- 1} X]}^{- 1} X^{'} Ω^{- 1} Z {(Z^{'} Ω^{- 1} Z)}^{- 1} Z^{'} Ω^{- 1} y . \end{matrix} \end{matrix}

(11)

When this can be made feasible, it yields

{\hat{β}}_{M G M M}^{(2)} .

This modification attempts to employ the optimal instruments, see Arellano [36], Appendix B.

2.3. Particular Test Procedures

Inference on elements of

β_{0}

based on

{\hat{β}}_{G M M}^{(2)}

of (7) requires an asymptotic approximation to its distribution. Under correct specification the standard first-order approximation is

\begin{matrix} \begin{matrix} {\hat{β}}_{G M M}^{(2)} \overset{a}{\sim} N (β_{0}, {[X^{'} Z {\hat{G}}^{(1)} Z^{'} X]}^{- 1}) . \end{matrix} \end{matrix}

(12)

It allows testing general restrictions by Wald-type tests. For an individual coefficient, say

β_{0 k} = e_{K, k}^{'} β_{0},

where

e_{K, k}

is a

K \times 1

vector with all elements zero except its

k^{t h}

element

(1 \leq k \leq K)

which is unity, testing

H_{0} : β_{0 k} = β_{k}^{0}

and allowing for one-sided alternatives, amounts to comparing test statistic

\begin{matrix} \begin{matrix} W_{β_{k}} = (e_{K, k}^{'} {\hat{β}}_{G M M}^{(2)} - β_{k}^{0}) / {e_{K, k}^{'} {[X^{'} Z {\hat{G}}^{(1)} Z^{'} X]}^{- 1} e_{K, k}}^{1 / 2} \end{matrix} \end{matrix}

(13)

with the appropriate quantile of the standard normal distribution. Note that this test statistic is actually an asymptotic t-test; in finite samples the type I error probability may deviate from the chosen nominal level, also depending on any employed loss of degrees of freedom corrections. In fact it has been observed that the consistent estimator of the variance of two-step GMM estimators

\hat{V a r} ({\hat{β}}_{G M M}^{(2)})

given in (12) often underestimates the finite sample variance, because in its derivation the randomness of

{\hat{G}}^{(1)}

is not taken into account. Windmeijer [29] provides a corrected formula

\hat{V a r c} ({\hat{β}}_{G M M}^{(2)})

, see Appendix A, which can be used in the corrected t-test

\begin{matrix} \begin{matrix} W_{β_{k}}^{c} = (e_{K, k}^{'} {\hat{β}}_{G M M}^{(2)} - β_{k}^{0}) / {e_{K, k}^{'} \hat{V a r c} ({\hat{β}}_{G M M}^{(2)}) e_{K, k}}^{1 / 2} . \end{matrix} \end{matrix}

(14)

Provided

L > K

the overidentification restrictions can be tested by the Sargan-Hansen statistic. When

Ω = I_{N}

this is given by

J_{Z} = {\hat{u}}_{I V}^{'} P_{Z} {\hat{u}}_{I V} / {\hat{σ}}_{u}^{2},

where

{\hat{u}}_{I V} = y - X {\hat{β}}_{I V}

and

{\hat{σ}}_{u}^{2} = {\hat{u}}_{I V}^{'} {\hat{u}}_{I V} / N .

Because

P_{Z} (y - {\hat{β}}_{I V}) = P_{Z} [I_{N} - X {({\hat{X}}^{'} \hat{X})}^{- 1} {\hat{X}}^{'}] u = P_{Z} M_{\hat{X}} u = (P_{\hat{X}} + P_{{\hat{X}}^{⊥}}) M_{\hat{X}} u = P_{{\hat{X}}^{⊥}} u,

where

P_{{\hat{X}}^{⊥}}

is idempotent,

(X, {\hat{X}}^{⊥})

spans the same subspace as Z while

{\hat{X}}^{'} {\hat{X}}^{⊥} = O,

J_{Z}

is asymptotically

χ^{2}

distributed with

r a n k ({\hat{X}}^{⊥}) = L - K

degrees of freedom under correct specification and using valid instruments.

For general Ω the derivations in subsection 2.2 indicate that its GMM generalization is simply given by

J_{Z} = {(y^{*} - X^{*} {\hat{β}}_{G M M})}^{'} P_{Z^{†}} (y^{*} - X^{*} {\hat{β}}_{G M M}) / {\hat{σ}}_{u}^{2},

which necessarily has asymptotic null distribution

χ_{L - K}^{2}

too. Here

\begin{matrix} \begin{matrix} {\hat{σ}}_{u}^{2} = {(y^{*} - X^{*} {\hat{β}}_{G M M})}^{'} (y^{*} - X^{*} {\hat{β}}_{G M M}) / N = {\hat{u}}_{G M M}^{'} Ω^{- 1} {\hat{u}}_{G M M} / N, \end{matrix} \end{matrix}

(15)

where

{\hat{u}}_{G M M} = y - X {\hat{β}}_{G M M},

and

P_{Z^{†}} (y^{*} - X^{*} {\hat{β}}_{G M M}) = {(Ψ^{'})}^{- 1} Z {(Z^{'} Ω Z)}^{- 1} Z^{'} {\hat{u}}_{G M M} .

Substitution yields for

J_{Z}

the familiar expression

\begin{matrix} \begin{matrix} J_{Z} = {\hat{u}}_{G M M}^{'} Z {(Z^{'} Ω Z)}^{- 1} Z^{'} {\hat{u}}_{G M M} / {\hat{σ}}_{u}^{2}, \end{matrix} \end{matrix}

(16)

which is unfeasible when Ω is unknown.

We will consider various feasible test statistics when Ω is diagonal. Choosing

{\hat{Ω}}^{(0)} = I_{N}

yields

\begin{matrix} \begin{matrix} J_{Z}^{(1, 0)} = {\hat{u}}^{(1)'} Z {(Z^{'} Z)}^{- 1} Z^{'} {\hat{u}}^{(1)} / ({\hat{u}}^{(1)'} {\hat{u}}^{(1)} / N), \end{matrix} \end{matrix}

(17)

with

Z^{'} {\hat{u}}^{(1)} = Z^{'} [I_{N} - X {({\hat{X}}^{'} \hat{X})}^{- 1} {\hat{X}}^{'}] u .

The variance of the limiting distribution of

N^{- 1 / 2} Z^{'} {\hat{u}}^{(1)}

differs from

σ_{u}^{2} plim N^{- 1} Z^{'} Z,

unless

V a r (u_{i} ∣ z_{i}) = σ_{u}^{2} .

The same holds for

\begin{matrix} \begin{matrix} J_{Z}^{(1, 1)} = {\hat{u}}^{(1)'} Z {\hat{G}}^{(1)} Z^{'} {\hat{u}}^{(1)} . \end{matrix} \end{matrix}

(18)

So, these two tests have a limiting

χ_{L - K}^{2}

distribution only under unconditional or conditional homoskedasticity.

For

\begin{matrix} \begin{matrix} J_{Z}^{(2, 1)} = {\hat{u}}^{(2)'} Z {\hat{G}}^{(1)} Z^{'} {\hat{u}}^{(2)}, \end{matrix} \end{matrix}

(19)

where

{\hat{u}}^{(2)} = y - X {\hat{β}}_{G M M}^{(2)}

and

plim {\hat{G}}^{(1)} = {(σ_{u}^{2} plim N^{- 1} Z^{'} Ω Z)}^{- 1},

we find

\begin{matrix} \begin{matrix} N^{- 1 / 2} Z^{'} {\hat{u}}^{(2)} & = N^{- 1 / 2} Z^{'} (y - X {\hat{β}}_{G M M}^{(2)}) = N^{- 1 / 2} Z^{'} [u - X ({\hat{β}}_{G M M}^{(2)} - β_{0})] \\ \approx N^{- 1 / 2} Z^{'} {I_{N} - X {[X^{'} Z {(Z^{'} Ω Z)}^{- 1} Z^{'} X]}^{- 1} X^{'} Z {(Z^{'} Ω Z)}^{- 1} Z^{'}} u \\ = N^{- 1 / 2} Z^{†'} {I_{N} - X^{*} {[X^{*'} P_{Z^{†}} X^{*}]}^{- 1} X^{*'} P_{Z^{†}}} u^{*}, \end{matrix} \end{matrix}

using the asymptotic approximation (A2) of Appendix A. Then it follows that

J_{Z}^{(2, 1)}

converges to

u^{*'} M_{{\hat{X}}^{*}} P_{Z^{†}} M_{{\hat{X}}^{*}} u^{*} / σ_{u}^{2} = u^{*'} P_{{\hat{X}}^{* ⊥}} u^{*} / σ_{u}^{2},

where

{\hat{X}}^{*} = P_{Z^{†}} X^{*},

({\hat{X}}^{*}, {\hat{X}}^{* ⊥}) = Z^{†}

and

{\hat{X}}^{*'} {\hat{X}}^{* ⊥} = O .

Hence

J_{Z}^{(2, 1)}

does have asymptotic null distribution

χ_{L - K}^{2},

and so will

J_{Z}^{(2, 2)},

where

{\hat{u}}^{(2)} = y - X {\hat{β}}_{G M M}^{(2)}

is used to construct

{\hat{G}}^{(2)} .

When

Z = (Z_{m}

,

Z_{a}),

where

Z_{m}

is an

N \times L_{m}

matrix with

L_{m} \geq K

containing the instruments whose validity seems very likely, then, under the maintained hypothesis

E (Z_{m}^{'} u) = 0

, one can test the validity of the

L - L_{m}

additional instruments

Z_{a}

by the incremental test statistic

\begin{matrix} \begin{matrix} J I_{Z_{a}}^{(2, 1)} = {\hat{u}}^{(2)'} Z {\hat{G}}^{(1)} Z^{'} {\hat{u}}^{(2)} - {\hat{u}}_{m}^{(2)'} Z_{m} {\hat{G}}_{m}^{(1)} Z_{m}^{'} {\hat{u}}_{m}^{(2)}, \end{matrix} \end{matrix}

(20)

which under correct specification of the model with valid instruments Z is distributed as

χ_{L - L_{m}}^{2}

asymptotically. Of course,

{\hat{u}}_{m}^{(2)}

and

{\hat{G}}_{m}^{(1)}

are obtained by just using the instruments

Z_{m} .

Note that for

m = K

we have

J I_{Z_{a}}^{(2, 1)} = J_{Z}^{(2, 1)}

because in that case

Z_{m}^{'} {\hat{u}}_{m}^{(2)} = 0 .

Hence, when

m = K,

explicit specification of component

Z_{m}

is meaningless.

In simulations it is interesting to examine as well unfeasible versions of the above test statistics, which exploit information that is usually not available in practice. This will produce evidence on what elements of the feasible asymptotic tests may cause any inaccuracies in finite samples. So, next to (13), (19) and (20) we will also examine

\begin{matrix} W_{β_{k}}^{(u)} & = (e_{K, k}^{'} {\hat{β}}_{G M M} - β_{k}^{0}) / {σ_{u}^{2} e_{K, k}^{'} {[X^{'} Z {(Z^{'} Ω Z)}^{- 1} Z^{'} X]}^{- 1} e_{K, k}}^{1 / 2}, \end{matrix}

(21)

\begin{matrix} J_{Z}^{(u)} & = {\hat{u}}^{'} Z {(Z^{'} Ω Z)}^{- 1} Z^{'} \hat{u} / σ_{u}^{2}, where \hat{u} = y - X {\hat{β}}_{G M M}, \end{matrix}

(22)

\begin{matrix} J I_{Z_{a}}^{(u)} & = [{\hat{u}}^{'} Z {(Z^{'} Ω Z)}^{- 1} Z^{'} \hat{u} - {\hat{u}}_{m}^{'} Z_{m} {(Z_{m}^{'} Ω Z_{m})}^{- 1} Z_{m}^{'} {\hat{u}}_{m}] / σ_{u}^{2} . \end{matrix}

(23)

Similar feasible and unfeasible implementations of t-tests and Sargan-Hansen tests for MGMM based estimators follow straight-forwardly.

3. Implementations for Dynamic Micro Panel Models

3.1. Model and Assumptions

We consider the balanced linear dynamic panel data model

(i = 1, . . ., N;

t = 1, . . ., T)

\begin{matrix} \begin{matrix} y_{i t} = x_{i t}^{'} β + w_{i t}^{'} γ + v_{i t}^{'} δ + μ + τ_{t} + η_{i} + ε_{i t}, \end{matrix} \end{matrix}

(24)

where

x_{i t}

contains

K_{x} \geq 0

strictly exogenous regressors (excluding a constant and fixed time effects),

w_{i t}

are

K_{w} \geq 0

predetermined regressors (probably including lags of the dependent variable and other variables affected by lagged feedback from

y_{i t}

or just from

ε_{i t}

),

v_{i t}

are

K_{v} \geq 0

endogenous regressors (affected by instantaneous feedback from

y_{i t}

and therefore jointly dependent with

y_{i t}

), μ is an overall constant, the

τ_{t}

are random or fixed time effects, the

η_{i}

are random individual specific effects (most likely correlated with many of the regressors) such that

\begin{matrix} \begin{matrix} η_{i} \sim i i d (0, σ_{η}^{2}), \end{matrix} \end{matrix}

(25)

whereas

\begin{matrix} \begin{matrix} E (ε_{i t}) = 0, E (ε_{i t}^{2}) = σ_{i t}^{2}, E (ε_{i t} ε_{j s}) = 0, E (η_{i} ε_{j t}) = 0, \forall i, j, t \neq s . \end{matrix} \end{matrix}

(26)

The parameter vector τ could have all its elements equal and then will be absorbed by the overall intercept μ of the model. However, it seems better to allow for time effects in addition to an intercept, because this helps to underpin the assumption that both the idiosyncratic disturbances and the individual effects have expectation zero. Note, though, that for identification at least one restriction should be imposed on the

T + 1

scalar parameters represented by μ and

τ .

The classification of the regressors implies

\begin{matrix} \begin{matrix} E (x_{i t} ε_{i s}) = 0, E (w_{i t} ε_{i, t + l}) = 0, E (v_{i t} ε_{i, t + 1 + l}) = 0, \forall i, t, s, l \geq 0 . \end{matrix} \end{matrix}

(27)

For the sake of simplicity we assume that all regressors are time varying and that the vectors

x_{i t},

w_{i t}

or

v_{i t}

are defined for

t = 1, . . ., T

. However, their elements may contain observations prior to

t = 1

for regressors that are actually the

l^{t h}

order lag of a current variable. Only these lagged regressors are observed from

t = 1 - l

onwards. This means that all regressors in (24), be it current variables or lags of them, have exactly T observations. So, any unbalancedness problems have been defined away; moreover, no internal instrumental variables can be constructed involving observations prior to those included in

x_{i 1},

w_{i 1}

or

v_{i 1} .

Stacking the T time-series observations of (24) the equation in levels can be written

\begin{matrix} \begin{matrix} y_{i} = X_{i} β + W_{i} γ + V_{i} δ + I_{T} τ + ι_{T} (μ + η_{i}) + ε_{i}, \end{matrix} \end{matrix}

(28)

where

y_{i} = {(y_{i 1} \dots y_{i T})}^{'},

X_{i} = {(x_{i 1} \dots x_{i T})}^{'},

W_{i} = {(w_{i 1} \dots w_{i T})}^{'},

V_{i} = {(v_{i 1} \dots v_{i T})}^{'},

τ = {(τ_{1} \dots τ_{T})}^{'}

and

ι_{T}

is the

T \times 1

vector with all its elements equal to unity. We do allow

K_{x} = 0,

K_{w} = 0,

K_{v} = 0,

but not all three at the same time, so

\begin{matrix} \begin{matrix} K = K_{x} + K_{w} + K_{v} > 0 . \end{matrix} \end{matrix}

(29)

We will focus on micro panels, where the number of time-series observations T is usually very small, possibly a one digit number, and the number of cross-section units N is large, usually at least several hundreds. Therefore, asymptotic approximations will be for

N \to \infty

and T finite.

3.2. Removing Individual Effects by First Differencing

First we consider estimating the model by GMM following the approach propounded by Arellano and Bond [1], see also Holtz-Eakin et al. [37]. To clarify this, and the consequences it may have for the time-dummies, we introduce the matrices

\begin{matrix} \begin{matrix} D_{T} = (\begin{matrix} - 1 & 1 & 0 & \dots & 0 \\ 0 & - 1 & 1 & ⋮ \\ ⋮ & ⋱ & ⋱ & 0 \\ 0 & 0 & \dots & - 1 & 1 \end{matrix}) and D_{T - 1}^{*} = (\begin{matrix} 1 & 0 & 0 \\ - 1 & 1 & ⋱ & ⋮ \\ ⋱ & 1 & 0 \\ 0 & \dots & - 1 & 1 \end{matrix}), \end{matrix} \end{matrix}

(30)

where

D_{T}

is

(T - 1) \times T

and

D_{T - 1}^{*}

is its

(T - 1) \times (T - 1)

submatrix after removing the first column. By taking first differences the intercept and the individual effects are removed and one may estimate the N sets of

T - 1

equations

D_{T} y_{i} = D_{T} X_{i} β + D_{T} W_{i} γ + D_{T} V_{i} δ + D_{T} τ + D_{T} ε_{i} .

Denoting

{\tilde{y}}_{i} = D_{T} y_{i},

{\tilde{X}}_{i} = D_{T} X_{i},

{\tilde{W}}_{i} = D_{T} W_{i},

{\tilde{V}}_{i} = D_{T} V_{i},

{\tilde{ε}}_{i} = D_{T} ε_{i}

and

\tilde{τ} = D_{T} τ,

where

{\tilde{τ}}_{t} = τ_{t} - τ_{t - 1}

for

t = 2, . . ., T,

this can compactly be expressed as

{\tilde{y}}_{i} = {\bar{R}}_{i} \bar{α} + {\tilde{ε}}_{i},

where

{\bar{R}}_{i} = ({\tilde{X}}_{i}, {\tilde{W}}_{i}, {\tilde{V}}_{i}, I_{T - 1})

and

\bar{α} = {(β^{'}, γ^{'}, δ^{'}, {\tilde{τ}}^{'})}^{'} .

The popular Stata package xtabond2 (StataCorp LLC, College Station, TX, USA) reparametrizes the time-effects differently (as we found out by experimentation). It substitutes

\tilde{τ} = D_{T - 1}^{*} τ^{*} .

Hence,

τ_{t}^{*} = τ_{t} - τ_{1}

for

t = 2, . . ., T,

and it estimates

\begin{matrix} \begin{matrix} {\tilde{y}}_{i} = {\tilde{R}}_{i} \tilde{α} + {\tilde{ε}}_{i}, \end{matrix} \end{matrix}

(31)

where

{\tilde{R}}_{i} = ({\tilde{X}}_{i}, {\tilde{W}}_{i}, {\tilde{V}}_{i}, D_{T - 1}^{*})

and

\tilde{α} = {(β^{'}, γ^{'}, δ^{'}, τ^{*'})}^{'} .

So, basically, it addresses the problem that not all T time-dummy coefficients can be identified, by replacing the submatrix of regressors

D_{T},

which has rank

T - 1,

by full rank matrix

D_{T - 1}^{*},

so by simply removing its first column, with the effect that coefficients

τ^{*}

will be estimated.

Defining

\begin{matrix} \begin{matrix} Q_{T} = (\begin{matrix} 0^{'} \\ I_{T - 1} \end{matrix}) and A_{T} = (\begin{matrix} 1 & 0 & \dots & 0 \\ 1 & 1 & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 1 & 1 & \dots & 1 \end{matrix}), \end{matrix} \end{matrix}

(32)

where

Q_{T}

is

T \times (T - 1)

and

A_{T}

is

T \times T

lower-triangular, one easily finds that

D_{T} Q_{T} = D_{T - 1}^{*} = A_{T - 1}^{- 1} .

Instead of

τ^{*},

one can directly estimate

\tilde{τ} = D_{T - 1}^{*} τ^{*}

by xtabond2 (StataCorp LLC) by replacing in the equation in levels

I_{T} τ

by

A_{T} τ^{* *},

hence by replacing the time dummy variables by accumulated time dummies. Here

τ^{* *} = A_{T}^{- 1} τ = D_{T + 1} Q_{T + 1} τ = D_{T + 1} {(0, τ^{'})}^{'} = {(τ_{1}, {\tilde{τ}}^{'})}^{'} .

Note that

D_{T} A_{T} = (0, I_{T - 1}) .

Hence,

I_{T - 1} \tilde{τ} = \tilde{τ}

would remain, after removal of the first column, as in

{\tilde{y}}_{i} = {\bar{R}}_{i} \bar{α} + {\tilde{ε}}_{i}

. Of course, many more different transformations of the T coefficients τ can be estimated, though, by taking first differences only

T - 1

linear transformations of them can be identified; by interpreting them appropriately no fundamental differences emerge.

To construct a full column rank matrix of instrumental variables

Z = {(Z_{1}^{'} \dots Z_{N}^{'})}^{'},

which expresses as many linearly independent orthogonality conditions as possible for (31), while restricting ourselves to internal variables, i.e., variables occurring in (24), we define the following vectors

\begin{matrix} \begin{matrix} x_{i}^{T'} = (x_{i 1}^{'} \dots x_{i T}^{'}), w_{i}^{t'} = (w_{i 1}^{'} \dots w_{i t}^{'}), v_{i}^{t'} = (v_{i 1}^{'} \dots v_{i t}^{'}) . \end{matrix} \end{matrix}

(33)

Without making this explicit in the notation it should be understood that these three vectors only contain unique elements. Hence, if vector

x_{i s}

(or

w_{i s})

contains for

1 < s \leq T

a particular value and also its lag (which is not possible for

v_{i t}

), then this lag should be taken out since it already appears in

x_{i, s - 1} .

Matrix

Z_{i}

is of order

(T - 1) \times L

and consists of four blocks (though some of these may be void)

\begin{matrix} \begin{matrix} Z_{i} = (Z_{i}^{x}, Z_{i}^{w}, Z_{i}^{v}, D_{T - 1}^{*}) . \end{matrix} \end{matrix}

(34)

The final block spans the same space as

I_{T - 1}

and could thus be replaced by

I_{T - 1} .

It is associated with the fundamental moment conditions

E ({\tilde{ε}}_{i}) = 0 .

Therefore, it could form part of

Z_{i}

even if one imposes

τ = 0 .

For the other blocks we have

\begin{matrix} \begin{matrix} Z_{i}^{x} = I_{T - 1} \otimes x_{i}^{T'}, Z_{i}^{w} = (\begin{matrix} w_{i}^{1'} & 0^{'} & 0^{'} \\ O & ⋱ & O \\ 0^{'} & 0^{'} & w_{i}^{T - 1'} \end{matrix}), Z_{i}^{v} = (\begin{matrix} 0^{'} & 0^{'} & 0^{'} \\ v_{i}^{1'} & 0^{'} & 0^{'} \\ O & ⋱ & O \\ 0^{'} & 0^{'} & v_{i}^{T - 2'} \end{matrix}) . \end{matrix} \end{matrix}

(35)

The maximum possible number of columns of

Z_{i}^{x}

is

K_{x} T (T - 1),

for

Z_{i}^{w}

it is

K_{w} T (T - 1) / 2

and for

Z_{i}^{v}

it is

K_{v} (T - 1) (T - 2) / 2,

thus

\begin{matrix} \begin{matrix} L \leq (T - 1) {T [K_{x} + (K_{w} + K_{v}) / 2] - K_{v} + 1}, \end{matrix} \end{matrix}

(36)

whereas MM estimation requires

L \geq K + T - 1 .

It follows from (26) and (27) that

E (Z_{i}^{'} {\tilde{ε}}_{i}) = 0

indeed. In actual estimation one may use a subset of these instruments by taking the linear transformation

Z_{i}^{*} = Z_{i} C,

where C is an

L \times L^{*}

matrix (with all its elements often being either zero, one or minus one) of rank

L^{*} < L,

provided

L^{*} \geq K + T - 1 .

In the above we have implicitly assumed that the variables are such that

Z = {(Z_{1}^{'} \dots Z_{N}^{'})}^{'}

will have full column rank, so another necessary condition is

N (T - 1) \geq L^{*} .

Of course, it is not required that individual blocks

Z_{i}

have full column rank.

Despite its undesirable effect on the asymptotic variance of method of moments estimators, reducing the number of instruments may improve estimation precision, because it may at the same time mitigate estimation bias in finite samples, especially when weak instruments are being removed. So, instead of including the block

D_{T - 1}^{*}

or

I_{T - 1}

in

Z_{i}

one could—especially when the model has no time-effects—replace it by

I_{T - 1} ι_{T - 1} = ι_{T - 1} .

Regarding

Z_{i}^{w}

and

Z_{i}^{v}

two alternative instrument reduction methods have been suggested, namely omitting long lags (see Bowsher [14], Windmeijer [29] and Bun and Kiviet [19]) and collapsing (see Roodman [10], but also suggested in Anderson and Hsiao [38]). Both are employed in Ziliak [35]; these two methods can also be combined.

Omitting long lags could be achieved by reducing

Z_{i}^{w}

to, for instance,

\begin{matrix} \begin{matrix} (\begin{matrix} w_{i 1}^{'} & 0^{'} & 0^{'} & 0^{'} & 0^{'} & 0^{'} & 0^{'} \\ 0^{'} & w_{i 1}^{'} & w_{i 2}^{'} & 0^{'} & 0^{'} & 0^{'} & 0^{'} \\ 0^{'} & 0^{'} & 0^{'} & w_{i 2}^{'} & w_{i 3}^{'} & 0^{'} & 0^{'} \\ ⋮ & ⋮ & ⋮ & O^{'} & O^{'} & ⋱ & O^{'} & O^{'} \\ 0^{'} & 0^{'} & 0^{'} & 0^{'} & 0^{'} & w_{i, T - 2}^{'} & w_{i, T - 1}^{'} \end{matrix}) \end{matrix} \end{matrix}

(37)

and similar for

Z_{i}^{v} .

The collapsed versions of

Z_{i}^{w}

and of

Z_{i}^{v}

can be denoted as

\begin{matrix} \begin{matrix} Z_{i}^{* w} = (\begin{matrix} w_{i 1}^{'} & 0^{'} & \dots & 0^{'} \\ w_{i 2}^{'} & w_{i 1}^{'} & ⋮ \\ ⋮ & ⋱ & O \\ w_{i, T - 1}^{'} & w_{i, T - 2}^{'} & \dots & w_{i, 1}^{'} \end{matrix}), Z_{i}^{* v} = (\begin{matrix} 0^{'} & 0^{'} & \dots & 0^{'} \\ v_{i 1}^{'} & 0^{'} & 0^{'} \\ v_{i 2}^{'} & v_{i 1}^{'} & ⋮ \\ ⋮ & ⋱ & O \\ v_{i, T - 2}^{'} & w_{i, T - 3}^{'} & \dots & v_{i, 1}^{'} \end{matrix}) . \end{matrix} \end{matrix}

(38)

Collapsing can be combined with omitting long lags, if one removes all the columns of

Z_{i}^{* w}

and

Z_{i}^{* v}

which have at least a certain number of zero elements (say 1 or 2 or more) in their top rows. In corresponding ways, the column space of

Z_{i}^{x}

can be reduced by including in

Z_{i}

either a limited number of lags and leads, or the collapsed matrix

\begin{matrix} \begin{matrix} Z_{i}^{* x} = (\begin{matrix} x_{i 2}^{'} & x_{i 1}^{'} & 0^{'} & \dots & 0^{'} \\ x_{i 3}^{'} & x_{i 2}^{'} & x_{i 1}^{'} & ⋮ \\ ⋮ & ⋮ & ⋱ & O \\ x_{i, T}^{'} & x_{i, T - 1}^{'} & x_{i, T - 2}^{'} & \dots & x_{i, 1}^{'} \end{matrix}), \end{matrix} \end{matrix}

(39)

or just its first two or three columns – or what is often done in practice – simply the difference between the first two columns, the

K_{x}

regressors

{(Δ x_{i 2}, . . ., Δ x_{i T})}^{'} .

It seems useful to distinguish the following specific forms of instrument matrix reduction of the case where all instruments associated with valid linear moment restrictions are being used. The latter case we label as A (all); the reductions are labelled C (standard collapsing), L0, L1, L2, L3 (which primarily restrict the lag length), and C0, C1, C2, C3 (which combine the two reduction principles). In all the reductions we replace

I_{T - 1}

by

ι_{T - 1}

when the model does not include time-effects. Regarding

Z_{i}^{x},

Z_{i}^{w}

and

Z_{i}^{v}

different types of reductions can be taken, which we will distinguish by using for example the characterization:

A^{v},

L2

^{w},

C1

^{x}

, etc. This leads to the particular reductions as indicated and defined in Table 1.

Note that for all three types of regressors L2, like L1, uses one extra lag compared to L0, but does not impose the first-difference restrictions characterizing L1. We skipped a similar intermediary case between L2 and L3. Self-evidently L2

^{x}

can also be represented by combining

diag (x_{i 1}^{'}, . . ., x_{i, T - 1}^{'})

with L1

^{x},

and similar for L2

^{w}

and L2

^{v} .

The reductions C0 and C1, which yield just one instrument per regressor, constitute generalizations of the classic instruments suggested by Anderson and Hsiao [38]. These may lead to just identified models where the number of instruments equals the number of regressors which provokes the non-existence of moments problem. To avoid that, and also because we suppose that in general some degree of overidentification will have advantages regarding both estimation precision and the opportunity to test model adequacy, one may choose to restrict oneself to the popular C1

^{x}

and the reductions C and C3 as far as collapsing is concerned. In C3 just the first three columns of the matrices in (38) and (39) are being used as instruments.

3.2.1. Alternative Weighting Matrices

We assumed in (26) that the

ε_{i t}

are serially and cross-sectionally uncorrelated but may be heteroskedastic. Let us define the matrix

Ω_{i} = diag (σ_{i 1}^{2}, . . ., σ_{i T}^{2}),

thus

ε_{i} \sim (0, Ω_{i})

and

{\tilde{ε}}_{i} = D_{T} ε_{i} \sim (0, D_{T} Ω_{i} D_{T}^{'}) .

Under standard regularity we have

\begin{matrix} \begin{matrix} N^{- 1 / 2} Σ_{i = 1}^{N} Z_{i}^{'} {\tilde{ε}}_{i} \overset{d}{\to} N (0, plim N^{- 1} Σ_{i = 1}^{N} Z_{i}^{'} D_{T} Ω_{i} D_{T}^{'} Z_{i}) . \end{matrix} \end{matrix}

(40)

Hence, the optimal GMM estimator of

\tilde{α}

of (31) should use a weighting matrix such that its inverse has probability limit proportional to

plim N^{- 1} Σ_{i = 1}^{N} Z_{i}^{'} D_{T} Ω_{i} D_{T}^{'} Z_{i} .

This can be achieved by first obtaining a consistent 1-step GMM estimator

\begin{matrix} \begin{matrix} {\hat{\tilde{α}}}^{(1)} = {[(Σ_{i = 1}^{N} {\tilde{R}}_{i}^{'} Z_{i}) G^{(0)} (Σ_{i = 1}^{N} Z_{i}^{'} {\tilde{R}}_{i})]}^{- 1} (Σ_{i = 1}^{N} {\tilde{R}}_{i}^{'} Z_{i}) G^{(0)} (Σ_{i = 1}^{N} Z_{i}^{'} {\tilde{y}}_{i}), \end{matrix} \end{matrix}

(41)

which uses the weighting matrix

\begin{matrix} \begin{matrix} G^{(0)} = {(Σ_{i = 1}^{N} Z_{i}^{'} D_{T} D_{T}^{'} Z_{i})}^{- 1} . \end{matrix} \end{matrix}

(42)

This is already efficient if

Ω_{i} = σ_{ε}^{2} I_{T};

otherwise, in a second step, the consistent 1-step residuals

{\hat{\tilde{ε}}}_{i}^{(1)} = {\tilde{y}}_{i} - {\tilde{R}}_{i} {\hat{\tilde{α}}}^{(1)}

can be used to construct the asymptotically optimal weighting matrix

\begin{matrix} \begin{matrix} {\hat{G}}_{a}^{(1)} = {(Σ_{i = 1}^{N} Z_{i}^{'} {\hat{\tilde{ε}}}_{i}^{(1)} {\hat{\tilde{ε}}}_{i}^{(1)'} Z_{i})}^{- 1} . \end{matrix} \end{matrix}

(43)

An alternative is using

\begin{matrix} \begin{matrix} {\hat{G}}_{b}^{(1)} = {(Σ_{i = 1}^{N} Z_{i}^{'} {\hat{H}}_{i}^{(1) b} Z_{i})}^{- 1}, \end{matrix} \end{matrix}

(44)

where

{\hat{H}}_{i}^{(1) b}

is the band matrix

\begin{matrix} \begin{matrix} {\hat{H}}_{i}^{(1) b} = (\begin{matrix} {\hat{\tilde{ε}}}_{i 2}^{(1)} {\hat{\tilde{ε}}}_{i 2}^{(1)} & {\hat{\tilde{ε}}}_{i 2}^{(1)} {\hat{\tilde{ε}}}_{i 3}^{(1)} & 0 & \dots & 0 \\ {\hat{\tilde{ε}}}_{i 3}^{(1)} {\hat{\tilde{ε}}}_{i 2}^{(1)} & {\hat{\tilde{ε}}}_{i 3}^{(1)} {\hat{\tilde{ε}}}_{i 3}^{(1)} & {\hat{\tilde{ε}}}_{i 3}^{(1)} {\hat{\tilde{ε}}}_{i 4}^{(1)} & 0 \\ 0 & {\hat{\tilde{ε}}}_{i 4}^{(1)} {\hat{\tilde{ε}}}_{i 3}^{(1)} & {\hat{\tilde{ε}}}_{i 4}^{(1)} {\hat{\tilde{ε}}}_{i 4}^{(1)} & ⋱ & 0 \\ ⋮ & ⋱ & ⋱ & ⋱ & ⋮ \\ 0 & ⋱ & {\hat{\tilde{ε}}}_{i, T - 1}^{(1)} {\hat{\tilde{ε}}}_{i, T - 1}^{(1)} & {\hat{\tilde{ε}}}_{i, T - 1}^{(1)} {\hat{\tilde{ε}}}_{i T}^{(1)} \\ 0 & 0 & 0 & \dots & {\hat{\tilde{ε}}}_{i T}^{(1)} {\hat{\tilde{ε}}}_{i, T - 1}^{(1)} & {\hat{\tilde{ε}}}_{i T}^{(1)} {\hat{\tilde{ε}}}_{i T}^{(1)} \end{matrix}) . \end{matrix} \end{matrix}

(45)

Both

{(N {\hat{G}}_{a}^{(1)})}^{- 1}

and

{(N {\hat{G}}_{b}^{(1)})}^{- 1}

have a probability limit equal to the limiting variance of (40). The latter is less robust, but may converge faster when

Ω_{i}

is diagonal indeed. On the other hand (44) may not be positive definite, whereas (43) is.

For the special case

Ω_{i} = σ_{ε, i}^{2} I_{T}

of cross-section heteroskedasticity but time-series homoskedasticity, one could use

\begin{matrix} \begin{matrix} {\hat{G}}_{c}^{(1)} = {(Σ_{i = 1}^{N} Z_{i}^{'} {\hat{H}}_{i}^{(1) c} Z_{i})}^{- 1}, \end{matrix} \end{matrix}

(46)

with

\begin{matrix} \begin{matrix} {\hat{H}}_{i}^{(1) c} = {\hat{σ}}_{ε, i}^{2, (1)} H = {\hat{σ}}_{ε, i}^{2, (1)} (\begin{matrix} 2 & - 1 & 0 & \dots & 0 \\ - 1 & 2 & ⋱ & ⋱ & ⋮ \\ 0 & ⋱ & ⋱ & ⋱ & 0 \\ ⋮ & ⋱ & ⋱ & 2 & - 1 \\ 0 & \dots & 0 & - 1 & 2 \end{matrix}), \end{matrix} \end{matrix}

(47)

where

H = D_{T} D_{T}^{'}

and

\begin{matrix} \begin{matrix} {\hat{σ}}_{ε, i}^{2, (1)} = {\hat{\tilde{ε}}}_{i}^{(1)'} H^{- 1} {\hat{\tilde{ε}}}_{i}^{(1)} / (T - 1) . \end{matrix} \end{matrix}

(48)

Of course, these N estimators are not consistent for T finite. However, a consistent estimator for

σ_{ε}^{2} = N^{- 1} Σ_{i = 1}^{N} σ_{ε, i}^{2}

is given by

\begin{matrix} \begin{matrix} {\hat{σ}}_{ε}^{2, (1)} = N^{- 1} Σ_{i = 1}^{N} {\hat{σ}}_{ε, i}^{2, (1)} . \end{matrix} \end{matrix}

(49)

The three different weighting matrices can be used to calculate alternative

{\hat{\tilde{α}}}_{(j)}^{(2)}

estimators for

j \in {a, b, c}

according to

\begin{matrix} \begin{matrix} {\hat{\tilde{α}}}_{(j)}^{(2)} = {[(Σ_{i = 1}^{N} {\tilde{R}}_{i}^{'} Z_{i}) {\hat{G}}_{j}^{(1)} (Σ_{i = 1}^{N} Z_{i}^{'} {\tilde{R}}_{i})]}^{- 1} (Σ_{i = 1}^{N} {\tilde{R}}_{i}^{'} Z_{i}) {\hat{G}}_{j}^{(1)} (Σ_{i = 1}^{N} Z_{i}^{'} {\tilde{y}}_{i}) . \end{matrix} \end{matrix}

(50)

When the employed weighting matrix is asymptotically optimal indeed, the first-order asymptotic approximation to the variance of

{\hat{\tilde{α}}}_{(j)}^{(2)}

is given by the inverse of the matrix in square brackets. From this (corrected) t-tests are easily obtained, see Section 3.4. Matching implementations of Sargan-Hansen statistics follow easily too, see Section 3.5. Note that estimators for

σ_{ε, i}^{2}

or

σ_{ε}^{2}

can also be obtained by employing second-stage residuals.

Let

\hat{\tilde{α}}

represent any of the consistent estimators of

\tilde{α}

mentioned above, and consider the residuals

{\hat{u}}_{i}^{†} = y_{i} - (X_{i}, W_{i}, V_{i}, Q_{T}) \hat{\tilde{α}} .

From these we find

\hat{μ + τ_{1}} = N^{- 1} T^{- 1} Σ_{i = 1}^{N} ι_{T}^{'} {\hat{u}}_{i}^{†}

, giving

{\hat{u}}_{i} = {\hat{u}}_{i}^{†} - \hat{μ + τ_{1}},

which for

N \to \infty

converges to

u_{i} = ι_{T} η_{i} + ε_{i} .

Since

E (u_{i} u_{i}^{'}) = σ_{ε, i}^{2} I_{T} + σ_{η}^{2} ι_{T} ι_{T}^{'}

we have

plim N^{- 1} Σ_{i = 1}^{N} {\hat{u}}_{i} {\hat{u}}_{i}^{'} = σ_{ε}^{2} I_{T} + σ_{η}^{2} ι_{T} ι_{T}^{'} .

This yields

plim N^{- 1} Σ_{i = 1}^{N} ι_{T}^{'} {\hat{u}}_{i} {\hat{u}}_{i}^{'} ι_{T} = T σ_{ε}^{2} + σ_{η}^{2} T^{2}

from which the consistent estimator

T^{- 2} N^{- 1} Σ_{i = 1}^{N} {(ι_{T}^{'} {\hat{u}}_{i})}^{2} - T^{- 1} {\hat{σ}}_{ε}^{2, (1)}

for

σ_{η}^{2}

follows. From simulations we established that, especially when

σ_{η}

is relatively small, this estimator is often negative. An alternative more satisfactory consistent estimator turns out to be

\begin{matrix} \begin{matrix} {\hat{σ}}_{η}^{2, (1)} = T^{- 1} N^{- 1} Σ_{i = 1}^{N} {\hat{u}}_{i}^{'} {\hat{u}}_{i} - {\hat{σ}}_{ε}^{2, (1)} . \end{matrix} \end{matrix}

(51)

This does not hinge as much on the serial uncorrelatedness of the

ε_{i t} .

However, estimator (51) can be negative too, especially when

σ_{η} / σ_{ε}

is small or T is very small. When this happens it seems reasonable to set

{\hat{σ}}_{η}^{2, (1)} = 0 .

Note that this does not jeopardize the consistency of the estimator; therefore we followed this approach in the simulations, rather then using the non-negative estimator

N^{- 1} Σ_{i = 1}^{N} {\hat{η}}_{i}^{2},

where

{\hat{η}}_{i} = T^{- 1} Σ_{t = 1}^{T} {\hat{u}}_{i t}^{†} - \hat{μ + τ_{1}},

since this is inconsistent, because

E ({\hat{η}}_{i}^{2}) \neq σ_{η}^{2}

.

3.3. Respecting the Equation in Levels as Well

In this subsection we will examine whether the first-difference operation in the foregoing subsection implied a loss of valid orthogonality conditions embodied by our initial assumptions made in Section 3.1.

Since

τ = τ_{1} ι_{T} + Q_{T} τ^{*},

we can rewrite model (28) as

\begin{matrix} \begin{matrix} y_{i} & = X_{i} β + W_{i} γ + V_{i} δ + Q_{T} τ^{*} + (μ + τ_{1} + η_{i}) ι_{T} + ε_{i} \\ = R_{i} \tilde{α} + (μ + τ_{1}) ι_{T} + u_{i} \\ = R_{i}^{*} \ddot{α} + u_{i}, \end{matrix} \end{matrix}

(52)

where

R_{i} = (X_{i}, W_{i}, V_{i}, Q_{T}),

R_{i}^{*} = (R_{i}, ι_{T})

and

(K + T) \times 1

vector

\ddot{α} = {({\tilde{α}}^{'}, μ + τ_{1})}^{'};

note that

{\tilde{R}}_{i} = D_{T} R_{i} .

Regressor

ι_{T}

is a valid instrument for model (52). It embodies the single orthogonality condition

E [Σ_{t = 1}^{T} (η_{i} + ε_{i t})] = E [Σ_{t = 1}^{T} u_{i t}] = 0

(\forall i),

which is implied by the

T + 1

assumptions

E (η_{i}) = 0

and

E (ε_{i t}) = 0

(for

t = 1, . . ., T)

made in (25) and (26). These

T + 1

assumptions can also be represented (through linear transformation) by (i)

E (η_{i}) = 0,

(ii)

E (Δ ε_{i t}) = 0

(for

t = 2, . . ., T)

and (iii)

E (Σ_{t = 1}^{T} u_{i t}) = 0 .

Because we cannot express

η_{i}

exclusively in observed variables and unknown parameters it is impossible to convert (i) into a separate sample orthogonality condition. The

T - 1

orthogonality conditions (ii) are already employed by Arellano-Bond estimation, through including

I_{T - 1}

or

D_{T - 1}^{*}

in

Z_{i}

of (34) for the equation in first differences. Orthogonality condition (iii), which is in terms of the level disturbance, can be exploited by including the column

ι_{T}

in the i-th block of an instrument matrix for level equation (52). Apparently, this condition will get lost when estimation is just based on the equation in first differences.

Combining the

T - 1

difference equations and the T level equations in a system yields

\begin{matrix} \begin{matrix} {\ddot{y}}_{i} = {\ddot{R}}_{i} \ddot{α} + {\ddot{u}}_{i}, \end{matrix} \end{matrix}

(53)

for each individual

i,

where

{\ddot{y}}_{i} = {({\tilde{y}}_{i}^{'}, y_{i}^{'})}^{'},

{\ddot{R}}_{i} = {({\tilde{R}}_{i}^{*'}, R_{i}^{*'})}^{'},

with

{\tilde{R}}_{i}^{*} = ({\tilde{R}}_{i}, 0),

so it is extended by an extra column of zeros (to annihilate coefficient

μ + τ_{1}

in the equation in first differences), and

{\ddot{u}}_{i} = {({\tilde{ε}}_{i}^{'}, u_{i}^{'})}^{'} .

We find that

E ({\tilde{ε}}_{i} u_{i}^{'}) = E (D ε_{i} ε_{i}^{'}) = D Ω_{i}

and

E (u_{i} u_{i}^{'}) = E [(η_{i} ι_{T} + ε_{i}) {(η_{i} ι_{T} + ε_{i})}^{'}] = σ_{η}^{2} ι_{T} ι_{T}^{'} + Ω_{i},

so

\begin{matrix} \begin{matrix} E ({\ddot{u}}_{i} {\ddot{u}}_{i}^{'}) = (\begin{matrix} D Ω_{i} D^{'} & D Ω_{i} \\ Ω_{i} D^{'} & Ω_{i} + σ_{η}^{2} ι_{T} ι_{T}^{'} \end{matrix}) . \end{matrix} \end{matrix}

(54)

Model (53) can be estimated by MM using the

N (2 T - 1) \times (L + 1)

matrix of instruments with blocks

\begin{matrix} \begin{matrix} {\ddot{Z}}_{i} = (\begin{matrix} Z_{i} & 0 \\ O & ι_{T} \end{matrix}), \end{matrix} \end{matrix}

(55)

provided

N (2 T - 1) \geq L + 1 \geq K + T .

Since both

{\ddot{R}}_{i}

and

{\ddot{Z}}_{i}

contain a column

{(0^{'}, ι_{T}^{'})}^{'},

and due to the occurrence of the O-block in

{\ddot{Z}}_{i},

by a minor generalization of result (8) the IV estimator of

\ddot{α}

obtained by using instrument blocks

{\ddot{Z}}_{i}

in (53) will be equivalent regarding

\tilde{α}

with the IV estimator of Equation (31) using instruments with blocks

Z_{i} .

That the same holds here for GMM under cross-sectional heteroskedasticity when using optimal instruments is due to the very special shape of

{\ddot{Z}}_{i}

and is proved in Appendix B. Hence, there seems no good reason to estimate the system, just in order to exploit the extra valid instrument

{(0^{'}, ι_{T}^{'})}^{'} .

3.3.1. Effect Stationarity

However, more valid internal instruments can be found for the equation in levels when some of the regressors

X_{i},

W_{i}

or

V_{i}

are known to be uncorrelated (like

ι_{T}

) with the individual effects, or (which is more general) have time-invariant correlation with the individual effects. Then, after first differencing, these explanatory variables will be uncorrelated with

η_{i} .

Let

r_{i t}^{⧫} = {(x_{i t}^{⧫'}, w_{i t}^{⧫'}, v_{i t}^{⧫'})}^{'}

contain the

K^{⧫} = K_{x}^{⧫} + K_{w}^{⧫} + K_{v}^{⧫}

unique elements of

r_{i t}

which are effect stationary, by which we mean that

E (r_{i t} η_{i})

is time-invariant, so that

\begin{matrix} E (Δ r_{i t}^{⧫} η_{i}) = 0, \forall i, t = 2, . . ., T . \end{matrix}

This implies that for the equation in levels the following orthogonality conditions hold

\begin{matrix} \begin{matrix} \begin{matrix} E [Δ x_{i t}^{⧫} (η_{i} + ε_{i s})] = 0 \\ E [Δ w_{i t}^{⧫} (η_{i} + ε_{i, t + l})] = 0 \\ E [Δ v_{i t}^{⧫} (η_{i} + ε_{i, t + 1 + l})] = 0 \end{matrix}\} \forall i, t > 1, s \geq 1, l \geq 0 \end{matrix} \end{matrix}

(56)

When

w_{i t}^{⧫}

includes

y_{i, t - 1},

then apparently

y_{i t}

is effect stationary so that the adopted model (24) suggests that all regressors in

r_{i t}

must be effect stationary, resulting in

K^{⧫} = K .

Like for the

T - 1

conditions

E (Δ ε_{i t}) = 0

discussed below Equation (52), many of the conditions (56) are already implied by the orthogonality conditions

E (Z_{i}^{'} {\tilde{ε}}_{i}) = 0

for the equation in first-differences. In Appendix C we demonstrate that a matrix

{\tilde{Z}}_{i}^{s}

of instruments can be designed for the equation in levels (52) just containing instruments additional to those already exploited by

E (Z_{i}^{'} {\tilde{ε}}_{i}) = 0,

whilst

E [{\tilde{Z}}_{i}^{s'} (η_{i} ι_{T} + ε_{i})] = 0 .

This is the

T \times L^{⧫}

matrix

\begin{matrix} \begin{matrix} {\tilde{Z}}_{i}^{s} = ({\tilde{Z}}_{i}^{x}, {\tilde{Z}}_{i}^{w}, {\tilde{Z}}_{i}^{v}, ι_{T}), \end{matrix} \end{matrix}

(57)

where

L^{⧫} = K^{⧫} (T - 1) - K_{v}^{⧫} + 1,

with

\begin{matrix} {\tilde{Z}}_{i}^{x} & = (\begin{matrix} 0^{'} & 0^{'} & \dots & 0^{'} \\ Δ x_{i 2}^{⧫'} & 0^{'} & \dots & 0^{'} \\ 0^{'} & Δ x_{i 3}^{⧫'} & 0^{'} \\ ⋮ & ⋱ \\ 0^{'} & 0^{'} & Δ x_{i T}^{⧫'} \end{matrix}), {\tilde{Z}}_{i}^{w} = (\begin{matrix} 0^{'} & 0^{'} & \dots & 0^{'} \\ Δ w_{i 2}^{⧫'} & 0^{'} & \dots & 0^{'} \\ 0^{'} & Δ w_{i 3}^{⧫'} & 0^{'} \\ ⋮ & ⋱ \\ 0^{'} & 0^{'} & Δ w_{i T}^{⧫'} \end{matrix}), \\ {\tilde{Z}}_{i}^{v} & = (\begin{matrix} 0^{'} & \dots & 0^{'} \\ 0^{'} & \dots & 0^{'} \\ Δ v_{i 2}^{⧫'} & 0^{'} \\ ⋱ \\ 0^{'} & Δ v_{i, T - 1}^{⧫'} \end{matrix}) . \end{matrix}

Under effect stationarity of the

K^{⧫}

variables (56) the system (53) can be estimated while exploiting the matrix of instruments

\begin{matrix} \begin{matrix} {\ddot{Z}}_{i}^{s} = (\begin{matrix} Z_{i} & O \\ O & {\tilde{Z}}_{i}^{s} \end{matrix}) . \end{matrix} \end{matrix}

(58)

If one decides to collapse the instruments included in

Z_{i},

it seems reasonable to collapse

{\tilde{Z}}_{i}^{s}

as well and replace it by

\begin{matrix} \begin{matrix} {\ddot{Z}}_{i}^{* s} = (\begin{matrix} 0^{'} & 0^{'} & 0^{'} & 1 \\ Δ x_{i 2}^{⧫'} & Δ w_{i 2}^{⧫'} & 0^{'} & 1 \\ Δ x_{i 3}^{⧫'} & Δ w_{i 3}^{⧫'} & Δ v_{i 2}^{⧫'} & 1 \\ ⋮ & ⋮ & ⋮ & ⋮ \\ Δ x_{i T}^{⧫'} & Δ w_{i T}^{⧫'} & Δ v_{i, T - 1}^{⧫'} & 1 \end{matrix}) . \end{matrix} \end{matrix}

(59)

Note that

{\ddot{Z}}_{i}^{* s}

has

L^{⧫} = K^{⧫} + 1

columns.

3.3.2. Alternative Weighting Matrices under Effect Stationarity

For the above system we have

\begin{matrix} \begin{matrix} N^{- 1 / 2} Σ_{i = 1}^{N} {\ddot{Z}}_{i}^{s'} {\ddot{u}}_{i} \overset{d}{\to} N (0, plim N^{- 1} Σ_{i = 1}^{N} Φ_{i}), \end{matrix} \end{matrix}

(60)

with

\begin{matrix} \begin{matrix} Φ_{i} = (\begin{matrix} Z_{i}^{'} D Ω_{i} D^{'} Z_{i} & Z_{i}^{'} D Ω_{i} {\tilde{Z}}_{i}^{s} \\ {\tilde{Z}}_{i}^{s'} Ω_{i} D^{'} Z_{i} & {\tilde{Z}}_{i}^{s'} (Ω_{i} + σ_{η}^{2} ι_{T} ι_{T}^{'}) {\tilde{Z}}_{i}^{s} \end{matrix}) . \end{matrix} \end{matrix}

(61)

Hence a feasible initial weighting matrix is given by

\begin{matrix} \begin{matrix} S^{(0)} (q) = {[Σ_{i = 1}^{N} Φ_{i}^{(0)} (q)]}^{- 1}, \end{matrix} \end{matrix}

(62)

where

\begin{matrix} Φ_{i}^{(0)} (q) = (\begin{matrix} Z_{i}^{'} D D^{'} Z_{i} & Z_{i}^{'} D {\tilde{Z}}_{i}^{s} \\ {\tilde{Z}}_{i}^{s'} D^{'} Z_{i} & {\tilde{Z}}_{i}^{s'} (I_{T} + q ι_{T} ι_{T}^{'}) {\tilde{Z}}_{i}^{s} \end{matrix}), \end{matrix}

with q some nonnegative real value. Weighting matrix

S^{(0)} (q)

would be optimal if

Ω_{i} = σ_{ε}^{2} I_{T}

with

q = σ_{η}^{2} / σ_{ε}^{2} .

For any nonnegative q a consistent 1-step GMM system estimator is given by

\begin{matrix} \begin{matrix} {\hat{\ddot{α}}}^{(1)} (q) = {[(Σ_{i = 1}^{N} {\ddot{R}}_{i}^{'} {\ddot{Z}}_{i}^{s}) S^{(0)} (q) (Σ_{i = 1}^{N} {\ddot{Z}}_{i}^{s'} {\ddot{R}}_{i})]}^{- 1} (Σ_{i = 1}^{N} {\ddot{R}}_{i}^{'} {\ddot{Z}}_{i}^{s}) S^{(0)} (q) (Σ_{i = 1}^{N} {\ddot{Z}}_{i}^{s'} {\ddot{y}}_{i}) . \end{matrix} \end{matrix}

(63)

Next, in a second step, the consistent 1-step residuals

{\hat{\ddot{u}}}_{i}^{(1)} = {\ddot{y}}_{i} - {\ddot{R}}_{i} {\hat{\ddot{α}}}^{(1)} (q)

can be used to construct the asymptotically optimal weighting matrix

\begin{matrix} \begin{matrix} {\hat{S}}_{a}^{(1)} = {(Σ_{i = 1}^{N} {\ddot{Z}}_{i}^{s'} {\hat{\ddot{u}}}_{i}^{(1)} {\hat{\ddot{u}}}_{i}^{(1)'} {\ddot{Z}}_{i}^{s})}^{- 1}, \end{matrix} \end{matrix}

(64)

where

{\hat{\ddot{u}}}_{i}^{(1)} = {({\hat{\tilde{ε}}}_{i}^{s (1)'}, {\hat{u}}_{i}^{s (1)'})}^{'}

with

{\hat{\tilde{ε}}}_{i}^{s (1)} = {\tilde{y}}_{i} - {\tilde{R}}_{i}^{*} {\hat{\ddot{α}}}^{(1)} (q)

and

{\hat{u}}_{i}^{s (1)} = y_{i} - R_{i}^{*} {\hat{\ddot{α}}}^{(1)} (q) .

However, several alternatives are possible. Consider weighting matrix

\begin{matrix} \begin{matrix} {\hat{S}}_{b}^{(1)} = {[Σ_{i = 1}^{N} (\begin{matrix} Z_{i}^{'} {\hat{H}}_{i}^{s (1)} Z_{i} & Z_{i}^{'} {\hat{D}}_{i}^{s (1)} {\tilde{Z}}_{i}^{s} \\ {\tilde{Z}}_{i}^{s'} {\hat{D}}_{i}^{s (1)'} Z_{i} & {\tilde{Z}}_{i}^{s'} {\hat{u}}_{i}^{s (1)} {\hat{u}}_{i}^{s (1)'} {\tilde{Z}}_{i}^{s} \end{matrix})]}^{- 1}, \end{matrix} \end{matrix}

(65)

where

{\hat{H}}_{i}^{s (1)}

is self-evidently like

{\hat{H}}_{i}^{(1) b}

but on the basis of the residuals

{\hat{\tilde{ε}}}_{i}^{s (1)},

and

\begin{matrix} \begin{matrix} {\hat{D}}_{i}^{s (1)} = (\begin{matrix} {\hat{\tilde{ε}}}_{i 2}^{s (1)} {\hat{u}}_{i 1}^{s (1)} & {\hat{\tilde{ε}}}_{i 2}^{s (1)} {\hat{u}}_{i 2}^{s (1)} & 0 & \dots & 0 & 0 \\ 0 & {\hat{\tilde{ε}}}_{i 3}^{s (1)} {\hat{u}}_{i 2}^{s (1)} & {\hat{\tilde{ε}}}_{i 3}^{s (1)} {\hat{u}}_{i 3}^{s (1)} & 0 \\ ⋮ & ⋱ & ⋱ & ⋱ & ⋮ \\ ⋮ & ⋱ & ⋱ & ⋱ & ⋮ \\ 0 & ⋱ & {\hat{\tilde{ε}}}_{i, T - 1}^{s (1)} {\hat{u}}_{i, T - 1}^{s (1)} & {\hat{\tilde{ε}}}_{i, T - 1}^{s (1)} {\hat{u}}_{i T}^{s (1)} \\ 0 & 0 & 0 & \dots & 0 & {\hat{\tilde{ε}}}_{i T}^{s (1)} {\hat{u}}_{i T}^{s (1)} \end{matrix}) . \end{matrix} \end{matrix}

(66)

For the special case

σ_{ε}^{2} Ω_{i} = σ_{ε, i}^{2} I_{T}

of cross-section heteroskedasticity and time-series homoskedasticity one can use the weighting matrix

\begin{matrix} \begin{matrix} {\hat{S}}_{c}^{(1)} = {[Σ_{i = 1}^{N} {\hat{σ}}_{ε, i}^{2, s (1)} (\begin{matrix} Z_{i}^{'} H Z_{i} & Z_{i}^{'} D {\tilde{Z}}_{i}^{s} \\ {\tilde{Z}}_{i}^{s'} D^{'} Z_{i} & {\tilde{Z}}_{i}^{s'} [I_{T} + ({\hat{σ}}_{η}^{2, (1)} / {\hat{σ}}_{ε, i}^{2, s (1)}) ι_{T} ι_{T}^{'}] {\tilde{Z}}_{i}^{s} \end{matrix})]}^{- 1}, \end{matrix} \end{matrix}

(67)

where

\begin{matrix} {\hat{σ}}_{ε, i}^{2, s (1)} & = {\hat{\tilde{ε}}}_{i}^{s (1)'} H^{- 1} {\hat{\tilde{ε}}}_{i}^{s (1)} / (T - 1), \end{matrix}

(68)

\begin{matrix} {\hat{σ}}_{η}^{2, s (1)} & = T^{- 1} N^{- 1} Σ_{i = 1}^{N} {\hat{u}}_{i}^{s (1)'} {\hat{u}}_{i}^{s (1)} - N^{- 1} Σ_{i = 1}^{N} {\hat{σ}}_{ε, i}^{2, s (1)} . \end{matrix}

(69)

For

j \in {a, b, c}

three alternative 2-step system estimators

\begin{matrix} \begin{matrix} {\hat{\ddot{α}}}_{(j)}^{(2)} = {[(Σ_{i = 1}^{N} {\ddot{R}}_{i}^{'} {\ddot{Z}}_{i}^{s}) {\hat{S}}_{j}^{(1)} (Σ_{i = 1}^{N} {\ddot{Z}}_{i}^{s'} {\ddot{R}}_{i})]}^{- 1} (Σ_{i = 1}^{N} {\ddot{R}}_{i}^{'} {\ddot{Z}}_{i}^{s}) {\hat{S}}_{j}^{(1)} (Σ_{i = 1}^{N} {\ddot{Z}}_{i}^{s'} {\ddot{y}}_{i}) \end{matrix} \end{matrix}

(70)

are obtained, where the inverse matrix expression can be used again to estimate the variance of

{\hat{\ddot{α}}}_{(j)}^{(2)}

if all employed moment conditions are valid.

3.4. Coefficient Restriction Tests

Simple Student-type coefficient test statistics can be obtained from 1-step and 2-step AB and BB estimation for the different weighting matrices considered. The 1-step estimators can be used in combination with a robust variance estimate (which takes possible heteroskedasticity into account). The 2-step estimators can be used in combination with the standard or a corrected variance estimate.1

When testing particular coefficient values, the relevant element of estimator

{\hat{\tilde{α}}}^{(1)}

given in (41) should under homoskedasticity be scaled by the corresponding diagonal element of the standard expression for its estimated variance given by

\begin{matrix} \begin{matrix} \hat{V a r} ({\hat{\tilde{α}}}^{(1)}) = N^{- 1} Σ_{i = 1}^{N} {\hat{σ}}_{ε, i}^{2, (1)} Ψ, with Ψ = {[(Σ_{i = 1}^{N} {\tilde{R}}_{i}^{'} Z_{i}) G^{(0)} (Σ_{i = 1}^{N} Z_{i}^{'} {\tilde{R}}_{i})]}^{- 1}, \end{matrix} \end{matrix}

(71)

where

{\hat{σ}}_{ε, i}^{2, (1)}

is given in (48). Its robust version under cross-sectional heteroskedasticity uses for

j \in {a, b, c}

\begin{matrix} \begin{matrix} {\hat{V a r}}_{(j)} ({\hat{\tilde{α}}}^{(1)}) = Ψ (Σ_{i = 1}^{N} {\tilde{R}}_{i}^{'} Z_{i}) G^{(0)} {[{\hat{G}}_{j}^{(1)}]}^{- 1} G^{(0)} (Σ_{i = 1}^{N} Z_{i}^{'} {\tilde{R}}_{i}) Ψ . \end{matrix} \end{matrix}

(72

However, under heteroskedasticity the estimators

{\hat{\tilde{α}}}_{(j)}^{(2)}

given in (70) are more efficient. The standard estimator for their variance is

\begin{matrix} \begin{matrix} \hat{V a r} ({\hat{\tilde{α}}}_{(j)}^{(2)}) = {[(Σ_{i = 1}^{N} {\tilde{R}}_{i}^{'} Z_{i}) {\hat{G}}_{j}^{(1)} (Σ_{i = 1}^{N} Z_{i}^{'} {\tilde{R}}_{i})]}^{- 1} . \end{matrix} \end{matrix}

(73)

The corrected version

\hat{V a r c} ({\hat{\tilde{α}}}_{(j)}^{(2)})

requires derivation for

k = 1, . . ., K - 1

of the actual implementation of matrix

\partial Ω (β) / \partial β_{k}

of Appendix A which is here

N (T - 1) \times N (T - 1)

. We denote its i-th block as

\partial {\tilde{Ω}}_{(j) i} (\tilde{α}) / \partial {\tilde{α}}_{k} .

For the a-type weighting matrix2 the relevant

T - 1 \times T - 1

matrix

\partial {\tilde{ε}}_{i} {\tilde{ε}}_{i}^{'} / \partial {\tilde{α}}_{k}

with

{\tilde{ε}}_{i} = {\tilde{y}}_{i} - {\tilde{R}}_{i} \tilde{α},

is

- ({\tilde{ε}}_{i} {\tilde{R}}_{i k}^{'} + {\tilde{R}}_{i k} {\tilde{ε}}_{i}^{'}),

where

{\tilde{R}}_{i k}

denotes the k-th column of

{\tilde{R}}_{i} .

For weighting matrix b it simplifies to the matrix consisting of the main diagonal and the two first sub-diagonals of

- ({\tilde{ε}}_{i} {\tilde{R}}_{i k}^{'} + {\tilde{R}}_{i k} {\tilde{ε}}_{i}^{'})

with all other elements zero. Moreover,

\partial {\tilde{Ω}}_{(c) i} (\tilde{α}) / \partial {\tilde{α}}_{k} = - 2 [{\tilde{ε}}_{i}^{'} H^{- 1} {\tilde{R}}_{i k} / (T - 1)] H .

So, we find

\begin{matrix} \begin{matrix} \hat{V a r c} ({\hat{\tilde{α}}}_{(j)}^{(2)}) = \hat{V a r} ({\hat{\tilde{α}}}_{(j)}^{(2)}) + {\hat{F}}_{(j)} \hat{V a r} ({\hat{\tilde{α}}}_{(j)}^{(2)}) + \hat{V a r} ({\hat{\tilde{α}}}_{(j)}^{(2)}) {\hat{F}}_{(j)}^{'} + {\hat{F}}_{(j)} {\hat{V a r}}_{(j)} ({\hat{\tilde{α}}}^{(1)}) {\hat{F}}_{(j)}^{'}, \end{matrix} \end{matrix}

(74)

with the k-th column of

{\hat{F}}_{(j)}

given by

\begin{matrix} \begin{matrix} {\hat{F}}_{(j) \cdot k} = - \hat{V a r} ({\hat{\tilde{α}}}_{(j)}^{(2)}) (Σ_{i = 1}^{N} {\tilde{R}}_{i}^{'} Z_{i}) {\hat{G}}_{j}^{(1)} (Σ_{i = 1}^{N} Z_{i}^{'} {\frac{\partial {\tilde{Ω}}_{(j) i} (\tilde{α})}{\partial {\tilde{α}}_{k}}|}_{{\hat{\tilde{α}}}^{(1)}} Z_{i}) {\hat{G}}_{j}^{(1)} (Σ_{i = 1}^{N} Z_{i}^{'} {\hat{\tilde{ε}}}_{i}^{(2)}) . \end{matrix} \end{matrix}

(75)

All above expressions become a bit more complex when considering Blundell-Bond estimation of the K coefficients

\ddot{α} .

The suboptimal 1-step estimator (63) of

\ddot{α}

should not be used for testing, unless in combination with

\begin{matrix} \begin{matrix} \hat{V a r} ({\hat{\ddot{α}}}^{(1)}) = Φ (Σ_{i = 1}^{N} {\ddot{R}}_{i}^{'} {\ddot{Z}}_{i}^{s}) S^{(0)} (q) {[S^{(0)} ({\hat{σ}}_{η}^{2, s (1)} / {\hat{σ}}_{ε}^{2, s (1)})]}^{- 1} S^{(0)} (q) (Σ_{i = 1}^{N} {\ddot{Z}}_{i}^{s'} {\ddot{R}}_{i}) Φ, \end{matrix} \end{matrix}

(76)

under homoskedasticity, or a robust variance estimator, which is

\begin{matrix} \begin{matrix} {\hat{V a r}}_{(j)} ({\hat{\ddot{α}}}^{(1)}) = Φ (Σ_{i = 1}^{N} {\ddot{R}}_{i}^{'} {\ddot{Z}}_{i}^{s}) S^{(0)} (q) {[{\hat{S}}_{j}^{(1)}]}^{- 1} S^{(0)} (q) (Σ_{i = 1}^{N} {\ddot{Z}}_{i}^{s'} {\ddot{R}}_{i}) Φ, \end{matrix} \end{matrix}

(77)

where

Φ = {[(Σ_{i = 1}^{N} {\ddot{R}}_{i}^{'} {\ddot{Z}}_{i}^{s}) S^{(0)} (q) (Σ_{i = 1}^{N} {\ddot{Z}}_{i}^{s'} {\ddot{R}}_{i})]}^{- 1} .

It seems better of course to use the efficient estimator

{\hat{\ddot{α}}}_{(j)}^{(2)}

of (70). The standard expression for its estimated variance is

\begin{matrix} \begin{matrix} \hat{V a r} ({\hat{\ddot{α}}}_{(j)}^{(2)}) = {[(Σ_{i = 1}^{N} {\ddot{R}}_{i}^{'} {\ddot{Z}}_{i}^{s}) {\hat{S}}_{j}^{(1)} (Σ_{i = 1}^{N} {\ddot{Z}}_{i}^{s'} {\ddot{R}}_{i})]}^{- 1} . \end{matrix} \end{matrix}

(78)

Their corrected versions can be obtained by a formula similar to (74) upon changing

\tilde{α}

in

\ddot{α}

and

{\hat{F}}_{(j) \cdot k}

of (75) in

\begin{matrix} \begin{matrix} {\hat{F}}_{(j) \cdot k} = - \hat{V a r} ({\hat{\ddot{α}}}_{(j)}^{(2)}) (Σ_{i = 1}^{N} {\ddot{R}}_{i}^{'} {\ddot{Z}}_{i}^{s}) {\hat{S}}_{j}^{(1)} (Σ_{i = 1}^{N} {\ddot{Z}}_{i}^{s'} {\frac{\partial {\ddot{Ω}}_{(j) i} (\ddot{α})}{\partial {\ddot{α}}_{k}}|}_{{\hat{\ddot{α}}}^{(1)}} {\ddot{Z}}_{i}^{s}) {\hat{S}}_{j}^{(1)} (Σ_{i = 1}^{N} {\ddot{Z}}_{i}^{s'} {\hat{\ddot{ε}}}_{i}^{(2)}), \end{matrix} \end{matrix}

(79)

where the block of

\partial {\ddot{Ω}}_{(j) i} (\ddot{α}) / \partial {\ddot{α}}_{k}

corresponding to the equation in first differences is similar as before, but with an extra column and row of zeros for the intercept. The block corresponding to the equation in levels we took for weighting matrices a and b equal to

\partial u_{i} u_{i}^{'} / \partial {\ddot{α}}_{k} = - (u_{i} {\bar{R}}_{i k}^{*'} + {\bar{R}}_{i k}^{*} u_{i}^{'}),

and for type c

\begin{matrix} \partial {{\tilde{ε}}_{i}^{'} H^{- 1} {\tilde{ε}}_{i} / (T - 1) + Σ_{i = 1}^{N} [{(ι_{T}^{'} u_{i})}^{2} - u_{i}^{'} u_{i}] / [N T (T - 1)]} I_{T} / \partial {\ddot{α}}_{k}, \end{matrix}

for which the first term yields

- 2 [({\tilde{ε}}_{i}^{'} H^{- 1} {\tilde{R}}_{i k}) / (T - 1)] I_{T},

and the second gives

\begin{matrix} - 2 {Σ_{i = 1}^{N} (ι_{T}^{'} u_{i} {\bar{R}}_{i k}^{*'} ι_{T} - {\bar{R}}_{i k}^{*'} u_{i}) / [N T (T - 1)]} I_{T} . \end{matrix}

For the nondiagonal upperblock of

\partial {\ddot{Ω}}_{(j) i} (\ddot{α}) / \partial {\ddot{α}}_{k}

we took in cases a and b

\partial {\tilde{ε}}_{i} u_{i}^{'} / \partial {\tilde{α}}_{k} = - ({\tilde{ε}}_{i} {\bar{R}}_{i k}^{*'} + {\tilde{R}}_{i k} u_{i}^{'})

and for the derivative with respect to the intercept

- {\tilde{ε}}_{i} ι_{T}^{'} .

In case c it is

- 2 [({\tilde{ε}}_{i}^{'} H^{- 1} {\tilde{R}}_{i k}) / (T - 1)] D

and a zero matrix for the derivative with respect to the intercept.

3.5. Tests of Overidentification Restrictions

Using Arellano-Bond and Blundell-Bond type estimation, many options exist with respect to testing the overidentification restrictions. These options differ in the residuals and weighting matrices being employed. After 1-step Arellano-Bond estimation, see (41) and (48), we have the test statistics

\begin{matrix} J A B^{(1, 0)} & = (Σ_{i = 1}^{N} {\hat{\tilde{ε}}}_{i}^{(1)'} Z_{i}) {(Σ_{i = 1}^{N} Z_{i}^{'} H Z_{i})}^{- 1} (Σ_{i = 1}^{N} Z_{i}^{'} {\hat{\tilde{ε}}}_{i}^{(1)}) / (N^{- 1} Σ_{i = 1}^{N} {\hat{σ}}_{i}^{2, (1)}), \end{matrix}

(80)

\begin{matrix} J A B_{j}^{(1, 1)} & = (Σ_{i = 1}^{N} {\hat{\tilde{ε}}}_{i}^{(1)'} Z_{i}) {\hat{G}}_{j}^{(1)} (Σ_{i = 1}^{N} Z_{i}^{'} {\hat{\tilde{ε}}}_{i}^{(1)}), j \in {a, b, c} \end{matrix}

(81)

which are only valid in case of conditional homoskedasticity. Here

{\hat{G}}_{j}^{(1)}

is given in (43), (44) and (46). From (70) one may obtain 2-step residuals

{\hat{\tilde{ε}}}_{i (j)}^{(2)} = {\tilde{y}}_{i} - {\tilde{R}}_{i}^{*} {\hat{\ddot{α}}}_{(j)}^{(2)}

, and from these overidentification restrictions test statistics can be calculated which are valid under heteroskedasticity. These may differ depending on whether the j-th weighting matrix is now obtained still from 1-step or already from 2-step residuals. This leads to

\begin{matrix} \begin{matrix} J A B_{j}^{(2, h)} = (Σ_{i = 1}^{N} {\hat{\tilde{ε}}}_{a, i}^{(2)'} Z_{i}) {\hat{G}}_{j}^{(h)} (Σ_{i = 1}^{N} Z_{i}^{'} {\hat{\tilde{ε}}}_{a, i}^{(2)}), for h \in {1, 2} \end{matrix} \end{matrix}

(82)

where the 2-step weighting matrices are either

{\hat{G}}_{a}^{(2)} = {(Σ_{i = 1}^{N} Z_{i}^{'} {\hat{\tilde{ε}}}_{i (a)}^{(2)} {\hat{\tilde{ε}}}_{i (a)}^{(2)'} Z_{i})}^{- 1},

{\hat{G}}_{b}^{(2)} = {(Σ_{i = 1}^{N} Z_{i}^{'} {\hat{H}}_{i}^{(2) b} Z_{i})}^{- 1}

or

{\hat{G}}_{c}^{(2)} = {(Σ_{i = 1}^{N} {\hat{σ}}_{ε, i (c)}^{2, (2)} Z_{i}^{'} H Z_{i})}^{- 1},

and

{\hat{H}}_{i}^{(2) b}

is like

{\hat{H}}_{i}^{(1) b}

of (45), though using

{\hat{\tilde{ε}}}_{i (b)}^{(2)}

instead of

{\hat{\tilde{ε}}}_{i}^{(1)}

; furthermore

{\hat{σ}}_{ε, i (c)}^{2, (2)} = {\hat{\tilde{ε}}}_{i (c)}^{(2)'} H^{- 1} {\hat{\tilde{ε}}}_{i (c)}^{(2)} / (T - 1) .

Exploiting effect stationarity of a subset of the regressors by estimating the Blundell-Bond system leads to the 1-step test statistics (exclusively for use under conditional homoskedasticity)

\begin{matrix} J B B^{(1, 0)} & = (Σ_{i = 1}^{N} {\hat{\ddot{u}}}_{i}^{(1)'} {\ddot{Z}}_{i}^{s}) S^{(0)} ({\hat{σ}}_{η}^{2, s (1)} / {\hat{σ}}_{ε}^{2, s (1)}) (Σ_{i = 1}^{N} {\ddot{Z}}_{i}^{s'} {\hat{\ddot{u}}}_{i}^{(1)}) / {\hat{σ}}_{ε}^{2, s (1)}, \end{matrix}

(83)

\begin{matrix} J B B_{j}^{(1, 1)} & = (Σ_{i = 1}^{N} {\hat{\ddot{u}}}_{i}^{(1)'} {\ddot{Z}}_{i}^{s}) {\hat{S}}_{j}^{(1)} (Σ_{i = 1}^{N} {\ddot{Z}}_{i}^{s'} {\hat{\ddot{u}}}_{i}^{(1)}), j \in {a, b, c} \end{matrix}

(84)

where

{\hat{σ}}_{ε}^{2, s (1)} = Σ_{i = 1}^{N} {\hat{σ}}_{ε, i}^{2, s (1)} / N

and

S^{(0)} (\cdot)

and

{\hat{S}}_{j}^{(1)}

can be found in (62), (64), (65) and (67). Defining the various 2-step residuals and variance estimators as

{\hat{\ddot{u}}}_{i (j)}^{(2)} = {\ddot{y}}_{i} - {\ddot{R}}_{i} {\hat{\ddot{α}}}_{(j)}^{(2)} = {({\hat{\tilde{ε}}}_{i (j)}^{s (2)'}, {\hat{u}}_{i (j)}^{s (2)'})}^{'}

and

{\hat{σ}}_{ε, i (j)}^{2, s (2)}

and

{\hat{σ}}_{η (j)}^{2, (2)}

similar to (68) and (69) though obtained from the appropriate two-step residuals

{\hat{\tilde{ε}}}_{i (j)}^{s (2)} = {\tilde{y}}_{i} - {\tilde{R}}_{i}^{*} {\hat{\ddot{α}}}_{(j)}^{(2)}

and

{\hat{u}}_{i (j)}^{s (2)} = y_{i} - R_{i}^{*} {\hat{\ddot{α}}}_{(j)}^{(2)}

, the statistics to be used under heteroskedasticity after 2-step estimation are

\begin{matrix} \begin{matrix} J B B_{j}^{(2, h)} = (Σ_{i = 1}^{N} {\hat{\ddot{u}}}_{i (j)}^{(2)'} {\ddot{Z}}_{i}^{s}) {\hat{S}}_{j}^{(h)} (Σ_{i = 1}^{N} {\ddot{Z}}_{i}^{s'} {\hat{\ddot{u}}}_{i (j)}^{(2)}), \end{matrix} \end{matrix}

(85)

where

{\hat{S}}_{a}^{(2)}

and

{\hat{S}}_{b}^{(2)}

are like

{\hat{S}}_{a}^{(1)}

and

{\hat{S}}_{b}^{(1)}

, except that they use

{\hat{\ddot{u}}}_{i (a)}^{(2)}

and

{\hat{\ddot{u}}}_{i (b)}^{(2)}

instead of

{\hat{\ddot{u}}}_{i}^{(1)}

. With respect to

{\hat{S}}_{c}^{(2)}

one can use

\begin{matrix} {\hat{S}}_{c}^{(2)} = {[Σ_{i = 1}^{N} {\hat{σ}}_{ε, i (j)}^{2, s (2)} (\begin{matrix} Z_{i}^{'} H Z_{i} & Z_{i}^{'} D {\tilde{Z}}_{i}^{s} \\ {\tilde{Z}}_{i}^{s'} D^{'} Z_{i} & {\tilde{Z}}_{i}^{s'} [I_{T} + ({\hat{σ}}_{η (c)}^{2, (2)} / {\hat{σ}}_{ε, i (c)}^{2, s (2)}) ι_{T} ι_{T}^{'}] {\tilde{Z}}_{i}^{s} \end{matrix})]}^{- 1} . \end{matrix}

Under their respective null hypotheses the tests based on Arellano-Bond estimation follow asymptotically

χ^{2}

distributions with

L - K - T + 1

degrees of freedom, whereas the tests based on Blundell-Bond estimates have

L + L^{⧫} - K - T

degrees of freedom3. Self-evidently tests on the effect stationarity related orthogonality conditions are given by

\begin{matrix} J E S^{(1, 0)} & = J B B^{(1, 0)} - J A B^{(1, 0)}, \end{matrix}

(86)

\begin{matrix} J E S_{j}^{(l, h)} & = J B B_{j}^{(l, h)} - J A B_{j}^{(l, h)}, 0 < l \leq h \in {1, 2}, j \in {a, b, c}, \end{matrix}

(87)

where

J E S^{(1, 0)}

and

J E S_{j}^{(1, 1)}

require homoskedasticity. They should all be compared with a

χ^{2}

critical value for

L^{⧫} - 1

degrees of freedom.4

3.6. Modified GMM

In the special case that panel model (28) has cross-sectional heteroskedasticity and no time-series heteroskedasticity, hence

\begin{matrix} \begin{matrix} σ_{ε}^{2} Ω_{i} = σ_{ε, i}^{2} I_{T}, with Σ_{i = 1}^{N} σ_{ε, i}^{2} = σ_{ε}^{2} N, \end{matrix} \end{matrix}

(88)

we can easily employ MGMM estimator (11). However, because

H^{- 1}

is not a lower-triangular matrix, not all instruments

σ_{ε, i}^{- 2} H^{- 1} Z_{i}

would be valid for the equation in first-differences. This problem can be avoided by using, instead of first-differencing, the forward orthogonal deviation (FOD) transformation for removing the individual effects. Let

\begin{matrix} \begin{matrix} B = {[\begin{matrix} \frac{T - 1}{T} & 0 & 0 & \dots & 0 \\ 0 & \frac{T - 2}{T - 1} & 0 & \dots & 0 \\ ⋮ & 0 & ⋱ & ⋱ & ⋮ \\ ⋮ & ⋮ & ⋱ & \frac{2}{3} & 0 \\ 0 & 0 & \dots & 0 & \frac{1}{2} \end{matrix}]}^{1 / 2} [\begin{matrix} 1 & - \frac{1}{T - 1} & \dots & \dots & - \frac{1}{T - 1} & - \frac{1}{T - 1} \\ 0 & 1 & - \frac{1}{T - 2} & \dots & - \frac{1}{T - 2} & - \frac{1}{T - 2} \\ ⋮ & ⋮ & ⋱ & ⋱ & ⋮ & ⋮ \\ 0 & 0 & ⋱ & 1 & - \frac{1}{2} & - \frac{1}{2} \\ 0 & 0 & \dots & 0 & 1 & - 1 \end{matrix}], \end{matrix} \end{matrix}

(89)

and

{\overset{ˇ}{ε}}_{i} = B ε_{i} .

Then

B ι_{T} = 0

and

B u_{i} = {\overset{ˇ}{ε}}_{i} \sim (0, σ_{ε, i}^{2} I_{T - 1})

provided (88) holds, whereas

E (Z_{i}^{'} {\overset{ˇ}{ε}}_{i}) = 0

for

Z_{i}

given by (34). Hence, premultiplying model

y_{i} = R_{i} \tilde{α} + (μ + τ_{1} + η_{i}) ι_{T} + ε_{i}

by B yields

\begin{matrix} \begin{matrix} {\overset{ˇ}{y}}_{i} = {\overset{ˇ}{R}}_{i} \tilde{α} + {\overset{ˇ}{ε}}_{i}, \end{matrix} \end{matrix}

(90)

where

{\overset{ˇ}{y}}_{i} = B y_{i}

and

{\overset{ˇ}{R}}_{i} = B R_{i} .

Estimating this by GMM, but using an instrument matrix with components

σ_{ε, i}^{- 2} Z_{i},

yields the unfeasible MABu estimator for the model with cross-sectional heteroskedasticity, which is

\begin{matrix} \begin{matrix} {\hat{\tilde{α}}}_{M A B u} & = {[(Σ_{i = 1}^{N} σ_{ε, i}^{- 2} {\overset{ˇ}{R}}_{i}^{'} Z_{i}) {(Σ_{i = 1}^{N} σ_{ε, i}^{- 2} Z_{i}^{'} Z_{i})}^{- 1} (Σ_{i = 1}^{N} σ_{ε, i}^{- 2} Z_{i}^{'} {\overset{ˇ}{R}}_{i})]}^{- 1} \times \\ (Σ_{i = 1}^{N} σ_{ε, i}^{- 2} {\overset{ˇ}{R}}_{i}^{'} Z_{i}) {(Σ_{i = 1}^{N} σ_{ε, i}^{- 2} Z_{i}^{'} Z_{i})}^{- 1} (Σ_{i = 1}^{N} σ_{ε, i}^{- 2} Z_{i}^{'} {\overset{ˇ}{y}}_{i}) . \end{matrix} \end{matrix}

(91)

Note that the exploited moment conditions are here

E (σ_{ε, i}^{- 2} Z_{i}^{'} {\overset{ˇ}{ε}}_{i}) = σ_{ε, i}^{- 2} E (Z_{i}^{'} {\overset{ˇ}{ε}}_{i}) = 0 .

For

σ_{ε, i}^{2} > 0

these are intrinsically equivalent with

E (Z_{i}^{'} {\overset{ˇ}{ε}}_{i}) = 0,

but they induce the use of a different set of instruments yielding a different estimator. That it is most likely that the unfeasible standard AB estimator ABu, which uses instruments

σ_{ε, i} Z_{i}

for regressors

σ_{ε, i}^{- 1} {\overset{ˇ}{R}}_{i},

will generally exploit weaker instruments than MABu, which uses

σ_{ε, i}^{- 1} Z_{i}

for regressors

σ_{ε, i}^{- 1} {\overset{ˇ}{R}}_{i},

should be intuitively obvious.

To convert this into a feasible procedure, one could initially assume that all

σ_{ε, i}^{2}

are equal. Then the first-step MGMM estimator

{\hat{\tilde{α}}}_{M A B}^{(1)}

is numerically equivalent to AB1 of (41), provided all instruments are being used.5 Next, exploiting (48), the feasible 2-step MAB estimator can be obtained by

\begin{matrix} \begin{matrix} {\hat{\tilde{α}}}_{M A B}^{(2)} & = {[(Σ_{i = 1}^{N} {\overset{ˇ}{R}}_{i}^{'} Z_{i} / {\hat{σ}}_{ε, i}^{2, (1)}) {(Σ_{i = 1}^{N} Z_{i}^{'} Z_{i} / {\hat{σ}}_{ε, i}^{2, (1)})}^{- 1} (Σ_{i = 1}^{N} Z_{i}^{'} {\overset{ˇ}{R}}_{i} / {\hat{σ}}_{ε, i}^{2, (1)})]}^{- 1} \times \\ (Σ_{i = 1}^{N} {\overset{ˇ}{R}}_{i}^{'} Z_{i} / {\hat{σ}}_{ε, i}^{2, (1)}) {(Σ_{i = 1}^{N} Z_{i}^{'} Z_{i} / {\hat{σ}}_{ε, i}^{2, (1)})}^{- 1} (Σ_{i = 1}^{N} Z_{i}^{'} {\overset{ˇ}{y}}_{i} / {\hat{σ}}_{ε, i}^{2, (1)}) . \end{matrix} \end{matrix}

(92)

Modifying the system estimator is more problematic, primarily because the inverse of the matrix

V a r (u_{i}) = Σ_{i} = σ_{ε, i}^{2} I_{T} + σ_{η}^{2} ι_{T} ι_{T}^{'},

which is

Σ_{i}^{- 1} = σ_{ε, i}^{- 2} [I_{T} + {(T + σ_{ε, i}^{2} / σ_{η}^{2})}^{- 1} ι_{T} ι_{T}^{'}],

is nondiagonal. So, although

E ({\tilde{Z}}_{i}^{s'} u_{i}) = 0,

surely

E ({\tilde{Z}}_{i}^{s'} Σ_{i}^{- 1} u_{i}) \neq 0 .

However, as an unfeasible modified system estimator we can combine estimation of the model for

{\overset{ˇ}{y}}_{i}

using instruments

σ_{ε, i}^{- 2} Z_{i}

with estimation of the model for

y_{i}

using instruments

{(σ_{ε, i}^{2} + σ_{η}^{2})}^{- 1} {\tilde{Z}}_{i}^{s} .

So, the system is then given by the model

\begin{matrix} \begin{matrix} {\overset{⃛}{y}}_{i} = {\overset{⃛}{R}}_{i} \ddot{α} + {\overset{⃛}{u}}_{i}, \end{matrix} \end{matrix}

(93)

where

{\overset{⃛}{y}}_{i} = {({\overset{ˇ}{y}}_{i}^{'}, y_{i}^{'})}^{'},

{\overset{⃛}{R}}_{i} = {({\overset{ˇ}{R}}_{i}^{*'}, R_{i}^{*'})}^{'},

with

{\overset{ˇ}{R}}_{i}^{*} = ({\overset{ˇ}{R}}_{i}, 0)

, and

{\overset{⃛}{u}}_{i} = {({\overset{ˇ}{ε}}_{i}^{'}, u_{i}^{'})}^{'} .

For the 1-step estimator we could again choose some nonnegative value q and calculate the 1-step estimator BB1 given in (63) in order to find residuals and obtain the estimators

{\hat{σ}}_{ε, i}^{2, s (1)}

and

{\hat{σ}}_{η}^{2, s (1)}

of (68) and (69). Building on

E ({\overset{⃛}{u}}_{i} {\overset{⃛}{u}}_{i}^{'})

and instrument matrix block

{\overset{⃛}{Z}}_{i},

given by

\begin{matrix} E ({\overset{⃛}{u}}_{i} {\overset{⃛}{u}}_{i}^{'}) = σ_{ε, i}^{2} (\begin{matrix} I_{T - 1} & B \\ B^{'} & I_{T} + (σ_{η}^{2} / σ_{ε, i}^{2}) ι_{T} ι_{T}^{'} \end{matrix}) and {\overset{⃛}{Z}}_{i} = (\begin{matrix} {\hat{σ}}_{ε, i}^{- 2, s (1)} Z_{i} & O \\ O & {({\hat{σ}}_{ε, i}^{2, s (1)} + {\hat{σ}}_{η}^{2, s (1)})}^{- 1} {\tilde{Z}}_{i}^{s} \end{matrix}), \end{matrix}

one obtains weighting matrix

\begin{matrix} \begin{matrix} {\hat{S}}_{c}^{B (1)} = {[Σ_{i = 1}^{N} (\begin{matrix} {\hat{σ}}_{ε, i}^{- 2, s (1)} Z_{i}^{'} Z_{i} & {({\hat{σ}}_{ε, i}^{2, s (1)} + {\hat{σ}}_{η}^{2, s (1)})}^{- 1} Z_{i}^{'} B {\tilde{Z}}_{i}^{s} \\ {({\hat{σ}}_{ε, i}^{2, s (1)} + {\hat{σ}}_{η}^{2, s (1)})}^{- 1} {\tilde{Z}}_{i}^{s'} B^{'} Z_{i} & {({\hat{σ}}_{ε, i}^{2, s (1)} + {\hat{σ}}_{η}^{2, s (1)})}^{- 2} {\tilde{Z}}_{i}^{s'} [{\hat{σ}}_{ε, i}^{2, s (1)} I_{T} + {\hat{σ}}_{η}^{2, s (1)} ι_{T} ι_{T}^{'}] {\tilde{Z}}_{i}^{s} \end{matrix})]}^{- 1}, \end{matrix} \end{matrix}

(94)

which can be exploited in the feasible 2-step MGMM system estimator MBB

\begin{matrix} \begin{matrix} {\hat{\ddot{α}}}_{M B B}^{(2)} = {[(Σ_{i = 1}^{N} {\overset{⃛}{R}}_{i}^{'} {\overset{⃛}{Z}}_{i}) {\hat{S}}_{c}^{B (1)} (Σ_{i = 1}^{N} {\overset{⃛}{Z}}_{i}^{'} {\overset{⃛}{R}}_{i})]}^{- 1} (Σ_{i = 1}^{N} {\overset{⃛}{R}}_{i}^{'} {\overset{⃛}{Z}}_{i}) {\hat{S}}_{c}^{B (1)} (Σ_{i = 1}^{N} {\overset{⃛}{Z}}_{i}^{'} {\overset{⃛}{y}}_{i}) . \end{matrix} \end{matrix}

(95)

For both

{\hat{\tilde{α}}}_{M A B}^{(2)}

and

{\hat{\ddot{α}}}_{M B B}^{(2)}

relevant t-test and Sargan-Hansen test statistics can be constructed. Regarding the latter we will just examine

\begin{matrix} \begin{matrix} J M A B = (Σ_{i = 1}^{N} {\hat{\overset{ˇ}{ε}}}_{i}^{(2)'} Z_{i} / {\hat{σ}}_{ε, i}^{2, s (1)}) {(Σ_{i = 1}^{N} Z_{i}^{'} Z_{i} / {\hat{σ}}_{ε, i}^{2, s (1)})}^{- 1} (Σ_{i = 1}^{N} Z_{i}^{'} {\hat{\overset{ˇ}{ε}}}_{i}^{(2)} / {\hat{σ}}_{ε, i}^{2, s (1)}), \end{matrix} \end{matrix}

(96)

where

{\hat{\overset{ˇ}{ε}}}_{i}^{(2)} = {\overset{ˇ}{y}}_{i} - {\overset{ˇ}{R}}_{i} {\hat{\tilde{α}}}_{M G M M c h}^{(2)},

and

\begin{matrix} \begin{matrix} J M B B = (Σ_{i = 1}^{N} {\hat{\overset{⃛}{u}}}_{i}^{(2)'} {\overset{⃛}{Z}}_{i}^{s}) {\hat{S}}_{c}^{(1)} (Σ_{i = 1}^{N} {\overset{⃛}{Z}}_{i}^{s'} {\hat{\overset{⃛}{u}}}_{i}^{(2)}), \end{matrix} \end{matrix}

(97)

with

{\hat{\overset{⃛}{u}}}_{i}^{(2)} = {\overset{⃛}{y}}_{i} - {\overset{⃛}{R}}_{i} {\hat{\ddot{α}}}_{M B B}^{(2)} = ({\hat{\overset{ˇ}{ε}}}_{i}^{s (2)'}

{\hat{u}}_{i}^{s (2)'})^{'} .

Under their respective null hypotheses these follow asymptotically

χ^{2}

distributions with

L - K - T + 1

and

L + L^{⧫} - K

degrees of freedom. Self-evidently, the test on the effect stationarity related orthogonality conditions is given by

\begin{matrix} \begin{matrix} J E S M = J M B B - J M A B . \end{matrix} \end{matrix}

(98)

4. Simulation Design

We will examine the stable dynamic simultaneous heteroskedastic DGP (

i = 1, . . ., N,

t = 1, . . ., T

)

\begin{matrix} \begin{matrix} y_{i t} = μ_{y} + γ y_{i, t - 1} + β x_{i t} + σ_{η} η_{i}^{\circ} + σ_{ε} ω_{i}^{1 / 2} ε_{i t}^{\circ} (|γ| < 1) . \end{matrix} \end{matrix}

(99)

Here β has just one element relating to the for each i stable autoregressive regressor

\begin{matrix} x_{i t} & = μ_{x} + ξ x_{i, t - 1} + π_{η} η_{i}^{\circ} + π_{λ} λ_{i}^{\circ} + σ_{v} ω_{i}^{1 / 2} v_{i t}^{\circ}, where \end{matrix}

(100)

\begin{matrix} v_{i t}^{\circ} & = ρ_{v ε} ε_{i t}^{\circ} + {(1 - ρ_{v ε}^{2})}^{1 / 2} ζ_{i t}^{\circ}, \end{matrix}

(101)

with

|ξ| < 1

and

|ρ_{v ε}| < 1 .

All random drawings

η_{i}^{\circ},

ε_{i t}^{\circ},

λ_{i}^{\circ},

ζ_{i t}^{\circ}

are

I I D (0, 1)

and mutually independent. Parameter

ρ_{v ε}

indicates the correlation between the cross-sectionally heteroskedastic disturbances

ε_{i t} = σ_{ε} ω_{i}^{1 / 2} ε_{i t}^{\circ}

and

v_{i t} = σ_{v} ω_{i}^{1 / 2} v_{i t}^{\circ},

which are both homoskedastic over time. How we did generate the values

ω_{1}, . . ., ω_{N}

and the start-up values

x_{i, 0}

and

y_{i, 0}

and chose relevant numerical values for the other eleven parameters will be discussed extensively below.

Note that in this DGP

x_{i t}

is either strictly exogenous

(ρ_{v ε} = 0)

or otherwise endogenous6; the only weakly exogenous regressor is

y_{i, t - 1} .

Regressor

x_{i t}

may be affected contemporaneously by two independent individual specific effects when

π_{η} \neq 0

and

π_{λ} \neq 0,

but also with delays if

ξ \neq 0 .

The dependent variable

y_{i t}

may be affected contemporaneously by the (standardized) individual effect

η_{i}^{\circ},

both directly and indirectly; directly if

σ_{η} \neq 0,

and indirectly via

x_{i t}

when

β π_{η} \neq 0 .

However,

η_{i}^{\circ}

will also have delayed effects on

y_{i t}

, when

γ \neq 0

or

ξ β π_{η} \neq 0,

and so has

λ_{i}^{\circ}

when

ξ β π_{λ} \neq 0 .

The cross-sectional heteroskedasticity is determined by both

η_{i}^{\circ}

and

λ_{i}^{\circ},

the two standardized individual effects, and is thus associated with the regressors

x_{i t}

and

y_{i, t - 1} .

It follows a lognormal pattern when both

η_{i}^{\circ}

and

λ_{i}^{\circ}

are standard normal, because we take

\begin{matrix} \begin{matrix} ω_{i} = e^{h_{i} (θ)}, with h_{i} (θ) = - θ^{2} / 2 + θ [κ^{1 / 2} η_{i}^{\circ} + {(1 - κ)}^{1 / 2} λ_{i}^{\circ}] \sim N I D (- θ^{2} / 2, θ^{2}), \end{matrix} \end{matrix}

(102)

where

0 \leq κ \leq 1 .

This establishes a lognormal distribution with

E (ω_{i}) = 1

and

V a r (ω_{i}) = e^{θ^{2}} - 1 .

So, for

θ = 0

the

ε_{i t}

and

v_{i t}

are homoskedastic. The seriousness of the heteroskedasticity increases with the absolute value of θ. From

h_{i} (θ) / 2 \sim N I D (- θ^{2} / 4, θ^{2} / 4)

it follows that

ω_{i}^{1 / 2} = e^{h_{i} (θ) / 2}

is lognormally distributed too, hence

E (σ_{ε} ω_{i}^{1 / 2}) = σ_{ε} e^{- θ^{2} / 8}

and

V a r (σ_{ε} ω_{i}^{1 / 2}) = σ_{ε}^{2} (1 - e^{- θ^{2} / 4}) .

Table 2 presents some quantiles and moments of the distributions of

ω_{i}

and

ω_{i}^{1 / 2}

(taken as the positive square root of

ω_{i}

) in order to disclose the effects of parameter

θ .

It shows that

θ \geq 1

implies pretty serious heteroskedasticity, whereas it may be qualified mild when

θ \leq 0.3,

say. In all our simulation experiments

ε_{i t}

will be unconditionally homoskedastic, irrespective of the value of

θ,

because

E (ε_{i t}^{2}) = σ_{ε}^{2} E (ω_{i} ε_{i t}^{\circ 2}) = σ_{ε}^{2} .

However, for

θ \neq 0

none of the experiments will be characterized by conditional homoskedasticity, because of the following. Always

y_{i, t - 2}

will be employed as instrument. Because a realization of

y_{i, t - 2}

depends on

θ,

also

E (ε_{i t}^{2} ∣ y_{i, t - 2})

will depend on

θ .

Without loss of generality we may chose

σ_{ε} = 1

and

μ_{y} = μ_{x} = 0 .

Note that (99) implicitly specifies

τ_{1} = 0,

τ^{*} = 0 .

All simulation results refer to estimators where these T restrictions have been imposed (there are no time effects), but

μ_{y} = μ_{x} = 0

have not been imposed. Hence, when estimating the model in levels

ι_{T}

is one of the regressors. Moreover, we may always include

I_{T - 1}

in

Z_{i}

and

ι_{T}

in

{\tilde{Z}}_{i}^{s}

in order to exploit the fundamental moment conditions

E ({\tilde{ε}}_{i t}) = 0

(for

t = 2, . . ., T)

and

E [\sum_{t = 1}^{T} (η_{i} + ε_{i t})] = 0

for

i = 1, . . ., N .

Apart from values for θ and

κ,

we have to make choices on relevant values for eight more parameters. We could choose

γ \in {0.2, 0.5, 0.8},

which covers a broad range of adjustment processes for dynamic behavioral relationships, and

ξ \in {0.5, 0.8, 0.95}

to include less and more smooth

x_{i t}

processes. Next, interesting values should be given to the remaining six parameters, namely

β,

σ_{η},

π_{η},

π_{λ},

σ_{v}

and

ρ_{v ε} .

We will do this by choosing relevant values for six alternative more meaningful notions, which are all functions of some of the eight DGP parameters and allow us to establish relevant numerical values for them, as suggested in Kiviet [39].

The first three notions will be based on (ratios of) particular variance components of the long-run stationary path of the process for

x_{i t}

. Using lag-operator notation and assuming that

v_{i t^{'}}^{\circ}

(and

ε_{i t^{'}}^{\circ})

exist for

t^{'} = - \infty, . . . ., 0, 1, . . ., T,

we find7 that the long-run path for

x_{i t}

consists of three mutually independent components, namely

\begin{matrix} \begin{matrix} x_{i t} = {(1 - ξ)}^{- 1} π_{η} η_{i}^{\circ} + {(1 - ξ)}^{- 1} π_{λ} λ_{i}^{\circ} + σ_{v} ω_{i}^{1 / 2} {(1 - ξ L)}^{- 1} v_{i t}^{\circ} . \end{matrix} \end{matrix}

(103)

The third component, the accumulated contributions of

v_{i t}^{\circ},

is a stationary AR(1) process with variance

σ_{v}^{2} ω_{i} / (1 - ξ^{2}) .

Approximating

N^{- 1} Σ_{i = 1}^{N} ω_{i}

by

1,

the average variance is

σ_{v}^{2} / (1 - ξ^{2}) .

The other two components have variances

π_{η}^{2} / {(1 - ξ)}^{2}

and

π_{λ}^{2} / {(1 - ξ)}^{2}

respectively, so the average long-run variance of

x_{i t}

equals

\begin{matrix} \begin{matrix} {\bar{V}}_{x} = {(1 - ξ)}^{- 2} (π_{η}^{2} + π_{λ}^{2}) + {(1 - ξ^{2})}^{- 1} σ_{v}^{2} . \end{matrix} \end{matrix}

(104)

A first characterization of the

x_{i t}

series can be obtained by setting

{\bar{V}}_{x} = 1 .

This is an innocuous normalization, because β is still a free parameter. As a second characterization of the

x_{i t}

series, we can choose what we call the (average) effects variance fraction of

x_{i t},

given by

\begin{matrix} \begin{matrix} E V F_{x} = {(1 - ξ)}^{- 2} (π_{η}^{2} + π_{λ}^{2}) / {\bar{V}}_{x}, \end{matrix} \end{matrix}

(105)

with

0 \leq E V F_{x} \leq 1,

for which we could take, say,

E V F_{x} \in {0, 0.3, 0.6}

. To balance the two individual effect variances we define for the case

E V F_{x} > 0

what we call the individual effect fraction of

η_{i}^{\circ}

in

x_{i t}

given by

\begin{matrix} \begin{matrix} I E F_{x}^{η} = π_{η}^{2} / (π_{η}^{2} + π_{λ}^{2}) . \end{matrix} \end{matrix}

(106)

So

I E F_{x}^{η},

with

0 \leq I E F_{x}^{η} \leq 1,

expresses the fraction due to

π_{λ} η_{i}^{\circ}

of the (long-run) variance of

x_{i t}

stemming from the two individual effects. We could take, say,

I E F_{x}^{η} \in {0, 0.3, 0.6}

.

From these three characterizations we obtain

\begin{matrix} π_{λ} & = (1 - ξ) {[(1 - I E F_{x}^{η}) E V F_{x}]}^{1 / 2}, \end{matrix}

(107)

\begin{matrix} π_{η} & = (1 - ξ) {[I E F_{x}^{η} E V F_{x}]}^{1 / 2}, \end{matrix}

(108)

\begin{matrix} σ_{v} & = {[(1 - ξ^{2}) (1 - E V F_{x})]}^{1 / 2} . \end{matrix}

(109)

For all three we will only consider the nonnegative root, because changing the sign would have no effects on the characteristics of

x_{i t}

, as we will generate the series

η_{i}^{\circ},

ε_{i t}^{\circ},

λ_{i}^{\circ}

and

ζ_{i t}^{\circ}

from symmetric distributions. The above choices regarding the

x_{i t}

process have the following implications for the average correlations between

x_{i t}

and its two constituting effects:

\begin{matrix} {\bar{ρ}}_{x η} & = π_{η} / (1 - ξ) = {[(1 - I E F_{x}^{η}) E V F_{x}]}^{1 / 2}, \end{matrix}

(110)

\begin{matrix} {\bar{ρ}}_{x λ} & = π_{λ} / (1 - ξ) = {[I E F_{x}^{η} E V F_{x}]}^{1 / 2} . \end{matrix}

(111)

Now the

x_{i t}

series can be generated upon choosing a value for

ρ_{v ε} .

This we obtain from

E (x_{i t} ε_{i t}) = σ_{ε} σ_{v} ρ_{v ε} ω_{i},

which on average is

σ_{v} ρ_{v ε} .

Hence, fixing the average simultaneity8 to

{\bar{ρ}}_{x ε},

we should choose

\begin{matrix} \begin{matrix} ρ_{v ε} = {\bar{ρ}}_{x ε} / σ_{v} . \end{matrix} \end{matrix}

(112)

In order that both correlations are smaller than 1 in absolute value an admissibility restriction has to be satisfied, namely

{\bar{ρ}}_{x ε}^{2} \leq σ_{v}^{2},

giving

\begin{matrix} \begin{matrix} {\bar{ρ}}_{x ε}^{2} \leq (1 - ξ^{2}) (1 - E V F_{x}) . \end{matrix} \end{matrix}

(113)

When choosing

E V F_{x} = 0.6

and

ξ = 0.8

we should have

|{\bar{ρ}}_{x ε}| \leq 0.379 .

That we should not exclude negative values of

{\bar{ρ}}_{x ε}

will become obvious in due course. For the moment it seems interesting to examine, say,

{\bar{ρ}}_{x ε} \in {- 0.3, 0, 0.3} .

The remaining choices concern β and

σ_{η}

which both directly affect the DGP for

y_{i t} .

Substituting (103) and (101) in (99) we find that the long-run stationary path for

y_{i t}

entails four mutually independent components, since

\begin{matrix} \begin{matrix} y_{i t} & = β {(1 - γ L)}^{- 1} x_{i t} + {(1 - γ)}^{- 1} σ_{η} η_{i}^{\circ} + σ_{ε} ω_{i}^{1 / 2} {(1 - γ L)}^{- 1} ε_{i t}^{\circ} \\ = {(1 - γ)}^{- 1} {(1 - ξ)}^{- 1} {[β π_{η} + (1 - ξ) σ_{η}] η_{i}^{\circ} + β π_{λ} λ_{i}^{\circ}} \\ + β σ_{v} ω_{i}^{1 / 2} {(1 - γ L)}^{- 1} {(1 - ξ L)}^{- 1} v_{i t}^{\circ} + σ_{ε} ω_{i}^{1 / 2} {(1 - γ L)}^{- 1} ε_{i t}^{\circ} \\ = {(1 - γ)}^{- 1} {(1 - ξ)}^{- 1} {[β π_{η} + (1 - ξ) σ_{η}] η_{i}^{\circ} + β π_{λ} λ_{i}^{\circ}} \\ + \frac{β σ_{v} {(1 - ρ_{v ε}^{2})}^{1 / 2} ω_{i}^{1 / 2}}{(1 - γ L) (1 - ξ L)} ζ_{i t}^{\circ} + \frac{[β ρ_{v ε} σ_{v} + (1 - ξ L) σ_{ε}] ω_{i}^{1 / 2}}{(1 - γ L) (1 - ξ L)} ε_{i t}^{\circ} \end{matrix} \end{matrix}

(114)

The second term of the final expression constitutes for each i an AR(2) process and the third one an ARMA(2,1) process. The variance of

y_{i t}

has four components given by (derivations in Appendix D)

\begin{matrix} \begin{matrix} V_{η} & = {(1 - γ)}^{- 2} {(1 - ξ)}^{- 2} {[β π_{η} + (1 - ξ) σ_{η}]}^{2} \\ V_{λ} & = {(1 - γ)}^{- 2} {(1 - ξ)}^{- 2} β^{2} π_{λ}^{2} \\ V_{ζ} (i) & = \frac{β^{2} σ_{v}^{2} (1 - ρ_{v ε}^{2}) (1 + γ ξ)}{(1 - γ^{2}) (1 - ξ^{2}) (1 - γ ξ)} ω_{i} \\ V_{ε} (i) & = \frac{[{(1 + β ρ_{v ε} σ_{v})}^{2} + ξ^{2}] (1 + γ ξ) - 2 ξ (1 + β ρ_{v ε} σ_{v}) (γ + ξ)}{(1 - γ^{2}) (1 - ξ^{2}) (1 - γ ξ)} ω_{i} . \end{matrix} \end{matrix}

Averaging the last two over all i yields

{\bar{V}}_{ζ}

and

{\bar{V}}_{ε} .

For the average long-run variance of

y_{i t}

we then can evaluate

\begin{matrix} \begin{matrix} {\bar{V}}_{y} = V_{η} + V_{λ} + {\bar{V}}_{ζ} + {\bar{V}}_{ε} . \end{matrix} \end{matrix}

(115)

When choosing fixed values for ratios involving these components to obtain values for β and

σ_{η}

we will run into the problem of multiple solutions. On the other hand, the four components of (115) have particular invariance properties regarding the signs of

β,

σ_{η}

and

ρ_{v ε}

, since changing the sign of all three yields exactly the same value of

{\bar{V}}_{y}

. We coped with this as follows. Although we note that

V_{η}

does depend on

β π_{η},

we set

σ_{η}

simply by fixing the direct cumulated effect impact of

η_{i}^{\circ}

on

y_{i t}

relative to the current noise

σ_{ε} = 1 .

This is

\begin{matrix} \begin{matrix} D E N_{y}^{η} = σ_{η} / (1 - γ) . \end{matrix} \end{matrix}

(116)

Because the direct and indirect (via

x_{i t}

) effects from

η_{i}^{\circ}

may have opposite signs,

D E N_{y}^{η}

could be given negative values too, but we restricted ourselves to

D E N_{y}^{η} \in {1, 4},

yielding

\begin{matrix} \begin{matrix} σ_{η} = (1 - γ) D E N_{y}^{η} . \end{matrix} \end{matrix}

(117)

Finally we fix a signal-noise ratio, which gives a value for

β .

Because under simultaneity the noise and current signal conflate, we focus on the case where

ρ_{x ε} = 0 .

Then we have

\begin{matrix} {\bar{V}}_{ζ} = {[(1 - γ^{2}) (1 - ξ^{2}) (1 - γ ξ)]}^{- 1} β^{2} σ_{v}^{2} (1 + γ ξ), {\bar{V}}_{ε} = {(1 - γ^{2})}^{- 1} . \end{matrix}

Leaving the variance due to the effects aside, the average signal variance is

{\bar{V}}_{ζ} + {\bar{V}}_{ε} - 1

, because the current average noise variance is unity. Hence, we may define a signal-noise ratio as

\begin{matrix} \begin{matrix} S N R = {\bar{V}}_{ζ} + {\bar{V}}_{ε} - 1 = \frac{β^{2} (1 - E V F_{x}) (1 + γ ξ)}{(1 - γ^{2}) (1 - γ ξ)} + \frac{γ^{2}}{(1 - γ^{2})}, \end{matrix} \end{matrix}

(118)

where we have substituted (109). For this we may choose, say,

S N R \in {2, 3, 5},

in order to find

\begin{matrix} \begin{matrix} β = {[\frac{1 - γ ξ}{1 + γ ξ} \frac{S N R - γ^{2} (S N R + 1)}{1 - E V F_{x}}]}^{1 / 2} . \end{matrix} \end{matrix}

(119)

Note that here another admissibility restriction crops up, namely

\begin{matrix} \begin{matrix} γ^{2} \leq S N R / (S N R + 1) . \end{matrix} \end{matrix}

(120)

However, for

|γ| \leq 0.8

this is satisfied for

S N R \geq 1.78 .

From (119) we only examined the positive root.

Instead of fixing SNR another approach would be to fix the total multiplier

\begin{matrix} \begin{matrix} T M = β / (1 - γ), \end{matrix} \end{matrix}

(121)

which would directly lead to a value for

β,

given

γ .

However, different

T M

values will then lead to different

S N R

values, because

\begin{matrix} \begin{matrix} S N R = T M^{2} (1 - E V F_{x}) \frac{(1 - γ) (1 + γ ξ)}{(1 + γ) (1 - γ ξ)} + \frac{γ^{2}}{(1 - γ^{2})} . \end{matrix} \end{matrix}

(122)

At this stage it is hard to say what would yield more useful information from the Monte Carlo, fixing

T M

or

S N R .

Keeping both constant for different γ and some other characteristics of this DGP is out of the question. We chose to fix

S N R = 3

. which yields

T M

values in the range 1.5–1.8. When comparing with results for

T M = 1

we did not note substantial principle differences.

For all different design parameter combinations considered, which involve sample size

N \in {200, 1000}

and

T \in {3, 6, 9},

we used the very same realizations of the underlying standardized random components

η_{i}^{\circ},

λ_{i}^{\circ},

ε_{i t}^{\circ}

and

ζ_{i t}^{\circ}

over the respective 10,000 replications that we performed. At this stage, all these components have been drawn from the standard normal distribution. To speed-up the convergence of our simulation results, in each replication we have modified the N drawings

η_{i}^{\circ}

and

λ_{i}^{\circ}

such that they have sample mean zero, sample variance 1 and sample correlation zero. This rescaling is achieved by replacing the N draws

η_{i}^{\circ}

first by

[η_{i}^{\circ} - N^{- 1} Σ_{i = 1}^{N} η_{i}^{\circ}]

and next by

η_{i}^{\circ} / {[N^{- 1} Σ_{i = 1}^{N} {(η_{i}^{\circ})}^{2}]}^{1 / 2},

and by replacing the

λ_{i}^{\circ}

by the residuals obtained after regressing

λ_{i}^{\circ}

on

η_{i}^{\circ}

and an intercept, and next scaling them by taking

λ_{i}^{\circ} / {[N^{- 1} Σ_{i = 1}^{N} {(λ_{i}^{\circ})}^{2}]}^{1 / 2} .

In addition, we have rescaled in each replication the

ω_{i}

by dividing them by

N^{- 1} Σ_{i = 1}^{N} ω_{i},

so that the resulting

ω_{i}

have sum N as they should in order to avoid that presence of heteroskedasticity is conflated with larger or smaller average disturbance variance.

In the simulation experiments we will start-up the processes for

x_{i t}

and

y_{i t}

at pre-sample period

s < 0

by taking

x_{i s} = 0

and

y_{i, s} = 0

and next generate

x_{i t}

and

y_{i t}

for the indices

t = s + 1, . . ., T .

The data with time-indices

s, . . ., - 1

will be discarded when estimating the model. We suppose that for

s = - 50

both series will be on their stationary track from

t = 0

onwards. When taking

s = - 1

or

- 2

the initial values

y_{i 0}

and

x_{i 1}

will be such that effect-stationarity has not yet been achieved. Due to the fixed zero startups (which are equal to the unconditional expectations) the (cross-)autocorrelations of the

x_{i t}

and

y_{i t}

series have a very peculiar start then too, so such results regarding effect nonstationarity will certainly not be fully general, but for s close to zero they mimic in a particular way the situation that the process started only very recently.

Another simple way to mimic a situation in which lagged first-differenced variables are invalid instruments for the model in levels can be designed as follows. Equations (103) and (114) highlight that in the long-run

Δ x_{i t}

and

Δ y_{i t}

are uncorrelated with the effects

η_{i}^{\circ}

and

λ_{i}^{\circ} .

This can be undermined by perturbing

x_{i 0}

and

y_{i 0}

as obtained from

s = - 50

in such a way that we add to them the values

\begin{matrix} \begin{matrix} \frac{ϕ - 1}{1 - ξ} (π_{η} η_{i}^{\circ} + π_{λ} λ_{i}^{\circ}) and \frac{ϕ - 1}{(1 - γ) (1 - ξ)} {[β π_{η} + (1 - ξ) σ_{η}] η_{i}^{\circ} + β π_{λ} λ_{i}^{\circ}}, \end{matrix} \end{matrix}

(123)

respectively. Note that for

ϕ = 1

effect stationarity is maintained, whereas for

0 \leq ϕ < 1

the dependence of

x_{i 0}

and

y_{i 0}

on the effects is mitigated in comparison to the stationary track (upon maintaining stationarity regarding

ε_{i t}^{\circ}

and

ζ_{i t}^{\circ}

), whereas for

ϕ > 1

this dependence is inflated. Note that this is a straight-forward generalization of the approach followed in Kiviet [7] for the panel AR(1) model.

5. Simulation Results

To limit the number of tables we proceed as follows. Often we will first produce results on unfeasible implementations of the various inference techniques in relatively simple DGPs. These exploit the true values of

ω_{1}, . . ., ω_{N},

σ_{ε}^{2}

and

σ_{η}^{2}

instead of their estimates. Although this information is generally not available in practice, only when such unfeasible techniques behave reasonably well in finite samples it seems useful to examine in more detail the performance of feasible implementations. Results for the unfeasible Arellano and Bond [1] and Blundell and Bond [2] GMM estimators are denoted as ABu and BBu respectively. Their feasible counterparts are denoted as AB1 and BB1 for the 1-step (which under homoskedasticity are equivalent to their unfeasible counterparts) and AB2 and BB2 for the 2-step estimators. For 2-step estimators the lower case letters a, b or c are used (as in for instance AB2c) to indicate which type of weighting matrix has been exploited, as discussed in Section 3.2.1 and Section 3.3.2. For corresponding MGMM implementations these acronyms are preceded by the letter M. Under homoskedasticity their unfeasible implementation has been omitted when this is equivalent to GMM. In BB estimation we have always used

q = 1

.

First in Section 5.1 we will discuss the results for DGPs in which the initial conditions are such that BB estimation will be consistent and more efficient than AB, and subsequently in Section 5.2 the situation where BB is inconsistent is examined. Within these subsections we will examine different parameter value combinations for the DGP. We will start by presenting results for a reference parametrization (indicated P0) which has been chosen such that the model has in fact four parameters less, by taking

{\bar{ρ}}_{x ε} = 0

(

x_{i t}

is strictly exogenous),

E V F_{x} = 0

(hence

π_{λ} = π_{η} = 0,

so

x_{i t}

is neither correlated with

λ_{i}

nor with

η_{i}

) and

κ = 0

(any cross-sectional heteroskedasticity is just related with

λ_{i}

). These choices (implying that any heteroskedasticity will be unrelated to the mean of regressor

x_{i t}

) may (hopefully) lead to results where little difference between unfeasible and feasible estimation will be found and where test sizes are relatively close to the nominal level of 5%. Next we will discuss the effects of settings (to be labelled P1, P2 etc.) which deviate from this reference parametrization P0 in one or more aspects regarding the various correlations and variance fractions and ratios. In P0 the relationship for

y_{i t}

will be characterized by

D E N_{y}^{η} = 1

(the impact on

y_{i t}

of the individual effect

η_{i}

and of the idiosyncratic disturbance

ε_{i t}

have equal variance). The two remaining parameters have been held fixed over all cases examined (including P0); the

x_{i t}

series has autoregressive coefficient

ξ = 0.8

and regarding

y_{i t}

we take

S N R = 3

(excluding the impacts of the individual effects, the variance of the explanatory part of

y_{i t}

is three times as large as

σ_{ε}^{2}

).

In Section 3.2 we already indicated that we will examine implementations of GMM where all internal instruments associated with linear moment conditions will be employed (A), but also particular reductions based either on collapsing (C) or omitting long lags (L3, etc.), or a combination (C3, etc.). On top of this we will also distinguish situations that may lead to reductions of the instruments that are being used, because the regressor

x_{i t}

in model (99), which will either be strictly exogenous or endogenous with respect to

ε_{i t},

might be rightly or wrongly treated as either strictly exogenous, or as predetermined (weakly exogenous), or as endogenous. These three distinct situations will be indicated by the letters X, W and E respectively. So, in parametrization P0, where

x_{i t}

is strictly exogenous, the instruments used by either A, C or, say, L2, are not the same under the situations X, W and E. This is hopefully clarified in the next paragraph.

Since we assume that for estimation just the observations

y_{i 0}, . . ., y_{i T}

and

x_{i 1}, . . ., x_{i T}

are available, the number of internal instruments that are used under XA (all instruments,

x_{i t}

treated as strictly exogenous) for estimation of the equation in first differences is:

T - 1

(time-dummies) +

T (T - 1) / 2

(lags of

y_{i t})

+

(T - 1) T

(lags and leads of

x_{i t})

. This yields

{11, 50, 116}

instruments for

T = {3, 6, 9} .

Under WA this is

{8, 35, 80}

and under EA

{6, 30, 72} .

From Section 3.3.1 it follows that for BB estimation this number of instruments increases with 1 (intercept) +

T - 1

(when

y_{i, t - 1}

is supposed to be effect stationary) +

T - 1

(when

x_{i, t}

is supposed to be effect stationary)

- 1

(when

x_{i t}

is treated as endogenous). This implies for

T = {3, 6, 9}

a total of

{5, 11, 17}

extra instruments under XA and WA, and of

{4, 10, 16}

under EA, whereas these extra instruments will be valid in Section 5.1 below and invalid in Section 5.2.

For the tables to follow we always examine the three values

γ \in {0.2, 0.5, 0.8}

for the dynamic adjustment coefficient at the three sample size values

T \in {3, 6, 9}

while mostly

N = 200,

as in the classic Arellano and Bond [1] study. This is done for both

θ = 0

(homoskedasticity) and

θ = 1

(substantial cross-sectional heteroskedasticity). Tables have a code which starts by the design parametrization, followed by the character u or f, indicating whether the table contains unfeasible or feasible results. Because of the many feasible variants not all results can be combined in just one table. Therefore, the f is followed by c, t, J or σ, where c indicates that the table just contains results on coefficient estimates, which are estimated bias, standard deviation (Stdv) and RMSE (root mean squared error; below often loosely addressed as precision); t refers to estimates of the actual rejection probabilities of tests on true coefficient values; J indicates that the table only contains results on Sargan-Hansen tests; and σ indicates that the table just contains results on estimating

σ_{ε}

en

σ_{η}

. Next, after a bar (-), the earlier discussed code is given for how regressor

x_{i t}

is actually treated when selecting the instruments, followed by the type of instrument reduction.

5.1. DGPs under Effect Stationarity

Here we focus on the case where BB is consistent and more efficient than AB, since

s = - 50

and

ϕ = 1

.

5.1.1. Results for the Reference Parametrization P0

Table 3, with code P0u-XA, gives results for unfeasible GMM coefficient estimators, unfeasible single coefficient tests, and for unfeasible Sargan-Hansen tests for the reference parametrization P0 when

x_{i t}

is (correctly) treated as strictly exogenous and all available instruments are being used. Table 4 (P0fc-XA) presents a selection of feasible counterparts regarding the coefficient estimators. Under homoskedasticity we see that for

{\hat{γ}}_{A B u} = {\hat{γ}}_{A B 1}

its bias (which is negative), stdv and thus its RMSE increase with γ and decrease with

T,

whereas the bias of

{\hat{β}}_{A B u} = {\hat{β}}_{A B 1}

is moderate and its RMSE, although decreasing in

T,

is almost invariant with respect to β. The BBu coefficient estimates are superior indeed, the more so for larger γ values (as is already well-known), but less so for

β .

As already conjectured in Section 3.6 under cross-sectional heteroskedasticity both ABu and BBu are substantially less precise than under homoskedasticity. However, modifying the instruments under cross-sectional heteroskedasticity as is done by MABu and MBBu yields considerable improvements in performance both in terms of bias and RMSE. In fact, the precision of the unfeasible modified estimators under heteroskedasticity comes very close to their counterparts under homoskedasticity.

The simulation results in Table 4 for feasible estimation do not contain the b variant of the weighting matrix9 because it is so bad, whereas both the a and c variants yield RMSE values very close to their unfeasible counterparts, under homoskedasticity as well as heteroskedasticity. Although the best unfeasible results under heteroskedasticity are obtained by MBBu, this does not fully carry over to MBB, because for T small and also for moderate T and large

γ,

BB2c performs much better. The performance of MAB and AB2c is rather similar, whereas we established that their unfeasible variants differ a lot when γ is large. Apparently, the modified estimators can be much more vulnerable when the variances of the error components,

σ_{ε, i}^{2}

and

σ_{η}^{2}

, are unknown, probably because their estimates have to be inverted in (92) and (94).

From the type I error estimates for unfeasible single coefficient tests in Table 3 we see that the standard test procedures work pretty well for all techniques regarding

β,

but with respect to γ ABu fails for larger

γ .

This gets even worse under heteroskedasticity, but less so for MABu. For BBu and MBBu the results are reasonable. Here the test seems to benefit from the smaller bias of BBu. For the feasible variants we find in Table 5 (P0ft-XA) that under homoskedasticity AB1 has reasonable actual significance level for

β,

but for γ only when it is small. The same holds for AB2c. Under heteroskedasticity AB2c overrejects, especially for γ or T large, but only mildly so for tests on

β .

Both AB2a and MAB overreject enormously. Employing the Windmeijer [29] correction mitigates the overrejection probabilities in many cases, but not in all. AB2cW has appropriate size for tests on

β,

but for tests on γ the size increases both with γ and with T from 7% to 37% over the grid examined. Since the test based on ABu shows a similar pattern, it is self-evident that a correction which just takes the randomness of AB1 into account cannot be fully effective. Oddly enough the Windmeijer correction is occasionally more effective for the heavily oversized AB2a than for the less oversized AB2c. Under homoskedasticity both BB2c and BB2cW behave very reasonable, both for tests on β and on

γ .

Under heteroskedasticity BB2cW is still very reasonable, but all other implementations fail in some instances, especially for tests on γ when γ or T are large. The failure of BB1 under heteroskedasticity is self-evident, see (76).

Regarding the unfeasible J tests Table 3 shows reasonable size properties under homoskedasticity, especially for

J B B u

, but less so for the incremental test on effect stationarity when γ is large. Under heteroskedasticity this problem is more serious, though less so for the unfeasible modified procedure. Heteroskedasticity and γ large lead to underrejection of the

J A B u

test, especially when T is large too. Turning now to the many variants of feasible J tests, of which only some10 are presented in Table 6 (P0fJ-XA), we first focus on

J A B

. Under homoskedasticity

J A B^{(1, 0)}

behaves reasonable, though when (inappropriately) applied when

θ = 1

it rejects with high probability (thus detecting heteroskedasticity instead of instrument invalidity, probably due to underestimation of the variance of the still valid moment conditions). Of the

J A B^{(1, 1)}

tests, which are only valid when

θ = 0,

the c variant severely underrejects when

T = 9

(when there is an abundance of instruments), but less so than the a version. Such severe underrejection under homoskedasticity had already been noted by Bowsher [14] when

T > 9 .

An almost similar pattern we note for

J A B^{(2, 1)}

and

J A B^{(2, 2)},

which are asymptotically valid for any

θ .

Test

J M A B

overrejects severely for

T = 3

and underrejects otherwise. Turning now to feasible

J B B

tests we find that

J B B^{(1, 0)}

underrejects when

θ = 0

and, like

J A B^{(1, 0)},

rejects with high probability when

θ = 1 .

Both the a and c variants of test

J B B^{(1, 1)},

like

J A B^{(1, 1)},

have rejection probabilities that are not invariant with respect to

T,

γ and

θ .

The c variants seem the least vulnerable, and therefore also yield an almost reasonable incremental test

J E S^{(1, 1)}

, although it underrejects when

θ = 0

and overrejects when

θ = 1

for

γ = 0.8 .

For

J B B^{(2, 1)}

and

J B B^{(2, 2)}

too the c variant has rejection probabilities which vary the least with

T,

γ and

θ,

but they are systematically below the nominal significance level, which is also the case for the resulting incremental tests. Oddly enough, the incremental tests resulting from the a variants have type I error probabilities reasonably close to 5%, despite the serious underrejection of both the

J A B

and

J B B

tests from which they result.

From Table 7 it can be seen that in the base case P0 estimation of

σ_{ε}

(which has true value 1) is pretty accurate for all techniques and T and γ values, but less so under heteroskedasticity when T is small and γ large. Estimation of

σ_{η}

is much more problematic. Only when γ is moderate, estimation bias is moderate too. The bias can exceed 100% when γ is large and T is small, and gets even worse under heteroskedasticity. Employing BB mitigates this bias.

When treating regressor

x_{i t}

as predetermined (P0-WA, not presented here), although it is strictly exogenous, fewer instruments are being used. Since the ones that are now abstained from are most probably the strongest ones regarding

Δ x_{i t},

it is no surprise that in the simulation results we note that especially the standard deviation of the β coefficient suffers. Also the rejection probabilities of the various tests differ slightly between implementations WA and XA, but not in a very systematic way as it seems. When treating

x_{i t}

as endogenous (P0-EA) the precision of the estimators gets worse, with again no striking effects on the performance of test procedures under their respective null hypotheses. Upon comparing for P0 the instrument set A (and set C) with the one where A

^{x}

(C

^{x}

) is replaced by C1

^{x}

it has been found that the in practice quite popular choice C1

^{x}

yields often slightly less efficient estimates for β, but much less efficient estimates for γ.

When

x_{i t}

is again treated as strictly exogenous, but the number of instruments is reduced by collapsing the instruments stemming from both

y_{i t}

and

x_{i t},

then we note from Table 8 (P0fc-XC, just covering

θ = 1)

a mixed picture regarding the coefficient estimates. Although any substantial bias always reduces by collapsing, standard errors always increase at the same time, leading either to an increase or a decrease in RMSE. Decreases occur for the AB estimators of

γ,

especially when γ is large; for β just increases occur. A noteworthy reduction in RMSE does show up for BB2a when γ is large,

T = 9

and

θ = 1,

but then the RMSE of BB2c using all instruments is in fact smaller. However, Table 9 (P0ft-XC) shows that collapsing is certainly found to be very beneficial for the type I error probability of coefficient tests, especially in cases where collapsing yields substantially reduced coefficient bias. The AB tests benefit a lot from collapsing, especially the c variant, leaving only little room for further improvement by employing the Windmeijer correction. After collapsing AB1 works well under homoskedasticity, and also under heteroskedasticity provided robust standard errors are being used, where the c version is clearly superior to the a version. AB2c has appropriate type I error probabilities, except for testing γ when it is

0.8

at

T = 3

and

θ = 1

(which is not repaired by a Windmeijer correction either), and is for most cases superior to AB2aW. After collapsing BB2a shows overrejection which is not completely repaired by BB2aW when

θ = 1

. BB2c and BB2cW generally show lower rejection probabilities, with occasionally some underrejection. Tests based on MAB and MBB still heavily overreject. Table 10 (P0fJ-XC) shows that by collapsing the

J A B

and

J B B

tests suffer much less from underrejection when T is larger than 3. However, both the a and c versions of the

J^{(2, 1)}

and

J^{(2, 2)}

tests usually still underreject, mostly by about 1 or 2 percentage points. Good performance is shown by

J E S_{a}^{(2, 1)}

and

J E S_{c}^{(2, 1)} .

Table 11 (P0fσ-XC) shows that collapsing reduces the bias in estimates of

σ_{η}

substantially, although the bias is still huge when γ is large and T small, especially for AB and more so under heteroskedasticity.

When

x_{i t}

is still correctly treated as strictly exogenous but for the level instruments just a few lags or first differences are being used (XL0 ... XL3) for both

y_{i t}

and

x_{i t}

then we find the following. Regarding feasible AB and BB estimation collapsing (XC) always gives smaller RMSE values than XL0 and XL1 (which is much worse than XL0), but this is not the case for XL2 and XL3. Whereas XC yields smaller bias, XL2 and XL3 often reach smaller Stdv and RMSE. Especially regarding β XL3 performs better than XL2. Probably due to the smaller bias of XC it is more successful in mitigating size problems of coefficient tests than XL0 through XL3. The effects on J tests is less clear-cut. Combining collapsing with restricting the lag length we find that XC2 and XC3 are in some aspects slightly worse but in others occasionally better than XC for P0. We also examined the hybrid instrumentation which seems popular amongst practitioners where C

^{w}

is combined with L1

^{x}

(see Table 1). Especially for γ this leads to loss of estimator precision without any other clear advantages, so it does not outperform the XC results for P0. From examining P0-WC (and P0-EC) we find that in comparison to P0-WA (P0-EA) there is often some increase in RMSE, but the size control of especially the t-tests is much better.

Summarizing the results for P0 on feasible estimators and tests we note that when choosing between different possible instrument sets a trade off has to be made between estimator precision and test size control. For both some form of reduction of the instrument set is often but not always beneficial. Not one single method seems superior irrespective of the actual values of

γ,

β and

T .

Using all instruments is not necessarily a bad choice; also XC, XL3 and XC3 often work well. To mitigate estimator bias and foster test size control while not sacrificing too much estimator precision using collapsing (C) for all regressors seems a reasonable compromise, as far as P0 is concerned. Coefficient and J tests based on the modified estimator using its simple feasible implementation examined here behave so poorly, that in the remainder we no longer mention its results.

5.1.2. Results for Alternative Parametrizations

Next we examine a series of alternative parametrizations where each time we just change one of the parameter values of one of the already examined cases. In P1 we increase

D E N_{y}^{η}

from 1 to 4 (hence, substantially increasing the relative variance of the individual effects). We note that for P1-XA (not tabulated here) all estimators regarding γ are more biased and dispersed than for P0-XA, but there is little or no effect on the β estimates. For both T and γ large this leads to serious overrejection for the unfeasible coefficient tests regarding γ, in particular for ABu. Self-evidently, this carries over to the feasible tests and, although a Windmeijer correction has a mitigating effect, the overrejection remains often serious for both AB and BB based tests. Tests on β based on AB behave reasonable, apart from not robustified AB1 and AB2a. For the latter a Windmeijer correction proves reasonably effective. When exploiting the effect stationarity the BB2c implementation seems preferable. The unfeasible J tests show a similar though slightly more extreme pattern as for P0-XA. Among the feasible tests both serious underrejection and some overrejection occurs. The when

θ = 1

invalid

J A B^{(1, 1)}

is not much worse than the valid tests. As far as the incremental tests concerns

J E S_{c}^{(2, 2)}

behaves remarkably well.

In Table 12, Table 13, Table 14 and Table 15 (P1fj-XC,

j =

c,t,J,σ) we find that collapsing leads again to reduced bias, slightly deteriorated precision though improved size control (here all unfeasible tests behave reasonably well). All feasible AB1R and AB2W tests have reasonable size control, apart from tests on γ when T is small and γ large. These give actual significance levels close to 10%. BB2cW seems slightly better than BB2aW. The J tests using 1-step residuals only show some serious overrejection under heteroskedasticity, whereas the

J^{(2, 1)}

and

J^{(2, 2)}

behave quite satisfactorily. The increase of

σ_{η}

has an adverse effect on its estimate when using uncollapsed BB for γ small, but collapsing substantially reduces the bias in

σ_{η}

estimates. For C3 reasonably similar results are obtained, but those for L3 are generally slightly less attractive.

In P2 we increase

E V F_{x}

from 0 to 0.6, upon having again

I E F_{x}^{η} = 0

(hence,

x_{i t}

is still uncorrelated with effect

η_{i}

though correlated with effect

λ_{i},

which determines any heteroskedasticity). This leads to increased β values. Results for P2-XA show larger absolute values for the standard deviations of the β estimates than for P0-XA, but they are almost similar in relative terms. The patterns in the rejection probabilities under the respective null hypotheses are hardly affected, and P2-XC shows again improved behavior of the test statistics due to reduced estimator bias, whereas the RMSE values have slightly increased. Under P2

σ_{η}

estimates are more biased than under P0.

In P3 we change

I E F_{x}^{η}

from 0 to 0.3, while keeping

E V F_{x} = 0.6

(hence, realizing now dependence between regressor

x_{i t}

and the individual effect

η_{i}

). Comparing the results for P3-XA with those for P2-XA (which have the same β values) we find that all patterns are pretty similar. Also P3-XC follows the P2-XC picture closely. Under P3

σ_{η}

estimates are more biased than under P0.

P4 differs from P3 because

κ = 0.25,

thus now the heteroskedasticity is determined by

η_{i}

too. This has a noteworthy effect on MBB estimation, a minor effect on JBB (and thus on JES) testing, and almost no effect on

σ_{η}

estimation.

P5 differs from P0 just in having

{\bar{ρ}}_{x ε} = 0.3,

so

x_{i t}

is now endogenous with respect to

ε_{i t} .

P5-EA uses all instruments available when correctly taking the endogeneity into account. This leads to very unsatisfactory results. The coefficient estimates of γ have serious negative bias, and those for β positive bias, whereas the standard deviation is slightly larger than for P0-EA, which are substantially larger than for P0-XA. All coefficient tests are very seriously oversized, also after a Windmeijer correction, both for AB and BB. Tests

J A B u

and

J B B u

show underrejection, whereas the matching

J E S

tests show serious overrejection when T is large, but the feasible 2-step variants are not all that bad. From Table 16, Table 17 and Table 18 (P5fj-EC,

j =

c,t,J) we see that most results which correctly handle the simultaneity of

x_{i t}

are still bad after collapsing, especially for T small (where collapsing can only lead to a minor reduction of the instrument set), although not as bad as those for P5-EA and larger values of T. For P5-EC the rejection probabilities of the corrected coefficient tests are usually in the 10%–20% range, but those of the 2-step J tests are often close to 5%. Under P5 estimates of

σ_{ε}

and

σ_{η}

are much more biased than under P0. Both AB and BB are inconsistent when treating

x_{i t}

either as predetermined or as exogenous. For P5-WA and P5-XA the coefficient bias is almost similar but much more serious than for P5-EA. For the inconsistent estimators the bias does not reduce when collapsing the instruments. Because the inconsistent estimators have a much smaller standard deviation than the consistent estimators practitioners should be warned never to select an estimator simply because of its attractive estimated standard error. The consistency of AB and BB should be tested with the Sargan-Hansen test.

In this study we did not examine the particular incremental test which focusses on the validity of the extra instruments when comparing E with W or E with X. Here we just examine the rejection probabilities of the overall overidentification J tests for case P5 using all instruments and can compare the rejection frequencies when treating

x_{i t}

correctly as endogenous, or incorrectly as either predetermined or exogenous. From Table 19 (P5fJ-jA,

j =

E,W,X) we find that size control for

J^{(2, 2)}

can be slightly better than for

J^{(2, 1)} .

The detection of inconsistency by

J^{(2, 1)}

has often a higher probability when the null hypothesis is W than when it is X. The probability generally increases with T and with γ and is often better for the c variant than for the a variant and slightly better for BB implementations than for AB implementations, whereas in general heteroskedasticity mitigates the rejection probability. In the situation where all instruments have been collapsed, where we already established that the J tests do have reasonable size control, we find the following. For

T = 3

and

γ = 0.2

the rejection probability of the

J A B

and

J B B

tests does not rise very much when

{\bar{ρ}}_{x ε}

moves from 0 to 0.3, whereas for

T = 9,

{\bar{ρ}}_{x ε} = 0.3

this rejection probability is often larger than

0.7

when

γ \geq 0.5

and often larger than

0.9

for

γ = 0.8 .

Hence, only for particular

T,

γ and θ parametrizations the probability to detect inconsistency seems reasonable, whereas the major consequence of inconsistency, which is serious estimator bias, is relatively invariant regarding

T,

γ and

θ .

Summarizing our results for effect stationary models we note the following. We established that finite sample inaccuracies of the asymptotic techniques seriously aggravate when either

σ_{η} / (1 - γ) ≫ σ_{ε}

or under simultaneity. For both problems it helps to collapse instruments, and the first problem is mitigated and the second problem detected with higher probability by instrumenting according to W rather than X. Neglected simultaneity leads to seemingly accurate but seriously biased coefficient estimators, whereas asymptotically valid inference on simultaneous dynamic relationships is often not very accurate either. Even when the more efficient BB estimator is used with Windmeijer corrected standard errors, the bias in both γ and β is very substantial and test sizes are seriously distorted. Some further pilot simulations disclosed that N should be much and much larger than 200 in order to find much more reasonable asymptotic approximation errors.

5.2. Nonstationarity

Next we examine the effects of a value of ϕ different from unity. We will just consider setting

ϕ = 0.5

and perturbing

x_{i 0}

and

y_{i 0}

according to (123), so that their dependence on the effects is initially 50% away from stationarity so that BB estimation is inconsistent. That this occurred we will indicate in the parametrization code by changing P into P

^{ϕ} .

Comparing the results for P

^{ϕ}

0-XA with those for P0-XA, where

ϕ = 1

(effect stationarity), we note from Table 20 (P

^{ϕ}

0fc-XA) a rather moderate positive bias in the BB estimators for both γ and β when both T and γ are small. Despite the inconsistency of BB the bias is very mild for larger T and especially for larger γ it is much smaller than for consistent AB. The pattern regarding T can be explained, because convergence towards effect stationarity does occur when time proceeds. Since this convergence is faster for smaller γ the good results for large γ seem due to great strength of the first-differenced lagged instruments regarding the level equation. Since

π_{η} = 0

here

Δ x_{i, t - 1}

is in fact a valid instrument too. Note that the RMSE of inconsistent BB1, BB2a and BB2c is always smaller than that for consistent AB1, AB2a and AB2c, except when T and γ are both small. With respect to the AB estimators we find little to no difference compared to the results under stationarity. Table 21 (P

^{ϕ}

0ft-XA) shows that when

γ = 0.8

the BB2cW coefficient test on γ yields very mild overrejection, while AB2aW and AB2cW seriously overreject. For smaller values of γ it is the other way around. After collapsing (not tabulated here) similar but more moderate patterns are found, due to the mitigated bias which goes again with slightly increased standard errors. Hence, for this case we find that one should perhaps not worry too much when applying BB even if effect stationarity does not strictly hold for the initial observations. As it happens, we note from Table 22 (P

^{ϕ}

0fJ-XA) that the rejection probabilities of the JES tests are such that they are relatively low when BB inference is more precise than AB inference, and relatively high when either T or γ are low for

ϕ = 0.5 .

This pattern is much more pronounced for the JES tests than for the JBB tests. However, it is also the case in P

^{ϕ}

0 that collapsing mitigates this welcome quality of the JES tests to warn against unfavorable consequences of effect nonstationarity on BB inference.

From P

^{ϕ}

1-XA, in which the individual effects are much more prominent, we find that

ϕ = 0.5

has curious effects on AB and BB results. For effect stationarity

(ϕ = 1)

we already noted more bias for AB than under P0. For γ large, this bias is even more serious when

ϕ = 0.5,

despite the consistency of AB. For BB estimation the reduction of ϕ leads to much larger bias and much smaller stdv, with the effect that RMSE values for inconsistent BB are usually much worse than for AB, but are often slightly better (except for BB2c) when

γ = 0.8

. All BB coefficient tests for γ have size close or equal to 1 under P

^{ϕ}

1-XA and the AB tests for

γ = 0.8

overreject very seriously as well. Under P

^{ϕ}

1-XC the bias of AB is reasonable except for

γ = 0.8 .

The bias of BB has decreased but is still enormous, although its RMSE remains preferable when

γ = 0.8 .

Especially regarding tests on γ BB fails. For both the a and c versions the JES test has high rejection probability to detect

ϕ \neq 1,

except when γ is large. The relatively low rejection probability of JES tests obtained after collapsing when

γ = 0.8

and

ϕ = 0.5

again indicates that despite its inconsistency BB has similar or smaller RMSE than AB for that specific case.

Next we consider the simultaneous model again. In case P

^{ϕ}

5-EA estimator AB is consistent and BB again inconsistent. Nevertheless, for all γ and T values examined in Table 23 (P

^{ϕ}

5fc-EA), AB has a more severe bias than BB, whereas BB has smaller stdv values at the same time and thus has smaller RMSE for all γ and T values examined. The size control of coefficient tests is worse for AB, but for BB it is appalling too, where BB2aW, with estimated type I error probabilities ranging from 5% to 70%, is often preferable to BB2cW. The 2-step JAB tests behave reasonably, whereas the JBB tests reject with probabilities in the 3%–38% range, and JES in the 3%–69% range. By collapsing the RMSE of AB generally reduces when

T \geq 6

and for BB especially when

γ = 0.8 .

BB has again smaller RMSE than AB. The rejection rates of the JBB and JES tests are substantially lower now, which seems bad because the invalid (first-differenced) instruments are less often detected, but this may nevertheless be appreciated because it induces to prefer less inaccurate BB inference to AB inference. After collapsing the size distortions of BB2aW and BB2cW are less extreme too, now ranging from 5%–33%, but the RMSE values for BB may suffer due to collapsing, especially when γ and T are small. The RMSE values for BB under P

^{ϕ}

5-WA and P

^{ϕ}

5-XA are usually much worse than those for AB under P

^{ϕ}

5-EA. Hence, although the invalid instruments for the level equation are not necessarily a curse when endogeneity of

x_{i t}

is respected, they should not be used when they are invalid for two reasons (

ϕ \neq 1

and

ρ_{x ε} \neq 0

). That neither AB nor BB should be used in P5 under W and X will be indicated with highest probability under WC, and then this probability is larger than 0.8 for

J B B_{a}^{(2, 1)}

only when T is high and for

J A B_{a}^{(2, 1)}

only when both T and γ are high.

Summarizing our findings regarding effect nonstationarity, we have established that although

ϕ \neq 1

renders BB estimators inconsistent, especially when T is not small BB inference nevertheless often beats consistent AB, provided possible endogeneity of

x_{i t}

is respected. The JES test seems to have the remarkable property to be able to guide towards the technique with smallest RMSE instead of the technique exploiting the valid instruments. For further details we refer to the full set of Monte Carlo results.

6. Empirical Results

The above findings will be employed now in a re-analysis of the data and some of the techniques studied in Ziliak [35]. The main purpose of that article was to expose the downward bias in GMM as the number of moment conditions expands. This is done by estimating a static life-cycle labor-supply model for a ten year balanced panel of males, and comparing for various implementations of 2SLS and GMM the coefficient estimates and their estimated standard errors when exploiting expanding sets of instruments. We find this approach rather naive for various reasons: (a) the difference between empirical coefficient estimates will at best provide a very poor proxy to any underlying difference in bias; (b) standard asymptotic variance estimates of IV estimators are known to be very poor representations of true estimator uncertainty;11 (c) the whole analysis is based on just one sample and possibly the model is seriously misspecified.12 The latter issue also undermines conclusions drawn in Ziliak [35] on overrejection by the J test, because it is of course unknown in which if any of his empirical models the null hypothesis is true. To avoid such criticism we designed the controlled experiments in the two foregoing sections on the effects of different sets of instruments on various relevant inference techniques. Now we will examine how these simulation results can be exploited to underpin actual inference from the data set used by Ziliak.

This data set originates from waves XII-XXI and the years 1979–1988 of the PSID. The subjects are

N = 532

continuously married working men aged 22–51 in 1979. Ziliak [35] employs the static model13

\begin{matrix} \begin{matrix} \ln h_{i t} = β \ln w_{i t} + z_{i t}^{'} γ + η_{i} + ε_{i t}, \end{matrix} \end{matrix}

(124)

where

h_{i t}

is the observed annual hours of work,

w_{i t}

the hourly real wage rate,

z_{i t}

a vector of four characteristics (kids, disabled, age, age-squared),

η_{i}

an individual effect and

ε_{i t}

the idiosyncratic error term. He assumes that

\ln w_{i t}

may be an endogenous regressor and that all variables included in

z_{i t}

are predetermined. The parameter of interest is β and in the various static models examined its GMM estimates range from approximately 0.07 to 0.52, depending on the number of instruments employed.

After some experimentation we inferred that lagged reactions play a significant role in this relationship and that in fact a general second-order linear dynamic specification is required in order to pass the diagnostic tests which are provided by default in the Stata package xtabond2 (StataCorp LLC), see Roodman [43]. This model, also allowing for time-effects, is given by

\begin{matrix} \begin{matrix} \ln h_{i t} & = \sum_{l = 1}^{2} γ_{l} \ln h_{i, t - l} + \sum_{l = 0}^{2} (β_{l}^{w} \ln w_{i, t - l} + β_{l}^{k} k i d s_{i, t - l} + β_{l}^{d} d i s a b_{i, t - l}) \\ + β^{a} a g e_{i, t} + β^{a a} a g e_{i, t}^{2} + τ_{t} + η_{i} + ε_{i t} . \end{matrix} \end{matrix}

(125)

We did not include lags of age and its square.14 Contrary to Ziliak, we will not treat variable

a g e

as predetermined, since due to its very nature (no feedbacks from hours worked to age) it must be strictly exogenous. On the other hand, lagged or even immediate feedbacks from labor supply to the variables

k i d s

and

d i s a b

seem well possible.

In the sequence of various model specifications and instrument set compositions embarked on below, we adopted the following methodological strategy. We start with a rather general initial dynamic model specification employing a relatively uncontroversial set of instruments, hence avoiding as much as possible the imposition of doubtful exclusion restrictions on (lagged) regressor variables as well as the exploitation of yet unconfirmed orthogonality conditions. This initial model is estimated by 1-step AB with heteroskedasticity robust standard errors, neglecting for the moment any coefficient t-tests, unless serial correlation tests and heteroskedasticity robust J tests show favorable results. As long as the latter is not the case, the model should be re-specified by adapting the functional form and/or including additional explanatories, either new ones or transformations of already included ones such as longer lags or interactions. When favorable serial correlation and robust J tests have been obtained, and when reconfirmed (especially in case evidence has been found indicating the presence of heteroskedasticity) by favorable autocorrelation and J tests after 2-step AB estimation, hopefully initial consistent estimates have been accomplished. Then, in next stages, the two further aims are: attaining increased efficiency and mitigating finite sample bias. These are pursued first by sequentially testing additional orthogonality conditions. Initially by testing whether variables treated as endogenous seem actually predetermined, and next by verifying whether predetermined variables seem in fact exogenous, possibly followed by testing the orthogonality conditions implied by effect stationarity. In this process the tested extra instruments are added to the already adopted set of instruments, provided incremental J tests are convincingly insignificant. Next, one could test coefficient restrictions (on the basis of robust 1-step AB standard errors in case of suspected heteroskedasticity, or using Windmeijer-corrected 2-step AB standard errors) and impose these restrictions when convincingly insignificant from both a statistical and economic point of view. During the whole process the effects on the various estimates and test statistics of collapsing the instrument set and/or removing instruments with long lags could be monitored and possibly induce not exploiting particular probably valid orthogonality conditions represented by apparently weak instruments.

For the present data set, the inclusion of second-order lags in the initial model specification yields

T = 7

and

K = 20

when estimating the first-differenced model (125), hence

N T = 3724

. Although no generally accepted rules of thumb exist yet on requirements regarding the number of degrees of freedom and the degree of overidentification for GMM to work well in the analysis of micro panel data sets, we chose to respect at any stage in the specification search the inequalities

L ≪ 10 K

and

L ≪ N T / 20,

but also examined cases where

K < L < 2 K .

Table 24 presents some estimation and test results for model (125) obtained by employing different estimation methods and instrument sets. All results have been obtained by Stata/SE14.0 with package xtabond2 (StataCorp LLC), abstaining from any finite sample corrections, and supplemented with code for calculating

{\hat{σ}}_{η},

{\hat{σ}}_{ε}

and

J A B

test variants. In column (1) 1-step Arellano-Bond GMM estimates are presented (omitting the results for the included time-effects) with heteroskedasticity robust standard errors (indicated by AB1R) using all level instruments that are valid when (with respect to

ε_{i t})

regressor

\ln h_{i, t - 1}

is predetermined, the regressors

\ln w_{i t},

k i d s_{i t}

and

d i s a b_{i t}

could be endogenous, and

a g e

is exogenous (indicated by 1P3E1X). This yields

4 \times Σ_{h = 2}^{8} h + 9 = 149

instruments, because we instrumented both

a g e

and

a g e^{2},

like the seven time-dummies, just by themselves. For the AR and J tests given in the bottom lines the p-values are presented. Hence, in column (1) the (first-differenced) residuals do exhibit 1st order serial correlation (as they should) but no significant 2nd order problems emerge. We supplemented

J A B^{(1, 0)}

and

J A B_{a}^{(2, 1)},

as presented by xtabond2 (StataCorp LLC, see our footnote 3) by

J A B_{a}^{(1, 1)}

and

J A B_{a}^{(2, 2)} .

The p-value of 0.000 for

J A B_{a}^{(1, 0)}

should be neglected, because we found convincing evidence of heteroskedasticity from an auxiliary LSDV regression (not presented) of the squared level residuals for the findings in column (1) on all regressors of model (125), except the current endogenous ones. Xtabond2 suggests now that we judge the adequacy of model and instruments on the basis of test

J A B_{a}^{(2, 1)},

hence on a hybrid test statistic involving both 1-step and 2-step residuals. Its p-value is high, thus seems to approve the validity of the instruments. However, in some of our simulations this variant underrejects. The purely 1-step based

J A B_{a}^{(1, 1)}

test is only valid under conditional homoskedasticity, so we may neglect its low p-value in this and all other columns.

Because many regressors in column (1) have very low absolute t-values, this may undermine the finite sample performance of the

J A B

tests. Therefore, in column (2) we examine removal of the time-effects from the regression, which in column (1) have absolute t-values between 0.34 and 1.15. In column (3) we remove the time-dummies from the set of instruments too. This has little effect. Because the exogeneity of the time-effects is self-evident, we decide to keep them in the instrument set, though exclude them from the regressors. Since we did not manage to get more satisfying results regarding the J tests by relaxing implicit restrictions (including interactions, generalizing the functional form), we adopt with some hesitance the specification and classification of the variables of column (2) as an acceptable starting point. In the table all coefficient estimates with a t-ratio above 2 are marked by a double asterix, and a single asterix when between 1 and 2 (estimated standard errors are given between parentheses). The modest estimated values for the lagged dependent variable coefficient estimates in combination with those of

{\hat{σ}}_{η} / {\hat{σ}}_{ε}

suggest values of the

D E N_{y}^{η}

concept such that the relatively unfavorable simulation results for case P1 (where

D E N_{y}^{η} = 4

) do not seem to apply here. Column (4) presents the Windmeijer corrected 2-step AB estimates. For many coefficients these suggest an improvement in estimator efficiency. From the simulations we learned that we should not overrate the qualities of 2-step estimation. Also note that some of the coefficient estimates deviate from their 1-step counterparts, which might be due to vulnerability to finite sample bias. This can also be seen from the bottom row of the table which presents the estimate of the long-run wage elasticity of hours worked. This total multiplier is given by

T M^{w} = ({\hat{β}}_{0}^{w} + {\hat{β}}_{1}^{w} + {\hat{β}}_{2}^{w}) / (1 - {\hat{γ}}_{1} - {\hat{γ}}_{2}) .

Column (4) suggests a lower elasticity than column (2). Many of the static models estimated by Ziliak suggest even lower values for this elasticity (and forced equality of immediate and long-run elasticity, which is sharply rejected by all our models).

Before we proceed, we want to report that when estimating model (125) by AB1R without second order lags (then

K = 17

and

L = 154

) the p-values of the AR(1) and AR(2) tests are 0.000 and 0.754 respectively, whereas that of

J A B_{a}^{(2, 1)}

is

0.510 .

Hence, despite the significance of various of the coefficients of twice lagged variables in columns (1) through (3), these three tests do not detect the apparent dynamic underspecification; hence, they lack power.

Although quite a few slope coefficients in columns (1) through (3) have t-ratio’s with small absolute values, similar to the time-effects, we prefer not to proceed at this stage by imposing further coefficient restrictions on the model. Instead, we shall try to decrease the estimated standard errors and mitigate finite sample bias by examining whether the three regressors which we treated as endogenous could actually be classified such that additional and stronger instruments might be used. However, before we do that, just for illustrative purposes, we present again AB1 and AB2 results for the model specification and instrument set as used in column (2), but now not robustified AB1 in column (5) and not Windmeijer corrected AB2 in column (6). For most coefficients column (5) suggests smaller standard errors than column (2), but given the detected heteroskedasticity we know these are deceitful inconsistent standard deviation estimates. Column (6) shows that not using the Windmeijer correction would incorrectly suggest that AB2 is substantially more efficient than (robust) AB1, which often it is not, as we already learned from our simulations. Note that the value of the serial correlation tests does not just depend on the (unaffected) residuals, but on the (affected) coefficient standard errors too. Therefore, we interpret the rejection by AR(2) in column (5) as due to size problems.

Next, a series of incremental

J_{a}^{(2, 1)}

tests (not presented in the table) has been performed to establish the actual classification of the three yet treated as endogenous regressors. Testing against 1P4X (which implies 42 extra instruments) yields a p-values below 0.005. So, we better proceed step by step to assess whether some of these 42 instruments are nevertheless valid. Testing validity of the 7 extra instruments in case

d i s a b_{i, t}

is treated as predetermined yields a p-value of 0.029, so this seems truly endogenous. Doing the same for

k i d s_{i, t}

gives 0.520. Next testing whether the 7 extra instruments involving current values of

k i d s_{i, t}

seem valid too yields p-value 0.398 and when testing against column (2) the 14 extra instruments yield p-value 0.490. Accepting exogeneity of the variable

k i d s_{i, t}

and maintaining endogeneity of

d i s a b_{i, t}

we now focus on the classification of

\ln w_{i t} .

Testing the extra 7 instruments when treating

\ln w_{i t}

as predetermined yields p-values 0.330. Testing jointly the 21 instruments additional to column (2) the p-value is 0.429. We decide to adopt the classification where variables

a g e_{i, t}

and

k i d s_{i, t}

are exogenous,

d i s a b_{i, t}

and

\ln w_{i, t}

are endogenous, and self-evidently

\ln h_{i, t - 1}

is predetermined (all with respect to

ε_{i, t}

). The corresponding AB1R and AB2W estimates can be found in columns (7) and (8). Note that the extra instruments are especially beneficial for the standard errors of the

β_{j}^{k}

coefficients. Again the

T M^{w}

estimate is larger for 1-step than for 2-step estimation.

In columns (9) and (10) we examine the effects on the results of column (7) of reducing the number of instruments; in column (9) by collapsing and in column (10) by discarding instruments lagged more than two periods. This leads to disturbing results. If the instruments used in column (7) are valid, those used in the columns (9) and (10) cannot be invalid. Nevertheless, test p-values of test

J A B_{a}^{(2, 1)}

reduces substantially. That the estimated coefficient standard errors have increased in columns (9) and (10) is understandable, but the substantial shifts in coefficient estimates is seriously uncomfortable. The negative

T M^{w}

found after collapsing seems not very realistic. The main question seems now whether this is just caused by finite sample bias, or by inconsistency. In the latter case the results of all other columns must be inconsistent too.

Finally, we examine 2-step Blundell-Bond system estimation with Windmeijer correction. Testing validity of the 34 instruments used in column (11) additional to those used in column (8), yields a p-value for the

J E S_{a}^{(2, 1)}

test of 0.016, whereas the

J A B_{a}^{(2, 1)}

based Hayashi-version (see our footnote 3) calculated by xtabond2 (StataCorp LLC) gives a p-value of 0.136. So, effect stationarity seems doubtful, although the five γ and

β^{w}

coefficients seem all highly significant now (with all further coefficients insignificant). The estimates of

T M^{w}

and

σ_{η}

deviate strongly from those of columns (1) through (8). Even more distorted BB2 results are obtained after collapsing. We find it hard to believe that this is all due to increased efficiency and reduced finite sample bias and simply reject effect stationarity and tend to accept the results of columns (7) and (8). Or, should we declare all results in Table 21 uninterpretable simply because no model from the class examined here matches with the Ziliak data? It is hard to answer this question, simply because we learned from the simulations how vulnerable all employed tools are even in cases where the adopted model specification fully corresponds with the underlying DGP.

Hopefully the small sample bias is such that proper interpretation of the coefficients of column (7) is possible. Then we note that—although not statistically significant—we find a tendency that a positive change in either

k i d s

or

d i s a b

leads to an immediate drop in hours supplied, although this drop is mitigated for a substantial part after a few periods. Also, the older an individual gets there is a tendency (again insignificant) to work fewer hours. The wage elasticity is positive with a larger value than was inferred by earlier (static) studies. However, given what we learned from the simulations, we should restrain ourselves when drawing far-reaching conclusions from the estimation and test results given in Table 24, simply because we established that for the currently available techniques for analysis of dynamic panel data models the bias of coefficient estimates can be substantial and the actual size of tests may deviate considerably from the aimed at levels whereas their actual power seems modest.

7. Major Findings

In social science the quantitative analysis of many highly relevant problems requires structural dynamic panel data methods. These allow the observed data to have at best a quasi-experimental nature, whereas the causal structure and the dynamic interactions in the presence of unobserved heterogeneity have yet to be unraveled. When the cross-section dimension of the sample is not very small, employing GMM techniques seems most appropriate in such circumstances. This is also practical since corresponding software packages are widely available. However, not too much is known yet about the actual accuracy in practical situations on the abundance of different not always asymptotically equivalent implementations of estimators and test procedures. This study aims to demarcate the areas in the parameter space where the asymptotic approximations to the properties of the relevant inference techniques in this context have either shown to be reliable beacons or are actually often misguiding marsh fires.

In this context we provide a rather rigorous treatment of many major variants of GMM implementations as well as for the inference techniques on testing the validity of particular orthogonality assumptions and restrictions on individual coefficient values. Special attention is given to the consequences of the joint presence in the model of time-constant and individual-constant unobserved effects, covariates that may be strictly exogenous, predetermined or endogenous, and disturbances that may show particular forms of heteroskedasticity. Also the implications regarding initial conditions for separate regressors with respect to individual effect stationarity are analyzed in great detail, and various popular options that aim to mitigate bias by reducing the number of exploited internal instruments are elucidated. In addition, as alternatives to those used in current standard software, less robust weighting matrices and additional variants of Sargan-Hansen test implementations are considered, as well as the effects of particular modifications of the instruments under heteroskedasticity.

Next, a simulation study is designed in which all the above variants and details are being parametrized and categorized, which leads to a data generating process involving 10 parameters, for which, under 6 different settings regarding sample size and initial conditions, 60 different grid points are examined. For each setting and various of the grid points 13 different choices regarding the set of instruments have been used to examine 12 different implementations of GMM coefficient estimates, giving rise to 24 different implementations of t-tests and 27 different implementations of Sargan-Hansen tests. From all this only a pragmatically selected subset of results is actually presented in this paper.

The major conclusion from the simulations is that, even when the cross-section sample size is several hundreds, the quality of this type of inference depends heavily on a great number of aspects of which many are usually beyond the control of the investigator, such as: magnitude of the time-dimension sample size, speed of dynamic adjustment, presence of any endogenous regressors, type and severity of heteroskedasticity, relative prominence of the individual effects and (non)stationarity of the effect impact on any of the explanatory variables. The quality of inference also depends seriously on choices made by the investigator, such as: type and severity of any reductions applied regarding the set of instruments, choice between (robust) 1-step or (corrected) 2-step estimation, employing a modified GMM estimator, the chosen degree of robustness of the adopted weighting matrix, the employed variant of coefficient tests and of (incremental) Sargan-Hansen tests in deciding on the endogeneity of regressors, the validity of instruments and on the (dynamic) specification of the relationship in general.

Our findings regarding the alternative approaches of modifying instruments and exploiting different weighting matrices are as follows for the examined case of cross-sectional heteroskedasticity. Although the unfeasible form of modification does yield very substantial reductions in both bias and variance, for the straight-forward feasible implementation examined here the potential efficiency gains do not materialize. The robust weighting matrix, which also allows for possible time-series heteroskedasticity, performs often as well as (and sometimes even better than) a specially designed less robust version, although the latter occasionally demonstrates some benefits for incremental Sargan-Hansen tests.

Furthermore, we can report to practitioners: (a) when the effect-noise-ratio is large, the performance of all GMM inference deteriorates; (b) the same occurs in the presence of a genuine (or a supervacaneously treated as) endogenous regressor; (c) in many settings the coefficient restrictions tests show serious size problems which usually can be mitigated by a Windmeijer correction, although for γ large or under simultaneity serious overrejection remains unless N is very much larger than 200; (d) the limited effectiveness of the Windmeijer correction is due to the fact that the positive or negative bias in coefficient estimates is often more serious than the negative bias in the variance estimate; (e) limiting to some degree the number of instruments usually reduces bias and therefore improves size properties of coefficient tests, though at the potential cost of power loss because efficiency usually suffers; (f) for the case of an autoregressive strictly exogenous regressor we noted that it is better to not just instrument it by itself, but also by some of its lags because this improves inference, especially regarding the lagged dependent variable coefficient; (g) to mitigate size problems of the overall Sargan-Hansen overidentification tests the set of instruments should be reduced, possibly by collapsing; under conditional heteroskedasticity one should employ the quadratic form in 2-step residuals, possibly in combination with a weighting matrix based on 1-step residuals, although occasionally the 2-step weighting matrix seems preferable; (h) collapsing also reduces size problems of the incremental Sargan-Hansen effect stationarity test; (i) except under simultaneity, the GMM estimator which exploits instruments which are invalid under effect nonstationarity (BB) may nevertheless perform better than the estimator abstaining from these instruments (AB); (j) the rejection probability of the incremental Sargan-Hansen test for effect stationarity is such that it tends to direct the researcher towards applying the most accurate estimator, even if this is inconsistent; (k) The estimate of

{\hat{σ}}_{ε}

is usually pretty accurate, which is certainly not always the case for

{\hat{σ}}_{η},

although quality improves for larger N and

T,

is better for BB than for AB and usually benefits from collapsing.

When re-analyzing a popular empirical data set in the light of the above simulation findings we note in particular that actual dynamic feedbacks may be much more subtle than those that can be captured by just including a lagged dependent variable regressor, which at present seems the most common approach to model dynamics in panels. In theory the omission of further lagged regressor variables should result in rejections by Sargan-Hansen test statistics, but their power suffers when many valid and some invalid orthogonality conditions are tested jointly instead of by deliberately chosen sequences of incremental tests or by direct variable addition tests. Hopefully tests for serial correlation, which we intentionally left out of this already overloaded study, provide an extra help to practitioners in guiding them towards well-specified models. Our results demonstrate that, especially under particular unfavorable settings, there is great urge for developing more refined inference procedures for structural dynamic panel data models.

multiple

Supplementary Materials

The following are available online at www.mdpi.com/2225-1146/5/1/14/s1, the full set of all Monte Carlo results produced for this article.

Acknowledgments

Financial support from the Netherlands Organization for Scientific Research (NWO) grants “Statistical inference methods regarding effectivity of endogenous policy measures” and “Causal inference with panel data” is gratefully acknowledged. Furthermore, all three authors would like to thank the Division of Economics at Nanyang Technological University in Singapore, where substantial parts of this paper have been written. The paper benefited from constructive comments by two reviewers and an academic editor, all anonymous.

Author Contributions

Overall, all authors contributed equally to this project.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Corrected Variance Estimation for 2-Step GMM

Windmeijer [29] provides a correction to the standard expression for the estimated variance of the 2-step GMM estimator in general nonlinear models and next specializes his results for models with linear moment conditions and finally for linear (panel data) models. Here we apply his approach directly to the standard linear model of Section 2.1 where

{\hat{β}}^{(2)}

is based on weighting matrix

Z^{'} {\hat{Ω}}^{(1)} Z,

where

{\hat{Ω}}^{(1)}

depends on

{\hat{u}}^{(1)}

and thus on

{\hat{β}}^{(1)} .

The nonlinear dependence of

{\hat{β}}^{(2)}

on

{\hat{β}}^{(1)}

can be made explicit by a linear approximation obtained by employing the well-known `delta-method’ to the vector function

f (β) = {X^{'} Z {[Z^{'} Ω (β) Z]}^{- 1} Z^{'} X}^{- 1} X^{'} Z {[Z^{'} Ω (β) Z]}^{- 1} Z^{'} u .

Note that

{\hat{β}}^{(2)} = β_{0} + f ({\hat{β}}^{(1)}) .

Expanding the second term around

β_{0}

yields

\begin{matrix} \begin{matrix} {\hat{β}}^{(2)} - β_{0} \approx f (β_{0}) + {\frac{\partial f (β)}{\partial β^{'}}|}_{β = β_{0}} ({\hat{β}}^{(1)} - β_{0}), \end{matrix} \end{matrix}

(A1)

where under sufficient regularity the omitted terms will be of small order. For

k = 1, . . ., K

we find

\begin{matrix} \begin{matrix} \frac{\partial f (β)}{\partial β_{k}} & = \frac{\partial {X^{'} Z {[Z^{'} Ω (β) Z]}^{- 1} Z^{'} X}^{- 1}}{\partial β_{k}} X^{'} Z {[Z^{'} Ω (β) Z]}^{- 1} Z^{'} u \\ + {X^{'} Z {[Z^{'} Ω (β) Z]}^{- 1} Z^{'} X}^{- 1} \frac{\partial X^{'} Z {[Z^{'} Ω (β) Z]}^{- 1} Z^{'} u}{\partial β_{k}}, \end{matrix} \end{matrix}

where

\begin{matrix} \begin{matrix} \frac{\partial {X^{'} Z {[Z^{'} Ω (β) Z]}^{- 1} Z^{'} X}^{- 1}}{\partial β_{k}} & = - {X^{'} Z {[Z^{'} Ω (β) Z]}^{- 1} Z^{'} X}^{- 1} X^{'} Z \frac{\partial {[Z^{'} Ω (β) Z]}^{- 1}}{\partial β_{k}} Z^{'} X \\ \times {X^{'} Z {[Z^{'} Ω (β) Z]}^{- 1} Z^{'} X}^{- 1}, \end{matrix} \end{matrix}

with

\begin{matrix} \frac{\partial {[Z^{'} Ω (β) Z]}^{- 1}}{\partial β_{k}} = - {[Z^{'} Ω (β) Z]}^{- 1} Z^{'} \frac{\partial Ω (β)}{\partial β_{k}} Z {[Z^{'} Ω (β) Z]}^{- 1}, \end{matrix}

and

\begin{matrix} \frac{\partial {X^{'} Z {[Z^{'} Ω (β) Z]}^{- 1} Z^{'} u}}{\partial β_{k}} = X^{'} Z \frac{\partial {[Z^{'} Ω (β) Z]}^{- 1}}{\partial β_{k}} Z^{'} u . \end{matrix}

In the latter we omit an extra term in

\partial u / \partial β_{k} = \partial (y - X β) / \partial β_{k}

simply because we just want to extract the dependence of

{\hat{β}}^{(2)}

on the operational weighting matrix.

Using the short-hand notation

A (β) = Z {[Z^{'} Ω (β) Z]}^{- 1} Z^{'}

and

Ω_{k} (β) = \partial Ω (β) / \partial β_{k}

we can establish from the above that

\begin{matrix} \partial f (β) / \partial β_{k} = - {[X^{'} A (β) X]}^{- 1} X^{'} A (β) Ω_{k} (β) {I_{N} - A (β) X {[X^{'} A (β) X]}^{- 1} X^{'}} A (β) u . \end{matrix}

This is the k-th column of the matrix

F (β) = \partial f (β) / \partial β^{'}

in (A1). The latter can now be expressed as

\begin{matrix} \begin{matrix} {\hat{β}}^{(2)} - β_{0} \approx {{[X^{'} A (β_{0}) X]}^{- 1} X^{'} A (β_{0}) + F (β_{0}) {(X^{'} P_{Z} X)}^{- 1} X^{'} P_{Z}} u . \end{matrix} \end{matrix}

(A2)

Because

F (β_{0}) = O_{p} (N^{- 1 / 2})

the second term is of smaller order.

This approximation to the estimation errors of

{\hat{β}}^{(2)}

can be used to obtain a finite sample corrected variance estimate of

{\hat{β}}^{(2)} .

This is relatively easy if one conditions on some value for

F (β_{0}),

say

\hat{F} .

Windmeijer chooses for the k-th column of

\hat{F}

the vector

\begin{matrix} - {[X^{'} A ({\hat{β}}^{(1)}) X]}^{- 1} X^{'} A ({\hat{β}}^{(1)}) Ω_{k} ({\hat{β}}^{(1)}) {I_{N} - A ({\hat{β}}^{(1)}) X {[X^{'} A ({\hat{β}}^{(1)}) X]}^{- 1} X^{'}} A ({\hat{β}}^{(1)}) {\hat{u}}^{(2)} . \end{matrix}

Taking

{\hat{u}}^{(2)}

instead of the asymptotically equivalent

{\hat{u}}^{(1)}

leads to substantial simplification, because

X^{'} A ({\hat{β}}^{(1)}) {\hat{u}}^{(2)} = X^{'} Z {[Z^{'} Ω (β^{(1)}) Z]}^{- 1} Z^{'} (y - X {\hat{β}}^{(2)}) = 0,

giving

\begin{matrix} \begin{matrix} \hat{F} = ({\hat{F}}_{\cdot 1}, . . ., {\hat{F}}_{\cdot K}), with {\hat{F}}_{\cdot k} = - {[X^{'} A ({\hat{β}}^{(1)}) X]}^{- 1} X^{'} A ({\hat{β}}^{(1)}) Ω_{k} ({\hat{β}}^{(1)}) A ({\hat{β}}^{(1)}) {\hat{u}}^{(2)} . \end{matrix} \end{matrix}

(A3)

Note that when

L = K

we have

\hat{F} = O,

because

Z^{'} {\hat{u}}^{(1)} = Z^{'} {\hat{u}}^{(2)} = 0 .

This all then yields for

L > K

the corrected variance estimator

\begin{matrix} \begin{matrix} \hat{V a r c} ({\hat{β}}^{(2)}) = \hat{V a r} ({\hat{β}}^{(2)}) + \hat{F} \hat{V a r} ({\hat{β}}^{(2)}) + \hat{V a r} ({\hat{β}}^{(2)}) {\hat{F}}^{'} + \hat{F} \hat{V a r} ({\hat{β}}^{(1)}) {\hat{F}}^{'}, \end{matrix} \end{matrix}

(A4)

where

\begin{matrix} \hat{V a r} ({\hat{β}}^{(1)}) = {\hat{σ}}_{u}^{2} {(X^{'} P_{Z} X)}^{- 1} X^{'} P_{Z} \hat{Ω} P_{Z} X {(X^{'} P_{Z} X)}^{- 1} . \end{matrix}

Note that in case

Ω (β) = d i a g (u_{1}^{2}, . . ., u_{N}^{2})

one has

Ω_{k} ({\hat{β}}^{(1)}) = - 2 {\hat{β}}_{k}^{(1)} d i a g ({\hat{u}}_{1}^{(1)} x_{1 k}, . . ., {\hat{u}}_{N}^{(1)} x_{N k}) .

Appendix B. Partialling Out and GMM

The IV/2SLS result on partialling out directly generalizes for the MGMM estimator, provided this uses all the (transformed) predetermined regressors as instruments. In standard GMM the equivalence of predetermined regressors and a block of the instruments gets lost. Using the notation of (10) and considering the partitioned model leading to (8), we easily find its counterpart

\begin{matrix} \begin{matrix} {\hat{β}}_{1, G M M} = {({\hat{X}}_{1}^{† *'} M_{{\hat{X}}_{2}^{† *}} {\hat{X}}_{1}^{† *})}^{- 1} {\hat{X}}_{1}^{† *'} M_{{\hat{X}}_{2}^{† *}} y^{*}, \end{matrix} \end{matrix}

(A5)

where

\begin{matrix} {\hat{X}}^{† *} = ({\hat{X}}_{1}^{† *}, {\hat{X}}_{2}^{† *}) = P_{Z^{†}} (X_{1}^{*}, X_{2}^{*}) = P_{{(Ψ^{'})}^{- 1} Z} (Ψ X_{1}, Ψ X_{2}) . \end{matrix}

In the special case of system (53) with instruments (55) we have

X_{2} = Z_{2} = {(0^{'}, ι_{T}^{'})}^{'}

and

Z_{1}^{'} Z_{2} = 0,

whereas under cross-sectional heteroskedasticity, due to

D ι_{T} = 0,

the optimal weighting matrix is block-diagonal, hence

Z_{1}^{'} Ω Z_{2} = 0 .

Therefore

Z_{1}^{†'} Z_{2}^{†} = 0

too, giving

P_{Z^{†}} = P_{Z_{1}^{†}} + P_{Z_{2}^{†}} .

Now we find

{\hat{X}}_{1}^{† *} = P_{Z^{†}} X_{1}^{*} = (P_{Z_{1}^{†}} + P_{Z_{2}^{†}}) X_{1}^{*}

and

{\hat{X}}_{2}^{† *} = (P_{Z_{1}^{†}} + P_{Z_{2}^{†}}) Ψ Z_{2} = P_{Z_{2}^{†}} Ψ Z_{2} = {(Ψ^{'})}^{- 1} Z_{2} {(Z_{2}^{'} Ω Z_{2})}^{- 1} Z_{2}^{'} Z_{2} = c Z_{2}^{†},

with c some scalar, because

Z_{2}

has just one column. Therefore,

M_{{\hat{X}}_{2}^{† *}} {\hat{X}}_{1}^{† *} = M_{Z_{2}^{†}} (P_{Z_{1}^{†}} + P_{Z_{2}^{†}}) X_{1}^{*} = P_{Z_{1}^{†}} X_{1}^{*} .

Thus, in this particular case (when using an appropriate weighting matrix), we find

\begin{matrix} {\hat{β}}_{1, G M M} = {({\hat{X}}_{1}^{*'} M_{{\hat{X}}_{2}^{*}} {\hat{X}}_{1}^{*})}^{- 1} {\hat{X}}_{1}^{*'} M_{{\hat{X}}_{2}^{*}} y^{*} = {({\hat{X}}_{1}^{*'} P_{Z_{1}^{†}} X_{1}^{*})}^{- 1} {\hat{X}}_{1}^{*'} P_{Z_{1}^{†}} y . \end{matrix}

Due to the block of zeros in

Z_{1}

this is just the GMM estimator of the model in first differences.

Appendix C. Extracting Redundant Moment Conditions

Through linear transformation15 we demonstrate that the sets of moment conditions for the equation in levels and for the equation in first-differences have a non empty intersection. First we consider the moment conditions associated with the strictly exogenous regressors. For the equation in first differences these are

E (x_{i}^{T} Δ ε_{i t}) = E [Δ ε_{i t} {(x_{i 1}^{'} . . . x_{i T}^{'})}^{'}] = 0,

for

t = 2, . . ., T .

They can also be represented16 by the combination

E [Δ ε_{i t} {(Δ x_{i 2}^{'} . . . Δ x_{i T}^{'})}^{'}] = 0

and

E (x_{i t} Δ ε_{i t}) = 0 .

However, by a similar transformation17 (here of the disturbances instead of the instruments), the conditions for the equation in levels

E [Δ x_{i t h}^{⧫} (η_{i} ι_{T} + ε_{i})] = 0,

where

h = 1, . . ., K_{x}^{⧫}

(and again

t = 2, . . ., T

), can be represented by

E (Δ x_{i t h}^{⧫} {\tilde{ε}}_{i}) = 0

and

E [Δ x_{i t h}^{⧫} (η_{i} + ε_{i t})] = 0 .

So, just the

K_{x}^{⧫} (T - 1)

orthogonality conditions

E [Δ x_{i t}^{⧫} (η_{i} + ε_{i t})] = 0

for

t = 2, . . ., T

are additional due to effect stationarity of

K_{x}^{⧫}

of the strictly exogenous regressors.

Similarly, the orthogonality conditions

E (w_{i}^{t - 1} Δ ε_{i t}) = 0,

or

E (w_{i s} Δ ε_{i t}) = 0

for

s = 1, . . ., t - 1

with

t = 2, . . ., T,

can be represented by

E (w_{i, t - 1} Δ ε_{i t}) = 0

for

t = 2, . . ., T

and

E (Δ w_{i s} Δ ε_{i t}) = 0

for

t = 3, . . ., T

and

s = 2, . . ., t - 1 .

On the other hand, the conditions

E [Δ w_{i t}^{⧫} (η_{i} + ε_{i, t + l})] = 0

for

t > 1

and

l > 0

are actually

E [Δ w_{i s}^{⧫} (η_{i} + ε_{i t})] = 0

for

t = 2, . . ., T

and

s = 2, . . ., t,

whereas these can be represented by

E (Δ w_{i s}^{⧫} Δ ε_{i t}) = 0

for

t = 3, . . ., T

and

s = 2, . . ., t - 1

and

E [Δ w_{i t}^{⧫} (η_{i} + ε_{i t})] = 0

for

t = 2, . . ., T .

Thus, only the

K_{w}^{⧫} (T - 1)

conditions

E [Δ w_{i t}^{⧫} (η_{i} + ε_{i t})] = 0

for

t = 2, . . ., T

are additional.

Using the same logic, the orthogonality conditions

E (v_{i}^{t - 2} Δ ε_{i t}) = 0

for

t = 3, . . ., T,

which are actually

E (v_{i s} Δ ε_{i t}) = 0

for

t = 3, . . ., T

and

s = 1, . . ., t - 2,

can also be represented by

E (v_{i, t - 2} Δ ε_{i t}) = 0

for

t = 3, . . ., T

and

E (Δ v_{i s} Δ ε_{i t}) = 0

for

t = 4, . . ., T

and

s = 2, . . ., t - 2 .

However, the conditions

E [Δ v_{i t}^{⧫} (η_{i} + ε_{i, t + 1 + l})] = 0

for

t > 1

and

l > 0

are in fact

E [Δ v_{i s}^{⧫} (η_{i} + ε_{i t})] = 0

for

t = 3, . . ., T

and

s = 2, . . ., t - 1,

which can also be represented as

E (Δ v_{i s}^{⧫} Δ ε_{i t}) = 0

for

t = 4, . . ., T

and

s = 2, . . ., t - 2

and

E [Δ v_{i, t - 2}^{⧫} (η_{i} + ε_{i t})] = 0

for

t = 3, . . ., T .

Thus, we find that only the

K_{v}^{⧫} (T - 2)

conditions

E [Δ v_{i, t - 2}^{⧫} (η_{i} + ε_{i t})] = 0

for

t = 3, . . ., T

are additional.

Appendix D. Derivations for (115)

The results for

V_{η}

and

V_{λ}

are obvious. Those for

V_{ζ} (i)

and

V_{ε} (i)

are obtained as follows. We use the standard result that the variance of a general stationary ARMA(2,1) process

\begin{matrix} \begin{matrix} z_{t} = \frac{ψ (1 - ϕ L)}{(1 - γ L) (1 - ξ L)} u_{t}, \end{matrix} \end{matrix}

(A6)

where

u_{t} \sim I I D (0, 1),

is given by

\begin{matrix} \begin{matrix} V a r (z_{t}) = ψ^{2} \frac{(1 + γ ξ) (1 + ϕ^{2}) - 2 ϕ (γ + ξ)}{(1 - γ ξ) (1 - γ^{2}) (1 - ξ^{2})} . \end{matrix} \end{matrix}

(A7)

Because we can rewrite (using

σ_{ε} = 1

)

\begin{matrix} [β ρ_{v ε} σ_{v} + (1 - ξ L)] ω_{i}^{1 / 2} = (1 + β ρ_{v ε} σ_{v}) ω_{i}^{1 / 2} (1 - \frac{ξ}{1 + β ρ_{v ε} σ_{v}} L), \end{matrix}

the result for

V_{ε} (i)

follows upon substituting

ψ = (1 + β ρ_{v ε} σ_{v}) ω_{i}^{1 / 2}

and

ϕ = ξ / (1 + β ρ_{v ε} σ_{v}) .

For

V_{ζ} (i)

simply take

ϕ = 0

and

ψ = β σ_{v} {(1 - ρ_{v ε}^{2})}^{1 / 2} ω^{1 / 2} .

^1.Many authors and the Stata xtabond2 package (StataCorp LLC) confusingly address the corrected 2-step variance as robust.
^2.This is the only variant considered in Windmeijer [29].
^3.Package xtabond2 (StataCorp LLC) for Stata always reports $J A B^{(1, 0)}$ after Arellano-Bond estimation, which is inappropriate unless there is conditional homoskedasticity. After requesting for robust standard errors in 1-step estimation it presents also $J A B_{a}^{(2, 1)} .$ Requesting 2-step estimation also presents both $J A B^{(1, 0)}$ and $J A B_{a}^{(2, 1)}$ . Blundell-Bond estimation yields $J B B^{(1, 0)}$ and $J B B^{(2, 1)},$ although a version of $J B B^{(1, 0)}$ is reported that does not use weighting matrix $S^{(0)} ({\hat{σ}}_{η}^{2, s (1)} / {\hat{σ}}_{ε}^{2, s (1)}),$ but $S^{(0)} (0),$ which is only valid under homoskedasticity and $σ_{η}^{2} = 0 .$ Package xtabond2 (StataCorp LLC) addresses overidentification tests after 1-step estimation always as “Sargan test” and after 2-step estimation as “Hansen test”.
^4.By specifying instruments in separate groups, xtabond2 (StataCorp LLC) presents for each separate group the corresponding incremental J test. However, not the version as defined in (87), but an asymptotically equivalent one as suggested in Hayashi [34] (p. 220) which will never be negative.
^5.Proved in Arellano and Bover [3].
^6.If we would strictly follow the notation of the earlier sections the coefficient β should actually be called δ when $ρ_{v ε} \neq 0 .$
^7.Note that ${(1 - ξ L)}^{- 1} = 1 + ξ L + ξ^{2} L^{2} + . . .$ and therefore ${(1 - ξ L)}^{- 1} η_{i}^{\circ} = \sum_{j = 0}^{\infty} ξ^{j} η_{i}^{\circ} = η_{i}^{\circ} / (1 - ξ)$ .
^8.Such control is not exercised in the simulation designs of Blundell et al. [26] and Bun and Sarafidis [27]. They do consider simultaneity, but its magnitude has not been mentioned and it is not kept constant over different designs.
^9.The b variant differs from a only for $T > 3$ and may then be not positive definite. For $T = 6, 9$ it proved to be so bad for both AB and BB that we discarded it completely from the presented tables.
^10.Below various results will be discussed in the text without referring to a table presented in the article, simply because we did not find it worthwhile to include it, respecting reasonable space limitations. However, the full set of all Monte Carlo results produced for this article is available as supplementary material at: https://www.mdpi.com/2225-1146/5/1/14/s1.
^11.See findings in Kiviet [40] and in many of its references.
^12.Baltagi et al. [41] study a similar life-cycle labor-supply model for physicians in Norway. They consider a dynamic model, and this rejects the static specification used by Ziliak [35].
^13.This static model is also used extensively for illustrative purposes in Cameron and Trivedi [42].
^14.If $a g e_{i, t} = a g e_{i, t - 1} + 1$ then $a g e_{i t}^{2} = a g e_{i, t - 1}^{2} + 2 a g e_{i, t - 1} + 1 .$ Thus, including lags of $a g e_{i t}$ and of $a g e_{i t}^{2}$ in addition to their current values, either as regressors or as instruments, in combination with an intercept or time-dummies, leads to rank reduction. Although in this particular data set $a g e_{i, t} = a g e_{i, t - 1} + 1$ does not hold $\forall i, t,$ we abstained from including lags of age and its square.
^15.In this Appendix we repeatedly use the result that the p conditions $E (a C b) = 0,$ where a is a random scalar, b a $p \times 1$ random vector and C a deterministic nonsingular $p \times p$ matrix, are equivalent with the p conditions $E (a b) = 0,$ because $E (a b) = 0 \Leftrightarrow C E (a b) = 0 \Leftrightarrow E (a C b) = 0 .$
^16.Here $C = (D^{'}$ $e_{T, t})^{'} \otimes I_{K_{x}} .$
^17.Now $C = (D^{'}$ $e_{T, t})^{'} .$

References

M. Arellano, and S. Bond. “Some Tests of Specification for Panel Data: Monte Carlo Evidence and an Application to Employment Equations.” Rev. Econ. Stud. 58 (1991): 277–297. [Google Scholar] [CrossRef]
R. Blundell, and S. Bond. “Initial Conditions and Moment Restrictions in Dynamic Panel Data Models.” J. Econom. 87 (1998): 115–143. [Google Scholar] [CrossRef]
M. Arellano, and O. Bover. “Another Look at the Instrumental Variable Estimation of Error-Components Models.” J. Econom. 68 (1995): 29–51. [Google Scholar] [CrossRef]
J. Hahn, and G. Kuersteiner. “Asymptotically Unbiased Inference for a Dynamic Panel Model with Fixed Effects When Both n and T Are Large.” Econometrica 70 (2002): 1639–1657. [Google Scholar] [CrossRef]
J. Alvarez, and M. Arellano. “The Time Series and Cross-Section Asymptotics of Dynamic Panel Data Estimators.” Econometrica 71 (2003): 1121–1159. [Google Scholar] [CrossRef]
J. Hahn, J. Hausman, and G. Kuersteiner. “Long Difference Instrumental Variables Estimation for Dynamic Panel Models with Fixed Effects.” J. Econom. 140 (2007): 574–617. [Google Scholar] [CrossRef]
J.F. Kiviet. “Judging Contending Estimators by Simulation: Tournaments in Dynamic Panel Data Models.” In The Refinement of Econometric Estimation and Test Procedures; Finite Sample and Asymptotic Analysis. Edited by G.D.A. Phillips and E. Tzavalis. Cambridge, UK: Cambridge University Press, 2007, pp. 282–318. [Google Scholar]
H. Kruiniger. “Maximum Likelihood Estimation and Inference Methods for the Covariance Stationary Panel AR(1)/Unit Root Model.” J. Econom. 144 (2008): 447–464. [Google Scholar] [CrossRef]
R. Okui. “The Optimal Choice of Moments in Dynamic Panel Data Models.” J. Econom. 151 (2009): 1–16. [Google Scholar] [CrossRef]
D. Roodman. “A Note on the Theme of too Many Instruments.” Oxf. Bull. Econ. Stat. 71 (2009): 135–158. [Google Scholar] [CrossRef]
K. Hayakawa. “On the Effect of Mean-Nonstationarity in Dynamic Panel Data Models.” J. Econom. 153 (2009): 133–135. [Google Scholar] [CrossRef]
C. Han, and P.C.B. Phillips. “First Difference Maximum Likelihood and Dynamic Panel Estimation.” J. Econom. 175 (2013): 35–45. [Google Scholar] [CrossRef]
J.F. Kiviet. “On Bias, Inconsistency, and Efficiency of Various Estimators in Dynamic Panel Data Models.” J. Econom. 68 (1995): 53–78. [Google Scholar] [CrossRef]
C.G. Bowsher. “On Testing Overidentifying Restrictions in Dynamic Panel Data Models.” Econ. Lett. 77 (2002): 211–220. [Google Scholar] [CrossRef]
C. Hsiao, M.H. Pesaran, and A.K. Tahmiscioglu. “Maximum Likelihood Estimation of Fixed Effects Dynamic Panel Data Models Covering Short Time Periods.” J. Econom. 109 (2002): 107–150. [Google Scholar] [CrossRef]
S. Bond, and F. Windmeijer. “Reliable Inference for GMM Estimators? Finite Sample Properties of Alternative Test Procedures in Linear Panel Data Models.” Econom. Rev. 24 (2005): 1–37. [Google Scholar] [CrossRef]
M.J.G. Bun, and M.A. Carree. “Bias-Corrected Estimation in Dynamic Panel Data Models.” J. Bus. Econ. Stat. 23 (2005): 200–210. [Google Scholar] [CrossRef]
M.J.G. Bun, and M.A. Carree. “Correction: Bias-Corrected Estimation in Dynamic Panel Data Models.” J. Bus. Econ. Stat. 23 (2005): 200–210. [Google Scholar] [CrossRef]
M.J.G. Bun, and J.F. Kiviet. “The Effects of Dynamic Feedbacks on LS and MM Estimator Accuracy in Panel Data Models.” J. Econom. 132 (2006): 409–444. [Google Scholar] [CrossRef]
C. Gouriéroux, P.C.B. Phillips, and J. Yu. “Indirect Inference for Dynamic Panel Models.” J. Econom. 157 (2010): 68–77. [Google Scholar] [CrossRef]
K. Hayakawa. “The Effects of Dynamic Feedbacks on LS and MM Estimator Accuracy in Panel Data Models: Some Additional Results.” J. Econom. 159 (2010): 202–208. [Google Scholar] [CrossRef]
G. Dhaene, and K. Jochmans. “Likelihood Inference in an Autoregression with Fixed Effects.” Econom. Theory 32 (2016): 1178–1215. [Google Scholar] [CrossRef]
M.J. Flannery, and K.W. Hankins. “Estimating Dynamic Panel Models in Corporate Finance.” J. Corp. Financ. 19 (2013): 1–19. [Google Scholar] [CrossRef]
G. Everaert. “Orthogonal to Backward Mean Transformation for Dynamic Panel Data Models.” Econom. J. 16 (2013): 179–221. [Google Scholar] [CrossRef]
S. Kripfganz, and C. Schwarz. Estimation of Linear Dynamic Panel Data Models with Time-Invariant Regressors. ECB Working Paper 1838; Frankfurt am Main, Germany: European Central Bank, 2015. [Google Scholar]
R. Blundell, S. Bond, and F. Windmeijer. “Estimation in Dynamic Panel Data Models: Improving on the Performance of the Standard GMM Estimator.” Adv. Econom. 1 (2001): 53–91. [Google Scholar]
M.J.G. Bun, and V. Sarafidis. “Dynamic Panel Data Models.” In The Oxford Handbook of Panel Data. Edited by B.H. Baltagi. Oxford, UK: Oxford University Press, 2015, pp. 76–110. [Google Scholar]
M.N. Harris, W. Kostenko, L. Mátyás, and I. Timol. “The Robustness of Estimators for Dynamic Panel Data Models to Misspecification.” Singap. Econ. Rev. 54 (2009): 399–426. [Google Scholar] [CrossRef]
F. Windmeijer. “A Finite Sample Correction for the Variance of Linear Efficient Two-Step GMM Estimators.” J. Econom. 126 (2005): 25–51. [Google Scholar] [CrossRef]
M.J.G. Bun, and M.A. Carree. “Bias-Corrected Estimation in Dynamic Panel Data Models with Heteroscedasticity.” Econ. Lett. 92 (2006): 220–227. [Google Scholar] [CrossRef]
A. Juodis. “A Note on Bias-Corrected Estimation in Dynamic Panel Data Models.” Econ. Lett. 118 (2013): 435–438. [Google Scholar] [CrossRef]
E. Moral-Benito. “Likelihood-Based Estimation of Dynamic Panels with Predetermined Regressors.” J. Bus. Econ. Stat. 31 (2013): 451–472. [Google Scholar] [CrossRef]
J.F. Kiviet, and Q. Feng. Efficiency Gains by Modifying GMM Estimation in Linear Models under Heteroskedasticity. UvA-Econometrics Discussion Paper 2014/06; Amsterdam, The Netherlands: University of Amsterdam, Amsterdam School of Economics, 2016. [Google Scholar]
F. Hayashi. Econometrics. Princeton, NJ, USA: Princeton University Press, 2000. [Google Scholar]
J.P. Ziliak. “Efficient Estimation with Panel Data When Instruments are Predetermined: An Empirical Comparison of Moment-Condition Estimators.” J. Bus. Econ. Stat. 15 (1997): 419–431. [Google Scholar] [CrossRef]
M. Arellano. Panel Data Econometrics. Oxford, UK: Oxford University Press, 2003. [Google Scholar]
D. Holtz-Eakin, W. Newey, and H.S. Rosen. “Estimating Vector Autoregressions with Panel Data.” Econometrica 56 (1988): 1371–1395. [Google Scholar] [CrossRef]
T.W. Anderson, and C. Hsiao. “Estimation of Dynamic Models with Error Components.” J. Am. Stat. Assoc. 76 (1981): 598–606. [Google Scholar] [CrossRef]
J.F. Kiviet. “Monte Carlo Simulation for Econometricians.” Found. Trends Econom. 5 (2011): 1–181. [Google Scholar] [CrossRef]
J.F. Kiviet. “Identification and Inference in a Simultaneous Equation under Alternative Information Sets and Sampling Schemes.” Econom. J. 16 (2013): S24–S59. [Google Scholar] [CrossRef]
B.H. Baltagi, E. Bratberg, and T.H. Holmås. “A Panel Data Study of Physicians’ Labor Supply: The Case of Norway.” Health Econ. 14 (2005): 1035–1045. [Google Scholar] [CrossRef] [PubMed]
A.C. Cameron, and P.K. Trivedi. Microeconometrics: Methods and Applications. Cambridge, UK: Cambridge University Press, 2005. [Google Scholar]
D. Roodman. “How to do xtabond2: An Introduction to Difference and System GMM in Stata.” Stata J. 9 (2009): 86–136. [Google Scholar] [CrossRef]

Table 1. Definition of labels for particular instrument matrix reductions.

**Table 1.** Definition of labels for particular instrument matrix reductions.
A $^{x} : Z_{i}^{x}$	A $^{w} : Z_{i}^{w}$	A $^{v} : Z_{i}^{v}$
L0 $^{x} : d i a g (x_{i 2}^{'}, . . ., x_{i T}^{'})$	L0 $^{w} : d i a g (w_{i 1}^{'}, . . ., w_{i, T - 1}^{'})$	L0 $^{v} : {[0, d i a g (v_{i 1}, . . ., v_{i, T - 2})]}^{'}$
L1 $^{x} : d i a g (Δ x_{i 2}^{'}, . . ., Δ x_{i T}^{'})$	L1 $^{w} : d i a g (w_{i 1}^{'}, Δ w_{i 2}^{'}, . . ., Δ w_{i, T - 1}^{'})$	L1 $^{v} : {[0, d i a g (v_{i 1}, Δ v_{i 2}, . . ., Δ v_{i, T - 2})]}^{'}$
L2 $^{x} :$ $d i a g (x_{i 1}^{'}, . . ., x_{i, T - 1}^{'}),$ L0 $^{x}$	L2 $^{w} :$ ${[0, d i a g (w_{i 1}, . . ., w_{i, T - 2})]}^{'},$ L0 $^{w}$	L2 $^{v} :$ ${[0, 0, d i a g (v_{i 1}, . . ., v_{i, T - 3})]}^{'},$ L0 $^{v}$
L3 $^{x} :$ ${[0, d i a g (x_{i 1}, . . ., x_{i, T - 2})]}^{'},$ L2 $^{x}$	L3 $^{w} :$ ${[0, 0, d i a g (w_{i 1}, . . ., w_{i, T - 3})]}^{'},$ L2 $^{w}$	L3 $^{v} :$ ${[0, 0, 0, d i a g (v_{i 1}, . . ., v_{i, T - 4})]}^{'},$ L2 $^{v}$
C $^{x} : Z_{i}^{* x}$	C $^{w} : Z_{i}^{* w}$	C $^{v} : Z_{i}^{* v}$
C0 $^{x} : {(x_{i 2}, . . ., x_{i T})}^{'}$	C0 $^{w} : {(w_{i 1}, . . ., w_{i, T - 1})}^{'}$	C0 $^{v} : {(0, v_{i 1}, . . ., v_{i, T - 2})}^{'}$
C1 $^{x} : {(Δ x_{i 2}, . . ., Δ x_{i T})}^{'}$	C1 $^{w} : {(0, Δ w_{i 2}, . . ., Δ w_{i, T - 1})}^{'}$	C1 $^{v} : {(0, 0, Δ v_{i 2}, . . ., Δ v_{i, T - 2})}^{'}$
C2 $^{x} :$ C0 $^{x}, {(x_{i 1}, . . ., x_{i, T - 1})}^{'}$	C2 $^{w} :$ C0 $^{w}$ , ${(0, w_{i 1}, . . ., w_{i, T - 2})}^{'}$	C2 $^{v} :$ C0 $^{v}$ , ${(0, 0, v_{i 1}, . . ., v_{i, T - 3})}^{'}$
C3 $^{x} :$ C2 $^{x}$ , ${(0, x_{i 1}, . . ., x_{i, T - 2})}^{'}$	C3 $^{w} :$ C2 $^{w}$ , ${(0, 0, w_{i 1}, . . ., w_{i, T - 3})}^{'}$	C3 $^{v} :$ C2 $^{v}$ , ${(0, 0, 0, v_{i 1}, . . ., v_{i, T - 4})}^{'}$

Table 2. Heteroskedasticity quantiles and moments of

ω_{i}^{1 / 2}

for different values of θ.

**Table 2.** Heteroskedasticity quantiles and moments of $ω_{i}^{1 / 2}$ for different values of θ.
$\|θ\|$		$ω_{i}$			$ω_{i}^{1 / 2}$
	$q_{0.01}$	$q_{0.5}$	$q_{0.99}$	$q_{0.01}$	$q_{0.5}$	$q_{0.99}$	$E (ω_{i}^{1 / 2})$	$St . Dev . (ω_{i}^{1 / 2})$
0.1	0.789	0.995	1.256	0.888	0.998	1.121	0.999	0.050
0.3	0.476	0.956	1.921	0.690	0.978	1.386	0.989	0.149
0.5	0.276	0.883	2.824	0.525	0.939	1.681	0.969	0.246
0.7	0.154	0.783	3.989	0.391	0.885	1.997	0.941	0.340
1.0	0.059	0.607	6.211	0.243	0.779	2.492	0.882	0.470
1.3	0.021	0.430	8.840	0.145	0.655	2.973	0.810	0.587
1.6	0.007	0.278	11.498	0.082	0.527	3.391	0.726	0.688
2.0	0.001	0.135	14.192	0.036	0.368	3.767	0.607	0.795

Table 3. P0u-XA *.

**Table 3.** P0u-XA *.
Unfeasible Coefficient Estimators
					$θ = 0$						$θ = 1$
${\bar{ρ}}_{x ε} = 0.0$	$L$				ABu			BBu			ABu				BBu				MABu				MBBu
	AB	BB		$γ$	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE		Bias	Stdv	RMSE		Bias	Stdv	RMSE		Bias	Stdv	RMSE
$T = 3$	11	16		0.20	−0.012	0.058	0.060	−0.001	0.049	0.049	−0.022	0.080	0.083		−0.003	0.067	0.067		−0.015	0.066	0.068		−0.002	0.055	0.055
				0.50	−0.022	0.076	0.080	−0.003	0.055	0.055	−0.042	0.105	0.113		−0.007	0.075	0.075		−0.029	0.087	0.091		−0.003	0.061	0.061
				0.80	−0.077	0.134	0.155	−0.009	0.067	0.068	−0.144	0.182	0.232		−0.018	0.096	0.097		−0.096	0.150	0.178		−0.006	0.071	0.071
$T = 6$	50	61		0.20	−0.009	0.029	0.030	0.000	0.026	0.026	−0.017	0.040	0.044		0.001	0.036	0.036		−0.010	0.030	0.032		−0.002	0.029	0.029
				0.50	−0.017	0.034	0.038	0.000	0.028	0.028	−0.030	0.046	0.055		−0.000	0.038	0.038		−0.020	0.037	0.041		−0.000	0.031	0.031
				0.80	−0.054	0.052	0.075	−0.002	0.032	0.032	−0.094	0.070	0.117		−0.005	0.043	0.043		−0.065	0.057	0.087		0.001	0.034	0.034
$T = 9$	116	133		0.20	−0.008	0.021	0.023	0.001	0.020	0.020	−0.015	0.029	0.032		0.001	0.027	0.027		−0.009	0.022	0.024		−0.002	0.021	0.021
				0.50	−0.014	0.024	0.027	0.001	0.020	0.020	−0.024	0.031	0.040		0.002	0.027	0.027		−0.016	0.025	0.029		−0.001	0.022	0.022
				0.80	−0.041	0.033	0.053	−0.000	0.022	0.022	−0.069	0.043	0.081		−0.001	0.028	0.028		−0.049	0.036	0.061		0.003	0.023	0.023
	AB	BB		$β$	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE		Bias	Stdv	RMSE		Bias	Stdv	RMSE		Bias	Stdv	RMSE
$T = 3$	11	16		1.43	0.003	0.100	0.100	0.001	0.096	0.096	0.004	0.148	0.148		0.005	0.136	0.136		0.004	0.100	0.100		0.001	0.097	0.097
				0.93	0.002	0.099	0.099	0.003	0.092	0.092	0.002	0.146	0.146		0.009	0.131	0.131		0.002	0.099	0.099		0.002	0.094	0.094
				0.31	−0.002	0.097	0.097	0.006	0.092	0.093	−0.005	0.142	0.142		0.012	0.132	0.133		−0.003	0.097	0.097		0.004	0.093	0.093
$T = 6$	50	61		1.43	0.006	0.054	0.055	−0.000	0.053	0.053	0.011	0.078	0.078		−0.000	0.074	0.074		0.007	0.055	0.055		0.001	0.054	0.054
				0.93	0.007	0.053	0.053	−0.000	0.051	0.051	0.012	0.075	0.076		0.001	0.070	0.070		0.008	0.053	0.054		0.000	0.052	0.052
				0.31	0.004	0.051	0.051	0.002	0.048	0.048	0.006	0.073	0.073		0.004	0.066	0.066		0.005	0.051	0.051		0.000	0.048	0.048
$T = 9$	116	133		1.43	0.007	0.040	0.041	−0.001	0.039	0.039	0.012	0.056	0.057		−0.001	0.054	0.054		0.008	0.040	0.041		0.001	0.040	0.040
				0.93	0.009	0.039	0.040	−0.001	0.037	0.037	0.014	0.054	0.056		−0.001	0.051	0.051		0.010	0.039	0.040		0.001	0.038	0.038
				0.31	0.006	0.037	0.037	0.001	0.034	0.034	0.010	0.051	0.052		0.002	0.047	0.047		0.008	0.037	0.037		−0.000	0.035	0.035
Unfeasible t-Test: Actual Significance Level
${\bar{ρ}}_{x ε} = 0.0$	$L$			$θ = 0$							$θ = 1$
${\bar{ρ}}_{x ε} = 0.0$	AB	BB		$γ$	ABu	BBu		$β$	ABu	BBu	$γ$	ABu	BBu	MABu	MBBu			$β$	ABu	BBu	MABu	MBBu
$T = 3$	11	16		0.20	0.058	0.051		1.43	0.048	0.050	0.20	0.060	0.051	0.062	0.049			1.43	0.046	0.049	0.048	0.049
				0.50	0.061	0.053		0.93	0.047	0.050	0.50	0.066	0.053	0.067	0.057			0.93	0.045	0.048	0.047	0.048
				0.80	0.089	0.056		0.31	0.042	0.049	0.80	0.123	0.061	0.099	0.058			0.31	0.037	0.047	0.039	0.049
$T = 6$	50	61		0.20	0.059	0.041		1.43	0.050	0.048	0.20	0.071	0.048	0.061	0.044			1.43	0.052	0.050	0.049	0.049
				0.50	0.074	0.044		0.93	0.052	0.047	0.50	0.099	0.053	0.079	0.044			0.93	0.051	0.050	0.051	0.049
				0.80	0.172	0.052		0.31	0.050	0.049	0.80	0.267	0.058	0.197	0.054			0.31	0.047	0.048	0.050	0.049
$T = 9$	116	133		0.20	0.071	0.043		1.43	0.049	0.047	0.20	0.082	0.048	0.072	0.053			1.43	0.055	0.048	0.049	0.048
				0.50	0.095	0.047		0.93	0.053	0.048	0.50	0.127	0.049	0.101	0.048			0.93	0.058	0.047	0.053	0.049
				0.80	0.246	0.053		0.31	0.055	0.050	0.80	0.377	0.053	0.281	0.055			0.31	0.054	0.050	0.057	0.050
Unfeasible Sargan-Hansen Test: Rejection Probability
${\bar{ρ}}_{x ε} = 0.0$	$df$				$θ = 0$						$θ = 1$
${\bar{ρ}}_{x ε} = 0.0$	AB	BB	Inc	$γ$	$JABu$	$JBBu$	$JESu$	$JMABu$	$JMMBu$	$JESMu$	$JABu$	$JBBu$	$JESu$		$JMABu$	$JMMBu$	$JESMu$
$T = 3$	9	13	4	0.20	0.048	0.047	0.048	0.048	0.047	0.048	0.049	0.047	0.045		0.050	0.050	0.049
				0.50	0.049	0.049	0.048	0.049	0.049	0.048	0.047	0.048	0.050		0.049	0.051	0.051
				0.80	0.039	0.051	0.063	0.039	0.051	0.063	0.033	0.048	0.075		0.038	0.050	0.067
$T = 6$	48	58	10	0.20	0.045	0.048	0.048	0.045	0.048	0.048	0.048	0.048	0.050		0.045	0.048	0.050
				0.50	0.043	0.045	0.049	0.043	0.045	0.049	0.043	0.048	0.057		0.042	0.047	0.052
				0.80	0.036	0.043	0.075	0.036	0.043	0.075	0.030	0.047	0.103		0.035	0.043	0.077
$T = 9$	114	130	16	0.20	0.048	0.053	0.048	0.048	0.053	0.048	0.049	0.054	0.053		0.048	0.052	0.051
				0.50	0.046	0.051	0.053	0.046	0.051	0.053	0.048	0.052	0.059		0.045	0.051	0.055
				0.80	0.036	0.049	0.086	0.036	0.049	0.086	0.034	0.050	0.118		0.037	0.049	0.095

*

R = 10, 000

simulation replications. Design parameter values:

N = 200

,

S N R = 3

,

D E N_{y} = 1.0

,

E V F_{x} = 0.0

,

{\bar{ρ}}_{x ε} = 0.0

,

ξ = 0.8

,

κ = 0.00

,

σ_{ε} = 1

,

q = 1

,

ϕ = 1.0

. These yield the DGP parameter values:

π_{λ} = 0.00

,

π_{η} = 0.00

,

σ_{v} = 0.60

,

σ_{η} = 1.0 * (1 - γ)

,

ρ_{v ε} = 0.0

(and

{\bar{ρ}}_{x η} = 0.00

,

{\bar{ρ}}_{x λ} = 0.00

).

Table 4. P0fc-XA *.

**Table 4.** P0fc-XA *.
Feasible Coefficient Estimators for Arellano-Bond
			$θ = 0$									$θ = 1$
${\bar{ρ}}_{x ε} = 0.0$			AB1			AB2a			AB2c			AB1			AB2a			AB2c			MAB
	$L$	$γ$	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE
$T = 3$	11	0.20	−0.012	0.058	0.060	−0.011	0.061	0.062	−0.012	0.059	0.061	−0.023	0.086	0.089	−0.019	0.081	0.083	−0.022	0.081	0.084	−0.022	0.083	0.086
		0.50	−0.022	0.076	0.080	−0.022	0.079	0.082	−0.022	0.078	0.081	−0.044	0.112	0.121	−0.036	0.106	0.112	−0.040	0.106	0.114	−0.042	0.109	0.117
		0.80	−0.077	0.134	0.155	−0.077	0.141	0.161	−0.075	0.137	0.156	−0.146	0.194	0.243	−0.132	0.188	0.230	−0.139	0.184	0.231	−0.143	0.190	0.238
$T = 6$	50	0.20	−0.009	0.029	0.030	−0.009	0.032	0.033	−0.009	0.029	0.031	−0.019	0.045	0.049	−0.016	0.041	0.043	−0.017	0.040	0.044	−0.014	0.036	0.039
		0.50	−0.017	0.034	0.038	−0.017	0.038	0.041	−0.017	0.034	0.038	−0.035	0.052	0.063	−0.028	0.047	0.055	−0.030	0.047	0.055	−0.026	0.043	0.050
		0.80	−0.054	0.052	0.075	−0.055	0.059	0.081	−0.053	0.053	0.075	−0.105	0.078	0.131	−0.091	0.074	0.118	−0.094	0.071	0.117	−0.087	0.068	0.110
$T = 9$	116	0.20	−0.008	0.021	0.023	−0.008	0.024	0.025	−0.008	0.021	0.023	−0.017	0.033	0.037	−0.015	0.031	0.035	−0.014	0.029	0.032	−0.011	0.024	0.027
		0.50	−0.014	0.024	0.027	−0.014	0.026	0.030	−0.014	0.024	0.028	−0.028	0.036	0.046	−0.026	0.034	0.043	−0.024	0.032	0.040	−0.019	0.028	0.034
		0.80	−0.041	0.033	0.053	−0.042	0.037	0.056	−0.041	0.033	0.053	−0.078	0.050	0.093	−0.074	0.048	0.088	−0.070	0.043	0.082	−0.061	0.040	0.073
	$L$	$β$	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE
$T = 3$	11	1.43	0.003	0.100	0.100	0.004	0.103	0.103	0.003	0.101	0.101	0.006	0.158	0.158	0.005	0.145	0.145	0.005	0.147	0.147	0.005	0.148	0.148
		0.93	0.002	0.099	0.099	0.003	0.103	0.103	0.002	0.100	0.100	0.004	0.157	0.157	0.003	0.144	0.144	0.002	0.146	0.146	0.003	0.147	0.147
		0.31	−0.002	0.097	0.097	−0.002	0.101	0.101	−0.002	0.099	0.099	−0.004	0.152	0.152	−0.005	0.141	0.141	−0.005	0.143	0.143	−0.004	0.142	0.142
$T = 6$	50	1.43	0.006	0.054	0.055	0.006	0.060	0.061	0.006	0.055	0.055	0.013	0.087	0.088	0.010	0.077	0.077	0.011	0.078	0.079	0.009	0.067	0.067
		0.93	0.007	0.053	0.053	0.007	0.059	0.059	0.007	0.053	0.054	0.014	0.085	0.086	0.011	0.074	0.075	0.012	0.076	0.077	0.010	0.065	0.066
		0.31	0.004	0.051	0.051	0.004	0.057	0.057	0.004	0.052	0.052	0.007	0.082	0.082	0.005	0.072	0.072	0.006	0.073	0.074	0.005	0.063	0.063
$T = 9$	116	1.43	0.007	0.040	0.041	0.008	0.045	0.045	0.007	0.041	0.041	0.014	0.065	0.066	0.013	0.060	0.061	0.012	0.056	0.058	0.009	0.046	0.047
		0.93	0.009	0.039	0.040	0.009	0.043	0.044	0.009	0.039	0.040	0.017	0.062	0.064	0.016	0.058	0.060	0.014	0.054	0.056	0.012	0.044	0.046
		0.31	0.006	0.037	0.037	0.007	0.041	0.041	0.006	0.037	0.037	0.012	0.059	0.060	0.011	0.055	0.056	0.010	0.051	0.052	0.009	0.042	0.043
Feasible Coefficient Estimators for Blundell-Bond
			$θ = 0$									$θ = 1$
${\bar{ρ}}_{x ε} = 0.0$			BB1			BB2a			BB2c			BB1			BB2a			BB2c			MBB
	$L$	$γ$	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE
$T = 3$	16	0.20	−0.003	0.049	0.050	−0.000	0.051	0.051	−0.000	0.051	0.051	−0.010	0.073	0.074	−0.004	0.067	0.068	−0.002	0.070	0.070	−0.003	0.072	0.072
		0.50	−0.009	0.057	0.058	−0.004	0.058	0.058	−0.003	0.057	0.057	−0.021	0.085	0.087	−0.010	0.078	0.078	−0.005	0.079	0.079	0.002	0.092	0.092
		0.80	−0.029	0.075	0.081	−0.013	0.076	0.077	−0.010	0.073	0.074	−0.055	0.111	0.124	−0.031	0.108	0.112	−0.014	0.107	0.108	0.020	0.155	0.157
$T = 6$	61	0.20	−0.002	0.026	0.026	−0.001	0.028	0.028	0.001	0.027	0.027	−0.010	0.041	0.042	−0.006	0.037	0.037	0.000	0.038	0.038	−0.003	0.034	0.034
		0.50	−0.008	0.029	0.030	−0.003	0.032	0.032	0.000	0.030	0.030	−0.021	0.045	0.050	−0.013	0.040	0.043	−0.001	0.040	0.040	−0.003	0.039	0.039
		0.80	−0.029	0.038	0.048	−0.014	0.039	0.041	−0.005	0.035	0.035	−0.058	0.056	0.081	−0.042	0.052	0.067	−0.009	0.047	0.048	0.007	0.056	0.056
$T = 9$	133	0.20	−0.002	0.020	0.020	−0.001	0.021	0.021	0.001	0.020	0.020	−0.009	0.031	0.033	−0.008	0.030	0.031	0.001	0.028	0.028	−0.003	0.024	0.024
		0.50	−0.008	0.021	0.022	−0.006	0.022	0.023	0.001	0.021	0.021	−0.019	0.033	0.038	−0.017	0.031	0.036	0.001	0.029	0.029	−0.004	0.026	0.027
		0.80	−0.027	0.026	0.038	−0.021	0.027	0.034	−0.003	0.024	0.024	−0.053	0.040	0.066	−0.049	0.038	0.062	−0.006	0.031	0.032	−0.004	0.034	0.034
	$L$	$β$	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE
$T = 3$	16	1.43	0.002	0.096	0.096	0.002	0.101	0.101	0.002	0.097	0.097	0.007	0.149	0.149	0.004	0.138	0.138	0.008	0.137	0.137	0.009	0.139	0.139
		0.93	0.004	0.094	0.094	0.004	0.097	0.097	0.005	0.093	0.093	0.009	0.146	0.146	0.008	0.133	0.133	0.014	0.132	0.133	0.021	0.136	0.137
		0.31	0.005	0.094	0.094	0.007	0.098	0.098	0.007	0.094	0.094	0.010	0.145	0.146	0.010	0.134	0.134	0.015	0.135	0.135	0.041	0.157	0.162
$T = 6$	61	1.43	0.002	0.053	0.053	0.001	0.059	0.059	−0.000	0.054	0.054	0.007	0.085	0.085	0.004	0.075	0.075	0.000	0.075	0.075	0.003	0.065	0.065
		0.93	0.004	0.052	0.052	0.002	0.057	0.057	0.000	0.051	0.051	0.011	0.082	0.083	0.007	0.072	0.073	0.003	0.072	0.072	0.005	0.063	0.063
		0.31	0.004	0.050	0.050	0.004	0.054	0.054	0.003	0.048	0.048	0.009	0.079	0.079	0.007	0.070	0.070	0.006	0.067	0.068	0.008	0.060	0.060
$T = 9$	133	1.43	0.002	0.039	0.039	0.002	0.043	0.043	−0.001	0.040	0.040	0.008	0.064	0.064	0.007	0.060	0.060	−0.001	0.055	0.055	0.003	0.046	0.046
		0.93	0.005	0.038	0.038	0.004	0.041	0.042	−0.000	0.038	0.038	0.012	0.061	0.062	0.011	0.057	0.058	0.000	0.052	0.052	0.003	0.044	0.044
		0.31	0.006	0.036	0.036	0.005	0.039	0.039	0.002	0.035	0.035	0.011	0.057	0.058	0.010	0.054	0.055	0.004	0.047	0.048	0.004	0.040	0.041

*

R = 10, 000

simulation replications. Design parameter values:

N = 200

,

S N R = 3

,

D E N_{y} = 1.0

,

E V F_{x} = 0.0

,

{\bar{ρ}}_{x ε} = 0.0

,

ξ = 0.8

,

κ = 0.00

,

σ_{ε} = 1

,

q = 1

,

ϕ = 1.0

. These yield the DGP parameter values:

π_{λ} = 0.00

,

π_{η} = 0.00

,

σ_{v} = 0.60

,

σ_{η} = 1.0 * (1 - γ)

,

ρ_{v ε} = 0.0

(and

{\bar{ρ}}_{x η} = 0.00

,

{\bar{ρ}}_{x λ} = 0.00

).

Table 5. P0ft-XA *.

**Table 5.** P0ft-XA *.
Feasible t-Test Arellano-Bond: Actual Significance Level
${\bar{ρ}}_{x ε} = 0.0$			$θ = 0$							$θ = 1$
${\bar{ρ}}_{x ε} = 0.0$	$L$	$γ$	AB1	AB1aR	AB1cR	AB2a	AB2aW	AB2c	AB2cW	AB1	AB1aR	AB1cR	AB2a	AB2aW	AB2c	AB2cW	MAB
$T = 3$	11	0.20	0.062	0.064	0.057	0.084	0.062	0.065	0.059	0.213	0.088	0.070	0.134	0.074	0.082	0.067	0.556
		0.50	0.072	0.072	0.065	0.093	0.071	0.073	0.069	0.237	0.104	0.086	0.150	0.083	0.101	0.083	0.584
		0.80	0.120	0.119	0.114	0.144	0.104	0.121	0.114	0.348	0.190	0.172	0.248	0.146	0.192	0.166	0.697
$T = 6$	50	0.20	0.061	0.063	0.052	0.159	0.061	0.058	0.053	0.246	0.091	0.067	0.354	0.077	0.078	0.066	0.201
		0.50	0.078	0.078	0.071	0.182	0.071	0.076	0.071	0.299	0.123	0.098	0.395	0.096	0.108	0.091	0.241
		0.80	0.191	0.183	0.176	0.317	0.142	0.182	0.174	0.547	0.329	0.285	0.617	0.234	0.299	0.267	0.465
$T = 9$	116	0.20	0.073	0.074	0.062	0.324	0.070	0.069	0.064	0.272	0.101	0.075	0.689	0.095	0.087	0.075	0.154
		0.50	0.098	0.096	0.090	0.358	0.084	0.096	0.090	0.344	0.149	0.117	0.728	0.139	0.130	0.115	0.198
		0.80	0.255	0.242	0.239	0.552	0.192	0.246	0.235	0.653	0.411	0.367	0.893	0.376	0.396	0.366	0.460
	$L$	$β$	AB1	AB1aR	AB1cR	AB2a	AB2aW	AB2c	AB2cW	AB1	AB1aR	AB1cR	AB2a	AB2aW	AB2c	AB2cW	MAB
$T = 3$	11	1.43	0.052	0.052	0.051	0.073	0.053	0.057	0.052	0.216	0.070	0.065	0.117	0.062	0.071	0.056	0.586
		0.93	0.051	0.052	0.051	0.073	0.053	0.057	0.051	0.215	0.073	0.065	0.118	0.062	0.072	0.055	0.582
		0.31	0.050	0.051	0.050	0.072	0.052	0.055	0.050	0.214	0.068	0.062	0.116	0.061	0.072	0.057	0.581
$T = 6$	50	1.43	0.051	0.055	0.050	0.145	0.055	0.056	0.051	0.222	0.068	0.059	0.315	0.060	0.068	0.057	0.210
		0.93	0.054	0.054	0.052	0.145	0.055	0.058	0.053	0.225	0.069	0.061	0.315	0.062	0.069	0.058	0.216
		0.31	0.054	0.054	0.054	0.144	0.052	0.059	0.054	0.232	0.066	0.064	0.317	0.058	0.070	0.058	0.229
$T = 9$	116	1.43	0.050	0.052	0.048	0.296	0.057	0.054	0.051	0.232	0.068	0.057	0.652	0.067	0.067	0.055	0.138
		0.93	0.053	0.055	0.052	0.298	0.057	0.058	0.054	0.241	0.072	0.064	0.657	0.071	0.070	0.059	0.145
		0.31	0.057	0.056	0.058	0.297	0.057	0.061	0.057	0.241	0.069	0.069	0.654	0.066	0.071	0.061	0.153
Feasible t-Test Blundell-Bond: Actual Significance Level
${\bar{ρ}}_{x ε} = 0.0$			$θ = 0$							$θ = 1$
${\bar{ρ}}_{x ε} = 0.0$	$L$	$γ$	BB1	BB1aR	BB1cR	BB2a	BB2aW	BB2c	BB2cW	BB1	BB1aR	BB1cR	BB2a	BB2aW	BB2c	BB2cW	MBB
$T = 3$	16	0.20	0.051	0.056	0.041	0.093	0.059	0.058	0.050	0.196	0.076	0.051	0.158	0.064	0.073	0.058	0.494
		0.50	0.056	0.057	0.046	0.098	0.057	0.062	0.056	0.209	0.079	0.056	0.165	0.067	0.080	0.065	0.503
		0.80	0.066	0.065	0.048	0.103	0.057	0.055	0.043	0.251	0.100	0.071	0.208	0.073	0.087	0.065	0.595
$T = 6$	61	0.20	0.042	0.052	0.034	0.178	0.056	0.045	0.038	0.206	0.074	0.048	0.389	0.067	0.059	0.046	0.160
		0.50	0.053	0.060	0.044	0.184	0.054	0.052	0.042	0.248	0.092	0.064	0.413	0.073	0.069	0.052	0.167
		0.80	0.116	0.122	0.100	0.217	0.061	0.063	0.046	0.424	0.218	0.161	0.562	0.129	0.079	0.054	0.261
$T = 9$	133	0.20	0.049	0.056	0.041	0.364	0.054	0.050	0.042	0.221	0.076	0.051	0.735	0.072	0.057	0.043	0.119
		0.50	0.063	0.070	0.056	0.381	0.058	0.056	0.044	0.279	0.108	0.078	0.767	0.101	0.063	0.048	0.121
		0.80	0.173	0.176	0.156	0.506	0.109	0.066	0.049	0.552	0.313	0.249	0.894	0.282	0.074	0.049	0.167
	$L$	$β$	BB1	BB1aR	BB1cR	BB2a	BB2aW	BB2c	BB2cW	BB1	BB1aR	BB1cR	BB2a	BB2aW	BB2c	BB2cW	MBB
$T = 3$	16	1.43	0.051	0.054	0.052	0.083	0.056	0.058	0.053	0.210	0.070	0.061	0.147	0.064	0.073	0.058	0.571
		0.93	0.050	0.053	0.051	0.084	0.055	0.057	0.052	0.214	0.070	0.061	0.148	0.064	0.073	0.058	0.564
		0.31	0.048	0.051	0.049	0.087	0.056	0.059	0.054	0.217	0.072	0.062	0.155	0.065	0.078	0.064	0.626
$T = 6$	61	1.43	0.048	0.052	0.047	0.166	0.055	0.053	0.049	0.216	0.069	0.053	0.371	0.059	0.064	0.052	0.200
		0.93	0.051	0.055	0.050	0.166	0.053	0.054	0.049	0.223	0.069	0.058	0.376	0.063	0.064	0.051	0.205
		0.31	0.053	0.055	0.053	0.169	0.055	0.053	0.050	0.229	0.068	0.060	0.388	0.064	0.064	0.054	0.221
$T = 9$	133	1.43	0.047	0.051	0.046	0.342	0.052	0.051	0.047	0.220	0.067	0.054	0.717	0.064	0.060	0.049	0.128
		0.93	0.050	0.052	0.049	0.349	0.053	0.053	0.049	0.232	0.070	0.060	0.718	0.067	0.061	0.050	0.132
		0.31	0.055	0.055	0.055	0.356	0.056	0.056	0.051	0.234	0.070	0.066	0.730	0.066	0.063	0.054	0.144

*

R = 10, 000

simulation replications. Design parameter values:

N = 200

,

S N R = 3

,

D E N_{y} = 1.0

,

E V F_{x} = 0.0

,

{\bar{ρ}}_{x ε} = 0.0

,

ξ = 0.8

,

κ = 0.00

,

σ_{ε} = 1

,

q = 1

,

ϕ = 1.0

. These yield the DGP parameter values:

π_{λ} = 0.00

,

π_{η} = 0.00

,

σ_{v} = 0.60

,

σ_{η} = 1.0 * (1 - γ)

,

ρ_{v ε} = 0.0

(and

{\bar{ρ}}_{x η} = 0.00

,

{\bar{ρ}}_{x λ} = 0.00

).

Table 6. P0fJ-XA *.

**Table 6.** P0fJ-XA *.
Feasible Sargan-Hansen Test: Rejection Probability
${\bar{ρ}}_{x ε} = 0.0$	$df$				$θ = 0$
${\bar{ρ}}_{x ε} = 0.0$	AB	BB	Inc	$γ$	${JAB}_{a}^{(2, 1)}$	${JBB}_{a}^{(2, 1)}$	${JES}_{a}^{(2, 1)}$	${JAB}_{c}^{(2, 1)}$	${JBB}_{c}^{(2, 1)}$	${JES}_{c}^{(2, 1)}$	$JMAB$	$JMBB$	$JESM$
$T = 3$	9	13	4	0.20	0.047	0.050	0.053	0.043	0.033	0.030	0.244	0.304	0.306
				0.50	0.050	0.051	0.052	0.044	0.034	0.025	0.246	0.328	0.327
				0.80	0.061	0.055	0.047	0.055	0.035	0.021	0.251	0.381	0.375
$T = 6$	48	58	10	0.20	0.034	0.038	0.068	0.026	0.025	0.030	0.032	0.386	0.439
				0.50	0.037	0.039	0.062	0.027	0.023	0.023	0.033	0.391	0.442
				0.80	0.046	0.042	0.056	0.031	0.022	0.013	0.039	0.404	0.452
$T = 9$	114	130	16	0.20	0.007	0.002	0.056	0.021	0.023	0.033	0.022	0.409	0.466
				0.50	0.007	0.002	0.053	0.021	0.021	0.026	0.022	0.411	0.467
				0.80	0.009	0.002	0.048	0.025	0.019	0.014	0.026	0.416	0.471
${\bar{ρ}}_{x ε} = 0.0$	$df$				$θ = 1$
${\bar{ρ}}_{x ε} = 0.0$	AB	BB	Inc	$γ$	${JAB}_{a}^{(2, 1)}$	${JBB}_{a}^{(2, 1)}$	${JES}_{a}^{(2, 1)}$	${JAB}_{c}^{(2, 1)}$	${JBB}_{c}^{(2, 1)}$	${JES}_{c}^{(2, 1)}$	$JMAB$	$JMBB$	$JESM$
$T = 3$	9	13	4	0.20	0.036	0.035	0.047	0.037	0.034	0.041	0.278	0.569	0.557
				0.50	0.041	0.035	0.045	0.042	0.038	0.042	0.280	0.594	0.581
				0.80	0.057	0.041	0.045	0.061	0.047	0.047	0.300	0.620	0.608
$T = 6$	48	58	10	0.20	0.016	0.015	0.054	0.020	0.022	0.028	0.037	0.727	0.754
				0.50	0.018	0.016	0.051	0.023	0.020	0.027	0.036	0.730	0.756
				0.80	0.024	0.017	0.048	0.033	0.028	0.027	0.042	0.738	0.761
$T = 9$	114	130	16	0.20	0.001	0.000	0.046	0.015	0.017	0.032	0.024	0.764	0.788
				0.50	0.001	0.000	0.044	0.017	0.015	0.025	0.023	0.766	0.788
				0.80	0.001	0.000	0.038	0.025	0.019	0.020	0.027	0.770	0.791

*

R = 10, 000

simulation replications. Design parameter values:

N = 200

,

S N R = 3

,

D E N_{y} = 1.0

,

E V F_{x} = 0.0

,

{\bar{ρ}}_{x ε} = 0.0

,

ξ = 0.8

,

κ = 0.00

,

σ_{ε} = 1

,

q = 1

,

ϕ = 1.0

. These yield the DGP parameter values:

π_{λ} = 0.00

,

π_{η} = 0.00

,

σ_{v} = 0.60

,

σ_{η} = 1.0 * (1 - γ)

,

ρ_{v ε} = 0.0

(and

{\bar{ρ}}_{x η} = 0.00

,

{\bar{ρ}}_{x λ} = 0.00

).

Table 7. P0fσ-XA *.

**Table 7.** P0fσ-XA *.
Standard Errors of Error Components Eta and Epsilon
				$θ = 0$						$θ = 1$
${\bar{ρ}}_{x ε} = 0.0$				Bias ${\hat{σ}}_{η}$			Bias ${\hat{σ}}_{ε}$			Bias ${\hat{σ}}_{η}$				Bias ${\hat{σ}}_{ε}$
${\bar{ρ}}_{x ε} = 0.0$	$L$	$γ$	$σ_{η}$	AB1	AB2a	AB2c	AB1	AB2a	AB2c	AB1	AB2a	AB2c	MAB	AB1	AB2a	AB2c	MAB
$T = 3$	11	0.20	0.80	0.025	0.024	0.025	−0.007	−0.006	−0.007	0.053	0.043	0.048	0.049	−0.016	−0.013	−0.015	−0.015
		0.50	0.50	0.050	0.049	0.049	−0.011	−0.011	−0.011	0.106	0.086	0.095	0.099	−0.024	−0.020	−0.022	−0.023
		0.80	0.20	0.224	0.228	0.223	−0.033	−0.033	−0.033	0.413	0.377	0.390	0.402	−0.063	−0.056	−0.060	−0.061
$T = 6$	50	0.20	0.80	0.013	0.013	0.013	−0.003	−0.002	−0.003	0.027	0.022	0.023	0.019	−0.006	−0.005	−0.006	−0.005
		0.50	0.50	0.027	0.027	0.026	−0.005	−0.005	−0.005	0.057	0.046	0.048	0.042	−0.011	−0.009	−0.010	−0.008
		0.80	0.20	0.127	0.129	0.126	−0.019	−0.019	−0.019	0.244	0.214	0.219	0.202	−0.036	−0.032	−0.032	−0.030
$T = 9$	116	0.20	0.80	0.010	0.010	0.010	−0.001	−0.001	−0.001	0.020	0.019	0.017	0.013	−0.004	−0.003	−0.003	−0.002
		0.50	0.50	0.019	0.020	0.019	−0.003	−0.003	−0.003	0.040	0.037	0.034	0.027	−0.006	−0.006	−0.005	−0.004
		0.80	0.20	0.092	0.094	0.092	−0.012	−0.012	−0.012	0.172	0.162	0.154	0.134	−0.022	−0.021	−0.020	−0.017
	$L$	$γ$	$σ_{η}$	BB1	BB2a	BB2c	BB1	BB2a	BB2c	BB1	BB2a	BB2c	MBB	BB1	BB2a	BB2c	MBB
$T = 3$	16	0.20	0.80	0.008	0.006	0.005	−0.004	−0.003	−0.003	0.026	0.015	0.012	0.016	−0.012	−0.008	−0.008	−0.009
		0.50	0.50	0.021	0.012	0.009	−0.006	−0.004	−0.004	0.051	0.028	0.016	0.013	−0.016	−0.010	−0.008	−0.006
		0.80	0.20	0.090	0.049	0.037	−0.016	−0.008	−0.006	0.176	0.114	0.072	0.097	−0.031	−0.019	−0.011	0.007
$T = 6$	61	0.20	0.80	0.003	0.002	−0.000	−0.001	−0.001	−0.001	0.014	0.009	0.000	0.005	−0.005	−0.004	−0.002	−0.003
		0.50	0.50	0.013	0.005	−0.000	−0.003	−0.001	−0.001	0.034	0.022	0.001	0.005	−0.008	−0.006	−0.002	−0.003
		0.80	0.20	0.069	0.029	0.003	−0.011	−0.006	−0.002	0.141	0.102	0.011	−0.022	−0.023	−0.017	−0.005	0.001
$T = 9$	133	0.20	0.80	0.002	0.002	−0.001	−0.001	−0.001	−0.000	0.011	0.010	−0.001	0.004	−0.003	−0.003	−0.001	−0.002
		0.50	0.50	0.010	0.008	−0.002	−0.002	−0.001	−0.000	0.027	0.024	−0.001	0.005	−0.005	−0.005	−0.001	−0.002
		0.80	0.20	0.061	0.045	0.000	−0.008	−0.006	−0.001	0.119	0.110	0.004	−0.001	−0.017	−0.015	−0.003	−0.002

*

R = 10, 000

simulation replications. Design parameter values:

N = 200

,

S N R = 3

,

D E N_{y} = 1.0

,

E V F_{x} = 0.0

,

{\bar{ρ}}_{x ε} = 0.0

,

ξ = 0.8

,

κ = 0.00

,

σ_{ε} = 1

,

q = 1

,

ϕ = 1.0

. These yield the DGP parameter values:

π_{λ} = 0.00

,

π_{η} = 0.00

,

σ_{v} = 0.60

,

σ_{η} = 1.0 * (1 - γ)

;

ρ_{v ε} = 0.0

(and

{\bar{ρ}}_{x η} = 0.00

,

{\bar{ρ}}_{x λ} = 0.00

).

Table 8. P0fc-XC *.

**Table 8.** P0fc-XC *.
Feasible Coefficient Estimators for Arellano-Bond ( $θ = 1$ )
${\bar{ρ}}_{x ε} = 0.0$			AB1			AB2a			AB2c			MAB
${\bar{ρ}}_{x ε} = 0.0$	$L$	$γ$	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE
$T = 3$	6	0.20	−0.010	0.090	0.090	−0.010	0.086	0.086	−0.013	0.087	0.088	−0.010	0.086	0.086
		0.50	−0.019	0.118	0.120	−0.018	0.113	0.114	−0.022	0.114	0.116	−0.018	0.113	0.115
		0.80	−0.065	0.217	0.227	−0.062	0.207	0.216	−0.074	0.206	0.219	−0.065	0.210	0.220
$T = 6$	12	0.20	−0.006	0.049	0.050	−0.004	0.046	0.046	−0.006	0.047	0.048	−0.004	0.039	0.040
		0.50	−0.011	0.059	0.060	−0.007	0.054	0.055	−0.010	0.056	0.057	−0.007	0.047	0.047
		0.80	−0.035	0.094	0.100	−0.026	0.088	0.092	−0.033	0.089	0.095	−0.024	0.074	0.078
$T = 9$	18	0.20	−0.005	0.037	0.037	−0.003	0.033	0.034	−0.004	0.035	0.035	−0.002	0.027	0.027
		0.50	−0.009	0.042	0.043	−0.006	0.038	0.039	−0.007	0.040	0.041	−0.004	0.031	0.031
		0.80	−0.024	0.061	0.066	−0.017	0.056	0.059	−0.021	0.058	0.062	−0.014	0.045	0.047
	$L$	$β$	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE
$T = 3$	6	1.43	0.004	0.181	0.181	0.003	0.172	0.172	0.002	0.174	0.174	0.004	0.159	0.159
		0.93	0.003	0.180	0.180	0.002	0.172	0.172	0.002	0.174	0.174	0.004	0.159	0.159
		0.31	0.001	0.179	0.179	−0.001	0.171	0.171	−0.001	0.173	0.173	0.000	0.158	0.158
$T = 6$	12	1.43	0.004	0.110	0.111	0.000	0.102	0.102	0.003	0.106	0.106	0.002	0.079	0.079
		0.93	0.004	0.109	0.109	0.000	0.101	0.101	0.003	0.105	0.105	0.002	0.077	0.077
		0.31	0.001	0.109	0.109	−0.001	0.101	0.101	0.000	0.105	0.105	0.001	0.076	0.076
$T = 9$	18	1.43	0.003	0.084	0.084	0.000	0.075	0.075	0.002	0.080	0.080	0.001	0.056	0.056
		0.93	0.003	0.083	0.083	0.001	0.074	0.074	0.002	0.079	0.079	0.002	0.055	0.055
		0.31	0.001	0.082	0.082	−0.001	0.073	0.073	0.000	0.078	0.078	0.000	0.053	0.053
Feasible Coefficient Estimators for Blundell-Bond ( $θ = 1$ )
${\bar{ρ}}_{x ε} = 0.0$			BB1			BB2a			BB2c			MBB
${\bar{ρ}}_{x ε} = 0.0$	$L$	$γ$	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE
$T = 3$	9	0.20	−0.005	0.078	0.078	−0.002	0.074	0.074	−0.004	0.075	0.075	0.001	0.076	0.076
		0.50	−0.011	0.092	0.093	−0.007	0.086	0.086	−0.008	0.087	0.088	0.010	0.101	0.101
		0.80	−0.037	0.129	0.134	−0.024	0.123	0.126	−0.025	0.126	0.128	0.052	0.204	0.210
$T = 6$	15	0.20	−0.004	0.046	0.046	−0.001	0.042	0.042	−0.002	0.044	0.044	−0.001	0.036	0.036
		0.50	−0.007	0.052	0.052	−0.002	0.047	0.047	−0.003	0.048	0.049	0.001	0.041	0.041
		0.80	−0.021	0.071	0.074	−0.010	0.063	0.063	−0.011	0.065	0.066	0.022	0.064	0.067
$T = 9$	21	0.20	−0.003	0.035	0.035	−0.001	0.031	0.031	−0.001	0.033	0.033	−0.001	0.026	0.026
		0.50	−0.006	0.038	0.039	−0.002	0.034	0.034	−0.003	0.036	0.036	−0.001	0.028	0.028
		0.80	−0.016	0.051	0.053	−0.007	0.044	0.045	−0.007	0.047	0.047	0.003	0.038	0.038
	$L$	$β$	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE
$T = 3$	9	1.43	0.004	0.170	0.170	0.001	0.162	0.162	0.005	0.163	0.163	0.007	0.157	0.157
		0.93	0.006	0.168	0.168	0.004	0.158	0.158	0.009	0.160	0.160	0.024	0.154	0.156
		0.31	0.008	0.172	0.172	0.007	0.164	0.164	0.010	0.165	0.166	0.052	0.180	0.188
$T = 6$	15	1.43	0.004	0.107	0.107	0.002	0.098	0.098	0.004	0.102	0.102	0.002	0.078	0.078
		0.93	0.005	0.105	0.105	0.003	0.095	0.096	0.006	0.100	0.100	0.005	0.076	0.076
		0.31	0.004	0.106	0.106	0.003	0.097	0.097	0.005	0.101	0.101	0.014	0.076	0.077
$T = 9$	21	1.43	0.003	0.082	0.082	0.002	0.073	0.073	0.003	0.078	0.078	0.001	0.056	0.056
		0.93	0.004	0.080	0.080	0.003	0.071	0.071	0.005	0.075	0.075	0.002	0.054	0.054
		0.31	0.003	0.079	0.080	0.003	0.070	0.070	0.004	0.075	0.075	0.005	0.053	0.053

*

R = 10, 000

simulation replications. Design parameter values:

N = 200

,

S N R = 3

,

D E N_{y} = 1.0

,

E V F_{x} = 0.0

,

{\bar{ρ}}_{x ε} = 0.0

,

ξ = 0.8

,

κ = 0.00

,

σ_{ε} = 1

,

q = 1

,

ϕ = 1.0

. These yield the DGP parameter values:

π_{λ} = 0.00

,

π_{η} = 0.00

,

σ_{v} = 0.60

,

σ_{η} = 1.0 * (1 - γ)

,

ρ_{v ε} = 0.0

(and

{\bar{ρ}}_{x η} = 0.00

,

{\bar{ρ}}_{x λ} = 0.00

).

Table 9. P0ft-XC *.

**Table 9.** P0ft-XC *.
Feasible t-Test: Actual Significance Level ( $θ = 1$ )
${\bar{ρ}}_{x ε} = 0.0$			Arellano-Bond									Blundell-Bond
${\bar{ρ}}_{x ε} = 0.0$	$L$	$γ$	AB1	AB1aR	AB1cR	AB2a	AB2aW	AB2c	AB2cW	$L$	$γ$	BB1	BB1aR	BB1cR	BB2a	BB2aW	BB2c	BB2cW
$T = 3$	6	0.20	0.192	0.073	0.051	0.097	0.071	0.059	0.055	9	0.20	0.196	0.070	0.046	0.110	0.066	0.058	0.051
		0.50	0.198	0.079	0.058	0.100	0.072	0.066	0.061		0.50	0.201	0.071	0.046	0.109	0.067	0.064	0.056
		0.80	0.230	0.110	0.091	0.132	0.094	0.102	0.095		0.80	0.218	0.079	0.046	0.131	0.071	0.063	0.055
$T = 6$	12	0.20	0.208	0.070	0.047	0.120	0.062	0.048	0.046	15	0.20	0.203	0.066	0.043	0.129	0.060	0.048	0.046
		0.50	0.206	0.069	0.049	0.121	0.061	0.050	0.048		0.50	0.203	0.067	0.043	0.127	0.058	0.048	0.045
		0.80	0.239	0.090	0.067	0.139	0.071	0.072	0.069		0.80	0.225	0.081	0.050	0.135	0.062	0.049	0.045
$T = 9$	18	0.20	0.216	0.070	0.047	0.148	0.064	0.051	0.048	21	0.20	0.208	0.067	0.047	0.153	0.063	0.050	0.048
		0.50	0.216	0.071	0.049	0.144	0.064	0.054	0.053		0.50	0.209	0.069	0.047	0.155	0.059	0.049	0.047
		0.80	0.233	0.090	0.068	0.165	0.068	0.068	0.066		0.80	0.225	0.078	0.054	0.160	0.058	0.049	0.045
	$L$	$β$	AB1	AB1aR	AB1cR	AB2a	AB2aW	AB2c	AB2cW	$L$	$β$	BB1	BB1aR	BB1cR	BB2a	BB2aW	BB2c	BB2cW
$T = 3$	6	1.43	0.212	0.069	0.063	0.091	0.068	0.068	0.061	9	1.43	0.210	0.068	0.062	0.104	0.066	0.069	0.062
		0.93	0.211	0.066	0.061	0.089	0.067	0.066	0.059		0.93	0.213	0.068	0.061	0.105	0.068	0.068	0.062
		0.31	0.206	0.066	0.059	0.089	0.064	0.061	0.054		0.31	0.215	0.070	0.062	0.111	0.073	0.072	0.064
$T = 6$	12	1.43	0.215	0.064	0.054	0.114	0.062	0.058	0.055	15	1.43	0.214	0.065	0.056	0.124	0.063	0.062	0.058
		0.93	0.217	0.065	0.054	0.114	0.062	0.056	0.055		0.93	0.216	0.066	0.056	0.125	0.064	0.062	0.057
		0.31	0.214	0.065	0.052	0.116	0.064	0.055	0.053		0.31	0.217	0.068	0.054	0.128	0.065	0.060	0.058
$T = 9$	18	1.43	0.221	0.064	0.051	0.135	0.058	0.055	0.054	21	1.43	0.220	0.065	0.052	0.147	0.059	0.056	0.054
		0.93	0.222	0.063	0.052	0.138	0.059	0.054	0.053		0.93	0.219	0.063	0.052	0.149	0.059	0.055	0.053
		0.31	0.219	0.063	0.052	0.139	0.057	0.053	0.051		0.31	0.221	0.064	0.053	0.150	0.057	0.056	0.054

*

R = 10, 000

simulation replications. Design parameter values:

N = 200

,

S N R = 3

,

D E N_{y} = 1.0

,

E V F_{x} = 0.0

,

{\bar{ρ}}_{x ε} = 0.0

,

ξ = 0.8

,

κ = 0.00

,

σ_{ε} = 1

,

q = 1

,

ϕ = 1.0

. These yield the DGP parameter values:

π_{λ} = 0.00

,

π_{η} = 0.00

,

σ_{v} = 0.60

,

σ_{η} = 1.0 * (1 - γ)

,

ρ_{v ε} = 0.0

(and

{\bar{ρ}}_{x η} = 0.00

,

{\bar{ρ}}_{x λ} = 0.00

).

Table 10. P0fJ-XC *.

**Table 10.** P0fJ-XC *.
Feasible Sargan-Hansen Test: Rejection Probability
${\bar{ρ}}_{x ε} = 0.0$	$df$				$θ = 1$
${\bar{ρ}}_{x ε} = 0.0$	AB	BB	Inc	$γ$	${JAB}_{a}^{(2, 1)}$	${JBB}_{a}^{(2, 1)}$	${JES}_{a}^{(2, 1)}$	${JAB}_{c}^{(2, 1)}$	${JBB}_{c}^{(2, 1)}$	${JES}_{c}^{(2, 1)}$
$T = 3$	4	6	2	0.20	0.037	0.037	0.045	0.042	0.040	0.044
				0.50	0.040	0.037	0.042	0.045	0.042	0.042
				0.80	0.048	0.040	0.041	0.054	0.048	0.046
$T = 6$	10	12	2	0.20	0.032	0.030	0.048	0.036	0.035	0.042
				0.50	0.031	0.029	0.046	0.037	0.037	0.044
				0.80	0.037	0.035	0.047	0.040	0.040	0.053
$T = 9$	16	18	2	0.20	0.027	0.027	0.047	0.033	0.034	0.050
				0.50	0.029	0.028	0.046	0.034	0.032	0.049
				0.80	0.032	0.030	0.051	0.035	0.036	0.054

*

R = 10, 000

simulation replications. Design parameter values:

N = 200

,

S N R = 3

,

D E N_{y} = 1.0

,

E V F_{x} = 0.0

,

{\bar{ρ}}_{x ε} = 0.0

,

ξ = 0.8

,

κ = 0.00

,

σ_{ε} = 1

,

q = 1

,

ϕ = 1.0

. These yield the DGP parameter values:

π_{λ} = 0.00

,

π_{η} = 0.00

,

σ_{v} = 0.60

,

σ_{η} = 1.0 * (1 - γ)

,

ρ_{v ε} = 0.0

(and

{\bar{ρ}}_{x η} = 0.00

,

{\bar{ρ}}_{x λ} = 0.00

).

Table 11. P0fσ-XC *.

**Table 11.** P0fσ-XC *.
Standard Errors of Error Components $η_{i}$ and $ε_{it}$
				$θ = 0$						$θ = 1$
${\bar{ρ}}_{x ε} = 0.0$				Bias ${\hat{σ}}_{η}$			Bias ${\hat{σ}}_{ε}$			Bias ${\hat{σ}}_{η}$				Bias ${\hat{σ}}_{ε}$
${\bar{ρ}}_{x ε} = 0.0$	$L$	$γ$	$σ_{η}$	AB1	AB2a	AB2c	AB1	AB2a	AB2c	AB1	AB2a	AB2c	MAB	AB1	AB2a	AB2c	MAB
$T = 3$	6	0.20	0.80	0.017	0.018	0.018	−0.004	−0.004	−0.004	0.037	0.035	0.040	0.032	−0.010	−0.009	−0.011	−0.010
		0.50	0.50	0.029	0.030	0.032	−0.006	−0.006	−0.006	0.070	0.063	0.072	0.061	−0.013	−0.012	−0.014	−0.013
		0.80	0.20	0.152	0.155	0.159	−0.014	−0.013	−0.015	0.306	0.284	0.305	0.291	−0.027	−0.026	−0.031	−0.027
$T = 6$	12	0.20	0.80	0.005	0.005	0.005	−0.001	−0.001	−0.001	0.013	0.009	0.012	0.007	−0.003	−0.002	−0.003	−0.002
		0.50	0.50	0.010	0.010	0.010	−0.001	−0.001	−0.001	0.025	0.017	0.022	0.013	−0.004	−0.003	−0.004	−0.003
		0.80	0.20	0.034	0.034	0.036	−0.005	−0.004	−0.005	0.097	0.074	0.091	0.059	−0.011	−0.008	−0.011	−0.009
$T = 9$	18	0.20	0.80	0.003	0.003	0.003	−0.000	−0.000	−0.000	0.008	0.006	0.007	0.004	−0.001	−0.001	−0.001	−0.001
		0.50	0.50	0.006	0.006	0.006	−0.001	−0.000	−0.001	0.016	0.010	0.013	0.007	−0.002	−0.001	−0.002	−0.001
		0.80	0.20	0.015	0.014	0.016	−0.002	−0.002	−0.003	0.053	0.034	0.044	0.021	−0.006	−0.004	−0.006	−0.004
	$L$	$γ$	$σ_{η}$	BB1	BB2a	BB2c	BB1	BB2a	BB2c	BB1	BB2a	BB2c	MBB	BB1	BB2a	BB2c	MBB
$T = 3$	9	0.20	0.80	0.008	0.008	0.008	−0.003	−0.002	−0.003	0.023	0.017	0.019	0.012	−0.009	−0.006	−0.007	−0.006
		0.50	0.50	0.016	0.014	0.014	−0.004	−0.003	−0.004	0.040	0.027	0.028	0.009	−0.011	−0.008	−0.009	−0.001
		0.80	0.20	0.073	0.056	0.061	−0.010	−0.007	−0.008	0.158	0.120	0.124	0.149	−0.021	−0.014	−0.015	0.027
$T = 6$	15	0.20	0.80	0.003	0.002	0.002	−0.001	−0.000	−0.001	0.009	0.004	0.005	0.003	−0.003	−0.002	−0.002	−0.002
		0.50	0.50	0.006	0.003	0.004	−0.001	−0.000	−0.001	0.017	0.007	0.009	0.001	−0.004	−0.002	−0.002	−0.001
		0.80	0.20	0.017	0.005	0.010	−0.003	−0.002	−0.002	0.059	0.026	0.031	−0.045	−0.008	−0.004	−0.005	0.008
$T = 9$	21	0.20	0.80	0.002	0.001	0.001	−0.000	−0.000	−0.000	0.006	0.003	0.003	0.002	−0.001	−0.001	−0.001	−0.001
		0.50	0.50	0.004	0.002	0.003	−0.000	−0.000	−0.000	0.011	0.005	0.005	0.003	−0.002	−0.001	−0.001	−0.001
		0.80	0.20	0.008	−0.001	0.003	−0.002	−0.000	−0.001	0.033	0.009	0.011	−0.018	−0.005	−0.002	−0.002	0.001

*

R = 10, 000

simulation replications. Design parameter values:

N = 200

,

S N R = 3

,

D E N_{y} = 1.0

,

E V F_{x} = 0.0

,

{\bar{ρ}}_{x ε} = 0.0

,

ξ = 0.8

,

κ = 0.00

,

σ_{ε} = 1

,

q = 1

,

ϕ = 1.0

. These yield the DGP parameter values:

π_{λ} = 0.00

,

π_{η} = 0.00

,

σ_{v} = 0.60

,

σ_{η} = 1.0 * (1 - γ)

,

ρ_{v ε} = 0.0

(and

{\bar{ρ}}_{x η} = 0.00

,

{\bar{ρ}}_{x λ} = 0.00

).

Table 12. P1fc-XC *.

**Table 12.** P1fc-XC *.
Feasible Coefficient Estimators for Arellano-Bond ( $θ = 1$ )
${\bar{ρ}}_{x ε} = 0.0$			AB1			AB2a			AB2c			MAB
${\bar{ρ}}_{x ε} = 0.0$	$L$	$γ$	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE
$T = 3$	6	0.20	−0.017	0.134	0.135	−0.020	0.127	0.129	−0.021	0.126	0.128	−0.017	0.132	0.134
		0.50	−0.036	0.175	0.179	−0.038	0.167	0.171	−0.041	0.164	0.169	−0.035	0.172	0.176
		0.80	−0.131	0.303	0.330	−0.134	0.298	0.326	−0.135	0.284	0.315	−0.136	0.298	0.328
$T = 6$	12	0.20	−0.009	0.062	0.063	−0.007	0.058	0.058	−0.008	0.058	0.059	−0.006	0.052	0.053
		0.50	−0.017	0.077	0.079	−0.014	0.072	0.074	−0.016	0.072	0.074	−0.011	0.065	0.066
		0.80	−0.061	0.126	0.139	−0.053	0.121	0.133	−0.056	0.117	0.130	−0.049	0.113	0.123
$T = 9$	18	0.20	−0.006	0.044	0.044	−0.005	0.040	0.040	−0.005	0.041	0.042	−0.004	0.035	0.035
		0.50	−0.012	0.052	0.053	−0.009	0.048	0.049	−0.010	0.049	0.050	−0.007	0.041	0.042
		0.80	−0.040	0.081	0.090	−0.032	0.077	0.084	−0.035	0.076	0.083	−0.028	0.070	0.075
	$L$	$β$	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE
$T = 3$	6	1.43	0.005	0.182	0.182	0.002	0.174	0.174	0.003	0.176	0.176	0.007	0.161	0.161
		0.93	0.003	0.181	0.181	0.000	0.174	0.174	0.001	0.175	0.175	0.006	0.159	0.159
		0.31	−0.002	0.177	0.177	−0.006	0.171	0.171	−0.004	0.172	0.172	−0.002	0.157	0.157
$T = 6$	12	0.20	0.004	0.110	0.110	0.002	0.103	0.103	0.004	0.106	0.106	0.003	0.080	0.080
		0.93	0.004	0.109	0.109	0.001	0.102	0.102	0.004	0.105	0.105	0.004	0.078	0.078
		0.31	0.000	0.108	0.108	−0.002	0.102	0.102	−0.001	0.105	0.105	0.002	0.076	0.076
$T = 9$	18	1.43	0.003	0.084	0.084	0.001	0.076	0.076	0.003	0.081	0.081	0.002	0.057	0.057
		0.93	0.004	0.083	0.083	0.001	0.075	0.075	0.003	0.079	0.079	0.003	0.055	0.055
		0.31	0.000	0.082	0.082	−0.001	0.074	0.074	−0.000	0.078	0.078	0.001	0.053	0.053
Feasible Coefficient Estimators for Blundell-Bond ( $θ = 1$ )
${\bar{ρ}}_{x ε} = 0.0$			BB1			BB2a			BB2c			MBB
${\bar{ρ}}_{x ε} = 0.0$	$L$	$γ$	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE
$T = 3$	9	0.20	0.023	0.125	0.128	0.018	0.114	0.116	0.016	0.114	0.115	0.021	0.123	0.124
		0.50	0.011	0.124	0.125	0.012	0.124	0.124	0.010	0.126	0.126	0.017	0.129	0.130
		0.80	−0.022	0.142	0.144	−0.015	0.148	0.149	−0.016	0.153	0.154	0.031	0.190	0.193
$T = 6$	15	0.20	0.007	0.061	0.062	0.004	0.053	0.054	0.002	0.055	0.055	0.002	0.049	0.049
		0.50	0.002	0.064	0.064	0.004	0.061	0.061	0.002	0.063	0.063	0.001	0.056	0.056
		0.80	−0.016	0.078	0.080	−0.005	0.075	0.075	−0.009	0.078	0.078	0.008	0.079	0.080
$T = 9$	21	0.20	0.003	0.044	0.044	0.002	0.038	0.038	−0.000	0.039	0.039	0.000	0.033	0.033
		0.50	−0.001	0.046	0.046	0.002	0.042	0.042	−0.001	0.044	0.044	−0.001	0.037	0.037
		0.80	−0.014	0.056	0.058	−0.004	0.053	0.053	−0.007	0.055	0.056	−0.002	0.050	0.050
	$L$	$β$	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE
$T = 3$	9	1.43	−0.010	0.185	0.185	−0.000	0.175	0.175	0.000	0.174	0.174	−0.009	0.171	0.171
		0.93	−0.003	0.176	0.176	−0.000	0.170	0.170	0.000	0.170	0.170	−0.001	0.163	0.163
		0.31	0.004	0.174	0.174	0.002	0.167	0.167	0.003	0.168	0.168	0.019	0.167	0.168
$T = 6$	15	1.43	−0.002	0.112	0.112	−0.000	0.103	0.103	0.001	0.106	0.106	−0.001	0.082	0.082
		0.93	0.001	0.108	0.108	0.001	0.101	0.101	0.002	0.104	0.104	0.000	0.079	0.079
		0.31	0.003	0.106	0.107	0.002	0.100	0.100	0.003	0.103	0.103	0.004	0.077	0.077
$T = 9$	21	1.43	−0.001	0.086	0.086	−0.000	0.076	0.076	0.001	0.081	0.081	−0.000	0.058	0.058
		0.93	0.002	0.082	0.082	0.001	0.074	0.074	0.002	0.078	0.078	0.000	0.056	0.056
		0.31	0.002	0.080	0.080	0.002	0.073	0.073	0.002	0.076	0.076	0.001	0.054	0.054

*

R = 10, 000

simulation replications. Design parameter values:

N = 200

,

S N R = 3

,

D E N_{y} = 4.0

,

E V F_{x} = 0.0

,

{\bar{ρ}}_{x ε} = 0.0

,

ξ = 0.8

,

κ = 0.00

,

σ_{ε} = 1

,

q = 1

,

ϕ = 1.0

. These yield the DGP parameter values:

π_{λ} = 0.00

,

π_{η} = 0.00

,

σ_{v} = 0.60

,

σ_{η} = 4.0 * (1 - γ)

,

ρ_{v ε} = 0.0

(and

{\bar{ρ}}_{x η} = 0.00

,

{\bar{ρ}}_{x λ} = 0.00

).

Table 13. P1ft-XC *.

**Table 13.** P1ft-XC *.
Feasible t-Test: Actual Significance Level ( $θ = 1$ )
${\bar{ρ}}_{x ε} = 0.0$			Arellano-Bond									Blundell-Bond
${\bar{ρ}}_{x ε} = 0.0$	$L$	$γ$	AB1	AB1aR	AB1cR	AB2a	AB2aW	AB2c	AB2cW	$L$	$γ$	BB1	BB1aR	BB1cR	BB2a	BB2aW	BB2c	BB2cW
$T = 3$	6	0.20	0.165	0.075	0.064	0.099	0.072	0.068	0.063	9	0.20	0.118	0.087	0.068	0.132	0.080	0.085	0.065
		0.50	0.179	0.082	0.073	0.106	0.076	0.080	0.073		0.50	0.138	0.083	0.063	0.146	0.093	0.097	0.077
		0.80	0.236	0.134	0.130	0.165	0.111	0.137	0.130		0.80	0.144	0.063	0.034	0.141	0.088	0.088	0.069
$T = 6$	12	0.20	0.192	0.070	0.053	0.115	0.060	0.052	0.050	15	0.20	0.124	0.071	0.048	0.127	0.062	0.052	0.046
		0.50	0.192	0.072	0.058	0.117	0.059	0.056	0.054		0.50	0.141	0.065	0.045	0.133	0.065	0.059	0.049
		0.80	0.215	0.099	0.086	0.155	0.076	0.088	0.085		0.80	0.161	0.065	0.038	0.138	0.069	0.058	0.048
$T = 9$	18	0.20	0.201	0.070	0.054	0.140	0.063	0.055	0.053	21	0.20	0.138	0.064	0.047	0.150	0.062	0.052	0.048
		0.50	0.201	0.071	0.057	0.140	0.063	0.058	0.055		0.50	0.155	0.065	0.045	0.153	0.060	0.051	0.046
		0.80	0.225	0.095	0.084	0.174	0.067	0.078	0.076		0.80	0.175	0.068	0.042	0.157	0.061	0.053	0.044
	$L$	$β$	AB1	AB1aR	AB1cR	AB2a	AB2aW	AB2c	AB2cW	$L$	$β$	BB1	BB1aR	BB1cR	BB2a	BB2aW	BB2c	BB2cW
$T = 3$	6	1.43	0.207	0.066	0.060	0.086	0.065	0.065	0.058	9	1.43	0.176	0.068	0.058	0.097	0.065	0.065	0.058
		0.93	0.204	0.064	0.057	0.086	0.065	0.061	0.055		0.93	0.191	0.066	0.056	0.097	0.066	0.066	0.059
		0.31	0.195	0.061	0.051	0.083	0.061	0.057	0.050		0.31	0.208	0.069	0.057	0.103	0.068	0.067	0.059
$T = 6$	12	1.43	0.213	0.065	0.053	0.111	0.064	0.056	0.053	15	1.43	0.196	0.065	0.055	0.119	0.061	0.058	0.055
		0.93	0.215	0.066	0.053	0.112	0.062	0.057	0.053		0.93	0.205	0.064	0.053	0.119	0.062	0.059	0.055
		0.31	0.211	0.064	0.051	0.114	0.062	0.055	0.052		0.31	0.214	0.067	0.053	0.124	0.063	0.059	0.058
$T = 9$	18	1.43	0.219	0.063	0.051	0.131	0.060	0.053	0.052	21	1.43	0.200	0.063	0.053	0.139	0.061	0.056	0.052
		0.93	0.219	0.063	0.051	0.131	0.057	0.054	0.052		0.93	0.207	0.064	0.054	0.141	0.061	0.055	0.053
		0.31	0.215	0.063	0.050	0.130	0.056	0.052	0.051		0.31	0.217	0.063	0.054	0.141	0.060	0.054	0.052

*

R = 10, 000

simulation replications. Design parameter values:

N = 200

,

S N R = 3

,

D E N_{y} = 4.0

,

E V F_{x} = 0.0

,

{\bar{ρ}}_{x ε} = 0.0

,

ξ = 0.8

,

κ = 0.00

,

σ_{ε} = 1

,

q = 1

,

ϕ = 1.0

. These yield the DGP parameter values:

π_{λ} = 0.00

,

π_{η} = 0.00

,

σ_{v} = 0.60

,

σ_{η} = 4.0 * (1 - γ)

,

ρ_{v ε} = 0.0

(and

{\bar{ρ}}_{x η} = 0.00

,

{\bar{ρ}}_{x λ} = 0.00

).

Table 14. P1fJ-XC *.

**Table 14.** P1fJ-XC *.
Feasible Sargan-Hansen Test: Rejection Probability
${\bar{ρ}}_{x ε} = 0.0$	$df$				$θ = 1$
${\bar{ρ}}_{x ε} = 0.0$	AB	BB	Inc	$γ$	${JAB}_{a}^{(2, 1)}$	${JBB}_{a}^{(2, 1)}$	${JES}_{a}^{(2, 1)}$	${JAB}_{c}^{(2, 1)}$	${JBB}_{c}^{(2, 1)}$	${JES}_{c}^{(2, 1)}$
$T = 3$	4	6	2	0.20	0.039	0.043	0.061	0.043	0.047	0.054
				0.50	0.042	0.043	0.055	0.048	0.042	0.046
				0.80	0.053	0.036	0.042	0.059	0.038	0.036
$T = 6$	10	12	2	0.20	0.035	0.036	0.055	0.039	0.041	0.053
				0.50	0.034	0.035	0.052	0.039	0.042	0.054
				0.80	0.043	0.035	0.044	0.045	0.042	0.044
$T = 9$	16	18	2	0.20	0.029	0.029	0.056	0.037	0.039	0.054
				0.50	0.029	0.028	0.053	0.037	0.039	0.054
				0.80	0.031	0.026	0.049	0.040	0.039	0.046

*

R = 10, 000

simulation replications. Design parameter values:

N = 200

,

S N R = 3

,

D E N_{y} = 4.0

,

E V F_{x} = 0.0

,

{\bar{ρ}}_{x ε} = 0.0

,

ξ = 0.8

,

κ = 0.00

,

σ_{ε} = 1

,

q = 1

,

ϕ = 1.0

. These yield the DGP parameter values:

π_{λ} = 0.00

,

π_{η} = 0.00

,

σ_{v} = 0.60

,

σ_{η} = 4.0 * (1 - γ)

,

ρ_{v ε} = 0.0

(and

{\bar{ρ}}_{x η} = 0.00

,

{\bar{ρ}}_{x λ} = 0.00

).

Table 15. P1fσ-XC *.

**Table 15.** P1fσ-XC *.
Standard Errors of Error Components $η_{i}$ and $ε_{it}$
				$θ = 0$						$θ = 1$
${\bar{ρ}}_{x ε} = 0.0$				Bias ${\hat{σ}}_{η}$			Bias ${\hat{σ}}_{ε}$			Bias ${\hat{σ}}_{η}$				Bias ${\hat{σ}}_{ε}$
	$L$	$γ$	$σ_{η}$	AB1	AB2a	AB2c	AB1	AB2a	AB2c	AB1	AB2a	AB2c	MAB	AB1	AB2a	AB2c	MAB
$T = 3$	6	0.20	3.20	0.050	0.053	0.053	−0.002	−0.002	−0.003	0.081	0.091	0.096	0.078	−0.006	−0.007	−0.008	−0.006
		0.50	2.00	0.104	0.107	0.108	−0.006	−0.006	−0.007	0.177	0.184	0.193	0.168	−0.012	−0.013	−0.015	−0.012
		0.80	0.80	0.490	0.510	0.498	−0.025	−0.025	−0.026	0.823	0.810	0.802	0.826	−0.040	−0.041	−0.044	−0.042
$T = 6$	12	0.20	3.20	0.020	0.020	0.020	−0.001	−0.000	−0.001	0.039	0.031	0.035	0.024	−0.002	−0.001	−0.002	−0.002
		0.50	2.00	0.038	0.038	0.038	−0.002	−0.001	−0.002	0.074	0.061	0.069	0.049	−0.004	−0.003	−0.004	−0.003
		0.80	0.80	0.143	0.147	0.144	−0.009	−0.009	−0.009	0.279	0.241	0.254	0.219	−0.016	−0.014	−0.016	−0.014
$T = 9$	18	0.20	3.20	0.012	0.013	0.012	−0.000	0.000	−0.000	0.027	0.019	0.022	0.015	−0.001	−0.001	−0.001	−0.001
		0.50	2.00	0.023	0.023	0.023	−0.000	−0.000	−0.000	0.049	0.036	0.042	0.028	−0.002	−0.001	−0.002	−0.001
		0.80	0.80	0.087	0.086	0.087	−0.004	−0.004	−0.004	0.172	0.139	0.151	0.121	−0.009	−0.007	−0.008	−0.006
	$L$	$γ$	$σ_{η}$	BB1	BB2a	BB2c	BB1	BB2a	BB2c	BB1	BB2a	BB2c	MBB	BB1	BB2a	BB2c	MBB
$T = 3$	9	0.20	3.20	−0.105	−0.062	−0.058	0.014	0.007	0.006	−0.089	−0.068	−0.060	−0.081	0.009	0.005	0.004	0.008
		0.50	2.00	−0.064	−0.054	−0.049	0.008	0.007	0.006	−0.036	−0.042	−0.033	−0.060	0.002	0.003	0.002	0.005
		0.80	0.80	0.037	0.026	0.028	−0.004	−0.002	−0.002	0.134	0.102	0.112	0.054	−0.013	−0.008	−0.009	0.016
$T = 6$	15	0.20	3.20	−0.037	−0.008	−0.003	0.003	0.001	0.000	−0.027	−0.015	−0.006	−0.010	0.001	0.000	−0.001	−0.000
		0.50	2.00	−0.024	−0.013	−0.006	0.002	0.001	0.001	−0.005	−0.016	−0.005	−0.005	0.000	0.001	−0.000	−0.000
		0.80	0.80	0.021	−0.001	0.013	−0.001	0.001	−0.001	0.071	0.020	0.036	−0.039	−0.006	−0.001	−0.003	0.004
$T = 9$	21	0.20	3.20	−0.020	−0.003	0.002	0.001	0.000	0.000	−0.012	−0.006	0.001	−0.000	0.000	0.000	−0.000	−0.000
		0.50	2.00	−0.011	−0.006	0.002	0.001	0.001	0.000	0.004	−0.006	0.003	0.004	−0.000	0.000	−0.000	−0.000
		0.80	0.80	0.018	−0.001	0.013	−0.001	0.001	−0.000	0.059	0.015	0.028	0.007	−0.004	−0.000	−0.001	0.000

*

R = 10, 000

simulation replications. Design parameter values:

N = 200

,

S N R = 3

,

D E N_{y} = 4.0

,

E V F_{x} = 0.0

,

{\bar{ρ}}_{x ε} = 0.0

,

ξ = 0.8

,

κ = 0.00

,

σ_{ε} = 1

,

q = 1

,

ϕ = 1.0

. These yield the DGP parameter values:

π_{λ} = 0.00

,

π_{η} = 0.00

,

σ_{v} = 0.60

,

σ_{η} = 4.0 * (1 - γ)

,

ρ_{v ε} = 0.0

(and

{\bar{ρ}}_{x η} = 0.00

,

{\bar{ρ}}_{x λ} = 0.00

).

Table 16. P5fc-EC *.

**Table 16.** P5fc-EC *.
Feasible Coefficient Estimators for Arellano-Bond ( $θ = 1$ )
${\bar{ρ}}_{x ε} = 0.3$			AB1			AB2a			AB2c			MAB
${\bar{ρ}}_{x ε} = 0.3$	$L$	$γ$	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE
$T = 3$	4	0.20	−0.165	0.297	0.340	−0.169	0.298	0.343	−0.169	0.291	0.337	−0.178	0.287	0.338
		0.50	−0.191	0.379	0.424	−0.197	0.372	0.421	−0.195	0.360	0.409	−0.204	0.357	0.412
		0.80	−0.181	0.477	0.511	−0.195	0.474	0.513	−0.197	0.453	0.494	−0.197	0.443	0.485
$T = 6$	10	0.20	−0.097	0.101	0.140	−0.090	0.101	0.135	−0.099	0.099	0.141	−0.098	0.100	0.140
		0.50	−0.104	0.111	0.152	−0.092	0.112	0.144	−0.096	0.108	0.144	−0.093	0.107	0.142
		0.80	−0.091	0.126	0.155	−0.078	0.123	0.146	−0.083	0.121	0.147	−0.068	0.110	0.130
$T = 9$	16	0.20	−0.073	0.070	0.101	−0.062	0.066	0.091	−0.073	0.068	0.100	−0.060	0.062	0.086
		0.50	−0.077	0.075	0.108	−0.063	0.072	0.096	−0.069	0.072	0.100	−0.056	0.065	0.086
		0.80	−0.064	0.080	0.102	−0.050	0.076	0.091	−0.055	0.076	0.094	−0.039	0.063	0.074
	$L$	$β$	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE
$T = 3$	4	1.43	0.790	1.455	1.655	0.810	1.451	1.661	0.804	1.420	1.632	0.823	1.362	1.591
		0.93	0.613	1.322	1.457	0.638	1.286	1.436	0.644	1.242	1.399	0.631	1.195	1.351
		0.31	0.362	1.109	1.166	0.394	1.100	1.168	0.418	1.068	1.146	0.383	0.989	1.061
$T = 6$	10	1.43	0.409	0.432	0.595	0.399	0.437	0.591	0.420	0.423	0.596	0.381	0.395	0.548
		0.93	0.324	0.362	0.486	0.307	0.364	0.476	0.314	0.351	0.471	0.264	0.309	0.406
		0.31	0.178	0.283	0.334	0.170	0.276	0.324	0.179	0.273	0.326	0.121	0.216	0.247
$T = 9$	16	1.43	0.286	0.270	0.394	0.255	0.261	0.365	0.289	0.262	0.390	0.213	0.218	0.305
		0.93	0.235	0.235	0.332	0.205	0.224	0.304	0.222	0.224	0.315	0.157	0.179	0.238
		0.31	0.129	0.179	0.221	0.112	0.167	0.201	0.123	0.171	0.211	0.070	0.121	0.140
Feasible Coefficient Estimators for Blundell-Bond ( $θ = 1$ )
${\bar{ρ}}_{x ε} = 0.3$			BB1			BB2a			BB2c			MBB
${\bar{ρ}}_{x ε} = 0.3$	$L$	$γ$	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE
$T = 3$	7	0.20	−0.106	0.169	0.199	−0.102	0.179	0.206	−0.108	0.174	0.205	−0.089	0.187	0.207
		0.50	−0.126	0.187	0.225	−0.117	0.195	0.228	−0.120	0.193	0.227	−0.089	0.187	0.207
		0.80	−0.123	0.198	0.233	−0.112	0.209	0.237	−0.112	0.203	0.232	−0.063	0.257	0.265
$T = 6$	13	0.20	−0.066	0.080	0.104	−0.054	0.078	0.095	−0.064	0.079	0.102	−0.050	0.078	0.093
		0.50	−0.073	0.086	0.113	−0.055	0.083	0.100	−0.060	0.084	0.103	−0.039	0.084	0.093
		0.80	−0.064	0.089	0.110	−0.044	0.084	0.094	−0.047	0.086	0.098	−0.005	0.086	0.086
$T = 9$	19	0.20	−0.054	0.059	0.080	−0.042	0.056	0.070	−0.051	0.057	0.077	−0.037	0.050	0.063
		0.50	−0.058	0.063	0.085	−0.042	0.059	0.072	−0.047	0.060	0.076	−0.031	0.052	0.061
		0.80	−0.048	0.063	0.079	−0.031	0.057	0.065	−0.034	0.059	0.068	−0.012	0.050	0.052
	$L$	$β$	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE
$T = 3$	7	1.43	0.495	0.778	0.922	0.489	0.837	0.970	0.500	0.793	0.937	0.499	0.751	0.901
		0.93	0.419	0.643	0.768	0.403	0.675	0.786	0.413	0.650	0.770	0.465	0.594	0.754
		0.31	0.276	0.504	0.574	0.265	0.511	0.576	0.279	0.495	0.568	0.406	0.442	0.600
$T = 6$	13	1.43	0.284	0.321	0.429	0.247	0.318	0.403	0.276	0.315	0.419	0.219	0.297	0.369
		0.93	0.244	0.278	0.370	0.202	0.270	0.337	0.216	0.268	0.344	0.162	0.238	0.288
		0.31	0.151	0.216	0.264	0.125	0.203	0.238	0.135	0.203	0.243	0.116	0.159	0.196
$T = 9$	19	1.43	0.220	0.223	0.313	0.180	0.211	0.278	0.210	0.215	0.300	0.143	0.180	0.230
		0.93	0.190	0.197	0.273	0.151	0.183	0.237	0.167	0.185	0.249	0.108	0.151	0.186
		0.31	0.115	0.150	0.189	0.092	0.135	0.163	0.103	0.138	0.172	0.063	0.105	0.122

*

R = 10, 000

simulation replications. Design parameter values:

N = 200

,

S N R = 3

,

D E N_{y} = 1.0

,

E V F_{x} = 0.0

,

{\bar{ρ}}_{x ε} = 0.3

,

ξ = 0.8

,

κ = 0.00

,

σ_{ε} = 1

,

q = 1

,

ϕ = 1.0

. These yield the DGP parameter values:

π_{λ} = 0.00

,

π_{η} = 0.00

,

σ_{v} = 0.60

,

σ_{η} = 1.0 * (1 - γ)

,

ρ_{v ε} = 0.5

(and

{\bar{ρ}}_{x η} = 0.00

,

{\bar{ρ}}_{x λ} = 0.00

).

Table 17. P5ft-EC *.

**Table 17.** P5ft-EC *.
Feasible t-Test: Actual Significance Level ( $θ = 1$ )
${\bar{ρ}}_{x ε} = 0.3$			Arellano-Bond									Blundell-Bond
${\bar{ρ}}_{x ε} = 0.3$	$L$	$γ$	AB1	AB1aR	AB1cR	AB2a	AB2aW	AB2c	AB2cW	$L$	$γ$	BB1	BB1aR	BB1cR	BB2a	BB2aW	BB2c	BB2cW
$T = 3$	4	0.20	0.252	0.155	0.118	0.175	0.164	0.134	0.134	7	0.20	0.263	0.149	0.113	0.198	0.147	0.146	0.129
		0.50	0.283	0.173	0.140	0.195	0.175	0.161	0.154		0.50	0.301	0.166	0.142	0.210	0.150	0.165	0.134
		0.80	0.274	0.170	0.146	0.191	0.166	0.168	0.158		0.80	0.297	0.145	0.135	0.188	0.127	0.147	0.111
$T = 6$	10	0.20	0.420	0.250	0.196	0.316	0.203	0.225	0.211	13	0.20	0.359	0.195	0.140	0.247	0.141	0.159	0.143
		0.50	0.404	0.231	0.204	0.289	0.177	0.203	0.187		0.50	0.367	0.193	0.163	0.236	0.129	0.149	0.130
		0.80	0.340	0.173	0.157	0.220	0.127	0.154	0.147		0.80	0.332	0.153	0.130	0.185	0.098	0.109	0.098
$T = 9$	16	0.20	0.453	0.258	0.218	0.359	0.201	0.237	0.224	19	0.20	0.399	0.215	0.165	0.294	0.149	0.169	0.157
		0.50	0.440	0.245	0.215	0.327	0.177	0.206	0.194		0.50	0.405	0.209	0.176	0.279	0.135	0.152	0.140
		0.80	0.353	0.175	0.161	0.247	0.118	0.145	0.140		0.80	0.339	0.155	0.138	0.216	0.098	0.102	0.096
	$L$	$β$	AB1	AB1aR	AB1cR	AB2a	AB2aW	AB2c	AB2cW	$L$	$β$	BB1	BB1aR	BB1cR	BB2a	BB2aW	BB2c	BB2cW
$T = 3$	4	1.43	0.253	0.154	0.114	0.175	0.167	0.132	0.128	7	1.43	0.265	0.151	0.130	0.205	0.152	0.162	0.136
		0.93	0.274	0.173	0.135	0.196	0.183	0.155	0.147		0.93	0.304	0.180	0.164	0.223	0.160	0.185	0.155
		0.31	0.259	0.154	0.130	0.176	0.159	0.148	0.139		0.31	0.288	0.158	0.151	0.195	0.140	0.163	0.139
$T = 6$	10	1.43	0.429	0.255	0.217	0.336	0.219	0.243	0.227	13	1.43	0.381	0.215	0.179	0.280	0.165	0.194	0.168
		0.93	0.413	0.234	0.207	0.305	0.195	0.218	0.202		0.93	0.387	0.213	0.192	0.266	0.157	0.185	0.168
		0.31	0.319	0.154	0.136	0.213	0.138	0.149	0.140		0.31	0.330	0.165	0.146	0.226	0.135	0.146	0.139
$T = 9$	16	1.43	0.463	0.270	0.238	0.375	0.215	0.262	0.248	19	1.43	0.423	0.239	0.203	0.323	0.173	0.206	0.192
		0.93	0.443	0.244	0.226	0.350	0.193	0.226	0.214		0.93	0.425	0.228	0.211	0.318	0.166	0.196	0.186
		0.31	0.348	0.161	0.147	0.249	0.129	0.149	0.145		0.31	0.355	0.167	0.153	0.262	0.135	0.150	0.145

*

R = 10, 000

simulation replications. Design parameter values:

N = 200

,

S N R = 3

,

D E N_{y} = 1.0

,

E V F_{x} = 0.0

,

{\bar{ρ}}_{x ε} = 0.3

,

ξ = 0.8

,

κ = 0.00

,

σ_{ε} = 1

,

q = 1

, These yield the DGP parameter values:

π_{λ} = 0.00

,

π_{η} = 0.00

,

σ_{v} = 0.60

,

σ_{η} = 1.0 * (1 - γ)

,

ρ_{v ε} = 0.5

(and

{\bar{ρ}}_{x η} = 0.00

,

{\bar{ρ}}_{x λ} = 0.00

).

Table 18. P5fJ-EC *.

**Table 18.** P5fJ-EC *.
Feasible Sargan-Hansen Test: Rejection Probability
${\bar{ρ}}_{x ε} = 0.3$	$df$				$θ = 1$
${\bar{ρ}}_{x ε} = 0.3$	AB	BB	Inc	$γ$	${JAB}_{a}^{(2, 1)}$	${JBB}_{a}^{(2, 1)}$	${JES}_{a}^{(2, 1)}$	${JAB}_{c}^{(2, 1)}$	${JBB}_{c}^{(2, 1)}$	${JES}_{c}^{(2, 1)}$
$T = 3$	2	4	2	0.20	0.036	0.042	0.068	0.032	0.045	0.072
				0.50	0.044	0.050	0.074	0.038	0.055	0.077
				0.80	0.067	0.057	0.069	0.061	0.064	0.074
$T = 6$	8	10	2	0.20	0.065	0.064	0.072	0.056	0.057	0.074
				0.50	0.071	0.066	0.067	0.066	0.063	0.070
				0.80	0.060	0.058	0.064	0.063	0.062	0.065
$T = 9$	14	16	2	0.20	0.063	0.058	0.063	0.053	0.058	0.072
				0.50	0.063	0.056	0.066	0.060	0.062	0.072
				0.80	0.053	0.048	0.061	0.053	0.056	0.066

*

R = 10, 000

simulation replications. Design parameter values:

N = 200

,

S N R = 3

,

D E N_{y} = 1.0

,

E V F_{x} = 0.0

,

{\bar{ρ}}_{x ε} = 0.3

,

ξ = 0.8

,

κ = 0.00

,

σ_{ε} = 1

,

q = 1

,

ϕ = 1.0

. These yield the DGP parameter values:

π_{λ} = 0.00

,

π_{η} = 0.00

,

σ_{v} = 0.60

,

σ_{η} = 1.0 * (1 - γ)

,

ρ_{v ε} = 0.5

(and

{\bar{ρ}}_{x η} = 0.00

,

{\bar{ρ}}_{x λ} = 0.00

).

Table 19. P5fJ-jA *,

j =

E,W,X.

**Table 19.** P5fJ-jA *, $j =$ E,W,X.
*P5fJ-EA . Feasible Sargan-Hansen Test: Rejection Probability**
${\bar{ρ}}_{x ε} = 0.3$	$df$				$θ = 1$
${\bar{ρ}}_{x ε} = 0.3$	AB	BB	Inc	$γ$	${JAB}_{a}^{(2, 1)}$	${JBB}_{a}^{(2, 1)}$	${JES}_{a}^{(2, 1)}$	${JAB}_{c}^{(2, 1)}$	${JBB}_{c}^{(2, 1)}$	${JES}_{c}^{(2, 1)}$	${JAB}_{a}^{(1, 1)}$	${JBB}_{a}^{(1, 1)}$	${JES}_{a}^{(1, 1)}$		${JAB}_{a}^{(2, 2)}$	${JBB}_{a}^{(2, 2)}$	${JES}_{a}^{(2, 2)}$	${JAB}_{c}^{(2, 2)}$	${JBB}_{c}^{(2, 2)}$	${JES}_{c}^{(2, 2)}$
$T = 3$	4	7	3	0.20	0.052	0.047	0.059	0.039	0.040	0.053	0.074	0.074	0.079		0.052	0.048	0.059	0.043	0.040	0.053
				0.50	0.065	0.057	0.064	0.043	0.048	0.063	0.088	0.094	0.091		0.066	0.057	0.066	0.047	0.045	0.057
				0.80	0.083	0.065	0.068	0.058	0.055	0.069	0.111	0.108	0.107		0.088	0.066	0.070	0.061	0.047	0.062
$T = 6$	28	37	9	0.20	0.065	0.056	0.062	0.042	0.048	0.057	0.105	0.107	0.091		0.072	0.064	0.067	0.045	0.044	0.049
				0.50	0.074	0.062	0.062	0.055	0.059	0.063	0.116	0.123	0.099		0.077	0.069	0.070	0.053	0.044	0.047
				0.80	0.081	0.059	0.057	0.071	0.065	0.062	0.128	0.128	0.103		0.083	0.068	0.067	0.065	0.042	0.041
$T = 9$	70	85	15	0.20	0.030	0.018	0.049	0.046	0.054	0.060	0.055	0.050	0.074		0.040	0.033	0.065	0.048	0.049	0.046
				0.50	0.032	0.021	0.048	0.060	0.070	0.073	0.055	0.055	0.082		0.042	0.036	0.067	0.053	0.050	0.048
				0.80	0.032	0.016	0.043	0.070	0.078	0.067	0.059	0.054	0.086		0.042	0.033	0.065	0.058	0.044	0.041
*P5fJ-XA . Feasible Sargan-Hansen Test: Rejection Probability**											*P5fJ-WA . Feasible Sargan-Hansen Test: Rejection Probability**
${\bar{ρ}}_{x ε} = 0.3$	$df$				$θ = 1$						$df$			$θ = 1$
${\bar{ρ}}_{x ε} = 0.3$	AB	BB	Inc	γ	${JAB}_{a}^{(2, 1)}$	${JBB}_{a}^{(2, 1)}$	${JES}_{a}^{(2, 1)}$	${JAB}_{c}^{(2, 1)}$	${JBB}_{c}^{(2, 1)}$	${JES}_{c}^{(2, 1)}$	AB	BB	Inc	γ	${JAB}_{a}^{(2, 1)}$	${JBB}_{a}^{(2, 1)}$	${JES}_{a}^{(2, 1)}$	${JAB}_{c}^{(2, 1)}$	${JBB}_{c}^{(2, 1)}$	${JES}_{c}^{(2, 1)}$
$T = 3$	9	13	4	0.20	0.103	0.128	0.103	0.088	0.133	0.124	6	10	4	0.20	0.110	0.130	0.099	0.077	0.111	0.112
				0.50	0.108	0.136	0.103	0.098	0.149	0.131				0.50	0.120	0.155	0.115	0.095	0.142	0.133
				0.80	0.175	0.220	0.141	0.187	0.291	0.219				0.80	0.182	0.252	0.178	0.171	0.278	0.235
$T = 6$	48	58	10	0.20	0.148	0.215	0.169	0.183	0.250	0.164	33	43	10	0.20	0.217	0.316	0.189	0.184	0.270	0.199
				0.50	0.145	0.248	0.218	0.223	0.336	0.243				0.50	0.233	0.382	0.237	0.244	0.395	0.293
				0.80	0.324	0.488	0.251	0.576	0.771	0.507				0.80	0.465	0.670	0.319	0.580	0.824	0.580
$T = 9$	114	130	16	0.20	0.013	0.004	0.049	0.290	0.368	0.189	78	94	16	0.20	0.141	0.138	0.093	0.303	0.416	0.255
				0.50	0.011	0.006	0.069	0.339	0.484	0.319				0.50	0.138	0.171	0.125	0.382	0.589	0.411
				0.80	0.033	0.016	0.052	0.815	0.940	0.619				0.80	0.349	0.401	0.122	0.835	0.968	0.731

*

R = 10, 000

simulation replications. Design parameter values:

N = 200

,

S N R = 3

,

D E N_{y} = 1.0

,

E V F_{x} = 0.0

,

{\bar{ρ}}_{x ε} = 0.3

,

ξ = 0.8

,

κ = 0.00

,

σ_{ε} = 1

,

q = 1

,

ϕ = 1.0

. These yield the DGP parameter values:

π_{λ} = 0.00

,

π_{η} = 0.00

,

σ_{v} = 0.60

,

σ_{η} = 1.0 * (1 - γ)

,

ρ_{v ε} = 0.5

(and

{\bar{ρ}}_{x η} = 0.00

,

{\bar{ρ}}_{x λ} = 0.00

).

Table 20. P

^{ϕ}

0fc-XA *.

**Table 20.** P $^{ϕ}$ 0fc-XA *.
Feasible Coefficient Estimators for Arellano-Bond ( $θ = 1$ )
${\bar{ρ}}_{x ε} = 0.0$			AB1			AB2a			AB2c			MAB
${\bar{ρ}}_{x ε} = 0.0$	$L$	$γ$	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE
$T = 3$	11	0.20	−0.024	0.089	0.092	−0.020	0.084	0.086	−0.023	0.084	0.087	−0.023	0.086	0.089
		0.50	−0.044	0.116	0.124	−0.038	0.110	0.117	−0.043	0.110	0.118	−0.043	0.113	0.121
		0.80	−0.146	0.199	0.246	−0.136	0.193	0.236	−0.144	0.189	0.237	−0.143	0.194	0.241
$T = 6$	50	0.20	−0.020	0.046	0.050	−0.016	0.041	0.044	−0.017	0.041	0.045	−0.015	0.037	0.040
		0.50	−0.035	0.053	0.064	−0.030	0.049	0.057	−0.031	0.048	0.057	−0.027	0.044	0.052
		0.80	−0.105	0.080	0.132	−0.093	0.076	0.120	−0.098	0.072	0.122	−0.089	0.070	0.113
$T = 9$	116	0.20	−0.017	0.034	0.038	−0.015	0.032	0.035	−0.015	0.029	0.033	−0.011	0.025	0.027
		0.50	−0.029	0.037	0.047	−0.026	0.035	0.044	−0.025	0.032	0.041	−0.020	0.028	0.035
		0.80	−0.079	0.050	0.094	−0.075	0.049	0.089	−0.073	0.044	0.085	−0.064	0.042	0.076
	$L$	$β$	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE
$T = 3$	11	1.43	0.006	0.158	0.158	0.005	0.145	0.145	0.005	0.147	0.147	0.005	0.148	0.148
		0.93	0.004	0.157	0.157	0.003	0.144	0.144	0.003	0.146	0.146	0.004	0.147	0.147
		0.31	−0.004	0.152	0.152	−0.006	0.140	0.140	−0.005	0.142	0.143	−0.004	0.142	0.142
$T = 6$	50	1.43	0.013	0.087	0.088	0.010	0.077	0.077	0.011	0.078	0.079	0.009	0.067	0.068
		0.93	0.014	0.085	0.086	0.011	0.074	0.075	0.012	0.076	0.077	0.011	0.065	0.066
		0.31	0.007	0.082	0.082	0.005	0.072	0.072	0.006	0.073	0.074	0.006	0.063	0.063
$T = 9$	116	1.43	0.014	0.065	0.067	0.013	0.060	0.062	0.012	0.057	0.058	0.010	0.046	0.047
		0.93	0.017	0.062	0.065	0.016	0.058	0.060	0.015	0.054	0.056	0.012	0.044	0.046
		0.31	0.012	0.059	0.060	0.011	0.055	0.056	0.011	0.051	0.052	0.009	0.042	0.043
Feasible Coefficient Estimators for Blundell-Bond ( $θ = 1$ )
${\bar{ρ}}_{x ε} = 0.0$			BB1			BB2a			BB2c			MBB
${\bar{ρ}}_{x ε} = 0.0$	$L$	$γ$	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE
$T = 3$	16	0.20	0.036	0.073	0.082	0.056	0.075	0.094	0.061	0.074	0.096	0.060	0.078	0.099
		0.50	0.011	0.084	0.084	0.036	0.079	0.087	0.039	0.078	0.087	0.062	0.101	0.119
		0.80	−0.040	0.108	0.116	−0.010	0.103	0.104	0.005	0.101	0.101	0.067	0.162	0.176
$T = 6$	61	0.20	0.009	0.041	0.042	0.016	0.038	0.042	0.036	0.040	0.054	0.028	0.037	0.047
		0.50	−0.006	0.045	0.045	0.009	0.042	0.043	0.032	0.041	0.052	0.041	0.042	0.059
		0.80	−0.049	0.055	0.074	−0.028	0.051	0.058	0.007	0.045	0.045	0.040	0.055	0.068
$T = 9$	133	0.20	0.002	0.031	0.031	0.003	0.030	0.030	0.023	0.029	0.037	0.014	0.025	0.029
		0.50	−0.011	0.033	0.034	−0.007	0.031	0.032	0.025	0.029	0.038	0.023	0.028	0.036
		0.80	−0.048	0.039	0.062	−0.042	0.038	0.056	0.007	0.030	0.031	0.021	0.034	0.040
	$L$	$β$	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE
$T = 3$	16	1.43	0.038	0.153	0.158	0.064	0.149	0.162	0.058	0.146	0.158	0.043	0.144	0.150
		0.93	0.029	0.149	0.152	0.047	0.139	0.147	0.048	0.138	0.146	0.049	0.141	0.150
		0.31	0.013	0.147	0.147	0.016	0.136	0.137	0.019	0.136	0.138	0.048	0.156	0.164
$T = 6$	61	1.43	0.008	0.086	0.086	0.010	0.078	0.079	0.004	0.078	0.078	−0.001	0.067	0.067
		0.93	0.014	0.083	0.084	0.015	0.074	0.076	0.011	0.074	0.074	0.003	0.064	0.065
		0.31	0.010	0.079	0.080	0.009	0.070	0.071	0.009	0.068	0.069	0.010	0.061	0.062
$T = 9$	133	1.43	0.005	0.064	0.064	0.005	0.061	0.061	−0.006	0.056	0.057	−0.004	0.047	0.047
		0.93	0.012	0.061	0.062	0.011	0.058	0.059	−0.000	0.053	0.053	−0.003	0.045	0.045
		0.31	0.011	0.057	0.059	0.011	0.054	0.055	0.005	0.048	0.048	0.003	0.041	0.041

*

R = 10, 000

simulation replications. Design parameter values:

N = 200

,

S N R = 3

,

D E N_{y} = 1.0

,

E V F_{x} = 0.0

,

{\bar{ρ}}_{x ε} = 0.0

,

ξ = 0.8

,

κ = 0.00

,

σ_{ε} = 1

,

q = 1

,

ϕ = 0.5

. These yield the DGP parameter values:

π_{λ} = 0.00

,

π_{η} = 0.00

,

σ_{v} = 0.60

,

σ_{η} = 1.0 * (1 - γ)

,

ρ_{v ε} = 0.0

(and

{\bar{ρ}}_{x η} = 0.00

,

{\bar{ρ}}_{x λ} = 0.00

).

Table 21. P

^{ϕ}

0ft-XA *.

**Table 21.** P $^{ϕ}$ 0ft-XA *.
Feasible t-Test: Actual Significance Level ( $θ = 1$ )
${\bar{ρ}}_{x ε} = 0.0$			Arellano-Bond									Blundell-Bond
${\bar{ρ}}_{x ε} = 0.0$	$L$	$γ$	AB1	AB1aR	AB1cR	AB2a	AB2aW	AB2c	AB2cW	$L$	$γ$	BB1	BB1aR	BB1cR	BB2a	BB2aW	BB2c	BB2cW
$T = 3$	11	0.20	0.225	0.088	0.073	0.140	0.075	0.086	0.069	16	0.20	0.257	0.118	0.087	0.322	0.165	0.213	0.187
		0.50	0.250	0.103	0.088	0.159	0.087	0.106	0.086		0.50	0.219	0.083	0.061	0.245	0.120	0.135	0.122
		0.80	0.353	0.191	0.170	0.254	0.147	0.196	0.171		0.80	0.240	0.090	0.060	0.196	0.071	0.084	0.069
$T = 6$	50	0.20	0.250	0.092	0.067	0.361	0.079	0.077	0.065	61	0.20	0.219	0.080	0.049	0.445	0.090	0.205	0.156
		0.50	0.307	0.125	0.101	0.411	0.099	0.114	0.096		0.50	0.212	0.068	0.045	0.435	0.074	0.184	0.151
		0.80	0.549	0.322	0.282	0.624	0.236	0.312	0.277		0.80	0.382	0.176	0.127	0.499	0.088	0.085	0.066
$T = 9$	116	0.20	0.274	0.101	0.074	0.693	0.093	0.088	0.077	133	0.20	0.210	0.067	0.044	0.730	0.064	0.164	0.124
		0.50	0.349	0.148	0.117	0.734	0.138	0.134	0.116		0.50	0.233	0.075	0.054	0.729	0.067	0.182	0.140
		0.80	0.655	0.404	0.363	0.893	0.376	0.412	0.379		0.80	0.511	0.267	0.209	0.866	0.217	0.085	0.066
	$L$	$β$	AB1	AB1aR	AB1cR	AB2a	AB2aW	AB2c	AB2cW	$L$	$β$	BB1	BB1aR	BB1cR	BB2a	BB2aW	BB2c	BB2cW
$T = 3$	11	1.43	0.215	0.070	0.064	0.117	0.063	0.071	0.056	16	1.43	0.230	0.083	0.073	0.214	0.106	0.106	0.096
		0.93	0.216	0.072	0.064	0.118	0.063	0.071	0.054		0.93	0.226	0.077	0.067	0.190	0.089	0.097	0.085
		0.31	0.215	0.069	0.063	0.117	0.061	0.072	0.056		0.31	0.220	0.073	0.062	0.163	0.069	0.083	0.068
$T = 6$	50	1.43	0.221	0.068	0.059	0.315	0.060	0.068	0.058	61	1.43	0.219	0.070	0.054	0.387	0.065	0.070	0.059
		0.93	0.227	0.070	0.061	0.314	0.064	0.069	0.059		0.93	0.227	0.070	0.058	0.395	0.070	0.072	0.060
		0.31	0.232	0.066	0.064	0.315	0.057	0.069	0.058		0.31	0.230	0.069	0.060	0.395	0.066	0.069	0.058
$T = 9$	116	1.43	0.232	0.067	0.056	0.654	0.068	0.065	0.056	133	1.43	0.221	0.065	0.052	0.719	0.063	0.066	0.054
		0.93	0.241	0.073	0.065	0.657	0.071	0.070	0.060		0.93	0.229	0.069	0.059	0.721	0.067	0.066	0.055
		0.31	0.242	0.068	0.069	0.657	0.066	0.072	0.061		0.31	0.236	0.070	0.065	0.734	0.067	0.067	0.057

*

R = 10, 000

simulation replications. Design parameter values:

N = 200

,

S N R = 3

,

D E N_{y} = 1.0

,

E V F_{x} = 0.0

,

{\bar{ρ}}_{x ε} = 0.0

,

ξ = 0.8

,

κ = 0.00

,

σ_{ε} = 1

,

q = 1

,

ϕ = 0.5

. These yield the DGP parameter values:

π_{λ} = 0.00

,

π_{η} = 0.00

,

σ_{v} = 0.60

,

σ_{η} = 1.0 * (1 - γ)

,

ρ_{v ε} = 0.0

(and

{\bar{ρ}}_{x η} = 0.00

,

{\bar{ρ}}_{x λ} = 0.00

).

Table 22. P

^{ϕ}

0fJ-XA *.

**Table 22.** P $^{ϕ}$ 0fJ-XA *.
Feasible Sargan-Hansen Test: Rejection Probability
${\bar{ρ}}_{x ε} = 0.0$	$df$				$θ = 1$
${\bar{ρ}}_{x ε} = 0.0$	AB	BB	Inc	$γ$	${JAB}_{a}^{(2, 1)}$	${JBB}_{a}^{(2, 1)}$	${JES}_{a}^{(2, 1)}$	${JAB}_{c}^{(2, 1)}$	${JBB}_{c}^{(2, 1)}$	${JES}_{c}^{(2, 1)}$
$T = 3$	9	13	4	0.20	0.036	0.289	0.505	0.037	0.240	0.412
				0.50	0.039	0.101	0.169	0.041	0.101	0.146
				0.80	0.056	0.048	0.060	0.060	0.057	0.063
$T = 6$	48	58	10	0.20	0.017	0.224	0.660	0.020	0.192	0.596
				0.50	0.019	0.082	0.327	0.022	0.071	0.201
				0.80	0.023	0.024	0.074	0.032	0.035	0.045
$T = 9$	114	130	16	0.20	0.001	0.004	0.350	0.016	0.134	0.642
				0.50	0.001	0.002	0.219	0.016	0.056	0.264
				0.80	0.000	0.001	0.068	0.024	0.025	0.039

*

R = 10, 000

simulation replications. Design parameter values:

N = 200

,

S N R = 3

,

D E N_{y} = 1.0

,

E V F_{x} = 0.0

,

{\bar{ρ}}_{x ε} = 0.0

,

ξ = 0.8

,

κ = 0.00

,

σ_{ε} = 1

,

q = 1

,

ϕ = 0.5

. These yield the DGP parameter values:

π_{λ} = 0.00

,

π_{η} = 0.00

,

σ_{v} = 0.60

,

σ_{η} = 1.0 * (1 - γ)

,

ρ_{v ε} = 0.0

(and

{\bar{ρ}}_{x η} = 0.00

,

{\bar{ρ}}_{x λ} = 0.00

).

Table 23. P

^{ϕ}

5fc-EA*.

**Table 23.** P $^{ϕ}$ 5fc-EA*.
Feasible Coefficient Estimators for Arellano-Bond ( $θ = 1$ )
${\bar{ρ}}_{x ε} = 0.3$			AB1			AB2a			AB2c			MAB
${\bar{ρ}}_{x ε} = 0.3$	$L$	$γ$	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE
$T = 3$	6	0.20	−0.169	0.193	0.256	−0.167	0.191	0.254	−0.182	0.184	0.259	−0.177	0.191	0.260
		0.50	−0.241	0.251	0.349	−0.243	0.251	0.350	−0.253	0.247	0.354	−0.250	0.247	0.352
		0.80	−0.292	0.335	0.444	−0.311	0.332	0.455	−0.312	0.327	0.452	−0.298	0.328	0.443
$T = 6$	30	0.20	−0.144	0.070	0.160	−0.135	0.069	0.151	−0.149	0.065	0.163	−0.145	0.063	0.158
		0.50	−0.187	0.087	0.206	−0.178	0.087	0.198	−0.183	0.082	0.200	−0.197	0.083	0.214
		0.80	−0.203	0.109	0.230	−0.191	0.110	0.221	−0.196	0.102	0.221	−0.192	0.105	0.219
$T = 9$	72	0.20	−0.142	0.048	0.150	−0.134	0.047	0.142	−0.144	0.043	0.150	−0.125	0.039	0.131
		0.50	−0.169	0.057	0.179	−0.163	0.056	0.172	−0.163	0.052	0.171	−0.159	0.049	0.167
		0.80	−0.165	0.068	0.179	−0.157	0.067	0.171	−0.157	0.062	0.169	−0.150	0.062	0.162
	$L$	$β$	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE
$T = 3$	6	1.43	0.677	0.816	1.060	0.685	0.804	1.056	0.719	0.778	1.060	0.682	0.788	1.043
		0.93	0.660	0.741	0.992	0.668	0.728	0.988	0.690	0.723	0.999	0.663	0.717	0.977
		0.31	0.534	0.695	0.877	0.554	0.669	0.868	0.568	0.672	0.880	0.533	0.675	0.860
$T = 6$	30	1.43	0.515	0.257	0.576	0.495	0.255	0.557	0.529	0.240	0.581	0.484	0.227	0.535
		0.93	0.508	0.240	0.562	0.489	0.239	0.544	0.498	0.228	0.547	0.489	0.216	0.534
		0.31	0.368	0.214	0.425	0.347	0.208	0.404	0.357	0.201	0.410	0.312	0.181	0.361
$T = 9$	72	1.43	0.485	0.166	0.512	0.462	0.161	0.490	0.489	0.148	0.511	0.409	0.132	0.429
		0.93	0.472	0.157	0.498	0.454	0.153	0.479	0.456	0.142	0.478	0.413	0.128	0.433
		0.31	0.327	0.135	0.354	0.307	0.128	0.332	0.312	0.122	0.335	0.258	0.104	0.279
Feasible Coefficient Estimators for Blundell-Bond ( $θ = 1$ )
${\bar{ρ}}_{x ε} = 0.3$			BB1			BB2a			BB2c			MBB
${\bar{ρ}}_{x ε} = 0.3$	$L$	$γ$	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE
$T = 3$	10	0.20	0.030	0.134	0.137	0.067	0.152	0.166	0.064	0.153	0.166	0.110	0.190	0.219
		0.50	−0.048	0.147	0.155	−0.014	0.157	0.158	−0.013	0.160	0.160	0.038	0.200	0.204
		0.80	−0.124	0.160	0.202	−0.103	0.172	0.201	−0.092	0.169	0.192	−0.044	0.208	0.213
$T = 6$	40	0.20	−0.048	0.062	0.079	−0.024	0.065	0.069	−0.004	0.071	0.071	0.004	0.075	0.075
		0.50	−0.080	0.065	0.103	−0.040	0.068	0.079	−0.015	0.068	0.069	0.017	0.077	0.079
		0.80	−0.102	0.066	0.122	−0.069	0.064	0.094	−0.041	0.060	0.072	−0.003	0.069	0.069
$T = 9$	88	0.20	−0.083	0.045	0.095	−0.072	0.044	0.085	−0.050	0.047	0.069	−0.053	0.044	0.069
		0.50	−0.104	0.047	0.114	−0.086	0.047	0.098	−0.042	0.047	0.063	−0.034	0.050	0.060
		0.80	−0.104	0.047	0.114	−0.086	0.045	0.097	−0.038	0.039	0.055	−0.018	0.044	0.047
	$L$	$β$	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE	Bias	Stdv	RMSE
$T = 3$	10	1.43	0.169	0.582	0.606	0.113	0.660	0.670	0.118	0.636	0.647	0.109	0.629	0.638
		0.93	0.327	0.492	0.591	0.295	0.523	0.601	0.289	0.520	0.594	0.335	0.484	0.589
		0.31	0.342	0.404	0.530	0.332	0.420	0.536	0.316	0.409	0.517	0.434	0.362	0.565
$T = 6$	40	1.43	0.291	0.216	0.362	0.238	0.227	0.329	0.192	0.232	0.301	0.152	0.229	0.275
		0.93	0.326	0.193	0.378	0.256	0.198	0.323	0.205	0.190	0.280	0.149	0.195	0.246
		0.31	0.276	0.159	0.319	0.228	0.149	0.273	0.194	0.139	0.239	0.179	0.131	0.222
$T = 9$	88	1.43	0.351	0.149	0.381	0.324	0.145	0.355	0.274	0.146	0.310	0.246	0.135	0.281
		0.93	0.358	0.136	0.383	0.323	0.134	0.350	0.240	0.129	0.272	0.195	0.127	0.233
		0.31	0.272	0.112	0.294	0.245	0.105	0.266	0.183	0.092	0.205	0.147	0.087	0.171

*

R = 10, 000

simulation replications. Design parameter values:

N = 200

,

S N R = 3

,

D E N_{y} = 1.0

,

E V F_{x} = 0.0

,

{\bar{ρ}}_{x ε} = 0.3

,

ξ = 0.8

,

κ = 0.00

,

σ_{ε} = 1

,

q = 1

,

ϕ = 0.5

. These yield the DGP parameter values:

π_{λ} = 0.00

,

π_{η} = 0.00

,

σ_{v} = 0.60

,

σ_{η} = 1.0 * (1 - γ)

,

ρ_{v ε} = 0.5

(and

{\bar{ρ}}_{x η} = 0.00

,

{\bar{ρ}}_{x λ} = 0.00

).

Table 24. Empirical findings for the Ziliak data by Arellano-Bond estimation.

**Table 24.** Empirical findings for the Ziliak data by Arellano-Bond estimation.
	(1)	(2)	(3)	(4)	(5)	(6)	(7)	(8)	(9)	(10)	(11)
	AB1R	AB1R	AB1R	AB2W	AB1	AB2	AB1R	AB2W	AB1RC	AB1RL2	BB2W
	1P3E1X	1P3E1X	1P3E1X	1P3E1X	1P3E1X	1P3E1X	1P2E2X	1P2E2X	1P2E2X	1P2E2X	1P2E2X
$γ_{1}$	0.207 **	0.208 **	0.190 **	0.200 **	0.208 **	0.200 **	0.211 **	0.202 **	0.251 **	0.081	0.305 **
	(0.068)	(0.069)	(0.073)	(0.063)	(0.025)	(0.015)	(0.070)	(0.065)	(0.108)	(0.142)	(0.052)
$γ_{2}$	0.070 **	0.069 **	0.056 **	0.078 **	0.069 **	0.078 **	0.071 **	0.082 **	0.058 **	0.122 **	0.159 **
	(0.030)	(0.029)	(0.028)	(0.029)	(0.023)	(0.011)	(0.030)	(0.029)	(0.029)	(0.045)	(0.037)
$β_{0}^{w}$	0.629 **	0.625 **	0.617 **	0.438 **	0.625 **	0.438 **	0.588 **	0.429 **	0.040	0.518 **	0.383 **
	(0.204)	(0.202)	(0.209)	(0.181)	(0.095)	(0.053)	(0.197)	(0.167)	(0.210)	(0.282)	(0.143)
$β_{1}^{w}$	0.001	−0.019	−0.013	−0.032	−0.019	−0.032	−0.048	−0.054	−0.038	−0.210 *	−0.225
	(0.123)	(0.121)	(0.115)	(0.112)	(0.069)	(0.037)	(0.112)	(0.106)	(0.124)	(0.178)	(0.080)
$β_{2}^{w}$	−0.070 *	−0.080 *	−0.078 *	−0.058 *	−0.080 *	−0.058 **	−0.098 *	−0.076 *	−0.090 *	0.012	−0.142 **
	(0.065)	(0.064)	(0.062)	(0.056)	(0.042)	(0.022)	(0.061)	(0.054)	(0.072)	(0.101)	(0.050)
$β_{0}^{k}$	−0.047	−0.047	−0.046	0.006	−0.047	0.006	−0.024 *	−0.014 *	−0.019 *	−0.027 *	−0.006
	(0.085)	(0.079)	(0.083)	(0.061)	(0.056)	(0.029)	(0.014)	(0.010)	(0.015)	(0.017)	(0.010)
$β_{1}^{k}$	0.016	0.008	0.006	−0.032	0.008	−0.032 *	0.009	0.003	−0.011	−0.089 *	0.004
	(0.069)	(0.064)	(0.068)	(0.052)	(0.052)	(0.026)	(0.011)	(0.009)	(0.014)	(0.073)	(0.009)
$β_{2}^{k}$	0.008	0.008	0.008	0.005	0.008	0.005	0.001	0.007	0.006	0.102 *	0.003
	(0.016)	(0.015)	(0.015)	(0.012)	(0.014)	(0.008)	(0.012)	(0.009)	(0.014)	(0.070)	(0.008)
$β_{0}^{d}$	−0.154 *	−0.118 *	−0.112 *	−0.072 *	−0.118 *	−0.072 *	−0.112 *	−0.073	0.321 *	−0.038	−0.046
	(0.092)	(0.090)	(0.090)	(0.071)	(0.084)	(0.038)	(0.086)	(0.077)	(0.251)	(0.211)	(0.062)
$β_{1}^{d}$	0.015	0.017	0.020	0.015	0.017	0.015	0.006	0.003	0.043	0.124	−0.004
	(0.048)	(0.047)	(0.047)	(0.044)	(0.040)	(0.018)	(0.046)	(0.043)	(0.083)	(0.197)	(0.036)
$β_{2}^{d}$	0.069 **	0.072 **	0.071 **	0.053 **	0.072 **	0.053 **	0.065 *	0.049 *	0.070 *	0.079 **	0.045
	(0.034)	(0.034)	(0.034)	(0.030)	(0.033)	(0.014)	(0.033)	(0.030)	(0.040)	(0.039)	(0.026)
$β^{a}$	−0.010	0.007	0.008	0.011	0.007	0.011 *	0.008	−0.001	0.024	0.002	0.003
	(0.023)	(0.019)	(0.019)	(0.017)	(0.017)	(0.011)	(0.015)	(0.013)	(0.021)	(0.020)	(0.006)
$β^{a a}$	−0.0000	−0.0000	−0.0001	−0.0001	−0.0000	−0.0001 *	−0.0000	0.0000	−0.0003	−0.0000	−0.0000
	(0.0002)	(0.0002)	(0.0002)	(0.0002)	(0.0002)	(0.0001)	(0.0002)	(0.0002)	(0.0003)	(0.0003)	(0.0001)
K	20	13	13	13	13	13	13	13	13	13	13
L	149	149	142	149	149	149	163	163	43	51	197
$A R (1)$	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000
$A R (2)$	0.151	0.150	0.207	0.502	0.038	0.427	0.157	0.490	0.288	0.307	0.336
$J A B^{(1, 0)}$	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.026	0.000
$J A B_{a}^{(1, 1)}$	0.057	0.173	0.117	0.173	0.173	0.173	0.137	0.137	0.005	0.078	0.020
$J A B_{a}^{(2, 1)}$	0.783	0.728	0.643	0.728	0.728	0.728	0.728	0.728	0.084	0.295	0.312
$J A B_{a}^{(2, 2)}$	0.139	0.207	0.157	0.207	0.207	0.207	0.218	0.218	0.034	0.184	0.084
${\hat{σ}}_{η}$	0.235	0.234	0.237	0.172	0.234	0.172	0.203	0.155	0.155	0.179	0.068
${\hat{σ}}_{ε}$	0.246	0.246	0.244	0.237	0.246	0.237	0.243	0.236	0.243	0.245	0.242
$T M^{w}$	0.775	0.728	0.698	0.482	0.728	0.482	0.616	0.418	−0.127	0.402	0.030

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kiviet, J.; Pleus, M.; Poldermans, R. Accuracy and Efficiency of Various GMM Inference Techniques in Dynamic Micro Panel Data Models. Econometrics 2017, 5, 14. https://doi.org/10.3390/econometrics5010014

AMA Style

Kiviet J, Pleus M, Poldermans R. Accuracy and Efficiency of Various GMM Inference Techniques in Dynamic Micro Panel Data Models. Econometrics. 2017; 5(1):14. https://doi.org/10.3390/econometrics5010014

Chicago/Turabian Style

Kiviet, Jan, Milan Pleus, and Rutger Poldermans. 2017. "Accuracy and Efficiency of Various GMM Inference Techniques in Dynamic Micro Panel Data Models" Econometrics 5, no. 1: 14. https://doi.org/10.3390/econometrics5010014

APA Style

Kiviet, J., Pleus, M., & Poldermans, R. (2017). Accuracy and Efficiency of Various GMM Inference Techniques in Dynamic Micro Panel Data Models. Econometrics, 5(1), 14. https://doi.org/10.3390/econometrics5010014

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Accuracy and Efficiency of Various GMM Inference Techniques in Dynamic Micro Panel Data Models

Abstract

1. Introduction

2. Basic GMM Results for Linear Models

2.1. Model and Estimators

2.2. Some Algebraic Peculiarities

2.3. Particular Test Procedures

3. Implementations for Dynamic Micro Panel Models

3.1. Model and Assumptions

3.2. Removing Individual Effects by First Differencing

3.2.1. Alternative Weighting Matrices

3.3. Respecting the Equation in Levels as Well

3.3.1. Effect Stationarity

3.3.2. Alternative Weighting Matrices under Effect Stationarity

3.4. Coefficient Restriction Tests

3.5. Tests of Overidentification Restrictions

3.6. Modified GMM

4. Simulation Design

5. Simulation Results

5.1. DGPs under Effect Stationarity

5.1.1. Results for the Reference Parametrization P0

5.1.2. Results for Alternative Parametrizations

5.2. Nonstationarity

6. Empirical Results

7. Major Findings

Supplementary Materials

Acknowledgments

Author Contributions

Conflicts of Interest

Appendix A. Corrected Variance Estimation for 2-Step GMM

Appendix B. Partialling Out and GMM

Appendix C. Extracting Redundant Moment Conditions

Appendix D. Derivations for (115)

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI